目錄(209章)
倒序
- 封面
- 版權信息
- Credits
- About the Authors
- About the Reviewer
- www.PacktPub.com
- Why subscribe?
- Customer Feedback
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Downloading the color images of this book
- Errata
- Piracy
- Questions
- Introduction to Apex
- Unbounded data and continuous processing
- Stream processing
- Stream processing systems
- What is Apex and why is it important?
- Use cases and case studies
- Real-time insights for Advertising Tech (PubMatic)
- Industrial IoT applications (GE)
- Real-time threat detection (Capital One)
- Silver Spring Networks (SSN)
- Application Model and API
- Directed Acyclic Graph (DAG)
- Apex DAG Java API
- High-level Stream Java API
- SQL
- JSON
- Windowing and time
- Value proposition of Apex
- Low latency and stateful processing
- Native streaming versus micro-batch
- Performance
- Where Apex excels
- Where Apex is not suitable
- Summary
- Getting Started with Application Development
- Development process and methodology
- Setting up the development environment
- Creating a new Maven project
- Application specifications
- Custom operator development
- The Apex operator model
- CheckpointListener/CheckpointNotificationListener
- ActivationListener
- IdleTimeHandler
- Application configuration
- Testing in the IDE
- Writing the integration test
- Running the application on YARN
- Execution layer components
- Installing Apex Docker sandbox
- Running the application
- Working on the cluster
- YARN web UI
- Apex CLI
- Logging
- Dynamically adjusting logging levels
- Summary
- The Apex Library
- An overview of the library
- Integrations
- Apache Kafka
- Kafka input
- Kafka output
- Other streaming integrations
- JMS (ActiveMQ SQS and so on)
- Kinesis streams
- Files
- File input
- File splitter and block reader
- File writer
- Databases
- JDBC input
- JDBC output
- Other databases
- Transformations
- Parser
- Filter
- Enrichment
- Map transform
- Custom functions
- Windowed transformations
- Windowing
- Global Window
- Time Windows
- Sliding Time Windows
- Session Windows
- Window propagation
- State
- Accumulation
- Accumulation Mode
- State storage
- Watermarks
- Allowed lateness
- Triggering
- Merging of streams
- The windowing example
- Dedup
- Join
- State Management
- Summary
- Scalability Low Latency and Performance
- Partitioning and how it works
- Elasticity
- Partitioning toolkit
- Configuring and triggering partitioning
- StreamCodec
- Unifier
- Custom dynamic partitioning
- Performance optimizations
- Affinity and anti-affinity
- Low-latency versus throughput
- Sample application for dynamic partitioning
- Performance – other aspects for custom operators
- Summary
- Fault Tolerance and Reliability
- Distributed systems need to be resilient
- Fault-tolerance components and mechanism in Apex
- Checkpointing
- When to checkpoint
- How to checkpoint
- What to checkpoint
- Incremental state saving
- Incremental recovery
- Processing guarantees
- Example – exactly-once counting
- The exactly-once output to JDBC
- Summary
- Example Project – Real-Time Aggregation and Visualization
- Streaming ETL and beyond
- The application pattern in a real-world use case
- Analyzing Twitter feed
- Top Hashtags
- TweetStats
- Running the application
- Configuring Twitter API access
- Enabling WebSocket output
- The Pub/Sub server
- Grafana visualization
- Installing Grafana
- Installing Grafana Simple JSON Datasource
- The Grafana Pub/Sub adapter server
- Setting up the dashboard
- Summary
- Example Project – Real-Time Ride Service Data Processing
- The goal
- Datasource
- The pipeline
- Simulation of a real-time feed using historical data
- Parsing the data
- Looking up of the zip code and preparing for the windowing operation
- Windowed operator configuration
- Serving the data with WebSocket
- Running the application
- Running the application on GCP Dataproc
- Summary
- Example Project – ETL Using SQL
- The application pipeline
- Building and running the application
- Application configuration
- The application code
- Partitioning
- Application testing
- Understanding application logs
- Calcite integration
- Summary
- Introduction to Apache Beam
- Introduction to Apache Beam
- Beam concepts
- Pipelines PTransforms and PCollections
- ParDo – elementwise computation
- GroupByKey/CombinePerKey – aggregation across elements
- Windowing watermarks and triggering in Beam
- Windowing in Beam
- Watermarks in Beam
- Triggering in Beam
- Advanced topic – stateful ParDo
- WordCount in Apache Beam
- Setting up your pipeline
- Reading the works of Shakespeare in parallel
- Splitting each line on spaces
- Eliminating empty strings
- Counting the occurrences of each word
- Format your results
- Writing to a sharded text file in parallel
- Testing the pipeline at small scale with DirectRunner
- Running Apache Beam WordCount on Apache Apex
- Summary
- The Future of Stream Processing
- Lower barrier for building streaming pipelines
- Visual development tools
- Streaming SQL
- Better programming API
- Bridging the gap between data science and engineering
- Machine learning integration
- State management
- State query and data consistency
- Containerized infrastructure
- Management tools
- Summary 更新時間:2021-07-02 22:39:10
推薦閱讀
- 教父母學會上網
- Windows XP中文版應用基礎
- Hands-On Machine Learning with TensorFlow.js
- Google App Inventor
- 機艙監測與主機遙控
- Maya 2012從入門到精通
- Photoshop CS3圖像處理融會貫通
- 自動生產線的拆裝與調試
- 電腦主板現場維修實錄
- 基于單片機的嵌入式工程開發詳解
- Learn CloudFormation
- Practical Big Data Analytics
- Photoshop行業應用基礎
- Hands-On Data Warehousing with Azure Data Factory
- 基于ARM9的小型機器人制作
- 步步驚“芯”
- 暗戰強人:黑客攻防入門全程圖解
- 從實踐中學嵌入式Linux操作系統
- 單片機C語言編程實踐
- INSTANT Oracle GoldenGate
- AVR單片機菜鳥進階
- 游戲外掛攻防藝術
- ROBOTC FOR LEGO EV3基礎編程與實例
- Network Security with pfSense
- 巧學活用Excel
- ABB工業機器人應用技術全集
- 撥開CCNA迷霧
- Amazon Fargate Quick Start Guide
- Machine Learning for Data Mining
- Flash CS3中文版無敵課堂