最新章節
- Summary
- Management tools
- Containerized infrastructure
- State query and data consistency
- State management
- Machine learning integration
品牌:中圖公司
上架時間:2021-07-02 18:47:07
出版社:Packt Publishing
本書數字版權由中圖公司提供,并由其授權上海閱文信息技術有限公司制作發行
- Summary 更新時間:2021-07-02 22:39:10
- Management tools
- Containerized infrastructure
- State query and data consistency
- State management
- Machine learning integration
- Bridging the gap between data science and engineering
- Better programming API
- Streaming SQL
- Visual development tools
- Lower barrier for building streaming pipelines
- The Future of Stream Processing
- Summary
- Running Apache Beam WordCount on Apache Apex
- Testing the pipeline at small scale with DirectRunner
- Writing to a sharded text file in parallel
- Format your results
- Counting the occurrences of each word
- Eliminating empty strings
- Splitting each line on spaces
- Reading the works of Shakespeare in parallel
- Setting up your pipeline
- WordCount in Apache Beam
- Advanced topic – stateful ParDo
- Triggering in Beam
- Watermarks in Beam
- Windowing in Beam
- Windowing watermarks and triggering in Beam
- GroupByKey/CombinePerKey – aggregation across elements
- ParDo – elementwise computation
- Pipelines PTransforms and PCollections
- Beam concepts
- Introduction to Apache Beam
- Introduction to Apache Beam
- Summary
- Calcite integration
- Understanding application logs
- Application testing
- Partitioning
- The application code
- Application configuration
- Building and running the application
- The application pipeline
- Example Project – ETL Using SQL
- Summary
- Running the application on GCP Dataproc
- Running the application
- Serving the data with WebSocket
- Windowed operator configuration
- Looking up of the zip code and preparing for the windowing operation
- Parsing the data
- Simulation of a real-time feed using historical data
- The pipeline
- Datasource
- The goal
- Example Project – Real-Time Ride Service Data Processing
- Summary
- Setting up the dashboard
- The Grafana Pub/Sub adapter server
- Installing Grafana Simple JSON Datasource
- Installing Grafana
- Grafana visualization
- The Pub/Sub server
- Enabling WebSocket output
- Configuring Twitter API access
- Running the application
- TweetStats
- Top Hashtags
- Analyzing Twitter feed
- The application pattern in a real-world use case
- Streaming ETL and beyond
- Example Project – Real-Time Aggregation and Visualization
- Summary
- The exactly-once output to JDBC
- Example – exactly-once counting
- Processing guarantees
- Incremental recovery
- Incremental state saving
- What to checkpoint
- How to checkpoint
- When to checkpoint
- Checkpointing
- Fault-tolerance components and mechanism in Apex
- Distributed systems need to be resilient
- Fault Tolerance and Reliability
- Summary
- Performance – other aspects for custom operators
- Sample application for dynamic partitioning
- Low-latency versus throughput
- Affinity and anti-affinity
- Performance optimizations
- Custom dynamic partitioning
- Unifier
- StreamCodec
- Configuring and triggering partitioning
- Partitioning toolkit
- Elasticity
- Partitioning and how it works
- Scalability Low Latency and Performance
- Summary
- State Management
- Join
- Dedup
- The windowing example
- Merging of streams
- Triggering
- Allowed lateness
- Watermarks
- State storage
- Accumulation Mode
- Accumulation
- State
- Window propagation
- Session Windows
- Sliding Time Windows
- Time Windows
- Global Window
- Windowing
- Windowed transformations
- Custom functions
- Map transform
- Enrichment
- Filter
- Parser
- Transformations
- Other databases
- JDBC output
- JDBC input
- Databases
- File writer
- File splitter and block reader
- File input
- Files
- Kinesis streams
- JMS (ActiveMQ SQS and so on)
- Other streaming integrations
- Kafka output
- Kafka input
- Apache Kafka
- Integrations
- An overview of the library
- The Apex Library
- Summary
- Dynamically adjusting logging levels
- Logging
- Apex CLI
- YARN web UI
- Working on the cluster
- Running the application
- Installing Apex Docker sandbox
- Execution layer components
- Running the application on YARN
- Writing the integration test
- Testing in the IDE
- Application configuration
- IdleTimeHandler
- ActivationListener
- CheckpointListener/CheckpointNotificationListener
- The Apex operator model
- Custom operator development
- Application specifications
- Creating a new Maven project
- Setting up the development environment
- Development process and methodology
- Getting Started with Application Development
- Summary
- Where Apex is not suitable
- Where Apex excels
- Performance
- Native streaming versus micro-batch
- Low latency and stateful processing
- Value proposition of Apex
- Windowing and time
- JSON
- SQL
- High-level Stream Java API
- Apex DAG Java API
- Directed Acyclic Graph (DAG)
- Application Model and API
- Silver Spring Networks (SSN)
- Real-time threat detection (Capital One)
- Industrial IoT applications (GE)
- Real-time insights for Advertising Tech (PubMatic)
- Use cases and case studies
- What is Apex and why is it important?
- Stream processing systems
- Stream processing
- Unbounded data and continuous processing
- Introduction to Apex
- Questions
- Piracy
- Errata
- Downloading the color images of this book
- Downloading the example code
- Customer support
- Reader feedback
- Conventions
- Who this book is for
- What you need for this book
- What this book covers
- Preface
- Customer Feedback
- Why subscribe?
- www.PacktPub.com
- About the Reviewer
- About the Authors
- Credits
- 版權信息
- 封面
- 封面
- 版權信息
- Credits
- About the Authors
- About the Reviewer
- www.PacktPub.com
- Why subscribe?
- Customer Feedback
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Downloading the color images of this book
- Errata
- Piracy
- Questions
- Introduction to Apex
- Unbounded data and continuous processing
- Stream processing
- Stream processing systems
- What is Apex and why is it important?
- Use cases and case studies
- Real-time insights for Advertising Tech (PubMatic)
- Industrial IoT applications (GE)
- Real-time threat detection (Capital One)
- Silver Spring Networks (SSN)
- Application Model and API
- Directed Acyclic Graph (DAG)
- Apex DAG Java API
- High-level Stream Java API
- SQL
- JSON
- Windowing and time
- Value proposition of Apex
- Low latency and stateful processing
- Native streaming versus micro-batch
- Performance
- Where Apex excels
- Where Apex is not suitable
- Summary
- Getting Started with Application Development
- Development process and methodology
- Setting up the development environment
- Creating a new Maven project
- Application specifications
- Custom operator development
- The Apex operator model
- CheckpointListener/CheckpointNotificationListener
- ActivationListener
- IdleTimeHandler
- Application configuration
- Testing in the IDE
- Writing the integration test
- Running the application on YARN
- Execution layer components
- Installing Apex Docker sandbox
- Running the application
- Working on the cluster
- YARN web UI
- Apex CLI
- Logging
- Dynamically adjusting logging levels
- Summary
- The Apex Library
- An overview of the library
- Integrations
- Apache Kafka
- Kafka input
- Kafka output
- Other streaming integrations
- JMS (ActiveMQ SQS and so on)
- Kinesis streams
- Files
- File input
- File splitter and block reader
- File writer
- Databases
- JDBC input
- JDBC output
- Other databases
- Transformations
- Parser
- Filter
- Enrichment
- Map transform
- Custom functions
- Windowed transformations
- Windowing
- Global Window
- Time Windows
- Sliding Time Windows
- Session Windows
- Window propagation
- State
- Accumulation
- Accumulation Mode
- State storage
- Watermarks
- Allowed lateness
- Triggering
- Merging of streams
- The windowing example
- Dedup
- Join
- State Management
- Summary
- Scalability Low Latency and Performance
- Partitioning and how it works
- Elasticity
- Partitioning toolkit
- Configuring and triggering partitioning
- StreamCodec
- Unifier
- Custom dynamic partitioning
- Performance optimizations
- Affinity and anti-affinity
- Low-latency versus throughput
- Sample application for dynamic partitioning
- Performance – other aspects for custom operators
- Summary
- Fault Tolerance and Reliability
- Distributed systems need to be resilient
- Fault-tolerance components and mechanism in Apex
- Checkpointing
- When to checkpoint
- How to checkpoint
- What to checkpoint
- Incremental state saving
- Incremental recovery
- Processing guarantees
- Example – exactly-once counting
- The exactly-once output to JDBC
- Summary
- Example Project – Real-Time Aggregation and Visualization
- Streaming ETL and beyond
- The application pattern in a real-world use case
- Analyzing Twitter feed
- Top Hashtags
- TweetStats
- Running the application
- Configuring Twitter API access
- Enabling WebSocket output
- The Pub/Sub server
- Grafana visualization
- Installing Grafana
- Installing Grafana Simple JSON Datasource
- The Grafana Pub/Sub adapter server
- Setting up the dashboard
- Summary
- Example Project – Real-Time Ride Service Data Processing
- The goal
- Datasource
- The pipeline
- Simulation of a real-time feed using historical data
- Parsing the data
- Looking up of the zip code and preparing for the windowing operation
- Windowed operator configuration
- Serving the data with WebSocket
- Running the application
- Running the application on GCP Dataproc
- Summary
- Example Project – ETL Using SQL
- The application pipeline
- Building and running the application
- Application configuration
- The application code
- Partitioning
- Application testing
- Understanding application logs
- Calcite integration
- Summary
- Introduction to Apache Beam
- Introduction to Apache Beam
- Beam concepts
- Pipelines PTransforms and PCollections
- ParDo – elementwise computation
- GroupByKey/CombinePerKey – aggregation across elements
- Windowing watermarks and triggering in Beam
- Windowing in Beam
- Watermarks in Beam
- Triggering in Beam
- Advanced topic – stateful ParDo
- WordCount in Apache Beam
- Setting up your pipeline
- Reading the works of Shakespeare in parallel
- Splitting each line on spaces
- Eliminating empty strings
- Counting the occurrences of each word
- Format your results
- Writing to a sharded text file in parallel
- Testing the pipeline at small scale with DirectRunner
- Running Apache Beam WordCount on Apache Apex
- Summary
- The Future of Stream Processing
- Lower barrier for building streaming pipelines
- Visual development tools
- Streaming SQL
- Better programming API
- Bridging the gap between data science and engineering
- Machine learning integration
- State management
- State query and data consistency
- Containerized infrastructure
- Management tools
- Summary 更新時間:2021-07-02 22:39:10