- Learning Apache Apex
- Thomas Weise Munagala V. Ramanath David Yan Kenneth Knowles
- 280字
- 2021-07-02 22:38:38
Native streaming versus micro-batch
Let's examine how the stateful stream processing (as found in Apex and Flink) compares to the micro-batch based approach in Apache Spark Streaming.
Let's look at the following diagram:

On top, we see an example of processing in Spark Streaming and below we see an example in Apex in the preceding diagram. Based on its underlying "stateless" batch architecture, Spark Streaming processes a stream by piding it into small batches (micro-batches) that typically last from 500 ms to a few seconds. A new task is scheduled for every micro-batch. Once scheduled, the new task needs to be initialized. Such initialization could include opening connections to external resources, loading data that is needed for processing and so on. Overall this implies a per task overhead that limits the micro-batch frequency and leads to a latency trade-off.
In classical batch processing, tasks may last for the entire bounded input data set. Any computational state remains internal to the task and there is typically no special consideration for fault tolerance required, since whenever there is a failure, the task can restart from the beginning.
However, with unbounded data and streaming, a stateful operation like counting would need to maintain the current count and it would need to be transferred across task boundaries. As long as the state is small, this may be manageable. However, when transformations are applied to large key cardinality, the state can easily grow to a size that makes it impractical to swap in and out (cost of serialization, I/O, and so on). The correct state management is not easy to solve without underlying platform support, especially not when accuracy, consistency and fault tolerance are important.
- Ansible Configuration Management
- 大數(shù)據(jù)技術(shù)與應(yīng)用基礎(chǔ)
- 輕松學(xué)C#
- 精通MATLAB圖像處理
- Maya 2012從入門到精通
- Photoshop CS3特效處理融會貫通
- 完全掌握AutoCAD 2008中文版:綜合篇
- Photoshop CS3圖層、通道、蒙版深度剖析寶典
- 嵌入式操作系統(tǒng)
- Implementing AWS:Design,Build,and Manage your Infrastructure
- 邊緣智能:關(guān)鍵技術(shù)與落地實踐
- 從零開始學(xué)C++
- AWS Administration:The Definitive Guide(Second Edition)
- 網(wǎng)絡(luò)設(shè)備規(guī)劃、配置與管理大全(Cisco版)
- Practical Autodesk AutoCAD 2021 and AutoCAD LT 2021