- Learning Apache Apex
- Thomas Weise Munagala V. Ramanath David Yan Kenneth Knowles
- 430字
- 2021-07-02 22:38:36
Directed Acyclic Graph (DAG)
An Apex application is represented by a DAG, which expresses processing logic as operators (vertices) and streams (edges). Streams are unbounded sequences of pieces of data, also called events or tuples. The logic that can be executed is arranged in the DAG in sequence or in parallel.

The resulting graph must be acyclic, meaning that any given tuple is processed only once by an operator. An exception to this is iterative processing, also supported by Apex, whereby the output of an operator becomes the input of a predecessor (or upstream operator), introducing a loop in the graph as far as the streams are concerned. This construct is frequently required for machine learning algorithms.
The concept of a DAG is not unique to Apex. It is widely used, for example to represent the history in revision control systems such as Git. Several projects in the Hadoop ecosystem use a DAG to model the processing logic, including Apache Storm, Apache Spark, and Apache Tez. Apache Beam pipelines are represented as a DAG of transformations and each of the streaming engines that currently offer Beam runners also have a DAG as their internal representation.
Operators are the functional building blocks that can contain custom code specific to a single use case or generic functionality that can be applied broadly. The Apex Malhar library (to be introduced later) contains reusable operators, including connectors that can read from various sources, provide filtering or transformation functionality, or output to various destinations:

The flow of data is defined through streams, which are connections between ports. Ports are the endpoints of operators to receive data (input ports) or emit data (output ports). Each operator can have multiple ports and each port is connected to at most one stream (ports can also be optional, in which case they don't have to be connected in the DAG). We will look at ports in more detail when discussing operator development. For now, it is sufficient to know that ports provide the type-safe endpoints through which the application developer specifies the data flow by connecting streams. The advantage of using ports versus just a single process and emit method on the operator is that the type of tuple or record is explicit and, when working with a Java IDE, the compiler will show type mismatches as compile errors.
In the following subsections, we will introduce the different APIs that Apex offers to develop applications. Each of these representations is eventually translated into the native DAG, which is the input required by the Apex engine to launch an application.
- 后稀缺:自動化與未來工作
- 大數據戰爭:人工智能時代不能不說的事
- Hybrid Cloud for Architects
- 大學C/C++語言程序設計基礎
- INSTANT Heat Maps in R:How-to
- 電腦日常使用與維護322問
- 奇點將至
- Artificial Intelligence By Example
- MPC5554/5553微處理器揭秘
- 工業機器人操作
- 工業機器人應用系統三維建模
- Hands-On Geospatial Analysis with R and QGIS
- Hands-On Agile Software Development with JIRA
- 微控制器的選擇與應用
- Apache Spark Machine Learning Blueprints