- Learning Apache Apex
- Thomas Weise Munagala V. Ramanath David Yan Kenneth Knowles
- 219字
- 2021-07-02 22:38:40
Application specifications
Let's start by transforming this placeholder application into an application that counts words – the Hello World equivalent for big data processing frameworks. The functionality is easy to understand and not very important, as our focus here is on the development process.
The full source code of the modified application is available at https://github.com/tweise/apex-samples/tree/master/wordcount. Here is the modified application assembly in Application.java:
@Override
public void populateDAG(DAG dag, Configuration conf)
{
LineByLineFileInputOperator lineReader = dag.addOperator("input",
new LineByLineFileInputOperator());
LineSplitter parser = dag.addOperator("parser", new LineSplitter());
UniqueCounter counter = dag.addOperator("counter", new UniqueCounter());
GenericFileOutputOperator<Object> output = dag.addOperator("output",
new GenericFileOutputOperator<>());
output.setConverter(new ToStringConverter());
dag.addStream("lines", lineReader.output, parser.input);
dag.addStream("words", parser.output, counter.data);
dag.addStream("counts", counter.count, output.input);
}
The pipeline reads from a file (LineByLineFileInputOperator), then each line is split into words (LineSplitter), then occurrences of each word are counted (UniqueCounter), and finally the result is written to the file (GenericFileOutputOperator). Apart from the LineSplitter operator, all other operators are part of the Apex library. After all the operators are added to the DAG, the pipeline is completed connecting the operator (through their ports) using addStream. This is the explicit style of composing the logical DAG (rather than using the high level API), hence the name compositional API. Note that ports must always be defined in their respective operators, and may not always be named input and output.
- 火格局的時(shí)空變異及其在電網(wǎng)防火中的應(yīng)用
- 三菱FX3U/5U PLC從入門到精通
- 空間機(jī)器人遙操作系統(tǒng)及控制
- 并行數(shù)據(jù)挖掘及性能優(yōu)化:關(guān)聯(lián)規(guī)則與數(shù)據(jù)相關(guān)性分析
- 程序設(shè)計(jì)語言與編譯
- 自動檢測與傳感技術(shù)
- 數(shù)控銑削(加工中心)編程與加工
- 現(xiàn)代機(jī)械運(yùn)動控制技術(shù)
- Ceph:Designing and Implementing Scalable Storage Systems
- Python:Data Analytics and Visualization
- 網(wǎng)中之我:何明升網(wǎng)絡(luò)社會論稿
- 悟透AutoCAD 2009案例自學(xué)手冊
- Salesforce for Beginners
- 零起點(diǎn)學(xué)西門子S7-200 PLC
- PLC與變頻技術(shù)應(yīng)用