- Learning Apache Apex
- Thomas Weise Munagala V. Ramanath David Yan Kenneth Knowles
- 219字
- 2021-07-02 22:38:40
Application specifications
Let's start by transforming this placeholder application into an application that counts words – the Hello World equivalent for big data processing frameworks. The functionality is easy to understand and not very important, as our focus here is on the development process.
The full source code of the modified application is available at https://github.com/tweise/apex-samples/tree/master/wordcount. Here is the modified application assembly in Application.java:
@Override
public void populateDAG(DAG dag, Configuration conf)
{
LineByLineFileInputOperator lineReader = dag.addOperator("input",
new LineByLineFileInputOperator());
LineSplitter parser = dag.addOperator("parser", new LineSplitter());
UniqueCounter counter = dag.addOperator("counter", new UniqueCounter());
GenericFileOutputOperator<Object> output = dag.addOperator("output",
new GenericFileOutputOperator<>());
output.setConverter(new ToStringConverter());
dag.addStream("lines", lineReader.output, parser.input);
dag.addStream("words", parser.output, counter.data);
dag.addStream("counts", counter.count, output.input);
}
The pipeline reads from a file (LineByLineFileInputOperator), then each line is split into words (LineSplitter), then occurrences of each word are counted (UniqueCounter), and finally the result is written to the file (GenericFileOutputOperator). Apart from the LineSplitter operator, all other operators are part of the Apex library. After all the operators are added to the DAG, the pipeline is completed connecting the operator (through their ports) using addStream. This is the explicit style of composing the logical DAG (rather than using the high level API), hence the name compositional API. Note that ports must always be defined in their respective operators, and may not always be named input and output.
- 樂高機(jī)器人:WeDo編程與搭建指南
- 大數(shù)據(jù)專業(yè)英語
- Dreamweaver CS3網(wǎng)頁制作融會(huì)貫通
- 網(wǎng)上生活必備
- 大數(shù)據(jù)平臺(tái)異常檢測分析系統(tǒng)的若干關(guān)鍵技術(shù)研究
- JavaScript典型應(yīng)用與最佳實(shí)踐
- Linux:Powerful Server Administration
- R Data Analysis Projects
- Working with Linux:Quick Hacks for the Command Line
- 單片機(jī)技術(shù)項(xiàng)目化原理與實(shí)訓(xùn)
- 電腦上網(wǎng)入門
- 和機(jī)器人一起進(jìn)化
- PHP求職寶典
- EJB JPA數(shù)據(jù)庫持久層開發(fā)實(shí)踐詳解
- Windows 7故障與技巧200例