官术网_书友最值得收藏!

Application specifications

Let's start by transforming this placeholder application into an application that counts words – the Hello World equivalent for big data processing frameworks. The functionality is easy to understand and not very important, as our focus here is on the development process.

The full source code of the modified application is available at https://github.com/tweise/apex-samples/tree/master/wordcount. Here is the modified application assembly in Application.java:

@Override
public void populateDAG(DAG dag, Configuration conf)
{
LineByLineFileInputOperator lineReader = dag.addOperator("input",
new LineByLineFileInputOperator());
LineSplitter parser = dag.addOperator("parser", new LineSplitter());
UniqueCounter counter = dag.addOperator("counter", new UniqueCounter());
GenericFileOutputOperator<Object> output = dag.addOperator("output",
new GenericFileOutputOperator<>());
output.setConverter(new ToStringConverter());
dag.addStream("lines", lineReader.output, parser.input);
dag.addStream("words", parser.output, counter.data);
dag.addStream("counts", counter.count, output.input);
}

The pipeline reads from a file (LineByLineFileInputOperator), then each line is split into words (LineSplitter), then occurrences of each word are counted (UniqueCounter), and finally the result is written to the file (GenericFileOutputOperator). Apart from the LineSplitter operator, all other operators are part of the Apex library. After all the operators are added to the DAG, the pipeline is completed connecting the operator (through their ports) using addStream. This is the explicit style of composing the logical DAG (rather than using the high level API), hence the name compositional API. Note that ports must always be defined in their respective operators, and may not always be named input and output.

主站蜘蛛池模板: 曲周县| 杭锦后旗| 滦平县| 贺州市| 思茅市| 鲜城| 布拖县| 阜新| 盘山县| 靖安县| 大悟县| 百色市| 广安市| 波密县| 巴中市| 卫辉市| 高州市| 上饶县| 逊克县| 娱乐| 宣城市| 宁海县| 忻州市| 亚东县| 信阳市| 嵊州市| 郁南县| 托里县| 新乡县| 八宿县| 上饶县| 沙坪坝区| 江都市| 安塞县| 赞皇县| 伊宁县| 龙江县| 湘潭县| 西乡县| 天峻县| 葫芦岛市|