官术网_书友最值得收藏!

  • Learning Apache Apex
  • Thomas Weise Munagala V. Ramanath David Yan Kenneth Knowles
  • 451字
  • 2021-07-02 22:38:37

High-level Stream Java API

The high-level Apex Stream Java API provides an abstraction from the lower level DAG API. It is a declarative, fluent style API that is easier to learn for someone new to Apex. Instead of identifying inpidual operators, the developer works with methods on the stream interface to specify the transformations.

The API will internally keep track of the operator(s) needed for each of the transformations and eventually translate it into the lower level DAG. The high-level API is part of the Apex library and outside of the Apex engine.

Here is the Word Count example application written with the high-level API (using Java 8 syntax):

StreamFactory.fromFolder("/tmp") 
   .flatMap(input -> Arrays.asList(input.split(" ")), name("Words")) 
   .window(new WindowOption.GlobalWindow(), 
           new 
TriggerOption().accumulatingFiredPanes().withEarlyFiringsAtEvery(1)) .countByKey(input -> new Tuple.PlainTuple<>(new KeyValPair<>(input, 1L)),
name("countByKey")) .map(input -> input.getValue(), name("Counts")) .print(name("Console")) .populateDag(dag);

Windowing is supported and stateful transformations can be applied to a windowed stream, as shown with countByKey in the preceding code listing. The inpidual windowing options will be explained later in the Windowing and time section, as they are applicable in a broader context.

In addition to the transformations that are directly available through the Stream API, the developer can also use other (possibly custom) operators through the addOperator(..) and endsWith(..) methods. For example, if output should be written to JDBC, the connector from the library can be integrated using these generic methods instead of requiring the stream API to have a method like toJDBC.

The ability to add additional operators is important, because not all possible functionality can be baked into the API and larger projects typically require customizations to operators or additional operators that are not part of the Apex library. Additionally, there are many connectors available as part of the library, each with its own set of dependencies and, sometimes, these dependencies and connectors may conflict. In this situation it isn't practical or possible to add a method for each connector to the API. Instead, the developer needs to be able to plug-in the required connector and use it along with the generally applicable transformations that are part of the Stream API.

It is also possible to extend the Stream API with custom methods to provide new transformations without exposing the details of the underlying operator. An example for this extension mechanism can be found in the API unit tests.

For readers interested to explore the API further, there is a set of example applications in the apex-malhar repository at https://github.com/apache/apex-malhar/tree/master/examples/highlevelapi.

The Stream API is relatively new and there are several enhancements planned, including expansion of the set of windowed transforms, watermark handling, and custom trigger support. The community is also discussing expanding the language support to include a native API for Scala and Python.

主站蜘蛛池模板: 永康市| 原平市| 镶黄旗| 炉霍县| 乐安县| 鹿泉市| 项城市| 合作市| 郑州市| 邵阳县| 青铜峡市| 宝清县| 大埔县| 金平| 朝阳县| 桦南县| 宁南县| 汶上县| 五指山市| 保康县| 甘德县| 布尔津县| 垫江县| 佛山市| 自治县| 沅江市| 黄龙县| 灵台县| 北海市| 旌德县| 南安市| 华安县| 玛多县| 正阳县| 文成县| 延边| 雷波县| 德江县| 扎兰屯市| 宁河县| 巨野县|