官术网_书友最值得收藏!

  • Learning Apache Apex
  • Thomas Weise Munagala V. Ramanath David Yan Kenneth Knowles
  • 451字
  • 2021-07-02 22:38:37

High-level Stream Java API

The high-level Apex Stream Java API provides an abstraction from the lower level DAG API. It is a declarative, fluent style API that is easier to learn for someone new to Apex. Instead of identifying inpidual operators, the developer works with methods on the stream interface to specify the transformations.

The API will internally keep track of the operator(s) needed for each of the transformations and eventually translate it into the lower level DAG. The high-level API is part of the Apex library and outside of the Apex engine.

Here is the Word Count example application written with the high-level API (using Java 8 syntax):

StreamFactory.fromFolder("/tmp") 
   .flatMap(input -> Arrays.asList(input.split(" ")), name("Words")) 
   .window(new WindowOption.GlobalWindow(), 
           new 
TriggerOption().accumulatingFiredPanes().withEarlyFiringsAtEvery(1)) .countByKey(input -> new Tuple.PlainTuple<>(new KeyValPair<>(input, 1L)),
name("countByKey")) .map(input -> input.getValue(), name("Counts")) .print(name("Console")) .populateDag(dag);

Windowing is supported and stateful transformations can be applied to a windowed stream, as shown with countByKey in the preceding code listing. The inpidual windowing options will be explained later in the Windowing and time section, as they are applicable in a broader context.

In addition to the transformations that are directly available through the Stream API, the developer can also use other (possibly custom) operators through the addOperator(..) and endsWith(..) methods. For example, if output should be written to JDBC, the connector from the library can be integrated using these generic methods instead of requiring the stream API to have a method like toJDBC.

The ability to add additional operators is important, because not all possible functionality can be baked into the API and larger projects typically require customizations to operators or additional operators that are not part of the Apex library. Additionally, there are many connectors available as part of the library, each with its own set of dependencies and, sometimes, these dependencies and connectors may conflict. In this situation it isn't practical or possible to add a method for each connector to the API. Instead, the developer needs to be able to plug-in the required connector and use it along with the generally applicable transformations that are part of the Stream API.

It is also possible to extend the Stream API with custom methods to provide new transformations without exposing the details of the underlying operator. An example for this extension mechanism can be found in the API unit tests.

For readers interested to explore the API further, there is a set of example applications in the apex-malhar repository at https://github.com/apache/apex-malhar/tree/master/examples/highlevelapi.

The Stream API is relatively new and there are several enhancements planned, including expansion of the set of windowed transforms, watermark handling, and custom trigger support. The community is also discussing expanding the language support to include a native API for Scala and Python.

主站蜘蛛池模板: 青海省| 桐柏县| 龙陵县| 炉霍县| 新疆| 铜梁县| 河南省| 昭苏县| 静安区| 洞头县| 株洲市| 龙里县| 嫩江县| 湖口县| 亳州市| 荣成市| 北海市| 衡阳县| 尉氏县| 铜山县| 怀来县| 疏附县| 新绛县| 句容市| 潼南县| 西华县| 遂昌县| 金秀| 上林县| 富锦市| 青海省| 仪征市| 西乡县| 华容县| 大城县| 开远市| 日照市| 平塘县| 阿城市| 马山县| 涟水县|