官术网_书友最值得收藏!

  • Learning Apache Apex
  • Thomas Weise Munagala V. Ramanath David Yan Kenneth Knowles
  • 350字
  • 2021-07-02 22:38:37

SQL

SQL is widely used for data transformation and access, not only with traditional relational databases but also in the Apache big data space with projects like Hive, Drill, Impala, and several others. They all let the user process bounded data at rest using familiar SQL syntax without requiring other programming skills. SQL can be used for ETL purposes but the most common use is for querying data, either directly or through the wide range of SQL compatible BI tools.

Though it has been in use in the Hadoop space for years, SQL is relatively new in the stream processing area as a declarative approach to specify a streaming application. Apex is using Apache Calcite for its SQL support, which has already been adopted by many other big data processing frameworks. Instead of every project coming up with its own declarative API, Calcite aims to make SQL the common language. Calcite accepts standard SQL, translates it into relational algebra, facilitates query planning and optimization to physical plan and allows for integration of any data source that can provide collections of records with columns (files, queues, and so on).

There are different use cases for Calcite, including ETL, lookups, search, and so on. With unbounded data sources, the processing of SQL becomes continuous and it is necessary to express windows on the stream that define boundaries at which results can be computed and emitted. Calcite provides streaming SQL extensions to support unbounded data (https://calcite.apache.org/docs/stream.html).

The initial SQL support in Apex covers select, insert, inner join, where clause and scalar functions. Endpoints (sources and sinks) can be files, Kafka or streams that are defined with the DAG API (fusion style) and CSV is supported as a data format.

Here is a simple example to illustrate the translation of SQL into an Apex DAG:

Translation of SQL into Apex DAG

For more information, you can visit http://apex.apache.org/docs/malhar/apis/calcite/.

The community is working on the support for windowed transformations (required for aggregations), which will be based on the scalable window and accumulation state management of the Apex library (refer to Chapter 3, The Apex Library).

主站蜘蛛池模板: 玉田县| 太康县| 东乡县| 瓦房店市| 新疆| 金昌市| 淳安县| 建平县| 桂林市| 上杭县| 巴塘县| 武川县| 乌苏市| 惠东县| 启东市| 岱山县| 南投县| 柳河县| 邛崃市| 监利县| 揭东县| 会泽县| 宁河县| 吉林市| 壶关县| 绥棱县| 资溪县| 宁波市| 肥东县| 吉林省| 微山县| 永兴县| 三穗县| 镇康县| 兖州市| 余干县| 大洼县| 金秀| 甘洛县| 宁武县| 遂川县|