官术网_书友最值得收藏!

Spark SQL

From Spark version 1.3, data frames have been introduced in Apache Spark so that Spark data can be processed in a tabular form and tabular functions (such as select, filter, and groupBy) can be used to process data. The Spark SQL module integrates with Parquet and JSON formats to allow data to be stored in formats that better represent the data. This also offers more options to integrate with external systems.

The idea of integrating Apache Spark into the Hadoop Hive big data database can also be introduced. Hive context-based Spark applications can be used to manipulate Hive-based table data. This brings Spark's fast in-memory distributed processing to Hive's big data storage capabilities. It effectively lets Hive use Spark as a processing engine.

Additionally, there is an abundance of additional connectors to access NoSQL databases outside the Hadoop ecosystem directly from Apache Spark. In Chapter 2, Apache Spark SQL, we will see how the Cloudant connector can be used to access a remote ApacheCouchDB NoSQL database and issue SQL statements against JSON-based NoSQL document collections.

主站蜘蛛池模板: 山东省| 华蓥市| 西城区| 泸定县| 辽阳县| 慈利县| 喀什市| 五原县| 和平区| 巴里| 平山县| 万安县| 奉化市| 姜堰市| 房产| 巴南区| 衡山县| 浦东新区| 申扎县| 祁阳县| 舒兰市| 福海县| 邻水| 谢通门县| 灌阳县| 青州市| 陇南市| 吉安市| 武义县| 和静县| 苗栗市| 漳浦县| 湄潭县| 榆中县| 鲁山县| 华坪县| 东乡县| 郁南县| 晋州市| 泾源县| 静乐县|