官术网_书友最值得收藏!

Spark SQL

From Spark version 1.3, data frames have been introduced in Apache Spark so that Spark data can be processed in a tabular form and tabular functions (such as select, filter, and groupBy) can be used to process data. The Spark SQL module integrates with Parquet and JSON formats to allow data to be stored in formats that better represent the data. This also offers more options to integrate with external systems.

The idea of integrating Apache Spark into the Hadoop Hive big data database can also be introduced. Hive context-based Spark applications can be used to manipulate Hive-based table data. This brings Spark's fast in-memory distributed processing to Hive's big data storage capabilities. It effectively lets Hive use Spark as a processing engine.

Additionally, there is an abundance of additional connectors to access NoSQL databases outside the Hadoop ecosystem directly from Apache Spark. In Chapter 2, Apache Spark SQL, we will see how the Cloudant connector can be used to access a remote ApacheCouchDB NoSQL database and issue SQL statements against JSON-based NoSQL document collections.

主站蜘蛛池模板: 雷波县| 曲水县| 东丰县| 准格尔旗| 泸西县| 云林县| 基隆市| 凉山| 保德县| 抚远县| 保定市| 高雄县| 成都市| 乡城县| 桂平市| 泸溪县| 芦山县| 安阳市| 建阳市| 临安市| 台南县| 海兴县| 琼结县| 西峡县| 太仆寺旗| 城步| 嘉黎县| 勃利县| 米易县| 定襄县| 马公市| 奉新县| 蓝田县| 潜山县| 慈利县| 门头沟区| 锡林郭勒盟| 禄劝| 建昌县| 德安县| 修武县|