- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 177字
- 2021-07-02 18:55:23
Spark SQL
From Spark version 1.3, data frames have been introduced in Apache Spark so that Spark data can be processed in a tabular form and tabular functions (such as select, filter, and groupBy) can be used to process data. The Spark SQL module integrates with Parquet and JSON formats to allow data to be stored in formats that better represent the data. This also offers more options to integrate with external systems.
The idea of integrating Apache Spark into the Hadoop Hive big data database can also be introduced. Hive context-based Spark applications can be used to manipulate Hive-based table data. This brings Spark's fast in-memory distributed processing to Hive's big data storage capabilities. It effectively lets Hive use Spark as a processing engine.
Additionally, there is an abundance of additional connectors to access NoSQL databases outside the Hadoop ecosystem directly from Apache Spark. In Chapter 2, Apache Spark SQL, we will see how the Cloudant connector can be used to access a remote ApacheCouchDB NoSQL database and issue SQL statements against JSON-based NoSQL document collections.
- Data Visualization with D3 4.x Cookbook(Second Edition)
- Kubernetes實戰
- Java異步編程實戰
- LabVIEW Graphical Programming Cookbook
- Cocos2d-x游戲開發:手把手教你Lua語言的編程方法
- 自己動手寫Java虛擬機
- Learning Elixir
- PLC編程及應用實戰
- AutoCAD VBA參數化繪圖程序開發與實戰編碼
- Mastering Unity 2D Game Development(Second Edition)
- Hands-On Full Stack Development with Spring Boot 2.0 and React
- Python Interviews
- 算法設計與分析:基于C++編程語言的描述
- 進入IT企業必讀的324個Java面試題
- Mastering VMware Horizon 7(Second Edition)