- Apache Spark Quick Start Guide
- Shrey Mehrotra Akash Grade
- 278字
- 2021-07-02 13:39:55
Spark SQL
Spark SQL is where developers can work with structured and semi-structured data such as Hive tables, MySQL tables, Parquet files, AVRO files, JSON files, CSV files, and more. Another alternative to process structured data is using Hive. Hive processes structured data stored on HDFS using Hive Query Language (HQL). It internally uses MapReduce for its processing, and we shall see how Spark can deliver better performance than MapReduce. In the initial version of Spark, structured data used to be defined as schema RDD (another type of an RDD). When there is data along with the schema, SQL becomes the first choice of processing that data. Spark SQL is Spark's component that enables developers to process data with Structured Query Language (SQL).
Using Spark SQL, business logic can be easily written in SQL and HQL. This enables data warehouse engineers with a good knowledge of SQL to make use of Spark for their extract, transform, load (ETL) processing. Hive projects can easily be migrated on Spark using Spark SQL, without changing the Hive scripts.
Spark SQL is also the first choice for data analysis and data warehousing. Spark SQL enables the data analysts to write ad hoc queries for their exploratory analysis. Spark provides Spark SQL shell, where you can run the SQL-like queries and they get executed on Spark. Spark internally converts the code into a chain of RDD computations, while Hive converts the HQL job into a series of MapReduce jobs. Using Spark SQL, developers can also make use of caching (a Spark feature that enables data to be kept in memory), which can significantly increase the performance of their queries.
- 大數據技術與應用基礎
- 工業機器人產品應用實戰
- Learning Apache Cassandra(Second Edition)
- Associations and Correlations
- 西門子S7-200 SMART PLC實例指導學與用
- PyTorch Deep Learning Hands-On
- Learn CloudFormation
- 突破,Objective-C開發速學手冊
- Linux嵌入式系統開發
- INSTANT Heat Maps in R:How-to
- 手機游戲策劃設計
- 中國戰略性新興產業研究與發展·數控系統
- Learn Microsoft Azure
- PVCBOT零基礎機器人制作(第2版)
- Building Analytics Teams