- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 163字
- 2021-07-02 18:55:27
Apache Spark SQL
In this chapter, we will examine ApacheSparkSQL, SQL, DataFrames, and Datasets on top of Resilient Distributed Datasets (RDDs). DataFrames were introduced in Spark 1.3, basically replacing SchemaRDDs, and are columnar data storage structures roughly equivalent to relational database tables, whereas Datasets were introduced as experimental in Spark 1.6 and have become an additional component in Spark 2.0.
We have tried to reduce the dependency between individual chapters as much as possible in order to give you the opportunity to work through them as you like. However, we do recommend that you read this chapter because the other chapters are dependent on the knowledge of DataFrames and Datasets.
This chapter will cover the following topics:
- SparkSession
- Importing and saving data
- Processing the text files
- Processing the JSON files
- Processing the Parquet files
- DataSource API
- DataFrames
- Datasets
- Using SQL
- User-defined functions
- RDDs versus DataFrames versus Datasets
Before moving on to SQL, DataFrames, and Datasets, we will cover an overview of the SparkSession.
- WildFly:New Features
- 微信公眾平臺與小程序開發:從零搭建整套系統
- OpenCV實例精解
- 認識編程:以Python語言講透編程的本質
- 深入實踐Spring Boot
- Unity Virtual Reality Projects
- INSTANT Sencha Touch
- Python 3破冰人工智能:從入門到實戰
- Building a Quadcopter with Arduino
- ADI DSP應用技術集錦
- Getting Started with Eclipse Juno
- FFmpeg開發實戰:從零基礎到短視頻上線
- 數據分析與挖掘算法:Python實戰
- Angular Design Patterns
- Learning Kotlin by building Android Applications