- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 130字
- 2021-07-02 18:55:28
Understanding the DataSource API
The DataSource API was introduced in Apache Spark 1.1, but is constantly being extended. You have already used the DataSource API without knowing when reading and writing data using SparkSession or DataFrames.
The DataSource API provides an extensible framework to read and write data to and from an abundance of different data sources in various formats. There is built-in support for Hive, Avro, JSON, JDBC, Parquet, and CSV and a nearly infinite number of third-party plugins to support, for example, MongoDB, Cassandra, ApacheCouchDB, Cloudant, or Redis.
Usually, you never directly use classes from the DataSource API as they are wrapped behind the read method of SparkSession or the write method of the DataFrame or Dataset. Another thing that is hidden from the user is schema discovery.
- ClickHouse性能之巔:從架構(gòu)設(shè)計(jì)解讀性能之謎
- Git Version Control Cookbook
- Mastering AWS Lambda
- 零起步玩轉(zhuǎn)掌控板與Mind+
- Learning Elixir
- JavaScript動(dòng)態(tài)網(wǎng)頁開發(fā)詳解
- 快速念咒:MySQL入門指南與進(jìn)階實(shí)戰(zhàn)
- Visual C#.NET程序設(shè)計(jì)
- C語言程序設(shè)計(jì)同步訓(xùn)練與上機(jī)指導(dǎo)(第三版)
- Learning ArcGIS for Desktop
- Apache Kafka Quick Start Guide
- Python算法詳解
- C專家編程
- Penetration Testing with the Bash shell
- C語言程序設(shè)計(jì)與應(yīng)用實(shí)驗(yàn)指導(dǎo)書(第2版)