書名： Mastering Apache Spark 2.x（Second Edition）
作者名： Romeo Kienzler
本章字?jǐn)?shù)： 130字
更新時(shí)間： 2021-07-02 18:55:28

Understanding the DataSource API

The DataSource API was introduced in Apache Spark 1.1, but is constantly being extended. You have already used the DataSource API without knowing when reading and writing data using SparkSession or DataFrames.

The DataSource API provides an extensible framework to read and write data to and from an abundance of different data sources in various formats. There is built-in support for Hive, Avro, JSON, JDBC, Parquet, and CSV and a nearly infinite number of third-party plugins to support, for example, MongoDB, Cassandra, ApacheCouchDB, Cloudant, or Redis.

Usually, you never directly use classes from the DataSource API as they are wrapped behind the read method of SparkSession or the write method of the DataFrame or Dataset. Another thing that is hidden from the user is schema discovery.

官术网_书友最值得收藏!

Mastering Apache Spark 2.x（Second Edition）

Understanding the DataSource API