官术网_书友最值得收藏!

Understanding the DataSource API

The DataSource API was introduced in Apache Spark 1.1, but is constantly being extended. You have already used the DataSource API without knowing when reading and writing data using SparkSession or DataFrames.

The DataSource API provides an extensible framework to read and write data to and from an abundance of different data sources in various formats. There is built-in support for Hive, Avro, JSON, JDBC, Parquet, and CSV and a nearly infinite number of third-party plugins to support, for example, MongoDB, Cassandra, ApacheCouchDB, Cloudant, or Redis.

Usually, you never directly use classes from the DataSource API as they are wrapped behind the read method of SparkSession or the write method of the DataFrame or Dataset. Another thing that is hidden from the user is schema discovery.

主站蜘蛛池模板: 平度市| 新干县| 临邑县| 满洲里市| 东山县| 连平县| 崇文区| 贵溪市| 铜梁县| 独山县| 兴仁县| 柳林县| 肇源县| 遂川县| 贵溪市| 且末县| 米脂县| 简阳市| 信丰县| 凌海市| 孟连| 夏河县| 萨嘎县| 菏泽市| 邹城市| 桐庐县| 神木县| 台州市| 临沧市| 怀远县| 沙雅县| 卢氏县| 北京市| 虎林市| 黄山市| 镇远县| 岫岩| 资兴市| 都匀市| 赣州市| 宣汉县|