- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 136字
- 2021-07-02 18:55:28
Processing the Parquet files
Apache Parquet is another columnar-based data format used by many tools in the Hadoop ecosystem, such as Hive, Pig, and Impala. It increases performance using efficient compression, columnar layout, and encoding routines. The Parquet processing example is very similar to the JSON Scala code. The DataFrame is created and then saved in Parquet format using the write method with a parquet type:
df.write.parquet("hdfs://localhost:9000/tmp/test.parquet")
This results in an HDFS directory, which contains eight parquet files:

For more information about possible SparkContext and SparkSession methods, check the API documentation of the classes called org.apache.spark.SparkContext and org.apache.spark.sql.SparkSession, using the Apache Spark API reference at http://spark.apache.org/docs/latest/api/scala/index.html.
In the next section, we will examine Apache Spark DataFrames. They were introduced in Spark 1.3 and have become one of the first-class citizens in Apache Spark 1.5 and 1.6.
推薦閱讀
- Data Visualization with D3 4.x Cookbook(Second Edition)
- Learning Neo4j
- ExtGWT Rich Internet Application Cookbook
- Redis Applied Design Patterns
- Python數(shù)據(jù)可視化:基于Bokeh的可視化繪圖
- Power Up Your PowToon Studio Project
- C語言從入門到精通(第4版)
- 假如C語言是我發(fā)明的:講給孩子聽的大師編程課
- HTML5 and CSS3 Transition,Transformation,and Animation
- C語言程序設計
- Modular Programming in Java 9
- Cocos2d-x學習筆記:完全掌握Lua API與游戲項目開發(fā) (未來書庫)
- Visual FoxPro 6.0程序設計
- Mastering VMware Horizon 7(Second Edition)
- C語言程序設計