- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 266字
- 2021-07-02 18:55:27
The SparkSession--your gateway to structured data processing
The SparkSession is the starting point for working with columnar data in Apache Spark. It replaces SQLContext used in previous versions of Apache Spark. It was created from the Spark context and provides the means to load and save data files of different types using DataFrames and Datasets and manipulate columnar data with SQL, among other things. It can be used for the following functions:
- Executing SQL via the sql method
- Registering user-defined functions via the udf method
- Caching
- Creating DataFrames
- Creating Datasets
Using the SparkSession allows you to implicitly convert RDDs into DataFrames or Datasets. For instance, you can convert RDD into a DataFrame or Dataset by calling the toDF or toDS methods:
import spark.implicits._
val rdd = sc.parallelize(List(1,2,3))
val df = rdd.toDF
val ds = rdd.toDS
As you can see, this is very simple as the corresponding methods are on the RDD object itself.
Next, we will examine some of the supported file formats available to import and save data.
- Java程序設計(慕課版)
- 程序設計與實踐(VB.NET)
- LabVIEW Graphical Programming Cookbook
- Selenium Design Patterns and Best Practices
- MariaDB High Performance
- C語言程序設計
- Frank Kane's Taming Big Data with Apache Spark and Python
- Extreme C
- 案例式C語言程序設計實驗指導
- OpenCV 3 Blueprints
- Arduino Wearable Projects
- Shopify Application Development
- C語言程序設計教程
- Java程序設計入門(第2版)
- Parallel Programming with Python