- Mastering Apache Spark 2.x(Second Edition)
- Romeo Kienzler
- 266字
- 2021-07-02 18:55:27
The SparkSession--your gateway to structured data processing
The SparkSession is the starting point for working with columnar data in Apache Spark. It replaces SQLContext used in previous versions of Apache Spark. It was created from the Spark context and provides the means to load and save data files of different types using DataFrames and Datasets and manipulate columnar data with SQL, among other things. It can be used for the following functions:
- Executing SQL via the sql method
- Registering user-defined functions via the udf method
- Caching
- Creating DataFrames
- Creating Datasets
Using the SparkSession allows you to implicitly convert RDDs into DataFrames or Datasets. For instance, you can convert RDD into a DataFrame or Dataset by calling the toDF or toDS methods:
import spark.implicits._
val rdd = sc.parallelize(List(1,2,3))
val df = rdd.toDF
val ds = rdd.toDS
As you can see, this is very simple as the corresponding methods are on the RDD object itself.
Next, we will examine some of the supported file formats available to import and save data.
- INSTANT Mock Testing with PowerMock
- Google Apps Script for Beginners
- Rust實(shí)戰(zhàn)
- Instant 960 Grid System
- Yocto for Raspberry Pi
- 青少年P(guān)ython編程入門
- 響應(yīng)式架構(gòu):消息模式Actor實(shí)現(xiàn)與Scala、Akka應(yīng)用集成
- Learning Jakarta Struts 1.2: a concise and practical tutorial
- AutoCAD基礎(chǔ)教程
- JavaScript Mobile Application Development
- INSTANT Lift Web Applications How-to
- 深度學(xué)習(xí):基于Python語(yǔ)言和TensorFlow平臺(tái)(視頻講解版)
- 嵌入式網(wǎng)絡(luò)編程
- Instant PhoneGap
- IBM Cognos Insight