- Hands-On Data Analysis with Scala
- Rajesh Gupta
- 163字
- 2021-06-24 14:51:07
Apache Spark
Apache Spark (https://spark.apache.org/) is a unified analytics engine for large-scale data processing. Spark provides APIs for batch as well as stream data processing in a distributed computing environment. Spark's API can be broadly divided into the following five categories:
- Core: RDD
- SQL structured: DataFrames and Datasets
- Streaming: Structured streaming and DStreams
- MLlib: Machine learning
- GraphX: Graph processing
Apache Spark is a very active open source project. New features are added and performance improvements made on a regular basis. Typically, there is a new minor release of Apache Spark every three months with significant performance and feature improvements. At the time of writing, 2.4.0 is the most recent version of Spark.
The following is Spark core's SBT dependency:
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.1"
Spark version 2.4.0 has introduced support for Scala version 2.12; however, we will be using Scala version 2.11 for exploring Spark's feature sets. Spark will be covered in more detail in the subsequent chapters.
- 數(shù)據(jù)展現(xiàn)的藝術(shù)
- 傳感器技術(shù)實(shí)驗(yàn)教程
- 空間機(jī)器人遙操作系統(tǒng)及控制
- 來(lái)吧!帶你玩轉(zhuǎn)Excel VBA
- Mastering D3.js
- 條碼技術(shù)及應(yīng)用
- PostgreSQL 10 Administration Cookbook
- 基于神經(jīng)網(wǎng)絡(luò)的監(jiān)督和半監(jiān)督學(xué)習(xí)方法與遙感圖像智能解譯
- Visual C++項(xiàng)目開發(fā)案例精粹
- PLC與變頻技術(shù)應(yīng)用
- Learning iOS 8 for Enterprise
- 渲染王3ds Max三維特效動(dòng)畫技術(shù)
- 谷物干燥節(jié)能供熱技術(shù)與裝備
- 樂(lè)高機(jī)器人:Scratch與WeDo編程基礎(chǔ)實(shí)戰(zhàn)應(yīng)用
- BeagleBone Home Automation