- Apache Spark Quick Start Guide
- Shrey Mehrotra Akash Grade
- 123字
- 2021-07-02 13:40:00
Spark RDD
Resilient Distributed Datasets (RDDs) are the basic building block of a Spark application. An RDD represents a read-only collection of objects distributed across multiple machines. Spark can distribute a collection of records using an RDD and process them in parallel on different machines.
In this chapter, we shall learn about the following:
-
- What is an RDD?
- How do you create RDDs?
- Different operations available to work on RDDs
- Important types of RDD
- Caching an RDD
- Partitions of an RDD
- Drawbacks of using RDDs
The code examples in this chapter are written in Python and Scala only. If you wish to go through the Java and R APIs, you can visit the Spark documentation page at https://spark.apache.org/.
推薦閱讀
- Hands-On Intelligent Agents with OpenAI Gym
- 腦動力:Linux指令速查效率手冊
- Hands-On Internet of Things with MQTT
- AWS:Security Best Practices on AWS
- Hands-On Data Science with SQL Server 2017
- 離散事件系統(tǒng)建模與仿真
- 21天學通C++
- Embedded Programming with Modern C++ Cookbook
- 可編程序控制器應用實訓(三菱機型)
- 網(wǎng)站前臺設計綜合實訓
- 電腦上網(wǎng)輕松入門
- ESP8266 Robotics Projects
- 精通LabVIEW程序設計
- 所羅門的密碼
- INSTANT Adobe Story Starter