書名： Machine Learning with Spark（Second Edition）
作者名： Rajdeep Dua Manpreet Singh Ghotra Nick Pentreath
本章字數： 112字
更新時間： 2021-07-09 21:07:40

Resilient Distributed Datasets

The core of Spark is a concept called the Resilient Distributed Dataset (RDD). An RDD is a collection of records (strictly speaking, objects of some type) that are distributed or partitioned across many nodes in a cluster (for the purposes of the Spark local mode, the single multithreaded process can be thought of in the same way). An RDD in Spark is fault-tolerant; this means that if a given node or task fails (for some reason other than erroneous user code, such as hardware failure, loss of communication, and so on), the RDD can be reconstructed automatically on the remaining nodes and the job will still be completed.

官术网_书友最值得收藏!

Machine Learning with Spark（Second Edition）

Resilient Distributed Datasets