- Machine Learning with Spark(Second Edition)
- Rajdeep Dua Manpreet Singh Ghotra Nick Pentreath
- 112字
- 2021-07-09 21:07:40
Resilient Distributed Datasets
The core of Spark is a concept called the Resilient Distributed Dataset (RDD). An RDD is a collection of records (strictly speaking, objects of some type) that are distributed or partitioned across many nodes in a cluster (for the purposes of the Spark local mode, the single multithreaded process can be thought of in the same way). An RDD in Spark is fault-tolerant; this means that if a given node or task fails (for some reason other than erroneous user code, such as hardware failure, loss of communication, and so on), the RDD can be reconstructed automatically on the remaining nodes and the job will still be completed.
推薦閱讀
- 大數據專業英語
- 輕松學Java Web開發
- 空間機器人遙操作系統及控制
- 程序設計缺陷分析與實踐
- 機器人智能運動規劃技術
- 物聯網與云計算
- PyTorch Deep Learning Hands-On
- 基于單片機的嵌入式工程開發詳解
- 工業機器人應用案例集錦
- Mastering Game Development with Unreal Engine 4(Second Edition)
- 工業機器人維護與保養
- Applied Data Visualization with R and ggplot2
- Mastering Exploratory Analysis with pandas
- 寒江獨釣:Windows內核安全編程
- 電腦故障排除與維護終極技巧金典