官术网_书友最值得收藏!

Resilient Distributed Datasets

The core of Spark is a concept called the Resilient Distributed Dataset (RDD). An RDD is a collection of records (strictly speaking, objects of some type) that are distributed or partitioned across many nodes in a cluster (for the purposes of the Spark local mode, the single multithreaded process can be thought of in the same way). An RDD in Spark is fault-tolerant; this means that if a given node or task fails (for some reason other than erroneous user code, such as hardware failure, loss of communication, and so on), the RDD can be reconstructed automatically on the remaining nodes and the job will still be completed.

主站蜘蛛池模板: 固原市| 张家港市| 丘北县| 海安县| 莆田市| 资兴市| 北海市| 永州市| 眉山市| 鹿邑县| 宜昌市| 洮南市| 梓潼县| 玉林市| 吉林省| 古浪县| 汨罗市| 吉林市| 营山县| 满城县| 荆州市| 高尔夫| 广水市| 金沙县| 襄垣县| 普陀区| 宜昌市| 呼伦贝尔市| 鄯善县| 饶阳县| 高安市| 曲水县| 康乐县| 柞水县| 和田县| 江源县| 含山县| 浦江县| 周宁县| 巩义市| 黄梅县|