官术网_书友最值得收藏!

Resilient Distributed Datasets

The core of Spark is a concept called the Resilient Distributed Dataset (RDD). An RDD is a collection of records (strictly speaking, objects of some type) that are distributed or partitioned across many nodes in a cluster (for the purposes of the Spark local mode, the single multithreaded process can be thought of in the same way). An RDD in Spark is fault-tolerant; this means that if a given node or task fails (for some reason other than erroneous user code, such as hardware failure, loss of communication, and so on), the RDD can be reconstructed automatically on the remaining nodes and the job will still be completed.

主站蜘蛛池模板: 渭南市| 顺平县| 岚皋县| 贵阳市| 泾源县| 阳江市| 丘北县| 新蔡县| 社会| 房产| 通化县| 济源市| 繁峙县| 团风县| 大渡口区| 屯门区| 筠连县| 霸州市| 普格县| 治多县| 定南县| 吉安县| 昭通市| 洪泽县| 阿城市| 阿巴嘎旗| 连江县| 安平县| 安龙县| 皮山县| 正镶白旗| 达拉特旗| 图们市| 沙洋县| 长顺县| 鹤山市| 瓦房店市| 诸暨市| 阿图什市| 温州市| 措勤县|