官术网_书友最值得收藏!

Spark RDD

Resilient Distributed Datasets (RDDs) are the basic building block of a Spark application. An RDD represents a read-only collection of objects distributed across multiple machines. Spark can distribute a collection of records using an RDD and process them in parallel on different machines. 

In this chapter, we shall learn about the following:

    • What is an RDD? 
    • How do you create RDDs?
    • Different operations available to work on RDDs
    • Important types of RDD
    • Caching an RDD
    • Partitions of an RDD
    • Drawbacks of using RDDs

The code examples in this chapter are written in Python and Scala only. If you wish to go through the Java and R APIs, you can visit the Spark documentation page at https://spark.apache.org/

主站蜘蛛池模板: 铜山县| 武城县| 晴隆县| 兴宁市| 当阳市| 驻马店市| 济宁市| 南安市| 彩票| 收藏| 普格县| 浦县| 正镶白旗| 连南| 万安县| 手游| 进贤县| 孟村| 安吉县| 灵石县| 万盛区| 正镶白旗| 手游| 开封县| 谢通门县| 莱州市| 略阳县| 房产| 仪征市| 南川市| 凤翔县| 凌海市| 晋城| 苍南县| 永州市| 芒康县| 汉阴县| 神农架林区| 库伦旗| 手机| 射阳县|