官术网_书友最值得收藏!

Summary

In this chapter, we have gone through the concept of creating an RDD, to manipulating data within the RDD. We've looked at the transformations and actions available to an RDD, and walked you through various code examples to explain the differences between transformations and actions. Finally, we moved on to the advanced topics of PairRDD, where we demonstrated the creation of a Pair RDD along with some advanced transformations on the RDD.

We are now ready to explain the ETL process and the types of external storage systems that Spark can read/write data from including external filesystems, Apache Hadoop HDFS, Apache Hive, Amazon S3, and so on. We'll also look at some of the connectors to the most popular databases and how to optimally load data from storage systems, and store it back.

However, before moving on to the next chapter, have a break as you definitely deserve it!

主站蜘蛛池模板: 太和县| 定南县| 万全县| 桐乡市| 天祝| 巴林右旗| 苏州市| 阜宁县| 如皋市| 石城县| 许昌市| 蒲江县| 江安县| 五华县| 双鸭山市| 西丰县| 英吉沙县| 临桂县| 宜都市| 上犹县| 乡宁县| 宁强县| 铁力市| 正定县| 和静县| 漳平市| 东城区| 陇川县| 周口市| 绥滨县| 弥渡县| 通州区| 南雄市| 米泉市| 昔阳县| 长乐市| 壤塘县| 丰城市| 贵阳市| 大城县| 象山县|