官术网_书友最值得收藏!

Chapter 1. Getting Started with Spark and GraphX

Apache Spark is a cluster-computing platform for the processing of large distributed datasets. Data processing in Spark is both fast and easy, thanks to its optimized parallel computation engine and its flexible and unified API. The core abstraction in Spark is based on the concept of Resilient Distributed Dataset (RDD). By extending the MapReduce framework, Spark's Core API makes analytics jobs easier to write. On top of the Core API, Spark offers an integrated set of high-level libraries that can be used for specialized tasks such as graph processing or machine learning. In particular, GraphX is the library to perform graph-parallel processing in Spark.

This chapter will introduce you to Spark and GraphX by building a social network and exploring the links between people in the network. In addition, you will learn to use the Scala Build Tool (SBT) to build and run a Spark program. By the end of this chapter, you will know how to:

  • Install Spark successfully on your computer
  • Experiment with the Spark shell and review Spark's data abstractions
  • Create a graph and explore the links using base RDD and graph operations
  • Build and submit a standalone Spark application with SBT
主站蜘蛛池模板: 贺州市| 齐齐哈尔市| 上杭县| 天等县| 辽中县| 西宁市| 嘉兴市| 当雄县| 翼城县| 古丈县| 务川| 固镇县| 贞丰县| 汽车| 东丰县| 西林县| 曲水县| 偃师市| 广东省| 嘉禾县| 乐安县| 松滋市| 慈溪市| 通河县| 赤峰市| 吴旗县| 汝城县| 吴江市| 晋城| 高尔夫| 永州市| 女性| 丰台区| 凤山县| 河津市| 隆昌县| 乌审旗| 天长市| 宜宾县| 芷江| 新乡市|