官术网_书友最值得收藏!

Chapter 1. Getting Started with Spark and GraphX

Apache Spark is a cluster-computing platform for the processing of large distributed datasets. Data processing in Spark is both fast and easy, thanks to its optimized parallel computation engine and its flexible and unified API. The core abstraction in Spark is based on the concept of Resilient Distributed Dataset (RDD). By extending the MapReduce framework, Spark's Core API makes analytics jobs easier to write. On top of the Core API, Spark offers an integrated set of high-level libraries that can be used for specialized tasks such as graph processing or machine learning. In particular, GraphX is the library to perform graph-parallel processing in Spark.

This chapter will introduce you to Spark and GraphX by building a social network and exploring the links between people in the network. In addition, you will learn to use the Scala Build Tool (SBT) to build and run a Spark program. By the end of this chapter, you will know how to:

  • Install Spark successfully on your computer
  • Experiment with the Spark shell and review Spark's data abstractions
  • Create a graph and explore the links using base RDD and graph operations
  • Build and submit a standalone Spark application with SBT
主站蜘蛛池模板: 新安县| 洛川县| 宜丰县| 株洲县| 定西市| 内江市| 桂阳县| 禹城市| 沧州市| 绥德县| 绿春县| 无锡市| 新竹县| 保康县| 义乌市| 多伦县| 乐业县| 五原县| 云阳县| 鄂托克前旗| 贡山| 新泰市| 曲阳县| 攀枝花市| 阜阳市| 元谋县| 太康县| 手游| 界首市| 嘉鱼县| 耒阳市| 浏阳市| 邵阳市| 长兴县| 达州市| 固原市| 高尔夫| 安新县| 印江| 临汾市| 临漳县|