- Apache Spark Graph Processing
- Rindra Ramamonjison
- 205字
- 2021-07-16 20:03:53
Chapter 1. Getting Started with Spark and GraphX
Apache Spark is a cluster-computing platform for the processing of large distributed datasets. Data processing in Spark is both fast and easy, thanks to its optimized parallel computation engine and its flexible and unified API. The core abstraction in Spark is based on the concept of Resilient Distributed Dataset (RDD). By extending the MapReduce framework, Spark's Core API makes analytics jobs easier to write. On top of the Core API, Spark offers an integrated set of high-level libraries that can be used for specialized tasks such as graph processing or machine learning. In particular, GraphX is the library to perform graph-parallel processing in Spark.
This chapter will introduce you to Spark and GraphX by building a social network and exploring the links between people in the network. In addition, you will learn to use the Scala Build Tool (SBT) to build and run a Spark program. By the end of this chapter, you will know how to:
- Practical UX Design
- Debian 7:System Administration Best Practices
- NLTK基礎教程:用NLTK和Python庫構建機器學習應用
- Production Ready OpenStack:Recipes for Successful Environments
- bbPress Complete
- 自制編程語言
- Building Serverless Applications with Python
- Jenkins Continuous Integration Cookbook(Second Edition)
- 全棧自動化測試實戰:基于TestNG、HttpClient、Selenium和Appium
- Unity UI Cookbook
- Instant Lucene.NET
- 計算機應用基礎項目化教程
- Practical Predictive Analytics
- 大規模語言模型開發基礎與實踐
- Java程序設計實用教程(第2版)