- Mastering Machine Learning with Spark 2.x
- Alex Tellez Max Pumperla Michal Malohlava
- 276字
- 2021-07-02 18:46:05
Inside the box
So, you have downloaded the latest version of Spark (depending on how you plan on launching Spark) and you have run the standard Hello, World! example....what now?!
Spark comes equipped with five libraries, which can be used separately--or in unison--depending on the task we are trying to solve. Note that in this book, we plan on using a variety of different libraries, all within the same application so that you will have the maximum exposure to the Spark platform and better understand the benefits (and limitations) of each library. These five libraries are as follows:
- Core: This is the Spark core infrastructure, providing primitives to represent and store data called Resilient Distributed Dataset (RDDs) and manipulate data with tasks and jobs.
- SQL : This library provides user-friendly API over core RDDs by introducing DataFrames and SQL to manipulate with the data stored.
- MLlib (Machine Learning Library) : This is Spark's very own machine learning library of algorithms developed in-house that can be used within your Spark application.
- Graphx : This is used for graphs and graph-calculations; we will explore this particular library in depth in a later chapter.
- Streaming : This library allows real-time streaming of data from various sources, such as Kafka, Twitter, Flume, and TCP sockets, to name a few. Many of the applications we will build in this book will leverage the MLlib and Streaming libraries to build our applications.

The Spark platform can also be extended by third-party packages. There are many of them, for example, support for reading CSV or Avro files, integration with Redshift, and Sparkling Water, which encapsulates the H2O machine learning library.
- Java面向對象程序開發及實戰
- C語言程序設計案例式教程
- Building a Quadcopter with Arduino
- Java編程技術與項目實戰(第2版)
- Java EE 7 Performance Tuning and Optimization
- Learning Node.js for .NET Developers
- 人工智能算法(卷1):基礎算法
- JQuery風暴:完美用戶體驗
- Apache Solr for Indexing Data
- Oracle SOA Suite 12c Administrator's Guide
- C語言程序設計實驗指導與習題精解
- Access 2016數據庫應用與開發:實戰從入門到精通(視頻教學版)
- Learning Node.js for Mobile Application Development
- TensorFlow 2.0深度學習應用實踐
- 自己動手做智能產品:嵌入式JavaScript實現