書名： Hands-On Deep Learning with Apache Spark
作者名： Guglielmo Iozzia
本章字數： 160字
更新時間： 2021-07-02 13:34:20

The Apache Spark Ecosystem

Apache Spark (http://spark.apache.org/) is an open source, fast cluster-computing platform. It was originally created by AMPLab at the University of California, Berkeley. Its source code was later donated to the Apache Software Foundation (https://www.apache.org/). Spark comes with a very fast computation speed because data is loaded into distributed memory (RAM) across a cluster of machines. Not only can data be quickly transformed, but also cached on demand for a variety of use cases. Compared to Hadoop MapReduce, it runs programs up to 100 times faster when the data fits in memory, or 10 times faster on disk. Spark provides support for four programming languages: Java, Scala, Python, and R. This book covers the Spark APIs (and deep learning frameworks) for Scala (https://www.scala-lang.org/) and Python (https://www.python.org/) only.

This chapter will cover the following topics:

Apache Spark fundamentals
Getting Spark
Resilient Distributed Dataset (RDD) programming
Spark SQL, Datasets, and DataFrames
Spark Streaming
Cluster mode using a different manager

官术网_书友最值得收藏!

Hands-On Deep Learning with Apache Spark

The Apache Spark Ecosystem