官术网_书友最值得收藏!

The Apache Spark Ecosystem

Apache Spark (http://spark.apache.org/) is an open source, fast cluster-computing platform. It was originally created by AMPLab at the University of California, Berkeley. Its source code was later donated to the Apache Software Foundation (https://www.apache.org/). Spark comes with a very fast computation speed because data is loaded into distributed memory (RAM) across a cluster of machines. Not only can data be quickly transformed, but also cached on demand for a variety of use cases. Compared to Hadoop MapReduce, it runs programs up to 100 times faster when the data fits in memory, or 10 times faster on disk. Spark provides support for four programming languages: Java, Scala, Python, and R. This book covers the Spark APIs (and deep learning frameworks) for Scala (https://www.scala-lang.org/) and Python (https://www.python.org/) only.

This chapter will cover the following topics:

  • Apache Spark fundamentals
  • Getting Spark
  • Resilient Distributed Dataset (RDD) programming
  • Spark SQL, Datasets, and DataFrames
  • Spark Streaming
  • Cluster mode using a different manager
主站蜘蛛池模板: 泸定县| 宿松县| 塔河县| 休宁县| 特克斯县| 宜都市| 尉犁县| 明溪县| 申扎县| 朝阳区| 长汀县| 上林县| 广汉市| 台中市| 玛曲县| 年辖:市辖区| 平阴县| 石嘴山市| 高雄市| 兴宁市| 兴国县| 辽中县| 维西| 宕昌县| 察雅县| 江门市| 新民市| 宾川县| 上杭县| 峨山| 天台县| 崇信县| 诏安县| 交城县| 视频| 玛沁县| 宝兴县| 平乡县| 长沙县| 高阳县| 郑州市|