官术网_书友最值得收藏!

The Apache Spark Ecosystem

Apache Spark (http://spark.apache.org/) is an open source, fast cluster-computing platform. It was originally created by AMPLab at the University of California, Berkeley. Its source code was later donated to the Apache Software Foundation (https://www.apache.org/). Spark comes with a very fast computation speed because data is loaded into distributed memory (RAM) across a cluster of machines. Not only can data be quickly transformed, but also cached on demand for a variety of use cases. Compared to Hadoop MapReduce, it runs programs up to 100 times faster when the data fits in memory, or 10 times faster on disk. Spark provides support for four programming languages: Java, Scala, Python, and R. This book covers the Spark APIs (and deep learning frameworks) for Scala (https://www.scala-lang.org/) and Python (https://www.python.org/) only.

This chapter will cover the following topics:

  • Apache Spark fundamentals
  • Getting Spark
  • Resilient Distributed Dataset (RDD) programming
  • Spark SQL, Datasets, and DataFrames
  • Spark Streaming
  • Cluster mode using a different manager
主站蜘蛛池模板: 登封市| 南川市| 永和县| 罗山县| 吉安市| 赤峰市| 乌拉特后旗| 顺平县| 福海县| 惠州市| 塔城市| 茌平县| 镇江市| 合水县| 阜阳市| 五峰| 宜君县| 大余县| 都江堰市| 饶河县| 攀枝花市| 天气| 石城县| 乌苏市| 天峻县| 辽中县| 正蓝旗| 嘉鱼县| 新营市| 宝清县| 饶阳县| 中卫市| 卓尼县| 泾源县| 青川县| 蕉岭县| 莒南县| 齐齐哈尔市| 衢州市| 宜都市| 漯河市|