官术网_书友最值得收藏!

From Hadoop MapReduce to Spark

With a growing amount of data, the single-machine tools were not able to satisfy the industry needs and thereby created a space for new data processing methods and tools, especially Hadoop MapReduce, which is based on an idea originally described in the Google paper, MapReduce: Simplified Data Processing on Large Clusters (https://research.google.com/archive/mapreduce.html). On the other hand, it is a generic framework without any explicit support or libraries to create machine learning workflows. Another limitation of classical MapReduce is that it performs many disk I/O operations during the computation instead of benefiting from machine memory.

As you have seen, there are several existing machine learning tools and distributed platforms, but none of them is an exact match for performing machine learning tasks with large data and distributed environment. All these claims open the doors for Apache Spark.

Enter the room, Apache Spark!

Created in 2010 at the UC Berkeley AMP Lab (Algorithms, Machines, People), the Apache Spark project was built with an eye for speed, ease of use, and advanced analytics. One key difference between Spark and other distributed frameworks such as Hadoop is that datasets can be cached in memory, which lends itself nicely to machine learning, given its iterative nature (more on this later!) and how data scientists are constantly accessing the same data many times over.

Spark can be run in a variety of ways, such as the following:

  • Local mode: This entails a single Java Virtual Machine (JVM) executed on a single host
  • Standalone Spark cluster: This entails multiple JVMs on multiple hosts
  • Via resource manager such as Yarn/Mesos: This application deployment is driven by a resource manager, which controls the allocation of nodes, application, distribution, and deployment
主站蜘蛛池模板: 谢通门县| 东城区| 承德县| 漳州市| 遂昌县| 博客| 平江县| 宿松县| 洛南县| 东乌珠穆沁旗| 海安县| 顺平县| 青川县| 若羌县| 集贤县| 阳东县| 保康县| 个旧市| 航空| 肥西县| 莱芜市| 清镇市| 罗平县| 深州市| 龙州县| 阳泉市| 绥滨县| 四平市| 忻州市| 无极县| 和林格尔县| 佛坪县| 牡丹江市| 临桂县| 东乡县| 桦南县| 沙雅县| 灵川县| 孙吴县| 会理县| 石狮市|