官术网_书友最值得收藏!

Spark machine learning

It is difficult to run a machine-learning algorithm when your data is distributed across multiple machines. There might be a case when the calculation depends on another point that is stored or processed on a different executor. Data can be shuffling across executors or workers, but shuffle comes with a heavy cost. Spark provides a way to avoid shuffling data. Yes, it is caching. Spark's ability to keep a large amount of data in memory makes it easy to write machine-learning algorithms.

Spark MLlib and ML are the Spark’s packages to work with machine-learning algorithms. They provide the following:

  • Inbuilt machine-learning algorithms such as Classification, Regression, Clustering, and more
  • Features such as pipelining, vector creation, and more

The previous algorithms and features are optimized for data shuffle and to scale across the cluster.

主站蜘蛛池模板: 津市市| 乌拉特中旗| 昌图县| 深泽县| 衢州市| 汶上县| 高邮市| 庄河市| 德江县| 大足县| 永仁县| 增城市| 广东省| 宁安市| 互助| 阿拉善左旗| 淳安县| 肥乡县| 万山特区| 修武县| 泗水县| 连州市| 西乌珠穆沁旗| 津市市| 崇信县| 龙泉市| 永登县| 柏乡县| 秦皇岛市| 博兴县| 玛多县| 兴隆县| 衡山县| 阿拉尔市| 锦州市| 临朐县| 墨玉县| 清水县| 宁陵县| 陆丰市| 甘德县|