官术网_书友最值得收藏!

Spark MLlib

Apache Spark is an open-source platform for large dataset processing. It is well suited for iterative machine learning tasks as it leverages in-memory data structures such as RDDs. MLlib is Spark's machine learning library. MLlib provides functionality for various learning algorithms-supervised and unsupervised. It includes various statistical and linear algebra optimizations. It is shipped along with Apache Spark and hence saves on installation headaches like some other libraries. MLlib supports several higher languages such as Scala, Java, Python and R. It also provides a high-level API to build machine-learning pipelines.

MLlib's integration with Spark has quite a few benefits. Spark is designed for iterative computation cycles; it enables efficient implementation platform for large machine learning algorithms, as these algorithms are themselves iterative.

Any improvement in Spark's data structures results in direct gains for MLlib. Spark's large community contributions have helped bring new algorithms to MLlib faster.

Spark also has other APIs such as Pipeline APIs GraphX, which can be used in conjunction with MLlib; it makes building interesting use cases on top of MLlib easier.

主站蜘蛛池模板: 古田县| 五莲县| 内乡县| 临西县| 尤溪县| 昂仁县| 雷山县| 康保县| 齐齐哈尔市| 三都| 镇安县| 汶上县| 天等县| 信宜市| 桐柏县| 高碑店市| 定边县| 旌德县| 漳浦县| 瓮安县| 长寿区| 苏尼特右旗| 麻城市| 棋牌| 子长县| 福贡县| 五河县| 清原| 凌云县| 介休市| 韩城市| 阿巴嘎旗| 塘沽区| 舒城县| 浦县| 洛浦县| 巴林右旗| 厦门市| 汪清县| 东安县| 宝鸡市|