官术网_书友最值得收藏!

Spark computing for machine learning

With its innovations on RDD and in-memory processing, Apache Spark has truly made distributed computing easily accessible to data scientists and machine learning professionals. According to the Apache Spark team, Apache Spark runs on the Mesos cluster manager, letting it share resources with Hadoop and other applications. Therefore, Apache Spark can read from any Hadoop input source like HDFS.

Spark computing for machine learning

For the above, the Apache Spark computing model is very suitable to distributed computing for machine learning. Especially for rapid interactive machine learning, parallel computing, and complicated modelling at scale, Apache Spark should definitely be utilized.

According to the Spark development team, Spark's philosophy is to make life easy and productive for data scientists and machine learning professionals. Due to this, Apache Spark has:

  • Well documented, expressive API's
  • Powerful domain specific libraries
  • Easy integration with storage systems
  • Caching to avoid data movement

Per the introduction by Patrick Wendell, co-founder of Databricks, Spark is especially made for large scale data processing. Apache Spark supports agile data science to iterate rapidly, and Spark can be integrated with IBM and other solutions easily.

主站蜘蛛池模板: 抚松县| 常宁市| 保山市| 泗洪县| 奉新县| 弋阳县| 甘南县| 佳木斯市| 浦江县| 五台县| 巴青县| 德州市| 靖边县| 江北区| 章丘市| 咸阳市| 武威市| 汕尾市| 工布江达县| 沈丘县| 新和县| 永康市| 大邑县| 安国市| 青田县| 大新县| 舟曲县| 县级市| 西丰县| 汝阳县| 大英县| 天气| 肃宁县| 榆树市| 综艺| 共和县| 霸州市| 司法| 奈曼旗| 娱乐| 天镇县|