官术网_书友最值得收藏!

Spark computing for machine learning

With its innovations on RDD and in-memory processing, Apache Spark has truly made distributed computing easily accessible to data scientists and machine learning professionals. According to the Apache Spark team, Apache Spark runs on the Mesos cluster manager, letting it share resources with Hadoop and other applications. Therefore, Apache Spark can read from any Hadoop input source like HDFS.

Spark computing for machine learning

For the above, the Apache Spark computing model is very suitable to distributed computing for machine learning. Especially for rapid interactive machine learning, parallel computing, and complicated modelling at scale, Apache Spark should definitely be utilized.

According to the Spark development team, Spark's philosophy is to make life easy and productive for data scientists and machine learning professionals. Due to this, Apache Spark has:

  • Well documented, expressive API's
  • Powerful domain specific libraries
  • Easy integration with storage systems
  • Caching to avoid data movement

Per the introduction by Patrick Wendell, co-founder of Databricks, Spark is especially made for large scale data processing. Apache Spark supports agile data science to iterate rapidly, and Spark can be integrated with IBM and other solutions easily.

主站蜘蛛池模板: 潜山县| 图片| 天门市| 紫云| 娄底市| 科技| 皋兰县| 河北省| 威海市| 得荣县| 鹤庆县| 大新县| 杭锦旗| 奉贤区| 聊城市| 大安市| 泰顺县| 铁力市| 海安县| 太仓市| 上杭县| 和静县| 洛隆县| 福贡县| 珠海市| 临洮县| 拉孜县| 洪江市| 无为县| 保亭| 三台县| 肥西县| 曲阳县| 兴业县| 七台河市| 台北县| 金乡县| 磐石市| 兴义市| 密云县| 马山县|