- Mastering Machine Learning with Spark 2.x
- Alex Tellez Max Pumperla Michal Malohlava
- 244字
- 2021-07-02 18:46:05
Introducing H2O.ai
H2O is an open source, machine learning platform that plays extremely well with Spark; in fact, it was one of the first third-party packages deemed "Certified on Spark".

Sparkling Water (H2O + Spark) is H2O's integration of their platform within the Spark project, which combines the machine learning capabilities of H2O with all the functionality of Spark. This means that users can run H2O algorithms on Spark RDD/DataFrame for both exploration and deployment purposes. This is made possible because H2O and Spark share the same JVM, which allows for seamless transitions between the two platforms. H2O stores data in the H2O frame, which is a columnar-compressed representation of your dataset that can be created from Spark RDD and/or DataFrame. Throughout much of this book, we will be referencing algorithms from Spark's MLlib library and H2O's platform, showing how to use both the libraries to get the best results possible for a given task.
The following is a summary of the features Sparkling Water comes equipped with:
- Use of H2O algorithms within a Spark workflow
- Transformations between Spark and H2O data structures
- Use of Spark RDD and/or DataFrame as inputs to H2O algorithms
- Use of H2O frames as inputs into MLlib algorithms (will come in handy when we do feature engineering later)
- Transparent execution of Sparkling Water applications on top of Spark (for example, we can run a Sparkling Water application within a Spark stream)
- The H2O user interface to explore Spark data
- Mastering JavaScript Functional Programming
- PHP 編程從入門到實踐
- jQuery從入門到精通 (軟件開發視頻大講堂)
- UI智能化與前端智能化:工程技術、實現方法與編程思想
- 精通Scrapy網絡爬蟲
- Expert Android Programming
- 3D少兒游戲編程(原書第2版)
- VMware虛擬化技術
- C程序設計實踐教程
- Python之光:Python編程入門與實戰
- Android系統原理及開發要點詳解
- Service Mesh實戰:基于Linkerd和Kubernetes的微服務實踐
- Python全棧數據工程師養成攻略(視頻講解版)
- Hadoop 2.X HDFS源碼剖析
- 金融商業數據分析:基于Python和SAS