官术网_书友最值得收藏!

Introducing H2O.ai

H2O is an open source, machine learning platform that plays extremely well with Spark; in fact, it was one of the first third-party packages deemed "Certified on Spark".

Sparkling Water (H2O + Spark) is H2O's integration of their platform within the Spark project, which combines the machine learning capabilities of H2O with all the functionality of Spark. This means that users can run H2O algorithms on Spark RDD/DataFrame for both exploration and deployment purposes. This is made possible because H2O and Spark share the same JVM, which allows for seamless transitions between the two platforms. H2O stores data in the H2O frame, which is a columnar-compressed representation of your dataset that can be created from Spark RDD and/or DataFrame. Throughout much of this book, we will be referencing algorithms from Spark's MLlib library and H2O's platform, showing how to use both the libraries to get the best results possible for a given task.

The following is a summary of the features Sparkling Water comes equipped with:

  • Use of H2O algorithms within a Spark workflow
  • Transformations between Spark and H2O data structures
  • Use of Spark RDD and/or DataFrame as inputs to H2O algorithms
  • Use of H2O frames as inputs into MLlib algorithms (will come in handy when we do feature engineering later)
  • Transparent execution of Sparkling Water applications on top of Spark (for example, we can run a Sparkling Water application within a Spark stream)
  • The H2O user interface to explore Spark data
主站蜘蛛池模板: 海城市| 荣昌县| 牡丹江市| 保德县| 遂溪县| 招远市| 永和县| 合江县| 博客| 青阳县| 武清区| 宜都市| 惠水县| 井陉县| 乌审旗| 瓦房店市| 梧州市| 西和县| 大丰市| 仁怀市| 清河县| 伊春市| 启东市| 湛江市| 河南省| 阳谷县| 建宁县| 敖汉旗| 宜兰市| 邢台市| 喀什市| 犍为县| 湘阴县| 北海市| 长乐市| 西宁市| 随州市| 阿拉善右旗| 镇江市| 玉溪市| 汉川市|