官术网_书友最值得收藏!

Introducing H2O.ai

H2O is an open source, machine learning platform that plays extremely well with Spark; in fact, it was one of the first third-party packages deemed "Certified on Spark".

Sparkling Water (H2O + Spark) is H2O's integration of their platform within the Spark project, which combines the machine learning capabilities of H2O with all the functionality of Spark. This means that users can run H2O algorithms on Spark RDD/DataFrame for both exploration and deployment purposes. This is made possible because H2O and Spark share the same JVM, which allows for seamless transitions between the two platforms. H2O stores data in the H2O frame, which is a columnar-compressed representation of your dataset that can be created from Spark RDD and/or DataFrame. Throughout much of this book, we will be referencing algorithms from Spark's MLlib library and H2O's platform, showing how to use both the libraries to get the best results possible for a given task.

The following is a summary of the features Sparkling Water comes equipped with:

  • Use of H2O algorithms within a Spark workflow
  • Transformations between Spark and H2O data structures
  • Use of Spark RDD and/or DataFrame as inputs to H2O algorithms
  • Use of H2O frames as inputs into MLlib algorithms (will come in handy when we do feature engineering later)
  • Transparent execution of Sparkling Water applications on top of Spark (for example, we can run a Sparkling Water application within a Spark stream)
  • The H2O user interface to explore Spark data
主站蜘蛛池模板: 南木林县| 瑞丽市| 乌恰县| 车险| 西盟| 琼中| 盐津县| 广灵县| 永靖县| 禹城市| 霍林郭勒市| 清流县| 黄石市| 融水| 铜陵市| 九龙县| 德惠市| 奈曼旗| 监利县| 廉江市| 米林县| 渭南市| 镇坪县| 宁陕县| 福泉市| 桑日县| 波密县| 灵宝市| 石首市| 汉寿县| 新安县| 荥阳市| 禄丰县| 普安县| 桂东县| 铜山县| 昌黎县| 温泉县| 修文县| 新宁县| 蕲春县|