官术网_书友最值得收藏!

The Spark MLlib library

The Spark MLlib is a library of machine learning algorithms and utilities designed to make machine learning easy and run in parallel. This includes regression, collaborative filtering, classification, and clustering. Spark MLlib provides two types of API included in the packages, namely spark.mllib and spark.ml, where spark.mllib is built on top of RDDs and spark.ml is built on top of the DataFrame. The primary machine learning API for Spark is now the DataFrame-based API in the spark.ml package. Using spark.ml with the DataFrame API is more versatile and flexible, and we can have the benefits provided by DataFrame, such as catalyst optimizer and spark.mllib, which is an RDD-based API that is expected to be removed in the future.

Machine learning is applicable to various data types, including text, images, structured data, and vectors. To support these data types under a unified dataset concept, Spark ML includes the Spark SQL DataFrame. It is easy to combine various algorithms in a single workflow or pipeline. 

The following sections will give you a detailed view of a few key concepts in the Spark ML API.

主站蜘蛛池模板: 和政县| 海伦市| 华蓥市| 昌吉市| 沧源| 麻江县| 东兴市| 平邑县| 平阳县| 克拉玛依市| 孙吴县| 汾西县| 攀枝花市| 舞阳县| 云霄县| 襄汾县| 邓州市| 韩城市| 淮安市| 石渠县| 古蔺县| 拉孜县| 出国| 三明市| 台北市| 鱼台县| 保靖县| 伽师县| 虎林市| 准格尔旗| 邵武市| 鸡东县| 侯马市| 石嘴山市| 马公市| 霸州市| 特克斯县| 托克逊县| 筠连县| 安化县| 东阿县|