- Artificial Intelligence for Big Data
- Anand Deshpande Manish Kumar
- 187字
- 2021-06-25 21:57:12
The Spark MLlib library
The Spark MLlib is a library of machine learning algorithms and utilities designed to make machine learning easy and run in parallel. This includes regression, collaborative filtering, classification, and clustering. Spark MLlib provides two types of API included in the packages, namely spark.mllib and spark.ml, where spark.mllib is built on top of RDDs and spark.ml is built on top of the DataFrame. The primary machine learning API for Spark is now the DataFrame-based API in the spark.ml package. Using spark.ml with the DataFrame API is more versatile and flexible, and we can have the benefits provided by DataFrame, such as catalyst optimizer and spark.mllib, which is an RDD-based API that is expected to be removed in the future.
Machine learning is applicable to various data types, including text, images, structured data, and vectors. To support these data types under a unified dataset concept, Spark ML includes the Spark SQL DataFrame. It is easy to combine various algorithms in a single workflow or pipeline.
The following sections will give you a detailed view of a few key concepts in the Spark ML API.
- GitHub Essentials
- SQL入門經(jīng)典(第5版)
- 數(shù)據(jù)之巔:數(shù)據(jù)的本質(zhì)與未來
- Architects of Intelligence
- 企業(yè)大數(shù)據(jù)系統(tǒng)構(gòu)建實戰(zhàn):技術(shù)、架構(gòu)、實施與應用
- 數(shù)據(jù)庫系統(tǒng)原理及應用教程(第4版)
- 算法與數(shù)據(jù)中臺:基于Google、Facebook與微博實踐
- 數(shù)據(jù)驅(qū)動:從方法到實踐
- 智能數(shù)據(jù)分析:入門、實戰(zhàn)與平臺構(gòu)建
- 數(shù)據(jù)庫技術(shù)及應用教程
- SQL Server 2012數(shù)據(jù)庫管理教程
- 編寫有效用例
- Unreal Engine Virtual Reality Quick Start Guide
- 貫通SQL Server 2008數(shù)據(jù)庫系統(tǒng)開發(fā)
- 數(shù)據(jù)修復技術(shù)與典型實例實戰(zhàn)詳解(第2版)