- Mastering Machine Learning with Spark 2.x
- Alex Tellez Max Pumperla Michal Malohlava
- 187字
- 2021-07-02 18:46:08
Labeled point vector
Prior to running any supervised machine learning algorithm using Spark MLlib, we must convert our dataset into a labeled point vector which maps features to a given label/response; labels are stored as doubles which facilitates their use for both classification and regression tasks. For all binary classification problems, labels should be stored as either 0 or 1, which we confirmed from the preceding summary statistics holds true for our example.
val higgs = response.zip(features).map { case (response, features) => LabeledPoint(response, features) } higgs.setName("higgs").cache()
An example of a labeled point vector follows:
(1.0, [0.123, 0.456, 0.567, 0.678, ..., 0.789])
In the preceding example, all doubles inside the bracket are the features and the single number outside the bracket is our label. Note that we are yet to tell Spark that we are performing a classification task and not a regression task which will happen later.
- C#完全自學教程
- Building a RESTful Web Service with Spring
- Ray分布式機器學習:利用Ray進行大模型的數據處理、訓練、推理和部署
- Visual Basic學習手冊
- R大數據分析實用指南
- Node.js全程實例
- ExtJS Web應用程序開發指南第2版
- JSP程序設計實例教程(第2版)
- 大學計算機基礎
- PHP 8從入門到精通(視頻教學版)
- Google Maps JavaScript API Cookbook
- ASP.NET jQuery Cookbook(Second Edition)
- Microsoft Azure Security
- Java算法從菜鳥到達人
- JavaScript實戰-JavaScript、jQuery、HTML5、Node.js實例大全(第2版)