- Hands-On Machine Learning with ML.NET
- Jarred Capellman
- 425字
- 2021-06-24 16:43:33
The Trainer class
Inside the Trainer class, a large portion was rewritten to handle the expanded features used and to provide regression algorithm evaluation as opposed to the binary classification we looked at in Chapter 2, Setting Up the ML.NET Environment.
The first change is the use of a comma to separate the data as opposed to the default tab like we used in Chapter 2, Setting Up the ML.NET Environment:
var trainingDataView = MlContext.Data.LoadFromTextFile<EmploymentHistory>(trainingFileName, ',');
The next change is in the pipeline creation itself. In our first application, we had a label and fed that straight into the pipeline. With this application, we have nine features to predict the duration of a person's employment in the DurationInMonths property and append each one of them to the pipeline using the C# 6.0 feature, nameof. You might have noticed the use of magic strings to map class properties to features in various code samples on GitHub and MSDN; personally, I find this error-prone compared to the strongly typed approach.
For every property, we call the NormalizeMeanVariance transform method, which as the name implies normalizes the input data both on the mean and the variance. ML.NET computes this by subtracting the mean of the input data and dividing that value by the variance of the inputted data. The purpose behind this is to nullify outliers in the input data so the model isn't skewed to handle an edge case compared to the normal range. For example, suppose the sample dataset of employment history had 20 rows and all but one of those rows had a person with 50 years experience. The one row that didn't fit would be normalized to better fit within the ranges of values entered into the model.
In addition, note the use of the extension method referred to earlier to help to simplify the following code, when we concatenate all of the feature columns:
var dataProcessPipeline = MlContext.Transforms.CopyColumns("Label", nameof(EmploymentHistory.DurationInMonths))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.IsMarried)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.BSDegree)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.MSDegree)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.YearsExperience))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.AgeAtHire)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.HasKids)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.WithinMonthOfVesting)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.DeskDecorations)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.LongCommute)))
.Append(MlContext.Transforms.Concatenate("Features",
typeof(EmploymentHistory).ToPropertyList<EmploymentHistory>(nameof(EmploymentHistory.DurationInMonths)))));
We can then create the Sdca trainer using the default parameters ("Label" and "Features"):
var trainer = MlContext.Regression.Trainers.Sdca(labelColumnName: "Label", featureColumnName: "Features");
Lastly, we call the Regression.Evaluate method to provide regression specific metrics, followed by a Console.WriteLine call to provide these metrics to your console output. We will go into detail about what each of these means in the last section of this chapter:
var modelMetrics = MlContext.Regression.Evaluate(testSetTransform);
Console.WriteLine($"Loss Function: {modelMetrics.LossFunction:0.##}{Environment.NewLine}" +
$"Mean Absolute Error: {modelMetrics.MeanAbsoluteError:#.##}{Environment.NewLine}" +
$"Mean Squared Error: {modelMetrics.MeanSquaredError:#.##}{Environment.NewLine}" +
$"RSquared: {modelMetrics.RSquared:0.##}{Environment.NewLine}" +
$"Root Mean Squared Error: {modelMetrics.RootMeanSquaredError:#.##}");
- Facebook Application Development with Graph API Cookbook
- 程序員面試筆試寶典
- Vue.js快跑:構建觸手可及的高性能Web應用
- 自然語言處理Python進階
- VMware虛擬化技術
- Python編程:從入門到實踐
- Jenkins Continuous Integration Cookbook(Second Edition)
- .NET Standard 2.0 Cookbook
- Java Web從入門到精通(第2版)
- Java EE Web應用開發基礎
- .NET 4.0面向對象編程漫談:應用篇
- 數據科學中的實用統計學(第2版)
- Python Social Media Analytics
- Elasticsearch搜索引擎構建入門與實戰
- Wearable:Tech Projects with the Raspberry Pi Zero