- Hands-On Machine Learning with ML.NET
- Jarred Capellman
- 425字
- 2021-06-24 16:43:33
The Trainer class
Inside the Trainer class, a large portion was rewritten to handle the expanded features used and to provide regression algorithm evaluation as opposed to the binary classification we looked at in Chapter 2, Setting Up the ML.NET Environment.
The first change is the use of a comma to separate the data as opposed to the default tab like we used in Chapter 2, Setting Up the ML.NET Environment:
var trainingDataView = MlContext.Data.LoadFromTextFile<EmploymentHistory>(trainingFileName, ',');
The next change is in the pipeline creation itself. In our first application, we had a label and fed that straight into the pipeline. With this application, we have nine features to predict the duration of a person's employment in the DurationInMonths property and append each one of them to the pipeline using the C# 6.0 feature, nameof. You might have noticed the use of magic strings to map class properties to features in various code samples on GitHub and MSDN; personally, I find this error-prone compared to the strongly typed approach.
For every property, we call the NormalizeMeanVariance transform method, which as the name implies normalizes the input data both on the mean and the variance. ML.NET computes this by subtracting the mean of the input data and dividing that value by the variance of the inputted data. The purpose behind this is to nullify outliers in the input data so the model isn't skewed to handle an edge case compared to the normal range. For example, suppose the sample dataset of employment history had 20 rows and all but one of those rows had a person with 50 years experience. The one row that didn't fit would be normalized to better fit within the ranges of values entered into the model.
In addition, note the use of the extension method referred to earlier to help to simplify the following code, when we concatenate all of the feature columns:
var dataProcessPipeline = MlContext.Transforms.CopyColumns("Label", nameof(EmploymentHistory.DurationInMonths))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.IsMarried)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.BSDegree)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.MSDegree)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.YearsExperience))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.AgeAtHire)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.HasKids)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.WithinMonthOfVesting)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.DeskDecorations)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.LongCommute)))
.Append(MlContext.Transforms.Concatenate("Features",
typeof(EmploymentHistory).ToPropertyList<EmploymentHistory>(nameof(EmploymentHistory.DurationInMonths)))));
We can then create the Sdca trainer using the default parameters ("Label" and "Features"):
var trainer = MlContext.Regression.Trainers.Sdca(labelColumnName: "Label", featureColumnName: "Features");
Lastly, we call the Regression.Evaluate method to provide regression specific metrics, followed by a Console.WriteLine call to provide these metrics to your console output. We will go into detail about what each of these means in the last section of this chapter:
var modelMetrics = MlContext.Regression.Evaluate(testSetTransform);
Console.WriteLine($"Loss Function: {modelMetrics.LossFunction:0.##}{Environment.NewLine}" +
$"Mean Absolute Error: {modelMetrics.MeanAbsoluteError:#.##}{Environment.NewLine}" +
$"Mean Squared Error: {modelMetrics.MeanSquaredError:#.##}{Environment.NewLine}" +
$"RSquared: {modelMetrics.RSquared:0.##}{Environment.NewLine}" +
$"Root Mean Squared Error: {modelMetrics.RootMeanSquaredError:#.##}");
- 演進式架構(原書第2版)
- Java程序設計實戰教程
- VMware vSphere 6.7虛擬化架構實戰指南
- The React Workshop
- Scientific Computing with Scala
- C語言程序設計
- Windows內核編程
- Visual Basic程序設計
- BeagleBone Black Cookbook
- 搞定J2EE:Struts+Spring+Hibernate整合詳解與典型案例
- Python網絡爬蟲技術與應用
- AMP:Building Accelerated Mobile Pages
- Learning Cocos2d-JS Game Development
- Learning TypeScript
- JBoss AS 7 Development