官术网_书友最值得收藏!

The Trainer class

Inside the Trainer class, a large portion was rewritten to handle the expanded features used and to provide regression algorithm evaluation as opposed to the binary classification we looked at in Chapter 2Setting Up the ML.NET Environment.

The first change is the use of a comma to separate the data as opposed to the default tab like we used in Chapter 2Setting Up the ML.NET Environment:

var trainingDataView = MlContext.Data.LoadFromTextFile<EmploymentHistory>(trainingFileName, ',');

The next change is in the pipeline creation itself. In our first application, we had a label and fed that straight into the pipeline. With this application, we have nine features to predict the duration of a person's employment in the DurationInMonths property and append each one of them to the pipeline using the C# 6.0 feature, nameof. You might have noticed the use of magic strings to map class properties to features in various code samples on GitHub and MSDN; personally, I find this error-prone compared to the strongly typed approach.

For every property, we call the NormalizeMeanVariance transform method, which as the name implies normalizes the input data both on the mean and the variance. ML.NET computes this by subtracting the mean of the input data and dividing that value by the variance of the inputted data. The purpose behind this is to nullify outliers in the input data so the model isn't skewed to handle an edge case compared to the normal range. For example, suppose the sample dataset of employment history had 20 rows and all but one of those rows had a person with 50 years experience. The one row that didn't fit would be normalized to better fit within the ranges of values entered into the model.

In addition, note the use of the extension method referred to earlier to help to simplify the following code, when we concatenate all of the feature columns:

var dataProcessPipeline = MlContext.Transforms.CopyColumns("Label", nameof(EmploymentHistory.DurationInMonths))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.IsMarried)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.BSDegree)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.MSDegree)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.YearsExperience))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.AgeAtHire)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.HasKids)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.WithinMonthOfVesting)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.DeskDecorations)))
.Append(MlContext.Transforms.NormalizeMeanVariance(nameof(EmploymentHistory.LongCommute)))
.Append(MlContext.Transforms.Concatenate("Features",
typeof(EmploymentHistory).ToPropertyList<EmploymentHistory>(nameof(EmploymentHistory.DurationInMonths)))));

We can then create the Sdca trainer using the default parameters ("Label" and "Features"):

var trainer = MlContext.Regression.Trainers.Sdca(labelColumnName: "Label", featureColumnName: "Features");

Lastly, we call the Regression.Evaluate method to provide regression specific metrics, followed by a Console.WriteLine call to provide these metrics to your console output. We will go into detail about what each of these means in the last section of this chapter:

var modelMetrics = MlContext.Regression.Evaluate(testSetTransform);

Console.WriteLine($"Loss Function: {modelMetrics.LossFunction:0.##}{Environment.NewLine}" +
$"Mean Absolute Error: {modelMetrics.MeanAbsoluteError:#.##}{Environment.NewLine}" +
$"Mean Squared Error: {modelMetrics.MeanSquaredError:#.##}{Environment.NewLine}" +
$"RSquared: {modelMetrics.RSquared:0.##}{Environment.NewLine}" +
$"Root Mean Squared Error: {modelMetrics.RootMeanSquaredError:#.##}");
主站蜘蛛池模板: 炉霍县| 肇庆市| 藁城市| 宜宾县| 隆德县| 宁安市| 扶沟县| 鱼台县| 两当县| 武强县| 清水河县| 龙岩市| 左权县| 吉安县| 郴州市| 沂源县| 泰顺县| 海安县| 时尚| 青铜峡市| 临高县| 建德市| 务川| 旬邑县| 宁远县| 新民市| 柳江县| 陕西省| 松阳县| 隆德县| 沭阳县| 阿拉尔市| 临朐县| 台东县| 长泰县| 比如县| 松溪县| 大姚县| 和林格尔县| 永福县| 镇原县|