- Test-Driven Machine Learning
- Justin Bozonier
- 483字
- 2021-07-30 10:20:01
Different approaches to validating the improved models
Model quality validation, of course, depends upon the kinds of models that you're building, and the purpose of them. There are a few general types of machine learning problems that I've covered in this book, and each has different ways of validating model quality.
Classification overview
We'll get to the specifics in just a moment, but let's review the high-level terms. One method for quantifying the quality of a supervised classification is using ROC curves. These can be quantified by finding the total area under the curve (AUC), finding the location of the inflection point, or by simply setting a limit of the amount of data that must be classified correctly against percentage of the time.
Another common technique is that of a confusion matrix. Limits can be set on certain cells of the matrix to help drive testing. Also, they can be used as a diagnostic tool that can help identify the issues that come up.
We will typically use the k-fold cross validation. Cross validation is a technique where we take our sample dataset and divide it into several separate datasets. We can then use one of these datasets to develop against one of the others, to validate that our data isn't overfitted, and a third dataset for a final check to see whether the others went well. All of these separate datasets work to make sure that we develop a generally applicable model, and not just one that predicts our training data but falls apart in production.
Regression
Linear regression quality is typically quantified with the combination of an adjusted- R2 value and by checking this, the residuals of the model don't fit a pattern. How do we check for this in an automated test?
The adjusted R2 values are provided by the most statistical tools. It's a quick measure of how much of the variations in the data is explained by your model. Checking model assumptions is more difficult. It is much easier to see patterns visually than via discrete, specific tests.
So, this is hard but there are other tests… perhaps, even more important tests that are easier—cross-validation. By selecting strong test datasets with a litany of misbehavior, we can compare R2 statistics from development, to testing, to ready for production. If a serious drop occurs at any point, then we can circle back.
Clustering
Clustering is the way in which we create our classification model. From there, we can test it by cross validating against our data. This can be especially useful in clustering algorithms, such as k-means, where the feedback can help us tune the number of clusters we want to use to minimize the cluster variation. As we move from one cross-validation dataset to another, it's important to remember not to persist with our training data from the previous tests, lest we bias our results.
- Web前端開(kāi)發(fā)技術(shù):HTML、CSS、JavaScript(第3版)
- Redis Applied Design Patterns
- 大學(xué)計(jì)算機(jī)應(yīng)用基礎(chǔ)實(shí)踐教程
- FreeSWITCH 1.8
- 三維圖形化C++趣味編程
- PyTorch Artificial Intelligence Fundamentals
- 微信公眾平臺(tái)開(kāi)發(fā):從零基礎(chǔ)到ThinkPHP5高性能框架實(shí)踐
- MongoDB權(quán)威指南(第3版)
- Java程序設(shè)計(jì)入門(mén)
- 從零開(kāi)始學(xué)Linux編程
- 軟件測(cè)試綜合技術(shù)
- SwiftUI極簡(jiǎn)開(kāi)發(fā)
- WebStorm Essentials
- HTML5游戲開(kāi)發(fā)實(shí)戰(zhàn)
- Shopify Application Development