- Mastering Machine Learning with R(Second Edition)
- Cory Lesmeister
- 317字
- 2021-07-09 18:23:59
Advanced Feature Selection in Linear Models
"I found that math got to be too abstract for my liking and computer science seemed concerned with little details--trying to save a microsecond or a kilobyte in a computation. In statistics I found a subject that combined the beauty of both math and computer science, using them to solve real-world problems."
This was quoted by Rob Tibshirani, Professor, Stanford University at:
https://statweb.stanford.edu/~tibs/research_page.html.
So far, we've examined the usage of linear models for both quantitative and qualitative outcomes with an emphasis on the techniques of feature selection, that is, the methods and techniques to exclude useless or unwanted predictor variables. We saw that the linear models can be quite effective in machine learning problems. However, newer techniques that have been developed and refined in the last couple of decades or so can improve predictive ability and interpretability above and beyond the linear models that we discussed in the preceding chapters. In this day and age, many datasets have numerous features in relation to the number of observations or, as it is called, high-dimensionality. If you've ever worked on a genomics problem, this will quickly become self-evident. Additionally, with the size of the data that we are being asked to work with, a technique like best subsets or stepwise feature selection can take inordinate amounts of time to converge even on high-speed computers. I'm not talking about minutes: in many cases, hours of system time are required to get a best subsets solution.
There is a better way in these cases. In this chapter, we will look at the concept of regularization where the coefficients are constrained or shrunk towards zero. There are a number of methods and permutations to these methods of regularization but we will focus on Ridge regression, Least Absolute Shrinkage and Selection Operator (LASSO), and finally, elastic net, which combines the benefit of both techniques into one.
- Python數(shù)據(jù)分析入門:從數(shù)據(jù)獲取到可視化
- Effective Amazon Machine Learning
- 大數(shù)據(jù)可視化
- 數(shù)據(jù)庫(kù)應(yīng)用基礎(chǔ)教程(Visual FoxPro 9.0)
- Spark大數(shù)據(jù)分析實(shí)戰(zhàn)
- 數(shù)據(jù)庫(kù)設(shè)計(jì)與應(yīng)用(SQL Server 2014)(第二版)
- 云原生數(shù)據(jù)中臺(tái):架構(gòu)、方法論與實(shí)踐
- TextMate How-to
- Visual Studio 2013 and .NET 4.5 Expert Cookbook
- Oracle高性能SQL引擎剖析:SQL優(yōu)化與調(diào)優(yōu)機(jī)制詳解
- SIEMENS數(shù)控技術(shù)應(yīng)用工程師:SINUMERIK 840D-810D數(shù)控系統(tǒng)功能應(yīng)用與維修調(diào)整教程
- Node.js High Performance
- 利用Python進(jìn)行數(shù)據(jù)分析(原書(shū)第2版)
- Google Cloud Platform for Architects
- ORACLE 11g權(quán)威指南