- Statistics for Machine Learning
- Pratap Dangeti
- 330字
- 2021-07-02 19:05:58
Comparison between regression and machine learning models
Linear regression and machine learning models both try to solve the same problem in different ways. In the following simple example of a two-variable equation fitting the best possible plane, regression models try to fit the best possible hyperplane by minimizing the errors between the hyperplane and actual observations. However, in machine learning, the same problem has been converted into an optimization problem in which errors are modeled in squared form to minimize errors by altering the weights.
In statistical modeling, samples are drawn from the population and the model will be fitted on sampled data. However, in machine learning, even small numbers such as 30 observations would be good enough to update the weights at the end of each iteration; in a few cases, such as online learning, the model will be updated with even one observation:

Machine learning models can be effectively parallelized and made to work on multiple machines in which model weights are broadcast across the machines, and so on. In the case of big data with Spark, these techniques are implemented.
Statistical models are parametric in nature, which means a model will have parameters on which diagnostics are performed to check the validity of the model. Whereas machine learning models are non-parametric, do not have any parameters, or curve assumptions; these models learn by themselves based on provided data and come up with complex and intricate functions rather than predefined function fitting.
Multi-collinearity checks are required to be performed in statistical modeling. Whereas, in machine learning space, weights automatically get adjusted to compensate the multi-collinearity problem. If we consider tree-based ensemble methods such as bagging, random forest, boosting, and so on, multi-collinearity does not even exist, as the underlying model is a decision tree, which does not have a multi-collinearity problem in the first place.
With the evolution of big data and distributed parallel computing, more complex models are producing state-of-the-art results which were impossible with past technology.
- 深度實(shí)踐OpenStack:基于Python的OpenStack組件開發(fā)
- Spring 5企業(yè)級(jí)開發(fā)實(shí)戰(zhàn)
- Python自動(dòng)化運(yùn)維快速入門
- Three.js開發(fā)指南:基于WebGL和HTML5在網(wǎng)頁上渲染3D圖形和動(dòng)畫(原書第3版)
- 跟小海龜學(xué)Python
- Web Application Development with MEAN
- Learn WebAssembly
- Quarkus實(shí)踐指南:構(gòu)建新一代的Kubernetes原生Java微服務(wù)
- RSpec Essentials
- Django 3.0入門與實(shí)踐
- Hands-On Nuxt.js Web Development
- 3ds Max印象 電視欄目包裝動(dòng)畫與特效制作
- Java7程序設(shè)計(jì)入門經(jīng)典
- Java從入門到精通(視頻實(shí)戰(zhàn)版)
- 少兒編程輕松學(xué)(全2冊(cè))