- Mastering Machine Learning on AWS
- Dr. Saket S.R. Mengle Maximo Gurmendez
- 764字
- 2021-06-24 14:23:17
Understanding linear regression
Regression algorithms are an important algorithm in a data scientist's toolkit as they can be used for various non-binary prediction tasks. The linear regression algorithm models the relationship between a dependent variable that we are trying to predict with a vector of independent variables. The vector of variables is also called the regressor in the context of regression algorithms. Linear regression assumes that there is a linear relationship between the vector of independent variables and the dependent variable that we are trying to predict. Hence, linear regression models learn the unknown variables and constants of a linear function using the training data so that the linear function best fits the training data.
Linear regression can be applied in cases where the goal is to predict or forecast the dependent variable based on the regressor variables. We will use an example to explain how linear regression trains based on data.
The following table shows a sample dataset where the goal is to predict the price of a house based on three variables:

In this dataset, the Floor Size, Number of Bedrooms, and Number of Bathrooms variables are assumed as independent in linear regression. Our goal is to predict the House Price value based on the variables.
Let's simplify this problem. Let's only consider the Floor Size variable to predict the house price. Creating linear regression from only one variable or regressor is referred to as a simple linear regression. If we create a scatterplot of these two values, we can see that there is a relationship between them:

Although there is not an exact linear relationship between the two variables, we can create an approximate line that represents the trend. The aim of the modeling algorithm is to minimize the error in creating this approximate line.
As we know, a straight line can be represented by the following equation:
Hence, the approximately linear relationship in the preceding diagram can also be represented using the same formula, and the task of the linear regression model is to learn the values of and
. Moreover, since we know that the relationship between the predicted variable and the regressors is not strictly linear, we can add a random error variable to the equation that models the noise in the dataset. The following formula represents how the simple linear regression model is represented:
Now, let's consider a dataset with multiple regressors. Instead of just representing the linear relationship between two variables, and
, we will represent a set of regressors as
. We will assume that a linear relationship between the dependent variable
and the regressors
is linear. Thus, a linear regression model with multiple regressors is represented by the following formula:
Linear regression, as we've already discussed, assumes that there is a linear relationship between the regressors and the dependent variable that we are trying to predict. This is an important assumption that may not hold true in all datasets. Hence, for a data scientist, using linear regression may look attractive due to its fast training time. However, if the dataset variables do not have a linear relationship with the dependent variable, it may lead to significant errors. In such cases, data scientists may also try algorithms such as Bernoulli regression, Poisson regression, or multinomial regression to improve prediction precision. We will also discuss logistic regression later in this chapter, which is used when the dependent variable is binary.
During the training phase, linear regression can use various techniques for parameter estimation to learn the values of ,
, and
. We will not go into the details of these techniques in this book. However, we recommend that you try using these parameter estimation techniques in the examples that follow and observe their effect on the training time of the algorithm and the accuracy of prediction.
To fit a linear model to the data, we first need to be able to determine how well a linear model fits the data. There are various models being developed for parameter estimation in linear regression. Parameter estimation is the process of estimating the values of ,
, and
. In the following sections, will briefly explain the following estimation techniques.
- Arduino入門基礎教程
- Learning AngularJS Animations
- 電腦軟硬件維修大全(實例精華版)
- 電腦組裝與維修從入門到精通(第2版)
- 精選單片機設計與制作30例(第2版)
- Mastering Adobe Photoshop Elements
- SiFive 經典RISC-V FE310微控制器原理與實踐
- Machine Learning Solutions
- 微型計算機系統原理及應用:國產龍芯處理器的軟件和硬件集成(基礎篇)
- BeagleBone Robotic Projects
- 筆記本電腦維修實踐教程
- 基于PROTEUS的電路設計、仿真與制板
- LPC1100系列處理器原理及應用
- Python Machine Learning Blueprints
- Istio實戰指南