Let's say you have a list of data point pairs such as the following:
You want to find out if there are any linear relationships between and .
In the simplest possible model of linear regression, there exists a simple linear relationship between the independent variable (also known as the predictor variable) and the dependent variable (also known as the predicted or the target variable). The independent variable is most often represented by the symbol and the target variable is represented by the symbol . In the simplest form of linear regression, with only one predictor variable, the predicted value of Y is calculated by the following formula:
is the predicted variable for . Error for a single data point is represented by:
and are the regression parameters that can be calculated with the following formula.
The best linear model minimizes the sum of squared errors. This is known as Sum of Squared Error (SSE).
For the best model, the regression coefficients are found by the following formula:
Where each variable is described as the following:
The best linear model reduces the residuals. A residual is the vertical gap between the predicted and the actual value. The following image shows very nicely what is meant by residual: