官术网_书友最值得收藏!

Polynomial regression

While in linear regression, the correlation between the independent and the dependent variables is best represented with a straight line, the real-life datasets are more complex and do not represent a linear relationship between cause and effect. The straight line equation does not fit the data points and hence cannot create an effective predictive model.

In such cases, we can consider using a higher-order quadratic equation for the predictor function. Given x as an independent variable and y as a dependent variable, the polynomial function takes the following forms:

These can be visualized with a small set of sample data as follows:

Figure 3.11 Polynomial prediction function

Note that the straight line cannot accurately represent the relationship between x and y. As we model the prediction function with higher-order functions, R2 is improved. This means the model is able to be more accurate.

We may think that it will be best to use the highest possible order equation for the prediction function in order to get the best fitting model. However, that is not right because as we create the regression line that goes through all the data points, the model fails to accurately predict the outcomes for any data outside of the training sample (test data). This problem is called overfitting. On the other end, we may also encounter the problem of underfitting. This is when the model does not fit the training data well and hence performs poorly with the test data.

主站蜘蛛池模板: 罗平县| 呼玛县| 蓬安县| 河北省| 文安县| 威宁| 汽车| 静乐县| 大石桥市| 鄯善县| 邵武市| 延津县| 平原县| 南木林县| 潜江市| 左权县| 太白县| 兴城市| 浦城县| 安化县| 柘城县| 阿坝县| 历史| 密云县| 涿鹿县| 双鸭山市| 华安县| 勃利县| 阜宁县| 泰州市| 石台县| 界首市| 临沭县| 东明县| 驻马店市| 汾西县| 伊宁县| 天津市| 蓬安县| 吉安县| 夏津县|