官术网_书友最值得收藏!

Regularization

Regularization is one possible approach that a data scientist may use for improving the results generated from a statistical model or data science process, such as when addressing a case of overfitting in statistics and data science.

We defined fitting earlier in this chapter (fitting describes how well a statistical model or process describes a data scientist's observations). Overfitting is a scenario where a statistical model or process seems to fit too well or appears to be too close to the actual data.

Overfitting usually occurs with an overly simple model. This means that you may have only two variables and are drawing conclusions based on the two. For example, using our previously mentioned example of daffodil sales, one might generate a model with temperature as an independent variable and sales as a dependent one. You may see the model fail since it is not as simple as concluding that warmer temperatures will always generate more sales.

In this example, there is a tendency to add more data to the process or model in hopes of achieving a better result. The idea sounds reasonable. For example, you have information such as average rainfall, pollen count, fertilizer sales, and so on; could these data points be added as explanatory variables?

An explanatory variable is a type of independent variable with a subtle difference. When a variable is independent, it is not affected at all by any other variables. When a variable isn't independent for certain, it's an explanatory variable.

Continuing to add more and more data to your model will have an effect but will probably cause overfitting, resulting in poor predictions since it will closely resemble the data, which is mostly just background noise.

To overcome this situation, a data scientist can use regularization, introducing a tuning parameter (additional factors such as a data points mean value or a minimum or maximum limitation, which gives you the ability to change the complexity or smoothness of your model) into the data science process to solve an ill-posed problem or to prevent overfitting.

主站蜘蛛池模板: 阿城市| 息烽县| 宁阳县| 莫力| 南开区| 长治县| 长阳| 垫江县| 杭锦后旗| 布拖县| 卢龙县| 岑巩县| 诸城市| 綦江县| 舟曲县| 高要市| 锡林浩特市| 曲周县| 顺昌县| 仲巴县| 佳木斯市| 柘城县| 陈巴尔虎旗| 凤城市| 西乌珠穆沁旗| 荔波县| 浪卡子县| 策勒县| 察雅县| 平安县| 盐山县| 于都县| 乳山市| 吉首市| 兴文县| 岑溪市| 民权县| 白朗县| 平舆县| 临江市| 冀州市|