官术网_书友最值得收藏!

Checking model assumptions

Linear models, as with any kind of models, require that we check their assumptions to justify their application. The accuracy and interpretability of the results comes from adhering to a model's assumptions. Sometimes these will be rigorous assumptions in the sense that if they are not strictly met, then the model is not considered to be valid at all. Other times, we will be working with more flexible assumptions in which a degree of criteria from the analyst will come into play.

For those of you interested, a great article about models' assumptions is David Robinson's, K-means clustering is not free lunch, 2015 (http://varianceexplained.org/r/kmeans-free-lunch/).

For linear models, the following are some of the core assumptions:

  • Linearity: There is a linear relation among the variables
  • Normality: Residuals are normally distributed
  • Homoscedasticity: Residuals have constant variance
  • No collinearity: Variables are not linear combinations of each other
  • Independence: Residuals are independent or at least not correlated

We will show how to briefly check four of the them: linearity, normality, homoscedasticity, and no collinearity. We should mention that the independence assumption is probably the most difficult assumption to test, and you can generally handle it with common sense and understanding how the data was collected. We will not get into that here as it's more in the statistics side of things and we want to keep the book focused on programming techniques. For the statistically-interested reader, we recommend looking at Jeffrey M. Wooldridge's, Introductory Econometrics, 2013 and Joshua D. Angrist and Jorn-Steffen Pischke's, Mostly Harmless Econometrics, 2008.

主站蜘蛛池模板: 宣汉县| 牟定县| 扎兰屯市| 鄂伦春自治旗| 格尔木市| 曲松县| 吐鲁番市| 清涧县| 星子县| 徐汇区| 浑源县| 福贡县| 洪洞县| 惠水县| 宿迁市| 宁河县| 淮滨县| 大港区| 开远市| 万荣县| 小金县| 江永县| 嘉义县| 株洲县| 和政县| 嫩江县| 昌平区| 阿坝县| 桑日县| 囊谦县| 怀仁县| 寿宁县| 从江县| 衢州市| 东海县| 五大连池市| 岳池县| 沙坪坝区| 四川省| 兰溪市| 陈巴尔虎旗|