- Mastering Machine Learning with R(Second Edition)
- Cory Lesmeister
- 458字
- 2021-07-09 18:23:56
Interaction terms
Interaction terms are similarly easy to code in R. Two features interact if the effect on the prediction of one feature depends on the value of the other feature. This would follow the formulation, Y = B0 + B1x + B2x + B1B2x + e. An example is available in the MASS package with the Boston dataset. The response is the median home value, which is medv in the output. We will use two features: the percentage of homes with a low socioeconomic status, which is termed lstat, and the age of the home in years, which is termed age in the following output:
> library(MASS)
> data(Boston)
> str(Boston)
'data.frame': 506 obs. of 14 variables:
$ crim : num 0.00632 0.02731 0.02729 0.03237
0.06905 ...
$ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5
...
$ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87
7.87 7.87 7.87
...
$ chas : int 0 0 0 0 0 0 0 0 0 0 ...
$ nox : num 0.538 0.469 0.469 0.458 0.458 0.458
0.524 0.524
0.524 0.524 ...
$ rm : num 6.58 6.42 7.18 7 7.15 ...
$ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6
96.1 100 85.9
...
$ dis : num 4.09 4.97 4.97 6.06 6.06 ...
$ rad : int 1 2 2 3 3 3 5 5 5 5 ...
$ tax : num 296 242 242 222 222 222 311 311 311
311 ...
$ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2
15.2 15.2 15.2
...
$ black : num 397 397 393 395 397 ...
$ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
$ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9
27.1 16.5 18.9 ...
Using feature1*feature2 with the lm() function in the code puts both the features as well as their interaction term in the model, as follows:
> value.fit <- lm(medv ~ lstat * age, data =
Boston)
> summary(value.fit)
Call:
lm(formula = medv ~ lstat * age, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.806 -4.045 -1.333 2.085 27.552
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.0885359 1.4698355 24.553 < 2e-16
***
lstat -1.3921168 0.1674555 -8.313 8.78e-16
***
age -0.0007209 0.0198792 -0.036 0.9711
lstat:age 0.0041560 0.0018518 2.244 0.0252
*
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
Residual standard error: 6.149 on 502 degrees of
freedom
Multiple R-squared: 0.5557, Adjusted R-squared:
0.5531
F-statistic: 209.3 on 3 and 502 DF, p-value: <
2.2e-16
Examining the output, we can see that, while the socioeconomic status is a highly predictive feature, the age of the home is not. However, the two features have a significant interaction to positively explain the home value.
- MySQL高可用解決方案:從主從復制到InnoDB Cluster架構
- 云計算環境下的信息資源集成與服務
- 數據革命:大數據價值實現方法、技術與案例
- 大話Oracle Grid:云時代的RAC
- Python金融數據分析(原書第2版)
- Proxmox VE超融合集群實踐真傳
- 數據中心數字孿生應用實踐
- Python數據分析與數據化運營
- Google Cloud Platform for Developers
- 聯動Oracle:設計思想、架構實現與AWR報告
- Access 2010數據庫程序設計實踐教程
- Internet of Things with Python
- 一本書講透數據治理:戰略、方法、工具與實踐
- AutoCAD基礎與應用精品教程(2008版)
- 用戶畫像:平臺構建與業務實踐