- Mastering Machine Learning with R(Second Edition)
- Cory Lesmeister
- 458字
- 2021-07-09 18:23:56
Interaction terms
Interaction terms are similarly easy to code in R. Two features interact if the effect on the prediction of one feature depends on the value of the other feature. This would follow the formulation, Y = B0 + B1x + B2x + B1B2x + e. An example is available in the MASS package with the Boston dataset. The response is the median home value, which is medv in the output. We will use two features: the percentage of homes with a low socioeconomic status, which is termed lstat, and the age of the home in years, which is termed age in the following output:
> library(MASS)
> data(Boston)
> str(Boston)
'data.frame': 506 obs. of 14 variables:
$ crim : num 0.00632 0.02731 0.02729 0.03237
0.06905 ...
$ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5
...
$ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87
7.87 7.87 7.87
...
$ chas : int 0 0 0 0 0 0 0 0 0 0 ...
$ nox : num 0.538 0.469 0.469 0.458 0.458 0.458
0.524 0.524
0.524 0.524 ...
$ rm : num 6.58 6.42 7.18 7 7.15 ...
$ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6
96.1 100 85.9
...
$ dis : num 4.09 4.97 4.97 6.06 6.06 ...
$ rad : int 1 2 2 3 3 3 5 5 5 5 ...
$ tax : num 296 242 242 222 222 222 311 311 311
311 ...
$ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2
15.2 15.2 15.2
...
$ black : num 397 397 393 395 397 ...
$ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
$ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9
27.1 16.5 18.9 ...
Using feature1*feature2 with the lm() function in the code puts both the features as well as their interaction term in the model, as follows:
> value.fit <- lm(medv ~ lstat * age, data =
Boston)
> summary(value.fit)
Call:
lm(formula = medv ~ lstat * age, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.806 -4.045 -1.333 2.085 27.552
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.0885359 1.4698355 24.553 < 2e-16
***
lstat -1.3921168 0.1674555 -8.313 8.78e-16
***
age -0.0007209 0.0198792 -0.036 0.9711
lstat:age 0.0041560 0.0018518 2.244 0.0252
*
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
Residual standard error: 6.149 on 502 degrees of
freedom
Multiple R-squared: 0.5557, Adjusted R-squared:
0.5531
F-statistic: 209.3 on 3 and 502 DF, p-value: <
2.2e-16
Examining the output, we can see that, while the socioeconomic status is a highly predictive feature, the age of the home is not. However, the two features have a significant interaction to positively explain the home value.
- 使用GitOps實現(xiàn)Kubernetes的持續(xù)部署:模式、流程及工具
- Access 2007數(shù)據(jù)庫應(yīng)用上機指導(dǎo)與練習(xí)
- 數(shù)據(jù)驅(qū)動設(shè)計:A/B測試提升用戶體驗
- Starling Game Development Essentials
- 數(shù)據(jù)挖掘原理與SPSS Clementine應(yīng)用寶典
- 深入淺出 Hyperscan:高性能正則表達式算法原理與設(shè)計
- 智慧的云計算
- SAS金融數(shù)據(jù)挖掘與建模:系統(tǒng)方法與案例解析
- R Machine Learning Essentials
- Oracle 11g+ASP.NET數(shù)據(jù)庫系統(tǒng)開發(fā)案例教程
- Mastering ROS for Robotics Programming(Second Edition)
- 數(shù)據(jù)中心經(jīng)營之道
- 企業(yè)級大數(shù)據(jù)項目實戰(zhàn):用戶搜索行為分析系統(tǒng)從0到1
- AI Crash Course
- 一類智能優(yōu)化算法的改進及應(yīng)用研究