官术网_书友最值得收藏!

Interaction terms

Interaction terms are similarly easy to code in R. Two features interact if the effect on the prediction of one feature depends on the value of the other feature. This would follow the formulation, Y = B0 + B1x + B2x + B1B2x + e. An example is available in the MASS package with the Boston dataset. The response is the median home value, which is medv in the output. We will use two features: the percentage of homes with a low socioeconomic status, which is termed lstat, and the age of the home in years, which is termed age in the following output:

    > library(MASS)

> data(Boston)

> str(Boston)

'data.frame': 506 obs. of 14 variables:
$ crim : num 0.00632 0.02731 0.02729 0.03237
0.06905 ...

$ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5
...

$ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87
7.87 7.87 7.87
...

$ chas : int 0 0 0 0 0 0 0 0 0 0 ...
$ nox : num 0.538 0.469 0.469 0.458 0.458 0.458
0.524 0.524
0.524 0.524 ...

$ rm : num 6.58 6.42 7.18 7 7.15 ...
$ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6
96.1 100 85.9
...

$ dis : num 4.09 4.97 4.97 6.06 6.06 ...
$ rad : int 1 2 2 3 3 3 5 5 5 5 ...
$ tax : num 296 242 242 222 222 222 311 311 311
311 ...

$ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2
15.2 15.2 15.2
...

$ black : num 397 397 393 395 397 ...
$ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
$ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9
27.1 16.5 18.9 ...

Using feature1*feature2 with the lm() function in the code puts both the features as well as their interaction term in the model, as follows:

    > value.fit <- lm(medv ~ lstat * age, data = 
Boston)


> summary(value.fit)

Call:
lm(formula = medv ~ lstat * age, data = Boston)

Residuals:
Min 1Q Median 3Q Max
-15.806 -4.045 -1.333 2.085 27.552

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.0885359 1.4698355 24.553 < 2e-16
***

lstat -1.3921168 0.1674555 -8.313 8.78e-16
***

age -0.0007209 0.0198792 -0.036 0.9711
lstat:age 0.0041560 0.0018518 2.244 0.0252
*

---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1


Residual standard error: 6.149 on 502 degrees of
freedom

Multiple R-squared: 0.5557, Adjusted R-squared:
0.5531

F-statistic: 209.3 on 3 and 502 DF, p-value: <
2.2e-16

Examining the output, we can see that, while the socioeconomic status is a highly predictive feature, the age of the home is not. However, the two features have a significant interaction to positively explain the home value.

主站蜘蛛池模板: 西峡县| 东平县| 大竹县| 万载县| 佛坪县| 诸暨市| 罗田县| 遂平县| 昌乐县| 高青县| 高州市| 会泽县| 长顺县| 自贡市| 汨罗市| 诸城市| 富阳市| 左权县| 宣威市| 沅江市| 阿巴嘎旗| 邢台县| 潼关县| 资中县| 马关县| 依安县| 夏河县| 武宁县| 顺义区| 玉林市| 巴林右旗| 岗巴县| 娄烦县| 沾化县| 甘谷县| 民权县| 黔西| 广水市| 晴隆县| 建始县| 陆川县|