書名： Mastering Machine Learning with R（Second Edition）
作者名： Cory Lesmeister
本章字數(shù)： 458字
更新時間： 2021-07-09 18:23:56

Interaction terms

Interaction terms are similarly easy to code in R. Two features interact if the effect on the prediction of one feature depends on the value of the other feature. This would follow the formulation, Y = B0 + B1x + B2x + B1B2x + e. An example is available in the MASS package with the Boston dataset. The response is the median home value, which is medv in the output. We will use two features: the percentage of homes with a low socioeconomic status, which is termed lstat, and the age of the home in years, which is termed age in the following output:

    > library(MASS)
    
    > data(Boston)
    
    > str(Boston)
    
    'data.frame':   506 obs. of  14 variables:
    $ crim   : num  0.00632 0.02731 0.02729 0.03237 
       0.06905 ...
    $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 
       ...
    $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 
       7.87 7.87 7.87 
      ...
    $ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
    $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 
      0.524 0.524 
      0.524 0.524 ...
    $ rm     : num  6.58 6.42 7.18 7 7.15 ...
    $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 
      96.1 100 85.9 
      ...
    $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
    $ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
    $ tax    : num  296 242 242 222 222 222 311 311 311 
      311 ...
    $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 
      15.2 15.2 15.2 
      ...
    $ black  : num  397 397 393 395 397 ...
    $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
    $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 
      27.1 16.5 18.9 ...

Using feature1*feature2 with the lm() function in the code puts both the features as well as their interaction term in the model, as follows:

    > value.fit <- lm(medv ~ lstat * age, data = 
      Boston)
    
    > summary(value.fit)
    
    Call:
    lm(formula = medv ~ lstat * age, data = Boston)
    
    Residuals:
        Min      1Q  Median      3Q     Max
    -15.806  -4.045  -1.333   2.085  27.552
    
    Coefficients:
      Estimate Std. Error t value Pr(>|t|)    
    (Intercept) 36.0885359  1.4698355  24.553  < 2e-16 
      ***
    lstat       -1.3921168  0.1674555  -8.313 8.78e-16 
      ***
    age         -0.0007209  0.0198792  -0.036   0.9711    
    lstat:age    0.0041560  0.0018518   2.244   0.0252 
      *  
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 
      '.' 0.1 ' ' 1
    
    Residual standard error: 6.149 on 502 degrees of 
      freedom
    Multiple R-squared:  0.5557,    Adjusted R-squared:  
      0.5531
    F-statistic: 209.3 on 3 and 502 DF,  p-value: < 
      2.2e-16

Examining the output, we can see that, while the socioeconomic status is a highly predictive feature, the age of the home is not. However, the two features have a significant interaction to positively explain the home value.

官术网_书友最值得收藏!

Mastering Machine Learning with R（Second Edition）

Interaction terms