官术网_书友最值得收藏!

  • Advanced Machine Learning with R
  • Cory Lesmeister Dr. Sunil Kumar Chinnamgari
  • 346字
  • 2021-06-24 14:24:36

Reverse transformation of natural log predictions

Now that you have read Duan's paper several times, here's how to apply to our work. I'm going to provide you with a user-defined function. It will do the following:

  1. Exponentiate the residuals from the transformed model
  2. Exponentiate the predicted values from the transformed model
  3. Calculate the mean of the exponentiated residuals
  4. Calculate the smeared predictions by multiplying the values in step 2 by the value in step 3
  5. Return the results

Here's the function, which requires only two arguments:

> duan_smear <- function(pred, resid){
expo_resid <- exp(resid)
expo_pred <- exp(pred)
avg_expo_resid <- mean(expo_resid)
smear_predictions <- avg_expo_resid * expo_pred
return(smear_predictions)
}

Next, we calculate the new predictions from the results of the MARS model:

 > duan_pred <- duan_smear(pred = earth_pred, resid = earth_residTest)

We can now see how the model error plays out at the original sales price:

> caret::postResample(duan_pred, test_y)
RMSE Rsquared MAE
23483.5659 0.9356 16405.7395

We can say that the model is wrong, on average, by $16,406. How does that compare with not smearing? Let's see:

> exp_pred <- exp(earth_pred)
> caret::postResample(exp_pred, test_y)
RMSE Rsquared MAE
23106.1245 0.9356 16117.4235

The error is slightly less so, in this case, it just doesn't seem to be the wise choice to smear the estimate. I've seen examples where Duan's method, and others, are combined in an ensemble model. Again, more on ensembles later in this book.

Let's conclude the analysis by plotting the non-smeared predictions alongside the actual values. I'll show how to do this in ggplot fashion:

> results <- data.frame(exp_pred, test_y)

> colnames(results) <- c('predicted', 'actual')

> ggplot2::ggplot(results, ggplot2::aes(predicted, actual)) +
ggplot2::geom_point(size=1) +
ggplot2::geom_smooth() +
ggthemes::theme_fivethirtyeight()

The output of the preceding code is as follows:

This is interesting as you can see that there's almost a subset of actual values that have higher sales prices than we predicted with their counterparts. There's some feature or interaction term that we could try and find to address that difference. We also see that, around the $400,000 sale price, there's considerable variation in the residuals—primarily, I would argue, because of the paucity of observations.

For starters, we have a pretty good model and serves as an excellent foundation for other modeling efforts as discussed. Additionally, we produced a model that's rather simple to interpret and explain, which in some cases may be more critical than some rather insignificant reduction in error. Hey, that's why you make big money. If it were easy, everyone would be doing it.

主站蜘蛛池模板: 昌都县| 汉源县| 松溪县| 东港市| 潞西市| 滨海县| 阳朔县| 伊宁县| 武鸣县| 房山区| 泰来县| 仲巴县| 肇源县| 嘉定区| 临沂市| 增城市| 内丘县| 上犹县| 历史| 游戏| 庆元县| 建瓯市| 唐河县| 章丘市| 启东市| 呼伦贝尔市| 社旗县| 广水市| 内乡县| 兴化市| 东丰县| 都安| 花莲县| 石城县| 大港区| 武功县| 小金县| 池州市| 博湖县| 和平县| 兴安盟|