官术网_书友最值得收藏!

The regression model

The previous section developed a deep learning model for a binary classification task, this section develops a deep learning model to predict a continuous numeric value, regression analysis. We use the same dataset that we used for the binary classification task, but we use a different target column to predict for. In that task, we wanted to predict whether a customer would return to our stores in the next 14 days. In this task, we want to predict how much a customer will spend in our stores in the next 14 days. We follow a similar process; we load and prepare our dataset by applying log transformations to the data. The code is in Chapter4/regression.R:

set.seed(42)
fileName <- "../dunnhumby/predict.csv"
dfData <- read_csv(fileName,
col_types = cols(
.default = col_double(),
CUST_CODE = col_character(),
Y_categ = col_integer())
)
nobs <- nrow(dfData)
train <- sample(nobs, 0.9*nobs)
test <- setdiff(seq_len(nobs), train)
predictorCols <- colnames(dfData)[!(colnames(dfData) %in% c("CUST_CODE","Y_numeric","Y_numeric"))]

dfData[, c("Y_numeric",predictorCols)] <- log(0.01+dfData[, c("Y_numeric",predictorCols)])
trainData <- dfData[train, c(predictorCols,"Y_numeric")]
testData <- dfData[test, c(predictorCols,"Y_numeric")]

xtrain <- model.matrix(Y_numeric~.,trainData)
xtest <- model.matrix(Y_numeric~.,testData)

We then perform regression analysis on the data using lm to create a benchmark before creating a deep learning model:

# lm Regression Model
regModel1=lm(Y_numeric ~ .,data=trainData)
pr1 <- predict(regModel1,testData)
rmse <- sqrt(mean((exp(pr1)-exp(testData[,"Y_numeric"]$Y_numeric))^2))
print(sprintf(" Regression RMSE = %1.2f",rmse))
[1] " Regression RMSE = 29.30"
mae <- mean(abs(exp(pr1)-exp(testData[,"Y_numeric"]$Y_numeric)))
print(sprintf(" Regression MAE = %1.2f",mae))
[1] " Regression MAE = 13.89"

We output two metrics, rmse and mae, for our regression task. We covered these earlier in the chapter. Mean absolute error measures the absolute differences between the predicted value and the actual value. Root mean squared error (rmse) penalizes the square of the differences between the predicted value and the actual value, so one big error costs more than the sum of the small errors. Now let's look at the deep learning regression code. First we load the data and define the model:

require(mxnet)
Loading required package: mxnet

# MXNet expects matrices
train_X <- data.matrix(trainData[, predictorCols])
test_X <- data.matrix(testData[, predictorCols])
train_Y <- trainData$Y_numeric

set.seed(42)
# hyper-parameters
num_hidden <- c(256,128,128,64)
drop_out <- c(0.4,0.4,0.4,0.4)
wd=0.00001
lr <- 0.0002
num_epochs <- 100
activ <- "tanh"

# create our model architecture
# using the hyper-parameters defined above
data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=num_hidden[1])
act1 <- mx.symbol.Activation(fc1, name="activ1", act_type=activ)
drop1 <- mx.symbol.Dropout(data=act1,p=drop_out[1])

fc2 <- mx.symbol.FullyConnected(drop1, name="fc2", num_hidden=num_hidden[2])
act2 <- mx.symbol.Activation(fc2, name="activ2", act_type=activ)
drop2 <- mx.symbol.Dropout(data=act2,p=drop_out[2])

fc3 <- mx.symbol.FullyConnected(drop2, name="fc3", num_hidden=num_hidden[3])
act3 <- mx.symbol.Activation(fc3, name="activ3", act_type=activ)
drop3 <- mx.symbol.Dropout(data=act3,p=drop_out[3])

fc4 <- mx.symbol.FullyConnected(drop3, name="fc4", num_hidden=num_hidden[4])
act4 <- mx.symbol.Activation(fc4, name="activ4", act_type=activ)
drop4 <- mx.symbol.Dropout(data=act4,p=drop_out[4])

fc5 <- mx.symbol.FullyConnected(drop4, name="fc5", num_hidden=1)
lro <- mx.symbol.LinearRegressionOutput(fc5)

Now we train the model; note that the first comment shows how to switch to using a GPU instead of a CPU:

# run on cpu, change to 'devices <- mx.gpu()'
# if you have a suitable GPU card
devices <- mx.cpu()
mx.set.seed(0)
tic <- proc.time()
# This actually trains the model
model <- mx.model.FeedForward.create(lro, X = train_X, y = train_Y,
ctx = devices,num.round = num_epochs,
learning.rate = lr, momentum = 0.9,
eval.metric = mx.metric.rmse,
initializer = mx.init.uniform(0.1),
wd=wd,
epoch.end.callback = mx.callback.log.train.metric(1))
print(proc.time() - tic)
user system elapsed
13.90 1.82 10.50

pr4 <- predict(model, test_X)[1,]
rmse <- sqrt(mean((exp(pr4)-exp(testData[,"Y_numeric"]$Y_numeric))^2))
print(sprintf(" Deep Learning Regression RMSE = %1.2f",rmse))
[1] " Deep Learning Regression RMSE = 28.92"
mae <- mean(abs(exp(pr4)-exp(testData[,"Y_numeric"]$Y_numeric)))
print(sprintf(" Deep Learning Regression MAE = %1.2f",mae))
[1] " Deep Learning Regression MAE = 14.33"
rm(data,fc1,act1,fc2,act2,fc3,act3,fc4,lro,model)

For regression metrics, lower is better, so our rmse metric on the deep learning model (28.92) is an improvement on the original regression model (29.30). Interestingly, the mae on the the deep learning model (14.33) is actually worse than the original regression model (13.89). Since rsme penalizes big differences between actual and predicted values more, this indicates that the errors in the deep learning model are less extreme than the regression model.

主站蜘蛛池模板: 万载县| 莆田市| 蓝田县| 沙雅县| 新河县| 莎车县| 云龙县| 永川市| 额尔古纳市| 平定县| 招远市| 桐梓县| 红桥区| 宁国市| 绥阳县| 景泰县| 雅安市| 伊金霍洛旗| 察隅县| 洛隆县| 建昌县| 新民市| 天台县| 苏尼特左旗| 贡觉县| 赤峰市| 垦利县| 花莲市| 海口市| 广丰县| 东方市| 南开区| 华蓥市| 津南区| 武胜县| 金川县| 顺昌县| 杂多县| 朝阳县| 宿松县| 淮安市|