官术网_书友最值得收藏!

The regression model

The previous section developed a deep learning model for a binary classification task, this section develops a deep learning model to predict a continuous numeric value, regression analysis. We use the same dataset that we used for the binary classification task, but we use a different target column to predict for. In that task, we wanted to predict whether a customer would return to our stores in the next 14 days. In this task, we want to predict how much a customer will spend in our stores in the next 14 days. We follow a similar process; we load and prepare our dataset by applying log transformations to the data. The code is in Chapter4/regression.R:

set.seed(42)
fileName <- "../dunnhumby/predict.csv"
dfData <- read_csv(fileName,
col_types = cols(
.default = col_double(),
CUST_CODE = col_character(),
Y_categ = col_integer())
)
nobs <- nrow(dfData)
train <- sample(nobs, 0.9*nobs)
test <- setdiff(seq_len(nobs), train)
predictorCols <- colnames(dfData)[!(colnames(dfData) %in% c("CUST_CODE","Y_numeric","Y_numeric"))]

dfData[, c("Y_numeric",predictorCols)] <- log(0.01+dfData[, c("Y_numeric",predictorCols)])
trainData <- dfData[train, c(predictorCols,"Y_numeric")]
testData <- dfData[test, c(predictorCols,"Y_numeric")]

xtrain <- model.matrix(Y_numeric~.,trainData)
xtest <- model.matrix(Y_numeric~.,testData)

We then perform regression analysis on the data using lm to create a benchmark before creating a deep learning model:

# lm Regression Model
regModel1=lm(Y_numeric ~ .,data=trainData)
pr1 <- predict(regModel1,testData)
rmse <- sqrt(mean((exp(pr1)-exp(testData[,"Y_numeric"]$Y_numeric))^2))
print(sprintf(" Regression RMSE = %1.2f",rmse))
[1] " Regression RMSE = 29.30"
mae <- mean(abs(exp(pr1)-exp(testData[,"Y_numeric"]$Y_numeric)))
print(sprintf(" Regression MAE = %1.2f",mae))
[1] " Regression MAE = 13.89"

We output two metrics, rmse and mae, for our regression task. We covered these earlier in the chapter. Mean absolute error measures the absolute differences between the predicted value and the actual value. Root mean squared error (rmse) penalizes the square of the differences between the predicted value and the actual value, so one big error costs more than the sum of the small errors. Now let's look at the deep learning regression code. First we load the data and define the model:

require(mxnet)
Loading required package: mxnet

# MXNet expects matrices
train_X <- data.matrix(trainData[, predictorCols])
test_X <- data.matrix(testData[, predictorCols])
train_Y <- trainData$Y_numeric

set.seed(42)
# hyper-parameters
num_hidden <- c(256,128,128,64)
drop_out <- c(0.4,0.4,0.4,0.4)
wd=0.00001
lr <- 0.0002
num_epochs <- 100
activ <- "tanh"

# create our model architecture
# using the hyper-parameters defined above
data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=num_hidden[1])
act1 <- mx.symbol.Activation(fc1, name="activ1", act_type=activ)
drop1 <- mx.symbol.Dropout(data=act1,p=drop_out[1])

fc2 <- mx.symbol.FullyConnected(drop1, name="fc2", num_hidden=num_hidden[2])
act2 <- mx.symbol.Activation(fc2, name="activ2", act_type=activ)
drop2 <- mx.symbol.Dropout(data=act2,p=drop_out[2])

fc3 <- mx.symbol.FullyConnected(drop2, name="fc3", num_hidden=num_hidden[3])
act3 <- mx.symbol.Activation(fc3, name="activ3", act_type=activ)
drop3 <- mx.symbol.Dropout(data=act3,p=drop_out[3])

fc4 <- mx.symbol.FullyConnected(drop3, name="fc4", num_hidden=num_hidden[4])
act4 <- mx.symbol.Activation(fc4, name="activ4", act_type=activ)
drop4 <- mx.symbol.Dropout(data=act4,p=drop_out[4])

fc5 <- mx.symbol.FullyConnected(drop4, name="fc5", num_hidden=1)
lro <- mx.symbol.LinearRegressionOutput(fc5)

Now we train the model; note that the first comment shows how to switch to using a GPU instead of a CPU:

# run on cpu, change to 'devices <- mx.gpu()'
# if you have a suitable GPU card
devices <- mx.cpu()
mx.set.seed(0)
tic <- proc.time()
# This actually trains the model
model <- mx.model.FeedForward.create(lro, X = train_X, y = train_Y,
ctx = devices,num.round = num_epochs,
learning.rate = lr, momentum = 0.9,
eval.metric = mx.metric.rmse,
initializer = mx.init.uniform(0.1),
wd=wd,
epoch.end.callback = mx.callback.log.train.metric(1))
print(proc.time() - tic)
user system elapsed
13.90 1.82 10.50

pr4 <- predict(model, test_X)[1,]
rmse <- sqrt(mean((exp(pr4)-exp(testData[,"Y_numeric"]$Y_numeric))^2))
print(sprintf(" Deep Learning Regression RMSE = %1.2f",rmse))
[1] " Deep Learning Regression RMSE = 28.92"
mae <- mean(abs(exp(pr4)-exp(testData[,"Y_numeric"]$Y_numeric)))
print(sprintf(" Deep Learning Regression MAE = %1.2f",mae))
[1] " Deep Learning Regression MAE = 14.33"
rm(data,fc1,act1,fc2,act2,fc3,act3,fc4,lro,model)

For regression metrics, lower is better, so our rmse metric on the deep learning model (28.92) is an improvement on the original regression model (29.30). Interestingly, the mae on the the deep learning model (14.33) is actually worse than the original regression model (13.89). Since rsme penalizes big differences between actual and predicted values more, this indicates that the errors in the deep learning model are less extreme than the regression model.

主站蜘蛛池模板: 儋州市| 拜城县| 万安县| 五寨县| 哈尔滨市| 桑植县| 马关县| 呼和浩特市| 临澧县| 中牟县| 绥德县| 元阳县| 嘉定区| 饶平县| 获嘉县| 婺源县| 读书| 游戏| 楚雄市| 仙居县| 汉寿县| 新丰县| 尚义县| 庆阳市| 苍山县| 东光县| 淄博市| 福海县| 光山县| 尼木县| 阿鲁科尔沁旗| 汤阴县| 桃园县| 滁州市| 邵东县| 扎囊县| 江阴市| 奇台县| 瑞安市| 黄平县| 忻州市|