官术网_书友最值得收藏!

Data understanding and preparation

To start, we will load the necessary packages and put the required ones in the environment. The data is in the MASS package:

> library(magrittr)

> install.packages(caret)

> install.packages(MASS)

> library(MASS)

> install.packages("neuralnet")

> install.packages("vtreat")

The neuralnet package will be used for building the model and caret for data preparation. Let's load the data and examine its structure:

> data(shuttle)

> str(shuttle)

The data consists of 256 observations and 7 features. Notice that all of the features are categorical and the response is use with two levels, auto and noauto, as follows:

  • stability: This is stable positioning or not (stab/xstab)
  • error: This is the size of the error (MM / SS / LX)
  • sign: This is the sign of the error, positive or negative (pp/nn)
  • wind: This is the wind sign (head / tail)
  • magn: This is the wind strength (Light / Medium / Strong / Out of Range)
  • vis: This is the visibility (yes / no)

Here, we will look at a table of the response/outcome:

> table(shuttle$use)
auto noauto
145 111

Almost 57% of the time, the decision is to use the autolander. We'll now get our training and testing data set up for modeling:

> set.seed(1942)

> trainIndex <-
caret::createDataPartition(shuttle$use, p = .6, list = FALSE)

> shuttleTrain <- shuttle[trainIndex, -7]

> shuttleTest <- shuttle[-trainIndex, -7]

We are going to treat the data to create numeric features, and also drop the cat_P features that the function creates. We covered the idea of treating a dataframe in Chapter 1, Preparing and Understanding Data:

> treatShuttle <- vtreat::designTreatmentsZ(shuttleTrain, colnames(shuttleTrain))

> train_treated <- vtreat::prepare(treatShuttle, shuttleTrain)

> train_treated <- train_treated[, c(-1,-2)]

> test_treated <- vtreat::prepare(treatShuttle, shuttleTest)

> test_treated <- test_treated[, c(-1, -2)]

The next couple portions of code I find awkward. Because neuralnet() requires a formula and the data in a dataframe, we have to turn the response into a numeric list and then add it to our treated train and test data:

> shuttle_trainY <- shuttle[trainIndex, 7]

> train_treated$y <- ifelse(shuttle_trainY == "auto", 1, 0)

> shuttle_testY <- shuttle[-trainIndex, 7]

> test_treated$y <- ifelse(shuttle_testY == "auto", 1, 0)

The function in neuralnet will call for the use of a formula as we used elsewhere, such as y~x1+x2+x3+x4, data = df. In the past, we used y~ to specify all the other variables in the data as inputs. However, neuralnet does not accommodate this at the time of writing. The way around this limitation is to use the as.formula() function. After first creating an object of the variable names, we will use this as an input to paste the variables properly on the right-hand side of the equation:

> n <- names(train_treated)

> form <- as.formula(paste("y ~", paste(n[!n %in% "y"], collapse = " + ")))

The object form give us what we need to build our model.

主站蜘蛛池模板: 保德县| 四子王旗| 凤凰县| 绥化市| 广元市| 闸北区| 藁城市| 宁化县| 都匀市| 崇信县| 松江区| 鸡东县| 滕州市| 蓝田县| 吉林市| 盖州市| 梁山县| 江陵县| 鹤峰县| 深圳市| 郧西县| 蓬溪县| 开远市| 桂阳县| 九江市| 武城县| 绥芬河市| 尉氏县| 泾阳县| 洪江市| 涡阳县| 铁岭县| 金秀| 石屏县| 广丰县| 平谷区| 聊城市| 介休市| 东兴市| 鞍山市| 平果县|