- Advanced Machine Learning with R
- Cory Lesmeister Dr. Sunil Kumar Chinnamgari
- 242字
- 2021-06-24 14:24:45
Data understanding and preparation
To start, we will load the necessary packages and put the required ones in the environment. The data is in the MASS package:
> library(magrittr)
> install.packages(caret)
> install.packages(MASS)
> library(MASS)
> install.packages("neuralnet")
> install.packages("vtreat")
The neuralnet package will be used for building the model and caret for data preparation. Let's load the data and examine its structure:
> data(shuttle)
> str(shuttle)
The data consists of 256 observations and 7 features. Notice that all of the features are categorical and the response is use with two levels, auto and noauto, as follows:
- stability: This is stable positioning or not (stab/xstab)
- error: This is the size of the error (MM / SS / LX)
- sign: This is the sign of the error, positive or negative (pp/nn)
- wind: This is the wind sign (head / tail)
- magn: This is the wind strength (Light / Medium / Strong / Out of Range)
- vis: This is the visibility (yes / no)
Here, we will look at a table of the response/outcome:
> table(shuttle$use)
auto noauto
145 111
Almost 57% of the time, the decision is to use the autolander. We'll now get our training and testing data set up for modeling:
> set.seed(1942)
> trainIndex <-
caret::createDataPartition(shuttle$use, p = .6, list = FALSE)
> shuttleTrain <- shuttle[trainIndex, -7]
> shuttleTest <- shuttle[-trainIndex, -7]
We are going to treat the data to create numeric features, and also drop the cat_P features that the function creates. We covered the idea of treating a dataframe in Chapter 1, Preparing and Understanding Data:
> treatShuttle <- vtreat::designTreatmentsZ(shuttleTrain, colnames(shuttleTrain))
> train_treated <- vtreat::prepare(treatShuttle, shuttleTrain)
> train_treated <- train_treated[, c(-1,-2)]
> test_treated <- vtreat::prepare(treatShuttle, shuttleTest)
> test_treated <- test_treated[, c(-1, -2)]
The next couple portions of code I find awkward. Because neuralnet() requires a formula and the data in a dataframe, we have to turn the response into a numeric list and then add it to our treated train and test data:
> shuttle_trainY <- shuttle[trainIndex, 7]
> train_treated$y <- ifelse(shuttle_trainY == "auto", 1, 0)
> shuttle_testY <- shuttle[-trainIndex, 7]
> test_treated$y <- ifelse(shuttle_testY == "auto", 1, 0)
The function in neuralnet will call for the use of a formula as we used elsewhere, such as y~x1+x2+x3+x4, data = df. In the past, we used y~ to specify all the other variables in the data as inputs. However, neuralnet does not accommodate this at the time of writing. The way around this limitation is to use the as.formula() function. After first creating an object of the variable names, we will use this as an input to paste the variables properly on the right-hand side of the equation:
> n <- names(train_treated)
> form <- as.formula(paste("y ~", paste(n[!n %in% "y"], collapse = " + ")))
The object form give us what we need to build our model.
- Aftershot Pro:Non-destructive photo editing and management
- Python GUI Programming:A Complete Reference Guide
- Linux運維之道(第2版)
- 精選單片機設計與制作30例(第2版)
- The Applied AI and Natural Language Processing Workshop
- 3ds Max Speed Modeling for 3D Artists
- Unity 5.x Game Development Blueprints
- 單片機原理及應用系統設計
- 微服務分布式架構基礎與實戰:基于Spring Boot + Spring Cloud
- The Deep Learning with Keras Workshop
- Rapid BeagleBoard Prototyping with MATLAB and Simulink
- RISC-V處理器與片上系統設計:基于FPGA與云平臺的實驗教程
- 數字媒體專業英語(第2版)
- 圖解計算機組裝與維護
- Blender 3D By Example