- R Deep Learning Essentials
- Mark Hodnett Joshua F. Wiley
- 1458字
- 2021-08-13 15:34:31
Neural network code
While the web application is useful to see the output of the neural network, we can also run the code for the neural network to really see how it works. The code in Chapter3/nnet.R allows us to do just that. This code has the same hyper-parameters as in the web application; this file allows you to run the neural network from the RStudio IDE. The following is the code that loads the data and sets the initial hyper-parameters for the neural network:
source("nnet_functions.R")
data_sel <- "bulls_eye"
........
####################### neural network ######################
hidden <- 3
epochs <- 3000
lr <- 0.5
activation_ftn <- "sigmoid"
df <- getData(data_sel) # from nnet_functions
X <- as.matrix(df[,1:2])
Y <- as.matrix(df$Y)
n_x=ncol(X)
n_h=hidden
n_y=1
m <- nrow(X)
This code should not be too difficult to understand, it loads a dataset and sets some variables. The data is created in the getData function from the Chapter3/nnet_functions.R file. The data is created from functions in the clustersim package. The Chapter3/nnet_functions.R file contains the core functionality of our neural network that we will look at here. Once we load our data, the next step is to initialize our weights and biases. The hidden variable controls the number of nodes in the hidden layer; we set it to 3. We need two sets of weights and biases, one for the hidden layer and one for the output layer:
# initialise weights
set.seed(42)
weights1 <- matrix(0.01*runif(n_h*n_x)-0.005, ncol=n_x, nrow=n_h)
weights2 <- matrix(0.01*runif(n_y*n_h)-0.005, ncol=n_h, nrow=n_y)
bias1 <- matrix(rep(0,n_h),nrow=n_h,ncol=1)
bias2 <- matrix(rep(0,n_y),nrow=n_y,ncol=1)
This creates matrices for the (weights1, bias1) hidden layer and the (weights2, bias2) output layer. We need to ensure our matrices have the correct dimensions. For example, the weights1 matrix should have the same number of columns as the input layer and the same number of rows as the hidden layer. Now we move on to the actual processing loop of the neural network:
for (i in 0:epochs)
{
activation2 <- forward_prop(t(X),activation_ftn,weights1,bias1, weights2,bias2)
cost <- cost_f(activation2,t(Y))
backward_prop(t(X),t(Y),activation_ftn,weights1,weights2, activation1,activation2)
weights1 <- weights1 - (lr * dweights1)
bias1 <- bias1 - (lr * dbias1)
weights2 <- weights2 - (lr * dweights2)
bias2 <- bias2 - (lr * dbias2)
if ((i %% 500) == 0)
print (paste(" Cost after",i,"epochs =",cost))
}
[1] " Cost after 0 epochs = 0.693147158995952"
[1] " Cost after 500 epochs = 0.69314587328381"
[1] " Cost after 1000 epochs = 0.693116915341439"
[1] " Cost after 1500 epochs = 0.692486724429629"
[1] " Cost after 2000 epochs = 0.687107068792801"
[1] " Cost after 2500 epochs = 0.660418522655335"
[1] " Cost after 3000 epochs = 0.579832913091798"
We first run the forward-propagation function, then calculate a cost. We then call a backward-propagation step that calculates our derivatives, (dweights1, dbias1, dweights2, dbias2). Then we update the weights and biases, (weights1, bias1, weights2, bias2), using our Learning rate, (lr). We run this loop for the number of epochs (3000) and print out a diagnostic message every 500 epochs. This describes how every neural network and deep learning model works: first call forward-propagation, then calculate costs and derivative values, use those to update the weights through back-propagation and repeat.
Now let's look at some of the functions in the nnet_functions.R file. The following is the forward propagation function:
forward_prop <- function(X,activation_ftn,weights1,bias1,weights2,bias2)
{
# broadcast hack
bias1a<-bias1
for (i in 2:ncol(X))
bias1a<-cbind(bias1a,bias1)
bias2a<-bias2
for (i in 2:ncol(activation1))
bias2a<-cbind(bias2a,bias2)
Z1 <<- weights1 %*% X + bias1a
activation1 <<- activation_function(activation_ftn,Z1)
bias2a<-bias2
for (i in 2:ncol(activation1))
bias2a<-cbind(bias2a,bias2)
Z2 <<- weights2 %*% activation1 + bias2a
activation2 <<- sigmoid(Z2)
return (activation2)
}
The two for loops expand the bias vectors into matrices, then repeat the vector n times. The interesting code starts with the Z1 assignment. Z1 is a matrix multiplication, followed by an addition. We call the activation_function function on that value. We then use that output value and perform a similar operation for Z2. Finally, we apply a sigmoid activation to our output layer because our problem is binary classification.
The following is the code for the activation function; the first parameter decides which function to use (sigmoid, tanh, or relu). The second parameter is the value to be used as input:
activation_function <- function(activation_ftn,v)
{
if (activation_ftn == "sigmoid")
res <- sigmoid(v)
else if (activation_ftn == "tanh")
res <- tanh(v)
else if (activation_ftn == "relu")
{
v[v<0] <- 0
res <- v
}
else
res <- sigmoid(v)
return (res)
}
The following is the cost function:
cost_f <- function(activation2,Y)
{
cost = -mean((log(activation2) * Y)+ (log(1-activation2) * (1-Y)))
return(cost)
}
As a reminder, the output of the cost function is what we are trying to minimize. There are many types of cost functions; in this application we are using binary cross-entropy. The formula for binary cross-entropy is -1/m ∑ log(?i) * yi + (log(1 -?i) * (1-yi). Our target values (yi) are always either 1 or 0, so for instances where yi = 1, this reduces to ∑log(?i). If we have two rows where yi = 1 and suppose that our model predicts 1.0 for the first row and the 0.0001 for the second row, then the costs for the rows are log(1)=0 and log(0.0001)=-9.1, respectively. We can see that the closer to 1 the prediction is for these rows, the lower the cost value. Similarly, for rows where yi = 0, this reduces to log(1-?i), so the closer to 0 the prediction is for these rows, the lower the cost value.
The following is the code for the backward-propagation function:
backward_prop <- function(X,Y,activation_ftn,weights1,weights2,activation1,activation2)
{
m <- ncol(Y)
derivative2 <- activation2-Y
dweights2 <<- (derivative2 %*% t(activation1)) / m
dbias2 <<- rowSums(derivative2) / m
upd <- derivative_function(activation_ftn,activation1)
derivative1 <- t(weights2) %*% derivative2 * upd
dweights1 <<- (derivative1 %*% t(X)) / m
dbias1 <<- rowSums(derivative1) / m
}
Backward propagation processes the network in reverse, starting at the last hidden layer and finishing at the first hidden layer, that is, in the direction of the output layer to the input layer. In our case, we only have one hidden layer, so it first calculates the loss from the output layer and calculates dweight2 and dbias2. It then calculates the derivative of the activation1 value, which was calculated during the forward-propagation step. The derivative function is similar to the activation function, but instead of calling an activation function, it calculates the derivative of that function. For example, the derivative of sigmoid(x) is sigmoid(x) * (1 - sigmoid(x)). The derivative values of simple functions can be found in any calculus reference or online:
derivative_function <- function(activation_ftn,v)
{
if (activation_ftn == "sigmoid")
upd <- (v * (1 - v))
else if (activation_ftn == "tanh")
upd <- (1 - (v^2))
else if (activation_ftn == "relu")
upd <- ifelse(v > 0.0,1,0)
else
upd <- (v * (1 - v))
return (upd)
}
That's it! A working neural network using basic R code. It can fit complex functions and performs better than logistic regression. You might not get all the parts at once, that's OK. The following is a quick recap of the steps:
- Run a forward-propagation step, which involves multiplying the weights by the input for each layer and passing the output to the next layer.
- Evaluate the output from the final layer using the cost function.
- Based on the error rate, use backpropagation to make small adjustments to the weights in the nodes in each layer. The learning rate controls how much of an adjustment we make each time.
- Repeat steps 1-3, maybe thousands of times, until the cost function begins to plateau, which indicates our model is trained.
- Intel FPGA/CPLD設計(基礎篇)
- Raspberry Pi 3 Cookbook for Python Programmers
- 計算機組裝與系統配置
- micro:bit魔法修煉之Mpython初體驗
- Svelte 3 Up and Running
- Arduino BLINK Blueprints
- 計算機組裝維修與外設配置(高等職業院校教改示范教材·計算機系列)
- 筆記本電腦使用、維護與故障排除從入門到精通(第5版)
- Building 3D Models with modo 701
- 微型計算機系統原理及應用:國產龍芯處理器的軟件和硬件集成(基礎篇)
- 無蘋果不生活:OS X Mountain Lion 隨身寶典
- 圖解計算機組裝與維護
- Arduino項目開發:智能生活
- 可編程邏輯器件項目開發設計
- 從企業級開發到云原生微服務:Spring Boot實戰