- Natural Language Processing with TensorFlow
- Thushan Ganegedara
- 965字
- 2021-06-25 21:28:22
Implementing our first neural network
Great! Now that you've learned the architecture, basics, and scoping mechanism of TensorFlow, it's high time that we move on and implement something moderately complex. Let's implement a neural network. Precisely, we will implement a fully connected neural network model that we discussed in Chapter 1, Introduction to Natural Language Processing.
One of the stepping stones to the introduction of neural networks is to implement a neural network that is able to classify digits. For this task, we will be using the famous MNIST dataset made available at http://yann.lecun.com/exdb/mnist/. You might feel a bit skeptical regarding our using a computer vision task rather than a NLP task. However, vision tasks can be implemented with less preprocessing and are easy to understand.
As this is our first encounter with neural networks, we will walk through the main parts of the example. However, note that I will only walk through the crucial bits of the exercise. To run the example end to end, you can find the full exercise in the tensorflow_introduction.ipynb
file in the ch2
folder.
Preparing the data
First, we need to download the dataset with the maybe_download(...)
function and preprocess it with the read_mnist(...)
function. These two functions are defined in the exercise file. The read_mnist(...)
function performs two main steps:
- Reading the byte stream of the dataset and forming it into a proper
numpy.ndarray
object - Standardizing the images to have a zero-mean and unit-variance (also known as whitening)
The following code shows the read_mnist(...)
function. The read_mnist(...)
function takes the filename of the file containing images and the filename of the file containing labels, as input. Then the read_mnist(...)
function produces two NumPy matrices containing all the images and their corresponding labels:
def read_mnist(fname_img, fname_lbl): print('\nReading files %s and %s'%(fname_img, fname_lbl)) with gzip.open(fname_img) as fimg: magic, num, rows, cols = struct.unpack(">IIII", fimg.read(16)) print(num,rows,cols) img = (np.frombuffer(fimg.read(num*rows*cols), dtype=np.uint8).reshape(num, rows * cols)).astype(np.float32) print('(Images) Returned a tensor of shape ',img.shape) # Standardizing the images img = (img - np.mean(img))/np.std(img) with gzip.open(fname_lbl) as flbl: # flbl.read(8) reads upto 8 bytes magic, num = struct.unpack(">II", flbl.read(8)) lbl = np.frombuffer(flbl.read(num), dtype=np.int8) print('(Labels) Returned a tensor of shape: %s'%lbl.shape) print('Sample labels: ',lbl[:10]) return img, lbl
Defining the TensorFlow graph
To define the TensorFlow graph, we'll first define placeholders for the input images (tf_inputs
) and the corresponding labels (tf_labels
):
# Defining inputs and outputs tf_inputs = tf.placeholder(shape=[batch_size, input_size], dtype=tf.float32, name = 'inputs') tf_labels = tf.placeholder(shape=[batch_size, num_labels], dtype=tf.float32, name = 'labels')
Next, we'll write a Python function that will create the variables for the first time. Note that we are using scoping to ensure the reusability, and make sure that our variables are named properly:
# Defining the TensorFlow variables def define_net_parameters(): with tf.variable_scope('layer1'): tf.get_variable(WEIGHTS_STRING,shape=[input_size,500], initializer=tf.random_normal_initializer(0,0.02)) tf.get_variable(BIAS_STRING, shape=[500], initializer=tf.random_uniform_initializer(0,0.01)) with tf.variable_scope('layer2'): tf.get_variable(WEIGHTS_STRING,shape=[500,250], initializer=tf.random_normal_initializer(0,0.02)) tf.get_variable(BIAS_STRING, shape=[250], initializer=tf.random_uniform_initializer(0,0.01)) with tf.variable_scope('output'): tf.get_variable(WEIGHTS_STRING,shape=[250,10], initializer=tf.random_normal_initializer(0,0.02)) tf.get_variable(BIAS_STRING, shape=[10], initializer=tf.random_uniform_initializer(0,0.01))
Next, we'll define the inference process for the neural network. Note how the scoping has given a very intuitive flow to the code in the function, compared with using variables without scoping. So, in this network we have three layers:
- A fully-connected layer with ReLU activation (
layer1
) - A fully-connected layer with ReLU activation (
layer2
) - A fully-connected softmax layer (
output
)
By means of scoping, we name variables (weights and biases) for each layer as, layer1/weights
, layer1/bias
, layer2/weights
, layer2/bias
, output/weights
, and output/bias
. Note that in the code, all of them have the same name, but different scopes:
# Defining calcutations in the neural network # starting from inputs to logits # logits are the values before applying softmax to the final output def inference(x): # calculations for layer 1 with tf.variable_scope('layer1',reuse=True): w,b = tf.get_variable(WEIGHTS_STRING), tf.get_variable(BIAS_STRING) tf_h1 = tf.nn.relu(tf.matmul(x,w) + b, name = 'hidden1') # calculations for layer 2 with tf.variable_scope('layer2',reuse=True): w,b = tf.get_variable(WEIGHTS_STRING), tf.get_variable(BIAS_STRING) tf_h2 = tf.nn.relu(tf.matmul(tf_h1,w) + b, name = 'hidden1') # calculations for output layer with tf.variable_scope('output',reuse=True): w,b = tf.get_variable(WEIGHTS_STRING), tf.get_variable(BIAS_STRING) tf_logits = tf.nn.bias_add(tf.matmul(tf_h2,w), b, name = 'logits') return tf_logits
Now we'll define a loss function and then a loss minimize operation. The loss minimize operation minimizes the loss by nudging the network parameters in the direction that minimizes the loss. There is a perse collection of optimizers available in TensorFlow. Here, we will be using MomentumOptimizer
, which gives better final accuracy and convergence than GradientDescentOptimizer
:
# defining the loss tf_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=inference(tf_inputs), labels=tf_labels)) # defining the optimize function tf_loss_minimize = tf.train.MomentumOptimizer(momentum=0.9,learning_rate=0.01).minimize(tf_loss)
Finally, we'll define an operation to retrieve the predicted softmax probabilities for a given batch of inputs. This in turn will be used to calculate the accuracy of your neural network:
# defining predictions tf_predictions = tf.nn.softmax(inference(tf_inputs))
Running the neural network
Now we have all the essential operations required to run the neural network and examine whether it's capable of learning to successfully classify digits:
for epoch in range(NUM_EPOCHS): train_loss = [] # Training Phase for step in range(train_inputs.shape[0]//batch_size): # Creating one-hot encoded labels with labels # One-hot encoding digit 3 for 10-class MNIST dataset # will result in # [0,0,0,1,0,0,0,0,0,0] labels_one_hot = np.zeros((batch_size, num_labels),dtype=np.float32) labels_one_hot[np.arange(batch_size),train_labels[step*batch_size:(step+1)*batch_size]] = 1.0 # Running the optimization process loss, _ = session.run([tf_loss,tf_loss_minimize],feed_dict={ tf_inputs: train_inputs[step*batch_size: (step+1)*batch_size,:], tf_labels: labels_one_hot}) train_loss.append(loss) # Used to average the loss for a single epoch test_accuracy = [] # Testing Phase for step in range(test_inputs.shape[0]//batch_size): test_predictions = session.run(tf_predictions,feed_dict={tf_inputs: test_inputs[step*batch_size: (step+1)*batch_size,:]}) batch_test_accuracy = accuracy(test_predictions,test_labels[step*batch_size: (step+1)*batch_size]) test_accuracy.append(batch_test_accuracy) print('Average train loss for the %d epoch: %.3f\n'%(epoch+1,np.mean(train_loss))) print('\tAverage test accuracy for the %d epoch: %.2f\n'%(epoch+1,np.mean(test_accuracy)*100.0))
In this code, accuracy(test_predictions,test_labels)
is a function that takes some predictions and labels as inputs and provides the accuracy (how many predictions matched the actual label). It is defined in the exercise file.
If successful, you should be able to see a behavior similar to the ones shown in Figure 2.10. After 50 epochs, the test accuracy should reach approximately 98%:

Figure 2.10: Training loss and test accuracy for the MNIST digit classification task
- Spring Cloud Alibaba核心技術與實戰(zhàn)案例
- Web應用系統(tǒng)開發(fā)實踐(C#)
- jQuery Mobile Web Development Essentials(Third Edition)
- CockroachDB權威指南
- 單片機C語言程序設計實訓100例:基于STC8051+Proteus仿真與實戰(zhàn)
- SQL Server 2012數(shù)據(jù)庫技術及應用(微課版·第5版)
- Leap Motion Development Essentials
- PostgreSQL Replication(Second Edition)
- Terraform:多云、混合云環(huán)境下實現(xiàn)基礎設施即代碼(第2版)
- Practical Game Design with Unity and Playmaker
- ArcGIS for Desktop Cookbook
- 貫通Tomcat開發(fā)
- Learning D
- 從零開始學UI:概念解析、實戰(zhàn)提高、突破規(guī)則
- IBM RUP參考與認證指南