pp电子试玩

書名： Natural Language Processing with TensorFlow
作者名： Thushan Ganegedara
本章字?jǐn)?shù)： 4757字
更新時間： 2021-06-25 21:28:21

Inputs, variables, outputs, and operations

Now with an understanding of the underlying architecture let's proceed to the most common elements that comprise a TensorFlow client. If you read any of the millions of TensorFlow clients available on the internet, they all (the TensorFlow-related code) fall into one of these buckets:

Inputs: Data used to train and test our algorithms
Variables: Mutable tensors, mostly defining the parameters of our algorithms
Outputs: Immutable tensors storing both terminal and intermediate outputs
Operations: Various transformations for inputs to produce the desired outputs

In our earlier example, in the sigmoid example, we can find instances of all these categories. We list the elements in Table 2.1:

The following subsections explain each of these TensorFlow elements in more detail.

Defining inputs in TensorFlow

The client can mainly receive data in three different ways:

Feeding data at every step of the algorithm with Python code
Preloading and storing data as TensorFlow tensors
Building an input pipeline

Let's look at each of these ways.

Feeding data with Python code

In the first method, data can be fed to the TensorFlow client using conventional Python code. In our earlier example, x is an example of this method. To feed data into the client from external data structures (for example, numpy.ndarray), the TensorFlow library provides an elegant symbolic data structure known as a placeholder defined as tf.placeholder(...). As the name suggests, a placeholder does not require actual data at the graph building stage. Rather, the data is fed only for graph executions invoked with session.run(...,feed_dict={placeholder: value}) by passing the external data to the feed_dict argument in the form of a Python dictionary where the key is the tf.placeholder variable and the corresponding value is the actual data (for example, numpy.ndarray). The placeholder definition takes the following form:

tf.placeholder(dtype, shape=None, name=None)

The arguments are as follows:

dtype: This is the data type for the data fed into the placeholder
shape: This is the shape of the placeholder, given as a 1D vector
name: This is the name of the placeholder, and it is important for debugging

Preloading and storing data as tensors

The second method is similar to the first one, but with one less thing to worry about. We do not have to feed data during the graph execution as the data is preloaded. To see this in action, let's modify our sigmoid example. Remember that we defined x as a placeholder:

x = tf.placeholder(shape=[1,10],dtype=tf.float32,name='x')

Instead, let's define this as a tensor that contains specific values:

x = tf.constant(value=[[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]],dtype=tf.float32,name='x')

Also, the full code would become as follows:

import tensorflow as tf
# Defining the graph and session
graph = tf.Graph() # Creates a graph
session = tf.InteractiveSession(graph=graph) # Creates a session

# Building the graph

# x - A pre-loaded input
x = tf.constant(value=[[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]],dtype=tf.float32,name='x')

W = tf.Variable(tf.random_uniform(shape=[10,5], minval=-0.1, maxval=0.1, dtype=tf.float32),name='W') # Variable
# Variable
b = tf.Variable(tf.zeros(shape=[5],dtype=tf.float32),name='b') 
h = tf.nn.sigmoid(tf.matmul(x,W) + b) # Operation to be performed

# Executing operations and evaluating nodes in the graph
tf.global_variables_initializer().run() # Initialize the variables

# Run the operation without feed_dict
h_eval = session.run(h)
print(h_eval)
session.close()

You will notice there are two main differences from our original sigmoid example. We have defined x in a different way. Instead of using a placeholder object and feeding in the actual value at graph execution, we now assign a specific value straightaway and define x as a tensor. Also, as you can see, we do not feed in any extra arguments at session.run(...). However, on the downside, now you cannot feed different values to x at session.run(...) and see how the output changes.

Building an input pipeline

Input pipelines are designed for more heavy-duty clients that need to process a lot of data quickly. This essentially creates a queue that holds data until it is needed. TensorFlow also provides various preprocessing steps (for example, for adjusting image contrast/brightness or standardization) that can be performed before feeding data to the algorithm. To make things even more efficient, it is possible to have multiple threads reading and processing data in parallel.

A typical pipeline will consist of the following components:

The list of filenames
A filename queue producing filenames for an input (record) reader
A record reader for reading the inputs (records)
A decoder to decode the read records (for example, JPEG image decoding)
Preprocessing steps (optional)
An example (that is, decoded inputs) queue

Let's write a quick example input pipeline using TensorFlow. In this example, we have three text files (text1.txt, text2.txt, and text3.txt) in CSV format, each with five lines and each line having 10 numbers separated by commas (an example line: 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0). We need to read this data as batches (multiple rows of data vectors) by forming an input pipeline from the files all the way to a tensor representing those inputs in the files. We will go step by step to see what is going on.

Note

For more information, refer to the official TensorFlow page on Importing Data at https://www.tensorflow.org/programmers_guide/reading_data.

First, let's import a few important libraries as before:

import tensorflow as tf
import numpy as np

Next, we'll define the graph and session objects:

graph = tf.Graph() # Creates a graph
session = tf.InteractiveSession(graph=graph) # Creates a session

Then we'll define a filename queue, a queue data structure containing filenames. This will be passed as an argument to a reader (soon to be defined). The queue will produce filenames as requested by the reader, so that the reader can fetch the files with these filenames to read data:

filenames = ['test%d.txt'%i for i in range(1,4)]
filename_queue = tf.train.string_input_producer(filenames, capacity=3, shuffle=True, name='string_input_producer')

Here, capacity is the amount of data held in the queue at a given time, and shuffle tells the queue if the data should be shuffled before spitting out.

TensorFlow has several different types of readers (a list of available readers is available at https://www.tensorflow.org/api_guides/python/io_ops#Readers). As we have a few separate text files where a single line represents a single data point, TextLineReader suits us the best:

reader = tf.TextLineReader()

After defining the reader, we can use the read() function to read data from the files. It outputs (key,value) pairs. The key identifies the file and the record (that is, the line of text) being read within the file. We can omit this. The value returns the actual value of the line read by the reader:

key, value = reader.read(filename_queue, name='text_read_op')

Next, we'll define record_defaults, which will be output if any faulty records are found:

record_defaults = [[-1.0], [-1.0], [-1.0], [-1.0], [-1.0], [-1.0], [-1.0], [-1.0], [-1.0], [-1.0]]

Now we decode the read line of text into numerical columns (as we have CSV files). For this we use the decode_csv() method. You will see that we have 10 columns in a single line if you open a file (for example, test1.txt) with a text editor:

col1, col2, col3, col4, col5, col6, col7, col8, col9, col10 = tf.decode_csv(value, record_defaults=record_defaults)

Then we'll concatenate these columns to form a single tensor (we call this features) that will be passed to another method, tf.train.shuffle_batch(). The tf.train.shuffle_batch() method takes the previously defined tensor (features), and outputs a batch of a given batch size by randomly shuffling the tensor:

features = tf.stack([col1, col2, col3, col4, col5, col6, col7, col8, col9, col10])

x = tf.train.shuffle_batch([features], batch_size=3, capacity=5,name='data_batch', min_after_dequeue=1, num_threads=1)

The batch_size argument is the size of the data batch we'll be sampling at a given step, capacity is the capacity of the data queue (more memory required for large queues), and min_after_dequeue represents the minimum number of elements to be left in the queue after dequeue. Finally, num_threads defines how many threads are used to produce a batch of data. If there is lot of preprocessing taking place in the pipeline, you can increase this number. Also, if you need to read data without shuffling (as with tf.train.shuffle_batch), you can use the tf.train.batch operation. Then we'll start this pipeline by calling the following:

coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord, sess=session)

The tf.train.Coordinator() class can be seen as a thread manager. It implements various mechanisms for managing threads (for example, starting threads and joining threads to the main thread once the task is finished). The tf.train.Coordinator() class is needed because, the input pipeline spawns many threads for filling in (that is, enqueue) queues, dequeuing queues, and many other tasks. Next, we will execute tf.train.start_queue_runners(...) using the thread manager we created before. QueueRunner() holds enqueue operations for a queue and they are automatically created during the definition of the input pipeline. So, to fill in the defined queues, we need to start these queue runners with the tf.train.start_queue_runners function.

Next, after the task we're interested in is completed, we explicitly need to stop the threads and join them to the main thread, otherwise the program will hang indefinitely. This is achieved by coord.request_stop() and coord.join(threads). This input pipeline combined with our sigmoid example—so that it reads data from the file directly—would look like the following:

import tensorflow as tf
import numpy as np
import os

# Defining the graph and session
graph = tf.Graph() # Creates a graph
session = tf.InteractiveSession(graph=graph) # Creates a session

### Building the Input Pipeline ###
# The filename queue
filenames = ['test%d.txt'%i for i in range(1,4)]
filename_queue = tf.train.string_input_producer(filenames, capacity=3, shuffle=True,name='string_input_producer')

# check if all files are there
for f in filenames:
    if not tf.gfile.Exists(f):
        raise ValueError('Failed to find file: ' + f)
    else:
        print('File %s found.'%f)

# Reader which takes a filename queue and # read() which outputs data one by one
reader = tf.TextLineReader()

# ready the data of the file and output as key,value pairs# We're discarding the key
key, value = reader.read(filename_queue, name='text_read_op')

# if any problems encountered with reading file # this is the value returned
record_defaults = [[-1.0], [-1.0], [-1.0], [-1.0], [-1.0], [-1.0], [-1.0], [-1.0], [-1.0], [-1.0]]
# decoding the read value to columns
col1, col2, col3, col4, col5, col6, col7, col8, col9, col10 = tf.decode_csv(value, record_defaults=record_defaults)
# Now we stack the columns together to form a single tensor containing # all the columns
features = tf.stack([col1, col2, col3, col4, col5, col6, col7, col8, col9, col10])

# output x is randomly assigned a batch of data of batch_size 
# where the data is read from the .txt files
x = tf.train.shuffle_batch([features], batch_size=3,
                           capacity=5, name='data_batch', 
                           min_after_dequeue=1,num_threads=1)

# QueueRunner retrieve data from queues and we need to explicitly start them
# Coordinator coordinates multiple QueueRunners
# Coordinator coordinates multiple QueueRunners
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord, sess=session)

# Building the graph by defining the variables and calculations
W = tf.Variable(tf.random_uniform(shape=[10,5], minval=-0.1, maxval=0.1, dtype=tf.float32),name='W') # Variable
# Variable
b = tf.Variable(tf.zeros(shape=[5],dtype=tf.float32),name='b') 
h = tf.nn.sigmoid(tf.matmul(x,W) + b) # Operation to be performed

# Executing operations and evaluating nodes in the graph
tf.global_variables_initializer().run() # Initialize the variables

# Calculate h with x and print the results for 5 steps
for step in range(5):
    x_eval, h_eval = session.run([x,h]) 
    print('========== Step %d =========='%step)
    print('Evaluated data (x)')
    print(x_eval)
    print('Evaluated data (h)')
    print(h_eval)
    print('')

# We also need to explicitly stop the coordinator 
# otherwise the process will hang indefinitely
coord.request_stop()
coord.join(threads)
session.close()

Defining variables in TensorFlow

Variables play an important role in TensorFlow. A variable is essentially a tensor with a specific shape defining how many dimensions the variable will have and the size of each dimension. However, unlike a regular tensor, variables are mutable; meaning that the value of the variables can change after they are defined. This is an ideal property to have to implement parameters of a learning model (for example, neural network weights), where the weights change slightly after each step of learning. For example, if you define a variable with x = tf.Variable(0,dtype=tf.int32), you can change the value of that variable using a TensorFlow operation such as tf.assign(x,x+1). However, if you define a tensor such as x = tf.constant(0,dtype=tf.int32), you cannot change the value of the tensor, as for a variable. It should stay 0 until the end of the program execution.

Variable creation is quite simple. In our example, we already created two variables, W and b. When creating a variable, a few things are of high importance. We list them here and discuss each in detail in the following paragraphs:

Variable shape
Data type
Initial value
Name (optional)

The variable shape is a 1D vector of the [x,y,z,...] format. Each value in the list indicates how large the corresponding dimension or axis is. For instance, if you require a 2D tensor with 50 rows and 10 columns as the variable, the shape would be equal to [50,10].

The dimensionality of the variable (that is, the length of the shape vector) is recognized as the rank of the tensor in TensorFlow. Do not confuse this with the rank of a matrix.

Note

Tensor rank in TensorFlow indicates the dimensionality of the tensor; for a two-dimensional matrix, rank = 2.

The data type plays an important role in determining the size of a variable. There are many different data types including the commonly used tf.bool, tf.uint8, tf.float32, and tf.int32. Each data type has a number of bits required to represent a single value with that type. For example, tf.uint8 requires 8 bits, whereas tf.float32 requires 32 bits. It is common practice to use the same data types for computations as doing otherwise can lead to data type mismatches. So if you have two different data types for two tensors that you need to transform, you have to explicitly convert one tensor to the other tensor's type using the tf.cast(...) operation. The tf.cast(...) operation is designed to cope with such situations. For example, if you have an x variable with the tf.int32 type, which needs to be converted to tf.float32, employ tf.cast(x,dtype=tf.float32) to convert x to tf.float32.

Next, a variable requires an initial value to be initialized with. TensorFlow provides several different initializers for our convenience, including constant intializers and normal distribution intializers. Here are a few popular TensorFlow intializers you can use to initialize variables:

tf.zeros
tf.constant_initializer
tf.random_uniform
tf.truncated_normal

Finally, the name of the variable will be used as an ID to identify that variable in the graph. So if you ever visualize the computational graph, the variable will appear by the argument passed to the name keyword. If you do not specify a name, TensorFlow will use the default naming scheme.

Note

Note that the Python variable tf.Variable is assigned to, is not known by the computational graph and is not a part of TensorFlow variable naming. Consider this example where you specify a TensorFlow variable as follows:

a = tf.Variable(tf.zeros([5]),name='b')

Here, the TensorFlow graph will know this variable by the name b and not a.

Defining TensorFlow outputs

TensorFlow outputs are usually tensors and a result of a transformation to either an input or a variable or both. In our example, h is an output, where h = tf.nn.sigmoid(tf.matmul(x,W) + b). It is also possible to give such outputs to other operations, forming a chained set of operations. Furthermore, it does not necessarily have to be TensorFlow operations. You also can use standard Python arithmetic with TensorFlow. Here is an example:

x = tf.matmul(w,A)
y = x + B
z = tf.add(y,C)

Defining TensorFlow operations

If you take a look at the TensorFlow API at https://www.tensorflow.org/api_docs/python/, you will see that TensorFlow has a massive collection of operations available. Here we will take a look at a selected few of the myriad TensorFlow operations.

Comparison operations

Comparison operations are useful for comparing two tensors. The following code example includes a few useful comparison operations. You can find the comprehensive list of comparison operators in the Comparison Operators section at https://www.tensorflow.org/api_guides/python/control_flow_ops. Furthermore, to understand the working of these operations, let's consider two example tensors, x and y:

# Let's assume the following values for x and y
# x (2-D tensor) => [[1,2],[3,4]]
# y (2-D tensor) => [[4,3],[3,2]]
x = tf.constant([[1,2],[3,4]], dtype=tf.int32)
y = tf.constant([[4,3],[3,2]], dtype=tf.int32)

# Checks if two tensors are equal element-wise and returns a boolean tensor
# x_equal_y => [[False,False],[True,False]]
x_equal_y = tf.equal(x, y, name=None) 

# Checks if x is less than y element-wise and returns a boolean tensor
# x_less_y => [[True,True],[False,False]]
x_less_y = tf.less(x, y, name=None) 

# Checks if x is greater or equal than y element-wise and returns a boolean tensor
# x_great_equal_y => [[False,False],[True,True]]
x_great_equal_y = tf.greater_equal(x, y, name=None) 

# Selects elements from x and y depending on whether,
# the condition is satisfied (select elements from x)
# or the condition failed (select elements from y)
condition = tf.constant([[True,False],[True,False]],dtype=tf.bool)
# x_cond_y => [[1,3],[3,2]]
x_cond_y = tf.where(condition, x, y, name=None)

Mathematical operations

TensorFlow allows you to perform math operations on tensors that range from the simple to the complex. We will discuss a few of the mathematical operations made available in TensorFlow. The complete set of operations is available at https://www.tensorflow.org/api_guides/python/math_ops.

# Let's assume the following values for x and y
# x (2-D tensor) => [[1,2],[3,4]]
# y (2-D tensor) => [[4,3],[3,2]]
x = tf.constant([[1,2],[3,4]], dtype=tf.float32)
y = tf.constant([[4,3],[3,2]], dtype=tf.float32)

# Add two tensors x and y in an element-wise fashion
# x_add_y => [[5,5],[6,6]]
x_add_y = tf.add(x, y) 

# Performs matrix multiplication (not element-wise)
# x_mul_y => [[10,7],[24,17]]
x_mul_y = tf.matmul(x, y) 

# Compute natural logarithm of x element-wise
# equivalent to computing ln(x)
# log_x => [[0,0.6931],[1.0986,1.3863]]
log_x = tf.log(x) 

# Performs reduction operation across the specified axis
# x_sum_1 => [3,7]
x_sum_1 = tf.reduce_sum(x, axis=[1], keepdims=False)

# x_sum_2 => [[4],[6]]
x_sum_2 = tf.reduce_sum(x, axis=[0], keepdims=True)
# Segments the tensor according to segment_ids (items with same id in
# the same segment) and computes a segmented sum of the data

data = tf.constant([1,2,3,4,5,6,7,8,9,10], dtype=tf.float32)
segment_ids = tf.constant([0,0,0,1,1,2,2,2,2,2 ], dtype=tf.int32)
# x_seg_sum => [6,9,40]
x_seg_sum = tf.segment_sum(data, segment_ids)

Scatter and gather operations

Scatter and gather operations play a vital role in matrix manipulation tasks, as these two variants are the only way (until recent times) to index tensors in TensorFlow. In other words, you cannot access elements of tensors in TensorFlow as you would in NumPy (for example, x[1,0], where x is a 2D numpy.ndarray). A scatter operation allows you to assign values to specific indices of a given tensor, whereas the gather operation allows you to extract a slice (or inpidual elements) of a given tensor. The following code shows a few variations of the scatter and gather operations:

# 1-D scatter operation
ref = tf.Variable(tf.constant([1,9,3,10,5],dtype=tf.float32),name='scatter_update')
indices = [1,3]
updates = tf.constant([2,4],dtype=tf.float32)
tf_scatter_update = tf.scatter_update(ref, indices, updates, use_locking=None, name=None) 


# n-D scatter operation
indices = [[1],[3]]
updates = tf.constant([[1,1,1],[2,2,2]])
shape = [4,3]
tf_scatter_nd_1 = tf.scatter_nd(indices, updates, shape, name=None)


# n-D scatter operation
indices = [[1,0],[3,1]] # 2 x 2
updates = tf.constant([1,2]) # 2 x 1
shape = [4,3] # 2
tf_scatter_nd_2 = tf.scatter_nd(indices, updates, shape, name=None)


# 1-D gather operation
params = tf.constant([1,2,3,4,5],dtype=tf.float32)
indices = [1,4]
tf_gather = tf.gather(params, indices, validate_indices=True, name=None) #=> [2,5]


# n-D gather operation
params = tf.constant([[0,0,0],[1,1,1],[2,2,2],[3,3,3]],dtype=tf.float32)
indices = [[0],[2]]
tf_gather_nd = tf.gather_nd(params, indices, name=None) #=> [[0,0,0],[2,2,2]]


params = tf.constant([[0,0,0],[1,1,1],[2,2,2],[3,3,3]],dtype=tf.float32)
indices = [[0,1],[2,2]]
tf_gather_nd_2 = tf.gather_nd(params, indices, name=None) #=> [[0,0,0],[2,2,2]]

Neural network-related operations

Now let's look at several useful neural network-related operations that we will use heavily in the following chapters. The operations we will discuss here range from simple element-wise transformations (that is, activations) to computing partial derivatives of a set of parameters with respect to another value. We will also implement a simple neural network as an exercise.

Nonlinear activations used by neural networks

Nonlinear activations enable neural networks to perform well at numerous tasks. Typically, there is a nonlinear activation transformation (that is, activation layer) after each layer output in a neural network (except for the last layer). A nonlinear transformation helps a neural network to learn various nonlinear patterns that are present in data. This is very useful for complex real-world problems, where data often has more complex nonlinear patterns, in contrast to linear patterns. If not for the nonlinear activations between layers, a deep neural network will be a bunch of linear layers stacked on top of each other. Also, a set of linear layers can essentially be compressed to a single bigger linear layer. In conclusion, if not for the nonlinear activations, we cannot create a neural network with more than one layer.

Note

Let's observe the importance of nonlinear activation through an example. First, recall the computation for the neural networks we saw in the sigmoid example. If we disregard b, it will be this:

h = sigmoid(W*x)

Assume a three-layer neural network (having W1, W2, and W3 as layer weights) where each layer does the preceding computation; we can summarize the full computation as follows:

h = sigmoid(W3*sigmoid(W2*sigmoid(W1*x)))

However, if we remove the nonlinear activation (that is, sigmoid), we get this:

h = (W3 * (W2 * (W1 *x))) = (W3*W2*W1)*x

So, without the nonlinear activations, the three layers can be brought down to a single linear layer.

Now we'll list two commonly used nonlinear activations in neural networks and how they can be implemented in TensorFlow:

# Sigmoid activation of x is given by 1 / (1 + exp(-x))
tf.nn.sigmoid(x,name=None)
# ReLU activation of x is given by max(0,x)
tf.nn.relu(x, name=None)

The convolution operation

A convolution operation is a widely used signal-processing technique. For images, convolution is used to produce different effects of an image. An example of edge detection using convolution is shown in Figure 2.6. This is achieved by shifting a convolution filter on top of an image to produce a different output at each location (see Figure 2.7 later in this section). Specifically, at each location we do element-wise multiplication of the elements in the convolution filter with the image patch (same size as the convolution filter) that overlaps with the convolution filter and takes the sum of the multiplication:

Figure 2.6: Using the convolution operation for edge detection in an image (Source: https://en.wikipedia.org/wiki/Kernel_(image_processing))

The following is the implementation of the convolution operation:

x = tf.constant(
    [[
        [[1],[2],[3],[4]],
        [[4],[3],[2],[1]],
        [[5],[6],[7],[8]],
        [[8],[7],[6],[5]]
    ]],
    dtype=tf.float32)

x_filter = tf.constant(
    [
        [
            [[0.5]],[[1]]
        ],
        [
            [[0.5]],[[1]]
        ]
    ],
    dtype=tf.float32)

x_stride = [1,1,1,1]
x_padding = 'VALID'

x_conv = tf.nn.conv2d(
    input=x, filter=x_filter,
    strides=x_stride, padding=x_padding
)

Here, the apparently excessive number of square brackets used might make you think that the example can be made easy to follow by getting rid of these redundant brackets. Unfortunately, that is not the case. For the tf.conv2d(...) operation, TensorFlow requires input, filter, and stride to be of an exact format. We will now go through each argument in tf.conv2d(input, filter, strides, padding) in more detail:

input: This is typically a 4D tensor where the dimensions should be ordered as [batch_size, height, width, channels].
- batch_size: This is the amount of data (for example, inputs such as, images, and words) in a single batch of data. We normally process data in batches as often large datasets are used for learning. At a given training step, we randomly sample a small batch of data that approximately represents the full dataset. And doing this for many steps allows us to approximate the full dataset quite well. This batch_size parameter is the same as the one we discussed in the TensorFlow input pipeline example.
- height and width: This is the height and the width of the input.
- channels: This is the depth of an input (for example, for a RGB image, channels will be 3—a channel for each color).
filter: This is a 4D tensor that represents the convolution window of the convolution operation. The filter dimensions should be [height, width, in_channels, out_channels]:
- height and width: This is the height and the width of the filter (often smaller than that of the input)
- in_channels: This is the number of channels of the input to the layer
- out_channels: This is the number of channels to be produced in the output of the layer
strides: This is a list with four elements, where the elements are [batch_stride, height_stride, width_stride, channels_stride]. The strides argument denotes how many elements to skip during a single shift of the convolution window on the input. If you do not completely understand what strides is, you can use the default value of 1.
padding: This can be one of ['SAME', 'VALID']. It decides how to handle the convolution operation near the boundaries of the input. The VALID operation performs the convolution without padding. If we were to convolve an input of n length with a convolution window of size h, this will result in an output of size (n-h+1 < n). The diminishing of the output size can severely limit the depth of neural networks. SAME pads zeros to the boundary such that the output will have the same height and width as the input.

To gain a better understanding of what filter size, stride, and padding are, refer to Figure 2.7:

Figure 2.7: The convolution operation

The pooling operation

A pooling operation behaves similar to the convolution operation, but the final output is different. Instead of outputting the sum of the element-wise multiplication of the filter and the image patch, we now take the maximum element of the image patch for that location (see Figure 2.8):

x = tf.constant(
    [[
        [[1],[2],[3],[4]],
        [[4],[3],[2],[1]],
        [[5],[6],[7],[8]],
        [[8],[7],[6],[5]]
    ]],
    dtype=tf.float32)

x_ksize = [1,2,2,1]
x_stride = [1,2,2,1]
x_padding = 'VALID'

x_pool = tf.nn.max_pool(
    value=x, ksize=x_ksize,
    strides=x_stride, padding=x_padding
)
# Returns (out) =>
[[[[ 4.]
   [ 4.]],
  [[ 8.]
   [ 8.]]]]

Figure 2.8: The max-pooling operation

Defining loss

We know that in order for a neural network to learn something useful, a loss needs to be defined. There are several functions for automatically calculating the loss in TensorFlow, two of which are shown in the following code. The tf.nn.l2_loss function is the mean squared error loss, and tf.nn.softmax_cross_entropy_with_logits_v2 is another type of loss, which actually gives better performance in classification tasks. And by logits here, we mean the unnormalized output of the neural network (that is, the linear output of the last layer of the neural network):

# Returns half of L2 norm of t given by sum(t**2)/2
x = tf.constant([[2,4],[6,8]],dtype=tf.float32)
x_hat = tf.constant([[1,2],[3,4]],dtype=tf.float32)
# MSE = (1**2 + 2**2 + 3**2 + 4**2)/2 = 15
MSE = tf.nn.l2_loss(x-x_hat)

# A common loss function used in neural networks to optimize the network
# Calculating the cross_entropy with logits (unnormalized outputs of the last layer)
# instead of outputs leads to better numerical stabilities

y = tf.constant([[1,0],[0,1]],dtype=tf.float32)
y_hat = tf.constant([[3,1],[2,5]],dtype=tf.float32)
# This function alone doesnt average the cross entropy losses of all data points,
# You need to do that manually using reduce_mean function
CE = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=y_hat,labels=y))

Optimization of neural networks

After defining the loss of a neural network, our objective is to minimize that loss over time. Optimization is the procedure used for this. In other words, the objective of the optimizer is to find the neural network parameters (that is, weights and bias values) that give the minimum loss for all the inputs. Again, our beloved TensorFlow provides us with several different optimizers, so we don't have to worry about implementing them from scratch.

Figure 2.9 illustrates a simple optimization problem and shows how the optimization happens over time. The curve can be imagined as the loss curve (for high dimensions, we say loss surface), where x can be thought of as the parameters of the neural network (in this case a neural network with a single weight), and y can be thought of as the loss. We have an initial guess of x=2. From this point, we use the optimizer to reach the minimum y (that is, loss), which is obtained at x=0. More specifically, we take small steps in the direction opposite to the gradient at a given point and continue for several steps in this manner. However, in real-world problems, the loss surface will not be as nice as in the illustration, but it will be more complex:

Figure 2.9: The optimization process

In this example, we use GradientDescentOptimizer. The learning_rate parameter denotes the step size you take in the direction of minimization (distance between two red dots):

# Optimizers play the role of tuning neural network parameters so that # their task error is minimal
# For example task error can be the cross_entropy error # for a classification task
tf_x = tf.Variable(tf.constant(2.0,dtype=tf.float32),name='x') 
tf_y = tf_x**2
minimize_op = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(tf_y)

Everytime you execute the loss minimize operation with session.run(minimize_op), you will get close to the tf_x value that gives the minimum of tf_y.

The control flow operations

Control flow operations, as the name implies, controls the order of execution in the graph. For example, let's say we need to perform the following computation, in this order:

x = x+5

z = x*2

Precisely, if x = 2, we should get z = 14. Let's first try to achieve this in the simplest possible way:

session = tf.InteractiveSession()

x = tf.Variable(tf.constant(2.0), name='x')
x_assign_op = tf.assign(x, x+5)
z = x*2

tf.global_variables_initializer().run()
print('z=',session.run(z))
print('x=',session.run(x))
session.close()

Ideally, we would want x = 7 and z = 14, instead, TensorFlow produced x=2 and z=4. This is not the answer you were expecting. This is because TensorFlow does not care about the order of execution of things unless you explicitly specify it. Control flow operations enable you to exactly do this. To fix the preceding code, we do the following:

session = tf.InteractiveSession()

x = tf.Variable(tf.constant(2.0), name='x')
with tf.control_dependencies([tf.assign(x, x+5)]):
  z = x*2

tf.global_variables_initializer().run()
print('z=',session.run(z))
print('x=',session.run(x))
session.close()

Now this should give us x=7 and z=14. The tf.control_dependencies(...) operation makes sure that the operations passed to it as arguments will be performed before performing the nested operation.

官术网_书友最值得收藏!

Natural Language Processing with TensorFlow

Inputs, variables, outputs, and operations

Defining inputs in TensorFlow

Feeding data with Python code

Preloading and storing data as tensors

Building an input pipeline

Note

Defining variables in TensorFlow

Note

Note

Defining TensorFlow outputs

Defining TensorFlow operations

Comparison operations

Mathematical operations

Scatter and gather operations

Neural network-related operations

Nonlinear activations used by neural networks

Note

The convolution operation

The pooling operation

Defining loss

Optimization of neural networks

The control flow operations