免费代理手游

書名： Natural Language Processing with TensorFlow
作者名： Thushan Ganegedara
本章字數： 2223字
更新時間： 2021-06-25 21:28:21

What is TensorFlow?

In Chapter 1, Introduction to Natural Language Processing, we briefly discussed what TensorFlow is. Now let's take a closer look at it. TensorFlow is an open source distributed numerical computation framework released by Google that is mainly intended to alleviate the painful details of implementing a neural network (for example, computing derivatives of the weights of the neural network). TensorFlow takes this even a step further by providing efficient implementations of such numerical computations using Compute Unified Device Architecture (CUDA), which is a parallel computational platform introduced by NVIDIA. The Application Programming Interface (API) of TensorFlow at https://www.tensorflow.org/api_docs/python/ shows that TensorFlow provides thousands of operations that make our lives easier.

TensorFlow was not developed overnight. This is a result of the persistence of talented, good-hearted inpiduals who wanted to make a difference by bringing deep learning to a wider audience. If you are interested, you can take a look at the TensorFlow code at https://github.com/tensorflow/tensorflow. Currently, TensorFlow has around 1,000 contributors, and it sits on top of more than 25,000 commits, evolving to be better and better every day.

Getting started with TensorFlow

Now let's learn about a few essential components in the TensorFlow framework by working through a code example. Let's write an example to perform the following computation, which is very common for neural networks:

h = sigmoid(W * x + b)

Here W and x are matrices and b is a vector. Then, * denotes the dot product. sigmoid following equation:

We will discuss how to do this computation through TensorFlow step by step.

First, we will need to import TensorFlow and NumPy. Importing them is essential before you run any type of TensorFlow- or NumPy-related operation, in Python:

import tensorflow as tf
import numpy as np

Next, we'll define a graph object, which we will populate with operations and variables later:

graph = tf.Graph() # Creates a graph
session = tf.InteractiveSession(graph=graph) # Creates a session

The graph object contains the computational graph that connects the various inputs and outputs we define in our program to get the final desired output (that is, it defines how W, x, and b are connected to produce h in terms of a graph). For example, if you think of the output as a cake, then the graph would be the recipe to make that cake using ingredients (that is, inputs). Also, we'll define a session object that takes the defined graph as the input, which executes the graph. We will talk about these elements in detail in the next section.

Note

To create a new graph object, you can either use the following, as we did in the preceding example:

graph = tf.Graph()

Alternatively, you can use the following to get the TensorFlow default computational graph:

graph = tf.get_default_graph()

We show exercises using both these methods.

Now we'll define a few tensors, namely x, W, b, and h. A tensor is essentially an n-dimensional array in TensorFlow. For example, a one-dimensional vector or a two-dimensional matrix is called a tensor. There are several different ways in TensorFlow that you can define tensors. Here we will look at three such different approaches:

First, x is a placeholder. Placeholders, as the name suggests, are not initialized with some value. Rather, we will provide the value on-the-fly at the time of the graph execution.
Next, we have variables W and b. Variables are mutable, meaning that their values can change over time.

Finally, we have h, which is an immutable tensor produced by performing some operations on x, W, and b:

x = tf.placeholder(shape=[1,10],dtype=tf.float32,name='x')
W = tf.Variable(tf.random_uniform(shape=[10,5], minval=-0.1, maxval=0.1, dtype=tf.float32),name='W')
b = tf.Variable(tf.zeros(shape=[5],dtype=tf.float32),name='b')
h = tf.nn.sigmoid(tf.matmul(x,W) + b)

Also, notice that for W and b we provide some important arguments such as the following:

tf.random_uniform(shape=[10,5], minval=-0.1, maxval=0.1, dtype=tf.float32)
tf.zeros(shape=[5],dtype=tf.float32)

These are called variable initializers and are the tensors that will be assigned to the W and b variables initially. Variables cannot float without an initial value as placeholders and need to have some value assigned to them all the time. Here, tf.random_uniform means that we uniformly sample values between minval (-0.1) and maxval (0.1) to assign values to the tensors, and tf.zeros initializes the tensor with zeros. It is also very important to define the shape of your tensor when you are defining it. The shape property defines the size of each dimension of a tensor. For example, if shape is [10, 5], this means that it will be a two-dimensional structure and will have 10 elements on axis 0 and 5 elements on axis 1.

Next, we'll run an initialization operation that initializes the variables in the graph, W and b:

tf.global_variables_initializer().run()

Now, we will execute the graph to obtain the final output we need, h. This is done by running session.run(...), where we provide the value to the placeholder as an argument of the session.run() command:

h_eval = session.run(h,feed_dict={x: np.random.rand(1,10)})

Finally, we close the session, releasing any resources held by the session object.

session.close()

Here is the full code of this TensorFlow example. All the code examples in this chapter will be available in the tensorflow_introduction.ipynb file in the ch2 folder:

import tensorflow as tf
import numpy as np

# Defining the graph and session
graph = tf.Graph() # Creates a graph
session = tf.InteractiveSession(graph=graph) # Creates a session

# Building the graph
# A placeholder is an symbolic input
x = tf.placeholder(shape=[1,10],dtype=tf.float32,name='x') W = tf.Variable(tf.random_uniform(shape=[10,5], minval=-0.1, maxval=0.1, dtype=tf.float32),name='W') # Variable
# Variable
b = tf.Variable(tf.zeros(shape=[5],dtype=tf.float32),name='b') 
h = tf.nn.sigmoid(tf.matmul(x,W) + b) # Operation to be performed

# Executing operations and evaluating nodes in the graph
tf.global_variables_initializer().run() # Initialize the variables

# Run the operation by providing a value to the symbolic input x
h_eval = session.run(h,feed_dict={x: np.random.rand(1,10)})
# Closes the session to free any held resources by the session
session.close()

When you run this code, you might encounter a warning, as shown here:

... tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: ...

Don't worry about this. This is a warning saying that you used an off-the-shelf precompiled version of TensorFlow without compiling it on your computer. This is totally fine. It is just that you will get a slightly better performance if you compile it on your computer, as TensorFlow will be optimized for that particular hardware.

In the following sections we will explain how TensorFlow executes this code to produce the final output. Also note that the next two sections will be somewhat complex and technical. However, you don't have to worry if you don't understand everything completely, because after this, we will go through a nice, thorough real-world example, where the same execution is explained in terms of how an order is fulfilled in a restaurant, our own Café Le TensorFlow.

TensorFlow client in detail

The preceding example program is called a TensorFlow client. In any client program you write with TensorFlow, there will be two main types of objects: operations and tensors. In the preceding example, tf.nn.sigmoid is an operation and h is a tensor.

Then we have a graph object, which is the computational graph that stores the dataflow of our program. When we add the subsequent lines defining x, W, b, and h in the code, TensorFlow automatically adds these tensors and any operations (for example, tf.matmul()) to the graph as nodes. The graph will store vital information such as the tensor dependencies and which operation to perform where. In our example, the graph will know that to calculate h, tensors x, W, and b are required. So, if you haven't properly initialized one of them during runtime, TensorFlow can point you to the exact initialization error that needs to be fixed.

Next, the session plays the role of executing the graph by piding the graph into subgraphs and subsequently to even finer pieces which will then be assigned to workers that will perform the assigned task. This is done with the session.run(...) function. We will talk about this soon. For future reference, let's call our example the sigmoid example.

TensorFlow architecture – what happens when you execute the client?

We know that TensorFlow is skillful at creating a nice computational graph with all the dependencies and operations so that it knows exactly how, when, and where the data flows. But there should be one more element to this to make TensorFlow great: the effective execution of the defined computational graph. This is where the session comes in. Now let's peek under the hood of the session to understand how the graph is executed.

First, the TensorFlow client holds a graph and session. When you create a session, it sends the computational graph as a tf.GraphDef protocol buffer to the distributed master. tf.GraphDef is a standardized representation of the graph. The distributed master sees all computations in the graph and pides the computations to different devices (for example, different GPUs and CPUs). The graph in our sigmoid example looks like Figure 2.1. A single element of the graph is called a node:

Figure 2.1: A computational graph of the client

Next, the computational graph will be broken into subgraphs and further into finer pieces by the distributed master. Though decomposing the computational graph appears too trivial in our example, the computational graph can exponentially grow in real-world solutions with many hidden layers. Additionally, it becomes important to break the computational graph into multiple pieces in order to execute things in parallel (for example, multiple devices). Executing this graph (or a subgraph if the graph is pided to subgraphs) is called a single task, where a task is allocated to a single TensorFlow server.

However, in reality, each task will be executed by breaking this down into two pieces, where each piece is executed by a single worker:

One worker executes the TensorFlow operations using the current values of the parameters (operation executor)
The other worker stores the parameters and updates them with new values obtained after executing the operations (parameter server)

This general workflow of a TensorFlow client is depicted in Figure 2.2:

Figure 2.2: The generic execution of a TensorFlow client

Figure 2.3 illustrates the decomposition of the graph. In addition to breaking the graph down, TensorFlow inserts send and receive nodes to help with the communication between the parameter server and the operation executor. You can understand send nodes to be sending data whenever data is available, where the receive nodes keep listening and capture data when the corresponding send node sends data:

Figure 2.3: Decomposition of the TensorFlow graph

Finally, the session brings back the updated data to the client from the parameter server once the calculation is done. The architecture of TensorFlow is shown in Figure 2.4. This explanation is based on the official TensorFlow documentation found at https://www.tensorflow.org/extend/architecture.

Figure 2.4: TensorFlow framework architecture (https://www.tensorflow.org/extend/architecture)

Cafe Le TensorFlow – understanding TensorFlow with an analogy

If you were overwhelmed with the information contained in the technical explanation, we'll try to grasp the concept from a different perspective. Let's say that a new cafe just opened and you've been dying to try it. So you go there and grab a seat by a window.

Next, the waiter comes to take your order, and you order a chicken burger with extra cheese and no tomatoes. Think of yourself as the client and your order as defining the graph. The graph defines what you need and how you need it. The waiter is analogous to the session, where his responsibility is to carry the order to the kitchen so the order can be made. When taking the order, the waiter uses a certain format to convey your order, for example, table number, menu item ID, quantity, and special requirements. Think of this formatted order written in the waiter's notebook as GraphDef. Then the waiter takes the order to the kitchen and gives it to the kitchen manager. From this point, the kitchen manager assumes the responsibility of fulfilling the order. Here, the kitchen manager represents the distributed master. The kitchen manager makes decisions, such as how many chefs are required to make the dish and which chefs are the best candidates for the job. Let's also assume that each chef has a cook, whose responsibility is to provide the chef with the right ingredients, equipment, and so forth. So the kitchen manager takes the order to a single chef and a cook (a burger is not that hard to prepare) and asks them to prepare the dish. In our example, the chef is the operation executor, and the cook is the parameter server.

The chef looks at the order and tells the cook what is needed. So the cook first finds the things that will be required (for example, buns, patties, and onions) and keeps them close to fulfill the chef's requests as soon as possible. Moreover, the chef might also ask to keep the intermediate results (for example, cut vegetables) of the dish temporarily until the chef needs it back again.

When the order is ready, the kitchen manager receives the burger from the chef and the cook and notifies the waiter. At this point, the waiter takes the burger from the kitchen manager and brings it to you. You will finally be able to enjoy the delicious burger made according to your specifications. This process is shown in Figure 2.5:

Figure 2.5: The restaurant analogy illustrated

官术网_书友最值得收藏!