官术网_书友最值得收藏!

Optimizers

We will study AdamOptimizer here; TensorFlow AdamOptimizer uses Kingma and Ba's Adam algorithm to manage the learning rate. Adam has many advantages over the simple GradientDescentOptimizer. The first is that it uses moving averages of the parameters, which enables Adam to use a larger step size, and it will converge to this step size without any fine-tuning.

The disadvantage of Adam is that it requires more computation to be performed for each parameter in each training step. GradientDescentOptimizer can be used as well, but it would require more hyperparameter tuning before it would converge as quickly.
The following example shows how to use AdamOptimizer:

  • tf.train.Optimizer creates an optimizer
  • tf.train.Optimizer.minimize(loss, var_list) adds the optimization operation to the computation graph

Here, automatic differentiation computes gradients without user input:

import numpy as np
import seaborn
import matplotlib.pyplot as plt
import tensorflow as tf

# input dataset
xData = np.arange(100, step=.1)
yData = xData + 20 * np.sin(xData/10)

# scatter plot for input data
plt.scatter(xData, yData)
plt.show()

# defining data size and batch size
nSamples = 1000
batchSize = 100

# resize
xData = np.reshape(xData, (nSamples,1))
yData = np.reshape(yData, (nSamples,1))

# input placeholders
x = tf.placeholder(tf.float32, shape=(batchSize, 1))
y = tf.placeholder(tf.float32, shape=(batchSize, 1))

# init weight and bias
with tf.variable_scope("linearRegression"):
W = tf.get_variable("weights", (1, 1), initializer=tf.random_normal_initializer())
b = tf.get_variable("bias", (1,), initializer=tf.constant_initializer(0.0))

y_pred = tf.matmul(x, W) + b
loss = tf.reduce_sum((y - y_pred)**2/nSamples)

# optimizer
opt = tf.train.AdamOptimizer().minimize(loss)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())

# gradient descent loop for 500 steps
for _ in range(500):
# random minibatch
indices = np.random.choice(nSamples, batchSize)

X_batch, y_batch = xData[indices], yData[indices]

# gradient descent step
_, loss_val = sess.run([opt, loss], feed_dict={x: X_batch, y: y_batch})

Here is the scatter plot for the dataset:

This is the plot of the learned model on the data:

主站蜘蛛池模板: 疏附县| 乌拉特前旗| 板桥市| 隆子县| 博野县| 盱眙县| 上高县| 新巴尔虎右旗| 谢通门县| 搜索| 台州市| 湖南省| 十堰市| 襄樊市| 霞浦县| 安化县| 厦门市| 东乡县| 上犹县| 吉安县| 满洲里市| 民和| 福清市| 湛江市| 平武县| 芜湖市| 新和县| 桃江县| 交口县| 汉中市| 松原市| 武山县| 电白县| 绥中县| 鄯善县| 安阳市| 刚察县| 贵阳市| 承德市| 泰宁县| 武隆县|