官术网_书友最值得收藏!

A brief history of contemporary deep learning

In addition to the aforementioned models, the first edition of this book included networks such as Restricted Boltzmann Machines (RBMs) and DBNs. They were popularized by Geoffrey Hinton, a Canadian scientist, and one of the most prominent deep learning researchers. Back in 1986, he was also one of the inventors of backpropagation. RBMs are a special type of generative neural network, where the neurons are organized into two layers, namely, visible and hidden. Unlike feed-forward networks, the data in an RBM can flow in both directions – from visible to hidden units, and vice versa. In 2002, Prof. Hinton introduced contrastive divergence, which is an unsupervised algorithm for training RBMs. And in 2006, he introduced Deep Belief Nets, which are deep neural networks that are formed by stacking multiple RBMs. Thanks to their novel training algorithm, it was possible to create a DBN with more hidden layers than had previously been possible. To understand this, we should explain why it was so difficult to train deep neural networks prior to that. In the past, the activation function of choice was the logistic sigmoid, shown in the following chart:

Logistic sigmoid (blue) and its derivative (green)

We now know that, to train a neural network, we need to compute the derivative of the activation function (along with all the other derivatives). The sigmoid derivative has significant value in a very narrow interval, centered around 0 and converges towards 0 in all other cases. In networks with many layers, it's highly likely that the derivative would converge to 0, when propagated to the first layers of the network. Effectively, this means we cannot update the weights in these layers. This is a famous problem called vanishing gradients and (along with a few other issues), which prevents the training of deep networks. By stacking pre-trained RBMs, DBNs were able to alleviate (but not solve) this problem.

But training a DBN is not easy. Let's look at the following steps:

  • First, we have to train each RBM with contrastive divergence, and gradually stack them on top of each other. This phase is called pre-training
  • In effect, pre-training serves as a sophisticated weight initialization algorithm for the next phase, called fine-tuning. With fine-tuning, we transform the DBN in a regular multi-layer perceptron and continue training it using supervised backpropagation, in the same way we saw in Chapter 2, Neural Networks.

However, thanks to some algorithmic advances, it's now possible to train deep networks using plain old backpropagation, thus effectively eliminating the pre-training phase. We will discuss these improvements in the coming sections, but for now, let's just say that they rendered DBNs and RBMs obsolete. DBNs and RBMs are, without a doubt, interesting from a research perspective, but are rarely used in practice anymore. Because of this, we will omit them from this edition.

主站蜘蛛池模板: 玛纳斯县| 册亨县| 沐川县| 汝州市| 客服| 西华县| 同德县| 丽水市| 诏安县| 安龙县| 德阳市| 安龙县| 左贡县| 包头市| 庐江县| 汕头市| 龙海市| 瓦房店市| 建湖县| 烟台市| 福鼎市| 嵊州市| 朔州市| 彭州市| 资溪县| 西宁市| 安宁市| 汝阳县| 涞水县| 怀柔区| 壤塘县| 台山市| 四子王旗| 夏河县| 虞城县| 宣化县| 三亚市| 铁岭市| 界首市| 白河县| 嵊州市|