官术网_书友最值得收藏!

A brief history of contemporary deep learning

In addition to the aforementioned models, the first edition of this book included networks such as Restricted Boltzmann Machines (RBMs) and DBNs. They were popularized by Geoffrey Hinton, a Canadian scientist, and one of the most prominent deep learning researchers. Back in 1986, he was also one of the inventors of backpropagation. RBMs are a special type of generative neural network, where the neurons are organized into two layers, namely, visible and hidden. Unlike feed-forward networks, the data in an RBM can flow in both directions – from visible to hidden units, and vice versa. In 2002, Prof. Hinton introduced contrastive divergence, which is an unsupervised algorithm for training RBMs. And in 2006, he introduced Deep Belief Nets, which are deep neural networks that are formed by stacking multiple RBMs. Thanks to their novel training algorithm, it was possible to create a DBN with more hidden layers than had previously been possible. To understand this, we should explain why it was so difficult to train deep neural networks prior to that. In the past, the activation function of choice was the logistic sigmoid, shown in the following chart:

Logistic sigmoid (blue) and its derivative (green)

We now know that, to train a neural network, we need to compute the derivative of the activation function (along with all the other derivatives). The sigmoid derivative has significant value in a very narrow interval, centered around 0 and converges towards 0 in all other cases. In networks with many layers, it's highly likely that the derivative would converge to 0, when propagated to the first layers of the network. Effectively, this means we cannot update the weights in these layers. This is a famous problem called vanishing gradients and (along with a few other issues), which prevents the training of deep networks. By stacking pre-trained RBMs, DBNs were able to alleviate (but not solve) this problem.

But training a DBN is not easy. Let's look at the following steps:

  • First, we have to train each RBM with contrastive divergence, and gradually stack them on top of each other. This phase is called pre-training
  • In effect, pre-training serves as a sophisticated weight initialization algorithm for the next phase, called fine-tuning. With fine-tuning, we transform the DBN in a regular multi-layer perceptron and continue training it using supervised backpropagation, in the same way we saw in Chapter 2, Neural Networks.

However, thanks to some algorithmic advances, it's now possible to train deep networks using plain old backpropagation, thus effectively eliminating the pre-training phase. We will discuss these improvements in the coming sections, but for now, let's just say that they rendered DBNs and RBMs obsolete. DBNs and RBMs are, without a doubt, interesting from a research perspective, but are rarely used in practice anymore. Because of this, we will omit them from this edition.

主站蜘蛛池模板: 资阳市| 墨江| 甘泉县| 井冈山市| 贵阳市| 当阳市| 铁力市| 敦化市| 中阳县| 德江县| 澄迈县| 辽宁省| 石嘴山市| 林芝县| 荣成市| 长岭县| 沈阳市| 东莞市| 沈阳市| 嵊泗县| 南溪县| 三河市| 陕西省| 阳泉市| 扎鲁特旗| 灵川县| 惠州市| 台东市| 凉山| 山阳县| 白沙| 平江县| 陆河县| 胶南市| 庆元县| 宣武区| 英吉沙县| 镇原县| 瑞丽市| 宜宾县| 静安区|