官术网_书友最值得收藏!

Deep learning – a not-so-deep overview

So, what is this deep learning that is grabbing our attention and headlines? Let's turn to Wikipedia again to form a working definition: Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using model architectures, with complex structures or otherwise, composed of multiple nonlinear transformations. That sounds as if a lawyer wrote it. The characteristics of deep learning are that it is based on ANNs where the machine learning techniques, primarily unsupervised learning, are used to create new features from the input variables. We will dig into some unsupervised learning techniques in the next couple of chapters, but you can think of it as finding structure in data where no response variable is available.

A simple way to think of it is the periodic table of elements, which is a classic case of finding a structure where no response is specified. Pull up this table online and you will see that it is organized based on atomic structure, with metals on one side and non-metals on the other. It was created based on latent classification/structure. This identification of latent structure/hierarchy is what separates deep learning from your run-of-the-mill ANN. Deep learning sort of addresses the question of whether there is an algorithm that better represents the outcome than just the raw inputs. In other words, can our model learn to classify pictures other than with just the raw pixels as the only input? This can be of great help in a situation where you have a small set of labeled responses but a vast amount of unlabeled input data. You could train your deep learning model using unsupervised learning and then apply this in a supervised fashion to the labeled data, iterating back and forth.

Identification of these latent structures is not trivial mathematically, but one example is the concept of regularization that we looked at in Chapter 4, Advanced Feature Selection in Linear Models. In deep learning, you can penalize weights with regularization methods such as L1 (penalize non-zero weights), L2 (penalize large weights), and dropout (randomly ignore certain inputs and zero their weight out). In standard ANNs, none of these regularization methods take place.

Another way is to reduce the dimensionality of the data. One such method is the autoencoder. This is a neural network where the inputs are transformed into a set of reduced dimension weights. In the following diagram, notice that Feature A is not connected to one of the hidden nodes:

This can be applied recursively and learning can take place over many hidden layers. What you have seen happening, in this case, is that the network is developing features of features as they are stacked on each other. Deep learning will learn the weights between two layers in sequence first and then use backpropagation to fine-tune these weights. Other feature selection methods include restricted Boltzmann machine and sparse coding model.

The details of restricted Boltzmann machine and sparse coding model are beyond our scope, and many resources are available to learn about the specifics. Here are a couple of starting points: http://www.cs.toronto.edu/~hinton/
and http://deeplearning.net/.

Deep learning has performed well on many classification problems, including winning a Kaggle contest or two. It still suffers from the problems of ANNs, especially the black box problem. Try explaining to the uninformed what is happening inside a neural network, regardless of the use of various in vogue methods. However, it is appropriate for problems where an explanation of how is not a problem and the important question is what. After all, do we really care why an autonomous car avoided running into a pedestrian, or do we care about the fact that it did not? Additionally, the Python community has a bit of a head start on the R community in deep learning usage and packages. As we will see in the practical exercise, the gap is closing.

While deep learning is an exciting undertaking, be aware that to achieve the full benefit of its capabilities, you will need a high degree of computational power along with taking the time to train the best model by fine-tuning the hyperparameters. Here is a list of some things that you will need to consider:

  • An activation function
  • Size and number of the hidden layers
  • Dimensionality reduction, that is, restricted Boltzmann versus autoencoder
  • The number of epochs
  • The gradient descent learning rate
  • The loss function
  • Regularization
主站蜘蛛池模板: 西城区| 舞钢市| 奉贤区| 土默特右旗| 奉节县| 上思县| 德惠市| 万山特区| 溆浦县| 缙云县| 伊宁市| 安福县| 江油市| 清苑县| 衢州市| 同心县| 连南| 弥勒县| 广水市| 新蔡县| 灵川县| 江都市| 家居| 钟祥市| 定远县| 稻城县| 双城市| 轮台县| 阆中市| 洛南县| 福贡县| 长汀县| 汝州市| 积石山| 黎平县| 古田县| 察雅县| 壤塘县| 常州市| 会东县| 闸北区|