官术网_书友最值得收藏!

CNNs

The most common use case scenarios of CNNs are all to do with image processing, but are not restricted to other types of input, whether it be audio or video. A typical use case is image classification – the network is fed with images so that it can classify the data. For example, it outputs a lion if you give it a lion picture, a tiger when you give it a tiger picture, and so on. The reason why this kind of network is used for image classification is because it uses relatively little preprocessing compared to other algorithms in the same space – the network learns the filters that, in traditional algorithms, were hand-engineered.

Being a multilayered neural network, A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers can be convolutional, pooling, fully connected, and normalization layers. Convolutional layers apply a convolution operation (https://en.wikipedia.org/wiki/Convolution) to an input, before passing the result to the next layer. This operation emulates how the response of an individual physical neuron to a visual stimulus is generated. Each convolutional neuron processes only the data for its receptive field (which is the particular region of the sensory space of an individual sensory neuron in which a change in the environment will modify the firing of that neuron). Pooling layers are responsible for combining the outputs of clusters of neurons in a layer into a single neuron in the next layer. There are different implementations of poolings—max pooling, which uses the maximum value from each cluster from the prior layer; average pooling, which uses the average value from any cluster of neurons on the prior layer; and so on. Fully connected layers, instead, as you will clearly realize from their name, connect every neuron in a layer to every other neuron in another layer.

CNNs don't parse all the training data at once, but they usually start with a sort of input scanner. For example, consider an image of 200 x 200 pixels as input. In this case, the model doesn't have a layer with 40,000 nodes, but a scanning input layer of 20 x 20, which is fed using the first 20 x 20 pixels of the original image (usually, starting in the upper-left corner). Once we have passed that input (and possibly used it for training), we feed it using the next 20 x 20 pixels (this will be explained better and in a more detailed manner in Chapter 5, Convolutional Neural Networks; the process is similar to the movement of a scanner, one pixel to the right). Please note that the image isn't dissected into 20 x 20 blocks, but the scanner moves over it. This input data is then fed through one or more convolutional layers. Each node of those layers only has to work with its close neighboring cells—not all of the nodes are connected to each other. The deeper a network becomes, the more its convolutional layers shrink, typically following a divisible factor of the input (if we started with a layer of 20, then, most probably, the next one would be a layer of 10 and the following a layer of 5). Powers of two are commonly used as divisible factors.

The following diagram (by Aphex34—own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=45679374) shows the typical architecture of a CNN:

Figure 2.8
主站蜘蛛池模板: 藁城市| 富锦市| 五莲县| 左云县| 江西省| 黄陵县| 马边| 芷江| 多伦县| 铁力市| 渑池县| 金川县| 年辖:市辖区| 罗源县| 淄博市| 桐柏县| 黑龙江省| 西乌珠穆沁旗| 卫辉市| 固镇县| 辛集市| 汉中市| 横峰县| 旅游| 德钦县| 石林| 武胜县| 博爱县| 庆阳市| 建宁县| 敦煌市| 阳曲县| 高要市| 海林市| 鹿邑县| 西宁市| 安新县| 宁阳县| 宜昌市| 光泽县| 阜城县|