官术网_书友最值得收藏!

Network architecture 

We will now experiment with the pre-trained ResNet50, InceptionV3, and VGG16 networks, and find out which one gives the best results. Each of the pre-trained models' weights are based on ImageNet. I have provided the links to the original papers for the ResNet, InceptionV3, and VGG16 architectures, for reference. Readers are advised to go over these papers, to get an in-depth understanding of these architectures and the subtle differences between them.

The VGG paper link is as follows:

The ResNet paper link is as follows:

The InceptionV3 paper link is as follows:

To explain in brief, VGG16 is a 16-layered CNN that uses 3 x 3 filters and 2 x 2 receptive fields for convolution. The activation functions used throughout the network are all ReLUs. The VGG architecture, developed by Simonyan and Zisserman, was the runner up in the ILSVRC 2014 competition. The VGG16 network gained a lot of popularity due to its simplicity, and it is the most popular network for extracting features from images.

ResNet50 is a deep CNN that implements the idea of residual block, quite different from that of the VGG16 network. After a series of convolution-activation-pooling operations, the input of the block is again fed back to the output. The ResNet architecture was developed by Kaiming He, et al., and although it has 152 layers, it is less complex than the VGG network. This architecture won the ILSVRC 2015 competition by achieving a top five error rate of 3.57%, which is better than the human-level performance on this competition dataset. The top five error rate is computed by checking whether the target is in the five class predictions with the highest probability. In principle, the ResNet network tries to learn the residual mapping, as opposed to directly mapping from the output to the input, as you can see in the following residual block diagram:

Figure 2.8: Residual block of ResNet models

InceptionV3 is the state-of-the-art CNN from Google. Instead of using fixed-sized convolutional filters at each layer, the InceptionV3 architecture uses filters of different sizes to extract features at different levels of granularity. The convolution block of an InceptionV3 layer is illustrated in the following diagram:

Figure 2.9: InceptionV3 convolution block

Inception V1 (GoogleNet) was the winner of the ILSVRC 2014 competition. Its top 5% error rate was very close to human-level performance, at 6.67%.

主站蜘蛛池模板: 英德市| 泽普县| 谷城县| 万盛区| 商洛市| 陆良县| 盐池县| 应城市| 浏阳市| 伊宁市| 庆元县| 陇川县| 肇源县| 惠东县| 黑河市| 伊通| 宁蒗| 密云县| 汉沽区| 凤山县| 菏泽市| 图片| 德阳市| 泽州县| 印江| 连平县| 广西| 宁津县| 东阳市| 巧家县| 江陵县| 砚山县| 黑山县| 宁化县| 广水市| 凤冈县| 保山市| 和林格尔县| 天门市| 华容县| 瓦房店市|