- Practical Computer Vision
- Abhinav Dadhich
- 287字
- 2021-06-30 18:54:46
Pascal VOC
As previous datasets like MNIST and CIFAR are limited in representation, we cannot use them for tasks like people detection or segmentation. Pascal VOC[4] has gained in popularity for such tasks as one of the major datasets for object recognition. During 2005-2012, there were competitions conducted that used this dataset and achieved the best possible accuracy on test data. The dataset is also usually referred to by year; for example, VOC2012 refers to the dataset available for the 2012 competition. In VOC2012, there are three competition categories. The first is the classification and detection dataset, which has 20 categories of objects along with rectangular region annotations around the objects. The second category is Segmentation with instance boundaries around objects. The third competition category is for action recognition from images.
This dataset can be downloaded from the following link:
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html.
In this dataset, a sample annotation file (in XML format) for an image is in the following code, where the tags represent properties of that field:
<annotation>
<folder>VOC2012</folder>
<filename>2007_000033.jpg</filename>
<source>
<database>The VOC2007 Database</database>
<annotation>PASCAL VOC2007</annotation>
<image>flickr</image>
</source>
<size>
<width>500</width>
<height>366</height>
<depth>3</depth>
</size>
<segmented>1</segmented>
<object>
<name>aeroplane</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>9</xmin>
<ymin>107</ymin>
<xmax>499</xmax>
<ymax>263</ymax>
</bndbox>
</object>
<object>
<name>aeroplane</name>
<pose>Left</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>421</xmin>
<ymin>200</ymin>
<xmax>482</xmax>
<ymax>226</ymax>
</bndbox>
</object>
<object>
<name>aeroplane</name>
<pose>Left</pose>
<truncated>1</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>325</xmin>
<ymin>188</ymin>
<xmax>411</xmax>
<ymax&gt;223</ymax>
</bndbox>
</object>
</annotation>
The corresponding image is as shown in the following figure:

The available categories in this dataset are aeroplane, bicycle, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, train, and TV.
The number of categories is, however, limited. In the next section, we will see a more elaborate dataset with 80 categories. Having a higher number of generic object categories will help in creating applications that can be used easily in more generic scenarios.
- 32位嵌入式系統與SoC設計導論
- Big Data Analytics with Hadoop 3
- Zabbix Network Monitoring(Second Edition)
- Docker Quick Start Guide
- 網站入侵與腳本攻防修煉
- Dreamweaver CS6精彩網頁制作與網站建設
- Learning ServiceNow
- 重估:人工智能與賦能社會
- 工業機器人入門實用教程
- Hands-On Deep Learning with Go
- Red Hat Enterprise Linux 5.0服務器構建與故障排除
- 計算機應用基礎學習指導與練習(Windows XP+Office 2003)
- PowerPoint 2003中文演示文稿5日通
- 人工智能基礎
- 工程地質地學信息遙感自動提取技術