- Deep Learning with PyTorch
- Vishnu Subramanian
- 689字
- 2021-06-24 19:16:29
Image classification using deep learning
The most important step in solving any real-world problem is to get the data. Kaggle provides a huge number of competitions on different data science problems. We will pick one of the problems that arose in 2014, which we will use to test our deep learning algorithms in this chapter and improve it in Chapter 5, Deep Learning for Computer Vision, which will be on Convolution Neural Networks (CNNs) and some of the advanced techniques that we can use to improve the performance of our image recognition models. You can download the data from https://www.kaggle.com/c/dogs-vs-cats/data. The dataset contains 25,000 images of dogs and cats. Preprocessing of data and the creation of train, validation, and test splits are some of the important steps that need to be performed before we can implement an algorithm. Once the data is downloaded, taking a look at it, it shows that the folder contains images in the following format:

Most of the frameworks make it easier to read the images and tag them to their labels when provided in the following format. That means that each class should have a separate folder of its images. Here, all cat images should be in the cat folder and dog images in the dog folder:

Python makes it easy to put the data into the right format. Let's quickly take a look at the code and, then, we will go through the important parts of it:
path = '../chapter3/dogsandcats/'
#Read all the files inside our folder.
files = glob(os.path.join(path,'*/*.jpg'))
print(f'Total no of images {len(files)}')
no_of_images = len(files)
#Create a shuffled index which can be used to create a validation data set
shuffle = np.random.permutation(no_of_images)
#Create a validation directory for holding validation images.
os.mkdir(os.path.join(path,'valid'))
#Create directories with label names
for t in ['train','valid']:
for folder in ['dog/','cat/']:
os.mkdir(os.path.join(path,t,folder))
#Copy a small subset of images into the validation folder.
for i in shuffle[:2000]:
folder = files[i].split('/')[-1].split('.')[0]
image = files[i].split('/')[-1]
os.rename(files[i],os.path.join(path,'valid',folder,image))
#Copy a small subset of images into the training folder.
for i in shuffle[2000:]:
folder = files[i].split('/')[-1].split('.')[0]
image = files[i].split('/')[-1]
os.rename(files[i],os.path.join(path,'train',folder,image))
All the preceding code does is retrieve all the files and pick 2,000 images for creating a validation set. It segregates all the images into the two categories of cats and dogs. It is a common and important practice to create a separate validation set, as it is not fair to test our algorithms on the same data it is trained on. To create a validation dataset, we create a list of numbers that are in the range of the length of the images in a shuffled order. The shuffled numbers act as an index for us to pick a bunch of images for creating our validation dataset. Let's go through each section of the code in detail.
We create a file using the following code:
files = glob(os.path.join(path,'*/*.jpg'))
The glob method returns all the files in the particular path. When there are a huge number of images, we can also use iglob, which returns an iterator, instead of loading the names into memory. In our case, we have only 25,000 filenames, which can easily fit into memory.
We can shuffle our files using the following code:
shuffle = np.random.permutation(no_of_images)
The preceding code returns 25,000 numbers in the range from zero to 25,000 in a shuffled order, which we will use as an index for selecting a subset of images to create a validation dataset.
We can create a validation code, as follows:
os.mkdir(os.path.join(path,'valid'))
for t in ['train','valid']:
for folder in ['dog/','cat/']:
os.mkdir(os.path.join(path,t,folder))
The preceding code creates a validation folder and creates folders based on categories (cats and dogs) inside train and valid directories.
We can shuffle an index with the following code:
for i in shuffle[:2000]:
folder = files[i].split('/')[-1].split('.')[0]
image = files[i].split('/')[-1]
os.rename(files[i],os.path.join(path,'valid',folder,image))
In the preceding code, we use our shuffled index to randomly pick 2000 different images for our validation set. We do something similar for the training data to segregate the images in the train directory.
As we have the data in the format we need, let's quickly look at how to load the images as PyTorch tensors.
- 零點起飛學(xué)Xilinx FPG
- Cortex-M3 + μC/OS-II嵌入式系統(tǒng)開發(fā)入門與應(yīng)用
- Istio入門與實戰(zhàn)
- 龍芯應(yīng)用開發(fā)標(biāo)準(zhǔn)教程
- 顯卡維修知識精解
- Effective STL中文版:50條有效使用STL的經(jīng)驗(雙色)
- Deep Learning with PyTorch
- 嵌入式系統(tǒng)中的模擬電路設(shè)計
- 分布式微服務(wù)架構(gòu):原理與實戰(zhàn)
- OpenGL Game Development By Example
- 單片機技術(shù)及應(yīng)用
- Hands-On Motion Graphics with Adobe After Effects CC
- 單片機原理及應(yīng)用
- The Reinforcement Learning Workshop
- 計算機組裝與維護