官术网_书友最值得收藏!

Computer vision and the machine learning workflow 

Computer vision applications with machine learning have a common basic structure. This structure is divided into different steps:

  1. Pre-process
  2. Segmentation
  3. Feature extraction
  4. Classification result
  5. Post-process

These are common in almost all computer vision applications, while others are omitted. In the following diagram, you can see the different steps that are involved:

Almost all computer vision applications start with a Pre-process applied to the input image, which consists of the removal of light and noise, filtering, blurring, and so on. After applying all pre-processing required to the input image, the second step is Segmentation. In this step, we have to extract the regions of interest in the image and isolate each one as a unique object of interest. For example, in a face detection system, we have to separate the faces from the rest of the parts in the scene. After detecting the objects inside the image, we continue to the next step. Here, we have to extract the features of each one; the features are normally a vector of characteristics of objects. A characteristic describes our objects and can be the area of an object, contour, texture pattern, pixels, and so on.

Now, we have the descriptor, also known as a feature vector or feature set, of our object. Descriptors are the features that describe an object, and we use these to train or predict a model. To do this, we have to create a large dataset of features where thousands of images are pre-processed. We then use the extracted features (image/object characteristics) such as area, size, and aspect ration, in the Train model function we choose. In the following diagram, we can see how a dataset is fed into a Machine Learning Algorithm to train and generate a Model:

When we Train with a dataset, the Model learns all the parameters required to be able to predict when a new vector of features with an unknown label is given as input to our algorithm. In the following diagram, we can see how an unknown vector of features is used to Predict using the generated Model, thus returning the Classification result or regression:

After predicting the result, the post-processing of output data is sometimes required, for example, merging multiple classifications to decrease the prediction error or merging multiple labels. A sample case in Optical Character recognition is where the Classification result is according to each predicted character, and by combining the results of character recognition, we construct a word. This means that we can create a post-processing method to correct errors in detected words. With this small introduction to machine learning for computer vision, we are going to implement our own application that uses machine learning to classify objects in a slide tape. We are going to use support vector machines as our classification method and explain how to use them. The other machine learning algorithms are used in a very similar way. The OpenCV documentation has detailed information about all of the machine learning algorithms at the following link: https://docs.opencv.org/master/dd/ded/group__ml.html.

主站蜘蛛池模板: 叙永县| 金乡县| 宜兰县| 库尔勒市| 黔西| 兴宁市| 岳西县| 西宁市| 渝北区| 扬州市| 西畴县| 铁岭县| 东乌珠穆沁旗| 互助| 新龙县| 开原市| 普陀区| 桂东县| 山阳县| 阳城县| 蛟河市| 松滋市| 昭觉县| 平利县| 寿阳县| 武邑县| 宿迁市| 收藏| 清新县| 英山县| 大关县| 灵台县| 富顺县| 荥经县| 喜德县| 唐河县| 上饶市| 阿鲁科尔沁旗| 建昌县| 元谋县| 古蔺县|