- Machine Learning for OpenCV
- Michael Beyeler
- 964字
- 2021-07-02 19:47:17
Understanding the machine learning workflow
As mentioned earlier, machine learning is all about building mathematical models in order to understand data. The learning aspect enters this process when we give a machine learning model the capability to adjust its internal parameters; we can tweak these parameters so that the model explains the data better . In a sense, this can be understood as the model learning from the data. Once the model has learned enough--whatever that means--we can ask it to explain newly observed data.
This process is illustrated in the following figure:

Let's break it down step by step.
The first thing to notice is that machine learning problems are always split into (at least) two distinct phases:
- A training phase, during which we aim to train a machine learning model on a set of data that we call the training dataset
- A test phase, during which we evaluate the learned (or finalized) machine learning model on a new set of never-before-seen data that we call the test dataset
The importance of splitting our data into a training set and test set cannot be understated. We always evaluate our models on an independent test set because we are interested in knowing how well our models generalize to new data. In the end, isn't this what learning is all about--be it machine learning or human learning? Think back to school when you were a learner yourself: the problems you had to solve as part of your homework would never show up in exactly the same form in the final exam. The same scrutiny should be applied to a machine learning model; we are not so much interested in how well our models can memorize a set of data points (such as a homework problem), but we want to know how our models will use what they have learned to solve new problems (such as the ones that show up in a final exam) and explain new data points.
The next thing to notice is that machine learning is really all about the data. Data enters the previously described workflow diagram in its raw form--whatever that means--and is used in both training and test phases. Data can be anything from images and movies to text documents and audio files. Therefore, in its raw form, data might be made of pixels, letters, words, or even worse: pure bits. It is easy to see that data in such a raw form might not be very convenient to work with. Instead, we have to find ways to preprocess the data in order to bring it into a form that is easy to parse.
Data preprocessing comes in two stages:
- Feature selection: This is the process of identifying important attributes (or features) in the data. Possible features of an image might be the location of edges, corners, or ridges. You might already be familiar with some more advanced feature descriptors that OpenCV provides, such as speeded up robust features (SURF) or the histogram of oriented gradients (HOG). Although these features can be applied to any image, they might not be that important (or work that well) for our specific task. For example, if our task was to distinguish between clean and dirty water, the most important feature might turn out to be the color of the water, and the use of SURF or HOG features might not help us much.
- Feature extraction: This is the actual process of transforming the raw data into the desired feature space. An example would be the Harris operator, which allows us to extract corners (that is, a selected feature) in an image.
A more advanced topic is the process of inventing informative features, which is known as feature engineering. After all, before it was possible for people to select from popular features, someone had to invent them first. This is often more important for the success of our algorithm than the choice of the algorithm itself. We will talk about feature engineering extensively in Chapter 4, Representing Data and Engineering Features.
A last point to be made is that in supervised learning, every data point must have a label. A label identifies a data point of either belonging to a certain class of things (such as cat or dog) or of having a certain value (such as the price of a house). At the end of the day, the goal of a supervised machine learning system is to predict the label of all data points in the test set (as shown in the previous figure). We do this by learning regularities in the training data, using the labels that come with it, and then testing our performance on the test set.
Therefore, in order to build a functioning machine learning system, we first have to cover how to load, store, and manipulate data. How do you even do that in OpenCV with Python?
- Spring Cloud Alibaba核心技術(shù)與實戰(zhàn)案例
- Python 3.7網(wǎng)絡(luò)爬蟲快速入門
- Python快樂編程:人工智能深度學(xué)習(xí)基礎(chǔ)
- Python高級編程
- Python神經(jīng)網(wǎng)絡(luò)項目實戰(zhàn)
- Java性能權(quán)威指南(第2版)
- Mastering Google App Engine
- Java應(yīng)用開發(fā)技術(shù)實例教程
- AIRIOT物聯(lián)網(wǎng)平臺開發(fā)框架應(yīng)用與實戰(zhàn)
- UI設(shè)計全書(全彩)
- 編程可以很簡單
- 零基礎(chǔ)學(xué)Scratch 3.0編程
- Mastering Gephi Network Visualization
- Advanced Python Programming
- 計算語言學(xué)導(dǎo)論