- Hands-On Machine Learning with scikit:learn and Scientific Python Toolkits
- Tarek Amr
- 2330字
- 2021-06-18 18:24:28
Understanding machine learning
You may be wondering how machines actually learn. To get the answer to this query, let's take the following example of a fictional company. Space Shuttle Corporation has a few space vehicles to rent. They get applications every day from clients who want to travel to Mars. They are not sure whether those clients will ever return the vehicles—maybe they'll decide to continue living on Mars and never come back again. Even worse, some of the clients may be lousy pilots and crash their vehicles on the way. So, the company decides to hire shuttle rent-approval officers whose job is to go through the applications and decide who is worthy of a shuttle ride. Their business, however, grows so big that they need to formulate the shuttle-approval process.
A traditional shuttle company would start by having business rules and hiring junior employees to execute those rules. For example, if you are an alien, then sorry, you cannot rent a shuttle from us. If you are a human and you have kids that are in school on Earth, then you are more than welcome to rent one of our shuttles. As you can see, those rules are too broad. What about aliens who love living on Earth and just want to go to Mars for a quick holiday? To come up with a better business policy, the company starts hiring analysts. Their job is to go through historical data and try to come up with detailed rules or business logic. These analysts can come up with very detailed rules. If you are an alien, one of your parents is from Neptune, your age is between 0.1 and 0.2 Neptunian years, and you have 3 to 4 kids and one of them is 80% or more human, then you are allowed to rent a shuttle. To be able to come up with suitable rules, the analysts also need a way to measure how good this business logic is. For example, what percentage of the shuttles return if certain rules are applied? They use historic data to evaluate these measures, and only then can wesay that these rules are actually learned from data.
Machine learning works in almost the same way. You want to use historic data to come up with some business logic (an algorithm) in order to optimize some measure of how good the logic is (an objective or loss function). Throughout this book, we will learn about numerous machine learning algorithms; they differ from each other in how they represent business logic, what objective functions they use, and what optimization techniques they utilize to reach a model that maximizes (or sometimes minimizes) the objective function. Like the analysts in the previous example, you should pick an objective function that is as close as possible to your business objective. Any time you hear people saying data scientists should have a good understanding of their business, a significant part of that is their choice of a good objective function and ways to evaluate the models they build. In my example, I quickly picked the percentage of shuttles returned as my objective.
But if you think about it, is this really an accurate one-to-one mapping of the shuttle company's revenue? Is the revenue made by allowing a trip equal to the cost of losing a shuttle? Furthermore, rejecting a trip may also cost your company angry calls to the customer care center and negative word-of-mouth advertising. You have to understand all of this well enough before picking your objective function.
Finally, a key benefit to using machine learning is that it can iterate over a vast amount of business logic cases until it reaches the optimum objective function, unlike the case of the analysts in our space shuttle company who can only go so far with their rules. The machine learning approach is also automated in the sense that it keeps updating the business logic whenever new data arrives. These two aspects make it scalable, more accurate, and adaptable to change.
Types of machine learning algorithms
In this book, we are going to cover the two main paradigms of machine learning—supervised learning and unsupervised learning. Each of these two paradigms has its own sub-branches that will be discussed in the next section. Although it is not covered in this book, reinforcement learning will also be introduced in the next section:

Let's use our fictional Space Shuttle Corporation company once more to explain the differences between the different machine learning paradigms.
Supervised learning
Remember those old good days at school when you were given examples to practice on, along with the correct answers to them at the end to validate whether you are doing a good job? Then, at exam time, you were left on your own. That's basically what supervised learning is. Say our fictional space vehicle company wants to predict whether travelers will return their space vehicles. Luckily, the company has worked with many travelers in the past, and they already know which of them returned their vehicles and who did not. Think of this data as a spreadsheet, where each column has some information about the travelers—their financial statements, the number of kids they have, whether they are humans or aliens, and maybe their age (in Neptunian years, of course). Machine learners call these columns features. There is one extra column for previous travelers that states whether they returned or not; we call this column the label or target column. In the learning phase, we build a model using the features and targets. The aim of the algorithm while learning is to minimize the differences between its predictions and the actual targets. The difference is what we call the error. Once a model is constructed so that its error is minimal, we then use it to make predictions for newer data points. For new travelers, we only know their features, but we use the model we've just built to predict their corresponding targets. In a nutshell, the presence of the target in our historic data is what makes this process supervised.
Classification versus regression
Supervised learning is furthersubdivided into classification and regression. For cases where we only have a few predefined labels to predict, we use a classifier—for example, returnversusno return orhuman versusMartian versusVenusian. If what we want to predict is a wide-range number—say, how many years a traveler will take to come back—then it is a regression problem since these values can be anything from 1 or 2 years to 3 years, 5 months, and 7 days.
Supervised learning evaluation
Due to their differences, the metrics we use to evaluate these classifiers are usually different from ones we use with regression:
- Classifier evaluation metrics: Suppose we are using a classifier to determine whether a traveler is going to return. Then, of those travelers that the classifier predicted to return, we want to measure what percentage of them actually did return. We call this measure precision. Also, of all travelers who did return, we want to measure what percentage of them the classifier correctly predicted to return. We call this recall. Precision and recall can be calculated for each class—that is, we can also calculate precision and recall for the travelers who did not return.
Accuracy is another commonly used, and sometimes abused, measure. For each case in our historic data, we know whether a traveler actually returned (actuals) and we can also generate predictions of whether they will return. The accuracy calculates what percentage of cases of the predictions and actuals match. As you can see, it is labeled agnostic, so it can sometimesbe misleading when the classes are highly imbalanced. In our example business, say 99% of our travelers actually return. We can build a dummy classifier that predicts whether every single traveler returns; it will be accurate 99% of the time. This 99% accuracy value doesn't tell us much, especially if you know that in these cases, the recall value for non-returning travelers is 0%. As we are going to see later on in this book, each measure has its pros and cons, and a measure is only as good as how close it is to our business objectives. We are also going to learn about other metrics, such as F1 score, AUC, and log loss.
- Regressor evaluation metrics: If we are using a regressor to tell how long a traveler will stay, then we need to determine how far the numbers that the regression evaluation is predicting are from reality. Let's say for three users, the regressor expected them to stay for 6, 9, and 20 years, respectively, while they actually stayed for 5, 10, and 26 years, respectively. One solution is to calculate the average of the differences between the prediction and the reality—the average of 6–5, 9–10, and 20–25, so the average of 1, -1, and -6 is -2. One problem with these calculations is that 1 and -1 cancel each other out. If you think about it, both 1 and -1 are mistakes that the model made, and the sign might not matter much here.
So, we will need to use Mean Absolute Error(MAE) instead. This calculates the average of the absolute values of the differences—so, the average of 1, 1, and 6 is 2.67. This makes more sense now, but what if we can tolerate a 1-year difference more than a 6-year difference? We can then useMean Squared Error(MSE) to calculate the average of the differences squared—so, the average of 1, 1, and 36 is 12.67. Clearly, each measure has its pros and cons here as well. Additionally, we can also use different variations of these metrics, such as median absolute error or max error. Furthermore, sometimes your business objective can dictate other measures. Say we want to penalize the model if it predicts that a traveler will arrive 1 year later twice as often as when it predicts them to arrive 1 year earlier—what metric can you come up with then?
In practice, the lines between classification and regression problems can get blurred sometimes. For the case of how many years a traveler will take to return, you can still decide to bucket the range into 1–5 years, 5–10 years, and 10+ years. Then, you end up with a classification problem to solve instead. Conversely, classifiers return probabilities along with their predicted targets. For the case of whether a user will return, a predicted value of 60% and 95% means the same thing from a binary classifier's point of view, but the classifier is more confident that the traveler will return in the second case compared to the first case. Although this is still a classification problem, we can use the Brier score to evaluate our classifier here, which is actually MSE in disguise. More on the Brier score will be covered in Chapter 9, The Y is as important as the X. Most of the time, it is clear whether you are facing a classification or regression problem, but always keep your eyes open to the possibility of reformulating your problem if needed.
Unsupervised learning
Life doesn't always provide us with correct answers as was the case when we were in school. We have been told that space travelers like it when they are traveling with like-minded passengers. We already know a lot about our travelers, but of course, no traveler will say by the way, I am a type A, B, or C traveler. So, to group our clients, we use a form of unsupervised learning called clustering. Clustering algorithms try to come up with groups and put our travelers into them without us telling them what groups may exist. Unsupervised learning lacks targets, but this doesn't mean that we cannot evaluate our clustering algorithms. We want the members of a cluster to be similar to each other, but we also want them to be dissimilar from the members of adjacent clusters. The silhouette coefficient basically measures that. We will come across other measures for clustering, such as the Davies-Bouldinindex and the Calinski-Harabaszindex, later in this book
Reinforcement learning
Reinforcement learning is beyond the scope of this book and is not implemented in scikit-learn. Nevertheless, I will briefly talk about it here. In the supervised learning examples we have looked at, we treated each traveler separately. If we want to know which travelers are going to return their space vehicles the earliest, our aim then is to pick the best travelers for our business. But if you think about it, the behavior of one traveler affects the experience of the others as well. We only allow space vehicles to stay up to 20 years in space. However, we haven't explored the effect of allowing some travelers to stay longer or the effect of having a stricter rent period for other travelers. Reinforcement learning is the answer to that, where the key to it is exploration and exploitation.
Rather than dealing with each action separately, we may want to explore sub-optimal actions in order to reach an overall optimumset of actions. Reinforcement learning is used in robotics, where a robot has a goal and it can only reach it through a sequence of steps—2 steps to the right, 5 steps forward, and so on. We can't tell whether a right versus left step is better on its own; the whole sequence must be found to reach the best outcome. Reinforcement learningis also used in gaming, as well as in recommendation engines. If Netflix only recommended to a user what matches their taste best, a user may end up with nothing but Star Wars movies on their home screen. Reinforcement learning is thenneeded to explore less-optimum matches to enrich the user's overall experience.
- FreeSWITCH 1.8
- 造個小程序:與微信一起干件正經事兒
- PHP+MySQL+Dreamweaver動態網站開發實例教程
- MySQL數據庫管理與開發(慕課版)
- 信息技術應用基礎
- Modular Programming in Java 9
- The Complete Coding Interview Guide in Java
- 組態軟件技術與應用
- PHP編程基礎與實例教程
- PHP從入門到精通(第4版)(軟件開發視頻大講堂)
- Vue.js 2 Web Development Projects
- OpenCV with Python By Example
- 零基礎看圖學ScratchJr:少兒趣味編程(全彩大字版)
- Training Systems Using Python Statistical Modeling
- Practical Maya Programming with Python