- Effective Amazon Machine Learning
- Alexis Perrier
- 426字
- 2021-07-03 00:17:48
Extracting features to predict outcomes
That available data needs to be accessible and meaningful in order for the algorithm to extract information.
Let's consider a simple example. Imagine that we want to predict the market price of a house in a given city. We can think of many variables that would be predictors of the price of a house: the number of rooms or bathrooms, the neighborhood, the surface, the heating system, and so on. These variables are called features, attributes, or predictors. The value that we want to predict is called the outcome or the target.
If we want our predictions to be reliable, we need several features. Predicting the price of a house based on its surface alone would not be very efficient. Many other factors influence the price of a house and our dataset should include as many as possible (with conditions).
It's often possible to add large numbers of attributes to a model to try to improve the predictions. For instance, in our housing pricing prediction, we could add all the characteristics of the house (bathroom, superficies, heating system, the number of windows). Some of these variables would bring more information to our pricing model and increase the accuracy of our predictions, while others would just add noise and confuse the algorithm. Adding new variables to a predicting model does not always improve the predictions.
In order to make reliable predictions, each of the new features you bring to your model must bring some valuable piece of information. However, this is also not always the case. As we will see in Chapter 2, Machine Learning Definitions and Concepts, correlated predictors can hurt the performances of the model.
Predictive analytics is built on several assumptions and conditions:
- The value you are trying to predict is predictable and not just some random noise.
- You have access to data that has some degree of association to the target.
- The available dataset is large enough. Reliable predictions cannot be inferred from a dataset that is too small. (For instance, you can define and therefore predict a line with two points but you cannot infer data that follows a sine curve from only two points.)
- The new data you will base future predictions on is similar to the one you parameterized and trained your model on.
You may have a great dataset, but that does not mean it will be efficient for predictions.
These conditions on the data are very general. In the case of SGD, the conditions are more constrained.
- Python數據分析、挖掘與可視化從入門到精通
- 數據要素五論:信息、權屬、價值、安全、交易
- Oracle PL/SQL實例精解(原書第5版)
- 大數據技術入門
- 大數據精準挖掘
- 云原生數據中臺:架構、方法論與實踐
- 辦公應用與計算思維案例教程
- INSTANT Android Fragmentation Management How-to
- SQL Server深入詳解
- Mastering LOB Development for Silverlight 5:A Case Study in Action
- Python數據分析從小白到專家
- Web Services Testing with soapUI
- Unity 2018 By Example(Second Edition)
- 區塊鏈+:落地場景與應用實戰
- 大數據與機器學習:實踐方法與行業案例