- Effective Amazon Machine Learning
- Alexis Perrier
- 426字
- 2021-07-03 00:17:48
Extracting features to predict outcomes
That available data needs to be accessible and meaningful in order for the algorithm to extract information.
Let's consider a simple example. Imagine that we want to predict the market price of a house in a given city. We can think of many variables that would be predictors of the price of a house: the number of rooms or bathrooms, the neighborhood, the surface, the heating system, and so on. These variables are called features, attributes, or predictors. The value that we want to predict is called the outcome or the target.
If we want our predictions to be reliable, we need several features. Predicting the price of a house based on its surface alone would not be very efficient. Many other factors influence the price of a house and our dataset should include as many as possible (with conditions).
It's often possible to add large numbers of attributes to a model to try to improve the predictions. For instance, in our housing pricing prediction, we could add all the characteristics of the house (bathroom, superficies, heating system, the number of windows). Some of these variables would bring more information to our pricing model and increase the accuracy of our predictions, while others would just add noise and confuse the algorithm. Adding new variables to a predicting model does not always improve the predictions.
In order to make reliable predictions, each of the new features you bring to your model must bring some valuable piece of information. However, this is also not always the case. As we will see in Chapter 2, Machine Learning Definitions and Concepts, correlated predictors can hurt the performances of the model.
Predictive analytics is built on several assumptions and conditions:
- The value you are trying to predict is predictable and not just some random noise.
- You have access to data that has some degree of association to the target.
- The available dataset is large enough. Reliable predictions cannot be inferred from a dataset that is too small. (For instance, you can define and therefore predict a line with two points but you cannot infer data that follows a sine curve from only two points.)
- The new data you will base future predictions on is similar to the one you parameterized and trained your model on.
You may have a great dataset, but that does not mean it will be efficient for predictions.
These conditions on the data are very general. In the case of SGD, the conditions are more constrained.
- 數(shù)據(jù)庫(kù)應(yīng)用實(shí)戰(zhàn)
- 大數(shù)據(jù)可視化
- 正則表達(dá)式必知必會(huì)
- 云計(jì)算服務(wù)保障體系
- 圖解機(jī)器學(xué)習(xí)算法
- PySpark大數(shù)據(jù)分析與應(yīng)用
- 數(shù)據(jù)庫(kù)程序員面試筆試真題庫(kù)
- Spark大數(shù)據(jù)編程實(shí)用教程
- Flutter Projects
- Unreal Engine Virtual Reality Quick Start Guide
- 大數(shù)據(jù)數(shù)學(xué)基礎(chǔ)(Python語(yǔ)言描述)
- 區(qū)塊鏈+:落地場(chǎng)景與應(yīng)用實(shí)戰(zhàn)
- SIEMENS數(shù)控技術(shù)應(yīng)用工程師:SINUMERIK 840D-810D數(shù)控系統(tǒng)功能應(yīng)用與維修調(diào)整教程
- Mastering ROS for Robotics Programming(Second Edition)
- Filecoin原理與實(shí)現(xiàn)