官术网_书友最值得收藏!

  • Machine Learning with Swift
  • Alexander Sosnovshchenko
  • 316字
  • 2021-06-24 18:54:57

Evaluating accuracy

Score function calculates accuracy of the model using the data. Let's calculate the accuracy of our model on the training set:

In []: 
tree_model.score(X_train, y_train) 
Out[]: 
1.0 

Wow, looks like our model is 100% accurate. Isn't it a great result? Let's not hurry and check our model on held-out data. Evaluation on the test set is the golden standard of success in machine learning:

In []: 
tree_model.score(X_test, y_test) 
Out[]: 
0.87666666666666671 

Worse now. What's just happened? Here, the first time we were faced with the problem of overfitting, when the model is trying to fit itself to every quirk in the data. Our model adjusted itself to the training data so much, that on the previously unseen data, it lacks the ability to generalize. As any real-world data contains noise and signal, we want our models to fit to the signal and to ignore the noise component. Overfitting is the most common problem in machine learning. It's common when datasets are too small, or models are too flexible. The opposite situation is called underfitting—when the model is not able to fit the complex data well enough:

Figure 2.6: Underfitting (right column) versus good fit (central column) versus overfitting (right column). Top row shows classification problem, bottom row shows regression problem.

An overfitting problem is familiar to anyone who looked at some item at the online store, and then was presented with targeted advertisement of the same item everywhere on the internet. This item most likely is not relevant anymore, but the machine learning algorithm already overfitted to the limited dataset, and now you have trinket rabbits (or whatever you've looked at on the e-store) on every page you open.

In any case, we must fight overfitting somehow. So, what can we do? The simplest solution is to make the model simpler and less flexible (or, speaking machine learning, to reduce model capacity).

主站蜘蛛池模板: 聂荣县| 宜春市| 德清县| 阜南县| 云安县| 中山市| 武乡县| 日喀则市| 麟游县| 连州市| 尖扎县| 贡觉县| 遵义市| 龙川县| 德庆县| 光山县| 饶平县| 临高县| 南丰县| 保定市| 鸡泽县| 舟山市| 望奎县| 平谷区| 巩留县| 湛江市| 大兴区| 邛崃市| 宁化县| 万全县| 平乐县| 连山| 襄樊市| 甘谷县| 正镶白旗| 桑植县| 珠海市| 磐安县| 宜章县| 米林县| 绥滨县|