官术网_书友最值得收藏!

What is a good model?

A model is a bunch of values that describe the world, alongside the program that produces these values. That much, we have concluded from the previous section. Now we have to pass some value judgments on models - whether a model is good or bad.

A good model needs to describe the world accurately. This is said in the most generic way possible. Described thus, this statement about a good model encompasses many notions. We shall have to make this abstract idea a bit more concrete to proceed.

A machine learning algorithm is trained on a bunch of data. To the machine, this bunch of data is the world. But to us, the data that we feed the machine in for training is not the world. To us humans, there is much more to the world than what the machine may know about. So when I say "a good model needs to describe the world accurately", there are two senses to the word "world" that applies - the world as the machine knows, and the world as we know it.

The machine has only seen portions of the world as we know it. There are parts of the world the machine has not seen. So it is then a good machine learning model when it is able to provide the correct outputs for inputs it has not seen yet.

As a concrete example, let's once again suppose that we have a machine learning algorithm that determines if an image is that of a hot dog or not. We feed the model images of hot dogs and hamburgers. To the machine, the world are simply images of hot dogs and hamburgers. What happens when we pass in as input, an image of vegetables? A good model would be able to generalize and say it's not a hot dog. A poor model would simply crash.

And thus with this analogy, we have defined a good model to be one that generalizes well to unknown situations.

Often, as part of the process of building machine learning systems, we would want to put this notion to test. So we would have to split our dataset into testing and training datasets. The machine would be trained on the training dataset, and to test how good the model is once the training has completed, we will then feed in the testing dataset to the machine. It's assumed of course that the machine has never seen the testing dataset. A good machine learning model hence would be able to generate the correct output for the testing dataset, despite never having seen it.

主站蜘蛛池模板: 洪江市| 梁山县| 长葛市| 抚远县| 兴隆县| 三穗县| 岳阳县| 长丰县| 灵丘县| 波密县| 遂平县| 隆尧县| 磐安县| 香港| 如皋市| 抚远县| 聊城市| 尼勒克县| 海南省| 托克逊县| 喜德县| 阿拉善盟| 贞丰县| 霍山县| 巴东县| 清水河县| 儋州市| 湘乡市| 利辛县| 金乡县| 元阳县| 温宿县| 朝阳区| 宜都市| 乌苏市| 安平县| 建水县| 莱州市| 星子县| 河间市| 成都市|