官术网_书友最值得收藏!

What is classification?

Classification is one of the largest uses of data mining, both in practical use and in research. As before, we have a set of samples that represents objects or things we are interested in classifying. We also have a new array, the class values. These class values give us a categorization of the samples. Some examples are as follows:

  • Determining the species of a plant by looking at its measurements. The class value here would be: Which species is this?
  • Determining if an image contains a dog. The class would be: Is there a dog in this image?
  • Determining if a patient has cancer, based on the results of a specific test. The class would be: Does this patient have cancer?

While many of the examples previous are binary (yes/no) questions, they do not have to be, as in the case of plant species classification in this section.

The goal of classification applications is to train a model on a set of samples with known classes and then apply that model to new unseen samples with unknown classes. For example, we want to train a spam classifier on my past e-mails, which I have labeled as spam or not spam. I then want to use that classifier to determine whether my next email is spam, without me needing to classify it myself.

主站蜘蛛池模板: 铜川市| 阳城县| 贺兰县| 南康市| 巴中市| 隆昌县| 万盛区| 蒙山县| 宁陕县| 汝城县| 临海市| 柳河县| 乐亭县| 乌鲁木齐县| 扶余县| 甘洛县| 军事| 宝兴县| 资阳市| 讷河市| 扶沟县| 青铜峡市| 额尔古纳市| 卫辉市| 庆云县| 青海省| 香河县| 寻乌县| 界首市| 白朗县| 平定县| 兴海县| 察哈| 任丘市| 大化| 吐鲁番市| 成安县| 班玛县| 勐海县| 桑植县| 休宁县|