官术网_书友最值得收藏!

What is classification?

Classification is one of the largest uses of data mining, both in practical use and in research. As before, we have a set of samples that represents objects or things we are interested in classifying. We also have a new array, the class values. These class values give us a categorization of the samples. Some examples are as follows:

  • Determining the species of a plant by looking at its measurements. The class value here would be: Which species is this?
  • Determining if an image contains a dog. The class would be: Is there a dog in this image?
  • Determining if a patient has cancer, based on the results of a specific test. The class would be: Does this patient have cancer?

While many of the examples previous are binary (yes/no) questions, they do not have to be, as in the case of plant species classification in this section.

The goal of classification applications is to train a model on a set of samples with known classes and then apply that model to new unseen samples with unknown classes. For example, we want to train a spam classifier on my past e-mails, which I have labeled as spam or not spam. I then want to use that classifier to determine whether my next email is spam, without me needing to classify it myself.

主站蜘蛛池模板: 常宁市| 林口县| 宜都市| 岳普湖县| 磴口县| 隆昌县| 循化| 专栏| 秦皇岛市| 廉江市| 兰坪| 泊头市| 林甸县| 南阳市| 高陵县| 福泉市| 藁城市| 青河县| 旅游| 江达县| 博白县| 秭归县| 襄城县| 社旗县| 淄博市| 克山县| 手机| 大港区| 吴忠市| 郓城县| 静海县| 饶平县| 庆安县| 安宁市| 灵璧县| 卫辉市| 温泉县| 营山县| 应用必备| 揭阳市| 萨迦县|