官术网_书友最值得收藏!

Types of features

In the books example, you can see several types of features:

  • Categorical or unordered: Title, author, genre, publisher. They are similar to enumeration without raw values in Swift, but with one difference: they have levels instead of cases. Important: you can't order them or say that one is bigger than another.
  • Binary: The presence or absence of something, just true or false. In our case, the In stock feature.
  • Real numbers: Page count, year, average reader's review score. These can be represented as float or double.

There are others, but these are by far the most common.

The most common ML algorithms require the dataset to consist of a number of samples, where each sample is represented by a vector of real numbers (feature vector), and all samples have the same number of features. The simplest (but not the best) way of translating categorical features into real numbers is by replacing them with numerical codes (Table 1.2).

Table 1.2: dummy books dataset after simple preprocessing:

This is an example of how your dataset may look before you feed it into your ML algorithm. Later, we will discuss the nuts and bolts of data preprocessing for specific applications.

主站蜘蛛池模板: 江西省| 滁州市| 额尔古纳市| 湘西| 扬中市| 军事| 图木舒克市| 伊通| 缙云县| 方山县| 都匀市| 沾化县| 汪清县| 九寨沟县| 闽清县| 凤庆县| 大名县| 文昌市| 灵石县| 重庆市| 诏安县| 长宁县| 鄂温| 天全县| 昂仁县| 泉州市| 涪陵区| 陕西省| 岳普湖县| 锡林郭勒盟| 石台县| 泰安市| 临沭县| 西峡县| 宜阳县| 即墨市| 沁水县| 原阳县| 南岸区| 西安市| 广州市|