官术网_书友最值得收藏!

Types of features

In the books example, you can see several types of features:

  • Categorical or unordered: Title, author, genre, publisher. They are similar to enumeration without raw values in Swift, but with one difference: they have levels instead of cases. Important: you can't order them or say that one is bigger than another.
  • Binary: The presence or absence of something, just true or false. In our case, the In stock feature.
  • Real numbers: Page count, year, average reader's review score. These can be represented as float or double.

There are others, but these are by far the most common.

The most common ML algorithms require the dataset to consist of a number of samples, where each sample is represented by a vector of real numbers (feature vector), and all samples have the same number of features. The simplest (but not the best) way of translating categorical features into real numbers is by replacing them with numerical codes (Table 1.2).

Table 1.2: dummy books dataset after simple preprocessing:

This is an example of how your dataset may look before you feed it into your ML algorithm. Later, we will discuss the nuts and bolts of data preprocessing for specific applications.

主站蜘蛛池模板: 苏州市| 江口县| 余姚市| 永和县| 安义县| 永春县| 阿巴嘎旗| 沛县| 女性| 苗栗市| 齐齐哈尔市| 泌阳县| 河东区| 利川市| 凤城市| 花莲县| 安义县| 错那县| 嘉鱼县| 佳木斯市| 察雅县| 龙游县| 雷波县| 苏尼特右旗| 南木林县| 扬州市| 阿拉尔市| 当阳市| 泰安市| 军事| 万载县| 富宁县| 九龙城区| 盘锦市| 高碑店市| 武汉市| 民县| 富顺县| 江门市| 天柱县| 呈贡县|