官术网_书友最值得收藏!

Types of features

In the books example, you can see several types of features:

  • Categorical or unordered: Title, author, genre, publisher. They are similar to enumeration without raw values in Swift, but with one difference: they have levels instead of cases. Important: you can't order them or say that one is bigger than another.
  • Binary: The presence or absence of something, just true or false. In our case, the In stock feature.
  • Real numbers: Page count, year, average reader's review score. These can be represented as float or double.

There are others, but these are by far the most common.

The most common ML algorithms require the dataset to consist of a number of samples, where each sample is represented by a vector of real numbers (feature vector), and all samples have the same number of features. The simplest (but not the best) way of translating categorical features into real numbers is by replacing them with numerical codes (Table 1.2).

Table 1.2: dummy books dataset after simple preprocessing:

This is an example of how your dataset may look before you feed it into your ML algorithm. Later, we will discuss the nuts and bolts of data preprocessing for specific applications.

主站蜘蛛池模板: 东台市| 林州市| 武隆县| 怀远县| 丹寨县| 开远市| 和静县| 宁波市| 浦城县| 南澳县| 巴彦淖尔市| 绥德县| 西宁市| 疏附县| 印江| 博野县| 武乡县| 萝北县| 志丹县| 宜州市| 阿勒泰市| 桐梓县| 德格县| 玛纳斯县| 文登市| 林西县| 迭部县| 靖州| 贵港市| 祁门县| 尚志市| 互助| 布拖县| 吉木萨尔县| 红河县| 崇义县| 定日县| 杭锦后旗| 兴安县| 甘肃省| 绍兴县|