官术网_书友最值得收藏!

One hot encoding

The one-of-K or one hot encoding scheme uses dummy variables to encode categorical features. Originally, it was applied to digital circuits. The dummy variables have binary values such as bits, so they take the values zero or one (equivalent to true or false). For instance, if we want to encode continents, we'll have dummy variables, such as is_asia, which will be true if the continent is Asia and false otherwise. In general, we need as many dummy variables as there are unique labels minus one. We can determine one of the labels automatically from the dummy variables, because the dummy variables are exclusive. If the dummy variables all have a false value, then the correct label is the label for which we don't have a dummy variable. The following table illustrates the encoding for continents:

The encoding produces a matrix (grid of numbers) with lots of zeroes (false values) and occasional ones (true values). This type of matrix is called a sparse matrix. The sparse matrix representation is handled well by the the scipy package and shouldn't be an issue. We'll discuss the scipy package later in this chapter.

主站蜘蛛池模板: 白沙| 海宁市| 敦化市| 石首市| 双峰县| 沙洋县| 太仆寺旗| 霸州市| 通化县| 井研县| 呼和浩特市| 乌拉特中旗| 图们市| 山阳县| 启东市| 衡南县| 崇礼县| 龙岩市| 重庆市| 合川市| 祥云县| 弥渡县| 枣阳市| 偃师市| 枣阳市| 香港| 盈江县| 濮阳市| 和林格尔县| 墨玉县| 达拉特旗| 共和县| 依安县| 巨野县| 西宁市| 安义县| 邵阳县| 建水县| 九寨沟县| 肃北| 读书|