官术网_书友最值得收藏!

One-hot-encoding

The one-of-K or one-hot-encoding scheme uses dummy variables to encode categorical features. Originally it was applied to digital circuits. The dummy variables have binary values like bits, so they take the values zero or one (equivalent to true or false). For instance, if we want to encode continents, we will have dummy variables, such as is_asia, which will be true if the continent is Asia and false otherwise. In general, we need as many dummy variables, as there are unique labels minus one. We can determine one of the labels automatically from the dummy variables, because the dummy variables are exclusive. If the dummy variables all have a false value, then the correct label is the label for which we don't have a dummy variable. The following table illustrates the encoding for continents:

The encoding produces a matrix (grid of numbers) with lots of zeroes (false values) and occasional ones (true values). This type of matrix is called a sparse matrix. The sparse matrix representation is handled well by the SciPy package, and shouldn't be an issue. We will discuss the SciPy package later in this chapter.

主站蜘蛛池模板: 离岛区| 吴旗县| 固原市| 琼中| 奉节县| 唐山市| 德兴市| 莎车县| 花莲市| 曲周县| 岳西县| 布拖县| 兴海县| 澳门| 康平县| 金塔县| 长治市| 平泉县| 闽侯县| 上高县| 婺源县| 仁寿县| 万盛区| 荣昌县| 静安区| 永年县| 日土县| 洞口县| 左权县| 乐至县| 盘山县| 沾益县| 垦利县| 垣曲县| 呈贡县| 秦皇岛市| 通辽市| 苍梧县| 朝阳县| 龙陵县| 肃南|