官术网_书友最值得收藏!

One hot encoding

The one-of-K or one hot encoding scheme uses dummy variables to encode categorical features. Originally, it was applied to digital circuits. The dummy variables have binary values such as bits, so they take the values zero or one (equivalent to true or false). For instance, if we want to encode continents, we'll have dummy variables, such as is_asia, which will be true if the continent is Asia and false otherwise. In general, we need as many dummy variables as there are unique labels minus one. We can determine one of the labels automatically from the dummy variables, because the dummy variables are exclusive. If the dummy variables all have a false value, then the correct label is the label for which we don't have a dummy variable. The following table illustrates the encoding for continents:

The encoding produces a matrix (grid of numbers) with lots of zeroes (false values) and occasional ones (true values). This type of matrix is called a sparse matrix. The sparse matrix representation is handled well by the the scipy package and shouldn't be an issue. We'll discuss the scipy package later in this chapter.

主站蜘蛛池模板: 姜堰市| 平江县| 兰考县| 南川市| 左云县| 南部县| 东乌| 博爱县| 元阳县| 普陀区| 九寨沟县| 武夷山市| 伊宁县| 福安市| 安阳市| 新宁县| 巴青县| 米泉市| 兴海县| 武乡县| 赣榆县| 军事| 克山县| 建阳市| 蒙城县| 淮安市| 八宿县| 南溪县| 武夷山市| 乌拉特中旗| 盐城市| 凤翔县| 广安市| 景德镇市| 繁昌县| 河津市| 镇赉县| 横山县| 马尔康县| 密山市| 平昌县|