官术网_书友最值得收藏!

One-hot-encoding

The one-of-K or one-hot-encoding scheme uses dummy variables to encode categorical features. Originally it was applied to digital circuits. The dummy variables have binary values like bits, so they take the values zero or one (equivalent to true or false). For instance, if we want to encode continents, we will have dummy variables, such as is_asia, which will be true if the continent is Asia and false otherwise. In general, we need as many dummy variables, as there are unique labels minus one. We can determine one of the labels automatically from the dummy variables, because the dummy variables are exclusive. If the dummy variables all have a false value, then the correct label is the label for which we don't have a dummy variable. The following table illustrates the encoding for continents:

The encoding produces a matrix (grid of numbers) with lots of zeroes (false values) and occasional ones (true values). This type of matrix is called a sparse matrix. The sparse matrix representation is handled well by the SciPy package, and shouldn't be an issue. We will discuss the SciPy package later in this chapter.

主站蜘蛛池模板: 新源县| 泸溪县| 长海县| 明水县| 梅河口市| 大荔县| 项城市| 大厂| 平和县| 六盘水市| 镇江市| 眉山市| 镇宁| 鸡东县| 连江县| 郴州市| 黑河市| 新闻| 泾川县| 芒康县| 客服| 仁布县| 普格县| 鹤庆县| 龙门县| 宜兰市| 广河县| 四会市| 伊春市| 新平| 华容县| 皮山县| 兴义市| 永康市| 房产| 松江区| 大港区| 建始县| 洛宁县| 通州区| 阿拉善盟|