書名： Hands-On Natural Language Processing with Python
作者名： Rajesh Arumugam Rajalingappaa Shanmugamani
本章字?jǐn)?shù)： 143字
更新時間： 2021-08-13 16:01:47

One-hot encoding

One-hot encoding is a vectorization technique for labeled data, especially categorical data. In the case of binary labels, target variables will be presented as [0, 1], [1, 0]. The same representation for three classes will appear as [0, 0, 1], [0, 1, 0], [1, 0, 0]. This type of representation can support any number of categories. The main advantage of one-hot encoding is that it treats all categorical data equally, in contrast to arbitrary categorical labels. For instance, categories to represent colors such as red, green, and blue, may use integers such as 0, 1, and 2. Although there is no intrinsic order for colors, some ML models may treat such input as if it has an order. This is avoided in one-hot encoding, as it does not assume any order in the categorical values since they are binary encoded.

官术网_书友最值得收藏!

Hands-On Natural Language Processing with Python

One-hot encoding