官术网_书友最值得收藏!

One hot encoding

Numerical or categorical information can easily be normally represented by integers, one for each option or discrete result. But there are situations where bins indicating the current option are preferred. This form of data representation is called one hot encodingThis encoding simply transforms a certain input into a binary array containing only zeros, except for the value indicated by the value of a variable, which will be one.

In the simple case of an integer, this will be the representation of the list [1, 3, 2, 4] in one hot encoding:

[[0 1 0 0 0]
[0 0 0 1 0]
[0 0 1 0 0]
[0 0 0 0 1]]

Let's perform a simple implementation of a one hot integer encoder for integer arrays, in order to better understand the concept:

import numpy as np
def get_one_hot(input_vector):
result=[]
for i in input_vector:
newval=np.zeros(max(input_vector))
newval.itemset(i-1,1)
result.append(newval)
return result

In this example, we first define the get_one_hot function, which takes an array as input and returns an array.

What we do is take the elements of the arrays one by one, and for each element in it, we generate a zero array with length equal to the maximum value of the array, in order to have space for all possible values. Then we insert 1 on the index position indicated by the current value (we subtract 1 because we go from 1-based indexes to 0-based indexes).

Let's try the function we just wrote:

get_one_hot([1,5,2,4,3])

#Out:
[array([ 1., 0., 0., 0., 0.]),
array([ 0., 0., 0., 0., 1.]),
array([ 0., 1., 0., 0., 0.]),
array([ 0., 0., 0., 1., 0.]),
array([ 0., 0., 1., 0., 0.])]
主站蜘蛛池模板: 江津市| 托克逊县| 南宫市| 泉州市| 砚山县| 洱源县| 罗城| 绵阳市| 洪雅县| 和政县| 上林县| 杭锦旗| 盐城市| 乐东| 洪湖市| 临泉县| 贵南县| 梓潼县| 阳城县| 新兴县| 麦盖提县| 色达县| 桂平市| 平利县| 扬中市| 常德市| 云浮市| 虞城县| 马边| 临城县| 香港 | 樟树市| 江山市| 太白县| 安溪县| 登封市| 青州市| 盐边县| 疏勒县| 喀什市| 清水县|