- Machine Learning with Swift
- Alexander Sosnovshchenko
- 184字
- 2021-06-24 18:54:56
One-hot encoding
Most of the machine learning algorithms can't work with the categorical variables, so usually we want to convert them to the one-hot vectors (statisticians prefer to call them dummy variables). Let's convert first, and then I will explain what this is:
In []: features = pd.get_dummies(features, columns = ['color']) features.head() Out[]:

So now, instead of one column, color, we have four columns: color_light black, color_pink gold, color_purple polka dot, and color_space gray. The color of each sample is encoded as 1 in the corresponding column. Why do we need this if we could simply replace colors with the numbers from 1 to 4? Well, this is the problem: why to prefer 1 to 4 over the 4 to 1, or powers of 2, or prime numbers? These colors on their own don't carry any quantitative information associated to them. They can't be sorted from the largest to the smallest. If we introduce this information artificially, the machine learning algorithm may attempt to utilize that meaningless information, and we will end up with the classifier that sees regularities where there are none.
- 圖解西門子S7-200系列PLC入門
- 顯卡維修知識(shí)精解
- 數(shù)字道路技術(shù)架構(gòu)與建設(shè)指南
- Intel FPGA/CPLD設(shè)計(jì)(高級(jí)篇)
- 單片機(jī)原理及應(yīng)用系統(tǒng)設(shè)計(jì)
- Artificial Intelligence Business:How you can profit from AI
- 從零開(kāi)始學(xué)51單片機(jī)C語(yǔ)言
- Mastering Manga Studio 5
- Large Scale Machine Learning with Python
- 基于Apache Kylin構(gòu)建大數(shù)據(jù)分析平臺(tái)
- OpenGL Game Development By Example
- Building 3D Models with modo 701
- Hands-On Artificial Intelligence for Banking
- 超大流量分布式系統(tǒng)架構(gòu)解決方案:人人都是架構(gòu)師2.0
- “硬”核:硬件產(chǎn)品成功密碼