- Machine Learning with Swift
- Alexander Sosnovshchenko
- 184字
- 2021-06-24 18:54:56
One-hot encoding
Most of the machine learning algorithms can't work with the categorical variables, so usually we want to convert them to the one-hot vectors (statisticians prefer to call them dummy variables). Let's convert first, and then I will explain what this is:
In []: features = pd.get_dummies(features, columns = ['color']) features.head() Out[]:

So now, instead of one column, color, we have four columns: color_light black, color_pink gold, color_purple polka dot, and color_space gray. The color of each sample is encoded as 1 in the corresponding column. Why do we need this if we could simply replace colors with the numbers from 1 to 4? Well, this is the problem: why to prefer 1 to 4 over the 4 to 1, or powers of 2, or prime numbers? These colors on their own don't carry any quantitative information associated to them. They can't be sorted from the largest to the smallest. If we introduce this information artificially, the machine learning algorithm may attempt to utilize that meaningless information, and we will end up with the classifier that sees regularities where there are none.
- INSTANT Wijmo Widgets How-to
- 硬件產品經理手冊:手把手構建智能硬件產品
- VCD、DVD原理與維修
- Creating Flat Design Websites
- Hands-On Artificial Intelligence for Banking
- Managing Data and Media in Microsoft Silverlight 4:A mashup of chapters from Packt's bestselling Silverlight books
- Python Machine Learning Blueprints
- IP網絡視頻傳輸:技術、標準和應用
- 計算機電路基礎(第2版)
- Raspberry Pi Home Automation with Arduino
- 筆記本電腦現場維修實錄
- 施耐德M241/251可編程序控制器應用技術
- ActionScript Graphing Cookbook
- 超炫的35個Arduino制作項目
- The Deep Learning Workshop