- Practical Data Analysis Cookbook
- Tomasz Drabas
- 339字
- 2021-07-16 11:13:55
Encoding categorical variables
The final step on the road to prepare the data for the exploratory phase is to bin categorical variables. Some software packages do this behind the scenes, but it is good to understand when and how to do it.
Any statistical model can accept only numerical data. Categorical data (sometimes can be expressed as digits depending on the context) cannot be used in a model straightaway. To use them, we encode them, that is, give them a unique numerical code. This is to explain when. As for how—you can use the following recipe.
Getting ready
To execute this recipe, you will need the pandas
module.
No other prerequisites are required.
How to do it…
Once again, pandas
already has a method that does all of this for us (the data_dummy_code.py
file):
# dummy code the column with the type of the property csv_read = pd.get_dummies( csv_read, prefix='d', columns=['type'] )
How it works…
The .get_dummies(...)
method converts categorical variables into dummy variables. For example, consider a variable with three different levels:
1 One 2 Two 3 Three
We will need three columns to code it:
1 One 1 0 0 2 Two 0 1 0 3 Three 0 0 1
Sometimes, we can get away with using only two additional columns. However, we can use this trick only if one of the levels is, effectively, null:
1 One 1 0 2 Two 0 1 3 Zero 0 0
The first parameter to the .get_dummies(...)
method is the DataFrame. The columns
parameter specifies the column (or columns, as we can also pass a list) in the DataFrame to the dummy code. Specifying the prefix, we instruct the method that the names of the new columns generated should have the d_
prefix; in our example, the generated dummy-coded columns will have d_Condo
names (as an example). The underscore _
character is default but can also be altered by specifying the prefix_sep
parameter.
Tip
For a full list of parameters to the .get_dummies(...)
method, see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html.
- Learning LibGDX Game Development(Second Edition)
- PHP動態網站程序設計
- Visual C++程序設計教程
- 自然語言處理實戰:預訓練模型應用及其產品化
- JavaScript高效圖形編程
- Access 2010數據庫基礎與應用項目式教程(第3版)
- 用Flutter極速構建原生應用
- Hands-On GPU:Accelerated Computer Vision with OpenCV and CUDA
- 小程序開發原理與實戰
- H5頁面設計:Mugeda版(微課版)
- Julia高性能科學計算(第2版)
- 大數據分析與應用實戰:統計機器學習之數據導向編程
- 計算機應用基礎項目化教程
- JavaScript從入門到精通(視頻實戰版)
- JavaScript編程精解(原書第2版)