官术网_书友最值得收藏!

  • Deep Learning By Example
  • Ahmed Menshawy
  • 128字
  • 2021-06-24 18:52:45

Factorizing

This approach is used to create a numerical categorical feature from any other feature. In pandas, the factorize() function does that. This type of transformation is useful if your feature is an alphanumeric categorical variable. In the Titanic data samples, we can transform the Cabin feature into a categorical feature, representing the letter of the cabin:

# the cabin number is a sequence of of alphanumerical digits, so we are going to create some features
# from the alphabetical part of it
df_titanic_data['CabinLetter'] = df_titanic_data['Cabin'].map(lambda l: get_cabin_letter(l))
df_titanic_data['CabinLetter'] = pd.factorize(df_titanic_data['CabinLetter'])[0]
def get_cabin_letter(cabin_value):
# searching for the letters in the cabin alphanumerical value
letter_match = re.compile("([a-zA-Z]+)").search(cabin_value)

if letter_match:
return letter_match.group()
else:
return 'U'

We can also apply transformations to quantitative features by using one of the following approaches.

主站蜘蛛池模板: 绥阳县| 河津市| 高尔夫| 施甸县| 且末县| 达拉特旗| 延长县| 翁牛特旗| 松溪县| 菏泽市| 泗阳县| 抚远县| 萝北县| 阜城县| 永定县| 瑞安市| 乐业县| 巩义市| 锡林郭勒盟| 松江区| 黔西| 宁明县| 江陵县| 忻州市| 合阳县| 渝中区| 龙泉市| 华池县| 泰兴市| 呈贡县| 浮山县| 定兴县| 方城县| 彩票| 邓州市| 广南县| 离岛区| 浦东新区| 广州市| 华池县| 岢岚县|