官术网_书友最值得收藏!

  • Deep Learning By Example
  • Ahmed Menshawy
  • 128字
  • 2021-06-24 18:52:45

Factorizing

This approach is used to create a numerical categorical feature from any other feature. In pandas, the factorize() function does that. This type of transformation is useful if your feature is an alphanumeric categorical variable. In the Titanic data samples, we can transform the Cabin feature into a categorical feature, representing the letter of the cabin:

# the cabin number is a sequence of of alphanumerical digits, so we are going to create some features
# from the alphabetical part of it
df_titanic_data['CabinLetter'] = df_titanic_data['Cabin'].map(lambda l: get_cabin_letter(l))
df_titanic_data['CabinLetter'] = pd.factorize(df_titanic_data['CabinLetter'])[0]
def get_cabin_letter(cabin_value):
# searching for the letters in the cabin alphanumerical value
letter_match = re.compile("([a-zA-Z]+)").search(cabin_value)

if letter_match:
return letter_match.group()
else:
return 'U'

We can also apply transformations to quantitative features by using one of the following approaches.

主站蜘蛛池模板: 特克斯县| 石棉县| 高陵县| 吴川市| 工布江达县| 平阴县| 胶南市| 布尔津县| 陆河县| 保德县| 慈利县| 方城县| 华亭县| 股票| 邵阳市| 昂仁县| 房山区| 商南县| 延寿县| 永城市| 清流县| 读书| 西和县| 隆子县| 呈贡县| 晋宁县| 永泰县| 宿州市| 论坛| 唐山市| 宁远县| 鄂托克旗| 乐山市| 剑阁县| 张家口市| 丹东市| 仙游县| 建阳市| 涪陵区| 屯留县| 台前县|