官术网_书友最值得收藏!

Factorizing

This approach is used to create a numerical categorical feature from any other feature. In pandas, the factorize() function does that. This type of transformation is useful if your feature is an alphanumeric categorical variable. In the Titanic data samples, we can transform the Cabin feature into a categorical feature, representing the letter of the cabin:

# the cabin number is a sequence of of alphanumerical digits, so we are going to create some features
# from the alphabetical part of it
df_titanic_data['CabinLetter'] = df_titanic_data['Cabin'].map(lambda l: get_cabin_letter(l))
df_titanic_data['CabinLetter'] = pd.factorize(df_titanic_data['CabinLetter'])[0]
def get_cabin_letter(cabin_value):
# searching for the letters in the cabin alphanumerical value
letter_match = re.compile("([a-zA-Z]+)").search(cabin_value)

if letter_match:
return letter_match.group()
else:
return 'U'

We can also apply transformations to quantitative features by using one of the following approaches.

主站蜘蛛池模板: 吉水县| 清新县| 疏附县| 定南县| 徐闻县| 丰镇市| 新郑市| 铁岭市| 玉树县| 衡阳县| 绥棱县| 汤原县| 大邑县| 乌海市| 方城县| 长沙县| 天峨县| 永登县| 福鼎市| 石棉县| 汉中市| 新丰县| 仁怀市| 淅川县| 滕州市| 建始县| 綦江县| 旬邑县| 墨竹工卡县| 濉溪县| 鲁山县| 天台县| 肇源县| 怀来县| 托里县| 武宣县| 革吉县| 武穴市| 乌兰浩特市| 彩票| 札达县|