官术网_书友最值得收藏!

Converting categorical variables

As you already have noticed, a data frame can contain columns with the data of different types. To see which type has each column, we can check the dtypes attribute of the data frame. You can think about Python attributes as being similar to Swift properties:

In []: 
df.dtypes 
Out[]: 
length    float64 
color      object 
fluffy       bool 
label      object 
dtype: object 

While length and fluffy columns contain the expected datatypes, the types of color and label are less transparent. What are those objects? This means those columns can contain any type of the object. At the moment, we have strings in them, but what we really want them to be are categorical variables. In case you don't remember from the previous chapter, categorical variables are like Swift enums. Fortunately for us, data frame has handy methods for converting columns from one type to another:

In []: 
df.color = df.color.astype('category') 
df.label = df.label.astype('category')  

That's it. Let's check:

In []: 
df.dtypes 
Out []: 
length     float64 
color     category 
fluffy        bool 
label     category 
dtype: object  

color and label are categories now. To see all colors in those categories, execute:

In []: 
colors = df.color.cat.categories.get_values().astype('string') 
colors 
Out[]: 
array(['light black', 'pink gold', 'purple polka-dot', 'space gray'], dtype='|S16') 

As expected, we have four colors. '|S16' stands for strings of 16 characters in length.

主站蜘蛛池模板: 红原县| 平遥县| 漠河县| 阳城县| 姜堰市| 安图县| 伊金霍洛旗| 嘉禾县| 东阿县| 沭阳县| 瓦房店市| 磐石市| 元阳县| 太和县| 孝义市| 政和县| 邯郸县| 阳江市| 达日县| 石台县| 尼木县| 德兴市| 米泉市| 曲沃县| 宜黄县| 涞水县| 胶州市| 沅江市| 马公市| 库尔勒市| 延长县| 巴楚县| 贡山| 宁夏| 加查县| 涞源县| 亚东县| 仲巴县| 安仁县| 靖安县| 凤台县|