官术网_书友最值得收藏!

Loading the dataset

Create and open a new IPython notebook. In the chapter's supplementary materials, you can see the file extraterrestrials.csv. Copy it to the same folder where you created your notebook. In the first cell of your notebook, execute the magical command:

In []: 
%matplotlib inline 

This is needed to see inline plots right in the notebook in the future.

The library we are using for datasets loading and manipulation is pandas. Let's import it, and load the .csv file:

In []: 
import pandas as pd 
df = pd.read_csv('extraterrestrials.csv', sep='t', encoding='utf-8', index_col=0) 

Object df is a data frame. This is a table-like data structured for efficient manipulations over the different data types. To see what's inside, execute:

In []: 
df.head() 
Out[]: 

This prints the first five rows of the table. The first three columns (length, color, and fluffy) are features, and the last one is the class label.

How many samples do we have in total? Run this code to find out:

In []: 
len(df) 
Out[]: 
1000 

Looks like the most samples in the beginning are rabbosauruses. Let's fetch five samples at random to see if it holds true in other parts of the dataset:

In []: 
df.sample(5) 
Out[]: 

Well, this isn't helpful, as it would be too tedious to analyze the table content in this way. We need some more advanced tools to perform descriptive statistics computations and data visualization.

主站蜘蛛池模板: 武穴市| 沙雅县| 霞浦县| 屏边| 北流市| 兴仁县| 错那县| 枣强县| 武平县| 河南省| 福清市| 平邑县| 安乡县| 义马市| 扶绥县| 泌阳县| 德安县| 长沙市| 莒南县| 上杭县| 武乡县| 宜黄县| 富裕县| 军事| 云浮市| 城市| 京山县| 新河县| 鄂州市| 毕节市| 连云港市| 莫力| 资兴市| 周至县| 桐城市| 张家界市| 霍山县| 紫金县| 湟源县| 景东| 西盟|