官术网_书友最值得收藏!

  • Machine Learning with Swift
  • Alexander Sosnovshchenko
  • 233字
  • 2021-06-24 18:54:55

Loading the dataset

Create and open a new IPython notebook. In the chapter's supplementary materials, you can see the file extraterrestrials.csv. Copy it to the same folder where you created your notebook. In the first cell of your notebook, execute the magical command:

In []: 
%matplotlib inline 

This is needed to see inline plots right in the notebook in the future.

The library we are using for datasets loading and manipulation is pandas. Let's import it, and load the .csv file:

In []: 
import pandas as pd 
df = pd.read_csv('extraterrestrials.csv', sep='t', encoding='utf-8', index_col=0) 

Object df is a data frame. This is a table-like data structured for efficient manipulations over the different data types. To see what's inside, execute:

In []: 
df.head() 
Out[]: 

This prints the first five rows of the table. The first three columns (length, color, and fluffy) are features, and the last one is the class label.

How many samples do we have in total? Run this code to find out:

In []: 
len(df) 
Out[]: 
1000 

Looks like the most samples in the beginning are rabbosauruses. Let's fetch five samples at random to see if it holds true in other parts of the dataset:

In []: 
df.sample(5) 
Out[]: 

Well, this isn't helpful, as it would be too tedious to analyze the table content in this way. We need some more advanced tools to perform descriptive statistics computations and data visualization.

主站蜘蛛池模板: 昌黎县| 泸溪县| 南岸区| 多伦县| 富川| 宁强县| 孟津县| 克什克腾旗| 西吉县| 阳朔县| 绥芬河市| 昭通市| 海伦市| 英山县| 盘锦市| 仲巴县| 孟村| 原平市| 元氏县| 延寿县| 宜章县| 绵阳市| 阿坝县| 宽城| 怀来县| 南通市| 德格县| 获嘉县| 永川市| 辉县市| 右玉县| 环江| 贞丰县| 潼南县| 姜堰市| 吉林市| 油尖旺区| 满城县| 巴青县| 金湖县| 眉山市|