官术网_书友最值得收藏!

Generating Python datasets

To generate a Python dataset, we use the Pandas to_pickle functionality. The dataset we plan to use is called adult.pkl, as shown in the following screenshot:

The related Python code is given here:

import pandas as pd 
path="http://archive.ics.uci.edu/ml/machine-learning-databases/" 
dataSet="adult/adult.data" 
inFile=path+dataSet 
x=pd.read_csv(inFile,header=None) 
adult=pd.DataFrame(x,index=None) 
adult= adult.rename(columns={0:'age',1: 'workclass', 
2:'fnlwgt',3:'education',4:'education-num', 
5:'marital-status',6:'occupation',7:'relationship', 
8:'race',9:'sex',10:'capital-gain',11:'capital-loss', 
12:'hours-per-week',13:'native-country',14:'class'}) 
adult.to_pickle("c:/temp/adult.pkl") 

To show the first several lines of observations, we use the x.head() functionality, shown in the following screenshot:

Note that the backup dataset is available at the author's website, downloadable at http://canisius.edu/~yany/data/adult.data.txt.

主站蜘蛛池模板: 奎屯市| 岱山县| 福清市| 平邑县| 甘洛县| 卢龙县| 饶河县| 东光县| 潜山县| 汤阴县| 大新县| 新邵县| 金塔县| 界首市| 建瓯市| 龙山县| 长泰县| 丰台区| 岑巩县| 穆棱市| 丰镇市| 云龙县| 台山市| 新巴尔虎右旗| 廉江市| 花莲县| 前郭尔| 宁化县| 赤城县| 镇雄县| 本溪市| 临西县| 兴隆县| 三都| 化德县| 喀什市| 贵南县| 潮州市| 古丈县| 明水县| 通州区|