官术网_书友最值得收藏!

Generating Python datasets

To generate a Python dataset, we use the Pandas to_pickle functionality. The dataset we plan to use is called adult.pkl, as shown in the following screenshot:

The related Python code is given here:

import pandas as pd 
path="http://archive.ics.uci.edu/ml/machine-learning-databases/" 
dataSet="adult/adult.data" 
inFile=path+dataSet 
x=pd.read_csv(inFile,header=None) 
adult=pd.DataFrame(x,index=None) 
adult= adult.rename(columns={0:'age',1: 'workclass', 
2:'fnlwgt',3:'education',4:'education-num', 
5:'marital-status',6:'occupation',7:'relationship', 
8:'race',9:'sex',10:'capital-gain',11:'capital-loss', 
12:'hours-per-week',13:'native-country',14:'class'}) 
adult.to_pickle("c:/temp/adult.pkl") 

To show the first several lines of observations, we use the x.head() functionality, shown in the following screenshot:

Note that the backup dataset is available at the author's website, downloadable at http://canisius.edu/~yany/data/adult.data.txt.

主站蜘蛛池模板: 会昌县| 汉沽区| 从江县| 白城市| 郸城县| 博爱县| 高碑店市| 达日县| 九台市| 湖南省| 长岭县| 黄浦区| 芒康县| 平舆县| 交城县| 大兴区| 芦山县| 南城县| 久治县| 通城县| 临沂市| 金秀| 阜新| 伊春市| 道真| 同心县| 广东省| 兰溪市| 宝兴县| 邢台市| 福安市| 淅川县| 南充市| 女性| 沅陵县| 九龙城区| 来安县| 安图县| 舞钢市| 礼泉县| 丁青县|