官术网_书友最值得收藏!

Generating Python datasets

To generate a Python dataset, we use the Pandas to_pickle functionality. The dataset we plan to use is called adult.pkl, as shown in the following screenshot:

The related Python code is given here:

import pandas as pd 
path="http://archive.ics.uci.edu/ml/machine-learning-databases/" 
dataSet="adult/adult.data" 
inFile=path+dataSet 
x=pd.read_csv(inFile,header=None) 
adult=pd.DataFrame(x,index=None) 
adult= adult.rename(columns={0:'age',1: 'workclass', 
2:'fnlwgt',3:'education',4:'education-num', 
5:'marital-status',6:'occupation',7:'relationship', 
8:'race',9:'sex',10:'capital-gain',11:'capital-loss', 
12:'hours-per-week',13:'native-country',14:'class'}) 
adult.to_pickle("c:/temp/adult.pkl") 

To show the first several lines of observations, we use the x.head() functionality, shown in the following screenshot:

Note that the backup dataset is available at the author's website, downloadable at http://canisius.edu/~yany/data/adult.data.txt.

主站蜘蛛池模板: 广州市| 南澳县| 慈溪市| 当阳市| 郧西县| 东丽区| 罗平县| 新巴尔虎左旗| 城口县| 香格里拉县| 从江县| 牡丹江市| 安顺市| 仙居县| 神农架林区| 儋州市| 曲沃县| 虎林市| 分宜县| 永德县| 霍山县| 赣州市| 泰宁县| 荆门市| 含山县| 临沂市| 凭祥市| 乳源| 体育| 肃宁县| 隆安县| 纳雍县| 衡南县| 耒阳市| 靖宇县| 佛坪县| 南木林县| 日土县| 昭平县| 阳新县| 邯郸县|