- Hands-On Data Science with Anaconda
- Dr. Yuxing Yan James Yan
- 85字
- 2021-06-25 21:08:51
Generating Python datasets
To generate a Python dataset, we use the Pandas to_pickle functionality. The dataset we plan to use is called adult.pkl, as shown in the following screenshot:

The related Python code is given here:
import pandas as pd path="http://archive.ics.uci.edu/ml/machine-learning-databases/" dataSet="adult/adult.data" inFile=path+dataSet x=pd.read_csv(inFile,header=None) adult=pd.DataFrame(x,index=None) adult= adult.rename(columns={0:'age',1: 'workclass', 2:'fnlwgt',3:'education',4:'education-num', 5:'marital-status',6:'occupation',7:'relationship', 8:'race',9:'sex',10:'capital-gain',11:'capital-loss', 12:'hours-per-week',13:'native-country',14:'class'}) adult.to_pickle("c:/temp/adult.pkl")
To show the first several lines of observations, we use the x.head() functionality, shown in the following screenshot:

Note that the backup dataset is available at the author's website, downloadable at http://canisius.edu/~yany/data/adult.data.txt.
推薦閱讀
- 大學(xué)計算機信息技術(shù)導(dǎo)論
- 大數(shù)據(jù)管理系統(tǒng)
- Dreamweaver CS3+Flash CS3+Fireworks CS3創(chuàng)意網(wǎng)站構(gòu)建實例詳解
- Dreamweaver CS3網(wǎng)頁制作融會貫通
- 大數(shù)據(jù)時代的數(shù)據(jù)挖掘
- 數(shù)據(jù)庫原理與應(yīng)用技術(shù)學(xué)習(xí)指導(dǎo)
- 嵌入式Linux上的C語言編程實踐
- 21天學(xué)通Java
- 大數(shù)據(jù)平臺異常檢測分析系統(tǒng)的若干關(guān)鍵技術(shù)研究
- Ceph:Designing and Implementing Scalable Storage Systems
- 完全掌握AutoCAD 2008中文版:機械篇
- 悟透AutoCAD 2009案例自學(xué)手冊
- R Data Analysis Projects
- 云計算和大數(shù)據(jù)的應(yīng)用
- Visual Basic項目開發(fā)案例精粹