- Hands-On Data Science with Anaconda
- Dr. Yuxing Yan James Yan
- 85字
- 2021-06-25 21:08:51
Generating Python datasets
To generate a Python dataset, we use the Pandas to_pickle functionality. The dataset we plan to use is called adult.pkl, as shown in the following screenshot:

The related Python code is given here:
import pandas as pd path="http://archive.ics.uci.edu/ml/machine-learning-databases/" dataSet="adult/adult.data" inFile=path+dataSet x=pd.read_csv(inFile,header=None) adult=pd.DataFrame(x,index=None) adult= adult.rename(columns={0:'age',1: 'workclass', 2:'fnlwgt',3:'education',4:'education-num', 5:'marital-status',6:'occupation',7:'relationship', 8:'race',9:'sex',10:'capital-gain',11:'capital-loss', 12:'hours-per-week',13:'native-country',14:'class'}) adult.to_pickle("c:/temp/adult.pkl")
To show the first several lines of observations, we use the x.head() functionality, shown in the following screenshot:

Note that the backup dataset is available at the author's website, downloadable at http://canisius.edu/~yany/data/adult.data.txt.