- Python:Advanced Predictive Analytics
- Ashish Kumar Joseph Babcock
- 311字
- 2021-07-02 20:09:20
Various methods of importing data in Python
pandas
is the Python library/package of choice to import, wrangle, and manipulate datasets. The datasets come in various forms; the most frequent being in the .csv
format. The delimiter (a special character that separates the values in a dataset) in a CSV file is a comma. Now we will look at the various methods in which you can read a dataset in Python.
Case 1 – reading a dataset using the read_csv method
Open an IPython Notebook by typing ipython notebook
in the command line.
Download the Titanic dataset from the shared Google Drive folder (any of .xls
or .xlsx
would do). Save this file in a CSV format and we are good to go. This is a very popular dataset that contains information about the passengers travelling on the famous ship Titanic on the fateful sail that saw it sinking. If you wish to know more about this dataset, you can go to the Google Drive folder and look for it.
A common practice is to share a variable description file with the dataset describing the context and significance of each variable. Since this is the first dataset we are encountering in this book, here is the data description of this dataset to get a feel of how data description files actually look like:
Note
VARIABLE DESCRIPTIONS: pclass Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd) survival Survival (0 = No; 1 = Yes) name Name sex Sex age Age sibsp Number of Siblings/Spouses Aboard parch Number of Parents/Children Aboard ticket Ticket Number fare Passenger Fare cabin Cabin embarked Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton) boat Lifeboat body Body Identification Number home.dest Home/Destination
The following code snippet is enough to import the dataset and get you started:
import pandas as pd data = pd.read_csv('E:/Personal/Learning/Datasets/Book/titanic3.csv')
- 數據庫原理及應用教程(第4版)(微課版)
- Word 2010中文版完全自學手冊
- 數據庫基礎與應用:Access 2010
- 劍破冰山:Oracle開發藝術
- Python數據分析入門:從數據獲取到可視化
- Python廣告數據挖掘與分析實戰
- Libgdx Cross/platform Game Development Cookbook
- 深入淺出MySQL:數據庫開發、優化與管理維護(第2版)
- 3D計算機視覺:原理、算法及應用
- Sybase數據庫在UNIX、Windows上的實施和管理
- Python金融實戰
- Instant Autodesk AutoCAD 2014 Customization with .NET
- 數據科學實戰指南
- 智慧的云計算
- Gideros Mobile Game Development