- Python Data Mining Quick Start Guide
- Nathan Greeneltch
- 224字
- 2021-06-24 15:19:47
Loading data into memory – viewing and managing with ease using pandas
First, we will need to load data into memory so that Python can interact with it. Pandas will be our data management and manipulation library:
# load data into Pandas
import pandas as pd
df = pd.read_csv("./data/iris.csv")
Let's use some built-in pandas features to do sanity checks on our data load and make sure that we've loaded everything properly. First, we use the .shape attribute to check the size of the data printed (as rows and columns). Next, we sanity check the contents of the DataFrame with the .head() method, which returns the first five lines in a new and smaller DataFrame for easy viewing. Finally, we can use the .describe() method to show some summary statistics for each feature.
Pandas has many more sanity check and quick view features. For example, .tail() will return the final five lines of the data. Becoming proficient in pandas is undoubtedly worth the time investment. The dedicated chapter that appears later in the book is a good place to start, as well as the essential basic functionality (https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html) page on the pandas documentation site.
# sanity check with Pandas
print("shape of data in (rows, columns) is " + str(df.shape))
print(df.head())
print(df.describe().transpose())
You will see the following output after executing the preceding code:

推薦閱讀
- 3D Printing with RepRap Cookbook
- 商戰數據挖掘:你需要了解的數據科學與分析思維
- Hands-On Machine Learning on Google Cloud Platform
- Dreamweaver CS3網頁設計50例
- 深度學習中的圖像分類與對抗技術
- Python Data Science Essentials
- Pig Design Patterns
- 構建高性能Web站點
- 網絡安全管理實踐
- Chef:Powerful Infrastructure Automation
- Learning ServiceNow
- Ansible 2 Cloud Automation Cookbook
- Hands-On Dashboard Development with QlikView
- 未來學徒:讀懂人工智能飛馳時代
- Creating ELearning Games with Unity