- Python Data Mining Quick Start Guide
- Nathan Greeneltch
- 224字
- 2021-06-24 15:19:47
Loading data into memory – viewing and managing with ease using pandas
First, we will need to load data into memory so that Python can interact with it. Pandas will be our data management and manipulation library:
# load data into Pandas
import pandas as pd
df = pd.read_csv("./data/iris.csv")
Let's use some built-in pandas features to do sanity checks on our data load and make sure that we've loaded everything properly. First, we use the .shape attribute to check the size of the data printed (as rows and columns). Next, we sanity check the contents of the DataFrame with the .head() method, which returns the first five lines in a new and smaller DataFrame for easy viewing. Finally, we can use the .describe() method to show some summary statistics for each feature.
Pandas has many more sanity check and quick view features. For example, .tail() will return the final five lines of the data. Becoming proficient in pandas is undoubtedly worth the time investment. The dedicated chapter that appears later in the book is a good place to start, as well as the essential basic functionality (https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html) page on the pandas documentation site.
# sanity check with Pandas
print("shape of data in (rows, columns) is " + str(df.shape))
print(df.head())
print(df.describe().transpose())
You will see the following output after executing the preceding code:

推薦閱讀
- Big Data Analytics with Hadoop 3
- 我的J2EE成功之路
- 軟件架構設計
- Learning Apache Cassandra(Second Edition)
- 快學Flash動畫百例
- Associations and Correlations
- 西門子S7-200 SMART PLC實例指導學與用
- Photoshop CS3圖層、通道、蒙版深度剖析寶典
- Ceph:Designing and Implementing Scalable Storage Systems
- 網中之我:何明升網絡社會論稿
- Visual C++項目開發案例精粹
- Mastering Ceph
- 經典Java EE企業應用實戰
- 機器人制作入門(第4版)
- 項目實踐精解:C#核心技術應用開發