書名： Python Data Mining Quick Start Guide
作者名： Nathan Greeneltch
本章字數： 224字
更新時間： 2021-06-24 15:19:47

Loading data into memory – viewing and managing with ease using pandas

First, we will need to load data into memory so that Python can interact with it. Pandas will be our data management and manipulation library:

# load data into Pandas
import pandas as pd
df = pd.read_csv("./data/iris.csv")

Let's use some built-in pandas features to do sanity checks on our data load and make sure that we've loaded everything properly. First, we use the .shape attribute to check the size of the data printed (as rows and columns). Next, we sanity check the contents of the DataFrame with the .head() method, which returns the first five lines in a new and smaller DataFrame for easy viewing. Finally, we can use the .describe() method to show some summary statistics for each feature.

Pandas has many more sanity check and quick view features. For example, .tail() will return the final five lines of the data. Becoming proficient in pandas is undoubtedly worth the time investment. The dedicated chapter that appears later in the book is a good place to start, as well as the essential basic functionality (https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html) page on the pandas documentation site.

# sanity check with Pandas
print("shape of data in (rows, columns) is " + str(df.shape))
print(df.head())
print(df.describe().transpose())

You will see the following output after executing the preceding code:

官术网_书友最值得收藏!

Python Data Mining Quick Start Guide

Loading data into memory – viewing and managing with ease using pandas