官术网_书友最值得收藏!

Loading data from files into a DataFrame

The pandas library provides facilities for easy retrieval of data from a variety of data sources as pandas objects. As a quick example, let's examine the ability of pandas to load data in CSV format.

This example will use a file provided with the code from this book, data/goog.csv, and the contents of the file represent time series financial information for the Google stock.

The following statement uses the operating system (from within Jupyter Notebook or IPython) to display the content of this file. Which command you will need to use depends on your operating system:

This information can be easily imported into a DataFrame using the pd.read_csv() function:

pandas has no idea that the first column in the file is a date and has treated the contents of the date field as a string. This can be verified using the following pandas statement, which shows the type of the Date column as a string:

The parse_dates parameter of the pd.read_csv() function to guide pandas on how to convert data directly into a pandas date object. The following informs pandas to convert the content of the Date column into actual TimeStamp objects:

If we check whether it worked, we see that the date is a Timestamp:

Unfortunately, this has not used the date field as the index for the data frame. Instead, it uses the default zero-based integer index labels:

Note that this is now a RangeIndex, where in previous versions of pandas it would have been an integer index. We'll examine this difference later in the book.

This can be fixed using the index_col parameter of the pd.read_csv() function to specify which column in the file should be used as the index:

And the index now is a DateTimeIndex, which lets us look up rows using dates.

主站蜘蛛池模板: 安丘市| 龙海市| 宜丰县| 定西市| 元江| 竹北市| 隆安县| 德保县| 青冈县| 杂多县| 茶陵县| 赞皇县| 海丰县| 临城县| 自治县| 连城县| 固安县| 蓝山县| 巍山| 宁都县| 武清区| 浙江省| 夹江县| 图木舒克市| 宁明县| 友谊县| 石景山区| 曲沃县| 菏泽市| 新民市| 调兵山市| 肥城市| 岱山县| 洪泽县| 兴和县| 中宁县| 靖宇县| 同德县| 玉溪市| 开封县| 吐鲁番市|