官术网_书友最值得收藏!

Time series data

The example of cross-sectional data discussed earlier is from the year 2010 only. However, instead if we consider only one country, for example United States, and take a look at its military expenses and central government debt for a span of 10 years from 2001 to 2010, that would get two time series - one about the US federal military expenditure and the other about debt of US federal government. Therefore, in essence, a time series is made up of quantitative observations on one or more measurable characteristics of an individual entity and taken at multiple points in time. In this case, the data represents yearly military expenditure and government debt for the United States. Time series data is typically characterized by several interesting internal structures such as trend, seasonality, stationarity, autocorrelation, and so on. These will be conceptually discussed in the coming sections in this chapter.

The internal structures of time series data require special formulation and techniques for its analysis. These techniques will be covered in the following chapters with case studies and implementation of working code in Python.

The following figure plots the couple of time series we have been talking about:

Figure 1.3: Examples of time series data

In order to generate the preceding plots we will extend the code that was developed to get the graphs for the cross-sectional data. We will start by creating two new Series to represent the time series of military expenses and central government debt of the United States from 1960 to 2010:

central_govt_debt_us = central_govt_debt.ix[central_govt_debt['Country Code']=='USA', :].T 
military_exp_us = military_exp.ix[military_exp['Country Code']=='USA', :].T 

The two Series objects created in the preceding code are merged to form a single DataFrame and sliced to hold data for the years 2001 through 2010:

data_us = pd.concat((military_exp_us, central_govt_debt_us), axis=1) 
index0 = np.where(data_us.index=='1960')[0][0] 
index1 = np.where(data_us.index=='2010')[0][0] 
data_us = data_us.iloc[index0:index1+1,:] 
data_us.columns = ['Federal Military Expenditure', 'Debt of Federal  Government'] 
data_us.head(10)  

The data prepared by the preceding code returns the following table:

The preceding table shows that data on federal military expenses and federal debt is not available from several years starting from 1960. Hence, we drop the rows with missing values from the Dataframe data_us before plotting the time series:

data_us.dropna(inplace=True)
print('Shape of data_us:', data_us.shape)

As seen in the output of the print function, the DataFrame has twenty three rows after dropping the missing values:

Shape of data_us: (23, 2)

After dropping rows with missing values, we display the first ten rows of data_us are displayed as follows:

Finally, the time series are generated by executing the following code:

# Two subplots, the axes array is 1-d
f, axarr = plt.subplots(2, sharex=True)
f.set_size_inches(5.5, 5.5)
axarr[0].set_title('Federal Military Expenditure during 1988-2010 (% of GDP)')
data_us['Federal Military Expenditure'].plot(linestyle='-', marker='*', color='b', ax=axarr[0])
axarr[1].set_title('Debt of Federal Government during 1988-2010 (% of GDP)')
data_us['Debt of Federal Government'].plot(linestyle='-', marker='*', color='r', ax=axarr[1])
主站蜘蛛池模板: 乐昌市| 资阳市| 宜兰县| 定远县| 阳曲县| 民乐县| 济源市| 武山县| 全州县| 永和县| 德阳市| 盘锦市| 南川市| 昌平区| 湖口县| 大理市| 南和县| 全椒县| 夹江县| 湘潭县| 壶关县| 盘锦市| 象山县| 临沭县| 大竹县| 沙洋县| 祁东县| 夹江县| 仁寿县| 隆安县| 沂源县| 南宁市| 禹城市| 通辽市| 册亨县| 巴东县| 闽侯县| 马鞍山市| 临江市| 安平县| 张家界市|