官术网_书友最值得收藏!

Panel data

So far, we have seen data taken from multiple individuals but at one point in time (cross-sectional) or taken from an individual entity but over multiple points in time (time series). However, if we observe multiple entities over multiple points in time we get a panel data also known as longitudinal data. Extending our earlier example about the military expenditure, let us now consider four countries over the same period of 1960-2010. The resulting data will be a panel dataset. The figure given below illustrates the panel data in this scenario. Rows with missing values, corresponding to the period 1960 to 1987 have been dropped before plotting the data.

Figure 1.4: Example of panel data
A generic panel data regression model can be stated as y_it = W x _it +b+ ? _it, which expresses the dependent variable y_it as a linear model of explanatory variable x_it, where W are weights of x_it, b is the bias term, and ?_it is the error. i represents individuals for whom data is collected for multiple points in time represented by j. As evident, this type of panel data analysis seeks to model the variations across both multiple individual and multiple points in time. The variations are reflected by ? _it and assumptions determine the necessary mathematical treatment. For example, if ?_it is assumed to vary non-stochastically with respect to i and t, then it reduces to a dummy variable representing random noise. This type of analysis is known as fixed effects model. On the other hand, ?_it varying stochastically over i and t requires a special treatment of the error and is dealt in a random effects model.

Let us prepare the data that is required to plot the preceding figure. We will continue to expand the code we have used for the cross-sectional and time series data previously in this chapter. We start by creating a DataFrame having the data for the four companies mentioned in the preceding plot. This is done as follows:

chn = data.ix[(data['Indicator Name']=='Military expenditure (% of GDP)')&\
(data['Country Code']=='CHN'),index0:index1+1
]
chn = pd.Series(data=chn.values[0], index=chn.columns)
chn.dropna(inplace=True)

usa = data.ix[(data['Indicator Name']=='Military expenditure (% of GDP)')&\
(data['Country Code']=='USA'),index0:index1+1
]
usa = pd.Series(data=usa.values[0], index=usa.columns)
usa.dropna(inplace=True)

ind = data.ix[(data['Indicator Name']=='Military expenditure (% of GDP)')&\
(data['Country Code']=='IND'),index0:index1+1
]
ind = pd.Series(data=ind.values[0], index=ind.columns)
ind.dropna(inplace=True)

gbr = data.ix[(data['Indicator Name']=='Military expenditure (% of GDP)')&\
(data['Country Code']=='GBR'),index0:index1+1
]
gbr = pd.Series(data=gbr.values[0], index=gbr.columns)
gbr.dropna(inplace=True)

Now that the data is ready for all five countries, we will plot them using the following code:

plt.figure(figsize=(5.5, 5.5))
usa.plot(linestyle='-', marker='*', color='b')
chn.plot(linestyle='-', marker='*', color='r')
gbr.plot(linestyle='-', marker='*', color='g')
ind.plot(linestyle='-', marker='*', color='y')
plt.legend(['USA','CHINA','UK','INDIA'], loc=1)
plt.title('Miltitary expenditure of 5 countries over 10 years')
plt.ylabel('Military expenditure (% of GDP)')
plt.xlabel('Years')s
The Jupyter notebook that has the code used for generating all the preceding figures is Chapter_1_Different_Types_of_Data.ipynb under the code folder in the GitHub repo.

The discussion about different types of data sets the stage for a closer look at time series. We will start doing that by understanding the special properties of data that can be typically found in a time series or panel data with inherent time series in it.

主站蜘蛛池模板: 遵化市| 禹州市| 贺州市| 思茅市| 信阳市| 宜黄县| 扎鲁特旗| 永昌县| 东安县| 朔州市| 怀化市| 读书| 崇仁县| 海原县| 奉化市| 封开县| 河北区| 彩票| 姜堰市| 如皋市| 荥经县| 临潭县| 西乌| 朝阳区| 健康| 耿马| 满洲里市| 繁峙县| 富阳市| 丹阳市| 龙江县| 南开区| 万安县| 德清县| 嫩江县| 讷河市| 喀喇| 怀来县| 延长县| 玉环县| 津南区|