官术网_书友最值得收藏!

Panel data

So far, we have seen data taken from multiple individuals but at one point in time (cross-sectional) or taken from an individual entity but over multiple points in time (time series). However, if we observe multiple entities over multiple points in time we get a panel data also known as longitudinal data. Extending our earlier example about the military expenditure, let us now consider four countries over the same period of 1960-2010. The resulting data will be a panel dataset. The figure given below illustrates the panel data in this scenario. Rows with missing values, corresponding to the period 1960 to 1987 have been dropped before plotting the data.

Figure 1.4: Example of panel data
A generic panel data regression model can be stated as y_it = W x _it +b+ ? _it, which expresses the dependent variable y_it as a linear model of explanatory variable x_it, where W are weights of x_it, b is the bias term, and ?_it is the error. i represents individuals for whom data is collected for multiple points in time represented by j. As evident, this type of panel data analysis seeks to model the variations across both multiple individual and multiple points in time. The variations are reflected by ? _it and assumptions determine the necessary mathematical treatment. For example, if ?_it is assumed to vary non-stochastically with respect to i and t, then it reduces to a dummy variable representing random noise. This type of analysis is known as fixed effects model. On the other hand, ?_it varying stochastically over i and t requires a special treatment of the error and is dealt in a random effects model.

Let us prepare the data that is required to plot the preceding figure. We will continue to expand the code we have used for the cross-sectional and time series data previously in this chapter. We start by creating a DataFrame having the data for the four companies mentioned in the preceding plot. This is done as follows:

chn = data.ix[(data['Indicator Name']=='Military expenditure (% of GDP)')&\
(data['Country Code']=='CHN'),index0:index1+1
]
chn = pd.Series(data=chn.values[0], index=chn.columns)
chn.dropna(inplace=True)

usa = data.ix[(data['Indicator Name']=='Military expenditure (% of GDP)')&\
(data['Country Code']=='USA'),index0:index1+1
]
usa = pd.Series(data=usa.values[0], index=usa.columns)
usa.dropna(inplace=True)

ind = data.ix[(data['Indicator Name']=='Military expenditure (% of GDP)')&\
(data['Country Code']=='IND'),index0:index1+1
]
ind = pd.Series(data=ind.values[0], index=ind.columns)
ind.dropna(inplace=True)

gbr = data.ix[(data['Indicator Name']=='Military expenditure (% of GDP)')&\
(data['Country Code']=='GBR'),index0:index1+1
]
gbr = pd.Series(data=gbr.values[0], index=gbr.columns)
gbr.dropna(inplace=True)

Now that the data is ready for all five countries, we will plot them using the following code:

plt.figure(figsize=(5.5, 5.5))
usa.plot(linestyle='-', marker='*', color='b')
chn.plot(linestyle='-', marker='*', color='r')
gbr.plot(linestyle='-', marker='*', color='g')
ind.plot(linestyle='-', marker='*', color='y')
plt.legend(['USA','CHINA','UK','INDIA'], loc=1)
plt.title('Miltitary expenditure of 5 countries over 10 years')
plt.ylabel('Military expenditure (% of GDP)')
plt.xlabel('Years')s
The Jupyter notebook that has the code used for generating all the preceding figures is Chapter_1_Different_Types_of_Data.ipynb under the code folder in the GitHub repo.

The discussion about different types of data sets the stage for a closer look at time series. We will start doing that by understanding the special properties of data that can be typically found in a time series or panel data with inherent time series in it.

主站蜘蛛池模板: 德阳市| 江都市| 祁连县| 卓资县| 长宁区| 永济市| 巴南区| 盐城市| 额敏县| 衡南县| 江山市| 中宁县| 灵宝市| 云林县| 竹溪县| 逊克县| 保山市| 河南省| 廊坊市| 五指山市| 根河市| 宁强县| 中牟县| 石门县| 留坝县| 绥滨县| 铜鼓县| 萨迦县| 台东县| 寿阳县| 宁德市| 兴山县| 鱼台县| 南乐县| 双牌县| 天全县| 米泉市| 清水县| 隆子县| 无为县| 临邑县|