Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 8
Panel Time Series

8.1 Introduction

Panel time series methods were born to address the issues of “long” panels of possibly nonstationary series, usually of macroeconomic nature. Such datasets, pooling together a sizable number of time series from different countries (regions, firms) have become increasingly common and are the main object of empirical research in many fields: development economics, regional or political science to name a few; the most typical unit of observation being a country or region within a reasonably large set of similar units and over at least two decades of either yearly or quarterly data.

Unlike “large” panels, the emphasis is therefore not only on ‐asymptotics but on both and tending to infinity, either sequentially or jointly (a seminal paper in this respect is Phillips and Moon, 1999). Specifying the order with which and diverge is essential for the properties of estimators.

The dynamics holds a more important, often prominent place (see e.g. Pesaran and Smith, 1995; Eberhardt et al., 2013). Under cointegration, error correction specifications are often of interest (see e.g. Holly et al., 2010). The assumption of parameter homogeneity is also often questioned in this field, often leading to relaxing it in favor of heterogeneous specifications where the coefficients of individual units are free to vary over the cross section. The parameter of interest can then be either the whole population of individual ones or the cross‐sectional average thereof.

Lastly, the issue of cross‐sectional correlation, which is assumed away in the case of dynamic GMM estimators a la Arellano and Bond (1991), takes a central role in panel time series methods. In fact, observations coming from countries of the world, or regions within one country or continent, are more likely than not to be correlated in the cross section either by some spatial process, whereby shocks spread to neighboring units because of proximity, or by the effect of common factors.

For example, consider a dynamic error component model:

where is allowed to be correlated with ; for and fixed , the OLS estimator of is inconsistent because of the presence of the unobserved correlated effects . From Chapter , we know that the within estimator for this model is in turn biased downward, the bias being inversely proportional to so that it becomes less severe as the available time dimension gets longer. If and both diverge, then for consistency is needed to grow “fast enough” relative to , i.e., at a rate such that the limit of is finite.

From a different viewpoint, if each time series in the panel is considered separately, as , OLS are a consistent estimator for the individual parameters so that separately estimating, and then either averaging or pooling, the coefficients becomes a feasible strategy.

More generally, the abundance of data along both dimensions in large , large panels opens up possibilities and issues, other than the familiar ones of large, short panels: heterogeneity can be considered, where coefficients are not fixed across individuals but are allowed to vary, either freely or randomly around an average; nonstationarity, where the long time dimension allows to address unit roots and cointegration; and cross‐sectional dependence across individual units, possibly due to common factors to which individual units react idiosyncratically.

8.2 Heterogeneous Coefficients

Long panels allow to estimate separate regressions for each unit. Hence it is natural to question the assumption of parameter homogeneity (, also called the pooling assumption) as opposed to various kinds of heterogeneous specifications. This is a vast subject, which we will keep as simple as possible here; in general it can be said that imposing the pooling restriction reduces the variance of the pooled estimator but may introduce bias if these restrictions are false (Baltagi et al., 2008). Moreover, the heterogeneous model is usually a generalization of the homogeneous one so that estimating it may allow to test for the validity of the pooling restriction.

The panel data model with individual heterogeneity:

generalizes the familiar individual effects model: here, all parameters vary across units, while in the former only the intercept did. The decision “to pool or not to pool” spans a vast literature; it is analyzed thoroughly by Baltagi et al. (2000) (see also Baltagi and Griffin, 1997; Baltagi et al., 2003a) in a forecasting perspective. Summing up the results of a number of studies, Baltagi et al. (2008) conclude that for forecasting purposes, the simplicity and stability of the pooled estimators dominate the flexibility of the heterogeneous ones, but seen from other perspectives, conclusions may reverse. It can be safely stated that data rich environments favor the latter, while the appeal of pooling restrictions becomes higher the smaller the dataset.

8.2.1 Fixed Coefficients

The heterogeneous panel model is:

(8.1)

where are individual‐specific parameters and is a vector of explanatory variables.

If the pooling assumption is relaxed and one does not want to make any other assumption about how the are generated, and if the dimension permits, one can simply estimate a separate vector of coefficients for each regression.

Individual slope parameters can be estimated (‐consistently) by least squares as:

(8.2)

This can be accomplished by subsetting the data and running OLS; more efficient functionality is provided in plm through the function pvcm, leaving the model argument at the default value of 'within'.

8.2.2 Random Coefficients

Estimating separate regressions negates the advantages of panel datasets in that degrees of freedom are greatly reduced with respect to the pooled data. If s are treated as fixed, there will be parameters to estimate with observations. Random coefficients specifications allow instead for cross‐sectional variability while still reaping the benefits of pooling.

8.2.2.1 The Swamy Estimator

Swamy (1970) proposed a model with all individual‐specific coefficients. In this case, we have:

where homoscedasticity of is not assumed and , or . The model is then rewritten as:

with . The model errors can be heteroscedastic (in particular because we did not impose homoscedasticity of ) and the errors of each individual are correlated as containing the same parameter vector . For the th individual, the error covariance is then:

and being uncorrelated by hypothesis, we have:

For the whole sample, is a block diagonal matrix, each block being equal to .

OLS estimation of this model is inefficient, not taking into account the heteroscedasticity and the correlation of errors. The model can be efficiently estimated by generalized least squares by computing and then applying OLS to the variables transformed by pre‐multiplying them by . Given that the latter is a block diagonal matrix, the same result is obtained by pre‐multiplying each individual's data by the corresponding block . The generalized least squares method is clearly infeasible because is unknown, but it can be made operational by employing an estimate thereof from a consistent model. This amounts to estimating and the elements of the matrix, or in total parameters.

To this end, we start by estimating each individual model by OLS. We then have:

A natural estimator of is then:

The estimates are then averaged:

The estimation of is based on the expression , which, developing and regrouping terms, can be written:

The usefulness of this expression is in writing as a linear combination of uncorrelated random variates, which considerably simplifies the computation of the variance of as all covariances are zero. We then have:

Finally, regrouping terms:

We then have:

which gives the estimator of :

Example 8‐1 Random coefficient model – `Dialysis` data set

Caudill et al. (1995) examine the effect that certificate‐of‐need regulation by state health planning organizations has on the speed of diffusion of a medical technology, hemodialysis. More specifically, they test the hypothesis that this regulation has slowed the rate of adoption of this technology. They use a panel of 50 American states for 14 years (from 1977 to 1990). The degree of adoption of the technology diffusion is measured as the ratio of the number of dialysis machines in a particular state for a year divided by the number of machines for the last period of observation. A logistic diffusion function is used for the response. Two covariates are used: a time trend and a dummy variable that equals one for observations for which certificate‐of‐need regulation is in effect, interacted with the time trend.

The Swamy (1970) model can be estimated with the pvcm function, setting the model argument to 'random'.

 data("Dialysis", package = "pder")
rndcoef <- pvcm(log(diffusion / (1 - diffusion)) ˜ trend + trend:regulation,
                 Dialysis, model="random")
summary(rndcoef)
Oneway (individual) effect Random coefficients model

Call:
pvcm(formula = log(diffusion/(1 - diffusion)) ˜ trend + trend:regulation,
    data = Dialysis, model = "random")

Balanced Panel: n = 50, T = 14, N = 700

Residuals:
total sum of squares: 629.5
    id   time
0.4685 0.2659

Estimated mean of the coefficients:
                 Estimate Std. Error z-value Pr(>|z|)
(Intercept)       -1.4266     0.1284  -11.11   <2e-16 ***
trend              0.3416     0.0260   13.15   <2e-16 ***
trend:regulation  -0.0581     0.0237   -2.45    0.014 *
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estimated variance of the coefficients:
                 (Intercept)   trend trend:regulation
(Intercept)           0.6617 -0.0736           0.0398
trend                -0.0736  0.0288          -0.0205
trend:regulation      0.0398 -0.0205           0.0179

Total Sum of Squares: 33900
Residual Sum of Squares: 642
Multiple R-Squared: 0.981

The results indicate that certificate‐of‐need regulation has slowed the diffusion of hemodialyis technology, as the coefficient is significantly (at the 5% level) negative. The estimated covariance matrix of the random coefficients is an element of the fitted model called "Delta"; the following command extracts the mean values of the three coefficients and their standard deviations.

 cbind(coef(rndcoef), stdev = sqrt(diag(rndcoef$Delta)))
                        y  stdev
(Intercept)      -1.42656 0.8135
trend             0.34161 0.1697
trend:regulation -0.05806 0.1339

The random coefficients have large standard deviations: about half the mean for the trend coefficient and about two times the mean for the regulation coefficients. These large values justify the use of the random coefficient model.

8.2.2.2 The Mean Groups Estimator

Under less restrictive parametric assumptions than those of the Swamy model, assuming only exogeneity of the regressors and independently sampled errors, the average can be estimated by the simpler mean groups (MG) method

(8.3)

and its dispersion, in a nonparametric fashion, through the empirical covariance of the individual :

(8.4)

which is in fact the simplified version of the Swamy covariance seen above. In the context of the Swamy model, it is biased but ‐consistent and, differently from the original, always non‐negative definite; as such, it has been suggested by Swamy (1970) himself as an alternative for cases when his parametric covariance is not. In general, it can be shown that the MG estimator is a special case with equal GLS weighting of the Swamy estimator, to which it converges as grows sufficiently large (Hsiao and Pesaran, 2008). The function pmg performs mean groups estimation by default (model='mg').

Example 8‐2 Heterogeneous coefficients – `HousePricesUS` data set

Holly et al. (2010) analyze the long‐run relationship between house prices and economic fundamentals (per capita income, net borrowing cost and population growth) in a sample of 49 US states over 29 years. The hypothesis of interest is whether house prices have an income elasticity of one. Their specification allows for variable coefficients in the random sense, as discussed above. The core of their model is the relationship between the logs of the nonstationary variables house prices price and income income. Their initial approach is to estimate a static specification by mean groups (MG). In the following we compare the coefficients from the asymptotically equivalent Swamy and MG estimators:

 data("HousePricesUS", package = "pder")
swmod <- pvcm(log(price) ˜ log(income), data = HousePricesUS, model= "random")
mgmod <- pmg(log(price) ˜ log(income), data = HousePricesUS, model = "mg")
coefs <- cbind(coef(swmod), coef(mgmod))
dimnames(coefs)[[2]] <- c("Swamy", "MG")
coefs
             Swamy     MG
(Intercept) 3.8914 3.8498
log(income) 0.2867 0.3018

One can see that for , the efficient Swamy estimator and the simpler MG are already very close; moreover, both are statistically very far from one.

Dynamic Mean Groups

Importantly, Pesaran and Smith (1995) consider the MG estimator in dynamic models of the type

(8.5)

and show that, unlike aggregated or pooled regressions, it provides consistent estimates of both coefficients and standard errors. Considering the full parameter vector , they observe that, while for fixed the estimator is biased of order , the individual regressions (8.2) become consistent estimators of as diverges. Hence the MG estimator of the average parameter vector is consistent for both and (see the discussion in Hsiao and Pesaran, 2008). Explicit calculation of the individual parameters' covariance as in (8.4) in turn provides a consistent estimate of .

Example 8‐3 dynamic MG estimation – `RDSpillovers` data set

In their analysis of the returns of own vs general R&D, Eberhardt et al. (2013) consider both static and dynamic heterogeneous specifications in the production function of European firms. In doing so, every country‐industry is allowed to follow its own production function; individual parameters are then averaged for the purpose of general inference. Static and dynamic specifications alike are considered.

In the following, we estimate both the static MG model (see their Table 7) and the dynamic MG. As in the original paper, we include individual trends by specifying trend=TRUE:

 library("texreg")
data("RDSpillovers", package = "pder")
fm.rds <- lny ˜ lnl + lnk + lnrd
mg.rds <- pmg(fm.rds, RDSpillovers, trend = TRUE)
dmg.rds <- update(mg.rds, . ˜ lag(lny) + .)
screenreg(list('Static MG' = mg.rds, 'Dynamic MG'= dmg.rds), digits = 3)

=======================================
             Static MG     Dynamic MG
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
(Intercept)     4.550 ***     4.038 ***
               (0.841)       (0.778)
lnl             0.568 ***     0.507 ***
               (0.086)       (0.059)
lnk             0.117         0.020
               (0.122)       (0.085)
lnrd           -0.058        -0.092
               (0.079)       (0.071)
trend           0.022 **      0.023 ***
               (0.008)       (0.004)
lag(lny)                      0.223 ***
                             (0.034)
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
Num. obs.    2637          2518
=======================================
*** p < 0.001, ** p < 0.01, * p < 0.05

The lagged dependent variable turns out significant, although the autoregressive parameter's magnitude is modest. On the basis of the dynamic model, the authors proceed to calculate the long‐run coefficients with or without common factor restrictions (see comment to their Table 8). Here we only reproduce the computation of the long‐ run elasticity of production to own R&D (which is the ratio of the coefficient of R&D to one minus the autoregressive coefficient), and the estimation of its standard error, through a Taylor approximation, by the delta method. With reference to a vector of random variates, the function deltamethod from package msm (Jackson, 2011) requires: a formula describing the transformation (here, x5/(1‐x2) as the coefficients on lag(lny) and lnrd are respectively 2nd and 5th); a vector of estimates for the means; and a matrix of covariance estimates. For the latter two, here we provide the coef.panelmodel and vcov.panelmodel of the dynamic model:

 library("msm")
b.lr <- coef(dmg.rds)["lnrd"]/(1 - coef(dmg.rds)["lag(lny)"])
SEb.lr <- deltamethod(˜ x5 / (1 - x2),
                      mean = coef(dmg.rds), cov = vcov(dmg.rds))
z.lr <- b.lr / SEb.lr
pval.lr <- 2 * pnorm(abs(z.lr), lower.tail = FALSE)
lr.lnrd <- matrix(c(b.lr, SEb.lr, z.lr, pval.lr), nrow=1)
dimnames(lr.lnrd) <- list("lnrd (long run)", c("Est.", "SE", "z", "p.val"))
round(lr.lnrd, 3)
                  Est.    SE      z p.val
lnrd (long run) -0.118 0.091 -1.301 0.193

After obtaining the point estimate and standard error of the long‐run coefficient, we compute the t‐statistic and the corresponding asymptotic for the two‐tailed test. The long‐run elasticity of production to own R&D from the dynamic MG model is not significant at any conventional confidence level.¹

8.2.3 Testing for Poolability

Heterogeneous estimators relax the assumption made in the error components model, which imposes homogeneity of all model parameters (but the intercept) across individuals. Under this assumption, one can estimate a single model for the whole sample, at most including individual‐specific constant terms. This restriction, which is usually called poolability, can be tested by comparing the estimation results from the different approaches. Furthermore, one can impose the further restriction of no individual‐specific intercepts.

In the variable coefficients framework, unrestricted estimation consists in estimating by OLS one different model for each individual. The sum of squared residuals is then: . For this model, degrees of freedom are: . The restricted model to compare to can be either pooled OLS ( with degrees of freedom) or the within model ( with degrees of freedom), depending on whether the absence of individual effects is imposed or not. The test statistic is then (taking the within specification as the restricted model):

This takes the form of a well‐known stability test (known as the Chow test) distributed under as an F with and degrees of freedom.

The function performing this kind of test is called pooltest. One possible usage is to provide two models, one estimated separately for each individual, and either an OLS or a within model. In the first case, all parameters are supposed constant under , including the constant terms. The unrestricted model is estimated by the function pvcm. As seen above, this function allows to estimate two different models, depending on the parameter model; here, the appropriate value for this argument is 'within' (the other possible choice being illustrated in the next section).

Example 8‐4 Poolability test – `HousePricesUS` data set

Estimating the competing models for the HousePricesUS data, we have:

 housep.np <- pvcm(log(price) ˜ log(income), data = HousePricesUS,
          model = "within")
housep.pool <- plm(log(price) ˜ log(income), data = HousePricesUS,
          model = "pooling")
housep.within <- plm(log(price) ˜ log(income), data = HousePricesUS,
          model = "within")

As usual, the pvcm function provides a coef.pvcm method to retrieve individual coefficients. As a first assessment of their dispersion, in Figure 8.1 we display a histogram of the distribution of either coefficient.

Figure 8.1Individual coefficients, HousePriceUS.

The summary.pvcm method instead returns, for each coefficient, the synthetic statistics usually produced by summary for a generic numeric vector:

 summary(housep.np)
Oneway (individual) effect No-pooling model

Call:
pvcm(formula = log(price) ˜ log(income), data = HousePricesUS,
    model = "within")

Balanced Panel: n = 49, T = 29, N = 1421

Residuals:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-0.2790 -0.0699 -0.0058  0.0000  0.0647  0.3524

Coefficients:
  (Intercept)      log(income)
 Min.   :-0.295   Min.   :-1.141
 1st Qu.: 3.152   1st Qu.:-0.138
 Median : 4.146   Median : 0.228
 Mean   : 3.850   Mean   : 0.302
 3rd Qu.: 4.777   3rd Qu.: 0.661
 Max.   : 6.911   Max.   : 2.037

Total Sum of Squares: 3870
Residual Sum of Squares: 13.7
Multiple R-Squared: 0.996

The stability test can then be performed supplying housep.np and either housep.pool or housep.within to the test function, depending on whether we want to assume absence of individual effects or not. Notice the different degrees of freedom.

 pooltest(housep.pool, housep.np)

F statistic

data:  log(price) ˜ log(income)
F = 26, df1 = 96, df2 = 1300, p-value <2e-16
alternative hypothesis: unstability
pooltest(housep.within, housep.np)

F statistic

data:  log(price) ˜ log(income)
F = 16, df1 = 48, df2 = 1300, p-value <2e-16
alternative hypothesis: unstability

Coefficient stability is very strongly rejected, even in its weakest form (specific constants). The same tests can be performed using a formula‐data syntax, specifying the nature of the restricted model through the model argument.

8.3 Cross‐sectional Dependence and Common Factors

Dependence across individual units, or cross‐sectional dependence, can take two main forms. Either it depends on the relative position of units in (some) space, so that – according to the so‐called Tobler law – nearby units are “more related” than far away ones; or it depends on being observed at the same time and thus being subject to the same set of common, global factors that affect each unit to an extent that does not depend on distance.

The former kind of dependence is called spatial and is more appropriate to describe phenomena that spill over from one unit to nearby ones through vicinity, such as the diffusion of a disease or of know‐how in the labor force or the alteration in cigarette sales from cross‐border smuggling. In this case, one does therefore often speak of local dependence; although in many spatial models, effects do actually carry over across all spatial units, they in fact always do so in a distance‐decaying fashion, whereby influence is strongest between the closest units. In the characterization of Pesaran and Tosetti (2011), this kind of dependence is also dubbed “cross‐sectional weak dependence.”

The latter kind of dependence does instead not need units to be referenced in any space: the relative position does not matter because correlation is assumed to stem from being exposed to the same, cross‐sectionally invariant common factors (the world interest rate, the price of oil, the rate of technological progress, the stock market booms or busts, the price of homes in some reference market). Common factors can well originate from one or more main locations (think of a primary stock exchange, such as New York or London, setting prices that affect all other peers worldwide) but the effect will not depend on distance. Because factor‐related dependence does typically not decrease with the distance between units, it is also called global dependence. In the characterization of Pesaran and Tosetti (2011), it is named “cross‐sectional strong dependence.”

As can be seen from the examples, common factors can be observable or not: the case when they are unobservable is of course the most interesting one. Most importantly, they can also be correlated with the regressors included in the model so that if they are omitted because they are unobservable, they will be a source of endogeneity and hence of inconsistency for estimators, unless they are appropriately accounted for (for an assessment of the properties of panel time series estimators under different omitted factors scenarios, see Coakley et al., 2006).

The first kind of dependence will be the subject of the chapter on spatial panels. In the following, common factor induced correlation will be our primary concern; nevertheless, the methods presented here are generally robust to spatial correlation as well.

8.3.1 The Common Factor Model

Consider the factor‐augmented panel model

where is the cross‐sectional index and the time index. is a vector of observed, strictly exogenous regressors including a 1 and is a vector of unobserved, cross‐sectionally invariant common factors.

Such structure is capable of generating cross‐sectional correlation in case of a similar, albeit not identical, response across countries to modifications in the common factors, measured by the factor loadings . The common factors are allowed to be correlated with the regressors, as is most likely to be the case, so their effect comes both through factor loadings and through the indirect effect on the observed regressors. The common factors are also allowed to be nonstationary. Moreover, the remainder error term is allowed to be spatially correlated as in

where is the generic element of an spatial weights matrix in which nonzero elements correspond to pairs of spatially close observation units (e.g., regions sharing a common border, or below a given distance threshold); so that each error is correlated with a weighted average of the errors in close‐by observations according to the parameter .²

The two kinds of error dependence induced by omitted common factors and by spatial error correlation have serious consequences on the properties of estimators if they are neglected. The former induces cross‐sectional correlation of a pervasive type, not dying out with distance, characterized by Pesaran and Tosetti (2011) as strong; moreover, if the omitted common factors are correlated with the regressors, the latter become endogenous and estimators become inconsistent. The latter type of dependence, dubbed weak because it dies out with distance, has less serious consequences on estimation but can still cause inefficiency (and hence inconsistent standard errors and invalid inference); moreover, as discussed in the next section, it weakens consistency in the particular case of spurious panel regression. Estimators able to control for the strong kind of dependence, as it turns out, are consistent in the presence of weak dependence as well.

In the special case of only one factor with uniform factor loadings , the common factor model becomes a time fixed effects model, which can be estimated either by OLS with time dummies or by the appropriate within estimator, i.e., OLS on cross‐sectionally demeaned data.

8.3.2 Common Correlated Effects Augmentation

The principle of common correlated effects (CCE) augmentation of Pesaran (2006) is based on the idea that, for large , the factors can be approximated by cross‐sectional averages of the response and regressors. Following the original paper (see also Holly et al., 2010), consider the model:

(8.6)

where both the (composite) error and the regressors are generated by linear combinations of the unobserved, cross‐sectionally invariant factors :

(8.7)

(8.8)

Substituting (8.7) in (8.6) and combining the result with (8.8), we get:

(8.9)

where and

Taking cross‐section averages of (8.9),

so that, if is invertible, the common factors can be written as:

If as and , then

Following this line of reasoning, Pesaran (2006) shows that the cross‐sectional averages of the response () and regressors () are ‐consistent estimators of the unobserved common factors and can therefore be used as observable proxies thereof. Augmenting the regression with these averages is known as the common correlated effects (CCE) principle. CCE estimators can be used to consistently estimate the individual slope parameters by applying least squares to the augmented regression

where .

The estimator for each individual slope coefficient can then be written compactly as

with , contains: the matrix of cross‐sectional averages , ; and a deterministic component comprising individual intercept and time trend (Pesaran, 2006, p. 974). The average is then estimated by the MG method,

This estimator is known as , for “common correlated effects mean groups.”

The covariance matrix is estimated nonparametrically, on the basis of the empirical covariance of the individual coefficients, just like in the MG case:

(8.10)

Unlike other estimators, the CCE is (‐) consistent for any fixed, unknown number of possibly nonstationary common factors. Being robust to strong forms of cross‐sectional dependence, the CCE estimator is also robust to weak ones such as spatial correlation (see Pesaran and Tosetti, 2011). Moreover, the CCE strategy has proved most effective in a number of simulation studies, e.g., Coakley et al. (2006), Pesaran and Tosetti (2011), Kapetanios et al. (2011).

Example 8‐5 Common correlated effects MG – `HousePricesUS` data set

The function pmg will perform CCE augmentation in the context of the MG model, if the argument model is set to 'cmg'. In their article, Holly et al. (2010) augment their model with the cross‐section averages in order to obtain a consistent estimate of the income elasticity of house prices in the presence of common factors. Below we reproduce and compare their MG and results. The MG and coefficients are substantially different; with CCE the income elasticity turns out much higher and not significantly different from 1 any more, in line with economic theory. summary.pmg explicitly outputs the coefficients and significance diagnostics for the added cross‐sectional averages, denoted with the suffix .bar. The coefficients on the latter are not meaningful per se, but their joint significance can be seen as an informal test for the presence of common factors.

 library("texreg")
cmgmod <- pmg(log(price) ˜ log(income), data = HousePricesUS, model = "cmg")
screenreg(list(mg = mgmod, ccemg = cmgmod), digits = 3)
===========================================
                 mg            ccemg
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
(Intercept)         3.850 ***    -0.115
                   (0.204)       (0.256)
log(income)         0.302 **      1.135 ***
                   (0.093)       (0.195)
y.bar                             1.047 ***
                                 (0.058)
log(income).bar                  -1.195 ***
                                 (0.199)
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
Num. obs.        1421          1421
===========================================
*** p < 0.001, ** p < 0.01, * p < 0.05

8.3.2.1 CCE Mean Groups vs. CCE Pooled

Estimation by the CCE principle can be performed either leaving parameters free to vary, as above, or imposing parameter homogeneity (but maintaining heterogeneity in intercepts, factor loadings, and possibly time trends), which leads to the CCEP (pooled) estimator

(8.11)

and is to be preferred on efficiency grounds when the underlying assumption that is reasonable. It must be observed that the CCEP estimator, although imposing , still allows individual factor loadings to differ.

The standard pooled or heterogeneous estimators can be seen an special cases of this more general formulation where augmentation is eliminated or reduced: pooled OLS as CCEP with , individual fixed effects as CCEP with containing only individual dummies. The mean groups (MG) estimator can in turn be seen as where .

Example 8‐6 and CCEP – `HousePricesUS` data set

The function pcce estimates CCE models of either type by projection of the original regressors on the matrix ; by default (model='mg') one gets the , if model='p' the CCEP. This is the only way to perform CCEP estimation, while results from pcce will be equivalent to those obtained through explicit augmentation with pmg, the only difference being that here one cannot see the significance diagnostics for the added cross‐sectional averages:

 ccemgmod <- pcce(log(price) ˜ log(income), data=HousePricesUS, model="mg")
summary(ccemgmod)
Common Correlated Effects model
Call:
pcce(formula = log(price) ˜ log(income), data = HousePricesUS,
    model = "mg")

Balanced Panel: n = 49, T = 29, N = 1421

Residuals:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
-0.23744 -0.03549  0.00027  0.00000  0.03639  0.22423
Coefficients:
            Estimate Std. Error z-value Pr(>|z|)
log(income)    1.135      0.195    5.81  6.3e-09 ***
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 47.2
Residual Sum of Squares: 5.66
HPY R-squared: 0.74

Holly et al. (2010) are interested in estimating the relationship between house prices and income net of the influence of common factors under the pooled specification as well. To this end, they estimate a homogeneous CCEP version of the baseline model:

 ccepmod <- pcce(log(price) ˜ log(income), data=HousePricesUS, model="p")
summary(ccepmod)
Common Correlated Effects model
Call:
pcce(formula = log(price) ˜ log(income), data = HousePricesUS,
    model = "p")

Balanced Panel: n = 49, T = 29, N = 1421

Residuals:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
-0.27883 -0.03928 -0.00209  0.00000  0.03927  0.29993

Coefficients:
            Estimate Std. Error z-value Pr(>|z|)
log(income)    1.199      0.207    5.79  7.2e-09 ***
‐‐‐
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 47.2
Residual Sum of Squares: 6.89
HPY R-squared: 0.696

The results from the two specifications are very close, as regards both the coefficients and the standard errors thereof, which speaks in favor of imposing the pooling restriction.

8.3.2.2 Computing the CCEP Variance

According to Pesaran (2006, 5.2), the variance of the CCEP estimator can be computed in two different ways, depending on whether the assumption of parameter homogeneity is imposed here as well (homogeneous estimator) or not (heterogeneous, or nonparametric, estimator).

The heterogeneous version (Pesaran, 2006, Th. 3) is based again on the nonparametric estimate of the individual coefficients' covariance. Defining

and

the estimator is

(8.12)

This estimator is consistent under quite general conditions as regards the rate of growth of vs and the distribution of individual parameters; it is the one that fares best in the original papers' simulation study and the one the author recommends to use. It is therefore the default method in the pcce function.

Nevertheless, strictly speaking, (8.12) is not appropriate under complete homogeneity. Pesaran (2006, Th. 4) presents an alternative, which is appropriate for large panels (i.e., if as ). The latter, which is presented in detail in Pesaran (2006, p. 988), is based on the nonparametric kernel‐smoothed estimator of Newey and West (see 5.1.1.3) and can be calculated using standard methods. Analogously, and again in large‐ settings, the familiar clustering estimator can be applied. In fact, being idempotent, the CCEP estimator in (8.11) can be seen as OLS on the transformed variables ; hence methods for robust covariances can be applied to pcce objects the same way they are to plm ones – e.g., those representing a within model. From a software viewpoint, the pcce function is compliant with both vcovNW and vcovHC.

Example 8‐7 variance of the CCEP estimator – `RDSpillovers` data set

The main point in Eberhardt et al. (2013) is to control for cross‐sectional R&D spillovers in estimating the productivity of own R&D of any observation unit. To this end, they employ CCE augmentation both in heterogeneous and in homogeneous flavors. Below we present the CCEP estimates from their Table 5 with three alternative estimators for the standard errors:

 ccep.rds <- pcce(fm.rds, RDSpillovers, model="p")
library(lmtest)
ccep.tab <- cbind(coeftest(ccep.rds)[, 1:2],
                  coeftest(ccep.rds, vcov = vcovNW)[, 2],
                  coeftest(ccep.rds, vcov = vcovHC)[, 2])
dimnames(ccep.tab)[[2]][2:4] <- c("Nonparam.", "vcovNW", "vcovHC")
round(ccep.tab, 3)
     Estimate Nonparam. vcovNW vcovHC
lnl     0.562     0.088  0.031  0.045
lnk     0.289     0.161  0.045  0.077
lnrd    0.084     0.068  0.020  0.033

A priori, homogeneous variance estimators are relatively well‐suited to this comparatively large and short dataset, provided that the homogeneity assumption holds. From the results we can instead see that the nonparametric standard errors are much more conservative, hinting at pooling assumptions being too restrictive.

8.4 Nonstationarity and Cointegration

The time series dimension of “long” panel datasets raises the issue of possible nonstationarity and cointegration. From an econometric viewpoint, if two (single) nonstationary time series are cointegrated, then the least squares estimator of the regression parameter characterizing the relationship is superconsistent and converges to the true value faster than its stationary counterpart (Stock, 1987). If on the contrary they are nonstationary but not cointegrated, the statistical relationship is spurious, and least squares estimates do not converge to their true values at all, while fit and significance diagnostics yield the false positive results famously discussed by Granger and Newbold (1974).

In a panel time series context, there is one more dimension available for inference: the cross section. Assuming cross‐sectional independence, Phillips and Moon (1999) show that a spurious panel data regression can still deliver a consistent estimate of long‐run parameters. Yet its convergence properties will be weaker than those of a cointegrating one: in particular, the coefficients of a spurious panel regression will still converge to their true values, although at a much slower rate than that of a cointegrating panel, which is .

This result depends on an assumption of cross‐sectional independence. It is weakened if the errors are cross‐sectionally weakly correlated, for example if they follow a spatial process, and can be expected to fail in presence of strong cross‐sectional dependence, as would arise when omitting to control for common factors (Phillips and Moon, 1999, pages 1091–1092). Both pooled OLS (Phillips and Sul, 2003) and mean groups estimators (Coakley et al., 2006) lose their advantage in precision from pooling when cross‐sectional dependence is present.

8.4.1 Unit Root Testing: Generalities

Detecting unit roots has become a central subject in macroeconometrics. The techniques employed are adaptations from the time series literature to the panel case. We will begin by reviewing the main results regarding time series.

Consider a variable generated by an autoregressive process of order one:

The vector of explanatory variables may contain an intercept, a linear trend, and different explanatory variables. To keep things simple, in the following we will assume , so that follows a “pure” autoregressive process. As regards the error (which in this context is often called the innovation), we will assume that it has mean zero and standard deviation . By recursive substitution, one has:

If is deterministic and the are not correlated, the variance of can be written:

If , we have:

On the other hand, if , so that the variance grows to infinity with ; the series is then nonstationary and is said to have a unit root. The presence of unit roots poses various problems, first and foremost that of spurious regressions. In the presence of a unit root, a series presents a peculiar sort of trend that is not deterministic but stochastic, and the presence of such trends in two series containing unit roots may induce an artificial correlation between them. In Figure 8.2 we present two autoregressive series with respectively and . We see how in the former case the autoregressive process translates into correlation between successive values of ; in particular, if then is more likely to be negative than positive. However, the curve representing the realization of the process crosses the horizontal axis frequently. On the other hand, in the case of a unit root, one can clearly detect the presence of a stochastic trend (in this case, on the rise): only changes sign once, and most of its realizations are positive.

Image described by caption and surrounding text. — Figure 8.2Autoregressive processes with different parameters.

To illustrate the importance of the spurious regression problem, we perform a short simulation exercise; we draw two autoregressive series independently, regress one on the other, and recover the t‐statistic corresponding to the null hypothesis . This hypothesis is true by construction; therefore, in a normal context the t‐statistic should not reject (i.e., be roughly less than 2) in 95% of cases. Let us begin by illustrating this result for . To this end, we employ two functions: code generates an autoregressive series, tstat performs OLS estimation, and recovers the t‐statistic:

 autoreg <- function(rho = 0.1, T = 100){
  e <- rnorm(T)
  for (t in 2:(T)) e[t] <- e[t] + rho *e[t-1]
  e
}
tstat <- function(rho = 0.1, T = 100){
  y <- autoreg(rho, T)
  x <- autoreg(rho, T)
  z <- lm(y ˜ x)
  coef(z)[2] / sqrt(diag(vcov(z))[2])
}
result <- c()
R <- 1000
for (i in 1:R) result <- c(result, tstat(rho = 0.2, T = 40))
quantile(result, c(0.025, 0.975))
  2.5%  97.5%
-2.114  1.990
prop.table(table(abs(result) > 2))

FALSE  TRUE
0.943 0.057

We can see how the empirical quantiles are very close to their expected values and the share of false positives is in the region of 5%. Let us now do the same with two series, each containing a unit root:

 result <- c()
R <- 1000
for (i in 1:R) result <- c(result, tstat(rho = 1, T = 40))
quantile(result, c(0.025, 0.975))
  2.5%  97.5%
-9.158  8.227
prop.table(table(abs(result) > 2))

FALSE  TRUE
0.379 0.621

Judging by the usual t‐statistic, in two thirds of cases one would conclude in favor of a significant relationship between our two independently generated variables.

It is therefore crucial to detect the presence of unit roots in time series data; otherwise, there are considerable chances to obtain falsely significant results. To this end, it is simplest to write the equation of the autoregressive process subtracting to both sides. One has then:

The unit root test then becomes a zero restriction test for the coefficient associated to in the model where the regressand is . One might want to use a classic t‐statistic, obtained dividing by its standard error. Setting vs , one will then reject the unit root hypothesis at the 5% level if the statistic is less than .

 R <- 1000
T <- 100
result <- c()
for (i in 1:R){
  y <- autoreg(rho=1, T=100)
  Dy <- y[2:T] - y[1:(T-1)]
  Ly <- y[1:(T-1)]
  z <- lm(Dy ˜ Ly)
  result <- c(result, coef(z)[2] / sqrt(diag(vcov(z))[2]))
}

In Figure 8.3 we depict a histogram of the realizations of the t‐statistic, superposing a normal density curve:

One can easily see that employing classic inference procedures to detect the presence of unit roots is unwarranted, as the t‐statistic follows a distribution that is very far from the normal. Employing the usual critical value of , one has here:

 prop.table(table(result < -1.64))

FALSE  TRUE
0.542 0.458

which leads to reject the true hypothesis of a unit root one half of the times. To perform the Dickey‐Fuller test, one needs specific critical values that are not those of the normal (or the t) distribution. The test can be performed augmenting the auxiliary model with a constant and/or a deterministic trend; lags of can also be added in order to clean out any possible autocorrelation of .

The regression between two series both containing a unit root is only appropriate if they present a long‐term structural relationship. One speaks then of co‐integration. More precisely, we will say that two variables and are cointegrated if there exists such that:

where is stationary, i.e., it does not have unit roots. A simple cointegration test can then be performed as follows:

verify whether and have unit roots with a Dickey‐Fuller test,
if they both do, then estimate a model of on and recover the residuals ,
do a Dickey‐Fuller test on : if the unit root hypothesis is rejected, then and are cointegrated and the regression of on is meaningful; otherwise, and are integrated but not cointegrated, and the regression of on will be spurious.

8.4.2 First Generation Unit Root Testing

The classical test for unit roots is usually called ADF for “augmented Dickey‐Fuller”. Many extensions of this test have been proposed to adapt it to a panel data setting.

8.4.2.1 Preliminary Results

Some of these tests are obtained by applying separate ADF tests to every individual in the sample. To perform these preliminary tests, one shall choose the number of lags and the relevant set of deterministic variables , which can be either , (an intercept), or (an intercept and a time trend).

(8.13)

This choice can be based on a number of criteria:

the Schwarz information criterion (SIC),
the Akaike information criterion (AIC),
the Hall method, consisting in adding as many lags as there are significant ones.

The regression is performed on observations for each individual, which leads to in total, with , being the average number of lags. The variance of the residuals for individual is estimated by:

(8.14)

with the degrees of freedom of the regression.

8.4.2.2 Levin‐Lin‐Chu Test

Levin et al. (2002) proposed the first panel unit root test. In order to perform it, one must run two preliminary regressions: respectively, of and of as functions of and , obtaining two residual vectors denoted respectively by and .

These two residuals are then normalized dividing them by the estimated standard error (equation 8.14). The estimator of is obtained by regressing on for the whole sample. Its standard deviation and t‐statistic are denoted respectively by and .

The long‐term variance of is estimated by:

where is the truncation lag parameter and are the sample covariance weights, which depend on the choice of kernel.

Calling the ratio between the long‐term and the short‐term variance for the ‐th individual and the sample average thereof, Levin et al. (2002) show that the statistic:

is normally distributed under the null hypothesis of a unit root. and can be found in the original paper.

8.4.2.3 Im, Pesaran and Shin Test

One of the drawbacks of the Levin et al. (2002) test is that the alternative hypothesis holds that , but at the same time it is the same for all individuals. The test proposed by Im et al. (2003) (IPS) overtakes this limitation: the null hypothesis is still for all individuals, but the alternative is now that can be different across individuals, provided that at least for some of them. The IPS test takes the form of a simple average of the t‐statistics for from the individual ADF regressions (8.13):

The IPS statistic follows a nonstandard distribution, and must be therefore compared with values tabulated ad hoc. Alternatively, it can be standardized with mean and variance and given in the Im et al. (2003) paper. The test statistic is then :

which, under the null of a unit root, is normally distributed.

8.4.2.4 The Maddala and Wu Test

Maddala and Wu (1999) proposed a similar test, again not imposing homogeneity of under the alternative. Instead of the t‐statistics, it is based on combining the critical values obtained from the individual ADF tests. The test statistic is then simply:

and, under the null of a unit root for all individuals, it is distributed as a with degrees of freedom.

Example 8‐8 First generation unit root testing – `HousePricesUS` data set

The first‐generation unit root tests can be computed using the purtest function. A formula‐data can be used to describe the variable for which the test has to be computed and the deterministic covariates (0, 1 for an intercept, and trend for an intercept and a time trend). The same description of the test to be computed can be performed using a pseries and specifying the deterministic covariates using the exo argument.

We set below the lags argument to 2 for comparability across procedures, instead of leaving the choice to one of the flexible procedures described above (e.g., by setting the lags argument to 'Hall' to select the lags using Hall (1994)'s method). We apply the test to the price variable of the HousePricesUS data set.

 data("HousePricesUS", package = "pder")
price <- pdata.frame(HousePricesUS)$price
purtest(log(price), test = "levinlin", lags = 2, exo = "trend")

Levin-Lin-Chu Unit-Root Test (ex. var.: Individual
Intercepts and Trend)

data:  log(price)
z = -1.3, p-value = 0.1
alternative hypothesis: stationarity
purtest(log(price), test = "madwu", lags = 2, exo = "trend")

Maddala-Wu Unit-Root Test (ex. var.: Individual
Intercepts and Trend)

data:  log(price)
chisq = 100, df = 98, p-value = 0.4
alternative hypothesis: stationarity
purtest(log(price), test = "ips", lags = 2, exo = "trend")

Im-Pesaran-Shin Unit-Root Test (ex. var.: Individual
Intercepts and Trend)

data:  log(price)
z = 0.77, p-value = 0.8
alternative hypothesis: stationarity

The three tests strongly don't reject the null hypothesis of unit root.

8.4.3 Second Generation Unit Root Testing

The above panel unit root tests do all rest on the hypothesis of absence of cross‐sectional correlation. When, after the turn of the millennium, the panel data literature started recognizing how pervasive cross‐sectional correlation is in applications and progressed toward the development of consistent methods in its presence, the above assumption started to be seen as too restrictive. The tests assuming no cross‐sectional correlation became known under the collective name of “first‐generation” panel unit root tests, to distinguish them from the new breed of testing procedures that was emerging. These new panel unit root tests, sharing the quality of being consistent in the face of cross‐sectional correlation, were dubbed “second generation” to distinguish them from the former and are currently most often employed in applications.

The reference framework for cross‐sectionally correlated panels is, as discussed above, the common factor model. A number of cross‐correlation‐compliant panel unit root procedures have been devised in this framework based on various defactoring procedures. One of the most popular second‐generation tests, due to Pesaran (2007), takes the approach of controlling for the common factors, instead of trying to eliminate them; it does so in the CCE framework, by augmenting the auxiliary regressions through cross‐sectional averages of the response and regressors. The individual ADF regressions are augmented with the cross‐sectional averages of lagged levels and differences of the individual series:

(8.15)

The individual ADF regressions are therefore denoted “cross‐sectionally augmented ADF” (CADF) regressions; the resulting individual CADF statistics can in principle be combined as described above, forming the basis for either a “cross‐sectionally augmented IPS” (CIPS) or a Maddala‐Wu test. However, the limiting distributions for the latter do not apply anymore in the absence of cross‐sectional independence; for this reason, Pesaran (2007) tabulated critical values for the CIPS test for the three different cases where the auxiliary CADF regressions contain an intercept, a deterministic trend, or none of the above.

Example 8‐9 IPS and CIPS tests – `HousePricesUS` data set

Holly et al. (2010) analyze the stationarity of their target variable, the house price index, and of the regressors of their model using individual ADF tests. They do so only in order to demonstrate the strong cross‐sectional correlation remaining in the residuals of the individual ADF regressions, which invalidates the use of the first‐generation IPS test, and thus to motivate their resorting to the CADF‐based CIPS test. In fact, they do not show the result of an IPS test but only the regression diagnostics.

As every unit root test, the results are sensitive to the order of time series augmentation: the more lags we add, the more confident we are to have effectively filtered out residual serial correlation, but the less degrees of freedom, and hence the less power, we allow to the testing procedure. They consider the first four augmentation orders: following them, below we reproduce the CD statistics and the average pairwise correlation coefficients for the residuals of the ADF regressions.³

Below we explicitly estimate the individual ADF regressions using the pmg function: the latter outputs a pmg object from which the pcdtest function is able to retrieve the residuals as a pseries, so it can be directly applied specifying whether one wants the CD statistic (default) or the pairwise correlation coefficients (then test has to be set to 'rho').

 tab5a <- matrix(NA, ncol = 4, nrow = 2)
tab5b <- matrix(NA, ncol = 4, nrow = 2)

for(i in 1:4) {
    mymod <- pmg(diff(log(income)) ˜ lag(log(income)) +
                 lag(diff(log(income)), 1:i),
                 data = HousePricesUS,
                 model = "mg", trend = TRUE)
    tab5a[1, i] <- pcdtest(mymod, test = "rho")$statistic
    tab5b[1, i] <- pcdtest(mymod, test =  "cd")$statistic
}

for(i in 1:4) {
    mymod <- pmg(diff(log(price)) ˜ lag(log(price)) +
                 lag(diff(log(price)), 1:i),
                 data=HousePricesUS,
                 model="mg", trend = TRUE)
    tab5a[2, i] <- pcdtest(mymod, test = "rho")$statistic
    tab5b[2, i] <- pcdtest(mymod, test =  "cd")$statistic
}

tab5a <- round(tab5a, 3)
tab5b <- round(tab5b, 2)
dimnames(tab5a) <- list(c("income", "price"),
                        paste("ADF(", 1:4, ")", sep=""))
dimnames(tab5b) <- dimnames(tab5a)

tab5a
       ADF(1) ADF(2) ADF(3) ADF(4)
income  0.465  0.443  0.338  0.317
price   0.346  0.326  0.252  0.194
tab5b
       ADF(1) ADF(2) ADF(3) ADF(4)
income  82.84  77.40  57.96  53.21
price   61.73  57.02  43.21  32.52

Residual cross‐correlation is clearly apparent and motivates employing the CIPS test. In the following we assess the order of integration of prices and income by testing the original series and the differenced ones for unit roots. To do so, the dataset now contained in the data.frame HousePricesUS has to be converted into a pdata.frame from which the testing function cipstest will be able to retrieve the panel indices it needs. The number of lags is left at the default value of 2. As for the deterministic component of the CADF regressions, we allow for an intercept (type='drift') in the original series; for the sake of consistency, we then exclude it from the differenced one (type='none').

 php <- pdata.frame(HousePricesUS)
cipstest(log(php$price), type = "drift")

Pesaran's CIPS test for unit roots

data:  log(php$price)
CIPS test = -2, lag order = 2, p-value = 0.1
alternative hypothesis: Stationarity
cipstest(diff(log(php$price)), type = "none")

Pesaran's CIPS test for unit roots

data:  diff(log(php$price))
CIPS test = -1.8, lag order = 2, p-value = 0.01
alternative hypothesis: Stationarity

The CIPS test does not reject a unit root for the original series, while it does for the differenced one⁴. The conclusion is that the price index is integrated of order 1. The same (not reported) happens for income, at which point the crucial issue is whether house prices and income are cointegrated, or otherwise the regression of interest is spurious. A CIPS test of the regression residuals will help shed light on the issue: currently the cipstest function only accepts pseries objects as arguments; hence, we extract residuals as a pseries through the usual resid.ccep extractor function prior to feeding them to the unit root test. Given that individual trends have been controlled for at the modeling stage and that by the very nature of regression residuals, the series is not expected to contain a drift (intercept), we eliminate any deterministic component from the CADF regressions by specifying type='none':

 cipstest(resid(ccemgmod), type="none")

Pesaran's CIPS test for unit roots

data:  resid(ccemgmod)
CIPS test = -2.7, lag order = 2, p-value = 0.01
alternative hypothesis: Stationarity
cipstest(resid(ccepmod), type="none")

Pesaran's CIPS test for unit roots

data:  resid(ccepmod)
CIPS test = -2.2, lag order = 2, p-value = 0.01
alternative hypothesis: Stationarity

The unit root hypothesis is rejected for both the residuals of the and the CCEP models. The conclusion is that both models represent cointegrating regressions.

Notes

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.