Pakistani

The Pakismn Development Review
Vol. XXVII, No.4 Part II (Winter 1988)
Combining Yearly and Quarterly Data in

Regression Analysis
EATZAZ AHMAD"
I. INTRODUCTION
Data deficiency is often a problem in regression analysis. The problem, for

example, may be due to non-availability of data on some variable, missing observa-
tions, lack of information due to multicollinearity and measurement errors, etc.
Various approaches have been suggested to deal with the problem depending on its
precise nature. One such problem we want to focus our attention on is the lack of
time disaggregated data in time-series regression analysis. In particular, observations
on some variables over a shorter time interval like a quarter may be limited in
number while the corresponding observations over a longer time interval like a year
are available for a long period of time. The number of quarterly observations may
not be sufficient to estimate the desired relationship with acceptable degrees of
freedom. On the other hand, estimation with yearly data may require the use of a
long time series going way back into the past. The estimates thus obtained may not
capture the relationship prevailing at present or in the recent past and, therefore,
mislead the researcher. In addition, the use of yearly data may also result in lack of
degrees of freedom.
One possible approach to deal with the problem is to convert the yearly data
into quarterly data by using some distribution scheme and combine with these data
the other quarterly observations available. Friedman (1962) suggestsvarious non-
correlation methods of distributing a time aggregatedseries into a time disaggregated
seties. Chow and Un (1971), Friedman (1962), Hsiao (1979) and Palm and Nijman
(1982) also suggest various methods of using a time disaggregated related series to
distribute a time aggregatedseries over shorter time intervals.
The problem with any method of distribution is that it introduces measure-
ment error and related problems in regressionanalysis. Due to this reason we suggest
another approach which is quite simple and does not involve distribution of yeany
data over quarterly intervals. One can simply pool quarterly and yearly data
adjusting for heteroscedasticity introduced by the pooling. It is shown that the
.The author is Assistant Professor in the Department of Economics, Quaid-i-AzamUni-

versity, Islamabad. He is grateful to Professor F. T. Denton of McMasterUniversity for his
comments on an earlier draft of this paper.
--- --- -
716 Eatzaz Ahmad Combining Yearly and Quarterly Data 717
estimators of regression coefficients with pooled data have smaller variances as (5') E(Ut U)8 = 40'2 for all t= s
compared to the estimators with yearly or quarterly data.
The paper is organized as follows. In Section 2 we present the model and = 0 for all t =f s.
derive an Ordinary Least Squares estimator and the corresponding variance- With no loss of generality, we assume that the quarterly observations are
covariance matrix of the vector of regression coefficients with alternative data sets. available over a whole number of years, say m. In addition, n yearly observations are
Relative efficiency of the alternative estimators is compared in Section 3. Section 4 also assumed to be available. Asis more likely, it is assumed that the yearly observa-
presents the conclusions of the paper with some thoughts on future research. tions appear before the quarterly ones. We consider the following three options
availableto estimate the relationship.
2. THE MODELAND ITS ESTIMATIONWITH
ALTERNATIVE DATA SETS Option I
Let the relationship to be estimated be described by the following linear Use yearly data over n + m years. With this option the regression model can
be written as:
regression equation:
Y t 1 =b1 + b2 x 2 t 1. + . . . . . + bk
' X
k tl
.+U .
tl (1)
where, t and i refer to year and quarter respectively. Weassume that:

[ :.] = [:.] B + [ :.J
where,
(1) All the variables are flow variables;
(2) There is no laggedvariable in the equation; Yn+1
Y1 U1
U b1
(3) All the x variablesare non-stochastic; Y= , Y*= . .
,U= ,U*=I. n+1 ,B=
(4) Uti is randomly distributed with E(Uti) '
=0 for all t and i; and
(5) E(u t I U8J.)=0'2 forallt=sandi=
' j
= 0 for all t"* s or i "*j. Yn
Yn+m U U
n n+m bk
Owing to assumptions (1) and (2) we can conveniently define yearly observa- 4 X 4
tions as: 2 ,1 Xk,1 X2,n+1 Xk,n+1
4
Yt =.l:
X= and X* =
1=1 Y t"1 for all t
4
4 X 4
l: x J.t 1' for all t and j
X'tJ = i=1 2,n Xk,n X2 ,n+m . ... X k ,n+ m
4 The best linear estimator of B, its mean vector and variance-covariancematrix
Ut =.l: Ut' for all t.
1=1 1 are as follows. 1
B
['"
= (X' X + x' * X *)-1 (X' Y + X' * Y *)
Thus we can write Equation (1) for yearly observations as:
(2) E(B[ ) =B
Yt = 4b1 + b2 X2 t + +bk Xkt + Ut
2
V(B[) =4 0 (x' X + X~X *r1 (3)
where Ut satisfies the following obvious properties.
'It is assumedthat the matrix [X' X~] has full column rank k which is less than n + m.
(4') Ut is randomly distributed with E(Ut) =0 for all t and
Option n E(BnI ) =B
Use only quarterly data over 4m quarterly observations. In this case the
V(Bm) = a2 (~X' X + x'*x *rl (5)
regressionmodel can be written as:
y*=x*B+u* 3. RELATIVE EFFICIENCY OF THE ALTERNATNE ESTIMATORSOF B

Since all the three estimators of B outlined above are unbiased, their relative
where, efficiency depends only on relative variances. For the comparison of variances we
will use the followingtheorem taken from Maddala (1977).
1 x ....
2.n+l .1
Theorem
y*- ., u* = , X* =
If B is a positive definite matrix and A - B is positive semidefmite then
B-1 - A-I is positive semidefinite.
[+'.' 1 x
f+'''j
Yn+m, n+m, 2.n+m,4 .... :k.n+L']
k,n+m,4
and B is the same as defined before. Proof

The best linear estimator of B, its mean vector and variance-covariancematrix Since B and hence B-1 are positive definite and A - B is positive semidefinite,
Therefore the Equation IB-1
under this option are:2 the matrix B-1
(A - B) is positive semidefinite.
(A - B) - vI I has all roots v;;' O. But the roots of this equation are the same as the
Bn = (x'*x*rl (x~y*) roots of I(A B) vB or IA (1 + v)B I or IB-1 - (1 + v)A-l I or I A(B-l -
- - 1 - \
A-I) - vI I. Thus all v ;;. 0 implies that A(B-l - A-I) is positive semidefinite.
E(Bn) = B Since B is positive definite and A - B is positive semidefini te, sum of the two, that is
-:4is positive defmite and so is A-I. Multiplying this positive definite matrix (A-I)
V(Bn) = a2- (x~x *rl (4) by the positive semidefmite matrix A(B-1 - A-I) gives a positive semidefinite
matrix B-1 - A-I. This completes the proof.
Option III Let us now apply this theorem to compare variances of various estimators of
Combine.4m quarterly and n yearly observations. Sincevariance of Ut is four B. We will consider the variance of a linear combination of the elements of B r
times as high as the variance of Uti' we have the problem of heteroscedasticity. (r = I. II, /II), namely e' B r where, e is a k x 1 column vector of known constants.
With pre-adjustment for heteroscedasticity, the model becomes: .
Var (e'BIII) Versus J1zr(e'BI)
Consider the following matrix.
t.} r x. X]B+ r u. U] (~X'X + x:.x*) - (~X'X + ~X'~*) = x:.x* - ~X:,.x*

where, Y, Y *, x, x *, U, u * and B are the same as defined earlier. With this option The element in jth row and hth column of this matrix is:
the best linear estimator of B, its mean vector and variance covariance matrix are:
n+m 4 n+m 4 4
~ ~ x.. x - ~ ~ ( ~ x' .) ( LX.) =
t=n+l i=1 Itl hti t=n+l i= 1 It I
B In =(~ X' X + x~x*rl (~ X' Y + x~Y*) i=1 h tl
n+m 4 n+m
L
4
L
t=n +1
Lex.
i=1 Itl
. - x.It ) (x
hti
- X
ht
) = L L
Zjti Zhti
2 The matrix x * is assumed to have full column rank k < 4m. t=n+l i=1
4 theorem implies that the matrix (x'* x *r1 - (% X' X + x;" x *r1 is positive
where, Ziti = Xiti - x it' Zhti = x hti - x ht' x it = ~ x. ./4 semidefinite.
i= 1 ltl
4 Therefore we can write:
and Xht = .~
1=1
Xht./4
1 0'2 c'(x~x*r1c ~a2 c'(%X'X+x;"x*r1c
The above discussion impliesthat the whole matrix (%X'X + x~ *) - (%X'X This implies, according to Equations (4) and (5), that
+%"x'~*) =x~* - %x'~* can be written as Z~*where,
var (c' B II ) .~var (c' B III ).
0 x -x
2 ,n+1,1 2,n+1 xk,n+1,1 - xk,n+1
This result does not require any explanation. Addition of observations to a
0 x -x
2 ,n + 1,2 2 ,n +1 Xk,k+1,2 - x k,n +1 given data set improv~s efficiency of the estimators.
Z* IDustration: One Explanatory Variable Case
Consider the case of one explanatory variable without intercept:
Yti =b x ti + Uti
In this casethe variance of br (r = I, II, IlI) can be calculated as follows:

0 x -x x -x
2,n+m,4 2,n+m k,n+m,4 k,n+m
a2
var (b[) n+m 4
dearly, Z~ Z* = (%x' X +x~ x *) - (% X' X + %X~ X *) is positive semidefinite. % ( Xt/
In addition, (% X' X + % X'* X *) is positive definite. Therefore, accordirlg to the t=1 i=1
theorem, the matrix 4 (X' X + X~ X *r1 - (% X' X + x'* x *r1 is positive semi-
definite. a2
var (bII) = n+m 4
t=n+1 i=1 (XtY
4 a2 c'(x' X + X'*X*r1 c ~ a2 c' (%X' X +x~x*r1c.
a2
Or, according to Equations (3) and (5) var (bm) = n 4 n+m 4
% ~ ~
t=1 (i~ Xt/ + t=n +1 i~ (xtif
var (c' B I ) ~ var (c' B III ).
Calling the denominator of b t as Dr respectively for r = I, Il and IIl, we can
Notice that, if Xiti = Xit' that is, if there is no variation across quarters within a show that:
n+m 4
year then Z',J,* is positive as well as negative semidefirlite. In this case var(c'B[)
Dm = D[ + t=n+1
.
1=1 (Xt'-1 Xt)2 and
is equal to var (c'B[II)' This is precisely what one should expect. If within a year
variation across quarters is zero then disaggregating yearly observations into quarterly n 4
observations does not provide any additional information. Dm = DII +
( x ti y
t=1 i=1
Var (c' Bm) Versus Var (c' BI) This implies that:
Now consider the matrix (%X' X +X:.X*) - x'*x *= %X' X which is obvious-
var (bIII ) < var (bI ) unless X ti = x t for t = n+1, . . ., n+m
ly positive semidefinite. Since, in addition, x'* x * is positive defmite, the above
Ii t = 1, , 4
1
722 Eatzaz Ahmod
4
var(bII/) < var(bI/) unless 1=1
.~ x("1 = 0 for t = 1, , n.
S. CONCLUDINGREMARKS
Aggregation of quarterly data into yearly data for any part of the sample
period results in loss of efficiency. Using quarterly data alone when additional Comments on
yearly data are available also"resultsin loss of efficiency. The best use of the limited "Combining Yearly and Quarterly Data
data is to pool yearly observations with the quarterly observations with appropriate in Regression Analysis"
adjustment for heteroscedasticity. The result can be generalizedto include seasonal
effects in the regression equation. It can be shown that pooling yearly data to a given In this paper the author has proposed an alternative approach to deal with the
set of quarterly data also improves efficiency of seasonal effects although the yearly issue of non-availability of comparable data in regression analysis. In particular, the
data alone are uselessto estimate seasonal effects.3 paper deals with the problem where both quarterly as well as annual observations on
The research can be extended to develop tests for autocorrelation of first or some variables are availablebut the number of each type of observation i.e. quarterly
fourth order (these are the most likely orders of autocorrelation at quarterly level). limited as a result of which it is difficult to estimate the desired relationship, by
Our preliminary research suggests that it is quite complicated to determine the order either using quarterly or annual data with acceptable degrees of freedom. Various
of autocorrelation with pooled data. Once the order of autocorrelation is deter- approaches have been suggested in the literature to deal with such problems. A
mined, one can use various procedures to improve the asymptotic efficiency of the common weakness of most of these approaches is that they introduce measurement
estimates. errors. This paper suggests that instead of estimating the desired relationship by
either using quarterly or annual data the researcher should simply pool the quarterly
REFERENCES
and the annual data with appropriate adjustment for heteroscedasicity. It is claimed
Chow, G. C., and A. I. Lin (1971). "Best Linear Unbiased Interpolation, Distribution that the coefficients obtained using pooled data will have a lower variance as
and Extrapolation of Time Seriesby Related Series". Review of Economics and compared to the estimators with yearly or quarterly data.
Statistics. Vol. CIII, No.4. pp. 372-375. To show the relative efficiency of the estimates obtained using pooled data the
Friedman, M. (1962). "The Interpolation of Time Seriesby Related Series".Journal author has made use of one of the standard theorems available in the econometric
of American Statistical Association. Vol. 57, pp. 729-757. literature. There is thus little room to caste doubt on his claim. The paper, there-
Hsiao, C. (1979). "linear Regression Using Both Temporally Aggregated and Tem- fore, can be regarded as a theoretical contribution in the area.
porally DisaggregatedData". Journal of Econometrics. Vol. 10, No.2, pp. 243- A general problem faced by many researchers in at least developing countries
252.' is not that both quarterly as well as annual data are available for the same variables
Maddala,G. S. (1979). Econometrics. Tokyo; McGraw-Hill,Inc. but that for some variables quarterly data are available while for others the annual
Palm, F. C., and T. E. Nijman (1982). "linear Regression Using Both Temporally data are available. In these circumstances the techniques suggested by the author
Aggregated and Temporally Disaggregated Data", Journal of Econometrics. is hardly of any use. I wonder if we can somehow modify the suggestedtechnique
Vol. 19, pp. 333-343. to handle the above-mentioned problem which is more frequently faced by the
researchers.
Nadeem A. Burney
Pakistan Institute of
Development Economics,
Islamabad
3 'This result has been proved in the paper originally presented in the general meeting.
The paper had to be reduced due to the space constraint.

Pakistani

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pakistani

Uploaded by

Copyright:

Available Formats

The Pakismn Development Review

Vol. XXVII, No.4 Part II (Winter 1988)

Combining Yearly and Quarterly Data in

Data deficiency is often a problem in regression analysis. The problem, for

.The author is Assistant Professor in the Department of Economics, Quaid-i-AzamUni-

where, t and i refer to year and quarter respectively. Weassume that:

y=xB+u* 3. RELATIVE EFFICIENCY OF THE ALTERNATNE ESTIMATORSOF B

and B is the same as defined before. Proof

Var (e'BIII) Versus J1zr(e'BI)

Consider the following matrix.

t.} r x. X]B+ r u. U] (~X'X + x:.x) - (~X'X + ~X'~) = x:.x* - ~X:,.x*

Z* IDustration: One Explanatory Variable Case

Consider the case of one explanatory variable without intercept:

In this casethe variance of br (r = I, II, IlI) can be calculated as follows:

The paper had to be reduced due to the space constraint.

You might also like