Professional Documents
Culture Documents
CH - 2 - Econometrics UG
CH - 2 - Econometrics UG
CH - 2 - Econometrics UG
1
MTU, College of Agriculture and Natural Resource Department of Agricultural Economics
2
MTU, College of Agriculture and Natural Resource Department of Agricultural Economics
A perfect positive correlation is given the value of 1. A perfect negative correlation is given the
value of -1. If there is absolutely no correlation present the value given is 0. The closer the
number is to 1 or -1, the stronger the correlation, or the stronger the relationship between the
variables. The closer the number is to 0, the weaker the correlation.
Two variables may have a positive correlation, negative correlation, or they may be uncorrelated.
This holds true both for linear and nonlinear correlation. Two variables are said to be positively
correlated if they tend to change together in the same direction, that is, if they tend to increase or
decrease together.
Such perfect correlation is seldom encountered. We still need to measure correlational strength,
defined as the degree to which data point adhere to an imaginary trend line passing through the
“scatter cloud.” Strong correlations are associated with scatter clouds that adhere closely to the
imaginary trend line. Weak correlations are associated with scatter clouds that adhere marginally
to the trend line. The closer r is to +1, the stronger the positive correlation.
The closer r is to -1, the stronger the negative correlation. Examples of strong and weak
correlations are shown below. Note: Correlational strength cannot be quantified visually. It is too
subjective and is easily influenced by axis-scaling. The eye is not a good judge of correlational
strength.
3
MTU, College of Agriculture and Natural Resource Department of Agricultural Economics
Such positive correlation is postulated by economic theory for the quantity of a commodity
supplied and its price. When the price increases the quantity supplied increases. Conversely,
when price falls the quantity supplied decreases. Negative correlation: Two variables are said to
be negatively correlated if they tend to change in the opposite direction: when X increases Y
decreases, and vice versa. For example, saving and household size are negatively correlated.
When price increases, demand for the commodity decreases and when price falls demand
increases.
2.2. Correlation coefficient and types of Correlation coefficient
The Population Correlation Coefficient ‘’ and its Sample Estimate ‘r’
In the light of the above discussions it appears clear that we can determine the kind of correlation
between two variables by direct observation of the scatter diagram. In addition, the scatter
diagram indicates the strength of the relationship between the two variables. This section is about
how to determine the type and degree of correlation using a numerical result. For a precise
4
MTU, College of Agriculture and Natural Resource Department of Agricultural Economics
u
ln (Y)=ln(β0 X β1 X 2e )=ln(β0)+β1 ln(X1 )+ β2 ln(X2)+u
β
1 2 2.1
~ ~^ ^ ~ ^ ~
Y= β0 + β1 X1+ β2 X2 +u
Or 2.2
~
Y = ln ( Y )
~
X 1 = ln ( X 1 )
~
X 2 = ln ( X 2 )
Where,
~
^ = (
β 0 ln β 0 )
Explanation of the Formula for r. The formulas presented above are those which are used to
determine r. While the calculation is relatively straightforward, although tedious, no explanation
for the formula has been given. Here an intuitive explanation of the formula is provided. Recall
that the original formula for determining the correlation coefficient r for the association between
two variables X and Y is
r=
∑ ( X i− X )( Y i −Y )
√ X − X ) √(∑ Y
2
(∑ i
2
i−X ) Or
The denominator of this formula involves the sums of the squares of the deviations of each value
of X and Y about their respective means. These summations under the square root sign in the
denominator are the same expressions as were used when calculating the variance and the
standard deviation in Chapter 5. The expression ∑ ( X i−X )2 can be called the variation in X.
5
MTU, College of Agriculture and Natural Resource Department of Agricultural Economics
This differs from the variance in that it is not divided by n-1. Similarly, ∑ ( Y i −Y )2 is the
variation in Y. The denominator of the expression for r is the square root of the products of these
two variations. It can be shown mathematically that this denominator, along with the expression
in the numerator scales the correlation coefficient so that it has limits of -1 and +1.
The numerator of the expression for r is
∑ ( X i−X )( Y i −Y )
and this is called the covariation of X and Y .
We will use a simple example from the theory of supply. Economic theory suggests that the
quantity of a commodity supplied in the market depends on its price, ceteris paribus. When price
increases the quantity supplied increases, and vice versa. When the market price falls producers
offer smaller quantities of their commodity for sale. In other words, economic theory postulates
that price (X) and quantity supplied (Y) are positively correlated.
Example 2.1: The following table shows the quantity supplied for a commodity with the
corresponding price values. Determine the type of correlation that exists between these two
variables.
Table 1: Data for computation of correlation coefficient
Time period(in days) Quantity supplied Yi (in tons) Price Xi (in shillings)
1 10 2
2 20 4
3 50 6
4 40 8
5 50 10
6 60 12
7 80 14
8 90 16
9 90 18
10 120 20
6
MTU, College of Agriculture and Natural Resource Department of Agricultural Economics
Or using the deviation form (Equation 2.2), the correlation coefficient can be computed as:
u
Y = β0 X β1 e
1
This result shows that there is a strong positive correlation between the quantity supplied and the
price of the commodity under consideration.
The simple correlation coefficient has the value always ranging between -1 and +1. That means
the value of correlation coefficient cannot be less than -1 and cannot be greater than +1. Its
minimum value is -1 and its maximum value is +1. If r= -1, there is perfect negative correlation
between the variables. Ifln(Y )=ln(β0 )+β1 ln(X1 )+ei , there is positive correlation between the two variables and
movement from zero to positive one increases the degree of positive correlation. If r= +1, there is
perfect positive correlation between the two variables. If the correlation coefficient is zero, it
indicates that there is no linear relationship between the two variables. If the two variables are
independent, the value of correlation coefficient is zero but zero correlation coefficient does not
show us that the two variables are independent.
7
MTU, College of Agriculture and Natural Resource Department of Agricultural Economics
The sign of the correlation coefficient determines whether the correlation is positive or negative.
The magnitude of the correlation coefficient determines the strength of the correlation.
Correlation is an effect size and so we can verbally describe the strength of the correlation using
the guide that Evans (1996) suggests for the absolute value of r:
Though, correlation coefficient is most popular in applied statistics and econometrics, it has its
own limitations. The major limitations of the method are:
8
MTU, College of Agriculture and Natural Resource Department of Agricultural Economics
1. The correlation coefficient always assumes linear relationship regardless of the fact whether
the assumption is true or not.
2. Great care must be exercised in interpreting the value of this coefficient as very often the
coefficient is misinterpreted. For example, high correlation between lung cancer and smoking
does not show us smoking causes lung cancer.
3. The value of the coefficient is unduly affected by the extreme values
5. The coefficient requires the quantitative measurement of both variables. If one of the two
variables is not quantitatively measured, the coefficient cannot be computed.
Definition of Covariance
Covariance is a statistical term, defined as a systematic relationship between a pair of random
variables wherein a change in one variable reciprocated by an equivalent change in another
variable.
Covariance can take any value between -∞ to +∞, wherein the negative value is an indicator of
negative relationship whereas a positive value represents the positive relationship. Further, it
ascertains the linear relationship between variables. Therefore, when the value is zero, it
indicates no relationship. In addition to this, when all the observations of the either variable are
same, the covariance will be zero.
In Covariance, when we change the unit of observation on any or both the two variables, then
there is no change in the strength of the relationship between two variables but the value of
covariance is changed.
For two random variables, X Y their covariance is defined by cov(X ,Y) = E(X − EX )(Y − EY ),
Property 2. Alternative expression: cov(X ,Y ) = EXY − (EX )(EY ) .
Proof
cov(X,Y)
9
MTU, College of Agriculture and Natural Resource Department of Agricultural Economics
The following points are noteworthy so far as the difference between covariance and correlation
is concerned:
2. A measure used to indicate the extent to which two random variables change in tandem is known
as covariance. A measure used to represent how strongly two random variables are related
known as correlation.
3. Covariance is nothing but a measure of correlation. On the contrary, correlation refers to the
scaled form of covariance.
4. The value of correlation takes place between -1 and +1. Conversely, the value of covariance lies
between -∞ and +∞.
5. Covariance is affected by the change in scale, i.e. if all the value of one variable is multiplied by
a constant and all the value of another variable are multiplied, by a similar or different constant,
then the covariance is changed. As against this, correlation is not influenced by the change in
scale.
Correlation is dimensionless, i.e. it is a unit-free measure of the relationship between variables.
Unlike covariance, where the value is obtained
10
MTU, College of Agriculture and Natural Resource Department of Agricultural Economics
b1 b2 u
Y =bo L K e 2.3
Where,
D = difference between ranks of corresponding pairs of X and Y
n = number of observations.
The values that r may assume range from + 1 to – 1.
Two points are of interest when applying the rank correlation coefficient. Firstly, it does not
matter whether we rank the observations in ascending or descending order. However, we must
use the same rule of ranking for both variables. Second if two (or more) observations have the
same value we assign to them the mean rank. Let’s use example to illustrate the application of
the rank correlation coefficient.
Example 2.2: A market researcher asks experts to express their preference for twelve different
brands of soap. Their replies are shown in the following table.
Table 3: Example for rank correlation coefficient
Brands of soap A B C D E F G H I J K L
Person I 9 10 4 1 8 11 3 2 5 7 12 6
Person II 7 8 3 1 10 12 2 6 5 4 11 9
The figures in this table are ranks but not quantities. We have to use the rank correlation
coefficient to determine the type of association between the preferences of the two persons. This
can be done as follows.
Table 4: Computation for rank correlation coefficient
Brands of soap A B C D E F G H I J K L Total
Person I 9 10 4 1 8 11 3 2 5 7 12 6
Person II 7 8 3 1 10 12 2 6 5 4 11 9
Di 2 2 1 0 -2 -1 1 -4 0 3 1 -3
Di2 4 4 1 0 4 1 1 16 0 9 1 9 50
The rank correlation coefficient (using Equation 2.3)
This figure, 0.827, shows a marked similarity of preferences of the two persons for the various
brands of soap.
11
MTU, College of Agriculture and Natural Resource Department of Agricultural Economics
2.4
12
MTU, College of Agriculture and Natural Resource Department of Agricultural Economics
Similarly, the partial correlation between X1 and X3, keeping the effect of X2 constant is given
by:
and
Example 2.3: The following table gives data on the yield of corn per acre(Y), the amount of
fertilizer used(X1) and the amount of insecticide used (X 2). Compute the partial correlation
coefficient between the yield of corn and the fertilizer used keeping the effect of insecticide
constant.
Table 5: Data on yield of corn, fertilizer and insecticides used
Year 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980
Y 40 44 46 48 52 58 60 68 74 80
X1 6 10 12 14 16 18 22 24 26 32
X2 4 4 5 7 9 12 14 20 21 24
The computations are done as follows:
Table 6: Computation for partial correlation coefficients
Year Y X1 X2 Y x1 x2 x1y x2y x1x2 x12 x22 y2
1971 40 6 4 -17 -12 -8 204 136 96 144 64 289
1972 44 10 4 -13 -8 -8 104 104 64 64 64 169
1973 46 12 5 -11 -6 -7 66 77 42 36 49 121
1974 48 14 7 -9 -4 -5 36 45 20 16 25 81
1975 52 16 9 -5 -2 -3 10 15 6 4 9 25
1976 58 18 12 1 0 0 0 0 0 0 0 1
1977 60 22 14 3 4 2 12 6 8 16 4 9
1978 68 24 20 11 6 8 66 88 48 36 64 121
1979 74 26 21 17 8 9 136 153 72 64 81 289
1980 80 32 24 23 14 12 322 276 168 196 144 529
Sum 570 180 120 0 0 0 956 900 524 576 504 1634
Mean 57 18 12
ryx1=0.9854
ryx2=0.9917
rx1x2=0.9725
Then,
13
MTU, College of Agriculture and Natural Resource Department of Agricultural Economics
Just because two sets of data are correlated, it doesn't mean that one is the cause of the other.
Correlation analysis has serious limitations as a technique for the study of economic
relationships.
Firstly: The above formulae for r apply only when the relationship between the variables is
linear. However two variables may be strongly connected with a nonlinear relationship.
It should be clear that zero correlation and statistical independence of two variables (X and Y)
are not the same thing. Zero correlation implies zero covariance of X and Y so that r=0.
Statistical independence of x and y implies that the probability of x i and yi occurring
simultaneously is the simple product of the individual probabilities
P (x and y) = p (x) p (y)
Independent variables do have zero covariance and are uncorrelated: the linear correlation
coefficient between two independent variables is equal to zero. However, zero linear correlation
does not necessarily imply independence. In other words uncorrelated variables may be
statistically dependent. For example if X and Y are related so that the observations fall on a
circle or on a symmetrical parabola, the relationship is perfect but not linear. The variables are
statistically dependent.
Secondly, the second limitation of the theory is that although the correlation coefficient is a
measure of the co-variability of variables, it does not necessarily imply any functional
relationship between the variables concerned. Correlation theory does not establish, and/ or
prove any causal relationship between the variables. It seeks to discover a co-variation exists, but
it does not suggest that variations in, say, Y are caused by variations in X, or vice versa.
Knowledge of the value of r, alone, will not enable us to predict the value of Y from X. A high
correlation between variables Y and X may describe any one of the following situations:
(1) variation in X is the cause of variation in Y,
(2) variation in Y is the cause of variation X,
(3) Y and X are jointly dependent, or there is a two- way causation, that is to say Y is the
cause of (is determined by) X, but also X is the cause of (is determined by) Y. For
example in any market: q = f (p), but also p = f(q), therefore there is a two – way
causation between q and p, or in other words p and q are simultaneously determined.
14
MTU, College of Agriculture and Natural Resource Department of Agricultural Economics
(4) There is another common factor (Z), that affects X and Y in such a way as to show a
close relation between them. This often occurs in time series when two variables have
strong time trends (i.e. grow over time). In this case we find a high correlation between Y
and X, even though they happen to be causally independent,
(5) The correlation between X and Y may be due to chance.
15