Unit 1: Introduction To Econometric Analysis OUTLINE: A. Economic Questions and Data: The Role of Econometrics

Econometrics I · 2018/19
c Maite Cabeza-Gutés
Unit 1: Introduction to Econometric analysis · OUTLINE
1.1. Presentation
A. Economic questions and data: the role of econometrics
≡ Some examples of economic questions
SLIDE #1(1)
Please refer to Chapter 1 of course textbooks for a detail presentation of additional

examples of interesting questions we might be interested in.
≡ What is econometrics?
Informal definition: Application of statistical and mathematical tools in order to ex-
tract information from economic data in the context of economic theory (expertise,
previous knowledge, intuition). It allows us to provide quantitative answers to eco-
nomic questions and to empirically study relationships among economic variables.
≡ Background needed
Economic Theory: Econometrics deals with the analysis of economic data. As
we shall see, knowledge on the expected relationship between variables, that is,
economic expertise is also a key ingredient in econometrics.
Probability and Statistics: Important to revise your probability theory and statistics.
In the course site (Moodle classroom) you will find a copy of Statistics for Economics
and Business with RStudio, courtesy of Professor Xavier Vilà. This textbook is the
same one used in Statistics I, but it has been updated using R/RStudio. In the
course site (Moodle classroom), you will find a set of exercises to help you revising
some key statistical concepts needed for Econometrics I.
Linear algebra: A good command of basic linear algebra is expected. In the course
site (Moodle classroom) you will find also a set of exercises to help you revising your
linear algebra for Econometrics I.
≡ Software: R/RStudio
R/RStudio is the software we shall use in this course. R is a free software envi-
ronment for statistical computing and graphics, and it runs on a wide variety of
platforms: Windows, MacOS, Linux.1 RStudio is the IDE (Integrated Development
Environment) for working with R, and it makes R easier to use. Please install
R (https://cran.r-project.org/) in your personal computer and afterwards, install
RStudio (https://www.RStudio.com/).
1
https://www.r-project.org/
Introduction to Econometric Analysis 2
≡ Econometrics: Objectives
(1) Measuring association
→ detecting and measuring meaningful association between economic variables.
→ Examples:
- How connected are stock markets?
- How closely does the return of a given stock follow the movements of the
stock market ?
→ Is it easy to measure the degree of association between two variables? Not

necessarily. Associations can be deceiving.
Example 1 : Storks and births (Matthews,R. (2000), "Storks deliver babies",

Teaching Statistics, vol.22(2).
Example 2 : Mozart’s effect (Rauscher,F. et al. (1993) "Music and spatial task
performance", Nature, vol.365)
Example 3 : Simpson’s paradox
SLIDE #1(2)
What could be behind this nonsense, meaningless, spurious correlation between

these two variables? The presence of confounding.
→ What is confounding?
In measuring the degree of association between variable A and variable B, con-
founding appears whenever there is a variable C that affects at the same time
variable A and B. Some times, this variable C is referred to as a confounder.
Confounding can be best understood by using so called causal diagrams. A

causal diagram is simply a variable-arrow picture, diagram, that summarizes
the researcher’s expertise, previous knowledge, about the cause-effect relation-
ship between the variables under study. We will use them extensively along the
course, and we will see they are crucial in the process of data analysis. This
existing knowledge can come from formal economic theory, from intuition,...
Causal diagrams allow us to clearly draw our prior assumptions before the pro-
cess of data analysis actually starts. Including an arrow from C to A captures
the researcher belief that variable C has an effect on variable A. An absence of
an arrow implies that researcher considers there is no direct link between the
two variables.
Figure 1.1. includes the easiest representation of confounding. According to

the diagram, variable C causes, has an effect on, variable A. At the same
time, variable C causes, has an effect on, variable B. No direct link is present
between A and B. They association comes only from the path though C.
Figure 1.1: Basic causal diagram representing confounding
A
C
B
Another example of using a causal diagram to illustrate confounding is found

in Figure 1.2. In this case, there is a causal link between variables A and B,
but C is causing confounding, as it affects both of these variables.
Figure: 1.2: Causal diagram representing confounding
C A B
Consider we are only interested in measuring the link between A and B. If

C is not controlled for, i.e., not taken into account, confounding will bias our
measure of direct association between A and B. That means that, depending
on the sign of the relationships expressed above, we could underestimate, over-
estimate and even change the sign of the direct association between A and B.
(2) Prediction / Forecasting
→ Guessing the value of a variable. It can imply guessing the value of a variable
into the future, but not exclusively. The idea is to make an informed guess, a
best/optimal guess.
→ Examples:
- What is the prediction for next quarter growth rate ?
- How much does a company expect to sell next month?
→ In this course we shall learn how to make conditional predictions. That is,
predictions on one variable using information available from other variables.
It is important to understand that detecting associations between variables,
even if they are spurious or biased measures, can be useful for prediction.
(2) Causal inference
→ Economists are most interested in causal relationships. Examples:

- Does reducing class size improve school education?
- Does it pay to invest in a college degree?
- What is the degree of gender discrimination in a given labor market?
- What is the degree of racial discrimination in a given labor market?
- Can an increase in cigarette taxes reduce smoking?
- Is climate change a source of conflict?
- What is the annual health care costs attributed to smoking?
- How effective is an advertising campaign in promoting sales?
- How much do the economic conditions before political elections drive its
results?
- How likely is that a child born in a low income household earns a high
income in adulthood?
All these questions inquire about cause-effect relationships between these vari-
ables. Notice that some of them might be related to testing implications of a
given economic theory/economic model. Some of them might related to policy
evaluation.
In the following example illustrates how tricky causal inference might be.
→ Example:
We have data for a sample of people. For each person in the sample we know
their wages (W ) and whether they have gone to college (C). We define qualita-
tive variable going or not going to college using a dummy or indicator variable,
C, taking 2 values: 1 (person has gone to college), 0 otherwise. We calculate
two conditional sample means:
sample mean of W given C = 1 : W1

sample mean of W given C = 0 : W0
Consider that with this sample we observe:
W1 − W0 > 0 .
The data is pointing at an association between W and C: people with a college

degree on average they get paid more than a person without a college degree.
Notice that the population equivalent of these two statistics would be:
E(W/C = 1) − E(W/C = 0)
Hence, W 1 − W 0 > 0 would be signaling at E(W/C = 1) − E(W/C = 0) > 0.

But, can we use the measurement W 1 − W 0 , for causal inference? That is, can
we use it as a measure of how much going to college pays? As a measure of
the link C → W ? As the effect of going to college on your salary?
Not necessarily.
Causal diagrams help in understanding why association and causation are dif-
ferent things. Causal diagrams, as explained, translate our prior expertise on
the variables we are analyzing. In this case, we consider that going to col-
lege has an effect on pay, hence an arrow from C to W . We also believe that
family income, F determines the chances of people going to college and at the
same time, it influences their pay (may be through networking, good family
connections,..). The causal diagram reflecting these assumptions is included
below.
link of interest
C W
It is important to emphasize that the arrows included in the causal diagram

do not represent deterministic relationships between the variables. That is,
an arrow from C to W does not imply that knowing whether a person has a
college degree or not is enough to know exactly how much that person earns.
Notice, that under our causal diagram, we would be facing confounding. This
would imply that W 1 − W 0 would be a biased measure of our link of interest.
In this case it would overestimate the effect of C on W . It does not matter
how large our sample is, the bias will be there.
How to deal with confounding? To actually move towards measuring our link
of interest, we should take the role of F into account in our calculations, some-
thing not done when calculating W 1 and W 0 .
B. The Notion of causality: a closer look

≡ We know that association is not causation. But, what is causation? What do we
mean by variable A causing variable B? Let us consider the following example to
illustrate some of the key elements in learning about causal inference.
≡ Illustration: Inference on the effect of a fertilizer dose (F ) on crop yields (Y ):
?
F Y
= For simplicity, let us consider that the treatment F is defined as an indicator

variable with only to possible values: apply or not apply fertilizer dose.
Consider the following definitions:
Yi1 ≡ yield of plot i if fertilizer dose is applied(Fi = 1)

Yi0 ≡ yield of plot i if fertilizer dose is not applied(Fi = 0)
Notice: We can use Yi1 and Yi0 to clearly define the effect of fertilizer on plot i
Yi1 − Yi0 = effect of fertilizer for plot i.
Notice that Yi1 and Yi0 are counterfactual. That is, they cannot be observed
at the same time. Now, we can define the (average) effect of the fertilizer on
crop yields using parameter AT E (Average Treatment Effect):
AT E ≡ E(Yi1 − Yi0 )
Defining a causal effect in terms of counterfactuals, as AT E ≡ E(Yi1 − Yi0 ),

allows us to have a clear definition of what we mean by a variable causing,
having an effect, on another. If AT E > 0, for example, we can say that
fertilizer has an effect on crop yields.
= How could we collect data to estimate AT E?
Consider the following experiment: Select n plots. In some of them fertilizer
was applied but not in others. With the data collected, we calculate:
Y1−Y0
Key question: If Y 1 − Y 0 > 0, what can we conclude?

→ plots where the fertilizer was applied, on average, have a higher yield than
the plots where no fertilizer was applied. (Association)
→ But, can we conclude that the fertilizer has a positive effect on crop yields?
That is, can we use Y 1 − Y 0 > 0 as a measure of AT E? Not, necessarily.
Why? Consider that the way F was applied depends on the value of
another variable S, say sunshine level of that plot. For simplicity, consider
only two values for S: S = 1 (plot with high level of sunshine) and S =
0 (plot in the shade). Consider plots where S = 1 had a much alrger
probability of receiving a fertilizer dose. In this case, the relationship
between F , S and Y can be summarized by the following causal diagram:
?
F Y
+ +
Under this scenario:

Yi1 : fertilizer + effect of higher levels of sunlight
Yi0 : no fertilizer + effect of lower levels of sunlight
Hence, we would not know if Y 1 − Y 0 > 0 is due to the effect of the fertil-
izer, to the differences in sunlight, or to both. That is Y 1 − Y 0 would be
a biased measure of AT E. This would be again a case of confounding.
= Way out? We have in fact two alternative strategies to treat confounding.
Alternative 1: Randomization of treatment

To use Y 1 − Y 0 > 0 for causal inference, we would need plots where F has been
applied and the plots where it has not been applied to be identical, on average,
in everything else affecting Y . How can we do this? Use randomization. That
is, randomize the application the fertilizer, so that the probability that a plot
is assigned a fertilizer dose is the same regardless of its level of sunshine. Ran-
domization of treatment should guarantee that the two groups of plots (those
where the fertilizer was applied and those where it was not) are on average the
same in any thing else affecting yields.
That is, under proper randomization of treatment:
Y 1 : fertilizer + average effect of sunlight (mixture of high and low levels)

≈
Y 0 : no fertilizer + average effect of sunlight (mixture of high and low levels)
Notice that in terms of causal diagram before, we have removed confounding

by eliminating the arrow connecting S and F . That is,
?
F Y
Alternative 2: Control for S

If too late for randomization, we coould control for sunlight, provided data
is available for S. This implies, calculating not marginal sample means but
conditional sample means. That is:

→ For plots where S = 1, calculate Y 1 and Y 0 . Notation: ( Y 1 S=1 , Y 0 S=1 )

→ For plots where S = 0, calculate Y 1 and Y 0 . Notation: ( Y 1 S=0 , Y 0 S=0 )
Notice that by fixing/controlling for S, we have a measure of the effect of the

fertilizer for the plots receiving more sunlight (AT E1 ) on one hand, and also,
for the plots with less sunlight (AT E0 ) on the other:

Y 1 S=1 − Y 0 S=1 → AT E1

Y 1
S=0
− Y 0
S=0
→ AT E0 ,
Notice that the population counterpart of the conditional sample means above
would be:
E(Y /F = 1, S = 1) − E(Y /F = 0, S = 1)
E(Y /F = 1, S = 0) − E(Y /F = 0, S = 0)
Comments
(i) To control for a variable, this variable should be part of our data set. In
our example, if no data is available on sunlight levels, then, controlling for
sunlight would not be possible and confounding would affect our causal
inference. We would face what it is called an identification problem. No
matter how large the sample is, if we do not control for S, identification
problem remains.
(ii) Under the causal diagrams above relating F , S and Y , S is assumed to

be the only variable causing confounding. If that is the case controlling
for S would allow us to measure the effect of F on Y . That is, under the
assumed causal diagram,

Y 1 S=1 − Y 0 S=1 and Y 1 S=0 − Y 0 S=0
can be interpreted in causal terms. Of course, the larger the sample, the
better our measurement of this causal effect would be. That is, we would
not face an identification problem, as increasing the sample size would al-
low to get more precise estimates of our causal parameter of interest.
→ In this course we will learn to use regression analysis as a way to try to con-
trol for confounding. Regression analysis is a very useful tool for calculating
conditional means. If all variables causing confounding have been identified,
and data for these variables is available, regressions could be used for causal
inference.
→ Summing up:
In the context of a controlled experiment, causal inference requires all other
factors that influence the outcome vary at random (randomization) or can be
controlled for. That is, even if the direction of causality between treatment
and outcome is clear, prior knowledge (theory) of the variables affecting the
outcome is required and will determine the quality of the inference.
C.About the data

≡ The nature of economic data
"Not all data a created equal".
The nature of economic data is central for the understanding of econometrics and
its methodology.
A large proportion of the data available for analysis is obtained by observing actual
behavior outside an experiment. That is, we have observational data, (also referred
to as passive data or nonexperimental data ). The nature of the data is what
makes causal inference very tricky.
Exceptions/Alternatives: Natural experiments (many available) and Social experi-
ments (increasingly available)
In this course we will mostly be analyzing observational data.
≡ The structure of the data

Cross sectional data
Data on different economic units (countries, families, firms, individuals,...) at a
given time period. An observation for variable y in a cross sectional data set is
typically identified as:
yi
where subindex i indicates observation for unit i.
Time series
Data on same economic unit, different time periods. A series always implies some
ordering of the data. In the case of a time series, observations are temporarily
ordered. An observation of a variable in a time series data set is typically identified
as:
yt
where subindex t could refer to a day (in the case of daily data set), a month
(monthly data set ), a year (yearly data set),...
Longitudinal / Panel data

Data on a set of economic units, observed at different time periods. An observation
of a variable in a panel data set is usually identified as:
yit
where subindex it indicates observation for unit i, time period t.

As we shall see some of the issues in data analysis are common for all data structures,
but some are structure specific.
D.About the variables

≡ Discrete versus continuous
Discrete: A variable is discrete is it can take a countable number of values.
E.g.: life expectancy, number of internet searches, units purchased, years of com-
pleted education.
Continuous: A variable is continuous when it can take any value within an interval.
E.g: income/earnings/profits or growth measurement
≡ Types of Variables by level of measurement

Not all variables have the same level of measurement. This has important implica-
tions not only in how to read the results from our data analysis, but also in selecting
the appropriate tools. According to their levels of measurement, from strongest to
weakest, variables can be classified as:
Ratio scale:
Variable y is a ratio scale variable if for two values of this, y2 and y1 , is meaningful.2
Notice that if yy12 is meaningful, it implies that y2 − y1 is also meaningful, and that
a natural ordering between the values (y2 > y1 or y2 < y1 ) is possible. E.g.: Wage.
Interval scale:
Variable y is an interval scale variable if for two values of this variable,y2 and y1 ,
their distance is meaningful but its ratio is not.
Notice that if y2 − y1 is meaningful, a natural ordering between the values (y2 > y1
or y2 < y1 ) is possible. E.g: Years (time)
Ordinal scale:
A natural ordering between two values, y2 and y1 , is possible (like from lower to
higher) but neither its ratio nor the distance between the two values have meaning.
E.g. Corruption index.
Nominal variable:
We label a variable as having a nominal scale, when no order is possible between
two values y2 , y1 . Notice that in this case, neither yy12 nor y2 − y1 has any meaning.
Variables in the nominal scale are sometimes referred to as qualitative variables.
E.g. gender defined in terms of a dummy variable..
2
Usually it is also included in the definition that plus it includes a ’true’ 0 point (i.e., it has an absolute
zero). When a level of measurement includes an absolute 0 what it means is that it is indicating a lack
of presence of the quality being measured. Example an income of 0 means the person has no income
whereas a temperature of 0 does not indicate a lack of temperature. Hence, income is ratio scale but not
temperature.
1.2. Linear regression model

A. Introduction to regression analysis: Galton’s work
≡ Francis Galton: "Regression Towards Mediocrity in Hereditary Stature" in The
Journal of the Anthropological Institute of Great Britain and Ireland, Vol. 15.
(1886), pp. 246-263.
Goal: study relationship between parents height and their children’s height as
adults. We will use part of his work in this article to introduce regression anal-
ysis.
Galton’s data: Collected data of 928 people, born from 205 couples.
SLIDE #1(3)
Important concepts to revise in relationship to Slide #1(3)
→ Not a deterministic relationship between the two variables.

→ Information regarding the joint distribution of the two variables.
→ Information regarding the conditional distribution
→ Conditional sample means: how would you calculate them?
→ Information regarding the marginal distribution of each variable
→ Marginal sample means: how would you calculate them?
→ Notice Galton’s table includes sample marginal medians.
≡ A first look at the data set3
SLIDE #1(4)
child=child’s height, as adult, in inches (1in=2.54cm)
parents=mid-parents height in inches (average father’s height and mother’s height)
≡ Analysis of Galton’s data: Univariate analysis
Descriptive statistics4 :
SLIDE #1(5) (statistics)
Statistic N Mean St. Dev. Min Pctl(25) Median Pctl(75) Max

child 928 68.09 1.53 63.19 67.03 68.12 69.08 72.19
parents 928 68.30 1.86 63.50 67.25 68.25 69.50 73.25
3
See appendix: GaltonCode1.
4
See appendix GaltonCode2
Important:
→ Sample mean: child = 68.09.

-How is the sample mean calculated?
-What is the population counter part? I.e, what is the sample mean an estimate
of?
-What information about the distribution of variable child is the average pro-
vide?
-What are the units of measurement of the mean (sample or population) in
this case?
→ Sample standard deviation: 1.53
-What is the population counter part? Square root of the variance. Variance
is also just an expectation. An expectation of...?
-What is the sample counterpart of the population variance? That is, which
expression is used to calculate the sample variance?
-What information about the distribution of variable child do the standard
deviation or variance provide? measure of dispersion.
-What are the units of measurement of the variance (sample or population) in
this case?
-What are the units of measurement of the standard deviation (sample or
population) in this case?
Density histograms5 :
SLIDE #1(5) (Plots)
0.3
0.2
0.2
Density
Density
0.1
0.1
0.0 0.0
63 66 69 72 65.0 67.5 70.0 72.5

Parents Height Child’s Height
Comments:
→ In this plot we see the density histogram of each variable of parents height and
of child’s height as adults. In red, the sample mean of each variable is also
identified.
→ If Child’s distribution is assumed to be normal, then we would expect that
95% of all the observations to be within.....
5
≡ Analysis of Galton’s data: Moving towards bivariate analysis6
SLIDE #1(6)
72.5
Child’s Height (as adults)

70.0
67.5
65.0
63 66 69 72
Parent Height
Comments:
→ Plot above provides information of both, relationship between the two variables
(scatter) and information of each variable in the sample, regardless the value
of the other (marginal density histograms). The dotted blue lines identify each
of the (marginal) sample means.
→ From now on we are interested in relationships, not marginal behavior, so we
will focus on the scatter part of the plot.
Focusing on the scatter:7
SLIDE #1(7)
72.5
Child’s Height (as adults)
70.0
67.5
65.0
63 64 65 66 67 68 69 70 71 72 73
Parent’s Height
Comments: Association between the two?
→ Don’t keep a deterministic relationship!

→ Sample covariance between child and parent: 2.14
6
7
See appendix GaltonCode5.
Population definition of covariance between two variables Z and W ? Again,

an average!
cov(Z, W ) ≡ E[(Z − E(Z)) ∗ (W − E(W ))]
How is the sample covariance calculated?
Is the sign of the covariance as you expected? Units of measurement of the
covariance? Any idea about the information the value provides?
What does it mean for the covariance to be positive?You should be able to
explain it by looking at the scatter in the figure above.
Looking at population definition of covariance, or at sample counterpart, easy
to see that cov(Z, W ) = cov(W, Z). Why?
→ Sample correlation coefficient: r = 0.75
Population definition of correlation coefficient (ρ)?
How is the sample correlation coefficient calculated?
Sign as expected? Units of the correlation coefficient? Unit free! Why?
→ How else could we characterize the relationship between the two? We could
calculate conditional expectations.
≡ Conditional expectations
Consider we focus on just the sub-group of parents with mid-height=63.75 inches,
and we want to estimate:
E(child/parent = 63.75)
How could we extract info from the sample on this conditional expectation?
We select from the sample all children that have parents with height 63.75in (there
are 8 of them) and calculate the average of the 8 heights.
child|parent=63.75cm = 65.53in
Now, we can repeat the calculations for all conditional means (40 of them).
SLIDE #1(8)
72.5
70.0
70.0
Mean Child Height
Child Height
67.5
67.5
65.0 65.0
63 64 65 66 67 68 69 70 71 72 73 63 64 65 66 67 68 69 70 71 72 73
Parent Height Parent Height
Comments
→ We can see that the conditional means (red squares) change as a function of
value of parent. That it, we can see that we can talk about CEF :
E(child/parent) = f (parent).
→ We can see that the conditional means can be approximated, summarized by

a line.
SLIDE #1(8)
72.5
70.0
70.0
Mean Child Height

Child Height
67.5
67.5
65.0 65.0
63 64 65 66 67 68 69 70 71 72 73 63 64 65 66 67 68 69 70 71 72 73
Parent Height Parent Height
The black line can be used as an approximation of tis CEF. In this case a good
approximation.
→ The black line is positively sloped, with a value of approximately 2/3: regres-
sion to the mean (hence the title of Galton’s article.)
→ Notice that the CEF function could be used to:
(1) Describe the association between the two variables (proving more informa-
tion than just using the covariance or correlation coefficient.
(2) As (optimal) conditional prediction of child, given the information of
parent.
(3) Could we used in this case as a measure of the effect (in this case through
genetic inheritance) of parents height on to children? Probably.
B. Regression analysis
≡ Regression analysis is a tool we use to extract information from the data set. Re-
gressions focus on characterizing the behavior of a single variable y with respect to
a set of variables x1 , x2 , . . . , xk .
Variable y is usually referred to as the dependent variable and variables x1 , x2 , . . . , xk
as regressors.
Most popular regression models focus on the conditional expectation of y given the
values of the regressors, x1 , x2 , . . . , xk .8
E(y/x1 , x2 , . . . , xk )
8
Not the only option, we could also focus on conditional medians, as Galton did.
This is referred to as the Conditional Expectation Function (CEF ) or Regression

Function.9 It is considered a function since the expected value of y is assumed to
change depending on the values of the regressors. That is:
E(y/x1 , x2 , . . . , xk ) = f (x1 , x2 , . . . , xk ).
Now, associated to this CEF we can define the associated regression model as:
y = E(y/x1 , x2 , . . . , xk ) + u.
Term u is known as the disturbance or error or noise. It acknowledges that the

values of y, associated to a given x, might be above or beyond the expected value.
That is, x does not fully characterize the behavior of y. Other factors, not captures
by the regressor or regressors are playing a role.
Our goal is going to be to use the data to extract information about this function,
as this function can help us:
(1) characterize association between y and the regressors
(2) can be use as an optimal predictor for y given information included in the value
of the regressors (conditional predictor)
(3) under certain assumptions it might could be used for causal inference.
≡ Special case: Regression analysis using 2 variables

Only two variables involved: one acting as the dependent variable (y) and the other
one as a regressor (x). Focus, as explained is on the CEF , which in this case
simplifies to:
E(y/x) = f (x).
Hence, our regression model will be:
y = E(y/x) + u.
Galton’s work: Please notice the equivalences to Galton’s example just presented in
section 1.2.A.
Example:
→ Variables: annual income (W ages), whether person has a college degree or not,
using indicator variable C.
→ To define the regression model, we need to choose which variable acts as y and
which one as x. Our prior causal model might help in that.
y : Wage (W )
x : College degree indicator (C)
9
Notice the label ’regression function’ is more general. It could also be applied to regression analysis
that focuses on conditional medians, for example.
→ In this case, the conditional expectation function (CEF ):
E(W/C)
Notice that treating CEF as a function implies allowing the possibility that
E(W/C) might change depending on the value of C:

E(W/C = 1) (expected wage for a person with a college degree)
E(W/C) =
E(W/C = 0) (expected wage for a person without a college degree)
That is:
E(W/C) = f (C)
Notice, this CEF is a function that can only take two values.
→ In this case, the regression model would be:
W = E(W/C) + u
Term u would capture the part of W not captured by C. What could this
include?
C.Linear regression model

≡ Key assumption regarding the CEF :
E(y/x1 , x2 , ..., xK ) = β0 + β1 x1 + . . . βK xK
Since the CEF is assumed to be linear, it is known as the Linear Regression function
or Regression line.
β0 , β1 , . . . , βK are called the regression coefficients or regression parameters. Very

important to understand their meaning, as they will be of key importance in ex-
tracting information from data.
Under this assumption, our regression model would be:
y = β 0 · 1 + β 1 x1 + β 2 x2 + . . . + β K xK + u Regression Model
- called the constant: ’1’

- k non-constant regressors: x1 , x2 , . . . , xK
This is called the (multiple) linear regression model.
≡ Back to special case: regression analysis using two variables
In this case, we assume that:
E(y/x) = β0 + β1 x
Hence, the corresponding regression model will be:
y = β0 + β1 x +u .
| {z }
E(y/x)
This is known as the simple regression model.
Galton’s work: Back to Galton’s Slide1(8): would this assumption, of linear CEF
seem reasonable?
Back to previous example:
→ Variables: annual income (W ages), whether person has a college degree or not,
using indicator variable C.
W : acting as dependent variable
C: acting as regressor
→ Regression line: E(W/C) = β0 + β1 C
Simple regression model: W = β0 + β1 C + u.
→ Parameter meaning?
E(W/C = 1) = β0 + β1
E(W/C = 0) = β0
Hence: β0 is the expected pay os a person without a college degree.
β1 = E(W/C = 1) − E(W/C = 0)
Hence: β1 is the expected difference in pay between person with college degree
and a person without.
Notice now our parameters can be used to formalize our goals:
(1) β1 : characterize the association between W and C. Still we should be
careful as we are interested in meaningful associations (not spurious) and
in measuring these associations without bias.
(2) β0 and β1 , to produce an optimal predictor of W given info on C.can be

use as an optimal predictor for y given information included in the value
of the regressors (conditional predictor)
(3) β1 , for causal inference? More complex than measuring associations as in

addition it will always bedependent on the assumed causal links between
variables.
Example:
We are given data on (C, W ), where C = indicator of whether the person

has a college degree (C = 1) or it does not (C = 0), and W = wage.
Set regression model:
W = β0 + β1 C + u
Focus: E(W/C) = β0 + β1 C.
- Consider causal diagram, capturing our view of the relationship be-
tween W , C, and u is:
link of interest!
C W
In this case, C is labelled as exogenous, or moving in ceteris paribus

conditions, implying that we consider that when we shift from the
group with C = 1 to the group with C = 0, nothing else determining
W is systematically moving. In other words, the two groups are equal,
on average, in all other variables that also influence W .
How can we formalize this:
E(u/C = 1) = E(u/C = 0)
Or, equivalently:
E(u/C) 6= f (C),
or,
E(u/C) = constant (usually chosen to be 0).
→ Under these conditions β1 could be estimated without bias using

regression analysis and it could be interpreted as a causal parameter.
Obviously, as we shall see, the larger the number of observations in the
sample, the more precise our estimation will be.
- But, consider now the causal diagram to be:
link of interest!
C W
u
The reasoning behind this causal diagram could be considering, for

example, that as part of u there is a person’s family income (F ), and
that the higher F , the higher the probability a person goes to college
since it can afford it, (arrow from u to C) and also, that higher family
income means better networking and hence, better chances to find a
good job (arrow u to W ).
In this case:
E(u/C = 1) 6= E(u/C = 0)
Or, equivalently:
E(u/C) = f (C).
→ Under these conditions, regressor C is labelled as endogenous.
If β1 is a parameter that we want to define as the one measuring the
effect of C on W , then, using data solely on C and W to measure
this effect, will produce a biased estimate, since it will be affected by
confounding.
Could we correct for confounding? If we identify variable F as the

variable inside u causing confounding, and data on F is available, then,
we could control for F . This could be done by adding F as a regressor
in our regression model, and moving from simple to multiple regression
analysis:
W = β0 + β1 C + β2 F + u
The CEF would be now:
E(W/C, F = F ) = β0 + β1 C + β2 F.
In this case, the confounding caused by F would have been taken care
of. Why? Consider F set to be fixed at value F0 . Two conditional
expectations of W with respect to C for level F0 can be defined:
E(W/C = 1, F = F0 ) = β0 + β1 + β2 F0
E(W/C = 0, F = F0 ) = β0 + β2 F0
Then,
β1 = E(W/C = 1, F = F0 ) − E(W/C = 0, F = F0 ).
Parameter β1 would be capturing the expected difference in pay be-
tween people that have a college degree and people without, controlling
for F . If F is the only variable causing confounding, then, the link from
u to C would be eliminated, and the link between C and W would be
estimated without a bias. Obviously, as mentioned before, the larger
the sample, the better our estimation would be.
One should be aware that for confounding to be fully controlled, that
is, that F would have to be the only variable in u affecting at the
same time C and W . But, that might not be the case, as other vari-
ables,might be also causing confounding.
≡ Regression Analysis: course plan
As explain, our goal will be use the information contained in our sample to approx-
imate the CEF or regression function. Given that we are using linear regression
analysis, this will imply estimate these parameters using our sample.
In order to produce estimates of these parameters, we will need an estimation

method. The estimation method we will use in this course is the least squares
estimation.
After estimation, hypothesis testing and confidence intervals will be used to evaluate
what conclusions can we draw regarding the population from the analysis of our
sample.
In Unit 2, estimation, hypothesis testing and confidence intervals will be presented

for the case of the simple regression model. The multiple regression model will be
covered in Units 3 and 4. In Unit 5, some extensions of regression analysis will be
presented.
Readings
Stock, James and Watson, Mark, Introduction to Econometrics. Updates third edi-
tion (or earlier). Chapter1,2,3.
Wooldridge, Jeffrey, Introduction to Econometrics. Sixth edition (or earlier). Chap-

ter 1, Appendix A,B,C and D.
Key concepts revised in Unit 1

• Measuring association
• Correlation
• Spurious correlation
• Confounding
• Bias
• Simpson’s paradox
• Prediction
• Conditional prediction
• Causal inference
• Causal diagram
• Association versus causation
• Counterfactual
• Average treatment effect (AT E)
• Conditional expectation
• Experiment
• Randomization of treatment
• Experimental data
• Observational, passive, non experimental data
• Cross sectional data
• Time series data
• Discrete random variable
• Continuous random variable
• Level of measurement of a variable (ratio scale, interval scale,...)
• Regression analysis
• Conditional Expectation Function (CEF )
• Disturbance or noise
• Regression model
• Regression line
• Regression parameter o regression coefficient
• Linear Regression model
• Simple regression model
• Multiple regression model
• Exogenous regressor
• Endogenous regressor
• Ceteris paribus conditions
Appendix: Rcodes
• GaltonCode1: Import data and look at data
1 Galton _ data <- read _ csv ( " Galton . csv " )
2 head ( Galton _ data )
3 tail ( Galton _ data )
• GaltonCode2 : Summary statistics

1 library ( stargazer )
2 # txt output and onscreen
3 stargazer ( Galton _ dataW , median = TRUE , type = " text " , title = " Descriptive statistics " ,
digits =2 , out = " table1 . txt " )
4
5 # latex table output
6 stargazer ( Galton _ dataW , median = TRUE , type = " latex " , title = " Descriptive statistics "
, digits =2 , out = " table1 . tex " )
• GaltonCode3: Density histograms

1 # Parents
2 p1 <- ggplot ( Galton _ data , aes ( x = parent ) ) + geom _ histogram ( aes ( y =.. density ..) , fill = "
lightblue " , color = " darkblue " ) +
3 geom _ vline ( aes ( xintercept = mean ( parent ) ) , col = ’ red ’ , size =2) +
4 ylab ( " Density " ) + xlab ( " Parents Height " ) +
5 theme ( axis . text = element _ text ( size =10) ,
6 axis . title = element _ text ( size =10) )
7 p1
8
9 # Child ’ s
10 p2 <- ggplot ( Galton _ data , aes ( x = child ) ) + geom _ histogram ( aes ( y =.. density ..) , fill = "
lightblue " , color = " darkblue " ) +
11 geom _ vline ( aes ( xintercept = mean ( child ) ) , col = ’ red ’ , size =2) +
12 ylab ( " Density " ) + xlab ( " Child ’ s Height ( as adult ) " )
13 theme ( axis . text = element _ text ( size =10) ,
14 axis . title = element _ text ( size =10) )
15 # Both side by side :
16 library ( gridExtra )
17 p3 <- grid . arrange ( p1 , p2 , ncol = 2)
1 Galton _ data <- read _ csv ( " Galton . csv " )

2 View ( Galton _ data )
• GaltonCode4: Scatter, marginal means and marginal histograms

1 g <- ggplot ( Galton _ data , aes ( parent , child ) ) +
2 geom _ point ( shape =20 , color = " black " , size =2) +
3 xlab ( " Parent Height " ) + ylab ( " Child ’ s Height ( as adults ) " ) +
4 geom _ hline ( yintercept = mean ( childH ) , linetype = " dotted " , color = " blue " , size
=1) +
5 geom _ vline ( xintercept = mean ( parentsHr ) , linetype = " dotted " , color = " blue " ,
size =1)
6
7 library ( ggExtra )
8 gm <- ggMarginal (g , type = " histogram " , fill = " gray53 " , color = " black " , size =4)
9 gm
• GaltonCode5: Scatter with marginal means

1 library ( ggplot2 )
2 g <- ggplot ( Galton _ data , aes ( parent , child ) ) +
3 geom _ point ( shape =20 , color = " black " , size =2) +
4 xlab ( " Parent ’ s Height " ) + ylab ( " Child ’ s Height ( as adults ) " ) +
5 geom _ hline ( yintercept = mean ( Galton _ data $ child ) , linetype = " dotted " , color = "
blue " , size =1) +
6 geom _ vline ( xintercept = mean ( Galton _ data $ parent ) , linetype = " dotted " , color = "
blue " , size =1) +
7 scale _ x _ discrete ( limits = seq (63 ,73 , by =1 ) )
8 g

Unit 1: Introduction To Econometric Analysis OUTLINE: A. Economic Questions and Data: The Role of Econometrics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 1: Introduction To Econometric Analysis OUTLINE: A. Economic Questions and Data: The Role of Econometrics

Uploaded by

Copyright:

Available Formats

Econometrics I · 2018/19

Please refer to Chapter 1 of course textbooks for a detail presentation of additional

→ detecting and measuring meaningful association between economic variables.

→ Is it easy to measure the degree of association between two variables? Not

Example 1 : Storks and births (Matthews,R. (2000), "Storks deliver babies",

What could be behind this nonsense, meaningless, spurious correlation between

Confounding can be best understood by using so called causal diagrams. A

Figure 1.1. includes the easiest representation of confounding. According to

Figure 1.1: Basic causal diagram representing confounding

Another example of using a causal diagram to illustrate confounding is found

Figure: 1.2: Causal diagram representing confounding

Consider we are only interested in measuring the link between A and B. If

(2) Prediction / Forecasting

(2) Causal inference

→ Economists are most interested in causal relationships. Examples:

sample mean of W given C = 1 : W1

Consider that with this sample we observe:

The data is pointing at an association between W and C: people with a college

Hence, W 1 − W 0 > 0 would be signaling at E(W/C = 1) − E(W/C = 0) > 0.

It is important to emphasize that the arrows included in the causal diagram

B. The Notion of causality: a closer look

≡ Illustration: Inference on the effect of a fertilizer dose (F ) on crop yields (Y ):

= For simplicity, let us consider that the treatment F is defined as an indicator

Consider the following definitions:

Yi1 ≡ yield of plot i if fertilizer dose is applied(Fi = 1)

Defining a causal effect in terms of counterfactuals, as AT E ≡ E(Yi1 − Yi0 ),

Key question: If Y 1 − Y 0 > 0, what can we conclude?

Under this scenario:

= Way out? We have in fact two alternative strategies to treat confounding.

Alternative 1: Randomization of treatment

Y 1 : fertilizer + average effect of sunlight (mixture of high and low levels)

Y 0 : no fertilizer + average effect of sunlight (mixture of high and low levels)

Notice that in terms of causal diagram before, we have removed confounding

Alternative 2: Control for S

Notice that by fixing/controlling for S, we have a measure of the effect of the

(ii) Under the causal diagrams above relating F , S and Y , S is assumed to

C.About the data

"Not all data a created equal".

In this course we will mostly be analyzing observational data.

≡ The structure of the data

Longitudinal / Panel data

where subindex it indicates observation for unit i, time period t.

D.About the variables

≡ Types of Variables by level of measurement

1.2. Linear regression model

Important concepts to revise in relationship to Slide #1(3)

→ Not a deterministic relationship between the two variables.

≡ A first look at the data set3

child=child’s height, as adult, in inches (1in=2.54cm)

parents=mid-parents height in inches (average father’s height and mother’s height)

≡ Analysis of Galton’s data: Univariate analysis

Statistic N Mean St. Dev. Min Pctl(25) Median Pctl(75) Max

→ Sample mean: child = 68.09.

63 66 69 72 65.0 67.5 70.0 72.5

≡ Analysis of Galton’s data: Moving towards bivariate analysis6

Child’s Height (as adults)

Focusing on the scatter:7

Comments: Association between the two?

→ Don’t keep a deterministic relationship!

Population definition of covariance between two variables Z and W ? Again,

→ We can see that the conditional means can be approximated, summarized by

Mean Child Height