Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Lecture 1 (Intro to Econometrics):

● Econometrics (focuses on causality) vs. Machine Learning (focuses on prediction).


● Econometrics depends mainly on observational data (using sample statistics to
understand population parameters).
● Econometric Models:
- Deterministic: X explains 100% of the variation in Y
- Stochastic (Random / Probabilistic): X explains some degree of variation in Y.
The rest is captured by the error term.
● The purpose of Econometrics is:
- Making inferences about population parameters through analyzing sample
statistics (this happens by using descriptive statistics and inferential statistics).
- Understanding the nature of association between the dependent variable (Y) and
the independent / explanatory variable (X).
Lecture 2: Ordinary Least Squares (OLS)
● Y = B0 (y-intercept; does not necessarily have economic interpretation) + B1X
(systematic part) + E (noise / error)
● Intuition:
- Scatter Plot
- Best Fit Line that minimizes Sum of Squared Errors (SSR)
● Purposes:
- Estimation of population parameters based on sample statistics.
- On average, the expected value of the sample statistic should be VERY CLOSE
to the actual value of the population parameter in order for your model to be
properly specified (otherwise the model is considered biased).
- To test theories about how the world works.
● Do NOT use OLS when your dataset contains outliers (extremely high or low
values compared to the rest of the data points).
● Use OLS when variables are continuous.
● OLS Assumptions:
1- Linearity in Parameters
2- Random Sampling
3- Mean Error Term = 0
4- Homoscedasticity of Error Term: Constant variance of errors.
5- Normal Distribution of Error Term: Can be checked through Box Plot and/or Histogram
6- No serial correlation (auto-correlation) between error term points: current points are
NOT correlated with past points.
● OLS Regression Estimators are BLUE (Best Linear Unbiased Estimators).
● Example: CAPM Model
● Understanding Standard Error of Regression (SER) is crucial in OLS regression
analysis.
● Achieving a high R-Square (more than 75%) is a key indicator to the fitness of the model
as well.
- R-Square: represents the variation percentage in the dependent variable (Y) that
is attributed to the explanatory variable (X).
-------------------------------------------------------------------------------------------------------------------------------
Lecture 3 (Hypothesis Testing):
● MIDTERM Exam Questions (according to Prof. Antonio):
- What is an Econometric Model?
- What are the assumptions/properties of OLS Regression? What makes the
OLS Regression BLUE (Best Linear Unbiased Estimator)?
● Before we understand what is hypothesis testing, there are some basic terminologies
that we all need to grasp:
- Identification Error: Occurs when one OLS assumption or more is / are not
fulfilled.
- Sampling Error: Occurs when the process of sample selection is biased
(wrong), which results in a sample that does NOT accurately represent its
population.
- Coefficient of variation (R-Square): Shows how much of the change in the
dependent variable (y) is attributed to the explanatory / independent variable (x).
- Standard Error (SE) (aka Sampling Variation): Represents the average
distance of the actual data points to the regression line (estimated model).
- A proper OLS regression model is one with BOTH a relatively high R-Square
and low SE.
- Endogeneity: refers to the correlation between the independent variable and
unexplained variation (or “error”) in the dependent variable, which is caused
by other variables that are NOT included in our model (exogenous variables).
★ In a regression analysis, endogeneity occurs when there is a relationship
between the predictor variable and the error term.
Hypothesis Testing
● Definition: Using sample statistics to predict/forecast whether average population
parameters, estimated by the researcher, are accurate or not, in addition to how
confident we are in that accuracy.
- Example: using the sample mean “x-bar” to predict if an estimated population
mean is accurate or not. And if it is accurate, how confident are we in that
estimation?
● Our main challenge, whether with OLS regression or Hypothesis Testing, is that using
different samples from the same population will obviously give us different coefficients
(y-intercept & slope).
- To deal with this challenge, we need to make sure that our sample is properly
selected and our OLS estimators are BLUE (regression model is properly
specified).
- If this happens, then our estimated coefficients (sample statistics) will be very
close to the actual population parameters.
● While engaging in hypothesis testing, we need to consider the following:
- Alpha (Level of Significance):
★ Represents the probability of Type I Error (Alpha) occurrence, which is
rejecting the null hypothesis (H0) although it is true.
★ Type I Error is the opposite of Type II Error (Beta), which is NOT
REJECTING the null hypothesis (H0) although it is false.
★ Could be 1%, 5%, or 10% (determined by the researcher)
★ Example (Alpha = 5%): This means that there is a 5% probability that I
might, wrongfully, reject H0 when it is actually true.
- Confidence Level (CL) vs. Confidence Interval (CI):
★ CL:
➔ Represents how much I am confident in my results.
➔ CL is equal to 1 - Alpha.
★ CI (QQ version):
➔ Represents the mean of your estimate (sample statistic) plus and
minus the variation in that estimate.
➔ This makes CI the range of values you expect your estimate to fall
between if you redo your test, within a certain level of confidence
(CL).
➔ Example: I am 95% confident (CL) that if I redo a certain analysis,
my results will lie within the same range / interval (CI).
★ It is important to note that the higher the Confidence Level (CL), the wider
your Confidence Interval (CI) should be.
➔ The logic behind this is that when you say that, for example, you
are 99% confident of something, you must put a wider range of
values in order to decrease the probability of being wrong.
‫ الزم تأمن نفسك‬،‫ في حاجة معينة‬%99 ‫ لما تقول مثال انك واثق بنسبة‬،‫➔ يعني بالبلدي كده‬
‫ يعني ايه برضه؟‬.‫ عشان ماتطلعش غلط‬،‫وتحط مجال واسع‬
،‫ ان الدفعة كلها هتنجح انت كده بتأمن نفسك انك ماتغلطش‬%99 ‫ لو قلت انك واثق بنسبة‬،‫➔ مثال‬
‫ انت كده زنقت نفسك واحتمال‬+A ‫ ان الدفعة كلها هتجيب‬%99 ‫لكن لو قلت انا واثق بنسبة‬
‫ فتحل ده ازاي؟‬.‫انك تطلع غلط بقت كبيرة‬
‫ كده انت بتدي لنفسك‬،%99 ‫ بدل‬+A ‫ ان الدفعة كلها هتجيب‬%90 ‫➔ تقول انك واثق بنسبة‬
‫هامش خطأ أكبر‬
‫ ان األهلي هيحقق الدوري برصيد نقاط يتراوح ما‬%99 ‫ ممكن اقول اني واثق بنسبة‬:2 ‫➔ مثال‬
‫ لكن لو عايز تبقى دقيق اكتر‬.)‫ نقطة هامش خطأ‬20 ‫ نقطة (حطيت لنفسي‬90 ‫ ل‬70 ‫بين‬
%90 ‫ انا واثق بنسبة‬:‫ فتقول مثال‬،‫(تقلل هامش الخطأ) الزم بالتبعية تقلل نسبة الثقة اللي عندك‬
.‫ نقطة‬85 ‫ ل‬80 ‫ان األهلي هياخد الدوري برصيد نقاط بين‬
‫ هقابلك ما بين المغرب والعشاء‬%99 ‫ هتقول له بنسبة‬،‫ لو خارج مع واحد صاحبك‬:3 ‫➔ مثال‬
‫ فاهمين‬.‫ بالضبط‬7 ‫ هتقابله الساعة‬%90 ‫ لكن هتقول له انك بنسبة‬،)7:30 ‫ و‬6 ‫(بين الساعة‬
😀 ‫يا أساتذة؟‬
● Please refer to the following link for essential relevant information from last year’s QQ
course: Tutorial Slides
-------------------------------------------------------------------------------------------------------------------------------
Lecture 4:
● Simple Linear Regression (SLR): y = B1 + B2X + u
- B1: y-intercept; usually has no economic interpretation
- B2: slope coefficient (estimate); represents how much y changes when x
changes by 1 unit.
- Note: you can refer to the y-intercept and slope coefficient with any symbols you
like (“a & b”; “B0 & B1”; “B1 & B2”; … etc).
- Accordingly, the symbol of the slope coefficient can vary. Nevertheless, the
slope coefficient is ALWAYS the coefficient I am interested in
understanding.
- u: residual (unexplained error); represents exogenous factors that are NOT
included in our regression model. It captures the effect of these factors on y.

-
● Hypothesis Testing: First, let’s start from what we know from QQ. Kindly check the
following screenshot:

● Moving on to the ECONOMETRICS course: The purpose of hypothesis testing in our


course is to know if our slope coefficient is statistically different from 0 or not. Why 0 in
particular?
- Because if the slope coefficient is equal to 0, this means that our explanatory
variable (x) has NO EFFECT on y, which makes our OLS regression pointless.
- On the other hand, if we prove that the slope coefficient is STATISTICALLY
DIFFERENT from 0 (does NOT equal 0), this means that our selected (x) does
actually impat y.
- So, how do we know if our slope coefficient is different from 0 or not? We do
hypothesis testing using the following 4 steps:
1- Identify Null Hypothesis (H0) & Alternative Hypothesis (H1) as follows:
➔ H0: B2 = 0
➔ H1: B2 does NOT equal 0
2- Determine the critical value:
➔ From QQ, we used to get this critical value from the z-table or t-table.
➔ In Econometrics, this critical value is either:
A- Given (no need to go through any tables)
B- Rule of Thumb (A value you should remember on your own). These
values are:
★ Critical z-value (at 95% CL) = 1.96 (Note: this value is used to
determine the Confidence Interval (CI), we will get to that at a
later point.
★ Critical t-value (at 95% CL) = 2 (actually it is 1.987, but we
approximate it to 2 for convenience).
3- Calculate the computed value:
➔ From the following equation (WHICH IS VERY SIMILAR TO THE ONE
WE TOOK IN QQ, BUT THE SYMBOLS ARE DIFFERENT):

Where:
➢ B1 hat: slope coefficient
➢ Null: the value I am testing for, which is 0 in our case and in most
cases that you will encounter in this course.
➢ s.e. (B1 hat): standard error of the slope coefficient (it will be
given; you do not need to calculate it).
4- Compare the computed value (from Step 3) to the critical value (From
Step 2) in order to make a final decision:
➔ Remember that the critical value in our course will usually equal 2.
➔ Accordingly, if the computed value is greater than 2, then you should
reject H0.
★ What am I rejecting? I am rejecting that the slope coefficient is
equal to 0, which means that it is actually NOT EQUAL to zero,
which means that MY EXPLANATORY VARIABLE (X) HAS A
REAL EFFECT ON Y.
➔ On the other hand, if the computed value is less than 2, then you should
NOT REJECT H0.
★ What am I NOT REJECTING? I am NOT REJECTING that the
slope coefficient is equal to 0, which means that it is actually
EQUAL to zero
★ This also means that MY EXPLANATORY VARIABLE (X) DOES
NOT HAVE A REAL EFFECT ON Y.
★ This is because when the slope coefficient is 0, then the whole x
variable will vanish (because 0 times anything = 0. In this case, x
does not explain any of the variation that happens in y (x and y are
NOT correlated).
-------------------------------------------------------------------------------------------------------------------------------
● Confidence Intervals (CI) (Econometrics version):

- CI represents the range of values you expect your population parameter to fall
within based on the sample statistics, within a certain level of confidence (CL).
- Example: I am 95% confident (CL), based on my sample statistics, that my
population parameter will lie within this range.

You might also like