Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Chapter 2: The linear regression model (Mô hình hồi quy tuyến tính)

1. Regression: Quy về giá trị trung bình (regression to the med)


Defined by Galton: height of sons and fathers
2. Regression line (đường hồi quy) connect med points, show the relation on average
3. PRF/SRF (population regression funtion/sample regression funtion):
3.1. PRF
3.1.1. Definition: regression funtion that is constructed based on the survey of the
population
E(Y/Xi )= f(Xi ) = β0+ β1Xi [1]
E: expectation
i: ith observation
n: number of observation
β0 , β1 : regression coefficients/ parameters
β0 : constant coefficient
β1 : slope coefficient
– PRF shows how the expected value of Y changes at different values of X
– If PRF has 1 independent variable  simple regression function
– If PRF has 2 or more independent variables  multiple regression function
3.1.2. Error/ disturbance term
- Because E(Y|Xi ) is expected value of Y given Xi , single values of Yi is not
necessary the same with E(Y|Xi ), but they are around E(Y/Xi ).
- Note ui is the difference between Yi and E(Y/Xi ), we have:
ui= Yi - E(Y|Xi ) [3]
Or : Yi= E(Y|Xi )+ ui [4]
3.2. Sample regression funtion (SRF)
- In reality we can not cary out surveys of population(PRF) -> SRF

- U i : error term, disturbance, noise


- ui mũ: estimate of u, residuals
- k: number of independent variables in the models
- ^y iis an estimate of E(Y/Xi) and is a fitted value/ predicted value of Y
- 0 , 1 mũ are estimates of β0 , β1
4. OLS (the ordinary least square):
Linear regression model -> cross-sectional data/ Y is a continous variables => OLS
- It is used to estimate parameters given some assumptions.
- The estimates have some properties (linearity, unbiasedness, and efficiency).
- This method is used the most popularly now.
- We have to choose SRF so that the sum of residuals has the minimum value
- (sum of residuals)

-  has the minium value (dấu mũ để giải quyết vấn


đề về dấu) (SSR: sum of squared residuals)

- beta 0,1

- SSR
-
STATA
2 kinds of variables: quantitative and qualitative
- Quantitative variables: is a random variable that has value in number and the value
has the meaning in terms of algebra. It means that we can compare these values and
comparision has meaning.
Ex: educ: level of education
- Qualitative variables: is a random variable that has value in number but the value has
no meaning in term of algebra. It means that we can compare these values but te
comparison has no meaning.
- We have to transfer a qualitative variable into a dummy (zero-one, binary) variable.
- If a qualitative variable has n categories, then we can create n dummy variables. But
we can only include (n-1) dummy variables in the model. The cariable excluded out
of the model is considered as base group or benchmark variable to compare
- Ex: variable Male =1 if the obs is a male, =0 otherwise
variable Female =1 if the obs is a female, =0 otherwise

STEPS:
Step 1: Question of interest
Topic: Analyzing determinants of income of individuals in the USA
Choose Y and X
Y: wage
X: educ, exper, nonwhite…
Step 2:
Step 3:
Step 4:
Step 5: Estimate the model
- Econometrics model:linear regression model
- Method to estiamte coefficent: OLS
- Correlation matrix presents:
o Correlation betwwn Y and X -> r(Y,X): if r(Y,X) !=0 -> X has correlation
with Y -> we can inlcude X in the model
o Example:
 r(wage,educ) =0,459>0 -> education has positive correlation with
wage -> educ should be included in the model
 r(wage,nonwhite) = -0,0385<0 race has correlation with wage -> the
negative correlation coefficient implies that non white people tends to
have lower wage than white people (on average)
 r(wage,female) = -0,3041<0 gender has correlation with wage -> the
negative correlation coefficient implies that female tendds to have
lowers wage than male on average
- Conclusion all the independen variables have the correlation with the dependent
variable -> we can include them in the model
Step 6:
- Topic 1: Analyze the imapct of foreign direcct investment on GDP growth of VN
o Y: GDP growth
o X: FDI
o X2, X3…Xk: Control variables
- Topic 2: Analyze the relationship between FDI and GDP growth of VN
o 1. GDP growth = f(FDI, Xi)
o 2. FDI = f(GDP growth, Xi)
- If we have 2 variables A and B and we want to see if A or B can be Y or X -> we
have to check the nature of A and B
o calculate the correlation coefficient of A and B -> r(A,B)
o if r(A,B) != 0 we can include A and B in the model
o if A correlates with B check if the correlation is causation or not
- If causation:
o result/consequence ->Y
o cause -> X
5. The sum of squares
SST

You might also like