Week 2 New

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 17

Week 2: Basic Regression of

Binary Variable (Qualitative


Variable)
Bintang Satrio S.E., M.Ec.Dev
Preview
• Recall our Simple or Multiple Linear Regression Model:

•  Constant; &  Coefficient of Independent


• We have i subscript that indicates the entity
• Subscript depend on your data type (Time Series, Cross Section,
Pooled Cross Section, Panel)
• The term u called unobservable variable
Preview
• Suppose we have:

• Educ is years of education  quantitative (12 Years, 9 Years, etc)


• Wage is also quantitative  $100/month, $250/month, etc
• Sometimes we encounter qualitative data
Qualitative Data
• How we describe qualitative variable?
E.g1: Marital Status Single or Married
E.g2: Sex  Male or Female
E.g3: Racial White or black

• We cannot estimate an equation without using numbers.


• Can be captured by defining a binary variable (Dummy Variable)
Qualitative Data
• For Example, to indicate Sex, instead of using Male or Female, we can
emphasize it with 0 or 1
• So  1 for female; 0 otherwise
• Or we can use female to indicate the variable of sex,
No Sex Binary Form/Female
1 Female 1
2 Female 1
3 Male 0
4 Female 1
5 Male 0
6 Female 1
Qualitative Data
• Another Example, to indicate Race, instead of using White or Black,
we can emphasize it with 0 or 1

No Race Binary Form/White


1 White 1
2 White 1
3 White 1
4 Black 0
5 White 1
6 White 1
Single Linear Regression with binary variable
• Suppose:

Where we assume:
 Zero Conditional Mean Assumption
Then:

• How you interpret it?


Interpretation
• You need to know:

So (recall your math course):

Then:

• So show Difference of wage between female and male


Interpretation
• Given another example:

Where:
perform = employee performance  performance’s score
training = participating in training 1 for join training; 0 otherwise

and assume:
 Zero Conditional Mean
Interpretation

• If the result of for example 5.2


• We can say that the performance scores of employees who attended
the training are 5.2 points higher compared to those who did not
attend the training.
• 5.2 points higher means that there’s a difference score for person
who join training and otherwise
Case1: House Pricing (Prepare your STATA)
• For basic example, given the scenario:
“what is the effect of the location of a garbage incinerator on house price?”
• So, given the equation:

Where:
lprice= house selling price in logarithmic form
nearinc 1 for house near from incinerator; 0 otherwise
X = Control Variable (such as total of rooms, total of bathrooms)

• what the expected sign of β1?


Case1: House Pricing (How to do in STATA)
• First, open your “kielmc.DTA” on your STATA
• Remember your equation:

• Then, to estimate the equation above, write


“reg lprice nearinc rooms baths” on your
command window
Case1: House Pricing

• Estimation show that coefficient of


nearinc is -0,12
• It means that price of the houses
located near the incinerator are
0.12 percent lower/cheaper than
houses far from the incinerator.
• Additional: you can add “ro”
option to avoid Heteroskedasticity
Case2: Math Score
• For another basic example, given the scenario:
“Is there a difference on score based on where they attend school?”
• So, given the equation:

Where:
math12= mathematics standardized score
cathhs 1 if attended Catholic High School; 0 otherwise
X= Control Variable (such as parent education, family income)
Note
• Many type of binary variable, not only they are white or black, female
or male etc
• Example:
E.g4: Ethnic/Tribe Asian or non Asian; Hispanic or non Hispanic
E.g5: Decision Decide to buy or not; Decide to join insurance or not

• What if this binary variable acts as the dependent variable?


Next Week!
• Encounter limited dependent variable
• Logit Probit and LPM

You might also like