Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Dummy Variables (as independent variables)

Dummy variables are variables that take the values of only 0 or 1

In a regression model, a dummy variable with a value of 0 will cause its


coefficient to disappear from the equation. Conversely, the value of 1 causes the
coefficient to function as a supplemental (additional) intercept, because of the
identity property of multiplication by 1.

Suppose we want to see the impact of additional years of schooling on the level
of income of a population. Let Y be the level of income and (edu) be the
average years of schooling of an individual. Then the expected level of income
Y^ = β1 + β2(woman)

Here, (female) is a dummy variable.

If the regression is run for a man, the dummy variable ‘woman’ will take the
value zero, which will eliminate the effect of β2 from the equation.

On the other hand, if the regression is run for a woman, the dummy variable
‘woman’ will take the value one, and the value of Y^ will be deviated by a factor
of β2.

As the number of years of education is not accounted for in the equation, only
the value of slope-intercept will change in the case of a woman.

If the intercept as well as the slope of the regression equation is different in the
case of men and women, then we will have the following equation:
Y^ = β1 + β2(woman) + β3.edu + β4.edu.(woman)

The increment in wage is different for men and women for the same increase in
education level.

Assigning Codes for Different Categories


Suppose we have more than 2 categories in a group, and want to represent a
regression equation with dummy variables. In this case, we give a code to each
category in the group.

E.g. if we want to see the effect of a person’s social group in determining his
income level, we may assign the following codes to different social groups.
ST – 1

SC – 2

OBC – 3

Others – 9 (It is a convention to code others as 9)

The regression equation can be represented as follows:


Y^ = β1 + β2(SC) + β3(OBC) + β4(others)

Here, β1 represents the marginal relation of ST and income level, as we have


taken ST to be our base category, from which each other category’s deviation
will be shown.

SC, OBC, and others are dummy variables, which will take value of 0 or 1 each.
For a person belonging to SC, SC = 1 and OBC and (others) = 0, so only β2 will
be accounted for. β2 is the deviation of SC category from the ST category.

So, Y^ (SC) = β1 + β2

Similarly, Y^ (OBC) = β1 + β3 ; and Y^ (others) = β1 + β4

Note: If we have ‘n’ number of categories in a group, then we will have ‘n-1’
dummy variables and one base variable (constant)

There can be many dummy variables in an equation with different categories.

E.g. the average years of education can depend upon many variables like
gender, income group, social group.

^
Edu= β1 + β22(Q2) + β32(SC) + β4(male)

β23(Q3) β32(OBC)

β24(Q4) β32(others)

Here, the base variable β1 represents a female belonging to ST cast and the Q1
(topmost quartile of income) category.
Example 1: Let us understand the working of dummy variables with an
example of monthly per capita expenditure (MPCE)

MPCE = 1688 – 66(SC) + 244(OBC) + 956(oth)

Hence,

MPCE(ST) = 1688 (since ST is the base variable)

MPCE(SC) = 1688 – 66 = 1622

MPCE(OBC) = 1688 + 244 = 1932

MPCE(oth) = 1688 + 956 = 2644

Example 2:
Y^ = β1 + β2.edu + β3.(fe) + β4.edu.(female)

d Y^
Now, d (edu) = β2 (for men) i.e. when (female) is zero

d Y^
And d (edu) = β2 + β4 (for women) i.e. when (female) is one

The intercept term for men is β1 and that for women is β1 + β3

Comparison of effect of eduction on income


60

50

40
Monthly Income

30

20

10

0
0 25
No. of years of education

Men Women
As we can see from the graph, the intercept as well as the slope for both men
and women is different. This may imply that there is a gender bias in the
workspace initially, and even further education does not help women as much
as men in earning a higher salary.

The above graph depicts the equation:


Y^ = 8 + 1.68[edu] - 3(fe) – 0.48[edu](fe)

Where the base variable represents men and the dummy variable is for women.

You might also like