Measuring Economic Inequalities

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Julien Barlan, institut d’etudes politiques de paris

https://docs.google.com/Doc?

docid=0AXe2E1Mm09WIZGhzazhxaDRfMjUzZ25nMjdkZzY&hl=en
Julien R. Barlan (IEP Paris) Austin, Texas, USA
Department of Economics -April 2010
julien.barlan@sciences-po.org
http://jrbpagetravaux.blogspot.com/
 
Measuring economic inequalities: Lorenz curve, coefficient of variation and
Gini coefficient

In this handout, I am at introducing a classical economic tool, the Lorenz curve, which
measures inequalities in a society focusing on income distribution. It has been developed by
American economist Max Lorenz in the early 20 century when he was a graduate student
th

at the University of Wisconsin at Madison.

First of all, download the excel spreadsheet clicking here

You are strongly advised to reproduce what you are reading while going over this
document. Eventually, you should be capable of running you own experiments. If you are
using this handout to review and exam or just doing homework, make sure you understand
the concepts and the calculations behind all the notions. Moreover, do not skip hand
calculations. It is unlikely you will be asked to compute complicated curves or coefficients
while taking an exam. Thus, you should use my excel spreadsheet for more complicated
experiments.

I will first go over the theory and the mathematic formalization, before running some
experiments about income inequality in the United States. The third part consist of an
application to NBA statistics as an illustration of the usefulness of that tool to measure the
degree of equality of any kind of distribution.

This handout is accessible through my web site http://jrbpagetravaux.blogspot.com/ and the


following Wikipedia pages:
       The Lorenz curve http://en.wikipedia.org/wiki/Lorenz_curve
       The Gini coefficient http://en.wikipedia.org/wiki/Gini_coefficient
       Income inequality metrics
http://en.wikipedia.org/wiki/Income_inequality_metrics

Credited reproduction is permitted for academic purpose only. If you plan to use this
handout in class or in a presentation, please don’t forget to reference my university,
Sciences Po Paris (France), and myself.
 
I)                Theory and mathematics of the Lorenz curve
 
It actually is nothing else than a very nice way to measure the degree of inequality among a
society. We only need two sets of data to compute it, which are revenues and population. If
we know what is owned and by who, we can organize our knowledge in the following way:
we will have two vectors, one representing the cumulative shares of the population, the
other being the cumulative shares of the total income with respect to population.
 

The entries in X are shares of the populations, the entries in y Y are shares of the total
i

income corresponding the x .


i

 
Let’s take an example: in a society composed of X = six individuals, one owns $100, two
own $200 and the three remaining possess $300. Does it make
a total income Y = $600?
The answer is no. This is the first mistake to avoid since we have to scale. For example the
third group overalls 3 x 300 = 900. So our total income ($) is:
100 + 400 + 900 = $1,400.

Looking at the population, actually earns . The second group

weights 33% of the total population and earns of total income.


Similarly, half the population is the owner of 64.29% of the society’s wealth. If we add
those proportions in order to get the cumulative shares of income, one can argue that 16.7%
of the population represents 7.14% of the total revenue.

16.67% + 33.33% = 50% so half the population just owns 7.14% + 28.57% of Y.
Eventually, adding the third and last portion of X, we end up with the 2-tuple {X(%),Y(%)}
= (100, 100). If you plug the values into the attached spreadsheet, you would end up with
the following:

Table 1

Now we can compute our graphical representation of the Lorenz curve, labeling the x-axis
as the cumulative shares of the population and the y-axis of the cumulative shares of
income.
Note: you should draw those tables every time, either in exam, doing homework or
conducting some researches!
Graph 1

We clearly identify our four coordinates, taking into account that (0,0) and (100,100) are
obvious.
 
Here are the pairs: (0,0) (16.67,7.14) (50,35.71) (100,100)

How can we interpret the previous graph? Actually it might be uneasy, or at least not
obvious, without the knowledge that the perfect equality line represents a society in which
the outcome is equally shared among the population. We have a continuous straight line on
the interval [0,100]. Mathematically, we have the easy function f(a) = a. The interpretation

is the following: X% of the population would earn X% of Y or .

Consequently, the more a particular Lorenz curve lies far under the perfect equality line, the
less the society is equal. But that is just a graphical interpretation. Based on this piece of
information, one might only say that the studied society reached some point in the
inequality scale. To go into more details, we have to introduce two statistical tools.

1.   The coefficient of variation.

It measures the spread of our weighted income distribution. It invokes some basic statistics
tools, such as the weighted mean and the standard deviation. It actually is the ratio of the
later over the first. The standard deviation measures how far is a value from the mean. One
might argue that the overall sum of the deviations should be zero and that is exactly right.
This is also why we must square the subtractions (why? See a statistics text is you are not
familiar with those analyses). But since A is a more than proportional increasing function,
2

we want the deviation from the mean to be weighted by the proportion of the population
earning y . Eventually we divide the standard deviation by the weighted mean mu .
j
In our example, C is roughly 32%. Remember that it is just a spread. Nevertheless, the
more the spread is important, the more the society is likely to be unequal. To give an
example, in 1950 Mexico, C was equal to 2.5, i.e. 250%. So one can argue that the society
we are looking at is comparatively less unequal than mid-20 century Mexico, but is
th

comparatively less equal that a society experiencing the perfect equality situation and
where C = 0 (since all deviation from the mean are necessary 0!), or any more realistic 0 <
C < 0.32 .

2.   The Gini coefficient

Developed by Corrado Gini, an Italian statistician in the first part of the 20 century, it
th

“takes the difference between all pairs of income and totals the absolute differences”[1]
devided by twice the product of the population squared and the weighted mean.

Despite that apparent unfriendly calculation, the Gini Coefficient is very useful and widely
used in social sciences - not only economics - since it measures the area A between the
perfect equality line (E) and the Lorenz curve in portion of the area which lies between E
and the x-axis. Let’s make it simple. E(x)=x on the interval [0,100]:

Since the anti derivative F(x)= and . In other words we have .

We face two extreme values for G. Obviously, so when the Gini coefficient for a
Lorenz curve is null, this means that the later is x =y , which is E(x) the equal society.
i i

On the other hand, if G = 1A = 5000. The society is as unequal as possible since people
in the last category own all the income Note that we need at least two people vamped in
two categories for the following to hold. If one person composes the society, it can only be
equal though it does not make a lot of mathematical sense to talk about equality for a single
variable.
Anyway, those two situations are just the two extremes of the model, they just have to be
understood but it is very unlikely that they will happen in real situations.

II) Income inequality in the United States


1.   Geographic

Based on a 2009 report of the Census bureau[2], we are going to compute a Lorenz curve
for income distribution in the US with respect to geography. The population, expressed in
terms of households, will be organized in administrative locations. There are four of them:
Northeast, Midwest, South and West. First of all, we need to order them by increasing
income. If we do not pay attention to that organizational matter, we might end up with a
Lorenz curve, partially or totally, lying above E(x). This is a mistake since it has to lie
under E(x), think about it in term of the ratio of integrals. Table 2 has consequently ordered
the regions as the following: South, Midwest, Northeast, and West using median incomes.

Table 2

Not to get confused, note that we gradually group income values per individuals. For
example, each household in Texas is considered to earn $47,961, which is the lowest
individual income, but total income in the South is overwhelmingly higher than any other
part of the country. Make sure that you organize your data in consequence.
We can now have a look to the Lorenz curve and our two statistical measures of inequality.

Graph 2

It turns out we are likely to rule out the hypothesis United Sates is geographically unequal
when it comes to income. The spread is about 6.3% and the G=3.5%. We ran this study
using median
incomes. Recall that if there are n households in the Midwest, the median income is the

in the distribution, when it is gradually organized from the lowest to the highest
th

value.
For those who are familiar with empirical studies, it is not surprising since the median,
unlike the mean, is not affected by outlier values. Since we are dealing with very large
samples, it is likely the distribution is normally distributed around the mean but using the
median is generally a safer way to estimate inequality. See a statistics textbook if you want
to go over those issues in more details.
 
2.   Age inequalities

As it appears that income is pretty equally spread around the United States, we might now
be interested in measuring inequalities among citizens. Let’s make the hypothesis that we
might encounter a larger level of inequality when looking at age rather than location. It
seems to be a straightforward hypothesis since young people are not so likely to make
money and retired folks earn less that workers. Based on the same Census report, it actually
turns out that young people aged 15 to 24 make individually more than elders over 65. Note
that 45-54 means people from age 45 to 54. Income is measured in dollars.

Table 3

The population looks less equally distributed than in the geographical test. Let’s have a
look at the Lorenz curve.

Graph 3
Compared to our previous experiment where we had C = 6%, we observe a coefficient of
variation more than four times bigger. The Gini coefficient confirms what one could
graphically visualize: the area between the Lorenz curve has significantly increased.

Question: do you conclude the United States is a country where it does not matter where
one lives, for example they would find as much poor places in the Midwest as they would
in New England?
 

3.   Interpretation and comparisons issues

The answer is no. You may have notice that all is bout categories. The geographical slicing
of the United States is just a rough cutting in four pieces. Consider the West part of the
country: it bears California and Idaho. Do you believe that income distribution is the same
between those two states? Do you think we are likely to find as much rich places in volume
in California as we could in Idaho? Once again, the answer is negative. It all depends on
categories. We are now dealing with volume issues since we had run a proportional study.
To do so, we picked up California and Idaho, grouped them with others and created
“West”. We could always make up our statistics to argue Idaho is as rich as California, we
could try to argue there are more cities in Idaho that earn more than a certain proportional
amount of revenue than California but let’s make it clear: California is richer than Idaho.
We are just trying to underline that scaling, grouping, and setting up criteria might shade
reality.

All that matters is what does “geography” actually mean. If it means four regions created in
order to have four groups more or less equal in terms of income, then the United States
does not know inequality. If it means fifty different states, our conclusion does not hold any
longer.

Let’s take an other example: does a state like New York experiences an income distribution
like the one we observe for the all United States? Or looking closer, does New York City
do? Or even closer, does the borough of Manhattan do? Zooming as much as we can, is
income equally distributed on Fifth Avenue, between Harlem and Rockefeller center? We
might stick to a non-positive answer. Everything is just a matter of scaling. While looking
into details, we can find some huge evidence of inequality.

If you are about to run your own experiments, you have to clearly define your criterions.
You have to remember that the more large samples you take, the more large are your
groups, the more everything tends to be balanced.

We can now deal with another issue: comparisons over time. Our studies can lead us to
compare two Gini coefficients at t and t . We might conclude that inequality has reduced
0 1

within a country over a period or time, or on the contrary underline that it has risen. The
challenge is to find data and my purpose is not to wrestle with the Census website. If you
are interested in those questions, you might want to compute it yourself. You would just
have to run the experiment plugging values on the excel file going with this handout.
Nevertheless, you might want to be aware of the following problem:
Table 4

  Gini coefficient Coefficient of variation


Period 1 0.20 0.45
Period 2 0.30 .030
 
Don’t you notice something unexpected? Since the Gini coefficient has thrived, we might
conclude that the society has become more unequal between t and t . But on the other hand,
o 1

we can also notice that the coefficient of variation is reduced. It is confusing, because it
would suggest that inequality has shrunk.

Such an empiric observation is not hypothetical, it can actually happen. To deal with it, we
need to introduce the notions of progressive and regressive transfers. In the first case,
some income has been transferred from a group to another one, relatively poorer. This
obviously reduces inequality holding the total income fixed. A regressive transfer is the
opposite situation, where we transfer some money from a group to a relatively richer one.
Consequently, one can argue that holding total income fixed, it has increased inequality
within the society. In a nutshell, if we have such contradictory results, we are dealing with a
mix of both transfers. Looking into details at the quantity of what is transfers every time a
transfer occurs can be helpful to determine if the society has become more equal or not. But
there is no general rule. We are reaching the limits of the efficiency the Lorenz curve.

Note: Such transfers are redistribution processes because we are not considering an increase
in the economy’s wealth. If we were comparing inequality in the United States between
1960 and 2010 using a Lorenz curve, we would have to do it in proportion and not in
volume. And we would have to take into account inflation.

II) Applicability of the Lorenz curve: basketball example

The Lorenz curve can actually measure any kind of inequality. One just has to redefine
what are population and income. Let’s focus on Los Angeles Lakers scorers. It is known
they rely a lot on Kobe Bryan who, so far, has been scoring 27.1 points per game in the
2009-2010 season. It might be interesting to highlight how important he is for his team. The
point is to evaluate how unequal is the Lakers shooting distribution. We take the ten leading
scorers of the team. Necessarily, population is 1 for every player. We just replace “income”
by “points per game”. Let’s check out the Excel output.

Table 5
It is clear that Kobe Bryant has a major impact over the Lakers season. But it is even more
interesting to compare this result to other teams statistics in order to derive an overall
conclusion. The 2009 NBA final opposed the Orlando Magic’s and the LA Lakers. The
Lakers won 100-75, and Bryant score 40% of their points. Let’s run the same experiment
we just did to estimate if the Magic’s rely as much on a single player as the Lakers do.
 
It turns out that the Magics’ scoring distribution is definitely less unequal than the Lakers’.
What could we conclude?  That the Lakers are better? Or that Kobe Bryant is better than
the all Orlando team? We can be sure is that if they were to play again, Kobe would be very
likely to be the best scorer. We are also pretty confident when we argue the Lakers would
be seriously harmed if Kobe were to be injured of could not play for any reason. You can
try to run the experiment for the San Antonio Spurs and you would highlight that their
scoring distribution is even more equal that the Magics’.

Now have a look to your college basketball team data and repeat the exercise once again.
Assume a player is very good and you find a very unequal distribution. For some reason,
your team is about to play the Lakers tonight. According to the Gini coefficient, would you
be ready to bet some money you will win the game? Or would you bet that your best scorer
will end up with more field goals than Kobe? Probably not much, once again we are facing
a comparison problem. We are not really talking about the same thing when comparing
NCAA to NBA. Players are different, the level of experience is completely different, and
even the rules have nothing to do with each others! We could not seriously derive a
conclusion. Eventually, one might even argue that if economics is a science, even
imperfect, sport is much more unpredictable.
 
 

[1] RAY Debraj, Development economics, 1998, Princeton university press, Princeton NJ
[2] http://www.census.gov/prod/2009pubs/p60-236.pdf

You might also like