Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Uncertainty: Part V

Stochastic Dominance: Applications to Inequality1

P.G. Babu
Indira Gandhi Institute of Development Research
Gen.A.K.Vaidya Marg
Goregaon (East), Mumbai 400 065.
email: babu@igidr.ac.in

Inequality

We can think of inequality measurement as an attempt to give meaning to comparisons of various


income distributions in terms of criteria that are developed from ethical principles, or appealing
mathematical formula or the intuition. The moment we think of it as comparing income distributions, we can draw upon similar methods that are used to compare distributions in general; for
example, we can think of applying ideas from general principles of distributional ranking such as
Stochastic Dominance. The focus on income distribution gives rise to the fundamental question:
what is an income distribution?
There are two ways in which we can think about income distributions. One is based on an
individualist approach. If we assume that income captures all that we might want to know
about an individuals economic status, then the income distribution can be represented as a mere
list of persons and a list of their corresponding incomes. For example, if we have n people in
an economy, letting i = 1, . . . , n represent their names, and xi , i = 1, . . . , n their incomes, the
distribution can be represented simply by an n-dimensional vector:
x = (x1 , x2 , . . . , xn ).
In case you want to capture the idea that the income receiving agents may not be merely individuals,
but families with different household sizes, then you could modify the above definition by adding
population weights, in which case the distribution would become:
(w1 , x1 ), (w2 , x2 ), . . . , (wn , xn )).
In the latter case, your income will not be a single variable but a multivariable one.
The other way in which we can think about an income distribution is to use some aspect of the
general statistical concept of a probability distribution. Think of a parade of people, where they
are ordered according to their incomes, the lowest income person comes first in the parade, and so
on. Then, if order them this way, then, in the horizontal axis, we can measure proportions of the
population and income along the vertical axis. That is, the population is arranged in ascending
order of income. If you see the Figure 5, you will notice that the points x0.2 corresponds to the
1

These are merely lecture notes. No originality, other than the organization of the material is claimed.

person who appears exactly at the twenty percent way along the parade of the people, and x0.8
gives the income of the person who appears at eighty percent way along the parade of the entire
population. In this way of representing income distribution, we gain the advantage of borrowing
the concepts from Statistics in our inequality analysis.
Both of the above methods are useful. For example, if one is more interested in developing
individualistic welfare criteria, we would use the first approach. However, given our intent to develop
a general theory based on distributions, we would focus more on the latter, viz., distributional
approach to inequality measurement.
Needless to state, a good definition of inequality would require us to come up with implementable definitions of income and the income recipient. In case we think we need to convert the
standard accounting measure of total family income into a measure that captures individual
welfare, we need to introduce an equivalence scale which defines the rate of exchange between
conventionally defined accounting income and an adjusted notion of equivalised income, which
works as a money metric of utility. For the moment, we understand that our notion of income
incorporates all such concerns.
With this preamble, let us turn to the details.

Specific Inequality Comparisons

One way of illustrating inequality in the distribution of income is to divide the total population
into several equal sized income classes or so called fractiles from the poorest to the richest, and
then to evaluate the percentage of income accruing to each fractile or income class. Think of the
following example.
Table 1:
Income Class (Fifth of the Population)
Bottom
Second
Third
Fourth
Top

Percentage Share in Year X


6.8
13.2
18.5
24.5
37.0

Percentage Share in Year Y


6.1
12.5
18.1
24.5
38.8

The top fifth gained at the expense of the lowest three income classes.
Distribution in Year X has less inequality than the distribution in Year Y as share of total
income held by the poorest classes is more. Of course, this measure is somewhat crude as it ignores
the distribution of income within each income class or fractile. A way around is given by the
Lorenz curve, formulated by Max Otto Lorenz in 1905. It graphs the cumulative share of the total
income accruing to each cumulative share of the total population, when incomes are ordered from
the poorest to the richest. As the share of the population p varies between [0, 1], the Lorenz curve
indicates the share L(p) of total income received by the lowest p of the population.
2

If all incomes are equal, the lowest p receives exactly p of the total income: its Lorenz curve
will be the diagonal from (0, 0) to (1, 1).
INSERT FIGURE ABOUT HERE.
As income becomes less equal, the associated Lorenz curve bows outward, away from the diagonal. Distribution 1 is said to have less inequality than Distribution 2 in the diagram below by
the Lorenz criterion if L1 (p) L2 (p), p [0, 1], with strict inequality for some p. If two Lorenz
curves intersect, then they are not comparable by the Lorenz criterion.
INSERT FIGURE ABOUT HERE.
In the above figure, L3 has more inequality than L1 and L2 . But L1 and L2 are not comparable.
The Lorenz curve offers only a partial ordering of income distributions with respect to inequality.
To obtain complete ordering, you need to find a numerical inequality measure. The Gini Coefficient is the most widely used numerical inequality measure, partly due to its interpretation in
terms of the Lorenz curve. Consider the area A between the diagonal and Lorenz curve of a given
distribution, as below.
INSERT FIGURES ABOUT HERE.
When all incomes are equal, the Lorenz curve coincides with the diagonal, and the area A is
zero then. At the other extreme, when only a single income is positive, the area tends to B, the
entire area below the diagonal. The Gini coefficient is the ratio of A to B, or equivalently, 2A
(given that B = 0.5). The potential range of Gini coefficient is between 0 and 1; the typical values
are between 0.20 and 0.60.
You can also consider the coefficient of variation. If we consider the distribution of income
as a probability distribution, where the probability of a specific income is the proportion of the
population holding it, we may define the mean, variance and standard deviation as usual. Then,
the coefficient of variation is simply the standard deviation divided by the mean, while its square
is the variance over the squared mean. Measuring income in Euros or thousands of Euros leaves
the coefficient of variation unchanged, while this is not the case for variance. The problem with
the coefficient of variation is that of transfer neutrality principle: a redistribution of income at
the lower end of the income distribution has the same effect as an equivalent redistribution at the
upper end of the income distribution. One would expect inequality measures to weigh transfers at
the lower end more heavily.
Pn

recall: Arithmetic Mean is given by =


Pn To quickly
2

i=1

xi

; the Variance is given by 2 =

(xi )
n1

; the Standard Deviation is denoted by is the square root of 2 , and the Coefficient
of Variation, CV, is given by .
i=1

Instead of thinking in terms of specific inequality indexes as above, let us think about general
principles.

General Inequality Measures

Let us quickly look at the key concepts that we need in order to compare income distributions in
the context of measuring inequality. Inequality ordering, denoted by I means a complete and
transitive binary relation on D, the set of all distributions. If it is continuous, we can represent it
as: I : D R. A Social Welfare Function (SWF), W, can be defined as: W : D R. The strict
ordering and indifference relations are defined in the usual manner. The properties of I and W
are determined by the fundamental distributional axioms or ethical principles. Let us turn to a
quick recap of the axioms now.
Axiom 1: Anonymity (x1 , x2 , . . . , xn ) I (x2 , x1 , x3 , . . . , xn ) I (x1 , x3 , x2 , . . . , xn ) . . .
This axiom states that permutations of names or labels of persons are regarded as distributionally equivalent. It implies that we only make use of the information about the income variable and
not any other characteristic which might be discernible in a sample.
Axiom 2: The Population Principle (Dalton) (x1 , x2 , . . . , xn ) I (x1 , x1 , x2 , x2 , . . . , xn , xn ) I
(x1 , x1 , x1 , x2 , x2 , x2 , . . . , xn , xn , xn ) . . .
The Population Principle says that an income distribution is to be regarded as distributionally
equivalent to a distribution formed by replications of it. That is, it is immune to replication.
Axiom 3: Principle of Transfers (Pigou-Dalton) G I F if distribution G can be obtained
from F by a mean preserving spread.
How do we understand this axiom? Consider an arbitrary distribution, say, xA = (x1 , . . . , xi , . . . , xj , . . . , xn )
and a number such that 0 < < xi xj ; from xA let us form another distribution xB =
(x1 , . . . , xi , . . . , xj + , . . . , xn ). The axiom of transfers then ranks xB as more unequal than xA .
The rest of the components of xA and xB are kept identical.
Axiom 4: Monotonicity G W F if distribution G can be obtained from F by a righward
translation of some probability mass.
That is, let xA = (x1 , . . . , xi , . . . , xn ). Get xB from xA as xB = (x1 , . . . , xi + , . . . , xn ). Then
this axiom says that welfare is higher in xB than xA . You can also think of a uniform rightward
translation of the whole distribution instead of the Monotonicity axiom.
Axiom 5: Scale Invariance If all incomes go up or down by a scalar, then the ordering is
unaffected.
That is, for F, G D, if G I F , then G(x/k) I F (x/k) for a scalar multiple k R+ . You
could also think of translation invariance instead of scale invariance.
Axiom 6: Decomposability Given F, G, K D, and [0, 1], G I F implies (1 )G + K I
(1 )F + K.
4

If we combine distribution K in the same way with both G and F , where all the distributions
F, G, K have the same mean, the original ordering should still be preserved. This is analogous to
the independence axiom of the N-M utility theory.
Dalton argued that underlying any inequality measure, there ought to be some concept of social
welfare. Let us follow him to assume an additively separable and symmetric function of individual
incomes; then we would rank distributions according to:
Z b

Z b

u(x)dF (x)

u(x)f (x)dx =

W =

where u : X R is the evaluation function of incomes.


Note that the additively separable form of the Social Welfare Function (SWF) W implies the
decomposability of the inequality ordering.
In order to arrive at any ordering of distributions, we have to make some assumption about the
form of u(x). Let us consider a class of u(x) that are increasing and concave. You can of course see
the parallel to what you have been doing in N-M utility theory and stochastic dominance. Recall
our stochastic dominance ideas.

3.1

First Order Distributional Dominance

For all F D, and for all 0 q 1, the quantile functional is defined by:
Q(F ; q) = inf {x | F (x) q} = xq .
INSERT FIGURE ABOUT HERE.
For any distribution of income F , the graph Q describes the ascending order of income, where
x0.2 gives the income of the person who appears exactly twenty percent along the parade of people.
It helps to think of the above diagram as an inverse of the one that appears below.
INSERT FIGURE ABOUT HERE.
Theorem 1
G Q F if and only if W (G) W (F ) for all W W1 , where W1 is the class of increasing SWFs.
If each quantile in distribution G is no less than the corresponding quantile in distribution F ,
and at least one quantile is strictly greater, then distribution G will be assigned higher welfare level
by every SWF in the class W1 (that is, the class of increasing evaluation functions of income, viz.,
u(.)).
As we have seen in the first order stochastic dominance part of the lectures, in practical applications, very often it would be the case that neither distribution first order stochastically dominates
the other. Second, it may not incorporate all the standard principles of social welfare analysis; for
example, it does not incorporate the principle of transfers. For this reason, it is good to introduce
the idea of Second order distributional dominance.

3.1.1

Quick Recall: Mean Preserving Spread

Call s as Mean Preserving Spread (MPS) if


Z b

Z b

xs(x)dx.

s(x)dx =
a

Suppose we obtain the probability density function g(x) from another density function, say, f (x)
by the Mean Preserving Spread s(x); that is:
g(x) = f (x) + s(x).
We will then get:
Z b

Z b

Z b

s(x)dx = 1.

f (x)dx +

g(x)dx =

Also note carefully that


a

xf (x)dx

xs(x)dx =

xf (x)dx +

xg(x)dx =
a

Z b

Z b

Z b

Z b

given the properties of the MPS.


Define S(x) =
S(b) =

Rb
a

Rx
a

s(u)du. Then, we have the following:

s(u)du = 0 = S(a).

There exists z (0, 1) such that S(x) 0 if x z, and S(x) 0 for x > z.
With this quick recall of the idea of Mean Preserving Spread, let us get back to our main argument.

3.2

Second Order Distributional Dominance

Let us begin with the following definition.


Definition: For all F D, and for all 0 q 1, the cumulative income functional is defined by:
Z Q(F ;q)

C(F ; q) =

xdF (x)
a

Then, note carefully that C(F ; 0) = 0 and C(F ; 1) = (F ). You can take the income to range
between a to b, as we have done before in the stochastic dominance section. The graph of C(F ; q)
against q gives the Generalized Lorenz curve.
INSERT FIGURE ABOUT HERE.
Theorem 2
For all F, G D, G C F if and only if W (G) W (F ) for all W W2 class; that is the class of
u(.) that is increasing and concave.
INSERT FIGURE ABOUT HERE.
The distribution G second order dominates F in the above diagram. Generalized Lorenz curve
(GLC) is a fundamental tool for drawing inferences about welfare about individual income data.
6

3.3

Lorenz Curve

Closely linked concept is the traditional Lorenz curve, also known as Relative Lorenz Curve:
L(F ; q) =

C(F ; q)
.
(F )

Lorenz Curve is the graph of L(F ; q) against q.


INSERT FIGURE ABOUT HERE.
In the above figure, it is clear that the income share of the bottom 100 q percentage of the
population must be higher in distribution G than F , whatever the value of q be.
Theorem 3
For all F, G D, class of all distributions with the same mean, G L F if and only if W (G) W (F )
for all W W2 , that is u0 > 0, u00 < 0 class of SWFs.
As we know, when Lorenz curves intersect, you cannot use Second Order Stochastic Dominance.
Then, you need to impose additional restrictions on the class of SWFs. For example, you might
want to think of Kolms Principle of Diminishing Transfers, viz., that a small transfer from an
individual with income x to one with x should have a greater impact on inequality the lower
the income x is located in the distribution. This is analogous to the notion of Decreasing Absolute
Risk Aversion (DARA) that we have seen while motivating the Third order stochastic dominance.
Hence, we can then go on to using third order stochastic dominance in such cases.
At this point, let us pause to see which axioms the Lorenz curve satisfies. It clearly satisfies the
axioms of anonymity and population principle as both of them, viz., permutation and replication, do
not alter the underlying cumulative distribution function. Proportionate changes in income might
shift and expand the cumulative distributive function horizontally, but the Lorenz curve normalizes
such changes by the mean of the income distribution and hence the resulting Lorenz curve is
unchanged. Also, regressive income transfers imply the one that obtained by Mean preserving
spread is worse than the other. Hence, Lorenz curve satisfies Pigou-Dalton Principle of Transfers.
Hence, we can draw the following inference.
Result: The Lorenz curve measure satisfies the axioms of Anonymity, Population Principle, Homogeneity, and the Principle of Transfers.
Given the immunity to replication and proportionate changes in income, we would call any
measure that satisfies the above four axioms as a relative inequality measure or a measure of
relative inequality.
We can define an inequality measure I : D R as Lorenz Consistent if for all F, G D,
G L F implies I(F ) > I(G).
G L F implies I(F ) I(G).
Needless to state, Lorenz curve itself is Lorenz consistent. Which other measures satisfy this
property? We can characterize the class of Lorenz consistent measures. Let us state it without the
proof.
7

Result: An inequality measure I : D R is Lorenz consistent if and only if it satisfies the axioms
of Anonymity, Population Principle, Homogeneity, and the Principle of Transfers.
One way is quite easy to see. Let I : D R be a measure of inequality. Suppose that I is Lorenz
consistent. Then, we know that L satisfies all the above axioms. Hence, by Lorenz consistency, the
measure I too satisfies all the above axioms. The reverse is slightly more involved, which we skip.
This result underscores the point that the class of Lorenz consistent inequality measures is
exactly the same as the class of relative inequality measures. This leads us to conclude that:
Result: I(F ) > I(G) for every relative inequality measure I if and only if G L F ; similarly,
I(F ) I(G) for every relative inequality measure I if and only if G L F .
However, it is in the nature of the general ranking principles that in many practical situations,
they tend to yield an indecisive answer. In such cases, the specific indices come handy. Let us take
a quick look at two such measures. The most popular one is that of Gini coefficient.

3.4

Gini Coefficient

We can define it in many ways. The first way is:


G(F ) =

1
2(F )

Z Z

|x x0 |dF (x)dF (x0 )

This definition is the normalized average absolute difference between all pairs of incomes in the
population. It captures the idea of average distance between incomes in the population according
P
0
0
to the L1 metric on Rn : nj=1 |xj xj | for x, x Rn .
The other way to define Gini Coefficient is:
G(F ) = 1 2

Z 1

L(F ; q)dq
0

That is, the Gini coefficient is twice the area between the Lorenz curve and the diagonal. That is,
it is the normalized area between the Lorenz curve and the diagonal.

3.5

Equally Distributed Equivalent

Let point F in the diagram below represent an income distribution in a two-person economy. Then,
we can find the mean income as the abscissa of the point M where the 45 degree line through
F intersects the equality ray. The equally distributed equivalent, xii is the abscissa of the point E
where the W-contour through F intersects the equality ray.
INSERT FIGURE ABOUT HERE.
The farther F is on the constant total income line from perfect equality point M , the lower is
. The normalized gap between and then provide a natural basis for an inequality index.
IA (F ) = 1
8

(F )
.
(F )

For any given income distribution, the more sharply convex to the origin is the W-contour, the
greater is the gap between and . In the extreme case, you will get the following, where the SWF
is L-shaped.
INSERT FIGURE ABOUT HERE.
This value, (F ) is interpreted as aversion to inequality. Put this way, there is a strong link
between the ideas here and Economics of Uncertainty. For example, the equally distributed equivalent, viz., (F ) is nothing but the Certainty Equivalent in the N-M theory.
Of course, the equally distributed equivalent income level will be different for different social
welfare functions, and so the inequality measure will be dependent on the choice of the social welfare
function.
Poverty Measures
Even though we will not go into poverty measures, let us understand certain subtle differences
between inequality measurement and poverty measurement. Any practical approach to poverty
has to specify a poverty line, say, x . This value could be an unique exogenously given value, or
alternatively, some functional of the distribution F , or a set of possible values. You can think of it
as Head Count Ratio - viz., the proportion of the individuals with incomes below the poverty line,
Given such a value, there is a clear partition of the population into poor and non-poor. Now, you
cannot use Anonymity axiom for example in the poverty context without additional changes. You
may want to apply the anonymity axiom to only the partition of the non-poor: a permutation of
the non-poor alone should leave the poverty index unaltered. With such suitable modifications, you
would then be able to apply the stochastic dominance structure to poverty comparisons. However,
it is beyond our scope. For those interested in the poverty measures, a good place to begin would
be: Atkinson, A.B., 1987, On the Measurement of Poverty, Econometrica, Vol. 55, pp. 749-764.

You might also like