Professional Documents
Culture Documents
Data Envelopment Analysis Pattern
Data Envelopment Analysis Pattern
Robert M. Hayes
2005
Overview
Introduction
Data Envelopment Analysis
DEA Models
Extensions to include a priori Valuations
Strengths and Weaknesses of DEA
Implementation of DEA
The Example of Libraries
Annals of Operations Research 66
Annals of Operations Research 73
Introduction
Utility Functions
Cost/Effectiveness
Interpretation for Libraries
Utility Functions
A fundamental requirement in applying operations research models is the
identification of a "utility function" which combines all variables relevant to
a decision problem into a single variable which is to be optimized.
Underlying the concept of a utility function is the view that it should
represent the decision-maker's perceptions of the relative importance of the
variables involved rather than being regarded as uniform across all
decision-makers or externally imposed.
The problem, of course, is that the resulting utility functions may bear no
relationship to each other and it is therefore difficult to make comparisons
from one decision context to another. Indeed, not only may it not be possible
to compare two different decision-makers but it may not be possible to
compare the utility functions of a single decision-maker from one context to
another.
Cost/Effectiveness
A traditional way to combine variables in a utility
function is to use a cost/effectiveness ratio, called an
"efficiency" measure. It measures utility by the "cost
per unit produced". On the surface, that would appear
to make comparison between two contexts possible by
comparing the two cost/effectiveness ratios. The
problem, though, is that two different decision-makers
may place different emphases on the two variables.
Cost/Effectiveness
It also must be recognized that effectiveness will usually entail
consideration of a number of products and services and costs a
number of sources of costs. Cost/effectiveness measurement
requires combining the sources of cost into a single measure of cost
and the products and services into a single measure of effectiveness.
Again, the problem of different emphases of importance must be
recognized. This is especially the case for the several measures of
effectiveness. But it may also be the case with the several measure
of costs, since some costs may be regarded as more important than
others even though they may all be measured in dollars. When
some costs cannot be measured in dollars, the problem is
compounded.
Cost/Effectiveness
More generally, instead of costs and effectiveness, the
variables may be identified as "input" and "output".
The efficiency ratio is then no long characterized as
cost/effectiveness but as "output/input", but the issues
identified above are the same.
Graphical Illustration
To illustrate, consider seven DMUs which each have
one input and one output: L1 = (2,2), L2 = (3,5), L3 =
(6,7), L4 = (9,8), L5 = (5,3), L6 = (4,1), L7 = (10,7).
9
L4
L3
7
Output
L7
L2
5
4
L5
L1
L6
1
0
0
Input
10
11
Graphical Illustration
DEA identifies the units in the comparison set which lie at
the top and to the left, as represented by L1, L2, L3, and
L4. These units are called the efficient units, and the line
connecting them is called the "envelopment surface"
because it envelops all the cases.
DMUs L5 through L7 are not on the envelopment surface
and thus are evaluated as inefficient by the DEA analysis.
There are two ways to explain their weakness. One is to
say that, for example, L5 could perhaps produce as much
output as it does, but with less input (comparing with L1
and L2); the other is to say it could produce more output
with the same input (comparing with L2 and L3).
Graphical Illustration
Thus, there are two possible definitions of efficiency depending on
the purpose of the evaluation. One might be interested in possible
reduction of inputs (in DEA this is called the input orientation) or
augmentation of outputs (the output orientation) in achieving
technical efficiency. Depending on the purpose of the evaluation,
the analysis provides different sets of peer groups to which to
compare.
However, there are times when reduction of inputs or
augmentation of outputs is not sufficient. In our example, even
when L6 reduces its input from 4 units to 2, there is still a gap
between it and its peer L1 in the amount of one unit of output. In
DEA, this is called the "slack" which means excess input or
missing output that exists even after the proportional change in
the input or the outputs.
Graphical Illustration
This example will be used to illustrate the several forms
that the DEA model can take.
In each case, the results presented are based on the
implementation of DEA that will be discussed later in
this presentation. It is an Excel spreadsheet using the
add-in Solver capability.
The spreadsheet is identical for all of the forms, but the
application of Solver differs in the target optimized and
in the values to be varied, so for each form the target
and the values to be varied will be identified.
DEA Models
Formulation
Let (Yk,Xk) = (Yki,Xkj), k = 1 to n, i = 1 to s, j = 1 to m
Maximize Yk/Xk for each value of k from 1 to n,
subject in each case to Yj/nXj <= 1, j= 1 to n, where
Yk means i i*Yki, i = 1 to s,
Xk means i i*Xki, i = 1 to m
i i
Min
Yj -Xj <= 0
a -I
<= -I
b
-I <= -I
>= >=
Max Yk - Xk
In a moment, we will interpret this display as it is
translated into alternative formulations of the
optimization target and conditional inequalities.
Primal Form
Input Minimization
Fixed Returns
Discretionary Variables
Additive
Dual Form
Output Maximization
Variable Returns
Non-discretionary Variables
Multiplicative
Primal Formulation
Yj -Xj <=
(2) -I
<=
(3)
-I <=
0
-I
-I
(M) Yk - Xk
(M)
(1)
(2)
(3)
Maximize W = Yk Xk subject to
Yj Xj <= 0, j = 1 to n
-<= -1, or = 1
- <= -1, or >= 1
(m)
0
-I
-I
Illustration
To illustrate, consider the example previously presented. The
target to be minimized in the Dual form is W = a b. The
values to be varied are (, a, b), or ( .
The following table shows the solution for both forms:
L1
L2
L3
L4
L5
L6
L7
X
2
3
6
9
5
4
10
Y
2
5
7
8
3
1
7
W
- 1.33
0.00
- 3.00
- 7.00
- 5.33
- 5.67
- 9.67
b
1.33
0.00
3.00
7.00
5.33
5.67
9.67
= 0.67
1
1
1
1
1
1
1
1.67
1.67
1.67
1.67
1.67
1.67
1.67
Illustration
Graphically, the results are as follows:
25
20
15
10
5
0
0
10
12
Orientation to Input
The linear programming display for the input orientation is as follows:
Min
Yj -Xj <= 0
a -I
<= 0
b
-I <= 0
I
c-1
Xk <=
It adds one additional
>=condition,
>= Xk <= 1, to the display.
Max Yk - Xk
Orientation to Input
The resulting Dual formulation is as follows:
subject to
Yj -Xj
a -I
b
-I
c-1
Xk
>= >=
Max Yk - Xk
(m)
0
0
0
I
Orientation to Input
Continuing with the same example, the following table shows the solutions
in both formulations. The target is W = c 1. Values to be varied are now
(, a, b, c) or
( and .
X
Y
W=c-1 a b
L1
2
2
- 0.40
= 0.40 0.30
L2
3
5
0.00
0.20
L3
6
7
- 0.30
0.10
L4
9
8
- 0.46
0.07
is now
quite
L5Note that
5 L2 still
3 dominates
- 0.64 the solution, but the
0.12
graph
different,
L6
4
1
- 0.85
0.15
L7
10
7
- 0.58
0.06
0.50
0.33
0.17
0.11
0.20
0.25
0.10
Orientation to Input
12
10
8
6
4
2
0
0
10
12
Orientation to Output
The linear programming display for the output orientation is as follows:
Min
Yj -Xj <= 0
a -I
<= 0
b
-I <= 0
I
c - 1 Yk
<=
It adds one additional condition, Yk <= 1, to the display.
>= >=
Max Yk - Xk
Orientation to Output
(m) Minimize W = 1 c
(1) Yj a >= cYk
subject to
Yj -Xj
a -I
b
-I
1 - c Yk
>= >=
Max Yk - Xk
(m)
0
0
0
I
Orientation to Output
Continuing with the same example, the following table shows the solutions
in both formulations. The target is W = 1 c. Values to be varied are still
(, a, b, c) or ( and .
X
Y
W=1-c a b
L1
2
2
- 0.67
= 0.67 0.50
L2
3
5
0.00
0.20
L3
6
7
- 0.43
0.14
L4
9
8
- 0.87
0.13
is now0.33
quite
L5Note that
5 L2 still
3 dominates
- 1.78 the solution, but the
graph
different,
L6
4
1
- 5.67
1.00
L7
10
7
- 1.38
0.14
0.83
0.33
0.24
0.21
0.56
1.67
0.24
Orientation to Output
Note that the graphical display is identical to that for
the general form, though the interpretation is
somewhat different (replacing efficiencies by slacks).
25
20
15
10
5
0
0
10
12
Min
u
Yj -Xj I <= 0
a -I
<= - I
b
-I
<= - I
>= >=
Max Yk - Xk I
It adds the variable u to the display.
Min
u
Yj -Xj I <= 0
a -I
<= 0
b
-I
<= 0
I
c-1
Xk
<=
>= <= 1, to the display.
>= X
It adds one additional>=
condition,
k
Max Yk - Xk I
Orientation to Input
u
Yj -Xj I
a -I
The new, third condition
b makes things
-I interesting.
c-1
Xk
>= >=
Max Yk - Xk I
(m)
0
0
0
I
Orientation to Input
Continuing with the same example, the
following table shows the solutions in both
formulations. The target is W = c 1. Values to
be varied are now (, a, b, c) or (u.
L1
L2
L3
L4
L5
L6
L7
X
2
3
6
9
5
4
10
Y
2
5
7
8
3
1
7
W=c-1
0.00
0.00
0.00
0.00
- 4.00
- 5.00
- 4.00
b
0.00
0.00
0.00
0.00
2.00
4.00
0.00
=1.00
1.00
1.00
1.00
1.00
1.00
1.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
u
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Orientation to Input
12
10
8
6
4
2
0
0
10
12
Orientation to Output
The linear programming display for the output orientation is as follows:
Min
Yj -Xj <= 0
a -I
<= 0
b
-I <= 0
I
c - 1 Yk
<=
It adds one additional condition, Yk <= 1, to the display.
>= >=
Max Yk - Xk
Orientation to Output
Yj -Xj
a -I
b
-I
1 - c Yk
>= >=
Max Yk - Xk
(m)
0
0
0
I
(m) Minimize W = 1 c
(1) Yj a >= cYk
subject to
Orientation to Output
Continuing with the same example, the following table shows the solutions
in both formulations. The target is W = 1 c. Values to be varied are still
(, a, b, c) or ( and .
X
Y
W=1-c a b
L1
2
2
- 0.67
= 0.67 0.50
L2
3
5
0.00
0.20
L3
6
7
- 0.43
0.14
L4
9
8
- 0.87
0.13
is now0.33
quite
L5Note that
5 L2 still
3 dominates
- 1.78 the solution, but the
graph
different,
L6
4
1
- 5.67
1.00
L7
10
7
- 1.38
0.14
0.83
0.33
0.24
0.21
0.56
1.67
0.24
Orientation to Output
Note that the graphical display is identical to that for
the general form, though the interpretation is
somewhat different (replacing efficiencies by slacks).
25
20
15
10
5
0
0
10
12
Categorical Variables
In the DEA model as so far presented, the variables are
treated as essentially quantitative, but sometimes one
would like to identify non-quantitative variables, such
as ordinal or nominal variables.
For example, one might like to compare institutions of
the same type, such as public or private universities.
This is accomplished by introducing categorical
variables containing numbers for order or identifiers
for names.
Substitutability of Variables
A still unresolved issue is the means for representing
substitutability of variables. For example, two input
variables may represent two different type of labor
which may be, to some extent, substitutable for each
other.
How is such substitutability to be incorporated?
Lets explore this issue a bit further since, by doing so,
we can illuminate some additional perspectives on the
basic DEA model.
Substitutability of Variables
For simplicity in description, consider two input variables and a
single output variable that has the same value for all DMUs. The
graphic representation of the envelopment surface can now best
be presented not in terms of the relationship between output and
input, as previously shown, but between the variables of input.
The two variables are Professional Staff and NonProfessional Staff. The assumption is that they are completely
substitutable and that physicians differ only in their styles of
providing service, represented by the mix of the two means for
doing so.
The efficient DMUs are located on the red envelopment
surface, which shows the minimums in use of variables.
Substitutability of Variables
10
Style 1
Style 2
Style 3
Non-Professional Staff
8
7
6
5
4
Style 4
3
2
1
0
-1
Professional Staff
Implementation of DEA
Structure
Spreadsheet implementation
Choice of Model
Spreadsheet Structure
Spreadsheet Calculations
Solver Elements in Spreadsheet
Visual Basic Program
Access to the Implementation
The data included in the spreadsheet is for ARL
libraries in 1996.
Choice of Model
The spreadsheet includes means to identify the choice of
model by means of three parameters:
Form: Dual represented by 0 and Primal by 1
Orientation: Addition by 0, Input by 1, Output by 2
Convexity: No by 0, Yes by 1
Given the specification, solution of the resulting model is
initiated by pressing Ctrl-q.
The solution is effected by a Visual Basic program that
determines the model from the parameters and then launches
the Excel Add-In called Solver.
The program then produces the output on Sheet 3 that shows
the results.
Spreadsheet Structure
The DEA Spreadsheet for application to ARL libraries
consists of three main parts:
Spreadsheet Calculations
The Spreadsheet calculations in D5:R14 can be
illustrated by D5:D14 and N5:N14:
C
D
5 Discretionary?
1
6 Weights
0.000001
7
8
9 Comp
=SUMPRODUCT(Mult,D17:D113)*D16
10 Slacks
15.2073410229378
11 Mod Comp
=D9+D10
12 =INDEX(C17:C126,MATCH($B$12,$B$17:$B$126,0),1) =INDEX(Data,MATCH($B$12,$B$17:$B$126,0),COLUMN()-3)*D16
13
=D12*$B$13
14
=IF($B$2=1,D13,D12)
Spreadsheet Calculations
The Spreadsheet calculations in D5:R14 can be
illustrated by D5:D14 and N5:N14:
C
N
5 Discretionary?
1
6 Weights
9.99999999999265E-07
7
8
9 Comp
=SUMPRODUCT(Mult,N17:N113)*N16
10 Slacks
5.56269731722995
11 Mod Comp
=N9-N10
12 =INDEX(C17:C126,MATCH($B$12,$B$17:$B$126,0),1) =INDEX(Data,MATCH($B$12,$B$17:$B$126,0),COLUMN()-3)*N16
13
=N12*$B$13
14
=IF($B$2=2,N13,N12)
B1 B2 B3 Target
0 0 0 B7
0 0 1 B7
0 1 0 B8
0 1 1 B8
0 2 0 B9
0 2 1 B9
1 0 0 B6
1 0 1 B6
1 1 0 B6
1 1 1 B6
1 2 0 B6
1 2 1 B6
Min
Min
Min
Min
Min
Min
Max
Max
Max
Max
Max
Max
Vary
$D$10:$R$10,$A$17:$A$113
$D$10:$R$10,$A$17:$A$113
$D$10:$R$10,$A$17:$A$113,$B$13
$D$10:$R$10,$A$17:$A$113,$B$13
$D$10:$R$10,$A$17:$A$113,$B$13
$D$10:$R$10,$A$17:$A$113,$B$13
$D$6:$R$6
$D$6:$R$6,$S$6
$D$6:$R$6
$D$6:$R$6,$S$6
$D$6:$R$6
$D$6:$R$6,$S$6
Conditions
$D$11:$R$11=
$D$11:$R$11=
$D$11:$R$11=
$D$11:$R$11=
$D$11:$R$11=
$D$11:$R$11=
$D$14:$R$14
$D$14:$R$14
$D$14:$R$14
$D$14:$R$14
$D$14:$R$14
$D$14:$R$14
$A$17:$A$113>=
$A$17:$A$113>=
$A$17:$A$113>=
$A$17:$A$113>=
$A$17:$A$113>=
$A$17:$A$113>=
$T$17:$T$113<=
$T$17:$T$113<=
$T$17:$T$113<=
$T$17:$T$113<=
$T$17:$T$113<=
$T$17:$T$113<=
0
0
0
0
0
0
0
0
0
0
0
0
$B$127=
$B$127=
$B$127=
$T$12=
$T$12=
$T$12=
$T$12=
$T$12=
$T$12=
1
1
1
1
1
1
1
Selection of Data
The Variables
Constraints on Weights
Results
Efficiency Distribution
The following chart display the efficiency distribution
for the 97 U.S. ARL libraries.
The input and output components for each institution
have been multiplied by the size of the collection.
Note the cluster of inefficient institutions below the
3,000,000 volumes of holdings.
There appear to be three groups of institutions:
Collection*Output
13.00
11.00
9.00
7.00
5.00
3.00
1.00
1.00
3.00
5.00
7.00
9.00
Collection*Input
11.00
13.00
Sum of Projections
The following chart show the distribution of the sum of
the projections as a function of the Intensity.
9,00
Sum of Projections
8,00
7,00
6,00
5,00
4,00
3,00
2,00
1,00
0,00
0,00
0,20
0,40
0,60
Intensity
0,80
1,00
1,20
Distribution of Weights
The following chart shows the magnitudes of the
weights on each of the Input and Output components
0,25
0,20
0,15
0,10
0,05
0,00
0
-0,05
10
12
14
16
Extensions in DEA
Covers (1) new measures of efficiency, (2) new models, and
(3) new implementations.
The TDT measure of relative efficiency takes the
criterion measure (weighted output/weighted input)
relative to the maximum for that measure
The Pareto-Koopman measure applies the Pareto criterion
(no variables can be improved without worsening others)
The BCC model (variable returns to scale) is presented.
Congestion arises when excess inputs interfere with
outputs. It thus represents relationships among variables.
Translation invariance
This paper proves that several of the DEA model are
translation invariant (i.e., optimal solutions are not
changed if the original variable values are translated,
that is all values for a variable are replaced by some
constant minus the values).
Specifically, the primal additive model is translation
invariant.
The BCC input oriented primal model is output
translation invariant.
The CCR models are not translation invariant.
Lack of invariance
This paper supplements the prior one. It shows that in
neither the BCC model nor the additive model are the
optimal solutions for the dual (i.e., multipler)
formulation invariant under translation.
Duality,
classification
slack
This paper considers
the role of slacks and
especially
in the context of
Multiplier Sensitivity
The stability of the set E of extreme efficient DMUs is
examined to determine the sensitivity to changes in the
data,
Simulation studies
Well, so be it.
DEA/AR analysis
Another application in China.
Staffing efficiency
Again, styles of management are identified, this time
based on ratios of types of staffing (e.g., professional vs.
non-professional). Industries are divided into types (batch
vs. line processing industries) and best practices for
each type are identified by DEA.
software production
Input to software production is taken as cost; outputs as
size (measure by function points), quality (measured
by defects or rework hours), and time to market.
The DEA is compared to performance ratio analyses,
such as Cost/Function, Defects/Function, Days/Function.
Then, constraints on the weights are introduced. One set
of constraints consisted of bounds on ratios of weights. A
second set of constraints consisted of tradeoffs between
variables, again represented by bounds on ratios.
Baseball batters
Traditional methods for evaluating batters include
fixed and variable weight statistics (homers, batting
average, slugging average, RBI, etc.). The point in this
article is that use of DEA allows one to determine the
effect of changes over time.
Another effect of interest is noise. To correct for
noise, the DEA model derates the data for each
player by a factor based on the players standard
deviation for each variable
Efficiency of families
Family home health care is assessed using a stepped
procedure in DEA.
The stepped procedure involves a series of steps in
which variables are successively introduced:
Inputs
Step 1 Direct Costs Medical Expense
Step 2 Indirect Costs Training
Step 1
Step 3 Caring Costs Hours/day
Moths/caregiving
Medication
Step 2
Outputs
Family Income
Patient/Caregiver
Step 1
Caregiver burden
Caregiver esteem
Comparative disadvantage
The DEA model for determining comparative
disadvantage is:
Max R C + w subject to
- uY1 + R = -1
Min -
- Y1 + Y + T0 = 0
vX1 C = 1
X1 X+ T1 1 = 0
uY vX = Iw <= 0
uT0 <= 0, vT1 <= 0
R, C >= 0
I= 1
<= 1, >= 1
>= 0
Comparative advantage
The DEA model for determining comparative
advantage is applied to the set removing the target
unit:
Max R + C + w subject to
- uY1 + R = 1
Min -
- Y1 + Y1 + T0 = 0
vX1 + C = 1
X1 X1+ T1 1 = 0
I= 1
>= 1, <= 1
>= 0
Model mis-specification
This paper examines the effects of various types of misspecifications of the DEA model. They include:
Discriminant Analysis
Discriminant analysis is a means for determining group
classification for a set of similar units or observations.
It determines a set of factor weights which best
separate the groups, given units for which membership
is already known.
This paper proposes the use of DEA as a means for
doing DA
The End