Professional Documents
Culture Documents
Computers & Industrial Engineering: Jingjing Ding, Shengqing Chang, Ruifeng Wang, Chenpeng Feng, Liang Liang
Computers & Industrial Engineering: Jingjing Ding, Shengqing Chang, Ruifeng Wang, Chenpeng Feng, Liang Liang
Computers & Industrial Engineering: Jingjing Ding, Shengqing Chang, Ruifeng Wang, Chenpeng Feng, Liang Liang
A R T I C L E I N F O A B S T R A C T
Keywords: The application of data envelopment analysis (DEA) in large-scale datasets raises computational concerns, and
Data envelopment analysis (DEA) many novel algorithms have been proposed. However, limitations of the existing algorithms such as the
Large-scale datasets computational difficulties due to data volume and privacy issues still exist when the datasets under evaluation
Dantzig-Wolfe decomposition
are massive and possess a high-density feature. The existing algorithms have not mentioned the potential conflict
Data privacy
between a requirement of full data for implementation in applications and the reality that data privacy could
prevent a full data application. To address the above-mentioned issues, we integrate DEA and the Dantzig-Wolfe
(DW) decomposition algorithm and propose a parallel DEA-DW algorithm to facilitate the computing of effi
ciency scores. Furthermore, the computing time of the algorithm is analyzed. Finally, we perform numerical
experiments with different datasets to demonstrate the feasibility and effectiveness of the proposed algorithm,
and analyze the interactions of the master problem (MP) and the sub-problems (SPs) of the algorithm.
1. Introduction LPs by identifying efficient DMUs. Barr and Durchholz (1997) design an
algorithm to quickly find efficient DMUs as benchmarks, and then
Data envelopment analysis (DEA), originally proposed by Charnes complete the evaluation of all DMUs. Similar work can be seen in
et al. (1978), is a non-parametric estimation method for measuring the Korhonen and Siitari (2007, 2009), Dulá (2011), Zhu et al. (2018),
relative efficiencies of a set of homogeneous decision-making units Khezrimotlagh et al. (2019), Khezrimotlagh and Zhu (2020), Jie (2020),
(DMUs). In an era of an expanding data scale, the need arises to boost Khezrimotlagh (2021), Yu et al. (2021), Dellnitz (2022), just to name a
computing efficiency by exploring the structural properties of DEA few. Chen and Cho (2009), and Chen and Lai (2017) propose a new
models when facing a great amount of data in real-life applications. method to evaluate all DMUs by identifying some “similar” key DMUs of
The standard or ‘naive’ approach for assessing the efficiency of n evaluated DMU and solving small-size LPs. Dulá and López (2009)
DMUs is to solve n specialized linear programs (LPs) respectively. summarize five pre-processing methods, which can be used to quickly
Repeated solutions of many similar LPs are computationally intensive, if determine partially efficient DMUs or partially inefficient DMUs. More
not infeasible, in large-scale data applications. To address the issue, one studies on DEA computation for large-scale datasets can be found in
needs high-performance computer hardware equipped with a stack of Dulá and Thrall (2001), Dulá and López (2002), Dulá and López (2013),
computational strategies, a high-performance LP solver and other and so on.
appropriate software packages. Despite the fact that the last two factors Dulá (2008) introduces three important factors that affect DEA
play key roles (Dulá, 2008), this paper focuses on a computational computing times, namely, the number of DMUs (cardinality), the
strategy that takes advantage of features such as decomposability and number of inputs and outputs (dimension), and the proportion of effi
parallelism while treating the LP solver as a ‘black box’. cient DMUs (density). Unfortunately, the methods mentioned above are
Many computational strategies have been put forward in the litera particularly suitable for low-density situations. However, when the
ture for large-scale datasets. Ali (1993, 1994) proposes two solution density is high, the cost will overweigh the benefit from reducing inef
enhancement techniques: restricted basis entry (RBE) and early identi ficient DMUs. For example, hierarchical decomposition (HD) proposed
fication of efficient DMUs (EIE), the former reduces the LP’s size by by Barr and Durchholz (1997), build hull (BH) proposed by Dulá (2011)
removing inefficient DMUs and the latter reduces the number of solved and the framework proposed by Khezrimotlagh et al. (2019) to identify
* Corresponding author.
E-mail addresses: jingding@hfut.edu.cn (J. Ding), 2020110822@mail.hfut.edu.cn (S. Chang), wrf960117@mail.hfut.edu.cn (R. Wang), cpfeng@hfut.edu.cn
(C. Feng), lliang@hfut.edu.cn (L. Liang).
https://doi.org/10.1016/j.cie.2022.108875
all efficient DMUs have limitations because the remaining scale of effi computing time is provided. (3) The proposed algorithm is helpful to
cient DMUs is still too large in high-density situations. maintain data confidentiality among different data owners, which
Regarding the high-density cases, Chen and Cho (2009) identify a broadens the range of applications by satisfying the data privacy
few “similar” critical DMUs as reference set to compute each DMU’s requirement.
efficiency value. Simulation results show that the accelerating proced The rest of the paper is organized as follows. Section 2 describes the
ure can reduce the computational time drastically. Chen and Lai (2017) underlying DEA model used in this paper and how to split the model by
propose a “Trial and Error” (TE) procedure that could control the size of the DW decomposition. In Section 3, we describe the proposed parallel
individual LPs while still maintaining optimality. These methods have a DEA-DW algorithm and demonstrate how it works by an example. In
prominent feature that the size of a main working DEA model is un addition, a formula is provided to estimate the computing time. Section
changed. However, to verify whether the solution is optimal, they need 4 performs numerical experiments and analyzes the interactions of the
to check an optimality condition on the whole dataset, which renders the MP and SPs. Section 5 concludes the paper.
algorithm impractical as it involves frequent hard disk operations when
the size of datasets exceeds the capacity limit of a computer. 2. Preliminaries
To sum up, what if the final number of efficient DMUs is too big to fit
into the RAM (Random Access Memory)? When the size of a dataset is 2.1. Basic BCC model
larger than that of the RAM of a computer, the computer has to
frequently interact with the comparatively low-speed hard disk to swap Suppose that there are n DMUs containing m inputs and s outputs.
in and out the data generated in the computing process. This is DMU0 DMUj (j = 1, 2, ⋯, n) denotes that the unit consumes the xij (i =
impractical and time-consuming. A question that arises is: how to take 1, ..., m) inputs to produce the yrj (r = 1, ..., s) yr0 (r = 1, ..., s) outputs.
advantage of the structural features of a DEA model if we cannot avoid DMU0 denotes the DMU under evaluation. A popular DEA model for
running a large-scale DEA model that exceeds the capacity limit of a evaluating the efficiency score is the BCC (Banker–Charnes–Cooper)
computer? The ultimate need to overcome the difficulties of solving a model (Banker et al., 1984). The input-oriented and envelopment model
DEA model caused by large-scale datasets with a high-density feature is is as follows.
one of the motivations for this work.
Aside from the computing difficulty, data privacy has not been dis Einput
0 = Min θ0
cussed in addressing the computing issue in the current literature. ∑n
Obviously, the existing DEA algorithms require the full data of all DMUs s.t. λj xij ≤ θ0 xi0 , i = 1, ⋯, m;
to be implemented. This requirement poses a hidden risk to data sharing.
j=1
2
J. Ding et al. Computers & Industrial Engineering 175 (2023) 108875
p
∑ i ,sr ,si ,sr ) denotes slack variables. For brevity,
In Model (3-A), (s1− 1+ 2− 2+
Min θk1
we provide a matrix form for Model (3-A) in Model (3-B).
k=1
⌊n/p⌋
∑ ⌊n/p⌋
∑ Min E = CT X
s.t. λj xij ⩽ θ1i xi0 , λj yrj ⩾ y1r0 , i = 1, …, m , r = 1, …, s; ⎡ ⎤
[ ] D1 D2 [ ] [ ]
(3-B)
j=1 j=1
A1 ⎢ ⎥ X1 b1
⌊2n/p⌋
∑ ⌊2n/p⌋
∑ s.t. X = ⎢ ⎥
⎣ F1 0 ⎦ =
λj xij ⩽ θ2i xi0 , λj yrj ⩾ y2r0 , i = 1, …, m , r = 1, …, s; A2 X2 b2
0 F2
j=⌊n/p⌋+1 j=⌊n/p⌋+1
3
J. Ding et al. Computers & Industrial Engineering 175 (2023) 108875
p ∑
∑
Min z = ωk(l) θk1(l)
k=1 l∈Lk
⎫
p ∑
∑ p ∑
∑ p ∑
∑ ⎪
⎪
⎪
s.t. ω k k
(l) θ1(l) = ω k k
(l) θ2(l) = ⋯ = ω k k
(l) θm(l) , 1⩾θki ⩾ 0; ⎪
⎪
⎪
⎪
⎪
k=1 l∈Lk k=1 l∈Lk k=1 l∈Lk ⎪
⎪
⎪
⎪
∑
n ∑ ⎪
⎬
ωk(l) λj(l) = 1, λj ⩾ 0; (dual variables: π)
j=1 l∈Lk ⎪
⎪
⎪
⎪ (4-A)
⎪
⎪
p ∑
∑ ⎪
⎪
⎪
⎪
ωk(l) ykr0(l) = yr0 , r = 1, …, s; ⎪
⎪
k=1 l∈Lk
⎪
⎭
}
∑
ωk(l) = 1, k = 1, ..., p; (dual variables: αk )
l∈Lk
ωk(l) ⩾ 0,
( ) ( )
Min zk = CTk − πT Dk Xk − αk Min zk = CTk − π T Dk Xk − αk
⌊kn/p⌋
(5-B)
∑ s.t. Fk Xk = b2 k
s.t. λj xij + sk−i − θki xi0 = 0 , i = 1, …, m;
j=⌊(k− 1)n/p⌋+1
(5-A) 3. The parallel DEA-DW algorithm
⌊kn/p⌋
∑
λj yrj − sk+ − ykr0 = 0, r = 1, …, s;
3.1. Description of the algorithm
r
j=⌊(k− 1)n/p⌋+1
λj , sk−i , sk+ k
r ⩾0, yr0 ⩾yr0 ⩾0. Based on the above discussion, we provide the procedure referred to
Note that z is the optimal objective function value of MP, zk (k = 1, as the Parallel DEA-DW algorithm in Algorithm 1.
…, p) as the optimal objective function value of the kth SP, π as the dual
variables with respect to first three rows of constraints and αk (k = 1, ...,
p) as the dual variables of convexity constraints. All these variables Algorithm 1: Parallel DEA-DW
shall be used extensively in the sequel. In Model (4-A), [θk1(l) , θk2(l) , ..., θkm(l) , Step 1: Transform the standard BCC model (Model-1) to a block angular structure (
Model-2-A). Partition the transformed programming into one MP (Model-4-A) with
yk10(l) , yk20(l) , ..., ykr0(l) , λj(l) (j = ⌊(k − 1)n/p⌋ + 1, ..., ⌊kn/p⌋)], l ∈ Lk denotes nini [1] variables, and p SPs (Model-5-A) with m +s +(n/p) variables. Assign the MP to
extreme points of the kth SP, and ωk(l) (k = 1, 2, ...p; l ∈ Lk ) are convex a computer and p SPs to p computers.
Step 2: Identify an initial feasible basis solution for MP by Big M method. Initialize
multipliers associated with extreme points of the kth SP. In Model (5-A),
I = 0[2].
it is worth stating that CTk is the vector of objective function coefficients Step 3: Solve MP (Model-4-A) and storez, passing π and αk to all SPs.
in the MP corresponding to the variables of the kth SP. Models (4-A) and Step 4: Set objective functions of the SPs and solve the SPs (Model-5-A).
(5-A) have the same structures as the respective matrix forms of Models Step 5: Gathering the optimal objectives zk (k = 1, …, p) of all SPs, if all zk are
positive, then go to Step 6, otherwise go to Step 7.
(4-B) and (5-B) below.
Step 6: The resulting optimal solutionz* = z, the algorithm terminates.
Min E = CT EW Step 7: Letzs = min{zk , zk ⩽ 0}, s ∈ {1,…,p}. Based on the optimal solutions X*s of the
⎡ ⎤ sth SP, generate a specific coefficient column with a new variable to the MP, and
E1 W1
letI = I + 1. Go to Step 3.
⎢ ⎥
⎢ E2 W2 ⎥
⎢ ⎥ (4-B) 1
nini indicates the initial number of variables in the MP.
s.t. A1 EW = [ D1 ⋯ Dp− 1 Dp ]⎢ ⎥ = b1 2
⎢ ⋮ ⎥ I represents the number of iterations in the parallel DEA-DW algorithm, i.e., the
⎣ ⎦
number of column generation.
Ep Wp
4
J. Ding et al. Computers & Industrial Engineering 175 (2023) 108875
θ22 ), (y11 0 , y12 0 ) and (y21 0 , y22 0 ) are bounded above by 1, y1 0 and y2 0 single DMU is spent on two tasks. One is the MP computation that in
respectively. Thus, it is equivalent to add these upper bounds to SP 1 and creases in size with iteration I, and the other is the parallel computation
SP 2 to make explicitly the feasible domains of the SPs bounded poly of SPs. Initial MP is an LP with nini variables. Every time a column is
hedrons. As a result, we only need to use the extreme points of Con generated by the SPs, the size of MP will be added one. So, the first half
straints (3.2) and (3.3) together with upper bounds in the column of the formula represents the time consumed by the MP, which is the
generation process of DW decomposition. For the reason that extreme sum of a series of equal-difference from CMP (m, s, d) nini to C(m, s, d) (nini + I).
MP
points of the two SPs are sometimes difficult to obtain, we use the Big M And the rest is the time consumed by the SPs and interaction. Since each
method to initiate the procedure. Note that the MP has six constraints SP is an LP model with n/p variables, we get the time of CSP
(m, s, d) n/p. And
(four constraints are already shown in Constraints (3.1) and the other the required interaction time Tint for the comparison of SPs’ optimal
two are convexity constraints corresponding to two SPs). Thus, we use solution is added (Note that the interaction overhead costs little because
six artificial variables attached to these six constraints and associate six only the few parameters of MP need to be passed to all SPs and the
big M’s as the coefficients in the objective function of Constraints (3.1) optimal solution of the SP needs to be gathered for comparison). In
to construct a feasible basic solution for Steps 1 and 2. consideration of every iteration will have such a pass, so we multiply it
Next, we can get an initial dual price vector (π, α1 , α2 ) for con by I. Finally, n DMUs need to be evaluated n times, so we are multiplying
structing the objective functions of the two SPs. Taking SP1 as an the outermost part of the Formulation (6) by n. Hence, the proof.
example, its objective function is Min z1 = (CT1 − πT D1 )X1 − α1 . Natu
rally, we may start two computers to solve the SPs in parallel. Once we 4. Numerical study
get the z1 and z2 in the two SPs are all positive, the original problem’s
optimal objective value z* is the same as the optimal value of the current In this section, we use different datasets to demonstrate the feasi
MPz. That is, we get the efficiency value of the DMU under evaluation. bility and validity of the proposed parallel DEA-DW algorithm. We
Otherwise, we choose the optimal solution from the SP with the most perform the parallel DEA-DW algorithm using two sources of datasets:
negative value to generate a column to enter the MP. For example, randomly generated datasets and datasets from Dulá (2011). The first
ifz2 < z1 and z2 < 0, we choose the optimal solution from SP2 to source is computer generated random data. 12n-by-(m + s) data
generate a new variable with the coefficient column matrices are randomly generated, where (m, s) = (2, 2), (3, 3), (4, 4),
( T )T ( )
C2 X2 , D2 X2 , 0, 1 to the MP, where CT2 X2 is added to the objective and (5, 5) and n = 10000, 25000, 50000, the datasets can be viewed at
https://github.com/1660622007/datasets-Parallel_DEA_DW. The other
function and (D2 X2 , 0, 1)T enters the constraints. The remaining steps
source is Dulá (2011), We used 16 data sets with dimension (m, s) = (7,
are to repeat the column generation process iteratively until the pro
8), (10, 10), cardinality n = 25000, 50000, 75000, 100000 and density
cedure converges to an optimal solution.
To sum up, two features of the proposed algorithm can be discerned:
ρ = 0.10, 0.25.
In the following experiments, we will use the following tools and
(1) Scalability. The data needed for interactions are very small and the
software: Gurobi, Python, mpi4py, and a personal laptop (Gurobi 9.1.2,
majority of data are stored and maintained locally in SPs. Also note that
Gurobi Optimization, LLC. “Gurobi Optimizer Reference Manual”. 2021.
SPs are independent of one another and they only need to interact with
[Online]. Available: https://www.gurobi.com; Python 3.9 version;
the MP. (2) Confidentiality. The owners of parts of DMUs’ information
mpi4py, https://mpi4py.readthedocs.io/en/stable/; Personal laptop,
can construct their own SPs and implement computing with private
Intel Core i5-7300HQ CPU @2.50 GHz and 8G memory). We use one
partial data, while the MP is constructed and solved by a central eval
computer with multiple processes running to simulate multiple com
uator. The DMU’s efficiency evaluation can be finished by the interac
puters, where one process is used to solve the MP and each of the
tion mechanism depicted in Fig. 1 with the necessary data exchanges
remaining processes solves an SP. The process-to-process interaction we
without disclosing the information to other data owners.
implement with the support of mpi4py.
The performance results of the parallel DEA-DW algorithm in our
3.2. Computing time analysis of the algorithm generated datasets is shown in Table 1 (3 processes) and in datasets from
Dulá (2011) is shown in Table 2 (3 processes).
In this subsection, we discuss the performance of the proposed par In Tables 1 and 2, we give the three attributes of the dataset:
allel DEA-DW algorithm from the perspective of computing time dimension, cardinality and density, such as dimension = 2i2o, cardinality =
analysis. 10000 and density = 0.62 % mean the dataset has 10,000 DMUs with 2
The current paper assumes that the underlying algorithm to solve the inputs and 2 outputs and the proportion of efficient DMUs is 0.62 %. As
DEA model is the Simplex algorithm. Dulá (2008) demonstrates Simplex mentioned before, nini indicates the initial number of variables in the
algorithm is more efficient than other algorithms such as interior-point MP. Avg. I, Avg. final MP size and SPs size indicate the average iterations,
methods in DEA computation, and provides an approximate relationship the average number of variables in final MP and the number of variables
T = C(m, s, d) n, where T is the time required to solve a standard DEA in each SP, respectively. Avg. computing time means the average
computing time (Unit: Second) required to compute one DMU’s
problem, the factor C(m, s, d) is an inherent data attributes about
efficiency.
dimension (m, s) and density (d). n is the cardinality. It also means that
As shown in Table 1, in the first dataset (2i2o, 10000, 0.62 %), the
the total time required to get the efficiencies of all n DMUs, is T =
number of variables involved in each SP is 5004, including 5000 for λj , 2
C(m, s, d) n2 . Based on this, we can analyze the computing time of the
for θki and 2 for ykro . In other words, the number of variables in each SP is
parallel DEA-DW algorithm and Property 2 summarizes the main result.
determined by the number of DMUs and the number of dimensions
Property 2. Assume p computers in parallel for the calculations of p SPs. involved in each SP, while the number of constraints per SP is 4, which is
The computing time required to generate efficiencies of all n DMUs by the determined by the number of dimensions. The average final number of
parallel DEA-DW algorithm is given in the formula in (6), where Tint indicates variables for MP is 26.62, where the initial MP has 6 variables (i.e., nini ),
the interaction time overhead for passing parameters to all SPs and gathering and 20.62 variables are added through the column generation method
optimal solutions from the SPs in one iteration. (equal to the number of Avg. I).
( ( ) ) It is worth stating that Avg. I is equivalent to the number of infor
(2nini + I)(I + 1) n mation interactions required between the MP and each SP for each DMU,
T1 = MP
C(m, SP
+ C(m, + Tint I n (6)
s, d)
2 s, d)
p and the times of the MP and each SP solved. Therefore, it can be sug
gested that it is the most critical index in our algorithm, which
Proof: For the parallel DEA-DW algorithm, the computing time of a
5
J. Ding et al. Computers & Industrial Engineering 175 (2023) 108875
Table 1
The performance of the proposed algorithm in randomly generated datasets with 3 processes.
Dataset nini Avg. I Avg. final MP size SPs size Avg. computing time
Table 2
The performance of the proposed algorithm in datasets of Dulá (2011) with 3 processes.
Dataset nini Avg. nini Avg. final MP size SPs size Avg. computing time
significantly determines the size of MP and the times of MP and SPs an algorithm and the stark reality that data owned by different owners,
solved, i.e., the main computational consumption. who are unwilling to transfer data to a central location.
To further show the feasibility of the proposed algorithm, we simu In order to address these problems, we first study the combination of
late more computers, i.e., more SPs, and the results about Avg. I are the DEA model and the DW decomposition, and further propose the
shown in Figs. 2–5. parallel DEA-DW algorithm. In addition, we analyze the computing time
The following observations can be made based on Figs. 2–5. (1) As of the proposed algorithm. Subsequently, we demonstrate the feasibility
the number of SPs increases, Avg. I also increases, regardless of the of our algorithm through simulations. It can be found that it is feasible to
dimension. Figs. 2 and 3 show an unambiguous upward trend. However, use DW decomposition to decompose large-scale DEA problems, while
it should be noted that the size of each SP is smaller and the computation the size of SPs depends on the number of decomposed SPs. The parallel
consumption of each SP should be reduced. (2) As the dimension in DEA-DW algorithm is feasible in dealing with the DEA computation
crease, Avg. I is also increasing, regardless of the number of SPs. In problem for large-scale datasets and data confidentiality. The key index,
Figs. 2 and 3, it indicates that the line with the lower dimension is below the average iterations, is tightly related to the number of dimensions and
the line with the higher dimension. And in Figs. 4 and 5, it indicates that the number of SPs, while the cardinality has a small effect on it.
the heights of the columns in different dimensions are significantly Our proposed algorithm provides a solution strategy for a practical
different. The more dimensions, the more variables involved in the MP implementation of DEA computations when large-scale datasets
and the SPs, and as a result, the more iterations are needed to find an exceeding the capacity of RAMs; the amount of information exchanges
optimal solution. (3) The cardinality has a small effect on Avg.I. Figs. 4 in the computation process is small. Moreover, the data privacy issue has
and 5 display graphically that the effect of cardinality in the same attracted widespread public attention. Many scholars have discussed
dimension setting is less prominent. In summary, the critical factors this issue in numerous studies (Horvitz and Mulligan, 2015; Landau,
affecting Avg. I are dimension and the number of SPs, while cardinality 2015). Our proposed method provides a solution that avoids disclosing
has a weak effect on it. DMU information to achieve data confidentiality. Respecting data
confidentiality, our proposed method broadens the scope of applications
5. Conclusion of DEA models. Vaccine distribution is a potential case in point, where
countries can obtain distribution strategies through DEA methods
As data become more and more available, the difficulty with large- without revealing sensitive information.
scale datasets in DEA applications attracts the attention of many re Finally, there are still some limitations left for future development.
searchers. Except for this issue, the data privacy issue has not been First, in terms of the initial feasible solution, we use the Big M method to
discussed in designing an efficient algorithm in DEA research. There is a obtain an initial feasible solution, iterating off this feasible solution the
potential conflict between a requirement of full-data for implementing MP and the SPs may require multiple interactions, and discovering a
6
J. Ding et al. Computers & Industrial Engineering 175 (2023) 108875
Fig. 2. Relations between the number of SPs and the average iterations in different cases of cardinality in randomly generated datasets.
Fig. 3. Relations between the number of SPs and the average iterations in different cases of cardinality in datasets of Dulá (2011).
7
J. Ding et al. Computers & Industrial Engineering 175 (2023) 108875
Fig. 4. Relations between the dimension and the average iterations in different cases of SPs in randomly generated datasets.
Fig. 5. Relations between the dimension and the average iterations in different cases of SPs in datasets of Dulá (2011).
8
J. Ding et al. Computers & Industrial Engineering 175 (2023) 108875
better method to obtain the initial feasible solution may reduce the in Barr, R. S., & Durchholz, M. L. (1997). Parallel and hierarchical decomposition
approaches for solving large-scale data envelopment analysis models. Annals of
teractions. Second, the proposed algorithm may be applied to more
Operations Research, 73(1), 339–372.
scenarios, such as the case where the dataset is stored in a distributed Charnes, A., Cooper, W. W., & Rhodes, E. (1978). Measuring the efficiency of decision
form and so on. making units. European Journal of Operational Research, 2(6), 429–444.
Chen, W. C., & Cho, W. J. (2009). A procedure for large-scale dea computations.
Computers and Operations Research, 36(6), 1813–1824.
CRediT authorship contribution statement Chen, W. C., & Lai, S. Y. (2017). Determining radial efficiency with a large data set by
solving small-size linear programs. Annals of Operations Research, 250(1), 147–166.
Jingjing Ding: Conceptualization, Methodology, Writing – original Dantzig, G. B., & Wolfe, P. (1960). Decomposition principle for linear programs.
Operations Research, 8(1), 101–111.
draft, Writing – review & editing, Funding acquisition. Shengqing Dellnitz, A. (2022). Big data efficiency analysis: Improved algorithms for data
Chang: Data curation, Software, Visualization, Writing – original draft, envelopment analysis involving large datasets. Computers and Operations Research,
Writing – review & editing. Ruifeng Wang: Data curation, Software, 137, Article 105553.
Horvitz, E., & Mulligan, D. (2015). Data, privacy, and the greater good. Science, 349,
Writing – original draft. Chenpeng Feng: Writing – review & editing, 253–255.
Supervision, Project administration, Funding acquisition. Liang Liang: Dulá, J. H., & Thrall, R. M. (2001). A computational framework for accelerating dea.
Validation, Funding acquisition. Journal of Productivity Analysis, 16(1), 63–78.
Dulá, J. H. (2011). An algorithm for data envelopment analysis. INFORMS Journal on
Computing, 23(2), 284–296.
Declaration of Competing Interest Dulá, J. H., & López, F. J. (2002). Data Envelopment Analysis (DEA) in massive data sets.
In Handbook of massive data sets (pp. 419–437). Boston, MA: Springer.
Dulá, J. H., & López, F. J. (2009). Preprocessing DEA. Computers & Operations Research,
The authors declare that they have no known competing financial
36(4), 1204–1220.
interests or personal relationships that could have appeared to influence Dulá, J. H., & López, F. J. (2013). Dea with streaming data. Omega, 41(1), 41–47.
the work reported in this paper. Dulá, J. H. (2008). A computational study of dea with massive data sets. Computers &
Operations Research, 35(4), 1191–1203.
Jie, T. (2020). Parallel processing of the build hull algorithm to address the large-scale
Data availability DEA problem. Annals of Operations Research, 295, 453–481.
Khezrimotlagh, D., Zhu, J., Cook, W. D., & Toloo, M. (2019). Data envelopment analysis
Data will be made available on request. and big data. European Journal of Operational Research, 274(3), 1047–1054.
Khezrimotlagh, D., & Zhu, J. (2020). Data Envelopment Analysis and Big Data: Revisit
with a Faster Method. In Data Science and Productivity Analytics (pp. 1–34). Boston,
Acknowledgment MA: Springer.
Khezrimotlagh, D. (2021). Parallel Processing and Large-Scale Datasets in Data
Envelopment Analysis. In Data-Enabled Analytics. International Series in Operations
The authors would like to thank the guest editor and the anonymous Research & Management Science (pp. 159–198). Cham: Springer.
reviewers for their constructive comments and invaluable suggestions. Korhonen, P. J., & Siitari, P. A. (2007). Using lexicographic parametric programming for
Jingjing DING and Shengqing CHANG are joint first authors. This identifying efficient units in dea. Computers & Operations Research, 34(7),
2177–2190.
research is supported by National Natural Science Foundation of China Korhonen, P. J., & Siitari, P. A. (2009). A dimensional decomposition approach to
(Nos. 71771074, 71971072, 72188101, 71971074). identifying efficient units in large-scale dea models. Computers & Operations
Research, 36(1), 234–244.
Landau, S. (2015). Control use of data to protect privacy. Science, 347, 504–506.
References
Yu, A., Shi, Y., & Zhu, J. (2021). Acceleration of Large-Scale DEA Computations Using
Random Forest Classification. In Data-Enabled Analytics. International Series in
Ali, A. I. (1993). Streamlined computation for data envelopment analysis. European Operations Research & Management Science (pp. 31–50). Cham: Springer.
Journal of Operational Research, 64(1), 61–67. Zhu, Q., Wu, J., & Song, M. (2018). Efficiency evaluation based on data envelopment
Ali, A. I. (1994). Computational Aspects of DEA. In Data Envelopment Analysis: Theory, analysis in the big data context. Computers & Operations Research, 98, 291–300.
Methodology, and Applications (pp. 63–88). Netherlands: Springer.
Banker, R. D., Charnes, A., & Cooper, W. W. (1984). Some models for estimating
technical and scale inefficiencies in data envelopment analysis. Management science,
30(9), 1078–1092.