Professional Documents
Culture Documents
Optimization of Complex Systems: Theory, Models, Algorithms and Applications
Optimization of Complex Systems: Theory, Models, Algorithms and Applications
Optimization of Complex Systems: Theory, Models, Algorithms and Applications
Hoai An Le Thi
Hoai Minh Le
Tao Pham Dinh Editors
Optimization of
Complex Systems:
Theory, Models,
Algorithms and
Applications
Advances in Intelligent Systems and Computing
Volume 991
Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Nikhil R. Pal, Indian Statistical Institute, Kolkata, India
Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad
Central de Las Villas, Santa Clara, Cuba
Emilio S. Corchado, University of Salamanca, Salamanca, Spain
Hani Hagras, School of Computer Science & Electronic Engineering, University of
Essex, Colchester, UK
László T. Kóczy, Department of Automation, Széchenyi István University, Gyor,
Hungary
Vladik Kreinovich, Department of Computer Science, University of Texas at El
Paso, El Paso, TX, USA
Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung
University, Hsinchu, Taiwan
Jie Lu, Faculty of Engineering and Information Technology, University of
Technology Sydney, Sydney, NSW, Australia
Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of
Technology, Tijuana, Mexico
Nadia Nedjah, Department of Electronics Engineering, University of Rio de
Janeiro, Rio de Janeiro, Brazil
Ngoc Thanh Nguyen, Faculty of Computer Science and Management, Wrocław
University of Technology, Wrocław, Poland
Jun Wang, Department of Mechanical and Automation Engineering, The Chinese
University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications
on theory, applications, and design methods of Intelligent Systems and Intelligent
Computing. Virtually all disciplines such as engineering, natural sciences, computer
and information science, ICT, economics, business, e-commerce, environment,
healthcare, life science are covered. The list of topics spans all the areas of modern
intelligent systems and computing such as: computational intelligence, soft comput-
ing including neural networks, fuzzy systems, evolutionary computing and the fusion
of these paradigms, social intelligence, ambient intelligence, computational neuro-
science, artificial life, virtual worlds and society, cognitive science and systems,
Perception and Vision, DNA and immune based systems, self-organizing and
adaptive systems, e-Learning and teaching, human-centered and human-centric
computing, recommender systems, intelligent control, robotics and mechatronics
including human-machine teaming, knowledge-based paradigms, learning para-
digms, machine ethics, intelligent data analysis, knowledge management, intelligent
agents, intelligent decision making and support, intelligent network security, trust
management, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” are
primarily proceedings of important conferences, symposia and congresses. They
cover significant recent developments in the field, both of a foundational and
applicable character. An important characteristic feature of the series is the short
publication time and world-wide distribution. This permits a rapid and broad
dissemination of research results.
** Indexing: The books of this series are submitted to ISI Proceedings,
EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **
Optimization of Complex
Systems: Theory, Models,
Algorithms and Applications
123
Editors
Hoai An Le Thi Hoai Minh Le
Computer science and Applications Computer Science and Applications
Department Department
LGIPM, University of Lorraine LGIPM, University of Lorraine
Metz Cedex 03, France Metz Cedex 03, France
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
WCGO 2019 was the sixth event in the series of World Congress on Global
Optimization conferences, and it took place on July 8–10, 2019 at Metz, France.
The conference aims to bring together most leading specialists in both theoretical
and algorithmic aspects as well as a variety of application domains of nonconvex
programming and global optimization to highlight recent advances, trends, chal-
lenges and discuss how to expand the role of these fields in several potential
high-impact application areas.
The WCGO conference series is a biennial conference of the International
Society of Global Optimization (iSoGO). The first event WCGO 2009 took place in
Hunan, China. The second event, WCGO 2011, was held in Chania, Greece, fol-
lowed by the third event, WCGO 2013, in Huangshan, China. The fourth event,
WCGO 2015, took place in Florida, USA, while the fifth event was held in Texas,
USA. One of the highlights of this biannual meeting is the announcement of
Constantin Carathéodory Prize of iSoGO awarded in recognition of lifetime con-
tributions to the field of global optimization.
WCGO 2019 was attended by about 180 scientists and practitioners from 40
countries. The scientific program includes the oral presentation of 112 selected full
papers as well as several selected abstracts covering all main topic areas. In addi-
tion, the conference program was enriched by six plenary lectures that were given
by Prof. Aharon Ben-Tal (Israel Institute of Technology, Israel), Prof. Immanuel M.
Bomze (University of Vienna, Austria), Prof. Masao Fukushima (Nanzan
University, Japan), Prof. Anna Nagurney (University of Massachusetts Amherst,
USA), Prof. Panos M. Pardalos (University of Florida, USA), and Prof. Anatoly
Zhigljavsky (Cardiff University, UK).
This book contains 112 papers selected from about 250 submissions to WCGO
2019. Each paper was peer-reviewed by at least two members of the International
Program Committee and the International Reviewer Board. The book covers both
theoretical and algorithmic aspects of nonconvex programming and global opti-
mization, as well as its applications to modeling and solving decision problems in
various domains. The book is composed of ten parts, and each of them deals with
either the theory and/or methods in a branch of optimization such as continuous
v
vi Preface
WCGO 2019 was organized by the Computer Science and Applications Department,
LGIPM, University of Lorraine, France.
Conference Chair
Program Chairs
Publicity Chair
vii
viii Organization
External Reviewers
Plenary Lecturers
Sponsoring Institutions
Continuous Optimization
A Hybrid Simplex Search for Global Optimization with
Representation Formula and Genetic Algorithm . . . . . . . . . . . . . . . . . 3
Hafid Zidani, Rachid Ellaia, and Eduardo Souza de Cursi
A Population-Based Stochastic Coordinate Descent Method . . . . . . . . . 16
Ana Maria A. C. Rocha, M. Fernanda P. Costa, and Edite M.
G. P. Fernandes
A Sequential Linear Programming Algorithm for Continuous
and Mixed-Integer Nonconvex Quadratic Programming . . . . . . . . . . . 26
Mohand Bentobache, Mohamed Telli, and Abdelkader Mokhtari
A Survey of Surrogate Approaches for Expensive Constrained
Black-Box Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Rommel G. Regis
Adaptive Global Optimization Based on Nested Dimensionality
Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Konstantin Barkalov and Ilya Lebedev
A B-Spline Global Optimization Algorithm for Optimal Power
Flow Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Deepak D. Gawali, Bhagyesh V. Patil, Ahmed Zidna,
and Paluri S. V. Nataraj
Concurrent Topological Optimization of a Multi-component Arm
for a Tube Bending Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Federico Ballo, Massimiliano Gobbi, and Giorgio Previati
Discrete Interval Adjoints in Unconstrained Global Optimization . . . . 78
Jens Deussen and Uwe Naumann
xiii
xiv Contents
Multiobjective Programming
A Global Optimization Algorithm for the Solution of Tri-Level
Mixed-Integer Quadratic Programming Problems . . . . . . . . . . . . . . . . 579
Styliani Avraamidou and Efstratios N. Pistikopoulos
A Method for Solving Some Class of Multilevel Multi-leader
Multi-follower Programming Problems . . . . . . . . . . . . . . . . . . . . . . . . . 589
Addis Belete Zewde and Semu Mitiku Kassa
xviii Contents
Engineering Systems
Application of PLS Technique to Optimization of the Formulation
of a Geo-Eco-Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 971
S. Imanzadeh, Armelle Jarno, and S. Taibi
Databases Coupling for Morphed-Mesh Simulations and Application
on Fan Optimal Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 981
Zebin Zhang, Martin Buisson, Pascal Ferrand, and Manuel Henner
Kriging-Based Reliability-Based Design Optimization Using
Single Loop Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 991
Hongbo Zhang, Younes Aoues, Hao Bai, Didier Lemosse,
and Eduardo Souza de Cursi
Sensitivity Analysis of Load Application Methods for Shell Finite
Element Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001
Wilson Javier Veloz Parra, Younes Aoues, and Didier Lemosse
1 Introduction
In the context of the resolution of engineering problems, many optimization algo-
rithms have been proposed, tested and analyzed in the last decades. However,
optimization in engineering remains an active research field, since many real-
world engineering optimization problems remain very complex in nature and
quite difficult to be solved by the existing algorithms. The existing literature
presents intensive research efforts to solve some difficulty points, which remains
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 3–15, 2020.
https://doi.org/10.1007/978-3-030-21803-4_1
4 H. Zidani et al.
still incompletely solved and for which only partial response has been obtained.
Among these, we may cite: handling of non convexity - specially when optimiza-
tion restrictions are involved, working with incomplete or erroneous evaluation
of the functions and restrictions, increasing the number of optimization variables
up to those of realistic designs in practical situations, dealing with non regular
(discontinuous or non-differentiable) functions, determining convenient starting
points for iterative methods. Floudas [5] We observe that the difficulties con-
cerning non-convexity and the determination of starting points are connected:
efficient methods for the optimization of regular functions are often deterministic
and involve gradients, but depends strongly on the initial point - they can be
trapped by local minima if a non convenient initial guess is used. Alternatively,
methods based on the exploration of the space of the design variables usually
involve a stochastic aspect - thus, a significant increase in the computational cost
- and are less dependent of the initial choice, but improvements in their perfor-
mance request combination with deterministic methods and may introduce a
dependence on the initial choice. This last approach tends to the use of hybrid
procedures involving both approaches and try to benefit from the best of each
method - by these reasons, the literature about mixed stochastic/deterministic
methods has grown in the last years [2]. Those hybrid algorithms perform better
if the initial point belongs to an attraction area of the optimum. This shows the
importance of the initial guesses in optimization algorithm [8]. Hence, we would
like in this paper to use a representation formula to provide a convenient initial
guess of the solution. Let S denote a closed bounded regular domain of the n-
dimensional Euclidean space Rn , and let f be a continuous function defined on
S and taking its values on R. An unconstrained optimization problem can be
formulated, in general, as follows:
More recently, the original representation proposed by Pincus has been refor-
mulated by Souza de Cursi [3] as follows: let X be a random variable taking its
values on S and g : R2 −→ R be a function. If these elements are conveniently
chosen, then
E (X g (λ, f (X)))
x∗ = lim (2)
λ→+∞ E (g (λ, f (X)))
we propose the use of the representation given by Eq. (3) hybridized with the
Nelder Mead algorithm and a genetic algorithm, for the global optimization of
multimodal functions.
Hybrid methods have been introduced to keep the flexibility of the stochastic
methods and the efficiency of the deterministic one. In our paper, the hybrid
method for solving optimization problems is a coupling of the representation
formula proposed by Pincus [9] and Nelder Mead algorithm and genetic algo-
rithm. The representation formula is used first to find the region containing the
global solution, based on the generating of finite samples of the random variables
involved in the expression and an approximation of the limits. For instance, we
may choose λ large enough and generate a sample by using standard random
number generators.
The generation of points can be done either using a normal distribution or a
Gaussian one. In the case of Gaussian distribution, when a trial point generated
lies outside S, it has been projected in order to get admissible point.
In order to obtain a more accurate results, it is convenient to use the improve-
ment by the following algorithms:
3 Test Bed
To demonstrate the efficiency and the accuracy of the hybrid algorithms, 21
typical benchmark functions of different levels of complexity and multimodality
were chosen from the global optimization literature [6].
One hundred runs have been performed for each test function to estimate
the probability of success of the used methods. The used test functions are:
Bohachevsky 1 BO1, Bohachevsky 2 BO2, Branin function BR, Camel func-
tion CA, Cosine mixture function CO, DeJoung function DE, Goldstein and
price function GO, Griewank function GR, Hansen function HN, Hartman 3
function HR3, Hartman 6 function HR6, Rastrigin function RA, Rosenbrock
function RO, Shekel 5 SK5, Shekel 7 SK7, Shekel 10 SK10, Shubert 1 func-
tion SH1, Shubert 2 function SH2, Shubert 3 function SH3, Shubert 4 function
SH4 and Wolfe nondifferentiable function WO.
4 Numerical Results
In this section we focus on the efficiency of the six algorithms, i.e. Representation
Formula (RF), Classical Genetic Algorithm (GA), Representation Formula with
GA (RFGA), Genetic Algorithm using Nelder Mead algorithm at the mutation
stage (GANM), Representation Formula with GA and Nelder Mead (RFGANM),
and Representation Formula with Nelder Mead (RFNM).
A series of experiments have been done to make some performance analysis
about them. To avoid attributing the optimization results to the choice of a
particular conditions and to conduct fair comparisons, we have performed each
test 100 times, starting from various randomly selected points in the hyper rect-
angular search domain.
The used parameters in genetic algorithm are: Population size: from 2 to
50, the mutation rate is set to 0.2, the selection method is rank weighting, the
stopping criteria are maximum iteration (set to 2000 for GA) and the maximum
number of continuous iterations without improving the solution, and it is set to
1000 for GA.
Concerning NM we adopted the standard parameters recommended by the
authors, the used stoping criteria are: Maximum function evaluation, maxf un =
50000, maximum of iteration, maxiter = 10000, termination tolerance on the
function value tolf = 10−5 and termination on the tolerance on xtolx = 10−6 .
A Hybrid Simplex Search for Global Optimization 7
To investigate the effect of the simple size used in the representation formula on
searching quality of the hybrid algorithms, we choose six level of sample sizes
(SS = 60, 100, 500, 1000, 2000 and 5000) and we set the value of population size
to 6. The experiments results are reported in Table 3. Because of the limitation of
space, only 10 test functions are presented, chosen in order to illustrate significant
aspects.
We observe that the success rate and the number of function evaluation
increase with the sample size. The RF method failed in almost the tests (SR =
0% in the most cases) except in the case of HR3 function (SR = 100% f or SS =
A Hybrid Simplex Search for Global Optimization 9
The results presented in Tables 4 and 5 are based on P opGA = 12 for GA and
GANM algorithms, and P opGA = 6 for the others. The sample size chosen for
10 H. Zidani et al.
TestF Dim RF Size RF SR RF Neval RFGA RFGA RFGANM RFGANM RFNM RFNM
SR Neval SR Neval SR Neval
SH4 4 60 0% 1 800 7% 15 460 38% 103 489 43% 7 981
SH4 4 100 0% 3 000 9% 16 892 34% 104 957 47% 9 137
SH4 4 500 0% 15 000 8% 28 885 74% 117 804 100% 21 010
SH4 4 5000 0% 150 000 10% 163 800 97% 252 327 100% 155 899
SK10 4 60 0% 1 800 0% 15 470 78% 108 783 100% 8 002
SK10 4 100 0% 3 000 2% 16 669 72% 110 233 100% 9 209
SK10 4 500 0% 15 000 4% 28 783 91% 121 826 100% 21 198
SK10 4 5000 0% 150 000 1% 163 758 99% 255 924 100% 155 946
BO2 2 60 0% 1 800 10% 14 320 83% 9 338 100% 4 421
BO2 2 100 0% 3 000 9% 15 477 78% 11 953 100% 5 532
BO2 2 500 0% 15 000 14% 26 791 92% 20 139 100% 17 332
BO2 2 5000 0% 150 000 17% 160 866 100% 152 901 100% 152 160
CA 2 60 1% 1 800 68% 13 957 100% 32 760 100% 4 338
CA 2 100 1% 3 000 71% 15 106 100% 33 982 100% 5 565
CA 2 500 1% 15 000 71% 27 101 100% 45 952 100% 17 481
CA 2 5000 11% 150 000 73% 160 307 100% 180 812 100% 152 405
CO 4 60 0% 1 800 58% 15 421 92% 8 142 96% 8 990
CO 4 100 0% 3 000 50% 16 752 93% 9 167 96% 10 219
CO 4 500 0% 15 000 43% 28 507 100% 20 130 100% 22 540
CO 4 5000 0% 150 000 52% 163 426 100% 155 101 100% 157 744
GR 5 60 0% 1 800 0% 15 685 43% 183 996 97% 27 227
GR 5 100 0% 3 000 0% 16 866 32% 185 066 100% 28 212
GR 5 500 0% 15 000 0% 28 872 49% 196 576 100% 40 228
GR 5 5000 0% 150 000 0% 163 962 33% 330 770 100% 172 940
HN 2 60 1% 1 800 63% 14 338 100% 41 534 100% 4 008
HN 2 100 2% 3 000 63% 15 387 100% 42 604 100% 5 252
HN 2 500 1% 15 000 57% 27 103 100% 54 571 100% 17 107
HN 2 5000 14% 150 000 72% 161 323 100% 189 857 100% 152 137
RA 2 60 0% 1 800 7% 13 856 64% 10 726 93% 4 054
RA 2 100 0% 3 000 13% 15 105 79% 9 321 100% 5 256
RA 2 500 0% 15 000 8% 26 663 96% 18 566 100% 17 306
RA 2 5000 1% 150 000 5% 160 487 100% 152 868 100% 152 329
RO 10 60 0% 1 800 0% 15 806 99% 326 193 96% 133 508
RO 10 100 0% 3 000 0% 16 979 99% 328 651 100% 133 634
RO 10 500 0% 15 000 0% 29 006 100% 341 868 100% 146 056
RO 10 5000 0% 150 000 0% 163 858 100% 477 806 100% 281 391
WO 2 60 0% 1 800 0% 14 399 100% 21 077 100% 6 467
WO 2 100 0% 3 000 0% 15 090 100% 22 260 100% 7 567
WO 2 500 0% 15 000 0% 27 260 100% 34 235 100% 19 385
WO 2 5000 0% 150 000 0% 160 978 100% 169 121 100% 154 137
the test functions depends on its dimension: SS = 60 for Dim = 1 or 2, 100 for
Dim = 3, 500 for Dim = 4 or 5, 1000 for Dim = 6 and 2000 for Dim = 10.
The Tables 4 and 5 summarize the results (i.e. SR, SD, CPUT and NEvalF)
obtained from the 100 runs of the six algorithms, for the 21 benchmark functions.
Table 4. Comparison between methods: GA, GANM and RF
20 SK7 4 21% 3,29E-01 0,65 23 726 90% 1,58E-01 51,66 217 461 0% 1,21E-01 0,23 15 000
21 WO 2 1% 4,63E-02 0,23 20 512 100% 0,00E+00 2,04 39 624 0% 3,14E+00 0,02 1 800
11
12
Table 5. Comparison between methods: RFGA, RFGANM and RFNM
4 CA 2 68% 2,99E-04 0,23 13 957 100% 3,72E-08 1,65 32 760 100% 3,72E-08 0,15 4 338
5 CO 4 43% 2,20E-04 0,36 28 507 100% 6,88E-08 0,33 20 130 100% 6,88E-08 0,53 22 540
6 DE 3 78% 5,48E-05 0,25 16 260 100% 4,99E-12 0,19 7 073 100% 2,25E-12 0,29 8 271
7 GO 2 25% 1,11E-03 0,32 14 649 100% 0,00E+00 1,27 20 873 100% 0,00E+00 0,17 4 212
8 GR 5 0% 2,41E-01 0,43 28 872 49% 2,04E-01 12,12 196 576 90% 5,05E-02 1,77 40 228
9 HN 2 63% 1,76E-02 0,27 14 338 100% 6,37E-08 2,33 41 534 100% 6,37E-08 0,16 4 008
10 HR3 3 100% 1,34E-05 0,30 16 290 100% 6,43E-08 4,37 65 699 100% 6,43E-08 0,33 7 570
11 HR6 6 35% 1,74E-02 0,58 43 874 73% 1,60E-02 16,81 231 152 100% 2,00E-08 1,76 48 337
12 RA 2 7% 9,79E-03 0,22 13 856 64% 5,03E-01 0,46 10 726 93% 2,55E-01 0,13 4 054
13 RO 10 0% 8,02E-01 2,98 73 994 99% 3,99E-01 23,75 386 100 98% 2,41E-03 11,24 190 569
14 SH1 1 88% 9,12E-05 0,15 9 942 100% 4,91E-08 1,00 21 925 100% 4,91E-08 0,08 2 756
15 SH2 2 51% 4,18E-04 0,27 14 308 100% 3,48E-08 2,29 40 642 100% 3,48E-08 0,16 4 047
16 SH3 3 36% 3,66E-02 0,33 16 512 93% 7,48E-02 4,72 75 502 99% 3,38E-02 0,29 6 771
17 SH4 4 8% 7,18E-02 0,46 28 885 74% 1,38E-01 7,27 117 804 83% 1,05E-01 0,56 21 010
18 SK10 4 4% 2,57E-01 0,96 28 783 91% 1,51E-01 34,07 121 826 100% 5,64E-08 2,16 21 198
19 SK5 4 3% 2,57E-01 0,74 28 858 83% 1,88E-01 20,33 122 398 100% 0,00E+00 1,37 21 129
20 SK7 4 2% 2,72E-01 0,84 28 693 91% 1,51E-01 25,71 121 771 100% 7,07E-08 1,70 21 206
21 WO 2 0% 1,17E-01 0,24 14 399 100% 0,00E+00 1,01 21 077 100% 0,00E+00 0,26 6 467
A Hybrid Simplex Search for Global Optimization 13
– The success rates for GA are generally modest (4% to 70% and SR=0% for
GR and RO), except in the case of SH1 and HR3 (80% and 90% respectively).
– The results are improved for GANM, for witch the success rate is 100% for 7
test functions, and SR ≥ 90% for 6 test functions. This results are similar to
those obtained in [1].
– The representation formula RF has failed for almost all tests, except for SH1
where SR = 67%(SR = 100% for an accuracy of 10−3 ). We notice that the
number of function evaluation increased considerably.
– The accuracy of RFGA is lower than GANM, with a larger number of function
evaluations. The total success is obtained only for 5 functions (SR ≥ 99%).
The number of function evaluation is similar to the RFGA method.
– In view of the algorithm’s effectiveness and efficiency on the whole, RFGANM
hybrid approach remains the most competitive to RFNM than the other
methods. Indeed, the SR is 100% for 12 test functions, and SR ≥ 82% for 18
functions (SR = 100% for P opGA = 50, see Table 2 for all the test functions).
The number of function evaluation is similar to the RFGA method.
– Concerning the RFNM, the experimental data obtained from the 21 test
functions shows high accuracy for all the test problems, with a 100% rate
of successful performance on all examples, with smaller number of function
evaluations than RFGANM and RFGA.
In this section, the experiments are aimed to compare the performance of RFNM
against five global optimization methods listed below (Table 6). In order to make
the results comparable, all the conditions are set to the same values (100 runs
and the accuracy of SR is set to 10−4 ).
In previous tests, we used the same parameters for all test functions. The
results showed that the RFNM method is robust (SR = 100%). To compare
it to the algorithms listed in the table, we took appropriate settings for each
function.
Methods References
Representation Formula with Nelder Mead (RFNM) This work
Enhanced Continuous Tabu Search (ECTS) [2]
Staged Continuous Tabu Search (SCTS) [7]
Continuous Hybrid Algorithm (CHA) [7]
Differential Evolution (DE) [7]
LPτ NM Algorithm (LPτ NM) [7]
14 H. Zidani et al.
Table 7. Comparison with other methods in terms of success rate and number of
function evaluations
Table 7 shows that RFNM performs better then the other algorithms for three
functions (BR, SK10, RO) with a with a success rate of 100, and with some
additional number of function evaluation in some other cases.
5 Conclusion
References
1. Chelouah, R., Siarry, P.: Genetic and Nelder-Mead algorithms hybridized for a more
accurate global optimization of continuous multiminima functions. Eur. J. Oper.
Res. 148(2), 335–348 (2003)
2. Chelouah, R., Siarry, P.: A hybrid method combining continuous tabu search and
nelder-mead simplex algorithms for the global optimiza tion of multiminima func-
tions. Eur. J. Oper. Res. 161(3), 636–654 (2005)
A Hybrid Simplex Search for Global Optimization 15
1 Introduction
The optimization methods for solving problems that have big size of data, like
large-scale machine learning, can make use of classical gradient-based meth-
ods, namely the full gradient, accelerated gradient and the conjugate gradient,
classified as batch approaches [1]. Using intuitive schemes to reduce the infor-
mation data, the stochastic gradient approaches have shown to be more efficient
than the batch methods. An appropriate approach to solve this type of prob-
lems is through coordinate descent methods. Despite the fact that they were the
first optimization methods to appear in the literature, they have received much
attention recently. Although the global optimization (GO) problem addressed
in this paper has not a big size of data, the herein proposed solution method
is iterative, stochastic and relies on a population of candidate solutions at each
iteration. Thus, a large amount of calculations may be required at each itera-
tion. To improve efficiency, we borough some of the ideas that are present in
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 16–25, 2020.
https://doi.org/10.1007/978-3-030-21803-4_2
Population Based Stochastic Method 17
min f (x)
(1)
subject to x ∈ Ω,
where ei represents the ith coordinate vector for some index i, randomly selected
from the set {1, 2, . . . , n}. We note that the search direction is along a component
of the gradient computed at a special point of the subpopulation k, xkH , further
on denoted by the point with the highest score. Since dkj might not be a descent
direction for f at xkj , the movement according to (3) is applied only if dkj is
descent for f at xkj . Otherwise, the point xkj is not moved. Whenever the new
position of the point falls outside the bounds, a projection onto Ω is carried out.
The index of the point with highest score kH , at iteration k, satisfies
is the score of the point xki [6]. The normalized distance D̂(xki ), from xki to
the center point of the k subpopulation, and the normalized objective function
value fˆ(xki ) at xki are defined by
and
f (xki ) − minj=1,...,|P + | f (xkj )
fˆ(xki ) = k
(7)
maxj=1,...,|P + | f (xkj ) − minj=1,...,|P + | f (xkj )
k k
respectively. The distance function D(xki ) (to the center point x̄k ) is measured
by xki − x̄k 2 and the center point is evaluated as follows:
+
|Pk |
1
x̄k = + xkj . (8)
|Pk | j=1
We note here that the point with the highest score in each subpopulation is
the point that lies far away from the center of the region defined by its points
(translated by x̄) that has the lowest function value. This way, looking for the
20 A. M. A. C. Rocha et al.
largest distance to x̄, the algorithm potentiates its exploration ability, and choos-
ing the one with lowest f value, the algorithm reenforces its local exploitation
capability. For each point with index kj , j = 1, . . . , |Pk+ |, the gradient coordinate
index i may be randomly selected by U on the set {1, 2, . . . , n} one at a time for
each kj with replacement. However, the random choice may also be done using
U on {1, 2, . . . , n} but without replacement. In this later case, when all indices
have been chosen, the set {1, 2, . . . , n} is shuffled [5].
The stopping condition of our population-based stochastic coordinate descent
algorithm aims to guarantee a solution in the vicinity of f ∗ . Thus, if
where xkb is the best point of the subpopulation k and f ∗ is the known global
optimum, is satisfied for a given tolerance > 0, the algorithm stops. Otherwise,
the algorithm runs until a specified number of function evaluations, nfmax , is
reached. The main steps of the algorithm are shown in Algorithm 1.
4 Numerical Experiments
In our previous work [2], we have used the gradient computed at x̄. Besides
this variant, we have also tested a variant where the gradient is computed at
the best point of the subpopulation. These variants are now compared with the
new strategy based on the gradient computed at the point with highest score,
summarized in the previous section. All the tested variants are termed as follows:
– best w (best wout): gradient computed at the best point and the coordinate
index i (see (4)) is randomly selected by U with (without) replacement;
– center w (center wout): gradient computed at x̄ and the coordinate index i is
randomly selected by U with (without) replacement;
– hscore w (hscore wout): gradient computed at the point with highest score and
the coordinate index i is randomly selected by U with (without) replacement;
– best full g (center full g / hscore full g): using the full gradient computed at
the best point (x̄/the point with highest score) to define the search direction.
Each variant was run 30 times with each problem. Tables 1 and 2 show the
average of the obtained f solution values over the 30 runs, favg , the minimum f
solution value obtained after the 30 runs, fmin , the average number of function
evaluations, nfavg , and the percentage of successful runs, %s, for the variants
best w, center w, hscore w and best wout, center wout, hscore wout. A successful
run is a run which stops with the stopping condition for the specified , see
(9). The other statistics also reported in the tables are: (i) the % of problems
with 100% of successful runs (% prob 100%); (ii) the average nf in problems
with 100% of successful runs (nfavg 100%); (iii) average nf in problems with
100% of successful runs simultaneously in the 3 tested variants (for each table)
(nfavg all100%). A result printed in ‘bold’ refers to the best variant shown and
compared in that particular table. From the results, we may conclude that using
with or without replacement to choose the coordinate index i (see (4)) has no
influence on the robustness and efficiency of the variant based on the gradient
computed at the best point. Variants best w and best wout are the less robust
and variants center w, hscore w and hscore wout are the most robust.
When computing the average number of function evaluations for the problems
that have 100% of successful runs in all the 3 tested variants, best w wins, fol-
lowed by hscore w and then by center w (same is true for best wout, hscore wout
and center wout). We remark that these average number of evaluations corre-
spond to the simpler and easy to solve problems. For the most difficult problems
and yet larger problems, the variants hscore w (75% against 50% and 69%) and
hscore wout (69% against 50% and 63%) win as far as robustness is concerned.
This justifies their larger nfavg 100% values.
The results reported in Table 3 aim to show that robustness has not been
improved when the full gradient is used. All the values and statistics have the
same meaning as in the previous tables. Similarly, the variant based on gra-
dient computed at the best point reports the lowest nfavg all100% but also
reaches the lowest % prob 100%. The use of the full gradient has deterio-
rated the results mostly on the variant center full g when compared with both
center w and center wout.
22
Table 1. Results based on the use of one coordinate of the gradient, randomly selected with replacement.
GP 3.000E+00 3.000E+00 828 100 3.000E+00 3.000E+00 1262 100 3.000E+00 3.000E+00 1564 100
HSK −2.346E+00 −2.346E+00 81 100 −2.346E+00 −2.346E+00 305 100 −2.346E+00 −2.346E+00 110 100
MT 9.652E-09 9.006E-09 1542 100 8.650E-09 5.902E-09 2255 100 8.556E-09 1.157E-09 2159 100
MC −1.913E+00 −1.913E+00 93 100 −1.913E+00 −1.913E+00 318 100 −1.913E+00 −1.913E+00 172 100
MHB 3.510E-01 2.662E-10 12144 77 5.525E-09 7.487E-10 1721 100 4.254E-09 5.141E-10 1450 100
NF2 3.728E-03 3.690E-05 50009 0 1.023E-02 6.221E-06 50020 0 4.403E-03 6.601E-05 50020 0
PWQ 6.514E-03 1.936E-07 50013 0 5.965E-03 5.386E-06 50019 0 6.655E-03 1.723E-05 50021 0
RG-2 4.643E-01 3.165E-09 18947 63 4.568E-09 1.868E-11 1505 100 4.160E-09 1.600E-10 2074 100
RG-5 3.283E+00 5.984E-09 43593 13 4.026E-09 8.058E-12 5918 100 3.855E-09 3.368E-12 6981 100
RG-10 5.373E+00 1.990E+00 50007 0 3.317E-02 1.994E-10 13911 97 3.157E-09 2.200E-11 20202 100
RB 5.346E-04 3.505E-08 50008 0 6.533E-03 1.224E-06 50027 0 5.449E-03 2.052E-07 50023 0
WF 8.587E-04 1.706E-05 50012 0 1.490E-01 6.966E-06 50018 0 2.425E-01 4.423E-03 50021 0
% prob 100% 50 69 75
nfavg 100% 607 1607 3170
nfavg all100% 607 1067 916
Table 2. Results based on the use of one coordinate of the gradient, randomly selected without replacement.
Table 4 compares the results obtained with five of the above mentioned prob-
lems with those presented in [2]. The comparison involves the three tested vari-
ants center w, hscore w and hscore wout, which provided the highest percentages
of successful runs, 69%, 75% and 69% respectively. This table reports the values
of favg and nfavg , after 30 runs. We note that the herein stopping condition is
the same as that of [2]. All reported variants have 100% of successful runs when
solving GP, MHB, RG-2 and RG-5. However, only the variant hscore w reaches
100% success when solving RG-10 (see last row in the table).
5 Conclusions
In this paper, we present a population-based stochastic coordinate descent
method for bound constrained GO problems. Several variants are compared in
order to find the most robust, specially when difficult and larger problems are
considered. The idea of using the point with highest score to generate the coor-
dinate descent directions to move all the points of the subpopulation has shown
to be more robust than the other tested ideas and worth pursuing.
Future work will be directed to include, in the set of tested problems,
instances with varied dimensions to analyze the influence of the dimension n
in the performance of the algorithm. Another matter is related to choosing a
specified number (yet small) of gradient coordinate indices (rather than just
one) by the uniform distribution on the set {1, 2, . . . , n}, to move each point of
the subpopulation.
References
1. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine
learning. Technical Report arXiv:1606.04838v3, Computer Sciences Department,
University of Wisconsin-Madison (2018)
2. Rocha, A.M.A.C., Costa, M.F.P., Fernandes, E.M.G.P.: A stochastic coordinate
descent for bound constrained global optimization. AIP Conf. Proc. 2070, 020014
(2019)
3. Kvasov, D.E., Mukhametzhanov, M.S.: Metaheuristic vs. deterministic global opti-
mization algorithms: the univariate case. Appl. Math. Comput. 318, 245–259 (2018)
4. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization
problems. SIAM J. Optim. 22(2), 341–362 (2012)
5. Wright, S.J.: Coordinate descent algorithms. Math. Program. Series B 151(1), 3–34
(2015)
6. Liu, H., Xu, S., Chen, X., Wang, X., Ma, Q.: Constrained global optimization via a
DIRECT-type constraint-handling technique and an adaptive metamodeling strat-
egy. Struct. Multidisc. Optim. 55(1), 155–177 (2017)
7. Ali, M.M., Khompatraporn, C., Zabinsky, Z.B.: A numerical evaluation of several
stochastic algorithms on selected continuous global optimization test problems. J.
Glob. Optim. 31(4), 635–672 (2005)
A Sequential Linear Programming
Algorithm for Continuous
and Mixed-Integer Nonconvex Quadratic
Programming
1 Introduction
Nonconvex quadratic programming is a very important branch in optimization.
There does not exists a polynomial algorithm for finding the global optimum
of nonconvex quadratic programs, so they are considered as NP-hard optimiza-
tion problems. Several approaches were proposed for finding local optimal solu-
tions (DCA [16], interior-point methods [2], simplex algorithm for the concave
quadratic case [4], etc.) and approximate global solutions (branch and cut [18],
branch and bound [10,12,13], DC combined with branch and bound [1], integer
linear programming reformulation approaches [21], approximation set and linear
programming (LP) approach [3,5,17], etc.)
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 26–36, 2020.
https://doi.org/10.1007/978-3-030-21803-4_3
A SLP Algorithm for Nonconvex QP 27
In [5], a new and very interesting approach based on the concept of approx-
imation set and LP for finding an approximate global solution of a strictly con-
cave quadratic program with inequality and nonnegativity constraints is pro-
posed. This approach computes a finite number of feasible points in the level
line passing through the initial feasible solution, then it solves a sequence of
linear programs. After that, the current point is improved by using the global
optimality criterion proposed in [9].
In [3,17], the previous approach is adapted and extended to solve concave
quadratic programs written in general form (the matrix of the quadratic form
is negative semi-definite, the problem can contain equality and inequality con-
straints, the bounds of the variables can take finite or infinite values). In order to
improve the current solution, the global optimality criterion proposed in [14] was
used. However, the previous global criteria [9,14] use the concavity assumption,
thus they can not be used for the general nonconvex case.
In this work, we generalize the algorithms proposed in [3,5,17] for solving
nonconvex quadratic programming problems written in general form. Hence a
new approach called “sequential linear programming algorithm” is proposed.
This algorithm starts with an initial extreme point, which is the solution of
the linear program corresponding to the minimization of the linear part of the
objective function over the feasible set of the quadratic problem, then it moves
from a current extreme point to a new one with a better objective function value
by solving a sequence of LP problems. The algorithm stops when no improvement
is possible. Our algorithm finds a good approximate global extreme point for
continuous as well as mixed-integer quadratic programs, it is easy to implement
and has a polynomial average complexity.
In order to compare our algorithm with the existing approaches, we devel-
oped an efficient implementation with MATLAB2018a [11]. Then, we presented
some numerical experiments which compare the performance of the developed
nonconvex solver (SLPqp) with the branch and cut algorithm implemented in
CPLEX12.8 [6] on a collection of 104 test problems: 64 nonconvex quadratic
test problems and 20 concave quadratic test problems of the library Globallib
[8], 8 concave test problems randomly generated with the algorithm proposed
in [15] and 12 mixed-integer concave quadratic test problems constructed using
the continuous qps [7,15], by considering 50% of their variables as integers.
This paper is organized as follows. In Sect. 2, we state the problem and recall
some definitions and results of nonconvex quadratic programming. In Sect. 3, we
describe the proposed algorithm and we illustrate it with two numerical examples
(a continuous nonconvex qp [19] and an integer concave qp [20]). In Sect. 4, we
present some numerical experiments which compare our solver with the branch
and cut solver of CPLEX12.8. Finally, we conclude the paper and give some
future works.
28 M. Bentobache et al.
The global minimizer is x∗ = (0, 0, 0, 0, 5, 1, 0, 0)T with f (x∗ ) = −179 [19]. The
current point and its corresponding objective value at each iteration of SLPqp
are shown in the left side of Table 1.
The optimal solution is x∗ = (2, 3)T with f (x∗ ) = −10 [20]. The current point
and its corresponding objective value at each iteration of SLPqp are shown in
the right side of Table 1.
4 Numerical Experiments
In order to compare our algorithm (SLPqp) with the branch and cut solver
of CPLMEX12.8 (CPLEX): the “globalqpex1” function with the parameter
“optimalitytaget” set to 3, i.e., global, we developed an implementation with
MATLAB2018a. In this implementation, we used the barrier interior-point algo-
rithm of CPLEX12.8 (the “cplexlp” function with the parameter “lpmethod”
set to 4) to solve the intermediate continuous linear programs and we used the
“cplexmilp” function for solving the intermediate mixed-integer LPs. In the com-
parison, the different solvers are executed on a PC with a CORE i7-4790CPU
3.60 GHz processor, 8 G0 of RAM and Windows 10 operating system. We have
considered 104 nonconvex quadratic test problems (these qps can be downloaded
from “https://www.sciencedz.net/perso.php?id=mbentobache&p=253”):
(A) Twelve mixed-integer concave quadratic test problems obtained by con-
sidering 50% of the variables of the following problems as integers: nine qps
taken from [7] (the problems miqp7, miqp8, miqp9 are obtained by setting
in Problem 7, page 12 [7] (λi , αi ) = (1, 2), (λi , αi ) = (1, −5), (λi , αi ) = (1, 8)
A SLP Algorithm for Nonconvex QP 31
respectively). The three last problems: miqp10, miqp11, miqp12 are obtained
from the first three generated continuous qps: Rosen-qp1, Rosen-qp2 and
Rosen-qp3 shown in Table 6. The results of the two solvers for these miqps
are shown in Table 2.
(B) Sixty-four nonconvex quadratic test problems of the library Globallib [8].
Results are shown in Tables 3 and 4.
(C) Twenty concave quadratic test problems of the library Globallib [8]. Results
are shown in Table 5.
(D) Eight concave quadratic test problems randomly generated with the algo-
rithm proposed in [15]. These qps are written in the form:
No QP m n SLPqp CPLEX
f∗ It1 CPU1 It CPU f∗ CPU
1 ex2-1-1 1 5 −17,000 4 0,07 6 0,31 −17,000 0,46
2 ex2-1-10 10 20 −498345,482 4 0,10 6 1,02 −498345,482 0,11
3 ex2-1-2 2 6 −213,000 1 0,06 2 0,22 −213,000 0,03
4 ex2-1-3 9 13 −15,000 5 0,07 7 0,58 −15,000 0,10
5 ex2-1-4 5 6 −11,000 1 0,06 2 0,24 −11,000 0,06
6 ex2-1-6 5 10 −39,000 2 0,07 5 0,67 −39,000 0,07
7 ex2-1-9 1 10 0,000 1 0,06 2 0,28 −0,375 0,28
8 nemhaus 5 5 31,000 1 0,06 2 0,20 31,000 0,03
9 qp1 2 50 0,063 2 0,10 4 2,20 – > 14400
10 qp2 2 50 0,104 2 0,10 4 2,17 – > 14400
11 qp3 52 100 0,006 1 0,06 3 7,44 – > 14400
12 st-bpaf1a 10 10 −45,380 1 0,06 3 0,49 −45,380 0,07
13 st-bpaf1b 10 10 −42,963 1 0,06 3 0,50 −42,963 0,23
14 st-bpk1 6 4 −13,000 1 0,06 3 0,25 −13,000 0,07
15 st-bpk2 6 4 −13,000 1 0,06 3 0,25 −13,000 0,10
16 st-bpv2 5 4 −8,000 1 0,06 3 0,26 −7,999 0,12
17 st-bsj2 5 3 1,000 1 0,06 2 0,17 1,000 0,55
18 st-bsj3 1 6 −86768,550 5 0,07 7 0,32 −86768,550 0,03
19 st-bsj4 4 6 −70262,050 4 0,07 6 0,33 −70262,050 0,06
20 st-e22 5 2 −85,000 2 0,06 4 0,19 −85,000 0,04
21 st-e23 2 2 −0,750 1 0,06 2 0,15 −1,083 0,05
22 st-e24 4 2 8,000 1 0,06 3 0,19 8,000 0,04
23 st-e25 8 4 0,870 1 0,06 2 0,19 0,870 0,05
24 st-e26 4 2 −185,779 2 0,06 4 0,19 −185,779 0,04
25 st-fp1 1 5 −17,000 4 0,07 6 0,29 −17,000 0,05
26 st-fp2 2 6 −213,000 1 0,06 2 0,22 −213,000 0,03
27 st-fp3 10 13 −15,000 5 0,07 7 0,59 −15,000 0,06
28 st-fp4 5 6 −11,000 1 0,06 2 0,23 −11,000 0,04
29 st-fp5 11 10 −268,015 1 0,06 2 0,30 −268,015 0,06
30 st-fp6 5 10 −39,000 2 0,07 5 0,67 −39,000 0,06
31 st-glmp-fp1 8 4 10,000 1 0,06 3 0,27 10,000 0,05
32 st-glmp-fp2 9 4 7,345 1 0,06 3 0,28 7,345 0,14
33 st-glmp-fp3 8 4 −12,000 1 0,06 3 0,27 −12,000 0,04
34 st-glmp-kk90 7 5 3,000 1 0,06 3 0,31 3,000 0,05
35 st-glmp-kk92 8 4 −12,000 1 0,06 3 0,27 −12,000 0,03
36 st-glmp-kky 13 7 −2,500 1 0,06 2 0,25 −2,500 0,06
37 st-glmp-ss1 11 5 −24,000 1 0,06 3 0,31 −24,571 0,07
38 st-glmp-ss2 8 5 3,000 1 0,06 3 0,30 3,000 0,06
39 st-ht 3 2 −1,600 3 0,06 5 0,19 −1,600 0,11
40 st-iqpbk1 7 8 −621,488 3 0,07 5 0,40 −621,488 0,07
41 st-iqpbk2 7 8 −1195,226 3 0,07 5 0,40 −1195,226 0,09
42 st-jcbpaf2 13 10 −794,856 1 0,06 4 0,68 −794,856 0,07
43 st-jcbpafex 2 2 −0,750 1 0,06 2 0,15 −1,083 0,05
44 st-kr 5 2 −85,000 2 0,06 4 0,19 −85,000 0,07
45 st-pan1 4 3 −5,284 3 0,06 5 0,23 −5,284 0,05
46 st-pan2 1 5 −17,000 4 0,07 6 0,29 −17,000 0,05
47 st-ph1 5 6 −230,117 3 0,07 5 0,34 −230,117 0,05
48 st-ph10 4 2 −10,500 1 0,06 2 0,15 −10,500 0,03
A SLP Algorithm for Nonconvex QP 33
NO QP m n SLPqp CPLEX
f∗ It1 CPU1 It CPU f ∗ CPU
49 st-ph11 4 3 −11,281 4 0,06 6 0,22 −11,281 0,04
50 st-ph12 4 3 −22,625 4 0,06 6 0,22 −22,625 0,04
51 st-ph13 10 3 −11,281 4 0,06 6 0,23 −11,281 0,04
52 st-ph14 10 3 −229,722 2 0,06 4 0,23 −229,125 0,04
53 st-ph15 4 4 −392,704 3 0,06 5 0,27 −392,704 0,06
54 st-ph2 5 6 −1028,117 3 0,07 5 0,34 −1028,117 0,04
55 st-ph20 9 3 −158,000 3 0,06 5 0,24 −158,000 0,04
56 st-ph3 5 6 −420,235 3 0,06 5 0,33 −420,235 0,05
57 st-phex 5 2 −85,000 2 0,06 4 0,19 −85,000 0,05
58 st-qpc-m0 2 2 −5,000 3 0,06 5 0,19 −5,000 0,04
59 st-qpc-m3a 10 10 −382,695 2 0,07 4 0,49 −382,695 0,03
60 st-qpk1 4 2 −3,000 1 0,06 3 0,19 −3,000 0,05
61 st-qpk2 12 6 −12,250 2 0,06 4 0,35 −12,250 0,07
62 st-qpk3 22 11 −36,000 3 0,07 5 0,59 −36,000 0,08
63 st-z 5 3 0,000 1 0,06 2 0,17 0,000 0,05
64 stat 5 3 0,000 1 0,06 2 0,17 0,000 0,05
• In terms of CPU time, CPLEX is slightly faster than SLPqp in almost all the
Globallib test problems. Except for the nonconvex problems qp1, qp2 and qp3
(see Table 3), CPLEX has failed to obtain the solution after 4 h, while SLPqp
has found an approximate global extreme point in less than 8 s.
• SLPqp outperforms CPLEX in solving all the generated test qps shown in
Table 6: our algorithm has found the known global minimum of all the gener-
ated problems with a good accuracy (2.91 × 10−11 ≤ Error ≤ 1.86 × 10−5 ).
Moreover, SLPqp solved the problem Rosen-qp3 of dimension 31 × 30 in 1.67 s,
while CPLEX found the solution in 542.39 s (9 min); SLPqp solved Rosen-qp4
of dimension 41 × 40 in 2.69 s, while CPLEX found the solution in 63120.91 s
(17.53 h); SLPqp solved Rosen-qp5 of dimension 51 × 50 in 4.26 s, while CPLEX
failed to find the solution after 238482.58 s (66.25 h). Finally, for problems of
dimension 201 × 200, 401 × 400 and 1001 × 1000, SLPqp found the global opti-
mal values in less than 5142.77 s (1.43 h), while we have broken the execution of
CPLEX after 4 h.
• Since the global optimal solution of concave quadratic programs is an extreme
point, SLPqp gives the global optimum with a good accuracy for this type of
problems.
34 M. Bentobache et al.
NO QP m n SLPqp CPLEX
Error It1 CPU1 It CPU Error CPU
1 ex2-1-5 11 10 4,74E-10 1 0,07 2 0,32 4,74E-10 0,23
2 ex2-1-7 10 20 1,74E-09 4 0,09 6 0,93 1,74E-09 0,17
3 ex2-1-8 0 24 0 2 0,08 4 1,02 0 0,08
4 st-fp7a 10 20 6,12E-04 2 0,08 5 1,31 6,12E-04 0,13
5 st-fp7b 10 20 6,12E-04 5 0,10 8 1,34 6,12E-04 0,13
6 st-fp7c 10 20 2,68E+03 4 0,09 7 1,32 3,18E-04 0,12
7 st-fp7d 10 20 6,12E-04 3 0,08 6 1,31 6,12E-04 0,18
8 st-fp7e 10 20 1,34E-04 4 0,09 7 1,31 1,34E-04 0,16
9 st-m1 11 20 0 1 0,07 2 0,55 8,89E-02 0,10
10 st-m2 21 30 0 1 0,07 2 0,89 9,67E-01 0,14
11 st-qpc-m1 5 5 0 2 0,06 4 0,30 4,50E-09 0,04
12 st-qpc-m3b 10 10 0 1 0,06 2 0,31 0 0,04
13 st-qpc-m3c 10 10 0 1 0,06 2 0,31 0 0,03
14 st-qpc-m4 10 10 0 1 0,06 2 0,30 0 0,04
15 st-rv1 5 10 0 2 0,07 4 0,51 1,42E-14 0,07
16 st-rv2 10 20 0 1 0,07 2 0,55 1,42E-14 0,08
17 st-rv3 20 20 0 2 0,08 4 1,02 0 0,21
18 st-rv7 20 30 0 2 0,09 4 1,50 7,87E-10 0,26
19 st-rv8 20 40 0 3 0,12 5 2,09 0 0,17
20 st-rv9 20 50 0 2 0,11 5 3,99 0 0,68
QP n SLPqp CPLEX
Error It1 CPU1 It CPU Error CPU
Rosen-qp1 10 5,38E-09 2 0,07 4 0,50 3,20E-04 0,78
Rosen-qp2 20 2,91E-11 2 0,08 4 0,99 7,06E-04 5,01
Rosen-qp3 30 5,82E-11 2 0,09 4 1,66 2,61E-03 542,39
Rosen-qp4 40 7,33E-09 2 0,11 4 2,69 1,30E-05 63120,91
Rosen-qp5 50 1,16E-08 2 0,15 4 4,26 Failure 238482.58
Rosen-qp6 200 9,69E-08 1 1,97 3 274,77 – >14400
Rosen-qp7 400 1,86E-05 1 23,17 3 4782,68 – >14400
Rosen-qp8 1000 4,77E-06 1 677,29 3 5142,77 – >14400
A SLP Algorithm for Nonconvex QP 35
• SLPqp has successfully found the same global optimum as CPLEX for problems
miqp1,. . . ,miqp9 (see Table 2). For test problems miqp10, miqp11 and miqp12,
our algorithm has found mixed-integer approximate global optimal solutions in
less than 89 s, while we have interrupted the execution of CPLEX after 3 h.
5 Conclusion
References
1. An, L.T.H., Tao, P.D.: A branch and bound method via dc optimization algorithms
and ellipsoidal technique for box constrained nonconvex quadratic problems. J.
Global Optim. 13(2), 171–206 (1998)
2. Absil, P.-A., Tits, A.L.: Newton-KKT interior-point methods for indefinite
quadratic programming. Comput. Optim. Appl. 36(1), 5–41 (2007)
3. Bentobache, M., Telli, M., Mokhtari, A.: A global minimization algorithm for con-
cave quadratic programming. In: Proceedings of the 29th European Conference on
Operational Research, EURO 2018, p. 329, University of Valencia, 08–11 July 2018
4. Bentobache, M., Telli, M., Mokhtari, A.: A simplex algorithm with the small-
est index rule for concave quadratic programming. In: Proceedings of the Eighth
International Conference on Advanced Communications and Computation, INFO-
COMP 2018, pp. 88–93, Barcelona, Spain, 22–26 July 2018
5. Chinchuluun, A., Pardalos, P.M., Enkhbat, R.: Global minimization algorithms for
concave quadratic programming problems. Optimization 54(6), 627–639 (2005)
6. CPLEX12.8, IBM Ilog. Inc., NY (2017)
7. Floudas, C.A., Pardalos, P.M., Adjiman, C., Esposito, W.R., Gumus, Z.H., Hard-
ing, S.T., Klepeis, J.L., Meyer, C.A., Schweiger, C.A.: Handbook of Test Problems
in Local and Global Optimization. Nonconvex Optimization and its Applications.
Springer, Boston (1999)
8. Globallib: Gamsworld global optimization library. http://www.gamsworld.org/
global/globallib.htm. Accessed 15 Jan 2019
9. Hiriart-Urruty, J.B., Ledyaev, Y.S.: A note on the characterization of the global
maxima of a (tangentially) convex function over a convex set. J. Convex Anal. 3,
55–62 (1996)
10. Horst, R.: An algorithm for nonconvex programming problems. Math. Program.
10, 312–321 (1976)
11. Matlab2018a. Mathworks, Inc., NY (2018)
36 M. Bentobache et al.
12. Pardalos, P.M., Rodgers, G.: Computational aspects of a branch and bound algo-
rithm for quadratic zero-one programming. Computing 45(2), 131–144 (1990)
13. Rusakov, A.I.: Concave programming under simplest linear constraints. Comput.
Math. Math. Phys. 43(7), 908–917 (2003)
14. Strekalovsky, A.S.: Global optimality conditions for nonconvex optimization. J.
Global Optim. 12(4), 415–434 (1998)
15. Sung, Y.Y., Rosen, J.B.: Global minimum test problem construction. Math. Pro-
gram. 24(1), 353–355 (1982)
16. Tao, P.D., An, L.T.H.: Convex analysis approach to DC programming: theory,
algorithms and applications. Acta Math. Vietnam. 22, 289–355 (1997)
17. Telli, M., Bentobache, M.: Mokhtari, A: A Successive Linear Approximations App-
roach for the Global Minimization of a Concave Quadratic Program, Submitted to
Computational and Applied Mathematics. Springer (2019)
18. Tuy, H.: Concave programming under linear constraints. Doklady Akademii Nauk
SSSR 159, 32–35 (1964)
19. Tuy, H.: DC optimization problems. In : Convex analysis and global optimization.
Springer optimization and its applications, vol. 110, pp. 167–228, Second edn.
Springer, Cham (2016)
20. Wang, F.: A new exact algorithm for concave knapsack problems with integer
variables. Int. J. Comput. Math. 96(1), 126–134 (2019)
21. Xia, W., Vera, J., Zuluaga, L. F.: Globally solving non-convex quadratic programs
via linear integer programming techniques. arXiv preprint, arXiv:1511.02423v3
(2018)
A Survey of Surrogate Approaches
for Expensive Constrained Black-Box
Optimization
Rommel G. Regis(B)
1 Introduction
min f (x)
s.t. x ∈ Rd , ≤ x ≤ u
gi (x) ≤ 0, i = 1, . . . , m (1)
hj (x) = 0, j = 1, . . . , p
x ∈ X ⊆ Rd
A widely used kriging surrogate model is described in Jones et al. [13] and
Jones [12] (sometimes called the DACE model) where the values of the black-
box function f are assumed to be the outcomes of a stochastic process. That
is, before f is evaluated at any point, assume that f (x) is a realization of a
Gaussian random variable Y (x) ∼ N (μ, σ 2 ). Moreover, for any two points xi
and xj , the correlation between Y (xi ) and Y (xj ) is modeled by
d
Corr[Y (xi ), Y (xj )] = exp − θ |xi − xj |
p
, (4)
=1
y(x∗ ) = μ
+ rT R−1 (y − J μ
), (5)
J T R−1 y
and r = (Corr[Y (x∗ ), Y (x1 )], . . . , Corr[Y (x∗ ), Y (xn )]) .
T
= T −1
where μ
J R J
Moreover, a measure of error of the kriging predictor at x∗ is given by
T −1 (1 − rT R−1 r)2
2
1−r R r+
s (x) = σ 2
, (6)
J T R−1 J
2 =
where σ 1
n (y )T R−1 (y − J μ
− Jμ ).
One effective infill strategy for problems with expensive black-box objective and
inequality constraints (no equality constraints and no hidden constraints) is
provided by the COBRA algorithm [19]. COBRA uses the above RBF model
to approximate the objective and constraint functions though one can use
other types of surrogates with its infill strategy. It treats each inequality con-
straint individually instead of combining them into a penalty function and
builds/updates RBF surrogates for the objective and constraints in each iter-
ation. Moreover, it handles infeasible initial sample points using a two-phase
approach where Phase I finds a feasible point while Phase II improves on this
feasible point. In Phase I, the next iterate is a minimizer of the sum of the
squares of the predicted constraint violations (as predicted by the RBF surro-
gates) subject only to the bound constraints. In Phase II, the next iterate is a
minimizer of the RBF surrogate of the objective subject to RBF surrogates of
the inequality constraints within some small margin and also satisfying a dis-
tance requirement from previous iterates. That is, the next iterate xn+1 solves
the optimization subproblem:
(0)
minx sn (x)
s.t. x ∈ Rd , ≤ x ≤ u
(i) (i) (7)
sn (x) + n ≤ 0, i = 1, 2, . . . , m
x − xj ≥ ρn , j = 1, . . . , n
(0) (i)
Here, sn (x) is the RBF model of f (x) while sn (x) is the RBF model of gi (x)
(i)
for i = 1, . . . , m. Moreover, n > 0 is the margin for the ith constraint and ρn is
the distance requirement given the first n sample points. The margins are meant
to facilitate the generation of feasible iterates. The ρn ’s are allowed to cycle
from large values meant to enforce global search and small values that promote
42 R. G. Regis
References
1. Appel, M.J., LaBarre, R., Radulović, D.: On accelerated random search. SIAM J.
Optim. 14(3), 708–731 (2004)
2. Bagheri, S., Konen, W., Allmendinger, R., Branke, J., Deb, K., Fieldsend, J.,
Quagliarella, D., Sindhya, K.: Constraint handling in efficient global optimization.
In: Proceedings of the Genetic and Evolutionary Computation Conference, pp.
673–680. GECCO 2017, ACM, New York (2017)
3. Bagheri, S., Konen, W., Emmerich, M., Bäck, T.: Self-adjusting parameter control
for surrogate-assisted constrained optimization under limited budgets. Appl. Soft
Comput. 61, 377–393 (2017)
4. Basudhar, A., Dribusch, C., Lacaze, S., Missoum, S.: Constrained efficient global
optimization with support vector machines. Struct. Multidiscip. Optim. 46(2),
201–221 (2012)
5. Bouhlel, M.A., Bartoli, N., Otsmane, A., Morlier, J.: Improving kriging surrogates
of high-dimensional design models by partial least squares dimension reduction.
Struct. Multidiscip. Optim. 53(5), 935–952 (2016)
46 R. G. Regis
6. Bouhlel, M.A., Bartoli, N., Regis, R.G., Otsmane, A., Morlier, J.: Efficient global
optimization for high-dimensional constrained problems by using the kriging mod-
els combined with the partial least squares method. Eng. Optim. 50(12), 2038–2053
(2018)
7. Boukouvala, F., Hasan, M.M.F., Floudas, C.A.: Global optimization of general
constrained grey-box models: new method and its application to constrained PDEs
for pressure swing adsorption. J. Global Optim. 67(1), 3–42 (2017)
8. Conn, A.R., Le Digabel, S.: Use of quadratic models with mesh-adaptive direct
search for constrained black box optimization. Optim. Methods Softw. 28(1), 139–
158 (2013)
9. Forrester, A.I.J., Sobester, A., Keane, A.J.: Engineering Design Via Surrogate
Modelling: A Practical Guide. Wiley (2008)
10. Ginsbourger, D., Le Riche, R., Carraro, L.: Kriging Is Well-Suited to Parallelize
Optimization, pp. 131–162. Springer, Heidelberg (2010)
11. Jones, D.R.: Large-scale multi-disciplinary mass optimization in the auto industry.
In: MOPTA 2008, Modeling and Optimization: Theory and Applications Confer-
ence. MOPTA, Ontario, Canada, August 2008
12. Jones, D.R.: A taxonomy of global optimization methods based on response sur-
faces. J. Global Optim. 21(4), 345–383 (2001)
13. Jones, D., Schonlau, M., Welch, W.: Efficient global optimization of expensive
black-box functions. J. Global Optim. 13(4), 455–492 (1998)
14. Koch, P., Bagheri, S., Konen, W., Foussette, C., Krause, P., Bäck, T.: A new
repair method for constrained optimization. In: Proceedings of the Genetic and
Evolutionary Computation Conference (GECCO 2015), pp. 273–280 (2015)
15. Nuñez, L., Regis, R.G., Varela, K.: Accelerated random search for constrained
global optimization assisted by radial basis function surrogates. J. Comput. Appl.
Math. 340, 276–295 (2018)
16. Parr, J.M., Keane, A.J., Forrester, A.I., Holden, C.M.: Infill sampling criteria for
surrogate-based optimization with constraint handling. Eng. Optim. 44(10), 1147–
1166 (2012)
17. Powell, M.J.D.: The theory of radial basis function approximation in 1990. In:
Light, W. (ed.) Advances in Numerical Analysis, Volume 2: Wavelets, Subdivision
Algorithms and Radial Basis Functions, pp. 105–210. Oxford University Press,
Oxford (1992)
18. Regis, R.G.: Stochastic radial basis function algorithms for large-scale optimization
involving expensive black-box objective and constraint functions. Comput. Oper.
Res. 38(5), 837–853 (2011)
19. Regis, R.G.: Constrained optimization by radial basis function interpolation for
high-dimensional expensive black-box problems with infeasible initial points. Eng.
Optim. 46(2), 218–243 (2014)
20. Regis, R.G.: Evolutionary programming for high-dimensional constrained expen-
sive black-box optimization using radial basis functions. IEEE Trans. Evol. Com-
put. 18(3), 326–347 (2014)
21. Regis, R.G.: Surrogate-assisted particle swarm with local search for expensive con-
strained optimization. In: Korošec, P., Melab, N., Talbi, E.G. (eds.) Bioinspired
Optimization Methods and Their Applications, pp. 246–257. Springer International
Publishing, Cham (2018)
22. Regis, R.G., Shoemaker, C.A.: Parallel radial basis function methods for the global
optimization of expensive functions. Eur. J. Oper. Res. 182(2), 514–535 (2007)
A Survey of Surrogate Approaches 47
23. Regis, R.G., Wild, S.M.: CONORBIT: constrained optimization by radial basis
function interpolation in trust regions. Optim. Methods Softw. 32(3), 552–580
(2017)
24. Sasena, M.J., Papalambros, P., Goovaerts, P.: Exploration of metamodeling sam-
pling criteria for constrained global optimization. Eng. Optim. 34(3), 263–278
(2002)
25. Schonlau, M.: Computer Experiments and Global Optimization. Ph.D. thesis, Uni-
versity of Waterloo, Canada (1997)
26. Sóbester, A., Leary, S.J., Keane, A.J.: On the design of optimization strategies
based on global response surface approximation models. J. Global Optim. 33(1),
31–59 (2005)
27. Wild, S.M., Regis, R.G., Shoemaker, C.A.: ORBIT: optimization by radial basis
function interpolation in trust-regions. SIAM J. Sci. Comput. 30(6), 3197–3219
(2008)
28. Zhan, D., Qian, J., Cheng, Y.: Pseudo expected improvement criterion for parallel
EGO algorithm. J. Global Optim. 68(3), 641–662 (2017)
Adaptive Global Optimization Based on
Nested Dimensionality Reduction
1 Introduction
This paper considers “black-box” global optimization problems of the following
form:
This study was supported by the Russian Science Foundation, project No. 16-11-10150.
|zi − zi−1 |
μ = max (3)
1≤i≤k Δi
where Δi = xi − xi−1 . If the above formula yields a zero value, assume that
μ = 1.
Step 3. For each interval (xi−1 , xi ), 1 ≤ i ≤ k, calculate the characteristic
(zi − zi−1 )2
R(i) = rμΔi + − 2(zi + zi−1 ), (4)
rμΔi
where r > 1 is a predefined parameter of the method.
Step 4. Find the interval (xt−1 , xt ) with the maximum characteristic
3 Dimensionality Reduction
3.1 Dimensionality Reduction Using Peano-Type Space-Filling
Curves
The use of Peano curve y(x)
y ∈ RN : −2−1 ≤ yi ≤ 2−1 , 1 ≤ i ≤ N = {y(x) : 0 ≤ x ≤ 1} (7)
where the Hölder constant H is linked to the Lipschitz constant L by the relation
√
H = 2L N + 3. (10)
Condition (9) allows adopting the algorithm for solving the one-dimensional
problems presented in Sect. 2 for solving the multidimensional problems reduced
to the one-dimensional ones. For this, the lengths of intervals Δi involved into
rules (3), (4) of the algorithm are substituted by the lengths
1/N
Δi = (xi − xi−1 ) (11)
in the block scheme are the multidimensional ones. The dimension reduction
method based on Peano curves can be applied for solving these ones. It is a
principal difference from the initial nested scheme.
Adaptive Global Optimization 53
The solving of the arising set of subproblems (15) (for the nested optimization
scheme) or (20) (for the block nested optimization scheme) can be organized in
various ways. A straightforward way (developed in details for the nested opti-
mization scheme [9,18] and for the block nested optimization scheme [2,3]) is
based on solving the subproblems according to the generation order. However,
here a loss of a considerable part of the information on the objective function
takes place when solving the multidimensional problem. Another approach is
the adaptive scheme, in which all subproblems are solved simultaneously, that
allows more complete accounting for the information on the multidimensional
and accelerating the process of its solving.
For the case of the one-dimensional subproblems the adaptive scheme was
theoretically substantiated and tested in [8,10,11]. The present work proposes
a generalization of the adaptive scheme for the case of the multidimensional
subproblems. Let us give a brief description of its basic elements.
Let us assume the nested subproblems (20) to be solved with the use of a
multidimensional global search algorithm described in Sect. 3.1. Then, each sub-
problem (20) can be associated with a numerical value called the characteristic
of this problem. The value R(t) from (5) (i.e., the maximum characteristic of
the intervals formed within the subproblem) can be taken as such characteris-
tic. According to the rule of computing the characteristics (4), the higher the
value of the characteristic, the more promising the subproblem for continuing
the search of the global minimum of the initial problem (1). Therefore, the sub-
problem with the highest characteristic is selected for executing the next trial
at each iteration. This trial either computes the objective function value ϕ(y)
(if the selected subproblem belongs to the level j = M ) or generates new sub-
problems according to (20) when j ≤ M − 1. In the latter case, new generated
problems are added to current problem set, their characteristics are computed,
and the process is repeated. The optimization process is finished when the stop
condition is fulfilled for the algorithm solving the root problem.
In order to compare the efficiencies of the algorithms, the two criteria were
used: the average number of trials and the operating characteristic. The oper-
ating characteristic of an algorithm is the function P (k) defined as the fraction
of the problems from the considered series, for solving of which not more than
k trials have been required. The problem was considered to be solved, if the
algorithm
k generated
a trial point y k in the vicinity of the global minimizer y ∗ ,
i.e. y − y < δ b − a, where δ = 0.01, a and b are the boundaries of the
∗
search domain D.
The first series of experiments has been carried out on the two-dimensional
problems from the classes Fgr , GKLS Simple, and GKLS Hard (100 functions
from each class). Table 1 presents the averaged numbers of trials executed by
GSA with the use of evolvents (Ke ), nested optimization scheme (Kn ), and adap-
tive nested optimization scheme (Kan ). Figures 1 and 2(a, b) present the operat-
ing characteristics of the algorithms obtained on the problem classes Fgr , GKLS
Simple, and GKLS Hard respectively. The solid line corresponds to the algo-
rithm using the evolvents (GSA-E), short dashed line – to the adaptive nested
optimization scheme (GSA-AN), and the long dashed line – to the nested opti-
mization scheme (GSA-N). The results of experiments demonstrate that GSA
with the use of the adaptive nested optimization scheme shows almost the same
speed as compared to GSA with the evolvents, and both algorithms considerably
exceed the algorithm using the nested optimization scheme. Therefore further
experiments were limited to the comparison of different variants of the adaptive
dimensionality reduction scheme.
0.8
0.6
GSA-E
GSA-N
GSA-AN
0.4
0.2
0
0 100 200 300 400 500 600 700 800
1 1
0.8 0.8
0.6 0.6
GSA-E GSA-E
GSA-N GSA-N
GSA-AN GSA-AN
0.4 0.4
0.2 0.2
0 0
0 200 400 600 800 1000 1200 1400 0 500 1000 1500 2000 2500 3000
(a) (b)
Fig. 2. Operating characteristics using 2d GKLS Simple (a) and Hard (b) classes
The second series of experiments has been carried out on the four-dimensional
problems from the classes GKLS Simple and GKLS Hard (100 functions of each
class). Table 2 presents the averaged numbers of trials executed by GSA with the
use of the adaptive nested optimization scheme (Kan ) and block adaptive nested
optimization scheme (Kban ) with two levels of subproblems of equal dimension-
ality N1 = N2 = 2. Note that when solving the problem of the dimensional-
ity N = 4 using the initial variant of the adaptive scheme, four levels of one-
dimensional subproblems are formed that complicates the processing of these
ones.
Figure 3(a, b) presents the operating characteristics of the algorithms
obtained on the classes GKLS Simple and GKLS Hard respectively. The dashed
line corresponds to GSA using the adaptive nested optimization scheme (GSA-
AN), the solid line – the block adaptive nested optimization scheme (GSA-BAN).
The results of experiments demonstrate the use of the block adaptive nested opti-
mization scheme provides a considerable gain in the number of trials (up to 35%)
as compared to the initial adaptive nested optimization scheme.
1 1
0.8 0.8
0.6 0.6
GSA-AN GSA-AN
GSA-BAN GSA-BAN
0.4 0.4
0.2 0.2
0 0
0 20000 40000 60000 80000 100000 0 40000 80000 120000 160000 200000
(a) (b)
Fig. 3. Operating characteristics using 4d GKLS Simple (a) and Hard (b) classes
5 Conclusion
References
1. Barkalov, K., Gergel, V., Lebedev, I.: Use of Xeon Phi coprocessor for solving
global optimization problems. Lecture Notes in Computer Science, vol. 9251, pp.
307–318 (2015)
2. Barkalov, K., Lebedev, I.: Solving multidimensional global optimization problems
using graphics accelerators. Commun. Comput. Inf. Sci. 687, 224–235 (2016)
3. Barkalov, K., Gergel, V.: Multilevel scheme of dimensionality reduction for parallel
global search algorithms. In: OPT-i 2014 Proceedings of 1st International Confer-
ence on Engineering and Applied Sciences Optimization, pp. 2111–2124 (2014)
4. Carr, C., Howe, C.: Quantitative Decision Procedures in Management and Eco-
nomic: Deterministic Theory and Applications. McGraw-Hill, New York (1964)
5. Evtushenko, Y., Posypkin, M.: A deterministic approach to global box-constrained
optimization. Optim. Lett. 7, 819–829 (2013)
Adaptive Global Optimization 57
1 Introduction
The optimal power flow (OPF) has a rich research history since it was first
introduced by Carpentier in 1962 [1]. In practice, the OPF problem aims at
minimizing the electric generator fuel cost to meet the desired load demand
for power system under various operating conditions, such as system thermal
dissipation, voltages and powers.
Briefly, the classical formulation for the OPF problem can be stated as fol-
lows:
In this section, we briefly introduce polynomial B-spline form [17]. This polyno-
mial B-spline form is the basis of the main B-spline global optimization algorithm
reported in Sect. 2.3.
60 D. D. Gawali et al.
x−m = x−m+1 = · · · = x0 ,
xi = a + ih, for i = 1, . . . , k,
xk+1 = xk+2 = · · · = xk+m .
Symt (j + 1, . . . , j + m)
πjt = m for t = 0, . . . , m. (3)
t
where ⎧
x − xi
⎨ , if xi ≤ xi+m ,
γi,m (x) = xi+m − xi (5)
⎩ 0, otherwise,
and
1, if x ∈ [xi , xi+1 ),
Ni0 (x) := (6)
0, otherwise.
A B-Spline Global Optimization Algorithm for Optimal Power Flow Problem 61
where
n
(t)
dj at πj . (10)
t=0
where I := (i1 , i2 , ..., is ), and N := (n1 , n2 , ..., ns ). By substituting (2) for each
xt , (11) can be written as
n1
ns k1 −1 ks −1
πj11 Njm1 1 (x1 ) ... πjss Njms s (xs )
(i ) (i )
p (x1 , ..., xs ) = ... ai1 ...is
i1 =0 is =0 j1 =−m1 js =−ms
k k
1 −1 s −1
n1
ns
...Njm1 1 (x1 ) ...Njms s (xs )
(i1 ) (is )
= ... ... ai1 ...is πj1 ....πjs
j1 =−m1 js =−ms i1 =0 is =0
k
1 −1 k
s −1
= ... dj1 ...js Njm1 1 (x1 ) ...Njms s (xs ) ,
j1 =−m1 js =−ms
(12)
we have expressed p as
p(x) = dI (x)NIN (x), (13)
I≤N
62 D. D. Gawali et al.
p̄(x) = D(x).
Remark 1. Above equation (15) says that the minimum and maximum B-spline
coefficients of multivariate polynomial p on x obtained by transforming it from
the power form to B-spline form, provides an enclosure for the range of the p.
We shall obtain such B-spline transformation for the OPF problem (1), followed
by a interval branch-and-bound procedure to locate correct global optimal solution
for (1).
9 {Subdivision decision}
if (wid b) & (max Do (b) − min Do (b)) < then
enter the item {b, min D0 (b)} to Lsol & go to 7
else
go to 10
end
10 {Generate two sub boxes}
Choose the subdivision direction along the longest direction of b and
the subdivision point as the midpoint. Subdivide b into two subboxes
b1 and b2 such that b = b1 ∪ b2 .
11 for r ← 1 to 2
(a) {Set flag vector}
Set F r = (F1r , . . . , Fpr , Fp+1
r r
, . . . , Fp+q ) := F
(b) {Compute B-spline coefficients and corresponding B-spline range
enclosure for br }
Compute the B-spline coefficient arrays of objective and
constraints polynomial on box br and compute corresponding
B-spline range enclosure Do (br ), Dgi (br ) and Dhj (br ) for objective
and constraints polynomial.
(c) {Set local current minimum estimate}
Set p̃local = min(Do (br ))
(d) if (p̃local < p̃) then
(I) for i ← 1 to p do
if (Fi = 0) & (Dgi (br ) ≤ 0) then
Fir = 1
end
end
(II) for j ← 1 to q do
if (Fp+j = 0) & (Dhj (br ) ⊆ [−zero , zero ]) then
r
Fp+j =1
end
end
end
(e) if F r = (1, . . . , 1) then
set p̃ := min(p̃, max(Do (br )))
end
(f ) Enter {br , Do (br ), Dgi (br ), Dhj (br ), F r } into the list L.
end
A B-Spline Global Optimization Algorithm for Optimal Power Flow Problem 65
numerical results we choose global optimization solvers like BARON and Glop-
tiPoly. For BARON the 3-bus system is modeled in GAMS and solved via the
NEOS server for optimization [16].
The B-spline Opt algorithm is implemented in MATLAB and the OPF
instance for the 3-bus system is also modeled in the MATLAB. It is solved
on a PC with an Intel-core i3-370M CPU processor running at 2.40 GHz with
a 6 GB RAM. The termination accuracy and equality constraint feasibility
tolerance zero are set to 0.001.
Table 1 shows the numerical results (global minimum, f ∗ ) with different solu-
tions approaches. We found the B-spline Opt algorithm is able to find a better
optimal solution with respect to the MATPOWER. It is worth noting that prac-
tically this accounts for around 3 $/hr savings in the fuel cost required for the
electricity generation. Our numerical results are further validated with respect to
BARON (cf. Table 1). We further note that, for GloptiPoly, the relaxation order
needs to systematically increased to obtain the convergence to the final result.
However, GloptiPoly exhausts the memory even with small relaxation order (in
this case just 2).
Table 1. Comparison of the optimal fuel cost value (f ∗ ) in (1) for a 3-bus system with
the different solution approaches.
Solver/Algorithm f*($/hr)
B-spline Opt 5703.52∗
BARON 5703.52∗
∗
GloptiPoly ∗
MATPOWER 5707.11∗ ∗ ∗
∗
Indicates the best obtained fuel cost
value. ∗∗ Indicates that the solver did
not give the result even after one hour
and therefore terminated. ∗∗∗ Indicates the
local optimal fuel cost value
Remark 2. In practice, a local optimum exists for the OPF problem. Reference
[10] shows for small bus power systems where voltage is within practical limits,
standard fixed-point optimization packages, such as MATPOWER converges to a
local optimum. Similarly, we observed that global optimization software package
GloptiPoly successfully solved the OPF instance of 3-bus power system. However,
it took significantly large amount of computational time to report the final global
optimum.
4 Conclusions
This paper addressed the important planning problem in power systems, termed
as optimal power flow (OPF). A new global optimization algorithm based on
the polynomial B-spline was proposed for solving the OPF problem. The appli-
cability of the B-spline algorithm was demonstrated on the OPF instance corre-
sponding to a real-world 3-bus system. A notable savings in the fuel cost (3 $/hr)
was achieved using B-spline algorithm with respect to traditional MATPOWER
toolbox. Similarly, the results obtained using proposed B-spline algorithm are
further validated against the generic global optimization solver BARON and
were found to be satisfactory.
References
1. Capitanescu, F.: Critical review of recent advances and further developments
needed in AC optimal power flow. Electr. Power Syst. Res. 136, 57–68 (2016)
2. Huneault, M., Galiana, F.: A survey of the optimal power flow literature. IEEE
Trans. Power Syst. 6(2), 762–770 (1991)
3. Torres, G., Quintana, V.: Optimal power flow by a nonlinear complementarity
method. In: Proceedings of the 21st IEEE International Conference Power Industry
Computer Applications (PICA’99), 211–216 (1999)
A B-Spline Global Optimization Algorithm for Optimal Power Flow Problem 67
1 Introduction
In the literature many different engineering problems referring to the topological
optimization of structural components can be found (see for instance [3–5,9,
13,15]). In most cases, only one component is considered. Recently, however,
problems involving multi-domain optimization, such as the design of multi-phase
or multi-material structures [12], have been considered.
Referring to the particular problem of the optimization of systems com-
posed by two bodies sharing the same design domain, some applications can
be found. In [7,11], the level set method is used for the optimization of the
2 Problem Formulation
Fig. 1. Generalized geometries of the design domains (Ω1 and Ω2 ) of two bodies sharing
a portion of the design space. Each body has its own system of applied forces (f1 and
f2 ) and boundary constraints (Γ1 and Γ2 ). Ω1−2 represents the shared portion of the
design space. Left: initial definition of the domains. Right: division of the domains with
assignment of the shared portion of the design space to each body.
N1 N2
where : K Ee = e=11
Ke Ee1 and K Ee2 = e=1 Ke Ee2
Ω1 and Ω2 connected
where u1 and u2 are the displacement fields of the two bodies, xe are the coor-
dinates of the centre of the considered element, K is stiffness matrix of each
body with Ke element stiffness matrix, Ee1 and Ee2 are the elastic moduli of each
element of the two bodies, with Ee∗1 and Ee∗2 reference elastic moduli, N1 and
Concurrent Topological Optimization of a Multi-component 71
Fig. 2. Diagram of the algorithm for the concurrent topological optimization. Symbols
refer to Fig. 1.
N2 are the number of elements of each body and ρ is the pseudodensity, with p
the penalty term.
In [2,8], the authors have presented a solution algorithm able to solve the
problem stated in Eq. 1 under the condition that the two bodies present the
same mesh in the shared part of the domain, i.e. in this region there is a one
to one correspondence of each element of the two bodies. In the following, this
condition is removed.
The modified solution algorithm is reported in the diagram of Fig. 2. The
algorithm is basically divided in two parts, i.e. the dashed rectangles labeled A
and B in Fig. 2.
The sub-algorithm A represents a standard topology optimization algorithm
(see [3]). This sub-algorithm follows the solution of the finite element model
72 F. Ballo et al.
of the two bodies (block 1 in Fig. 2). The sub-algorithm B is the part of the
algorithm devoted to the allocation of the shared part of the domain (Ω1−2 ) to
each of the two bodies.
The sub-algorithm B implements the following steps.
After the sub-algorithm B is completed, the new finite element model is con-
structed and the solution algorithm is repeated until convergence or the maxi-
mum number of iterations is reached.
50
1
40 F=100
2
30
Dimension [unit] 20
1-2
10
F=100
-10
0 20 40 60 80
Dimension [unit]
The described algorithm has been employed for the optimization of a tool sup-
port swing arm for a tube bending machine. This research activity has been
completed in collaboration with BLM Group [1]. Due to the very high produc-
tion rate of the machine, the swing arm is subjected to high accelerations and,
as a consequence, an inertial torque arises. The optimization of the system is
thus important to reduce energy consumption and increase the production rate.
In Fig. 5, the tool support swing arm is depicted. The figure shows only half
of the model due to the symmetry in the geometry of the system. The arm is
composed by two parts, namely the support arm which rotates around a vertical
rotation axis and the sledge sliding on a guide rail on the support arm. The tool
load is applied to the sledge by a multi-point constraint. A contact interaction is
imposed between the sledge and the support arm in correspondence of the guide
rail. It is worth noting that, by including this contact condition, a non linear
finite element analysis has to be run for each optimization step. A screw drive
is moved by a motor in order to position the sledge with respect to the support
arm. The screw is actuated by a motor connected to a gear.
74 F. Ballo et al.
Fig. 6. Optimization results - surfaces with pseudodensity greater than 0.3. Left: com-
plete system. Right: detail of the shared part of the domain.
5 Conclusion
In the present paper, an improved algorithm has been presented for the concur-
rent optimization of two bodies sharing part of the design domain. The improved
algorithm allows for the utilization of any arbitrary mesh on the bodies. Also,
76 F. Ballo et al.
the algorithm has been used with a commercial finite element software by con-
sidering a SIMP approach and a symmetry constraint in the solution, proving
the applicability of the method with existing optimization algorithm. In this
way, the algorithm can be applied to real world optimization problem.
The new algorithm has been tested on a simple two dimensional problem and
then applied to the optimization of the arm of a tube bending machine designed
in collaboration with BLM Group. The application has shown the ability of
the algorithm to solve real problems and find non trivial efficient solutions for
the assignment of the shared domain. Further developments of the method that
consider the possibility to include contact interactions in the shared part of the
domain will be investigated.
References
1. BLM Group. http://www.blmgroup.com. Accessed 24 Jan 2019
2. Ballo, F., Gobbi, M., Previati, G.: Concurrent topological optimisation: optimisa-
tion of two components sharing the design space. In: EngOpt 2018 Proceedings
of the 6th International Conference on Engineering Optimization, pp. 725–738.
Springer International Publishing, Cham (2019)
3. Bendsøe, M.P., Sigmund, O.: Topology Optimization. Theory, Methods, and Appli-
cations, 2nd edn. Springer Berlin (2004)
4. Eschenauer, H.A., Olhoff, N.: Topology optimization of continuum structures: a
review. Appl. Mech. Rev. 54(4), 331 (2001)
5. Guo, X., Cheng, G.D.: Recent development in structural design and optimization.
Acta Mech. Sin. 26(6), 807–823 (2010)
6. Kosaka, I., Swan, C.C.: A symmetry reduction method for continuum structural
topology optimization. Comput. Struct. 70(1), 47–61 (1999)
7. Lawry, M., Maute, K.: Level set shape and topology optimization of finite strain
bilateral contact problems. Int. J. Numer. Methods Eng. 113(8), 1340–1369 (2018)
8. Previati, G., Ballo, F., Gobbi, M.: Concurrent topological optimization of two
bodies sharing design space: problem formulation and numerical solution. Struct.
Multidiscip. Optim. (2018)
9. Rozvany, G.I.N.: A critical review of established methods of structural topology
optimization. Struct. Multidiscip. Optim. 37(3), 217–237 (2009)
10. Sigmund, O.: A 99 line topology optimization code written in matlab. Struct.
Multidiscip. Optim. 21(2), 120–127 (2001)
11. Strömberg, N.: Topology optimization of orthotropic elastic design domains with
mortar contact conditions. In: Schumacher, A., Vietor, T., Fiebig, S., Bletzinger,
K.-U., Maute, K. (eds.) Advances in Structural and Multidisciplinary Optimiza-
tion: Proceedings of the 12th World Congress of Structural and Multidisciplinary
Optimization, pp. 1427–1438. Springer International Publishing, Braunschweig,
Germany (2018)
12. Tavakoli, R., Mohseni, S.M.: Alternating active-phase algorithm for multimate-
rial topology optimization problems: a 115-line MATLAB implementation. Struct.
Multidiscip. Optim. 49(4), 621–642 (2014)
13. Zhang, W., Zhu, J., Gao, T.: Topology Optimization in Engineering Structure
Design. Elsevier, Oxford (2016)
Concurrent Topological Optimization of a Multi-component 77
14. Zhang, W., Yuan, J., Zhang, J., Guo, X.: A new topology optimization approach
based on Moving Morphable Components (MMC) and the ersatz material model.
Struct. Multidiscip. Optim. 53(6), 1243–1260 (2016)
15. Zhu, J.H., Zhang, W.H., Xia, L.: Topology optimization in aircraft and aerospace
structures design. Arch. Comput. Methods Eng. 23(4), 595–622 (2016)
Discrete Interval Adjoints in
Unconstrained Global Optimization
1 Introduction
The preferred numerical method to compute derivatives of a given computer
code at a specified point by exploiting the chain rule and elemental symbolic
differentiation rules is algorithmic differentiation (AD) [1,2]. The tangent mode
of AD computes the Jacobian at a cost proportional to the number of argu-
ments. In case of a high-dimensional domain and a low-dimensional codomain
the adjoint mode is advantageous for the derivative computation as the costs are
proportional to the number of outputs. AD methods are successfully applied in
e.g. machine learning [3], computational finance [4], and fluid dynamics [5].
Interval arithmetic (IA) has the property that all values that can be evaluated
are reliably contained in the output of the corresponding interval evaluation on a
given domain. This has the advantage that instead of evaluating the function at
several points a single function evaluation in IA is required to obtain semi-local
information on the function value. Among others, IA can be used to estimate
errors with floating-point computations [6,7], and in optimization to find global
optima [8–10]. Branch and bound algorithms are often applied in this context.
Combining the the discrete differentiation techniques of AD and the inclusion
property of IA yields semi-local derivative information. This information can
e.g. be used to compute worst-case approximations of the error that can occur
in a neighborhood of an evaluation point. Another application field for interval
adjoints is approximate and unreliable computing [11].
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 78–88, 2020.
https://doi.org/10.1007/978-3-030-21803-4_8
Discrete Interval Adjoints in Unconstrained Global Optimization 79
2 Methodology
2.1 Interval Arithmetic
IA is a concept that enables to compute bounds of a function evaluation on a
given interval. Since this chapter will only give a brief introduction to IA, the
reader might be referred to [6–8] to get more information about the topic.
We will use the following notation for an interval of a variable x with lower
bound x and upper bound x
[x] = [x, x] = {x ∈ R | x ≤ x ≤ x} .
The real numbers of the midpoint m [x] and the width w [x] are defined as
The superset relation states that the interval [y] can be an overestimation of all
possible values on [x], but it guarantees that these values are contained. To ensure
this inclusion property arithmetic operations and elementary functions must be
redefined. More complex functions can be composed of these basic functions.
One reason for the already mentioned overestimation is that the underlying
data format (e.g. floating-point numbers) cannot represent the exact bounds. In
case of a lower bound IA rounds to negative infinity and in case of an upper
bound to positive infinity. Overestimation can also be caused by the dependency
problem. If a function evaluation uses a variable multiple times, IA does not take
into account that actual values taken from these intervals are equal. The larger
the intervals are the more significant the overestimation is.
Another problem of applying IA occurs if the implementation contains con-
ditional branches that depend on interval arguments. Comparisons of intervals
are only well-defined if these intervals do not intersect. By accepting further
80 J. Deussen and U. Naumann
AD techniques use the chain rule to compute additionally to the function value
of a primal implementation the derivative of the function values with respect
to arguments and intermediate variables at a specified point. Differentiability of
the underlying function is required for the application of AD.
In the following we will only consider multivariate scalar functions with n
arguments x and a single output y
f : Rn → R, y = f (x) . (1)
The first derivative of these functions is the gradient ∇f (x) ∈ Rn , and the second
derivative is the Hessian matrix ∇2 f (x) ∈ Rn×n . The next subsections will
briefly introduce the basic modes of AD. More detailed and general derivations
of these models can e.g. be found in [1,2].
Tangent Mode The tangent model can be derived by differentiating the func-
tion dependence. Thus, the model consists of the function evaluation in (1) and
∂y (1)
y (1) = x . (2)
∂xj j
For each evaluation with x(1) set to the i-th Cartesian basis vector ei in Rn
(also called seeding), an entry of the gradient can be extracted from y (1) (also
called harvesting). Using this model to get all entries of the gradient requires
n evaluations which is proportional to the number of arguments. The costs of
this method are similar to the costs of a finite difference approximation but AD
methods are accurate up to machine precision.
Discrete Interval Adjoints in Unconstrained Global Optimization 81
Adjoint Mode The adjoint mode is also called reverse mode, due to the reverse
computation of the adjoints compared to the computation of the values. There-
fore, a data-flow reversal of the program is required, to store additional informa-
tion on the computation (e.g. partial derivatives) [12], which potentially leads
to high memory requirements. The data structure to store this additional infor-
mation is often called tape.
Again following [2], first-order adjoints are denoted with a subscript (1) .
∂y
x(1),j = y(1) . (3)
∂xj
This equation is computed for each j = 0, . . . , n − 1. Note that the evaluation
of the primal (1) is also part of the adjoint models. The reverse mode yields a
product of the gradient with the adjoint y(1)
x(1) = y(1) · ∇f (x) . (4)
By seeding y(1) = 1 the resulting x(1) contains all entries of the gradient. A
single adjoint computation is required.
The general idea of these algorithms is to partition the domain and try to remove
those parts that cannot contain a global minimum. Furthermore, the algorithms
find a bound for the global minimum y ∗ on domain D and return a subdomain
with desired precision that contains the minimum.
The algorithm that is referred to in this paper is described in Algorithm 1.
It uses a task queue Q to manage all (sub)domains that need to be analyzed. At
the beginning there will only be a single task in the queue (domain [x] = D).
The algorithm terminates if the queue is empty. In line 7 the tangent component
of [x] and the adjoint component of [y] are seeded. The adjoint model itself is
called in line 8. After that there are three verification of the following conditions
that need to be fulfilled at the global minimum:
1. The value must be less than any other value in the domain.
2. The first-order optimality condition requires that the gradient is zero.
3. The second-order optimality condition needs a positive-definite Hessian.
To eliminate those parts of the domain that cannot contain a global minimum
the these conditions are reformulated in IA: domains with a lower bound of the
function value y that is larger than the upper bound of the global minimum y ∗
and domains that does not contain zeros in the gradient intervals ∇[x] f are
removed. The third check removes a domain if the Hessian is not positive-definite
v · ∇2[x] f · v < 0 ∃v ∈ Rn .
(2)
The product of the interval Hessian with a random vector [x] (line 7) can be
(2)
harvested from [x](1) . This product is multiplied by the random vector again. If
the resulting interval is negative, the Hessian is not positive-definite.
Discrete Interval Adjoints in Unconstrained Global Optimization 83
In line 12 the bound for the global minimum y ∗ is updated. The implemented
branch and bound algorithm provides three methods to update this upper bound:
1. Store the minimal upper bound of the interval value y.
2. Store the smallest value evaluated at the midpoint of the domains f (m [x]).
3. Investigate a few gradient descent steps on the domain to advance towards a
(local) minimum and store the smallest function value.
While the first method only needs the already computed interval function value,
the other two methods require further function evaluations and the third even
requires derivatives in floating-point arithmetic.
If none of the previous checks failed and if the domain is still larger than
the desired precision X , the domain will be refined with a splitting (line 14) in
every direction. This procedure appends 2n new tasks to the task queue.
Whenever an undefined comparison occurs the IA implementation is assumed
to throw an exception which is handled in lines 6 and 19. The algorithm only
84 J. Deussen and U. Naumann
2 2
1 1
x[1]
x[1]
0 0
-1 -1
-2 -2
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
x[0] x[0]
Fig. 1. Isolines of the composed objective function (left) with non-smooth regions in
red and global optima in green. Results of the algorithm (right). Red domains can be
non-smooth, green domains can contain global optima, blue, orange and purple indicate
if the value, gradient or Hessian check failed, respectively.
splits these domains until a previously defined width N . This procedure has the
advantage that it will prevent the algorithm to generate huge amounts of tasks.
But it will also result in a surrounding of each potential non-smoothness that
needs to be investigated further by other techniques than the applied IA.
The implementation of the branch and bound algorithm uses the AD tool
dco/c++1 [13] to compute all required derivatives in interval as well as in floating-
point arithmetic. The Boost library [14] is used as an implementation of IA. Both
template libraries make use of the concept of operator overloading. Choosing
the interval datatype as a base type for the adjoint datatype results in the
desired first-order interval derivatives. Nesting the interval adjoint type into
a tangent type yields a interval with second-order derivative information. The
adjoint models are only evaluated if the value check passed. Furthermore, OpenMP
was used for its implementation of a task queue which also takes care about the
parallelization on a shared memory architecture.
The user of the branch and bound implementation can decide which of the
conditions described in the previous section should be verified. The software
only uses second-order adjoint types if the second-order optimality condition is
checked. Moreover, the user can select which method should be used to update
the bound of the global minimum. If the third method is chosen the user needs
to decide how many gradient descent steps should be performed for every task.
4 Case Studies
4.1 Ambiguous Control-Flow Branches
As a first test case the global minima of the six-hump camel function [15] are
computed. To show how the algorithm treats control-flow branches the imple-
1
https://www.nag.co.uk/content/adjoint-algorithmic-differentiation.
Discrete Interval Adjoints in Unconstrained Global Optimization 85
1.4 1.4
1.2 GD (2 log(w[x]/ X)) 1.2 GD (2 log(w[x]/ X))
1 GD (log(w[x]/ X)) 1 GD (log(w[x]/ X))
0.8 GD (16) 0.8 GD (16)
–y*
–y*
0.6 GD (4) 0.6 GD (4)
0.4 f(m[x]) 0.4 f(m[x])
0.2 y– 0.2
–
y
0 0
0 10 20 30 40 50 60 0 50 100 150 200 250 300 350
Fig. 2. Convergence of the bound for the minimum y ∗ for the different update
approaches over time (left) and over number of tasks (right). For the gradient descent
methods (GD) the number of performed steps is given in brackets.
To figure out which method is the best for updating y ∗ we performed tests
for the Griewank function [16] with n = 8 on the domain [x0 ] = [−200, 220]
in each direction. The target interval width was set to X = 10−13 . In Fig. 2
we compare the convergence of the minimum bound for the proposed update
methods. Evaluating the function at the midpoint of the domain improves the
branch and bound compared to using the upper bounds of the interval values.
The (incomplete) local search of a minimum implemented by a gradient descent
can decrease y ∗ even faster although it has higher computational costs due to
the computation of the gradient for every task. We observe that the gradient
descent method with 16 steps converges faster in the beginning, but it loses this
advance after some time due to the high computational effort. Thus, choosing
86 J. Deussen and U. Naumann
the number of decent steps to be dependent on the width (brown) of the domain
requires an unchanged number of tasks while the run time decreases. Computing
even more gradient descent steps (green) reduces the number of computed tasks
and with that the run time to less than a half.
Table 1. Additional average costs per task for computing interval derivative infor-
mation if required (left) and relative amount of tasks failing the particular conditions
(right) for the Griewank (GW), Rosenbrock (RB) and Styblinski-Tang (ST) function.
References
1. Griewank, A., Walther, A.: Evaluating Derivatives: Principles and Techniques of
Algorithmic Differentiation. 2nd edn. SIAM, Philadelphia, PA (2008)
2. Naumann, U.: The Art of Differentiating Computer Programs: An Introduction to
Algorithmic Differentiation. SIAM, Philadelphia (2012)
3. Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M.: Automatic differ-
entiation in machine learning: a survey. J. Mach. Learn. Res. 18(1), 5595–5637
(2017)
4. Giles, M., Glasserman, P.: Smoking adjoints: fast monte carlo greeks. Risk 19(1),
88–92 (2006)
5. Towara, M., Naumann, U.: SIMPLE adjoint message passing. Optim. Methods
Softw. 33(4–6), 1232–1249 (2018)
6. Moore, R.E.: Methods and Applications of Interval Analysis, 2nd edn. SIAM,
Philadelphia (1979)
7. Moore, R.E., Kearfott, R.B., Cloud, M.J.: Introduction to Interval Analysis. SIAM,
Philadelphia (2009)
8. Hansen, E., Walster, G.W.: Global Optimization using Interval Analysis. Marcel
Dekker, New York (2004)
9. Neumaier, A.: Complete search in continuous global optimization and constraint
satisfaction. Acta Numer. 13, 271–369 (2004)
10. Floudas, C.A., Pardalos, P.M.: Encyclopedia of Optimization, 2nd edn. Springer,
New York (2009)
11. Vassiliadis, V., Riehme, J., Deussen, J., Parasyris, K., Antonopoulos, C.D., Bellas,
N., Lalis, S., Naumann, U.: Towards automatic significance analysis for approx-
imate computing. In: Proceedings of CGO 2016, pp. 182–193. ACM, New York,
(2016)
88 J. Deussen and U. Naumann
12. Hascoët, L., Naumann, U., Pascual, V.: “To be recorded” analysis in reverse-mode
automatic differentiation. FGCS 21(8), 1401–1417 (2005)
13. Naumann, U., Lotz, J., Leppkes, K., Towara, M.: Algorithmic differentiation of
numerical methods: tangent and adjoint solvers for parameterized systems of non-
linear equations. ACM Trans. Math. Softw. 41(4), 26:1–26:21 (2015)
14. Brönnimann, H., Melquiond, G., Pion, S.: The design of the Boost interval arith-
metic library. Theor. Comput. Sci. 351(1), 111–118 (2006)
15. Dixon, L.C.W., Szegö, G.P.: The global optimization problem: an introduction. In:
Towards Global Optimization, vol. 2, pp. 1–15. North-Holland, Amsterdam (1978)
16. Griewank, A.: Generalized decent for global optimization. J. Optim Theory Appl.
34(1), 11–39 (1981)
17. Rosenbrock, H.H.: An automatic method for finding the greatest or least value of
a function. Comput. J. 3(3), 175–184 (1960)
18. Styblinski, M.A., Tang, T.S.: Experiments in nonconvex optimization: stochastic
approximation with function smoothing and simulated annealing. Neural Netw.
3(4), 467–483 (1990)
Diving for Sparse Partially-Reflexive
Generalized Inverses
1 Introduction
AHA = A (P1)
HAH = H (P2)
M. Fampa was supported in part by CNPq grant 303898/2016-0. J. Lee was supported
in part by ONR grant N00014-17-1-2296
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 89–98, 2020.
https://doi.org/10.1007/978-3-030-21803-4_9
90 V. K. Fuentes et al.
(AH) = AH (P3)
(HA) = HA (P4)
Note that with regard to how a generalized inverse H is used, we are moti-
vated by the situation in which A is very large (and hence so is H), and we have
many right-hand sides b for which we wish to form Hb. Clearly, a sparse H has
computational advantages for this use case.
Except for P2, the other M-P properties are linear (in H). Therefore, mini-
mizing H1 over any subset of the M-P properties including P1 and excluding
P2 yields four different sparse generalized inverses which can be computed by
linear optimization; see [11].
Following [16], we call any generalized inverse satisfying P2 a reflexive gen-
eralized inverse. P2 is strongly connected with the rank of a generalized inverse:
on the M-P properties were introduced in [11] and developed further in [10,17].
The modeling ideas presented in Sect. 2 are extended to general bilinear forms in
[9]. The work in this paper will also appear as part of the forthcoming doctoral
dissertation of V.K. Fuentes.
2 Relaxing P2
T −H ≥0 ,
T +H ≥0 ,
AHA = A .
On top of this formulation, we can easily impose any of P3 and P4, which are
obviously linear in H. The challenge is to find a way to work with P2. Really, we
do not consider fully imposing P2—in fact, if we had already imposed P3 and
P4, we would simply end up with the M-P pseudoinverse, which is likely to be
fully dense. Rather, we want to impose a relaxation of P2, so as to give ourselves
enough room to find a sparse solution.
First, we write P2 as the following nm non-symmetric quadratic equations
for all ij ∈ n × m.
92 V. K. Fuentes et al.
where the concave terms −t2pij have been replaced with the linear terms +wpij ,
for p = 1, 2. Assuming lower and upper bounds on tpij (αpij ≤ tpij ≤ βpij ), the
new variables wpij are then constrained to satisfy the secant inequalities
2
βpij − αpij
2
− (tpij − αpij ) + αpij ≤ wpij .
2
βpij − αpij
We assume that we can impose reasonable interval bounds on the hij , say
for ij ∈ n × m. Then interval bounds [αpij , βpij ] on tpij can be directly derived:
m
1
n
α1ij = min{(uij ) λi , (uij ) μi } + min{(v ij ) λj , (v ij ) μj } ,
2
=1 =1
m
1
n
β1ij = max{(u ) λi , (u ) μi } +
ij ij
max{(v ) λj , (v ) μj } ,
ij ij
2
=1 =1
m
1
n
α2ij = min{(u ) λi , (u ) μi } −
ij ij
max{(v ) λj , (v ) μj } ,
ij ij
2
=1 =1
m
1
n
β2ij = max{(u ) λi , (u ) μi } −
ij ij
min{(v ) λj , (v ) μj } .
ij ij
2
=1 =1
Though, we could also seek to tighten these bounds by casting and solving
appropriate optimization problems.
Diving for Sparse Partially-Reflexive Generalized Inverses 93
Additionally, we could replace the convex quadratic terms +t2pij with lower-
bounding linearizations. That is, we can replace +t2pij with
2
ηpij + 2ηpij (tpij − ηpij ),
at one or more values ηpij ∈ [αpij , βpij ] in the interval domain of tpij . More
specifically, we substitute as follows:
+t21ij ← ηpij
2
+ 2η1ij (uij hi· + v ij h·j )/2 − η1ij ,
and
+t22ij ← η2ij
2
+ 2η2ij (uij hi· − v ij h·j )/2 − η2ij .
In this manner, we could choose to work with a linear rather than quadratic
model.
94 V. K. Fuentes et al.
Let (Ĥ, K̂) denote the solution of P. Appropriate vectors uij and v ij can be
obtained from the columns of the matrices U ij and V ij in the SVD:
U ij (K̂ij − ĥi· ĥ·j )V ij = Σ ij .
Though it might be beneficial to pre-compute some of these vectors, before
finding cuts iteratively via SVD.
3 Diving
The inequalities that we have been considering for relaxing P2 are rather heavy,
and it is not practical to include a large number of them. Moreover, it may not be
desirable to even implicitly include all of them. The inequalities relax P2, but we
do not want to fully enforce P2. Instead, we understand that there is a trade-off
to be made between sparsity, as measured by H1 , and satisfaction of P2, as
measured say by H − HAHF . In this section, we propose a “diving” procedure
for progressively enforcing P2 while heuristically narrowing the domain of our
feasible region.
Diving is well known as a key primal heuristic for mixed-integer linear opti-
mization, in the context of branch-and-bound; see, for example, [1–4,8]. Part of
its popularity stems from the fact that it is easy to implement within a mixed-
integer linear-optimization solver that already has the infrastructure to carry out
branch-and-bound. Iteratively, via a sequence of continuous relaxations, vari-
ables that are required to be integer in feasible solutions are heuristically fixed
to integer values. This is a bit akin to “reduced-cost fixing” (for mixed-integer
linear-optimization), where variables are fixed to bounds in a provably correct
manner. Diving heuristics employ special (heuristic) branching rules, with the
aim of tending towards (primal) feasibility and not towards a balanced sub-
division of the problem (as many branching rules seek to do). These heuristics
“quickly go down” the branch-and-bound tree (in the sense of depth-first search),
giving us the term diving. The heuristic is so important in the context of mixed-
integer linear-optimization solvers that most of them, as a default, do a sequence
of dives at the beginning of the solution process, so as to quickly obtain a good
feasible solution (which is very important for limiting the branching exploration).
Applying this type of idea in continuous non-convex global optimization appears
to be a fairly recent idea; see [12].
Our diving heuristic is closely related to this idea, but there is an important
difference. Diving in the context of global optimization is aimed at hoping to get
lucky and branch directly toward what will turn out to be a globally-optimal
solution. Our context is different; our “target” that we aim toward is the MP-
pseudoinverse A+ . But, importantly, our goal is not to get there; rather, our goal
is find good solutions along the way, that trade off sparsity against satisfaction
of P2.
We consider a diving procedure that iteratively increases the enforcement
of property P2, while heuristically localizing our search, inevitably showing its
impact on the sparsity (approximately measured by H1 ) of a computed gen-
eralized inverse H.
Diving for Sparse Partially-Reflexive Generalized Inverses 95
– The procedure is initialized with the solution of problem P, but without the
quadratic lifting inequalities.
– We define bounds for hij (λij ≤ hij ≤ μij ), such that [λij , μij ] is the smallest
interval that contains ĥij and A+
ij . By including the current Ĥ in the box,
we hope to remain localized to a region where there is a somewhat-sparse
solution. By including the MP-pseudoinverse A+ in the box, we guaranteed
that at every step we will have a feasible solution to our domain-restricted
relaxation.
– Next, for a fixed number of iterations, we consider the last solution (Ĥ, K̂)
of P, and we append to P the following inequalities, for all ij such that
K̂ij − ĥi· ĥ·j
= 0m×n .
Kij , uij v ij + w1ij + t22ij ≤ 0,
2
β1ij − α1ij
2
− (t1ij − α1ij ) 2
+ α1ij ≤ w1ij ,
β1ij − α1ij
α1ij ≤ t1ij ≤ β1ij ,
4 Preliminary Experiments
We present results on an example, as a proof of concept. More detailed compu-
tational results will appear in a subsequent publication. We imposed all of P1,
P3 and P4. Figure 1 contains two plots. The first shows the increase in H1 as
we dive, by iteration. The second shows the decrease in P2 violation as we dive.
96 V. K. Fuentes et al.
Fig. 3. Tradeoff
Figure 3 shows the results, as a scatter plot, for the best weighting, 25/75.
The difference between the two in the plot is only the selection of “branching”
point. We can see that there is no significant difference.
Overall, we see that our diving heuristic appears to be an effective means for
trading off sparsity against P2 satisfaction.
References
1. Achterberg, T.: Constraint integer programming. Ph.D. thesis, Berlin Institute of
Technology (2007). http://opus.kobv.de/tuberlin/volltexte/2007/1611/
2. Berthold, T.: Primal Heuristics for Mixed Integer Programming. Master’s thesis,
Technische Universität Berlin (2006)
3. Berthold, T.: Heuristics of the branch-cut-and-price-framework scip. In: Kalcsics,
J., Nickel, S. (eds.) Operations Research Proceedings 2007, pp. 31–36. Springer,
Berlin (2008)
98 V. K. Fuentes et al.
4. Danna, E., Rothberg, E., Le Pape, C.: Exploring relaxation induced neighborhoods
to improve mip solutions. Math. Progr. Ser. A 102, 71–90 (2005). https://doi.org/
10.1007/s10107-004-0518-7
5. Dokmanić, I., Kolundžija, M., Vetterli, M.: Beyond Moore-Penrose: Sparse pseu-
doinverse. In: ICASSP, vol. 2013, pp. 6526–6530 (2013)
6. Dokmanić, I., Gribonval, R.: Beyond Moore-Penrose Part I: Generalized Inverses
that Minimize Matrix Norms (2017). https://hal.inria.fr/hal-01547283
7. Dokmanić, I., Gribonval, R.: Beyond Moore-Penrose Part II: The Sparse Pseudoin-
verse (2017). https://hal.inria.fr/hal-01547283
8. Eckstein, J., Nediak, M.: Pivot, cut, and dive: a heuristic for 0–1 mixed integer
programming. J. Heuristics 13, 471–503 (2007)
9. Fampa, M., Lee, J.: Efficient treatment of bilinear forms in global optimization
(2018). arXiv:1803.07625
10. Fampa, M., Lee, J.: On sparse reflexive generalized inverse. Oper. Res. Lett. 46(6),
605–610 (2018)
11. Fuentes, V., Fampa, M., Lee, J.: Sparse pseudoinverses via LP and SDP relaxations
of Moore-Penrose. CLAIO 2016, 343–350 (2016)
12. Gerard, D., Köppe, M., Louveaux, Q.: Guided dive for the spatial branch-and-
bound. J. Glob. Optim. 68(4), 685–711 (2017)
13. Golub, G., Van Loan, C.: Matrix Computations, 3rd edn. Johns Hopkins University
Press, Baltimore (1996)
14. Penrose, R.: A generalized inverse for matrices. Proc. Camb. Philos. Soc. 51, 406–
413 (1955)
15. Rao, C., Mitra, S.: Generalized Inverse of Matrices and Its Applications. Probabil-
ity and Statistics Series. Wiley (1971)
16. Rohde, C.: Contributions to the theory, computation and application of
generalized inverses. Ph.D. thesis, University of North Carolina, Raleigh,
N.C. (May 1964). https://www.stat.ncsu.edu/information/library/mimeo.archive/
ISMS 1964 392.pdf
17. Xu, L., Fampa, M., Lee, J.: Aspects of symmetry for sparse reflexive generalized
inverses (2019)
Filtering Domains of Factorable Functions
Using Interval Contractors
Laurent Granvilliers(B)
1 Introduction
X i ⊆ Ω ∩ D ⊆ X i ∪ X o ⊆ Ω.
that must be inserted in C. For example, the function whose paving is depicted
in Fig. 1 leads to the set
2 Interval Computations
2.1 Interval Arithmetic
An interval is a closed and connected set of real numbers. The set of intervals
is denoted by I. The empty interval represents an empty set of real numbers.
The width of an interval [a, b] is equal to (b − a). The interval hull of a set of
real numbers S is the interval [inf S, sup S] denoted by hull S. Given an integer
n ≥ 1, an n-dimensional box X is a Cartesian product of intervals X1 × · · · × Xn .
A box is empty if one of its components is empty. The width of a box X is the
maximum width taken componentwise denoted by wid X.
Interval arithmetic is a set extension of real arithmetic [13]. Let g : D → R
be a real function with D ⊆ Rn . An interval extension of g is an interval function
G : In → I such that
This property called the fundamental theorem of interval arithmetic implies that
the interval G(X) encloses the range of g over X. When g corresponds to a
basic operation, it is possible to implement the interval operation in a way to
calculate the hull of the range by exploiting monotonicity properties, limits and
extrema. More complex functions can be extended in several ways. In particular,
the natural interval extension of a factorable function consists of evaluating the
function with interval operations given interval arguments.
[−2, 2] x2 [−5, 5]
Fig. 2. Let g(x) ≤ 0 be an inequality constraint with g(x1 , x2 , x3 ) = 2x1 + x22 − x3 and
let X be the box [0, 10] × [−5, 5] × [−1, 4]. The interval on the right at each node of g is
the result of the interval evaluation phase of the HC4Revise contractor. The interval at
the root node is intersected with the interval I = [−∞, 0] associated with the relation
symbol. The interval on the left at each node is the result of the projection phase
from the root to the leaves. For example, let u ∈ [−4, 0], v ∈ [0, 20] and w ∈ [−4, 26]
be three variables respectively labelling the + node, the × node and the − node. We
have to project the equation v + w = u over v and w, which propagates the new
domain at the root node to its children nodes. To this end the equation is inverted
and it is equivalently rewritten as v = u − w. The new domain for v is calculated as
[0, 20] ∩ ([−4, 0] − [−4, 26]), which leads to the new domain [0, 4] at the × node. The
new domain for w is derived similarly. At the end of this backward phase it comes the
new box [0, 2] × [−2, 2] × [0, 4].
Fig. 3. Let c be the inequality constraint x21 + x22 ≤ 4 that defines a disk in the
Euclidean plane and let X be the box [0, 4] × [−1, 1]. The hatched surface is returned
by an HC4Revise contractor Γ associated with the negation of c. The gray region
X \ Γ (X) is thus an inner region for c (every point satisfies c) and it is a box here.
√
Fig. 4. The function f (x) = x2 − x is undefined in the open interval (0, 1) since the
square root is defined in R+ and g(x) = x2 − x is negative for all x such that 0 < x < 1.
The restricted domain of the square root entails the constraint x2 − x ≥ 0.
104 L. Granvilliers
3.2 Branch-and-Contract-Algorithm
Algorithm 1 implements a classical interval branch-and-contract algorithm that
calculates a paving of the domain of definition of a function f within a given box
Ω. It maintains a list L from which a CSP C, X is extracted at each iteration.
This CSP is reduced and divided by two algorithms contract and branch that
are specific to our problem. If the set C becomes empty then X is inserted in
the set of inner boxes X i . A tolerance > 0 permits to stop processing too small
boxes that are inserted in the set of outer boxes X o .
is an inner region for the CSP, which means that every point of this region
satisfies all the constraints from C, as illustrated in Fig. 5.
X X1 X2
Γ2 (X)
Γ1 (X)
Γ3 (X)
(a) (b)
be the interval hull of the contracted boxes with respect to the constraint nega-
tions. If H is empty then X is an inner box and it is inserted in X i . Now suppose
that H is not empty. Let d− i = min Hi − min Xi and di = max Xi − max Hi be
+
d = max{d− − +
1 , . . . , dn , d1 , . . . , dn }
+
instance that d = d− i o
j for some j, X is split in two sub-boxes X ∪ X at xj = dj .
−
The maximal inner box X i is directly inserted in the set of inner boxes X i and
the CSP C, X o is inserted in L. Otherwise, a bisection of the largest component
of X generates two sub-boxes X ∪ X and the CSPs C, X and C, X are
added to L, which ensures the convergence of the branch-and-contract algorithm.
4 Experimental Results
Fig. 6. The sets of inner boxes X i computed by the three strategies for the introductory
problem at tolerance = 0.1: 330 boxes for S1 , 738 for S2 and 570 for S3 . Their total
areas are respectively equal to 30.38, 29.95 and 29.74.
Filtering Domains of Factorable Functions Using Interval Contractors 107
Acknowledgment. The author would like to thank Christophe Jermann for interest-
ing discussions about these topics and his careful reading of a preliminary version of
this paper.
References
1. 1788-2015, I.S.: IEEE Standard for Interval Arithmetic (2015)
2. Benhamou, F., Goualard, F.: Universally quantified interval constraints. In: Pro-
ceedings of International Conference on Principles and Practice of Constraint Pro-
gramming (CP), pp. 67–82 (2000)
3. Benhamou, F., Goualard, F., Granvilliers, L., Puget, J.F.: Revising hull and box
consistency. In: Proceedings of International Conference on Logic Programming
(ICLP), pp. 230–244 (1999)
4. Chabert, G., Beldiceanu, N.: Sweeping with continuous domains. In: Proceedings
of International Conference on Principles and Practice of Constraint Programming
(CP), pp. 137–151 (2010)
5. Chabert, G., Jaulin, L.: Contractor programming. Artif. Intell. 173(11), 1079–1100
(2009)
6. Collavizza, H., Delobel, F., Rueher, M.: Extending consistent domains of numeric
CSP. In: Proceedings of International Joint Conference on Artificial Intelligence
(IJCAI), pp. 406–413 (1999)
7. Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., Zimmermann, P.: MPFR: a
multiple-precision binary floating-point library with correct rounding. ACM Trans.
Math. Softw. 33(2) (2007)
8. Granvilliers, L.: A new interval contractor based on optimality conditions for bound
constrained global optimization. In: Proceedings of International Conference on
Tools with Artificial Intelligence (ICTAI), pp. 90–97 (2018)
9. Granvilliers, L., Benhamou, F.: Algorithm 852: realpaver: an interval solver using
constraint satisfaction techniques. ACM Trans. Math. Softw. 32(1), 138–156 (2006)
10. Hentenryck, P.V., McAllester, D., Kapur, D.: Solving polynomial systems using a
branch and prune approach. SIAM J. Numer. Anal. 34(2), 797–827 (1997)
11. Lhomme, O.: Consistency techniques for numeric CSPs. In: Proceedings of Inter-
national Joint Conference on Artificial Intelligence (IJCAI), pp. 232–238 (1993)
12. Mackworth, A.K.: Consistency in networks of relations. Artif. Intell. 8, 99–118
(1977)
13. Moore, R.E.: Interval Analysis. Prentice-Hall (1966)
Leveraging Local Optima Network
Properties for Memetic Differential
Evolution
1 Introduction
Consider the global optimization problem
2 Definitions
2.1 Strategies
The most popular DE variants which apply different strategies are distinguished
by the notation DE/x/y/z, where
– x specifies the solution to be perturbed, and it can be either rand or best, i.e.,
a random one or the current best solution.
In the above algorithm description it defines the way to choose pj in Step
2(a).
Leveraging Local Optima Network Properties for Memetic DE 111
– y specifies the number of difference vectors (i.e. the difference between two
randomly selected and distinct population members) to be used in the per-
turbation done in Step 2(b), and its typical values are either 1 or 2.
The choice y = 1 is considered as default and hence Step 2(a) and 2(b) are
as already given in the description. In case of y = 2, then besides pk and pl ,
further two vectors, pm and pn are also selected in order to create another
differential vector.
– z identifies which probability distribution function to be used by the crossover
operator: either bin (as binomial) or exp (as exponential).
In bin choose randomly a dimension index d. In Step 2(c) the vector c modified
as for every e = d index let ce := pie with CR probability.
In exp choose randomly a dimension index d. Starting from d, step over every
e dimension and modify ce to pie . In every step with 1 − CR probability
finish the modification.
which is a single-funnel function with 10n local minimizers, and its global
minimum value is 0.
– Schwefel:
n
fS (x) = −xi sin( |xi |), x ∈ [−500, 500]n ,
i=1
which is a highly multi-funnel function with 2n funnel bottom, and its global
minimum value is −418.98129n.
In fact, we used modified versions of these functions, namely we applied
shifting and rotation of Rastrigin: fR (W(x − x)), and rotation on Schwefel:
fS (W(x)), where W is an n-dimensional orthogonal matrix, and x is an n-
dimensional shift vector. These transformations result in even more challenging
test functions, as they are non-separable and their global minimizer points do
not lie in the center of their search space (as in the original versions).
3.4 Results
As it was already mentioned, we executed K = 50 independent runs for every
variants. Both the dimension and the population size was fixed to 20. The MDE
parameters were set up as F = 0.5 and CR = 0.1 in all experiments.
For the Rastrigin function the tested strategies resulted in different perfor-
mance metrics as it can be seen in Table 1. The most successful is the rand/2/bin
variant as it was able to find the global optimum in all cases. Overall the rand/y/z
strategies did quite well, except the rand/1/exp which resulted in the highest SP
value. Among the best/x/y ones the best/2/bin got the highest success rate and
the lowest SP value, whereas the best/1/exp did not succeed at all. Regarding the
LONs we can notice that the x/2/z strategies led to larger graphs, as expected.
This is a clear indication that these versions discover wider regions during the
optimization runs. Note that larger LONs, such as rand/2/z have not resulted
in larger diameters. The small size LONs of the best/1/z strategies and their low
average degree are evidences of early convergence to local optima.
Table 1. Performance and graph metrics for rotated and shifted Rastrigin-20
As it was expected the Schwefel problem turned out to be much more chal-
lenging for the MDE versions, see Table 2. Only three out of eight strategies were
able to find the global optimum at least once. For this function rand/2/bin has
the largest success rate and the lowest Adf and SP values, being essentially better
than any other variants. However, the relative good performance of rand/2/bin
is related to the highest number of nodes and edges in its LONs, hence it spends
considerably more computational time than the others. An overall observation
114 V. Homolya and T. Vinkó
is that the diameters are certainly lower for the Schwefel problem than for the
Rastrigin. On the other hand, the average degree values are very similar for the
two problems.
Fig. 1. Function values of out-neighbors for fS with n = 20; the most successful runs
for: best/1/bin (left) and rand/1/bin (right)
During the MDE run the corresponding LON gets built-up and it is possible
to store the function values of the nodes. We can investigate the out-neighbors
of node u and compare their function values against u. Figure 1 contains two
plots of this kind, showing two different runs of two MDE variants. The x-axis
contains the function values of LON nodes with positive out-degree. Each dot
Leveraging Local Optima Network Properties for Memetic DE 115
shows the function values of the out-neighbors. The straight line helps us to
notice the amount of neighbors with higher and lower function values for each
node. Having more dots above the line indicates that the MDE variant created
more children with worse function value from a given node. The side effect of this
behavior is the wider discovery of the search space, which can be quite beneficial
especially on multi-funnel functions such as Schwefel.
Although the rand/1/bin variant resulted in much larger LONs than the
best/1/bin ones, Fig. 1 clearly shows that rand/1/bin has relatively much more
dots above the line than below. For the other rand/y/z variants we obtained
similar figures, and we know from Table 2 that some of these variants were able
to find the global minimizer. On the other hand, best/1/bin got stuck in a local
minimizer point and from the plot we can see the sign of greedy behavior.
The fact that more successful variants can show similar behavior for the
single-funnel Rastrigin function is shown on Fig. 2. Greedy behavior for this
function could lead to better performance, nevertheless, even the most successful
run (in terms of best function value reached) of best/1/exp converged to a local
minimizer (left hand side on Fig. 2).
Fig. 2. Function values of out-neighbors for fR with n = 20; the most successful runs
for: best/1/exp (left) and rand/1/exp (right)
the population visited fairly large part of the space, so it has good chances to
converge to the global optimum if we use the MDE without this modification.
We propose to extend the MDE algorithm in its Step 2 with the following
rule, which has three integer parameters, δ > 0, α > 0 and θ ≤ 0. If the diameter
of the current LON is lower than δ then in every α-th iteration for all pi do the
followings:
– collect the out-neighbors of pi into the set Ni ,
– calculate the function values of the elements of Ni ,
– let Nia := {q : f (q) > f (pi )}, and Nib := {q : f (q) < f (pi )}
– if |Nia | − |Nib | < θ then replace pi by a newly generated random vector.
Note that function values of the nodes are stored directly in the LON, so prac-
tically they need to be calculated only once.
Table 3. Performance and graph metrics for rotated and shifted Rastrigin-20 using
the new rule
We can see that our rule improved the percentage of success (S) for the
single-funnel Rastrigin function in up to 16%, and resulted in lower average
Leveraging Local Optima Network Properties for Memetic DE 117
Table 4. Performance and graph metrics for rotated and shifted Schwefel-20 using the
new rule
5 Conclusions
To the best of our knowledge our paper is the first one reporting benchmark-
ing results on MDE variants. According to the numerical experiments, the
rand/2/bin strategy provides overall the best percentage of success metric, espe-
cially when it is applied on multi-funnel problem. This is somewhat in line with
the results reported in [6] for DE. For a single-funnel function the best/2/bin
variant can be advantageous if one needs good success performance, i.e. lower
computational time.
We have shown that incorporating certain knowledge on the local optima
network of the MDE to the evolutionary procedure can lead us to formalize
restarting rules to enhance the diversification of the population. Our numerical
tests indicates that the proposed restarting rule is beneficial on average for most
of the MDE variants.
In this work we have developed a computational tool in Python using Pyomo
and NetworkX packages which provide us with a general framework to discover
further possibilities on the field of (evolutionary) global optimization and net-
work science. We plan to extend our codebase with further MDE rules, in par-
ticular with those involve network centrality measures as selection [4].
Acknowledgment. This research has been partially supported by the project “Inte-
grated program for training new generation of scientists in the fields of computer sci-
ence”, no EFOP-3.6.3-VEKOP-16-2017-0002. The project has been supported by the
European Union and co-funded by the European Social Fund. Ministry of Human
Capacities, Hungary grant 20391-3/2018/FEKUSTRAT is acknowledged.
118 V. Homolya and T. Vinkó
References
1. Cabassi, F., Locatelli, M.: Computational investigation of simple memetic
approaches for continuous global optimization. Comput. Oper. Res. 72, 50 – 70
(2016)
2. Hagberg, A., Swart, P., S Chult, D.: Exploring network structure, dynamics, and
function using NetworkX. Technical report, Los Alamos National Lab. (LANL),
Los Alamos, NM (United States) (2008)
3. Hart, W.E., Laird, C.D., Watson, J.P., Woodruff, D.L., Hackebeil, G.A., Nicholson,
B.L., Siirola, J.D.: Pyomo-Optimization Modeling in Python, vol. 67. Springer,
Heidelberg (2012)
4. Homolya, V., T.Vinkó: Memetic differential evolution using network centrality
measures. In: AIP Conference Proceedings 2070, 020023 (2019)
5. Locatelli, M., Maischberger, M., Schoen, F.: Differential evolution methods based
on local searches. Comput. Oper. Res. 43, 169–180 (2014)
6. Mezura-Montes, E., Velázquez-Reyes, J., Coello Coello, C.A.: A comparative study
of differential evolution variants for global optimization. In: Proceedings of the 8th
Annual Conference on Genetic and Evolutionary Computation, pp. 485–492. ACM
(2006)
7. Moscato, P.: On evolution, search, optimization, genetic algorithms and martial
arts: towards memetic algorithms. Caltech concurrent computation program. C3P
Rep. 826 (1989)
8. Murtagh, B.A., Saunders, M.A.: MINOS 5.5.1 user’s guide. Technical Report SOL
83-20R (2003)
9. Neri, F., Cotta, C.: Memetic algorithms and memetic computing optimization: a
literature review. Swarm Evol. Comput. 2, 1–14 (2012)
10. Piotrowski, A.P.: Adaptive memetic differential evolution with global and local
neighborhood-based mutation operators. Inf. Sci. 241, 164–194 (2013)
11. Skanderova, L., Fabian, T.: Differential evolution dynamics analysis by complex
networks. Soft Comput. 21(7), 1817–1831 (2017)
12. Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for
global optimization over continuous spaces. J. Global Optim. 11, 341–359 (1997)
13. Vinkó, T., Gelle, K.: Basin hopping networks of continuous global optimization
problems. Cent. Eur. J. Oper. Res. 25, 985–1006 (2017)
Maximization of a Convex Quadratic
Form on a Polytope: Factorization and
the Chebyshev Norm Bounds
1 Introduction
We consider one of the basic global optimization problems [6,9,15,16], maxi-
mization of a convex quadratic form on a convex polyhedral set
There are various methods developed for solving (1). This includes cutting
plane methods [10], reformulation-linearization/convexification and branch &
bound methods [3,15], among others. Polynomial time approximation methods
also exist [18]. There are many works on quadratic programming [7] and concave
function minimization [14] to give a more detailed state-of-the-art.
In this paper, we focus on computation of a cheap upper bound on f ∗ . Tight
upper bounds are important, for instance, when quadratic functions in a non-
linear model are relaxed. Maybe more importantly, tight bounds are crucial for
effectivity of a branch & bound approach when solving nonlinear optimization
problems.
Notation. For a matrix A, we use Ai,∗ to denote its ith row. Inequalities and
absolute values are applied entry-wise for vectors and matrices. The vector of
ones is denoted by e = (1, . . . , 1)T and the identity matrix
√ of size n by In . We
use two vector norms, the Euclidean norm x2 = xT x and the maximum
(Chebyshev) norm x∞ = maxi {|xi |}. For a matrix M ∈ Rn×n , we use the
induced maximum norm M ∞ = maxi j |Mij |.
Factorization. Matrix A can be factorized as A = GT G. Then xT Ax =
xT GT Gx = Gx22 and we can formulate the problem as maximization of the
squared Euclidean norm
The inner optimization problem maxx∈M Gi,∗ x has the form of an LP problem,
and we have to solve maxx∈M ±Gi,∗ x for each i = 1, . . . , n. So in order to
calculate g ∗ (G), it is sufficient to solve 2n LP problems in total.
Quality of the upper bound g ∗ (G) depends on the factorization A = GT G.
Our problem thus reads:
Find the factorization A = GT G such that the upper bound (3) is as tight
as possible.
2 Methods
There are two natural choices for the factorization A = GT G:
Maximization of a Convex Quadratic Form: Factorization and Bounds 121
Algorithm 1. (Factorization A = RT R)
Input: Let A = GT G be an initial factorization.
1: Put R := G.
2: Put y := |R|e.
3: Put α := √1n y2 .
4: Put H := H(α · e − y).
5: If HR∞ < R∞ , put R := HR and go to step 2.
Output: factorization A = RT R.
Alternative Approaches
In order to carry out a comparison, we consider three alternative methods:
Exact method by enumeration. The optimal value f ∗ is attained for a vertex of
the feasible set M. Thus, to compute f ∗ , we enumerate all vertices of M and
take the maximum. Due to high computational complexity of this method, we
use it in small dimensions only.
122 M. Hladı́k and D. Hartman
xT A+ x ≤ xT A+ x + xT A+ x − xT A+ x
= (x + x)T A+ x − xT A+ x = 2xTc A+ x − xT A+ x,
and
xT A− x ≥ xT A− x + xT A− x − xT A− x = 2xT A− x − xT A− x,
xT A− x ≥ xT A− x + xT A− x − xT A− x = 2xT A− x − xT A− x.
in [−1, 1] and b is chosen randomly uniformly in [0, eT |a|]. For larger dimensions
(n ≥ 70), we make zero randomly selected 80% of entries of the constraint matrix
and run the computations in sparse mode.
For small dimensions, effectivity of a method is evaluated relatively to the
exact method. That is, we record the ratio bm /f ∗ , where bm is the upper bound
by the given method and f ∗ is the optimal value. For higher dimensions, the
exact method is too time consuming, so effectivity of a method is evaluated
relatively to the trivial method. That is, we record the ratio bm /btriv , where bm
is the upper bound by the given method and btriv is the upper bound by the
trivial method.
The computations were carried out in MATLAB R2017b on a eight-processor
machine AMD Ryzen 7 1800X, with 32187 MB RAM. The symbols used in the
tables have the following meaning:
– runs: the number of runs, for which the mean values in each row are com-
puted;
– triv: the trivial upper bound using the interval hull of M;
– McCormick: the upper bound using McCormick relaxation and the interval
hull of M;
– sqrtm: our upper bound using G as the square root of A;
– sqrtm+it: our upper bound using G as the square root of A and iterative
modification of G by means of Algorithm 1;
– chol: our upper bound using G from the Cholesky decomposition of A;
– chol+it: our upper bound using G from the Cholesky decomposition of A
and iterative modification of G by means of Algorithm 1;
– chol+rand: our upper bound using G from the Cholesky decomposition of A
and iterative improvement of G by trying 10 random Householder matrices.
Small dimension. Table 1 compares the effectivities for small dimensions, and
Table 2 displays the corresponding running times. By definition, effectivity of
the exact method is 1. From the results we see that for a very small dimension,
the best strategy is to compute the exact optimal value – it is the tightest and
fastest method. As the dimension increases, computation of the exact optimal
value becomes more time consuming. The running times of the upper bound
methods are more-or-less the same. Of course, chol+rand is about ten times
slower since it run ten instances.
Our approach is more effective with respect to tightness provided a suitable
factorization is used. The square root of A behaves better than the Cholesky
decomposition on average. Algorithm 1 can improve the performance of the
Cholesky approach, but not that of the square root one. The random generation
of Householder matrices has the best performance, indicating that there is a high
potential of using a suitable factorization. On average, the random Householder
matrix generation performs similarly when applied on sqrtm or on chol, so we
numerically tested only the latter.
124 M. Hladı́k and D. Hartman
As the dimension increases, all the bounds (the trivial ones, the McCormick
ones and our bounds) tend to improve. This is rather surprising, and we have
no complete explanation for this behaviour. It seems to affected by the geom-
etry of the convex polyhedron in connection with the way how the bounds are
constructed.
Table 1. Efficiency of the methods – small dimensions. The best efficiencies highlighted
in boldface.
Higher dimension. Tables 3 and 4 show the results for higher dimensions. By
definition, effectivity of the trivial method is 1. For smaller n, random House-
holder matrix generation performs best, but for larger n the number of random
matrices is not sufficient and the winner is the square root of A. Sometimes,
its tightness is improved by additional iterations, but not always. Again, the
computation times are very similar to each other. This is not surprising since all
the methods basically need to solve 2n LP problems.
For n ≥ 70, we run the computations in sparse mode. We can see from the
tables that the calculations took lower time due to the sparse mode. With respect
to the efficiencies, the methods perform similarly as in the previous dense case.
Again, as the dimension increases, our bounds tend to improve. Since we
related the displayed efficiency of the bounds to the trivial ones, this behaviour
might be caused by the worse quality of the trivial bounds in higher dimensions.
Maximization of a Convex Quadratic Form: Factorization and Bounds 125
4 Conclusion
We proposed a simple and cheap method to compute an upper bound for a prob-
lem of maximization of a convex quadratic form on a convex polyhedron. The
method is based on a factorization of the quadratic form matrix and application
of Chebyshev vector norm.
The numerical experiments indicate that (at least for the randomly generated
instances) with basically the same running time, the method gives tighter bounds
than the trivial method or than the McCormick relaxation approach. For small
dimensions, the performance of all the considered approximation methods was
low even in comparison with exact optimum computation. However, in medium
or larger dimensions, the effectivity of our approach becomes very significant.
Therefore, it may serve as a promising approximation method for solving large-
Table 3. Efficiency of the methods – higher dimensions. The best efficiencies high-
lighted in boldface. The bottom part run in sparse mode.
Table 4. Computational times of the methods (in seconds) – higher dimensions. The
bottom part run in sparse mode.
scale problems. Indeed, the larger the dimension, the tighter bounds we got
compared relatively to the trivial or McCormick ones.
In the future, it would be also interesting to compare our approach to other
approximation methods, including the state-of-the-art technique of semidefinite
programming.
As an open problem, it remains the question of finding a suitable factoriza-
tion. In our experiments, the square root approach behaves best. Algorithm 1
can sometimes slightly improve tightness of the resulting bounds with almost
no additional effort. Nevertheless, as the numerical experiments with random
Householder matrices suggest, there is a high potential of achieving even better
results. The problem of finding the best factorization is challenging – so far,
there are no complexity theoretical results or any kind of characterization.
References
1. Allemand, K., Fukuda, K., Liebling, T.M., Steiner, E.: A polynomial case of uncon-
strained zero-one quadratic optimization. Math. Program. 91(1), 49–52 (2001)
2. Baharev, A., Achterberg, T., Rév, E.: Computation of an extractive distillation
column with affine arithmetic. AIChE J. 55(7), 1695–1704 (2009)
3. Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming. Theory and
Algorithms. 3rd edn. Wiley, Hoboken (2006)
4. Černý, M., Hladı́k, M.: The complexity of computation and approximation of the
t-ratio over one-dimensional interval data. Comput. Stat. Data Anal. 80, 26–43
(2014)
5. Floudas, C.A.: Deterministic Global Optimization. Theory, Methods and Applica-
tions, Nonconvex Optimization and its Applications, vol. 37. Kluwer, Dordrecht
(2000)
6. Floudas, C.A., Visweswaran, V.: Quadratic optimization. In: Horst, R., Parda-
los, P.M. (eds.) Handbook of Global Optimization, pp. 217–269. Springer, Boston
(1995)
7. Gould, N.I.M., Toint, P.L.: A quadratic programming bibliography. RAL Internal
Report 2000-1, Science and Technology Facilities Council, Scientific Computing
Department, Numerical Analysis Group, 28 March, 2012. ftp://ftp.numerical.rl.
ac.uk/pub/qpbook/qp.pdf
8. Hansen, E.R., Walster, G.W.: Global Optimization Using Interval Analysis, 2nd
edn. Marcel Dekker, New York (2004)
9. Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches. Springer, Hei-
delberg (1990)
10. Konno, H.: Maximizing a convex quadratic function over a hypercube. J. Oper.
Res. Soc. Jpn 23(2), 171–188 (1980)
11. Kreinovich, V., Lakeyev, A., Rohn, J., Kahl, P.: Computational Complexity and
Feasibility of Data Processing and Interval Computations. Kluwer, Dordrecht
(1998)
12. McCormick, G.P.: Computability of global solutions to factorable nonconvex pro-
grams: Part I - Convex underestimating problems. Math. Program. 10(1), 147–175
(1976)
13. Moore, R.E., Kearfott, R.B., Cloud, M.J.: Introduction to Interval Analysis. SIAM,
Philadelphia (2009)
Maximization of a Convex Quadratic Form: Factorization and Bounds 127
14. Pardalos, P., Rosen, J.: Methods for global concave minimization: a bibliographic
survey. SIAM Rev. 28(3), 367–79 (1986)
15. Sherali, H.D., Adams, W.P.: A Reformulation-Linearization Technique for Solving
Discrete and Continuous Nonconvex Problems. Kluwer, Boston (1999)
16. Tuy, H.: Convex Analysis and Global Optimization. Springer Optimization and Its
Applications, vol. 110, 2nd edn. Springer, Cham (2016)
17. Vavasis, S.A.: Nonlinear Optimization: Complexity Issues. Oxford University Press,
New York (1991)
18. Vavasis, S.A.: Polynomial time weak approximation algorithms for quadratic pro-
gramming. In: Pardalos, P.M. (ed.) Complexity in Numerical Optimization, pp.
490–500. World Scientific Publishing, Singapore (1993)
New Dynamic Programming Approach
to Global Optimization
Abstract. The paper deals with the problem of finding the global min-
imum of a function in a subset of Rn described by values of solutions to
a system of semilinear parabolic equations. We propose a construction
of a new dual dynamic programming to formulate a new optimization
problem. As a consequence we state and prove a verification theorem for
the global minimum and investigate a dual optimal feedback control for
the global optimization.
1 Introduction
In classical optimization problem, our aim is to minimize a real valued objective
function, defined on a subset of an Euclidean space, which is determined by a
family of constraint functions. Depending on the type of those functions: linear,
convex, nonconvex and nonsmooth, different tools from analysis and numerical
analysis can be applied in order to find the minimum (or approximate minimum)
of the objective function (see e.g. [1]). However some sets, interesting from prac-
tical point of view, are very difficult to describe by constraints. Sometimes such
problematic sets can be characterized as controllability sets of dynamics e.g. dif-
ferential equations depending on controls. The aim of this paper is to present one
of such dynamics–a system of parabolic differential equations and to construct
a new dynamic programming to derive verification theorem for the optimization
problem. As a consequence, we can define a dual feedback control and an optimal
dual feedback and state a theorem regarding sufficient optimality conditions in
terms of the feedback control.
minimize R(x) on P.
Notice that nothing is assumed on R and the set P can be very irregular. It is
not an easy work to study such a problem and, in fact, theory of optimization
does not offer suitable tools to perform that task. We develop a new method
to handle the problem R. To this effect, we transform R to the language of
optimal control theory. Let us introduce Ω–an open, bounded domain in Rn of
the variables z, a compact set U ⊂ Rm (m ≥ 1) and an interval [0, T ]. Define a
family
U = u(t, z), (t, z) ∈ [0, T ] × Ω : u ∈ L1 ([0, T ] × Ω), u(t, z) ∈ U
xt (t, z) − Δx(t, z)) = f (t, z, x(t, z)), u(t, z)) , (t, z) ∈ [0, T ] × Ω, (2)
parameters u. Thus we can apply tools from optimal control theory. Of course,
one may wonder whether this machinery is too complicated to optimize R on
P. All depends on the type of the set P is, as well as on how smooth is the
function R. If R and P are regular enough, we have many instruments in theory
of optimization to solve the problem R also numerically, but when there is no
sufficient regularity, then these methods are very complicated or, in case of very
bad data, cannot be adopted. In order to derive verification conditions, in fact,
sufficient optimality conditions for Rc , we develop quite a new dual method
basing on ideas from [3]. Using that dual method we also construct a new dual
optimal feedback control for Rc . Essential point of the proposed approach is that
we do not need any regularity of R on P as we move all considerations related
to Rc to extended space.
Remark 1. Notice that if we omit the integral in the definition of P, then P
will become a subset of an infinite dimensional space, but the method developed
in subsequent sections can be applied also to that case (i.e. to the problem of
finding a minimum in a subset of an infinite dimensional space).
3 Dual Approach to Rc
The dual approach to optimal control problems was first introduced in [3] and
then developed in several papers to different problems of that kind, governed
by elliptic, parabolic and wave equations (see e.g. [2,4]). In that method we do
not deal directly with a value function but with some auxiliary function, defined
in an extended set, satisfying a dual dynamic equation, which allows to derive
verification conditions for the primal value function. One of the benefits of that
technique is that we do not need any properties of the value function, such as
smoothness or convexity. In this paper we want to construct a new dual method
to treat the problem Rc . We start with the definition of a dual set: P ⊂ Rn –an
open set of the variables p. The set P is chosen by us! Let P ⊂ R2n+1 be an
open set of the variables (t, z, p), (t, z) ∈ [0, T ] × Ω, p ∈ P, i.e.
depending on the primal variable (t, z), and the dual variable p. The primal and
the dual variables are independent and the functions in the space W 1:2 (P ) enjoy
different properties with respect to (t, z) and p. The strategy of dual dynamic
programming consists in building all notions in the dual space–this concerns also
a dynamic programming equation. Thus the question is: how to construct that
equation in our case? The answer is not easy and not unique: on the left hand side
of (2) there is a linear differential operator, which concerns a state x. Certainly,
the auxiliary function V has to be real valued, as it must relate somehow to
a value function. This implies that the system of dynamic equations has to be
composed of one equation only, despite that (2) is a system of n equations.
The main problem is to choose a proper differential operator for the auxiliary
function V and a correct Hamiltonian, as these choices depend on themselves.
We have decided that in our original approach it is better to apply for V the
parabolic operator ∂/∂t − Δ only. We state the dynamic equation in a strong
form (see 5). We should stress that this equation is considered in the set P , i.e.
in the set of the variables (t, z, p).
Therefore, we require that a function V (t, z, p), V ∈ W 1:2 (P ), satisfies, in
P, for some y 0 ∈ L2 ([0, T ] × Ω), continuous in t, a parabolic partial differential
equation of dual dynamic programming of the form:
∂
∂t V(t, z, p) − Δz V (t, z, p) − inf{pf (t, z, V (t, z, p), u), u ∈ U }
∂
= ∂t V (t, z, p) − Δz V (t, z, p) − pf (t, z, V (t, z, p), u(t, z, p)) (5)
= y 0 (t, z), (t, z, p) ∈ P,
as well the initial condition
y (T, z)dz ≤ R( pV (T, z, p)dz), p ∈ P,
0
(6)
Ω Ω
while V (t, z, p), is a solution to (5). We will call p(·) a dual trajectory, while x(·)
stands for a primal trajectory. Moreover, we say that a dual trajectory p(·) is
dual to x(·), if both are generated by the same control u(t, z). Further, we confine
ourselves only to those admissible trajectories x(·), which satisfy the equation:
x(t, z) = p(t, z)V (t, z, p(t, z)) (for (t, z) ∈ [0, T ] × Ω). Thus denote
AdV = {(x, u) ∈ Ad : there exist p ∈L2 ([0, T ] × Ω), dual to x(t, z)
and such that x(t, z) = p(t, z)V (t, z, p(t, z)), for (t, z) ∈ [0, T ] × Ω}.
Actually, it means that we are going to study the problem Rc possibly in
some smaller set AdV , which is determined by V . All the above was simply the
132 A. Kaźmierczak and A. Nowakowski
precise description of the family AdV . This means we must reformulate Rc to:
RV = inf R( x(T, z)dz). (8)
(x,u)∈AdV Ω
V
We name R the dual optimal value, in contrast to the optimal value
Ro = inf R( x(T, z)dz),
(x,u)∈Ad Ω
the set AdV . Moreover, essential point is that the set AdV is, in general, smaller
than Ad, i.e. AdV ⊂ Ad, so the dual optimal value RV may be greater than
the optimal value Ro , i.e. RV ≥ Ro . In order to find the set AdV , first we must
find the function V , i.e. solve equation (5) and then define the set of admissible
dual trajectories. It is not an easy work, but it permits to assert that suspected
trajectory is really optimal with respect to all trajectories lying in AdV . This
fact is presented in the literature for the first time.
Remark 2. We should not bother about the problem R, if AdV is strictly smaller
than Ad, since the given P can be characterized with the help of the set AdV . In
practice, we extend Ad in order that (a possibly smaller set) AdV corresponds
precisely to P.
Moreover, assume that x̄(t, z) = p̄(t, z)V (t, z, p̄(t, z)), (t,z) ∈ [0, T ]×Ω, together
with ū, belong to AdV .
Then (x̄(·), ū(·)) is the optimal pair relative to all (x(·), u(·)) ∈ AdV .
Proof. Let us take any (x(·), u(·)) ∈ AdV and p(·) generated by u(·), i.e. such
that (u(t, z),p(t, z)), (t, z) ∈ [0, T ] × Ω, satisfy (7) for some y ∈ L2 ([0, T ] × Ω),
y ≤ y 0 . Hence, from definition of AdV , the control u(·) generates x(t,z) =
p(t,z)V (t, z, p(t,z)), (t, z) ∈ [0, T ] × Ω. Then, on the basis of (9) and (6), we can
write
R( Ωx̄(T, z, )dz) = R( Ω p̄(T, z)V (T, z,p̄(T, z))dz) = Ω y 0 (T, z)dz
≤ R( Ω p(T, z)V (T, z, p(T, z))dz) = R( Ω x(T, z, )dz),
which gives the assertion.
New Dynamic Programming Approach to Global Optimization 133
A dual feedback control ū(t, z, p), (t, z, p) ∈ P , is named optimal if there exist:
(i) a function x̄(t, z, p), (t, z, p) ∈ P , x̄ ∈ W 1:2 (P ), satisfying (10) with ū(t, z, p),
(ii) V ∈ W 1:2 (P ), given by the relation x̄(t, z, p) = pV (t, z, p), satisfying (6) for
some y 0 ∈ L2 ([0, T ] × Ω) and defining
Adx̄ = {(x, u) ∈ Ad : x(t, z) = x̄(t, z, p(t, z)) for some p ∈ L2 ([0, T ] × Ω),
satisfying(7) with u(t, z) = ū(t, z, p(t, z))
and some y ∈ L2 ([0, T ] × Ω), y ≤ y 0 },
(iii) a dual trajectory p̄(·) ∈ L2 ([0, T ] × Ω), such that the pair
x̄(t, z) = x̄(t, z, p̄(t, z)), ū(t, z) = ū(t, z, p̄(t, z)), (t, z) ∈ [0, T ] × Ω,
is optimal relative to the set Adx̄ and p̄ satisfies (7) together with ū.
Next theorem asserts the existence of an optimal dual feedback control, again
in terms of the function V (t, z, p).
Theorem 2. Let ū(t, z, p) be a dual feedback control in P and x̄(t, z, p),
(t, z, p) ∈ P , be defined according to (10). Suppose that for some y 0 ∈ L2 ([0, T ]×
Ω), there exists a function V ∈ W 1:2 (P ), satisfying (6), and that
Let p̄(·) ∈ L2 ([0, T ] × Ω), (t, z, p̄(t, z)) ∈ P , be such a function, that a pair
x̄(t, z) = x̄(t, z, p̄(t, z)), ū(t, z) = ū(t, z, p̄(t, z)) belongs to Adx̄ and p̄ satisfies
(7) with ū and V . Moreover, assume that
R( p̄(T, z)V (T, z, p̄(T, z))dz) (12)
Ω
= R( y 0 (T, z)dz).
Ω
Proof. Take any function p(t, z), p ∈ L2 ([0, T ] × Ω), dual to x(t, z) =
x̄(t, z, p(t, z)) and such that for u(t, z) = ū(t, z, p(t, z)), (x, u) ∈ Adx̄ . By (11),
134 A. Kaźmierczak and A. Nowakowski
it follows that x(t, z) = p(t, z)V (t, z, p(t, z)) for (t, z) ∈ [0, T ] × Ω. Analogously
as in the proof of Theorem 1, Eqs. (6) and (12) give
R( p̄(T, z)V (T, z, p̄(T, z))dz) ≤ R( x(T, z, )dz). (13)
Ω Ω
References
1. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press
(2004)
2. Galewska, E., Nowakowski, A.: A dual dynamic programming for multidimensional
elliptic optimal control problems. Numer. Funct. Anal. Optim. 27, 279–289 (2006)
3. Nowakowski, A.: The dual dynamic programming. Proc. Am. Math. Soc. 116, 1089–
1096 (1992)
4. Nowakowski, A., Sokolowski, J.: On dual dynamic programming in shape control.
Commun. Pure Appl. Anal. 11, 2473–2485 (2012)
On Chebyshev Center of the Intersection
of Two Ellipsoids
1 Introduction
and Fi ∈ Rmi ×n , gi ∈ Rmi for i = 1, 2. We assume one of the two ellipsoids is non-
degenerate so that Ω is bounded. To this end, we let F1 be of full column rank.
We also assume that Ω has at least one interior point. Without loss of generality,
we assume the origin 0 is an interior point of Ω, that is, gi < 1, i = 1, 2.
Under these assumptions, (CC) has an optimal solution (z ∗ , x∗ ). Then, z ∗ is the
Chebyshev center of Ω and the ball centered at z ∗ with radius x∗ − z ∗ is the
smallest ball covering Ω.
(CC) has a direct application in the bounded error estimation. Consider the
linear regression model Ax ≈ b where A is ill-conditioned. In order to stabilize
the estimation, a regularization constraint Lx2 ≤ η is introduced to restrict
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 135–144, 2020.
https://doi.org/10.1007/978-3-030-21803-4_14
136 X. Cen et al.
x. Therefore, the admissible solutions to the linear system is given by the inter-
section of two ellipsoids [13]:
As a robust approximation of the true solution, Beck and Eldar [4] suggested
the Chebyshev center of F, which leads to the minimax optimization (CC).
(CC) is difficult to solve. Relaxing the inner nonconvex quadratic optimiza-
tion problem to its Lagrange dual (which can be reformulated as a semidefinite
programming (SDP) minimization), Beck and Eldar [4] proposed the SDP relax-
ation approach for (CC). Their numerical experiments demonstrated that this
approximation is “pretty good” in practice. Interestingly, when (CC) is defined
on the complex domain rather than the real space, there is no gap between (CC)
and this SDP relaxation since strong duality holds for the inner quadratic max-
imization with two quadratic constraints over complex domain [3]. The other
zero-duality case is reported in [2] when both ellipsoids are Euclidean balls and
n ≥ 2. The SDP relaxation approach was later extended by Eldar et al. [10]
to find the Chebyshev center of the intersection of multiple ellipsoids, where an
alternative derivation of the SDP relaxation was presented.
To the best of our knowledge, there is no particular global optimization
method for solving (CC). Moreover, the following two questions is unknown:
– The SDP relaxation has been shown “pretty good” only in numerical exper-
iments [4]. Is there any theoretical guarantee?
– Can (CC) be globally solved in polynomial time?
In this paper, we will positively answer the above two questions. In particular,
we establish in Sect. 2 the worst-case approximation bound of the SDP relaxation
of (CC). In Sect. 3, we propose a global optimization method to solve (CC) and
show that it can be done in polynomial time. As a by-product, in Sect. 4, we
show that based on (CC) one can randomly generate Celis-Dennis-Tapia (CDT)
subproblems having positive Lagrangian duality gap with high probability.
Notations. Let σmax (·) and σmin (·) be the largest and smallest singular
values of the matrix (·), respectively. Denote by In the n × n identity matrix.
v(·) denotes the optimal valueof the problem (·). For two n × n symmetric
n n
matrices A and B, Tr(AB) = i=1 j=1 aij bij returns the inner product of A
and B. A ()B means that the matrix A − B is positive (semi)definite. Let
0n and O be the n-dimensional zero vector and n × n zero matrix, respectively.
For a real number x, x
denotes the smallest integer larger than or equal to x.
We first introduce the SDP relaxation in a new simple way. Consider the
inner nonconvex maximization of (CC)
(QP(z)) max {xT x − 2z T x + z T z}. (2)
x∈Ω
2
αi FiT Fi In . (6)
i=1
The last inequality actually holds as an equality since one can verify that
A b A b In A−1 b
0, A In =⇒ 0. (7)
bT c bT c bT A−1 bT A−2 b
Denote by (SDP) the SDP relaxation (4)–(6). Let (α1∗ , α2∗ ) be an optimal
solution of (SDP). Then, according to (7), the optimal solution argminz v(D(z))
is recovered by
2
−1 2
where the parameter γ (0 ≤ γ < 1) is the optimal value of the following univari-
ate concave maximization problem:
−1
γ = sup λg1 2 + (1 − λ)g2 2 − l(λ)T λF1T F1 + (1 − λ)F2T F2 l(λ),
0<λ<1
and l(λ) = λF1T g1 + (1 − λ)F2T g2 . Moreover, suppose both ellipsoids are non-
degenerate, then γ is bounded by the distance between their centers, denoted by
c1 and c2 respectively. That is,
γ ≤ min{σmax
2
(F1 ), σmax
2
(F2 )} · c1 − c2 2 . (10)
Proof. The proof is based on the following approximation bound for nonconvex
quadratic optimization.
Theorem 2 (Theorem 2.3 [12]). Consider the following nonconvex quadratic
optimization problem with k ellipsoid constraints
(EQP) maxx∈Rn g(x) = xT F0 x + 2g0T x
s.t Fi x + gi 2 ≤ 1, i = 1, . . . , k.
Suppose 0n is in the interior of the feasible region of (EQP) and the primal SDP
relaxation of (EQP), denoted by (SDR) has an optimal solution. Then, a feasible
solution x̃ of (EQP) can be generated in polynomial time such that
√
2
1− γ
g(x̃) ≥ √ √ v(SDR),
r̃ + γ
Since we have assumed gi < 1, i = 1, 2, it follows from the definition (11) that
γ ≤ max F1 0n + g1 2 , F1 0n + g2 2 = max g1 2 , g2 2 < 1.
Now we show (10). For the two centers c1 and c2 , we have Fi ci + gi = 0 for
i = 1, 2. It implies from the definition (11) that
√
γ ≤ max {F1 c1 + g1 , F2 c1 + g2 } = F2 c2 + g2 + F2 (c1 − c2 )
≤ F2 c2 + g2 + σmax (F2 )c1 − c2 = σmax (F2 )c1 − c2 .
Similarly, we have
√
γ ≤ σmax (F1 )c1 − c2 .
The proof of (10) is complete.
Remark 1. The approximation bound (9) is not tight. Interestingly, when γ = 0,
one can prove that v(CC) = v(SDP).
Solving the above determinantal equations are reduced to find the generalized
zero eigenvalues. Therefore, the computational complexity of the global algo-
rithm for solving CDT problem is at most O(n6 log log u−1 ), where u is a unit
roundoff.
Now, we focus on solving (CC), which is an unconstrained optimization prob-
lem in terms of z. Since the optimal z-solution is clearly in the interior of Ω, we
can add a redundant constraint to (CC):
v(CC) =minz f (z) := z T z + max{xT x − 2z T x} (12)
x∈Ω
n
s.t. z ∈ Q := {z ∈ R : f¯(z) := F1 z + g1 2 − 1 ≤ 0}.
One can see that the convex feasible region Q is bounded, closed and has
nonempty interior. The objective function f (z) is nonsmooth but convex. For
any given point z, let x∗ (z) be an optimal solution of maxx∈Ω {xT x − 2z T x},
which is solved as a CDT subproblem. Then, a subgradient of f (z) at any point
z is given by
g(z) = 2z − 2x∗ (z). (13)
We employ the ellipsoid method to solve the nonsmooth convex problem
(12). The algorithmic framework as shown in [14] is presented in the following,
where zc := −(F1T F1 )−1 F1T g is the center of the ellipsoid Q and ḡ(z) denotes
the gradient of f¯(z).
Ellipsoid method
The following convergence result of the above ellipsoid method for solving
nonsmooth convex optimization problem minz∈Q f (z) (whose optimal value and
optimal solution are denoted by f ∗ and z ∗ , respectively) can be found in [14].
Theorem 3. (Theorem 3.2.8, [14]). Let f(z) be Lipschitz continuous on ball {z ∈
Rn : z − z ∗ ≤ R} with some constant M . Assume that there exists some ρ > 0
and z̄ ∈ Q such that {z ∈ Rn : z − z̄ ≤ ρ} ⊆ Q, then for any
Q ∩ {y0 , y1 , . . . , yk } = ∅ and
1 − k
min f (yi ) − f ∗ ≤ M R2 · e 2(n+1)2 .
0≤j≤k,yj ∈Q ρ
Proof. According to the definitions of (QP(z)) (2) and (D(z)) (3), we have
where the first inequality follows from weak duality and the second inequality
holds trivially.
Under the assumption v(SDP) > v(CC), it follows from the above chain of
inequalities and the definition v(SDP) = minz v(D(z)) that
v(QP(z ∗ )) < v(D(z ∗ )).
The proof is complete since (D(z ∗ )) is the Lagrangian dual problem of the CDT
subproblem (QP(z ∗ )).
We tested 1000 instances of (CC) in two and three dimensions, respectively,
where each component of the input Fi and gi (i = 1, 2) is randomly, indepen-
dently and uniformly generated in {0, 0.01, 0.02, · · · , 0.99, 1}. v(CC) and v(SDP)
are solved by the ellipsoid method in Sect. 3 and the solver CVX [11], respec-
tively. To our surprise, among the 1000 two-dimensional instances, there are
766 instances satisfying v(SDP) > v(CC). While for the 1000 three-dimensional
instances, the number of instances satisfying v(SDP) > v(CC) is 916. It implies
that, with the help of (CC) and Proposition 1, one can generate CDT subproblem
admitting positive duality gap with a high probability.
Finally, we illustrate two small examples of (CC) and the corresponding CDT
subproblems (QP(z ∗ )). For each example, we plot in Fig. 1 the exact Chebyshev
center and the
corresponding SDP approximation, the smallest covering circle
with
radius v(CC) and the approximated circle via SDP relation whose radius
is v(SDP). One can observe that the smaller the distance between the centers
of the two input ellipses, the tighter the SDP relaxation. It demonstrated the
relation (10) in Theorem 1.
Example 1. Let
10 0.94 0.01 0.88 0.51
n = 2, F1 = , g1 = , F2 = , g2 = .
01 0.19 0.72 0.39 0.15
0.5 3
2
0
+
* 1
-0.5
0 *+
-1 -1
-2
-1.5
-3
Fig. 1. Two examples in two dimension where the input ellipses are plotted in solid line.
The dotted and dashed circles are the Chebyshev solutions and the SDP approximation,
respectively. Chebyshev centers and the corresponding SDP approximation are marked
by ∗ and +, respectively.
References
1. Ai, W., Zhang, S.: Strong duality for the CDT subproblem: a necessary and suffi-
cient condition. SIAM J. Optim. 19(4), 1735–1756 (2009)
2. Beck, A.: Convexity properties associated with nonconvex quadratic matrix func-
tions and applications to quadratic programming. J. Optim. Theory Appl. 142(1),
1–29 (2009)
3. Beck, A., Eldar, Y.: Strong duality in nonconvex quadratic optimization with two
quadratic constraints. SIAM J. Optim. 17(3), 844–860 (2006)
4. Beck, A., Eldar, Y.: Regularization in regression with bounded noise: a Chebyshev
center approach. SIAM J. Matrix Anal. Appl. 29(2), 606–625 (2007)
5. Bienstock, D.: A note on polynomial solvability of the CDT problem. SIAM J.
Optim. 26(1), 488–498 (2016)
6. Burer, S.: A gentle, geometric introduction to copositive optimization. Math. Pro-
gram. 151(1), 89–116 (2015)
7. Burer, S., Anstreicher, K.M.: Second-order-cone constraints for extended trust-
region subproblems. SIAM J. Optim. 23(1), 432–451 (2013)
8. Chen, X., Yuan, Y.: On local solutions of the Celis-Dennis-Tapia subproblem.
SIAM J. Optim. 10(2), 359–383 (2000)
9. Consolini, L., Locatelli, M.: On the complexity of quadratic programming with two
quadratic constraints. Math. Program. 164(1–2), 91–128 (2017)
10. Eldar, Y., Beck, A.: A minimax Chebyshev estimator for bounded error estimation.
IEEE Trans. Signal Process. 56(4), 1388–1397 (2008)
11. Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming
error estimation, version 2.1. (March 2014). http://cvxr.com/cvx
12. Hsia, Y., Wang, S., Xu, Z.: Improved semidefinite approximation bounds for non-
convex nonhomogeneous quadratic optimization with ellipsoid constraints. Oper.
Res. Lett. 43(4), 378–383 (2015)
144 X. Cen et al.
13. Milanese, M., Vicino, A.: Optimal estimation theory for dynamic systems with set
membership uncertainty: an overview. Automatica 27(6), 997–1009 (1991)
14. Nesterov, Y.: Introductory Lectures on Convex Optimizaiton: A Basic Course.
Kluwer Academic, Boston (2004)
15. Sakaue, S., Nakatsukasa, Y., Takeda, A., Iwata, S.: Solving generalized CDT prob-
lems via two-parameter eigenvalues. SIAM J. Optim. 26(3), 1669–1694 (2016)
16. Xia, Y., Yang, M., Wang, S.: Chebyshev center of the intersection of balls: com-
plexity, relaxation and approximation (2019). arXiv:1901.07645
17. Yang, B., Burer, S.: A two-variable approach to the two-trust-region subproblem.
SIAM J. Optim. 26(1), 661–680 (2016)
18. Ye, Y., Zhang, S.: New results on quadratic minimization. SIAM J. Optim. 14(1),
245–267 (2003)
19. Yuan, J., Wang, M., Ai, W., Shuai, T.: New results on narrowing the duality gap of
the extended Celis-Dennis-Tapia problem. SIAM J. Optim. 27(2), 890–909 (2017)
20. Yuan, Y.: On a subproblem of trust region algorithms for constrained optimization.
Math. Program. 47(1–3), 53–63 (1990)
On Conic Relaxations of Generalization
of the Extended Trust Region
Subproblem
1 Introduction
We consider the following quadratically constrained quadratic programming
(QCQP) problem,
1
(P0 ) min z T Cz + cT z
2
1
s.t. z T Bz + bT z + e ≤ 0, (1)
2
AT z ≤ d,
where C and B are n×n symmetric matrices, not necessary positive semidefinite,
A is an n × m matrices, c, b ∈ Rn , e ∈ R and d ∈ Rm . Problem (P0 ) is
nonconvex since both the quadratic objective and the quadratic constraint may
Supported by Shanghai Sailing Program 18YF1401700, Natural Science Foundation
of China (NSFC) 11801087 and Hong Kong Research Grants Council under Grants
14213716 and 14202017.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 145–154, 2020.
https://doi.org/10.1007/978-3-030-21803-4_15
146 R. Jiang and D. Li
where λmin (C) stands for the minimal eigenvalue of C and [a1 , . . . , am ] = A, and
showed its immediate application in robust least squares and a robust SOCP
model problem. Hsia and Sheu [12] derived a more general sufficient condition,
After that, using KKT conditions of the SDP relaxation (in fact, an equiva-
lent SOCP relaxation) of the ETRS, Locatelli [18] presented a better sufficient
condition than [12], which corresponds to the solution conditions of a specific
linear system. Meanwhile, Ho-Nguyen and Kilinc-Karzan [11] also developed a
sufficient condition by identifying the feasibility of a linear system. In fact, the
two conditions in [11,12] are equivalent for the ETRS as stated in [11].
In this paper, we mainly focus on a generalization of ETRS (GETRS), which
replaces the unit ball constraint in ETRS with a general, possibly nonconvex,
quadratic constraint. To the best of our knowledge, the current literature lacks
study on the equivalence between the GETRS and its SDP relaxation. Our study
in this paper on the equivalence between the GETRS and its SDP relaxation is
motivated not only by wide applications of the GETRS, but also by its theoretical
implication to a more general class of QCQP problems. The GETRS is much
more difficult than ETRS as the feasible region of the GETRS is no longer
compact and the optimal solution may be unattainable in some cases and the
null space of C + uB in the GETRS is more complicated than that in the ETRS,
where u is the corresponding KKT multiplier of constraint (1). To introduce our
investigation of sufficient conditions when the SDP relaxation is exact, we first
define the set IP SD = {λ : C + λB 0}, which is in fact an interval [19]. Define
IP+SD = IP SD R+ , where R+ is the nonnegative orthogonal axis. We then focus
On Conic Relaxations of Generalization of the Extended 147
the condition that the set IP SD has a nonempty interior. We mainly show that
under this condition the SDP relaxation is equivalent to an SOCP reformulation.
We then derive sufficient conditions under which the SDP relaxation of problem
(P) is tight.
Notation For any index set J, we define AJ as the restriction of matrix A to
the rows indexed by J and vJ as the restriction of vector v to the entries indexed
by J. We denote by the notation J C the complementary set of J. The notation
v denotes the Euclidean norm of vector v. We use Diag(A) and diag(a) to
denote the vector formed by the diagonal entries of matrix A and the diagonal
matrix formed by vector a, respectively. And v(·) represents the optimal value
of problem (·). We use Null(A) to denote the null space of matrix A.
2 Optimality Conditions
In this section, to simplify our problem, we consider the case when Slater condi-
tion of the SDP relaxationholds and further show a sufficient exactness condition
of the SDP relaxation when IP+SD has a nonempty interior.
In this section, we consider the case IP+SD has a nonempty interior, which is
also known as the regular condition in the study of the GTRS [19,26]. In fact,
int(IP+SD ) = ∅ implies that the two matrices C and B are SD [28]. That is,
there exists a nonsingular matrix U such that U T CU and U T BU both become
diagonal matrices. Then problem (P0 ) can then be reformulated, via a change
of variables z = U x, as follows,
n
n
1
(P) min δi x2i + εi xi
i=1
2 i=1
n n
1
s.t. αi x2i + βi xi + e ≤ 0,
i=1
2 i=1
ĀT x ≤ d,
The equivalence of (SOCP) and (SDP) is obvious and thus we only need to focus
on identifying the exactness of (SOCP).
148 R. Jiang and D. Li
It is well known that under Slater condition any optimal solution of convex
problems must satisfy KKT conditions [4]. This fact enables us to find sufficient
conditions that guarantee the exactness of the SDP relaxation. Let us denote the
jth column of matrix Ā by aj . Then the KKT conditions of the convex problem
(SOCP) are given as follows:
1
2 (δi + uαi ) − wi = 0, i = 1 . . . , n,
m
εi + uβi + j=1 vj aji + wi xi = 0, i = 1 . . . , n,
n 1 n
i=1 2 αi yi + i=1 βi xi + e ≤ 0,
j T
(ā ) x ≤ dj , j = 1, . . . , m,
x2i ≤ yi , n i = 1, . . . , n, (2)
n
u( i=1 12 αi yi + i=1 βi xi + e) = 0
vj ((āj )T x − dj ) = 0 j = 1, . . . , m,
wi (x2i − yi ) = 0, i = 1, . . . , n,
u, vj , wi ≥ 0 j = 1, . . . , m, i = 1, . . . , n,
n n
where u is the KKT multiplier of the constraint i=1 12 αi yi + i=1 βi xi + e ≤ 0,
vj is the KKT multiplier of the constraint (āj )T x ≤ dj , j = 1, . . . , m, and wi is
the KKT multiplier of the constraint x2i ≤ yi , i = 1, . . . , n.
The following lemma shows that the SDP relaxation is always bounded from
below and the optimal solution is attainable if int(IP+SD ) = ∅ and problem (P)
is feasible, which is weaker than Slater condition of the original problem (P).
Lemma 1 If int(IP+SD ) = ∅ and problem (P) is feasible, then the SDP relaxation
of (P) is bounded from below and the optimal value is attainable.
Proof. Consider the following Lagrangian dual problem of (P) ([24,25]), which
is also the conic dual problem of (SDP),
(L) max − τ /2 + ue − dT v
u,v,τ
C + uB ε + uβ + Av
s.t. M := 0.
(ε + uβ + Av)T τ
u ≥ 0, v ≥ 0.
Since int(IP+SD ) = ∅, we can always find some (v, τ ) such that the matrix M
is positive semidefinite for any u ∈ int(IP+SD ). In fact, for any u ∈ int(IP+SD )
we have C + uB 0 and thus ∃τ ≥ 0 such that M 0 for every v ≥ 0, e.g.,
τ = (ε + uβ + Av)T (C + uB)−1 (ε + uβ + Av) + 1. This means (τ, u, v) satisfies
Slater condition for problem (L). As Slater condition of problem (SDP ) implies
its feasibility, we have v(SDP) ≤ +∞. And problem (L) is bounded from above
due to weak duality, i.e., v(D) ≤ v(SDP). Hence from strong duality, the optimal
value of the SDP relaxation is equivalent to problem (L) and the objective value
is attainable [4].
we have AJ = [a1J , . . . , am
J ], where the superscribe means the column index. We
next show a sufficient condition, which is a generalization of the result in [18],
to guarantee the exactness of the SDP relaxation.
Condition 2 The interior of IP+SD is not empty. For any u ∈ ∂IP+SD , if J = ∅,
then {v : εJ + uβJ + ĀJ v = 0} ∩ Rm
+ = ∅.
Theorem 3 Assume that Slater condition holds for problem (SOCP). If Condi-
tion 2 holds, the SDP relaxation is exact and the optimal values of both the SDP
relaxation and problem (P) are attainable.
Proof. From Lemma 1, we obtain that (SOCP) is bounded from below and
the optimal solution is attainable. Then due to Slater condition, every optimal
solution of (SOCP) must be a KKT solution of system (2). So we have the
following two cases:
1. If u ∈ ∂IP+SD , then either J = ∅ or J = ∅. For the first case, 21 (δi +uαi )−wi =
0 implies that wi = 12 (δi + uαi ) > 0. This, together with complementary
slackness wi (x2i − yi ) = 0, implies that x2i = yi , i.e., (SOCP) is already exact.
For the latter case, the KKT condition 12 (δi + uαi ) − wi = 0, i = 1, . . . , n,
implies that wi = 0, ∀i ∈ J. But Condition 2 shows {v : J + uβJ + ĀJ v =
0} ∩ Rm + = ∅, i.e., there is no KKT solution satisfying the second equations
in (2) in this case.
2. Otherwise, u ∈ int(IP+SD ) and wi = 12 (δi + uαi ) > 0 for all u ∈ int(IP+SD ).
By the complementary slackness wi (x2i − yi ) = 0, we have x2i − yi = 0, ∀i =
1, . . . , n, and thus the SOCP relaxation is exact.
Let us consider now the following illustrative example In this problem, IP+SD =
[1, 2] is an interval and ∂IP+SD = {1, 2}. One may check that Condition 2
is satisfied. The optimal value of the SDP relaxation
is −2.44082 with x =
T 1.6660 1.0534
(−1.2907 − 0.8161) and X = . It is easy to verify that
1.0534 0.6660
X = xxT and the SDP relaxation is exact.
Motivated by the perturbation condition in Theorem 3.1 in [18], we propose
the following condition to extend Condition 2.
Condition 4 The interior of IP+SD . For any u ∈ ∂IP+SD , if J = ∅, then ∀ > 0,
∃ η ∈ RJ such that η ≤ and {v : εJ + η + uβJ + ĀJ v = 0} ∩ Rm+ = ∅.
Condition 4 will also guarantee the exactness of the SDP relaxation under the
same mild assumptions as in Theorem 3.
Theorem 5 Assume that Slater condition holds for problem (SOCP). If Condi-
tion 4 holds, the SDP relaxation is exact and the optimal values of SDP relaxation
and problem (P) both are attainable.
Remark 6 When B reduces to the identical matrix, problem (P) reduces to the
ETRS and an exactness condition is given in [18]. The difficulty in our proof,
compared to the results in [18], mainly comes from the possible non-compactness
of the feasible region.
In the above example, IP+SD = [1, 2] is an interval and ∂IP+SD = {1, 2}. It is easy
to verify that Condition 2 is not fulfilled but
√ Condition 4 is fulfilled for any > 0
and η = t(1 1)T , where t ∈ R and t ≤ 2 /2. The optimal value of the SDP
T 10
relaxation is −1 with x = (−1 0) and X = . So we have X = xxT and
00
the SDP relaxation is exact.
In fact, Condition 2 holds if and only if the following linear programming
problem has no solution,
(LP) min 0
v
s.t. εJ + uβJ + ĀJ v = 0,
v ≥ 0.
s.t. ĀTJ y ≤ 0.
Proof. The first statement follows directly from the infeasibility of (LP) and
strong duality. Condition 4 is equivalent to that ∀ > 0, ∃ η ∈ RJ with η ≤
and {v : εJ + η + uβJ + ĀJ v = 0} ∩ Rm+ = ∅, i.e.,
(LP ) min 0
v
s.t. εJ + η + uβJ + ĀJ v = 0,
v ≥ 0.
On Conic Relaxations of Generalization of the Extended 151
s.t. ĀTJ y ≤ 0.
The above lemma shows: Condition 2 holds if and only if (LD) is unbounded
from above, which is equivalent to that there exists a nonzero ȳ such that ĀTJ ȳ ≤
0 and −(εJ + uβJ )T ȳ > 0. And thus by defining ỹ = kȳ, we have ĀTJ ỹ ≤ 0 and
−(εJ + uβJ )T ỹ → ∞ as k → ∞; On the other hand, when Condition 2 fails,
Condition 4 holds if and only if there exists a nonzero ȳ such that ĀTJ ȳ ≤ 0 and
−(εJ + uβJ )T ȳ = 0. The above two statements can be simplified as: There exists
a nonzero ȳ such that ĀTJ ȳ ≤ 0 and (εJ + uβJ )T ȳ ≤ 0. Assume θ ∈ Rn with
θJ C = 0, θJ = ȳ. Then we hvae ĀT θ = ĀTJ y ≤ 0, (ε + uβ)T θ = (εJ + uβJ )T ȳ ≤
0, which is also equivalent to, by defining θ = U z, that ∃z ∈ Rn such that
(C + uB)z = 0, AT z ≤ 0 and (ε + uβ)T z ≤ 0. (Note that U is the congruent
matrix such that U T CU = diag(δ) and U T BU = diag(α) as mentioned in the
beginning of this section.) The above implication suggests that Conditions 2 and
4 can be combined as the following condition.
Condition 8 For any u ∈ ∂IP+SD , if Null(C + uB) = ∅, there exists a nonzero
z ∈ Rn such that (C + uB)z = 0, AT z ≤ 0 and (ε + uβ)T z ≤ 0.
Then we can summarise our main result under the condition int(IP+SD ) = ∅ in
the following theorem.
Theorem 9 When int(IP+SD ) = ∅ and Condition 8 holds, the SDP relaxation is
exact. Moreover, if Slater condition holds, both problem (P) and its SDP relax-
ation are bounded from below and the optimal values are attainable.
An advantage of Condition 8 is that it can be directly checked by the original
data set, i.e., we do not need invoke the congruence transformation to get the SD
form. In particular, when the quadratic constraint reduces to the unit ball con-
straint, problem (P) reduces to the ETRS, and Condition 8 reduces to Condition
2.1 in [11], i.e., there exists a nonzero vector z such that(C + λmin (C)I)z = 0,
AT z ≤ 0 and cT z ≤ 0; Conditions 2 and 4 reduces to (13) and (14) in [18]. As a
result, for problem (BP), Condition 2.1 in [11] is equivalent to (13) and (14) in
[18], which was also indicated in [11].
Together with the fulfillment of Slater condition, we further have the following
S-lemma with linear inequalities.
Theorem 10 (S-lemma with linear inequalities) Assume that there exists (X, x)
such that 12 B • X + bT x + e ≤ 0, AT x ≤ d and X xxT , int(IP+SD ) = ∅ and
Condition 8 holds. Then the following two statements are equivalent:
(i) 12 xT Bx + bT x + e ≤ 0 and AT x ≤ d ⇒ 12 xT Cx + cT x + γ ≥ 0.
(ii) ∃u, v1 , . . . , vm ≥ 0, ∀x ∈ Rn , 12 xT Cx + cT x + f + u( 12 xT Bx + bT x + e) +
v T (AT x − d) ≥ 0.
152 R. Jiang and D. Li
Proof. It is obvious that (ii) ⇒ (i). Next let us prove (i) ⇒ (ii). From Theorem 9,
we obtain that the SDP relaxation is bounded from below. So the SDP relaxation
is equivalent to the Lagrangian duality of problem (P) [24]. Hence,
1 T 1
max min L(x, u, v) := x Cx + cT x + u( xT Bx + bT x + e) + v T (Ax − d)
u≥0,v≥0 x 2 2
= v(SDP)
= v(P)
1 1
= min{ xT Cx + cT x : xT Bx + bT x + e ≤ 0, Ax ≤ d}.
x 2 2
Remark 11. The classical S-lemma, which first proposed by Yakubovich [29],
and its variants have a lot of applications in the real world, see the survey paper
[22]. To the best of our knowledge, our S-lemma is the most general one with
linear constraints, while the S-lemma in Jeyakumar and Li [13] is confined to a
unit ball constraint.
3 Conclusions
References
1. Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust optimization. Princeton Uni-
versity Press (2009)
On Conic Relaxations of Generalization of the Extended 153
2. Ben-Tal, A., den Hertog, D.: Hidden conic quadratic representation of some non-
convex quadratic optimization problems. Math. Program. 143(1–2), 1–29 (2014)
3. Ben-Tal, A., Teboulle, M.: Hidden convexity in some nonconvex quadratically con-
strained quadratic programming. Math. Program. 72(1), 51–63 (1996)
4. Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press
(2004)
5. Burer, S., Anstreicher, K.M.: Second-order-cone constraints for extended trust-
region subproblems. SIAM J. Optim. 23(1), 432–451 (2013)
6. Burer, S., Yang, B.: The trust region subproblem with non-intersecting linear con-
straints. Math. Program. 149(1–2), 253–264 (2015)
7. Conn, A.R., Gould, N.I., Toint, P.L.: Trust Region Methods, vol. 1. Society for
Industrial and Applied Mathematics (SIAM), Philadelphia (2000)
8. Fallahi, S., Salahi, M., Karbasy, S.A.: On SOCP/SDP formulation of the extended
trust region subproblem (2018). arXiv:1807.07815
9. Feng, J.M., Lin, G.X., Sheu, R.L., Xia, Y.: Duality and solutions for quadratic
programming over single non-homogeneous quadratic constraint. J. Glob. Optim.
54(2), 275–293 (2012)
10. Hazan, E., Koren, T.: A linear-time algorithm for trust region problems. Math.
Program. 1–19 (2015)
11. Ho-Nguyen, N., Kilinc-Karzan, F.: A second-order cone based approach for solving
the trust-region subproblem and its variants. SIAM J. Optim. 27(3), 1485–1512
(2017)
12. Hsia, Y., Sheu, R.L.: Trust region subproblem with a fixed number of additional
linear inequality constraints has polynomial complexity (2013). arXiv:1312.1398
13. Jeyakumar, V., Li, G.: Trust-region problems with linear inequality constraints:
exact SDP relaxation, global optimality and robust optimization. Math. Program.
147(1–2), 171–206 (2014)
14. Jiang, R., Li, D.: Novel reformulations and efficient algorithm for the generalized
trust region subproblem (2017). arXiv:1707.08706
15. Jiang, R., Li, D.: A linear-time algorithm for generalized trust region problems
(2018). arXiv:1807.07563
16. Jiang, R., Li, D.: Exactness conditions for SDP/SOCP relaxations of generalization
of the extended trust region subproblem. Working paper (2019)
17. Jiang, R., Li, D., Wu, B.: SOCP reformulation for the generalized trust region sub-
problem via a canonical form of two symmetric matrices. Math. Program. 169(2),
531–563 (2018)
18. Locatelli, M.: Exactness conditions for an SDP relaxation of the extended trust
region problem. Optim. Lett. 10(6), 1141–1151 (2016)
19. Moré, J.J.: Generalizations of the trust region problem. Optim. Methods Softw.
2(3–4), 189–209 (1993)
20. Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat.
Comput. 4(3), 553–572 (1983)
21. Pardalos, P.M.: Global optimization algorithms for linearly constrained indefinite
quadratic problems. Comput. Math. Appl. 21(6), 87–97 (1991)
22. Pólik, I., Terlaky, T.: A survey of the s-lemma. SIAM Rev. 49(3), 371–418 (2007)
23. Rendl, F., Wolkowicz, H.: A semidefinite framework for trust region subproblems
with applications to large scale minimization. Math. Program. 77(1), 273–299
(1997)
24. Shor, N.Z.: Quadratic optimization problems. Sov. J. Comput. Syst. Sci. 25(6),
1–11 (1987)
154 R. Jiang and D. Li
25. Shor, N.: Dual quadratic estimates in polynomial and boolean programming. Ann.
Oper. Res. 25(1), 163–168 (1990)
26. Stern, R.J., Wolkowicz, H.: Indefinite trust region subproblems and nonsymmetric
eigenvalue perturbations. SIAM J. Optim. 5(2), 286–313 (1995)
27. Sturm, J.F., Zhang, S.: On cones of nonnegative quadratic functions. Math. Oper.
Res. 28(2), 246–267 (2003)
28. Uhlig, F.: Definite and semidefinite matrices in a real symmetric matrix pencil.
Pac. J. Math. 49(2), 561–568 (1973)
29. Yakubovich, V.A.: S-procedure in nonlinear control theory. Vestnik Leningrad Uni-
versity, vol. 1, pp. 62–77 (1971)
30. Ye, Y., Zhang, S.: New results on quadratic minimization. SIAM J. Optim. 14(1),
245–267 (2003)
On Constrained Optimization Problems
Solved Using the Canonical Duality
Theory
Constantin Zălinescu1,2(B)
1
University “Al. I. Cuza” Iasi, Bd. Carol I 11, Iasi, Romania
zalinesc@uaic.ro
2
Octav Mayer Institute of Mathematics, Bd. Carol I 8, Iasi, Romania
Abstract. D.Y. Gao together with some of his collaborators applied his
Canonical duality theory (CDT) for solving a class of constrained opti-
mization problems. Unfortunately, in several papers on this subject there
are unclear statements, not convincing proofs, or even false results. It is
our aim in this work to study rigorously this class of constrained opti-
mization problems in finite dimensional spaces and to point out several
false results published in the last ten years.
1 Preliminaries
We consider the following constrained minimization problem
(PJ ) min f (x) s.t. x ∈ XJ ,
where J ⊂ 1, m,
XJ := x ∈ Rn | [∀j ∈ J : gj (x) = 0] ∧ [∀j ∈ J c : gj (x) ≤ 0]
with J c := 1, m \ J, f := g0 and
gk (x) := qk (x) + Vk (Λk (x)) x ∈ Rn , k ∈ 0, m ,
qk (x) := 1
2 x, Ak x−bk , x+ck ∧ Λk (x) := 1
2 x, Ck x−dk , x+ek (x ∈ Rn )
m
X : = x ∈ Rn | ∀k ∈ 0, m : Λk (x) ∈ domVk = k=0 Λ−1 (domVk ) ,
k
X0 := x ∈ R | ∀k ∈ 0, m : Λk (x) ∈ int(domVk ) ⊂ intX;
n
m
Ξ(x, λ, σ) = 1
2 x, G(λ, σ)x − F (λ, σ), x + E(λ, σ) − λk Vk∗ (σk ). (2)
k=0
On Constrained Optimization Problems Solved Using CDT 157
and
ΓJ := λ ∈ Rm | λj ≥ 0 ∀j ∈ J c ⊃ Rm
+ := {λ ∈ R | λj ≥ 0 ∀j ∈ 1, m },
m
respectively; clearly,
Γ∅ = Rm
+, Γ1,m = Rm , ΓJ∩K = ΓJ ∩ ΓK ∀J, K ⊂ 1, m.
where
m {0} if k ∈ J ∩ Q,
IJ,Q := Ik∗∗ with Ik∗∗ :=
k=0 Ik∗ if k ∈ 0, m \ (J ∩ Q),
and
f (x) if x ∈ XJ∩Q ,
sup Ξ(x, λ, σ) = sup L(x, λ) =
(λ,σ)∈ΓJ∩Q ×IJ,Q λ∈ΓJ∩Q ∞ if x ∈ X \ XJ∩Q .
Taking into account (2), we have that Ξ(·, λ, σ) is [strictly] convex for (λ, σ) ∈
+
Tcol [(λ, σ) ∈ T + ], and so
Observe that T ∩ (Rm × intI ∗ ) ⊂ intT , and for any σ ∈ I ∗ we have that the
set {λ ∈ Rm | (λ, σ) ∈ T } is open. Similarly to the computation of ∂D(λ)
∂λj in [2,
p. 5], using the expression of D(λ, σ) in (4), we get
∂D(λ, σ)
= qj (ξ(λ, σ)) + σj Λj (ξ(λ, σ)) − Vj∗ (σj ) ∀j ∈ 1, m, ∀(λ, σ) ∈ T,
∂λj
and
∂D(λ, σ)
= λk [Λk (ξ(λ, σ)) − Vk∗ (σk )] ∀k ∈ 0, m, ∀(λ, σ) ∈ T ∩ (Rm × intI ∗ ).
∂σk
Lemma 3. Let (λ, σ) ∈ (Rm × intI ∗ ) ∩ T and set x := ξ(λ, σ). Then
or, equivalently,
x ∈ XJ ∧ λ ∈ ΓJ ∧ ∀j ∈ J c : λj gj (x) = 0 ;
∀j ∈ J c : λj ≥ 0 ∧ ∂Ξ
∂λj (x, λ, σ) ≤ 0 ∧ λj ∂λ
∂Ξ
j
(x, λ, σ) = 0,
In the case in which J = ∅ we obtain the notions of KKT points for Ξ and
D. So, (x, λ, σ) ∈ Rn × Rm × intI ∗ is a KKT point of Ξ if ∇x Ξ(x, λ, σ) = 0,
∇σ Ξ(x, λ, σ) = 0 and
λ ∈ Rm + ∧ ∇λ Ξ(x, λ, σ) ∈ R− ∧
m
λ, ∇λ Ξ(x, λ, σ) = 0, (6)
where Rm − := {λ ∈ R
m
| λj ≤ 0 ∀j ∈ 1, m }, and (λ, σ) ∈ Rm × intI ∗ is a KKT
point of D if ∇σ D(λ, σ) = 0 and
λ ∈ Rm+ ∧ ∇λ D(λ, σ) ∈ R− ∧
m
λ, ∇λ D(λ, σ) = 0.
The definition of a KKT point for Ξ is suggested in the proof of [4, Th. 3].
Observe that (x, λ, σ) verifying the conditions in (6) is called critical point of Ξ
in [5, p. 477].
Corollary 1. Let (λ, σ) ∈ (Rm × intI ∗ ) ∩ T .
(i) If x := ξ(λ, σ), then (x, λ, σ) is a J-LKKT point of Ξ if and only if
(λ, σ) is a J-LKKT point of D.
(ii) If M= (λ) = 1, m, then (x, λ, σ) is a J-LKKT point of Ξ if and only if
(x, λ, σ) is a critical point of Ξ, if and only if x = ξ(λ, σ) and (λ, σ) is a critical
point of D.
Remark 2. Taking into account Remark 1, as well as (3) and Lemma 3, the
functions ∇x Ξ, ξ, ∇σ D do not depend on σk for k ∈ Q. Consequently, if (x, λ, σ)
is a J-LKKT point of Ξ then σ k = 0 for k ∈ Q ∩ M= (λ), and (x, λ, σ̃) is also
a J-LKKT point of Ξ, where σ̃k := 0 for k ∈ Q and σ̃k := σ k for k ∈ 0, m \ Q.
Conversely, taking into account that ∇σ D does not depend on σk for k ∈ Q, if
(λ, σ) ∈ T is a J-LKKT point of D then (λ, σ̃) is also a J-LKKT point of D,
where σ̃k := 0 for k ∈ Q and σ̃k := σ k for k ∈ 0, m \ Q.
160 C. Zălinescu
Having in view the previous remark, without loss of generality, in the sequel
(if not mentioned otherwise) we shall assume that σ k = 0 for k ∈ Q when
(x, λ, σ) ∈ Rn × Rm × intI ∗ is a J-LKKT point of Ξ, or (λ, σ) ∈ T is a J-LKKT
point of D.
The main result of the paper is the next one; in it we can see the roles of different
hypotheses in getting the main conclusion, that is the min-max duality formula
provided by Eq. (7).
The remark below refers to the case Q = ∅. A similar remark (but a bit less
dramatic) is valid for Q0 = ∅.
On Constrained Optimization Problems Solved Using CDT 161
The next example shows that the condition Qc0 ⊂ M= (λ) is essential for
x to be a feasible solution of problem (PJ ); moreover, it shows that, unlike
J+
the quadratic case (see [2, Prop. 9]), it is not possible to replace TQ,col by
{(λ, σ) ∈ Tcol | λ ∈ ΓJ , G(λ, σ) 0} in (7). The problem is a particular
case of the one considered in [7, Ex. 1], “which is very simple, but important
in both theoretical study and real-world applications since the constraint is a
so-called double-well function, the most commonly used nonconvex potential in
physics and engineering sciences [7]”;1 more precisely, q := 1, c := 6, d := 4,
e := 2.
However, applying Proposition 2 we get assertion (i) and the last part of
assertion (ii) of [2, Prop. 9].
We have to remark that in all papers of D.Y. Gao on constrained optimization
problems in which CDT is used there is a result stating the “complementary-dual
principle”, and at least a result stating min-max duality formula. However, we
didn’t find a convincing proof of that min-max duality formula in these papers.
We mention below some results which are not true.
The problem considered by Gao, Ruan and Sherali in [5] is of type (Pi ).
Theorem 2 (Global Optimality Condition) of [5] is false because in the men-
tioned conditions x is not necessarily in Xi , as Example 1 shows. Also Theorem
1 (Complementary-Dual
Principle)
√ and Theorem 3 (Triality Theory) are false
because (λ, σ) = 0; (0, 14 + 8 3) is a critical point of D (by Lemma 3), while
the assertion “x is a KKT point of (P)” is not true because x = 6 ∈ / Xi . It is
On Constrained Optimization Problems Solved Using CDT 163
References
1. Rockafellar, R. T.: Convex Analysis. Princeton University Press, Princeton, N.J.
(1972)
2. Zălinescu, C.: On quadratic optimization problems and canonical duality theory.
arXiv:1809.09032 (2018)
3. Zălinescu, C.: On unconstrained optimization problems solved using CDT and tri-
ality theory. arXiv:1810.09009 (2018)
4. Ruan, N., Gao, D.Y.: Canonical duality theory for solving nonconvex/discrete con-
strained global optimization problems. In: Gao, D.Y., Latorre, V., Ruan, N. (eds.)
Canonical Duality Theory. Advances in Mechanics and Mathematics, vol. 37, pp.
187–201. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58017-3 9
5. Gao, D.Y., Ruan, N., Sherali, H.: Solutions and optimality criteria for nonconvex
constrained global optimization problems with connections between canonical and
Lagrangian duality. J. Global Optim. 45, 473–497 (2009)
6. Voisei, M.-D., Zălinescu, C.: Counterexamples to some triality and tri-duality
results. J. Global Optim. 49, 173–183 (2011)
7. Latorre, V., Gao, D.Y.: Canonical duality for solving general nonconvex constrained
problems. Optim. Lett. 10, 1763–1779 (2016)
8. Morales-Silva, D., Gao, D.Y.: On minimal distance between two surfaces. In: Gao,
D.Y., Latorre, V., Ruan, N. (eds.) Canonical Duality Theory. Advances in Mechanics
and Mathematics, vol. 37, pp. 359–371. Springer, Cham (2017). https://doi.org/10.
1007/978-3-319-58017-3 18
9. Voisei, M.-D., Zălinescu, C.: A counter-example to ‘minimal distance between two
non-convex surfaces’. Optimization 60, 593–602 (2011)
On Controlled Variational Inequalities
Involving Convex Functionals
Savin Treanţă(B)
1 Introduction
Convexity theory is an important foundation for studying a wide class of unre-
lated problems in a unified and general framework. Based on the notion of unique
sharp minimizer, introduced by Polyak [11], and taking into account the works
of Burke and Ferris [2], Patriksson [10], following Marcotte and Zhu [8], the
variational inequalities have been strongly investigated by using the concept of
weak sharp solution. We mention, in this respect, the works of Wu and Wu [15],
Oveisiha and Zafarani [9], Alshahrani et al. [1], Liu and Wu [6] and Zhu [16].
In this paper, motivated and inspired by the ongoing research in this area, we
introduce and investigate a new class of scalar variational inequalities. More pre-
cisely, by using several variational techniques presented in Clarke [3], Treanţă
[12,13] and Treanţă and Arana-Jiménez [14], we develop a new mathematical
framework on controlled continuous-time variational inequalities governed by
convex path-independent curvilinear integral functionals and, under some con-
ditions and using a dual gap-type functional, we provide some characterization
results for the associated solution set. As it is very well-known, the functionals
of mechanical work type, due to their physical meaning, become very important
in applications. Thus, the importance of this paper is supported both from the-
oretical and practical reasonings. As well, the ideas and techniques of this paper
may stimulate further research in this dynamic field.
Supported by University Politehnica of Bucharest, Bucharest, Romania (Grant No.
MA51-18-01).
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 164–174, 2020.
https://doi.org/10.1007/978-3-030-21803-4_17
On Controlled Variational Inequalities Involving Convex Functionals 165
∂x
for xβ := , β = 1, m, let X be the space of piecewise smooth state
∂tβ n
functions x : Θ → R with the norm
m
x = x ∞ + xβ ∞ , ∀x ∈ X;
β=1
Dζ Vβi = Dβ Vζi , β, ζ = 1, m, β = ζ, i = 1, n,
are assumed.
Note. Further, in this paper, it is assumed summation on repeated indices.
In the following, J 1 (Rm , Rn ) denotes the first-order jet bundle associated to
R and Rn . For β = 1, m, we consider the real-valued continuously differentiable
m
Definition 2.1 The scalar functional L(x, u) is called convex on X × U if, for
any (x, u), (x0 , u0 ) ∈ X × U, the following inequality
L(x, u) − L(x0 , u0 )
∂lβ 0 0 0 ∂lβ 0 0 0
≥ t, x , xϑ , u (x − x ) +
0
t, x , xϑ , u Dϑ (x − x ) dtβ
0
Υ ∂x ∂xϑ
∂lβ 0 0 0
+ t, x , xϑ , u (u − u0 ) dtβ
Υ ∂u
is satisfied.
Working assumptions. (i) In this work, it is assumed that the inner product
between the variational derivative of a scalar functional and an element (ψ, Ψ )
in X × U is accompanied by the condition ψ(t1 ) = ψ(t2 ) = 0.
(ii) Assume that
∂lβ
dU := Dϑ (x − x ) dtβ
0
∂xϑ
is an exact total differential and satisfies U (t1 ) = U (t2 ).
At this point, we have the necessary mathematical tools to formulate the
following controlled variational inequality problem: find (x0 , u0 ) ∈ X × U such
that
∂lβ 0 0 0 ∂lβ 0 0 0
(CV IP ) t, x , xϑ , u (x − x0 ) + t, x , xϑ , u Dϑ (x − x0 ) dtβ
Υ ∂x ∂xϑ
∂lβ 0 0 0
+ t, x , xϑ , u (u − u0 ) dtβ ≥ 0,
Υ ∂u
for any (x, u) ∈ X × U. The dual controlled variational inequality problem asso-
ciated to (CV IP ) is formulated as follows: find (x0 , u0 ) ∈ X × U such that
∂lβ ∂lβ
(DCV IP ) (t, x, xϑ , u) (x − x ) +
0
(t, x, xϑ , u) Dϑ (x − x ) dtβ
0
Υ ∂x ∂xϑ
∂lβ
+ (t, x, xϑ , u) (u − u ) dtβ ≥ 0,
0
Υ ∂u
for any (x, u) ∈ X × U.
Denote by (X × U)∗ and (X × U)∗ the solution set associated to (CV IP )
and (DCV IP ), respectively, and assume they are nonempty.
Remark 2.1 As it can be easily seen (see (ii) in Working assumptions), we can
reformulate the above controlled variational inequality problems as follows: find
(x0 , u0 ) ∈ X × U such that
δβ L δβ L
(CV IP )
( , ); (x − x0 , u − u0 ) ≥ 0, ∀(x, u) ∈ X × U,
δx0 δu0
respectively: find (x0 , u0 ) ∈ X × U such that
δβ L δβ L
(DCV IP )
( , ); (x − x0 , u − u0 ) ≥ 0, ∀(x, u) ∈ X × U.
δx δu
In the following, in order to describe the solution set (X × U)∗ associated
to (CV IP ), we introduce the following gap-type path-independent curvilinear
integral functionals.
Definition 2.3 For (x, u) ∈ X×U, the primal gap-type path-independent curvi-
linear integral functional associated to (CV IP ) is defined as
∂lβ
S(x, u) = 0 max { (t, x, xϑ , u) (x − x ) dtβ
0
(x ,u0 )∈X×U Υ ∂x
168 S. Treanţă
∂lβ ∂lβ
+ (t, x, xϑ , u) Dϑ (x − x ) +
0
(t, x, xϑ , u) (u − u ) dtβ },
0
Υ ∂xϑ ∂u
and the dual gap-type path-independent curvilinear integral functional associated
to (CV IP ) is defined as follows
∂lβ 0 0 0
R(x, u) = 0 max { t, x , x ϑ , u (x − x0
) dtβ
(x ,u0 )∈X×U Υ ∂x
∂lβ 0 0 0 ∂lβ 0 0 0
+ t, x , xϑ , u Dϑ (x − x0 ) + t, x , xϑ , u (u − u0 ) dtβ }.
Υ ∂xϑ ∂u
For (x, u) ∈ X × U, we introduce the following notations:
∂lβ
A(x, u) := {(z, ν) ∈ X × U : S(x, u) = (t, x, xϑ , u) (x − z) dtβ
Υ ∂x
∂lβ ∂lβ
+ (t, x, xϑ , u) Dϑ (x − z) + (t, x, xϑ , u) (u − ν) dtβ },
Υ ∂xϑ ∂u
∂lβ
Z(x, u) := {(z, ν) ∈ X × U : R(x, u) = (t, z, zϑ , ν) (x − z) dtβ
Υ ∂x
∂lβ ∂lβ
+ (t, z, zϑ , ν) Dϑ (x − z) + (t, z, zϑ , ν) (u − ν) dtβ }.
Υ ∂xϑ ∂u
In the following, in accordance with Marcotte and Zhu [8], we introduce some
central definitions.
3 Preliminary Results
In this section, in order to formulate and prove the main results of the paper,
several auxiliary propositions are established.
Proposition 3.1 Let the path-independent curvilinear integral functional
L(x, u) be convex on X × U. Then:
(i) the following equality
∂lβ 2 2 2 1 ∂lβ 2 2 2
t, x , xϑ , u (x − x2 ) + t, x , xϑ , u Dϑ (x1 − x2 ) dtβ
Υ ∂x ∂xϑ
∂lβ 2 2 2 1
+ t, x , xϑ , u (u − u ) dtβ = 0
2
Υ ∂u
is fulfilled, for any (x1 , u1 ), (x2 , u2 ) ∈ (X × U)∗ ;
(ii) (X × U)∗ ⊂ (X × U)∗ .
Remark 3.1 The property of continuity for the variational derivative δβ L(x, u)
implies (X × U)∗ ⊂ (X × U)∗ . By Proposition 3.1, we conclude (X × U)∗ =
(X×U)∗ . Also, the solution set (X×U)∗ associated to (DCV IP ) is a convex set
and, consequently, the solution set (X × U)∗ associated to (CV IP ) is a convex
set.
Proposition 3.2 Let the path-independent curvilinear integral functional
R(x, u) be differentiable on X × U. Then the following ineguality
δβ R δβ R δβ L δβ L
( , ); (v, μ) ≥
( , ); (v, μ)
δx δu δy δw
is satisfied, for any (x, u), (v, μ) ∈ X × U, (y, w) ∈ Z(x, u).
Proposition 3.3 Let the path-independent curvilinear integral functional
R(x, u) be differentiable on (X × U)∗ and the path-independent curvilinear inte-
gral functional L(x, u) be convex on X × U. Also, assume the following implica-
tion
δβ R δ β R δβ L δβ L δβ R δβ R δβ L δβ L
( , ); (v, μ) ≥
( , ); (v, μ) =⇒ ( ∗ , ∗ ) = ( , )
δx∗ δu∗ δz δν δx δu δz δν
is true, for any (x∗ , u∗ ) ∈ (X × U)∗ , (v, μ) ∈ X × U, (z, ν) ∈ Z(x∗ , u∗ ). Then
Z(x∗ , u∗ ) = (X × U)∗ , ∀(x∗ , u∗ ) ∈ (X × U)∗ .
4 Main Results
In this section, taking into account the preliminary results established in the pre-
vious section, we investigate weak sharp solutions for the considered controlled
variational inequality governed by convex path-independent curvilinear integral
functional. Concretely, following Marcotte and Zhu [8], in accordance with Fer-
ris and Mangasarian [4], the weak sharpness property of (X × U)∗ associated to
(CV IP ) is studied. In this regard, two characterization results are established.
170 S. Treanţă
Definition 4.1 The solution set (X×U)∗ associated to (CV IP ) is called weakly
sharp if there exists γ > 0 such that
δβ L δβ L
◦
γB ⊂ , + TX×U (x∗ , u∗ ) ∩ N(X×U)∗ (x∗ , u∗ ) , ∀(x∗ , u∗ ) ∈ (X × U)∗ ,
δx∗ δu∗
(see int(Q) the interior of the set Q and B the open unit ball in X × U), or,
equivalently,
⎛ ⎞
δβ L δβ L ◦
− ∗ , − ∗ ∈ int ⎝ TX×U (x, u) ∩ N(X×U)∗ (x, u) ⎠ ,
δx δu ∗
(x,u)∈(X×U)
is true,
for any (x∗, u∗ ) ∈ (X × U)∗ , (v, μ) ∈ X × U, (z, ν) ∈ Z(x∗ , u∗ );
δβ L δ β L
(b) , is constant on (X × U)∗ .
δx∗ δu∗
Then (X × U)∗ is weakly sharp if and only if there exists γ > 0 such that
and, following Hiriart-Urruty and Lemaréchal [5], we obtain (x, u) − (ŷ, ŵ) ∈
TX×U (ŷ, ŵ) ∩ N(X×U)∗ (ŷ, ŵ). By hypothesis and Lemma 4.1, we get
∂lβ ∂lβ
(t, ŷ, ŷϑ , ŵ) (x − ŷ) + (t, ŷ, ŷϑ , ŵ) Dϑ (x − ŷ) dtβ
Υ ∂x ∂xϑ
∂lβ
+ (t, ŷ, ŷϑ , ŵ) (u − ŵ) dtβ ≥ γd((x, u), (X × U)∗ ), ∀(x, u) ∈ X × U.
Υ ∂u
(3)
Since
∂lβ ∂lβ
R(x, u) ≥ (t, ŷ, ŷϑ , ŵ) (x − ŷ) + (t, ŷ, ŷϑ , ŵ) Dϑ (x − ŷ) dtβ
Υ ∂x ∂xϑ
∂lβ
+ (t, ŷ, ŷϑ , ŵ) (u − ŵ) dtβ , ∀(x, u) ∈ X × U,
Υ ∂u
by (3), we obtain R(x, u) ≥ γd((x, u), (X × U)∗ ), ∀(x, u) ∈ X × U.
“⇐=” Consider there exists γ > 0 such that R(x, u) ≥ γd((x, u), (X × U)∗ ),
∗
∀(x, u) ∈ X × U. Obviously, for any (y, w) ∈ (X × U) , the case ◦ TX×U (y, w) ∩
N(X×U)∗ (y, w) = {(0,
0)} involves T X×U (y, w) ∩ N (X×U) ∗ (y, w) = X×U and,
δβ L δ β L ◦
consequently, γB ⊂ , + TX×U (y, w) ∩ N(X×U)∗ (y, w) , ∀(y, w) ∈
δy δw
∗
(X × U) is trivial. In the following, let (0, 0)
= (x, u) ∈ TX×U (y, w) ∩
N(X×U)∗ (y, w) involving there exists a sequence (xk , uk ) converging to (x, u)
with (y, w) + tk (xk , uk ) ∈ X × U (for some sequence of positive numbers {tk }
decreasing to zero), such that
tk
(x, u); (xk , uk )
= , (4)
(x, u)
where Hx,u = (x, u) ∈ X × U :
(x, u); (x, u) − (y, w) = 0 is a hyperplane
passing through (y, w) and orthogonal to (x, u). By hypothesis and (4), it
tk
(x, u); (xk , uk )
results R((y, w) + tk (xk , uk )) ≥ γ , or, equivalently (R(y, w) =
(x, u)
0, ∀(y, w) ∈ (X × U)∗ ),
Further, by taking the limit for k → ∞ in (5) and using a classical result of
functional analysis, we obtain
δβ R δβ R
( , ); (x, u) ≥ γ(x, u). (7)
δy δw
Now, taking into account the hypothesis and (7), for any (b, υ) ∈ B, it fol-
δβ L δ β L δβ R δβ R
lows
γ(b, υ) − ( , ); (x, u) =
γ(b, υ); (x, u) −
( , ); (x, u) ≤
δy δw δy δw
γ(x, u) − γ(x, u) = 0 and the proof is complete.
Theorem 4.2 Let the solution set (X × U)∗ associated to (CV IP ) be weakly
sharp and the path-independent curvilinear integral functional L(x, u) be convex
on X × U. Then (CV IP ) satisfies minimum principle sufficiency property.
δβ L δ β L
Further, for P (x, u) =
( , ); (x, u), (x, u) ∈ X × U, we get
∗ ∗
δx∗ δu∗
A(x , u ) the solution set for min P (x, u). For other related investiga-
(x,u)∈X×U
tions, the readers are directed to Mangasarian and Meyer [7]. We can write
P (x, u) − P (x̃, ũ) ≥ γd((x, u), A(x∗ , u∗ )), ∀(x, u) ∈ X × U, (x̃, ũ) ∈ A(x∗ , u∗ ),
δβ L δ β L
or,
( ∗ , ∗ ); (x, u) − (x∗ , u∗ ) ≥ γd((x, u), (X × U)∗ ), ∀(x, u) ∈ X × U, or,
δx δu
equivalently,
∂lβ ∂lβ
(t, x∗ , x∗ϑ , u∗ ) (x − x∗ ) + (t, x∗ , x∗ϑ , u∗ ) Dϑ (x − x∗ ) dtβ
Υ ∂x ∂xϑ
∂lβ
+ (t, x∗ , x∗ϑ , u∗ ) (u − u∗ ) dtβ ≥ γd((x, u), (X×U)∗ ), ∀(x, u) ∈ X×U.
Υ ∂u
(9)
By (8), (9) and Theorem 4.1, we get (X × U)∗ is weakly sharp.
“⇐=” This is a consequence of Theorem 4.2.
References
1. Alshahrani, M., Al-Homidan S., Ansari, Q.H.: Minimum and maximum principle
sufficiency properties for nonsmooth variational inequalities. Optim. Lett. 10, 805–
819 (2016)
2. Burke, J.V., Ferris, M.C.: Weak sharp minima in mathematical programming.
SIAM J. Control Optim. 31, 1340–1359 (1993)
3. Clarke, F.H.: Functional Analysis, Calculus of Variations and Optimal Control.
Springer, London (2013)
4. Ferris, M.C., Mangasarian, O.L.: Minimum principle sufficiency. Math. Program.
57, 1–14 (1992)
5. Hiriart-Urruty, J.-B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer,
Berlin (2001)
6. Liu, Y., Wu, Z.: Characterization of weakly sharp solutions of a variational inequal-
ity by its primal gap function. Optim. Lett. 10, 563–576 (2016)
7. Mangasarian, O.L., Meyer, R.R.: Nonlinear perturbation of linear programs. SIAM
J. Control Optim. 17, 745–752 (1979)
8. Marcotte, P., Zhu, D.: Weak sharp solutions of variational inequalities. SIAM J.
Optim. 9, 179–189 (1998)
9. Oveisiha, M., Zafarani, J.: Generalized Minty vector variational-like inequalities
and vector optimization problems in Asplund spaces. Optim. Lett. 7, 709–721
(2013)
10. Patriksson, M.: A unified framework of descent algorithms for nonlinear programs
and variational inequalities. Ph.D. thesis, Linköping Institute of Technology (1993)
11. Polyak, B.T.: Introduction to Optimization. Optimization Software. Publications
Division, New York (1987)
12. Treanţă, S.: Multiobjective fractional variational problem on higher-order jet bun-
dles. Commun. Math. Stat. 4, 323–340 (2016)
13. Treanţă, S.: Higher-order Hamilton dynamics and Hamilton-Jacobi divergence
PDE. Comput. Math. Appl. 75, 547–560 (2018)
174 S. Treanţă
1 Introduction
the classical fact that each proper lower semicontinuous convex function is the
upper envelope of a certain set of affine functions. In the present paper we use Φ-
convexity to investigate the duality for a wide class of a nonconvex optimization
problems.
The aim of this paper is to investigate Lagrangian duality for optimization
problems involving Φlsc -convex functions. The class of Φlsc -convex functions con-
sists of lower semicontinuous functions defined on Hilbert spaces and minorized
by quadratic functions. This class embodies many important classes of functions
appearing in optimization, e.g. prox-regular functions [3], also known as prox-
bounded functions [12], DC (difference of convex) functions [20], weakly convex
functions [21], para-convex functions [13] and lower semicontinuous convex (in
the classical sense) functions.
Our main Lagrangian duality result (Theorem 4) is based on a general mini-
max theorem for Φ-convex functions as proved in [18]. An important ingredient
of Theorem 4 is condition (3) which can be viewed as a regularity condition
guarantying the strong duality. Condition (3) appears to be weaker than many
already known regularity conditions [4].
The organization of the paper is as follows. In Sect. 2 we recall basic facts
on Φ-convexity. In Sect. 3 we define subgradient for a particular class of Φ-
convex functions, namely the class of Φlsc -convex functions. In Sect. 4 we for-
mulate Lagrangian duality theorem in the class of Φlsc -convex functions (Theo-
rem 4). Condition (3) of Theorem 4 is a regularity condition ensuring that the
Lagrangian duality for optimization problem with Φlsc -convex functions holds.
Let us note that Φlsc -convex functions as defined above may admit +∞ values
which allows to consider also indicator functions in our framework.
2 Φ-Convexity
f ≤ g ⇔ f (x) ≤ g(x) ∀x ∈ X.
supp(f, Φ) := {ϕ ∈ Φ : ϕ ≤ f }
is called the support of f with respect to Φ. We will use the notation supp(f ) if
the class Φ is clear from the context.
The class of Φlsc -convex functions is very broad and contains many well
known classes of nonconvex functions appearing in optimization e.g prox-regular
(also called prox-bounded) functions [3,12], DC functions [15,20], weakly convex
functions [21], para-convex functions [6,13].
Let us note that the set of all Φlsc -convex functions defined on Hilbert space
X contains all proper lower semicontinuous and convex (in the classical sense)
functions defined on X.
3 Subgradients
It can be shown that many subgradients, which appear in the literature, are Φlsc -
subgradients. Examples of such subgradients are: proximal subgradients [19],
subgradients for DC functions [1], for weakly convex functions [21], para-convex
functions [13] and classical subgradient for lower semicontinuous convex func-
tions.
178 E. M. Bednarczuk and M. Syga
where [ϕ < α] := {x ∈ X : ϕ(x) < α} is the strict lower level set of a function
ϕ : X → R. The general minimax theorem for Φ-convex functions is proved in
Theorem 3.3.3 of [18]. In the case Φ = Φlsc Theorem 3.3.3 of [18] can be rewritten
in the following way.
then
sup inf a(x, y) = inf sup a(x, y).
y∈Y x∈X x∈X y∈Y
Remark 1. Let us note that, if inf sup a(x, y) = −∞ then, the equality
x∈X y∈Y
sup inf a(x, y) = inf sup a(x, y) holds. If inf sup a(x, y) = +∞, then for min-
y∈Y x∈X x∈X y∈Y x∈X y∈Y
imax equality to hold we need to assume that the assumption of Theorem 3 is
true for every β < +∞.
= p∗∗
x (y0 ) = p(x, y0 ).
The dual problem to (P ) is defined is defined as follows
sup inf L(x, a, v) (D).
(a,v)∈R+ ×Y ∗ x∈X
Let β := inf sup L(x, a, v). The following theorem is based on the gen-
x∈X (a,v)∈R+ ×Y ∗
eral minimax theorem.
180 E. M. Bednarczuk and M. Syga
then
sup inf L(x, a, v) = inf sup L(x, a, v)
(a,v)∈R+ ×Y ∗ x∈X x∈X (a,v)∈R+ ×Y ∗
and there exists (ā, v̄) ∈ R+ × Y ∗ ∈ Y such that inf L(x, ā, v̄) ≥ β
x∈X
It can be shown through numerous examples that condition (3) works in some
cases where other relatively weak conditions are not satisfied. Moreover, the
condition (3) is close to be necessary for the conclusion of Theorem 4 to hold.
5 Conclusions
The most important ingredient of our work is the condition (3). It would be
interesting to derive equivalent form of the condition (3) in order to make it
more easy to check.
On Lagrange Duality of Nonconvex Optimization Problems 181
References
1. Bačák, M., Borwein, J.M.: On difference convexity of locally Lipschitz
functions. Optimization 60(8–9), 961–978 (2011). https://doi.org/10.1080/
02331931003770411
2. Bednarczuk, E.M., Syga, M.: On minimax theorems for lower semicontinuous func-
tions in Hilbert spaces. J. Convex Anal. 25(2), 389–402 (2018)
3. Bernard, F., Thibault, L.: Prox-regular functions in Hilbert spaces. J. Math. Anal.
Appl. 303(1), 1–14 (2005). https://doi.org/10.1016/j.jmaa.2004.06.003
4. Boţ, R.I., Csetnek, E.R.: Regularity conditions via generalized interiority notions
in convex optimization: new achievements and their relation to some classical state-
ments. Optimization 61(1), 35–65 (2012). https://doi.org/10.1080/02331934.2010.
505649
5. BoŢ, R.I., Wanka, G.: Duality for composed convex functions with applications
in location theory, pp. 1–18. Deutscher Universitätsverlag, Wiesbaden (2003).
https://doi.org/10.1007/978-3-322-81539-2
6. Cannarsa, P., Sinestrari, C.: Semiconcave Functions, Hamilton-Jacobi Equations,
and Optimal Control, vol. 58. Birkhauser Boston, MA (2004)
7. Dolecki, S., Kurcyusz, S.: On φ-convexity in extremal problems. SIAM J. Control
Optim. 16, 277–300 (1978)
8. Harada, R., Kuroiwa, D.: Lagrange-type duality in DC programming. J. Math.
Anal. Appl. 418(1), 415–424 (2014). https://doi.org/10.1016/j.jmaa.2014.04.017
9. Hare, W., Poliquin, R.: The quadratic Sub-Lagrangian of a prox-regular func-
tion. Nonlinear Anal. 47, 1117–1128 (2001). https://doi.org/10.1016/S0362-
546X(01)00251-6
10. Martı́nez-Legaz, J.E., Volle, M.: Duality in D.C. programming: the case of several
D.C. constraints. J. Math. Anal. Appl. 237(2), 657–671 (1999). https://doi.org/
10.1006/jmaa.1999.6496
11. Pallaschke, D., Rolewicz, S.: Foundations of Mathematical Optimization. Kluwer
Academic (1997)
12. Rockafellar, R., Wets, R.J.B.: Variational Analysis. Springer, Berlin (1998)
13. Rolewicz, S.: Paraconvex analysis. Control Cybern. 34, 951–965 (2005)
14. Rubinov, A.M.: Abstract Convexity and Global Optimization. Kluwer Academic,
Dordrecht (2000)
15. Singer, I.: Duality for D.C. optimization problems, pp. 213–258. Springer, New
York (2006). https://doi.org/10.1007/0-387-28395-1
16. Sun, X., Long, X.J., Li, M.: Some characterizations of duality for DC optimization
with composite functions. Optimization 66(9), 1425–1443 (2017). https://doi.org/
10.1080/02331934.2017.1338289
17. Syga, M.: Minimax theorems for φ-convex functions: sufficient and necessary con-
ditions. Optimization 65(3), 635–649 (2016). https://doi.org/10.1080/02331934.
2015.1062010
18. Syga, M.: Minimax theorems for extended real-valued abstract convex-concave
functions. J. Optim. Theory Appl. 176(2), 306–318 (2018). https://doi.org/10.
1007/s10957-017-1210-4
19. Syga, M.: Minimax theorems via abstract subdifferential. preprint (2019)
20. Tuy, H.: D.C. Optimization: theory, methods and algorithms (1995). https://doi.
org/10.1007/978-1-4615-2025-2
21. Vial, J.P.: Strong and weak convexity of sets and functions. Math. Oper. Res. 8(2),
231–259 (1983). https://doi.org/10.1287/moor.8.2.231
On Monotone Maps: Semidifferentiable
Case
1 Introduction
Karamardian and Schaible [3] discussed the concepts of monotone and gen-
eralized monotone maps. In that paper, Karamardian and Schaible [3] estab-
lished the relationships between convex/generalized convex functions and mono-
tone/generalized monotone maps of their gradient. The theory of generalized
monotone maps plays an important role in variational inequalities [5], comple-
mentarity problems [1], and equilibrium problems [15].
On the other hand, the differentiable functions can be extended to non-
differentiable functions, as subdifferentials [2], and semidifferentials [6] for single-
valued as well as set-valued maps [8]. Kaul and Kaur [6] introduced the con-
cepts of semidifferentials and discussed the properties of locally star-shaped sets
and semilocally generalized convex functions with the help of semidifferentials.
Although, the oldest trace of Hadamard semidifferentiability has been already
seen in two articles by Durdil [12] and Penot [13]. Further, Delfour and Zolésio
[14] gave a rather complete treatment of Hadamard semidifferentials in infinite
dimensions, where it is the natural tool for shape optimization. In the study of
non-differentiable functions, the concepts of semidifferentials have more impor-
tance than subdifferentials [7]. Particularly, the convex continuous functions can
be characterized by semidifferentiable functions so the solution of a minimiza-
tion problem with convex continuous objective function over a convex set can
be found with the help of semidifferentials [7]. For obtaining the necessary opti-
mality conditions of an optimization problem with the non-convex objective
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 182–190, 2020.
https://doi.org/10.1007/978-3-030-21803-4_19
On Monotone Maps: Semidifferentiable Case 183
2 Preliminaries
2.1 Semidifferentials
Definition 2.1. [6] Let f be a numerical function defined on a set U ⊆ Rn ,
then the semidifferential of f at y in direction of x − y is denoted by (df )+ and
defined by
f (y + λ(x − y)) − f (y)
(df )+ (y, x − y) = lim ,
λ→0+ λ
provided the limit exists.
Definition 2.2. [7] Let f be a numerical function defined on a set U ⊆ Rn ,
then the Hadamard semidifferential of f at y in direction of u is denoted by df
and defined by
f (y + λw) − f (y)
(df )(y, u) = lim ,
λ→0 + λ
w→u
provided the limit exists.
Remark 2.1. Hadamard semidifferentials coincide with the semidifferentials, if
the direction u = x − y.
Theorem 2.1. [7] (Mean-Value Theorem for Semidifferentials)
Let f : Rn → R be a function, x, u ∈ Rn . Let the function t → s(t) = f (x + tu)
be continuous on [0, 1] and differentiable on (0, 1), then ∃ λ ∈ (0, 1) such that
f (x + u) = f (x) + (df )+ (x + λu, u).
u → (df )+ (x, u) : Rn → R
F (x) − F (y), x − y ≥ 0.
F (y), x − y ≥ 0 ⇒ F (x), x − y ≥ 0.
On Monotone Maps: Semidifferentiable Case 185
3 Main Results
Theorem 3.1. Let U be a non-empty open and convex subset of Rn . Then, a
semidifferentiable function f : U → R is convex if and only if (df )+ (., x − y) is
monotone on U .
By the Mean-Value Theorem, ∃ z = λx + (1 − λ)y for some λ ∈ (0, 1), such that
1
f (x) − f (y) = (df )+ (z, x − y) = (df )+ (z, z − y). (4)
λ
186 S. K. Mishra et al.
Therefore, f is convex on U.
0
-2 -1 0 1 2
i.e. (df )+ (y, λ(x − y)) < 0, ∀x
= y ∈ U, for some λ ∈ (0, 1),
i.e. (df )+ (y, x − y) < 0, (∵ λ > 0)
which contradicts the assumption of pseudomonotonicity of (df )+ (., x−y). Thus,
f is pseudoconvex on U.
f (x) = x + sinx.
∵ f is quasiconvex on U,
and
(df )+ (x∗ , λ̄(x − y)) > 0. (∵ x̄ − y = λ̄(x − y)) (18)
∵ 1 − λ̄ > 0 and λ̄ > 0, hence from (17) and (18), we have
and
(df )+ (x∗ , x − y) > 0. (20)
On Monotone Maps: Semidifferentiable Case 189
Remark 3.1. Theorem 3.2 and Theorem 3.3 extend the Theorem 3.1 of Kara-
mardian [1] and Proposition 5.2 of Karamardian and Schaible [3], respectively,
in semidifferentiable case.
Which is quasimonotone on R.
References
1. Karamardian, S.: Complementarity problems over cones with monotone and pseu-
domonotone maps. J. Optim. Theory Appl. 18, 445–454 (1976)
2. Rockafellar, R.T.: Characterization of the subdifferentials of convex functions.
Pacific J. Math. 17, 497–510 (1966)
3. Karamardian, S., Schaible, S.: Seven kinds of monotone maps. J. Optim. Theory
Appl. 66(1), 37–46 (1990)
4. Minty, G.J.: On the monotonicity of the gradient of a convex function. Pacific J.
Math. 14, 243–247 (1964)
5. Ye, M., He, Y.: A double projection method for solving variational inequalities
without monotonicity. Comput. Optim. Appl. 60(1), 141–150 (2015)
6. Kaul, R.N., Kaur, S.: Generalizations of convex and related functions. European
J. Oper. Res. 9(4), 369–377 (1982)
7. Delfour, M.C.: Introduction to optimization and semidifferential calculus. Society
for Industrial and Applied Mathematics (SIAM). Philadelphia (2012)
8. Penot, J.-P., Quang, P.H.: Generalized convexity of functions and generalized
monotonicity of set-valued maps. J. Optim. Theory Appl. 92(2), 343–356 (1997)
9. Komlósi, S.: Generalized monotonicity and generalized convexity. J. Optim. Theory
Appl. 84(2), 361–376 (1995)
10. Mangasarian, O.L.: Nonlinear Programming. McGraw-Hill Book Co., New York-
London-Sydney (1969)
11. Castellani, M., Pappalardo, M.: On the mean value theorem for semidifferentiable
functions. J. Global Optim. 46(4), 503–508 (2010)
12. Durdil, J.: On Hadamard differentiability. Comment. Math. Univ. Carolinae 14,
457–470 (1973)
13. Penot, J.-P.: Calcul sous-différentiel et optimisation. J. Funct. Anal. 27(2), 248–276
(1978)
14. Delfour, M.C., Zolésio, J.-P.: Shapes and geometries, Society for Industrial and
Applied Mathematics (SIAM). Philadelphia (2001)
15. Giannessi, F., Maugeri, A. (eds.): Variational Inequalities and Network Equilibrium
Problems. Plenum Press, New York (1995)
Parallel Multi-memetic Global Optimization
Algorithm for Optimal Control
of Polyarylenephthalide’s Thermally-
Stimulated Luminescence
1 Introduction
This paper presents the modified parallel MEC algorithm with the incorporated LA
procedure and accompanied with the static load balancing method. An outline of the
algorithm as well as the brief description of its software implementation are described
in this paper. In addition, the computationally expensive optimal control problem of
polyarylenephthalide’s thermally-stimulated luminescence was studied in this work and
solved using the proposed technique.
Here Uð X Þ is the scalar objective function, UðX Þ ¼ U is the required minimal value,
X ¼ ðx1 ; x2 ; . . .; xn Þ is the n-dimensional vector of variables, Rn is the n-dimensional
arithmetical space, D is the constrained search domain.
Initial values of vector X are generated within a domain D0 , which is defined as
follows
D0 ¼ X xmin
i xi xmax
i ; i 2 ½ 1 : n R :
n
This paper presents the new parallel Modified Multi-Memetic MEC (M3MEC) algo-
rithm. The SMEC algorithm is based on the three stages: initialization, similar taxis and
dissimilation [10]. In turn, the initialization stage of the M3MEC algorithm contains the
LA procedure which is based on the concept of Lebesgue integral [5, 11] and divides
objective function’s range space into levels based on the values Uð X Þ. This stage can
be described as follows.
1. Generate N quasi-random n-dimensional vectors within domain D0 . In this work
LPs sequence was used to generate quasi-random numbers since it provides a high-
quality coverage of a domain.
194 M. Sakharov and A. Karpenko
Fig. 1. Determining a diameter of the first sub-domain for the benchmark composition function
1 from CEC’14
There are three possible cases for approximated dependency d ðlÞ: d can be an
increasing function of l; d can decrease as l grows; d ðlÞ can be neither decreasing nor
increasing. Within the scope of this work it is assumed that the latter scenario takes
place when a slope angle of the approximated line is within 5 . Each case represents a
Parallel Multi-memetic Global Optimization Algorithm 195
certain set of the numeric values of M3MEC’s free parameters suggested based on the
numeric studies [10].
The similar taxis stage was modified in M3MEC in order to include meme selection
and local improvement stage. Meme selection is performed in accordance with the
simple random hyper-heuristic [12]. Once the most suitable meme is selected for a
specific sub-population it is applied randomly to a half of its individuals for kls ¼ 10
iterations. The dissimilation stage of SMEC was not modified in M3MEC. To handle
constraints of a search domain D the death penalty technique [2] was utilized during the
similar taxis stage.
In this work four local search methods were utilized, namely, Nelder-Mead method
[13], Hooke-Jeeves method [14], Monte-Carlo method [15], and Random Search on a
Sphere [16]. Only zero-order methods were used to deal with problems where the
objective function’s derivative is not available explicitly and its approximation is
computationally expensive.
The similar taxis and dissimilation stages are performed in parallel independently
for each sub-population. To map those sub-populations on the available computing
nodes the static load balancing was proposed by the authors [5] specifically for loosely
coupled systems.
We modify the initialization stage described above so that at the step 2 apart from
calculating the values of the objective function Ur , time required for those calculations
tr is also measured. The proposed adaptive load balancing method can be described as
follows.
1. For each sub-population Kl , l 2 ½1 : jK j we analyze all time measurements tr for the
corresponding vectors Xr ; r 2 ½1 : N=jK j whether there are outliers or not.
2. All found outliers t are excluded from sub-populations. A new sub-population is
composed from those outliers and it can be investigated by the user’s request after
the computational process is over.
3. All available computing nodes are sorted by their computation power then the first
sub-population K1 is sent to the first node.
4. Individuals in other sub-populations are re-distributed between neighboring sub-
populations starting from K2 so that the average calculation time would be
approximately the same for every sub-populations. Balanced sub-populations Kl ,
l 2 ½2 : jK j are then mapped onto the computational nodes.
The modified similar taxis stage along with the dissimilation stage are launched on
each node with the specific values of the free parameters in accordance with the results
of landscape analysis. Each computing node utilizes the stagnation of computational
process as a termination criterion while the algorithm in general works in a syn-
chronous mode so that the final result is calculated when all computing nodes com-
pleted their tasks.
196 M. Sakharov and A. Karpenko
The Modified Multi-Memetic MEC algorithm (M3MEC) along with the utilized memes
were implemented by the authors in Wolfram Mathematica. Software implementation
has a modular structure, which helps to modify algorithm easily and extend it with
additional assisting methods and memes. The proposed parallel algorithm and its
software implementation were used to solve an optimal control problem for the
thermally-stimulated luminescence of polyarylenephthalides (PAP).
Nowadays, organic polymer materials are widely used in the field of optoelec-
tronics. The polyarylenephthalides are high-molecular compositions that belong to a
class of unconjugated cardo polymers. PAPs exhibit good optical and electrophysical
characteristics along with the thermally-stimulated luminescence. Determining the
origins of PAP’s luminescent states is of both fundamental and practical importance
[17].
Here y1 , y2 represent the initial stable species of various nature; y3 is some certain excited
state where y1 and y2 are headed to; y4 – quants of light. T ðtÞ is the reaction temperature.
Luminescence intensity I ðtÞ is being calculated according to the formula I ðtÞ ¼
544663240 y3 ðtÞ. Relative units are used for measuring I ðtÞ. Initial concentrations of the
species in the reaction equal y1 ð0Þ ¼ 300; y2 ð0Þ ¼ 1000; y3 ð0Þ ¼ y4 ð0Þ ¼ 0. The inte-
gration interval equals ½0; 2000 seconds.
Parallel Multi-memetic Global Optimization Algorithm 197
Z
2000
J ðT ðtÞÞ ¼ Iref ðtÞ I ðT ðtÞÞ2 dt ! min : ð3Þ
T ðtÞ
0
The global minimization problem (3) was solved using the proposed M3MEC-
P algorithm and its software implementation. The following values of the algorithm’s
free parameters were utilized: the number of groups c ¼ 90; the number of individuals
in each group jSj ¼ 50; the stagnation iteration number kstop ¼ 100; tolerance used for
identifying stagnation was equal to e ¼ 10 5 . All computations were performed with a
use of desktop grid made of eight personal computers that didn’t communicate with
each other. The number of sub-populations jK j ¼ 8 was selected to be equal to the
number of computing nodes. In order to increase the probability of localizing global
optima, the multi-start method with 15 launches was used. The BDF integration
method was utilized at every evaluation to solve (2).
The first set of experiments was devoted to studying dynamics of the model (2)
under the constant values of the temperature in the reaction within the range of T ¼
150::215 C with the step size of 10 degrees (Fig. 2). Obtained results demonstrate that
under any constant temperature the luminescence intensity I ðtÞ decreases over time.
Furthermore, the higher the temperature, the faster the luminescence intensity
decreases.
The second set of experiments was devoted to maintaining the constant value of the
luminescence intensity I ðtÞ. The results obtained for the target value Iref ðtÞ ¼ 300 are
displayed in Fig. 3. They suggest that in order to maintain the constant value of I ðtÞ,
the reaction temperature has to increase similar to the linear law of variation (ap-
proximately by 1 C every 300 s). This implies the restriction on the reaction time as it
is impossible to maintain the constant growth of the temperature in the experimental
setup.
The third set of experiments was conducted to determine the law of variation of the
temperature T ðtÞ that would provide the required pulse changes in the luminescence
intensity I ðtÞ. Obtained results are presented in Fig. 4. The optimal control trajectory
repeats the required law of I ðtÞ variation with the addition of linear increasing trend.
198 M. Sakharov and A. Karpenko
Fig. 2. PAP’s luminescence intensity under various constant values of the temperature
T 2 ½150::215 C
161
160
159
T(t), °C
158
157
156
0 500 1000 1500 2000
t, s
Fig. 3. Optimal control for maintaining the constant value of PAP’s luminescence intensity
164
162
T(t), °C 160
158
156
0 500 1000 1500 2000
t, s
Fig. 4. Optimal control for providing pulse changes in the luminescence intensity
Parallel Multi-memetic Global Optimization Algorithm 199
162
160
T(t), °C 158
156
154
0 500 1000 1500 2000
t, s
Fig. 5. Optimal control for providing harmonic oscillations of the luminescence intensity
5 Conclusions
This paper presents the new modified parallel population based global optimization
algorithm designed for loosely coupled systems and its software implementation. The
M3MEC-P algorithm is based on the adaptation strategy and the landscape analysis
procedure originally proposed by the authors and incorporated into the traditional
SMEC algorithm.
The algorithm is capable of adapting to various objective functions using both static
and dynamic adaptation. Static adaptation was implemented with a use of landscape
analysis, while dynamic adaptation was made possible by utilizing several memes. The
proposed landscape analysis is based on the concept of Lebesgue integral and allows
one to group objective functions into six categories. Each category suggests a usage of
the specific set of values for the algorithm’s free parameters. The proposed algorithm
and its software implementation were proved to be efficient when solving a real world
computationally expensive global optimization problem: determination of kinetics of
the thermally-stimulated luminescence of polyarylenephthalides.
Further research will be devoted to the study of asynchronous stopping criteria, as
well as the investigation of different architectures of loosely coupled systems.
Acknowledgments. This work was supported by the RFBR under a grant 18-07-00341.
200 M. Sakharov and A. Karpenko
References
1. Sakharov, M.K., Karpenko, A.P., Velisevich, Ya.I.: Multi-memetic mind evolutionary
computation algorithm for loosely coupled systems of desktop computers. In: Science and
Education of the Bauman MSTU, vol. 10, pp. 438–452 (2015). https://doi.org/10.7463/1015.
0814435
2. Karpenko, A.P.: Modern algorithms of search engine optimization. Nature-inspired
optimization algorithms. Moscow, Bauman MSTU Publ., p. 446 (2014)
3. Neri, F., Cotta, C., Moscato, P.: Handbook of Memetic Algorithms, pp. 368. Springer, Berlin
(2011). https://doi.org/10.1007/978-3-642-23247-3
4. Mersmann, O. et al.: Exploratory landscape analysis. In: Proceedings of the 13th Annual
Conference on Genetic and Evolutionary Computation. ACM, pp. 829–836. (2011). https://
doi.org/10.1145/2001576.2001690
5. Sakharov, M., Karpenko, A.: Multi-memetic mind evolutionary computation algorithm
based on the landscape analysis. In: Theory and Practice of Natural Computing. 7th
International Conference, TPNC 2018, Dublin, Ireland, 12–14 Dec 2018, Proceedings,
pp. 238–249. Springer (2018). https://doi.org/10.1007/978-3-030-04070-3
6. Voevodin, V.V., Voevodin, Vl. V.: Parallel Computations, p. 608. BHV-Peterburg, SPb.
(2004)
7. Sakharov, M.K., Karpenko, A. P.: Adaptive load balancing in the modified mind
evolutionary computation algorithm. In: Supercomputing Frontiers and Innovations, 5(4),
5–14 (2018). https://doi.org/10.14529/jsfi180401
8. Jie, J., Zeng, J.: Improved mind evolutionary computation for optimizations. In: Proceedings
of 5th World Congress on Intelligent Control and Automation, Hang Zhou, China, pp. 2200–
2204 (2004). https://doi.org/10.1109/WCICA.2004.1341978
9. Chengyi, S., Yan, S., Wanzhen, W.: A Survey of MEC: 1998-2001. In: 2002 IEEE
International Conference on Systems, Man and Cybernetics IEEE SMC2002, Hammamet,
Tunisia. October 6–9. Institute of Electrical and Electronics Engineers Inc., vol. 6, pp. 445–
453 (2002). https://doi.org/10.1109/ICSMC.2002.1175629
10. Sakharov, M., Karpenko, A.: Performance investigation of mind evolutionary computation
algorithm and some of its modifications. In: Proceedings of the First International Scientific
Conference “Intelligent Information Technologies for Industry” (IITI’16), pp. 475–486.
Springer (2016). https://doi.org/10.1007/978-3-319-33609-1_43
11. Sakharov, M., Karpenko, A.: A new way of decomposing search domain in a global
optimization problem. In: Proceedings of the Second International Scientific Conference
“Intelligent Information Technologies for Industry” (IITI’17), pp. 398–407. Springer (2018).
https://doi.org/10.1007/978-3-319-68321-8_41
12. Ong, Y.S., Lim, M.H., Zhu, N., Wong, K.W.: Classification of adaptive memetic algorithms:
a comparative study. In: IEEE Transactions on Systems, Man, and Cybernetics, Part B:
Cybernetics, pp. 141–152 (2006)
13. Nelder, J.A., Meade, R.: A Simplex method for function minimization. Comput. J. 7, 308–
313 (1965)
14. Karpenko, A.P.: Optimization Methods (Introductory Course), http://bigor.bmstu.ru/.
Accessed 25 Mar 2019
15. Sokolov, A.P., Pershin, A.Y.: Computer-aided design of composite materials using
reversible multiscale homogenization and graph-based software engineering. Key Eng.
Mater. 779, 11–18 (2018). https://doi.org/10.4028/www.scientific.net/KEM.779.11
Parallel Multi-memetic Global Optimization Algorithm 201
16. Agasiev, T., Karpenko, A.: The program system for automated parameter tuning of
optimization algorithms. Proc. Comput. Sci. 103, 347–354 (2017). https://doi.org/10.1016/j.
procs.2017.01.120
17. Antipin, V.A., Shishlov, N.M., Khursan, S.L.: Photoluminescence of polyarylenephthalides.
VI. DFT study of charge separation process during polymer photoexcitation. Bulletin of
Bashkir University, vol. 20, Issue 1, pp. 30–42 (2015)
18. Akhmetshina, L.R., Mambetova, Z.I., Ovchinnikov, M.Y.: Mathematical modeling of
thermoluminescence kinetics of polyarylenephthalides. In: V International Scientific
Conference on Mathematical Modeling of Processes and Systems, pp. 79–83 (2016)
19. Antipin, V.A., Mamykin, D.A., Kazakov, V.P.: Recombination luminescence of poly
(arylene phthalide) films induced by visible light. High Energy Chem. 45(4), 352–359
(2011)
Proper Choice of Control Parameters for
CoDE Algorithm
1 Introduction
A proper setting of control parameters in Differential evolution (DE) algorithm
plays an important role when solving various optimisation problems. More pre-
cisely, there is no one proper setting which performs the best on most of the prob-
lems (No-Free-Lunch theorem [16]). Although there is a lot of various approaches
for adapting values of DE parameters, no one is able to be the best.
This paper is focused on a more proper setting of two control parameters of
the DE algorithm based on preliminary work. Our preliminary comprehensive
experiment provides very interesting results of the DE algorithm solving real-
world problems. A lot of combinations of two DE control parameters are studied
and evaluated by selected problems to order them from more efficient to more
inefficient.
Although DE has only few control parameters, the efficiency is very sensitive
especially to the control parameter setting of F and CR values. Unfortunately,
simple trial-and-error tuning of the parameters requires a lot of time. Several
authors recommended the setting of DE control parameters [7,9,10], unfortu-
nately, these values are valid only for a part of optimisation problems. This fact
results in a lot of adaptive mechanisms controlling values of F and CR were
proposed e.g. [1,11,12,14,15]. The summary of DE research has been presented
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 202–212, 2020.
https://doi.org/10.1007/978-3-030-21803-4_21
Proper Choice of Control Parameters for CoDE Algorithm 203
Although CoDE is not very often used in real applications, results of the fol-
lowing experiments show that there are some problems in which the DE variant
performs better compare to other algorithms [2,3].
Fig. 1. Mean ranks from Friedman test and 441 F , CR settings and all problems.
parameters are used in CoDE to increase performance when The remaining set-
ting of CoDEFCR1 is the same as the setting of the original CoDE algorithm.
5 Experimental Settings
The main aim of this study is to increase the efficiency of CoDE algorithm on
real-world problems. Therefore, the test suite of 22 real-world problems selected
for CEC 2011 competition in Special Session on Real-Parameter Numerical Opti-
mization [5] is used as a benchmark in the experimental comparison. The func-
tions in the benchmark differ in the computational complexity and in the dimen-
sion of the search space which varies from D = 1 to D = 240.
For each algorithm and problem, 25 independent runs were carried out. The
run of the algorithm stops if the prescribed number of function evaluations
MaxFES = 150000 is reached. The partial results of the algorithms after reach-
ing one third and two-thirds of MaxFES were also recorded for further analysis.
The point in the terminal population with the smallest function value is the
solution of the problem found in the run. The minimal function values of the
problems are unknown, the algorithm providing lower function value is better
performing.
The experiments in this paper can be divided into two parts. In the first part,
a classic DE algorithm with strategy rand/1/bin and 441 different combinations
of F and CR setting was studied, as mentioned in Sect. 3. The only control
parameter is population size, and it is set N = 100.
In the second part of the experiment, the original CoDE algorithm and two
newly proposed enhanced CoDE variants (CoDEFCR1 , CoDEFCR2 ) were applied
to the set of 22 real-world problems. The names of the newly proposed variants
are abbreviated to FCR1 (CoDEFCR1 ) and FCR2 (CoDEFCR2 ) in some parts
of the results. The other control parameters are set up according to the recom-
mendation of authors in the original paper. All the algorithms are implemented
in Matlab 2017b, and all computations were carried out on a standard PC with
Windows 7, Intel(R) Core(TM)i7-4790 CPU 3.6 GHz, 16 GB RAM.
6 Results
The original CoDE algorithm and two newly proposed enhanced CoDE variants
are compared on 22 real-world problems. Global insight into overall algorithms
Proper Choice of Control Parameters for CoDE Algorithm 207
Fig. 2. Mean rank values for three CoDE variants in three stages of the search from
Friedman test.
More detailed results from the comparison of three CoDE variants provide
non-parametric Kruskal-Wallis test with Dunn’s multiple comparison methods.
This test is applied on each problem separately to show which settings of control
parameters in CoDE is more efficient. The zero hypothesis about algorithms’
performance was rejected in most of problems at significance level 1 × 10−4 . The
Table 1. Median values of three CoDE variants and Results of Kruskal-Wallis tests.
208
T04 1 0 0 0 0 0 0 No difference
T05 30 −33.5321 −29.8468 −33.2764 −32.2624 −33.5565 −32.3392 FCR2,FCR1 CoDE
T06 30 −26.5649 −21.6885 −28.0248 −25.9769 −28.1222 −26.0169 FCR2,FCR1 CoDE
T07 20 1.20988 1.39838 1.04545 1.40338 1.18516 1.36425 No difference
T08 7 220 220 220 220 220 220 No difference
T09 126 1823.74 2719.55 1323.17 2128.9 1188.79 1746.66 FCR2,FCR1 CoDE
T10 12 −21.8285 −21.5793 −21.8425 −21.6437 −21.8425 −21.6437 FCR1,FCR2 CoDE
T11.1 120 51855.2 53034.7 51439.1 52220.5 51299.7 52179.2 FCR2,FCR1 CoDE
T11.2 96 1070680 1074570 1084660 1130970 1069540 1113210 CoDE FCR2,FCR1
T11.3 240 15444.2 15444.2 15444.2 15444.2 15444.2 15444.2 No difference
T11.4 6 18263.1 18372.8 18075.3 18085.5 18019.2 18080.9 FCR2,FCR1 CoDE
T11.5 13 32797.9 32847.6 32715.2 32766.3 32729.9 32768.2 FCR1,FCR2 CoDE
T11.6 15 124765 126703 126989 128813 126883 128564 CoDE FCR2,FCR1
T11.7 40 1868890 1892380 1892840 1908440 1874920 1901360 CoDE,FCR2 FCR1
T11.8 140 934946 937003 935498 940008 934834 939806 CoDE FCR2,FCR1
T11.9 96 940282 960844 944549 968972 941807 983218 No difference
T11.10 96 934483 936501 933303 939932 935862 938568 CoDE FCR2,FCR1
T13 26 17.263 20.4819 13.555 18.0592 11.6715 18.2202 FCR2,FCR1 CoDE
T14 22 16.3306 21.14 8.93319 15.5489 8.73351 15.3736 FCR1,FCR2 CoDE
Proper Choice of Control Parameters for CoDE Algorithm 209
minimum and median values of 25 runs for each variant along with the best and
the worst performing variant for each problem are figured in Table 1.
In the ‘best’ column, there are the significantly better performing variants,
and in the ‘worst’ column, the variants providing the worst efficiency are located.
It is clear, the most frequently worst performing variant is the original CoDE
algorithm which loses in 12 out of 22 problems. On the other hand, both newly
proposed CoDE variants win in 12 problems out of 22, variant CoDEFCR2 out-
performs the CoDEFCR1 significantly only in one problem. In six problems out
of 22, all CoDE variants perform similarly. In the left side of the table, the least
median value of each problem is printed bold and underlined. The original CoDE
provides the best median value in 7 out of 22 problems, CoDEFCR1 variant is best
performing in 4 out of 22 problems, and CoDEFCR2 achieves the least median
in 9 out of 22 problems.
Table 2. Average frequencies of use of three CoDE variants over 22 real-world prob-
lems.
the frequency of this setting is used more frequently in CoDEFCR2 . In the orig-
inal CoDE variant, the settings with F = 0.8 and CR = 0.2 are used similarly
frequently as the combinations F = 0.95 and CR = 1 in new variants, the curves
are in CoDE more similar.
Proper Choice of Control Parameters for CoDE Algorithm 211
7 Conclusion
Based on the experimental results, it is obvious that the original CoDE algorithm
performs substantially worse compared to the newly proposed variants. The per-
formance of CoDEFCR1 is better than the original CoDE, the difference between
the algorithms is decreased with increased number of function evaluations. The
second proposed CoDEFCR2 performs best, the efficiency is rather invariant with
increasing function evaluations. The worse average performing combinations of
F and CR used in CoDEFCR2 achieve the best overall performance in real-world
problems. The most frequently worst performing variant is the original CoDE
algorithm which loses in 12 out of 22 problems, both newly proposed CoDE
variants win in 12 problems out of 22. CoDEFCR2 outperforms the CoDEFCR1
significantly only in one problem. Proposed variants of CoDE algorithm are able
to outperform the original CoDE. It is interesting that rather worse settings
used in CoDEFCR2 achieve slightly better results than CoDEFCR1 with the best
combination of F and CR. The proposed methods outperform the winner of
CEC 2011 competition (GA-MPC) in 7 out of 22 problems. More proper combi-
nations of control parameters in adaptive DE variants will be studied in further
research.
References
1. Brest, J., Maučec, M.S., Bošković, B.: Single objective real-parameter optimization:
algorithm jSO. In: 2017 IEEE Congress on Evolutionary Computation (CEC), pp.
1311–1318 (2017)
2. Bujok, P.: Migration model of adaptive differential evolution applied to real-world
problems. Artificial Intelligence and Soft Computing–Part I. Lecture Notes in Com-
puter Science, vol. 10841, pp. 313–322. In: 17th International Conference on Arti-
ficial Intelligence and Soft Computing ICAISC. Zakopane, Poland (2018)
3. Bujok, P., Tvrdı́k, J., Poláková, R.: Differential evolution with exponential
crossover revisited. In: Matoušek, R. (ed.) MENDEL, 22nd International Con-
ference on Soft Computing, pp. 17–24. Czech Republic, Brno (2016)
4. Das, S., Mullick, S.S., Suganthan, P.N.: Recent advances in differential evolution-an
updated survey. Swarm Evol. Comput. 27, 1–30 (2016)
5. Das, S., Suganthan, P.N.: Problem Definitions and Evaluation Criteria for CEC
2011 Competition on Testing Evolutionary Algorithms on Real World Optimiza-
tion Problems. Tech. rep. Jadavpur University, India and Nanyang Technological
University, Singapore (2010)
6. Das, S., Suganthan, P.N.: Differential evolution: a survey of the state-of-the-art.
IEEE Trans. Evol. Comput. 15, 27–54 (2011)
7. Feoktistov, V.: Differential Evolution in Search of Sotution. Springer (2006)
8. Neri, F., Tirronen, V.: Recent advances in differential evolution: a survey and
experimental analysis. Artif. Intell. Rev. 33, 61–106 (2010)
9. Price, K.V., Storn, R., Lampinen, J.: Differential Evolution: A Practical Approach
to Global Optimization. Springer (2005)
10. Storn, R., Price, K.V.: Differential evolution–a simple and efficient heuristic for
global optimization over continuous spaces. J. Glob. Optim. 11, 341–359 (1997)
212 P. Bujok et al.
11. Tang, L., Dong, Y., Liu, J.: Differential evolution with an individual-dependent
mechanism. IEEE Trans. Evol. Comput. 19(4), 560–574 (2015)
12. Tvrdı́k, J.: Competitive differential evolution. In: Matoušek, R., Ošmera, P. (eds.)
MENDEL 2006, 12th International Conference on Soft Computing, pp. 7–12. Uni-
versity of Technology, Brno (2006)
13. Tvrdı́k, J., Poláková, R., Veselský, J., Bujok, P.: Adaptive variants of differential
evolution: towards control-parameter-free optimizers. In: Zelinka, I., Snášel, V.,
Abraham, A. (eds.) Handbook of Optimization–From Classical to Modern App-
roach. Intellingent Systems Reference Library, vol. 38, pp. 423–449. Springer, Berlin
Heidelberg (2012)
14. Wang, Y., Cai, Z., Zhang, Q.: Differential evolution with composite trial vector
generation strategies and control parameters. IEEE Trans. Evol. Comput. 15, 55–
66 (2011)
15. Wang, Y., Li, H.X., Huang, T., Li, L.: Differential evolution based on covariance
matrix learning and bimodal distribution parameter setting. Appl. Soft Comput.
18, 232–247 (2014)
16. Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE
Trans. Evol. Comput. 1, 67–82 (1997)
Semidefinite Programming Based Convex
Relaxation for Nonconvex Quadratically
Constrained Quadratic Programming
1 Introduction
We consider in this survey paper the following class of quadratically constrained
quadratic programming (QCQP) problems:
SDP relaxations enhanced with valid inequalities may be tight for the original
problems, i.e., there exists a rank one SDP solution and an optimal solution of
the original problem can be recovered from the SDP solution.
The remaining of this paper is organized as follows. In Sect. 2, we review
various valid inequalities to strengthen the basic SDP relaxation. We conclude
the paper in Sect. 3.
Notations We use v(·) to denote the optimal
√ value of problem (·). Let x
T
denote the Euclidean norm of x, i.e., x= x x, and AF denote the Frobe-
nius norm of a matrix A, i.e., AF = tr(AT A). The notation A 0 refers
that matrix A is a positive semidefinite and symmetric square matrix and the
notation A B for matrices A and B implies that A − B 0 and both A and
B are symmetric.
The inner product of two symmetric matrices is defined by
A · B = i,j=1,...,n Aij Bij , where Aij and Bij are the (i, j) entries of A and
B, respectively. We also use Ai,· and A·,i to denote the ith row and column of
matrix A, respectively. For a positive semidefinite n × n matrix A with spectral
decomposition A = U T DU , where D is a n × n diagonal matrix and U is an
1 1 1
T
n × n orthogonal matrix, √ we use notation A to denote U D U , where D is
2 2 2
In this section, we review the basic SDP relaxation for problem (P) and its
strengthened variants with RLT, SOC-RLT, GSRT and other valid inequalities.
T T T
X T=xx and relaxing X = xx to X xx , which
By lifting x to matrix
1x
is further equivalent to 0 due to the Schur complement, we have the
x X
following basic SDP relaxation for problem (P):
(L) max τ
l
m
aj
Q0 c0 Qi ci 0 2
s.t. cT
2
− λi cT
2
− μj aT 0,
0
2 −τ i
i
2 di i
j
2 −bj
λi ≥ 0, i = 1, . . . , l, μj ≥ 0, j = 1, . . . , m,
also known as the Shor’s relaxation [16]. The strong duality holds for (SDP)
when (SDP) is bounded from below and Slater condition holds for (SDP).
We next review valid inequalities that have been considered to strengthen
(SDP) in the literature. Sherali and Adams [15] first introduced the concept of
“reformulation-linearization technique” (RLT) to formulate a linear program-
ming relaxation for problem (P). The RLT [15] linearizes the product of any
pair of linear constraints, i.e.,
A tighter relaxation for problem (P) can be obtained via enhancing (SDP) relax-
ation with the RLT constraints:
form, i.e.,
xT Qi x ≤ −di − cTi x ⇒ −di − cTi x ≥ 0
⇒
xT Qi x ≤ −di − cTi x
Bi x
1
1
≤ (−di − cTi x + 1). (5)
T
2
2 (−di − ci x − 1)
Multiplying the linear term bj − aTj x ≥ 0 to both sides of the above SOC yields
the following valid inequality,
Bi x
1
(bj − aTj x)
T T
1 (1 + di + cT x)
≤ 2 (bj − aj x)(1 − di − ci x).
2 i
Bi (bj x − Xaj )
(−cTi Xaj + (bj cTi − di aTj − aTj )x + (1 + di )bj )
(6)
2
≤ 12 (cTi Xaj + (di aTj − aTj − bj cTi )x + (1 − di )bj ), i ∈ C, j = 1, . . . , m.
Enhancing (SDPRLT ) with the SOC-RLT constraints gives rise to a tighter relax-
ation:
(SDPSOC-RLT ) min Q0 · X + cT0 x
s.t. (1), (2), (3), (4), (6).
Recently, Burer and Yang [9] demonstrated that the SDP+RLT+(SOC-RLT)
relaxation has no gap in an extended trust region problem of minimizing a
quadratic function subject to a unit ball and multiple linear constraints, where
the linear constraints do not intersect with each other in the interior of the ball.
Stimulated by the construction of SOC-RLT constraints, the authors in [12]
derive the GSRT constraints from nonconvex quadratic constraints and linear
constraints. They first decompose each indefinite matrix in quadratic constraints
according to the signs of its eigenvalues, i.e., Qi = LTi Li − MiT Mi , i ∈ N ,
where Li is corresponding to the positive eigenvalues and Mi is corresponding
to the negative eigenvalues. One of such decompositions is the spectral decom-
n−p+r
position, Qi = j=1 λij vij viTj , where λi1 ≥ λi2 · · · λir > 0 > λip+1 ≥ · · · ≥
λin , 0 ≤ r ≤ p < n, and correspondingly Li = ( λi1 vi1 , . . . , λir vir )T , Mi =
( −λip+1 vip+1 , −λin vin ). They introduced an augmented zi to reform prob-
lem (P) as follows,
(RP) min xT Q0 x + cT0 x
s.t. xT Qi x + cTi x + di ≤ 0, i = 1, . . . , l,
1 T Li x
≤ zi , i ∈ N , (7)
2 (ci x + di + 1)
1 T Mi x
= zi , i ∈ N , (8)
2 (ci x + di − 1)
aTj x ≤ bj , j = 1, . . . , m.
SDP Based Convex Relaxation for Nonconvex QCQP 217
X S x
Denote = (xT z T ). We then relax the intractable nonconvex
ST Z z
X S x T T X S x
constraint = (x z ) to (xT z T ), which is
ST Z z ST Z z
equivalent to the following LMI by the Schur complement,
⎛ ⎞
1 xT z T
⎝ x X S ⎠ 0.
z ST Z
1 T Li x(bj − aj x) T
≤ zi (bj − aTj x),
(c x + d i + 1)(b j − a x)
2 i j
Li bj x − Li xxT aj
i.e.,
T
1 (cTi (bj x − xxT aj ) + (di + 1)(bj − aTj x))
≤ zi bj − zi x aj .
2
Then the linearization of the above formula gives rise to
Li bj x − Li Xaj
1 T
≤ zi bj − S·,i
T
aj . (9)
(c (b j x − Xa j ) + (d i + 1)(b j − aT
x))
2 i j
Since the equality constraint (8) is nonconvex and intractable, relaxing (8) to
inequality yields the following tractable SOC constraint,
1 T Mi x
≤ zi . (10)
2 (ci x + di − 1)
Similarly, linearizing the product of (10) and bj − aTj x gives rise to the following
valid inequalities
Mi bj x − Mi Xaj
1 T
≤ zi bj − S·,i
T
aj . (11)
T
1 T Mi x
= zi2 ,
(c x + d i − 1)
2 i
to a tractable linearization,
1
Zi−k,i−k = X · MiT Mi + (ci cTi · X + (di − 1)2 + 2cTi x(di − 1)). (12)
4
Finally, (7), (9), (10), (11) and (12) together make up the GSRT constraints.
With the GSRT constraint, we strengthen (SDPRLT ) to the following tighter
relaxation:
(SDPGSRT-A ) min Q0 · X + cT0 x
s.t. (1), (2), (4), (6), (7), (9), (10), (11), (12)
⎛ ⎞
1 xT z T
⎝ x X S ⎠ 0.
z ST Z
218 R. Jiang and D. Li
The following theorem, which shows the relationship among all the above
convex relaxations, is obvious due to the nested inclusion relationship of the
feasible regions for this sequence of the relaxations.
Theorem 1. v(P) ≥ v(SDPGSRT-A ) ≥ v(SDPSOC-RLT ) ≥ v(SDPRLT ) ≥
v(SDP).
A natural extension of GSRT is to apply a similar idea to linearize the prod-
uct of a pair of SOC constraints. From the above paragraph, we see that SOC
constraints can be generated by both convex and nonconvex (by adding an aug-
mented variable zi ) constraints, denoting by
C xx (C ) + C s x(ξ t )T + ξ s xT (C t )T + ξ s (ξ t )T
≤ ls lt . (14)
F
C X(C t )T + C s x(ξ t )T + ξ s xT (C t )T + ξ s (ξ t )T
≤ βs,t ,
F
min xT Bx + bT x
s.t. x ≤ 1,
Ax + c ≤ 1,
which are derived from (and equivalent to) GSOC constraints in (13) by Schur
complement, where hj (x) = C j x + ξ j , j = s, t. Linearizing the above Kronecker
product yields KSOC valid inequalities.
3 Concluding Remark
In this survey paper, we have reviewed various valid inequalities to tighten the
SDP relaxations for nonconvex QCQP problems. In fact, we can further rewrite
the objective function as min τ and add a new constraint xT0 Q0 x0 + cT0 x ≤ τ ,
with a new variable τ . The original problem is then equivalent to minimizing
τ and all the techniques developed in this paper can be applied to the new
constraint xT0 Q0 x0 + c0 T x ≤ τ to achieve a tighter lower bound. A drawback
convex relaxations in this paper is their large number of valid inequalities, which
prevent them from efficient computation. A future direction should be to inves-
tigate how to find out the valid inequalities that are violated most and add them
dynamically to solve the original problem.
References
1. Anstreicher, K.: Semidefinite programming versus the reformulation-linearization
technique for nonconvex quadratically constrained quadratic programming. J.
Global Optim. 43(2–3), 471–484 (2009)
2. Anstreicher, K.: Kronecker product constraints with an application to the two-
trust-region subproblem. SIAM J. Optim. 27(1), 368–378 (2017)
3. Anstreicher, K., Chen, X., Wolkowicz, H., Yuan, Y.X.: Strong duality for a trust-
region type relaxation of the quadratic assignment problem. Linear Algebr. Its
Appl. 301(1–3), 121–136 (1999)
4. Anstreicher, K., Wolkowicz, H.: On lagrangian relaxation of quadratic matrix con-
straints. SIAM J. Matrix Anal. Appl. 22(1), 41–55 (2000)
5. Beck, A., Eldar, Y.C.: Strong duality in nonconvex quadratic optimization with
two quadratic constraints. SIAM J. Optim. 17(3), 844–860 (2006)
6. Burer, S., Anstreicher, K.: Second-order-cone constraints for extended trust-region
subproblems. SIAM J. Optim. 23(1), 432–451 (2013)
7. Burer, S., Saxena, A.: The MILP road to MIQCP. In: Mixed Integer Nonlinear
Programming, pp. 373–405. Springer (2012)
8. Burer, S., Vandenbussche, D.: A finite branch-and-bound algorithm for nonconvex
quadratic programming via semidefinite relaxations. Math. Program. 113(2), 259–
282 (2008)
9. Burer, S., Yang, B.: The trust region subproblem with non-intersecting linear con-
straints. Math. Program. 149(1–2), 253–264 (2013)
220 R. Jiang and D. Li
10. Celis, M., Dennis, J., Tapia, R.: A trust region strategy for nonlinear equality
constrained optimization. Numer. Optim. 1984, 71–82 (1985)
11. Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for max-
imum cut and satisfiability problems using semidefinite programming. J. ACM
(JACM) 42(6), 1115–1145 (1995)
12. Jiang, R., Li, D.: Convex relaxations with second order cone constraints for non-
convex quadratically constrained quadratic programming (2016)
13. Linderoth, J.: A simplicial branch-and-bound algorithm for solving quadratically
constrained quadratic programs. Math. Program. 103(2), 251–282 (2005)
14. Pardalos, P.M., Vavasis, S.A.: Quadratic programming with one negative eigen-
value is NP-hard. J. Global Optim. 1(1), 15–22 (1991)
15. Sherali, H.D., Adams, W.P.: A reformulation-linearization technique for solving
discrete and continuous nonconvex problems, vol. 31. Springer Science & Business
Media (2013)
16. Shor, N.Z.: Quadratic optimization problems. Sov. J. Comput. Syst. Sci. 25(6),
1–11 (1987)
17. Sturm, J.F., Zhang, S.: On cones of nonnegative quadratic functions. Math. Oper.
Res 28(2), 246–267 (2003)
18. Vavasis, S.A.: Quadratic programming is in NP. Inf. Process. Lett. 36(2), 73–77
(1990)
19. Ye, Y., Zhang, S.: New results on quadratic minimization. SIAM J. Optim. 14(1),
245–267 (2003)
Solving a Type of the Tikhonov
Regularization of the Total Least Squares
by a New S-Lemma
1 Introduction
The S-lemma can be extended to deal with the equality g(x) = 0 along a
series approaches, for example, please see [2,5,6,14]. They try to answer, for
what
pairs of (f (x), g(x)), the following two statements can become equivalent
(E1 ) ∼ (E2 ) :
The complete necessary and sufficient conditions for the pair of quadratic
functions (f (x), g(x)) under which (E1 ) ∼ (E2 ) were established by Xia et. al.
[13] with new applications to both quadratic optimization and the convexity of
the joint numerical range. As a further extension, Wang and Xia [12] established
the so-called S-lemma with interval bounds:
f (x) − z1 = 0, g(x) − z2 = 0, z1 a + z2 b ≤ c =⇒ z T Θz + θT z − γ ≥ 0;
∃ζ, η ∈ R : ζP + ηQ 0, (1)
θ1 θ2
Θ=
0, (2)
θ2 θ3
under the condition (2). Then, use the result from minimizing (PoD4) to solve a
type of the Tikhonov regularization of the total least squares (TRTLS) proposed
by Beck and Ben-Tal in [1]. The purpose of resolving (TRTLS) is to stabilize,
via the Tikhonov regularization, the total least square solution for fitting an
overdetermined linear system Ax = b. It was formulated in [1] as follows. Given
Solving a Type of the Tikhonov Regularization 223
min {E2 + r2 + ρLx2 : (A + E)x = b + r}
E,r,x
= min min{E2 + r2 + ρLx2 : (A + E)x = b + r}
x E,r
||Ax − b||2
= minn + ρ||Lx||2 (3)
x∈R ||x||2 + 1
For L = I, Beck and Ben-Tal in [1] then used the Dinkelbach method [4] incorpo-
rating with the bisection search method to solve (3). We show, in Sect. 3, that (3)
can be resolved by solving two SDP’s, with one SDP to obtain its optimal value
and the other one for the optimal solution. There is no need for any bisection
method.
The remainder of this study is organized as follows: In Sect. 2, we provide the
proof for Theorem 1 and solve Problem (PoD4). In Sect. 3, we use the Dinkelbach
method incorporating two SDP’s to solve (TRTLS) for the case L = I. Finally,
we have a short discussion in Sect. 4 for future extensions.
is convex. Let
D2 = {(z1 , z2 )| z1 a + z2 b ≤ c}. (5)
and it is easy to see that D2 ⊂ R2 is also convex. Then, the statement (G1 ) can
be recast as
(z1 , z2 ) ∈ D1 ∩ D2 ⇒ F (z) − γ = (z T Θz + θT z − γ) ≥ 0.
where the last equation (14) is due to Theorem 1 by re-defining the notations as
θ4 + 2γθ2 := θ4 , γθ3 − t := θ5 , θ − tγ := −γ. Moreover, we can write (14) as the
following SDP:
⎛ θ4 +2γθ2 −α ⎞
θ1 θ2 2
⎜ [0] γθ3 −t−β ⎟
⎜ θ2 θ3 ⎟
0,
t∗ = ⎝
2 (15)
[0] αP + βQ αp + βq ⎠
θ4 +2γθ2 −α γθ3 −t−β
2 2
αpT + βq T ξ
where ξ = αp0 + βq0 + θ − tγ. In other words, the optimal value t∗ of (12),
and thus the optimal value of the problem (TRTLSI), can be computed through
solving the SDP (15).
After getting the optimal value t∗ of (12) from (15), by (13), we can find the
corresponding optimal solution x∗ by solving the following problem
where h(x) − t∗ l(x) = θ1 f (x)2 + 2θ2 f (x)g(x) + θ3 g(x)2 + (θ4 + 2γθ2 )f (x) + (γθ3 −
t∗ )g(x) + θ − t∗ γ. Since (16) is a special form of (PoD4), therefore we are able
to get x∗ by solving another SDP similar to (11).
4 Discussion
In this paper, we propose a set of sufficient conditions (1)–(2) under which
(G1 ) ∼ (G2 ). It can be easily verified that, when m = 1, a = 1, b = c = θ1 =
. . . = θ4 = γ = 0, θ5 = 1, (G1 ) ∼ (G2 ) reduces to (S1 ) ∼ (S2 ) and we get the
classical S-lemma. Similarly, (G1 ) ∼ (G2 ) covers (I1 ) ∼ (I2 ) with m = 2, a =
(1, −1)T , b = (0, 0)T , c = (v0 , −u0 )T , θ1 = θ2 = θ3 = θ4 = γ = 0 and θ5 = 1.
Moreover, if we further have u0 = v0 = 0, (G1 ) ∼ (G2 ) becomes (E1 ) ∼ (E2 ).
In other words, if the sufficient conditions (1)–(2) van be removed, (G1 ) ∼ (G2 )
would be the most general results summarizing all previous results on S-lemma
so far.
References
1. Beck, A., Ben-Tal, A.: On the solution of the Tikhonov regularization of the total
least squares problem. SIAM J. Optim. 17(1), 98–118 (2006)
2. Beck, A., Eldar, Y.C.: Strong duality in nonconvex quadratic optimization with
two quadratic constraint. SIAM J. Optim. 17(3), 844–860 (2006)
3. Derinkuyu, K., Pınar, M.Ç.: On the S-procedure and some variants. Math. Methods
Oper. Res. 64(1), 55–77 (2006)
4. Dinkelbach, W.: On nonlinear fractional programming. Manag. Sci. 13, 492–498
(1967)
5. Nguyen, V.B., Sheu, R.L., Xia, Y.: An SDP approach for quadratic fractional
problems with a two-sided quadratic constraint. Optim. Methods Softw. 31(4),
701–719 (2016)
Solving a Type of the Tikhonov Regularization 227
6. Polik, I., Terlaky, T.: A survey of the S-lemma. SIAM Rev. 49(3), 371–418 (2007)
7. Pong, T.K., Wolkowicz, H.: The generalized trust region subprobelm. Comput.
Optim. Appl. 58, 273–322 (2014)
8. Polyak, B.T.: Convexity of quadratic transformations and its use in control and
optimization. J. Optim. Theory Appl. 99(3), 553–583 (1998)
9. Rockefellar, R.T.: Convex Analysis. Princeton University Press (1970)
10. Stoer, J., Witzgall, C.: Convexity and Optimization in Finite Dimensions, vol. I.
Springer-Verlag, Heidelberg (1970)
11. Stern, R., Wolkowicz, H.: Indefinite trust region subproblems and nonsymmetric
eigenvalue perturbations. SIAM J. Optim. 5(2), 286–313 (1995)
12. Wang, S., Xia, Y.: Strong duality for generalized trust region subproblem: S-lemma
with interval bounds. Optim. Lett. 9(6), 1063–1073 (2015)
13. Xia, Y., Wang, S., Sheu, R.L.: S-lemma with equality and its applications. Math.
Program. Ser. A. 156(1), 513–547 (2016)
14. Tuy, H., Tuan, H.D.: Generalized S-lemma and strong duality in nonconvex
quadratic programming. J. Global Optim. 56, 1045–1072 (2013)
15. Yakubovich, V.A.: S-procedure in nonlinear control theory. Vestn. Leningr. Univ.
1, 62–77 (1971). (in Russian)
16. Yang, M., Yong, X., Wang, J., Peng, J.: Efficiently solving total least squares with
Tikhonov identical regularization. Comput. Optim. Appl. 70(2), 571–592 (2018)
Solving Mathematical Programs with
Complementarity Constraints with a
Penalization Approach
1 Introduction
The Mathematical Program with Equilibrium Constraints (MPEC) is a con-
strained optimization problem in which the constraints include equilibrium con-
straints, such as variational inequalities or complementarity conditions. In this
paper, we consider a special case of MPEC, the Mathematical Program with
Complementarity Constraints (MPCC) in which the equilibrium constraints are
complementarity constraints. MPCC is an important class of problems since they
arise frequently in applications in engineering design, in economic equilibrium
and in multilevel games [18]. One main source of MPCC comes from bilevel
programming problems, which have numerous applications in practice [27].
A way to solve a standard nonlinear programming problem is to solve its
Karush-Kuhn-Tucker (KKT) system by using some numerical methods such as
Newton-type methods. However, the classical MFCQ that is very often used
to guarantee convergence of algorithms is violated at any feasible point when
the MPCC is treated as a standard nonlinear programming problem, a local
minimizer of MPCC may not be a solution of the classical KKT system. This is
partly due to the geometry of the complementarity constraint that always has
an empty relative interior.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 228–237, 2020.
https://doi.org/10.1007/978-3-030-21803-4_24
Solving Mathematical Programs with Complementarity Constraints 229
A wide range of numerical methods have been proposed to solve this prob-
lem such as relaxation methods [5,6], interior-point methods [16,20,25], penalty
methods [10,18,24], SQP methods [8], dc methods [23], filter methods [15] and
Levenberg-Marquardt methods [14].
In this study, following [3], we study a penalization method to solve the
MPCC. We regularize the complementarity constraints by using concave and
nondecreasing functions introduced in [9], and then penalize the constraints.
This approach allows us to consider the regularization parameter as a variable of
the problem. We prove that every cluster point of the KKT points of the penalty
problem gives a local minimum for the MPCC. We improve the result from [3]
by proving a convergence theorem without any constraint qualification, thus
removing a restrictive assumption. Numerical tests on some randomly generated
problems are studied to show the efficiency and robustness of this approach.
This paper is organized as follows. In Sect. 2, we present some preliminar-
ies on the smoothing functions and our formulation problem. We present our
penalty method and gives the link between the penalized problem and the orig-
inal problem in Sect. 3. The last section presents a set of numerical experiments
concerning a simple number partitioning problem.
2 Preliminaries
In this section, we present some preliminaries concerning the regularization and
approximation process. We consider the following problem
⎧ ∗
⎨ f = min f (x, y)
(P ) < x.y > = 0
⎩
(x, y) ∈ D
By definition of f
+∞ 0
lim θ(t) = f (x)dx = 1 and θ(0) = f (x)dx = 0.
t→+∞ 0 0
230 L. Abdallah et al.
t
Some interesting examples of this family for t ≥ 0 are: θε1 (t) = , θ2 (t) =
t+ε ε
t log(1 + t)
1 − e− ε or θε3 (t) = .
log(1 + t + ε)
m
Lemma 1. ∀x ∈ [0, v], ∀ε ∈ (0, ε0 ], there exists m > 0 such that |∂ε θε (x)| ≤ .
ε2
Proof. Since θε (x) := θ( xε ), then ∂ε θε (x) = −x x
ε2 θε ( ε ). Now by the concavity of
θ, for x ≥ 0 we have 0 ≤ θε ( ε ) ≤ θε (0). Then − εm2 ≤ − εx2 θε ( xε ) ≤ 0 with
x
m = xθε (0).
where D = [0, v]2n × [0, 1]n . The limit problem (Pε ) for ε = 0 is noted (P).
3 A Penalization Approach
Δ(z, ε) := Gε (z) 2 .
The term σβ(ε) allows to consider ε as a new optimization variable, and minimize
simultaneously z and ε.
Let us now to recall the definition of Mangasarian-Fromovitz (MFCQ).
Definition 1. [21] We say that the Mangasarian-Fromovitz condition (MFCQ)
for (Pε ) holds at z ∈ D if Gε (z) has full rank and there exists a vector p ∈ IR3n
such that Gε (z)p = 0, where
> 0 if zi = 0
pi (1)
< 0 if zi = wi
with
v if i ∈ {1 . . . 2n}
wi =
1 if i ∈ {2n + 1 . . . 3n}
The following lemma proves that MFCQ is satisfied whenever ε > 0. This is a
great improvement with respect to [3] where this was a crucial assumption.
Lemma 2. Let ∀ε > 0. Any z feasible for (Pε ) verifies MFCQ.
Proof. (i) Let z be a feasible point of (Pε ). The matrix Gε (z) ∈ IRn × IR3n
defined as follows
⎛ ⎞
θε (x1 ) 0 . . . 0 θε (y1 ) 0 . . . 0 1 0 . . . 0
⎜ 0 θ (x2 ) 0 . . . 0 θε (y2 ) 0 . . . 0 1 . . . 0 ⎟
⎜ ε ⎟
Gε (z) = ⎜ . .. .. .. .. .. ⎟
⎝ .. . . . . .⎠
0 . . . 0 θε (xn ) 0 . . . 0 θε (yn ) 0 . . . 0 1
is of full rank.
(ii) We have to proves that there exits p ∈ IR3n such that Gε (z)p = 0 and pi
verifies (1).
Let p = (p1 , . . . , p3n ). Gε (z)p = 0 implies that
θε (xi )pi + θε (yi )pn+i + p2n+i = 0, for i = 1, . . . , n. (2)
1. ej = 1,
2. ej = 0,
3. 0 < ej < 1.
(1) For the first case, ej = 1, so xj = yj = 0, since θε (0) = 0.
Replacing in (2), we obtain
θε (0)(pj + pn+j ) = −p2n+j .
We can take pj = pn+j = 1 > 0 and p2n+j = −2θε (0) < 0 (θε (0) > 0). So,
MFCQ is verified in this case.
(2) For the second case, ej = 0, we have to consider xj = 0, yj = 0, xj = 0, yj = 0
or xj = 0, yj = 0 since (θε (xj ) = 0 when xj = 0). “no” means that we don’t
have any constraints on pj or pn+j .
(i) Taking pj = −1, pn+j = −1, in (2), we obtain: p2n+j = θε (v) + θε (v) > 0
(Since θε (v) > 0), and, MFCQ is verified.
(2i) There is no constraints on pj , pn+j , only p2n+j should be positive. So,
MFCQ is verified.
θε (v) + 1
(3i) Taking pj = −1, p2n+j = 1, in (2), we have: pn+j = − . So, MFCQ
θε (yj )
is verified.
θ (0) + 1
(4i) Taking pj = 1, p2n+j = 1, in (2), we get: pn+j = − ε < 0. So,
θε (v)
MFCQ is verified.
(5i,6i,7i,8i) As above, it’s easy to see that the MFCQ is verified.
(3) In the third case, 0 < ej < 1, we can consider the same cases as (2), but
additionally here there is no constraints on p2n+j .
Table 1. Case ej = 0
xj yj pj pn+j p2n+j
(i) v v <0 <0 >0
(ii) 0 < xj < v 0 < yj < v no no >0
(3i) xj = v 0 < yj < v <0 no >0
(4i) 0 yj = v >0 <0 >0
(5i) 0 < xj < v yj = v no <0 >0
(6i) v 0 <0 >0 >0
(7i) 0 < xj < v 0 no >0 >0
(8i) 0 0 < yj < v >0 no >0
where ∇fσ is the gradient of fσ with respect to (z, ε). Let (zk , εk ) be a sequence
of KKT points of Pσk with εk = 0, ∀k and lim σk = +∞.
k→+∞
Since D is compact, it holds (up to a subsequence) that
ε2k ∂ε Δk + 2ε3k σk β1 ≤ εk Δk .
Let V be a neighborhood of (z ∗ , 0), for any z feasible for P such that (z, 0) ∈ V
we have
fσ (z ∗ , 0) ≤ fσ (z, 0) = f (x, y) < +∞ (4)
as Δ(z, 0) = 0.
Since fσ (z ∗ , 0) is finite, it follows that Δ(z ∗ , 0) = 0. So, < x∗ , y ∗ > = 0, and
therefore, (x∗ , y ∗ ) is a feasible point of (P ).
Thus, (4) gives f (x∗ , y ∗ ) = fσ (z ∗ , 0) ≤ fσ (z, 0) = f (x, y). Therefore, (x∗ , y ∗ )
is a local minimum of the MPCC problem.
234 L. Abdallah et al.
4 Numerical Results
Thanks to Theorem 1 and driving, in a smart way, σ to infinity, we also improved
the numerical results with respect to [3]. We consider some generated partitioning
problems, that can be cast as an MPCC. These simulations have been done using
AMPL language [4], with the the SNOPT √ solver [13]. In all our tests, we use the
same function β defined by β(ε) := ε [11].
Partitioning problem We describe now the formulation of this partitioning
problem. We consider a set of numbers S = {s1 , s2 , s3 , . . . , sn }. The goal is to
divide S into two subsets such that the subset sums are as close to each other as
n
possible. Let xj = 1 if sj is assigned to subset 1, 0 otherwise. Let sum1 = sj xj
j=1
n
n
and sum2 = sj − sj xj . The difference of the sums is
j=1 j=1
n
n
diff = sum2 − sum1 = c − 2 sj xj , (c = sj ).
j=1 j=1
where
qii = si (si − c), qij = si sj .
Dropping the additive and multiplicative constants, we obtain the following opti-
mization problem
min xT Qx
(U QP )
x ∈ {0, 1}n
This formulation can be written as an MPCC problem
min xT Qx
(U QP )
x.(1 − x) = 0
To get some local solutions for (U QP ), we use our approach described in the
previous section. We generated various randomly problems of size
100
– Best Sum Diff: the best value of | (Q ∗ round(x[i]) − 0.5 ∗ c|,
i=1
– Integrality measure: the max |round(xi ) − xi |,
i
– nb: the number of tests such that the best sum is satisfied,
– nb10 : the number of tests such that the sum:
100
(Q ∗ round(x[i]) − 0.5 ∗ c ≤ 10.
i=1
5 Conclusion
In this paper, we present a penalty approach to solve the mathematical program
with complementarity constraints. Under some mild hypotheses and without any
constraint qualification assumption, we prove the link between the penalized
problem and the MPCC. Our approach tested on some randomly generated
partitioning problems give very promising results and validated our approach.
236 L. Abdallah et al.
References
1. Abdallah, L., Haddou, M., Migot, T.: A sub-additive dc approach to the comple-
mentarity problem. Comput. Optim. Appl. 1–26 (2019)
2. Abdallah, L., Haddou, M., Migot, T.: Solving absolute value equation using comple-
mentarity and smoothing functions. J. Comput. Appl. Math. 327, 196–207 (2018)
3. Abdallah, L., Haddou, M.: An exact penalty approach for mathematical programs
with equilibrium constraints. J. Adv. Math. 9, 2946–2955 (2014)
4. AMPL. http://www.ampl.com
5. Dussault, J.P., Haddou, M., Kadrani, A., Migot, T.: How to Compute a M-
Stationary Point of the MPCC. Optimization online.org (2017)
6. Dussault, J.P., Haddou, M., Migot, T.: The New Butterfly Relaxation Method
for Mathematical Programs with Complementarity Constraints. Optimization
online.org (2016)
7. Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Com-
plementarity Problems, vol. I and II. Springer, New York (2003)
8. Fletcher, R., Leyffer, S., Ralph, D., Scholtes, S.: Local convergence of SQP methods
for mathematical programs with equilibrium constraints. SIAM J. Optim. 17(1),
259–286 (2006)
9. Haddou, M.: A new class of smoothing methods for mathematical programs with
equilibrium constraints. Pac. J. Optim. 5, 87–95 (2009)
10. Hu, X.M., Ralph, D.: Convergence of a penalty method for mathematical program-
ming with complementarity constraints. J. Optim. Theory Appl. 123, 365–390
(2004)
11. Huyer, W., Neumaier, A.: A new exact penalty function. SIAM J. Optim. 13(4),
1141–1158 (2003)
12. Facchinei, F., Jiang, H., Qi, L.: A smoothing method for mathematical programs
with equilibrium constraints. Math. Program. 85, 81–106 (1995)
13. Gill, P., Murray, W., Saunders, M.: SNOPT, a large-scale smooth optimization
problems having linear or nonlinear objectives and constraints. http://www-neos.
mcs.anl.gov/neos/solvers
14. Guo, L., Lin, G.H., Ye, J.J.: Solving mathematical programs with equilibrium
constraints. J. Optim. Theory Appl. 166, 234–256 (2015)
15. Leyfer, S., Munson, T.S.: A globally convergent filter method for MPECs, April
2009. ANL/MCS-P1457-0907
16. Leyfer, S., Lepez-Calva, G., Nocedal, J.: Interior methods for mathematical pro-
grams with complementarity constraints. SIAM J. Optim. 17(1), 52–77 (2006)
17. Lin, G.H., Fukushima, M.: Some exact penalty results for nonlinear programs and
mathematical programs with equilibrum constraints. J. Optim. Theory Appl. 118,
67–80 (2003)
18. Luo, Z.Q., Pang, J.S., Ralph, D.: Mathematical Programs with Equilibrium Con-
straints. Cambridge University Press, Cambridge, UK (1996)
19. Liu, G., Ye, J., Zhu, J.: Partial exact penalty for mathematical programs with
equilibrum constraints. J. Set-Valued Anal. 16, 785–804 (2008)
20. Liu, X., Sun, J.: Generalized stationary points and an interior-point method for
mathematical programs with equilibrium constraints. Math. Program. 101(1),
231–261 (2004)
21. Mangasarian, O.L., Fromovitz, S.: The Fritz John necessary optimality conditions
in the presence of equality and inequality constraints. J. Math. Anal. Appl. 17,
37–47 (1967)
Solving Mathematical Programs with Complementarity Constraints 237
22. Mangasarian, O.L., Pang, J.S.: Exact penalty functions for mathematical programs
with linear complementary constraints. J. Glob. Optim. 5 (1994)
23. Marechal, M., Correa, R.: A DC (Difference of Convex functions) Approach of the
MPECs. Optimization online.org (2014)
24. Monteiro, M.T.T., Meira, J.F.P.: A penalty method and a regularization strategy
to solve MPCC. Int. J. Comput. Math. 88(1), 145–149 (2011)
25. Raghunathan, A.U., Biegler, L.T.: An interior point method for mathematical
programs with complementarity constraints (MPCCs). SIAM J. Optim. 15(3),
720–750 (2005)
26. Ralph, D., Wright, S.J.: Some Properties of Regularization and Penalization
Schemee for MPECS. Springer, New York (2000)
27. Ye, J.J., Zhu, D.L., Zhu, Q.J.: Exact penalization and necessary optimality condi-
tions for generalized bilevel programming problems. SIAM J. Optim. 7, 481–507
(1997)
Stochastic Tunneling for Improving the
Efficiency of Stochastic Efficient Global
Optimization
1 Introduction
The optimization of a variety of engineering problems may require the minimiza-
tion (or maximization) of expensive to evaluate and high dimensional integrals.
These problems become more challenging if the resulting objective function turns
out be not non convex and multimodal. Examples of this kind may arise, for
example, from the maximization of the expected performance of a mechanical
system, vastly applied in robust design [10], the multidimensional integral of
Performance Based Design Optimization [2], or the double integral of Optimal
Design of Experiment problems [3].
A powerful approach to handle these issues is the Efficient Global Optimiza-
tion (EGO) [9], which exploits the information provided by the Kriging meta-
model to iteratively add new points, improving the surrogate accuracy and at
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 238–246, 2020.
https://doi.org/10.1007/978-3-030-21803-4_25
Stochastic Tunneling for Improving the Efficiency 239
the same time seeking its global minimum. For problems presenting variability
(or uncertainty), the Stochastic Kriging (SK) [1] was developed. The use of SK
within the EGO framework, or stochastic Efficient Global Optimization (sEGO),
is relatively recent. For example, [11] benchmarked different infill criteria for the
noisy case, while [8] compared Kriging-based methods in heterogeneous noise
situations.
Recently, a Adaptive Variance Target sEGO [4] approach was proposed for
the minimization of integrals. It employs Monte Carlo Integration (MCI) to
approximate the objective function and includes the variance of the error in
the integration into the SK framework. This variance of the error is adaptively
managed by the method, providing an efficient optimization process by rationally
spending the available computational budget. This method reached promising
results, specially in high dimensional problems [4].
This paper, thus, aims at enhancing the performance of the Adaptive Vari-
ance Target sEGO [4] by proposing the use of a normalization scheme during the
optimization process. This normalization is the result of the so called stochas-
tic tunneling approach, applied together with the Simulated Annealing (SA) for
global Minimization of Complex Potential Energy Landscapes [12]. In the SA
context, the physical idea behind the stochastic tunneling method is to allow
the particle to “tunnel” high energy regions of domain, once it was realized that
they are not relevant for the low-energy properties of the problem. In the sEGO
context, it is expected that this normalization reduce the variability level of the
regions of the design domain that have high values of the objective function as
well as reduce the dependency of the quality of the search on the parameters of
the SK.
The rest of the paper is organized as follows: Sect. 2 presents the problem
statement. The Adaptive Variance Target sEGO is presented in Sect. 3, together
with the proposed normalization scheme. Numerical examples are studied in
Sect. 6 to show the efficiency and robustness of the normalization. Finally, the
main conclusion are listed in Sect. 7.
2 Problem Statement
while the resulting objective function y is not convex and multimodal. Applying
MCI to estimate y, we have
1
nr
y(d) ≈ ȳ(d) = φ(d, x(i) ), (2)
nr i=1
where nr is the sample size and x(i) are sample points randomly drawn from
distribution w(x). One of the advantages of MCI is that we are able to estimate
the variance of the error of the approximation as:
1
nr
σ 2 (d) = (φi − ȳ(d))2 , (3)
nr (nr − 1) i=1
where φi = φ(d, x(i) ). Thus, by increasing the sample size nr (i.e. the number of
replications), the variance estimate decreases and approximation in 2 gets closer
to the exact value of Eq. 1.
For the construction of the initial sampling plan, we employ here the Latin
hypercube scheme detailed in [5]. Then, Steps 2 and 3 are repeated until a stop
criterion is met, e.g., maximum number of function evaluations. The manner in
which the infill points are added in each iteration is what differs the different
sEGO approaches. In this paper, we employ the AEI criterion as infill point
strategy, since it provided promising results [11]. Step 2 constructs a prediction
model, which is given by the SK in this paper, and its formulation is given in
the next subsection.
The work of [1] proposed a SK accounting for the sampling variability that is
inherent to a stochastic simulation. To accomplish this, they characterized both
the intrinsic error inherent in a stochastic simulation and the extrinsic error that
comes from the metamodel approximation. Then, the SK prediction can be seen
as:
Stochastic Tunneling for Improving the Efficiency 241
where M (d) is the usual average trend, Z(d) accounts for the model uncertainty
and is now referred as extrinsic noise. The additional term , called intrinsic
noise, accounts for the simulation uncertainty or variability. In this paper, the
variability is due to the error in the approximation of the integral from Eq. (1)
caused by MCI. It is worth to recall here that MCI provides an estimation of
the variance of this error. That is, we are able to estimate the intrinsic noise,
and consequently, introduce this information into the metamodel framework. To
accomplish this, we construct the covariance matrix of the intrinsic noise - among
the current sampling plan points. Since the intrinsic error is assumed to be i.i.d.
and normal, the covariance matrix is a diagonal matrix with components
which is the usual Kriging prediction with the added diagonal correlation matrix
from the intrinsic noise. Similarly, the predicted error takes the form:
2 1 + λ(d) − rT (Ψ + Σ )−1 r
s2n (d) = σ
The adaptive target selection is briefly introduced in this section. For a more
detailed description, the reader is referred to [4]. With the framework presented
so far, we are able to incorporate error estimates from MCI within the sEGO
scheme. It is important to notice that the number of samples of the MCI is an
input parameter, i.e. the designer has to set nr in Eq. (3). Consequently, the
designer is able to control the magnitude of Σ and λ by changing the sample size
nr . However, in practice a target variance (σ 20 ) is first chosen and the sample size
is iteratively increased until the evaluated variance is close to the target value.
Thus, for a constant target variance, the regression parameter is then enforced
by the MCI procedure to be
λ(d) = σ 20 . (8)
The choice of the target variance must consider two facts: (a) if the target
variance is too high, the associated error may lead to a poor and deceiving
242 F. Nascentes et al.
approximation of the integral, and, (b) if the target tends to zero, so does the
error and we retrieve the deterministic case, however, at the expense of a huge
computational effort.
The advantage here is that the Adaptive Variance Target selection automat-
ically defines the variance target for a new infill point in the sEGO search. That
is, the adaptive approach starts exploring the design domain by evaluating the
objective function value of each design point using MCI with a high target vari-
ance - so that each evaluation requires only a few samples. Then, it gradually
reduces the target variance for the evaluation of additional infill points in already
visited regions.
A flowchart of the proposed stochastic EGO algorithm, including the pro-
posed adaptive target selection, is shown in Fig. 1. In the next paragraphs, each
of its steps is detailed.
After the construction of the SK metamodel for the initial sampling plan,
the infill stage begins. The AEI method is employed for this purpose. Here, an
initial target variance σ 20 is set and the first infill point is added to the model
being simulated up to this corresponding target variance.
From the second infill point on, the adaptive target selection scheme starts to
take place. We propose the use of an exponential decay equation parameterized
Stochastic Tunneling for Improving the Efficiency 243
by problem dimension (n) and the number of points already sampled near the
new infill point (nclose ), which is defined by the number of points in the model
located at a given distance (rhc ) of the infill point. Here, we consider a hypercube
around the infill point selected with half-sides rhc to evaluate nclose .
Then, when the infill is located within an unsampled region, its target vari-
ance is set as the initial target variance. On the other hand, when the infill is
located in a region with existing sampled points, a lower target variance (σ 2adapt )
is employed for the approximation of its objective function value. This is done to
allocate more computational effort on regions that need to be exploited. When
they start to group up, the focus changes to landscape exploitation. In this sit-
uation, the target MCI variance is set to a lower value, increasing the model
accuracy.
The expression proposed to calculate the adaptive target value for each iter-
ation of the sEGO algorithm is
σ 20
σ 2adapt = , (9)
exp(a1 + a2 · n + a3 · nclose − a4 · nclose · n)
where ai are given constants. We also set a minimum and a maximum value for
the adaptive target, in order to avoid a computationally intractable number of
samples. We thus enforce
σ 2min ≤ σ 2adapt ≤ σ 20 , (10)
where σ 2min is a lower bound on the target.
where γ and y0 are given parameters. Then, the sEGO approaches minimizes
J instead of the original approximated function ȳ. In the sEGO context, it is
expected that this normalization reduce the variability level of the regions of the
design domain that have high values of the objective function, as well as reduce
the dependency of the quality of the search on the parameters of the SK and
adaptive methods.
244 F. Nascentes et al.
6 Numerical Examples
In this section, we analyze the minimization of two multimodal problems taken
from [4]. The first problem is a stochastic version of the 2D Multimodal Branin
function:
7 Conclusion
This paper proposed the use of a normalization scheme in order to increase the
performance of the recently developed Adaptive Target Variance sEGO method
for the minimization of functions that depend on expensive to evaluate and high
dimensional integrals. As in the original version of the method, the integral to
be minimized was approximated by MCI and the variance of the error in this
approximation was included into the SK framework. The AEI infill criterion was
employed to guide the addition of new points in the metamodel. The modification
proposed here was to minimize a normalized version of ȳ, by employing the
nonlinear transformation proposed by [12].
246 F. Nascentes et al.
Overall, the use of the normalization in the sEGO method yielded very
promising results for the minimization of integrals. Indeed, it was able to obtain
more precise results, while requiring only a fraction of the computational bud-
get of the original version of the algorithm. However, since the results presented
here are preliminary, the use of the normalization deserves further investigation
in order to better assess its impact in the sEGO search in different situations
and problems.
Acknowledgements. The authors acknowledge the financial support and thank the
Brazilian research funding agencies CNPq and CAPES.
References
1. Ankenman, B., Nelson, B.L., Staum, J.: Stochastic kriging for simulation meta-
modeling. Oper. Res. 58(2), 371–382 (2010)
2. Beck, A.T., Kougioumtzoglou, I.A., dos Santos, K.R.M.: Optimal performance-
based design of non-linear stochastic dynamical rc structures subject to stationary
wind excitation. Eng. Struct. 78, 145–153 (2014)
3. Beck, J., Dia, B.M., Espath, L.F.R., Long, Q., Tempone, R.: Fast Bayesian exper-
imental design: laplace-based importance sampling for the expected information
gain. Comput. Methods Appl. Mech. Eng. 334, 523–553 (2018). https://doi.org/
10.1016/j.cma.2018.01.053
4. Carraro, F., Lopez, R.H., Miguel, L.F.F., Andre J.T.: Optimum design of planar
steel frames using the search group algorithm. Struct. Multidiscip. Optim. (2019)
(p. to appear)
5. Forrester, A., Sobester, A., Keane, A.: Engineering Design Via Surrogate Mod-
elling: A Practical Guide. Wiley, Chichester (2008)
6. Gomes, W.J., Beck, A.T., Lopez, R.H., Miguel, L.F.: A probabilistic metric for
comparing metaheuristic optimization algorithms. Struct. Saf. 70, 59–70 (2018)
7. Huang, D., Allen, T.T., Notz, W.I., Zeng, N.: Global optimization of stochastic
black-box systems via sequential kriging meta-models. J. Glob. Optim. 34(3), 441–
466 (2006)
8. Jalali, H., Nieuwenhuyse, I.V., Picheny, V.: Comparison of kriging-based algorithms
for simulation optimization with heterogeneous noise. Eur. J. Oper. Res. 261(1),
279–301 (2017). https://doi.org/10.1016/j.ejor.2017.01.035
9. Jones, D.R., Schonlau, M., William, J.: Efficient global optimization of expensive
black-box functions. J. Glob. Optim. 13, 455–492 (1998). https://doi.org/10.1023/
a:1008306431147
10. Lopez, R., Ritto, T., Sampaio, R., de Cursi, J.S.: A new algorithm for the robust
optimization of rotor-bearing systems. Eng. Optim. 46(8), 1123–1138 (2014).
https://doi.org/10.1080/0305215X.2013.819095
11. Picheny, V., Wagner, T., Ginsbourger, D.: A benchmark of kriging-based infill
criteria for noisy optimization. Struct. Multidiscip. Optim. 48(3), 607–626 (2013)
12. Wenzel, W., Hamacher, K.: Stochastic tunneling approach for global mini-
mization of complex potential energy landscapes. Phys. Rev. Lett. 82, 3003–
3007 (1999). https://doi.org/10.1103/PhysRevLett.82.3003, https://link.aps.org/
doi/10.1103/PhysRevLett.82.3003
The Bernstein Polynomials Based
Globally Optimal Nonlinear Model
Predictive Control
1 Introduction
The design of efficient controllers to extract the desired performance from phys-
ical engineering systems has been a well studied problem in control engineer-
ing [3]. This problem can be broadly split into the following two stages: (i)
1
Now with the John Deere Technology Centre, Magarpatta City, Pune, India (email:
PatilBhagyesh@JohnDeere.com). 2,3 The authors acknowledge funding support from
the NTU Start-Up Grant and the MOE Academic Research Fund Tier 1 Grant.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 247–256, 2020.
https://doi.org/10.1007/978-3-030-21803-4_26
248 B. V. Patil et al.
development of a mathematical model for the physical system under study; and
(ii) controller design based on the mathematical model developed in (i). In the
last decade, tremendous advancements have been made in the development of
optimization algorithms and computing platforms used to solve optimization
problems. Consequently, several controllers have been designed which utilize
advanced computational algorithms to solve complex optimization problems (see,
for instance [7,8,10], and the references therein).
In recent years, nonlinear model predictive control (NMPC) has emerged as
a promising advanced control methodology. In principle, MPC performs a cost
optimization subject to specific constraints on the system. The cost optimization
is performed repeatedly over a moving horizon window [13]. We note that the
following two issues need to be carefully considered while designing any NMPC
scheme:
(i) Can the nonlinear optimization procedure be completed until a convergence
criterion is satisfied to guarantee the optimality of the solution obtained?
(ii) Can (i) be achieved within the prescribed sampling time?
This work primarily addresses (i) which necessitates the development of an
advanced optimization procedure. In the literature, (i) has been addressed by
many researchers using various global optimization solution approaches. For
instance, the particle swarm optimization (PSO) approach was used in [4]. A
branch-and-bound approach was adopted to solve the NMPC optimization prob-
lem in [2]. Reference [9] extended the traditional branch-and-bound approach
with bound tightening techniques to locate the correct global optimum for the
NMPC optimization problem. Apart from these works, Patil et al. advocated the
use of Bernstein global optimization procedures for NMPC applications (see, for
instance, [1,11]). These optimization procedures are based on the Bernstein form
of polynomials [12] and use several attractive ‘geometrical’ properties associated
with the Bernstein form of polynomials.
This work a sequential improvement of the previous work reported in [11].
Specifically, [11] introduced a Bernstein branch-and-bound algorithm to solve the
nonlinear optimization problems encountered in an NMPC scheme. We note that
the algorithm presented in [11] is computationally expensive due to the numerous
branchings involved in a typical branch-and-bound framework. This motivates
the main contribution of this work wherein a tool is developed to accelerate
the solution search procedure for the Bernstein global optimization algorithm.
The developed tool speeds up the Bernstein global optimization algorithm by
trimming (discarding) those regions from the solution search space which cer-
tainly do not contain any solution. Due to the nature of its main function, the
developed tool is called a ‘box trim operator’.
2 NMPC Formulation
Consider a class of time-invariant continuous-time systems described using the
following nonlinear model:
ẋ = f (x, u), x(t0 ) = x0 (1)
The Bernstein Polynomials Based Globally Optimal NMPC 249
where x ∈ Rnx and u ∈ Rnu represent the vectors of the system states and the
control inputs respectively while f describes the nonlinear dynamic behavior
of the system. The NMPC of a discrete-time system involves the solution of a
nonlinear optimization problem at each sampling instant. Mathematically, the
NMPC problem formulation can be summarized as follows:
N
−1
min J = L (xk , uk ) (2a)
xk ,uk
k=0
subject to x0 = x̂0 (2b)
xk+1 = xk + Δt.f (xk , uk ) (2c)
c(xk , uk ) ≤ 0 (2d)
xmin
k ≤ xk ≤ xmax
k (2e)
umin
k ≤ uk ≤ umax
k (2f)
for k = 0, 1, . . . , N − 1 (2g)
where N represents the prediction horizon; x̂0 ∈ Rnx represents the initial states
of the system and xk ∈ Rnx and uk ∈ Rnu represent the system states and the
control inputs respectively at the kth sampling instant. The objective function
(J) in (2a) is defined by the stage cost L (.). The discretized nonlinear dynamics
in (2c) are formulated as a set of equality constraints and c(xk , uk ) represents
the nonlinear constraints arising from the operational requirements of the system
in (1). The system is subjected to the state and input constraints described by
(2e)–(2f).
where N is the degree of pf . We transform (3) into the following Bernstein form
of polynomials to obtain the bounds for its range over an l-dimensional box x.
pb (x) = bI (x) BIN (x) , (4)
I≤N
250 B. V. Patil et al.
where BIN (x) is the Ith Bernstein basis polynomial of degree N . BIN (x) is defined
as follows:
BIN (x) = Bin11 (x1 ) · · · Binll (xl ), x ∈ Rl (5)
For ij = 0, 1, . . . , nj , j = 1, 2, . . . , l ,
n nj (xj − xj )ij (xj − xj )nj −ij
Bijj (xj ) = , (6)
ij (xj − xj )nj
Note that all the Bernstein coefficients bI (x)I∈S form an array, wherein S =
{I : I ≤ N }. Furthermore, we define S0 as a special set comprising only the
vertex indices from S as shown below:
Remark 3.1. The partial derivative of the polynomial pf in (3) with respect to
xr (1 ≤ r ≤ l) can be obtained from its Bernstein coefficients bI (x) using the
following relation [12]:
nr
pf,r (x) = [bIr,1 (x) − bI (x)]BNr,−1 ,I (x), 1 ≤ r ≤ l, x ∈ x. (8)
w(x)
I≤Nr,−1
where pf,r (x) contains an enclosure of the range of the derivative of the polyno-
mial pf on the box x.
F (y k )
y k+1 = y k − ,
F (y k )
Inputs: The cost function in (2a) as f , the equality constraints in (2c) as hj , the
inequality constraints in (2d) as gi , the initial search box comprising the system
252 B. V. Patil et al.
states (xk ) and the control inputs (uk ) as y, the tolerance parameter f on the
global minimum of (2a), and the tolerance parameter h to which the equality
constraints in (2c) need to be satisfied.
Outputs: The global minimum cost value of (2a) as f ∗ and the global minimizers
for the states (xk ) and the control input profile (uk ) as y∗ .
BEGIN Algorithm
Relaxation step
• For the item (f, bI,f , bI,gi , bI,hj , y ), check the constraint feasibility as detailed
in Remark 3.2. If it is found to be quasi-feasible, go to the branching step.
• Check if the item (f, bI,f , bI,gi , bI,hj , y ) satisfies the vertex property. If ‘true’,
then update f = b vi and add this item to Lsol . Go to the box trimming step.
I,f
vi
(Note that bI,f are the vertex Bernstein coefficients obtained from bI,f using
the special set S0 ).
Branching step
• For the item (f, bI,f , bI,gi , bI,hj , y ), subdivide the box y into two subboxes
y1 and y2 .
• Compute the Bernstein coefficient matrices for f , gi , and hj on y1 and y2 .
Construct the two items as (fk , bI,f,k , bI,gi,k , bI,hj,k , yk ), k = 1, 2.
• Discard yk , k = 1, 2 for which min(bI,f,k ) > f . Enter (fk , bI,f,k , bI,gi,k ,
bI,hj,k , yk ) into L Here fk := min(bI,f,k ) . Go to the box trimming step.
The Bernstein Polynomials Based Globally Optimal NMPC 253
Termination step
• Find the item in Lsol for which the first entry is equal to f . Denote that item
by If .
• In If , the first entry is the global minimum f ∗ while the last entry is the
global minimizer y∗ .
• Return the global solution (f ∗ , y∗ ).
END Algorithm
In this section, the performance of the BBTB algorithm (Sect. 3.2) based NMPC
scheme is first compared with that of the conventional sequential-quadratic pro-
gramming based suboptimal NMPC scheme implemented in MATLAB [5]. These
two schemes are compared in terms of the optimality of the solutions obtained
while solving the nonlinear optimization problems encountered in the respective
schemes. Subsequently, the benefits derived from the use of the box trim operator
in the BBTB algorithm are assessed. Specifically, the computation time required
and the number of boxes processed by the BBTB algorithm based NMPC scheme
are compared with the computation time required and the number of boxes pro-
cessed by the previously reported Bernstein algorithm (BBBC) based NMPC
scheme from [11].
The following parameter values were chosen for the simulation studies in this
paper:
Figure 1 shows the evolution of the system states from their initial values
(CA = 0.2, T = 370) for a series of setpoint changes. The closed-loop perfor-
mances of the BBTB based NMPC scheme and the suboptimal NMPC scheme
are compared. We observed that both the system states transitioned smoothly
to their new values for multiple setpoint changes when the CSTR system was
controlled using the BBTB algorithm based NMPC scheme. On the other hand,
some undershoot and overshoot (≈ 2 − 5%) was observed when the suboptimal
NMPC scheme was used to control the CSTR system. The settling time was sim-
ilar for both the NMPC schemes. Figure 2a illustrates the control action observed
when the CSTR system was controlled using the BBTB algorithm based NMPC
scheme and the suboptimal NMPC scheme. It is apparent that except for the first
few samples (≈ 0−20), the BBTB algorithm based NMPC scheme demonstrates
smooth control performance. The suboptimal NMPC scheme demonstrates a
slightly oscillating control action, particularly when the setpoint changes are
applied.
254 B. V. Patil et al.
1 380
0.8 360
A
0.7 350
0.6 340
0.5 330
0.4 320
Global NMPC (Bernstein algorithm)
0.3 Sub−optimal NMPC (fmincon) 310
0.2 300
0 50 100 150 200 250 0 50 100 150 200 250
Samples Samples
(a) (b)
Fig. 1. Evolution of the states CA (a) and T (b) when the CSTR system is controlled
using the BBTB algorithm based NMPC scheme and the sequential-quadratic pro-
gramming based suboptimal NMPC scheme.
The Bernstein Polynomials Based Globally Optimal NMPC 255
310 0.2
300 0.14
Time (Sec)
0.12
295
0.1
290 0.08
0.06
285
Global NMPC (Bernstein algorithm) 0.04
Sub−optimal NMPC (fmincon)
280 0.02
0 50 100 150 200 250 0 50 100 150 200 250
Samples Samples
(a) (b)
Fig. 2. (a) Control input (Tc ) profile for the CSTR system controlled using the BBTB
algorithm based NMPC scheme and the sequential-quadratic programming based sub-
optimal NMPC scheme. (b) Comparison of the computation times needed for solving
a nonlinear optimization problem of the form (2a)–(2g) at each sampling instant using
the BBTB algorithm and BBBC algorithm based NMPC schemes. The sampling time
is 0.5s.
500 9
BBBC (without the box trim operator) Global NMPC (Bernstein)
450 BBTB (with the box trim operator) 8 Sub−optimal NMPC (fmincon)
400 7
350
6
300 7
5
250
6
4
200
3 SP SP SP4 SP
2 5
150 1 SP3
5 2
100 0 100 200 250
50 1
0 0
0 50 100 150 200 250 0 50 100 150 200 250
Samples Samples
(a) (b)
Fig. 3. (a) Number of boxes processed during the branch-and-bound process of the
BBBC and BBTB algorithms. (b) Cost function values of the nonlinear optimization
problems of the form (2a)–(2g) solved at each sampling instant when the CSTR sys-
tem is controlled using the BBTB algorithm based NMPC scheme and the sequential-
quadratic programming based suboptimal NMPC scheme. SP1 , SP2 , SP3 , and SP4
show the samples at which the setpoint changes are implemented.
4 Conclusions
This work presented a global optimization algorithm based NMPC scheme for
nonlinear systems. We first discussed the necessity of using global optimization
algorithm based NMPC scheme. Subsequently, we proposed an improvement in
the Bernstein global optimization algorithm. The proposed improvement was a
Newton method based box trim operator which utilized some nice geometrical
properties associated with the Bernstein form of polynomials. Practically, this
operator quickened the computational times for the online nonlinear optimiza-
tion problems encountered in an NMPC scheme. The BBTB algorithm based
NMPC scheme was tested on a CSTR system to demonstrate its efficacy. The
results of the case studies performed on the CSTR system demonstrated the
superior control performance of the BBTB algorithm based NMPC scheme when
compared with a conventional sequential-quadratic programming based subop-
timal NMPC scheme. The case studies also showed that the performance of the
Bernstein global optimization algorithm based NMPC scheme can be improved
256 B. V. Patil et al.
References
1. Patil, B.V., Bhartiya, S., Nataraj, P.S.V., Nandola, N.N.: Multiple-model based
predictive control of nonlinear hybrid systems based on global optimization using
the Bernstein polynomial approach. J. Process Control 22(2), 423–435 (2012)
2. Cizniar, M., Fikar, M., Latifi, M.A.: Design of constrained nonlinear model pre-
dictive control based on global optimisation. In: 18th European Symposium on
Computer Aided Process Engineering-ESCAPE 18, pp. 1–6 (2008)
3. Doyle, J.C., Francis, B.A., Tannenbaum, A.R.: Feedback Control Theory. Dover
Publications, USA (2009)
4. Germin Nisha, M., Pillai, G.N.: Nonlinear model predictive control with relevance
vector regression and particle swarm optimization. J. Control. Theory Appl. 11(4),
563–569 (2013)
5. Grüne, L., Pannek, J.: Nonlinear Model Predictive Control, pp. 43–66. Springer,
London (2011)
6. Hansen, E.R., Walster, G.W.: Global Optimization Using Interval Analysis, 2nd
edn. Marcel Dekker, New York (2005)
7. Inga J. Wolf, Marquardt, W.: Fact NMPC schemes for regulatory and economic
NMPC− A review. J. Process Control 44, 162–183 (2016)
8. Lenhart, S., Workman, J.T.: Optimal Control Applied to Biological Models. CRC
Press, USA (2007)
9. Long, C., Polisetty, P., Gatzke, E.: Nonlinear model predictive control using deter-
ministic global optimization. J. Process Control 16(6), 635–643 (2006)
˚
10. Aström, K.J., Wittenmark, B.: Computer-Controlled Systems: Theory and Design,
3rd edn. Dover Publications, USA (2011)
11. Patil, B.V., Maciejowski, J., Ling, K.V.: Nonlinear model predictive control based
on Bernstein global optimization with application to a nonlinear CSTR. In: IEEE
Proceedings of 15th Annual European Control Conference, pp. 471–476. Aalborg,
Denmark (2016)
12. Ratschek, H., Rokne, J.: New Computer Methods for Global Optimization. Ellis
Horwood Publishers, Chichester, England (1988)
13. Rawlings, J.B., Mayne, D.Q., Diehl, M.M.: Model Predictive Control: Theory,
Computation, and Design, 2nd edn. Nob Hill Publishing, USA (2017)
14. Stahl, V.: Interval methods for bounding the range of polynomials and solving
systems of nonlinear equations. Ph.D. thesis, Johannes Kepler University, Linz
(1995)
Towards the Biconjugate of Bivariate
Piecewise Quadratic Functions
1 Introduction
with the existence of linear time algorithms for various convex transforms [4,22].
Computing the full graph of the convex hull of univariate PLQ functions is
possible in optimal linear worst-case time complexity [9].
For a function f defined over a region P , the pointwise supremum of
all its convex underestimators is called the convex envelope and is denoted
convfP (x, y). Computing the convex envelope of a multilinear function over
a unit hypercube is NP-Hard [7]. However, the convex envelope of functions
defined over a polytope P and restricted by the vertices of P can be computed
in finite time using a linear program [26,27]. A method to reduce the computa-
tion of convex envelope of functions that are one lower dimension(Rn−1 ) convex
and have indefinite Hessian to optimization problems in lower dimensions is
discussed in [14].
Any general bivariate nonconvex quadratic function can be linearly trans-
formed to the sum of bilinear and a linear function. Convex envelopes for bilinear
functions over rectangles have been discussed in [23] and validated in [1]. The
convex envelope over special polytopes (not containing edges with finite positive
slope) was derived in [25] while [15] deals with bilinear functions over a triangle
containing exactly one edge with finite positive slope. The convex envelope over
general triangles and triangulation of the polytopes through doubly nonnegative
matrices (both semidefinite and nonnegative) is presented in [2].
In [16], it is shown that the analytical form of the convex envelope of some
bivariate functions defined over polytopes can be computed by solving a con-
tinuously differentiable convex problem. In that case, the convex envelope is
characterized by a polyhedral subdivision.
The Fenchel conjugate f ∗ (s) = supx∈Rn [s, x − f (x)] (we note s, x = sT x)
of a function f : Rn → R ∪ {+∞} is also known as the Legendre-Fenchel Trans-
form or convex conjugate or simply conjugate. It plays a significant role in duality
and computing it is a key step in solving the dual optimization problem [24].
Most notably, the biconjugate is also the closed convex envelope.
A method to compute the conjugate known as the fast Legendre transform
was introduced in [5] and studied in [6,18]. A linear time algorithm was later
introduced by Lucet to compute the discrete Legendre transform [19]. Those
algorithms are numeric and do not provide symbolic expressions.
Computation of the conjugate of convex univariate PLQ functions have been
well studied in the literature and linear time algorithms have been developed
in [8,11]. Recently, a linear time algorithm to compute the conjugate of convex
bivariate PLQ functions was proposed in [12].
Let f : Rn → R ∪ {+∞} be a piecewise function, i.e. f (x) = fi (x) if x ∈ Pi
for i = 1, . . . , N . From [13, Theorem 2.4.1], we have (inf i fi )∗ = supi fi∗ , and
from [13, Proposition 2.6.1], conv(inf i (fi + IPi )) = conv(inf i [conv(fi + IPi )])
where IPi is the indicator function for Pi . Hence, conv(inf i (fi + IPi )) =
(supi [conv(fi + IPi )]∗ )∗ . This provides an algorithm to compute the closed con-
vex envelope: (1) compute the convex envelope of each piece, (2) compute the
conjugate of the convex envelope of each piece, (3) compute the maximum of all
Towards the Biconjugate of Bivariate Piecewise Quadratic Functions 259
the conjugates, and (4) compute the conjugate of the function obtained in (3)
to obtain the biconjugate. The present work focuses on Step (2).
Recall that given a quadratic function over a polytope, the eigenvalues of
its symmetric matrix determine how difficult its convex envelope is to compute
(for computational purposes, we can ignore the affine part of the function). If
the matrix is semi-definite (positive or negative), the convex envelope is easily
computed. When it is indefinite, a change of coordinate reduces the problem
to finding the convex envelope of the function (x, y) → xy over a polytope, for
which step (1) is known [17].
The paper is organized as follow. Section 3 focuses on the domain of the
conjugate while Sect. 4 determines the symbolic expressions. Section 5 concludes
the paper with future work.
Given a nonconvex PLQ function, we first compute the closed convex envelope
of each piece and obtain a piecewise rational function [17]. We now compute
the conjugate of such a rational function over a polytope by first computing its
domain, which will turn out to be a parabolic subdivision. Recall that for PLQ
functions, dom f ∗ = ∂f (dom f ). We decompose the polytope dom f = P into
its interior, its vertexes, and its edges.
Following [17], we write a rational function as
Proposition
1 (Interior). Consider r defined by (1), there exists αij such
that x∈dom(r) ∂r(x) = {s : Cr (s) = 0}, where Cr (s) = α11 s21 + α12 s1 s2 +
α22 s22 + α10 s1 + α02 s2 + α00 and {s : Cr (s) = 0} is a parabolic curve.
Proof. Note ξ1 (x) = ξ11 x1 + ξ12 x2 + ξ10 , ξ2 (x) = ξ21 x1 + ξ22 x2 + ξ20 and ξ0 (x) =
ξ01 x1 + ξ02 x2 + ξ00 . Since r is differentiable everywhere in dom(r) = R2 /{z :
ξ2 (z) = 0}, for any x ∈ dom(r) we compute s = ∇r(x) as si = 2ξ1i t − ξ2i t2 + ξ0i
for i = 1, 2, where t = (ξ11 x1 + ξ12 x2 + ξ10 )/(ξ21 x1 + ξ22 x2 + ξ20 ). Hence, s =
∇r(x) represents the parametric equation of a conic section, and by eliminating
t, we get Cr (s) = 0 where
Next we compute the subdifferential at any vertex in the smooth case (the
proof involves a straightforward computation of the normal cone).
Lemma 1 (Vertices). For g ∈ C 1 , P a polytope, and v vertex. Let f (x) =
g(x) + IP (x). Then ∂f (v) is an unbounded polyhedral set.
There is one vertex at which both numerator and denominator equal zero
although the rational function can be extended by continuity over the polytope;
we conjecture the result based on numerous observations.
Proof. For all x ∈ ri(E), ∂f (x) = ∂g(x) + NP (x). Let L(x) = x2 − mx1 − c be
the expression of the line joining xl and xu such that P ⊂ {x : L(x) ≤ 0}. (The
case P ⊂ {x : L(x) ≥ 0} is analogous.)
Since P ⊂ R2 is a polytope, for all x ∈ ri(E), NP (x) = {s : s = λ∇L(x), λ ≥
0} is the normal cone of P at x and can be written NP (x) = {s : s1 + ms2 =
0, s2 ≥ 0}. In the special case when E = {x : x1 = d, xl1 ≤ x1 ≤ xu1 },
L(x) = x1 − d and NP (x) = {s : s2 = 0, s1 ≥ 0}. Now for any x ∈ ri(E),
∂f (x) = ∂g(x) + NP (x) = {s + ∇g(x) : s1 + ms2 = 0, s2 ≥ 0}, so
∂f (x) = {s + ∇g(x) : s1 + ms2 = 0, s2 ≥ 0}.
x∈ri(E) x∈ri(E)
is a parabolic region.
∂f(v1)
∂f(x)
⋃
x∈ri(E13)
∂f(x)
⋃
x∈ri(E12)
∂f(v3)
∂f(v2)
4 Conjugate Expressions
Now that we know dom f ∗ as a parabolic subdivision, we turn to the computa-
tion of its expression on each piece. We note
ψ (s , s )
gf (s1 , s2 ) = 1 1 2 + ψ0 (s1 , s2 ) (2)
ζ00 ψ1/2 (s1 , s2 )
gq (s1 , s2 ) = ζ11 s21 + ζ12 s1 s2 + ζ22 s22 + ζ10 s1 + ζ01 s2 + ζ00 (3)
gl (s1 , s2 ) = ζ10 s1 + ζ01 s2 + ζ00 (4)
Proof. We compute the critical points for the optimization problem defining f ∗ .
Case 1 (Vertices) For any vertex v, f ∗ (s) = s1 v1 + s2 v2 − r(v) is a linear
function of form (4) defined over an unbounded polyhedral set (from Lemma 1).
In the special case, when ∂f (v) is a parabolic region (Conjecture 1), the conjugate
would again be a linear function but defined over a parabolic region.
Towards the Biconjugate of Bivariate Piecewise Quadratic Functions 263
x2 = mx1 + c, (6)
where all γij and γij/k are defined in the coefficients of r, and parameters m and
c. When ξ21 + mξ22 = 0, solving (5) and (6), leads to a quadratic equation in t
with coefficients as linear functions in s.
By substituting (7) and (6) in f ∗ (s), when ξ21 + mξ22 = 0, we have
ψ (s , s )
f ∗ (s) = 1 1 2 + ψ0 (s1 , s2 ),
ζ00 ψ1/2 (s1 , s2 )
where all ζij , ψi and ψi/j are defined in the coefficients of r, and parameters m
and c, with ψi (s) and ψi/j(s) linear functions in s.
From Proposition 2, x∈ri(E) ∂f (x) is either a parabolic region or a ray.
So for any E, the conjugate
is a fractional function of form (2) defined over a
parabolic region. When x∈ri(E) ∂f (x) is a ray, the computation of the conjugate
is deduced from its neighbours by continuity.
Case 3 (Interior) Since x∈int(P ) ∂f (x) is contained in a parabolic arc
(from Corollary 1), the computation of the conjugate is deduced by continuity.
x22
Example 2. For a bivariate rational function r(x) = defined over
x2 − x1 + 1
a polytope P with vertices v1 = (1, 1), v2 = (1, 0) and v3 = (0, 0), let f (x) =
r(x) + IP (x).
264 D. Kumar and Y. Lucet
where
R1 = {s : s2 ≥ −s1 + 2, s2 ≥ 1}
R2 = {s : s2 ≥ s1 , s21 + 2s1 s2 − 4s1 + s22 ≤ 0}
R3 = {s : s2 ≤ s1 , s2 ≤ 1, s1 ≥ 0}
R4 = {s : 0 ≤ s1 , s2 ≤ −s1 }
R5 = {s : s2 ≥ −s1 , s2 ≤ −s1 + 2, s2 ≥ s1 , s21 + 2s1 s2 − 4s1 + s22 ≥ 0}.
Step 1
Qi, Pi [Loc16] ri, Pi
Qi, Pi Convex Envelope
ri, Pi
PLQ
Step 2
?
Step 4 Step 3
frj, Prj
?, Pi Conjugate Max Conjugate
Conjugate
Biconjugate
Fig. 3. Summary
References
1. Al-Khayyal, F.A., Falk, J.E.: Jointly constrained biconvex programming. Math.
Oper. Res. 8(2), 273–286 (1983)
2. Anstreicher, K.M.: On convex relaxations for quadratically constrained quadratic
programming. Math. Program. 136(2), 233–251 (2012)
3. Bauschke, H.H., Goebel, R., Lucet, Y., Wang, X.: The proximal average: basic
theory. SIAM J. Optim. 19(2), 766–785 (2008)
4. Bauschke, H.H., Lucet, Y., Trienis, M.: How to transform one convex function
continuously into another. SIAM Rev. 50(1), 115–132 (2008)
5. Brenier, Y.: Un algorithme rapide pour le calcul de transformées de Legendre-
Fenchel discretes. Comptes rendus de l’Académie des sciences. Série 1,
Mathématique 308(20), 587–589 (1989)
6. Corrias, L.: Fast Legendre-Fenchel transform and applications to Hamilton-Jacobi
equations and conservation laws. SIAM J. Numer. Anal. 33(4), 1534–1558 (1996)
7. Crama, Y.: Recognition problems for special classes of polynomials in 0–1 variables.
Math. Program. 44(1–3), 139–155 (1989)
266 D. Kumar and Y. Lucet
8. Gardiner, B., Jakee, K., Lucet, Y.: Computing the partial conjugate of convex
piecewise linear-quadratic bivariate functions. Comput. Optim. Appl. 58(1), 249–
272 (2014)
9. Gardiner, B., Lucet, Y.: Convex hull algorithms for piecewise linear-quadratic func-
tions in computational convex analysis. Set-Valued Var. Anal. 18(3–4), 467–482
(2010)
10. Gardiner, B., Lucet, Y.: Graph-matrix calculus for computational convex analysis.
In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp.
243–259. Springer (2011)
11. Gardiner, B., Lucet, Y.: Computing the conjugate of convex piecewise linear-
quadratic bivariate functions. Math. Program. 139(1–2), 161–184 (2013)
12. Haque, T., Lucet, Y.: A linear-time algorithm to compute the conjugate of convex
piecewise linear-quadratic bivariate functions. Comput. Optim. Appl. 70(2), 593–
613 (2018)
13. Hiriart-Urruty, J.B., Lemaréchal, C.: Convex analysis and minimization algorithms
II: Advanced Theory and Bundle Methods. Springer Science & Business Media
(1993)
14. Jach, M., Michaels, D., Weismantel, R.: The convex envelope of (n-1)-convex func-
tions. SIAM J. Optim. 19(3), 1451–1466 (2008)
15. Linderoth, J.: A simplicial branch-and-bound algorithm for solving quadratically
constrained quadratic programs. Math. Program. 103(2), 251–282 (2005)
16. Locatelli, M.: A technique to derive the analytical form of convex envelopes for
some bivariate functions. J. Glob. Optim. 59(2–3), 477–501 (2014)
17. Locatelli, M.: Polyhedral subdivisions and functional forms for the convex
envelopes of bilinear, fractional and other bivariate functions over general poly-
topes. J. Glob. Optim. 66(4), 629–668 (2016)
18. Lucet, Y.: A fast computational algorithm for the Legendre-Fenchel transform.
Comput. Optim. Appl. 6(1), 27–57 (1996)
19. Lucet, Y.: Faster than the fast Legendre transform, the linear-time Legendre trans-
form. Numer. Algorithms 16(2), 171–185 (1997)
20. Lucet, Y.: Fast Moreau envelope computation i: Numerical algorithms. Numer.
Algorithms 43(3), 235–249 (2006)
21. Lucet, Y.: What shape is your conjugate? A survey of computational convex anal-
ysis and its applications. SIAM Rev. 52(3), 505–542 (2010)
22. Lucet, Y., Bauschke, H.H., Trienis, M.: The piecewise linear-quadratic model for
computational convex analysis. Comput. Optim. Appl. 43(1), 95–118 (2009)
23. McCormick, G.P.: Computability of global solutions to factorable nonconvex pro-
grams: Part iconvex underestimating problems. Math. Program. 10(1), 147–175
(1976)
24. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317. Springer Science &
Business Media (1998)
25. Sherali, H.D., Alameddine, A.: An explicit characterization of the convex envelope
of a bivariate bilinear function over special polytopes. Ann. Oper. Res. 25(1),
197–209 (1990)
26. Tardella, F.: On the existence of polyhedral convex envelopes. In: Frontiers in
Global Optimization, pp. 563–573. Springer (2004)
27. Tardella, F.: Existence and sum decomposition of vertex polyhedral convex
envelopes. Optim. Lett. 2(3), 363–375 (2008)
Tractable Relaxations for the Cubic
One-Spherical Optimization Problem
1 Introduction
The cubic one-spherical optimization problem has the following form:
n
CSP : minn f (x) := Ax = 3
aijk xi xj xk
x∈R
i,j,k=1
s.t. x = 1,
Our main purpose in this work is to develop new and efficient relaxations for
problem CSP. For that, we propose different approaches, described in Sect. 3. In
Sect. 4, we present preliminary numerical results concerning the quality of the
resulting lower bounds and the computational effort to compute them, for small
instances from the literature as well as larger randomly generated instances on
up to n = 200 variables.
with
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
a111 a112 a113 a211 a212 a213 a311 a312 a313
A1 = ⎝ a121 a122 a123 ⎠ , A2 = ⎝ a221 a222 a223 ⎠ , A3 = ⎝ a321 a322 a323 ⎠ ,
a131 a132 a133 a231 a232 a233 a331 a332 a333
and,
a122 a123 a211 a213 a311 a312
Ã1 := , Ã2 := , Ã3 := .
a132 a133 a231 a233 a321 a322
In the following, for a symmetric real matrix X, we will denote by λmin (X) the
smallest eigenvalue of X. Given a vector x ∈ Rn and ∈ {1, . . . , n}, we define the
vector xˆ ∈ Rn−1 as xˆ := (x1 , . . . , x−1 , x+1 , . . . , xn ), i.e., the vector x where
the -th component is omitted.
We first decompose the objective function of CSP by the first index, as follows:
n n
n
n
aijk xi xj xk = xi aijk xj xk + 2x2i aiij xj + aiii x3i (1)
i,j,k=1 i=1 j,k=1 j=1
j,k=i j=i
Then we conclude
n
min aijk xi xj xk ≥
x=1
i,j,k=1
n
n
n (2)
min xi aijk xj xk + 2x2i aiij xj + aiii x3i .
x=1
i=1 j,k=1 j=1
j,k=i j=i
using the notation of Sect. 2. Multiplying the right hand side with xi and taking
the minimum over xi ∈ [−1, 1], we obtain
n 2√
min xi min
√ aijk xj xk = − 3 |λmin (Ãi )| . (5)
xi ∈[−1,1] xî = 1−x2i j,k=1
9
j,k=i
Moreover,
n
n
min
√ aiij xj = (−1)
a2
iij 1 − x2i (6)
xî = 1−x2i j=1 j=1
j=i j=i
270 C. Buchheim et al.
and hence
n
n
4√
min 2x2i min
√ aiij xj =− 3
a2iij . (7)
xi ∈[−1,1] xî = 1−x2i j=1
9 j=1
j=i j=i
Finally,
min aiii x3i = −|aiii | . (8)
x=1
The time to calculate this lower bound is dominated by computing the smallest
eigenvalues of the n symmetric (n − 1) × (n − 1)-matrices Ã1 , . . . , Ãn .
n
n
= min min aijk xi (1 − x2i )yj yk + 2 aiij x2i 1 − x2i yj + aiii x3i
xi ∈[−1,1] y=1
j,k=1 j=1
j,k=i j=i
= min min xi (1 − x2i )y Ãi y + x2i 1 − x2i a
i y + a 3
iii i ,
x (9)
xi ∈[−1,1] y=1
where we set
y := √ 1
(x1 , x2 , . . . , xi−1 , xi+1 , . . . , xn ) =√ 1
x ∈ Rn−1 ,
1−x2i 1−x2i î
Taking into account that we only aim at finding lower bounds for CSP, our
strategy is to fix α = −λin − in (12) and α = λi1 − in (13), with > 0, for
each i = 1, . . . , n. We thus obtain a lower bound as
n
n
min aijk xi xj xk ≥ min {ωi , νi , −|aiii |} (14)
x=1
i,j,k=1 i=1
where
n
x3i (vij ai )2
ωi := min xi (1 − x2i )(λin + ) − + aiii x3i ,
xi ∈(−1,0)
j=1
4(λij − λin − )
(15)
n
x3i (vij ai )2
νi := min xi (1 − x2i )(λi1 − ) − + aiii x3i .
xi ∈(0,1)
j=1
4(λij − λi1 + )
The value −|aiii | in (14) covers the case xi ∈ {−1, 0, 1}. Note that the minimiza-
tion problems (15) are univariate polynomial optimization problems of degree 3
and hence easily solved by a closed formula.
The computational effort for computing this bound is dominated by the
diagonalization of the n symmetric (n − 1) × (n − 1)-matrices Ã1 , . . . , Ãn .
4 Numerical Results
We implemented the routines to compute lower bounds to the cubic one-spherical
optimization problem CSP, based on the approaches described in the previous
section, in MATLAB R2017b. Our experiments were run on a cluster of 64-bit
Intel(R) Xeon(R) E5-4620 processors running at 2.20 GHz with 252.4 GB of
memory.
We solve both problems in (15) by a closed formula, for 100 different values
of equally distributed in the interval [10−5 , 5], and report the best (i.e., largest)
bound obtained.
using the algorithm introduced by Lucidi and Palagi [6]. This however does not
yield a closed form solution, so that, different from the previous sections, we
cannot minimize the resulting expression exactly over xi ∈ [−1, 1]. Instead, we
discretize the interval [−1, 1] and use the smallest values obtained for any grid
point. Note however that this approach does not yield a safe lower bound in
general, as we cannot estimate the error incurred by the discretization.
Tractable Relaxations for the Cubic One-Spherical Optimization Problem 273
As mentioned above, Approach 3 does not give a rigorous lower bound, since
instead of globally solving problem (16) for each i, we obtain the best optimal
solution among the problems where xi is fixed at a point in a discretization set in
interval [−1, 1]. To have a better idea of the quality of these solutions we did an
experiment where we obtain a solution for (16), for each i, 10 times. At first we
have only 5 discretized points. Then, at each iteration k = 2, . . . , 10, we add 50k
more points to the discretization set. The points added are always equidistant.
At the last iteration, we consider 2255 points. The lower bounds obtained for
different numbers of discretization points (npoints) are depicted in Table 2. We
observe that when increasing the number of points in the discretization set from
155, the solutions obtained are very similar to each other. The percentage relative
difference shown in the last column, given by
rel.dif := (lower.bound(k − 1) − lower.bound(k))/|lower.bound(k − 1)| ∗ 100,
is always smaller than 0.04% when k > 2, for the two first examples, and smaller
than 0.16% for Example 3. These results suggest that the lower bounds obtained
by Approach 3 quickly converge to valid bounds when the number of discretiza-
tion points increases.
274 C. Buchheim et al.
Example 1
it npoints lower bound rel dif
1 5 –7.661209e-001 -
2 55 –1.082002e+000 41.2312600
3 155 –1.083636e+000 0.1510226
4 305 –1.083790e+000 0.0142051
5 505 –1.084534e+000 0.0686056
6 755 –1.084610e+000 0.0070751
7 1055 –1.084610e+000 0.0000000
8 1405 –1.084610e+000 0.0000000
9 1805 –1.084610e+000 0.0000000
10 2255 -1.084688e+000 0.0071920
Example 2
it npoints lower bound rel dif
1 5 –2.117380e+000 -
2 55 –2.972419e+000 40.3819833
3 155 –2.973344e+000 0.0311089
4 305 –2.973650e+000 0.0102988
5 505 –2.973650e+000 0.0000000
6 755 –2.973650e+000 0.0000000
7 1055 –2.973688e+000 0.0012582
8 1405 –2.973770e+000 0.0027572
9 1805 –2.973770e+000 0.0000000
10 2255 –2.973770e+000 0.0000000
Example 3
it npoints lower bound rel dif
1 5 –7.405235e+000 -
2 55 –1.051328e+001 41.9709771
3 155 –1.131087e+001 7.5864276
4 305 –1.131287e+001 0.0176827
5 505 –1.131301e+001 0.0012360
6 755 –1.131359e+001 0.0051961
7 1055 –1.131359e+001 0.0000000
8 1405 –1.131405e+001 0.0039842
9 1805 –1.133194e+001 0.1581602
10 2255 –1.133194e+001 0.0000000
Tractable Relaxations for the Cubic One-Spherical Optimization Problem 275
For Approach 3, for each i = 1, . . . , n, we solve the quadratic problem (16), for
200 equally spaced points xi in the interval [−1, 1], using the algorithm described
in [6]. The computational time needed for this approach is large and significantly
increases with n. Therefore, we only apply it for the smallest instance in Table 3.
We emphasize once more that the main objective of applying Approach 3 is to
have an evaluation of the quality of the lower bounds computed by the other
approaches. Note that, for all instances n = 3 in case the number of discretized
points in Approach 3 approaches infinity, we should have its solution converg-
ing to the best possible bound given by Approach 2. For the larger instances in
Table 4, we apply our two approaches actually intended to generate lower bounds
for the CSP.
Table 4. Results for random instances, n = 5, 10, 30, 50, 100, 200.
References
1. Bader, B.W., Kolda, T.G., et al.: MATLAB Tensor Toolbox Version 3.0-dev, Oct
2017. https://www.tensortoolbox.org
2. Basser, P.J., Mattiello, J., LeBihan, D.: MR diffusion tensor spectroscopy and imag-
ing. Biophys. J. 66, 259–267 (1994)
3. Basser, P.J., Mattiello, J., LeBihan, D.: Estimation of the effective seldiffusion tensor
from the NMR spin echo. J. Magn. Reson. B 103, 247–254 (1994)
4. Basser, P.J., Jones, D.K.: Diffusion-tensor MRI: theory, experimental design and
data analysis-a technical review. NMR Biomed. 15, 456–467 (2002)
5. Liu, C.L., Bammer, R., Acar, B., Moseley, M.E.: Characterizing non-gaussian diffu-
sion by using generalized diffusion tensors. Magn. Reson. Med. 51, 924–937 (2004)
6. Nesterov, Y.E.: Random walk in a simplex and quadratic optimization over convex
polytopes. CORE Discussion Paper 2003/71 CORE-UCL (2003)
7. Nie, J., Wang, L.: Semidefinte relaxations for best rank-1 tensor approximations.
SIAM J. Matrix Anal. Appl. 35, 1155–1179 (2014)
8. Stern, R.J., Wolkowicz, H.: Indefinite trust region subproblems and nonsymmetric
eigenvalue perturbations. SIAM J. Optim. 5, 286–313 (1995)
9. Zhang, X., Qi, L., Ye, Y.: The cubic spherical optimization problems. Math. Com-
put. 81(279), 1513–1525 (2012)
DC Programming and DCA
A DC Algorithm for Solving
Multiobjective Stochatic Problem via
Exponential Utility Functions
1 Introduction
Multiobjective stochastic linear programming (MOSLP) is a tool for modeling
many concrete real-life problems because it is not obvious to have the complete
data about problems parameters. Such a class of problems includes investment
and energy resources planning [1,20], manufacturing systems in production plan-
ning [7,8], mineral blending [12], water use planning [2,5] and multi-product
batch plant design [23]. So, to deal with this type of problems it is required to
introduce a randomness framework.
In order to obtain the solutions for these multiobjective stochastic problems,
it is necessary to combine techniques used in stochastic programming and multi-
objective programming. From this, two approaches are considered, both of them
involve a double transformation. The difference between the two approaches is
the order in which the transformations are carried out. Ben Abdelaziz qualified
as multiobjective approach the perspective which transform first, the stochastic
multiobjective problem into its equivalent multiobjective deterministic problem,
and stochastic approach the techniques that transform in first the stochastic
multiobjective problem into a monobjective stochastic problem [4].
Several interactive methods for solving (MOSLP) problems have been devel-
oped. We can mention the Probabilistic Trade-off Development Method or
PROTRADE by Goicoechea et al. [10]. The Strange method proposed by
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 279–288, 2020.
https://doi.org/10.1007/978-3-030-21803-4_29
280 R. Kasri and F. Bellahcene
Teghem et al. [21] and the interactive method with recourse which uses a two
stage mathematical programming model by Klein et al. [11].
In this paper, we propose another approach which is a combination between
the multiobjective approach and a nonconvex technique (Difference of Convex
functions), to solve the multiobjective stochastic linear problem with normal
multivariate distributions. The DC programming and DC Algorithm have been
introduced by Pham Dinh Tao in 1985 and developed by Le Thi and Pham Dinh
since 1994 [13–16]. This method has proved its efficiency in a large number of
noncovex problems [17–19].
The paper is structured as follows: In Sect. 2, the problem formulation is
given. Section 3, shows how to reformulate the problem by introducing utility
functions and applying the weighting method. Section 4 presents a review of DC
programming and DCA. Section 5 illustrates the application of DC programming
and DCA for the resulting quadratic problem. Our, experimental results are
presented in the last section.
2 Problem Statement
Let us consider the multiobjective stochastic linear programming problem for-
mulated as follows:
min (c̃1 x, c̃2 x, ..., c̃q x),
(1)
s.t. x ∈ S,
where x = (x1 , x2 , ..., xn ) denotes the n-dimensional vector of decision variables.
The feasible set S is a subset of n-dimensional real vector space IRn characterized
by a set of constraint inequalities of the form Ax ≤ b; where A is an m × n
coefficient matrix and b an m-dimensional column vector. We assume that S
is nonempty and compact in IRn . Each vector c̃k follows a normal distribution
with mean ck and covariance matrix Vk . Therefore, every objective c̃k x follows
a normal distribution with mean μk = c̄k x and variance σk2 = xt Vk x.
In the following section, we will be mainly interested in the main way to
transform problem (1) into an equivalent multiobjective deterministic problem
which in turn will be reformulated as a DC programming problem.
First, we will take into consideration the notion of risk. Assuming that deci-
sion makers’ preferences can be represented by utility functions, under plausible
assumptions about decision makers’s risk attitudes, problem (1) is interpreted
as:
min(E[U (c̃1 x)], E[U (c̃2 x)], ..., E[U (c̃q x)]),
x (2)
s.t. x ∈ S.
The utility function U is generally assumed to be continuous and convex. In this
paper, we consider an exponential utility function of the form U (r) = 1 − e−ar ,
A DC Algorithm for Solving Multiobjective Stochatic Problem 281
where r is the value of the objective and a the coefficient of incurred risk (a
large corresponds to a conservative attitude). Our choice is motivated by the fact
that exponential utility functions will lead to an equivalent quadratic problem
which encouraged us to design a DC method to solve it simply and accurately.
Therefore, if r ∼ N (μ, σ 2 ), we have:
2 2
+∞
−ar e−(r−μ) /2σ dr σ 2 a2
E(U (r)) = (1 − e ) √ = 1 − e 2 −μa .
−∞ 2π σ
2 2 2
Minimizing E(U (r)) means maximizing σ 2a − μa or minimizing μ − σ2a .
Our aim is to search for efficient solutions of the multiobjective deterministic
problem (2) according to the following definition:
Applying the widely used method for finding efficient solutions in multiobjec-
tive programming problems, namely the weighting sum method [3,6], we assign
to each objective function in (2) a non-negative weight wk and aggregate the
objectives functions in order to obtain a single function. Thus, problem (2) is
reduced to:
q
min wk E[U (c̃k x)],
x k=1
(3)
s.t. x ∈ S,
wk ∈ Λ ∀k ∈ {1, . . . , q},
or equivalently
q
minE[U ( wk c̃k x)],
x k=1 (4)
s.t. x ∈ S,
wk ∈ Λ ∀k ∈ {1, . . . , q},
q
where Λ = {wk : wk = 1, wk ≥ 0 ∀k ∈ {1, . . . , q}}.
k=1
q
q
σ2 = wk2 σk2 + 2 wk ws σks , (6)
k=1 k,s=1
where σks denotes the covariance of the random objectives c̄k x and c̄s x. Finally,
we obtain the following quadratic problem:
⎛ ⎞
q q q
min wk c̄tk x − a2 ⎝ wk2 σk2 + 2 wk ws σks ⎠ ,
x k=1 k=1 k,s=1 (7)
k<s
s.t. x ∈ S,
or ⎛ ⎞
q
q
q
min wk c̄tk x − a ⎝ wk2 xt Vk x + 2 wk ws xt Vks x⎠ ,
x 2 (8)
k=1 k=1 k,s=1
k<s
s.t. x ∈ S,
where c̄k = (c̄k1 , c̄k2 , ..., c̄kn ) is the k-th component of the expected value of the
random multinormal vector c̃, Vks and Vk are elements of the positive definite
covariance matrix V of c̃:
⎛ ⎞
V1 V12 . . . V1s . . . V1q
⎜ V21 V2 . . . V2s . . . V2q ⎟
⎜ ⎟
⎜ ... ... ... ... ... ... ⎟
V =⎜ ⎜ ⎟.
⎟
⎜ Vk1 Vk2 . . . Vks . . . Vkq ⎟
⎝ ... ... ... ... ... ⎠
Vq1 Vq2 . . . Vqs . . . Vq
From [15], the most used necessary optimality conditions for problem (9), is:
DCA constructs two sequences {xi } and {y i } (candidates for being primal and
dual solutions, respectively), such that their corresponding limit points satisfy
the local optimality conditions (12) and (13). There are two forms of DCA: the
simplified DCA and the complete DCA. In practice, the simplified DCA is most
used than the complete DCA because it is less expensive [13]. The simplified
DCA has the following scheme [13,18]:
Simplified DCA Algorithm
Step 1: Let x0 ∈ IRn given. Set i = 0.
Step 2: Calculate y i ∈ ∂h(xi ).
Step 3: Calculate xi+1 ∈ ∂g ∗ (y i ).
Step 4: If a convergence criterion is satisfied, then stop, else set i = i + 1 and
goto step 2.
We also can note that: [15,18]
– DCA is a descent method without linesearch.
– If g(xi+1 ) − h(xi+1 ) = g(xi ) − h(xi ), then xi is a critical point of f and y i is
a critical point of h∗ − g ∗ .
– DCA has a linear convergence for general DC programs, and has a finite
convergence for polyhedral programs.
– If the optimal value of problem (8) is finite and the sequences {xi } and {y i }
are bounded then every limit point x (resp. y) of the sequence {xi } (resp.
{y i } is a critical point of g − h (resp. h∗ − g ∗ ).
with
q
g(x) = χS (x) + wk c̄tk x,
k=1
After that, we will compute the two sequences {xi } and {y i } defined as follows:
y i ∈ ∂h(xi )andxi+1 ∈ ∂g ∗ (y i ).
Computation of y i :
We choose y i ∈ ∂h(xi ) = ∇h(xi ) .
It is equivalent to calculate:
⎛ ⎞
q
q
yi = a ⎝ wk2 Vk xi + 2 wk ws Vks xi ⎠ . (15)
k=1 k,s=1
k<s
Computation of xi :
We can choose xi+1 ∈ ∂g ∗ (y i ) as the solution of the following convex problem
q
min wk c̄tk x − xt y i : x ∈ S . (16)
k=1
The solution xi is optimal for the problem (14) if one of the following conditions
is verified
|(g − h)(xi+1 ) − (g − h)(xi )| ≤ , (17)
(xi+1 ) − (xi ) ≤ . (18)
Finally, the DC Algorithm that we can apply to problem (8) with the decom-
position (14) can be described as follows:
Algorithm DCAMOSLP
6 Experimental Results
To demonstrate the performances of our algorithm, two numerical examples
will be given in this section. The first is taken from [6] to show the efficiency
of the algorithm. The second example is given to present the performances of
DCAMOSLP according to the variation of certain parameters.
Let us consider the following stochastic bi-objective programming problem:
⎧
⎪
⎪
min (c̃11 x1 + c̃12 x2 , c̃21 x1 + c̃22 x2 ),
⎨ x
s.t. x1 + 2x2 ≥ 4, (19)
⎪
⎪ x1 , x2 ≤ 3,
⎩
x1 , x2 ≥ 0,
A DC Algorithm for Solving Multiobjective Stochatic Problem 285
with c̃ = (c̃11 , c̃12 , c̃21 , c̃22 )t being a random vector multinormal with expected
value c = (0.5, 1, 1, 2.5)t and with positive definite covariance matrix:
⎛ ⎞
25 0 0 3
⎜ 0 25 3 0 ⎟
V =⎜⎝ 0 3 1 0⎠.
⎟
3 0 09
For this test, we will take = 10−6 and x0 = (0, 0) as initial point. The
application of algorithm DCAMOSLP to this problem for different values of the
coefficient of incurred risk a and a fixed weight vector μ = (0.8, 0.2)t gives the
results in Table 1 where nbr it is the number of iterations.
The non dominated solution (3, 0.5) is obtained for values of parameter a ≤
10−2 . The non dominated solution for w = (0.8, 0.2)t in Ref. [6] is (3, 0.5).
We also note that the number of iterations decreases with the decrease of the
parameter a.
Now we will test the performance of the algorithm with a second problem
which has a larger set of feasible solutions.
⎧
⎪
⎪
min (c̃11 x1 + c̃12 x2 , c̃21 x1 + c̃22 x2 ),
⎨ x
s.t. 2x1 + 3x2 ≥ 10, (20)
⎪
⎪ x1 , x2 ≤ 5,
⎩
x1 , x2 ≥ 0,
3 0 08
The results of application of algorithm DCAMOSLP to this problem for different
values of parameter a and the weight vector w are given in Table 2.
286 R. Kasri and F. Bellahcene
We observe from the results that the algorithm DCAMOSLP gives efficient
solutions of the multiobjective stochastic problem for small values of the coeffi-
cient of incurred risk (a ≤ 10−2 ). The number of iterations decreases with the
decrease of the parameter a.
7 Conclusion
We have presented a DC optimization approach for solving a multiobjective
stochastic problem with multivariate normal distributions in which the objective
functions should be minimized. The experimental results show the efficiency of
the algorithm. However further experimental validation of this observation and
comparison with existing methods is needed. As future works, an algorithm for
a stochastic multiobjective maximization problem is planned.
A DC Algorithm for Solving Multiobjective Stochatic Problem 287
References
1. Alarcon-Rodriguez, A., Ault, G., Galloway, S.: Multiobjective planning of dis-
tributed energy resources review of the state-of-the-art. Renew. Sustain. Energy
Rev. 14(5), 1353–1366 (2010)
2. Ben Abdelaziz, F., Mejri, S.: Application of goal programming in a multi-objective
reservoir operation model in Tunisia. Eur. J. Oper. Res. 133, 352–361 (2001)
3. Ben Abdelaziz, F., Lang, P., Nadeau, R.: Distributional unanimity in multiobjec-
tive stochastic linear programming. In: Clmaco, J. (ed.) Multicriteria Analysis.
Springer-Verlag, Heidelberg (1997)
4. Ben Abdelaziz, F.: L’efficacité en programmation multi-objectifs stochastique.
Ph.D. Thesis, Université de Laval, Québec (1992)
5. Bravo, M., Gonzalez, I.: Applying stochastic goal programming: a case study on
water use planning. Eur. J. Oper. Res. 2(196), 1123–1129 (2009)
6. Caballero, R., Cerdá, E., del Mar Muñoz, M., Rey, L.: Stochastic approach versus
multiobjective approach for obtaining efficient solutions in stochastic multiobjec-
tive programming problems. Eur. J. Oper. Res. 158(3), 633–648 (2004)
7. Caner, T.Z., Tamer, U.A.: Tactical level planning in float glass manufacturing
with co- production, random yields and substitutable products. Eur. J. Oper. Res.
199(1), 252–261 (2009)
8. Fazlollahtabar, H., Mahdavi, I.: Applying stochastic programming for optimizing
production time and cost in an automated manufacturing system. In: International
Conference on Computers & Industrial Engineering, Troyes, 6–9 July 2009, pp.
1226–1230 (2009)
9. Geoffrion, A.M.: Proper efficiency and the theory of vector maximimization. J.
Math. Anal. Appl. 22(3), 618–630 (1968)
10. Goicoechea, A., Dukstein, L., Bulfin, R.T.: Multiobjective Stochastic Program-
ming. The PROTRADE-Method. Operation Research Society of America (1976)
11. Klein, G., Moskowitz, H., Ravindran, A.: Interactive multiobjective optimization
under uncertainty. Manag. Sci. 36(1), 58–75 (1990)
12. Kumral, M.: Application of chance-constrained programming based on multiobjec-
tive simulated annealing to solve mineral blending problem. Eng. Optim. 35(6),
661–673 (2003)
13. Le Thi, H.A., Pham Dinh, T.: Solving a class of linearly constrained indefinite-
quadratic problems by DC algorithms. J. Glob. Optim. 11(3), 253–285 (1997b)
14. Le Thi, H.A., Pham Dinh, T.: A continuous approach for globally solving linearly
constrained quadratic zero-one programming problems. Optimization 50, 93–120
(2001)
15. Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) program-
ming and DCA revisited with DC models of real world nonconvex optimization
problems. Ann. Oper. Res. 133, 23–46 (2005)
16. Le Thi, H.A., Pham Dinh, T., Huynh, V.N.: Exact penalty and error bounds in
DC programming. J. Glob. Opt. 52, 509–535 (2012)
17. Le Thi, H.A., Pham Dinh, T., Nguyen, C.N., Nguyen, V.T.: DC programming
techniques for solving a class of nonlinear bilevel programs. J. Glob. Opt. 44,
313–337 (2009)
18. Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to DC programming:
theory, algorithms and applications (dedicated to Professor Hoang Tuy on the
occasion of his 70th birthday). Acta Math. Vietnam. 22, 289–355 (1997a)
288 R. Kasri and F. Bellahcene
19. Pham Dinh, T., Nguyen, C.N., Le Thi, H.A.: DC programming and DCA for
globally solving the value-at-risk. Comput. Manag. Sci. 6, 477–501 (2009)
20. Teghem, J., Kunsch, P.: Application of multiobjective stochastic linear program-
ming to power systems planning. Eng. Costs Prod. Econ. 9(13), 83–89 (1985)
21. Teghem, J., Dufrane, D., Thauvoye, M., Kunsch, P.L.: Strange, an interactive
method for multiobjective stochastic linear programming under uncertainty. Eur.
J. Oper. Res. 26(1), 65–82 (1986)
22. Vahidinasab, V., Jadid, S.: Stochastic multiobjective self-scheduling of a power
producer in joint energy & reserves markets. Electr. Power Syst. Res. 80(7), 760–
769 (2010)
23. Wang, Z., Jia, X.P., Shi, L.: Optimization of multi-product batch plant design
under uncertainty with environmental considerations. Clean Technol. Environ. Pol-
icy 12(3), 273–282 (2009)
A DCA-Based Approach for Outage
Constrained Robust Secure
Power-Splitting SWIPT MISO System
1 Introduction
Simultaneous wireless information and power transfer (SWIPT) are efficient to
mitigate energy scarcity [6]. However, the secrecy rate is degraded in SWIPT
system since Radio-Frequency signals carry not only information but also energy
[12]. Thus, security in SWIPT systems is a critical issue. Fortunately, physical
layer security (PLS) has proven to secure communication. The challenge to PLS
is that the transmitter needs to know the channel state information (CSI). In
practice, it is hard to gain perfect CSI. The works [2,3,19,21] have considered
robust secure transmission based on deterministic norm-bounded uncertainty
model for SWIPT systems. These works correspond to the active eavesdropping
scenario. In case of passive eavesdropping, the channel error cannot be gained
exactly.
In such case, the concept of outage probability is an effective approach for
secure SWIPT design. The problems in this field such as transmit power mini-
mization [5,7,20], secrecy rate maximization [1,13,14] are nonconvex. Thus, it
is hard to solve them. To overcome the nonconvexity, the semi-definite relax-
ation (SDR) technique was applied in [5,7,20]. However, the relaxed problems
only give an upper bound of the objective value and, hence, the performance is
decreased. Then, Chen et al. applied successive convex approximation (SCA) for
solving the outage constrained secrecy rate maximization problem in [1].
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 289–298, 2020.
https://doi.org/10.1007/978-3-030-21803-4_30
290 P. A. Nguyen and H. A. L. Thi
2 System Model
In this section, we briefly describe the optimization problem reformulated in [1].
Consider a MISO SWIPT network with K transmit-receive pairs in presence
of one eavesdropper on each link. The transmitter is equipped with N antennas.
The channel vector from jth transmitter to kth legitimate user and eaves-
dropper are denoted by hjk ∈ CN and gjk ∈ CN , where gjk ∼ CN (0, Gjk ) and
hjk = hjk + Δhjk , ΔhH N
jk Qjk Δhjk ≤ 1, j, k ∈ K 1, ..., K. Let wk ∈ C be the
transmit beamforming vector and ρk ∈ (0, 1) be the power splitting factor for the
kth link. The additive Gaussian noise of signal for kth information receiver (IR),
kth energy receiver (ER) and kth eavesdropper are ni,k ∼ CN (0, σi,k2
), i = 1, 2, 3.
The considered problem is maximizing the minimal secrecy rate under fol-
lowing constraints:
K
The EH constraint : (1 − ρk ) |hH
jk wj | ≥ Ek , 0 < ρk < 1, j, k ∈ K. (1)
2
j=1
K
The power constraint : wj 2 ≤ Pmax . (2)
j=1
A DCA-Based Approach for Outage Constrained 291
(4)
s. t. (1)−(4). (6)
Denote X = ({ρk }, {Wk }, {βk }, {λjk }, {fjk }, {RI,k }, {sk }, {ak }, {Rk }, R).
The problem is difficult due to the nonconvexity of constraints (16)–(18).
Since it is hard to optimize all variables, a natural idea is to decouple into two
variable subsets (X and ξ) and then use the alternating optimization procedure.
First, we fix ξk = ξ t and compute X. The problem is nonconvex due to
constraints (17) and (18). We propose a DC decomposition for the constraint
(17) as F0 (zk ) = G0 (zk ) − H0 (zk ) ≤ 0, where zk = (Rk , RI,k , sk ), G0 (zk ) =
Rk − RI,k , H0 (zk ) = − log(1 + ξkt sk ). For the constraint (18): F1 (z) = G1 (z) −
1 1
H1 (z) ≤ 0, where z = (x, u, v), G1 (z) = ρ(x2 + u2 + v 2 ), H1 (z) = ρ(x2 +
2 2
u2 + v 2 ) + u − 2x v. It is not easy to estimate ρ such that H1 is convex, thus,
we investigate the DCA-Like based algorithm [9]. Then, we fix X = X t+1 and
compute ξk .
Our proposed algorithm is described in Algorithm 1.
Algorithm 1. The DCA based algorithm for solving the problem (5)
max R
X
s. t. (10) − (16),
ξkt (sk − slk )
R − RI,k + log(1 + ξkt slk ) + ≤ 0, k ∈ K,
(1 + ξkt slk ) ln 2
G1 (z) − H1 (zl ) − ∇H1 (zl ), z − zl ≤ 0.
1.3. while H1 (zl+1 ) < H1 (zl ) + ∇H1 (zl ), zl+1 − zl
ρ ← ηρ and update zl+1 by step 1.2.
1.4. l ← l + 1,
until stopping condition.
2. Fix X = X t+1 , compute ξkt+1 .
Initialization: Let ξkt be an initial point, l ← t.
294 P. A. Nguyen and H. A. L. Thi
max Rt+1
{ξk }
s. t. − ln pk − ξk σ3k −
2
ln(1+ξk Tr(Gjk Wt+1
j ) ≤ 0, k ∈ K,
j=k
t+1 st+1 l
k (ξk − ξk )
Rt+1 − RI,k + log(1 + ξkl st+1
k )+ l t+1 ≤ 0, k ∈ K.
(1 + ξk sk ) ln 2
2.2. l ← l + 1,
until stopping condition.
3. t ← t + 1,
until stopping condition.
Remark 1. We can apply DCA to derive the beamforming matrix wj ∗ from W∗j .
Indeed, Wj = wj wH
j can be rewritten as the following problem (see [18])
min 0, (19)
A1 ,A2 ,wj
⎡ ⎤
A1 Wj wj
s. t. ⎣WH ⎦
j A2 wj
0, (20)
H H
wj wj 1
Tr(A1 − wj wH
j ) ≤ 0. (21)
min 0,
A1 ,A2 ,w
s. t. (20),
−wkj 2 + 2(.(wkj ))T .(wj ) + 2(.(wkj ))T .(wj ) ≥ Tr(A1 ).
4 Numerical Experiments
initial point. The CVX1 toolbox is used to solve the subproblems. The elements of
all the legitimate channel vectors are generated by complex Gaussian distribution
with zero mean and unit variance. For channel error models, Qjk = −2 I, Gjk =
γjk I, where γjk denotes the average channel correlation gain. We assume that
E = Ek , p = pk , σ 2 = σ1k
2
= σ2k
2
= σ3k2
, γ = γjk for all k, j. Unless otherwise
specified, we set by default K = 5, n = 4, P = 5, E = 0.001, p = 0.1, σ =
0.01, γ = 1, = 0.05.
Comments on numerical results:
We are interested in the effect of the coefficient error , channel correlation
gain and secrecy outage probabilities. For each value of these parameters, the
algorithms are tested on 10 independent channel realizations and the average
values of the secrecy rate (SSR-AVER) and CPU time in seconds (CPU(s)) are
reported.
In Fig. 1, we compare the SSR-AVER of DCA and SCA algorithm under
different coefficient errors. It can be seen that DCA is much more efficient than
SCA in terms of both SSR-AVER and CPU time. In terms of SSR-AVER, the
gain varies from 1.1295 to 2.2538. Concerning CPU time, the ratio of gain varies
from 3.9 to 4.4 times.
Figure 2 presents the SSR-AVER under different channel correlation gains.
DCA always provides the better solutions. Regarding SSR-AVER, the gain varies
from 1.9620 to 2.2722. Concerning CPU time, the ratio of gain is around 4.3
times.
Figure 3 shows the results of increasing secrecy outage probability. This figure
show again the performance superiority of DCA as compared to SCA in terms
of both SSR-AVER and CPU time. In terms of SSR-AVER, the gain varies from
2.102 to 2.2538. As for CPU time, the ratio of gain varies from 4.2 to 5.6 times.
In summary, the proposed DCA algorithm is more efficient than the existing
method in terms of the quality and the rapidity.
4 180
DCA DCA
SCA 160 SCA
3.5
140
3 120
SSR-AVER
100
CPU(s)
2.5
80
2 60
40
1.5
20
1 0
0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12
Coefficient error Coefficient error
1
Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming,
version 2.0. Online: http://cvxr.com/cvx, (2012).
296 P. A. Nguyen and H. A. L. Thi
4 180
160
3.5
140
120
3
SSR-AVER
100
CPU(s)
80
2.5
60
40
2
20
1.5 0
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 0.4 0.6 0.8 1 1.2 1.4 1.6
Channel correlation gain Channel correlation gain
160
3.5
140
3
120
SSR-AVER
2.5 100
CPU(s)
2 80
60
1.5
40
1
20
0.5 0
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0 0.02 0.04 0.06 0.08 0.1 0.12
Secrecy outage probabilities secrecy outage probabilities
5 Conclusions
In this paper, we considered the secrecy rate maximization problem under the
total transmit power, the energy harvesting, and the outage probability con-
straints. By studying the special structure of this problem, we reformulated the
problem as a DC program. To reduce the dimension of the problem, we proposed
an efficient approach based on DCA and alternating optimization for solving the
problem. Numerical experiments confirmed the efficiency of our proposed algo-
rithm when comparing with the existing method.
Appendix A
First, we transform the EH constraint. According to [4], we rewrite |hH
jk wj | =
2
H H H
wHj (Hjk + djk )wj , where Hjk = hjk hjk , djk = hjk Δhjk + Δhjk hjk +
H
Δhjk Δhjk . By applying the triangle inequality and the Cauchy-Schwarz inequal-
ity, we have
H
djk ≤ hjk ΔhH H
jk + Δhjk hjk + Δhjk Δhjk
djk ≤ Qjk −1 + 2hjk Qjk −1 = jk ⇒ − jk IN ≤ Djk ≤ jk IN ,
thus, Tr (Hjk − jk IN )Wj ≤ |hH jk wj | = Tr (Hjk + djk )Wj .
2
A DCA-Based Approach for Outage Constrained 297
K Ek
Therefore, the EH constraint is recast as j=1 Tr(Hjk Wj − jk Wj ) ≥ (1−ρk ) .
ρk hH
kk Wk hkk
Next, RI,k ≤ log 1 + H
(22)
ρk j=k hjk Wj hjk + ρk σ1k + σ2k
2 2
The relaxed constraints hold with equalities at the optimal solution [1].
RI,k −Rk
By using slack variables ξk = 2Tr(Gkk W−1 k)
, the outage constraint is trans-
formed into
⎧
⎪
⎨ − ln pk − ξk σ3k − j=k ln(1 + ξk Tr(Gjk Wj ) ≤ 0,
2
RI,k −Rk
At the optimum, ξk = 2Tr(Gkk W−1 k)
at optimum, if not, we can increase Rk . In
addition, Tr(Gkk Wk ) = sk at the optimum, otherwise, we can decrease s leading
RI,k −Rk
to increase Rk due to ξk = 2Tr(Gkk W−1 k)
.
Such that, if the relaxed constraints do not hold with equalities at the opti-
mum, the objective function can be further increased.
References
1. Chen, D., He, Y., Lin, X., Zhao, R.: Both worst-case and chance-constrained robust
secure SWIPT in MISO interference channels. IEEE Trans. Inf. Forensics Secur.
13(2), 306–317 (2018)
2. Chu, Z., Zhu, Z., Hussein, J.: Robust optimization for an-aided transmission and
power splitting for secure MISO SWIPT system. IEEE Commun. Lett. 20(8),
1571–1574 (2016)
3. Feng, Y., Yang, Z., Zhu, W., Li, Q., Lv, B.: Robust cooperative secure beamforming
for simultaneous wireless information and power transfer in amplify-and-forward
relay networks. IEEE Trans. Veh. Technol. 66(3), 2354–2366 (2017)
4. Gharavol, E.A., Liang, Y., Mouthaan, K.: Robust downlink beamforming in mul-
tiuser MISO cognitive radio networks with imperfect channel-state information.
IEEE Trans. Veh. Technol. 59(6), 2852–2860 (2010)
5. Khandaker, M.R.A., Wong, K., Zhang, Y., Zheng, Z.: Probabilistically robust
SWIPT for secrecy misome systems. IEEE Trans. Inf. Forensics Secur. 12(1), 211–
226 (2017)
298 P. A. Nguyen and H. A. L. Thi
6. Krikidis, I., Timotheou, S., Nikolaou, S., Zheng, G., Ng, D.W.K., Schober, R.:
Simultaneous wireless information and power transfer in modern communication
systems. IEEE Commun. Mag. 52(11), 104–110 (2014)
7. Le, T.A., Vien, Q., Nguyen, H.X., Ng, D.W.K., Schober, R.: Robust chance-
constrained optimization for power-efficient and secure SWIPT systems. IEEE
Trans. Green Commun. Netw. 1(3), 333–346 (2017)
8. Le Thi, H.A., Huynh, V.N., Pham Dinh, T.: Dc programming and DCA for general
dc programs. In: Advanced Computational Methods for Knowledge Engineering,
pp. 15–35 (2014)
9. Le Thi, H.A., Le, H.M., Phanand, B.,Tran, D.N.: A DCA-like algorithm and
its accelerated version with application in data visualization (2018). CoRR
arXiv:abs/1806.09620
10. Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) program-
ming and DCA revisited with dc models of real world nonconvex optimization
problems. Ann. Oper. Res. 133(1), 23–46 (2005)
11. Le Thi, H.A., Pham Dinh, T.: DC programming and DCA: thirty years of devel-
opments. Math. Program. 169(1), 5–68 (2018)
12. Lei, H., Ansari, I.S., Pan, G., Alomair, B., Alouini, M.: Secrecy capacity analysis
over α − μ fading channels. IEEE Commun. Lett. 21(6), 1445–1448 (2017)
13. Li, Q., Ma, W.: Secrecy rate maximization of a MISO channelwith multiple multi-
antenna eavesdroppers via semidefinite programming. In: 2010 IEEE International
Conference on Acoustics, Speech and Signal Processing, pp. 3042–3045 (2010)
14. Ma, S., Hong, M., Song, E., Wang, X., Sun, D.: Outage constrained robust secure
transmission for MISO wiretap channels. IEEE Trans. Wirel. Commun. 13(10),
5558–5570 (2014)
15. Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to D.C. programming:
theory, algorithm and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)
16. Pham Dinh, T., Le Thi, H.A.: D.C. optimization algorithms for solving the trust
region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)
17. Pham Dinh, T., Le Thi, H.A.: Recent advances in DC programming and DCA. In:
Transactions on Computational Intelligence XIII. pp. 1–37. Springer, Heidelberg
(2014)
18. Rashid, U., Tuan, H.D., Kha, H.H., Nguyen, H.H.: Joint optimization of source
precoding and relay beamforming in wireless mimo relay networks. IEEE Trans.
Commun. 62(2), 488–499 (2014)
19. Tian, M., Huang, X., Zhang, Q., Qin, J.: Robust an-aided secure transmission
scheme in MISO channels with simultaneous wireless information and power trans-
fer. IEEE Signal Process. Lett. 22(6), 723–727 (2015)
20. Wang, K., So, A.M., Chang, T., Ma, W., Chi, C.: Outage constrained robust trans-
mit optimization for multiuser MISO downlinks: tractable approximations by conic
optimization. IEEE Trans. Signal Process. 62(21), 5690–5705 (2014)
21. Wang, S., Wang, B.: Robust secure transmit design in mimo channels with simulta-
neous wireless information and power transfer. IEEE Signal Process. Lett. 22(11),
2147–2151 (2015)
DCA-Like, GA and MBO: A Novel
Hybrid Approach for Binary Quadratic
Programs
1 Introduction
The binary quadratic programs (BQPs) are NP-hard combinatorial optimization
problems which take the following mathematical form:
⎧
⎪
⎪ min Z(x) = xT Qx + cT x,
⎨ Ax = b,
(BQP ) (1)
⎪
⎪ Bx ≤ b ,
⎩ n
x ∈ {0, 1} ,
where Q ∈ Rnn , c ∈ Rn , A ∈ Rmn , B ∈ Rpn , b ∈ Rm , b ∈ Rp .
BQP is a common model of several problems in different areas including
scheduling, facility location, assignment and knapsack.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 299–309, 2020.
https://doi.org/10.1007/978-3-030-21803-4_31
300 S. Samir et al.
Many exact methods have been developed to solve the BQP such as branch-
and-bound and cutting plane. The main limit of these methods is their expo-
nential execution time. Thus, they quickly become unusable for realistic size
instances. To effectively solve these problems, researchers have directed their
efforts towards the development of methods known as heuristics. In order to
save the time, the researchers have widely studied heuristics: e.g. genetic algo-
rithms, scatter search, ant colony optimization, tabu search, variable neighbor-
hood search, cuckoo Search, migrating bird optimization. The DC (Difference
of Convex function) programming and DCA (DC Algorithm) constitute another
research direction which has been successfully applied to BQP (see, [8,9,17]).
The main contribution of this study to the literature lies in a new cooperative
approach using DCA-like and Metaheuristic methods, named COP-DCAl -Meta,
for solving BQP. COP-DCAl -Meta is inspired by The Collaborative Metaheuris-
tic optimization Scheme proposed by Yagouni and Hoai An in 2014 [20]. COP-
DCAl -Meta consists in combining DCA-like (a new variant of DCA), a genetic
algorithm and the migrating bird optimization metaheuristic, in a cooperative
way. The participating algorithms start running in parallel. Then, the solution of
every algorithm will be distributed to the other ones via MPI (Message Passing
Interface). We opted for DCA-like due to its power for solving nonconvex pro-
grams in different areas. As for GA and MBO, their efficiency has been proved
in combinatorial optimization which motivated us to use them. To evaluate the
performance of COP-DCAl -Meta we test on instances of the well known the
quadratic assignment problem (QAP).
DC programming and DCA were introduced first in 1985 by Pham Dinh Tao
and then have been extensively developed since 1994 by Le Thi Hoai An and
Pham Dinh Tao. To understand DC programming and DCA, we refer the reader
to a seminal survey of Hoai An and Pham in [9]. DCA is a continuous approach
which showed its efficiency for solving combinatorial optimization problems [7,
18] by using the exact penalty techniques (see [10,11]).
Genetic algorithms are evolutionary algorithms proposed by Holland [3] and
inspired by the process of natural selection. In the literature, we can find a lot
of applications of genetic algorithms [4,13,14].
Migrating bird optimization (MBO) was presented by Duman et al. in 2012
[1]. MBO is heartened by the V formation flight of migrating birds. It has been
proved to be efficient in combinatorial optimization (see, e.g., [1,2,19])
This paper is organized as follows. After the introduction, the component
algorithms including DCA-like, GA, MBO and their application for solving BQP
are briefly presented in Sect. 2. Section 3 is devoted to the cooperative approach
for BQP. Numerical results are reported and discussed in Sects. 4 and 5 concludes
the paper.
where g, h ∈ Γ0 (Rn ), the set contains all lower semi-continuous proper convex
functions on Rn . g and h are called DC component, while g − h is a DC decom-
position of f . To avoid the ambiguity in DC programming, (+∞)− (+∞) = +∞
is a usual convention [16]. A constrained DC program is defined by
When C is a nonempty closed convex set. It can be clearly seen that (3) is
a special case of (2). A constrained DC program can be transformed into an
unconstrained DC program by adding the indicator function χC of the set C
(χC (x) = 0 if x ∈ C, +∞ otherwise) to the first DC component g:
According to the theorem of exact penalty [10], there exists a number t0 > 0
such that for all t > t0 the problems (1) and (7) are equivalent in the sense that
they have the same optimal value and the same optimal solution set.
We can use the following DC components of F :
• Gρ (x) = ρ2 x2 + χΩ (x),
• Hρ (x) = ρ2 x2 − Z(x) − tp(x),
302 S. Samir et al.
with ρ > 0.
We can see that Gρ (x) is convex since Ω(x) is convex. As for Hρ (x), its
convexity depends on the value of ρ. In practice, it is hard to determine the
best value of ρ and is estimated by a
large value.
With ρ beeing larger than the
spectral radius
2 of ∇2
Z(x) denoted ρ ∇ 2
Z(x) , Hρ (x) is convex.
Since ρ ∇ Z(x) ≤ ∇ Z(x), we can choose ρ = ∇2 Z(x), where
2
2
∇ Z(x) =
2
(aip djq + api dqj ) . (8)
i,j,p,q
However, ρ = ∇2 Z(x) is quite large which can affect the convergence of
DCA. Hence, we can update ρ as in the following DCA-like algorithm [5].
Contemplating and studying the V formation flight of the migrating birds give
rise to design a new metaheuristic for solving the combinatorial optimization
problems. This metaheuristic, called migrating bird optimization (MBO), was
introduced by Duman et al. in [1]. It consists in exploiting the power saving in
the V shape flight to minimize (or maximize) an objective function. The MBO
is based on a population of birds and uses the neighboring search technique.
A bird in a combinatorial optimization problem represents a solution. To keep
the context of the V formation, one bird is considered a leader and the others
constitute the right and left lines. The MBO starts treating from the leader to
tails along the lines as it is shown in Algorithm 3.
algorithms in parallel and solving all the problem. The cooperation is shown by
exchanging and distributing information. Using the master-slave model, we can
describe COP-DCAl -Meta as follows.
The Parallel Initialization.
The master chooses an initial point for DCA-like. Then, it computes ρ using (8).
Both of slaves generate randomly the initial populations.
A Cycle (Parallel and Distributed).
The master runs one iteration of DCA-like. If a binary solution is found then
it distributes a message indicating the end of the actual cycle. Otherwise, the
order of continuing execution is broadcasted. The slave 1 (slave 2) performs one
iteration of GA (MBO) and receives the information to rerun or to go to the
next step. This step will be repeated until getting a binary solution by DCA-like
which means until the end of the cycle.
Parallel Exchanging and Evaluation.
At the end of a cycle, all component algorithms will exchange the value of the
objective function (SDCA, SGA, and SMBO) between each other. After that, an
evaluation is done to determine the algorithm getting the best solution. In this
step, every process broadcasts whether if its algorithms satisfy their stopping
criterion (Stop-DCA, Stop-GA, and Stop-MBO) or not.
If all criteria are met, then COP-DCAl -Meta will end and the final solution
is BFS∗ . Otherwise, the best process distributes its solution (BFS) to the others
which will use it as an initial solution by DCA-like, as a new chromosome by
GA or as a new bird by MBO. At this level, COP-DCAl -Meta starts a new cycle
(Fig. 1).
DCA-Like, GA and MBO: A Novel Hybrid Approach 305
4 Numerical Results
as a convex quadratic solver. All experiments were carried out on a Dell desk-
top computer with Intel Core(TM) i5-6600 CPU 3.30GHz and 8GB RAM. To
study the performance of our approach, We test on seven instances of quadratic
assignment problem taken from QAPLIB (A Quadratic Assignment Problem
Library1 ) in OR-Library.2 We provide a comparison between COP-DCAl -Meta,
the participating algorithms and the best known lower bound (BKLB which is
either the optimal solution or the best-known one) for some instances. The com-
parison takes into consideration the objective value, the gap (see Eq. (10)) and
the running time measured in seconds.
The results of four algorithms are reported in Table 1. The first column gives the
ID of each dataset, the size which varies from 12 to 90 and the BKLB was taken
from OR-Library. The remaining columns show the objective value, the gap and
the CPU obtained by each algorithm. From the numerical results, it can be seen
that:
– In terms of the objective value and the gap, the cooperative approach COP-
DCAl -Meta is the most efficient, then MBO, DCA-like and finally GA.
– COP-DCAl -Meta obtained the BKLB on 6/7 instances.
– The cooperation between the component algorithms allows them to change
the behavior and browse more promising regions to get better results.
– DCA-like is very efficient when it solves the problems of big-size. This advan-
tage can be exploited by COP-DCAl -Meta to improve it.
– Regarding the running time, COP-DCAl -Meta consumes more time than each
component algorithm. The difference is due to communication between the
members.
1
http://anjos.mgi.polymtl.ca/qaplib//inst.html.
2
http://people.brunel.ac.uk/mastjjb/jeb/info.html.
DCA-Like, GA and MBO: A Novel Hybrid Approach 307
5 Conclusion
thing, shows that the cooperation between the component algorithms has been
successfully realized. As an area of further research, we can collaborate DCA
with other algorithms to solve other nonconvex problems.
References
1. Duman, E., Uysal, M., Alkaya, A.F.: Migrating birds optimization: a new meta-
heuristic approach and its performance on quadratic assignment problem. Inf. Sci.
217, 65–77 (2012)
2. Duman, E., Elikucuk, I.: Solving credit card fraud detection problem by the new
metaheuristics migrating birds optimization. In: Proceedings of the 12th Inter-
national Conference on Artificial Neural Networks. Advences in Computational
Intelligence, vol. II, pp. 62–71. Springer (2013)
3. Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michi-
gan Press (1975)
4. Julstrom, B.A.: Greedy, genetic, and greedy genetic algorithms for the quadratic
knapsack problem. In: Proceedings of the 7th Annual Conference on Genetic and
Evolutionary Computation, pp. 607–614. ACM (2005)
5. Hoai An, L.T., Le, H.M., Phan, D.N., Tran, B.: A DCA-like algorithm and its
accelerated version with application in data visualization. https://arxiv.org/abs/
1806.09620 (2018)
6. Hoai An, L.T., Pham, D.T.: Solving a class of linearly constrained indefinite
quadratic problems by DC algorithms. J. Glob. Optim. 11(3), 253–285 (1997)
7. Hoai An, L.T., Pham, D.T.: A continuous approch for globally solving linearly
constrained quadratic. Optimization 50(1–2), 93–120 (2001)
8. Hoai An, L.T., Pham D.T.: A continuous approach for large-scale constrained
quadratic zero-one programming. Optimization 45(3): 1–28 (2001). (In honor of
Professor ELSTER, Founder of the Journal Optimization)
9. Hoai An, L.T., Pham, D.T.: DC programming and DCA: thirty years of develop-
ments. Math. Program. 169(1), 5–68 (2018)
10. Hoai An, L.T., Pham, D.T., Le, D.M.: Exact penalty in DC programming. Viet-
nam. J. Math. 27(2), 169–178 (1999)
11. Hoai An, L.T., Pham, D.T., Van Ngai, H.: Exact penalty and error bounds in DC
programming. J. Glob. Optim. 52(3), 509–535 (2011)
12. Hoai An, L.T., Pham, D.T., Yen, N.D.: Properties of two DC algorithms in
quadratic programming. J. Glob. Optim. 49(3), 481–495 (2011)
13. Merz, P., Freisleben, B.: Genetic algorithms for binary quadratic programming. In:
Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computa-
tion, vol. 1, pp. 417–424. Morgan Kaufmann Publishers Inc. (1999)
14. Misevicius, A., Staneviciene, E.: A new hybrid genetic algorithm for the grey pat-
tern quadratic assignment problem. Inf. Technol. Control. 47(3), 503–520 (2018)
15. Osborn, A.F.: Your creative power: how to use imagination to brighten life, to get
ahead. How To Organize a Squad To Create Ideas, pp. 265–274. Charles Scribner’s
Sons, New York (1948). ch. XXXIII.
16. Pham, D.T., Hoai An, L.T.: Convex analysis approach to DC programming: theory,
algorithm and applications. Acta Mathematica Vietnamica, 22(1), 289–355 (1997)
17. Pham, D.T., Hoai An, L.T., Akoa, F.: Combining DCA (DC Algorithms) and
interior point techniques for large-scale nonconvex quadratic programming. Optim.
Methods Softw.23, 609–629 (2008)
DCA-Like, GA and MBO: A Novel Hybrid Approach 309
18. Pham, D.T., Canh, N.N., Hoai An, L.T.: An efficient combined DCA and B&B
using DC/SDP relaxation for globally solving binary quadratic programs. J. Glob.
Optim. 48(4), 595–632 (2010)
19. Tongur, V., Ülker, E.: Migrating birds optimization for flow shop sequencing prob-
lem. J. Comput. Commun. 02, 142–147 (2014)
20. Yagouni, M., Hoai An, L.T.: A collaborative metaheuristic optimization scheme:
methodological issues. In: van Do, T., Thi, H.A.L., Nguyen, N.T. (eds.) Advanced
Computational Methods for Knowledge Engineering. Advances in Intelligent Sys-
tems and Computing, vol. 282, pp. 3–14. Springer (2014)
Low-Rank Matrix Recovery with Ky Fan
2-k-Norm
1 Introduction
Matrix recovery problem concerns the construction of a matrix from incomplete
information of its entries. This problem has a wide range of applications such
as recommendation systems with incomplete information of users’ ratings or
sensor localization problem with partially observed distance matrices (see, e.g.,
[3]). In these applications, the matrix is usually known to be (approximately)
low-rank. Finding these low-rank matrices are theoretically difficult due to their
non-convex properties. Computationally, it is important to study the tractability
of these problems given the large scale of datasets considered in practical appli-
cations. Recht et al. [11] studied the low-rank matrix recovery problem using a
convex relaxation approach which is tractable. More precisely, in order to recover
a low-rank matrix X ∈ Rm×n which satisfy A(X) = b, where the linear map
A : Rm×n → Rp and b ∈ Rp , b = 0, are given, the following convex optimization
problem is proposed:
min X∗
X (1)
s.t. A(X) = b,
where X∗ = σi (X) is the nuclear norm, the sum of all singular values of
i
X. Recht et al. [11] showed the recoverability of this convex approach using some
This work is partially supported by the Alan Turing Fellowship of the first author.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 310–319, 2020.
https://doi.org/10.1007/978-3-030-21803-4_32
Low-Rank Matrix Recovery with Ky Fan 2-k-Norm 311
These unitarily invariant norms (see, e.g., Bhatia [2]) and their gauge functions
have been used in sparse prediction problems [1], low-rank regression analysis
[6] and multi-task learning regularization [7]. When k = 1, the Ky Fan 2-k-norm
is the spectral norm, A = σ1 (A), the largest singular value of A, whose dual
norm is the nuclear norm. Similar to the nuclear norm, the dual Ky Fan 2-k-
norm with k > 1 can be used to compute the k-approximation of a matrix A
(Proposition 2.9, [5]), which demonstrates its low-rank property. Motivated by
this low-rank property of the (dual) Ky Fan 2-k-norm, which is more general than
that of the nuclear norm, and its usage in other applications, in this paper, we
propose a Ky Fan 2-k-norm-based non-convex approach for the matrix recovery
problem which aims to recover matrices which are not recoverable by the convex
relaxation formulation (1). In Sect. 2, we discuss the proposed models in detail
and in Sect. 3, we develop numerical algorithms to solve those models. Some
numerical results will also be presented.
k 12 ⎛ ⎞ 12
min{m,n}
|A|k,2 = σi2 (A) ≤ AF = ⎝ σi2 (A)⎠ ,
i=1 i=1
where · is the Frobenius norm. Now consider the dual Ky Fan 2-k-norm and
use the definition of the dual norm, we obtain the following inequality:
Thus we have:
Given the result in Theorem 1, the exact recovery of a low-rank matrix using
(5) or (6) relies on the uniqueness of the low-rank solution of A(X) = b. Recht
et al. [11] generalized the restricted isometry property of vectors introduced by
Candès and Tao [4] to matrices and use it to provide sufficient conditions on the
uniqueness of these solutions.
Definition 1 (Recht et al. [11]). For every integer k with 1 ≤ k ≤ min{m, n},
the k-restricted isometry constant is defined as the smallest number δk (A) such
that
(1 − δk (A)) XF ≤ A(X)2 ≤ (1 + δk (A)) XF (7)
holds for all matrices X of rank at most k.
Using Theorem 3.2 in Recht et al. [11], we can obtain the following exact recovery
result for (5) and (6).
Theorem 2. Suppose that δ2k < 1 and there exists a matrix X ∈ Rm×n which
satisfies A(X) = b and rank(X) ≤ k, then X is the unique solution to (5) and
(6), which implies exact recoverability.
The condition in Theorem 2 is indeed better than those obtained for the
nuclear norm approach (see, e.g., Theorem 3.3 in Recht et al. [11]). The non-
convex optimization problems (5) and (6) use norm ratio and difference. When
k = 1, the norm ratio and difference are computed between the nuclear and
Frobenius norm. The idea of using these norm ratios and differences with k = 1
has been used to generate non-convex sparse generalizer in the vector case, i.e.,
m = 1. Yin et al. [13] investigated the ratio 1 /2 while Yin et al. [14] analyzed
Low-Rank Matrix Recovery with Ky Fan 2-k-Norm 313
3 Numerical Algorithm
3.1 Difference of Convex Algorithms
We start with the problem (5). It can be reformulated as
2
max ZF
Z ,z
s.t. |Z|k,2 ≤ 1, (8)
A(Z) − z · b = 0,
z > 0,
with the change of variables z = 1/|X|k,2 and Z = X/|X|k,2 . The compact
formulation
2
min δZ (Z, z) − ZF /2, (9)
Z ,z
where Z is the feasible set of the problem (8) and δZ (·) is the indicator function
of Z. The problem (9) is a difference of convex (d.c.) optimization problem (see,
e.g., [9]). The difference of convex algorithm DCA proposed in [9] can be applied
to the problem (9) as follows.
Step 1. Start with (Z 0 , z 0 ) = (X 0 /|X 0 |k,2 , 1/|X 0 |k,2 ) for some X 0
such that A(X 0 ) = b and set s = 0.
Step 2. Update (Z s+1 , z s+1 ) as an optimal solution of the following convex
optimization problem
max Z s , Z
Z ,z
s.t. |Z|k,2 ≤ 1 (10)
A(Z) − z · b = 0
z > 0.
Step 3. Set s ← s + 1 and repeat Step 2.
314 X. V. Doan and S. Vavasis
Let X s = Z s /z s and use the general convergence analysis of DCA (see, e.g.,
Theorem 3.7 in [10]), we can obtain the following convergence results.
Proposition 1. Given the sequence {X s } obtained from the DCA algorithm for
the problem (9), the following statements are true.
|X s |k,2
(i) The sequence is non-increasing and convergent.
X s F
X s+1 Xs
(ii) − → 0 when s → ∞.
|X s+1 |k,2 |X s |k,2
F
The convergence results show that the DCA algorithm improves the objec-
tive of the ratio minimization problem (5). The DCA algorithm can stop if
(Z s , z s ) ∈ O(Z s ), where O(Z s ) is the set of optimal solution of 10 and (Z s , z s )
which satisfied this condition is called a critical point. Note that (local) opti-
mal solutions of (9)can be shown to be critical points. The following proposition
shows that an equivalent condition for critical points.
Xs Xs + Y Xs Xs
Clearly, s , s ≤ , is equivalent to
|X |k,2 |X + Y |k,2 |X |k,2 |X s |k,2
s
|X s |k,2
|X s + Y |k,2 − · X s , Y ≥ |X s |k,2 .
X s F
2
The result of Proposition 2 shows the similarity between the norm ratio min-
imization problem (5) and the norm different minimization problem (6) with
respect to the implementation of the DCA algorithm. It is indeed that the prob-
lem (6) is a d.c. optimization problem and the DCA algorithm can be applied
as follows.
Low-Rank Matrix Recovery with Ky Fan 2-k-Norm 315
The following proposition shows that X s is a critical point of the problem (14)
for many functions α(·) if rank(X s ) ≤ k.
1 |X s |k,2
≤ α(X) ≤ 2 . (15)
XF X s F
we have: α(X s )·X s ∈ ∂|X s |k,2 . Thus for all Y , the following inequality holds:
Proposition 3 shows that one can choose different functions α(·) such as α(X) =
1/|X|k,2 for the sub-problem in the general DCA framework to solve the orig-
inal problem. This generalized sub-problem (14) is a convex optimization prob-
lem, which can be formulated as a semidefinite optimization problem given the
following calculation of the dual Ky Fan 2-k-norm provided in [5]:
In order to implement the DCA algorithm, one also needs to consider how to
find the initial solution X 0 . We can use the nuclear norm minimization problem
1, the convex relaxation of the rank minimization problem, to find X 0 . A similar
approach is to use the following dual Ky Fan 2-k-norm minimization problem to
find X 0 given its low-rank properties:
min |X|k,2
X (17)
s.t. A(X) = b.
Figure 1 shows recovery probabilities and average computation times (in seconds)
for different sizes of the linear map.
The results show that the proposed algorithm can recover exactly the matrix
M with 100% rate when s ≥ 250 with both initial solutions while the nuclear
norm approach cannot recover any matrix at all, i.e., 0% rate, if s ≤ 300.
k2-nuclear is slightly better than k2-zero in terms of recoverability when s is
small while their average computational times are almost the same in all cases.
The efficiency of the proposed algorithm when s is small comes with higher aver-
age computational times as compared to that of the nuclear norm approach. For
example, when s = 180, on average, one needs 80 iterations to reach the solution
when the proposed algorithm is used instead of 1 with the nuclear norm opti-
mization approach. Note that the average number of iterations is computed for
all cases including cases when the matix M cannot be recovered. For recoverable
cases, the average number of iterations is much less. For example, when s = 180,
the average number of iterations for recoverable case is 40 instead of 80. When
the size of the linear map increases, the average number of iterations is decreased
significantly. We only need 2 extra iterations when s = 250 or 1 extra iteration
318 X. V. Doan and S. Vavasis
on average when s = 300 to obtain 100% recover rate when the nuclear norm
optimization approach still cannot recover any of the matrices (0% rate). These
results show that the proposed algorithm achieve significantly better recovery
rate with a small number of extra iterations in many cases. We also test the
algorithms with higher ranks including r = 5 and r = 10. Figure 2 shows the
results when the size of linear map is s = 1.05dr .
Fig. 2. Recovery probabilities and average computation times for different ranks
These results show that when the size of linear maps is small, the proposed
algorithms are significantly better than the nuclear norm optimization approach.
With s = 1.05dr , the recovery probability increases when r increases and it is
close to 1 when r = 10. The computational time increases when r increases given
that the size of the sub-problems depends on the size of the linear map. With
respect to the number of iterations, it remains low. When r = 10, the average
numbers of iterations are 22 and 26 for k2-nuclear and k2-zero, respectively. It
shows that k2-nuclear is slightly better than k2-zero both in terms of recovery
probability and computational time.
Low-Rank Matrix Recovery with Ky Fan 2-k-Norm 319
4 Conclusion
We have proposed non-convex models based on the dual Ky Fan 2-k-norm for
low-rank matrix recovery and developed a general DCA framework to solve the
models. The computational results are promising. Numerical experiments with
larger instances will be conducted with first-order algorithm development for the
proposed modes as a future research direction.
References
1. Argyriou, A., Foygel, R., Srebro, N.: Sparse prediction with the k-support norm.
In: NIPS, pp. 1466–1474 (2012)
2. Bhatia, R.: Matrix Analysis, Graduate Texts in Mathematics, vol. 169. Springer,
New York (1997)
3. Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found.
Comput. Math. 9(6), 717–772 (2009)
4. Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory
51(12), 4203–4215 (2005)
5. Doan, X.V., Vavasis, S.: Finding the largest low-rank clusters with Ky Fan 2-k-
norm and 1 -norm. SIAM J. Optim. 26(1), 274–312 (2016)
6. Giraud, C.: Low rank multivariate regression. Electron. J. Stat. 5, 775–799 (2011)
7. Jacob, L., Bach, F., Vert, J.P.: Clustered multi-task learning: a convex formulation.
NIPS 21, 745–752 (2009)
8. Ma, T.H., Lou, Y., Huang, T.Z.: Truncated 1−2 models for sparse recovery and
rank minimization. SIAM J. Imaging Sci. 10(3), 1346–1380 (2017)
9. Pham, D.T., Hoai An, L.T.: Convex analysis approach to dc programming: theory,
algorithms and applications. Acta Mathematica Vietnamica 22(1), 289–355 (1997)
10. Pham, D.T., Hoai An, L.T.: A dc optimization algorithm for solving the trust-
region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)
11. Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear
matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
12. Toh, K.C., Todd, M.J., Tütüncü, R.H.: SDPT3-a MATLAB software package for
semidefinite programming, version 1.3. Optim. Methods Softw. 11(1–4), 545–581
(1999)
13. Yin, P., Esser, E., Xin, J.: Ratio and difference of 1 and 2 norms and sparse
representation with coherent dictionaries. Commun. Inf. Syst. 14(2), 87–109 (2014)
14. Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of 1 − 2 for compressed sensing.
SIAM J. Sci. Comput. 37(1), A536–A563 (2015)
Online DCA for Times Series Forecasting
Using Artificial Neural Network
Abstract. In this work, we study the online time series forecasting prob-
lem using artificial neural network. To solve this problem, different online
DCAs (Difference of Convex functions Algorithms) are investigated. We
also give comparison with online gradient descent—the online version of
one of the most popular optimization algorithm in the collection of neu-
ral network problems. Numerical experiments on some benchmark time
series datasets validate the efficiency of the proposed methods.
1 Introduction
Time series analysis and forecasting have an important role and a wide range
of applications such as stock market, weather forecasting, energy demand, fuels
usage, electricity and in any domain with specific seasonal or trendy changes in
time [15]. The information one gets from forecasting time series data can con-
tribute to important decisions of companies or organizations with high priority.
The goal of time series analysis is to extract information of a given time series
data over some period of time. Then, the information is used to construct a
model, which could be used for predicting future values of the considering time
series.
Online learning is a technique in machine learning which is performed in a
sequence of consecutive rounds [14,17]. At each round t, we receive a question
xt and have to give a corresponding prediction pt (xt ). After that, we receive
the true answer yt and suffer the loss between pt (x) and yt . In many real world
situations, we do not know the entire time series beforehand. New data might
arrive sequentially in real time. In those cases, analysis and forecasting of time
series should be put in an online learning context.
Linear models like autoregressive and autoregressive moving average mod-
els are standard tools for time series forecasting problems [2]. However, many
processes in real world are nonlinear. Empirical experience shows that linear
models are not always the best to simulate the underlying dynamics of a time
series. This gives rise to a demand for better nonlinear models. Recently, artifi-
cial neural networks have shown promising results in different applications [9],
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 320–329, 2020.
https://doi.org/10.1007/978-3-030-21803-4_33
Online DCA for Times Series Forecasting Using Artificial Neural Network 321
which comes from the flexibility of those models in approximating functions (see
[4,5]). In term of time series forecasting using neural networks, there are many
works to mention [1,11].
Although these works demonstrate the effectiveness of neural networks for
time series applications, they used the smooth activation functions such as sig-
moid or tanh. In recent works, ReLU activation function are shown to be better
than the above smooth activation functions with good properties in practical
[12]. In this work, we propose an online autoregressive model using neural net-
work with ReLU activation function. Unlike other regression works which use
square loss, we choose -insensitive loss function to reduce the impact of outliers.
We limit the architecture of the network to one hidden layer to reduce overfit-
ting problem. In spite of not being a deep network, fitting one hidden layer
neural network is still a nonconvex, nonsmooth optimization problem. To solve
such problem in online context, we utilize the tools of online DC (Difference of
Convex functions) programming and online DCA (DC Algorithm) (see [7,8,13]).
The contribution of this work is the proposal and comparison of several online
optimization algorithms based on online DCA to solve the time series forecasting
problem using neural network. Numerical experiments on different time series
datasets indicate the effectiveness of the proposed methods.
The structure of this work is organized as follows. In Sect. 2, we present the
online learning scheme, DCA and online DCA schemes with two learning rules.
In Sect. 3, we formulate the autoregressive model with neural network and the
corresponding online optimization problem. Section 4 contains three online DC
algorithms to solve the problem in the previous section. We also consider the
online gradient descent algorithm in this section. Numerical experiments are
presented in Sect. 5 with conclusion in Sect. 6.
where g, h ∈ Γ0 (Rd ), the set contains all lower semicontinuous proper convex
functions on Rd . Such a function f is called a DC function, and g − h, a DC
decomposition of f while g and h are DC components of f . A standard DC
program with a convex constraint C (a non empty closed convex set in Rd ),
which is α = inf {f (x) := g(x) − h(x) : x ∈ C}, can be expressed in the form of
(Pdc ) by adding the indicator function of C to the function g.
The main idea of DCA is quite simple: each iteration k of DCA approxi-
mates the concave part −h by its affine majorization corresponding to taking
the subgradient y ∈ ∂h(xk ) and minimizes the resulting convex function.
Convergence properties of the DCA and its theoretical basis are described
in [7,13]. In the past years, DCA has been successfully applied in several works
of various fields among them machine learning, financial optimization, supply
chain management [8].
In online learning context, at each round t, we receive a DC loss function
ft = gt − ht with gt , ht are functions in Γ0 (S) and S is the parameter set of
the online learner. Then, we approximate the concave part −ht by its affine
majorization corresponding to taking zt ∈ ∂ht (xt ) and minimize the resulting
convex subproblem [3].
The subproblem of online DCA can take two forms. The first form is follow-
t learning rule [14], which means at round t, we minimize the cumulative
the-leader
loss i=1 fi . We can also minimize the current loss ft instead the cumulative one.
t
In short, we can write both learning rules above in a single formula as i=t0 fi ,
where t0 = 1 or t. Then, the online DCA scheme is given in Algorithm 2.
To estimate the error between the predicted and true data, we use the
−insensitive loss: L (p(xt | θ), yt ) = max (0, |p(xt | θ) − yt | − ) , where is a
324 V. A. Nguyen and H. A. Le Thi
positive number. From now on, we use the notation ft to denote the objec-
tive function corresponding to a single data point (xt , yt ), which also means
ft (θ) = L (p(xt | θ), yt ).
In online setting,we have two learning rules: either minimizing the loss ft or
t
the cumulative loss i=1 fi at round t. Both choices can be written in a single
form as
t
t
min fi (θ) = max 0, max 0, xTi U + aT V + b − yi − , (2)
θ∈S
i=t0 i=t0
where t0 be either 1 or t.
In summary, we follow the online learning scheme 1 to receive question xt ,
predict p(xt | θ) then suffer the loss ft (θ) = L (p(xt | θ), yt ) and minimize the
problem (2). This process is repeated for t = 1, 2, ..., T in real time.
1 2 2
N
q(xt | θ) = b + [ max(0, xTt Uj + aTj ) + Vj+ + Vj− ]
2 j=1
1 2 2
N
r(xt | θ) = max(0, xTt Uj + aTj ) + Vj− + Vj+ (5)
2 j=1
where zi ∈ ∂hi (θt ). There are two cases for this subproblem: t0 = t or t0 = 1. In
the following sections, we will consider each case in detail.
Algorithm 3. DCA-1a
1: Initialization: θ1 ∈ S.
2: for t = 1, 2, 3, ..., T do
3: Receive question xt . Give prediction p(xt | θt ).
4: Receive answer yt and suffer loss ft (θ) = gt (θ) − ht (θ).
5: Choose step size αt > 0.
6: Calculate wt ∈ ∂gt (θt ) and zt ∈ ∂ht (θt ).
7: Calculate θt+1 = ProjS (θt − αt (wt − zt )).
8: end for
Algorithm 4. DCA-1b
1: Initialization: θ1 ∈ S and T0 , K ∈ N.
2: for t = 1, 2, 3, ..., T do
3: Receive question xt . Give prediction p(xt | θt ).
4: Receive answer yt and suffer loss ft (θ) = gt (θ) − ht (θ).
5: Calculate zt ∈ ∂ht (xt ).
6: if t ≤ T0 then
7: Solve minθ∈S {gt (θ) − zt , θ} by subgradient method with K iterations.
8: else
9: Solve minθ∈S {gt (θ) − zt , θ} by subgradient method with 1 iteration.
10: end if
11: end for
in the first T0 rounds, therefore leads to faster convergence. On the other hand,
T0 as we adjust T0 and
the computational time is kept in an acceptable threshold
K. In the view of regret bound, one might see that i=1 fi is bounded, so it
does not affect the sublinearity bound of the regret. The bound only depends on
DCA-1a, which is applied for the latter T − T0 rounds. This combined strategy
is describe in Algorithm 4 (DCA-1b).
Algorithm 5. DCA-2
1: Initialization: θ1 ∈ S.
2: for t = 1, 2, 3, ..., T do
3: p(xt | θt ).
Receive question xt . Give prediction
4: Receive answer yt and suffer loss ti=1 fi (θ) = ti=1 gi (θ) − ti=1 hi (θ).
5: Choose step size λt > 0.
6: Calculate wt ∈ ∂gt (θt ) and
zt ∈ ∂ht (θt ).
7: Calculate θt+1 = ProjS θt − λt ti=1 (wi − zi ) .
8: end for
case, the update formula of DCA-1a and OGD are exactly the same. Therefore,
in the numerical experiment, we consider DCA-1a and OGD as one algorithm.
More details about OGD for convex loss functions could be found in [14].
t
4.3 Learning Rule: min { i=1 }
fi (θ) : θ ∈ S (Case t0 = 1)
t
t
In this case, the subproblem (6) becomes mins∈S g
i=1 i (s) − z
i=1 i , s ,
where zi ∈ ∂hi (θi ). If we use subgradient method, first we initialize s1 = θt and
let αk be the step
size at iteration k. The update formula can be obtained as
t t
sk+1
= s −α
k k
i=1 wi −
k
i=1 zi , where wi ∈ ∂gi (s ). At each iteration k,
k k
we have to compute wik for all i in {1, 2, ..., t}. This makes the computation heavy,
even with only 1 iteration of subgradient method. An other approach is to replace
each gi with its piecewise linear approximation.
The linear approximation of gi at
s0 has the form of φ0i (s) = gi
(s0 )+ wi0 , s − s0
for all s in S, where wi
0
∈ ∂g 0
i (s ).
Assume we have a finite set s , s , ..., s ⊂ S. Then gi could be approximated
1 2 n
by the piecewise linear function as follows: gi (s) ≈ max{φji (s) : j ∈ {1, ..., n}}.
Put Gi (s) = max{φi,j (s) : j ∈ {1, ..., n}}, then the subproblem becomes
t
t
t
min Gi (s) − zi , s = max wij − zi , s : j ∈ {1, ..., n} .
s∈S
i=1 i=1 i=1
t
So we obtain θt+1 = ProjS θt − λt i=1 (wi − zi ) . With this update rule, we
have Algorithm 5 (DCA-2).
328 V. A. Nguyen and H. A. Le Thi
5 Numerical Experiments
We conduct experiments on three algorithms on five time series datasets taken
from UCI machine learning repository1 . The experimental procedure is imple-
mented as follows. We perform the preprocessing by transforming the time
series {τt } into a new dataset in which the feature vector xt has the form
xt = (τt−4 , τt−3 , τt−2 , τt−1 ) , which we call a window of length 4. For each win-
dow, we take the current t-th value of the time series as the label, which is
yt = τt . In short, we can see the online model as using 4 past values to predict
the upcoming value. We choose T0 = 10 and K = 100 for DCA-1b. Mean square
error (MSE) is used as the measure for quality.
Results are reported in Table 1.
Table 1. Comparative results on time series datasets. L denotes the length of the time
series. Bold values correspond to best results for each dataset. We have chosen the
subgradients such that DCA-1a and OGD have the same update formulas.
Comment. In term of MSE, DCA-2 is better than DCA-1a (or OGD) in all
datasets. This can be explained by the fact that DCA-2 minimizes the cumulative
loss, which gives more information for the online learner. DCA-1b outperforms
DCA-2 in datasets with more number of instances (Appliances, Temperature
and Pressure).
Regarding computational time, DCA-1a (or OGD) is the best due to its
light computation update formula. DCA-1b is slow since it has to loop over K
iterations of subgradient method for the first T0 rounds.
In summary, DCA-2 performs well on both quality and time. Although DCA-
1b has good MSEs on long time series, the computational time is large compared
to two other algorithms. DCA-1a (or OGD) is the fastest but with the lowest
quality.
6 Conclusion
This work presents an approach for online time series forecasting by using neural
networks. The resulting optimization problem of neural network with ReLU acti-
1
http://archive.ics.uci.edu/ml.
Online DCA for Times Series Forecasting Using Artificial Neural Network 329
References
1. Anders, U., Korn, O., Schmitt, C.: Improving the pricing of options: a neural
network approach. J. Forecast. 17(5–6), 369–388 (1998)
2. Box, G.E., Jenkins, G.M., Reinsel, G.C., Ljung, G.M.: Time series analysis: fore-
casting and control. Wiley (2015)
3. Ho, V.T., Le Thi, H.A., Bui Dinh, C.: Online DC optimization for online binary
linear classification. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong, T.P. (eds.)
Intelligent Information and Database Systems, pp. 661–670. Springer, Berlin (2016)
4. Hornik, K.: Some new results on neural network approximation. Neural Netw. 6(8),
1069–1072 (1993)
5. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are uni-
versal approximators. Neural Netw. 2(5), 359–366 (1989)
6. Kantz, H., Schreiber, T.: Nonlinear Time Series Analysis, vol. 7. Cambridge Uni-
versity Press (2004)
7. Le Thi, H.A., Pham Dinh, T.: The DC (Difference of Convex functions) program-
ming and DCA revisited with DC models of real world nonconvex optimization
problems. Ann. Oper. Res. 133(1), 23–46 (2005)
8. Le Thi, H.A., Pham Dinh, T.: DC programming and DCA: thirty years of devel-
opments. Math. Program. 169(1), 5–68 (2018)
9. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444
(2015)
10. Li, Y., Yuan, Y.: Convergence analysis of two-layer neural networks with ReLU
activation. In: Advances in Neural Information Processing Systems, pp. 597–607
(2017)
11. Medeiros, M.C., Teräsvirta, T., Rech, G.: Building neural network models for time
series: a statistical approach. J. Forecast. 25(1), 49–75 (2006)
12. Pan, X., Srikumar, V.: Expressiveness of rectifier networks. In: Proceedings of the
33rd International Conference on International Conference on Machine Learning.
ICML’16, vol. 48, pp. 2427–2435. JMLR.org (2016)
13. Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to d.c. programming:
theory, algorithm and applications. Acta Math. Vietnam. 22(01) (1997)
14. Shalev-Shwartz, S., Singer, Y.: Online learning: Theory, Algorithms, and Applica-
tions (2007)
15. Shumway, R.H., Stoffer, D.S.: Time Series Analysis and its Applications (Springer
Texts in Statistics). Springer, Berlin (2005)
16. Yule, G.U.: On a method of investigating periodicities in disturbed series, with
special reference to Wolfer’s sunspot numbers. In: Philosophical Transactions of
the Royal Society of London. Series A, Containing Papers of a Mathematical or
Physical Character, vol. 226, pp. 267–298 (1927)
17. Zinkevich, M.: Online convex programming and generalized infinitesimal gradient
ascent. In: Proceedings of the 20th International Conference on Machine Learning
(ICML-03), pp. 928–936 (2003)
Parallel DC Cutting Plane Algorithms
for Mixed Binary Linear Program
1 Introduction
min f (x, y) := c x + d y
s.t. Ax + By ≥ b (P)
(x, y) ∈ {0, 1}n × Rq+ ,
The research is funded by the National Natural Science Foundation of China (Grant
No: 11601327) and by the Key Construction National “985” Program of China (Grant
No: WF220426001).
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 330–340, 2020.
https://doi.org/10.1007/978-3-030-21803-4_34
Parallel DC Cutting Plane Algorithms for Mixed Binary Linear Program 331
For integer set {0, 1}n , we often use the piecewise linear function
n
p(x) = min{xi , 1 − xi },
i=1
n
then S = K ∩ {x ∈ R : p(x) ≤ 0} × [0, ȳ].
Based on exact penalty theorem [4,8], if K is nonempty, then there exists a
finite number t0 ≥ 0 such that for all t > t0 , the problem (P) is equivalent to:
3 DC Cutting Planes
3.1 Valid Inequalities Based on DC Programming
To simplify the notations, we will identify lu∗ (x) with lu∗ (u) and p(x) with p(u).
Theorem 1. (see [11, 12]) There exists a finite number t1 ≥ 0 such that for all
t > t1 , if u∗ ∈ V (K) \ S is a local minimizer of (P t ), then
Proof. First, it follows from Lemma 1 that lu∗ (u∗ ) = p(u∗ ), then p(u∗ ) ∈/ Z
implies lu∗ (u∗ ) < lu∗ (u∗ ). Thus, u∗ unsatisfies the inequality (2). Second,
when t is sufficiently large, it follows from Theorem 1 that for all u ∈ S, Z
lu∗ (u) ≥ lu∗ (u∗ ) ∈
/ Z, hence, lu∗ (u∗ ) ≥ lu∗ (u∗ ), ∀u ∈ S.
Case 2: In this case, we will use classical cuts to separate u∗ and S. Lift-and-
project (LAP) cut, one of the classical cuts, is introduced in Algorithm 2, the
reader can refer to [1,2] for more details.
j + v C
max{vb − (wD j )u∗ : wCj − vCj = 0, (w, v) ≥ 0}
Proof. First, since lu∗ (u∗ ) = 0, then u∗ unsatisfies the inequality (3). Second,
let C1 = {(x∗ , y) : (x∗ , y) ∈ K}, the following problem
4 DC-CUT Algorithms
In this section, we will establish DC-CUT algorithms based on DC cuts and the
classical cutting planes presented in previous section.
min{f (u) : u ∈ K k },
min{τt (u) : u ∈ K k }.
the lower bound, and provides more candidates for updating incumbent upper
bound. Consider also the power of multiple CPU/GPU, we propose to start DCA
simultaneously from random initial points.
The differences between Parallel-DC-CUT Algorithm and DC-CUT
Algorithm 3 are mainly focused on the codes from line 7 to line 19. Suppos-
ing that we want to use s parallel workers, then at the line 7, we can choose
s random initial points in [0, 1]n × [0, ȳ] for starting DCA simultaneously and
construct cutting planes respectively. Once the parallel block is terminated at
line 19, we collect all created cutting planes in V k to update the sets K k and S k .
5 Experimental Results
In this section, we report some numerical results of our proposed algorithms. Our
algorithms are implemented in MATLAB, and using parfor for parallel comput-
ing. The linear subproblems are solved by Gurobi 8.1.0 [3]. The experiments are
performed on a laptop equipped with 2 Intel i5-6200U 2.30GHz CPU (4 cores)
and 8 GB RAM, thus we use 4 workers for tests in parallel computing.
We first illustrate the test results for a pure binary linear program exam-
ple with 10 binary variables and 10 linear constraints. The optimal value is
0. Figure 1 illustrates the updates of upper bounds (solid cycle line) and lower
bounds (dotted square line) with respect to the iterations of four DC-CUT Algo-
rithms (DC-CUT, Parallel-DC-CUT, Variant DC-CUT, and Variant Parallel-
DC-CUT Algorithm). Comparing the cases (b) and (d) with parallelism to the
cases (a) and (c) without parallelism, we observe that by introducing paral-
lelism, DCA can find a global optimal solution more quickly, and the number
of required iterations is reduced. The computing times for these four algorithms
are respectively (a) 3.4s, (b) 2.3s, (c) 2.6s and (d) 1.6s. Clearly, the parallel
cases (b) and (d) are faster than the cases (a) and (c) without parallelism, and
the variant methods (c) and (d) with more cutting planes introduced in each
iteration performed better in general. The fastest algorithm is the last case (d).
Moreover, in this test example, it is possible to produce negative gap as in
case (c), since DC cut is a kind of local cut which can cut off some feasible
solutions such that the lower bound on the reduced set could be greater than
the current upper bound. This feature is particular for DC cut which is quite
different to the classical global cuts such as lift-and-project. Therefore, DC cut
can often provide deeper cut than the global cuts, so it plays an important role
in accelerating the convergence of the cutting plane method.
Another important feature of our algorithms is the possibility to find a global
optimal solution without small gap between upper and lower bound. We can
observe that the cases (a), (b) and (d) are terminated without small gap. This
is due to the fact that the introduction of cutting planes yields an empty set,
thus no more computations are required.
More numerical test results for large-scale cases will be reported in the full-
length paper.
References
1. Balas, E., Ceria, S., Cornuéjols, G.: A lift-and-project cutting plane algorithm for
mixed 0–1 programs. Matematical programming. 58, 295–324 (1993)
2. Cornuéjols, G.: Valid inequalities for mixed integer linear programs. Math. Pro-
gram. 112(1), 3–44 (2008)
3. Gurobi 8.1.0. http://www.gurobi.com
4. Le Thi, H.A., Pham, D.T., Le Dung, M.: Exact penalty in dc programming. Viet-
nam J. Math. 27(2), 169–178 (1999)
5. Le Thi, H.A., Pham, D.T.: The DC (difference of convex functions) programming
and DCA revisited with DC models of real world nonconvex optimization problems.
Ann. Oper. Res. 133, 23–46 (2005)
6. Le Thi, H.A., Nguyen, Q.T., Nguyen, H.T., Pham, D.T.: Solving the earliness
tardiness scheduling problem by DC programming and DCA. Math. Balk. 23(3–
4), 271–288 (2009)
7. Le Thi, H.A., Moeini, M., Pham, D.T.: Portfolio selection under downside risk mea-
sures and cardinality constraints based on DC programming and DCA. Comput.
Manag. Sci. 6(4), 459–475 (2009)
8. Le Thi, H.A., Pham, D.T., Huynh, V.N.: Exact penalty and error bounds in dc
programming. J. Glob. Optim. 52(3), 509–535 (2012)
9. Le Thi, H.A., Pham, D.T.: DC programming and DCA: thirty years of develop-
ments. Math. Program. 169(1), 5–68 (2018)
10. Ndiaye, B.M., Le Thi, H.A., Pham, D.T., Niu, Y.S.: DC programming and DCA for
large-scale two-dimensional packing problems. In: Pan, J.S., Chen, S.M., Nguyen,
N.T. (eds.) Intelligent Information and Database Systems, LNCS, vol. 7197, pp.
321–330, Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-28490-8 34
11. Nguyen, V.V.: Méthodes exactes pour l’optimisation DC polyédrale en variables
mixtes 0-1 basées sur DCA et des nouvelles coupes. Ph.D. thesis, INSA de Rouen
(2006)
12. Nguyen, Q.T.: Approches locales et globales basées sur la programmation DC et
DCA pour des problèmes combinatoires en variables mixtes 0–1, applications à la
planification opérationnelle. These de doctorat dirigée par Le Thi H.A, Informa-
tique Metz (2010)
13. Niu, Y.S., Pham, D.T.: A DC Programming Approach for Mixed-Integer Linear
Programs. In: Le Thi, H.A., Bouvry, P., Pham, D.T. (eds.) Modelling, Computa-
tion and Optimization in Information Systems and Management Sciences (MCO
2008), Communications in Computer and Information Science, vol. 14, pp. 244–
253. Springer, Berlin (2008). https://doi.org/10.1007/978-3-540-87477-5 27
14. Niu, Y.S.: Programmation DC & DCA en Optimisation Combinatoire et Optimi-
sation Polynomiale via les Techniques de SDP–Codes et Simulations Numériques.
Ph.D. thesis, INSA-Rouen, France (2010)
15. Niu, Y.S., Pham D.T.: Efficient DC programming approaches for mixed-integer
quadratic convex programs. In: International Conference on Industrial Engineering
and Systems Management (IESM 2011), pp. 222–231 (2011)
16. Niu, Y.S.: On combination of DCA branch-and-bound and DC-Cut for solving
mixed 0-1 linear program. In: 21st International Symposium on Mathematical Pro-
gramming (ISMP 2012). Berlin (2012)
17. Niu, Y.S.: A parallel branch and bound with DC algorithm for mixed integer
optimization. In: 23rd International Symposium in Mathematical Programming
(ISMP 2018). Bordeaux, France (2018)
340 Y.-S. Niu et al.
18. Pham, D.T., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory,
algorithm and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)
19. Pham, D.T., Le Thi, H.A.: A D.C. optimization algorithm for solving the trust-
region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)
20. Pham, D.T., Le Thi, H.A., Pham, V.N., Niu, Y.S.: DC programming approaches
for discrete portfolio optimization under concave transaction costs. Optim. Lett.
10(2), 261–282 (2016)
21. Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E.,
Thatcher, J.W. (eds.) Complexity of Computer Computations, The IBM Research
Symposia Series, pp. 85–103. Springer, Boston (1972). https://doi.org/10.1007/
978-1-4684-2001-2 9
22. Schleich, J., Le Thi, H.A., Bouvry, P.: Solving the minimum M-dominating set
problem by a continuous optimization approach based on DC programming and
DCA. J. Comb. Optim. 24(4), 397–412 (2012)
Sentence Compression via DC
Programming Approach
1 Introduction
The recent years have been known by the quick evolution of the artificial intel-
ligence (AI) technologies, and the sentence compression problems attracted the
attention of researchers due to the necessity of dealing with a huge amount of
natural language information in a very short response time. The general idea of
sentence compression is to make a summary with shorter sentences containing
the most important information while maintaining grammatical rules. Nowadays,
there are various technologies involving sentence compression as: text summa-
rization, search engine and question answering etc. Sentence compression will be
a key technology in future human-AI interaction systems.
There are various models proposed for sentence compression. The paper of
Jing [3] could be one of the first works addressed on this topic with many rewrit-
ing operations as deletion, reordering, substitution, and insertion. This approach
The research is funded by Natural Science Foundation of China (Grant No: 11601327)
and by the Key Construction National “985” Program of China (Grant No:
WF220426001).
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 341–351, 2020.
https://doi.org/10.1007/978-3-030-21803-4_35
342 Y.-S. Niu et al.
1
Punctuation is also deemed as word.
Sentence Compression via DC Programming Approach 343
where P (xi |start) stands for the probability of a sentence starting with xi ,
P (xk |xi , xj ) denotes the probability that xi , xj , xk successively occurs in a sen-
tence, and P (end|xi , xj ) means the probability that xi , xj ends a sentence. The
probability P (xi |start) is computed by bigram model, and the others are com-
puted by trigram model based on some corpora.
Constraints: The following sequential constraints will be introduced to restrict
the possible trigram combinations:
Constraint 1 Exactly one word can begin a sentence.
n
αi = 1. (1)
i=1
2
[[m, n]] with m ≤ n stands for the set of integers between m and n.
344 Y.-S. Niu et al.
with given lower and upper bounds of the compression l and ¯l.
Constraint 7 The introducing term for preposition phrase (PP) or subordinate
clause (SBAR) must be included in the compression if any word of the phrase
is included. Otherwise, the phrase should be entirely removed. Let us denote
Ii = {j : xj ∈ PP/SBAR, j = i} the index set of the words included in PP/SBAR
leading by the introducing term xi , then
δj ≥ δi , δi ≥ δj , ∀j ∈ Ii . (7)
j∈Ii
This step helps to extract the sentence trunk by keeping the main meaning of
the original sentence while reducing the number of binary decision variables.
More precisely, we will introduce for each node Ni of the parse tree a label
sNi taking the values in {0, 1, 2}. A value 0 represents the deletion of the node;
1 represents the reservation of the node; whereas 2 indicates that the node can
either be deleted or be reserved. We set these labels as compression rules for
each CFG grammar to support any sentence type of any language.
For the word xi , we go through all its parent nodes till the root S. If the
traversal path contains 0, then δi = 0; else if the traversal path contains only 1,
then δi = 1; otherwise δi will be further determined by solving the ILP model.
The sentence truck is composed by the words xi whose δi are fixed to 1. Using
this method, we can extract the sentence trunk and reduce the number of binary
variables in ILP model.
Step 4 (Solve ILP): Applying an ILP solution algorithm to solve the simplified
ILP model derived in Step 3 and generate a compression. In the next section,
we will introduce a DC programming approach for solving ILP.
min{f (x) := c x : x ∈ S} (P )
min{f (x) : x ∈ K}
Based on the exact penalty theorem [5,11], there exists a large enough param-
eter t ≥ 0 such that the problem (P ) is equivalent to the problem (P t ):
4 Experimental Results
In this section, we present our experimental results for assessing the performance
of the sentence compression model described above.
Our sentence compression model is implemented in Python as a Natural
Language Processing package, called ‘NLPTOOL’ (actually supporting multi-
language tokenization, tagging, parsing, automatic CFG grammar generation,
and sentence compression), which implants NLTK 3.2.5 [19] for creating parsing
trees and Gurobi 8.1.0 [2] for solving the linear relaxation problems R(Pi ) and
the convex optimization subproblems in Step 2 of DCA. The PDCABB algorithm
is implemented in C++ and invoked in python. The parallel computing part in
PDCABB is realized by OpenMP.
better F-scores in average for all compression rates, while the computing time
for both Gurobi and PDCABB are all very short within less than 0.2 s. We can
also see that Gurobi and PDCABB provided different solutions since F-scores
are different. This is due to the fact that branch-and-bound algorithm find only
approximate global solutions when the gap between upper and lower bounds is
small enough. Even both of the solvers provide global optimal solutions, these
solutions could be also different since the global optimal solution for ILP could
be not unique. However, the reliability of our judgment can be still guaranteed
since these two algorithms provided very similar F-score results.
Corpus+Model Solver 50% compression rate 70% compression rate 90% compression rate
F-score (%) Time (s) F-score (%) Time (s) F-score (%) Time (s)
Treebank+P Gurobi 56.5 0.099 72.1 0.099 79.4 0.081
PDCABB 59.1 0.194 76.2 0.152 80.0 0.122
Treebank+H Gurobi 79.0 0.064 82.6 0.070 81.3 0.065
PDCABB 79.9 0.096 82.7 0.171 82.1 0.121
Clarke+P Gurobi 70.6 0.087 80.2 0.087 80.0 0.071
PDCABB 81.4 0.132 80.0 0.128 81.2 0.087
Clarke+H Gurobi 77.8 0.046 85.5 0.052 82.4 0.041
PDCABB 79.9 0.081 85.2 0.116 82.3 0.082
References
1. Clarke, J., Lapata, M.: Global inference for sentence compression: an integer linear
programming approach. J. Artif. Intell. Res. 31, 399–429 (2008)
2. Gurobi 8.1.0. http://www.gurobi.com
3. Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of
the 6th Applied Natural Language Processing Conference, pp. 310–315 (2000)
4. Knight, K., Marcu, D.: Summarization beyond sentence extraction: a probalistic
approach to sentence compression. Artif. Intell. 139, 91–107 (2002)
5. Le Thi, H.A., Pham, D.T., Le Dung, M.: Exact penalty in dc programming. Viet-
nam J. Math. 27(2), 169–178 (1999)
6. Le Thi, H.A., Pham, D.T.: A continuous approach for large-scale constrained
quadratic zero-one programming. Optimization 45(3), 1–28 (2001)
7. Le Thi, H.A., Pham, D.T.: The dc (difference of convex functions) programming
and dca revisited with dc models of real world nonconvex optimization problems.
Ann. Oper. Res. 133, 23–46 (2005)
8. Le Thi, H.A., Nguyen, Q.T., Nguyen, H.T., et al.: Solving the earliness tardiness
scheduling problem by DC programming and DCA. Math. Balk. 23, 271–288 (2009)
9. Le Thi, H.A., Moeini, M., Pham, D.T.: Portfolio selection under downside risk mea-
sures and cardinality constraints based on DC programming and DCA. Comput.
Manag. Sci. 6(4), 459–475 (2009)
10. Le Thi, H.A., Minh, L.H., Pham, D.T., Bouvry, P.: Solving the perceptron problem
by deterministic optimization approach based on DC programming and DCA. In:
Proceeding in INDIN 2009, Cardiff. IEEE (2009)
11. Le Thi, H.A., Pham, D.T., Huynh, V.N.: Exact penalty and error bounds in dc
programming. J. Glob. Optim. 52(3), 509–535 (2012)
12. MacDonald, D.: Discriminative sentence compression with soft syntactic con-
straints. In: Proceedings of EACL, pp. 297–304 (2006)
Sentence Compression via DC Programming Approach 351
13. Niu, Y.S., Pham, D.T.: A DC programming approach for mixed-integer linear
programs. In: Modelling, Computation and Optimization in Information Systems
and Management Sciences, CCIS, vol. 14, pp. 244–253 (2008)
14. Niu, Y.S.: Programmation DC & DCA en Optimisation Combinatoire et Optimi-
sation Polynomiale via les Techniques de SDP. Ph.D. thesis, INSA, France (2010)
15. Niu, Y.S., Pham, D.T.: Efficient DC programming approaches for mixed-integer
quadratic convex programs. In: Proceedings of the International Conference on
Industrial Engineering and Systems Management (IESM2011), pp. 222–231 (2011)
16. Niu, Y.S.: On difference-of-SOS and difference-of-convex-SOS decompositions for
polynomials (2018). arXiv:1803.09900
17. Niu, Y.S.: A parallel branch and bound with DC algorithm for mixed integer opti-
mization. In: The 23rd International Symposium in Mathematical Programming
(ISMP2018), Bordeaux, France (2018)
18. Nguyen, H.T., Pham, D.T.: A continuous DC programming approach to the strate-
gic supply chain design problem from qualified partner set. Eur. J. Oper. Res.
183(3), 1001–1012 (2007)
19. NLTK 3.2.5: The Natural Language Toolkit. http://www.nltk.org
20. Pham, D.T., Le Thi, H.A., Pham, V.N., Niu, Y.S.: DC programming approaches
for discrete portfolio optimization under concave transaction costs. Optim. Lett.
10(2), 261–282 (2016)
21. Schleich, J., Le Thi, H.A., Bouvry, P.: Solving the minimum m-dominating set
problem by a continuous optimization approach based on DC programming and
DCA. J. Comb. Optim. 24(4), 397–412 (2012)
Discrete Optimization and Network
Optimization
A Horizontal Method of Localizing Values
of a Linear Function in
Permutation-Based Optimization
1 Introduction
Combinatorial optimization problems (COPs) with permutation as candidate
solutions commonly known as permutation-based problems [13] can be found in
a variety of application areas such as balancing problems associated with chip
design, ship loading, aircraft outfitting, turbine balancing as well as in geomet-
ric design, facility layout, VLSI design, campus design, assignments, schedul-
ing, routing, scheduling, process communications, ergonomics, network analysis,
cryptography, etc. [4,9,10,13–15,19,23–25,27,34,35].
Different COPs are representable easily by graph-theoretic approach (GTA)
[1–3,6–8,11]. First of all, it concerns COP on a set E coinciding with a vertex set
of their convex hull P (vertex-located sets, VLSs [28,30]). Such COPs are equiv-
alent to optimization problems on a node set of a skeleton graph G = (E, E)
of the polytope P , where E is an edge set of P . Note that in case if E is not
a VLS, approaches to an equivalent reformulation of the COP as an optimiza-
tion problem on a VLS in higher dimensional space can be applied first [31,32].
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 355–364, 2020.
https://doi.org/10.1007/978-3-030-21803-4_36
356 L. Koliechkina and O. Pichugina
The benefits of using the graph-theoretic approach are not limited to simple
illustrations, but also provide an opportunity to develop approaches to solving
COPs based on using configuration graphs [11] and structural graphs [1–3,6–8]
of the problems. Localization of COP-solutions or values of the objective func-
tion is an interesting technique allowing to reduce considerably a search domain
based on deriving specifics of the domain, type of constraints and the objec-
tive function [1,3,6,7,29]. In particular, the method of ordering the values of
objective function on an image En (A) in Rn of a set of n-permutations induced
by a set A is considered in [1]. It consists in constructing a Hamiltonian path
in an E of a permutation graph, which is a skeleton graph of the permutohe-
dron Pn (A) = conv(En (A)). In [2], a similar problem is considered on an image
Enk (A) in Euclidean space of a set multipermutations induced by a multiset A
including k different elements. In this case, a skeleton graph of the generalized
permutohedron (the multipermutation graph) is considered instead of the per-
mutation graph. In [8], linear constrained single and multiobjective COPs on
Enk (A) are solved using the multipermutation graph, etc.
This paper is dedicated to developing GTA-techniques [1–3,6–8] for solving
permutation-based COPs (PBOPs) related to localization of objective function
values. Namely, a generalization of Subset Sum Problem (SSP) [5,12], which as
known NP-complete COP, from the Boolean set Bn as an admissible domain to
En (A) (further referred to as a permutation-based SSP, BP-SSP). Also, we will
consider versions of BP-SSP where a feasible solution x∗ is sought (BP-SSP1)
or a complete solution X ∗ is sought (BP-SSP2).
Jm = {1, . . . , m}.
A Horizontal Method of Localizing Values in Permutation Optimization 357
E = En (A);
f1 (x) = f (x) − z0 ≤ 0; f2 (x) ≤ −f (x) + z0 ≤ 0, (6)
f (x) = z0 ,
is a lower/upper bound on the branch [.](cd), where zmin (cd) = f (ymin (cd)),
zmax (cd) = f (ymax (cd)).
G(cd) – is a skeleton graph of conv(E(cd)), G(cd) is a directed grid-graph
shown on Fig. 1. G(cd) is of 2(n − lcd ) nodes, two of which have been examined,
namely, top-left and bottom-right ones: zmax (cdn−lcd ) = zmin (cd), zmin (cd1 ) =
zmin (cd). In the terminology of [8], G(cd) is a two-dimensional structural graph
of BP-COP2.
Pruning branches:
– if z0 > zmax (cd) or z0 < zmin (cd), then prune E(cd) (a rule PB1);
– if z0 = zmax (cd), then find Xmax (cd), upload X ∗ : X ∗ = X ∗ ∪Ymax (cd), where
Ymax (cd) = {(x, cd)}x∈Xmax (cd) , and prune E(cd) (a rule PB2);
– if z0 = zmin (cd), then find Xmin (cd), upload X ∗ : X ∗ = X ∗ ∪ Ymin (cd), where
Ymin (cd) = {(x, cd)}x∈Xmin (cd) , and prune E(cd) (a rule PB3).
360 L. Koliechkina and O. Pichugina
4 BP.COP2 Example
Solve BP.COP2 with n = 6, c = (2, 3, 4, 6, 7, 8), A = J6 , z0 = 109.
Coefficients of c are different, therefore, by (2)–(3), rules PB2, PB3 are sim-
plified to:
– if z0 = zmax (cd), then X ∗ = X ∗ ∪ (xmax (cd), cd) (a rule PB2’);
– if z0 = zmin (cd), then X ∗ = X ∗ ∪ (xmin (cd), cd) (a rule PB3’).
Step 1. cd = X 0 = ∅, lcd = 0.
xmin (cd) = xmin = (6, 5, 4, 3, 2, 1), xmax (cd) = xmax = (1, 2, 3, 4, 5, 6),
zmin (cd) = 83 < z0 = 109 < zmax (cd) = 127. The branch E(cd) is not dis-
carded. branch(E(cd)) = {E(i)i∈J6 }. Graph G(∅) is depicted on Fig. 3 with E(6)
on top and E(1) on bottom, as well as with bounds lb(E(i)), ub(E(i)), i ∈ Jn .
362 L. Koliechkina and O. Pichugina
5 Conclusion
References
1. Donec, G.A., Kolechkina, L.M.: Construction of Hamiltonian paths in graphs of
permutation polyhedra. Cybern. Syst. Anal. 46(1), 7–13 (2010). https://doi.org/
10.1007/s10559-010-9178-1
2. Donec, G.A., Kolechkina, L.M.: Extremal Problems on Combinatorial Configura-
tions. RVV PUET, Poltava (2011)
3. Donets, G.A., Kolechkina, L.N.: Method of ordering the values of a linear function
on a set of permutations. Cybern. Syst. Anal. 45(2), 204–213 (2009). https://doi.
org/10.1007/s10559-009-9092-6
4. Gimadi, E., Khachay, M.: Extremal Problems on Sets of Permutations. Ural Federal
University, Yekaterinburg (2016). [in Russian]
5. Kellerer, H., Pferschy, U., Pisinger, D.: Knapsack Problems. Springer, Berlin, New
York (2010)
6. Koliechkina, L.M., Dvirna, O.A.: Solving extremum problems with linear fractional
objective functions on the combinatorial configuration of permutations under mul-
ticriteriality. Cybern. Syst. Anal. 53(4), 590–599 (2017). https://doi.org/10.1007/
s10559-017-9961-3
7. Koliechkina, L.N., Dvernaya, O.A., Nagornaya, A.N.: Modified coordinate method
to solve multicriteria optimization problems on combinatorial configurations.
Cybern. Syst. Anal. 50(4), 620–626 (2014). https://doi.org/10.1007/s10559-014-
9650-4
8. Koliechkina, L., Pichugina, O.: Multiobjective Optimization on Permutations with
Applications. DEStech Trans. Comput. Sci. Eng. Supplementary Volume OPTIMA
2018, 61–75 (2018). https://doi.org/10.12783/dtcse/optim2018/27922
9. Kozin, I.V., Maksyshko, N.K., Perepelitsa, V.A.: Fragmentary structures in dis-
crete optimization problems. Cybern. Syst. Anal. 53(6), 931–936 (2017). https://
doi.org/10.1007/s10559-017-9995-6
10. Korte, B., Vygen, J.: Combinatorial Optimization: Theory and Algorithms.
Springer, New York (2018)
11. Lengauer, T.: Combinatorial Algorithms for Integrated Circuit Layout.
Vieweg+Teubner Verlag (1990)
12. Martello, S., Toth, P.: Knapsack Problems: Algorithms and Computer Implemen-
tations. Wiley, Chichester, New York (1990)
13. Mehdi, M.: Parallel Hybrid Optimization Methods for permutation based problems
(2011). https://tel.archives-ouvertes.fr/tel-00841962/document
14. Pichugina, O.: Placement problems in chip design: Modeling and optimization. In:
2017 4th International Scientific-Practical Conference Problems of Infocommuni-
cations. Science and Technology (PIC S&T). pp. 465–473 (2017). https://doi.org/
10.1109/INFOCOMMST.2017.8246440
15. Pichugina, O., Farzad, B.: A human communication network model. In: CEUR
Workshop Proceedings, pp. 33–40. KNU, Kyiv (2016)
16. Pichugina, O., Yakovlev, S.: Convex extensions and continuous functional repre-
sentations in optimization, with their applications. J. Coupled Syst. Multiscale
Dyn. 4(2), 129–152 (2016). https://doi.org/10.1166/jcsmd.2016.1103
17. Pichugina, O.S., Yakovlev, S.V.: Functional and analytic representations of the
general permutation. East. Eur. J. Enterp. Technol. 79(4), 27–38 (2016). https://
doi.org/10.15587/1729-4061.2016.58550
18. Pichugina, O.S., Yakovlev, S.V.: Continuous representations and functional exten-
sions in combinatorial optimization. Cybern. Syst. Anal. 52(6), 921–930 (2016).
https://doi.org/10.1007/s10559-016-9894-2
364 L. Koliechkina and O. Pichugina
Abstract. Well-known graph theory problems are graph coloring and finding
the maximum clique in an undirected graph, or shortly - MCP. And these
problems are closely related. Vertex coloring is usually considered an initial step
before the start of finding maximum clique of a graph. The maximum clique
problem is considered to be of NP-hard complexity, which means that there is
no algorithm found that could solve this kind of problem in polynomial time.
The maximum clique algorithms employ a lot the heuristic vertex coloring
algorithm to find bounds and estimations. One class of such algorithms executes
the coloring one only in the first stage, so those algorithms less concerned on the
performance of the heuristic and more on the discovered colors. The researchers
always face a problem, which heuristic vertex coloring algorithm should be
selected to improve the performance of the core algorithm. Here we tried to give
a lot of insights on existing heuristic vertex coloring algorithms and compare
them identifying their ability to find color classes - 17 coloring algorithms are
investigated: described and tested on random graphs.
1 Introduction
Let G = (V, E) be an undirected graph. Then, V is a finite set of elements called vertices
and E is a finite set of unordered pairs of vertices, called edges. The cardinality of a set
of vertices, or just the number of its elements, is called the order of a graph and is
denoted as n = |V|. The cardinality of a set of edges, or just the number of its edges, is
called the size of a graph and is denoted as m = |E| [1]. If vi and vj are vertices that
belong to one and the same graph and there is a relationship between them, which ends
up being an edge, then these vertices are adjacent. The degree of vertex v in graph G is
the number of edges incident to it [1]. Or, in other words, it is the number of this
vertex’s neighbors, which are connected to it. The maximum degree of a vertex in a
graph is the number of edges of a vertex with the maximum neighbors. The minimum
degree of a vertex in a graph is the number of edges of a vertex, which has the least
neighbors. Usually, the degree of a vertex is denoted as deg(v). Density is the ratio of
the edges in graph G to the number of vertices of the graph. It is defined as g(G).
A graph is considered to be complement if it has the same vertices as graph G and any
two vertices in this graph are adjacent only if the same vertices are nonadjacent in the
original graph. A simple graph is considered to be an undirected graph with finite sets
of vertices and edges, which has no loops or multiple edges. A subgraph G0 ¼ ðV 0 ; E0Þ
is considered to be a subset of the vertices of graph G with the corresponding edges.
But not all possible edges may be included. This means that if vertices vi and vj are
adjacent in graph G, then it may happen that on a subgraph of graph G they won’t have
an edge between them. V 0 V; E 0 E. An induced subgraph G0 ¼ ðV 0 ; E 0 Þ is considered
to be a subset of the vertices of graph G with all their corresponding edges.
G½V 0 ¼ ðV 0 V; E 0 ¼ f vi ; vj ji 6¼ j; vi ; vj 2E; vi ; vj 2V 0 gÞ. A complete subgraph G0 ¼
ðV 0 ; E0 Þ is considered to be a subset of the vertices of graph G with all their corre-
sponding edges, where each pair of vertices is connected by an edge. Clique is a
complete subgraph of graph G. The clique V 0 in graph G is called maximal if there does
not exist any other V00, such that V 0 V00. The size of the largest maximum clique in
graph G is called the clique number [1]. An independent set (IS) of a graph G is any
subset of vertices V 0 V, where vertices are not pairwise adjacent. So, it is not hard to
conclude that for any clique in graph G, there is an independent set in a complement
graph G0 and vice versa. The assignment of colors to vertices of a graph according to
algorithm’s construction. If we have an undirected graph G ¼ ðV; EÞ, then the process
of colors assignment must follow the rules below:
• vi ; vj 2E; i 6¼ j
• cðvi Þ 6¼ c vj ; i 6¼ j
2 Coloring Algorithms
Many algorithms have been developed to solve the graph coloring problem heuristically.
But Greedy remains to be the basic algorithm to assign colors in a graph. It provides a
relatively good solution in a small amount of time. The order, in which the algorithm
colors the vertices, plays a major role in the process and heavily affects the quality of the
coloring. Therefore, there are many algorithms, which employ different ordering
heuristics to determine the order before coloring the vertices. These algorithms are
mostly based on Greedy but use additional vertex ordering to achieve better perfor-
mance. As a rule, they surpass Greedy in the number of colors used, producing better
results but taking more time to complete. The most popular ordering heuristics are:
• First-Fit ordering - the most primitive ordering existing. Assigns each vertex a
lowest possible color. This technique is the fastest amount ordering heuristics.
• Degree base ordering – uses a certain criteria to order the vertices and then
chooses the correct one to color. Uses a lot more time compared to First-Fit
ordering, but produces much better results in terms of the number of used colors.
There are many different degree ordering heuristics, but he most popular among
them are:
a. Random: colors the vertices of a graph in random order or according to random
degree function, i.e. random unique numbers given to every vertex;
b. Largest-First: colors the vertices of a graph in order of decreasing degree, i.e. it
takes into account the number of neighbors of each vertex;
c. Smallest-Last: repeatedly assigns weights to the vertices of a graph with the
smallest degree, and removes them from the graph, then colors the vertices
according to their weights in decreasing order [4];
d. Incidence: sequentially colors the vertices of a graph according to the highest
number of colored neighbors;
e. Saturation: iteratively colors the vertices of a graph by the largest number of
distinctly colored neighbors;
f. Mixed/Combined: uses a combination of known ordering heuristics. For
example, saturation degree ordering combined with largest first ordering, which
is used only to solve situations, when there is a tie, i.e. saturation degree of some
vertices is the same.
Sequential algorithms tend to do a lot of tasks that could have been executed simul-
taneously. That is why many popular algorithms have their parallel versions.
1. Greedy – classical algorithm introduced by Welsh and Powell in 1967 [3]. It iterates
over the vertices in a graph and assigns each vertex a smallest possible color, which
is not assigned to any adjacent vertex, i.e. no neighbor must share the same color.
2. Largest-First - Welsh and Powell also suggested an ordering for the greedy algo-
rithm called largest first. It is based on vertices’ degrees. The algorithm orders the
368 D. Kumlander and A. Kulitškov
vertices according to the number of neighbors that each of them has and then starts
with the greedy coloring.
3. Largest-First V2 - It is a slightly modified version of Largest-First algorithm. In this
algorithm more than one vertex could be colored in each iteration, i.e. after coloring
the vertex with the largest number of neighbors, the algorithm also assigns the same
color to all the vertices, which follow the rules of coloring - no adjacent vertices
must share the same color, and, finally, it removes these vertices from the graph.
4. Largest-First V3 - Based on the second version we made a third edition of the
Largest-First algorithm. The main idea of the algorithm is the same as in V2,
however, this time there will be a reordering of vertices in each iteration, meaning
that if the vertex is removed from the graph, then its neighbor’s degree is decreased.
5. DSatur- This heuristic algorithm was developed by Daniel Brelaz in 1979 [5]. The
core idea of it is to order the vertices by their saturation degrees. If a tie occurs, then
the vertex is chosen by the largest number of uncolored neighbors. By assigning
colors to a vertex with the largest number of distinctly colored neighbors, DSatur
minimizes the possibility of setting an incorrect color [2].
6. DSatur V2 - another interesting version of DSatur [6]. At first, it finds a largest
clique of graph and assigns each a distinct color. Then, it just removes the newly
colored vertices from the graph. After this procedure, the algorithm executes as the
previous DSatur. The greedy algorithm takes a complement graph, finds the largest
independent set and colors it with a distinct color. Then, removes these vertices
from the graph and starts working as the first version of DSatur.
7. Incidence degree ordering (IDO) - This ordering was firstly introduced by Daniel
Brelaz [5] and was modified by Coleman and More in their work [7]. In one word, it
is a modification of the DSatur algorithm. The main principle of this heuristic is to
order vertices by decreasing number of the vertices’ colored neighbors. If a tie
occurs, it can be decided, which vertex is going to be chosen, by the usage of
random numbers. The coloring itself is done by the Greedy algorithm.
8. MinMax - The MinMax algorithm was introduced by Hilal Almara’Beh and Amjad
Suleiman in their work in 2012 [8]. The main function of this algorithm is to find
the maximum independent set, but it could be used for coloring purposes as well
because independent sets are color classes.
3. LDO-IDO - This modification was introduced by Dr. Hussein Al-Omari and Khair
Eddin Sabri in their work in 2006 [9]. The basic heuristic for this algorithm is the
Largest-First. If a tie occurs, then the IDO heuristic decides, which vertex to take.
On the whole, this is almost the same algorithm as the Largest-First V3 with an IDO
function inside, in one word, the first ordering is being done by the largest number
of neighbors and then by the largest number of colored neighbors.
4. DSatur-LDO - This modification of the DSatur algorithm was also introduced by
Dr. Hussein Al-Omari and Khair Eddin Sabri in their work in 2006 [9]. The
algorithm works as DSatur but if a tie occurs, then Largest-First algorithm steps into
the action to solve the conflict. According to the results, this heuristic works a little
better than the original DSatur within the same amount of time.
5. DSatur-IDO-LDO - In this algorithm ties are resolved by Incidence Degree
Ordering at first, then the remaining ties are resolved by the Largest Degree
Ordering [10].
1. Jones and Plassmann algorithm - The algorithm was firstly proposed by Jones and
Plassmann in their work in 1993 [11]. The algorithm is based on the Lubys parallel
algorithm [12]. The core idea was to construct a unique set of weights at the
beginning that would be used throughout the algorithm itself. For example, random
numbers. Any conflict of the same random numbers is solved by the vertex number.
Each iteration the JP algorithm finds the independent set of a graph, i.e. all the
vertices, which weight is higher than the weight of the neighboring vertices, and
then assigns colors to these vertices using the Greedy algorithm. Every action is
done in parallel.
2. Jones and Plassmann V2 - Another version of JP algorithm was introduced by
William Hasenplaugh, Tim Kaler, Tao B. Schardl and Charles E. Leiserson in their
work in 2014 [4]. The idea behind the modification was to use recursion. The
algorithm orders the vertices in the order of function p, which generates random
numbers. It starts by partitioning the neighbors of each vertex into predecessors (the
vertices with larger priorities) and successors (the vertices with lower priorities) [4].
If there are no vertices in predecessors, then the algorithm begins coloring. It has a
helper function named JpColor, which uses recursion to color the vertices. The
color is chosen by collecting all the colors from the predecessors and choosing the
smallest possible (this is being done in the GetColor helper function). When the
vertex with the empty predecessors list is colored, the algorithm searches for
changes in this vertex successors list for vertices with counter equals to zero starts
coloring them. All this is done in parallel subtasks.
3. Parallel Largest-First - As a base algorithm for Parallel Largest-First is used JP
algorithm, but as the heuristic – Largest-First. The main difference is that instead of
weights system, used in JP, here they are replaced by finding the largest degree of
each vertex. However, random numbers are not removed.
4. Parallel Smallest-Last - The Smallest-Last heuristics was firstly introduced by
Matula in his work in 1972 [13]. The SL heuristic’s system of weights is more
370 D. Kumlander and A. Kulitškov
sophisticated and complex. The algorithm uses two phases [14]: Weighting phase
and Coloring phase. The weighting phase begins by finding vertices that correspond
to the current smallest degree in the graph. These vertices are assigned the current
weight and removed from the graph. The degree of all the neighbors of deleted
vertices are decreased. All these steps are repeated until every vertex receives its
weight.
5. Non-Parallel Implementations - These algorithms include:
• Greedy From Parallel – non-parallel copy of Jones and Plassmann algorithm;
• Greedy V2 From Parallel – non-parallel copy of Jones and Plassmann V2
algorithm;
• Largest-First From Parallel – non-parallel copy of Parallel Largest-First
algorithm;
• Smallest-Last From Parallel – non-parallel copy of Parallel Smallest-Last.
In this part we are going to conduct tests to determine the most acceptable coloring
algorithms that could be used later for the maximum clique algorithms. In this study we
focus on the number of used colors by density of the graph. The random graphs are
generated as described by Kumlander in 2005 [2].
80 Greedy
Number of used colors
70
60 LargestFirst
50 LargestFirstV2
40 LargestFirstV3
DSatur
Number of verƟces DSaturV2
Fig. 1. Randomly generated graphs tests’ results compared in used colors. Sequential
algorithms, density 10%.
An Experimental Comparison of Heuristic Coloring Algorithms 371
65
Number of used colors Greedy
60
LargestFirst
55
LargestFirstV2
50
LargestFirstV3
45
350 360 370 380 390 400 410 420 430 440 450 DSatur
Number of verƟces DSaturV2
Fig. 2. Randomly generated graphs tests’ results compared in used colors. Sequential
algorithms, density 50%.
70 Greedy
Number of used colors
60 LargestFirst
50 LargestFirstV2
LargestFirstV3
40
110 114 118 122 126 130 134 138 142 146 150 DSatur
Number of verƟces DSaturV2
Fig. 3. Randomly generated graphs tests’ results compared in used colors. Sequential
algorithms, density 90%.
The best results in terms of used colors among all the sequential algorithms pro-
duced DSatur, DSatur V2 and Largest-First V3. Their performance is much better
compared to the Greedy algorithm; however, it comes with a cost of taking more time
to complete (Figs. 2 and 3).
80 Greedy
Number of used colors
70
IdoLdo
60
IdoLdoRandom
50
LdoIdo
40
2000 2150 2300 2450 2600 2750 2900 3050 3200 3350 3500 DSaturLdo
Number of verƟces DSaturIdoLdo
Fig. 4. Randomly generated graphs tests’ results compared in used colors. Combined
algorithms, density 10%.
70
Number of used colors
65 Greedy
60 IdoLdo
55 IdoLdoRandom
50 LdoIdo
45 DSaturLdo
350 360 370 380 390 400 410 420 430 440 450 DSaturIdoLdo
Number of verƟces
Fig. 5. Randomly generated graphs tests’ results compared in used colors. Combined
algorithms, density 50%.
When it comes to consumed time, then LDO-IDO clearly wins among these three,
although it is bigger than the same of the Greedy one.
Parallel Largest-First algorithm. Parallel Jones and Plassmann and its second version
perform very similar to the Greedy algorithm, using less or more colors compared to
Greedy (Figs. 7, 8 and 9).
70
Number of used colors
65 Greedy
60
IdoLdo
55
IdoLdoRandom
50
LdoIdo
45
40 DSaturLdo
110 114 118 122 126 130 134 138 142 146 150 DSaturIdoLdo
Number of verƟces
Fig. 6. Randomly generated graphs tests’ results compared in used colors. Combined
algorithms, density 90%.
80
Greedy
Number of used colors
75
70 ParallelJp
65
ParallelJpV2
60
55 ParallelLargestFirst
50
ParallelSmallestLast
45
20002150230024502600275029003050320033503500 GreedyFromParallel
Number of verƟces GreedyV2FromParallel
Fig. 7. Randomly generated graphs tests’ results compared in used colors. Parallel algorithms,
density 10%.
65 Greedy
Number of used colors
60 ParallelJp
55 ParallelJpV2
50 ParallelLargestFirst
ParallelSmallestLast
45
350 360 370 380 390 400 410 420 430 440 450 GreedyFromParallel
Number of verƟces GreedyV2FromParallel
Fig. 8. Randomly generated graphs tests’ results compared in used colors. Parallel algorithms,
density 50%.
374 D. Kumlander and A. Kulitškov
70
Number of used colors
Greedy
65
60 ParallelJp
55 ParallelJpV2
50 ParallelLargestFirst
45 ParallelSmallestLast
40
GreedyFromParallel
110 114 118 122 126 130 134 138 142 146 150
Number of verƟces GreedyV2FromParallel
Fig. 9. Randomly generated graphs tests’ results compared in used colors. Parallel algorithms,
density 90%.
In terms of time used to complete the task, Parallel Smallest-Last demonstrates the
worst results. The performance of Parallel Largest-First is not far away from Parallel
Smallest-Last algorithm. The only thing that should be noted is the fact that on 30%,
50% and 80% density Parallel Largest-First algorithm’s execution time is very similar
to Parallel JP despite the fact that it uses largest first ordering.
4 Conclusion
Algorithms that showed the better results among others in their group in terms of
number of used colors. And these algorithms are:
• Among sequential: DSatur, DSatur V2 and Largest-First V3;
• Among combined: DSatur-LDO, DSatur-IDO-LDO and LDO-IDO;
• Among parallel: Parallel Largest-First and Parallel Smallest-Last.
References
1. Kubale, M.: Graph Colorings. American Mathematical Society, US (2004)
2. Kumlander, D.: Some practical algorithms to solve the maximum clique problem. Tallinn
University of Technology, Tallinn (2005)
3. Welsh, D.J.A., Powell, M.B.: An upper bound for the chromatic number of a graph and its
application to timetabling problems. Comput. J. 10(1), 85–86 (1967)
4. Hasenplaugh, W., Kaler, T., Schardl, T.B., Leiserson, C.E.: Ordering heuristics for parallel
graph coloring. In: Proceedings of the 26th ACM Symposium on Parallelism in Algorithms
and Architectures–SPAA’14, pp. 166–177 (2014)
5. Brelaz, D.: New methods to color the vertices of a graph. Commun. ACM 22(4), 251–256
(1979)
6. Andrews, P.S., Timmis, J., Owens, N.D.L., Aickelin, U., Hart, E., Hone, A., Tyrrell, A.M.:
Artificial Immune Systems. York, UK (2009)
An Experimental Comparison of Heuristic Coloring Algorithms 375
7. Coleman, T.F., More, J.J.: Estimation of sparse Jacobian matrices and graph coloring
problems. SIAM J. Numer. Anal. 20, 187–209 (1983)
8. Almarabeh, H., Suleiman, A.: Heuristic algorithm for graph coloring based on maximum
independent set. J. Appl. Comput. Sci. Math. 6(13), 9–18 (2012)
9. Al-Omari, H., Sabri, K.E.: New graph coloring algorithms. J. Math. Stat. 2(4), 439–441
(2006)
10. Saha, S., Baboo, G., Kumar, R.: An efficient EA with multipoint guided crossover for bi-
objective graph coloring problem. In: Contemporary Computing: 4th International
Conference-IC3 2011, pp. 135–145 (2011)
11. Jones, M.T., Plassmann, P.E.: A parallel graph coloring heuristic. SIAM J. Sci. Comput. 14
(3), 654–669 (1993)
12. Luby, M.: A simple parallel algorithm for the maximal independent set problem.
SIAM J. Comput. 15(4), 1036–1053 (1986)
13. Matula, D.W., Marble, G., Isaacson, J.D.: Graph coloring algorithms. Academic Press, New
York (1972)
14. Allwright, J.R., Bordawekar, R., Coddington, P.D., Dincer, K., Martin, C.L.: A comparison
of parallel graph coloring algorithms. Technical Report SCCS-666 (1995)
Cliques for Multi-Term Linearization of
0–1 Multilinear Program for Boolean
Logical Pattern Generation
For i ∈ S, we let
Ji := {j ∈ N | Aij = 0}.
Since the dataset is duplicate and contradiction free, all Ji ’s are unique and
|Ji | = n, ∀i ∈ S.
In [15], we showed that the 0–1 MP below holds a unifying theory to LAD
pattern generation:
(PG) : max ϕ+ (x) + l(x) ϕ− (x) = 0, x ∈ {0, 1}2n
The minimal cover inequalities provide a poor linear programming (LP) relax-
ation bound, however. For 0–1 linearly overestimating ϕ+ , McCormick con-
cave envelopes for a 0–1 monomial can serve the purpose (e.g., [2,10,12,14].)
This ‘standard’ method achieves the goal by means of introducing m+ (where
m+ = |S + |) variables
yi = (1 − xj ), i ∈ S + (1)
j∈Ji
and n × m+ inequalities
yi ≤ 1 − xj , j ∈ Ji , i ∈ S + (2)
378 K. Yan and H. S. Ryoo
or aggregate them with respect to i via standard probing techniques and logical
implications in integer programming (e.g., [5–7]) to
yi ≤ |Ij |(1 − xj ), j ∈ N ,
i∈Ij+
results in this note subsume our earlier results; thus, yield a tighter polyhedral
relaxation of (PG) in terms of a smaller number of 0–1 linear inequalities.
As for organization, this note is consisted of three parts. Following this section
of introduction and background, we present the main results in Sect. 2 and fol-
low it by a preliminary numerical study. The numerical study compares our new
relaxation method against the McCormick relaxation method in pattern gen-
eration experiments with six machine learning benchmark datasets; recall that
McCormick provides the strongest lower bound when compared to two alterna-
tives in [5–7,13]. In short, the performance of the new results is far superior and
demonstrates their practical utilities in data mining applications well.
2 Main Results
Definition 2. A vertex clique cover is a set of cliques whose union cover all
vertices of a graph. A vertex maximal clique cover is vertex clique cover in which
each clique is maximal.
Finally, let
+
IP G := x ∈ {0, 1}2n , y ∈ [0, 1]m (1), ϕ− (x) = 0 .
For variable x regarding a pair of original and negated Boolean attributes, the
requirement for a logical statement gives the following complementary relation
which is of great importance in deriving stronger valid inequalities for IP G .
xj + xj c ≤ 1, (3)
is valid for IP G .
yi + yk ≤ 1 − xj , (5)
is valid for IP G .
yi + yk = 0 ≤ 1 − xj .
yi + yk = 0 < 1 − xj .
Assume xι = 0, ∀ι ∈ J \ {j}, to satisfy (6), there must exist ι ∈ J \ J such that
xι = 1. Note that
N = J ∪ J c ∪ (Ji \ J ) ∪ (Jk \ J ) ,
where J c := {j c |j ∈ J}. One can see that ι ∈ J c , otherwise ιc ∈ J ⊂ J , which
contradicts ι ∈ J . This indicates either ι ∈ Ji \ J or ι ∈ Jk \ J. Without loss of
generality, assume ι ∈ Ji \ J, then yi ≤ 1 − xι = 0, and yk ≤ 1 − xιc = 1 via (3).
Thus
yi + yk ≤ 1 = 1 − xj .
This completes the proof.
To fully extend the result above, we represent the data under analysis in a
graph, as done so in [16,17]. The difference in this graph representation is that,
while each observation in Ij+ maps to a unique node in a graph, we now introduce
an edge between a pair of vertices if the pair satisfies the condition set forth in
Lemma 1. The resulting undirected graph is denoted by G+ j . Now, we have the
following result for each clique of G+
j .
Cliques for Multi-term Linearization of 0–1 Multilinear Program 381
Theorem 2. For a clique with vertex set Ω in Theorem 1, (7) defines a facet
of conv(Ξ).
Proof. First note that (7) is valid via the proof of Theorem 1. Suppose that (7)
is not facet-defining. Then, there exists a facet-defining inequality of conv(Ξ) of
the form
αj xj + βi yi ≤ γ, (8)
j∈N i∈Ω∪Ij+c
where (α, β) = (0, 0), such that (7) defines a face of the facet of conv(Ξ) defined
by (8). That is:
F := (x, y) ∈ Ξ xj + yi = 1
i∈Ω
⎧ ⎫
⎪ ⎪
⎨ ⎬
⊆ F := (x, y) ∈ Ξ αj xj + βi yi = γ
⎪
⎩ ⎪
⎭
j∈N i∈Ω∪Ij+c
382 K. Yan and H. S. Ryoo
βi = 0, ∀i ∈ Ij+c .
Case 2. (xj = 0) By the same argument that a pattern exists for a contradiction-
free dataset, we have βi = γ, i ∈ Ω and αj c + βi = γ, i ∈ Ω for solutions with
xj c = 0 and xj c = 1 respectively. These yield
αj c = 0 and βi = γ, ∀i ∈ Ω.
Summarizing, the two cases above show
αι = 0, ∀ι ∈ N \ {j}, βi = 0, ∀i ∈ Ij+c , and αj = βi = γ > 0, ∀i ∈ Ω,
where γ > 0 is from our supposition that (7) is dominated by (8). This shows
that (8) is a positive multiple of (7) and completes the proof.
We close this section with two remarks.
Remark 1. The last statement of Thorem 1 directs that only the maximal
cliques of G+ j need to be considered. As finding all maximal cliques is time-
consuming (e.g., [11],) we recommend instead using a vertex maximal clique
cover of G+j .
Remark 2. We wish to add that two main results – namely, Theorems 1 and 3 –
in [17] are subsumed by Theorem 1 above. Specifically, via Theorem 1, we obtain
a set of inequalities that dominate those by the forementioned results from [17]
in large part along with a small number of common ones. (We have theorems
and proofs for this but omit those here for reasons of space with respect to the
10 page limitation for the conference proceedings.) This helps greatly improve
the overall efficiency of LAD pattern generation via a more effective multi-term
relaxation of (PG) and its solution, as demonstrated in the following section.
3 A Preliminary Experiment
In this preliminary study, we compare the new results of the previous section
against the standard, McCormick relaxation method. We used six machine learn-
ing benchmark datasets from [9] for this experiment.
Cliques for Multi-term Linearization of 0–1 Multilinear Program 383
All MILP instances generated were solved by CPLEX [1] with disallowing any
cuts to be added by the solver. For the choice of metric for comparison, we
adopted CPU time for solution and the root relaxation gap, defined as the dif-
ference between root node relaxation value and the optimum. We remind the
reader that the latter is a fair metric in that this value is least affected by an
arsenal of solution options and heuristics featured in a powerful MILP solver
such as CPLEX.
First, Table 1 provides the root node relaxation gap values in format ‘average
± 1 standard deviation’ of 30 results for each dataset, followed by the minimum
and the maximum values inside parenthesis. The numbers in the last column
measure the improvement in the root relaxation gap made by cliques, in com-
parison to mccormick.
Table 2 provides the CPU seconds for solving the instances of (PG)•mccormick
and (PG)•cliques in format ‘average ± 1 standard deviation’ of 30 results. We
note that the time for (PG)•cliques includes the time spent in finding graph rep-
resentation of data and vertex maximal clique covers. Again, the last column of
384 K. Yan and H. S. Ryoo
References
1. IBM Corp.: IBM ILOG CPLEX Optimization Studio CPLEX User’s Manual
Version 12 Release 8 (2017). https://www.ibm.com/support/knowledgecenter/
SSSA5P 12.8.0/ilog.odms.studio.help/pdf/usrcplex.pdf. Accessed 12 Dec 2018
2. Crama, Y.: Concave extensions for nonlinear 0–1 maximization problems. Math.
Program. 61, 53–60 (1993)
3. Del Pia, A., Khajavirad, A.: A polyhedral study of binary polynomial programs.
Math. Oper. Res. 42(2), 389–410 (2017)
4. Del Pia, A., Khajavirad, A.: The multilinear polytope for acyclic hypergraphs.
SIAM J. Optim. 28(2), 1049–1076 (2018)
5. Fortet, R.: L’algèbre de boole dt ses applications en recherche opérationnelle.
Cahiers du Centre d’Études de Recherche Opérationnelle 1(4), 5–36 (1959)
6. Fortet, R.: Applications de l’algèbre de boole en recherche opérationnelle. Revue
Française d’Informatique et de Recherche Opérationnelle 4(14), 17–25 (1960)
7. Glover, F., Woolsey, E.: Converting the 0–1 polynomial programming problem to
a 0–1 linear program. Oper. Res. 12(1), 180–182 (1974)
8. Granot, F., Hammer, P.: On the use of boolean functions in 0–1 programming.
Methods Oper. Res. 12, 154–184 (1971)
9. Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.
edu/ml. Accessed 12 Dec 2018
10. McCormick, G.: Computability of global solutions to factorable nonconvex pro-
grams: part I-convex underestimating problems. Math. Program. 10, 147–175
(1976)
11. Moon, J.W., Moser, L.: On cliques in graphs. Isr. J. Math. 3(1), 23–28 (1965)
12. Rikun, A.: A convex envelope formula for multilinear functions. J. Glob. Optim.
10, 425–437 (1997)
13. Ryoo, H.S., Jang, I.Y.: MILP approach to pattern generation in logical analysis of
data. Discret. Appl. Math. 157, 749–761 (2009)
14. Ryoo, H.S., Sahinidis, N.: Analysis of bounds for multilinear functions. J. Glob.
Optim. 19(4), 403–424 (2001)
15. Yan, K., Ryoo, H.S.: 0–1 multilinear programming as a unifying theory for LAD
pattern generation. Discret. Appl. Math. 218, 21–39 (2017)
386 K. Yan and H. S. Ryoo
16. Yan, K., Ryoo, H.S.: Strong valid inequalities for Boolean logical pattern genera-
tion. J. Glob. Optim. 69(1), 183–230 (2017)
17. Yan, K., Ryoo, H.S.: A multi-term, polyhedral relaxation of a 0-1 multilinear func-
tion for Boolean logical pattern generation. J. Glob. Optim. https://doi.org/10.
1007/s10898-018-0680-8. (In press)
Gaining or Losing Perspective
Introduction
J. Lee was supported in part by ONR grant N00014-17-1-2296 and LIX, l’École Poly-
technique. D. Skipper was supported in part by ONR grant N00014-18-W-X00709.
E. Speakman was supported by the Deutsche Forschungsgemeinschaft (DFG, German
Research Foundation) - 314838170, GRK 2297 MathCoRe.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 387–397, 2020.
https://doi.org/10.1007/978-3-030-21803-4_39
388 J. Lee et al.
which case p − 1 = 1 and the naı̈ve perspective relaxation is the true perspective
relaxation. So some might think, even for p > 2, that q = 1 would give the
convex hull—but this naı̈ve perspective relaxation is not the strongest; we need
to use q = p − 1 to get the convex hull.
In Sect. 2, we present a formula for the volumes of all of these relaxations as
a means of comparing them. In doing so, we quantify, in terms of l, u, p, and q,
how much stronger the convex hull is compared to the weaker relaxations, and
when, in terms of l and u, there is much to be gained at all by considering more
than the weakest relaxation. Using our formula, and thinking of the baseline
of q = 1, namely the naı̈ve perspective relaxation, we quantify the impact of
“losing perspective” (e.g., going to q = 0, namely the most naı̈ve relaxation) and
of “gaining perspective” (e.g., going to q = p − 1, namely the true perspective
relaxation). Depending on l and u for a particular x (of which there may be a
great many in a real model), we may adopt different relaxations based on the
differences of the volumes of the various relaxation choices and on the solver
environment. For p = 2, we obtain further results on asymptotic behavior and
on optimal branching-point selection.
Compared to earlier work on volume formulae and related branching-point
selection relevant to comparing convex relaxations, our present results are the
first involving convex sets that are not polytopes. Thus we demonstrate that we
can get meaningful results that do not rely on triangulation of polytopes.
In Sect. 3 we present some computational experiments (for p = 2) which
bear out our theory, as we verify that volume can be used to determine which
variables are more important to handle by perspective relaxation.
1 Definitions
For real scalars u > l > 0 and p > 1, we define
Sp := (x, y, z) ∈ R2 × {0, 1} : y ≥ xp , uz ≥ x ≥ lz ,
Note that even though xp − yz q is not a convex function for q > 0 (even for
p = 2, q = 1), the set Spq is convex. In fact, the set Spq is higher-dimensional-
power-cone representable, which makes working with it appealing. Still, com-
putationally handling higher-dimensional power cones efficiently is not a trivial
matter, and we should not take it on without considering alternatives.
These sets are unbounded in the increasing-y direction. This is rather incon-
venient because we want to assess relaxations by computing their volumes. But
390 J. Lee et al.
Proposition 1. For p > 1 and q ∈ [0, p − 1], (i) S̄p ⊆ S̄pq , (ii) S̄pq is a convex
set, (iii) S̄pq ⊆ S̄pq , for 0 ≤ q ≤ q, and (iv) conv(S¯p ) = S̄pp−1 .
2 Our Results
2.1 Volumes
Inequality (6) is implied by (3) because y ≥ 0. Therefore, for the various choices
of u, l, and y, the tight inequalities for Ry are among (1), (2), (3), (4), and (5).
In fact, the region will always be described by either the entire set of inequalities
(if y > lp ), or (1), (2), (3), and (4) (if y ≤ lp ). For an illustration of these two
cases with p = 5 and q = 3, see Figs. 1, 2.
To understand why these two cases suffice, observe that together (2) and (4)
create a ‘wedge’ in the positive orthant. Ry is composed of this wedge intersected
with {(x, z) ∈ R2 : z ≥ xp/q y −1/q }, for uyp ≤ z ≤ 1. With a slight abuse of
notation, based on context we use (k), for k = 1, 2, . . . , 5, to refer both to the
inequality defined above and to the 1-d boundary of the region it describes.
Now consider the set of points formed by the wedge and the inequality
z ≥ xp/q y −1/q . Curves (1) and (4) intersect at (0, 0) and at a = (xa , za ) :=
y 1/(p−q) y 1/(p−q)
lq , p . Curves (1) and (2) intersect at (0, 0) and at b =
l
(xb , zb ) := uyq , uyp
1/(p−q) 1/(p−q)
. To understand the area that we are seek-
ing to compute, we need to ascertain where (0, 0), a, and b fall relative to (3)
Gaining or Losing Perspective 391
and (5), which bound the region uyp ≤ z ≤ 1. Note that the origin falls on or
below (3), and because u > l, a is always above b (in the sense of higher value
of z).
We show that b must fall between lines (3) and (5). This is equivalent to
y
y 1/(p−q)
up ≤ up = zb ≤ 1. Now, we know y ≤ up , which implies uyp ≤ 1.
From our assumptions on p and q, we also have 0 < p−q 1
≤ 1. From this we can
y
y 1/(p−q)
immediately conclude up ≤ up = zb ≤ 1.
Furthermore, given that a must be above b, we now have our two cases: a is
either above (5) (if y > lp ), or on or below (5) (if y ≤ lp ). Using the observations
made above, we can now calculate the area of Ry via integration. We integrate
392 J. Lee et al.
over z, and the limits of integration depend on the value of y. If y ≤ lp , then the
area is given by the expression:
zb za
1
(uz − lz) dz + (yz q ) p − lz dz.
y
up zb
x ≤ y 1/p . (1 )
Similarly to before, we can use this information to compute the volume of S̄p0 :
⎛ ⎞
lp y 1/p
u
y 1/p
l
1
vol(S̄p0 ) = ⎝ (uz − lz) dz + y − lz
p dz ⎠ dy
y y 1/p
0 up u
⎛ ⎞
up y 1/p
u
1
1
+ ⎝ (uz − lz) dz + y − lz
p dz ⎠ dy
y y 1/p
lp up u
We can now precisely quantify how much better the convex-hull perspective
relaxation (q = p − 1) is compared to the most naı̈ve relaxation (q = 0):
Corollary 3. For p > 1,
(p − 1)(up+1 − lp+1 ) u3 − l 3
vol(S̄p0 ) − vol(S̄pp−1 ) = = , for p = 2 .
3(p + 1)(p + 2) 36
We can also precisely quantify how much better the convex-hull perspective
relaxation (q = p − 1) is compared to the naı̈ve perspective relaxation (q = 1):
Corollary 4. For p ≥ 2,
(p − 2)(up+1 − lp+1 )
vol(S̄p1 ) − vol(S̄pp−1 ) = .
3(p + 1)2
vol(S̄20 ) − vol(S̄21 ) 1 + k + k2 1
lim = ≥ .
u→∞ 0
vol(S̄2 ) 3(3 − k − k )
2 9
the domain of a variable, so that re-convexifying the two child relaxations yields
the least volume.
For S ∈ {S̄21 , S̄20 }, let vS (x̂) be the sum of the volumes of the two pieces
of S created by branching on x at x̂ ∈ [l, u]. Interestingly, the branching-point
behavior of S̄21 and S̄20 are identical.
Theorem 6. √ For S ∈ {S̄21 , S̄20 }, vS is strictly convex on [l, u], and its minimum
is at x̂ = (l + l2 + 3u2 )/3.
min c y + f z
subject to:
a x ≥ b ;
ui zi ≥ xi ≥ li zi , i = 1, . . . , n ;
≥ yi ≥
u2i zi x2i , i = 1, . . . , n ;
1 ≥ zi ≥ 0, i = 1, . . . , n .
we only want to apply the perspective relaxation for some pre-specified fraction
of the i’s, we get the best improvement in the objective value (thinking of it as a
lower bound for the true problem with the constraints zi ∈ {0, 1}) by preferring i
with the largest value of u3i −li3 . Moreover, most of the benefit is already achieved
at much lower values of k than for the other rankings.
[8,12] suggested that for a pair ofrelaxationsP, Q ⊂ Rd , a good measure for
d d
evaluating Q relative to P might be vol(Q) − vol(P ) (in our present setting,
we have d = 3). We did experiments ranking by this, rather then the simpler
vol(Q) − vol(P ), and we found no significant difference in our results. This can
be explained by the fact that ranking by either of these choices is very similar
for our test set.
min v + c y
subject to:
ax≥b;
e z ≤ κ ;
w − M x = 0 ;
v ≥ w2 ;
ui zi ≥ xi ≥ li zi , i = 1, . . . , n ;
u2i zi ≥ yi ≥ x2i , i = 1, . . . , n ;
1 ≥ zi ≥ 0, i = 1, . . . , n ;
wi unrestricted, i = 1, . . . , n .
Note: The inequality v ≥ w2 is correct; there is a typo in [6], where it is written
as v ≥ w. The inequality v ≥ w2 , while not formulating a Lorentz (second-
order) cone, may be re-formulated as a an affine slice of a rotated Lorentz cone,
or not, depending on the solver employed.
Our results followed the same general trend seen in Fig. 3.
References
1. Aktürk, M.S., Atamtürk, A., Gürel, S.: A strong conic quadratic reformulation for
machine-job assignment with controllable processing times. Oper. Res. Lett. 37(3),
187–191 (2009)
2. Basu, A., Conforti, M., Di Summa, M., Zambelli, G.: Optimal cutting planes from
the group relaxations. arXiv:abs/1710.07672 (2018)
3. Dey, S., Molinaro, M.: Theoretical challenges towards cutting-plane selection.
arXiv:abs/1805.02782 (2018)
4. Frangioni, A., Gentile, C.: Perspective cuts for a class of convex 0–1 mixed integer
programs. Math. Program. 106(2), 225–236 (2006)
5. Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming,
version 2.1, build 1123. http://cvxr.com/cvx (2017)
6. Günlük, O., Linderoth, J.: Perspective reformulations of mixed integer nonlinear
programs with indicator variables. Math. Program. Ser. B 124, 183–205 (2010)
7. Ko, C.W., Lee, J., Steingrı́msson, E.: The volume of relaxed Boolean-quadric and
cut polytopes. Discret. Math. 163(1–3), 293–298 (1997)
8. Lee, J., Morris Jr., W.D.: Geometric comparison of combinatorial polytopes. Dis-
cret. Appl. Math. 55(2), 163–182 (1994)
9. Lee, J., Skipper, D.: Volume computation for sparse boolean quadric relaxations.
Discret. Appl. Math. (2017). https://doi.org/10.1016/j.dam.2018.10.038.
10. Speakman, E., Lee, J.: Quantifying double McCormick. Math. Oper. Res. 42(4),
1230–1253 (2017)
11. Speakman, E., Lee, J.: On branching-point selection for trilinear monomials in
spatial branch-and-bound: the hull relaxation. J. Glob. Optim. (2018). https://
doi.org/10.1016/j.dam.2018.10.038.
12. Speakman, E., Yu, H., Lee, J.: Experimental validation of volume-based compar-
ison for double-McCormick relaxations. In: Salvagnin, D., Lombardi, M. (eds.)
CPAIOR 2017, pp. 229–243. Springer (2017)
Gaining or Losing Perspective 397
13. Speakman, E.E.: Volumetric guidance for handling triple products in spatial
branch-and-bound. Ph.D., University of Michigan (2017)
14. Steingrı́msson, E.: A decomposition of 2-weak vertex-packing polytopes. Discret.
Comput. Geom. 12(4), 465–479 (1994)
15. Toh, K.C., Todd, M.J., Tütüncü, R.H.: SDPT3-a MATLAB software package for
semidefinite programming. Optim. Methods Softw. 11, 545–581 (1998)
Game Equilibria and Transition Dynamics
with Networks Unification
Abstract. In this paper, we consider the following problem - what affects the
Nash equilibrium amount of investment in knowledge when one of the complete
graph enters another full one. The solution of this problem will allow us to
understand exactly how game agents will behave when deciding whether to
enter the other net, what conditions and externalities affect it and how the level
of future equilibrium amount of investments in knowledge can be predicted.
1 Introduction
This article continues the study of Nash equilibria and its changes in the process of
unification complete graphs. However, this paper contains a number of new elements in
comparison with previous studies.
To begin with, we study dynamic behavior of agents, not only by generalizing the
simple two-period model of endogenous growth of Romer with the production and
externalities of knowledge (as in paper [7]), but also by using difference equations.
Moreover, we assume that our agents are innovative companies that are interested in
knowledge investmens.
The main content of the article is focused on the analysis of changes in the Nash
equilibrium investment values, as well as the description of angular solutions. We study
necessary and sufficient conditions, and possible limitations for the appearance of new
different equilibria.
The main problem of this research is to study the differences in agents’ behavior
during networks unification of different sizes and relations between amount of actors’
knowledge investments and their productivity.
To achieve this goal, the following objectives should be fulfilled:
(1) to create the model which can describe agents’ decisions in amount of knowledge
investments;
(2) to find the equilibrium condition of this model hat shows the optimal choices of
each network agent;
(3) outline the relations between agent’s productivity, network size and value of
knowledge invesments.
2 Model Description
There is a network (undirected graph) with n nodes, i ¼ 1; 2; ::; n; each node represents
an agent. In period 1 each agent i possesses initial endowment of good, e, and uses it
partially for consumption in first period of life, ci1 , and partially for investment into
knowledge, ki :
ci1 þ ki ¼ e; i ¼ 1; 2; . . .; n: ð1Þ
F ðki ; Ki Þ ¼ Bi ki Ki ; Bi [ 0; ð3Þ
The first two constraints of problem PðKi Þ in the optimum point are evidently satisfied
as equalities. Substituting into the objective function, we obtain a new function (payoff
function):
Vi ðki ; Ki Þ ¼ e2 ð1 aÞ ki eð1 2aÞ aki2 þ Ai ki Ki ð7Þ
If all players’ solutions are internal ð0\ki \e; i ¼ 1; 2; . . .; nÞ, i.e. all players are
active, the equilibrium will be referred as inner equilibrium. Clearly, the inner equi-
librium (if it exists for given values of parameters) is defined by the system
D1 Vi ðki ; Ki Þ ¼ 0; i ¼ 1; 2; . . .; n ð8Þ
Game Equilibria and Transition Dynamics 401
or
~ i eð1 2aÞ ¼ 0
D1 Vi ðki ; Ki Þ ¼ ðAi 2aÞki þ Ai K ð10Þ
~
~ks ¼ eð2a 1Þ þ Ai Ki ; ð11Þ
i
2a Ai
where K ~ i – pure externality of agent i. It is obvious, that if agent i is active, then his
investments will be equal to ~kis in equilibrium. To analyze equilibriums we need the
following statement.
Proposition 1. ([5], Lemma 2.1 and Corollary 2.1) A set of investment agent values
ðk1 ; k2 ; . . .; kn Þ can be an equilibrium only if for each i ¼ 1; 2; . . .; n it is true that
~i
1. if ki ¼ 0, then K eð12aÞ
Ai ;
2. if 0\ki \e, then ki ¼ ~kis ;
3. if ~ i eð1Ai Þ.
ki ¼ e, then K Ai
Let us consider the following situation. There are two clicks with n1 and n2 edges
respectively, with the same agents’ productivity: A.
There are three types of agents in the united network. The actors of the first type are all
the agents of the first network, besides the agent of the first network, which connected to
the agents of the second network. Actors of the second type are all agents of the second
network. The third type of actors is only one agent – this is the agent of the first network
that connected to all actors of the second network. Since all agents of the same type will
have the same environment, they will behave in the same way, not only in equilibrium,
but also in dynamics. Therefore, the investment of each agent of the type i will be denoted
ki , and the environment of each agent of the type i will be denoted Ki .
Both the clicks are initially in inner equilibrium. It follows immediately from (9)
that the initial investment of agents is the following
8
< ððn1 1ÞA 2aÞk1 þ Ak3 ¼ eð1 2aÞ;
ðn2 A 2aÞk2 þ Ak3 ¼ eð1 2aÞ; ð13Þ
:
ðn1 1ÞAk1 þ n2 Ak2 þ ðA 2aÞk3 ¼ eð1 2aÞ:
Solving this system by Kramer method we obtain for inner equilibria new equilibrium
investment amount:
Definition 1 implies that the dynamics in model under consideration is described by the
system of difference equations:
8
>
> k1t þ 1 ¼ ðn1 2a
1ÞA t
k1 þ eð2a1Þ
2a ;
A t
2a k3 þ
<
tþ1 eð2a1Þ
k2 ¼ 2a k2 þ 2a k3 þ 2a ;
n2 A t A t ð18Þ
>
>
: t þ 1 ðn1 1ÞA t n2 A t
k3 ¼ 2a k1 þ 2a k2 þ 2a k3 þ eð2a1
A t Þ
2a :
where t ¼ 0; 1; 2. . .
Characteristic equation for this system is
Game Equilibria and Transition Dynamics 403
Definition 2. The equilibrium is called dynamically stable if, after a small deviation of one
of the agents from the equilibrium, dynamics starts which returns the equilibrium back to
the initial state. In the opposite case the equilibrium is called dynamically unstable.
To find the eigenvalues of difference equations (18) in general we need to impose the
restrictions: n2 ¼ n1 1 ¼ n. Then the Eq. (19) takes the form
nA
k 0 A
2a 2a ðn þ 1ÞA nA2
nA
0 nA
k A
¼ k k k
2
2 ¼ 0; ð20Þ
2a 2a 2a 2a 4a
nA nA A
k
2a 2a 2a
hence
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðn þ 1ÞA ðn þ 1Þ2 A2 nA2 A pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi nA
k1;2 ¼ þ ¼ n þ 1 n2 þ 6n þ 1 ; k3 ¼ :
4a 16a2 4a2 4a 2a
ð21Þ
ðknA
2a Þx
Thus, if k 6¼ nA
2a , then x1 ¼ x2 ¼ x, x3 ¼ A , and if k ¼ nA
2a , then x3 ¼ 0, x1 ¼ x2 .
2a
Hence we may choose
0 1
1
e1 ¼ @ A
p1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; ð24Þ
1
2 1 n n 2 þ 6n þ 1
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
corresponding to k1 ¼ 4a
A
n þ 1 n2 þ 6n þ 1
404 A. Korolev and I. Garmashov
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi nA
A
nþ1 n2 þ 6n þ 1 2a
x3 ¼ 4a
A
¼
2a
(we supposed x ¼ x1 ¼ x2 ¼ 1)
1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
¼ n þ 1 n2 þ 6n þ 1 n ¼ 1 n n2 þ 6n þ 1 ;
2 2
0 1
1
e2 ¼ @ A
p1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; ð25Þ
1
2 1 n þ n 2 þ 6n þ 1
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
corresponding to k2 ¼ 4a
A
n þ 1 þ n2 þ 6n þ 1 ,
0 1
1
e3 ¼ @ 1 A; ð26Þ
0
corresponding to k3 ¼ nA
2a .
Thus, dynamics in joined network is described with following vector equation
0 1 0 1
k1t k1
@ k2t A ¼ C1 kt e1 þ C2 kt e2 þ C3 kt e3 þ @ k2 A ð27Þ
1 2 3
k3t k3
The constants C1 , C2 , C3 can be found from initial conditions. Before unification the
both networks were in symmetric inner equilibria:
We add first two equations of the previous system. We receive the following system of
two equations to define C1 and C2 :
Game Equilibria and Transition Dynamics 405
8
>
> C1 þ C2 ¼ 2e½ðð12a Þ½ð2n þ 1ÞA4a 2aeð12aÞ
n þ 1ÞA2aðnA2aÞ nA2 þ 2aðn þ 1Þ4a2 [ 0;
< pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
1 n n2 þ 6n þ 1 C1 þ 12 1 n þ n2 þ 6n þ 1 C2 ¼ ð30Þ
>
>
2
:
¼ ðneþð12a Þ eð12aÞðnA þ 2aÞ
1ÞA2a nA2 þ 2aðn þ 1ÞA4a2 \0
It is easy to check, that the right side of the first equation is positive, and the right side
of the second equation is negative. It is clear, that k2 is the largest eigenvalue in
absolute and it is positive as all the components of eigenvector e2 : Hence, the nature of
transitional process will be determined with the sign of constant C2 . Further, the sign of
C2 , is defined by sign of the following expression
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
~ 2 ¼ 2 k0 k þ
D n 2 þ 6n þ 1 þ n 1 k 0 þ k 0 k k ; ð31Þ
3 3 1 2 1 2
6 Conclusion
In this paper, we have described the process of change in game equilibrium during
graphs unification using dynamics model. We have highlighted the significance of the
productivity role that influences the agents’ behavior. Moreover, we determined the
importance of networks sizes, which also effect on agents’ decisions that they take
during unification process.
406 A. Korolev and I. Garmashov
We believe that this article offers the base model of game equilibria change that can
be improved with increasing the amount of parameters and modification graphs’ type to
incomplete nets or non-oriented graphs.
Acknowledgement. The research is supported by the Russian Foundation for Basic Research
(project 17-06-00618).
References
1. Alcacer, J.: Chung W: Location strategies and knowledge spillovers. Manage. Sci. 53(5),
760–776 (2007)
2. Bresch, S., Lissoni, F.: Knowledge spillovers and local innovation systems: a critical survey.
Ind. Corp. Change 10(4), 975–1005 (2001)
3. Chung, W., Alcácer, J.: Knowledge seeking and location choice of foreign direct investment
in the United States. Manage. Sci. 48(12), 1534–1554 (2002)
4. Cooke, P.: Regional innovation systems, clusters, and the knowledge economy. Ind.
Corp. Change 10(4), 945–974 (2001)
5. Jaffe, A.B., Trajtenberg, M.: Henderson, R: Geographic localization of knowledge spillovers
as evidenced by patent citations. Q. J. Econ. 108(3), 577–598 (1993)
6. Katz, M.L.: Shapiro, C: Network externalities, competition, and compatibility. Am. Econ.
Rev. 75(3), 424–440 (1985)
7. Matveenko, V.D., Korolev, A.: V: Network game with production and knowledge
externalities. Contrib. Game Theory Manag. 8, 199–222 (2015)
8. Matveenko, V.D., Korolev, A.: V: Knowledge externalities and production in network: game
equilibria, types of nodes, network formation. Int. J. Comput. Econ. Econ. 7(4), 323–358
(2017)
9. Matveenko, V., Korolev, A.: Zhdanova, M: Game equilibria and unification dynamics in
networks with heterogeneous agents. Int. J. Eng. Bus. Manag. 9, 1–17 (2017)
Local Search Approaches with Different
Problem-Specific Steps for Sensor
Network Coverage Optimization
1 Introduction
Wireless sensor networks are subject to many research projects where an impor-
tant issue is a maximization of time when the system fulfills its tasks, namely
the network lifetime. When a set of immobile sensors with a limited battery
capacity is randomly distributed over an area to monitor a set of points of inter-
est (POI) and the number of sensors is significant, for the majority of sensors,
their monitoring ranges overlap. Moreover, not all POIs must be monitored all
the time. In many applications, it is sufficient to control at any given time 80 or
90% of POIs. This percentage of POIs is called the required level of coverage.
Thus, not all sensors must be active all the time. Turning off some of them saves
their energy and allows to extend the network lifetime.
In this paper, we study relative performance of local search strategies used
to solve the problem of the network lifetime maximization. Such an approach
consists of three major problem–specific steps: finding any possible schedule,
that is, an initial problem solution, using a perturbation procedure to obtain
its neighbor, i.e., a solution close to the original one, and refining this neighbor
solution. If the refined neighbor is better than its ancestor, it takes the place
of the current best–found schedule. The algorithm repeats the steps of neighbor
generation and replacement until some termination condition is satisfied.
Earlier, in [8–10] we have proposed three local search algorithms to solve
the problem in question. Each of these algorithms employs a different method
to obtain an initial solution and a different perturbation procedure to get a
neighbor solution. Moreover, each of these algorithms refines the neighbor with
the method used to get the initial solution. Due to the regular and universal
structure of the three optimization algorithms, one can easily create new ones
by swapping selected problem–specific steps between them. In this paper, we
construct a group of local search algorithms based on the three proposed earlier.
We also evaluate their performance experimentally.
The paper is organized as follows. Related work is briefly discussed in Sect. 2.
Section 3 defines the Maximum Lifetime Coverage Problem (MLCP) formally.
The local search approach is introduced in Sect. 4. Section 5 describes our exper-
iments with local search algorithms for MLCP. Our conclusions are given in
Sect. 6.
2 Related Work
The problem of maximization of the sensor network lifetime has been intensively
studied for the last two decades. There are many variants of this problem, and
various strategies have been employed to solve them. More about these strategies
can be found in, e.g., the monograph [11] on the subject. More recently, modern
heuristic techniques have also been applied to this problem. One can find, e.g.,
papers on schedule optimization based on evolutionary algorithms [1,4], local
search with problem–specific perturbation operators [8–10], simulated anneal-
ing [5,7], particle swarm optimization [13], whale algorithm [12], graph cellular
automata [6] or other problem–specific heuristics [2,3,14].
We assume that NS immobile sensors with a limited battery capacity are ran-
domly deployed over an area to monitor NP points of interest (POI). All sensors
have the same sensing range rsens and the same fully charged batteries. We use
a discrete-time model where a sensor is either active or asleep during every time
slot. Every sensor consumes one unit of energy per time unit for its activity while
in a sleeping state the energy consumption is negligible. Battery capacity allows
the sensor to be active during Tbatt time steps (consecutive, or not). The assump-
tions mentioned above give a simplified model, of course. In real life, effective
battery capacity depends on various factors, such as temperature, and is hard
to predict. Frequent turnings on and off the battery shorten its lifetime. In this
research, we omit such problems and assume that neither the temperature nor
the sensor activity schedule influences the battery parameters. An active sensor
monitors all POIs within its sensing range. We assume that every POI can be
Local Search Approaches with Different Problem-Specific Steps 409
When step #1 is over, and the initialization procedure returns a new sched-
ule, in almost every case a small set of sensors retains a little energy in their
batteries. Even if we turn them all on, they will not provide a sufficient level
of POI coverage. Therefore, no feasible slot can be created using these sensors.
Perturbation operators make use of this set.
A perturbation operator builds a neighbor schedule in two steps (lines 3 and
4). First, the operator modifies the input schedule to make the set of available
working sensors larger. In the second step, it builds slots based on these sensors.
Eventually, the new list of slots should be longer or at least as long as the list
in the input schedule.
410 K. Trojanowski and A. Mikitiuk
The first step of the perturbation operator may follow two opposite strategies.
In the first one, the operator turns off selected sensors in the schedule. In [8] a
single slot is chosen randomly and removed entirely from the schedule. Thus, all
the active sensors from this slot recover one portion of energy. In [10], for each
of the slots the sensors to be off are chosen randomly with a given probability.
Simulations show that even for a minimal probability like, for example, 0.0005
the number of slots with unsatisfied coverage level is much larger than one when
the procedure is over. Therefore, this perturbation is much stronger than the
previous one because all such invalid slots are removed immediately from the
schedule.
The second strategy [9] starts with activation of sensors from the pool of
the remaining working sensors in random slots of the schedule. For each of the
selected slots we draw a sensor from the pool randomly, but for faster improve-
ment, we activate only sensors which increase the level of coverage in the slot.
Precisely, for selected slots, we choose a sensor randomly from the pool and then
check if its coverage of POI is fully redundant with any of sensors already active
in this slot. If yes, activation of this sensor is pointless because the slot coverage
level does not change. In this case, the selected sensor goes back to the pool,
and we try the same procedure of sensor selection and activation with the next
slot. When the pool is empty, that is, all the remaining working sensors have
been activated, in the modified slots we fit the sets of active sensors to have
the coverage level just a bit above the requested threshold. Saved sensors retain
energy and participate in a new set of working ones.
In the second step of the perturbation operator, we assume that the new set
of working sensors is large enough to provide a satisfying level of coverage, so
we apply the initialization procedure from step #1. The procedure creates slots
one by one and decreases the energy in batteries at the same time according to
the sensor activities defined in subsequent new slots. This scheme is the same in
HMA, RFTA, and CAIA (albeit, they differ in details). Hence, the non-empty
schedule and the set of working sensors may successfully represent input data
for the initialization procedure called in the second step of the perturbation
operator.
Eventually, we get three versions of proceedings for each of the three steps.
When we swap these versions between LS algorithms, we can obtain twenty
seven versions of LS. Let’s call these versions of LS according to the origin of
the three steps. For example, the notation [HM A, HM A, HM A] represents a
Local Search algorithm, where all the three steps are like in LSHMA , that is, it
is the original, unmodified version of LSHMA . [HM A, RF T A, HM A] represents
the case where the initialization and the refine steps come from LSHMA , but the
modification – from LSRFTA .
5 Experiments
The experimental part of the research consists of experiments with new versions
of LS. For fair comparisons, all the tested versions of LS should start with the
Local Search Approaches with Different Problem-Specific Steps 411
same initial solutions. Low quality of the initial schedules creates an opportunity
to show the efficiency of the compared optimization algorithms. HMA returns
the most extended schedules which are hard to improve; therefore it is not taken
into account. From the remaining two procedures, we selected CAIA to generate
initial schedules for each of the problem instances. CAIA represents the initial-
ization step of compared LS versions in every case, and what is more important,
the main loops of the algorithms begin optimization from the same starting
points assigned to the instances. Thus, just the main loop, that is, precisely the
perturbation operator of the algorithm may vary in the subsequent versions of
LS. So, in the further text, we label the LS versions accordingly to the construc-
tion of just the perturbation operator and the symbol for the method used in
the initialization procedure is omitted. The full list of considered versions of LS
is as follows: [HM A, HM A], [HM A, RF T A], [HM A, CAIA], [RF T A, HM A],
[RF T A, RF T A], [RF T A, CAIA], [CAIA, HM A], [CAIA, RF T A], and [CAIA,
CAIA]. In every case, the loop has a limit of 500 iterations.
For fair evaluation of the algorithm efficiency, we should compare lengths of
obtained schedules with the optimal schedules for each of the instances. Unfor-
tunately, optimal solutions of the instances are unknown, and the complexity of
these problems makes them impossible to solve by an exhaustive search in a rea-
sonable time. Therefore, to obtain sub-optimal schedules, we did a set of experi-
ments with different versions of LS for all instances. All these versions employed
HMA as the initialization step hoping that in this way we maximize chances to
get solutions in close vicinity of the optimum. Lengths of best-obtained sched-
ules represent reference values in further evaluations of the percentage quality
of schedules.
For our experiments, we used a set of eight test cases SCP1 proposed earlier [8–
10]. In all cases, there are 2000 sensors with the sensing range rsens one unit
(this is an abstract unit, not one of the standard units of the length). In these
test cases, POIs form nodes of a rectangular or a triangular grid. The area under
consideration is a square with possible side sizes: 13, 16, 19, 22, 25, and 28 units.
The distance between POIs grows together with the side size of the area. This
gives us similar numbers of POIs in all test cases. The POIs distribution should
not be regular; therefore, the number of about 20% of nodes in the grid does not
get its POIs. A grid node has a POI only if a randomly generated value from
the range [0, 1] is less than 0.8. Thus, instances of the same test case differ in
the number of POIs from 199 to 240 for the triangular grid and from 166 to
221 for the rectangular grid. Either a random generator or a Halton generator is
the source of the sensor localization coordinates. For every test case, a set of 40
instances has been generated. The reader is referred to [8–10] for a more detailed
description of the benchmark SCP1. In our experiments, we have assumed cov
80% and δ 5%.
412 K. Trojanowski and A. Mikitiuk
Table 1. Mean, min and max percentage qualities of the best-found schedules returned
by the LS algorithms for each of the five values of Tbatt from 10 to 30. Codes in
column headers: C – CAIA, H – HMA, R – RFTA, e.g., HR represents the version
[HM A, RF T A]; init – qualities of the initial schedules generated by CAIA
Tbatt Init HH HR CH HC RH CR RR CC RC
10 Mean 53.94 96.75 94.55 94.51 93.56 57.37 55.18 54.70 54.55 54.49
Min 52.18 94.21 92.26 91.51 89.59 54.72 53.36 52.87 52.79 52.80
Max 56.04 98.72 96.73 98.10 96.23 61.35 57.07 56.74 56.60 56.60
15 Mean 53.99 97.01 94.77 94.31 93.84 56.87 55.44 54.70 54.50 54.46
Min 52.35 95.01 92.56 91.26 90.72 54.41 53.49 52.98 52.83 52.79
Max 55.96 99.02 96.93 97.37 96.42 59.79 57.28 56.65 56.47 56.38
20 Mean 53.89 96.95 94.76 94.08 93.83 56.30 55.60 54.56 54.32 54.28
Min 52.37 94.60 92.70 91.01 91.15 54.16 53.92 53.07 52.83 52.73
Max 56.04 98.91 96.77 97.14 96.06 59.04 57.80 56.68 56.40 56.37
25 Mean 54.10 97.13 94.92 94.22 94.06 56.34 56.05 54.78 54.48 54.44
Min 52.37 94.98 92.76 91.13 91.08 54.18 54.14 53.01 52.75 52.73
Max 55.87 98.95 96.92 97.03 96.14 58.62 58.13 56.65 56.22 56.20
30 Mean 53.86 97.02 94.73 93.69 93.80 55.86 56.07 54.55 54.21 54.17
Min 52.29 95.19 92.77 90.91 91.31 53.68 54.29 52.93 52.58 52.55
Max 55.85 98.99 97.00 96.77 96.19 58.33 58.17 56.59 56.24 56.20
Concerning effectiveness, one can divide the LS versions into three groups:
weak ones, effective ones, and the master approach which is [HM A, HM A].
The group of effective ones consists of [HM A, RF T A], [CAIA, HM A] and
[HM A, CAIA]. The remaining LS versions belong to the group of weak ones.
Thus, all three approaches using the perturbation method from HMA result in
obtaining a relatively good schedule, no matter what algorithm is used in the
last step to repair or refine the schedule.
One could ask why [CAIA, HM A] is an effective approach while
[RF T A, HM A] is a weak one. The probable reason is that the perturbation
operator used in CAIA removes from the original schedule many more slots
Local Search Approaches with Different Problem-Specific Steps 413
than the one used in RFTA. Thus, the number of sensors available again is
higher, and they have more energy in batteries than in the case of RFTA per-
turbation operator. Having a broader set of available sensors (or even the same
set but having more power), an efficient algorithm HMA can extend a shorter
input schedule much more than in the opposite case when it gets on input a
longer input schedule and a smaller set of available sensors obtained from the
perturbation used by RFTA.
Mean, min and max lengths of schedules returned by the best representatives of
the three groups are presented in Tables 2 – [HM A, HM A], 3 – [HM A, RF T A],
4 – [RF T A, HM A]. One can see that the individual results in Table 2 are often
even 5–7% better than the corresponding results in Table 3. However, in some
cases results in Table 3 are slightly (less than 1%) better than those in Table 2.
The individual results in Table 4 are always much worse than the corresponding
Table 2. Mean, min and max lengths of schedules returned by the version
[HM A, HM A] for each of the eight test cases in SCP1 and for five values of Tbatt
from 10 to 30
Table 3. Mean, min and max lengths of schedules returned by the version
[HM A, RF T A] for each of the eight test cases in SCP1 and for five values of Tbatt
from 10 to 30
results from the previous two tables. Thus, comparison of absolute results for
individual test cases confirms observations from Sect. 5.2 concerning overall mean
relative quality of the schedules generated by particular approaches.
6 Conclusions
Table 4. Mean, min and max lengths of schedules returned by the version
[RF T A, HM A] for each of the eight test cases in SCP1 and for five values of Tbatt
from 10 to 30
and this way we obtained new versions of the local search algorithms. In the set
of experiments, we compared the efficiency of these new versions.
In our experiments, we generated an initial schedule using CAIA, and we tried
perturbation methods and refinement/repair methods from all three approaches.
We used benchmark data set SCP1 which we proposed in our previous papers.
Our experiments have shown that the best pair of perturbation and refine-
ment/repair methods is the one used in LSHMA , i.e., [HM A, HM A]. Approaches
[HM A, RF T A], [CAIA, HM A], and [HM A, CAIA] are also effective while the
remaining combinations give much worse results.
References
1. Gil, J.M., Han, Y.H.: A target coverage scheduling scheme based on genetic algo-
rithms in directional sensor networks. Sensors (Basel, Switzerland) 11(2), 1888–
1906 (2011). https://doi.org/10.3390/s110201888
416 K. Trojanowski and A. Mikitiuk
2. Keskin, M.E., Altinel, I.K., Aras, N., Ersoy, C.: Wireless sensor network lifetime
maximization by optimal sensor deployment, activity scheduling, data routing and
sink mobility. Ad Hoc Netw. 17, 18–36 (2014). https://doi.org/10.1016/j.adhoc.
2014.01.003
3. Roselin, J., Latha, P., Benitta, S.: Maximizing the wireless sensor networks life-
time through energy efficient connected coverage. Ad Hoc Netw. 62, 1–10 (2017).
https://doi.org/10.1016/j.adhoc.2017.04.001
4. Tretyakova, A., Seredynski, F.: Application of evolutionary algorithms to maximum
lifetime coverage problem in wireless sensor networks. In: IPDPS Workshops, pp.
445–453. IEEE (2013). https://doi.org/10.1109/IPDPSW.2013.96
5. Tretyakova, A., Seredynski, F.: Simulated annealing application to maximum life-
time coverage problem in wireless sensor networks. In: Global Conference on Arti-
ficial Intelligence, GCAI, vol. 36, pp. 296–311. EasyChair (2015)
6. Tretyakova, A., Seredynski, F., Bouvry, P.: Graph cellular automata approach to
the maximum lifetime coverage problem in wireless sensor networks. Simulation
92(2), 153–164 (2016). https://doi.org/10.1177/0037549715612579
7. Tretyakova, A., Seredynski, F., Guinand, F.: Heuristic and meta-heuristic
approaches for energy-efficient coverage-preserving protocols in wireless sensor net-
works. In: Proceedings of the 13th ACM Symposium on QoS and Security for Wire-
less and Mobile Networks, Q2SWinet’17, pp. 51–58. ACM (2017). https://doi.org/
10.1145/3132114.3132119
8. Trojanowski, K., Mikitiuk, A., Guinand, F., Wypych, M.: Heuristic optimization
of a sensor network lifetime under coverage constraint. In: Computational Col-
lective Intelligence: 9th International Conference, ICCCI 2017, Nicosia, Cyprus,
27–29 Sept 2017, Proceedings, Part I, LNCS, vol. 10448, pp. 422–432. Springer
International Publishing (2017). https://doi.org/10.1007/978-3-319-67074-4 41
9. Trojanowski, K., Mikitiuk, A., Kowalczyk, M.: Sensor network coverage problem: a
hypergraph model approach. In: Computational Collective Intelligence: 9th Inter-
national Conference, ICCCI 2017, Nicosia, Cyprus, 27–29 Sept 2017, Proceedings,
Part I, LNCS, vol. 10448, pp. 411–421. Springer International Publishing (2017).
https://doi.org/10.1007/978-3-319-67074-4 40
10. Trojanowski, K., Mikitiuk, A., Napiorkowski, K.J.M.: Application of local search
with perturbation inspired by cellular automata for heuristic optimization of sen-
sor network coverage problem. In: Parallel Processing and Applied Mathematics,
LNCS, vol. 10778, pp. 425–435. Springer International Publishing (2018). https://
doi.org/10.1007/978-3-319-78054-2 40
11. Wang, B.: Coverage Control in Sensor Networks. Computer Communications and
Networks. Springer (2010). https://doi.org/10.1007/978-1-84800-328-6
12. Wang, L., Wu, W., Qi, J., Jia, Z.: Wireless sensor network coverage optimization
based on whale group algorithm. Comput. Sci. Inf. Syst. 15(3), 569–583 (2018).
https://doi.org/10.2298/CSIS180103023W
13. Yile, W.U., Qing, H.E., Tongwei, X.U.: Application of improved adaptive parti-
cle swarm optimization algorithm in WSN coverage optimization. Chin. J. Sens.
Actuators (2016)
14. Zorbas, D., Glynos, D., Kotzanikolaou, P., Douligeris, C.: BGOP: an adaptive cov-
erage algorithm for wireless sensor networks. In: Proceedings of the 13th European
Wireless Conference, EW07 (2007)
Modelling Dynamic Programming-Based
Global Constraints in Constraint
Programming
1 Introduction
Constraint Programming (CP) is one of the most active fields in Artificial Intel-
ligence (AI). Designed to solve optimisation and decision problems, it provides
expressive modelling languages, development tools and global constraints. An
overview of the current status of CP and its challenges can be found in [9].
The Dynamic Programming (DP) approach builds an optimal solution by
breaking the problem down into subproblems and solving each to optimality in
a recursive manner, achieving great efficiency by solving each subproblem once
only.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 417–427, 2020.
https://doi.org/10.1007/978-3-030-21803-4_42
418 A. Visentin et al.
Until recently there was no standard procedure to encode a DP model into CP.
If part of a problem required a DP-based constraint that is not provided by the
solver being used, the modeller was forced either to write the global constraint
manually, or to change solver. This restricts the usefulness of DP in CP.
However, a new connection between CP and DP was recently defined. [16]
introduced a technique that allows DP to be seamlessly integrated within CP:
given a DP model, states are mapped to CP variables while seed values and
recurrence equations are mapped to constraints. The resulting model is called a
dynamic program encoding (DPE). Using a DPE, a DP model can be solved by
pure constraint propagation without search. DPEs can form part of a larger CP
model, and provide a general way for CP users to implement DP-based global
constraints. In this paper we explore DPEs further:
2 Method
In this section we formalize the DPE. As mentioned above, it models every DP
state with a CP variable, and the seed values and recurrence relations with
constraints. [16] introduces the technique informally, and here we give a more
formal description based on the definition of DP given in [4].
Many problems can be solved with a DP approach that can be modelled as a
shortest path on a DAG, for example the knapsack problem [11] or the lot sizing
problem [7]. We decided to directly use the shortest path problem, which was
already used as a benchmark for the MiniZinc challenge [18]. One of the most
famous DP-like algorithms is used to solve this problem: Dijkstra’s algorithm.
We will use Fig. 1 to help the visualization of the problem.
is based on its value; and base cases or final states that are the solution of the
smallest problems. The solutions for these problems do not depend on any other
state. In Fig. 1 they are represented by the source and the sink of the graph.
The last general characteristic of the DP approach is the recursive optimiza-
tion procedure. The goal of this procedure is to build the overall solution by
solving one stage at time and linking the optimal solution to each state to other
states of subsequent stages optimal solution. This procedure is generally based
on a backward induction process. The procedure has numerous components, that
can be generally modelled as a functional equation or Bellman equation [2] and
recursion order.
In the DPE the functional equation is not a single equation, but is applied to
every state via a constraint. This constraint contains an equality that binds the
optimal value obtainable to that states to the ones of the next stages involved.
For every state we have a set of feasible decisions that can lead to a state of
the next stage, which in the graph are represented by the edges leaving the
associated node: if used in the shortest path it means that decision is taken. In
the constraint is included also the immediate cost associated with each decision,
which is the value that is added to or subtracted from the next stage state
variables. In Fig. 1 these costs are represented by the weights of the involved
edges. In the shortest path problem, the constraint applied to each (non-sink)
state assigns to the node’s CP variable the minimum of the reachable with one
edge node’s CP variables, plus the edge cost.
The important difference between the encodings is the order in which the
states are explored and resolved. In DP they are ordered in such a way that each
state is evaluated only when all the subsequent stages are solved, while in the
encodings the ordering is delegated to the solvers. In the MIP flow formulation it
is completely replaced by a search on the binary variables, while in the DPE it is
done by constraint propagation, which depends on the CP solver implementation
of the propagators. This approach is more robust than search, which in the worst
case can spend an significant time exploring a search subtree. The optimality of
the solution is guaranteed by the correctness of the DP formulation.
3 Computational Results
We aim to make all our results replicable by other researchers. Our code is avail-
able online at: https://github.com/andvise/dpincp. We also decided to use only
open source libraries. In the first experiment we used MiniZincIDE 2.1.7, while
the second part is coded in Java 10. We used 3 CP solvers: Gecode 6.0.1, Google
OR-Tools 6.7.2 and Choco Solver 4.0.8. We used as MIP solvers: COIN-OR
branch-and-cut solver, IBM ILOG CPLEX 12.8 and Gurobi 8.0.1. All exper-
iments are executed on an Ubuntu system with an Intel i7-3610QM, 8 GB of
RAM and 15 GB of swap memory.
Modelling Dynamic Programming-Based Global Constraints 421
CP solver 0 1 2 3 4 5 6 7 8 9
Dijkstra 23 ms 19 ms 18 ms 17 ms 24 ms 20 ms 25 ms 23 ms 20 ms 29 ms
Flow - 50 ms 60 ms 571 ms 46 ms - 47 ms 1 182 s 4 504 ms -
formulation
The DPE requires a smaller number of variables, since it requires only one
for each node. On the contrary, the flow formulation requires a variable for
each edge. This is without taking in account the number of additional variables
created during the decomposition.
422 A. Visentin et al.
MIP solver 0 1 2 3 4 5 6 7 8 9
Dijkstra 375 s 64 ms - - - 20 667 ms 61 ms - 138 ms 303 ms
Flow 31 ms 39 ms 34 ms 40 ms 46 ms 35 ms 36 ms 40 ms 37 ms 53 ms
formulation
to the source for the full description of the algorithm. Utilizing the structure of
a DP approach in the previous section: J[i, j] with i ∈ I and j ∈ [0, C] are the
states of our DP. Each J[i, j] contains the optimal profit of packing the subset
of items Ii = (i, . . . , n) in a knapsack of volume j.
The formulation can be represented by a rooted DAG, in this case a tree with
node J[1, C] as root and nodes J[n, j], j ∈ [0, C] as leaves. For every internal
node J[i, j] the leaving arcs represent the action of packing the item i, and their
weight is the profit obtained by packing the i-th item. A path from the root to
a leaf is equivalent to a feasible packing, and the longest path of this graph is
the optimal solution. If we encode this model using a DPE, creating all the CP
variables representing the nodes of the graph, then it is solved by pure constraint
propagation with no backtracking.
We use this problem to show the potential for speeding up computational
times. With the DPE implementation we can use simple and well known tech-
niques to reduce the state space without compromising the optimality. For exam-
ple, if at state J[i, j] volume j is large enough to contain all items from i to n
(all the items I might pack in the next stages) then we know that the optimal
solution of J[i, j] will contain all of them, as their profit is a positive number.
This pruning can be made more effective by sorting the items in decreasing order
of size, so the pruning will occur closer to the root and further reduce the size
of the search space.
We test the DPE on different type of instances of increasing item set sizes,
and compare its performance with several other decompositions of the constraint:
– A CP model that uses the simple scalar product of the model (1a)–(1b)
(Naive CP). The MiniZinc encoding of the knapsack contraint uses the
same structure.
– The knapsack global constraint available in Choco (Global constraint).
This constraint is implemented with scalar products. The propagator uses
Dantzig-Wolfe relaxation [6].
– A CP formulation of the encoding proposed in this paper (DPE) solved using
Google OR.
– A DPE with state space reduction technique introduced before (DPE + sr).
– A DPE with state space reduction technique with the parts sorted, (DPE +
sr + sorting).
– The MIP flow formulation proposed by [13] (Flow model Solver) which we
tested with 3 different MIP solvers: COIN CBC, CPLEX and Gurobi.
To make the plots readable we decided to show only these solutions, but others
are available in our code.
As a benchmark we decided to use Pisinger’s described in [14]. We did not
use the code available online because it is not possible to set a seed and make the
experiments replicable. Four different type of instances are defined, in decreasing
correlation between items’s weight and profit order: subsetsum, strongly corre-
lated, weakly correlated and uncorrelated. Due to space limitation we leave the
reader finding the details of the instances in the original paper. In our experi-
ments we tested all the types, and we kept the same configuration of [14] first
424 A. Visentin et al.
set of experiments. We increased the size of the instances until all the DP encod-
ings were not finding the optimal solution before the time limit. A time limit
of 10 min was imposed on the MIP and CP solvers, including variable creation
overhead.
Fig. 2. Average computational time for: (a) subsetsum instances; and, (b) uncorrelated
instances.
Figure 2 shows the computational time in relation with the instance size. Due
to space limitations we had to limit the number of plots. We can see that DPE
clearly outperforms the naive formulation in CP or the previous encoding (flow
formulation) solved with an open source solver, CBC. Normal DPE solved with
an open source solver is computationally comparable to the flow formulation
implemented in CPLEX, and outperform the one solved by Gurobi in instances
where the correlation between weight and profit of the items is lower, even if
the commercial MIP solvers use parallel computations. The DPE outperforms
the variable redefinition technique in MIP, because of the absence of search. It
is also clearly better that a simple CP model of the problem definition, which
is the same model used for the MiniZinc constraint. The Choco constraint with
ad-hoc propagator outperforms the DPE in most of the cases, confirming that
a global constraint is faster than a DPE. A particular situation is the test on
the strongly correlated instances, in this case the global constraint fails to find
the optimal solution in many test instances even with a small number of items;
probably the particular structure of the problem makes the search get stucked
in some non optimal branches.
It is interesting to note the speed up from the space reduction technique. The
basic DPE can solve instances up to 200 items but it has a memory problem:
the state space grows so rapidly that a massive usage of the SWAP memory is
needed. However, this effect is less marked when a state reduction technique is
applied. This effect is stronger when the correlation between item profits and
Modelling Dynamic Programming-Based Global Constraints 425
Fig. 3. Computational time in the subsetsum instances with volume per item reduced
In the case that the constraint has to be called multiple times during the
solving of a bigger model, the DPE can outperform the pure constraint since
the overhead to create all the variables is not repeated. This experiment demon-
strates the potential of the DPE with state space reduction: even with a simple
and intuitive reduction technique we can solve instances 10 times bigger than
with a simple CP model. We can see that the behaviour of DPE is stable regard-
less of the type of the instance; on the contrary, the performance of the space
reduction technique strongly depends on the instance type and the volume of
the knapsack.
Of course we can not outperform a pure DP implementation, even if our
solution involves a similar number of operations. This is mainly due to the time
and space overhead of creating CP variables. In fact the DPE requires more time
to create the CP variables than to propagate the constraints.
4 Conclusions
References
1. Beldiceanu, N., Carlsson, M., Rampon, J.X.: Global constraint catalog, (revision
a) (2012)
2. Bellman, R.: The theory of dynamic programming. Technical report, RAND Corp
Santa Monica CA (1954)
3. Bergman, D., Cire, A.A., van Hoeve, W.J., Hooker, J.N.: Discrete optimization
with decision diagrams. INFORMS J. Comput. 28(1), 47–66 (2016)
4. Bradley, S.P., Hax, A.C., Magnanti, T.L.: Applied Mathematical Programming.
Addison Wesley (1977)
5. Chu, G., Stuckey, P.J.: Minimizing the maximum number of open stacks by cus-
tomer search. In: International Conference on Principles and Practice of Constraint
Programming, pp. 242–257. Springer (2009)
6. Dantzig, G.B., Wolfe, P.: Decomposition principle for linear programs. Oper. Res.
8(1), 101–111 (1960)
7. Eppen, G.D., Martin, R.K.: Solving multi-item capacitated lot-sizing problems
using variable redefinition. Oper. Res. 35(6), 832–848 (1987)
8. Focacci, F., Milano, M.: Connections and integrations of dynamic programming
and constraint programming. In: CPAIOR 2001 (2001)
9. Freuder, E.C.: Progress towards the holy grail. Constraints 23(2), 158–171 (2018)
10. Malitsky, Y., Sellmann, M., van Hoeve, W.J.: Length-lex bounds consistency for
knapsack constraints. In: International Conference on Principles and Practice of
Constraint Programming, pp. 266–281. Springer (2008)
11. Martello, S.: Knapsack Problems: Algorithms and Computer Implementations.
Wiley-Interscience Series in Discrete Mathematics and Optimization (1990)
12. Martello, S., Pisinger, D., Toth, P.: New trends in exact algorithms for the 0–1
knapsack problem. Eur. J. Oper. Res. 123(2), 325–332 (2000)
13. Martin, R.K.: Generating alternative mixed-integer programming models using
variable redefinition. Oper. Res. 35(6), 820–831 (1987)
14. Pisinger, D.: A minimal algorithm for the 0–1 knapsack problem. Oper. Res. 45(5),
758–767 (1997)
Modelling Dynamic Programming-Based Global Constraints 427
15. Plateau, G., Nagih, A.: 0–1 knapsack problems. In: Paradigms of Combinatorial
Optimization: Problems and New Approaches, vol. 2, pp. 215–242 (2013)
16. Prestwich, S.D., Rossi, R., Tarim, S.A., Visentin, A.: Towards a closer integration
of dynamic programming and constraint programming. In: 4th Global Conference
on Artificial Intelligence (2018)
17. Quimper, C.G., Walsh, T.: Global grammar constraints. In: International Confer-
ence on Principles and Practice of Constraint Programming, pp. 751–755. Springer
(2006)
18. Stuckey, P.J., Feydy, T., Schutt, A., Tack, G., Fischer, J.: The minizinc challenge
2008–2013. AI Mag. 35(2), 55–60 (2014)
19. Zhou, N.F., Kjellerstrand, H., Fruhman, J.: Constraint Solving and Planning with
Picat. Springer (2015)
Modified Extended Cutting Plane
Algorithm for Mixed Integer Nonlinear
Programming
1 Introduction
In this work, we address the following convex Mixed Integer Nonlinear Program-
ming (MINLP) problem:
(P ) min f (x, y)
x, y
s. t. g(x, y) ≤ 0, (1)
x ∈ X, y ∈ Y ∩ Zny ,
(P̄ ) min α
α,x,y
s. t. f (x, y) ≤ α (2)
g(x, y) ≤ 0
x ∈ X, y ∈ Y ∩ Zny .
Once the constraints of (P̄ ) are convex, when we linearize them by Taylor
series about any given point (x̄, ȳ) ∈ X × Y , we obtain the following valid
inequalities for (P̄ ):
x − x̄
∇f (x̄, ȳ)T + f (x̄, ȳ) ≤ α (3)
y − ȳ
T x − x̄
∇g(x̄, ȳ) + g(x̄, ȳ) ≤ 0 (4)
y − ȳ
(P ):
L
M min α
α,x,y
x − xj
s. t. ∇f (xj , y j )T j + f (xj , y j ) ≤ α, ∀(xj , y j ) ∈ L
y − y (5)
j j T x − xj j j j j
∇g(x , y ) + g(x , y ) ≤ 0, ∀(x , y ) ∈ L
y − yj
x ∈ X, y ∈ Y ∩ Zny .
Let (α̂, x̂, ŷ) be an optimal solution of (M L ). We emphasize that the value α̂
is a lower bound for (P̄ ) and (P ). If (α̂, x̂, ŷ) is feasible for (P̄ ), then the value
α̂ is also an upper bound for (P̄ ) and (P ). In this case, as (α̂, x̂, ŷ) gives the
same value as a lower and an upper bound for (P̄ ), this solution is optimal for
(P̄ ). Therefore, (x̂, ŷ) is also an optimal solution for (P ). On the other side, if
(α̂, x̂, ŷ) is not feasible for (P̄ ), it is necessary to add valid inequalities to (M L ),
to cut this solution out of its feasible set, strengthening the relaxation given by
this problem. To reach this goal, the ECP algorithm uses the strategy of adding
the solution (x̂, ŷ) to the set L.
The ECP algorithm is presented as Algorithm 1. We point out that ECP does
not require the solution of any NLP problem and does not use any information
from second order derivatives or approximations of them. This characteristic
can be advantageous in some cases, especially when the computation of second
order derivatives is hard or cannot be accomplished for some reason. We also
emphasize that the strategy of adding the solution (x̂k , ŷ k ) to the set L at the
end of each iteration (line 13) does not ensure that ECP has finite convergence.
Modified Extended Cutting Plane Algorithm for MINLP 431
We have observed that the new cuts generated are usually weak, which makes
the algorithm to require a large number of iterations to converge to an optimal
solution.
Let (ǔk , x̌k ) be an optimal solution for (PŷFk ). The point (x̌k , ŷ k ) is then added
to the set L. After the update of L, with the addition of (x̃k , ŷ k ), if problem (Pŷk )
is feasible, or with the addition of (x̌k , ŷ k ) otherwise, the algorithm starts a new
iteration, using as a stopping criterion a maximum tolerance for the difference
between the best lower and upper bounds obtained. As shown in [7], assuming
that the KKT conditions are satisfied at the solutions of (Pŷk ) and (PŷFk ), the
strategy used to update L ensures that a given solution ŷ k for the integer variable
y is not visited more than once by the algorithm, except in case it is part of the
optimal solution of (P ) (in this case the solution may be visited at most twice).
As the number of integer solutions is finite by hypothes, since Y is bounded, the
algorithm is guaranteed to find an optimal solution of (P ) in a finite number of
iterations. Thus, in comparison with the ECP algorithm, OA tends to spend a
smaller number of iterations, with the overhead of needing to solve one or two
NLP problems at each iteration. Algorithm 2 presents the OA algorithm.
Note that (MŷLk ) can be obtained from (M L ) simply by fixing the variable y at
the value ŷ k . Thus, when considering problem (MŷLk ), we expect to obtain good
feasible solutions sooner when comparing with the traditional ECP algorithm.
These solutions are used for a possible update of the known upper bound z u and
to strengthen the relaxation given by the master problem through their inclusion
in the set L.
The MECP algorithm is presented as Algorithm 3. Comparing to the ECP
algorithm, the novelty is the introduction of lines 7, 12–17. We point out that, at
each iteration, between the solution of (M L ) (line 9) and the solution of (MŷLk )
(lines 12–13), the solution (x̂k , ŷ k ) is added to set L (line 11), to strengthen the
linear relaxation built with the points of this set. For this reason, it is possible
that the optimal solution xk of (MŷLk ) is different from x̂k . With this strategy, we
expect that the MECP algorithm will find feasible solutions sooner, and, there-
fore, can close the integrality gap with less computational effort, when compared
to ECP. We still note that the solution obtained when solving (MŷLk ) is also added
to L (line 17), in case the problem is feasible. As (MŷLk ) is a linear programming
problem, its resolution does not add significant computational burden, when
compared to the resolution of (M L ).
It is important to point out that the convergence of MECP to an optimal
solution of (P ) is easily verified from the convergence of ECP, as MECP con-
siders, during its iterations, all linearization points considered by ECP (with
some additional points). We also emphasize that, in the context of this discus-
sion, the approximation of the feasibility problem (PŷFk ) by a linear programming
problem would make no sense, because in case (MŷLk ) is infeasible, the value ŷ k
cannot represent a solution to y in any problem (M L ) from the current iteration,
and, therefore, the resolution of a feasibility problem is not necessary to cut out
solutions with value ŷ k from (M L ).
434 W. Melo et al.
5 Computational Results
The algorithms were implemented in C++ 2011 and compiled with ICPC
16.0.0. To solve the MILP problems, we use the solver Cplex 12.6.0 [4], and
to solve the NLP problems, we use the solver Mosek 7.1.0 [1]. The tests were run
on a computer with processor core i7 4790 (3.6 GHz), under the operating system
Open Suse Linux 13.1. All the algorithms were configured to be executed by
a single processing thread, which means that they were executed by a single
processor at each time in the machine used for the tests. The CPU time of each
algorithm in each test instance was limited to 4 hours. Values of 10−6 and 10−3
were adopted as absolute and relative convergence tolerance, respectively, for all
algorithms.
More specifically, if the curve of a given algorithm passes through the point
(α, τ ), this indicates that for τ % of the instances, the result obtained by the
algorithm observed is smaller or equal to α times the best computational time
among all algorithms.
Note, for example, that the OA curve passes through the point (1, 57%).
This means that, for 57% of the test instances considered, OA achieves the best
result with respect to the computational time (one time greater than the best
result). Next, the curve passes through the point (1.2, 63%), indicating that OA
was able to solve 63% of the instances spending up to 20% more time than the
best algorithm in each (1.2 times greater than the best result). Thus, roughly
speaking, we can say that, the more the curve of an algorithm is above the
curves of the other algorithms in the graph, the better the algorithm did when
compared to the others, with respect to the characteristic analyzed in the graph.
Analyzing Fig. 1, we can observe that the performance of OA dominates
the performance of ECP. It is also possible to note that our MECP algorithm
presents substantially better results than ECP, completely dominating its per-
formance and even becoming competitive in relation to the OA algorithm. It is
worth noting that the MECP curve dominates the OA curve for results greater
than or equal to 2.2 times the best result. All algorithms were able to solve about
90% of the test instances in the maximum running time stipulated.
Finally, we note that the implementations of all the algorithms considered in
this study, ECP, MECP and OA, together with heuristics [11] are available in
our MINLP solver Muriqui [10,12].
References
1. The MOSEK optimization software. Software. http://www.mosek.com/
2. Bonami, P., Kilinç, M., Linderoth, J.: Algorithms and software for convex mixed
integer nonlinear programs. Technical Report 1664, Computer Sciences Depart-
ment, University of Wisconsin-Madison (2009)
3. CMU-IBM: Open source MINLP project (2012). http://egon.cheme.cmu.edu/ibm/
page.htm
4. Corporation, I.: IBM ILOG CPLEX V12.6 User’s Manual for CPLEX (2015).
https://www.ibm.com/support/knowledgecenter/en/SSSA5P 12.6.0
5. D’Ambrosio, C., Lodi, A.: Mixed integer nonlinear programming tools: a practical
overview. 4OR 9(4), 329–349 (2011). https://doi.org/10.1007/s10288-011-0181-9
6. Duran, M., Grossmann, I.: An outer-approximation algorithm for a class of mixed-
integer nonlinear programs. Math. Program. 36, 307–339 (1986). https://doi.org/
10.1007/BF02592064
7. Fletcher, R., Leyffer, S.: Solving mixed integer nonlinear programs by outer
approximation. Math. Program. 66, 327–349 (1994). https://doi.org/10.1007/
BF01581153
8. Hemmecke, R., Köppe, M., Lee, J., Weismantel, R.: Nonlinear integer program-
ming. In: Jünger, M., Liebling, T.M., Naddef, D., Nemhauser, G.L., Pulleyblank,
W.R., Reinelt, G., Rinaldi, G., Wolsey, L.A. (eds.) 50 Years of Integer Programming
1958–2008, pp. 561–618. Springer, Heidelberg (2010). https://doi.org/10.1007/978-
3-540-68279-0 15
Modified Extended Cutting Plane Algorithm for MINLP 437
9. Leyffer, S.: Macminlp: Test problems for mixed integer nonlinear programming
(2003). https://wiki.mcs.anl.gov/leyffer/index.php/macminlp (2013). https://
wiki.mcs.anl.gov/leyffer/index.php/MacMINLP
10. Melo, W., Fampa, M., Raupp, F.: Integrating nonlinear branch-and-bound and
outer approximation for convex mixed integer nonlinear programming. J. Glob.
Optim. 60(2), 373–389 (2014). https://doi.org/10.1007/s10898-014-0217-8
11. Melo, W., Fampa, M., Raupp, F.: Integrality gap minimization heuristics for binary
mixed integer nonlinear programming. J. Glob. Optim. 71(3), 593–612 (2018).
https://doi.org/10.1007/s10898-018-0623-4
12. Melo, W., Fampa, M., Raupp, F.: An overview of MINLP algorithms and their
implementation in muriqui optimizer. Ann. Oper. Res. (2018). https://doi.org/10.
1007/s10479-018-2872-5
13. Trespalacios, F., Grossmann, I.E.: Review of mixed-integer nonlinear and general-
ized disjunctive programming methods. Chem. Ing. Tech. 86(7), 991–1012 (2014).
https://doi.org/10.1002/cite.201400037
14. Westerlund, T., Pettersson, F.: An extended cutting plane method for solving con-
vex MINLP problems. Comput. Chem. Eng. 19(Supplement 1), 131–136 (1995).
https://doi.org/10.1016/0098-1354(95)87027-X. European Symposium on Com-
puter Aided Process Engineering
15. World, G.: MINLP library 2 (2014). http://www.gamsworld.org/minlp/minlplib2/
html/
On Proximity for k-Regular
Mixed-Integer Linear Optimization
1 Introduction
We study the standard form MILO (mixed-integer linear optimization) problem
with full row-rank A ∈ Zm×n , b ∈ Qm , and I ⊆ [n] := {1, 2, . . . , n}. The main
issue that we are interested in is: for distinct I, J ⊆ [n], and optimal solution
x∗ (I) to I-MIP, find a good upper bound on x∗ (I) − x∗ (J )∞ for some optimal
x∗ (J ) to J -MIP. Mostly we are considering the ∞-norm, though it is nice to
have results using the 1-norm. A key special-case of interest is I = ∅ and J = [n],
where we are asking for a bound on how far components of an optimal solution
of a pure MILO problem may be from components of a solution of its continuous
relaxation—a quantity that is very relevant to the issue of rounding and local
search starting from a relaxation solution. In some situations we add further
natural conditions (e.g., b ∈ Zm , x∗ (∅) is a basic solution, etc.).
Even in dimension n = 2, it is easy to construct examples where the solution
of a pure MILO problem is far from the solution of its continuous relaxation.
Choose p1 < p2 to be a pair of large, relatively-prime positive integers. Consider
the integer standard-form problem
By the equation in I-P, every feasible solution (x̂1 , x̂2 ) satisfies (x̂1 +1)/(x̂2 +1) =
p1 /p2 . Because p1 and p2 are relatively prime, there cannot be a feasible solution
to {1, 2}-P with x̂1 smaller than p1 − 1. So the optimal solution to {1, 2}-P is
(z1∗ , z2∗ ) := (p1 − 1, p2 − 1) . But it is very easy to see that the (unique and basic)
Supported in part by ONR grant N00014-17-1-2296.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 438–447, 2020.
https://doi.org/10.1007/978-3-030-21803-4_44
On Proximity for k-Regular Mixed-Integer Linear Optimization 439
optimal solution to its continuous relaxation ∅-P is (x∗1 , x∗2 ) := (0, −1 + p2 /p1 ),
quite far from the optimal solution to {1, 2}-P. With such a small example, it is
not obvious exactly what drives this behavior, and at a high level our goal is to
control and investigate this.
Thus we could directly apply the theorems in the general form to the standard
form using the simple fact that Δ([A; −A; −I]) = Δ(A). However, the special
structure of the resulting general-form matrix could imply a better bound in some
cases. For example, making a clever argument employing the “Steinitz Lemma”,
[3] establishes x̄ − z ∗ 1 ≤ m(2mU + 1)m , when x̄ is a basic optimal solution of
∅-MIP and z ∗ is some optimal solution of [n]-MIP, where U = maxij {|aij |}. [1]
gives an optimal bound Δ(A)−1 on the ∞-norm distance between basic optimal
solutions and feasible integer solutions for standard-form knapsack polyhedra,
i.e., the case when m = 1. [6,7] establishes a bound of k dim(n.s.(A)) on the
∞-norm distance between optimal solutions for “k-regular” ∅-MIP and [n]-MIP,
where k-regular means that the elementary vectors (i.e., the nonzero vectors with
minimal support) in the null space of the constraint matrix A can be scaled to
have all entries in {0, ±1, ±2, . . . , ±k} (see [6–8]). Note that k = 1 or regular
is equivalent to A be equivalent to a totally-unimodular matrix. A nice family
of examples with k = 2 is when A is the vertex-edge incidence matrix (or its
transpose) of a mixed-graph.
440 L. Xu and J. Lee
1.2 Fundamentals
Let F be an arbitrary field. For any x ∈ Fn , the support of x, denoted x, is the
set of coordinates with nonzero entries, i.e., x := {i ∈ [n] : xi = 0}. Let V be
a vector subspace of Fn . A vector x ∈ V is an elementary vector of V if x = 0,
and x has minimal support in V \ {0}; i.e., x ∈ V \ {0} and y ∈ V \ {0} with
y x. The set of elementary vectors of V is denoted as F(V ).
Assume now that F is ordered. A vector y ∈ Fn conforms to x ∈ Fn if
xi yi > 0 for i ∈ y. The following result of Rockafellar is fundamental, that every
nonzero x ∈ V can be expressed as a conformal sum of at most min{dim(V ), |x|}
elementary vectors from F(V ).
Theorem 1. ([10], Theorem 1). Let V be a subspace of Fn , where F is an
t x ∈ V \ {0}, there exists elementary vectors v1 , . . . , vt ∈
ordered field. For every
V , such that x = i=1 vi , where each vi conforms to x, none has its support
contained in the union of the supports of the others, and t ≤ min{dim(V ), |x|}.
Definition 2. Let V be a subspace of Rn . The subspace V is k-regular if
∀ x ∈ F(V ), ∃ λ ∈ F \ {0} such that λxi ∈ {±1, ±2, . . . , ±k} ∀ i ∈ x.
We also refer to a standard-form problem as k-regular when the null space of
its constraint matrix is k-regular. We have the simple following property for
2-regular standard-form problems.
Proposition 3. Let P := {x : Ax = b, x ≥ 0}, where A ∈ Zm×n and
rank(A) = m, and suppose that P has an integer feasible solution (this implies
that b ∈ Zm ). If V := n.s.(A) is 2-regular, then every basic solution (feasible or
not) x̄ of P satisfies 2x̄ ∈ Zn .
Proof. Rearranging columns, we may assume that the first m columns of A form
a basis matrix Aβ corresponding to x̄. That is, A = [Aβ , Aη ], x̄β = A−1 β b, and
x̄η = 0. Multiply A−1 β on both sides of Ax = b, we get [I, M ]x = A −1
β b. Also
V = r.s.([−M , I]) (r.s.(B) means the row space of B), and each row of [−M , I]
is in F(V ). Because each row has an entry of 1, and each row can be scaled by
a nonzero to have all entries in {0, ±1, ±2}, it follows that all entries of M are
in {0, ± 12 , ±1, ±2}. So we have that Ax = b is equivalent to [2I, 2M ]x = 2A−1β b,
where now [2I, 2M ] is an all-integer matrix. Plugging a feasible integer solution
x0 into [2I, 2M ]x = 2A−1 −1
β b, we can conclude that 2x̄β = 2Aβ b ∈ Z , and so
m
2x̄ ∈ Z .
n
On Proximity for k-Regular Mixed-Integer Linear Optimization 441
[9] considers the question of bounding the ∞-norm distance between optimal
solutions of mixed-integer linear problems that only differ in the sets of indices
of integer variables. And by using the properties of so-called bimodular systems
in [12], they manage to give a tighter bound Δ(A) − 1 for the special case when
Δ(A) ≤ 2 and I1 , I2 ∈ {∅, [n]}, which is just in terms of Δ(A) and not relative to
|I1 ∪ I2 | = n. However, Δ(A) ≤ 2 is a very strong assumption. Of course totally-
unimodular A have this property, so we have vertex-edge incidence matrices of
digraphs, for example. But we do not know further broad families of examples
for Δ(A) ≤ 2. It is natural to think about vertex-edge incidence matrices of
mixed graphs. But in general these have subdeterminants that are ±2k , k ∈ Z+ .
For example, if G is a collection of k disjoint undirected triangles, then the
square vertex-edge incidence matrix has determinant 2k . But interestingly, the
null space of the vertex-edge incidence matrix of every mixed graph is 2-regular
(see [7], for example), so there is an opportunity to get a better proximity bound
than afforded by only considering Δ(A). Generally, for integer matrices A, we
have k ≤ Δ(A) (see [7]), and so the idea of k-regularity gives a more refined
view that can be exploited.
We consider the standard-form I-MIP, where we assume that V := n.s.(A)
is k-regular and dim(V ) = r. Note that [n]-MIP is the k-regular pure-integer
problem, while ∅-MIP is its continuous relaxation.
Theorem 4. ([7]). If [n]-MIP is feasible, then for each optimal solution x∗ to
the corresponding continuous relaxation ∅-MIP, there exists an optimal solution
z ∗ to [n]-MIP with z ∗ − x∗ ∞ ≤ kr.
We are going to generalize Theorem 4, using the technique developed in [9].
Toward this, we restate the main lemma used in [9].
Lemma 5. ([9], Lemma 1). Let d, t ∈ Z≥1 , g 1 , . . . , g t ∈ Zd , and α1 , . . . , αt ≥
t
0. If i=1 αi ≥ d, then there exist βi ∈ [0, αi ] for i ∈ [t] such that not all
t
β1 , . . . , βt are zero and i=1 βi g i ∈ Zd .
We use a mild generalization of [3, Lemma 5] (from the case I = [n], J = ∅):
Proof. Without loss of generality, assume that I ∪ J = [d], with d ∈ [n]. Let
x∗ (I) ∈ Rn be optimal for I-MIP. And let z̃(J ) ∈ Rn be any optimal solution of
J -MIP. By Theorem 1, y := x∗ (I) − z̃(J ) ∈ V can be expressed as a conformal
t
sum of at most r vectors in F(V ), i.e., y = i=1 v i , t ≤ r, where each v i ∈ F(V )
i
conforms to y. For each summand v , because V is k-regular, there exists a
positive scalar λi , so that that λ1i v i is {0, ±1, . . . , ±k}-valued. So we have
t t
y= i=1 λi λ1i v i := i=1 λi g i ,
where g i = λ1i v i is an integer vector with g i ∞ ≤ k and Ag i = 0, g i = λ1i v i
also conforms to y. Next, consider the set
t
S := {(γ̄1 , . . . , γ̄t ) : γ̄i ∈ [0, λi ] for all i ∈ [t], i=1 γ̄i g ∈ Z × R
i d n−d
},
t
i=1 αi > d. Thus, let h ∈ Z be the projection of g onto the
i d i
hold, i.e.,
i
first d coordinates, we can apply Lemma 5 to α
t i , h , and obtain β ,
1 . . . , βt with
t
βi ∈ [0, αi ] such that not all βi ’s are zero and i=1 βi hi ∈ Zd . Hence i=1 βi g i ∈
Z ×R
d n−d
. Now we consider γi := γi + βi ≥ 0. Note that γi ≤ γi + αi = λi , and
t i
t t
i=1 βi g ∈ Z × R
i i d n−d
i=1 γi g = i=1 γi g + .
t
So (γ1 , . . . , γt ) ∈ S. However, because not all βi ’s are zero, we have γi >
t i=1
i=1 γi , which contradicts the maximality of (γ1 , . . . , γt ).
Proof. (1): The proof is similar to that of Theorem 7 with some extra care using
Theorem 1 and Proposition 3. Let x̄∗ be a basic optimal solution of ∅-MIP. If
x̄∗ ∈ Zn , then z ∗ := x̄∗ satisfies the conclusion, so we just consider x̄∗ ∈ / Zn .
∗
Because V is 2-regular, by Proposition 3, we have 2x̄ ∈ Z . Let z̃ be optimal for
n
(2): The proof is similar to that of (1) by choosing a basic optimal solution x̄
first, and then letting x∗ := x̄ − w.
Remark 9. For the mixed-integer case, we do not have a result like Proposition 3
for the optimal solution; so we cannot generalize Theorem 8 in such a direction.
Next we give an example to demonstrate that for Theorem 8, Part (1), the
bound of 32 r cannot be improved to better than r.
⎡ ⎤ ⎡ 1 1 ⎤
2 −2
1
101 2
⎢ ⎥
Example 10. Let G = ⎣1 1 0⎦, G−1 = ⎣− 12 12 12 ⎦, e1 = (1, 0, 0) , h =
011 1
− 12 12
⎡ ⎤ 2 ⎡ ⎤
1000 1 0 0 0
⎢1 1 0 1⎥ ⎢− 1 1 1 − 1 ⎥
G−1 e1 = ( 12 , − 12 , 12 ) , Ḡ = ⎢ ⎥, Ḡ−1 = ⎢ ⎢ 1
2 2 2 2⎥
1 ⎥, ē1 =
⎣0 1 1 0⎦ ⎣ 2 − 1 1
2 2 2 ⎦
0011 − 12 12 − 21 12
(1, 0, 0, 0) , h̄ = Ḡ−1 ē1 = (1, − 12 , 12 , − 12 ) ,
⎡ ⎤
Ḡ 0 . . . 0 ē1 . . . ē1
⎢ 0 G . . . 0 e1 . . . 0 ⎥
⎢ ⎥
A=⎢. . . . ⎥ ∈ Z(4+3p)×(4+4p) , b = β1 ∈ Z4+3p ,
⎣ . . . . 0 ... 0⎦
. . . .
0 0 . . . G 0 . . . e1
2 2
1
x3i+4 2
On Proximity for k-Regular Mixed-Integer Linear Optimization 445
Now, let p be an even integer, and let β be a large enough (> p) odd integer.
Then x3p+4+i is odd for i = 1, . . . , p because of the integrality
pof x3i+2 , which
implies x3p+4+i ≥ 1. In this case, x − u∞ ≥ |x1 − u1 | = i=1 x3p+4+i ≥ p
for all feasible integer solution x. Note that because A has full row rank, r =
n − m = (4 + 4p) − (4 + 3p) = p.
Inspired by the proximity result for bimodular matrices (i.e., Δ(A) ≤ 2) in [9,12],
we consider a special case where A is the incidence matrix of a mixed graph.
Such an A has a 2-regular null space, but it is not generally bimodular (see [7]).
A mixed graph G = G(V, E+ , E− , A) has vertices V, positive edges E+ , negative
edges E− , and arcs A. An edge with identical endpoints is a (positive or negative)
loop. An arc may have one void endpoint, in which case it is a half arc. The
incidence matrix A of G has a row for each vertex and a column for each edge
and arc. For each positive (resp., negative) loop e = (v, v), Av,e = +2(−2). For
all other positive (resp., negative) edges e = (v, w), Av,e = Aw,e = +1(−1). For
each half arc a = (v, ∅) (respectively, a = (∅, w)), Av,a = +1 (Aw,a = −1). For
each arc a = (v, w), Av,a = −Aw,a = +1. All unspecified entries of A are zero.
Mixed graphs, and their incidence matrices, have been studied previously under
the names bidirected graphs and signed graphs (see [14]).
For a mixed graph G(V, E+ , E− , A), we can construct an oriented signed graph
which has the same incidence matrix. The signed graph Σ consists of an unsigned
graph (V, {E+ , E− , A}), and an arc labelling σ, which maps E+ and E− to −1 and
maps A (except half arcs) to +1. Because there are in general two possibilities
for each column of the incidence matrix of a signed graph, the orientation is
chosen to make the incidence matrix the same as the mixed graph (see [14]).
For a cycle C = e1 e2 . . . ek not containing a half arc, if the product of the arc
labelling of the cycle σ(e1 )σ(e2 ) . . . σ(ek ) is +1, then the cycle is balanced, and
otherwise it is unbalanced. For C with some orientation, the incidence matrix is
⎡ ⎤
1 0 0 ... −σ(ek )
⎢−σ(e1 ) 1 0 ... 0 ⎥
⎢ ⎥
⎢ . .. ⎥
C=⎢ 0⎢ −σ(e2 ) 1 . . . ⎥ ⎥, (1)
⎢ . . . . . ⎥
⎣ .. .. .. .. .. ⎦
0 0 . . . −σ(ek−1 ) 1
Lemma 12. If full row-rank A is the incidence matrix of a mixed graph, and B
is a basis of A, then up to row/column rearrangement, B is block diagonal, with
each block being the incidence matrix of a quasitree, consisting of a spanning
tree, plus a half arc or an arc forming an unbalanced cycle (including the case
that the arc is a loop).
Lemma 12 follows from Theorem 5.1 (g) of [14], but [14] has another case for
a block of B—that it represents a spanning tree. But, in that case det(B) = 0.
Theorem 13. Let A be the incidence matrix of a mixed graph G such that for
every set S of vertex disjoint unbalanced cycles in G, there is a partition S1 S2 of
S such that each unbalanced cycle in S1 has a half arc in G incident to the cycle
and S2 has a perfect matching in G pairing these unbalanced cycles. Suppose that
P := {x : Ax = b, x ≥ 0} and P ∩ Zn = ∅. Then for each vertex u of P , there
exists y ∈ P ∩ Zn satisfying y − u∞ ≤ 1.
If ui ∈
/ Zni , then |det Bi | = 2, and there is an unbalanced cycle Ci in this block.
Similarly to the proof of Theorem 2 in [12], the lattice L generated by the columns
of Bi−1 can be divided into two classes: Zni and ui +Zni . For any j ∈ [ni ], we have
rj = Bi−1 ej ∈/ Zni , otherwise 1 Bi rj = 1 ej , the left-hand side is even because
1 Bi ≡ 0 mod 2, while the right-hand side is odd, resulting in a contradiction.
Therefore rj ∈ ui + Zni . Also for j ∈ [pi ], rj = Bi−1 ej = [Ci−1 (:, j); 0], implying
that the first pi entries of rj are 12 or − 12 by Proposition 11.
Consider the set S of unbalanced cycles in the blocks that ui is not integer,
by the assumption, there is a partition S1 S2 of S such that each unbalanced
cycle in S1 has a half arc in G incident to the cycle and S2 has a perfect matching
in G pairing these unbalanced cycles.
For each unbalanced cycle in S1 , assume that the cycle is in block Bi , and
the half arc incident to it has a non-zero (±1) corresponding to the columns
On Proximity for k-Regular Mixed-Integer Linear Optimization 447
References
1. Aliev, I., Henk, M., Oertel, T.: Distances to lattice points in knapsack polyhedra.
arXiv preprint arXiv:1805.04592 (2018)
2. Cook, W., Gerards, A.M., Schrijver, A., Tardos, É.: Sensitivity theorems in integer
linear programming. Math. Program. 34(3), 251–264 (1986)
3. Eisenbrand, F., Weismantel, R.: Proximity results and faster algorithms for integer
programming using the Steinitz lemma. In: SODA. pp. 808–816 (2018)
4. Granot, F., Skorin-Kapov, J.: Some proximity and sensitivity results in quadratic
integer programming. Math. Program. 47(1–3), 259–268 (1990)
5. Hochbaum, D.S., Shanthikumar, J.G.: Convex separable optimization is not much
harder than linear optimization. J. ACM 37(4), 843–862 (1990)
6. Lee, J.: Subspaces with well-scaled frames. Ph.D. dissertation, Cornell University
(1986)
7. Lee, J.: Subspaces with well-scaled frames. Linear Algebra Appl. 114, 21–56 (1989)
8. Lee, J.: The incidence structure of subspaces with well-scaled frames. J. Comb.
Theory Ser. B 50(2), 265–287 (1990)
9. Paat, J., Weismantel, R., Weltge, S.: Distances between optimal solutions of
mixed-integer programs. Math. Program. https://doi.org/10.1007/s10107-018-
1323-z (2018)
10. Rockafellar, R.T.: The elementary vectors of a subspace of Rn . In: Combinatorial
Mathematics and Its Applications, pp. 104–127. University of North Carolina Press
(1969)
11. Schrijver, A.: Theory of Linear and Integer Programming. Wiley (1998)
12. Veselov, S.I., Chirkov, A.J.: Integer program with bimodular matrix. Discret.
Optim. 6(2), 220–222 (2009)
13. Werman, M., Magagnosc, D.: The relationship between integer and real solutions
of constrained convex programming. Math. Prog. 51(1), 133–135 (1991)
14. Zaslavsky, T.: Signed graphs. Discrete Appl. Math. 4(1), 47–74 (1982)
On Solving Nonconvex MINLP Problems
with SHOT
1 Introduction
Mixed-integer nonlinear programming (MINLP) constitutes a difficult class of
mathematical optimization problems. As MINLP combines the combinatoric
nature of mixed-integer linear programming (MILP) and nonlinearities of non-
linear programming (NLP), there is still today often a practical limit on the
size of the problems (with respect to number of constraints and/or variables)
that can be solved. While this limit is constantly pushed forward through the
means of computational and algorithmic improvement, there are still MINLP
problems with only a few variables that are difficult to solve. Most of these cases
AL and JK acknowledge support from the Magnus Ehrnrooth Foundation and the
Newton International Fellowship by the Royal Society (NIF\R1\82194) respectively.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 448–457, 2020.
https://doi.org/10.1007/978-3-030-21803-4_45
On Solving Nonconvex MINLP Problems with SHOT 449
are nonconvex problems, i.e., MINLP problems with either a nonconvex objec-
tive function or one or more nonconvex constraints, e.g., a nonlinear equality
constraint.
Globally solving convex MINLP problems can nowadays be regarded almost
as a technology, as seen in a recent benchmark [10]. However, global nonconvex
MINLP is still very challenging. Solvers for this problem class include Antigone
[18], BARON [21], Couenne [1] and SCIP [5]. These solvers mostly rely on spatial
branch and bound, where convex understimators and concave overestimators are
refined in nodes in a branching tree. There are also reformulation techniques that
can transform special cases of nonconvex problems, e.g., signomial [14] or general
twice-differentiable [16], into convex MINLP problems that can then be solved
with convex solvers.
The requirement to solve the MINLP problem to guaranteed optimality or
having tight bounds on the best possible solution may be a nice bonus, but for
many real-world cases, it is not always a necessity or even possible. Often end
users of optimization software are mostly interested in finding a good-enough
feasible solution to the optimization problem at hand within a reasonable time.
For these use-cases, local MINLP solvers may be worth considering. Here, a local
solver for MINLP is defined as a solver that while it does solve convex problems
to global optimality, it cannot guarantee that a solution is found for a nonconvex
problem, let alone a locally or globally optimal one. However, local solvers are
often faster than global solvers, and in many cases they also manage to return
the global solution, or a very good approximation of it. Local MINLP solvers
include AlphaECP [13], Bonmin [2], DICOPT [6] and SBB [4]. SHOT is a new
local solver initially intended mainly for convex MINLP problems [15].
In this paper, the following general type of MINLP problem is considered:
minimize f (x),
subject to Ax ≤ a, Bx = b,
gk (x) ≤ 0 ∀k ∈ KI ,
(1)
hk (x) = 0 ∀k ∈ KE ,
xi ≤ xi ≤ xi ∀i ∈ I = {1, 2, . . . , n},
xi ∈ R, xj ∈ Z ∀i, j ∈ I, i = j.
As could be seen in Example 1, the main issue is that cuts generated exclude
viable solution candidates. In general, a simple solution to this shortcoming is to
generate fewer and less tight hyperplane cuts, e.g., by generating cutting planes
(ECP) instead of supporting hyperplanes (ESH), or by reducing the emphasis
given to generating cuts for nonconvex constraints. Also, since it is nontrivial,
or in many cases not even possible, to obtain the integer-relaxed interior point
required in the ESH algorithm, the ECP method might in general be a safer
choice in the dual strategy in SHOT; this is however not considered in this
paper.
1
SHOT is available at https://www.github.com/coin-or/shot.
On Solving Nonconvex MINLP Problems with SHOT 451
2 2 2
x2 x2 x2
1 1 1
l1 l2 c1
0 0 0
2 4 6 x1 8 2 4 6 x1 8 2 4 6 x1 8
Fig. 1. In the figures the shaded area indicate the integer-relaxed feasible region of the
MINLP problem. In the figure to the left, the MILP problem in the first iteration, with
the feasible set defined by the variable bounds and the two original linear constraints
l1 and l2 , has been solved to obtain the solution point (2, 2). In the middle figure, a
root search is performed according to the ESH algorithm between this point and an
interior point (6.0, 0.4) of the integer-relaxed feasible region of the MINLP problem
to give a point on the boundary. A supporting hyperplane (c1 ) is then added to the
MILP problem. In the figure to the right, the integer-relaxed feasible region of the
updated MILP problem is shown. Since all integer-feasible solutions have been cut off,
the problem is infeasible. We cannot continue after this and no primal solution is found.
stage, if a primal solution has been found we can terminate with this solution.
One alternative strategy would also be to remove some of the cuts added until
the problem is feasible again, but this can eliminate the effect of cuts added
in previous iterations and result in cycling. SHOT uses a different approach,
however, where the MILP problem is relaxed to restore feasibility. The same
approach was successfully used in [8], where an implementation of the ECP
algorithm in Matlab was connected to the process simulation tool Aspen to
solve simulation based MINLP problems. A similar technique was also used in
[7] to determine the feasibility of problems. To find the relaxation needed to
restore feasibility, the following MILP problem is solved
minimize vT r
subject to Ax ≤ a, Bx = b,
Ck x + ck ≤ r, r ≥ 0, (2)
xi ≤ xi ≤ xi ∀i ∈ I = {1, 2, . . . , n},
xi ∈ R, xj ∈ Z ∀i, j ∈ I, i = j.
Here the matrix Ck and vector ck contains the cuts added until the current
iteration k. The vector r contains the relaxations for restoring feasibility and
the vector v is used to individually penalize the relaxations of the different cuts.
The main strategy is to penalize the relaxation of the last cuts stronger than
the relaxation of the cuts from the early iterations. This will favor a relaxation
of the early cuts and reduce the risk of cycling, i.e., first adding the cuts in
one iteration and removing them in the next. The penalty terms in SHOT are
currently determined as vT = [1, 2, . . . N ], where N is total number of cuts
added. After the relaxation problem (2) is solved, the MILP model is modified
as Ck x + ck ≤ τ r, where τ > 1 is a parameter to relax the model further.
Both CPLEX and Gurobi have functionality built in to find a feasibility
relaxation of an infeasible problem, and this functionality is directly utilized in
SHOT. As Cbc lacks this functionality, repairing infeasible cuts are currently
not supported for this choice of subsolver.
Whenever the MILP solver returns with a status of infeasible, the repair
functionality of SHOT tries to modify the bounds in the constraints causing
the infeasibility. We do not want to modify the linear constraints that originate
from the original MINLP problem nor the variable bounds. In the MILP solvers’
repair functionality, it is possible to either minimize the sum of the numerical
modifications to the constraint bounds or the number of modifications; here
we have used the former. If it was possible to repair the problem, SHOT will
continue to solve the problem normally, otherwise SHOT will terminate with the
currently best known primal solution (if found).
In Fig. 2, we apply the repair functionality to the problem in Example 1.
As seen in Fig. 2, the dual strategy in SHOT can get stuck in suboptimal solu-
tions. To try to force the dual strategy to search for a better solution, a primal
On Solving Nonconvex MINLP Problems with SHOT 453
2 2 2
x2 x2 x2
1 1 1
c2
c1
0 0 0
2 4 6 x1 8 2 4 6 x1 8 2 4 6 x1 8
Fig. 2. The repair functionality is now applied to the infeasible MILP problem illus-
trated on the left so that the constraint c1 is relaxed and replaced with c2 . Thus,
an integer feasible solution (7.8, 1) to the updated MILP problem can be obtained as
illustrated in the middle and right figures.
2 2 2
c3
1 1 1
p p
c2 c2
0 0 0
2 4 6 8 2 4 6 8 2 4 6 8
Fig. 3. A primal cut p is now introduced to the MILP problem in the left figure, which
makes the problem infeasible as shown in the middle figure. The previously generated
cut c2 is therefore relaxed by utilizing the technique in Sect. 3.1 and replace with c3
to allow the updated MILP problem to have a solution (2.3, 1). This solution is better
than the previous primal solution (7.8, 1)! After this, we can continue to generate more
supporting hyperplanes to try to find an even better primal solution.
if a new primal bound has not been found in a specified number of iterations in
SHOT. This procedure, applied to Example 1, is exemplified in Fig. 3.
It is beneficial to introduce the set of binaries for the variable with smaller
domain. Also, depending on whether c is negative or positive one of the following
constraints are required:
w − k · xj + xi xj · bk ≤ xi xj if c > 0,
∀k = xi , . . . , xi :
w − k · xj − xi xj · bk ≤ −xi xj if c < 0.
∞ ∞
tln7 0 0 608 0 0 648 >100 0 1800 ∞ 0.1 ∞ 3.9 ∞ 0.1
wastepaper4 27 0 1800 0 0 116 0 1800 >100 0.7 12 310 >100 10
455
6 Conclusions
In this paper, some functionality for improving the stability of SHOT for non-
convex MINLP were described. With these modifications, the performance on
nonconvex problems is very good compared to the other local MINLP solvers
considered. The steps illustrated in this paper are, however, only a starting
point for further development, and the goal is to significantly increase the types
of MINLP problems that can be solved to global optimality by SHOT. To this
end, we intend to include convexification techniques based on lifting reformula-
tions for signomial and general twice-differentiable functions based on [14]. For
problems with a low to moderate number of nonlinearities, this might prove to
be a viable alternative to spatial branch and bound solvers.
References
1. Belotti, P., Lee, J., Liberti, L., Margot, F., Wächter, A.: Branching and bounds
tightening techniques for non-convex MINLP. Optim. Methods Softw. 24, 597–634
(2009)
2. Bonami, P., Lee, J.: BONMIN user’s manual. Numer. Math. 4, 1–32 (2007)
3. Bussieck, M.R., Dirkse, S.P., Vigerske, S.: PAVER 2.0: an open source environment
for automated performance analysis of benchmarking data. J. Glob. Optim. 59(2),
259–275 (2014)
On Solving Nonconvex MINLP Problems with SHOT 457
1 Introduction
graph that does not contain any loops and there is no more than one edge connecting
two vertices. It should be noted that in this paper we are studying only unweighted
simple graphs. An undirected graph where all the vertices are adjacent to each other is
called complete. Otherwise, a graph with no edges is called edgeless, in other words no
two vertices are adjacent to each other. A clique is a complete subgraph of a graph
G and an independent set is an edgeless subgraph of G. Complement graph G’ of a
simple graph G is a graph that has the same vertex set, but the edge set consists only
from vertices that are not present G. G0 ¼ ðV; KnEÞ, where K is the edge set consisting
from all possible edges. Vertex cover of a graph G is a vertex set such that each edge of
G is incident to at least one vertex from this set. Graph coloring is process of assigning
labels i.e. colors to vertices with a special property that no two adjacent vertices can
share the same color. A color class is a set of vertices containing vertices with the same
color. It is clearly seen from coloring property that each color class is nothing more
than an independent set. Graph is called k-colorable if it can be colored into k colors.
The minimum number of colors required for coloring a graph G is called the chromatic
number - v(G) and in this case graph is called k-chromatic. Maximum clique problem –
a problem of finding maximum possible complete subgraph of a graph G. Solving
maximum clique problem or upgrading algorithms for finding maximum clique will not
only improve one specific, narrow problem but help to find better algorithms for all the
problems reducible to maximum clique problem.
5. MCS - Three years later after MCR was released a new improvement for the same
algorithm appeared called MCS (Tomita et al. 2010). This time authors focused on
approximate coloring enhancements.
6. MCS improved - “Improvements to MCS algorithm for the maximum clique
problem” article was released in 2013 by Mikhail Batsyn, Boris Goldengorin,
Evgeny Maslov and Panos M. Pardalos (Batsyn et al. 2013). MCSI show very good
results on dense graphs using high-quality solution gained by ILS heuristic
algorithm.
3 New Algorithm
In this part we are going to introduce a new algorithm solving maximum clique
problem. It is called VRecolor-BT-u as this algorithm is a successor of VColor-BT-u
algorithm and it implements recoloring on each depth. There were multiple algorithms
described previously in this work. The idea of a new one is to gather and combine all
the gained knowledge to fasten maximum clique finding even more.
3.1 Description
The main idea of a new algorithm is to combine reversed search by color classes (from
VColor-BT-u) and in depth coloring i.e. recoloring (from MCQ and successors).
Before we can start there should be some useful properties from previous algorithms
noted:
1. Reversed search by color classes means searching for a clique in a constantly
increasing subgraph adding each color class one by one holding a cache b[] for each
color class, where cache is a maximum clique found by given color class. First of
all, we consider a subgraph S1 consisting only from vertices of a first color class C1.
After than
S subgraph
S S S2 is created with two color classes C1 and C2. In general
Si ¼ C1 C2 . . . Ci .
2. Pruning formula for reversed search by color classes is d 1 þ b½C ðvdi Þ jCBC j
can be used only if vertices in each subgraph Si are ordered by initial color classes
(using this color classes we are constructing a new subgraph on each iteration).
3. If vertices are ordered by their color numbers and are expanded starting from the
largest color number then all the vertices with color number lower than a threshold
ðth ¼ jCBC j ðd 1Þ can be ignored as they will not be expanded because of a
pruning formula d 1 þ MaxfNo½ pjp 2 Rg jCBC j.
4. Pruning formula d 1 þ MaxfNo½ pjp 2 Rg jCBC j can be used when we are
reapplying coloring on each depth and vertices are reordered with response to these
colors.
From this point it is seen that properties 2 and 4 are conflicting with each other as two
pruning formulas require different vertex ordering. As a result if both bounding rules
are used we are going to miss some cliques when a promising branch will be pruned.
To avoid such situations the formula d 1 þ MaxfNo½ pjp 2 Rg jCBC j was used not
Reversed Search Maximum Clique Algorithm Based on Recoloring 461
to prune a branch but to skip a current vertex as expanding it is not going to give us a
better solution. This means that if vertices are recolored on each depth, but are not
ordered with response to new colors, we can skip a vertex without expanding it if and
only if its color number is lower than a current threshold and there is no neighbors of
this vertex with color number larger than threshold and who stand after the bound
gained from the first pruning formula d 1 þ b½C ðvdi Þ jCBC j.
There is an example on a Fig. 1 that shows how a conflict with two different
colorings is solved. Green lines show adjacency of two vertices (not all the adjacent
vertices are marked with green lines, but only two that are interesting for us in this
specific example). Let us assume that current depth is 2 and we have the following
prerequisites:
• d = 2 (depth is 2)
• jCBC j ¼ 3 (current best clique is 3)
• th ¼ 3 ð2 1Þ ¼ 2 (threshold taken from skipping formula, we need to expand
vertices having color number bigger than threshold)
• b½1 ¼ 1; b½2 ¼ 2; b½3 ¼ 3; b½4 ¼ 3 (cache values from previous iterations)
• bnd ¼ 2 (index of a rightmost vertex expanding which a pruning formula d
1 þ b½C ðvdi Þ jCBC j will prune current branch)
• Ca – array storing initial color classes, Cb – array storing in depth color classes
Let’s analyze the current example (picture 4.1). We start with the rightmost vertex h
with in depth color number 1 (No[h] = 1). We skip this vertex as long as its color
number is lower than a threshold (th = 2). As you can see vertex h might be contained in
a larger clique as it is connected with a vertex r (No[r] = 3), but we skip it anyway
because vertex r will be expanded later. Now we proceed with the next vertex t. Color
number of t is 1 (No[t] = 1), the same as vertex h has, but in this case it is not possible to
skip vertex t, because it is adjacent to vertex k (No[k] = 3). Vertex k stands after the
pruning bound (bnd = 2), therefore it will not be expanded at all. If we skip vertex t right
now we might possibly skip a larger clique, this means that vertex t should be expanded.
The next vertex to analyze is vertex a, we skip it as it’s in depth color number is equal to
the threshold (th = No[a] = 2) and there are no adjacent vertex standing after bound.
And the last expanded vertex on current depth is r (No[a] = 3) as its color number is
larger than the threshold. It should be noted that skipped vertices are not thrown away
from further considerations (when building the next depth), they should be stored in a
462 D. Kumlander and A. Porošin
separate array and added to the next depth with preserved order. There is another
pruning formula used right after recoloring is done. As we already know number of
color classes obtained by coloring subgraph Gd is an upper bound for maximum clique
in a current subgraph. This property allows us to use the following pruning formula
d 1 þ cn jCBC j, where cn is a number of colors gained from recoloring.
3.3 Algorithm
CBC – current best clique, largest clique found by so far; d – depth; c – index of the
currently processed color class; di – index of the currently processed vertex on depth d;
b – array to save maximum clique values for each color class; Ca – initial color classes
array; Cb – color classes array recalculated on each depth; Gd - subgraph of graph G
induced by vertices on depth d; cn – number of color classes recalculated on each
depth; CanBeSkipped( vdi ; c) - function that returns true if a vertex can skipped without
expanding it.
1. Graph density calculation. If graph density is lower than 35% go to step 2a, else
go to step 2b.
2. Heuristic vertex greedy coloring. There should be two arrays created to store
initial color classes defined only once (Ca) and color classes recalculated on each
depth (Cb). During this step both arrays must be equal.
a. Before coloring vertices are unordered and colored with swaps.
b. Before coloring vertices are in decreasing order with response to their degree
and colored without swaps.
3. Searching. For each color class starting from the first (current color class index c).
a. Subgraph (branch) building. Build the first depth selecting all the vertices
from color classes whose number c is equal or smaller than current. Vertices
from the first color class should stand first. Vertices at the end should belong to
c color class.
b. Process subgraph.
(1) Initialize depth. d = 1.
(2) Initialize current vertex. Set current vertex index di to be expanded (ini-
tially the first expanded vertex is the rightmost one). di ¼ nd .
Reversed Search Maximum Clique Algorithm Based on Recoloring 463
(3) Bounding rule check. If current branch can possibly contain larger clique
than found by so far. If Caðvdi Þ\c and d 1 þ b½Caðvdi Þ jCBC j then
prune. Go to step 3.2.7.
(4) Vertex skipping check. If current vertex can possible contain larger clique
than found by so far. If d 1 þ Cbðvdi Þ jCBC j and CanBeSkipped( vdi ; c)
skip this vertex. Decrease index i = i − 1. Go to step 3.2.3.
(5) Expand current vertex. Form new depth by selecting all the adjacent
vertices (neighbors) to current vertex vdi (Gd þ 1 ¼ N ðvdi Þ). Set the next
expanding vertex on current depth di ¼ di 1:
(6) New depth analysis. Check if new depth contains vertices.
i. If Gd þ 1 ¼ ; then check if current clique is the largest one it must be
saved. Go to step 3.3.
ii. If Gd þ 1 6¼ ; then check graph density. If graph density is lower than
55% apply greedy coloring with swaps to Gd þ 1 , else use greedy col-
oring without swaps. Save number of color classes (cn) acquired by this
coloring. If number of color classes cannot possibly give us a larger
clique then prune. If d 1 þ cn jCBC j decrease index i = i − 1 and
go to step 3.2.3, else increase depth d = d + 1. Go to step 3.2.2.
(7) Step back. Decrease depth d = d − 1. Delete expanding vertex from the
current depth. If d = 0 go to step 3.3, else go to step 3.2.3.
(8) Complete iteration. Save current best clique value for this color. b[c] = |
CBC|.
4. Return maximum clique. Return CBC.
CanBeSkipped function
th – threshold from which branch will be pruned; CBC – current best clique, largest
clique found by so far; d – depth; c – index of the currently processed color class; di –
index of the currently processed vertex on depth d; bnd – bound from which vertices
cannot be skipped; b – array to save maximum clique values for each color class; Ca –
initial color classes array; Cb – color classes array recalculated on each depth.
4 Results
In this part we are going to compare the new algorithm to all the previously described
ones. The following algorithms take part in testing: Carraghan and Pardalos, Östergård,
VColor-u (Kumlander 2005), VColor-BT-u, MCQ, MCR, MCS, MCS Improved and
464 D. Kumlander and A. Porošin
VRecolor-BT-u. The first part of tests is devoted to randomly generated graphs. These
random tests give a general overview of algorithms performance and therefore whether
a new algorithm is worth to be used for clique finding. All test cases are divided by
graphs density and for each density different algorithms are being tested. Algorithms
that perform much worse compared to others are removed from test results. The second
part contains analysis of algorithm results of DIMACS instances. Each DIMACS graph
has a special structure with response to some specific real problem. Four algorithms
were tested with this benchmark: MCS, MCSI, VColor-BT-u and VRecolor-BT-u.
8800
Ostergard
Time (ms)
6800
VColorBtu
4800
Mcq
2800
Mcr
800
800 880 960 1040 1120 1200 VRecolorBtu
Number of verƟces
13900
11900
Mcq
Time (ms)
9900
7900 Mcr
5900 Mcs
3900 Mcsi
1900 VRecolorBtu
400 420 440 460 480 500
Number of verƟces
21000
16000
Time (ms)
Mcr
11000 Mcs
6000 Mcsi
VRecolorBtu
1000
125 130 135 140 145 150
Number of verƟces
Basic pruning formulas are really effective on such small density. Although
VRecolor-BT-u outperforms them proving that skipping technique gives overall pos-
itive impact even with a fact that algorithm needs to spend time for coloring and
proving that a vertex can be skipped. On densities from 20% to 40% the closest to
VRecolor-BT-u are results of MCQ and MCR but the new algorithm performs about
20–25% faster. Note that some algorithms are missing on certain densities: we filter out
those, which performance is exceptionally bad.
Based on randomly generated graph results we can conclude with the following
statements:
• Graphs with densities lower than 50% are best solved using VRecolor-BT-u
algorithm
• When graphs density is about 50%, there are three algorithms MCQ, MCR and
VRecolor-BT-u that are the fastest, but time consumption fluctuates a bit compared
to each other
• If density of graph lies between 55% and 75%, then VRecolor-BT-u algorithm is a
best choice
• For dense graphs with density more than 75% MCS Improved is fastest algorithm.
5 Summary
The main topic of this study was to develop a new improved algorithm for maximum
clique finding on undirected, unweighted graphs. The new maximum clique algorithm
called VRecolor-BT-u is demonstrated. This algorithm is constructed on the basis of
reversed search by color classes. The main idea is to apply coloring on each depth to
preserve the most up-to-date color classes and combine updated vertex colors with the
reversed search approach. At the first sight the idea of in depth recoloring might be
Reversed Search Maximum Clique Algorithm Based on Recoloring 467
unclear as reversed search is built around initial color classes, but introduction of a new
skipping technique instead of pruning allows avoiding this conflict. Furthermore, there
are two different greedy coloring algorithms (with swaps and without swaps) used for
initial and in-depth coloring. Experimentally gained constants, which depend on graph
density, determine which coloring is applied. The new algorithm shows the best results
on the random graphs with low density and loses only on dense graphs to MCS and
MCSI algorithms specially designed for high densities. VRecolor-BT-u produces less
branches that its predecessor for all the DIMACS instances, but there are some cases
where the new algorithm consumes more time. Decreasing branch number resulting in
a performance degradation might be misleading at a glance, but can be described with a
simple fact that on some special cases additional in depth recoloring consumes a lot of
time while skipping technique is practically not working. In the result we have a
slightly lower branch number but increased time consumption. Finally, it was noted
that each graph should be solved by a different algorithm with response to graphs
density. On the low to mid densities it is advised to use VRecolor-BT-u algorithm
while the best option for dense graphs is MCS Improved algorithm.
References
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of Np-
Completeness. Freeman, New York (2003)
Cook, S.A.: The complexity of theorem proving procedures. In: Proceedings of the 3rd
Annual ACM Symposium on Theory of Computing, pp. 151–158 (1971)
Carraghan, R., Pardalos, P.M.: An exact algorithm for the maximum clique problem. Oper. Res.
Lett. 9, 375–382 (1990)
Östergård, P.R.J.: A fast algorithm for the maximum clique problem. Discret. Appl. Math. 120,
197–207 (2002)
Kumlander, D.: Some Practical Algorithms to Solve the Maximum Clique Problem. Tallinn
University of Technology, Tallinn (2005)
Clarkson, K.: A modification to the greedy algorithm for vertex cover. Inf. Process. Lett. 16(1),
23–25 (1983)
Andrade, D.V., Resende, M.G.C., Werneck, R.F.: Fast local search for the maximum
independent set problem. J. Heuristics 18(4), 525–547 (2012)
Tomita, E., Seki, T.: An efficient branch-and-bound algorithm for finding a maximum clique. In:
Proceedings of the 4th International Conference on Discrete Mathematics and Theoretical
Computer Science, DMTCS’03. Springer-Verlag, Berlin, Heidelberg, pp. 278–289 (2003)
Tomita, E., Kameda, T.: An efficient branch-and-bound algorithm for finding a maximum clique
with computational experiments. J. Glob. Optim. 37(1), 95–111 (2007)
Tomita, E., Sutani, Y., Higashi, T., Takahashi, S., Wakatsuki, M.: A simple and faster branch-
and-bound algorithm for finding a maximum clique. In: Proceedings of the 4th International
Conference on Algorithms and Computation, WALCOM’10. Springer-Verlag, Berlin,
Heidelberg, pp. 191–203 (2010)
Batsyn, M., Goldengorin, B., Maslov, E., Pardalos, P.M.: Improvements to MCS Algorithm for
the Maximum Clique Problem. Springer Science+Business Media, New York (2013)
Sifting Edges to Accelerate the
Computation of Absolute 1-Center
in Graphs
1 Introduction
1.1 Previous Results
Let G = (V, E, w) be an undirected connected graph, where V is the set of n
vertices, E is the set of m edges, and w : E → R+ is a positive weight function
on edges. The vertex 1-center problem (V1CP) aims to find a vertex of G,
called a vertex 1-center (V1C), to minimize the longest distance from it to all the
other vertices. The V1CP is tractable and the whole computation is dominated
by finding all-pairs shortest paths in G, which can be done by using Fredman
and Tarjan’s O(mn + n2 log n)-time algorithm [3], or O(m∗ n + n2 log n)-time
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 468–476, 2020.
https://doi.org/10.1007/978-3-030-21803-4_47
Sifting Edges to Accelerate the Computation of Absolute 469
algorithm by Karger et al. [7] where m∗ is the number of edges used by shortest
paths, or Pettie’s O(mn + n2 log log n)-time algorithm [9].
The classic absolute 1-center problem (A1CP) asks for a point on some
edge of G, called an absolute 1-center (A1C), to minimize the longest distance
from it to all the vertices of G. The A1CP was proposed by Hakimi [4], who
showed that an A1C of a vertex-unweighted graph G must be at either one of
2 n(n − 1) break points or one vertex of G. The A1CP admits polynomial-time
1
exact algorithms. For example, Hakimi et al. [5] presented an O(mn log n)-time
algorithm. In [8], Kariv and Hakimi first devised a linear-time subroutine to
compute a local center on every edge and then found an A1C by comparing all
the local centers. As a result, they developed an O(mn+n2 log n)-time algorithm
when the all-pairs shortest paths distance matrix is known and an O(mn + n3 )-
time algorithm when the distance matrix is unknown. The A1CP in vertex-
unweighted trees admits a linear-time algorithm. Moreover, the A1CP in vertex-
weighted graphs admits an O(mn2 log n)-time algorithm by Hakimi et al. [5]
and Kariv and Hakimi’s O(mn log n)-time algorithm [8]. For A1CP in vertex-
weighted trees, Kariv and Hakimi designed an O(n log n)-time algorithm [8]. For
more results on A1CP, we refer readers to [2,10] and references listed therein. The
A1CP has applications in the design of minimum diameter spanning subgraph
[1,6].
In this paper, we consider the generalized version of the classic A1CP, formally
defined in Sect. 2, where we are asked to find an absolute 1-center from the given
subset of candidate edges with a goal of minimizing the longest distance from
the center to the given subset of terminals. Without otherwise specified, A1CP
is referred to as the generalized version in the remainder of this paper.
First, we prove that a V1C is just an A1C if the all-pairs shortest paths
distance matrix from the vertices covered by candidate edges to terminals has a
(global) saddle point. Next, we introduce the definition of the local saddle point
of edge, and prove that the local center on one edge can be reduced to one of its
endpoints if the edge has no local saddle point. In other words, the candidate
edges that have a local saddle point can be sifted. Moreover, we combine the
tool of sifting edges with the framework of Kariv and Hakimi’s algorithm to
design an O(m + pm∗ + np log p)-time algorithm for A1CP, where m∗ is the
number of the remaining candidate edges. Applying our algorithm to the classic
A1CP takes O(m + m∗ n + n2 log n) time when the distance matrix is known
as well as O(mn + n2 log n) time when the matrix is unknown, which reduces
O(mn + n2 log n) time and O(mn + n3 ) time of Kariv and Hakimi’s algorithm
to some extent, respectively.
Organization. The rest of this paper is organized as follows. In Sect. 2, we define
the notations and A1CP formally. In Sect. 3, we show the fundamental properties
which form the basis of our algorithm. In Sect. 4, we present our algorithm and
apply it to the classic A1CP. In Sect. 5, we conclude this paper.
470 W. Ding and K. Qiu
The vertex 1-center problem (V1CP) aims to find a vertex vi∗ , called a
vertex 1-center (V1C), from S to minimize the value of φ(vi , T ). So,
Let Pk be the set of continuum points on edge ek , for any 1 ≤ k ≤ m, and P(G)
be the set of continuum points on all the edges of G. For any point p ∈ P(G),
if p is a vertex then τ (p) also denotes the index of vertex p and τ (p) is empty
otherwise. Let E ⊆ {1, 2, . . . , m} be the index set of the given candidate edges
and also let E denote the set of candidate edges. Let PE be the set of continuum
points on all the edges in E. Clearly,
PE = Pk . (4)
k∈E
The absolute 1-center problem (A1CP) asks for a point p∗ , called a absolute
1-center (A1C), from PE to minimize the value of φ(p, T ). So,
and
ψ(vj ∗ , PE ) = max ψ(vj , PE ). (9)
j∈T
and
ψ(vj ∗∗ , SE ) = max ψ(vj , SE ). (11)
j∈T
3 Fundamental Properties
In this section, we show several important properties which form the basis of
our algorithm.
By Theorem 2, we only need to select e1k and e2k as the candidate vertices of
A1C when ek , k ∈ E, has a local saddle point. So, Corollary 2 follows.
It is well known that Kariv and Hakimi’s algorithm is the most popular one for
the classic A1CP. The fundamental framework of their algorithm is based on the
result that there is surely a classic A1C in the union of all the vertices of the
graph and the local centers of all the edges. By Corollaries 2 and 3, we claim for
A1CP that those candidate edges having a local saddle point can be sifted but
only their endpoints remain as the candidate vertices of A1C. In other words,
the edges in E1 can be omitted. In this subsection, we combine the framework of
Kariv and Hakimi’s algorithm with the tool of sifting candidate edges to design
a fast algorithm, named AlgA1CP, for A1CP, which consists of three stages.
Algorithm (AlgA1CP) for A1CP:
Input: an instance I of A1CP and distance matrix D;
Output: an A1C, p∗ .
01: Use DFS to traverse G to get SE , compute φ(vi , T )
02: and record ji∗ ← arg maxj∈T d(vi , vj ) for each i ∈ SE ;
03: i∗ ← arg mini∈SE φ(vi , T ); // (the index of V1C)
04: E0 ← ∅; SE0 ← ∅;
05: for every k ∈ E do
06: Determine whether or not ek has a local saddle point;
07: if ek has no local saddle point then
08: E0 ← E0 ∪ {k}; SE0 ← SE0 ∪ {τ (e1k ), τ (e2k )};
09: endif
10: endfor
11: if E0 = ∅ then
12: Return vi∗ ; // (i.e., p∗ = vi∗ )
13: else
14: for every i ∈ SE0 do
15: Sort d(vi , vj ), j ∈ T in a nonincreasing order;
16: endfor
17: for every k ∈ E0 do
474 W. Ding and K. Qiu
First of all, we recall and introduce some useful definitions and notations. For
every 1 ≤ k ≤ m, a local center on ek , p∗k , is referred to as a point on ek such that
the value of φ(p, T ) is minimized, i.e., φ(p∗k , T ) = minp∈Pk φ(p, T ). An optimal
local center, p∗k∗ , is referred to as the one that minimizes the value of φ(p∗k , T ),
i.e., φ(p∗k∗ , T ) = mink∈E0 φ(p∗k , T ). Note that k ∗ is the index of the candidate
edge in E0 that has an optimal local center. Moreover, for every i ∈ SE0 , we
let ji∗ be the index of the terminal that maximizes the value of d(vi , vj ), i.e.,
d(vi , vji∗ ) = maxj∈T d(vi , vj ).
In the first stage (lines 1–3 in AlgA1CP), it takes O(m) time to obtain SE
by using depth first search (DFS) to traverse G. When the distance matrix D is
known, it takes O(|T |) time to compute φ(vi , T ) (i.e., find the maximum element
on the i-th row of D) and record the column index ji∗ , for every i ∈ SE . Then,
it takes O(|SE |) time to find a V1C, i.e., the vertex index i∗ such that the value
of φ(vi , T ) is minimized. So, the time cost of the first stage is O(m + |SE ||T |).
In the second stage (lines 4–10 in AlgA1CP), the major task is to record
E0 and SE0 . So, we need to determine whether or not ek has a local saddle
point, for every k ∈ E. It is implied by Lemma 6 that our practice can be
used to determine whether or not D(Sk , T ) has a saddle point. In detail, we
determine that D(Sk , T ) has a saddle point if d(vτ (e1k ) , vj ∗ 1 ) ≤ d(vτ (e2k ) , vj ∗ 1 )
τ (e ) τ (e )
k k
or d(vτ (e2k ) , vj ∗ ) ≤ d(vτ (e1k ) , vj ∗ ) and it has no saddle point otherwise. Such
τ (e2 ) τ (e2 )
k k
a decision takes O(1) time. Accordingly, the update of E0 and SE0 also takes
O(1) time. So, the time cost of the second stage is O(|E|).
In the third stage (lines 11–24 in AlgA1CP), we find a local center p∗k on ek
for every k ∈ E0 and then determine the optimal local center p∗k∗ . By comparing
φ(vi∗ , T ) and φ(p∗k∗ , T ), we obtain an A1C, p∗ . The main body of this stage is
to compute p∗k∗ . First, it takes O(|SE0 ||T | log |T |) time to sort d(vi , vj ), j ∈ T for
all i ∈ SE0 . Next, it takes O(|T |) time to apply Kariv and Hakimi’s subroutine
to the candidate edge ek to get p∗k , for every k ∈ E0 . So, it takes O(|E0 ||T |) time
to compute all p∗k , k ∈ E0 and then determine p∗k∗ . So, the time cost of the third
stage is O(|E0 ||T | + |SE0 ||T | log |T |).
The above three stages form our algorithm AlgA1CP for A1CP. Therefore,
the total time cost of AlgA1CP is
Since |SE | ≤ n and |SE0 | ≤ min{2|E0 |, n} ≤ n, we conclude that the total time
cost of AlgA1CP is at most
Now, we consider the special case of A1CP, where all the candidate edges
have a local saddle point, i.e., E0 = ∅. The time cost of applying AlgA1CP to this
special case is obtained by substituting |SE0 | = |E0 | = 0 and |SE | ≤ n into Eq.
(14), see Corollary 4.
4.3 Application
The classic A1CP is the special case of A1CP studied in this paper, where
T = {1, 2, . . . , n} and E = {1, 2, . . . , m}. Therefore, when the distance matrix is
known, we substitute p = n into Theorem 3 to obtain the time cost of apply-
ing AlgA1CP to the classic A1CP, see Theorem 4. Moreover, when the distance
matrix is unknown, we additionally use Pettie’s O(mn + n2 log log n)-time algo-
rithm [9] to get the matrix.
Next, we consider the special case of the classic A1CP where all the edges
have a local saddle point. We substitute p = n into Corollary 4 to obtain the
time cost of applying AlgA1CP to this special case. Similarly, when the distance
matrix is unknown, we use Pettie’s algorithm to get it.
5 Conclusions
This paper studies the (generalized) A1CP in an undirected connected graph.
We examine an important property that if the distance matrix has a saddle point
then all the candidate edges can be sifted and only their endpoints remain as
the candidate vertices of A1C (i.e., a V1C is just an A1C), and further conclude
that every candidate edge having a local saddle point can be sifted. Based on
this property, we combine the tool of sifting edges with Kariv and Hakimi’s
subroutine for finding a local center of edge to design a faster exact algorithm for
the classic A1CP, which reduces O(mn + n2 log n) time of Kariv and Hakimi’s
algorithm to O(m + m∗ n + n2 log n) time, where m∗ is the number of edges
that have no local saddle point, when the distance matrix is known as well as
O(mn + n3 ) time of it to O(mn + n2 log n) time when the matrix is unknown,
respectively.
In this paper, we determine separately whether or not every candidate edge
can be sifted (i.e., has a local saddle point). This is a straightforward way. It
remains as a future research topic how to find a faster method of sifting candidate
edges to a larger extent.
References
1. Ding, W., Qiu, K.: Algorithms for the minimum diameter terminal steiner tree
problem. J. Comb. Optim. 28(4), 837–853 (2014)
2. Eiselt, H.A., Marianov, V.: Foundations of Location Analysis. Springer, Heidelberg
(2011)
3. Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network
optimization algorithms. J. ACM 34(3), 596–615 (1987)
4. Hakimi, S.L.: Optimum locations of switching centers and the absolute centers and
medians of a graph. Oper. Res. 12(3), 450–459 (1964)
5. Hakimi, S.L., Schmeichel, E.F., Pierce, J.G.: On p-centers in networks. Transport.
Sci. 12(1), 1–15 (1978)
6. Hassin, R., Tamir, A.: On the minimum diameter spanning tree problem. Info.
Proc. Lett. 53(2), 109–111 (1995)
7. Karger, D.R., Koller, D., Phillips, S.J.: Finding the hidden path: time bounds for
all-pairs shortest paths. SIAM J. Comput. 22(6), 1199–1217 (1993)
8. Kariv, O., Hakimi, S.L.: An algorithmic approach to network location problems.
I: the p-centers. SIAM J. Appl. Math. 37(3), 513–538 (1979)
9. Pettie, S.: A new approach to all-pairs shortest paths on real-weighted graphs.
Theor. Comp. Sci. 312(1), 47–74 (2004)
10. Tansel, B.C., Francis, R.L., Lowe, T.J.: Location on networks: a survey. Part I: the
p-center and p-median problems. Manag. Sci. 29(4), 482–497 (1983)
Solving an MINLP with Chance
Constraint Using a Zhang’s Copula
Family
Adriano Delfino(B)
1 Introduction
In recent years, the stochastic programming community have been witnessed a
great development in optimization methods for dealing with stochastic programs
with mixed-integer variables [2]. However, there are only few works on chance-
constrained programming with mixed-integer variables [1,3,12,13]. In this work,
the problem of interest consists in nonsmooth convex mixed-integer nonlinear
programs with chance constraints (CCMINLP). These class of problems for
instance, can be solved by employing the outer-approximation technique. In
general, OA algorithms require solving less MILP subproblems than extended
cutting-plane algorithms [14], therefore the former class of methods is prefer-
able than the latter one. This justifies why we have chosen the former class of
methods to deal with problems of the type
min(x,y)∈X×Y f0 (x, y)
s.t. fi (x, y) ≤ 0, i = 1, . . . , mf − 1 (1)
P [g(x, y) ≥ ξ] ≥ p,
Due to the probability function P [g(x, y) ≥ ξ], evaluating the constraint (2)
and computing its subgradient is a difficult task: for instance, if P follows a
multivariate normal distribution, computing a subgradient of P [g(x, y) ≥ ξ]
requires numerically solving m integrals of dimension m − 1. If the dimension
m of ξ is too large, then creating a cut for function log(p) − log(P [g(x, y) ≥ ξ])
is computationally challenging. In this situation, it makes sense to replace the
probability measure by a simpler function.
In this manner, this work proposes to approximate the hard chance constraint
P [g(x, y) ≥ ξ] ≥ p by a copula C:
C(Fξ1 (g1 (x, y)), Fξ2 (g2 (x, y)), . . . , Fξm (gm (x, y)) ≥ p. (4)
In addition to the difficulties present in MINLP models, we recall that the con-
straint function (4) can be nondifferentiable. Our main contribution in this paper
was to prove that the Zhang’s copula family is log concave and with this result we
get a approximation to Problem (3) which is possible to solve in the reasonable
time.
This work is organized as follows: in Sect. 2 we briefly review some basic about
Copulae and proved that the Zhang’s family satisfies the condition to use the
outer-approximation algorithm develop in [4]. In Sect. 3, a review about outer-
approximation is presented. In Sect. 4, we describe a tool problem coming from
power management energy, in Sect. 5 we present some preliminary numerical
solution of this problem and finally, in Sect. 6 we give a short conclusion.
fm (x, y) = log(p) − log C(Fξ1 (g1 (x, y)), Fξ2 (g2 (x, y)), . . . , Fξm (gm (x, y)), (6)
There is a family of copula with this property, introduced by Zhang [15]. The
family is given by
r
a
C(u1 , . . . , um ) = min (ui j,i ), (8)
1≤i≤m
j=1
r
where aj,i ≥ 0 and j=1 aj,i = 1 for all i = 1, . . . , m. Different choices of
parameters aj,i give different copulae, all of them nonsmooth functions, but
with subgradient easily computed via chain rule. We proved that this family of
copula is a log concave.
Theorem 2. Let ξ ∈ IRm be a random vector with all marginals Fξi , i =
1, . . . , m being 0−concave functions. Suppose that g : IRnx × IRny → IRm is
a concave function. Consider a Zhang’sCopula C given in (8) for any choice of
r
parameters aj,i satisfying aj,i ≥ 0 and j=1 aj,i = 1 for all i = 1, . . . , m. Then
C(Fξ1 (g1 (x, y)), Fξ2 (g2 (x, y)), . . . , Fξm (gm (x, y)))
is α−concave for α ≤ 0.
Proof. Given a pair (x, y) ∈ IRnx ×IRny we set z = (x, y) to simplify the notation.
Let z1 = (x1 , y1 ), z2 = (x2 , y2 ) and z = λz1 + (1 − λ)z2 with λ ∈ [0, 1]. As the
function g is concave, then for all i = 1, . . . , m
log(Fξi (gi (λz1 + (1 − λ)z2 ))) ≥ log(Fξi (λgi (z1 ) + (1 − λ)gi (z2 ))). (11)
log(Fξi (λgi (z1 ) + (1 − λ)gi (z2 ))) ≥ λ log(Fξi (gi (z1 ))) + (1 − λ) log(Fξi (gi (z2 ))).
(12)
By gathering inequality (11) and (12) we have
log(Fξi (gi (z))) ≥ log(λFξi (gi (z1 )) + (1 − λ)Fξi (gi (z2 )))
(13)
≥ λ log(Fξi (gi (z1 ))) + (1 − λ) log(Fξi (gi (z2 ))).
where aj,i ≥ 0.
To simply the notation, we write Fξ1 (g1 (z), . . . , Fξm (gm (z) as Fξ (g(z)). So,
r a
log C(Fξ (g(z))) = j=1 log (min1≤i≤m [Fξi (gi (z))] j,i ) .
Solving an MINLP with Chance Constraint Using a Zhang’s Copula Family 481
As the log function is increasing, log min u = min log u, and therefore
r
aj,i
log C(Fξ (g(z))) = min [log (Fξi (gi (z)) ].
1≤i≤m
j=1
log C(Fξ (g(λz1 + (1 − λ)z2 ))) ≥ λ log C(Fξ (g(z1 ))) + (1 − λ) log C(Fξ (g(z2 ))),
i.e., the log C(Fξ1 (g1 (z)), . . . , Fξm (gm (z))) is a concave function. In other words,
the copula C is α−concave for α ≤ 0.
3 Outer Approximation
T
maxx,y,z t=1 πt zt
s.t. P [xt + ξt ≥ dt ∀t = 1, . . . , T ] ≥ p
yt v ≤ xt + zt ≤ yt v̄ ∀t = 1, . . . , T
xt , zt ≥ 0, yt ∈ {0, 1} ∀t = 1, . . . , T (17)
t ¯
l ≤ l0 + tω − χ τ =1 (xτ + zτ ) ≤ l ∀t = 1, . . . , T
1
t
l0 + T ω − χ1 τ =1 (xτ + zτ ) ≥ l∗ ,
where zt is the residual energy which is produced by the hydro power plant in
time interval t that is sold to market, πt is the energy price in the time t, xt is
the amount of energy generated by hydro power plant to supply the remaining
demand on local community on time t, dt is the local community demand on
time t which is assumed to be known (due to the short planning horizon of one
day), ξt is the random energy generated by the wind farm on time t p ∈ (0, 1]
is the given parameter to ensure confidence level for the demand satisfaction, v
and v̄ are the lower and upper respectively operations limits of the hydro power
plant turbine, yt is the binary variable modeling turbine turn on/turn off, l0 is
the initial water level of the hydro power plant reservoir at the beginning of the
horizon, l and ¯l are the lower and upper water levels respectively in the hydro
power plant reservoir at any time, ω denotes the constant amount of water inflow
to the hydro power plant reservoir at any time t, χ represents a conversion factor
between the released water and the energy produced by the turbine: one unit of
water released corresponds to χ units of energy generated, l∗ is the minimum
level of water into the hydro power plant reservoir in the last period T of the
time horizon and P is the probability measure associated to random vector ξ. As
in [1], we assume that the wind power generation follows a multivariate normal
distribution with mean vector μ and a positive definite correlation matrix Σ.
This assumption assures that problem (3) is convex. Theorem 1 secures that
we can replace the probability measure P by a Zhang’s copula C and finally,
Theorem 2 confirm that Problem (7) are also convex and consequently, we can
use outer-approximation to solve them.
5 Numerical Experiments
Numerical experiments were performed on a computer with Intel(R) Core(TM),
i7-5500U, CPU @ 2.40 GHz, 8G (RAM), under Windows 10, 64 Bits and we
coded in/or called from matlab version 2017a our algorithm from [4].
We solved the power system management problem for T = 24 (one day) using
the Zhang’s copula to approximate the probability function. Then we checked
the probability constraint of (17). In this case for dimension T = 24 was not
possible to solve Problem (17) within one hour CPU time, given the considered
computer and softwares.
One of difficulties in using copulae is to find its coefficients that model with
accuracy the probability constraint. The parameters of Zhang’s Copula depend
on the size of the problem. If the random vector ξ has dimension T then the
r
number of parameters is 1+rT : r and aj,i ≥ 0 with j=1 aj,i = 1 ∀i = 1, . . . , T .
In this work we do not focus on the best choice of the Copula parameters yet.
For now, we simply set r = 8 and the coefficients aj,i was generated following
a uniform probability distribution with low sparseness. As shown below, this
simple choice gives satisfactory results.
These nonsmooth convex mixed-integer programs are solved with the follow-
ing solvers, coded in/or called from matlab version 2017a:
We solved 21 problems, each of them based a day of week and using the value
for p as 0.8, 0.9 and 0.95.
Table 1 shows the performance those algorithms in all problems. Although
some problems were not resolved in the required time (one hour) by OA, the
regularized ones solved all problems with satisfactory results. It is important to
say that if we use the probability function instead of copula then no problem is
solved by the algorithms in the time limit of one hour.
486 A. Delfino
Table 1. Number of MILPs and CPU time for p ∈ {0.8, 0.9, 0.95}
6 Conclusion
In this work we show that using copulas is an excellent alternative to solve
problems involving probability restrictions. In future work we will move in this
direction and improve our numerical results.
References
1. Arnold, T., Henrion, R., Möller, A., Vigerske, S.: A mixed-integer stochastic non-
linear optimization problem with joint probabilistic constraints. Pac. J. Optim. 10,
5–25 (2014)
2. Birge, J., Louveaux, F.: Introduction to Stochastic Programming. Springer, New
York (2011)
3. de Oliveira, W.: Regularized optimization methods for convex MINLP problems.
TOP 24, 665–692 (2016)
Solving an MINLP with Chance Constraint Using a Zhang’s Copula Family 487
Sai Ji1 , Dachuan Xu1 , Min Li2 , Yishui Wang3(B) , and Dongmei Zhang4
1
College of Applied Sciences, Beijing University of Technology, Beijing 100124,
People’s Republic of China
jisai@emails.bjut.edu.cn, xudc@bjut.edu.cn
2
School of Mathematics and Statistics, Shandong Normal University,
Jinan, People’s Republic of China
liminEmily@sdnu.edu.cn
3
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences,
Shenzhen 518055, People’s Republic of China
ys.wang1@siat.ac.cn
4
School of Computer Science and Technology, Shandong Jianzhu University,
Jinan 250101, People’s Republic of China
zhangdongmei@sdjzu.edu.cn
1 Introduction
Maximizing a submodular function subject to independence constraint is related
to many machine learning and data science applications such as information
gathering [13], document summarization [14], image segmentation [12], and PAC
learning [16]. There are many studies about variants of submodular maximization
[2,4,8,11].
However many subset selection problems in data science are not always sub-
modular [22]. In this paper, we study the constrained maximization of an objec-
tive that may be decomposed into the sum of a submodular and a supermodular
function. That is, we consider the following problem:
2 Preliminaries
Given a set V = {v1 , v2 , . . . , vn }, denote fv (X) = f (X ∪ {v}) − f (X) as the
marginal gain of adding the item v to the set X ⊂ V . A function f is monotone
if for any X ⊆ Y , f (X) ≤ f (Y ). Without loss of generality, we assume that
monotone functions are normalized, i.e., f (∅) = 0.
Definition 1. (Submodular curvature [7]) The curvature Kf of a submodular
function f is defined as
fv (V \{v})
Kf = 1 − min .
v∈V f (v)
Definition 2. (Supermodular curvature [7]) The curvature Kg of a supermodu-
lar function g is defined as
g(v)
Kg = 1 − min .
v∈V gv (V \{v})
From Definitions 1 and 2, we have that 0 ≤ Kf , Kg ≤ 1. In this paper, we
study the case that 0 < Kf < 1 and 0 < Kg < 1.
3 Algorithms
In this section, we provide the SG algorithm, the SSG algorithm and the RG
algorithm for the monotone BP maximization problem subject to cardinality
constraint, monotone BP maximization problem subject to p-system constraint
and non-monotone BP maximization problem subject to cardinality constraint,
respectively. These three algorithms are alternative processes. In each iteration,
the SG algorithm samples an alternative set of size nk ln 1 uniformly and ran-
domly from the set V \current solution, then chooses the item from above alter-
native set with the maximum marginal gain. The SSG algorithm selects one item
whose marginal gain is at least ξ ∈ (0, 1] times of the largest marginal gain value,
where ξ ∈ (0, 1] comes from a distribution D. While the RG algorithm chooses
an item uniformly and randomly from a set of size k with largest summation of
individual marginal gain value.
The detailed algorithms are showed as follows.
3.1 SG Algorithm
Similar to Lemma 2 of [15], we have the following lemma, which estimate the
lower bound of the expected gain in the (i+1)-th step and reveal the relationship
between the current solution S and the optimal solution S ∗ .
Lemma 1. Let t = nk ln 1 . The expected gain of Algorithm 1 at the (i + 1)-th
step is at least
1−
hs (Si ),
|S ∗ \Si | ∗
s∈S \S
Stochastic Greedy Algorithm Is Still Good: Maximizing 491
k − |S ∗ ∩ Si |
+ ai+1 ,
(1 − Kg )(1 − )
where S ∗ is the optimal solution, {si } = Si \Si−1 , ai = E[hsi (Si−1 )], Kf is the
curvature of submodular function f , and Kg is the curvature of supermodular
function g.
Combining Lemma C.1, Lemma D.2 of [1] and Lemmas 1–2, we obtain the
following theorem to estimate the approximation ratio of SG algorithm.
Theorem 1. Let t = nk ln 1 . For monotone BP maximization problem with
cardinal constraint, Algorithm 1 finds a subset S ⊆ V with |S| = k and
1 g
E[h(S)] ≥ 1 − e−(1−K )Kf − h(S ∗ ).
Kf
where Kf and Kg are the curvature of submodular function f and the curvature
of supermodular function g, respectively.
(1 − Kg )2 μ
E[h(S)] ≥ h(S ∗ ),
p + (1 − Kg )2 μ
where S ∗ , Kf , Kg and μ is the optimal solution, the curvature of submodular
function f , the curvature of supermodular function g and the expectation of
ξ ∼ D, respectively.
3.3 RG Algorithm
Lemma 4. Let h = f + g, f is a submodular function and g is a supermodular
function. Denote A(p) as a random subset of A where each element appears with
probability at most p (not necessarily independently). Then we have
E[h(A(p))] ≥ [1 − (1 − Kg )p]h(∅),
From Lemma 4, we get the following lemma which is crucial to the analysis
of RG algorithm.
Proof. Denote Si as the subset obtained by Algorithm 3 after i steps. For each
1 ≤ i ≤ k, consider a set Mi containing the elements of S ∗ ∪ Si−1 plus enough
dummy elements to make the size of Mi exactly k. Recall Algorithm 3 and
Lemma C.1 of [1]. We have
E[hsi (Si )] = k −1 hs (Si−1 ) ≥ k −1 hs (Si−1 )
s∈Mi s∈Mi
= k −1 hs (Si−1 )
S ∗ \Si−1
By (1), we have
k
k−1
1 − Kg 1 − Kg
E[h(S)] = E[h(Sk )] ≥ 1 − h(∅) + 1−
k k
k−2 k−3 2
1 − Kg 1 1 − Kg 1
+ 1− 1− + 1− 1− +
k k k k
k−1
1 1 − Kg
... + 1 − h(S ∗ )
k k
k−1
1 − Kg 1
≥ k 1− h(S ∗ )
k k
≥ (1 − Kg )e−1 h(S ∗ ).
The theorem is proved.
4 Numerical Experiments
In this section, we make some numerical experiments to compare the Stochastic-
Greedy algorithm (Algorithm 1) and the Stochastic-Standard-Greedy algorithm
(see the reference [1]) for the BP maximization problems subject to cardinality
constraint. We use the same model in Bai et al. [1]. In this model, the ground
set V is partitioned into V1 = {v1 , . . . , vk } and V2 = V \V1 . The submodular
function is defined as
k − α|S ∩ V2 | |S ∩ V2 |
f (S) := λ · wi + ,
k k
i:vi ∈S
0.9 0.9
objective value
0.8
0.8 0.8
0.6
0.7 0.7
=0.7 =0.9
1 1
0.9
0.9
stochastic greedy
0.8 standard greedy
0.8
0.7
0.6 0.7
1 1
0.5 0.5
0 0 0.5 1 0 0 0.5 1
is very big. Moreover, it is interested that for the fixed λ and Kg , lower Kf
yields lower gap of value between the two algorithms, and larger λ (means the
function h is closer to be submodular) yields lower gap of value between the two
algorithms for the big Kf .
Size λ Minimal time (s) Maximal time (s) Average time (s)
Standard Stochastic Standard Stochastic Standard Stochastic
greedy greedy greedy greedy
n = 300 0.1 8.2265 0.0923 8.7303 0.0969 8.4784 0.0943
k = 150 0.3 8.2101 0.0921 8.7409 0.0988 8.4771 0.0943
0.5 8.2262 0.0923 8.6891 0.0981 8.4800 0.0943
0.7 8.2150 0.0921 8.6225 0.0969 8.4808 0.0943
0.9 8.2512 0.0922 8.7408 0.1136 8.4816 0.0944
5 Conclusion
In this paper, we consider the monotone BP maximization problem subject to
cardinality constraint and p-system constraint, respectively. Then, we consider
the non-monotone BP maximization problem subjected to cardinality constraint.
496 S. Ji et al.
For each problem, we give a stochastic algorithm. The theoretical analysis indi-
cates that the stochastic algorithms work well on the BP maximization prob-
lems. Numerical experiments shows the the algorithms are effective. There are
two possible future research problems. One is to design a better stochastic algo-
rithm to solve BP maximization problem subjected to cardinality constraint or
p-system constraint. Another direction is to study other variants of constrained
submodular maximization problem.
References
1. Bai, W., Bilmes, J.A.: Greed is still good: maximizing monotone submodular+
supermodular functions (2018). arXiv preprint arXiv:1801.07413
2. Bian, A., Levy, K., Krause, A., Buhmann, J.M.: Continuous dr-submodular maxi-
mization: structure and algorithms. In: Advances in Neural Information Processing
Systems, pp. 486–496 (2017)
3. Bian, A.A., Buhmann, J.M., Krause, A., Tschiatschek, S.: Guarantees for
greedy maximization of non-submodular functions with applications (2017). arXiv
preprint arXiv:1703.02100
4. Bogunovic, I., Zhao, J., Cevher, V.: Robust maximization of non-submodular
objectives (2018). arXiv preprint arXiv:1802.07073
5. Buchbinder, N., Feldman, M., Naor, J.S., Schwartz, R.: Submodular maximization
with cardinality constraints. In: Proceedings of the Twenty-Fifth Annual ACM-
SIAM Symposium on Discrete Algorithms, pp. 1433–1452 (2014)
6. Chekuri, C., Vondrák, J., Zenklusen, R.: Submodular function maximization via
the multilinear relaxation and contention resolution schemes. SIAM J. Comput.
43(6), 1831–1879 (2014)
7. Conforti, M., Cornuéjols, G.: Submodular set functions, matroids and the greedy
algorithm: tight worst-case bounds and some generalizations of the rado-edmonds
theorem. Discret. Appl. Math. 7(3), 251–274 (1984)
8. Epasto, A., Lattanzi, S., Vassilvitskii, S., Zadimoghaddam, M.: Submodular opti-
mization over sliding windows. In: Proceedings of the 26th International Conference
on World Wide Web, pp. 421–430 (2017)
9. Fisher, M.L., Nemhauser, G.L., Wolsey, L.A.: An analysis of approximations for
maximizing submodular set functions - II. Polyhedral Combinatorics, pp. 73–87
(1978)
10. Iwata, S., Orlin, J.B.: A simple combinatorial algorithm for submodular function
minimization. In: Proceedings of the Twentieth Annual ACM-SIAM Symposium
on Discrete Algorithms, pp. 1230–1237 (2009)
11. Kawase, Y., Sumita, H., Fukunaga, T.: Submodular maximization with uncertain
knapsack capacity. In: Latin American Symposium on Theoretical Informatics, pp.
653–668 (2018)
12. Kohli, P., Kumar, M.P., Torr, P.H.: P3 & beyond: move making algorithms for
solving higher order functions. IEEE Trans. Pattern Anal. Mach. Intell. 31(9),
1645–1656 (2009)
13. Krause, A., Guestrin, C., Gupta, A., Kleinberg, J.: Near-optimal sensor placements:
maximizing information while minimizing communication cost. In: Proceedings of
the 5th International Conference on Information Processing in Sensor Networks,
pp. 2–10 (2006)
Stochastic Greedy Algorithm Is Still Good: Maximizing 497
14. Lin, H., Bilmes, J.: A class of submodular functions for document summarization,
pp. 510–520 (2011)
15. Mirzasoleiman, B., Badanidiyuru, A., Karbasi, A., Vondrák, J., Krause, A.: Lazier
than lazy greedy. In: AAAI, pp. 1812–1818 (2015)
16. Narasimhan, M., Bilmes, J.: Pac-learning bounded tree-width graphical models.
In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence,
pp. 410–417 (2004)
17. Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for
maximizing submodular set functions - I. Math. Prog. 14(1), 265–294 (1978)
18. Niazadeh, R., Roughgarden, T., Wang, J.R.: Optimal algorithms for continu-
ous non-monotone submodular and dr-submodular maximization (2018). arXiv
preprint arXiv:1805.09480
19. Qian, C., Yu, Y., Tang, K.: Approximation guarantees of stochastic greedy algo-
rithms for subset selection. In: Proceedings of the Twenty-Seventh International
Joint Conference on Artificial Intelligence IJCAI, pp. 1478–1484 (2018)
20. Schoenebeck, G., Tao, B.: Beyond worst-case (in) approximability of nonsubmod-
ular influence maximization. In: International Conference on Web and Internet
Economics, pp. 368–382 (2017)
21. Sviridenko, M.: A note on maximizing a submodular set function subject to a
knapsack constraint. Oper. Res. Lett. 32(1), 41–43 (2004)
22. Wei, K., Iyer, R., Bilmes, J.: Submodularity in data subset selection and active
learning. In: International Conference on Machine Learning, pp. 1954–1963 (2015)
Towards Multi-tree Methods
for Large-Scale Global Optimization
1 Introduction
We consider block-separable (or quasi-separable) MINLP problems of the form
min cT x s. t. x ∈ P, xk ∈ Xk , k ∈ K (1)
with
P := {x ∈ Rn : aTj x ≤ bj , j ∈ J}
Xk := Gk ∩ Lk ∩ Yk , (2)
where
n
vector of variables x ∈ R is partitioned into |K| blocks suchnkthat n =
The
nk , where nk is the dimension of the k -th block, and xk ∈ R denotes
k∈K
the variables of the k-th block. The vectors x, x ∈ Rn denote lower and upper
bounds on the variables.
The linear constraints defining the polytope P are called global. The con-
straints defining sub-sets Xk are called local. Set Xk is defined by nonlinear
local constraints, denoted by Gk , by linear local constraints, denoted by Lk ,
and by integrality constraints, denoted by Yk . In this paper, all the nonlinear
local constraint functions gkj : Rnk → R are assumed to be bounded and con-
tinuously differentiable within the set [xk , xk ]. Linear global constraints P are
defined by aj ∈ Rn , bj ∈ R, j ∈ J and linear local constraints Lk are defined by
akj ∈ Rnk , bkj ∈ R, j ∈ Jk . Set Yk defines the set of integer values of variables
xki , i ∈ Ik
, where Ik is an index set. The linear objective function is defined by
cT x := cTk xk , ck ∈ Rnk and matrix Ak ∈ Rm×nk with m = |J| + |Jk |, is
k∈K
defined by columns with the indices of k-th block. Furthermore, we define sets
G := Gk , Y := Yk , X := Xk . (4)
k∈K k∈K k∈K
2 Polyhedral Outer-Approximation
min cT x,
s.t. k , k ∈ K,
x ∈ P, xk ∈ X (5)
500 P. Muts and I. Nowak
where
k := Yk ∩ Lk ∩ G
k , with Yk := Yk ,
X (6)
Rnk .
∂g
where ϕki (xi ) := x2i and Ikj = {i : ∂xi
= const} denotes an index set of nonlinear
variables of constraint function gkj .
The convexification parameters are computed by σkj = max{0, −vkj } and
vkj is a lower bound of the optimal value of the following nonlinear eigenvalue
problem
where
h̄kj (x, ŷ) := hkj (ŷ) + ∇hkj (ŷ)T (x − ŷ)
denotes the linearization of hkj at the sample point ŷ ∈ Tk ⊂ Rnk .
A piecewise linear overestimator q̌ki (x) of qkj is defined by replacing ϕki by
pki,t+1 − xi xi − pki,t
ϕ̌ki (xi ) := ϕki (pki,t ) + ϕki (pki,t+1 ) , (10)
pki,t+1 − ŷki,t pki,t+1 − pki,t
A DC-based OA of G is denoted by
k = {xk ∈ Lk : (xk , rk ) ∈ C
G k ∩ Q
k }, (12)
Towards Multi-tree Methods for Large-Scale Global Optimization 501
where
k := {y ∈ [x , xk ], rk ∈ Rnk : ȟkj (y) − σkj
C rki ≤ 0},
k
i∈Ikj
The polytope Ck is defined by linearization cuts as in (9) and the set Q
k is
defined by breakpoints Bk as in (10).
3 DECOA
In this section, we describe the DECOA (Decomposition-based Outer Approxi-
mation) algorithm for solving (1), depicted in Algorithm 1.
breakpoints B, and L, a
It starts by computing an initial OA, defined by C,
solution estimate x̂, and sets the upper objective bound v. Then it computes a
solution candidate x̃ by calling procedure oaLocalSearch. Using x̃, the outer
approximation C is improved by calling procedure fixAndrefine. If solution
point x̃ of problem (21) improves the best solution candidate, i.e. x̃ ∈ X and
cT x̃ < v, point x̃ is the new solution candidate of problem (1), denoted by x∗ .
Moreover, we update v to cT x∗ . The OA master problem (5) is solved by call-
ing procedure solveOA. Furthermore, in order to refine the OA, the following
projection sub-problem is solved for k ∈ K
ŷk = argmin xk − x̂k 2 s. t. xk ∈ Gk , xki = x̂ki , i ∈ Ik , (13)
by calling the procedure project(x̂k ). The points x̂k and ŷk are used by method
addCutsAndPoints for cut and breakpoint generation. The algorithm itera-
tively performs these steps until a stopping criterion is fulfilled.
502 P. Muts and I. Nowak
Algorithm 3. OA-Initialization
1: function initOa
2: for k ∈ K do
3: for dk ∈ {ck , 1, −1} do
4: k , Sk ) ← oaSubSolve(dk )
(x̂k , C
5: Lk ← {xk ∈ Lk : dTk x̂k ≤ dTk xk }
6: [xk , xk ] ← box(Sk )
7: Bk ← {xk , xk , xk , xk }
8: ←addRnlpCuts(x , x )
(x̂, C)
9: return (x̂, C, L, B)
Note that 1 denotes a vector of ones and box(S) denotes the smallest interval
[x , x ] containing a set S. The procedure addRnlpCuts(x , x ) performs cutting
plane iterations for solving the RNLP-OA
(ỹ, s) = argmin cT x + γ s 1
s. t. Ax ≤ b + s, x ∈ [x , x ], s≥0 (16)
hkj (xk ) − q̌kj (xk ) ≤ 0, j ∈ Jk , k ∈ K,
where
xki − xki xki − xki
q̌kj (xk ) = σkj ϕki (xki ) + ϕ ki (x
ki ) . (17)
xki − xki xki − xki
i∈[nk ]
min dTk xk k .
s. t. xk ∈ X (18)
Note that Algorithm 4 uses temporary breakpoints Bk , which are initialized
using initLocalBreakPoints.
Algorithm 4. OA sub-solver
1: function oaSubSolve(dk )
2: x̂k ←solveSubOa(C k )
3: Bk ← initLocalBreakPoints
4: repeat
5: ŷk ← project(x̂k )
6: (Ck , Bk ) ← addCutsAndPoints(x̂k , ŷk , Bk )
7: x̂k ←solveSubOa(C k )
8: until stopping criterion
9: x∗k ← solveFixedNlp(x̂k )
10: Sk ← Sk ∪ {x∗k }
11: return (x̂k , C k , Sk )
504 P. Muts and I. Nowak
3.4 Fix-and-Refine
The procedure fixAndRefine, described in Algorithm 5, generates cuts and
breakpoints per block by solving a partly-fixed sub-problem similarly as in
Algorithm 4. It uses the procedure solveFixOA for solving a MIP-OA problem,
where variables are fixed for all blocks except for one:
min cTk xk + γ s 1
s.t. Ak xk ≤ s + b − Am x̃m ,
(19)
m∈K\{k}
k ,
xk ∈ X s ≥ 0,
where x̃ is a solution candidate of (1).
Fig. 1. Number of MIP solutions per problem size for convex MINLPs
506 P. Muts and I. Nowak
5 Conclusions
References
1. Gleixner, A., Eifler, L., Gally, T., Gamrath, G., Gemander, P., Gottwald, R.L.,
Hendel, G., Hojny, C., Koch, T., Miltenberger, M., Müller, B., Pfetsch, M.E.,
Puchert, C., Rehfeldt, D., Schlösser, F., Serrano, F., Shinano, Y., Viernickel,
J.M., Vigerske, S., Weninger, D., Witt, J.T., Witzig, J.: The SCIP Optimiza-
tion Suite 5.0. Technical report, www.optimization-online.org/DB HTML/2017/
12/6385.html (2017)
2. Hart, W.E., Laird, C.D., Watson, J.P., Woodruff, D.L., Hackebeil, G.A., Nicholson,
B.L., Siirola., J.D.: Pyomo–optimization modeling in python, vol. 67, second edn.
Springer Science & Business Media, Heidelberg (2017)
3. Lundell, A., Kronqvist, J., Westerlund, T.: The supporting hyperplane optimiza-
tion toolkit. www.optimization-online.org/DB HTML/2018/06/6680.html (2018)
4. Nagarajan, H., Lu, M., Wang, S., Bent, R., Sundar, K.: An adaptive, multivariate
partitioning algorithm for global optimization of nonconvex programs. J. Global
Optim. (2019)
5. Nowak, I.: Relaxation and Decomposition Methods for Mixed Integer Nonlinear
Programming. Birkhäuser (2005)
6. Nowak, I., Breitfeld, N., Hendrix, E.M.T., Njacheun-Njanzoua, G.: Decomposition-
based inner- and outer-refinement algorithms for global optimization. J. Global
Optim. 72(2), 305–321 (2018)
7. Tawarmalani, M., Sahinidis, N.: A polyhedral branch-and-cut approach to global
optimization. Math. Program. 225–249 (2005)
8. Vigerske, S.: Decomposition in multistage stochastic programming and a constraint
integer programming approach to mixed-integer nonlinear programming. Ph.D.
thesis, Humboldt-Universität zu Berlin (2012)
9. Vigerske, S.: MINLPLib. http://minlplib.org/index.html (2018)
10. Wächter, A., Lorenz, B.T.: On the implementation of an interior-point filter line-
search algorithm for large-scale nonlinear programming. Math. Program. 106(1),
25–57 (2006)
Optimization under Uncertainty
Fuzzy Pareto Solutions in Fully Fuzzy
Multiobjective Linear Programming
Manuel Arana-Jiménez(B)
1 Introduction
Fuzzy linear programming is a field where many researchers model decision mak-
ing in fuzzy environment [3,8,11,15,17,31,32]. It is usual that not all variables
and parameters in the fuzzy linear problem are assumed to be fuzzy numbers,
although it is interesting to provide a general model for linear problems where
all elements are fuzzy, called fully fuzzy linear programming problem ((FFLP)
problem, for short). In this regard, Lofti et al. [30] proposed a method to find the
fuzzy optimal solution of (FFLP) with equality constraints with symmetric fuzzy
numbers. Kumar et al. [26] proposed a new method for finding the fuzzy optimal
solutions of (FFLP) problems with equality constraints, using ranking function
(see [3] and the bibliography there in). Najafi and Edalatpanah [35] made cor-
rection to the previous method. Khan et al. [24] studied (FFLP) problems with
inequalities, and they also use ranking functions to compare the objective func-
tion values (see also [10,25]). Ezzati et al. [16] recovered the methods provided
by Lofti et al. [30] and Kumar et al. [26] to propose a new method based on
a multiobjective programming problem with equality constraints. Liu and Gao
[29] have remarked some limitations of the existing method to solve (FFLP)
problems. As applications, Chakraborty et al. [12] locate fuzzy optimal solutions
in fuzzy transportation problems. Recently, Arana-Jiménez [5] have provided a
novel method to find fuzzy optimal (nondominated) solutions of (FFLP) prob-
lems with inequality constraints, with triangular fuzzy numbers and not neces-
sarily symmetric, via solving a crisp multiobjective linear programming problem.
This method does not require ranking functions.
Supported by the research project MTM2017-89577-P (MINECO, Spain) and UCA.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 509–517, 2020.
https://doi.org/10.1007/978-3-030-21803-4_51
510 M. Arana-Jiménez
On the other hand, some models require decision maker to face several objec-
tives at the same time. This type of problems includes multiobjective program-
ming problems, where two or more objectives have to be optimized (minimized
or maximized), and we deal with conflicts among the objectives. The Pareto
optimality in multiobjective programming associates the concept of a solution
with some property that seems intuitively natural, and is an important concept
in mathematical models, economics, engineering, decision theory, optimal con-
trol, among others (see [2]). So, extending the idea of fuzzy linear programming
to Fuzzy multiobjective linear programming, again the objectives appearing in it
are conflicting in nature. Therefore, a concept of Pareto solution is necessary too.
In such fuzzy multiobjective problems, Bharati et al. [9] comment that choice of
best alternatives among the available need to rank the fuzzy numbers used in the
model. They compare different methods using ranking functions, and propose
the concept of Pareto-optimal solution suggested by Jimenez and Bilbao [22] by
means of ranking function. Some applications can be found by Kumar et al. [27],
for instance, to DEA. In the present work, and as an extension of [5], we face the
challenge of studying a problem with fuzzy variables and parameters, that is, a
fully fuzzy multiobjective linear programming problem ((FFMLP) problem, for
short). In this regard, a new method is proposed to get fuzzy Pareto solutions,
and no ranking functions are used.
The structure is as follows. In next section, we present notations, arithmetic
and partial orders on fuzzy numbers. Later, in Sect. 3, we formulate the fully
fuzzy multiobjective linear programming problem and provide an algorithm to
generate fuzzy Pareto solutions of (FFMLP) by means of solving an auxiliary
crisp multiobjective programming problem. Finally, we conclude the paper and
present future works. Due to length requirements on this text for the congress,
proofs and examples are omitted and will be presented in a paper (extended
version).
A fuzzy set on Rn is a mapping u : Rn → [0, 1]. Each fuzzy set u has associated
a family of α-level sets, which are described as [u]α = {x ∈ Rn | u(x) ≥ α}
for any α ∈ (0, 1], and its support as supp(u) = {x ∈ Rn | u(x) > 0}. The
0-level of u is defined as the closure of supp(u), that is, [u]0 = cl(supp(u)). A
very useful type of fuzzy set to model parameters and variables are the fuzzy
numbers. Following Dubois and Prade [13,14], a fuzzy set u on R is said to be
a fuzzy number if u is normal, this is there exists x0 ∈ R such that u(x0 ) = 1,
upper semi-continuous function, convex, and (iv) [u]0 is compact. FC denotes the
family of all fuzzy numbers. The α-levels of a fuzzy number can be represented
by means of real interval, that is, [u]α = [uα , uα ] ∈ KC , uα , uα ∈ R, with KC
is the set of real compact intervals. There exist many families of fuzzy numbers
that have been applied to model uncertainty in different situations. some of the
Fuzzy Pareto Solutions in Fully Fuzzy Multiobjective Linear Programming 511
most popular are the L-R, triangular, trapezoidal, polygonal, gaussian, quasi-
quadric, exponential, and singleton fuzzy numbers. The reader is referred to
[7,21,36] for a complete description of these families and their representation
properties. Among them, we point out triangular fuzzy numbers, because of
their easy modeling and interpretation (see, for instance, [13,23,24,30,36]), and
whose definition is as follows.
At the same time, given a triangular fuzzy number ã = (a− , â, a+ ), its α-
levels are formulated as
for all α ∈ [0, 1]. This means that triangular fuzzy number are well determined
by three real numbers a− ≤ â ≤ a+ . A unique triangular fuzzy number is char-
acterized by means of the previous formulation of α-levels, such as Goestschel
and Voxman [19] established. The set of all TFNs is denoted as TF .
The nonnegativity condition on some parameters and variables in many opti-
mization problems makes useful the following special consideration of TFNs. Let
ã be a fuzzy number. We say that ã is nonnegative fuzzy number (nonpositive,
respectively) if ã0 ≥ 0 (ã0 ≤ 0, respectively). So, in the case that ã is a TFN,
then ã nonnegative (nonpositive, respectively) if and only if a− ≥ 0 (a+ ≤ 0,
respectively).
Classical arithmetic operations on intervals are well known, and can be
referred to Moore [33,34] and Alefeld and Herzberger [1]. A natural extension of
these arithmetic operations to fuzzy numbers u, v ∈ FC can be found described
in [18,28], where the membership function of the operation u ∗v, with ∗ ∈ {+, ·},
is defined by
(u ∗ v)(z) = sup min{u(x), v(y)}. (1)
z=x∗y
However, TF is not closed under the multiplication operation (4) (see, for
instance, the examples in [39]). To avoid this situation, it is usual to apply a
different multiplication operation between TFNs, such as those referenced in
[5,23,24,26], which can be considered as an approximation to the multiplication
given in (1). We provide the following definition for the multiplication:
ãb̃ = ((ãb̃)− , (ãb̃), (ãb̃)+ )
= (min{a− b− , a− b+ , a+ b− , a+ b+ }, âb̂, max{a− b− , a− b+ , a+ b− , a+ b+ }). (7)
(i) μ ≺ ν if and only if μα < ν α and μα < ν α , for all α ∈ [0, 1],
(ii) μ ν if and only if μα ≤ ν α and μα ≤ ν α , for all α ∈ [0, 1],
The relations , are obtained in a similar manner. Note that to say that ã is
nonnegative is equivalent to write ã 0̃ = (0, 0, 0).
Fuzzy Pareto Solutions in Fully Fuzzy Multiobjective Linear Programming 513
Definition 3. Let x̄
˜ be a feasible solution for (FFMLP). x̄ ˜ is said to be a fuzzy
Pareto solution of (FFMLP) if there does not exist a feasible solution x̃ for
n n
n
c̃i0 j x̄
˜j for some i0 ∈ {1, . . . , p}.
j=1
Let us remark that x̃j is a nonnegative TFN, and so the multiplication role is
given by (8). This means that c̃ij x̃j is computed by one of the three expressions
in (8), which only depends on c̃ij . Since the fuzzy coefficients c̃ij are known, then
the expressions of c̃ij x̃j = ((c̃ij x̃j )− , (c̃ij x̃j ), (c̃ij x̃j ) ) are also known. The same
+
n
n
n
with fh linear functions, h = 1, . . . , 3p. And since all constraints are represented
as linear inequalities on the variable x, then (CMLP) is a multiobjective linear
programming problem. Recall that a feasible point x̄ ∈ R3n of (CMLP) is said
to be a Pareto solution if there does not exist another feasible point x such
that fh (x̄) fh (x), for all h = 1, . . . , 3p, and fh0 (x̄) < fh0 (x), for some h0 ∈
{1, . . . , 3p}. The relationship between the fuzzy Pareto solutions of (FFMLP)
and the Pareto solutions of (CMLP) is as follows.
Theorem 2. x̃ = (x̃1 , . . . , x̃n ) with x̃j = (x− j , x̂j , xj )
+
∈ TF , j =
1, . . . , n, is a fuzzy Pareto solution of (FFMLP) if and only if x =
(x− + −
1 , x̂1 , x1 , . . . , xn , x̂n , xn ) ∈ R
+ 3n
is a Pareto solution of (CMLP).
In the literature, we can find several methods to generate Pareto solutions
of a multiobjective linear problem (see [2] and the bibliography therein). Most
popular methods are based on scalarization. One of them is by means of related
weighting problems, whose formulation can be as follows. Given (CMLP) and
3p
w = (w1 , . . . , w3p ) ∈ R3p , wi > 0, i=1 wi = 1, we define the related weighting
problem as
3p
(CMLP)w Minimize wi fi (x)
i=1
n
subject to (ãrj x̃j )− ≤ b−
r , r = 1, . . . , m,
j=1
n
(ãrj x̃j ) ≤ b̂r , r = 1, . . . , m,
j=1
n
(ãrj x̃j )+ ≤ b+
r , r = 1, . . . , m,
j=1
x−
j − x̂j ≤ 0, j = 1, . . . , n,
x̂j − x+
j ≤ 0, j = 1, . . . , n,
−
xj ≥ 0, x̂j ≥ 0, x+j ≥ 0, j = 1, . . . , n.
Fuzzy Pareto Solutions in Fully Fuzzy Multiobjective Linear Programming 515
3p
Theorem 3. Given w = (w1 , . . . , w3p ) ∈ R3p , wi > 0, i=1 wi = 1, if
− + n
x = (xj , x̂j , xj )j=1 ∈ R is an optimal solution of the weighting optimiza-
3n
The previous result allows us to outline a method to get fuzzy Pareto solu-
tions for (FFMLP) problem, which can be written via the following algorithm.
Algorithm
4 Conclusions
References
1. Alefeld, G., Herzberger, J.: Introduction to Interval Computations. Academic
Press, New York (1983)
2. Arana-Jiménez, M. (ed.): Optimiality Conditions in Vector Optimization. Bentham
Science Publishers Ltd, Bussum (2010)
516 M. Arana-Jiménez
25. Khan, I.U., Ahmad, T., Maan, N.: A reply to a note on the paper “A simplified
novel technique for solving fully fuzzy linear programming problems”. J. Optim.
Theory Appl. 173, 353–356 (2017)
26. Kumar, A., Kaur, J., Singh, P.: A new method for solving fully fuzzy linear pro-
gramming problems. Appl. Math. Model. 35, 817–823 (2011)
27. Mehlawat, M.K., Kumar, A., Yadav, S., Chen, W.: Data envelopment analysis
based fuzzy multi-objective portfolio selection model involving higher moments.
Inf. Sci. 460, 128–150 (2018)
28. Liu, B.: Uncertainty Theory. Springer-Verlag, Heidelberg (2015)
29. Liu, Q., Gao, X.: Fully fuzzy linear programming problem with triangular fuzzy
numbers. J. Comput. Theor. Nanosci. 13, 4036–4041 (2016)
30. Lotfi, F.H., Allahviranloo, T., Jondabeha, M.A., Alizadeh, L.: Solving a fully fuzzy
linear programming using lexicography method and fuzzy approximate solution.
Appl. Math. Modell. 3, 3151–3156 (2009)
31. Maleki, H.R., Tata, M., Mashinchi, M.: Linear programming with fuzzy variables.
Fuzzy Set. Syst. 109, 21–33 (2000)
32. Maleki, H.R.: Ranking functions and their applications to fuzzy linear program-
ming. Far East J. Math. Sci. 4, 283–301 (2002)
33. Moore, R.E.: Interval Analysis. Prentice-Hall, Englewood Cliffs (1966)
34. Moore, R.E.: Method and Applications of Interval Analysis. SIAM, Philadelphia
(1979)
35. Najafi, H.S., Edalatpanah, S.A.: A note on ”A new method for solving fully fuzzy
linear programming problems”. Appl. Math. Model. 37, 7865–7867 (2013)
36. Stefanini, L., Sorini, L., Guerra, M.L.: Parametric representation of fuzzy numbers
and application to fuzzy calculus. Fuzzy Sets Syst. 157(18), 2423–2455 (2006)
37. Stefanini, L., Arana-Jiménez, M.: Karush-Kuhn–Tucker conditions for interval and
fuzzy optimization in several variables under total and directional generalized dif-
ferentiability. Fuzzy Sets Syst. 262, 1–34 (2019)
38. Wu, H.C.: The optimality conditions for optimization problems with convex con-
straints and multiple fuzzy-valued objective functions. Fuzzy Optim. Decis. Making
8, 295–321 (2009)
39. Yasin Ali Md., Sultana, A., Khodadad Kha, A.F.M.: Comparison of fuzzy multipli-
cation operation on triangular fuzzy number. IOSR J. Math. 12(4), 35–41 (2016)
Minimax Inequalities and Variational
Equations
1 Introduction
Minimax inequalities are normally associated with game theory. This was the
original motivation of von Neumann work, in 1928, but in mathematical lit-
erature, generalizations of von Neumann results, called minimax theorems,
became objects of study in their own right. These generalizations focuss on
various directions. Some of them pay attention on topological conditions, other
on the study of weak convexity conditions (see [23]). At the same time mini-
max inequalities have turned out to be a powerful tool in other fields: see, for
instance, [4,5,12,13,15,16,18,21,22].
In this work, Sect. 2, we illustrate the applicability of a minimax inequality
to analyse the existence of a solution for a quite general variational inequalities
system. After that, in Sect. 3, we explore some new generalizations of minimax
theorems with weak convexity conditions.
For the first aim we analyse a class of system which arises in many situations.
To evoke one of them, let us recall that the study of variational equations with
constraints emerges naturally, among others, from the context of the elliptic
boundary value problem, when their essential boundary conditions are treated
as constraints in their standard variational formulation. This leads one to its
variational formulation, which coincides with the system of variational equations:
z ∈ Z ⇒ f (z) = a(x0 , z)
find x0 ∈ X such that ,
y ∈ Y ⇒ g(y) = b(x0 , y)
Partially supported by project MTM2016-80676-P (AEI/FEDER, UE) and by Junta
de Andalucı́a Grant FQM359.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 518–525, 2020.
https://doi.org/10.1007/978-3-030-21803-4_52
Minimax Inequalities and Variational Equations 519
for some Banach spaces X and Y , a closed vector subspace Z of X, some con-
tinuous bilinear forms a : X × X −→ R and b : X × Y −→ R, and f ∈ X ∗ and
g ∈ Y ∗ (“∗ ” stands for “topological dual space”): see the details, for instance, in
[10, Sect. 4.6.1]. In a more general way, we deal with the following problem: let X
be a real reflexive Banach space, N ≥ 1, and suppose that for each j = 1, . . . , N ,
Yj is a real Banach space, yj∗ ∈ Yj∗ , Cj is a convex subset of Yj with 0 ∈ Cj , and
aj : X × Yj −→ R is a bilinear form satisfying yj ∈ Cj ⇒ aj (·, yj ) ∈ X ∗ ; then
⎧
⎨ y1 ∈ C1 ⇒ y1∗ (y1 ) ≤ a1 (x0 , y1 )
find x0 ∈ X such that ··· . (1)
⎩ ∗
yN ∈ CN ⇒ yN (yN ) ≤ aN (x0 , yN )
This kind of variational system is so general that it includes certain mixed varia-
tional formulations associated with some elliptic problems, those in the so-called
Babuška–Brezzi theory (see, for instance [3,9] and some of its generalizations [7]).
Proof. The fact that (2) ⇒ (3) is straightforward. On the other hand, let α > 0
in such a way that (3) holds. Then we apply the minimax theorem, Theorem 1
to the convex sets X := αBE , Y := C and the bifunction
max inf (a(x, y) − y0∗ (y)) = inf max (a(x, y) − y0∗ (y)).
x∈αBE y∈C y∈C x∈αBE
inf max (a(x, y) − y0∗ (y)) = inf (αa(·, y) − y0∗ (y))
y∈C x∈αBE y∈C
and according to (3). Therefore, the left-hand side term is also nonnegative, i.e.,
there exists x0 ∈ E –in fact, x0 ∈ αBE –
To conclude, the fact that x0 can be choosen in αBE implies the stability
condition (4).
The next result establishes that this necessary condition is also sufficient (see [8,
Theorem 2.2, Corollary 2.3]).
yj ∈ Cj ⇒ aj (·, yj ) ∈ E ∗ .
Example 1. Given μ ∈ R and f ∈ Lp (0, 1), (1 < p < ∞), let us consider the
boundary value problem:
−z + μz = f on (0, 1)
. (6)
z(0) = 0, z(1) = 0
where
X := W 1,p (0, 1), Y := W 1,q (0, 1), Z := Lp (0, 1), W := Lq (0, 1),
y0∗ (y) := 0, (y ∈ Y )
and 1
w0∗ (w) := − f w, (w ∈ W ).
0
522 M. I. Berenguer et al.
Now Theorem 3 applies, since this system adopts the form of (5) with N = 2,
the reflexive space E := (X × Z)∗ , the Banach spaces F1 := Y , F2 := W , the
convex sets C1 := F1 , C2 := F2 , the continuous bilinear forms a1 : E ∗ ×F1 −→ R
and a2 : E ∗ × F2 −→ R defined at each (x, z) ∈ E ∗ , y ∈ F1 and w ∈ F2 as
and
a2 ((x, z), w) := c(x, w) + d(z, w),
and the continuous linear forms y1∗ := y0∗ and y2∗ := w0∗ . This mixed variational
formulation admits a unique solution (x, z) ∈ E = (X × Z)∗ as soon as |μ| < 0.5:
see [8, Example 2.4] for the details.
Let us mention that the boundary problem in the preceding example does
not fall into the scope of the Babuška–Brezzi theory, or even the more general
one of [7], where the analysis of Theorem 3 is done by means of independent
conditions of the involved bilinear forms.
The abstract uniformity in the Theorem 3 allow us to state a Galerkin scheme
for the system of inequalities under study, when the convex sets Cj coincide with
the space Fj and the bilinear forms are continuous (see [8]).
Let us also emphasize to conclude that the numerical treatment of some
inverse problems related to the systems of variational equalities under consider-
ation has been developed in [14].
In the previous section we have shown some applications of the von Neumann–
Fan minimax inequality to variational analysis. Now we focus on the study of
minimax inequalities. With the aim of deriving more general applications, a wide
variety of this kind of results has appeared in the last decades. Most of them
involves a certain concept of convexity and some topological conditions.
Let us first recall that a minimax inequality is a result guaranteeing that,
under suitable hypotheses, a function f : X × Y −→ R, with X and Y nonempty
sets, satisfies the inequality
and therefore, the equality also holds, since the opposite inequality is always
true.
Note that when X is a compact topological space and f is upper semicon-
tinuous on X, the inequality (7) can be written as in Theorem 1.
Our starting point is the generalization of upper-semicontinuity introduced
in ([1, Definition 8]) : if X is a nonempty topological space, Y is a nonempty
set, x0 ∈ X and inf y∈Y supx∈X f (x, y) ∈ R, let us recall that f is infsup–
transfer upper semicontinuous in x0 if, for (x0 , y0 ) ∈ X × Y , f (x0 , y0 ) <
Minimax Inequalities and Variational Equations 523
inf y∈Y supx∈X f (x, y) implies that there exist y1 ∈ Y and a neighborhood U ⊂ X
of x0 such that f (x, y1 ) < inf y∈Y supx∈X f (x, y) for all x ∈ U . In addition, f is
said to be infsup–transfer upper semicontinuous on X when it is at each x0 ∈ X.
We also assume the following concept of convexity introduced in [11] whithout
a nomenclature.
Given X and Y nonempty sets, a function f : X × Y −→ R is said to be
infsup–convex on Y provided that
m
m ≥ 1, t ∈ Δm
⇒ inf sup f (x, y) ≤ sup tj f (x, yj ),
y1 , . . . , ym ∈ Y y∈Y x∈X x∈X j=1
4 Conclusions
In this work, we characterize the existence of solution for a system of variational
equations in some reflexive Banach spaces in terms of the existence of a certain
scalar. Our main tool is a classical minimax theorem. The family of considered
systems includes those appearing in the Babuŝka–Brezzi theory. We illustrate our
results with a non-Hilbert example. In addition we mention sufficient conditions
of convex and topological nature, which guarantee the validity of some minimax
inequalities.
References
1. Baye, M., Tian, G., Zhou, J.: Characterizations of the existence of equilibria in
games with discontinuous and nonquasiconcave payoffs. Rev. Econ. Stud. 60, 935–
948 (1993)
2. Berenguer, M. I., Gámez, D., Garralda-Guillem, A. I., Ruiz Galán, M.: A discrete
characterization of the solvability of equilibrium problems. Submitted for publica-
tion
3. Boffi, D., et al.: Mixed Finite Elements, Compatibility Conditions and Applica-
tions. Lecture Notes in Mathematics, vol. 1939. Springer-Verlag, Heidelberg (2008)
4. Borwein, J.M., Giladi, O.: Some remarks on convex analysis in topological groups.
J. Convex. Anal. 23, 313–332 (2016)
5. Deng, X.T., Li, Z.F., Wang, S.Y.: A minimax portfolio selection strategy with
equilibrium. Eur. J. Oper. Res. 166, 278–292 (2005)
6. Fan, K.: Minimax theorems. Proc. Nat. Acad. Sci. USA 39, 42–47 (1953)
7. Garralda Guillem, A.I., Ruiz Galán, R.: Mixed variational formulations in locally
convex spaces. J. Math. Anal. Appl. 414, 825–849 (2014)
8. Garralda Guillem, A.I., Ruiz Galán, R.: A minimax approach for the study of
systems of variational equations and related Galerkin schemes. J. Comput. Appl.
Math. 354, 103–111 (2019)
9. Gatica, G.N.: A Simple Introduction to the Mixed Finite Element Method Theory
and Applications. Springer Briefs in Mathematics. Springer, Cham (2014)
10. Grossmann, C., Roos, H.G., Stynes, M.: Numerical Treatment of Partial Differen-
tial Equations. Springer-Verlag, Heidelberg (2007)
11. Kassay, G., Kolumbán, J.: On a generalized sup-inf problem. J. Optim. Theory
Appl. 91, 651–670 (1996)
12. Khanh, P.Q., Quan, N.H.: General existence theorems, alternative theorems and
applications to minimax problems. Nonlinear Anal. 72, 2706–2715 (2010)
13. Kenmochi, N.: Monotonicity and compactness methods for nonlinear variational
inequalities, Handbook of differential equations: stationary partial differential
equations, IV, pp. 203–298. Elsevier/North-Holland, Amsterdam (2007)
14. Kunze, H., La Torre, D., Levere, K., Ruiz Galán, M.: Inverse problems via the “gen-
eralized collage theorem” for vector-valued Lax-Milgram-based variational prob-
lems. Math. Probl. Eng. 8 (2015). Article ID 764643
15. Polyanskiy, Y.: Saddle point in the minimax converse for channel coding. IEEE
Trans. Inf. Theory 59, 2576–2595 (2013)
16. Ricceri, B.: On a minimax theorem: an improvement, a new proof and an overview
of its applications. Minimax Theory Appl. 2, 99–152 (2017)
Minimax Inequalities and Variational Equations 525
17. Ruiz Galán, M.: A concave-convex Ky Fan minimax inequality. Minimax Theory
Appl. 1, 11–124 (2016)
18. Ruiz Galán, M.: The Gordan theorem and its implications for minimax theory. J.
Nonlinear Convex Anal. 17, 2385–2405 (2016)
19. Ruiz Galán, M.: An intrinsic notion of convexity for minimax. J. Convex Anal. 21,
1105–1139 (2014)
20. Ruiz Galán, M.: A version of the Lax-Milgram theorem for locally convex spaces.
J. Convex Anal. 16, 993–1002 (2009)
21. Saint Raymond, J.: A new minimax theorem for linear operators. Minimax Theory
Appl. 3, 131–160 (2018)
22. Simons, S.: Minimax and Monotonicity. Lecture Notes in Mathematics, vol. 1693.
Springer-Verlag, Heidelberg (1998)
23. Simons, S.: Minimax theorems and their proofs. In: Minimax and applications.
Nonconvex Optimization and its Applications, pp. 1–23. Kluwer Academic Pub-
lishers, Dordrecht (1995)
Optimization of Real-Life Integrated Solar
Desalination Water Supply System
with Probability Functions
Bayrammyrat Myradov(&)
Ashgabat, Turkmenistan
bbmrdv@gmail.com
1 Introduction
The United Nations General Assembly set Sustainable Development Goals in 2015
which cover economic, environmental and social development issues including but not
limited to water, energy and poverty.
And one of the most important problems of modernity is the problem of reliable
water supply to consumers. This problem is especially critical in the remote desert areas
with distributed low-power consumers. Underground saline water resources could be
the source of water for consumers in such areas, but it is necessary to desalinate such
saline water. And integrated solar desalination system can be one of the most admis-
sible technological and economic solution of production of potable water in the remote
desert areas with distributed low-power consumers. So, it is necessary to investigate
attractiveness of investment into the integrated solar desalination water supply system
in the remote desert areas with distributed low-power consumers.
where: x ¼ ðx1 ; x2 ; . . .; xN ÞT ; X ¼ xjxn 2 xLn ; xU
n ; n ¼ 1; N :
Subject to:
def
P j ðxÞ ¼ P x 2 X : ai; j ~gi; j ðx; xÞ bi; j ; i ¼ 1; Ij ; P j ðxÞ aj ; j ¼ 1; J ð2Þ
3 A Model of System
A Model of System consists of: (a) simulation model of water supply system (WSS),
and (b) financial-economic model of System (FEMS). WSS and FEMS will be con-
sidered in monthly (s) and annual (t) frames, respectively. Indexes s and t correspond to
indexes i and j in Sect. 2.
528 B. Myradov
Here: qs;t ðxÞ – productivity of solar desalination unit; rs;t ðxÞ – rainfall; CA is number
of Consumer A; and kn ; kA – the coefficients of collecting rainfall. There are monthly
correlations between qs;t ðxÞ and rs;t ðxÞ.
The volume of potable water which can enter to a reservoir-storage at demand bs;t :
zs;t ðx; xÞ ¼ max 0; min Qs1;t ðx; xÞ bs1;t ; x3 ; s ¼ 2; 12; t ¼ 1; T ð4Þ
Initially: z1;1 ¼ 0
Numbers of the runs by water truck (WT) for satisfying of water demands will be:
0; if Qs;t ðx; xÞ þ zs;t ðx; xÞ bs;t
Ms;t ðx; xÞ ¼ ð6Þ
ms;t ðx; wÞ; if Qs;t ðx; xÞ þ zs;t ðx; xÞ\bs;t
where: ms;t ðx; wÞ ¼ dðbs;t Qs;t ðx; xÞ zs;t ðx; xÞÞ=Ve, here V is WT’s tank volume.
Then the rest of water in the tank of the WT is:
zrs;t ðx; xÞ ¼ Ms;t ðx; xÞ V ðbs;t Qs;t ðx; xÞ zs;t ðx; xÞÞ; s ¼ 1; 12; t ¼ 1; T ð7Þ
n o
Zs;t ðwÞ ¼ min zs;t ðx; xÞ þ zrs;t ðx; xÞ; x3 ; s ¼ 1; 12; t ¼ 1; T ð8Þ
Here: Ds is number of days in each month; ys;t is quantity of Consumer B; d A ; dw; dsf
and dsa are demand per day per Consumer A; and per day per Consumer B in certain
season respectively.
The model (3)–(9) does not allow to have deficiency of water, i.e. demand will be
satisfied always thanks to presence of WT. This is very important in remote arid area.
Here: c0 is initial investment not related with xn , and cn are initial investments related
with xn .
Annual net cash flows ðNCFt ðx; xÞÞ after paid taxes rtax are:
Here: e0;t is independent yearly expenditure; en;t is yearly expenditure related with xn ;
eWT
s;t is expenditure related with runs of WT.
Analysis and discussion of this model is a subject of separate paper in economics.
530 B. Myradov
Subject to:
0 xn xU
n ð15Þ
The results for objective function (14) are close enough among themselves although
structure of solutions is different. This shows that such problems should be solved by
several algorithms and then deep analysis of the results is required.
The certain difference of the result by SQA from other algorithms can be explained
by flatness of objective function in zone of extremum; by presence of ravines, and
many false local extrema because of empirical approximations of objective function
(14). SQA behaves also “badly” if objective function has such “bad” peculiarities as
flatness, ravines and false extrema. For instance, the numerical experiments with
various initial values of x1 between 5000 m2 and 6000 m2 have shown that objective
function (14) does not reduce its value. When process of optimization begins with x1 =
4500 m2 it reaches 4788 m2.
The best result is obtained by DE algorithm (DEA). Therefore, below some other
results obtained by using this algorithm will be discussed.
Optimization of Real-Life Integrated Solar Desalination 531
The calculations show that minimal value of (14) is $831077.8 and its maximal
value is $1076427.0. Spread of values of (14) is 29.5%, and 76.9% of (14) is in
[$930000; $1005000].
The numerical experiments also show that the surplus of water outside of the
optimized volume of a reservoir-storage (x3 = 145.6 m3) will be in 22% of simulations.
The maximal surplus of water for month can reach 1092.54 m3. It is found out that it
can happen in April with rainfall above 97 mm. The probability of such event is
extremely small, 0.0008%. But still it is important to note, that such event can occur.
The probability of surpluses of water up to 60 m3 is 11.65%. The probability of
surpluses of water above 500 m3 is 0.77%.
Also, it is very important to know number of runs by WT for satisfying demands.
These minimally are 47 runs, maximally are 224 runs for a year. The most probable
number of runs is from 100 up to 180 runs for a year, 83%. In January number of runs
can reach 34.
ð17Þ
^a ¼ ns=N
b tr ð18Þ
This inequality is used for evaluation of the chance constraint(s) by computer simu-
lation via Monte Carlo sampling technique.
This approach of the chance constraint(s) evaluation has several advantages such as
simplicity, independency from distribution function of random variables, capacity for
work with random variables in TM and right hand side (RHS) as well as with correlated
random variables. It is also computationally not so expensive because we do not need
to simulate all Ntr continuously. If during the simulating process the number of fails
will achieve Nf for given Ntr and a then the process of evaluation of chance constraint
(s) will be finished. Different approaches of estimation of Ntr are well known. Some of
theirs are discussed in [11].
Classical DEA is often used for solution of complex real-life optimization problems
though it is still theoretically not proved that this algorithm holds global convergence.
With the purpose of conducting the comparative analysis and providing an addi-
tional information to decision maker, problem (3)–(16) have been solved for a = 0.95,
a = 0.99 and a = 0.999999 by classical DEA. Some results of these calculations are
presented in Table 2. These results show peculiarities of presence of chance constraint
(16). Above all these results convincingly show importance of considering of chance
constraint (16).
Optimization of Real-Life Integrated Solar Desalination 533
The problem (3)–(15), (20) is joint chance constrained programming problem (JCCPP).
As a rule, this problem is more complicated than previous one. But this problem is
more practice-relevant than above considered cases because it is required satisfactions
of chance constraint (20) for all time horizon of T.
To use classical DEA for solving this JCCP was very time consuming and required
much computer time. At the same time no guarantee that this algorithm holds global
convergence. Modified DEA with global convergence in probability was proposed in
[11]. This algorithm was used for solving the problem (3)–(15), (20). It was proved in
[12] that DEA with modified mutation vectors based on subspace clustering mutation
operator holds global convergence.
Modified mutation vectors vM n;iNP ;g are generated as [11]:
vM
vn;iNP ;g
if r1 2 rand ð1; bNPð1 þ Rb cÞÞ NP
n;iNP ;g ¼ xn;Rbtop ;g þ randð0; 1Þ xn;b1 ;g xn;b2 ;g otherwise
ð21Þ
where: NP is the population size; Rb is the increasing factor of the random integer r1
region; xn;Rbtop ;g is an individual selected by randomly sampling from the top Rb of the
gth population; xn;b1 ;g and xn;b2 ;g are two boundary individuals, each element of which
is equal to the upper or lower boundary value with an equal probability.
534 B. Myradov
The latest 5 results of solving the problem at a = 0.99 and Ntr = 105 are presented in
Table 3.
The classical approach has been used for control parameters of DEA: the popu-
lation size NP, mutation factor, and the crossover rate.
These results show the structure of decision variables is changing considerably
during the latest 5 best iterations though objective function is changing 2.2% only. This
means decision maker should pay attention not only on value of objective function but
also to the structure of decision variables.
5 Conclusions
A Model of System which consists of (a) stochastic simulation model of water supply
system, and (b) financial-economic model of System is developed.
Three stochastic programming problems are formulated and solved.
First of these problems is a classical stochastic programming problem with
objective function in mathematical expectation form. This problem was solved by 4
difference algorithms because objective function (14) has flatness, ravines, and many
local, particularly false extrema. This problem also helps to understand necessity of
formulation chance constrained programming problem because numerical experiments
show surplus of water. At the same time this problem has another practical importance
also: what we can do with surplus of water, may be this is another source of revenue?!
This is task for another study.
But in this study we do not want to have surplus of water with certain probabilities.
Therefore, second problem was formulated as combined CCPP. This problem was
solved by classical DEA together with Monte-Carlo sampling technique of the chance
constraint(s) evaluation.
Third problem was formulated as JCCPP with toughening of constraint for surplus
of water. This problem was solved by modified DEA together with Monte-Carlo
sampling technique of the chance constraint(s) evaluation.
The results of numerical experiments of these three problems demonstrate:
• Attractiveness of investment into business projects related to organization of sus-
tainable living activity for the small community in the remote desert areas by using
Optimization of Real-Life Integrated Solar Desalination 535
Appendix
The codes have been written on FORTRAN 95. The calculations are executed on PC
Intel Core 2 Duo CPU 2.2 GHz, 3 Gb RAM. Operation system is Windows 7 (32).
There are following input data and the relations:
n ¼ 7500 m ; 2000 m ; 2850 m ; 3500 ; kn ¼ f1:3; 1:0; 0:2; 1:1g;
N ¼ 4; T ¼ 10; xU 2 2 2
c0 ¼ $50000; dr ¼ 5%; rtax ¼ 2:5%; kCB s ¼ f1; 1; 1:05; 1:1; 1:15; 1:2; 1:25; 1:3; 1:4; 1:3; 1:2; 1:1g
C A ¼ 10; cn ¼ $105=m2 ; $5=m2 ; $0:25=L; $170 ; d A ¼ 100 L; kA ¼ 100; V ¼ 10 m3
dw ¼ 3 L; dsa ¼ 4L; ds ¼ 5L; en;t ¼ f$7:35=m2 ; 0:15=m2 ; $0:0125=L; $12:1 per Consumer Bg;
e0;t ¼ $2500; 8t; e1;1 ¼ $575; ktWT ¼ f1:02; 1:01; 1:0; 1:01; 1:01; 1:01; 1:01; 1:01; 1:02g;
s ¼ 1; 12; t ¼ 2; T; a4s;t ¼ 127:5ktC 8s; ktC ¼ ktWT ; t ¼ 2; T; a4s;1 ¼ 08s;
eWT s;t1 ; y1;1 ¼ x4 ; ys;t ¼ kCB s ys1;t ; s ¼ 2; 12:
WT
s;t ¼ kt eWT
SRCs ¼ f0:92; 0:9; 0:87; 0:85; 0:81; 0; 0; 0; 0; 0:87; 0:91; 0:93g
Ds ¼ f31; 28; 31; 30; 31; 30; 31; 31; 30; 31; 30; 31g:
References
1. Ermoliev, Y.: Methods of Stochastic Programming. Nauka, Moscow (1976). (In Russian)
2. Prékopa, A.: Stochastic Programming. Kluwer, Dordrecht, Boston (1995)
3. Birge, J.R., Louveaux, F.: Introduction to Stochastic Programming. Springer-Verlag, New
Vork (1997)
4. Kall, P., Mayer, J.: Stochastic Linear Programming: Models, Theory, and Computation.
Springer, New York (2005)
5. Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on Stochastic Programming:
Modeling and Theory. MPS-SIAM Series on Optimization, Philadelphia (2009)
6. Kibzun, A.I., Kan, Yu, S.: Stochastic Programming Problems with Probability Criteria.
Fizmatlit, Moscow (2009) (in Russian).
7. Mirzoahmedov, F., Uryasyev, S.P.: Adaptive stepsize rule for the stochastic optimization
algorithm. Comput. Math. Math. Phys. 23(6), 1314–1325 (1983). (in Russian)
8. Price, K.V., Storn, R.M., Lampinen, J.A.: Differential Evolution: A Practical Approach to
Global Optimization. Springer, Berlin (2005)
9. Chen, X., Li, Y.: On convergence and parameter selection of an improved particle swarm
optimization. Int. J. Control Autom. Syst. 6(4), 559–770 (2008)
10. Zhigljavsky, A., Zilinskas, A.: Stochastic Global Optimization. Springer, Berlin (2008)
11. Myradov, B.: Optimization of stochastic problems with probability function via differential
evolution, http://www.optimization-online.org/DB_HTML/2017/11/6341.html
12. Hu, Z., Xiong, S., Wang, X., Su, Q., Liu, M., Chen, Z.: Subspace clustering mutation
operator for developing convergent differential evolution algorithm. Math. Probl. Eng.
Article ID 154626, 1–18 (2014)
13. Seyitkurbanov, S., Fateeva, G.S., Ryhlov, A.B., Sergeev, V.A.: Collection of atmospheric
precipitation in autonomous heliocomplex. Probl. Desert Dev. 4, 74–76 (1984)
14. Devroye, L.: Non-uniform Random Variate Generation. Springer, New York (1986)
Social Strategy of Particles in Optimization
Problems
Bożena Borowska(&)
1 Introduction
The inspiration of the astandard PSO model, created by Kennedy and Eberhart [1, 2],
was natural environment of the swarm of insect such as bees [3]. In practice, the PSO
method works on the population (called “swarm”) of random individuals each of which
is represented as a point of they search space. Members off the populations are called
“particles”. The particles search the space to find the optimum. In an n-dimensional
search space., each particle is represented by two n-dimensional vectors Xj = (xj1,xj2,…,
xjn) and Vj = (vj1,vj2,…,vjn). Vector Xj describes location of the particle in the search
space whereas vector Vj represents a velocity of a particle according to which particle
changes its position. Except location and velocity, particles also possess memory. They
remember their best previously visited locations by Pbj= (pbj1,pbj2,…,pbjn). The best
location from all the particles of the swarm is remembered as Gb = (gb1,gb2,…,gbn). In
each iteration, velocities according to which particles search space, are updated by the
following formula:
Vj ðl þ 1Þ ¼ w Vj ðlÞ þ c1 r1 Pbj Xj ðlÞ þ c2 r2 Gb Xj ðlÞ ð1Þ
Xj ðl þ 1Þ ¼ Xj ðlÞ þ Vj ðl þ 1Þ ð2Þ
where:
w Parameter called inertia weight,
Pbj The best location (of the particle j),
Gb The best location (in whole swarm),
r1,r2 Random values generated from (0,1),
c1,c2 Acceleration coefficient
3 SoPSO Method
In the presented SoPSO method, the particle swarm optimization with an efficient
strategy for acceleration coefficient is proposed. This is a nonlinear strategy. The
novelty of this approach consists of the introduction a new method for calculating
acceleration coefficients. In each iteration, while searching the space of possible
solutions, particles constantly change their locations. Each of their new position is
assessed and the particles with the minimal and maximal fitness are remembered. On
their basis, and based on the number of iterations, the new acceleration coefficient is
calculated. According to this strategy, the acceleration coefficients are dynamically
changing. This strategy allows to more precisely specify the search direction and
velocity of the algorithm according to which it travels in the search space to discover
the optimal solution of the considered problem. This strategy is described according to
the following Eqs. (3, 4):
Par ¼ g f min f Xj ðlÞ 100 Iter =ðf max Iter maxÞ ð3Þ
Vj ðl þ 1Þ ¼ w Vj ðlÞ þ c1 r1 Pbj Xj ðlÞ þ r2 Par Gb Xj ðlÞ ð4Þ
where Iter, Iter_max represents the current and maximal number of iterations, f_max
(f_min) describes the current maximal (minimal) fitness and g is a random real number
between 0 and 1.
4 Test Results
The proposed method was tested on a set of ten benchmarks functions, four of which
are presented in this article and included in Table 1. The results of the test were
compared with the performances of standards PSO and improved IPSO described in
[24].
The algorithms worked with the linear decreasing inertia weights, which was
changing from w = 0.9 to 0.4. The attempts were made for four different dimension
sizes n = 10, 20 or 30. The swarm consisted of S = 20, 40 or 80 particles. The
acceleration coefficient used in the computations was equal c1 = 2.0. The total number
of iterations was depended on dimension and was equal to 1000, 1500, and 2000
respectively. For each experiment 50 runs were conducted.
The exemplary results for Griewank, Rastrigin, Zakharov and Rosenbrock func-
tions for S = 20, 40 and 80 particles in the swarm rare shown in Tables 2–5. All
settings regarding the IPSO algorithm were adopted from Jang et al. [24].
Table 2. Mean function values of Griewank function (for 20, 40, 80 particles).
Population size Dimension Number of iterations Average
PSO IPSO SoPSO
20 10 1000 9.20e−002 7.84e−002 6.15e−002
20 1500 3.17e−002 2.36e−002 1.79e−002
30 2000 4.82e−002 1.65e−002 1.12e−002
40 10 1000 7.62e−002 6.48e−002 4.63e−002
20 1500 2.27e−002 1.82e−002 1.25e−002
30 2000 1.53e−002 1.51e−002 9.31e−003
80 10 1000 6.58e−002 5.94e−002 4.79e−002
20 1500 2.22e−002 9.10e−003 5.38e−003
30 2000 1.21e−002 4.00e−004 5.30e−005
Table 3. Mean function values of Rastrigin function (for 20, 40, 80 particles).
Population size Dimension Number of iterations Average
PSO IPSO SoPSO
20 10 1000 5.21e+000 3.29+000 3.31e+000
20 1500 2.28e+001 1.64e+001 1.47e+001
30 2000 4.93e+001 3.50e+001 3.61e+001
40 10 1000 3.57e+000 2.62e+000 2.08e+000
20 1500 1.73e+001 1.49e+001 1.03e+001
30 2000 3.89e+001 2.78e+001 2.75e+001
80 10 1000 2.38e+000 1.71e+000 1.25e+000
20 1500 1.29e+001 7.67e+000 6.03+000
30 2000 3.00e+001 1.39e+001 9.89e+000
The exemplary charts representing the mean best fitness in the following iterations
for SoPSO, IPSO and SPSO algorithms, are depicted in Figs. 1–4.
542 B. Borowska
Table 4. Mean function values of Rosenbrock function (for 20, 40, 80 particles).
Population size Dimension Number of iterations Average
PSO IPSO SoPSO
20 10 1000 4.26e+001 1.05e+001 9.67e+000
20 1500 8.73e+001 7.57e+001 7.34e+001
30 2000 1.33e+002 9.98e+001 1.02e+002
40 10 1000 2.44e+001 1.24e+000 8.12e−001
20 1500 4.77e+001 8.73e+000 6.49e+000
30 2000 6.66e+001 1.47e+001 1.28e+001
80 10 1000 1.53e+001 1.92e−001 1.43e−001
20 1500 4.06e+001 1.58e+000 1.21e+000
30 2000 6.34e+001 1.54e+000 1.15e+000
Table 5. Mean function values of Zakharov function (for 20, 40, 80 particles).
Population Dimension Number of Average
size iterations PSO IPSO SoPSO
20 10 1000 1.3499e−003 1.1410e−003 1.0026e−004
20 1500 2.3139e+002 2.1745e+002 1.8914e+002
30 2000 5.7685e+002 5.4051e+002 5.1132e+002
40 10 1000 1.4368e−005 1.2513e−005 1.1907e−006
20 1500 1.7789e+002 1.4526e+002 1.3215e+002
30 2000 3.9224e+002 3.3972e+002 3.0422e+002
80 10 1000 2.4349e−008 2.1905e−008 1.9418e−009
20 1500 8.7169e+001 4.9795e+001 2.7951e+001
30 2000 2.3817e+002 1.7842e+002 1.5376e+002
1.1E+03 SoPSO
9.2E+02 SPSO
Average Best Fitness
IPSO
7.2E+02
5.2E+02
3.2E+02
1.2E+02
0 500 1000 1500 2000
Iterations
Fig. 1. The mean best fitness for Zakharov function (for 80 particles and 30 dimensions).
1.0E+02
SoPSO
1.0E+01
SPSO
Fig. 2. The mean best fitness for Griewank function (for 80 particles and 30 dimensions).
5.0E+02
SoPSO
SPSO
Average Best Fitness
IPSO
5.0E+01
5.0E+00
0 500 1000 1500 2000
Iterations
Fig. 3. The mean best fitness for Rastrigin function (for 80 particles and 30 dimensions).
1.0E+04
SoPSO
1.0E+03 SPSO
Average Best Fitness
IPSO
1.0E+02
1.0E+01
1.0E+00
1.0E-01
0 500 1000 1500 2000
Iterations
Fig. 4. The mean best fitness for Rosenbrock function (for 80 particles and 30 dimensions).
In most cases of the studied functions, the mean function values achieved by
SoPSO were lower than the results found by the remaining algorithms (Tables 2–5)
and only in few cases were comparable to those obtained by IPSO. The SPSO algo-
rithm obtained worse results compared to IPSO, and more often had problems with
trapping into local minima.
In the standard PSO, the particles of the swarm are exclusively focused on sharing
knowledge about their personal best found position and the best position discovered by
swarm. Other information on the particle’s achievements is irrelevant and not taken
544 B. Borowska
5 Conclusion
References
1. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: IEEE International Conference
on Neural Networks, pp. 1942–1948. Perth, Australia (1995)
2. Chomatek, L., Duraj, A.: Multiobjective genetic algorithm for outliers detection. In: IEEE
International Conference on Innovations in Intelligent Systems and Applications (INISTA),
pp. 379–384. Gdynia, Poland (2017)
3. Robinson, J., Rahmat-Samii, Y.: Particle swarm optimization in electromagnetic. IEEE
Transact. Anten. Propag. 52(2), 397–407 (2004)
Social Strategy of Particles in Optimization Problems 545
4. Kennedy, J., Eberhart, R.C., Shi, Y.: Swarm Intelligence. Morgan Kaufmann Publishers, San
Francisco (2001)
5. Borowska, B.: Nonlinear inertia weight in particle swarm optimization. In: 12th International
Scientific and Technical Conference on Computer Sciences and Information Technologies
(CSIT), pp. 296–299. IEEE, Ukraine (2017)
6. Ratnaveera, A., Halgamuge, S.K., Watson, H.C.: Self-organizing hierarchical particle swarm
optimizer with time-varying acceleration coefficients. IEEE Trans. Evol. Comput. 8(3), 240–
255 (2004)
7. Mashhadban, H., Kutanaei, S.S., Sayarinejad, M.A.: Prediction and modeling of mechanical
properties in fiber reinforced self-compacting concrete using particle swarm optimization
algorithm and artificial neural network. Constr. Build. Mater. 119, 277–287 (2016)
8. Soszyński, F., Wołowski, J., Stasiak, B.: Music games: as a tool supporting music education.
In: Proceedings of the Conference on Game Innovations, CGI 2016, pp. 116–132 (2016)
9. Yang, X., Yuan, J., Yuan, J., Mao, H.: A modified particle swarm optimizer with dynamic
adaptation. Appl. Mathemat. Computat. 189, 1205–1213 (2007)
10. Nouaouria, N., Boukadounm, M., Proux, R.: Particles swarm classification: as survey and
positioning. Pattern Recognit. 46, 2028–2044 (2013)
11. Kiranyaz, S., Ince, T, Gabbouj, M.: Multidimensional Particles Swarm Optimizations for
Machines Learning and a Pattern Recognition. Adapt. Learn. Optimizat, vol. 15. Springer-
Verlag, Berlin (2014)
12. Ling, S.H., Chan, K.Y., Leung, F.H., Jiang, F., Nguyen, H.: Quality and robustness
improvement for real world industrial systems using a fuzzy particle swarm optimization.
Eng. Appl. Artif. Intell. 47, 68–80 (2016)
13. Jordehy, A.R., Jasni, J.: Parameters selection in particle swarm optimisation: a survey.
J. Exp. Theor. Artif. Intell. 25, 527–542 (2013)
14. Eberhart, R.C., Shi, Y.: Particle swarm optimization: developments, applications and
resources. In: Proceedings of IEEE International Conference on Evolutionary Computation,
pp. 81–86 (2001)
15. Shi, Y., Eberhart, R.C.: A modified particle swarm optimizer. In: Proceedings of IEEE
International Conference on Evolutionary Computation, pp. 69–73 (1998)
16. Shi, Y., Eberhart, R.C.: Parameter selections in particle swarm optimization. In: Proceedings
of the 7th International Conference on Evolutionary Programming, pp. 591–600. New York
(1998)
17. Shi, Y., Eberhart, R.C.: Empirical study of particles swarm optimization. In: Proceedings of
the Congress on Evolutionary Computation, vol. 3, pp. 1945–1950 (1999)
18. He, S., Wu, Q.H., Wen, J.Y., Saunders, J.R., Paton, R.C.: A particle swarm optimizers with
passive congregation. Biosystems 78, 135–147 (2004)
19. Borowska, B.: An improved particles swarm optimization algorithm with prepair procedure.
Advan. Intellig. Syst. Comput. 512, 1–16. Spring. Internat. Publ. (2017)
20. Eberhart, RC., Shi, Y.: Evolving artificial neural network. In: Proceeding of the International
conference on Neural Network and Brain, pp. 5–13. Beijing, China (1998)
21. Borowska, B.: Exponential inertia weight in particles swarm optimization. Advanc. Intellig.
Syst. Comput. 524, 265–275, Springer Internation. iPubl. (2017)
22. Clerc, M., Kennedy, J.: The particle swarm—explosion, stability, and convergence in a
multidimensional complex space. IEEE Transact. Evolut. Comput. 6, 58–73 (2002)
23. Fan, H.Y.: A modification to particle swarm optimization algorithm. Engineering
Computation 19(8), 970–989 (2002)
24. Jiang, Y., Hu, T., Huang, C., Wu, X.: An improved particle swarm optimization algorithm.
Appl. Mathemat. Computat. 193, 231–239 (2007)
546 B. Borowska
25. Borowska, B.: Novel algorithms of particle swarm optimization with decision criteria.
J. Exp. Theor. Artif. Intell. 30, 615–635 (2018)
26. Robinson, J., Sinton, S., Rahmat-Samii, Y.: Particle swarm, genetic algorithm, and their
hybrid: optimizations of a profiled corrugated horn antenna. In: Proceedings of IEEE
International Symposium on Antennas and Propagation, vol. 1, pp. 314–317. San Antonio,
USA (2002)
27. Garg, H.: A hybrid PSO-GA algorithm for constrained optimization problems. Appl.
Mathem. Computat. 274, 292–305 (2016)
28. Dimopoulos, G.G.: Mixed-variable engineering optimization based on evolutionary and
social metaphors. Comput. Method Appl. Mechan. Eng. 196, 803–817 (2007)
29. Ratnaweera, A., Halgamuge, S.K., Watson, H.C.: Self-organizing hierarchical particle
swarm optimizer with time-varying acceleration coefficients. IEEE Trans. Evol. Comput. 8
(3), 204–255 (2004)
30. Liu, Y., Niu, B., Luo, Y.: Hybrid learning particle swarm optimizer with genetic disturbance.
Neurocomputing 151, 1237–1247 (2015)
31. Sheikhalishahi, M., Ebrahimipour, V., Shiri, H., Zaman, H., Jeihoonian, M.: A hybrid GA-
PSO approach for reliability optimization in redundancy allocation problem. Int. J. Adv.
Manuf. Technol. 68, 317–338 (2013)
32. Liu, L., Hu, R.S., Hu, X.P., Zhao, G.P., Wang, S.: A hybrid PSO-GA algorithm for job shop
scheduling in machine tool production. Int. J. Prod. Res. 53, 5755–5781 (2015)
33. Lim, W.H., Isa, N.A.M.: Particle swarm optimizations with dual-level task allocation. Eng.
Appl. Artif. Intell. 38, 88–110 (2015)
34. Abdelhalim, A., Nakata, K., El-Alem, M.: Eltawil, A 2017 Guided particle swarm
optimization method too solve general nonlinear optimization problems. Eng. Comput. 50,
568–583 (2017)
35. Wang, L., Li, L., Liu, L.: An effective hybrid PSOSA strategy for optimization and its
application to parameter estimation. Appl. Mathemat. Computat. 179, 135–146 (2006)
36. Liu, Fl, Zhou, Z.: And improved QPSO algorithm and its application in thee high-
dimensional complex problems. Chemomet. Intellig. Laborat. System 132, 82–90 (2014)
37. Shi, Y., Eberhart, R.C.: Fuzzy adaptive particle swarm optimization. In: Proceedings of the
IEEE Congress on Evolutionary Computation, vol. 1, pp. 101–106. IEEE, South Korea
(2001)
38. Khan, S.A., Engelbrecht, A.P.: A fuzzy particles swarms optimization algorithm for
computer communications network topology design. Appl. Intell. 36, 161–177 (2012)
39. Nobile, M., Cazzaniga, P., Besozzi, D., et al.: Fuzzy self-tuning PSO: at settings-free
algorithm for globals optimization. Swarm Evolution. Computat. 39, 70–85 (2018)
Statistics of Pareto Fronts
1 Introduction
Real phenomena are affected by variabilities and uncertainties, so that a description con-
sidering uncertainties is more realistic than the one provided by deterministic models. In
practice, even numeric, model or implementation errors and inaccuracies are sources of
uncertainty and variability. Thus, Uncertainty Quantification (UQ) is a field of increasing
interest, namely in design, especially for the optimization of systems. In the context of
multiobjective optimization, UQ must manipulate objects such as curves, surfaces or, more
generally, varieties - which are objects belonging to infinite dimensional spaces [1]. The
designer may be interested in statistics of the Pareto fronts (such as, for instance, mean,
median, variance, confidence intervals) or their probability distribution: in any case, a sample
formed by a significant number of fronts must be generated to produce statistically reliable
results. Thus, many calculations are requested: it is necessary to generate a large number of
fronts, while each front is the result of an optimization procedure - which must be restarted at
each time. We present here a procedure for significantly reducing the computational effort,
based on the use of Hilbert approximations of the Pareto fronts, typical of chaos expansions.
We illustrate the approach by using 2D fronts, but it generalizes to higher dimension.
© Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 547–556, 2020.
https://doi.org/10.1007/978-3-030-21803-4_55
548 M. Bassi et al.
All these parameterizations generate the same set of circles, but, as shown on Fig. 1,
the mean EðX ðtjU ÞÞ lead to different results:
EðX1 Þ ! Eðu1 Þ cosðtÞ EðX1 Þ ! 0 EðX1 Þ ! 0
; ; : ð2Þ
EðX2 Þ ! Eðu1 Þ sinðtÞ EðX2 Þ ! 0 EðX2 Þ ! Eðu1 Þ sinðtÞ
Fig. 1. Punctual means (red) furnished by three different representations of green circles
Statistics of Pareto Fronts 549
This problem involves only inequality restrictions, but the approach applies also to the
general situation involving mixed equality/inequality restrictions. In order to evaluate
statistics of the Pareto front associated to this problem, we generate ns variates from the
random vector U : u1 ; u2 ; . . .; uns . For each variate ui , we find the Pareto front asso-
ciated to:
8
>
< Minimize FðxjU ¼ ui Þ ¼ ðF1 ðxjU ¼ ui Þ; F2 ðxjU ¼ ui ÞÞ
x2R
d
It is interesting to notice that our experiments with other distances, such as, for
instance, dL2 and the modified Hausdorff distance dHM [6] led to the same results.
550 M. Bassi et al.
> > P3
>
> >
< 1f ð x Þ ¼ 1 exp x 1ffiffi
p þ u
>
< i¼1 i 3 i
Minimize 2
:
x2R3 >
> P3 ð6Þ
>
> : f2 ð xÞ ¼ 1 exp i¼1 xi 3 uip1ffiffi
>
>
:
under the restrictions 4 xi 4 ; i 2 f1; 2; 3g
The results are exhibited on Fig. 2 where 200 Pareto fronts corresponding to a sample
size ns ¼ 200 are plotted. On Fig. 2, the median appears in red, the confidence interval
90% appears in cyan, while blue curves lay outside the confidence interval.
Fig. 2. Pareto fronts for the Fonseca-Fleming problem under uncertainty. The sample has size
ns ¼ 200. The median appears in red, the confidence interval 90% in cyan. Blue curves lay
outside the confidence interval.
8
>
< f1 ðxÞq¼ffiffiffiffiffiffiffi
x1 þ u1
f1 ðxÞ f1 ðxÞ
Minimize f ð x Þ ¼ g ð x Þ: 1 þ u sin ð 10pf ð x Þ Þ
x2Rn >
:
2 2 gðxÞ gðxÞ 1
ð7Þ
under the restrictions 0 xi 1 i 2 f1; 2g
for gðxÞ ¼ 1 þ 9ðx1 þ x2 Þ:
On Fig. 3, the median Pareto front is the red curve and the confidence interval at
90% is the set of cyan fronts, while the blue fronts are the remaining 10% elements of
the set and which are the farthest ones from the median front, in the sense of Hausdorff
distance.
Fig. 3. Pareto fronts for the ZDT3 problem under uncertainty. The sample has size ns ¼ 200.
The median appears in red, the confidence interval 90% in cyan. Blue curves lay outside the
confidence interval
The preceding examples show that the construction of the median and of the confidence
interval may request a large number of Pareto fronts, namely if a high confidence is
requested. But each Pareto front results from an optimization procedure and the whole
process may be expensive in terms of computational cost. To accelerate the procedure,
we may consider the use of Generalized Fourier Series (GFS): given a relatively small
sample of exact Pareto fronts, we may determine an expansion of the functions FðtjU Þ
corresponding to the Pareto fronts and use it to generate a much larger sample of
approached Pareto fronts.
552 M. Bassi et al.
The process involves two steps: The first one consists in approximating each exact
Pareto front by a polynomial of degree N 1, whose coefficients are exact too. In the
second step, another approximation of the random vectors of the exact coefficients is
made by using GFS. In the sequel, “ex” refers to “exact” and “app” refers to
“approached”.
For instance,
let us consider
the Pareto front as given by the equation
FðtjU Þ ¼ f1ex ðtjU Þ; f2ex ðtjU Þ : t 2 ð0; 1Þ. We may consider the expansion:
XN
fiex ðtjU Þ cex ðU ÞWj ðtÞ;
j¼1 ij
Wj ðtÞ ¼ tj1 : ð8Þ
app
Xnc
ij ðU Þ cij ðU Þ ¼
cex k¼0
dijk Uk ðU Þ; ð9Þ
0 h P i1
j1
E f1ex ðtjU Þ Nj¼1 cex ð U Þt 6:4 103
em ¼ @ h
1j
1i A ¼
P j1 6:4 103
E f2ex ðtjU Þ Nj¼1 cex
2j ðU Þt
1
where k:k1 refers to the norm 1 defined on Rn by: kxk1 ¼ supfjxi j; 1 i ng.
Now, we generate a sample of 1000 values of capp ij resulting from the GFS
ex
expansion of cij and we construct 1000 approached Pareto fronts given by:
XN
fiapp ðtjU Þ ¼ capp ðU Þtj1 ; t
j¼1 ij
2 ð0; 1Þ: ð10Þ
Now, after having compared two samples of the same size, according to the values in
Tables 1 and 2, we generate 105 size sample of capp that allows us building 105 of
app ij
approached Pareto fronts f1 ðtjU Þ; f2app ðtjU Þ : t 2 ð0; 1Þ.
In Fig. 4, we present in cyan the new set of 105 of the approached Pareto fronts of
Fonseca and Fleming and in black, the mean Pareto front Papp m resulting from the
( ! )
P app j1 P app j1
N N
app
means of cij 1 i 2 such as: Pm ¼ app
c1j :t ; c2j :t ; t 2 ð0; 1Þ .
j¼1 j¼1
1j7
app
Table 1. Relative errors between the correlation coefficients of 1000 values of cex
ij and cij .
0 −5,31835E −4,9891E −4,47895E −2,87411E −2,48081E −2,47322E −5,15009E −3,45621E −1,99118E −6,84894E −3,09873E −2,46507E −2,47294E
−10 −08 −08 −08 −08 −08 −10 −09 −09 −08 −08 −08 −08
−5,31835E 0 −4,44304E −3,84528E −2,35116E −1,98608E −1,97139E −1,35588E −5,64872E −9,6811E −8,32324E −2,50653E −1,95474E −1,97111E
−10 −08 −08 −08 −08 −08 −09 −09 −10 −08 −08 −08 −08
−4,9891E−08 −4,44304E 0 −1,36275E −4,58613E −6,31702E −6,63513E −4,02546E −5,75829E −3,2258E −2,38264E −5,82029E −7,04689E −6,63732E
−08 −09 −09 −09 −09 −08 −08 −08 −07 −09 −09 −09
−4,47895E −3,84528E −1,36275E 0 −1,83409E −3,0819E −3,19753E −3,71717E −5,43515E −2,76388E −2,35483E −1,93134E −3,36212E −3,19907E
−08 −08 −09 −09 −09 −09 −08 −08 −08 −07 −09 −09 −09
−2,87411E −2,35116E −4,58613E −1,83409E 0 −1,61921E −1,98718E −2,34201E −3,79573E −1,54351E −1,98891E −2,90146E −2,64535E −1,99221E
−08 −08 −09 −09 −10 −10 −08 −08 −08 −07 −10 −10 −10
−2,48081E −1,98608E −6,31702E −3,0819E −1,61921E 0 −7,22795E −2,02017E −3,39841E −1,26035E −1,89211E −4,6475E −3,62969E −7,40883E
−08 −08 −09 −09 −10 −12 −08 −08 −08 −07 −10 −11 −12
−2,47322E −1,97139E −6,63513E −3,19753E −1,98718E −7,22773E 0 −2,03137E −3,41113E −1,2561E −1,89763E −4,203E−10 −1,14511E −1,16684E
−08 −08 −09 −09 −10 −12 −08 −08 −08 −07 −11 −13
−5,15009E −1,35588E −4,02546E −3,71717E −2,34201E −2,02017E −2,03137E 0 −4,57646E −1,32851E −8,20559E −2,66403E −2,04664E −2,03133E
−09 −09 −08 −08 −08 −08 −08 −09 −09 −08 −08 −08 −08
−3,45621E −5,64872E −5,75829E −5,43515E −3,79573E −3,39841E −3,41113E −4,57646E 0 −7,5895E −6,87324E −4,15572E −3,41819E −3,40736E
−09 −08 −08 −08 −08 −08 −08 −09 −09 −08 −08 −08 −08
−1,99118E– −9,6811E −3,2258E −2,76388E −1,54351E −1,26035E −1,2561E −1,32851E −7,5895E 0 −1,00573E −1,72107E −1,25239E −1,25595E
08 −10 −08 −08 −08 −08 −08 −09 −09 −07 −08 −08 −08
−6,84894E −8,32324E −2,38264E −2,35483E −1,98891E −1,89211E −1,89763E −8,20559E −6,87324E −1,00573E 0 −2,10434E −1,90461E −1,89766E
−08 −08 −07 −07 −07 −07 −07 −08 −08 −07 −07 −07 −07
−3,09873E −2,50653E −5,82029E −1,93134E −2,90146E −4,64749E −4,203E−10 −2,66403E −4,15572E −1,72107E −2,10434E 0 −3,83583E −4,19007E
−08 −08 −09 −09 −10 −10 −08 −08 −08 −07 −10 −10
−2,46507E −1,95474E −7,04689E −3,36212E −2,64535E −3,6297E −1,14512E −2,04664E −3,41819E −1,25239E −1,90461E −3,83583E 0 −1,09187E
−08 −08 −09 −09 −10 −11 −11 −08 −08 −08 −07 −10 −11
−2,47294E −1,97111E −6,63732E −3,19907E −1,99221E −7,40883E −1,16684E −2,03133E −3,40736E −1,25595E −1,89766E −4,19007E −1,09187E 0
−08 −08 −09 −09 −10 −12 −13 −08 −08 −08 −07 −10 −11
Statistics of Pareto Fronts
553
554 M. Bassi et al.
app
Table 2. Relative errors between the 4 first moments of 1000 values of cex
ij and cij .
The method presented here allows computing very large samples of a given random
object at a very low computational cost, and leads then to a better estimation of their
statistical characteristics. Note that the calculation of a 105 size sample of exact Fon-
seca and Flemings Pareto fronts lasts about 215 h, be it 9 days of calculation, while the
sample in Fig. 4 took few seconds to be generated.
Fig. 4. 105 of approached Fonseca and Fleming’s Pareto fronts (cyan) and their mean (black)
obtained by using GFS expansions
4 Concluding Remarks
In the framework of UQ, we are interested in the representation of random variables: let
us consider a couple of random variables ðU; X Þ, such that X ¼ X ðU Þ, that is X is a
function of U. If X 2 V, where V is a separable Hilbert space, associated to the scalar
product ð
;
Þ, we may consider a convenient Hilbert basis (or total family) U ¼
fui gi2N and look for a representation X given by [2]:
X
X ¼ X ðU Þ ¼ i2N
xi ui ðUÞ: ð11Þ
If the family is orthonormal, ui ; uj ¼ dij and the coefficients of the expansion are
given by xi ¼ ðX; ui ðU ÞÞ. Otherwise, we may consider the approximations of X by
finite sums:
X
X Pn X ¼ xi ui ðU Þ: ð12Þ
1in
In thiscase, the coefficients xi are the solutions of the linear system Ax ¼ B, where
Aij ¼ ui ; uj and Bi ¼ ðX; ui Þ. We have:
lim Pn X ¼ X: ð13Þ
n!1
In UQ, the Hilbert space V is mainly L2 ðX; PÞ, where X Rn and P is a probability
measure, with ðY; ZÞ ¼ EðYZÞ. Classical families U are formed by polynomials,
trigonometric functions, Splines or Finite Elements approximations. Examples of
approximations may be found in the literature (see, for instance, [2, 3]). When X is a
function of a second variable – for instance, t – we denote the function X ðtjU Þ and we
have:
556 M. Bassi et al.
X X
XðtjU Þ ¼ xi ðtÞui ðU Þ Pn X ðtjUÞ ¼ xi ðtÞui ðU Þ: ð14Þ
i2N 1in
The reader may refer to [4] to get more information and MATLAB codes for the
evaluation of the coefficients xi , namely in multidimensional situations. In practice, we
use a sample from ðtjU Þ : X ðtjU1 Þ; . . .; X ðtjUns Þ in order to evaluate the means forming
A and B.
References
1. Croquet, R., Souza de Cursi, E.: Statistics of uncertain dynamical systems. In: Topping, B.H.
V., Adam, J.M., Pallarés, F.J., Bru, R., Romero, M.L. (eds.) Proceedings of the Tenth
International Conference on Computational Structures Technology, Paper 173, Civil-Comp
Press, Stirlingshire, UK (93), pp. 541–561. https://doi.org/10.4203/ccp.93.173 (2010)
2. Bassi, M., Souza de Cursi, E., Ellaia, R.: Generalized fourier series for representing random
variables and application for quantifying uncertainties in optimization. In: 3rd International
Symposium on Uncertainty Quantification and Stochastic Modeling, Maresias, SP, Brazil,
15–19 February (2016). http://www.swge.inf.br/PDF/USM-2016-0037_027656.PDF
3. Bassi, M.: Quantification d’Incertitudes et Objets en Dimension Infinie. Ph.D. Thesis, INSA
Rouen Normandie, Normandie Université, Saint-Etienne du Rouvray (2019)
4. Souza de Cursi, E., Sampaio, R.: Uncertainty Quantification and Stochastic Modelling with
Matlab. ISTE Press, London, UK (2015)
5. Bassi, M., Souza de Cursi, E., Pagnacco, E., Ellaia, R.: Statistics of the pareto front in multi-
objective optimization under uncertainties. Lat. Am. J. Solids Struct. 15(11), e130. Epub
November 14, 2018. https://doi.org/10.1590/1679-78255018 (2018)
6. Dubuisson, M., Jain, A.K.: A modified hausdorff distance for object matching. In:
Proceedings of 12th International Conference on Pattern Recognition, October 9–13,
Jerusalem, pp. 566–568, https://doi.org/10.1109/icpr.1994.576361 (1994)
Uncertainty Quantification in Optimization
1 Introduction
Uncertainties are a key issue in engineering design: optimal solutions usually imply no
safety margins, but real systems involve uncertainty, variability and errors. For
instance, geometry, material parameters, boundary conditions, or even the model itself
include uncertainties. To provide safe designs, uncertainty must be considered in the
design procedure – so, in optimization procedures.
There are different ways to introduce uncertainty in design: the most popular ones
are interval methods, fuzzy variables and probabilistic modeling. Each approach has its
particularities, advantages and inconveniences. Here, we focus on the probabilistic
approach, which is used in the situations where quantitative statistical information
about the variability is available - fuzzy approaches are often used when the infor-
mation about uncertainty is qualitative and interval approaches do not require infor-
mation about statistical properties of the uncertainties.
When using the probabilistic approach, the variability is modeled by random vari-
ables. In general, the only assumption about the distributions the random parameters is the
existence of a mean and a variance, id est, that the random variables are square summable.
The distribution is generally calculated: it is one of the unknowns to be determined.
For instance, let us consider the model problem
where U is a random vector. Thus, the optimal solutions x may be sensitive to the
variations of U. In the case of a significant variability of u, standard optimization
procedures cannot ensure a requested safety level: for each possible value u ¼ uðxÞ,
the solution takes the value xðxÞ ¼ xðuðxÞÞ, so that x is a random variable. The
determination of its statistical properties and its distribution are requested to control
statistical properties or the probabilities of some crucial events, such as failure.
The reader may find in the literature different approaches used to guarantee the
safety of the solution: sensitivity analysis, robust optimization approaches, reliability
optimization, chance-constrained optimization, value at risk analysis. None of these
approaches furnishes the distribution of x : it is necessary to use Montecarlo simulation
or Uncertainty Quantification (UQ) approaches – in general, Montecarlo requires a
larger computational effort, while UQ approaches are more economical.
In a preceding work [1], we considered the determination of the distribution of x in
unconstrained optimization. The results extend straightly to the situation where S is
defined by inequalities:
Let us introduce
uðuÞ ¼ u1 ðuÞ; . . .; uNx ðuÞ
and a matrix X ¼ xij : 1 i NX ; 1 j n such that its line i contains the compo-
nents of xi :
Xij ¼ ðxi Þj ; i:e:; xi ¼ ðXi1 ; . . .; Xin Þ:
Then, Px ¼ uðuÞ:X. The unknowns to be determined are the elements of the matrix
X. In the sequel, we examine some methods for the determination of X.
Uncertainty Quantification in Optimization 559
2.1 Collocation
When a sample xk ; uk : 1 k ns of ns variates from the pair ðx; uÞ is available,
we may consider the system of linear equations given by:
u uk :X ¼ xk
The solution is
X1 ¼ 1=u1 ; X2 ¼ 1=ðu1 þ u2 Þ
We have
Eðyt PxÞ ¼ Y t E uðuÞt uðuÞ X; E ðyt xÞ ¼ Y t E uðuÞt x :
Thus, the coefficients X are the solution of the linear system (See Figs. 6, 7, and 8).
E uðuÞt uðuÞ X ¼ E uðuÞt X :
1X ns
Mke1 ...kn ¼ xki11 . . .xkinn E xki11 . . .xkinn
ns i¼1
1 Xns k1
Mk1 ...kn ðX Þ ¼ Pxi1 . . .Pxkinn E Pxki11 . . .Pxkinn ; Pxi ¼ uðui Þ:X
ns i¼1
These equations form a nonlinear system of ðKM þ 1Þn equations which must be
solved for the n Nx unknowns X by an appropriated method. If the number of
equations exceeds the number of unknowns, an alternative consists in minimizing a
pseudo-distance distðM ðX Þ; M e Þ. The main difficulty in this approach is to obtain a
good quality result in the numerical solution), due to the lack of convexity the mini-
mization of distðM ðX Þ; M e Þ is a global optimization problem.
Let us illustrate this approach by using the Rosenbrock’s function. Let n ¼ z and
consider a sample of 64 values of z, corresponding to 8 random values of each variable
zi . We consider KM ¼ 5 and we minimize the mean square norm kM ðX Þ M e k. For X
exactly determined, we obtain a relative error of 1.0%. By using an uniform grid of
88 values of z, the relative error is 0:8%. When 5% errors are introduced in the values
of X, the relative error is 1.0% for a sample of random values and 2.0% for an uniform
grid. When considering u as a pair of independent normal variables having mean 1.5
Uncertainty Quantification in Optimization 563
and standard deviation 0.25, the relative error is 1.6%. An example of result is shown in
Fig. 9.
Since
Thus
E uðuÞt uðuÞ Xðp þ 1Þ ¼ E uðuÞt W uðuÞ:Xð pÞ
564 E. S. de Cursi and R. Holdorf Lopez
The solution of this linear system determines X ðp þ 1Þ and, thus, Pxðp þ 1Þ . A particularly
useful situation concerns iterations where
WðX Þ ¼ X þ UðXÞ
Xðp þ 1Þ ¼ Xð1Þ þ DX ð pÞ ;
where
E uðuÞt uðuÞ DXðpÞ ¼ E uðuÞt U uðuÞ:XðpÞ :
This approach may be used, for instance, when an implementation of a descent method
for the problem (1) is available – such as, for example, a code implementing the
projected gradient descent. Then, the code furnishes W and we may adapt it to
uncertainty quantification (See Figs. 10, 11 and 12).
Fig. 10. Results for stochastic descent with an error in the distribution of u (N(1.5, 0.25))
Fig. 11. Results for Robbins-Monto iterations with an error in the distribution of u (N(1.5,
0.25)), degree 4
Uncertainty Quantification in Optimization 565
EðxÞ ¼ 0;
such as, for instance, rF ðxÞ ¼ 0, then we may use methods for uncertainty quantifi-
cation of algebraic equations (see, for instance, [2, 3]). This approach may be applied
when q ¼ 0 (only inequalities, no equalities), In this case, we may consider
mðt; uÞ ¼ minn fr ðy; t; uÞg; r ðy; t; uÞ ¼ max F ðy; uÞ t; g1 ðy; uÞ; . . .; gp ðy; uÞ :
y2R
y 62 SðuÞ ) r ðy; t; uÞ [ 0;
It results from these inequalities that m has a zero at the point t ¼ F ðx; uÞ. Thus, an
alternative approach consists in determining a zero t of m (i.e., mðt ; uÞ ¼ 0). Then
x ¼ arg minn fr ðy; t ; uÞg.
y2R
3 Concluding Remarks
References
1. Lopez, R.H., De Cursi, E.S., Lemosse, D.: Approximating the probability density function of
the optimal point of an optimization problem. Eng. Optim. 43(3), 281–303 (2011) https://doi.
org/10.1080/0305215x.2010.489607
2. Lopez, R.H., Miguel, L.F.F., De Cursi, E.S.: Uncertainty quantification for algebraic systems
of equations. Comput. Struct. 128, 189–202 (2013) https://doi.org/10.1016/j.compstruc.2013.
06.016
3. De Cursi, E.S., Sampaio, R.: Uncertainty quantification and stochastic modelling with matlab.
ISTE Press, London, UK (2015)
Uncertainty Quantification in
Serviceability of Impacted Steel Pipe
1 Introduction
To assure the urban security in the case of an explosion efforts have to be made
in developing reliability analysis and design methods. Research efforts, stimu-
lated by industrial needs, are still required to achieve this goal. Yi Zhu et al.
in [21] have analyzed oil vapor explosion accident, various causes led to the
explosion, high casualties and severe damages. They are mentioning that debris
usually present dangerous potential hazard, e.g. domino accident. Among possi-
ble affected structures pipelines can play a major role in the domino effect. This
consideration defines the object of the present research. Prediction of a debris
This research is a part of a project AMED, that has been funded with the support
from the European Union with the European Regional Development Fund (ERDF)
and from the Regional Council of Normandie.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 567–576, 2020.
https://doi.org/10.1007/978-3-030-21803-4_57
568 R. Troian et al.
The perfectly clamped hollow cylindrical steel beam is shown in Fig. 1 (a,b).
The characteristics are the length L = 1 m, the diameter d = 0.1 m and the
thickness r = 0.02, Young modulus E = 2.158e11 Pa, density ρ = 7966 kg/m3
and yield stress σy = 2.5e8 Pa.
in the elastic domain are marked with color. When impact position p = 0.1
m normal stresses are smaller than for p = 0.5, as it was expected. The input
parameters of the system that are considered to keep it in elastic domain are
chosen following the values in Fig. 2 (colored stresses ) and are given in the
Table 1.
Fig. 2. Maximum stresses values. Stresses values in an elastic domain are given by a
colorbar. The white area corresponds to a plasticity domain.
The study is concentrated on the stochastic nature of the impact. The charac-
teristics of the impact that will be studied are the impact force, duration and
position. The variability of the structure material will be considered through
variation of the Young modulus E.
Parameters are supposed to be independent and have uniform distribution
in the limits given in the Table 1 for impulse characteristics. Considering Young
modulus, the material of a pipe is assumed to be known, but it can vary slightly
572 R. Troian et al.
due to manufacturing or aging. To take this into account E has uniform distribu-
tion in a range [1.95e11; 2.2e11] Pa. The sample of a size N = 1400 is obtained
with Latin hypercube sampling (LHS) [10].
The output parameters of the model that are influenced by the impact char-
acteristics and are important for the evaluation of the structure integrity are
the maximum beam deflection wmax and maximum stresses σmax together with
deflection wp and stresses σp at the impact position.
While there are many methods available for analyzing the decomposition of
variance as a sensitivity measure, the method of Sobol [16] is one of the most
established and widely used methods and is capable of computing the Total
Sensitivity Indices (TSI), which measures the main effects of a given parameter
and all the interactions (of any order) involving that parameter. Sobol’s method
uses the decomposition of variance to calculate the sensitivity indexes. The basis
of the method is the decomposition of the model output function y = f (x)
into summands of variance using combinations of input parameters in increasing
dimensionality.
The first-order index Si represents the share of the output variance that
is explained by the considered parameter alone. Most important parameters
therefore have high index, but a low one does not mean the parameter has no
influence, as it can be involved in interactions. The total index SItot is a measure
of the share of the variance that is removed from the total variance when the
considered parameter is fixed to its reference value. Therefore parameters with
low SItot , can be considered as non-influential.
Fig. 3. Indices using Sobol’s method for the maximum beam deflection wmax , maxi-
mum stresses σmax , deflection wp and stresses σp at the impact position.
play a major role in the obtained values. Contrariwise the position of the impact
relatively to the boundary conditions and impact amplitude influence strongly
the beam response, as well as impact duration.
a) 0.001 s b) 0.0005 s
Fig. 5. Probability of possible stresses under two impacts with different intervals.
Fig. 6. The cumulative distribution functions (CDF’s) of stresses distribution. (1) cor-
responds to the case of one impact, (2.1) to the two impacts with an interval of 0.001
s, (2.2) to the two impacts with an interval of 0.0005 s and (3) corresponds to the three
impacts.
5 Conclusion
The paper proposes a stochastic analysis of a structural response under a random
impact. A steel pipeline is simulated with Bernoulli beam model and impactors
are introduced in the system by the impulses of rectangular or sinusoidal shapes.
The present research shows a need to consider not only the big/heavy impactors
or impactors with high velocity. Relatively small and slow impactors can cause
the plastic deformations and lead to rupture of a pipe and a following domino
effect. Proposed analysis is conducted on the simplified model. Nevertheless,
the conclusions on the parameters sensitivity give the insights into the problem
modeling. It was shown that the impactor properties and impact position are
more important than the structure material variation for a structural dynamic
response. Also the proposed approach when the impactor is introduced into
the system by its time-force history can save time for more complex numerical
models. Further studies with detailed 3D modeling are ongoing to detect rupture
modes for different kinds of impactors.
Acknowledgment. This research is a part of a project AMED, that has been funded
with the support from the European Union with the European Regional Development
Fund (ERDF) and from the Regional Council of Normandie.
References
1. Abrate, S.: Soft impacts on aerospace structures. Prog. Aerosp. Sci. 81, 1–17 (2016)
2. Alizadeh, A.A., Mirdamadi, H.R., Pishevar, A.: Reliability analysis of pipe convey-
ing fluid with stochastic structural and fluid parameters. Eng. Struct. 122, 24–32
(2016)
3. Andreaus, U., Casini, P.: Dynamics of sdof oscillators with hysteretic motion-
limiting stop. Nonlinear Dyn. 22(2), 145–164 (2000)
4. Antoine, G., Batra, R.: Sensitivity analysis of low-velocity impact response of lam-
inated plates. Int. J. Impact Eng. 78, 64–80 (2015)
5. Fyllingen, Ø., Hopperstad, O., Langseth, M.: Stochastic simulations of square alu-
minium tubes subjected to axial loading. Int. J. Impact Eng. 34(10), 1619–1636
(2007)
6. Hadianfard, M.A., Malekpour, S., Momeni, M.: Reliability analysis of h-section
steel columns under blast loading. Struct. Saf. 75, 45–56 (2018)
7. Kelliher, D., Sutton-Swaby, K.: Stochastic representation of blast load damage in
a reinforced concrete building. Struct. Saf. 34(1), 407–417 (2012)
8. Li, Q., Liu, Y.: Uncertain dynamic response of a deterministic elastic-plastic beam.
Int. J. Impact Eng. 28(6), 643–651 (2003)
9. Lönn, D., Fyllingen, Ø., Nilssona, L.: An approach to robust optimization of impact
problems using random samples and meta-modelling. Int. J. Impact Eng. 37(6),
723–734 (2010)
10. McKay, M.D., Beckman, R.J., Conover, W.J.: A comparison of three methods for
selecting values of input variables in the analysis of output from a computer code.
Technometrics 42(1), 55–61 (2000)
576 R. Troian et al.
11. Perera, S., Lam, N., Pathirana, M., Zhang, L., Ruan, D., Gad, E.: Deterministic
solutions for contact force generated by impact of windborne debris. Int. J. Impact
Eng. 91, 126–141 (2016)
12. Ren, Y., Qiu, X., Yu, T.: The sensitivity analysis of a geometrically unstable struc-
ture under various pulse loading. Int. J. Impact Eng. 70, 62–72 (2014)
13. Riha, D., Thacker, B., Pleming, J., Walker, J., Mullin, S., Weiss, C., Rodriguez, E.,
Leslie, P.: Verification and validation for a penetration model using a deterministic
and probabilistic design tool. Int. J. Impact Eng. 33(1–12), 681–690 (2006)
14. Shinohara, Y., Madi, Y., Besson, J.: A combined phenomenological model for the
representation of anisotropic hardening behavior in high strength steel line pipes.
Eur. J. Mech.A Solids 29(6), 917–927 (2010)
15. Shinohara, Y., Madi, Y., Besson, J.: Anisotropic ductile failure of a high-strength
line pipe steel. Int. J. Fract. 197(2), 127–145 (2016)
16. Sobol, I.M.: Global sensitivity indices for nonlinear mathematical models and their
monte carlo estimates. Math. Comput. Simul. 55(1), 271–280 (2001)
17. Timashev, S., Bushinskaya, A.: Methods of assessing integrity of pipeline systems
with different types of defects. In: Diagnostics and Reliability of Pipeline Systems,
pp. 9–43. Springer (2016)
18. Villavicencio, R., Soares, C.G.: Numerical modelling of the boundary conditions
on beams stuck transversely by a mass. Int. J. Impact Eng. 38(5), 384–396 (2011)
19. Van der Voort, M., Weerheijm, J.: A statistical description of explosion produced
debris dispersion. Int. J. Impact Eng. 59, 29–37 (2013)
20. Wagner, H., Hühne, C., Niemann, S., Khakimova, R.: Robust design criterion
for axially loaded cylindrical shells-simulation and validation. Thin-Walled Struct.
115, 154–162 (2017)
21. Zhu, Y., Qian, X.m., Liu, Z.y., Huang, P., Yuan, M.q.: Analysis and assessment of
the qingdao crude oil vapor explosion accident: lessons learnt. J. Loss Prev. Process
Ind. 33, 289–303 (2015)
Multiobjective Programming
A Global Optimization Algorithm for the
Solution of Tri-Level Mixed-Integer
Quadratic Programming Problems
1 Introduction
Optimization problems that involve three decision makers at three different deci-
sion levels are referred to as tri-level optimization problems. The first decision
maker, also referred to as the leader, solves an optimization problem which
includes in its constraint set another optimization problem solved by a second
decision maker, that it is in turn constraint by a third optimization problem
solved by the third decision maker.
A tri-level problem formulation can be applied to many different applications
in different fields including operations research, process engineering, and man-
agement. Moreover, tri-level problems can involve both discrete and continuous
Supported by Texas A&M Energy Institute, RAPID SYNOPSIS Project (DE-
EE0007888-09-03) and National Science Foundation grant [1739977].
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 579–588, 2020.
https://doi.org/10.1007/978-3-030-21803-4_58
580 S. Avraamidou and E. N. Pistikopoulos
decision variables, as they have been used to formulate supply chain manage-
ment problems [23], safety and defense [1,6,24] or robust optimization [7,13,14]
problems. Mixed-integer tri-level problems have the general form of (1), where
x is a vector of continuous variables, and y is a vector of discrete variables.
min F1 (x, y)
x1 ,y1
s.t. G1 (x, y) ≤ 0
min F2 (x, y)
x2 ,y2
s.t. G2 (x, y) ≤ 0 (1)
min F3 (x, y)
x3 ,y3
s.t. G3 (x, y) ≤ 0
x = [xT1 xT2 xT3 ]T , y = [y1T y2T y3T ]T
x ∈ Rn , y ∈ Zp
This manuscript is organized as follows. The following sub-section presents pre-
vious work on solution algorithms for tri-level problems, Sect. 2 presents the
class of problems considered and presents the proposed algorithm. In Sect. 3
computational studies are presented and Sect. 4 concludes this manuscript.
where ω is a vector of all decision variables of all decision levels, xi are continuous
bounded decision variables and yi binary decision variables of optimization level
i, Qi 0, ci and cci are constant coefficient matrices in the objective function of
optimization level i, Ai , Ei are constant coefficient matrices multiplying decision
variables of level i in the constraint set, and b is a constant value vector.
Faisca et al. [9] presented an algorithm for the solution of continuous tri-level
programming problems using multi-parametric programming [15]. Avraamidou
and Pistikopoulos [4] expanded on that approach and presented an algorithm for
the solution of mixed-integer linear tri-level problems. The approach presented
here is an extension to these algorithms and tackles the more general mixed-
integer quadratic tri-level problem.
The proposed algorithm will be introduced through the general form of the
tri-level mixed-integer quadratic programming problem (2) and then illustrated
through a numerical example in Subsect. 2.1.
582 S. Avraamidou and E. N. Pistikopoulos
The first step of the proposed algorithm is to recast the third level optimiza-
tion problem as multi-parametric mixed-integer quadratic programming prob-
lem, in which the optimization variables of the second and first level problems
are considered as parameters (3).
The solution of problem (3) results to the parametric solution (4) that con-
sists of the complete profile of optimal solutions of the third level variables, x3
and y3 as explicit functions of the decision variables of optimization levels one
and two (x1 , y1 , x2 , y2 ).
⎧
⎪
⎪ ξ1 = p1 + q1 [xT1 y1T xT2 y2T ]T if H1 [xT1 y1T xT2 y2T ]T ≤ h1 , y3 = r1
⎪
⎪
⎨ξ2 = p2 + q2 [xT1 y1T xT2 y2T ]T if H2 [xT1 y1T xT2 y2T ]T ≤ h2 , y3 = r2
x3 = . .. (4)
⎪
⎪ ..
⎪
⎪
.
⎩
ξk = pk + qk [xT1 y1T xT2 y2T ]T if Hk [xT1 y1T xT2 y2T ]T ≤ hk , y3 = rk
where ξi is the affine function of the third level continuous variables in terms
of the first and second level decision variables, Hi [xT1 y1T xT2 y2T ]T ≤ hi , y3 = ri
is referred to as critical region i, CRi , and k denotes the number of computed
critical regions.
The next step is to recast the second level optimization problem into k mp-
MIQP problems, by considering the optimization variables of the first level prob-
lem, x1 , y1 , as parameters and substituting in the corresponding functions ξi of
x3 and y3 . Also, the corresponding critical region definitions are added to the
existing set of second level constraints, as an additional set of constraints for
each problem.
The k formulated problems are solved using POP toolbox, providing the
complete profile of optimal solutions of the second level problem (for an optimal
A Global Optimization Algorithm for the Solution of T-MIQP Problems 583
third level problem), as explicit functions of the decision variables of the first
level problem, x1 and y1 .
The computed parametric solution is in turn used to formulate single-level
reformulations of the upper level problem by substituting the derived critical
region definitions and affine functions of the variables in the leader problem,
forming a single level mixed-integer quadratic programming (MIQP) problem for
each critical region. The single-level MIQP problems are solved with appropriate
algorithms (CPLEX R if convex, and either BARON R [19] or ANTIGONE R
[12] if not convex).
The final step of the algorithm is a comparison procedure to select the global
optimum solution. This is done by solving the mixed-integer linear problem (5).
z ∗ = min α
α,γ
s.t. α = γi,j zi,j
i,j
γi,j = 1 (5)
i,j
γi,j ui,j ≤ γi,j up,q ∀i, j, p = i, q
γi,j vi ≤ γi,j vr ∀i, j, r = i
γi,j ∈ {0, 1}
where z ∗ is the exact global optimum of problem (2), γi,j are binary variables
corresponding to each CRi,j , zi,j are the objective function values obtained when
solving problems in Step 6, ui are the objective function values obtained when
solving problems in Step 4, and vi are the objective function values obtained
when solving problems in Step 2.
Step 2: Problem (7) is solved using the mp-MIQP solver in POP R toolbox
[16]. The multi-parametric solution of problem (7) consists of 14 critical regions.
A subset of them is presented in Table 3.
set of constraints. The affine functions of the third level variables, x3 , are sub-
stituted in the problems, along with the value of the binary third level variables.
Finally, decision variables of the first level problem are considered as parame-
ters. The first mp-MIQP formulated corresponds to CR1 and is presented as (8).
Similar problems are formulated for the rest of the critical regions.
min u1,1 = 4x1 2 + 4y2 2 + 6y1 − 2x2 + 6y2 − (2.56x1 − 2.88x2 + 4.6) + 5(0)
x2 ,y2
s.t. 0.6236x1 + 0.7759x2 + 0.0960y2 ≤ 0.1743
−0.6644x1 − 0.7474x2 ≤ −0.2206
−y1 − y2 ≤ −1
x1 ≤ 10
y1 , y2 ∈ {0, 1}
(8)
Step 4: All the problems formulated in Step 3 are solved using the mp-MIQP
solvers in POP R toolbox [16]. The resulting solutions consisted of a total of 22
critical regions. The critical regions corresponding to CR1 and CR6 are presented
in Table 4.
Step 5: The parametric solutions for the second level problem obtained in
Step 4 are used to formulate 22 single level deterministic MIQP problems, each
corresponding to a critical region of the second level problem. Each critical region
definition is added to the first level problem as a new set of constraints and the
affine functions of the second and third level decision variables are substituted in
the objective, resulting into MIQP problems that involve only first level variables
x1 , y1 . The MIQPs formulated from CR1,1 and CR6,1 are presented below as (9)
and (10) respectively.
2
min z1,1 = 5x1 2 + 6(−0.8889x1 + 0.2951) + 3y1 + 3(1)
x1 ,y1
−3(−2.56x1 − 2.88(−0.8889x1 + 0.2951) + 4.6) (9)
s.t. 2.2792 ≤ x1 ≤ 10
y1 ∈ {0, 1}
586 S. Avraamidou and E. N. Pistikopoulos
2
min z6,1 = 5x1 2 + 6(−0.8049x1 − 0.75) + 3y1 + 3(0)
x1 ,y1
−3(−165x1 − 205(−0.8049x1 − 0.75) − 25) (10)
s.t. − 3.2852 ≤ x1 ≤ 10
y1 ∈ {0, 1}
Step 6: The 22 single level MIQP problems formulated in Step 5 are solved
using CPLEX R mixed-integer quadratic programming solver. The resulting
solutions from problems (9) and (10) are presented in Table 5.
Step 7: The comparison optimization problem (5) is then solved using the
information in Tables 3, 4 and 5. The exact global optimum is lying in CR6,1 with
optimal decisions x1 = −0.4076, y1 = 0, x2 = −0.4220y2 = 0, x3 = 3.7500, y3 = 1
and y4 = 0.
The computational performance of the algorithm for this numerical example
is presented in Table 6.
3 Computational Studies
A small set of tri-level mixed-integer quadratic problems of different sizes was
solved to investigate the capabilities of the proposed algorithm. The randomly
generated problems have the general mathematical form of (2) and all variables
appear in all three optimization levels. Table 6 presents the studied problems,
where XT denotes the total number of continuous variables of the tri-level prob-
lem, YT denotes the total number of binary variables of the tri-level problem,
X1 , X2 and X3 denote the number of continuous decision variables of the first,
second and third optimization level respectively, Y1 , Y2 and Y3 denote the num-
ber of binary decision variables of the first, second and third optimization level
respectively, C denotes the total number of constraints in the first, second and
third optimization level, L1, L2, and L3 denote the time required to solve each
optimization level, Com denotes the time required to solve the comparison prob-
lem and CPU denotes the total computational time for each test problem in
seconds.
The computations were carried out on a 2-core machine with an Intel Core
i7 at 3.1 GHz and 16 GB of RAM, MATLAB R2016a, and IBM ILOG CPLEX
Optimization Studio 12.6.3. The test problems presented in Table 6 can be found
in parametric.tamu.edu website as ‘BPOP TMIQP’.
A Global Optimization Algorithm for the Solution of T-MIQP Problems 587
Table 6. Computational results of the presented algorithm for tri-level MIQP problems
of the general form (2)
4 Conclusions
References
1. Alguacil, N., Delgadillo, A., Arroyo, J.: A trilevel programming approach for elec-
tric grid defense planning. Comput. Oper. Res. 41(1), 282–290 (2014)
2. Avraamidou, S., Pistikopoulos, E.N.: B-POP: Bi-level parametric optimization
toolbox. Comput. Chem. Eng. 122, 193–202 (2018)
3. Avraamidou, S., Pistikopoulos, E.N.: A Multi-Parametric optimization approach
for bilevel mixed-integer linear and quadratic programming problems. Comput.
Chem. Eng. 122, 98–113 (2019)
4. Avraamidou, S., Pistikopoulos, E.N.: Multi-parametric global optimization app-
roach for tri-level mixed-integer linear optimization problems. J. Global Optim.
(2018)
5. Blair, C.: The computational complexity of multi-level linear programs. Ann. Oper.
Res. 34(1), 13–19 (1992)
6. Brown, G., Carlyle, M., Salmerón, J., Wood, K.: Defending critical infrastructure.
Interfaces 36(6), 530–544 (2006)
588 S. Avraamidou and E. N. Pistikopoulos
7. Chen, B., Wang, J., Wang, L., He, Y., Wang, Z.: Robust optimization for trans-
mission expansion planning: minimax cost vs. minimax regret. IEEE Trans. Power
Syst. 29(6), 3069–3077 (2014)
8. Dua, V., Bozinis, N., Pistikopoulos, E.: A multiparametric programming approach
for mixed-integer quadratic engineering problems. Comput. Chem. Eng. 26(4–5),
715–733 (2002)
9. Faisca, N.P., Saraiva, P.M., Rustem, B., Pistikopoulos, E.N.: A multi-parametric
programming approach for multilevel hierarchical and decentralised optimisation
problems. Comput. Manag. Sci. 6, 377–397 (2009)
10. Han, J., Zhang, G., Hu, Y., Lu, J.: A solution to bi/tri-level programming problems
using particle swarm optimization. Inf. Sci. 370–371, 519–537 (2016)
11. Kassa, A., Kassa, S.: A branch-and-bound multi-parametric programming app-
roach for non-convex multilevel optimization with polyhedral constraints. J. Global
Optim. 64(4), 745–764 (2016)
12. Misener, R., Floudas, C.: Antigone: algorithms for continuous/integer global opti-
mization of nonlinear equations. J. Global Optim. 59(2–3), 503–526 (2014)
13. Moreira, A., Street, A., Arroyo, J.: An adjustable robust optimization approach
for contingency-constrained transmission expansion planning. IEEE Trans. Power
Syst. 30(4), 2013–2022 (2015)
14. Ning, C., You, F.: Data-driven adaptive nested robust optimization: general mod-
eling framework and efficient computational algorithm for decision making under
uncertainty. AIChE J. 63, 3790–3817 (2017)
15. Oberdieck, R., Diangelakis, N., Nascu, I., Papathanasiou, M., Sun, M., Avraami-
dou, S., Pistikopoulos, E.: On multi-parametric programming and its applications
in process systems engineering. Chem. Eng. Res. Design 116, 61–82 (2016)
16. Oberdieck, R., Diangelakis, N., Papathanasiou, M., Nascu, I., Pistikopoulos, E.:
Pop-parametric optimization toolbox. Ind. Eng. Chem. Res. 55(33), 8979–8991
(2016)
17. Oberdieck, R., Pistikopoulos, E.: Explicit hybrid model-predictive control: the
exact solution. Automatica 58, 152–159 (2015)
18. Oberdieck, R., Diangelakis, N.A., Avraamidou, S., Pistikopoulos, E.N.: On
unbounded and binary parameters in multi-parametric programming: Applications
to mixed-integer bilevel optimization and duality theory. J. Glob. Optim. 69(3),
587–606 (2017)
19. Sahinidis, N.: BARON 17.8.9: Global Optimization of Mixed-Integer Nonlinear
Programs, User’s Manual
20. Sakawa, M., Matsui, T.: Interactive fuzzy stochastic multi-level 0–1 programming
using tabu search and probability maximization. Expert Syst. Appl. 41(6), 2957–
2963 (2014)
21. Sakawa, M., Nishizaki, I., Hitaka, M.: Interactive fuzzy programming for multi-
level 0–1 programming problems through genetic algorithms. Eur. J. Oper. Res.
114(3), 580–588 (1999)
22. Woldemariam, A., Kassa, S.: Systematic evolutionary algorithm for general mul-
tilevel Stackelberg problems with bounded decision variables (SEAMSP). Ann.
Oper. Res. (2015)
23. Xu, X., Meng, Z., Shen, R.: A tri-level programming model based on conditional
value-at-risk for three-stage supply chain management. Comput. Ind. Eng. 66(2),
470–475 (2013)
24. Yao, Y., Edmunds, T., Papageorgiou, D., Alvarez, R.: Trilevel optimization in
power network defense. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 37(4),
712–718 (2007)
A Method for Solving Some Class of
Multilevel Multi-leader Multi-follower
Programming Problems
1 Introduction
Multi-leader-follower games are a class of hierarchical games in which a collection
of leaders compete in a Nash game constrained by the equilibrium conditions of
another Nash game amongst the followers. Generally, in a game, when several
players take the position as leaders and the rest of players take the position
as followers, it becomes a multi-leader-follower game. The leader-follower Nash
equilibrium, a solution concept for the multi-leader-follower game, can be defined
as a set of leaders’ and followers’ strategies such that no player (leader or follower)
can improve his status by changing his own current strategy unilaterally.
The early study associated with the multi-leader-follower game and equilib-
rium problem with equilibrium constraints (EPEC) could date back to 1984 by
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 589–599, 2020.
https://doi.org/10.1007/978-3-030-21803-4_59
590 A. B. Zewde and S. M. Kassa
In [7], Kassa and Kassa have reformulated the class of multilevel programs
with single leader and multiple followers, that consist of separable terms and
parameterized common terms across all the followers, into equivalent multilevel
programs having single follower at each level. Then the resulting (non-convex)
multilevel problem is solved by a solution strategy, called a branch-and-bound
multi-parametric programming approach, they have developed in [6].
In most of the literature reviewed above, the existence of equilibria have been
obtained mainly for multi-leader-follower games with specific structure (such as
bilevel case, single leader case) and with constraint sets and/or objective func-
tions assumed to have a nice property (such as linearity, convexity, differentia-
bility, separability). In this paper we consider an equivalent reformulation of
a multilevel multi-leader multi-follower programming problem into a multilevel
single leader multiple follower programming problem with one more level of hier-
archy. Then for some special classes of problems, the reformulated problem is
transformed into an equivalent multilevel programs having only single follower
at each level of the hierarchy - and hence proposing a solution approach for
multilevel multi-leader-follower games.
2 Problem Formulation
Multilevel programs involving multiple decision makers at each level over the
hierarchy are called multilevel multi-leader multi-follower (MLMLMF) program-
ming problem.
A general k-level multi-leader multi-follower programming problem involving
N leaders and multiple followers at each level can be described by:
min F1n (y1n , y1−n , y2i , y2−i , y3j , y3−j , . . . , ykl , yk−l ), n ∈ 1, . . . , N
n ∈Y n
y1 1
s.t. Gn n i −i j −j l −l
1 (y1 , y2 , y2 , y3 , y3 , . . . , yk , yk ) ≤ 0
where y1n ∈ Y1n is a decision vector for the leader’s optimization problem
and y1−n is a vector of the decision variables for all leaders without the deci-
sion variables y1n , of the nth leader. i.e., y1−n = (y11 , . . . , y1n−1 , y1n+1 , . . . , y1n ),
where, n = 1, 2, . . . , N . The shared constraint H1 is the leaders common con-
straint set whereas, the constraint Gn1 determines the constraint only for the
nth leader. ymc
∈ Ymc is a decision vector for the cth follower at level m, and
−c c−1 c+1 n
ym = (ym , . . . , ym
1
, ym , . . . , ym ), where, c = i, j, . . . , l and m ∈ 2, 3, . . . , k. The
th
shared constraint hm is the m level followers common constraint set whereas,
c
the constraint gm determines the constraint only for the cth follower at the mth
level optimization problem.
3 Equivalent Formulation
s.t. Gi (xi , y j , y −j ) ≤ 0
H(xi , x−i , y j , y −j ) ≤ 0
(2)
min
j j
fj (xi , x−i , y j , y −j )
y ∈Y
(ii) The feasible set for the j th follower (for any leaders strategy x = (xi , x−i ))
can be defined as
Aj (xi , x−i , y −j ) = y j ∈ Y j , : gj (xi , x−i , y j ) ≤ 0, h(xi , x−i , y j , y −j ) ≤ 0 .
A Method for Solving Some Class of Multilevel Multi-leader 593
(iii) The Nash rational reaction set for the j th follower is defined by the set of
parametric solutions,
j
Bj (xi , x−i , y −j ) = ȳ ∈ Y j : ȳ j ∈ argmin fj (xi , x−i , y j , y −j ) :
y j ∈ Aj (xi , x−i , y −j ) , j = 1, . . . , M .
(iv) The feasible set for the ith leader is defined as
i j −j
Ai (x−i ) = (x , y , y ) ∈ X i × Y j × Y −j : Gi (xi , y j , y −j ) ≤ 0, H(xi , x−i , y j , y −j ) ≤ 0,
gj (xi , x−i , y j ) ≤ 0, h(xi , x−i , y j , y −j ) ≤ 0, y j ∈ Bj (xi , x−i , y −j ), j = 1, . . . , M .
(v) The Nash rational reaction set for the ith leader is defined as
Bi (x−i ) = (xi , y j , y −j ) ∈ X i × Y j × Y −j : xi ∈ arg min Fi (xi , x−i , y j , y −j ) :
(xi , y j , y −j ) ∈ Ai (x−i ), i = 1, . . . , N .
(vi) The set of Nash equilibrium points (optimal solutions) of problem (2) is
given by
S = (xi , x−i , y j , y −j ) : (xi , x−i , y j , y −j ) ∈ A, (xi , y j , y −j ) ∈ Bi (x−i ), i = 1, . . . , N .
(i) the feasible set for the third level followers problem by Ω3 (xi , x−i , y −j );
(ii) the rational reaction set for the third level followers problem by
Ψ3 (xi , x−i , y −j );
(iii) the feasible set for the second level problem by Ω2 (x−i );
(iv) the rational reaction set for the second level followers problem by Ψ2 (x−i );
(v) the feasible set of problem (3) is given by:
Φ = (z, xi , x−i , y j , y −j ) : z = (x, y), gj (xi , x−i , y j ) ≤ 0, h(xi , x−i , y j , y −j ) ≤ 0,
Gi (xi , y j , y −j ) ≤ 0, H(xi , x−i , y j , y −j ) ≤ 0, i = 1, . . . , N, j = 1, . . . , M ;
With these notations and definitions, problem (3) could be rewritten as:
min 0
z
(4)
s.t. (z, xi , x−i , y j , y −j ) ∈ IR
Since every feasible point of (4) is an optimal point, the optimal set of (4) is
given by
S ∗ = IR = (z, xi , x−i , y j , y −j ) : (z, xi , x−i , y j , y −j ) ∈ Φ, (xi , y j , y −j ) ∈ Ψ2 (x−i ) .
Once we have established relations between the BLMLMF problem (2) and
the TLSLMF problem (3). We will describe their equivalence with the following
conclusions.
Theorem 31 A point (x∗,i , x∗,−i , y ∗,j , y ∗,−j ) is an optimal solution to (2) if and
only if (z ∗ , x∗,i , x∗,−i , y ∗,j , y ∗,−j ) is an optimal solution to (4).
Proof:- Suppose that (x∗,i , x∗,−i , y ∗,j , y ∗,−j ) is an optimal solution to (2),
i.e., (x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ∈ S which implies that, (x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ∈
A, (x∗,i , y ∗,j , y ∗,−j ) ∈ Bi (x∗,−i ), i = 1, . . . , N . This implies
(x∗,i , y ∗,j , y ∗,−j ) ∈ Ψ2 (x∗,−i ), gj (x∗,i , x∗,−i , y ∗,j ) ≤ 0, h(x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ≤ 0,
Gi (x∗,i , y ∗,j , y ∗,−j ) ≤ 0, H(x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ≤ 0, i = 1, . . . , N, j = 1, . . . , M.
Then for any point (z ∗ , x∗,i , x∗,−i , y ∗,j , y ∗,−j ), z ∗ = (x∗ , y ∗ ), and
(x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ∈ S, we have
∗,i ∗,j ∗,−j ∗,−i ∗ ∗ ∗ ∗,i ∗,−i ∗,j ∗,i ∗,−i ∗,j ∗,−j
(x ,y ,y ) ∈ Ψ2 (x ), z = (x , y ), gj (x ,x ,y ) ≤ 0, h(x ,x ,y ,y )
∗,i ∗,j ∗,−j ∗,i ∗,−i ∗,j ∗,−j
≤ 0, Gi (x ,y ,y ) ≤ 0, H(x ,x ,y ,y ) ≤ 0, i = 1, . . . , N, j = 1, . . . , M.
This implies that (z ∗ , x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ∈ Φ and (x∗,i , y ∗,j , y ∗,−j ) ∈
Ψ2 (x∗,−i ). Therefore (z ∗ , x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ∈ IR = S ∗ and hence
(z ∗ , x∗,i , x∗,−i , y ∗,j , y ∗,−j ) is an optimal solution to (4).
A Method for Solving Some Class of Multilevel Multi-leader 595
This implies that (x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ∈ A, (x∗,i , y ∗,j , y ∗,−j ) ∈ Bi (x∗,−i ), i =
1, . . . , N . Therefore (x∗,i , x∗,−i , y ∗,j , y ∗,−j ) ∈ S and hence (x∗,i , x∗,−i , y ∗,j , y ∗,−j )
is an optimal solution to (2). 2
Remark 1. The idea described above can be extended to any finite k-level multi-
leader multi-follower programming problem. By adding an upper decision maker,
problem (1) can be equivalently reformulated as (k +1)-level MLSLMF program-
ming. As a result, leaders in the upper level problem of (1) becomes followers at
the second-level and followers at mth -level problem of (1) becomes followers at
th
(m + 1) -level, where m ∈ 2, . . . , k.
5 Example
Consider the following bilevel multi-leader multi-follower programming problem:
1
min F1 (x1 , x2 , y1 , y2 ) = x1 − y1
x1 2
1
min F2 (x1 , x2 , y1 , y2 ) = − x2 − y2
x2 2
s.t. 0 ≤ x1 , x2 ≤ 1
(5)
1
min f1 (x1 , x2 , y1 , y2 ) = y1 (−1 + x1 + x2 ) + y12
y1 2
1 2
min f2 (x1 , x2 , y1 , y2 ) = y2 (−1 + x1 + x2 ) + y2
y2 2
s.t. y1 ≥ 0, y2 ≥ 0
min 0
z
s.t. z = (x, y)
1
min F1 (x1 , x2 , y1 , y2 ) = x1 − y1
x1 2
1
min F2 (x1 , x2 , y1 , y2 ) = − x2 − y2
x2 2
(6)
s.t. 0 ≤ x1 , x2 ≤ 1
1
min f1 (x1 , x2 , y1 , y2 ) = y1 (−1 + x1 + x2 ) + y12
y1 2
1 2
min f2 (x1 , x2 , y1 , y2 ) = y2 (−1 + x1 + x2 ) + y2
y2 2
s.t. y1 ≥ 0, y2 ≥ 0
A Method for Solving Some Class of Multilevel Multi-leader 597
Then (6) is transformed into the following tri-level programming problem with single
follower:
min 0
z
s.t. z = (x, y)
1 1
min F (x1 , x2 , y1 , y2 ) = x1 − x2 − y1 − y2
x1 ,x2 2 2
s.t. 0 ≤ x1 , x2 ≤ 1
1 2 1 2
min f (x1 , x2 , y) = y1 + y2 + y1 (−1 + x1 + x2 ) + y2 (−1 + x1 + x2 )
y1 ,y2 2 2
s.t. y1 ≥ 0, y2 ≥ 0
(7)
Then the third level problem in (7) can be considered as a MPP problem with parameter
x = (x1 , x2 ):
1 2 1 2
min f (x1 , x2 , y) = y1 + y2 + y1 (−1 + x1 + x2 ) + y2 (−1 + x1 + x2 )
y1 ,y2 2 2 (8)
s.t. 0 ≤ x1 , x2 ≤ 1, y1 ≥ 0, y2 ≥ 0
The Lagrangian of the problem is given by, L(x, y, λ) = 12 y12 + 12 y22 + y1 (−1 + x1 + x2 ) +
y2 (−1 + x1 + x2 ) and the KKT points are given by
⎧
⎪ ∂L ∂L
⎪
⎨ y1 ∂y = y1 (−1 + x1 + x2 ) = 0, ∂y = −1 + x1 + x2 ≥ 0, y1 ≥ 0,
1 1
⎪
⎪ ∂L ∂L
⎩ y2 = y2 (−1 + x1 + x2 ) = 0, = −1 + x1 + x2 ≥ 0, y2 ≥ 0
∂y2 ∂y2
Therefore, the parametric solution with the corresponding critical regions are given by
(Fig. 1):
⎧
⎧
⎪
⎪ 1 − x1 − x2 ⎪
⎪ 0
⎨ y ∗ (x) = ⎨ y ∗ (x) =
1 − x1 − x2 0
CR1 = and CR2 =
⎪
⎪ x 1 + x2 ≤ 1 ⎪
⎪ x 1 + x2 ≥ 1
⎩ ⎩
0 ≤ x1 , x2 ≤ 1 0 ≤ x1 , x2 ≤ 1
which can be incorporated into the second level followers problem of (7) and after
solving the resulting problems in each critical region we have the following solutions:
598 A. B. Zewde and S. M. Kassa
Since the objective value obtained in CR1 is better we can take (x1 , x2 , y1 , y2 ) =
(0, 0, 1, 1) as an optimal solution to the second level followers problem of (7). Therefore,
the optimal solution to the bilevel multi-leader multi-follower programming problem (5)
is (x1 , x2 , y1 , y2 ) = (0, 0, 1, 1) with the corresponding objective values F1 = −1, F2 =
−1, f2 = −0.5 and f2 = −0.5.
6 Conclusion
In a multilevel multi-leader-follower programming problem, various relationships
among multiple leaders in the upper-level and multiple followers at the lower-level
would generate different decision processes. To support decision in such problems, this
work considered a class of multilevel multi-leader multi-follower programming prob-
lem, that consist of separable terms and parameterized common terms across all objec-
tive functions of the followers and leaders, into multilevel single-leader multi-follower
programming problem. Then the reformulated problem is transformed into an equiv-
alent multilevel programs having only single follower at each level of the hierarchy.
Finally this single leader hierarchical problem is solved using the solution procedure
proposed in [5, 7]. The proposed solution approach can solve multilevel multi-leader
multi-follower problems whose objective values in all levels have common, but having
different positive weights of, nonseparable terms and with the constraints at each level
are polyhedral. However, much more research is needed in order to provide algorith-
mic tools to effectively solve the procedures. In this regard we feel it deserves further
investigations.
References
1. Başar, T., Olsder, G.: Dynamic Noncooperative Game Theory. Classics in Applied
Mathematics. SIAM, Philadelphia (1999)
2. Ehrenmann, A.: Equilibrium problems with equilibrium constraints and their appli-
cations in electricity markets. Dissertation, Judge Institute of Management, Cam-
bridge University, Cambridge, UK (2004)
3. Ehrenmann, A.: Manifolds of multi-leader Cournot equilibria. Oper. Res. Lett. 32,
121–125 (2004)
4. Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Com-
plementarity Problems. Springer Series in Operations Research, vol. I, 1st edn.
Springer, New York (2003)
5. Faı́sca, N.P., Saraiva, M.P., Rustem, B., Pistikopoulos, N.E.: A multi-parametric
programming approach for multilevel hierarchical and decentralised optimisation
problems. Comput. Manag. Sci. 6, 377–397 (2009)
6. Kassa, A.M., Kassa, S.M.: A branch-and-bound multi-parametric programming
approach for general non-convex multilevel optimization with polyhedral con-
straints. J. Glob. Optim. 64(4), 745–764 (2016)
A Method for Solving Some Class of Multilevel Multi-leader 599
7. Kassa, A.M., Kassa, S.M.: Deterministic solution approach for some classes of
nonlinear multilevel programs with multiple follower. J. Glob. Optim. 68(4), 729–
747 (2017)
8. Kulkarni, A.A.: Generalized Nash games with shared constraints: existence, effi-
ciency, refinement and equilibrium constraints. Ph.d. Dissertation, Graduate Col-
lege of the University of Illinois, Urbana, Illinois (2010)
9. Kulkarni, A.A., Shanbhag, U.V.: An existence result for hierarchical stackelberg
v/s stackelberg games. IEEE Trans. Autom. Control 60(12), 3379–3384 (2015)
10. Leyffer, S., Munson, T.: Solving multi-leader-common-follower games. Optim.
Methods Softw. 25(4), 601–623 (2010)
11. Okuguchi, K.: Expectations and stability in oligopoly models. In: Lecture Notes in
Economics and Mathematical Systems, vol. 138. Springer, Berlin (1976)
12. Pang, J.S., Fukushima, M.: Quasi-variational inequalities, generalized nash equi-
libria, and multi-leader-follower games. Comput. Manag. Sci. 2(1), 21–56 (2005)
13. Pang, J.S., Fukushima, M.: Quasi-variational inequalities, generalized nash equi-
libria, and multi-leader-follower games. Comput. Manag. Sci. 6, 373–375 (2009)
14. Sherali, H.D.: A multiple leader stackelberg model and analysis. Oper. Res. 32(2),
390–404 (1984)
15. Su, C.L.: A sequential ncp algorithm for solving equlibrium problems with equi-
librium constraints. Technical report, Department of Management Science and
Engineering, Stanford University (2004)
16. Su, C.L.: Analysis on the forward market equilibrium model. Oper. Res. Lett.
35(1), 74–82 (2007)
17. Sun, L.: Equivalent bilevel programming form for the generalized nash equilibrium
problem. J. Math. Res. 2(1), 8–13 (2010)
A Mixture Design of Experiments
Approach for Genetic Algorithm Tuning
Applied to Multi-objective Optimization
1 Introduction
inadequate setup of these parameters can affect the performance of the algo-
rithm, leading to unsatisfactory solutions. In order to bypass the configuration
issues of the GA, many studies have proposed methods for the optimization of
these parameters, which include the use of adaptive techniques, meta-heuristics
or yet the use of design of experiments.
This study addresses not only the optimization of GA parameters, but also
the optimization of the weights applied to the objective functions of a MOP and
the interactions which may exist between the weights and the parameters of the
algorithm used to solve it. Therefore, this work proposes an experimental pro-
cedure that applies the design of experiments methodology, through a mixture
design with process variables, in order to evaluate of the influence of the genetic
algorithm parameters on the results of a multi-objective optimization problem
using weighted responses. By using this procedure, it is also possible to deter-
mine both the optimal weights to be used in the GCM function and the optimal
parameters to be used for tuning the GA.
To demonstrate the applicability of the proposed method, a case study of a
flux-cored arc welding (FCAW) process is used. Four input parameters are used
to configure the FCAW process and its optimization includes four responses that
describe the weld bead geometry.
2 Theoretical Fundamentals
2.1 Global Criterion Method
Several methods for optimization of multiple objectives can be found in the
literature [1]. For scalarization methods, the strategy is to combine individual
objective functions into a single function, which becomes the global objective of
the problem. In the global criterion method (GCM), the optimum solution x∗
is found by minimizing a pre-selected global criterion, F (x) [2]. In this study,
the global criterion adopted is based on the normalization of the objectives, so
they will have the same magnitude. The MCG equation is then defined as in
Eq. 1, where fi (x∗ ) is the optimal value for the individual optimization (utopia
point) of each response and fi (xmax ) is the most distant value from fi (x∗ ) (nadir
point).
p 2
fi (x∗ ) − fi (x)
Min F (x) = wi
i=1
fi (xmax ) − fi (x∗ ) (1)
s.t.: gj (x) ≤ 0, j = 1, 2, ...m
3 Experimental Method
To evaluate the influence of GA parameters to solve a multi-objective problem
and determine which of the tested parameters gives the best results for this
problem in particular, an experimental procedure was developed. In this case,
the algorithm parameters will be considered as the process variables and they
will be evaluated in a mixture design combined with process variables, where the
mixing ratios are the weights of the objective functions in the global criterion
method. Thereunto, the two-level optimization strategy used in this work was
performed according to the following procedure of five steps:
Step 1: Problem Definition
The optimization problem is defined by determining the responses, control vari-
ables and process variables. The responses must be individually optimized in
order to find the utopia and nadir points for each one of them. With these val-
ues, it is possible to define the GCM function, according to Eq. 1, which along
with the restriction functions will characterize the MOP.
Step 2: Experimental Design
In this step, the mixture design is defined by establishing the experimental matrix
in which the weights for the objective functions and the algorithm parameters
will be tested.
Step 3: Optimization Using Genetic Algorithm
The MOP defined in step 1 is solved using GA and the weights and parameters
are set up as specified in the experimental design defined in step 2. The results
found in each experiment are used for the calculation of the Global Percent-
age Error, according to Eq. 3, where: yi∗ are the values of the Pareto-optimal
responses, Ti are the targets (individual optimization solutions) and m is the
number of objectives.
m ∗
yi
GP E = Ti − 1 (3)
i=1
The target values for the responses were established using the individual
constrained minimization for P and D, and the individual constrained maxi-
mization for W and R. The values found in the individual optimizations are
shown in Table 1. The MOP can be then stated as in Eq. 4, where wi is the
weight applied for each response and the problem is subjected to the experimen-
tal space constraint X T X ≤ 22 .
2
2
15.576 − W 0.828 − P
Min G = w1 + w2
−3.125 0.584
2
2
3.342 − R 16.275 − D (4)
+ w3 + w4
−0.287 7.796
s.t.:W 2 + P 2 + R2 + D2 ≤ 4.0
The GA parameters chosen for analysis as process variables for this study
were population size (Tp ), crossover rate (Tc ) and mutation type(Tm ). The levels
tested (−1e + 1) were: 20 and 100 for Tp , 0.15 and 0.85 for Tc and for Tm
the functions Gaussian (Gau) and Adaptive Feasible (AF). The experiments
matrix was based on a simplex lattice design of fourth-degree created for four
components {4, 4} with 0.05 ≤ wi ≤ 0.85, coupled with a full factorial design
606 T. I. de Paula et al.
Run w1 w2 w3 w4 Tp Tc
Run Tm w1 w2 w3 w4 Tp Tc Tm
. . . . . . . .
. . . . . . . .
1 0.85 0.05 0.05 0.05 20 0.15 Gau . . . . . . . .
2 0.65 0.25 0.05 0.05 20
Gau 226 0.15 0.25 0.25 0.05 0.45 20 0.85 AF
.. .. .. .. .. .... ..
. . . . . . . 227 . 0.25 0.05 0.65 0.05 20 0.85 AF
. . . . . . . .
. . . . . . . .
98 0.05 0.25 0.45 0.25 20 0.85 Gau . . . . . . . .
99 0.05 0.25 0.25 0.45 20 0.85 Gau 279 0.05 0.05 0.25 0.65 100 0.85 AF
.. .. .. .. .. .. .. ..
. . . . . . . . 280 0.05 0.05 0.05 0.85 100 0.85 AF
(23 ) for three process variables (parameters), which resulted in 280 experiments.
Table 2 presents a fragment of the created mixture design.
The optimization problem was programmed and optimized in Matlab® ,
according to the 280 weights and parameters settings determined by the mixture
design, in 30 replicates. The means of the results found in each replicate were
used to calculate the global percentage error according to Eq. 3 and the GPE
function was modeled with the DOE analysis in Minitab® , considering a level
of significance of 5%. The final model for the GPE function (Eq. 5) presented an
adj.R2 of 95.95%, the residues were normally distributed and no lack of fit was
found.
GP E = 0.744w1 + 0.232w2 + 0.374w3 + 0.224w4 + 0.364w1 w3
− 0.219w2 w3 + 0.231w2 w3 (w2 − w3 ) − 4.240w12 w2 w3
− 3.436w12 w2 w4 − 4.725w12 w3 w4 + 0.907w1 w3 (w1 − w3 )2
+ 0.284w1 w3 (w1 − w3 )Tc − 0.179w2 w3 (w2 − w3 )Tc (5)
− 0.048w3 w4 Tp Tc + 0.150w1 w2 (w1 − w2 )Tp Tm
− 0.050w1 w2 Tc Tm + 0.562w1 w2 (w1 − w2 )2 Tc Tm
+ 0.014w3 Tp Tc Tm + 0.195w1 w2 (w1 − w2 )Tp Tc Tm
It is possible to notice in Eq. (5) that the three GA parameters tested pre-
sented significant interactions with the mixture components, which means that
changes in the algorithm configuration will impact the final results of such met-
ric. Additionally, it is possible to evaluate the influence of the GA parameters
by analyzing the main effects plots of these parameters, presented in Fig. 1,
which shows how changing the levels of the process variables impact the average
results obtained for GPE. It can be observed that the process variables levels
that provide the smallest GPE values are Tp = 100, Tc = 0, 15 and Tm = Gau.
The contour plots for different combinations of weights and parameters shown
in Fig. 2 indicate which regions contain minimum and maximum values for the
GPE function, for some specific configurations tested in the arrangement. These
plots are distributed in the cube vertices according to the GA settings. By ana-
A Mixture DOE Approach for GA Tuning Applied to MOO 607
lyzing Fig. 2, it is evident that the algorithm parameters affect the results for
the response, since the contour plots are distinct for different combinations of
parameters, despite having the same combinations of weights. It can also be
noticed the great influence of the weights assigned to the objective functions in
the results for GPE.
With the GPE function properly modeled, the optimal parameters and
weights were obtained by solving the optimization problem described in Eq.
(6). The problem was solved through the desirability method in Minitab’s opti-
mizer. The optimization results were: w1 = 0.05, w2 = 0.85, w3 = 0.05, w4 =
0.05, Tp = 20, Tc = 0.85 and Tm = Gau, which resulted in a GP E = 0.232 with
a desirability composite D = 0.9687.
608 T. I. de Paula et al.
Min GP E
s.t.: w1 + w2 + w3 + w4 = 1
0.05 ≤ wi ≤ 0.85
(6)
20 ≤ Tp ≤ 100
0.15 ≤ Tc ≤ 0.85
− 1 ≤ Tm ≤ 1
The optimal configuration of weights and parameters was used for the solu-
tion of the initial optimization problem, the FCAW process defined in Eq. (4),
in 30 replicates, to confirm that the optimal combination of weights and param-
eters will lead to the best optimization results for this process, specifically. With
the mean results, it was possible to calculate the welding process input param-
eters and to determine the behavior of the FCAW responses (Table 3). It can
be noted from Table 3 that all four responses were stablished relatively close to
their targets, which suggests that the proposed method is very suitable for the
optimization of the FCAW process.
Wf V S N W P R D GP E
Optimal 9.4 28.7 22.8 23.7 13.648 0.864 3.283 17.094 0.236
Target – – – – 15.58 0.83 3.34 16.27 –
Objective – – – – Max Min Max Min –
Unit m/mm V cm/min mm mm mm mm % –
5 Conclusion
References
1. Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engi-
neering. Struct. Multidiscip. Optim. 26, 369–395 (2004). https://doi.org/10.1007/
s00158-003-0368-6
2. Rao, S.S.: Engineering Optimization: Theory and Practice, 4th edn. Wiley, New
Jersey (2009)
3. Heredia-Langner, A., Montgomery, D.C., Carlyle, W.M.: Solving a multistage par-
tial inspection problem using genetic algorithms. Int. J. Product. Res. 40(8), 1923–
1940 (2002). https://doi.org/10.1080/00207540210123337
4. Holland, J.H.: Adaptation in Natural and Artificial Systems. Ph.D. thesis, Univer-
sity of Michigan Press (1975). https://doi.org/10.1086/418447
5. Zain, A.M., Haron, H., Sharif, S.: Application of GA to optimize cutting conditions
for minimizing surface roughness in end milling machining process. Expert Syst.
Appl. 37(6), 4650–4659 (2010). https://doi.org/10.1016/j.eswa.2009.12.043
6. Fleming, P., Purshouse, R.: Evolutionary algorithms in control systems engineer-
ing: a survey. Control Eng. Pract. 10(11), 1223–1241 (2002). https://doi.org/10.
1016/S0967-0661(02)00081-3
7. Maaranen, H., Miettinen, K., Penttinen, A.: On initial populations of a genetic
algorithm for continuous optimization problems 37 (2007). https://doi.org/10.
1007/s10898-006-9056-6
8. Candan, G., Yazgan, H.R.: Genetic algorithm parameter optimisation using
Taguchi method for a flexible manufacturing system scheduling problem. Int.
J. Product. Res. 53(3), 897–915 (2014). https://doi.org/10.1080/00207543.2014.
939244
9. Weise, T., Wu, Y., Chiong, R., Tang, K., Lässig, J.: Global versus local search:
the impact of population sizes on evolutionary algorithm performance. J. Global
Optim. 1–24 (2016). https://doi.org/10.1007/s10898-016-0417-5
10. Ortiz, F., Simpson, J.R., Pignatiello, J.J., Heredia-langner, A.: A genetic algo-
rithm approach to multiple-response optimization. J. Qual. Technol. 36(4), 432–
450 (2004)
11. Grefenstette, J.: Optimization of control parameters for genetic algorithms. IEEE
Trans. Syst. Man Cybern. 16(February), 122–128 (1986)
12. Eiben, A.E., Smit, S.K.: Evolutionary algorithm parameters and methods to tune
them. In: Autonomus Search, Chap. 2, pp. 15–36. Springer, Heidelberg (2012).
https://doi.org/10.1007/978-3-642-21434-9
13. Alajmi, A., Wright, J.: Selecting the most efficient genetic algorithm sets in solving
unconstrained building optimization problem. Int. J. Sustain. Built Environ. 3(1),
18–26 (2014). https://doi.org/10.1016/j.ijsbe.2014.07.003
14. Fernandez-Prieto, J.a., Canada-Bago, J., Gadeo-Martos, M.a., Velasco, J.R.: Opti-
misation of control parameters for genetic algorithms to test computer networks
under realistic traffic loads. Appl. Soft Comput. J. 12(4), 1875–1883 (2012).
https://doi.org/10.1016/j.asoc.2012.04.018
610 T. I. de Paula et al.
1 Introduction
This paper concerns optimization over the efficient set of a multiobjective linear
programming problem, which is formulated as
minimize Cx
(1)
subject to Ax b, x 0,
X := {x ∈ Rn |Ax b, x 0}
The efficient solution is also called Pareto optimal solution. Let E denote the set
of all the efficient solutions of the problem (1). Then the optimization problem
over the efficient set (OE problem) is defined as
minimize φ(x)
(2)
subject to x ∈ E,
We note that Assumption 1 is very strong, because it is not satisfied if the dimen-
sion of the feasible region X is larger than 1. Hence, Sun’s result is valid only
when the dimension of the feasible region X is at most 1. We will give examples
in Sect. 4 to show what could happen when this assumption fails. Assumption
2 is rather standard and used in the most analysis of the problem (2). Please
note that the problem (1) is feasible and the efficient set E is nonempty under
Assumption 2.
We state Sun’s main result [17] in the next proposition.
To implement Sun’s approach, we assume that there is some i which can let
the problem solved by Sun’s approach. So we test a random i ∈ {1, . . . , n} to
A Numerical Study on MIP Approaches over the Efficient Set 615
set up C(i) and ci . In fact, obviously, the problem does not always satisfy the
Assumption 1 for the system
Ax = 0, 0 x c, xi = t
4 Conclusion
In this paper, we propose that an optimal solution of the optimization problem
(2) over the efficient set of the multiobjective linear programming problem (1)
is computed by solving the MIP problem (4). Compared with the previous MIP
approach by Sun [17], our approach does not make Assumption 1 on the problem
(1) and reduce the size of the MIP problem. By the preliminary computational
experiments, we observed that our approach can solve the linear cases of the OE
problem more accurate and faster.
References
1. An, L.T.H., Tao, P.D., Muu, L.D.: Numerical solution for optimization over the
efficient set by dc optimization algorithms. Oper. Res. Lett. 19(3), 117–128 (1996)
2. Benson, H.P.: An algorithm for optimizing over the weakly-efficient set. Eur. J.
Oper. Res. 25(2), 192–199 (1986)
3. Benson, H.P.: A finite, nonadjacent extreme-point search algorithm for optimiza-
tion over the efficient set. J. Optim. Theory Appl. 73(1), 47–64 (1992)
4. Benson, H.P.: An outcome space algorithm for optimization over the weakly effi-
cient set of a multiple objective nonlinear programming problem. J. Global Optim.
52(3), 553–574 (2012)
1
In fact, since all 0–1 linear programmings are a kind of DC, but the reverse is not
true. So the specialized tools on MIP may able to achieve a better performance.
2
There are more specialized experiments which satisfied the Assumption 1 been con-
ducted, and can be found in author’s homepage. Even in those cases, the accuracy of
both approaches are the same while the running times of our approach are still less
than the previous approach. But the feasible region is a 1-dimensional space which
is so special, so we do not put these instances in this paper.
616 K. Lu et al.
5. Benson, H.P., Lee, D.: Outcome-based algorithm for optimizing over the efficient
set of a bicriteria linear programming problem. J. Optim. Theory Appl. 88(1),
77–105 (1996)
6. Bolintineanu, S.: Minimization of a quasi-concave function over an efficient set.
Math. Program. 61(1–3), 89–110 (1993)
7. Dauer, J.P., Fosnaugh, T.A.: Optimization over the efficient set. J. Global Optim.
7(3), 261–277 (1995)
8. Hoang, T.: Convex Analysis and Global Optimization. Springer (2016)
9. Liu, Z., Ehrgott, M.: Primal and dual algorithms for optimization over the efficient
set. Optimization 67(10) 1–26 (2018)
10. Lu, K., Mizuno, S., Shi, J.: A mixed integer programming approach for the mini-
mum maximal flow. J. Oper. Res. Soc. Jpn 64(4), 261–271 (2018)
11. Lu, K., Mizuno, S., Shi, J.: Optimization over the efficient set of a linear multiob-
jective programming: Algorithm and applications (to appear). RIMS Kôkyûroku
(2018)
12. Muu, L.D., Thuy, L.Q.: On dc optimization algorithms for solving minmax flow
problems. Math. Methods Oper. Res. 80(1), 83–97 (2014)
13. Philip, J.: Algorithms for the vector maximization problem. Math. Program. 2(1),
207–229 (1972)
14. Phong, T.Q., Tuyen, J.: Bisection search algorithm for optimizing over the efficient
set. Vietnam J. Math. 28, 217–226 (2000)
15. Shi, J., Yamamoto, Y.: A global optimization method for minimum maximal flow
problem. Acta Math. Vietnam. 22(1), 271–287 (1997)
16. Shigeno, M., Takahashi, I., Yamamoto, Y.: Minimum maximal flow problem: an
optimization over the efficient set. J. Global Optim. 25(4), 425–443 (2003)
17. Sun, E.: On optimization over the efficient set of a multiple objective linear pro-
gramming problem. J. Optim. Theory Appl. 172(1), 236–246 (2017)
18. Thach, P., Konno, H., Yokota, D.: Dual approach to minimization on the set of
pareto-optimal solutions. J. Optim. Theory Appl. 88(3), 689–707 (1996)
19. Yamamoto, Y.: Optimization over the efficient set: overview. J. Global Optim.
22(1–4), 285–317 (2002)
Analytics-Based Decomposition of a Class
of Bilevel Problems
In (iii) weak constraints such as a single linear inequality are allowed. Thus the
BPMSIF has the form:
minx 1 ...x Q ,y 1 ...y Q F (x1 , . . . , xQ , y 1 , . . . , y Q )
s.t. G(x1 , . . . , xQ , y 1 , . . . , y Q ) ≤ 0
where each y q (q = 1, . . . , Q) solves
(1)
miny q fq (xq , y q )
s.t. gq (xq , y q ) ≤ 0
xq ∈ Xq , y q ∈ Yq
Ω = {(x1 , . . . , xQ ,y 1 , . . . , y Q ) ∈ X1 . . . × XQ × Y1 × . . . × YQ :
G(x1 , . . . , xQ , y 1 , . . . , y Q ) ≤ 0, g(xq , y q ) ≤ 0, q = 1, . . . , Q}
Ωq (xq ) = {y q : (xq , y q ) ∈ Ω}
IR = {(x1 , . . . , xQ , y 1 , . . . , y q ) :(x1 , . . . , xQ , y 1 , . . . , y q )
∈ Ω, y q ∈ Ψq (x), q = 1, . . . , Q}
As in standard bilevel programming min and argmin have been used without
loss of generality: each subproblem may involve maximisation. Note that the
follower problems need not be linear, or even optimisation problems: follower q
can be any algorithm that computes y q from xq .
Classical and evolutionary approaches have been applied to single- and multi-
follower problems. [13] presents a general framework and solutions for nine classes
of multi-follower problem, but none are applicable to the BPMSIF. Classical
methods for multi-follower problems include Kuhn-Tucker approaches [14,15,22]
and branch-and-bound algorithms [16]. The Kth-best approach (or some modifi-
cation) has also been applied to multi-follower problems [23,24,27,28]. [4] refor-
mulate a problem with multiple followers into one with one leader and one fol-
lower, by replacing the lower levels with an equivalent objective and constraint
region. This method also cannot be applied to the BPMSIF, as neither its objec-
tives nor its inducible region are equivalent to the problem class of [4]. Addition-
ally, the methods proposed in [4,13] assume that the followers are linear, which
is not the case with the BPMSIF.
Most classical methods for handling bilevel problems require assumptions of
smoothness, linearity or convexity, while the BPMSIF makes no such assump-
tions. Evolutionary and meta-heuristic techniques also do not make these
assumptions [2,9,11] but most are computationally intensive nested strategies.
They are efficient for smaller problems but do not scale up well to large-scale
problems. In contrast, our approach scales up well as the number of followers
increases (see Sect. 3).
3 Numerical Examples
To illustrate and evaluate our decomposition approach, we use two example
problems. Monte Carlo simulation and clustering were done in Java and R (using
the CLARA package [17]) respectively. The CPLEX 12.6 solver was also used on
a 3.0 GHz Intel Xeon Processor with 8 GB of RAM.
where λ11 = x1 , λ12 = x2 , λ21 = x3 and λ22 = x4 . This model can be linearised
using the big-M approach, but the ILP is solved faster when CPLEX indica-
tor constraints are used.1 The binary variables uk and vk ensure that only one
assignment each is selected from Λk1 and Y k1 , and from Λk2 and Y k2 respec-
tively. The λ11 + λ12 + λ21 + λ22 ≤ 40 constraint ensures that an (x1 , x2 ) and
an (x3 , x4 ) that satisfy the original constraints on the x are selected.
In experiments, as K increases better solutions were found, with the highest
value of 6594.05 obtained when K = 160 giving x = (8.13, 3.80, 11.23, 16.82),
y 1 = (0.74, 11.20) and y 2 = (28.04, 0.00) (rounded to 2 decimal places). The
clustering time when K = 160 is 234.53 seconds. The solution is 0.09% less than
optimal, but the strength of our approach is in its ability to handle large-scale
problems, as demonstrated next.
1
These are a way of expressing if-else relationships among variables [8].
622 A. Fajemisin et al.
Q Q
max q aq xq + q bq y q
N
s.t. xq ∈ R q = 1...Q
xqn ≤ xmax
qn q = 1 . . . Q, n = 1 . . . N
(4)
y q ∈ argmin cq xq + dq y q q = 1 . . . Q
N N
s.t. n yqn ≤ n xqn q = 1...Q
max
eqn xqn ≤ yqn ≤ yqn q = 1 . . . Q, n = 1 . . . N
where q aq xq = q n aqn xqn , x and y are the variables controlled by the
leader and followers respectively, and Q is the total number of followers. Both
the x and y are vectors of real numbers. The leader variables are partitioned
among the followers such that each follower contains one xq each, and each xq
is of size n. Each component of the vector xqn is constrained to be ≤ a given
upper bound xmax qn . aq , bq , cq , dq and eq are vectors of constants.
The decomposition approach outlined in Sect. 2 was used to decompose the
problem, which is then written as:
Q N Q N
max q n aqn xqn + q n bqn yqn
s.t. xqn − Xkqn ≤ M (1 − ukq ) k = 1 . . . K, q = 1 . . . Q, n = 1 . . . N
Xkqn − xqn ≤ M (1 − ukq ) k = 1 . . . K, q = 1 . . . Q, n = 1 . . . N
yqn − Ykqn ≤ M (1 − ukq ) k = 1 . . . K, q = 1 . . . Q, n = 1 . . . N
Ykqn − yqn ≤ M (1 − ukq ) k = 1 . . . K, q = 1 . . . Q, n = 1 . . . N
K
k ukq = 1 q = 1...Q
ukq ∈ {0, 1} k = 1 . . . K, q = 1...Q
algorithm is given in [6], and the variance-based termination criteria was set to
0.000001.
MFGA Parameters These were also varied using 100 followers. The popula-
tion size popSize was varied from 30–90, and the maximum number of gener-
ations maxGens from 50–500. The MFGA parameters selected were therefore:
maxGens = 500, popSize = 50. This population size was selected because,
although there is little difference between its objective value and the best objec-
tive at popSize = 80, the difference in time taken is almost 50% less. Uni-
form crossover with a crossover rate of 0.5 (50%) was used. Other parameters
are eliteP ercentage = 0.20, tournamentSize
Q N= 5, mutationRate = 0.015 and
Q N
f itnessF unction = q n aqn xqn + q n bqn yqn .
Comparing all 3 approaches For both N-BLEA and MFGA, each problem
size was solved 10 times, and the average objective values and solution times
were recorded. It should be noted that the poor performance of N-BLEA is
due to the operation of its crossover operator which is additive in nature, and
frequently violates the bounds of the vectors. This crossover operator results in
offspring which are frequently infeasible, and are thus heavily penalised by the
constraint handling scheme. MFGA was designed to avoid this problem: since
vector generation is done using Hypersphere Point Picking with the appropriate
boundaries, it always produces feasible offspring.
624 A. Fajemisin et al.
For 10–100 followers, the solution found by the MFGA was better in 7 out
of 10 of the cases, though the decomposition approach finds a close solution
in a fraction of the time (Fig. 1). However, as the problems get larger (Q from
100–1000) the decomposition approach is much better in terms of both solution
quality and runtime (Fig. 2). This demonstrates the scalability of our approach.
Reduction of a very large set of potential solutions to a much smaller (but highly
representative) set using medoids allows the ILP to choose the best solution from
a vast number of possibilities.
4 Conclusions
In this paper a new class of multi-follower bilevel problems, the BPMSIF, was
proposed. In this problem no variables are shared between followers, so that
the leader variables can be partitioned among the followers. Also, the follower
problems are also allowed to be integer or non-linear, and variables from different
follower problems are only connected through weak constraints. To solve the
BPMSIF an analytics-based decomposition approach was developed, and tested
on two numerical examples. The first example showed that the decomposition
approach is competitive even for small bilevel problems. More importantly, the
Analytics-Based Decomposition of a Class of Bilevel Problems 625
second example showed that the decomposition approach is much more scalable
as the number of followers increases, in terms of both runtime and solution
quality. Furthermore, we have applied these techniques to large scale cutting
stock real problem: forestry harvesting in [20].
References
1. de Amorim, R., Fenner, T.: Weighting Features for Partition Around Medoids
Using the Minkowski Metric, pp. 35–44. Springer, Heidelberg (2012)
2. Angelo, J., Barbosa, H.: Differential evolution to find Stackelberg-Nash equilibrium
in bilevel problems with multiple followers. In: IEEE Congress on Evolutionary
Computation, CEC 2015, Sendai, Japan, May 25–28, 2015, pp. 1675–1682 (2015)
3. Bard, J.: Convex two-level optimization. Math. Program. 40(1), 15–27 (1988)
4. Calvete, H., Galé, C.: Linear bilevel multi-follower programming with independent
followers. J. Glob. Optim. 39(3), 409–417 (2007)
5. Colson, B., Marcotte, P., Savard, G.: An overview of bilevel optimization. Ann.
Oper. Res. 153(1), 235–256 (2007)
6. Deb, K.: An efficient constraint handling method for genetic algorithms. Comput.
Methods Appl. Mech. Eng. 186(2–4), 311–338 (2000)
7. DeMiguel, V., Xu, H.: A stochastic multiple-leader Stackelberg model: analysis,
computation, and application. Oper. Res. 57(5), 1220–1235 (2009)
8. IBM: User’s manual of IBM CPLEX optimizer for z/OS: what is an indicator
constraint? (2017). https://ibmco/2ErnDyn
9. Islam, M., Singh, H., Ray, T.: A memetic algorithm for solving bilevel optimization
problems with multiple followers. In: IEEE Congress on Evolutionary Computa-
tion, CEC 2016, Vancouver, BC, Canada, July 24–29, 2016, pp. 1901–1908 (2016)
10. Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster
Analysis, vol. 344. Wiley (2009)
11. Liu, B.: Stackelberg-Nash equilibrium for multilevel programming with multiple
followers using genetic algorithms. Comput. Math. Appl. 36(7), 79–89 (1998)
12. Lu, J., Han, J., Hu, Y., Zhang, G.: Multilevel decision-making: a survey. Inf. Sci.
346–347(Supplement C), 463 – 487 (2016). https://doi.org/10.1016/j.ins.2016.01.
084, http://www.sciencedirect.com/science/article/pii/S0020025516300202
13. Lu, J., Shi, C., Zhang, G.: On bilevel multi-follower decision making: general frame-
work and solutions. Inf. Sci. 176(11), 1607–1627 (2006)
14. Lu, J., Shi, C., Zhang, G., Dillon, T.: Model and extended Kuhn-Tucker approach
for bilevel multi-follower decision making in a referential-uncooperative situation.
J. Glob. Optim. 38(4), 597–608 (2007)
15. Lu, J., Shi, C., Zhang, G., Ruan, D.: Multi-follower linear bilevel programming:
model and Kuhn-Tucker approach. In: AC 2005, Proceedings of the IADIS Inter-
national Conference on Applied Computing, Algarve, Portugal, February 22–25,
2005, vol. 2, pp. 81–88 (2005)
16. Lu, J., Shi, C., Zhang, G., Ruan, D.: An extended branch and bound algorithm
for bilevel multi-follower decision making in a referential-uncooperative situation.
Int. J. Inf. Technol. Decis. Mak. 6(2), 371–388 (2007)
626 A. Fajemisin et al.
17. Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K.: cluster: Cluster
Analysis Basics and Extensions (2017). R package version 2.0.6—for new features,
see the ‘Changelog’ file (in the package source)
18. Marsaglia, G.: Choosing a point from the surface of a sphere. Ann. Math. Statist.
43(2), 645–646 (1972). https://doi.org/10.1214/aoms/1177692644
19. Muller, M.: A note on a method for generating points uniformly on n-dimensional
spheres. Commun. ACM 2(4), 19–20 (1959)
20. Prestwich, S., Fajemisin, A., Climent, L., O’Sullivan, B.: Solving a Hard Cut-
ting Stock Problem by Machine Learning and Optimisation, pp. 335–347. Springer
International Publishing, Cham (2015)
21. Ramos, M., Boix, M., Aussel, D., Montastruc, L., Domenech, S.: Water
integration in eco-industrial parks using a multi-leader-follower approach.
Comput. Chem. Eng. 87(Supplement C), 190–207 (2016).https://doi.org/10.
1016/j.compchemeng.2016.01.005, http://www.sciencedirect.com/science/article/
pii/S0098135416000089
22. Shi, C., Lu, J., Zhang, G., Zhou, H.: An extended Kuhn-Tucker approach for linear
bilevel multifollower programming with partial shared variables among followers.
In: Proceedings of the IEEE International Conference on Systems, Man and Cyber-
netics, Waikoloa, Hawaii, USA, October 10–12, 2005, pp. 3350–3357 (2005)
23. Shi, C., Zhang, G., Lu, J.: The Kth-best approach for linear bilevel multi-follower
programming. J. Glob. Optim. 33(4), 563–578 (2005)
24. Shi, C., Zhou, H., Lu, J., Zhang, G., Zhang, Z.: The Kth-best approach for linear
bilevel multifollower programming with partial shared variables among followers.
Appl. Math. Comput. 188(2), 1686–1698 (2007)
25. Sinha, A., Malo, P., Frantsev, A., Deb, K.: Finding optimal strategies in a multi-
period multi-leader-follower Stackelberg game using an evolutionary algorithm.
Comput. Oper. Res. 41, 374–385 (2014)
26. Wei, C.P., Lee, Y.H., Hsu, C.M.: Empirical comparison of fast clustering algo-
rithms for large data sets. In: Proceedings of the 33rd Annual Hawaii International
Conference on System Sciences, pp. 10-pp. IEEE (2000)
27. Zhang, G., Lu, J.: Fuzzy bilevel programming with multiple objectives and coop-
erative multiple followers. J. Glob. Optim. 47(3), 403–419 (2010)
28. Zhang, G., Shi, C., Lu, J.: An extended Kth-best approach for referential-
uncooperative bilevel multi-follower decision making. Int. J. Comput. Intell. Syst.
1(3), 205–214 (2008)
KMCGO: Kriging-Assisted Multi-objective
Constrained Global Optimization
1 Introduction
2 Kriging Model
Y
n
Rðh; x; xÞ ¼ Ri ðhi ; xi xi Þ: ð4Þ
i¼1
^2 ¼ ðY FbÞ
r ^ T R1 ðY FbÞ=m.
^ Therefore, Kriging predictor at any point can be
expressed as
^ þ rT ðxÞ^c
^yðxÞ ¼ Fb ð5Þ
3 KMCGO Method
where the ^f ðxÞ at point x is predicted by Kriging, the non-negative exponent e can be
adaptively selected from fe1 . . .; 0:001; 0:01; 0:1; 1; 10; . . .ek g due to the iteration
number in each cycle. Parameter d is the distance factor, which is shown as follows:
8
< 0
dmax dmin ^f ðxÞ [ 0
kbak ;
d¼ d dmin 0 ð13Þ
:1 maxkbak ; ^f ðxÞ 0
630 Y. Li et al.
In Eq. (13), parameter dmax ¼ maxfd : x 2 Xg is the maximum distance among the
sampled design data X ¼ ½x1 ; . . .; xm .
0
The dmin ¼ minðkx x1 k; . . .; kx xm kÞ is the shortest distance between the
untried point x and the set X ¼ ½x1 ; . . .; xm T .
The second optimization objective. In the actual optimization, the sampling points
which are precisely on the approximate constraint boundaries may be infeasible. But
the points which lies in the approximate feasible area and have a tiny deviation from
the constraint boundary are likely to be feasible in most instances. We define gmax ¼
max½g1 ðxÞ; . . .; gq ðxÞ as the maximum constraint violation. If gmax is less than or equal
to zero, we think that all constraints have been met. In KMCGO, we use Kriging to
approximate all constraint functions, the probability P [11] meeting all constraints can
be shown as
" #
X
q
max ^sf ðxÞ ^sgi ðxÞ ð16Þ
i¼1
Initialization
1. Parameter initialization of KMCGO: design domain, constraints,
parameters of Kriging model , convergence conditions and so on
4. The predictive objective information, mean square error and probability of feasibility
calculated by Kriging models are used to construct three optimization objectives
10. Perform expensive function evaluations for the selected sampling points
Step 9. Stop criterion. Judge whether the convergence conditions are met, if met,
continue with the Step 10, or else, jump to Step 11.
Step 10. Expensive function evaluation of new data. Perform expensive function
evaluations for the selected new sampling points.
Step 11. End. End the process and output global approximate solution (xbest, ybest).
4 Test
To verify the performance of the KMCGO method, a series of tests, which include 7
benchmark numerical functions [13]. (G4, G7, G8 and G9), a speed reducer design
problem [14] have been executed. This section gives the test results of KMCGO and
makes comparisons with the results of SCGOSR [15], TOKCGO [7] and KCGO [11].
Table 2. Description of functions. The abbreviations for test problem, number of design
variables, number of constraints, the known optimum, the given relative error (GRE) and
maximum expensive evaluation number are TP, NODV, NOC, TKO, GRE and MEEN
respectively.
TP NODV NOC Bound Constraint TKO GRE MEEN
G4 5 6 [78, 102] [33, 45] [27, 45]3 −30665.539 1e−5 50
G7 10 8 [−10, 10]10 24.30621 0.01 150
G8 2 2 [0, 10]2 −0.095825 1e−4 50
G9 7 4 [−10, 10]7 680.630057 0.5 150
634 Y. Li et al.
sampling points or infeasible sampling points gather near the constraint boundary,
which will greatly improve the feasibility of new sampling points; (2) The remaining
two of the three optimization objectives and screening method can make new sampling
point be closer to the real optimal solution in the feasible domain
Furthermore, Figs. 3, 4, 5, 6 and 7 provides the iterative results of KMCGO. For
low dimensional problems (G4 and G8), LHD can find a feasible point in most cases,
but the high dimensional problems usually explore a feasible point in iteration opti-
mization. The optimization result of G4 has a slightly less convergence, while G7, G8,
4.3 Comparison
The comparisons are shown in Table 3. Three conclusions may be drawn: (1) LHD
usually fails to obtain feasible points for most cases; (2) For the test function G7, G8
and G9 with larger feasible exploration regions, it necessary to perform more function
evaluations (especially G7 and G9) in many cases. Even so, the approximate optimal
solution found in some cases is not very good; (3) Comparison results shows that the
KMCGO has a better convergence character in contrast with other three methods.
Table 3. Comparison results. MEEN, AOA, DTM and RRE represent for mean expensive
evaluation number, approximate optimum area, distance to minimizer and the real relative error.
TF Dim Method MEEN AOA DTM RRE
G4 5 KMCGO 44.6 −30665.472 ± 0.052 [0.015, 0.119] [4.9e−7, 3.9e−6]
SCRGOSR 53.9 −30665.463 ± 0.064 [0.012, 0.140] [3.9e−7, 4.6e−6]
TOKCGO 46.5 −30665.475 ± 0.043 [0.021, 0.127] [6.8e−7, 4.1e−6]
KCGO 32.7 −30665.480 ± 0.035 [0.024, 0.094] [7.8e−7, 3.1e−6]
(continued)
636 Y. Li et al.
Table 3. (continued)
TF Dim Method MEEN AOA DTM RRE
G7 10 KMCGO 124.8 24.5046 ± 0.192 [0.00639, [2.63e−4, 8.16e
0.1984] −3]
SCRGOSR 178.2 24.6559 ± 0.314 [0.00869, [3.58e−4,
0.69069] 0.02842]
TOKCGO 130.1 24.5878 ± 0.266 [0.01559, [6.41e−4,
0.54759] 0.02253]
KCGO 136.5 24.3139 ± 0.046 [0.00309, [1.27e−4, 5.10e
0.01239] −4]
G8 2 KMCGO 46.2 −0.0958 ± 0.000024 [1e−6, [1.0e−5, 5.1e−4]
0.000049]
SCRGOSR 51.8 −0.0958 ± 0.000013 [12e−6, [1.2e−5, 4.0e−4]
0.000038]
TOKCGO 45.9 −0.0958 ± 0.000021 [4e−6, [4.2e−5, 4.8e−4]
0.000046]
KCGO 47.4 −0.0958 ± 0.000020 [5e−6, [5.2e−5, 4.7e−4]
0.000045]
G9 7 KMCGO 112.8 827.678 ± 85.57 [61.48, 255.15] [0.0903, 0.3418]
SCRGOSR 115.6 904.08 ± 77.78 [140.42, 294.98] [0.2044, 0.4294]
TOKCGO 124.4 839.195 ± 96.59 [61.98, 255.15] [0.0916, 0.3749]
KCGO 165.9 910.49 ± 67.96 [155.64, 291.56] [0.2266, 0.4245]
SRD 7 KMCGO 77.3 2995.36 ± 0.95 [0.01, 1.89] [3.34e−6, 6.31e
−4]
SCRGOSR 88.1 2996.15 ± 1.65 [0.08, 3.38] [2.68e−5, 1.13e
−3]
TOKCGO 79.6 2996.25 ± 1.07 [0.76, 2.90] [2.54e−4, 9.68e
−4]
KCGO 136.5 24.3139 ± 0.046 [0.00309, [1.27e−4, 5.10e
0.01239] −4]
5 Conclusions
References
1. Deb, K.: Multi-objective Optimization Using Evolutionary Algorithms. Wiley, New Jersey
(2001)
2. Knowles, J.: ParEGO: a hybrid algorithm with on-line landscape approximation for
expensive multiobjective optimization problems. IEEE Trans. Evol. Comput. 10(1), 50–66
(2006)
3. Li, M., Li, G., Azarm, S.: A Kriging metamodel assisted multi-objective genetic algorithm
for design optimization. J. Mech. Des. 130(3), 031401-031401-10 (2008)
4. Koziel, S., Ogurtsov, S.: Multi-objective design of antennas using variable-fidelity
simulations and surrogate models. IEEE Trans. Antennas Propag. 61(12), 5931–5939 (2013)
5. Koziel, S., Bekasiewicz, A., Couckuyt, I., Dhaene, T.: Efficient multi-objective simulation-
driven antenna design using co-Kriging. IEEE Trans. Antennas Propag. 62(11), 5900–5905
(2014)
6. Jeong, S., Yamamoto, K., Obayashi, S.: Kriging-based probabilistic method for constrained
multi-objective optimization problem. In: AIAA 1st Intelligent Systems Technical Confer-
ence, pp. 1–12 (2004)
7. Durantin, C., Marzat, J., Balesdent, M.: Analysis of multi-objective Kriging-based methods
for constrained global optimization. Comput. Optim. Appl. 63(3), 903–926 (2016)
8. Martin, J.D.: Computational improvements to estimating Kriging metamodel parameters.
J. Mech. Des. 131(8), 084501-084501-7 (2009)
9. Kleijnen, J.P.: Regression and Kriging metamodels with their experimental designs in
simulation: a review. Eur. J. Oper. Res. 256(1), 1–16 (2017)
10. Ranjan, P., Haynes, R., Karsten, R.: A computationally stable approach to gaussian process
interpolation of deterministic computer simulation data. Technometrics 53(4), 366–378
(2011)
11. Li, Y., Wu, Y., Zhao, J., Chen, L.: A Kriging-based constrained global optimization
algorithm for expensive black-box functions with infeasible initial points. J. Global Optim.
67(1–2), 343–366 (2017)
12. Tang, B.: Latin Hypercube Designs. Encyclopedia of Statistics in Quality and Reliability.
Wiley, New Jersey (2008)
13. Mezura-Montes, E., Cetina-Domínguez, O.: Empirical analysis of a modified artificial bee
colony for constrained numerical optimization. Appl. Math. Comput. 218(22), 10943–10973
(2012)
14. Azarm, S., Li, W.C.: Multi-level design optimization using global monotonicity analysis.
J. Mech. Transm. Autom. Des. 111(2), 259–263 (1989)
15. Dong, H., Song, B., Dong, Z., Wang, P.: SCGOSR: surrogate-based constrained global
optimization using space reduction. Appl. Soft Comput. 65, 462–477 (2018)
Multistage Global Search Using Various
Scalarization Schemes in Multicriteria
Optimization Problems
1 Introduction
The multicriteria optimization (MCO) problems, which are used as the state-
ments of the decision making problems often are the field of extensive research
– see, for example, the monographs [1–3] and the reviews of the scientific and
applied results in this area [4,5].
Usually, the finding of the efficient (non-dominated) decisions, in which the
improvement of the values with respect to any criteria cannot be achieved with-
out the worsening of the values with respect to other criteria is understood as
the solution of a MCO problem. In the most general case, when solving the
MCO problems, it can appear to be necessary to obtain a complete set of the
This research was supported by the Russian Science Foundation, project No 16-11-
10150 “Novel efficient methods and software tools for time-consuming decision making
problems using supercomputers of superior performance.”
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 638–648, 2020.
https://doi.org/10.1007/978-3-030-21803-4_64
Multistage Global Search in Multicriteria Optimization Problems 639
efficient decisions (the Pareto set). However, the finding of all efficient decisions
may require a considerable amount of computations, and the set of obtained deci-
sions may appear to be quite large. As a result, the approaches, which obtain the
more limited set of efficient decisions are applied wider. Among such approaches,
there are various kinds of the criteria convolutions, the lexicographic optimiza-
tion methods, the algorithms of searching the best approximation to the given
prototypes, etc. All methods mentioned above allow accounting for the specific
features of the MCO problems being solved and satisfy the requirements to the
optimality from the decision maker.
The present work is devoted to the solving of the MCO problems, which are
used for the description of the complex decision making problems, in which the
criteria of efficiency may have a multiextremal form, and the determining of the
values of the criteria and constraints may require a large amount of computa-
tions. Also, it was assumed that in the course of computations it is possible to
change the problem statement, the methods, and the parameters of solving the
MCO problem that results in the necessity of the multiple solving of the global
optimization problems.
The practical use of this approach implies the overcoming of a considerable
computational complexity of the decision making problems that can be pro-
vided by means of the use of highly efficient global optimization methods and
the complete utilization of the search information obtained in the course of com-
putations.
In the present paper, the results of investigations on the generalization of
the decision making problem statements [6,7] and on the development of the
highly efficient global optimization methods utilizing the whole search informa-
tion obtained in the course of computations [8–10] are presented.
Further structure of the paper is as follows. In Sect. 2, the statement of the
decision making problems based on multistage multicriteria global optimiza-
tion are presented. In Sect. 3, a general scheme of the criteria scalarization
is proposed. In Sect. 4, the search information obtained in the course of com-
putations is considered. In Sect. 5, an efficient algorithm for solving the time-
consuming global optimization problems with the nonlinear constraints is pre-
sented. Section 6 presents the results of numerical experiments confirming the
developed approach to be promising. In Conclusion, the obtained results are
discussed and possible main directions of further research are outlined.
D = {y ∈ RN : ai ≤ yi ≤ bi , 1 ≤ i ≤ N } (2)
Pt = {P1 , P1 , . . . , Pt }, (8)
will be allowed, the set of which can be varied in the course of computations by
means of adding new or by removing already existing optimization problems.
In general, the proposed scheme of the optimal decision search process (1)–(8)
defines a new class of the optimization problems – the multistage multicriteria
global optimization (MMGO) problems.
Multistage Global Search in Multicriteria Optimization Problems 641
where fimin and fimax are the minimum and maximum values of the criteria
fi (y), 1 ≤ i < s, in the domain D respectively, and 0 ≤ δi ≤ 1, 1 ≤ i < s,
are the concessions with respect to each criterion. As before, the values of
concessions 0 ≤ δi ≤ 1, 1 ≤ i < s, can be varied in the course of computations.
The quantities fimin and fimax , 1 ≤ i < s, the values of which may be unknown
a priori, can be replaced by the minimum and maximum estimates of the
criteria values computed using the available search information.
3. In the case of availability of any estimates of the criteria values of the required
decision (for example, based on an ideal decision or on any existing prototype)
the MCO problem solution may consist in finding an efficient decision the
642 V. Gergel and E. Kozinov
s
min F 3 (λ, y) = 1/s θi (fi (y) − fi∗ )2 , g(y) ≤ 0, y ∈ D (12)
i=1
where the objective function F 3 (λ, y) is the standard deviation of the decision
y ∈ D from the sought ideal decision, and the quantities 0 ≤ θi ≤ 1, 1 ≤ i < s,
are the magnitudes of importance of approximations with respect to each
particular variable yi , 1 ≤ i ≤ N .
Within the framework of the developed approach, it is possible to change the
used scalarization methods (10)–(12) and/or altering the parameters of convo-
lutions λ, δ and θ. Such variations expand the set of the MCO problems P from
(8) necessary for solving the initial decision making problem into a wider set of
the scalar global optimization problems (9)
in which each problem P ∈ P from (8) can correspond to several global opti-
mization problems (9).
In the developed approach, one more step of converting the problems being
solved F (λ, y) from (9) is applied, namely the dimensionality reduction is per-
formed with the use of the Peano space-filling curves (evolvents) y(x) providing
an unambiguous mapping of the interval [0, 1] onto an N -dimensional hypercube
D [11,12]. As a result of such reduction, the multidimensional global optimiza-
tion problem (9) is reduced to a one-dimensional problem
The dimensionality reduction allows applying many well known highly effi-
cient one-dimensional global optimization algorithms for solving the problems
(9) (after performing necessary generalization) – see, for example, [11–15].
Table 1. Efficiency of the reuse of the search information in solving the problem (22)
using various criteria scalarization methods
In the last experiment, solving a test MCO problem has been conducted with
a use of the search information but the applied criteria scalarization methods
varied in the course of computations. In this experiment, at the first stage of
646 V. Gergel and E. Kozinov
computations, the minimax criteria convolution was used in solving three sub-
problems (10) with the convolution coefficients (1, 0), (0.5, 0.5), and (0, 1),
correspondingly. At the second stage, the scalarization method was changed to
the reference point method (12), where the estimate (−14, −7) was used as the
reference point with the weighting coefficients (0.5, 0.5). And finally, at the third
stage, the method of successive concessions (11) was applied with the concession
with respect to the first criterion δ = 0.5.
Table 2. Results of solving the problem (22) with altering the criteria scalarization
methods in the course of computations
Stage of computations, criteria scalarization method Total iters New iters PDA
1. MMC (10), three subproblems 304 304 14
2. RPM (12) 402 98 19
3. MSC (11) 437 35 24
The results of performed experiments are given in Table 2. The column “Total
iters” contains the total number of executed global search iterations whereas the
column “New iters” shows the points of iterations executed at particular stage
of solving the problem only. As follows from the results presented in Table 2,
the number of the global search iterations executed at separate stages of solving
the MCO is continuously problem reduced (from 304 down to 35), and the reuse
of the search information provides the computationally efficient opportunity for
dynamic variations of the criteria scalarization methods applied in the course of
the MCO problem solution.
The PDA approximations of Pareto set obtained at the sequentially executed
stages are shown in Fig. 1.
Fig. 1. Approximations of the Pareto set PDA obtained at the sequentially executed
stages of computations
Multistage Global Search in Multicriteria Optimization Problems 647
7 Conclusion
An approach is proposed to transform the decision making problems to the
multi-stage multicriteria time-consuming global optimization problems. As a key
property of the proposed approach it is supposed that the optimization problem
statements and the applied methods of the criteria scalarization can be changed
in the course of computations. The computational complexity is reduced by
means of the reuse of the computed search information. The performed numerical
experiments have confirmed the developed approach is promising.
In further investigations it is intended to execute the numerical experiments
on solving the problems for a larger number of efficiency criteria and for larger
dimensionality. Parallel computations can be considered as well.
References
1. Parnell, G.S., Driscoll, P.J., Henderson, D.L. (eds.): Decision Making in Systems
Engineering and Management. Wiley, New Jersey (2008)
2. Collette, Y., Siarry, P.: Multiobjective Optimization: Principles and Case Studies.
Decision Engineering. Springer (2011)
3. Pardalos, P.M., Žilinskas, A., Žilinskas, J.: Non-Convex Multi-Objective Optimiza-
tion. Springer (2017)
4. Hillermeier, C., Jahn, J.: Multiobjective optimization: survey of methods and
industrial applications. Surv. Math. Ind. 11, 1–42 (2005)
5. Modorskii, V.Y., Gaynutdinova, D.F., Gergel, V.P., Barkalov, K.A.: Optimization
in design of scientific products for purposes of cavitation problems. In: AIP Confer-
ence Proceedings, vol. 1738, p. 400013 (2016). https://doi.org/10.1063/1.4952201
6. Strongin, R.G., Gergel, V.P.: Parallel Computing for Globally Optimal Decision
Making. Lecture Notes in Computer Science, vol. 2763, pp. 76–88 (2003)
7. Gergel, V.P., Kozinov, E.A.: Accelerating multicriterial optimization by the inten-
sive exploitation of accumulated search data. In: AIP Conference Proceedings, vol.
1776, p. 090003 (2016). https://doi.org/10.1063/1.4965367
8. Gergel, V.: An unified approach to use of coprocessors of various types for solving
global optimization problems. In: 2nd International Conference on Mathematics
and Computers in Sciences and in Industry (2015). https://doi.org/10.1109/MCSI.
2015.18
9. Barkalov, K., Gergel, V., Lebedev, I.: Solving global optimization problems on GPU
cluster. In: AIP Conference Proceedings, vol. 1738, p. 400006 (2016). https://doi.
org/10.1063/1.4952194
10. Gergel, V.P., Kozinov, E.A.: Efficient multicriterial optimization based on intensive
reuse of search information. J. Glob. Optim. 71(1), 73–90 (2018). https://doi.org/
10.1007/s10898-018-0624-3
11. Strongin, R., Sergeyev, Y.: Global Optimization with Non-Convex Constraints.
Sequential and Parallel Algorithms. Kluwer Academic Publishers, Dordrecht (2nd
edn 2013, 3rd edn 2014)
12. Sergeyev, Y.D., Strongin, R.G., Lera, D.: Introduction to Global Optimization
Exploiting Space-Filling Curves. Springer (2013)
648 V. Gergel and E. Kozinov
13. Zhigljavsky, A., Žilinskas, A.: Stochastic Global Optimization. Springer, Berlin
(2008)
14. Locatelli, M., Schoen, F.: Global Optimization: Theory, Algorithms, and Applica-
tions. SIAM (2013)
15. Floudas, C.A., Pardalos, M.P.: Recent Advances in Global Optimization. Princeton
University Press (2016)
Necessary Optimality Condition
for Nonlinear Interval Vector
Programming Problem Under B-Arcwise
Connected Functions
1 Introduction
The research of Mohan Bir Subba is supported by Council of Scientific and Industrial
Research (CSIR), New Delhi, India through File No.: 09/1191(0001)/2017-EMR-I.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 649–659, 2020.
https://doi.org/10.1007/978-3-030-21803-4_65
650 M. B. Subba and V. Singh
the programming problems under the function. While Singh [13] considered both
differential and non differentiable functions for investigating the basic properties
of arcwise connected set and functions. Bhatia and Mehra [3] derived the opti-
mality conditions of nonlinear programming problems under arcwise connected
functions with the help of directional derivatives while Davar and Mehra [6] con-
sidered the fractional optimization problems and establish optimality conditions
and duality results. B-arcwise connected function was developed and studied by
Zhang [20] giving optimality conditions and the duality results of the nonlinear
semi-infinite programming problems. Optimization problems also have to deal
with uncertainty problems. In order to solve this, different methods viz. interval
numbers, fuzzy numbers and stochastic process have been developed. In recent
years, many researchers have contributed to the area of interval optimization
([5,8]). Hukuhara derivatives was introduced by Hukuhara [7] for solving interval
optimization which was further generalized by Stefanini and Bede [14] to gen-
eralized Hukuhara derivative (gH-derivative). Gómez et al. [10] considered the
generalized differentiable interval-valued functions and obtained its optimality
conditions. Interval optimization problems involving single and multiobjective
functions were rigorously studied by Wu ([17–19]) resulting in the derivation of
the KKT-optimality conditions and Wolfe duality. Wang and Zhang [16] derived
the same by introducing arcwise connected interval-valued function. Interval-
valued invex function was introduced by Li et al. [9] to derive sufficient opti-
mality condition using gH-differentiability. Recently, a new type of efficiency
conditions were introduced by Osuna-Gómez et al. [11] for interval optimiza-
tion problems considering generalized smooth multiobjective convex functions
with gH-derivative. Antczak [1] obtained the Mond-Weir duality results and the
conditions which are necessary and sufficient for weakly LU -efficient solutions
to be optimal in nondifferentiable multiobjective programming problems with
multiple interval-valued convex functions.
This paper deals with a new type of nonlinear interval vector programming
problem (NIVP) where both objective and constraints are B-arcwise connected
interval-valued functions (BCIFs). Current work is designed in the following
fashion. Section 2 is dedicated to the introduction of BCIFs, right gH-derivatives
of BCIFs (gHBCIFs) and its properties for deriving a global optimality criterion.
While in Sect. 3, Fritz-John kind and KKT-kind necessary weakly LU -efficiency
conditions for NIVP with gHBCIF in both multiple objective and constraints
have been presented. The conclusion and further outlook of the presented paper
are discussed in Sect. 4.
2 Preliminaries
Throughout the paper, we shall consider Ic a set of closed interval in IR and for
any closed set C ⊂ Ic we shall denote it by C = [c, c] such that c ≤ c where c is
the lower bound and c is the upper bound of C. If for any two arbitrary closed
intervals C, D ∈ Ic and a real number μ ∈ IR, then we have by definition;
Necessary Optimality Condition for NIVP Under BCNs 651
[μc, μc] , if μ ≥ 0
C + D = c + d, c + d ⊂ Ic and μC = μ [c, c] = (1)
[μc, μc] , if μ < 0
by letting ψ → 0+ .
Necessary Optimality Condition for NIVP Under BCNs 653
b+ ( x)] ≥U
x, y) [F (y) gH F ( +
L F (Hx
+
,y (0)) = Hx x)T = [0, 0]
,y (0)∇F (
x, y)F (y) ≥U
⇒ b+ ( +
L b (
x, y)F (
x)
⇒ F (y) ≥U
L f (
x)
since b+ ( ∈ X.
x, y) > 0, thus F has a global minima at x
Theorem 3. Consider an interval-valued function F : X → Ic which is SgHB-
CIF on non-empty open AC set X ⊂ IRn and b+ ( x, y) = limψ→0+ b(
x, y, ψ) > 0
, y ∈ X. If x
for x is a point such that ∇F (
x) = [0, 0], then F has unique global
minimum point at x ∈ X.
Proof. Following the proof of (theorem 2) we can conclude that F has a global
minimum point at x ∈ X.
For the proof of uniqueness, we follow the method of contradiction. If possible,
let F has a global minimum point at x1 ∈ X such that, F ( x) = F (x1 ), for x =
1
x . Since F is SBCIF and x , x ∈ X then there exists a real valued function
1
for 0 < ψ < 1, which is a contradiction that F has a global minimum point at
∈ X. Thus x
x is the unique global minimum point of F .
3 Optimality Conditions
In this paper, we consider the following nonlinear interval vector programming
problem (NIVP) under multiobjective interval-valued function
Min F (x) = (F1 (x), F2 (x), ..., Fm (x))
(N IV P ) = x∈X
subject to Gj (x) ≤U L [0, 0], j = 1, 2, ..., p
denotes the set of feasible solution of NIVP and we defined I( x) the set of
constraints indices which is active at the feasible point x ∈ S i.e. I(
x) = {j ∈
P : Gj (
x) = [0, 0]} and J( x) <U
x) = {j ∈ P : Gj ( L [0, 0]}.
Definition 8. [19] Let F (x) = (F1 (x), ..., Fm (x)) be a multiobjective interval-
valued function defined on non-empty open AC set X ⊂ IRn such that Fi : X →
∈ S is called
Ic . Then x
(i) weakly LU -efficient solution of NIVP if and only if there exists no feasible
point x ∈ S such that Fi (x) <U x) for every i ∈ M .
L Fi (
(ii) LU -efficient solution of NIVP if and only if there exists no feasible point
x ∈ S such that Fi (x) U x) and Fi (x) <U
L Fi ( x) for at least one i ∈ M .
L Fi (
μj Gj (x) ≥U
L [0, 0] i.e.
⎝ μj g j (x), μj gj (x)⎠ ≥U
L [0, 0], ∀ x ∈ X.
j=1 j=1 j=1
Gj (Hx,y (ψ)) ≤U
L (1 − ψb(x, y, ψ)) Gj (x) + ψb(x, y, ψ)Gj (y) (11)
for 0 ≤ ψ ≤ 1 and Hx,y (ψ) ⊂ X. Now for any (0, 0) ≤ μj ∈ IR2 , then by Eq. (11)
we get,
p
μj Gj (Hx,y (ψ)) <U
L [0, 0] (12)
j=1
μj Gj (
x) = [0, 0] (14)
j=1
μ∗i , μ∗i , μj , μj ≥ (0, 0, 0, 0) (15)
For sufficiently small ψ, 0 < ψ < ψ < 1 and from Eqs. (16)–(19), we have
Fi (Hx,x (0)) <U x) , i ∈ M
L Fi ( and Gj (Hx,x (0)) <U x) , j ∈ I(
L Gj ( x) (20)
Since the arc function Hx,x (ψ) is continuous at ψ and Gj (Hx,x (ψ)) ∀ (j ∈ J(
x))
, thus
are continuous at x
x) <U
lim Gj (Hx,x (ψ)) = Gj ( L [0, 0], j ∈ J(
x)
ψ→0+
Considering ψ = min(ψ, ψi ) and from the Eqs. (20), (21) we obtained that for
Hx,x (ψ) ∈ S ⊂ X for 0 < ψ < ψ which implies that Fi (Hx,x (ψ)) < Fi ( x) which
is a contradiction since, x is a weakly LU -efficient solution of NIVP. Thus, the
Eq. (16) does not have solution x ∈ X.
Since Fi+ (Hx,x (0)), i ∈ M and G+ (Hx,x (0)), j ∈ P are BCIFs of x, thus
j
∗ ∗
from (Lemma 1), there exists μi , μi ∈ IR2 and μj , μj ∈ IR2 such that Eqs.
(13)–(15) are satisfied at x , hence the proof of the theorem.
656 M. B. Subba and V. Singh
x) ≥U
μj Gj ( L [0, 0], ∀ j ∈ P (24)
j=1
Moreover, x ∈ S is a feasible solution of NIVP, thus for all μ∗i , μ∗i ∈ IR2 and
μj , μj ∈ IR2 with Eq. (24) we get
p
x) = [0, 0], ∀ j ∈ P
μj Gj ( (25)
j=1
Thus for every x ∈ S and 0 < ψ < 1 there exists an arc Hx,x (ψ) in X. Now from
the Eqs. (23) and (25) by substituting x = Hx,x (ψ), we obtain
m
p
for all i ∈ M and j ∈ P . Dividing the above equation by ψ > 0 and taking limit
ψ → 0+ , we get Eq. (13). Thus the proof of the theorem.
p
p
U +
μj G+
j (Hx x, x∗ )
,x∗ (0)) ≤L b ( μj [Gj (x∗ ) gH Gj (
x)] (27)
j=1 j=1
x, x∗ )
b+ ( μj Gj (x∗ ) ≥U
L [0, 0] (28)
j=1
4 Conclusions
The class of BCIF and SBCIF under AC set have been studied. The right gH-
derivative with respect to Hukuhara difference between two closed intervals are
introduced for BCIF and its properties have been investigated under right gH-
derivative for a globally optimal solution. The Fritz-John kind and Karush-
Kuhn-Tucker kind necessary weakly LU -efficiency conditions are obtained for
658 M. B. Subba and V. Singh
NIVP involving gHBCIFs. To the best of our knowledge, the necessary weakly
LU -efficiency conditions are new in the area of NIVP under the set of differential
BCIF and SBCIF. Many practical problems can be treated by interval optimiza-
tion including electric energy market [12], sinter cooling process [15], etc. The
work can be further extended to establish sufficiency optimality condition and
duality results.
References
1. Antczak, T.: Optimality conditions and duality results for nonsmooth vector opti-
mization problems with the multiple interval-valued objective function. Acta Math.
Sci. 37B(4), 1133–1150 (2017)
2. Avriel, M., Zang, I.: Generalized arcwise-connected functions and characterization
of local-global minimum properties. J. Optim. Theory Appl. 32(4), 407–425 (1980)
3. Bhatia, D., Mehra, A.: Optimality conditions and duality involving arcwise con-
nected and generalized arcwise connected functions. J. Optim. Theory Appl.
100(1), 181–194 (1999)
4. Cambini, A., Martein, L.: Generalized Convexity and Optimization. Lecture Notes
in Economics and Mathematical Systems, vol. 616. Springer, Berlin (2009)
5. Chalco-Cano, Y., Rufian-Lizana, A., Rom án-Flores, H., Jim énez-Gamero, M.D.:
Calculus for interval-valued functions using generalized Hukuhara derivative and
applications. Fuzzy Sets Syst. 219, 49–67
6. Davar, S., Mehra, A.: Optimality and duality for fractional programming problems
involving arcwise connected functions and their generalizations. J. Math. Anal.
Appl. 263(2), 666–682 (2001)
7. Hukuhara, M.: Integration des applications mesurables dont la valeur est un com-
pact convexe. Funkc. Ekvacioj 10, 205–223 (1967)
8. Jana, M., Panda, G.: Solution of nonlinear interval vector optimization problem.
Oper. Res. Int. J. 14(1), 71–85 (2014)
9. Li, L., Liu, S., Zhang, J.: On interval-valued invex mappings and optimality condi-
tions for interval-valued optimization problems. J. Inequal. Appl. 179, 2–19 (2015)
10. Osuna-Gómez, R., Chalco-Cano, Y., Hernández-Jiménez, B., Ruiz-Garzón, G.:
Optimality conditions for generalized differentiable interval-valued functions. Inf.
Sci. 321, 136–146 (2015)
11. Osuna-Gómez, R., Hernández-Jiménez, B., Chalco-Cano, Y., Ruiz-Garzón, G.:
New efficiency conditions for multiobjective interval-valued programming prob-
lems. Inf. Sci. 420, 235–248 (2017)
12. Saric, A.T., Stankovic, A.M.: An application of interval analysis and optimization
to electric energy markets. IEEE Trans. Power Syst. 21(2), 515–523 (2006)
13. Singh, C.: Elementary properties of arcwise connected sets and functions. J. Optim.
Theory Appl. 41(2), 377–387 (1983)
14. Stefanini, L., Bede, B.: Generalized Hukuhara differentiability of interval-valued
functions and interval differential equations. Nonlinear Anal. 71(3–4), 1311–1328
(2009)
15. Tian, W., Ni, B., Jiang, C., Wu, Z.: Uncertainty analysis and optimization of sinter
cooling process for waste heat recovery. Appl. Therm. Eng. 150(1), 111–120 (2019)
16. Wang, H., Zhang, R.: Optimality conditions and duality for arcwise connected
interval optimization problems. Opsearch 52(4), 870–883 (2015)
Necessary Optimality Condition for NIVP Under BCNs 659
1 Introduction
Nonsmooth phenomena occur naturally and frequently in optimization theory,
which led to the development of several notions of generalized directional deriva-
tives and subdifferentials. The notion of convexificator is a generalization of
The first author is supported by the Science and Engineering Research Board (SERB),
Department of Science and Technology, India under Early Career Reasearch (ECR)
advancement scheme through grant no. ECR/2016/001961.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 660–671, 2020.
https://doi.org/10.1007/978-3-030-21803-4_66
On Generalized Vector Variational Inequalities and Nonsmooth 661
The following proposition from [25] gives a necessary and sufficient conditions
for a locally Lipschitz function to be a ∂∗∗ -approximate convex function.
Proposition 1. Let f : K → R be a locally Lipschitz function on K and for
any x ∈ K admit bounded convexificators ∂∗∗ f (x). Then, f is ∂∗∗ -approximate
convex at x ∈ K, if and only if for all α > 0, there exists δ > 0, such that for
any x, y ∈ B(x, δ) ∩ K, and λ ∈ [0, 1] , one has
This result led to the following generalization of the ∂∗∗ -approximate convex
functions:
Definition 5. Let f : K → R be a locally Lipschitz function on K and admits a
bounded convexificator ∂∗∗ f (x) at x. The function f is said to be ∂∗∗ -approximate
pseudoconvex function of type I at x ∈ K, if for all α > 0, there exist δ > 0, such
that for all x ∈ B(x, δ) K, one has
or equivalently,
subject to x ∈ K,
where fi : K → R, i ∈ M are non-differentiable, locally Lipschitz, and bounded
below functions on K.
The notions of local quasi efficient and local weak quasi efficient solutions to
the (NVOP) may be defined as follows, see [1,11].
Definition 6. A point x ∈ K is said to be a local quasi efficient solution to the
there exists α ∈ int(Rp+ ) and a neighbourhood U of x, such that for
(NVOP) if
any x ∈ K U, one has
/ −Rm
fm (x) − fm (x) + αm x − x) ∈ + \ {0} .
On Generalized Vector Variational Inequalities and Nonsmooth 665
/ −int(Rm
fm (x) − fm (x) + αm x − x) ∈ + ).
Remark 1. It is obvious by the definitions that every local efficient solution (local
weak efficient solution) is local quasi efficient solution (local weak quasi efficient
solution) to the (NVOP), but the converse may not be true, see [3] and Mishra
and Upadhyay [23].
(∂∗∗ -WSVVIP)Find a point x ∈ K, and for any x∗i ∈ ∂∗∗ fi (x), i ∈ M, there exist
no x ∈ K, such that
Proof Let x ∈ K be a local weak quasi efficient solution to the (NVOP). Hence,
there exists α ∈ int(Rp+ ) and a δ > 0, such that for any x ∈ B(x, δ) K, one has
/ −int(Rm
f (x) − f (x) + α x − x ∈ + ).
Since Kis a convex set, for any t ∈ [0, 1] , and x ∈ B(x, δ) K, x + t(x − x) ∈
B(x, δ) K. Therefore, for any α ∈ int(Rm + ), 0 < t < 1 and x ∈ B(x, δ) K, it
follows that
f (x + t(x − x)) − f (x) + αt x − x
∈/ −int(Rm
+ ).
t
Taking the limit inf as t → 0, for any x ∈ B(x, δ) K, we get
f − (x, x − x) := f1− (x, x − x), ..., fm
−
(x, x − x) ∈/ −int(Rm
+ ).
∗
i admit bounded convexificator ∂∗ fi (x), for all i ∈ M, for any x ∈
Since f
B(x, δ) K, we infer that
fi (x) − fi (x) + αi x − x ≤ 0,
λT x∗ = 0.
λT x∗ = 0.
If x is not a local weak quasi efficient solution to the (NVOP), then for any
α ∈ int(Rm+ ) and any δ > 0 there exists a x ∈ B(x, δ) K, such that
References
1. Ansari, Q.H., Lee, G.M.: Nonsmooth vector optimization problems and Minty vec-
tor variational inequalities. J. Optim. Theory Appl. 145, 1–16 (2010)
2. Al-Homidan, S., Ansari, Q.H.: Generalized Minty vector variational like inequalities
and vector optimization problems. J. Optim. Theory Appl. 144, 1–11 (2010)
3. Bhatia, D., Gupta, A., Arora, P.: Optimality via generalized approximate convexity
and quasiefficiency. Optim. Lett. 7, 127–135 (2013)
4. Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley-Interscience, New
York (1983)
5. Daniilidis, A., Georgiev, P.: Approximate convexity and submonotonicity. J. Math.
Anal. Appl. 291, 292–301 (2004)
6. Deng, S.: On approximate solutions in convex vector optimization. SIAM J. Control
Optim. 35, 2128–2136 (1997)
7. Demyanov, V.F.: Convexification and Concavification of Positively Homogeneous
Functions by the Same Family of Linear Functions. Report 3.208.802 Universita
di Pisa (1994)
8. Demyanov, V.F., Jeyakumar, V.: Hunting for a smaller convex subdifferential. J.
Global Optim. 10, 305–326 (1997)
9. Dutta, J., Chandra, S.: Convexificators, generalized convexity and vector optimiza-
tion. Optimization 53, 77–94 (2004)
10. Dutta, J., Vetrivel, V.: On approximate minima in vector optimization. Numer.
Funct. Anal. Optim. 22, 845–859 (2001)
11. Giannessi, F.: Theorems of the alternative, quadratic programming and comple-
mentarily problems. In: Cottle, R.W., Giannessi, F., Lions, J.L. (eds.) Variational
Inequalities and Complementarity Problems, pp. 151–186. Wiley, New York (1980)
12. Giannessi, F.: On Minty variational principle. In: Giannessi, F., Komlósi, S.,
Rapcsák, T. (eds.) New Trends in Mathematical Programming, pp. 93–99. Kluwer
Academic Publishers, Dordrecht, Netherland (1997)
13. Golestani, M., Nobakhtian, S.: Convexificator and strong Kuhn-Tucker conditions.
Comput. Math. Appl. 64, 550–557 (2012)
14. Gupta, A., Mehra, A., Bhatia, D.: Approximate convexity in vector optimization.
Bull. Austral. Math. Soc. 74, 207–218 (2006)
15. Gupta, D., Mehra, A.: Two types of approximate saddle points. Numer. Funct.
Anal. Optim. 29, 532–550 (2008)
16. Jeyakumar, V., Luc, D.T.: Nonsmooth calculus, minimality, and monotonicity of
convexificators. J. Optim. Theory Appl. 101(3), 599–621 (1999)
17. Jeyakumar, V., Luc, D.T.: Approximate Jacobian matrices for nonsmooth contin-
uous maps and C 1 -optimization. SIAM J. Control Optim. 36, 1815–1832 (1998)
18. Li, X.F., Zhang, J.Z.: Stronger Kuhn-Tucker type conditions in nonsmooth multi-
objective optimization: locally Lipschitz case. J. Optim. Theory Appl. 127, 367–
388 (2005)
19. Long, X.J., Huang, N.J.: Optimality conditions for efficiency on nonsmooth mul-
tiobjective programming problems. Taiwanese J. Math. 18, 687–699 (2014)
20. Luu, D.V.: Convexifcators and necessary conditions for efficiency. Optimization
63, 321–335 (2013)
21. Mangasarian, O.L.: Nonlinear Programming. McGraw-Hill, New York (1969)
22. Michel, P., Penot, J.P.: A generalized derivative for calm and stable functions.
Differ. Integral Equ. 5, 433–454 (1992)
On Generalized Vector Variational Inequalities and Nonsmooth 671
23. Mishra, S.K., Upadhyay, B.B.: Some relations between vector variational inequal-
ity problems and nonsmooth vector optimization problems using quasi efficiency.
Positivity 17, 1071–1083 (2013)
24. Mishra, S.K., Upadhyay, B.B.: Pseudolinear Functions and Optimization. Taylor
and Francis (2014)
25. Upadhyay, B.B., Mohapatra, R.N.: On approximate convex functions and sub-
monotone operators using convexificators. J. Nonlinear Convex Anal. (2018) (sub-
mitted)
26. Mishra, S.K., Wang, S.Y., Lai, K.K.: Generalized Convexity and Vector Optimiza-
tion. Nonconvex Optimization and Its Applications. Springer, Berlin (2009)
27. Mordukhovich, B.S., Shao, Y.H.: On nonconvex subdifferential calculus in Banach
spaces. J. Convex Anal. 2, 211–227 (1995)
28. Ngai, H.V., Luc, D.T., Thera, M.: Approximate convex functions. J. Nonlinear
Convex Anal. 1, 155–176 (2000)
29. Ngai, H.V., Penot, J.P.: Approximate convex functions and approximately mono-
tone operators. Nonlinear Anal. 66, 547–564 (2007)
30. Osuna-Gomez, R., Rufian-Lizana, A., Ruiz-Canales, P.: Invex functions and gen-
eralized convexity in multiobjective programming. J. Optim. Theory Appl. 98,
651–661 (1998)
31. Upadhyay, B.B., Mohapatra, R.N., Mishra, S.K.: On relationships between vec-
tor variational inequality and nonsmooth vector optimization problems via strict
minimizers. Adv. Nonlinear Var. Inequalities 20, 1–12 (2017)
32. Yang, X.M., Yang, X.Q., Teo, K.L.: Some remarks on the Minty vector variational
inequality. J. Optim. Theory Appl. 121, 193–201 (1994)
33. Yang, X.Q., Zheng, X.Y.: Approximate solutions and optimality conditions of vec-
tor variational inequalities in Banach spaces. J. Glob. Optim. 40, 455–462 (2008)
SOP-Hybrid: A Parallel Surrogate-Based
Candidate Search Algorithm
for Expensive Optimization on Large
Parallel Clusters
This work was partially supported by the Singapore National Research Foundation,
Prime Minister’s Office, Singapore under its Campus for Research Excellence and Tech-
nological Enterprise (CREATE) programme (E2S2-CREATE project CS-B) and by
Prof. Shoemaker’s NUS startup grant.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 672–680, 2020.
https://doi.org/10.1007/978-3-030-21803-4_67
Parallel Surrogate-Based Candidate Search 673
1 Introduction
There is a growing need for global optimization methods for expensive objective
functions, especially those based on computer simulation models. The monumen-
tal increase in computational power over the last few decades has also resulted
in a massive increase in complexity and computational intensity of simulation
models. For instance, state-of-the-art environmental and hydrodynamic simu-
lation models solve Partial Differential Equation (PDE) systems that require
considerable computational time and resources. A single training run of a Deep
Neural Networks (DNNs) can take hours, or even days on a GPU. Global opti-
mization of continuous and computationally expensive black-box functions that
are derived from such models, is thus, extremely challenging [8].
Deterministic programming methods [13] and stochastics metaheuristics are
the two prevalent classes of algorithms that have been used in the past for
expensive global optimization. A recent comprehensive algorithm comparison
[12] showed that both deterministic and stochastic algorithm classes are com-
petitive, with stochastic methods performing better on smaller evaluation bud-
gets, and deterministic methods performing better on relatively larger evaluation
budgets.
When the evaluation budget is limited, optimization efficiency of stochas-
tic algorithms can be enhanced significantly (i) by using cheap approximation
functions as surrogate models during optimization and (ii) incorporating par-
allelization by proposing multiple points for expensive evaluations during the
iterative stochastic optimization process.
Prior work on the use of surrogates for expensive optimization is dominated
by iterative algorithmic frameworks where, in each algorithm iteration, surro-
gates are (i) used to propose new points for evaluation and (ii) subsequently
updated after new points are evaluated. This general framework is also called
Sequential Model-based Optimization (SMBO) [4]. Gaussian Processes (GP)
have been the most popular choice as surrogates for the SMBO framework [6,14].
However, many recent studies show that Radial Basis Functions (RBFs) can be
more effective as surrogates, than GPs, for the SMBO framework [5,9,11].
A critical component of any SMBO algorithm is the mechanism for proposing
new points for expensive evaluations. If such mechanism allows multiple points
to be proposed for evaluation in each algorithm iteration, expensive evaluations
of the multiple points can be executed in parallel. Parallel variants of both GP
[14] and RBF-based algorithms [7,10] have been proposed in the past, with the
focus on balancing the trade-off between exploration and exploitation during the
process of proposing multiple points for expensive evaluations. However, most
of these algorithms have been applied to situations where only a few points are
proposed for evaluation in each algorithm iteration. For instance, Snoek et al.
[14] test their GP-based algorithm with up to only 10 parallel evaluations.
Given the availability of many cores on large computing clusters, it is impor-
tant to explore the efficiency of surrogate algorithms when many points are
proposed for parallel evaluation in each algorithm iteration. This study explores
the efficiency of the SOP algorithm [7] in this regard, and subsequently proposes
674 T. Akhtar and C. A. Shoemaker
a modified version of SOP called SOP-Hybrid, that is designed for parallel opti-
mization of expensive functions with availability of many cores (approximately
50 or more).
where F1 (x) = f (x) and F2 (x) = − mins∈S (n) \{x} s − x. S (n) is the set of
all previously evaluated n points. F1 (x) denotes the expensive objective func-
tion value and F2 (x) is the minimum distance of an evaluated point from other
evaluated points, and is hence a metric of the isolation of an evaluated point.
After evaluated points are ranked (via non-dominated sorting as per Eq. 1), P
points are selected (these points are also called centers) in SOP for neighborhood
candidate search (see Sect. 2.3 below). Selection of center points according to
Eq. 1 is essentially based on the exploration-exploitation trade-off [6], where
F1 (x) implies exploitation (since we will subsequently perform candidate search
around better solutions found so far) and F2 (x) implies exploration.
After P center points are selected, the SOP algorithm proceeds by generating
P sets of candidate points (large). One candidate set is generated around each
center, i, by randomly perturbing a subset of decision variables around center i.
This candidate generation mechanism is also referred as Dynamic Co-ordinate
Search (DYCORS), which was first proposed in [11]. Subsequently, one point
is selected for expensive evaluations from each candidate set. This is the point
with the best surrogate approximation value. Hence, SOP proposes P new points
(Step 5 in Algorithm 1) for simultaneous evaluation in each iteration.
3 SOP-Hybrid Framework
4 Computer Experiments
4.1 Experimental Setup
Test Problems SOP and SOP-Hybrid are tested on six noiseless BBOB bench-
mark [3] problems (F15, F16, F19, F20, F23, F24). These test problems are
chosen, since they are highly multi-modal and are frequently used in global opti-
mization competitions. Moreover, F20, F23 and F24 have weak global structures,
and hence, pose an additional optimization challenge. The number of decision
variables for all test problems are set to ten.
Fig. 2. Progress curves (low curves are better) of SOP and SOP-Hybrid with 48 syn-
chronous simultaneous evaluations, for Problems F15 (left sub-plot) and F16 (right
sub-plot)
4.2 Results
The progress curves of SOP and SOP-Hybrid for test problems F15 and F16 are
illustrated in Fig. 2. Results of Fig. 2 indicate that both SOP and SOP-Hybrid
have comparable performance on Problems F15 and F16. However, performance
of SOP-Hybrid is slightly better.
Fig. 3. Progress curves (low curves are better) of SOP and SOP-Hybrid with 48 syn-
chronous simultaneous evaluations, for Problems F19 (left sub-plot) and F20 (right
sub-plot)
5 Conclusion
SOP-Hybrid is an extension of the synchronous parallel SOP algorithm, that is
designed for computationally expensive continuous black-box functions, and par-
allel frameworks where computational resources are available for simultaneously
Parallel Surrogate-Based Candidate Search 679
Fig. 4. Progress curves (low curves are better) of SOP and SOP-Hybrid with 48 syn-
chronous simultaneous evaluations, for Problems F23 (left sub-plot) and F24 (right
sub-plot)
evaluating many (more than 40) expensive points in each algorithm iteration.
SOP-Hybrid incorporates the weighted distance acquisition function proposed in
[9,10], into the SOP framework, to balance the exploration-exploitation trade-off
both at the global level (via center selection) and the local level (via DYCORS
candidate search).
Results, with 48 points proposed for synchronous evaluations, show that
SOP-Hybrid is more efficient than the baseline SOP algorithm. In future, we
wish to extend our comparative analysis of SOP and SOP-Hybrid, by testing
both algorithms on test problems with up to 200 simultaneous evaluations per
iteration, and on higher dimensional problems. Moreover, we intend to compare
both algorithms on some real simulation-optimization applications.
References
1. Deb, K., Kalyanmoy, D.: Multi-Objective Optimization Using Evolutionary Algo-
rithms. Wiley, New York (2001)
2. Eriksson, D., Bindel, D., Shoemaker, C.: Surrogate optimization toolbox (pysot).
https://github.com/dme65/pySOT (2015)
3. Hansen, N., Finck, S., Ros, R., Auger, A.: Real-parameter black-box optimization
benchmarking 2009: Noiseless functions definitions. Technical Report RR-6829,
INRIA (2009)
4. Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization
for general algorithm configuration. In: Proceedings of the 5th International Con-
ference on Learning and Intelligent Optimization, pp. 507–523. Springer-Verlag,
Heidelberg (2011)
5. Ilievski, I., Akhtar, T., Feng, J., Shoemaker, C.: Efficient hyperparameter optimiza-
tion for deep learning algorithms using deterministic RBF surrogates. In: AAAI
Conference on Artificial Intelligence (2017)
6. Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive
black-box functions. J. Global Optim. 13(4), 455–492 (1998)
680 T. Akhtar and C. A. Shoemaker
7. Krityakierne, T., Akhtar, T., Shoemaker, C.A.: SOP: parallel surrogate global opti-
mization with pareto center selection for computationally expensive single objective
problems. J. Global Optim. 66(3), 417–437 (2016)
8. Pintér, J.D.: Global Optimization in Action. Springer, New York (1996)
9. Regis, R.G., Shoemaker, C.A.: A stochastic radial basis function method for the
global optimization of expensive functions. INFORMS J. Comput. 19(4), 497–509
(2007)
10. Regis, R.G., Shoemaker, C.A.: Parallel stochastic global optimization using radial
basis functions. INFORMS J. Comput. 21(3), 411–426 (2009)
11. Regis, R.G., Shoemaker, C.A.: Combining radial basis function surrogates dynamic
coordinate search in high dimensional expensive black-box optimization. Eng.
Optim. 45(5), 529–555 (2013)
12. Sergeyev, Y.D., Kvasov, D.E., Mukhametzhanov, M.S.: On the efficiency of nature-
inspired metaheuristics in expensive global optimization with limited budget. Sci.
Rep. 8(453), (Jan 2018)
13. Sergeyev, Y.D., Kvasov, D.E.: Deterministic Global Optimization: An Introduction
to the Diagonal Approach, 1st edn. Springer, New York (2017)
14. Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine
learning algorithms. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q.
(eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 2951–2959.
Curran Associates, Inc. (2012)
Surrogate Many Objective Optimization:
Combining Evolutionary Search,
-Dominance and Connected Restarts
This work was partially supported by the Singapore National Research Foundation,
Prime Minister’s Office, Singapore under its Campus for Research Excellence and Tech-
nological Enterprise (CREATE) programme (E2S2-CREATE project CS-B) and by
Prof. Shoemaker’s NUS startup grant.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 681–690, 2020.
https://doi.org/10.1007/978-3-030-21803-4_68
682 T. Akhtar et al.
1 Introduction
Many real-world optimization problems are multi-objective, where evaluation
of objectives is computationally expensive. Multi-Objective Evolutionary Algo-
rithms (MOEAs) are extremely popular for solving computationally expensive
multi-objective problems, since their population-based structure allows MOEAs
to converge to the Pareto front and simultaneously find diverse trade-off solu-
tions [5].
Despite their inherent capability of simultaneously pursuing convergence and
diversity, MOEAs may still require many expensive simulations to find suitable
trade-off solutions, especially for Many-objective Optimization (MaOO) prob-
lems [1,2]. Iterative use of surrogate models in optimization can significantly
reduce computational effort for expensive MO problems.
The taxonomy of iterative surrogate multi-objective optimization is discussed
in [10]. Many iterative surrogate algorithms have been proposed in past literature
for expensive multi-objective optimization, and are dominated by methods that
either use Gaussian Processes (GP) [6,7,11] or Radial Basis Functions (RBFs)
[1,14] as surrogates. However, most surrogate methods introduced in the past
are only designed for and tested on problems with up to 3 objectives.
Since many real world optimization application can have many objec-
tives (more than three), this study proposes -GOMORS, an extension of the
GOMORS algorithm [1], that is designed to handle many objectives. GOMORS
is an iterative surrogate MO algorithm that uses RBFs as surrogates and has
performs better than the GP-based ParEGO [11] on a limited evaluation budget
and especially on problems with more than 10 decision variables.
-GOMORS replaces the use of non-dominance archiving in GOMORS by
-non-dominance archiving introduced in [13]. This -non-dominance archiving
mechanism is a computationally efficient alternative to non-dominance archiving
and has been used in some non-surrogate MOEAs to improve algorithm run-time
efficiency and scale performance on many-objective problems [9,12].
An additional challenge associated with Multi-objective algorithms is that
they can get stuck in locally optimum solutions and fronts, especially for prob-
lems with multi-modal objectives. Restart mechanisms have been used to alle-
viate this challenge in the past, especially in MOEAs [9,12] and single objective
surrogate algorithms [15,16]. -GOMORS also incorporates a novel restart mech-
anism to ensure that the algorithm does not get stuck in locally optimum fronts.
archiving [9,12]. If the expensive optimization problem has many objectives, the
auxiliary problems of Eqs. 1 and 2 will also have many objectives. Hence, solving
the auxiliary problems requires an algorithm that is suitable for many-objective
problems. -GOMORS thus uses -NSGA-II [12] as the auxiliary solver instead
of NSGA-II (see Fig. 1). Moreover, for the Gap problem of Eq. 2, a solution is
randomly selected from the -non-dominance archive, as xcrowd .
States Geological Survey (USGS) Station 01421618), and the two calibration
objectives represent different errors between simulated and observed data. Run-
ning time of a 10-year simulation of the Cannonsville model is around 1 min. The
watershed calibration problem is called ‘CFLOW’ in subsequent discussions.
Hv (P ) − Hv (Pinit )
Hc (P ) = (3)
Hv (P ∗ ) − Hv (Pinit )
Let P be the set of non-dominated solutions obtained by an algorithm and let
P ∗ be the Pareto front of the multi-objective problem being solved. Moreover, let
Hv (A) be the Hypervolume [6] of the objective space dominated by an arbitrary
set A. Consequently, Hc (P ) is the proportion of total feasible objective space
(after subtracting the space dominated by initial solutions, i.e., Pinit ) dominated
by P . Higher values of Hc are desirable and ideal value is 1.
Fig. 2. DTLZ2 Progress Plots: Hypervolume coverage progress curves (averaged over
multiple trials) of all algorithms for DTLZ2 [4], with 2, 4 and 6 objectives. Each subplot
corresponds to a Hypervolume progress plot (higher values are better) comparison for
a fixed number of objectives (depicted in subplot title).
Surrogate Many Objective Optimization 687
3.2 Results
Progress Curves - Many Objective Test Problems Results for the two
scalable test problems, DTLZ2 and DTLZ4 [4] are compared by plotting the
Hypervolume coverage (Hc ) obtained by an algorithm against the number of
completed function evaluations. We call these plots progress curves in subsequent
discussions. Figures 2 and 3 illustrate the progress curves for DTLZ2 and DTLZ4,
respectively. -GOMORS is labeled as “eps-GOMORS” and -NSGA-II is labeled
as “EpsNSGA2” in Figs. 2 and 3.
Fig. 3. DTLZ4 Progress Plots: Hypervolume coverage progress curves (averaged over
multiple trials) of all algorithms for DTLZ4 [4], with 2, 4 and 6 objectives. Each subplot
corresponds to a Hypervolume progress plot (higher values are better) comparison for
a fixed number of objectives (depicted in subplot title).
4 Conclusion
-GOMORS is a novel extension of the surrogate MO algorithm GOMORS [1],
that incorporates -dominance and connected restarts, to handle many-objective
optimization problems. A restart mechanism is introduced in -GOMORS to
ensure that the algorithm does not get stuck in locally optimum trade-off
solutions.
Results of -GOMORS on two many-objective test problems are promising,
indicating that the -dominance concept allows the algorithm to scale well for
up to six objectives. Moreover, results on many-objective test problems also show
that -GOMORS is more efficient than other state-of-the-art many-objective
evolutionary (non-surrogate) algorithms, -NSGA-II and NSGA-III, when eval-
uation budget is limited to 1000.
When applied to a watershed calibration problem, -GOMORS is more reli-
able than GOMORS (i.e., the variance of -GOMORS’ performance across mul-
tiple optimization trials is less), and considerably more efficient that other surro-
gate (e.g., the Gaussian Process-based ParEGO) and non-surrogate algorithms it
is compared against. In future, we intend to test -GOMORS with different sur-
rogate taxonomies [2] to further improve efficiency for surrogate many objective
optimization. Python implementation of -GOMORS is available upon request,
and will be made available online in future, as part of the pySOT toolbox [8].
References
1. Akhtar, T., Shoemaker, C.A.: Multi objective optimization of computationally
expensive multi-modal functions with RBF surrogates and multi-rule selection.
J. Global Optim. 64(1), 17–32 (2016)
2. Deb, K., Hussein, R., Roy, P.C., Toscano, G.: A taxonomy for metamodeling frame-
works for evolutionary multi-objective optimization. IEEE Trans. Evol. Comput.
1–1 (2018)
690 T. Akhtar et al.
1 Introduction
Bilevel programming problems are hierarchical optimization problems with two
levels, each of which is an optimization problem itself. The upper level problem
models the leader’s decision making problem whereas the lower level problem
models the follower’s problem. These two problems are coupled through common
variables.
Consider a particular problem formulated by Dempe and Franke [4]:
min aT x + bT y
x,y
s.t. x ∈ P1 , y ∈ P2 , (1)
T
y ∈ arg min
{x y : y ∈ P2 }.
y
where a and b are vectors with entries in Rmax and TP1 and TP2 are tropical
polyhedra of Rnmax , in the sense of the following definition.
Note that unlike the classical halfspace, the tropical halfspace is defined as a
solution set of a two-sided inequality, since we cannot move terms in the absence
of (immediately defined) tropical subtraction. Also note that any tropical poly-
hedron can be defined as a set of the form
{x ∈ Rnmax | Ax ⊕ c ≤ Bx ⊕ d}
where A, B are matrices and c, d are vectors with entries in Rmax of appropriate
dimensions. Furthermore, any tropical polyhedron is a tropically convex set in
the sense of the following definition:
optx,y f (x, y)
(2)
s.t. x ∈ TP1 , y ∈ arg min
{xT y : y ∈ TP2 },
y
Using φ(x) := min{xT y : y ∈ TP2 } we can write the lower level value function
y
(LLVF) reformulation of (2):
optx,y f (x, y)
(3)
s.t. x ∈ TP1 , y ∈ TP2 , xT y ≤ φ(x).
Further we will assume that f (x, y) is continuous and TP1 and TP2 are
compact in the topology1 induced by the metric ρ(x, y) = maxi |exi − eyi |.
Let us now introduce the following notion.
1
In other words, ef (x,y) is continuous and the sets {y ∈ Rn+ : log(y) ∈ TP1 } and
{z ∈ Rn
+ : log(z) ∈ TP1 } are compact in the usual Euclidean topology.
694 S. Sergeev and Z. Liu
optx,y f (x, y)
(4)
s.t. x ∈ TP1 , y ∈ TP2 .
We verify whether y 0 ∈ arg miny {xT0 y : y ∈ TP2 }. If “yes” then stop, (x0 , y 0 )
is a solution.
If not then find a point z 0 of Smin (TP2 ) that attains miny {(x0 )T y : y ∈
TP2 }. Let Z (0) = {z 0 }.
optx,y f (x, y)
(5)
s.t. x ∈ TP1 , y ∈ TP2 , xT y ≤ min xT z.
z∈Z (k−1)
Proof. First observe that as TP1 and TP2 are compact then the feasible set
of (4) is also compact. The feasible set of (5) is also compact as intersection of
the compact set TP1 × TP2 with the closed set
As f (x, y) is continuous as a function of (x, y), the optima in (4) and (5) always
exist.
Tropical Analogues of a Dempe-Franke Bilevel Optimization Problem 695
belong to a finite (min-essential) subset of TP2 and hence there exist k1 and k2
such that k1 < k2 and z k1 = z k2 . However, z k1 ∈ Z (k2 −1) and hence
The inequalities turn into equalities, and (xk2 , z k2 ) is a globally optimal solution
since it is feasible for (2) and globally optimal for its relaxation (5).
Let us now argue that a finite min-essential set exists for each tropical poly-
hedron TP.
We have the following known observation. Note, however, that this observa-
tion does not hold in the usual convexity, as counterexamples on the plane can
be easily constructed.
The set of extreme points of a tropical polyhedron is finite, see for example
Allamigeon, Gaubert and Goubault [2]. Combining this with an observation that
the set {z ∈ TP : z ≤ y} is compact and hence contains a minimal point, we
obtain the following claims.
min f (x, y)
x,y
(7)
s.t. x ∈ TP1 , y ∈ TP2 , xT y ≤ min xT y ,
y ∈TP2
where f (x, y) is isotone with respect to the second argument: f (x, y 1 ) ≤ f (x, y 2 )
whenever y 1 ≤ y 2 . We can observe the following.
min f (x, y)
x,y
Step 2. For each point y ∈ M(TP2 ) we solve the following optimization prob-
lem:
min f (x, y )
x
(8)
s.t. x ∈ TP1 , xT y ≤ xT z ∀z ∈ M(TP2 ).
Step 3. Find the minimum among all Problems (8) for all y ∈ TP2 .
Note that when f (x, y) = aT x ⊕ bT y for some vectors a, b over Rmax , Prob-
lem (8) can be solved by any algorithm of tropical linear programming [1,3,7].
The set of all minimal points can be found by a combination of the tropical
double description method of [2] that finds the set of all extreme points and the
techniques of Preparata et al. for finding all minimal points of a finite set [9],
although clearly a more efficient procedure should be sought for this purpose.
Tropical Analogues of a Dempe-Franke Bilevel Optimization Problem 697
optx,y f (x, y)
(10)
s.t. x ∈ TP1 , y ∈ TP2 , xT y = φ(x),
where φ(x) = maxz {xT z : z ∈ TP2 }. The following are similar to Definitions 4
and 3.
Definition 6 (Maximal Points). Let TP be a tropical polyhedron. A point
x ∈ TP is called maximal if y ≥ x and y ∈ TP imply y = x.
Definition 7 (Max-Essential Subset). Let TP be a tropical polyhedron. Set
Smax is called a max-essential subset of TP, if for any x ∈ Rnmax the maximum
maxz {xT z : z ∈ TP} is attained at a point of Smax .
However, it is immediate that each compact tropical polyhedron contains its
greatest point, and the above notions trivialize.
Proposition 3. Let TP be a compact tropical polyhedron. Then TP contains its
greatest point y max .Furthermore, the singleton {y max } is a max-essential subset
of TP.
Proposition 3 implies that (9) and (10) are equivalent to
optx,y f (x, y)
(11)
s.t. x ∈ TP1 , y ∈ TP2 , xT y = xT y max ,
where y max is the greatest point of TP2 . The following result yields an immediate
solution of the max-max problem.
Corollary 2 (Solving Max-max Problem). If f (x, y) is isotone with respect
to both arguments and opt = max, then (xmax , y max ) is a globally optimal solu-
tion of (9), where xmax and y max are the greatest points of TP1 and TP2 .
Let us now consider (11) where f is not necessarily isotone, or where opt =
min as in the case of Min-max problem. Suppose that y max has all components
in R and define point x∗ with coordinates:
x∗i = ykmax .
k=i
Lemma 3. Let y max ∈ Rn . Consider sets I and J such that I ∪ J = [n] and
I ∩ J = ∅. Let x be such that
xi = x∗i ∀i ∈ I,
(12)
xi < x∗i ∀i ∈ J.
Proof. Observe that y ∈ TP2 implies y ≤ y max . With such x as in (12) and y
such that y ≤ y max , we have
⎛ ⎞
xT y max = x∗i yimax ⊕ xj yjmax = ⎝ ykmax ⎠ yimax = yimax ,
i∈I j∈J i∈I k=i k∈[n]
⎛ ⎞
xT y = x∗i yi ⊕ xj yj = ⎝ ykmax ⎠ yi ⊕ xj yj .
i∈I j∈J i∈I k=i j∈J
TPIJ ∗ −1
1 = {x ∈ TP1 : xj (xj ) < xi (x∗i )−1 ∀i ∈ I, j ∈ J,
xk (x∗k )−1 = xl (x∗l )−1
∀k, l ∈ I}
⎛ ⎞ (15)
TPIJ
2 = {y ∈ TP2 : ⎝ ykmax ⎠ yi = ykmax }
i∈I k=i k∈[n]
Note that “xj (x∗j )−1 ” means xj − x∗j in the usual arithmetics. Now, using
Lemma 3 we can prove the following.
where the union is taken over I and J are such that I ∩ J = ∅ and I ∪ J = [n].
Tropical Analogues of a Dempe-Franke Bilevel Optimization Problem 699
Theorem 2 suggests that Problem (11) (and, equivalently, (9)) can be solved
by the following straightforward procedure.
Step 1. For each partition I, J of [n], identify the system of inequalities (15)
defining TPIJ IJ
1 and TP2 and find a solution of the problem optx,y f (x, y) over
(x, y) ∈ TPIJ IJ
1 × TP2 , if such solution exists.
(−1, 0)
(0, 0)
(a) (b)
In this example, y max = (2, 1) (the greatest point of TP2 in Fig. 1b). There-
fore, x∗ = (1, 2). Table 1 shows three possible partitions of TP1 and TP2 . Par-
tition 1 corresponds to the line segment between (−2, −1) and (−2, −3) in TP1
and the line segment connecting y max and (2, −1) in TP2 (red). Partition 2 cor-
responds to the line segment between (−2, −1) and (−3, −1) in TP1 and the
line segment connecting y max and (1, 1) in TP2 (blue). Partition 3 corresponds
to the line segment between (−2, −1) and (−1, 0) in TP1 (green) and in TP2
the union of the line segment connecting y max and (1, 1) and the line segment
between y max and (2, −1) (green).
Assume the upper level objective is of the form min aT x ⊕ bT y, where a,
b ∈ R2 . In ordinary algebra it can be written as min {max{a1 + x1 , a2 + x2 , b1 +
y1 , b2 + y2 }}. It is obvious that the objective function is isotone with respect to x
700 S. Sergeev and Z. Liu
I J TPIJ
1 TPIJ
2
and y. In partition 1, x = (−2, −3) and y = (2, −1) is always a solution regardless
of a and b. In partition 2, x = (−3, −1) and y = (1, 1) is a solution. In partition
3, either x = (−2, −1) and y = (1, 1) or x = (−2, −1) and y = (2, −1) solve the
problem. However, these solutions are always dominated by the optimal points of
partition 1 and partition 2. Therefore, in this example, it is sufficient to consider
only partition 1 and partition 2. and decide between (x, y)1 = ((−2, −3), (2, −1))
and (x, y)2 = ((−3, −1), (1, 1)). Taking a1 = a2 = b1 = b2 makes (x, y)2 an
optimal solution of the problem, but taking a2 = 10 and a1 = b1 = b2 results in
(x, y)1 .
References
1. Allamigeon, X., Benchimol, P., Gaubert, S., Joswig, M.: Tropicalizing the simplex
algorithm. SIAM J. Discrete Math. 29(2), 751–795 (2015)
2. Allamigeon, X., Gaubert, S., Goubault, É.: Computing the vertices of tropical poly-
hedra using directed hypergraphs. Discrete Comput. Geom. 49, 247–279 (2013)
3. Butkovič, P.: Max-linear Systems: Theory and Algorithms. Springer, London (2010)
4. Dempe, S., Franke, S.: Solution algorithm for an optimistic linear Stackelberg prob-
lem. Comput. Oper. Res. 41, 277–281 (2014)
5. De Schutter, B., Heemels, W.P.M.H., Bemporad, A.: On the equivalence of linear
complementarity problems. Oper. Res. Lett. 30(4), 211–222 (2002)
6. De Schutter, B., Heemels, W.P.M.H., Bemporad, A.: Max-plus-algebraic problems
and the extended linear complementarity problem–algorithmic aspects. In: Proceed-
ings of the 15th IFAC World Congress. Barcelona, Spain (2002)
7. Gaubert, S., Katz, R.D., Sergeev, S.: Tropical linear-fractional programming and
parametric mean-payoff games. J. Symb. Comput. 47(12), 1447–1478 (2012)
Tropical Analogues of a Dempe-Franke Bilevel Optimization Problem 701
1 Introduction
will still be original which are the extensions of the existing theorems in other
articles and we have added the concept of (Φ, ρ)−invexity to these extensions so
that our results could be more general. We organize the paper as follows. In the
next section, we provide the notations to be used in the rest of the paper and in
Sect. 3, we present our main results.
2 Notations
In this section, we briefly overview some notions of nonsmooth analysis widely
used in formulations and proofs of main results of the papers [6,15]. As usual,
||x|| stands for the Euclidean norm of x ∈ Rn , and Bn denotes the closed unit ball
in Rn . Given x, y ∈ Rn , we write x y (resp. x < y) when xi ≤ yi (resp. xi < yi )
for all i ∈ {1, . . . , n}. Moreover, we write x ≤ y when x y and x = y. The
zero vector of Rn is denoted by 0n . Given a nonempty set A ⊆ Rn , we denote
by A, conv(A), and cone(A), the closure of A, the convex hull and convex cone
(containing the origin) generated by A, respectively. Also, we denote the Clarke
tangent cone of A at x̂ ∈ A by Γ (A, x̂), i.e.,
Γ (A, x̂) := v ∈ Rn | ∀{xr } ⊆ A, xr → x̂, ∀tr ↓ 0,
∃vr → v such that xr + tr vr ∈ A ∀r ∈ N .
Let x̂ ∈ Rn and let ϕ : Rn → R be a locally Lipschitz function. The Clarke direc-
tional derivative of ϕ at x̂ in the direction v ∈ Rn , and the Clarke subdifferential
of ϕ at x̂ are respectively given by
ϕ(y + tv) − ϕ(y)
ϕ0 (x̂; v) := lim sup
y→x̂, t↓0 t
and
∂c ϕ(x̂) := ξ ∈ Rn |
ξ, v ≤ ϕ0 (x̂; v) for all v ∈ Rn .
It is worth to observe that if x̂ is a minimizer of locally Lipschitz function φ on
a set C, then
0 ∈ ∂c φ(x̂) + N (C, x̂),
where N (C, x̂) denotes the Clarke normal cone of C at x̂, i.e.,
N (C, x̂) := x ∈ Rn |
x, a ≤ 0, ∀a ∈ Γ (C, x̂) .
3 Main Results
In this paper, we consider the following multiobjective semi-infinite programming
problem:
(P) inf f1 (x), f2 (x), . . . , fp (x)
s.t. gt (x) ≤ 0 t ∈ T,
x ∈ Rn ,
704 A. Sadeghieh et al.
M := {x ∈ Rn | gt (x) ≤ 0, ∀t ∈ T }.
It should be observed from [8,19] that the PLV property is strictly weaker than
continuity for (P ). Thus, the following simple theorem is better than its contin-
uous versions (see, e.g., [4,17]).
Theorem 1. (FJ necessary condition) Let x̂ be a weakly efficient solution of
(P ). If the PLV property holds at x̂, then there exist αi ≥ 0 (for i ∈ I), and
βt ≥ 0, (for t ∈ T (x̂)), with βt = 0 for finitely many indexes, such that
p
p
0n ∈ αi ∂c fi (x̂) + βt ∂c gt (x̂), and αi + βt = 1.
i=1 t∈T (x̂) i=1 t∈T (x̂)
Φ−Weak Slater Constraint Qualification in Nonsmooth 705
where, Θ(x) := maxi∈I {fi (x) − fi (x̂)} and Ψ(x) is defined as Definition 1. Thus,
by PLV property we deduce that
0n ∈ ∂c ϑ(x̂) ⊆ conv ∂c Θ(x̂) ∪ ∂c Ψ(x̂) ⊆ conv conv(Fx̂ ) ∪ conv(Gx̂ ) .
It is easy to check that M = {0}, and SC does not hold. Now, assume that T ∗
is a finite subset of T . Take max(T ∗ ) := q and xT ∗ := q+1
1
. Then
⎧
⎪ 1 1 1
⎪
⎪ x − = − <0 if t ∈ N ∩ T ∗,
⎨ T∗ t q+1 t
gt (xT ∗ ) =
⎪
⎪ 1
⎪
⎩ −xT ∗ = − <0 if t = 0.
q+1
Thus the GWSC holds.
As mentioned in [1], the definition of (Φ, ρ)−invexity generalizes the almost all
concepts of invexity and convexity.
Example 2. Consider a function Φ : R × R × R × R → R defined by
⎧ u 3
⎪
⎨ w − 3y 2 |x − y |
3
if y = 0,
Φ(x, y, u, w) :=
⎪
⎩
w|x3 | if y = 0.
Let x and x̂ be arbitrary elements of R. Since Φ(x, x̂, ., .) is a linear function and
r if y = 0,
Φ(x, y, 0, r) =
r|x3 | if y = 0,
thus the conditions (1) and (2) hold. Take ρ(x, y) := −1 for all x, y ∈ R, and
(x) := x3 . It is easy check that (3) holds too. Furthermore, as it follows by [3],
(.) is not an invex function on R with respect to any η : R × R → R.
Everywhere in the following, we will assume X equals to feasible solution of
(P ), i.e., X = M , but for the sake of simplicity we will omit to mention X. The
following definition is motivated by above comments.
Definition 4. Let Φ : Rn × Rn × Rn × R → R be a given function, and x̂ ∈ M .
We say that (P ) satisfies the Φ-weak SCQ (Φ-WSCQ, briefly) at x̂, if
– for each t ∈ T (x̂), the gt function is (Φ, ρt )−invex at x̂ for some given function
ρt : Rn × Rn → R.
Normally, we are interested to show Karush-Kuhn-Tucker necessary condition
for (P ) under Φ-WSCQ assumption. In fact, the following theorem guarantees
that Φ-WSCQ is a constraint qualification.
Theorem 2. (KKT necessary condition for weakly efficient solutions) Let x̂ be
a weak efficient solution of (P ). Suppose that the Φ-WSCQ is satisfied at x̂ with
ρt (x, x̂) ≥ 0 for every (x, t) ∈ Rn × T . Then, there exist αi ≥ 0 (for i ∈ I) with
p
i=1 αi = 1, and βt ≥ 0, (for t ∈ T (x̂)), with βt = 0 for finitely many indexes,
such that
p
0n ∈ αi ∂c fi (x̂) + βt ∂c gt (x̂). (4)
i=1 t∈T (x̂)
All we need to prove is that at least one αi should be positive. If it is not this
case, then
βt ζt = 0n , βt = 1.
t∈T ∗ t∈T ∗
Thus, owing to βt ρt (xT ∗ , x̂) ≥ 0 and Definition 2, we get
t∈T ∗
0 ≤ Φ xT ∗ , x̂, βt ζt , βt ρt (xT ∗ , x̂)
t∈T ∗ t∈T ∗
≤ βt Φ xT ∗ , x̂, ζt , ρ(xT ∗ , x̂)
t∈T ∗
≤ βt gt (xT ∗ ) − gt (x̂) = βt gt (xT ∗ ) < 0.
t∈T ∗ t∈T ∗
λ, f (x̂) ≤
λ, f (x), ∀x ∈ M.
708 A. Sadeghieh et al.
As proved in [7], the above definition of proper efficiency is weaker than its
other definitions (under some assumed conditions). Thus, the following theorem
can be extended to other sense of proper efficiency under further assumptions.
Theorem 3. (KKT strong necessary condition for properly efficient solutions)
Let x̂ be a proper efficient solution of (P ). Suppose that the Φ-WSCQ is satisfied
n
px̂ with ρt (x, x̂) ≥ 0 for every x ∈ R . Then, there exist αi > 0 (for i ∈ I) with
at
i=1 αi = 1, and βt ≥ 0, (for t ∈ T (x̂)), with βt = 0 for finitely many indexes,
such that (4) fulfilled.
Proof. By the definition of proper efficiency, there exist some scalars λi > 0 (for
i ∈ I) such that x̂ is a minimizer of the following scalar semi-infinite problem:
p
min λi fi (x).
x∈M
i=1
for some μt ≥ 0, (t ∈ T (x̂)), with μt = 0 for finitely many indexes. For each
i ∈ I take αi := pλi λi , and for each t ∈ T (x̂) put βt := pμt λi .
i=1 i=1
Proof. (a) By contradiction assume that Υ (x̂, ξ, ˆ λ̂) = 0 while x̂ is not a weak
efficient solution for (P ). Then, we can find a x0 ∈ M such that fi (x0 ) < fi (x̂)
for all i ∈ I. Thus, the (Φ, ρi )-invexity of fi functions implies that
According to the above inequalities, the (Φ, ρi )-invexity of fi functions, and the
assumption of λ̂ > 0p , we get
p
λ̂i Φ(x0 , x̂, ξˆi , ρ̂i ) < 0 =⇒ Υ (x̂, ξ,
ˆ λ̂) < 0,
i=1
∗ p p
Taking λi := pλi
λ∗
, we conclude that i=1 λi = 1, which implies i=1 λi fi is
i=1
i
a (Φ, ρ)−invex function at x̂ by [2]. Thus, the latter inequality implies that
p
p
p
λi Φ(x0 , x̂, ξˆi , ρ) ≤ λi fi (x0 ) − λi fi (x̂) < 0.
i=1 i=1 i=1
References
1. Antczak, T.: Saddle point criteria and Wolfe duality in nonsmooth (Φ, ρ)−invex
vector optimization problems with inequality and equality constraints. Int. J. Com-
put. Math. 92(5), 882–907 (2015)
2. Antczak, T., Stasiak, A.: (Φ, ρ)−invexity in nonsmooth optimization. Numer. Func.
Anal. Optim. 32, 1–25 (2015)
3. Caristi, G., Kanzi, M., Soleimani-damaneh, M.: On gap functions for nonsmooth
multiobjective optimization problems. Optim. Lett. (2017). https://doi.org/10.
1007/s11590-017-1110-4
4. Caristi, G., Ferrara, M., Stefanescu, A.: Semi-infinite multiobjective programming
with grneralized invexity. Math. Rep. 62, 217–233 (2010)
5. Caristi, G., Ferrara, M., Stefanescu, A.: Mathematical programming with (ρ, Φ)-
invexity. In: Konnor, I.V., Luc, D.T., Rubinov, A.M. (eds.) Generalized Convexity
and Related Topics. Lecture Notes in Economics and Mathematical Systems, vol.
583, pp. 167–176. Springer, Heidelberg (2006)
6. Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, Interscience (1983)
7. Ehrgott, M.: Multicriteria Optimization. Springer, Berlin (2005)
8. Goberna, M.A., Kanzi, N.: Optimality conditions in convex multiobjective SIP.
Math. Program. (2017). https://doi.org/10.1007/s10107-016-1081-8
9. Goberna, M., Guerra-Vazquez, F., Todorov, M.I.: Constraint qualifications in linear
vector semi-infinite optimization. Eur. J. Oper. Res. 227, 32–40 (2016)
10. Goberna, M.A., Guerra-Vazquez, F., Todorov, M.I.: Constraint qualifications in
convex vector semi-infinite optimization. Eur. J. Oper. Res. 249, 12–21 (2013)
11. Gopfert, A., Riahi, H., Tammer, C., Zalinescu, C.: Variational Methods in Partial
Ordered Spaces. Springer, New York (2003)
12. Guerraggio, A., Molho, E., Zaffaroni, A.: On the notion of proper efficiency in
vector optimization. J. Optim. Theory Appl. 82, 1–21 (1994)
13. Hearn, D.W.: The gap function of a convex program. Oper. Res. Lett. 1, 67–71
(1982)
14. Hettich, R., Kortanek, O.: Semi-infinite programming: theory, methods, and appli-
cations. SIAM Rev. 35, 380–429 (1993)
15. Hiriart-Urruty, J.B., Lemarechal, C.: Convex Analysis and Minimization Algo-
rithms. I & II. Springer, Heidelberg (1991)
16. Kanzi, N., Shaker Ardekani, J., Caristi, G.: Optimality scalarization and duality
in linear vector semi-infinite programming. Optimization (2018). https://doi.org/
10.1080/02331934.2018.1454921
17. Kanzi, N.: Necessary and sufficient conditions for (weakly) efficient of nondiffer-
entiable multi-objective semi-infinite programming. Iran. J. Sci. Technol. Trans. A
Sci. (2017). https://doi.org/10.1007/s40995-017-156-6
18. Kanzi, N.: Necessary Optimality conditions for nonsmooth semi-infinite program-
ming problems. J. Global Optim. 49, 713–725 (2011)
19. Kanzi, N.: Constraint qualifications in semi-infinite systems and their applications
in nonsmooth semi-infinite problems with mixed constraints. SIAM J. Optim. 24,
559–572 (2014)
20. Kanzi, N.: On strong KKT optimality conditions for multiobjective semi-infinite
programming problems with Lipschitzian data. Optim. Lett. 9, 1121–1129 (2015)
21. Kanzi, N., Nobakhtian, S.: Optimality conditions for nonsmooth semi-infinite mul-
tiobjective programming. Optim. Lett. (2013). https://doi.org/10.1007/s11590-
013-0683-9
22. López, M.A., Vercher, E.: Optimality conditions for nondifferentiable convex semi-
infinite programming. Math. Program. 27, 307–319 (1983)
Data science: Machine Learning, Data
Analysis, Big Data and Computer Vision
A Discretization Algorithm for k-Means
with Capacity Constraints
1 Introduction
Clustering problem is a task that divides data into several clusters such that the
data within a same cluster have large similarity and data between clusters have
large diversity. However, most of the clustering problems are not so tractable
than imagine, for example the k-means problem. It is one of the most classical
and fundamental problem in theoretical computer science, combinatorial opti-
mization, machine learning and artificial intelligence. As the most efficient tool
of text clustering and sentiment analysis, k-means get more and more attentions
from IT companies like Google, Baidu and Microsoft.
In practical clustering task, especially for the large-scale clusterings, most
existing clustering methods suffer from expensive computation and memory
costs. Shen et al. [10] propose a compressed k-means clustering algorithm that
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 713–719, 2020.
https://doi.org/10.1007/978-3-030-21803-4_71
714 Y. Xu et al.
compresses high-dimensional data into short binary codes, which are well suited
for fast large-scale clustering. Also, spectral clustering is one of the most impor-
tant approaches under big data environment. Shen and Cai [3] propose a novel
approach called Landmark-based Spectral Clustering that select only a mall
amount of representative data points as the landmarks representing the original
ones. Numerical experiments show the spectral embedding of the data can be
efficiently computed with the landmark-based representation. Further researches
are in progress.
Theoretically, Matoušek [9] proposes the concept of -approximation centroid
set for the geometric k-clusterings inspiring some well known results. First, the
first constant approximation for k-means clustering [7]. Based on local search
heuristic that allows swapping centers in and out, Kanungo et al. present a (9+)-
approximation algorithm. Second, Cohen-Addad et al. [4] and Friggstad et al. [5]
independently propose a PTAS (Polynomial Time Approximation Scheme) for
a large scale of k-clusterings (including k-means) in fixed dimensional Euclidean
space.
Most results for capacitated k-clusterings are based on linear programming
techniques. However, due to the limitation of the LP-based techniques, these
results are mostly pseudo approximations that obey either the capacity con-
straints or the cardinality constraint. Byrka et al. [2] consider the capacitated
k-median and propose a bi-factor algorithm that results in two directions: vio-
lates the capacities by a factor of 2+ obtaining O(1/)-approximation or violates
the capacities by a factor of 3 + obtaining constant approximation. Later Li
[8] moves a step towards the constant approximation algorithm for capacitated
k-median by proposing an O(1/2 )-approximation algorithm which violates the
cardinality constraint by a factor of (1 + ). An et al. [1] studied the capacitated
k-center problem and propose a LP-based 9-approximation algorithm. Heuris-
tic techniques also works sometimes but not always, see the improved k-means
algorithm [6] for an example. No efficient capacitated k-means clusterings are
known.
Inspired by recent progress on k-clusterings, we considered the capacitated
k-means and propose a discretization algorithm that in polynomial time outputs
an -approximation centroid set with acceptable size. In other words, we put the
infinite continuous centroid set into intersections of a well-constructed grid with
loss of at most a factor of (1 + ). Moreover, our result implies an FPT(k,d) (i.e.
fix parameters k and d) PTAS for capacitated k-means. We hope this result is a
significant step towards constant approximations.
The remainder of this paper is organized as: In the next section we intro-
duce some basic concepts, mainly the -approximate centroid set. In the third
section we show how the -approximate centroid set works and how to deal with
capacitated k-means when we are given centroid set. We build the -approximate
centroid set for simple instances and then extend it to general instances in forth
section. As an application, we present an FPT PTAS for capacitated k-means
based on the proposed discretization algorithm in fifth section. Further work are
discussed in this section as well.
A Discretization Algorithm for k-Means with Capacity Constraints 715
k
cost(S, C) = min cost(Si , ci ),
Π:=(S1 ,S2 ,··· ,Sk )
i=1
Next we will show how and why the -approximate centroid set works.
716 Y. Xu et al.
Definition 3. η-dense set We say A is an η-dense set for A if for any point
of A, there exists a point in A that is within η distance of it.
We only present the theorem for general instances, more detail see journal version
of this paper.
718 Y. Xu et al.
References
1. An, H.C., Bhaskara, A., Chekuri, C., Gupta, S., Madan, V., Svensson, O.: Cen-
trality of trees for capacitated k-center. Math. Program. 154(1–2), 29–53 (2015)
2. Byrka J., Fleszar K., Rybicki B., Spoerhase J.: Bi-factor approximation algorithms
for hard capacitated k-median problems. In: Proceedings of the 26th Annual ACM-
SIAM Symposium on Discrete Algorithms, pp. 722–736. SIAM, San Diego, USA
(2015)
3. Chen X., Cai D.: Large scale spectral clustering with landmark-based representa-
tion. In: Proceedings of the 25th AAAI Conference on Artificial Intelligence, pp.
313–318. AAAI, San Francisco, USA (2011)
4. Cohen-Addad V., Klein P.N., Mathieu C.: Local search yields approximation
schemes for k-means and k-median in Euclidean and minor-free metrics. In: Pro-
ceedings of the 57th IEEE Annual Symposium on Foundations of Computer Sci-
ence, pp. 353–364. IEEE, New Brunswick, USA (2016)
5. Friggstad Z., Rezapour M., Salavatipour M.R.: Local search yields a PTAS for k-
means in doubling metrics. In: Proceedings of the 57th IEEE Annual Symposium
on Foundations of Computer Science, pp. 365–374. IEEE, New Brunswick, USA
(2016)
6. Geetha S., Poonthalir G., Vanathi P.: Improved k-means algorithm for capacitated
clustering problem. In: Proceedings of the 28th IEEE Conference on Computer
Communications, pp. 52–59. IEEE, Rio de Janeiro, Brazil (2009)
7. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu,
A.Y.: A local search approximation algorithm for k-means clustering. Comput.
Geom. 28(2–3), 89–112 (2004)
8. Li, S.: On uniform capacitated k-median beyond the natural LP Relaxation. ACM
Trans. Algorithms 13(2), 1–22 (2017)
9. Matoušek, J.: On approximate geometric k-clustering. Discret. Comput. Geom.
24(1), 61–84 (2000)
10. Shen X., Liu W., Tsang I., Shen F., Sun Q.: Compressed k-means for large-scale
clustering. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence,
pp. 2527–2533. AAAI, San Francisco, USA (2017)
A Gray-Box Approach for Curriculum
Learning
1 Introduction
Note that γm is used to emphasize the rewards that occur early during an
episode. We say that st is an absorbing state if st = st for all t ≥ t, and
rm (st , a) = 0 for any action a ∈ A, that is, the state can never be left, and
from that point on the agent receives a reward of 0. Absorbing states effectively
terminate an episode before the maximum number of time steps Tm is reached.
The policy function π is obtained from an estimate qπ of the value function
⎡ ⎤
m −1
T
qπ (s, a) E ⎣ (γm )j−t rm (sj , aj ) : st = s, at = a⎦ ,
j=t
for any state s ∈ S and action a ∈ A. The value function is the expected reward
for taking action a in state s at any possible time step t and following π thereafter
until the end of the episode. We linearly approximate the value function qπ in a
parameter θ ∈ D ⊂ RK :
K
qπ (s, a; θ) θk φk (s, a),
k=1
722 F. Foglino et al.
where φk are suitable basis functions mapping the pair (s, a) into R. The policy
function π for any point (s, a) ∈ S × A and any parameter θ ∈ D, is given by
qπ (s, a; θ)
π(s, a; θ) .
qπ (s, α; θ)
α∈A
We want the agent to quickly obtain great values of ψmL in a specific environment
mL that we call the final task. To do this, it is crucial to ensure that the
reinforcement learning phase in the final task mL starts from a good initial
point θL ideally close to a global maximum of ψmL over D. Curriculum learning
is actually a way to obtain a good starting point θL computed by sequentially
learning the policy on a subset of possible tasks (i.e. environments) different
from the final task mL , see e.g. [8] and references therein. The curriculum
c = (m0 , . . . , mL−1 ) is the sequence of these tasks in which the policy of the
agent is optimized before addressing the final task mL . Specifically, given a
starting θ0 ∈ D, the point θ1 is obtained by (approximately) maximizing ψm0
over {θ ∈ D : θ − θ0 < ζm0 }, the point θ2 is obtained by (approximately)
maximizing ψm1 over {θ ∈ D : θ − θ1 < ζm1 }, and so on. At the end of this
process we get a point θL ready to be used as starting guess for the optimization
of the policy in the final task mL . Clearly, the obtained θL depends on the
A Gray-Box Approach for Curriculum Learning 723
NmL
Pr (c) g − ψmL θL+(i/NmL ) (c) ,
i=1
where g is a given good performance threshold (which can be the total reward
obtained with the optimal policy when known), and θL+(i/NmL ) (c) is the point
obtained with the learning algorithm at the end of the ith episode. Given the
curriculum c, the function Pr (c) sums the gaps between the threshold g and the
total reward actually achieved at every episode. Clearly the aim is to minimize
it
minimize Pr (c), (3)
c∈C
This concept of penalty in assumption (A1) is useful to model the fact that a
task mj can be preparatory for another task mi . In this sense, if the policy is
not optimized in the preparatory task mj before it is optimized in task mi , then
the utility given by task mi has to be reduced by the corresponding penalty.
We intend to approximate U with the following function that is linear with
respect to (δ, γ):
n
n
n
(δ, γ; u, p)
U u i δi − pij γij .
i=1 i=1 i=j=1
maximize (δ, γ; u, p)
U
x,δ,γ
subject to xi ≥ (L − 1)(1 − δi ), i = 1, . . . , n
xi + δj ≤ xj + Lγji , i = 1, . . . , n, j = 1, . . . , n, i = j (4)
γij + γji ≤ 1, i = 1, . . . , n, j = 1, . . . , n, i = j
x ∈ [0, L − 1]n ∩ Zn , δ ∈ {0, 1}n , γ ∈ {0, 1}n×(n−1) .
Problem (4) is an Integer Linear Program (ILP) that can be solved by resorting
to many algorithms in the literature.
The following properties hold:
– Let ( γ
x, δ, ) be an optimal point of the scheduling problem (4) with (u, p) ∈
n×(n−1)
R+ ×R+
n
. Let 0, . . . , m
c = (m L−1 ) be such that, for all j ∈ {0, . . . , L−1},
j = m ∈ T with x
m ord(m) = j and δord(m) = 1, where the operator ord(m)
returns the index of the task m in T . Then c ∈ C, i.e.
c is a feasible curriculum.
A Gray-Box Approach for Curriculum Learning 725
– Let
c = (m 0, . . . , m
L−1 ) be any curriculum in C, then parameters ( u, p) ∈
n×(n−1)
R+ × R+
n
exist such that solving problem (4) with (u, p) = ( u, p) gives
such that x
x ord(m j) = j and
δ j)
ord(m = 1, for all j ∈ {0, . . . , L − 1}. That is,
any curriculum in C can be computed by solving problem (4) with suitable
parameters (u, p).
The gray-box function Ψ can be used in different ways in order to solve the
curriculum learning problem efficiently. Here we consider three of them.
Computing a good estimate for (u, p) can be critical for obtaining good numerical
performances. Here we propose a method that is justified by the assumption
(A1). In that, if the assumption (A1) holds, then we have for any (i, j) with
i = j:
n
n
U (mi , mj ) = ui + uj − pik − pjk + U ,
k=1, k=i k=1, j=k=i
n
n
U (mi ) = ui − pik + U , U (mj ) = uj − pjk + U ,
k=1, k=i k=1, k=j
726 F. Foglino et al.
6 Experimental Evaluation
In order to evaluate the effectiveness of the proposed framework, we implemented
it on the GridWorld domain. In this section, we describe the GridWorld’s setting
and all the libraries adopted for the definition of the framework.
6.1 GridWorld
GridWorld is an implementation of an episodic grid-world domain used in the
evaluation of existing curriculum learning methods, see e.g. [15]. Each cell can
be free, or occupied by a fire, pit, or treasure. The aim of the game is to find the
treasure in the least number of possible episodes, avoiding both fires and pits.
An example of GridWorld is shown in Fig. 1.
States S: The state is given by the agent position, that is d = 2.
Actions A and transition function pm : The agent can move in the four
cardinal directions, and the actions are deterministic.
Reward function rm : The reward is −2500 for entering a pit, −500 for entering
a fire, −250 for entering the cell next to a fire, and 200 for entering a cell
with the treasure. The reward is −1 in all other cases.
Episodes length Tm , absorbing states, discount parameter γm : All the
episodes terminate under one of these three conditions: the agent falls into a
pit, reaches the treasure, or executes a maximum number of actions (Tm =
50). We use γm = 0.99.
Basis functions φk : The variables fed to tile coding are the distance from,
and relative position of, the treasure (which is global and fulfills the Markov
property), and distance from, and relative position of, any pit or fire within a
radius of 2 cells from the agent (which are local variables, and allow the agent
to learn how to deal with these objects when they are close, and transfer this
knowledge from a task to another).
We consider tasks of dimensions similar to Fig. 1 and with a variable number of
fires and pits. The number of episodes for all the tasks is the same.
A Gray-Box Approach for Curriculum Learning 727
docplex (v 2.8.125): version of Cplex used for solving the ILP (4). We set the
running time to 60 s per iteration and the mipgap to 10−2 .
GPyOpt (v 1.2.5): used as black-box optimization algorithm for solving prob-
lem (5) when no information on good estimates of (u, p) is available. It is
a Sequential Model Based Optimization (SMBO) algorithm where the sur-
rogate function is defined through a Gaussian Process and the new point is
determined by the maximization of the EI [1,12].
728 F. Foglino et al.
Table 1. Results obtained on GridWorld domain problem (Pr∗ indicates the regret
obtained with the optimal policy).
n = 12, L = 4 n = 7, L = 7
Algorithm Pr Rank Pr Rank
C0 −0,6389 11499 −0,5051 4535
GREEDY Par −0,7765 144 −0,6113 260
GP −0,7882 32 −0,6511 38
Heuristic −0,7773 121 −0,5966 417
TPE −0,8025 4 −0,6697 14
Pr∗ : −0, 8149, |C| = 13345 Pr∗ : −0, 7224, |C| = 13700
From the numerical results, it is evident how all the proposed optimization meth-
ods based on the gray-box are able to improve the performance value Pr obtained
when training the agent directly on the final task (algorithm C0 ). As a proof of
the effectiveness of the proposed heuristic method from (6) and (7), we highlight
A Gray-Box Approach for Curriculum Learning 729
how this procedure is always able to find better solutions than C0 and similar
solutions to those returned by GREEDY Par. Moreover, the definition of a sur-
rogate function through a Gaussian Process seems to be a successful choice in
order to further improve the solution found. Finally, the local search performed
by TPE around the tentative point (u, p) leads to a remarkable improvement
of the final performance by finding, in both the two scenarios, one of 15th best
solutions out of the more than 13000 possible curricula.
References
1. Gpyopt: a bayesian optimization framework in python. http://github.com/
SheffieldML/GPyOpt (2016)
2. Belotti, P., Kirches, C., Leyffer, S., Linderoth, J., Luedtke, J., Mahajan, A.: Mixed-
integer nonlinear optimization. Acta Numer. 22, 1–131 (2013)
3. Bergstra, J.: Hyperopt: distributed asynchronous hyperparameter optimization in
python (2013)
4. Bergstra, J., Yamins, D., Cox, D.D.: Making a science of model search: hyperpa-
rameter optimization in hundreds of dimensions for vision architectures (2013)
5. Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter
optimization. In: Advances in Neural Information Processing Systems, pp. 2546–
2554 (2011)
6. Custódio, A.L., Scheinberg, K., Nunes Vicente, L.: Methodologies and software
for derivative-free optimization. In: Advances and Trends in Optimization with
Engineering Applications, pp. 495–506 (2017)
7. Di Pillo, G., Liuzzi, G., Lucidi, S., Piccialli, V., Rinaldi, F.: A DIRECT-type app-
roach for derivative-free constrained global optimization. Comput. Optim. Appl.
65(2), 361–397 (2016)
8. Foglino, F., Leonetti, M.: An optimization framework for task sequencing in cur-
riculum learning (2019). arXiv preprint arXiv:1901.11478
9. Frazier, P.I.: A tutorial on bayesian optimization (2018). arXiv preprint
arXiv:1807.02811
10. Leonetti, M., Kormushev, P., Sagratella, S.: Combining local and global direct
derivative-free optimization for reinforcement learning. Cybern. Inf. Technol.
12(3), 53–65 (2012)
11. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G.,
Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level
control through deep reinforcement learning. Nature 518(7540), 529 (2015)
12. Rasmussen, C.E.: Gaussian processes in machine learning. In: Advanced Lectures
on Machine Learning, pp. 63–71. Springer (2004)
13. Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., De Freitas, N.: Taking the
human out of the loop: a review of bayesian optimization. Proc. IEEE 104(1),
148–175 (2016)
14. Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine
learning algorithms. In: Advances in Neural Information Processing Systems, pp.
2951–2959 (2012)
15. Svetlik, M., Leonetti, M., Sinapov, J., Shah, R., Walker, N., Stone, P.: Automatic
curriculum graph generation for reinforcement learning agents. In: AAAI, pp. 2590–
2596 (2017)
A Study on Graph-Structured Recurrent
Neural Networks and Sparsification with
Application to Epidemic Forecasting
Zhijian Li1 , Xiyang Luo2 , Bao Wang2 , Andrea L. Bertozzi2 , and Jack Xin1(B)
1
UC Irvine, Irvine, CA, USA
zhijil2@uci.edu, jxin@math.uci.edu
2
UCLA, Los Angeles, CA, USA
xylmath@gmail.com, wangbaonj@gmail.com, bertozzi@math.ucla.edu
1 Introduction
Epidemic forecasting has been studied for decades [8]. Many statistical and
machine learning methods have been successfully used to detect epidemic out-
breaks [5]. In previous works, epidemic forecasting is mainly considered as a time-
series problem. Time-series methods, such as Auto-Regression (AR), Long Short-
term Memory (LSTM) neural networks and their variants have been applied to
this problem. One of the current directions is to use social media data [9]. In
2008, Google launched Google Flu Trend, a digital service to predict influenza
outbreaks using Google search data. The Google algorithm was discontinued
due to flaws, however Yang et al. [13] designed another algorithm ARGO in
2015 also using Google search pattern data. Google Correlate, a collection of
time-series data of Google search trends, plays a vital role in this refined regres-
sion algorithm. Though ARGO succeeded in accuracy as a time series algorithm,
it lacks spatial structure and requires the additional input of external features
(e.g., social media data). The infectious and spreading nature of the epidemics
suggests that forecasting is also a spatial problem. Here we study a model to
take advantage of the spatial information so that the data from the adjacent
regions can introduce regional spatial features. This way, we minimize external
data input and the accompanying computational cost. Structured recurrent neu-
ral network (SRNN) is a model for the spatial-temporal problem first adopted
by Jain et al. [4] for motion forecasting in computer vision. Wang et al. [10–12]
successfully adapted SRNN to forecast real-time crime activities. Motivated by
[4,10,11], we present an SRNN model to forecast epidemic activity levels. We test
our model with data provided by the Center for Disease Control (CDC), which
collects data from approximately 100 public and 300 private laboratories in the
US [1]. The CDC data [1] is a well-established authoritative data set widely used
by researchers, which makes it easy for us to compare our model with previous
work. CDC provides the influenza data by the geography of Health and Human
Services regions (HHS regions). We take the geographic structure of ten HHS
regions as our spatial information. The rest of the paper is organized as fol-
lows. In Sect. 2, we overview RNN. In Sects. 3–5, we present a graph-structured
RNN model, graph description of spatial correlations, and sparsity promoting
penalties. Experimental results and concluding remarks are in Sects. 6 and 7.
yˆt unfold ˆ
yt−1 yˆt ˆ
yt+1
V V V V
W W W
W ht ht−1 ht h+1
U U U U
xt xt−1 xt xt+1
∂ yˆt ∂si+1
t−1
∂st
|| || ≤ η t−k || ||
∂st ∂si ∂ yˆt
i=k
where η < 1 under the assumption of no bias is used and the spectral norm of W
being less than 1. We see that the gradient vanishes exponentially fast in large t.
Hence, the RNN is learning less and less as time goes by LSTM [3], is a special
kind of RNN that resolves this problem.
ft σ(W [ht−1,xt ]) bf
it σ(W [ht−1,xt ]) bi
= +
ot σ(W [ht−1,xt ]) bo
C̃t tanh(W [ht−1,xt ]) bC
Ct = ft ∗ Ct−1 + it ∗ C̃t , ht = ot ∗ tanh(Ct ).
Since one does not directly apply the same recurrent function to ht every time
step in the gradient flow, there is no intrinsic factor η in ∂Lt
∂W . This way the
gradient has much less chance to vanish as time goes by. In our model, we use
LSTM for all RNNs.
|v| − m
M = arg max |v|, m = arg min |v|, g(v) = .
v v M − m + 10−6
In our model, nodes with label 0 are in the relatively inactive class, nodes with
label 1 or higher belong to another class, the relatively active class.
We define an RNN Ei,j for each connected edge wij = 0. We denote Ei,j as the
edge RNN since it models the pairwise interaction between two connected nodes.
We enforce weight sharing among two edge RNNs, RNNEi ,j and RNNEi,j , if
g(i) = g(i ), g(j) = g(j ), i.e., if the class assignments of the two node pairs are
the same. Similarly, we define an RNN Ni , for each node in Z, which we denote
as a node RNN, and apply weight sharing if g(i ) = g(i). Even though the RNNs
share weights, there state vector are still different, and thus we denote them with
distinct indices.
Let {vit , i ∈ 1 . . . N } be the set of node features at time t. The GSRNN
makes a prediction at node i, time t by first feeding neighboring features to its
respective edge RNN, and then feeding the averaged output along with the node
features to the respective node RNN. Namely,
fit = wij RNNEi,j (vit , αij vjt ), ŷit = RNNNi (vit , fit ). (1)
j
Let yit be the true signal at time t. We use the mean square loss function below:
1 t
Lt (Θ) = (ŷi − yit )2 . (2)
N i
etv1 v3 v3t
v1t
v5t v6t
relatively active
class
v2t
etu1 v5 v4t
ut3
ut1
ut4
relatively inactive
class
etu1 u2
ut2
Fig. 2. red edges are of type H-L, green edges are of type L-L, and blue edges are of
type H-H.
734 Z. Li et al.
In our model of C = 2, we have three types of edges, H-H, L-L, and H-L.
The H-H is the type of edge between two nodes in class H, L-L is the type of
edge between two nodes of class L, and H-L is the type of edge a node of class
H and a node of class L. Each type of edge features will be fed into a different
RNN. We normalize our edge weight by maximum degree. Each edge has weight
αij = wij = M1e , ∀ i and j, where Me is the maximum degree over the ten nodes.
We use a look-back window of two to generate training data for RNN: the node
feature of v t contains the information of node v at t − 1 and t − 2. Then, the
edge features of a node v ∈ H with the edges Ev are:
t
v1 v2t
etv,H = , ···
Me Me
u1 ut2
etv,L = , ···
Me Me
for all ui ∈ L such that (v, ui ) ∈ Ev . We feed etv,H and etv,L into the corresponding
edgeRNNs:
1 1
ft = edgeRNNH−L (v t , etv,L ), htv = edgeRNNH−H (v t , etv,H ).
Me Me
Each edge RNN will jointly train all the nodes that have an edge belong to its
type:
1
arg min LH−L (Θ) = (yw t − ŷw
t 2
)
θ |Nw | t
w∈Nw
We feed the outputs of two edge RNNs, together the node feature of v itself into
nodeRNNH (Fig. 3):
v t+1 = nodeRNNH (v t , f t , ht ).
Graph-Structured RNN and Sparsification on Epidemic Forecasting 735
nodeRN NH
node
node features:
features: H-H
0-0 edgeRNN:H-H
nodeRN NL
Fig. 3. Edge features of the same type are jointly trained by one edge RNN. Nodes
from the same class are jointly trained by one node RNN.
10 8 2 1
5 3
7
9 4
6
Since lima→0+ ρa (xi ) = 1{xi =0} , lima→+∞ ρa (xi ) = |xi |, ∀i, the T1 penalty
interpolates 1 and 0 . For its sparsification in compressed sensing and other
applications, see [14] and references therein. To sparsify weights in GSRNN
training via 1 and T1 , we add them to the loss function of GSRNN with a
multiplicative penalty parameter α > 0, and call stochastic gradient descent
optimizer on Tensorflow. Though a relaxed splitting method [2] can enforce
sparsity much faster, we shall leave this as part of future work on 0 penalty.
6 Experimental Results
Among the previous works on influenza forecasting, ARGO [13] is the current
state-of-the-art prediction model for the entire U.S. influenza activity. To com-
pare with previous works conveniently, we use the CDC data
from 2013 to 2015
n
as our test data. The accuracy is measured in: RMSE = n i=1 (yi − yˆi )2 .
1
We use a single layer LSTM with 40 hidden units for edge RNNs, and a
three-layer multilayer LSTM with hidden units [10, 40, 10] for node RNNs. We
use the Adam optimizer to train GSRNN. The RMSE of the forecasting from
2013/1/19 to 2015/8/15, 135 weeks in total, is shown in Table 1. We outperform
LSTM and Autoregressive Model of order 3 (AR(3)) in all nodes, and ARGO in 8
nodes, see Fig. 5 for activity plots in each region. It is easy to see that in regions
1, 2, 7 and 8, there are some under-predictions, while GSRNN’s prediction is
almost identical to the ground-truth. The general form of an AR(p) model for
time-series data is
p
Xt = μ + φi Xt−i + ,
i=1
Graph-Structured RNN and Sparsification on Epidemic Forecasting 737
where φ = (φ1 , ..., φp ) is computed through the backshift operator. ARGO [13],
as a refined autoregressive model, models the flu activity level as:
52
100
yˆt = μy + αj yt−j + βi Xi,t + t , [μy , α, β] := arg min (yt − yˆt )2 ,
j=1 i=1 μy , α , β t
t being i.i.d Gaussian noise, Xi,t the log-transformed Google search frequency
of term i at time t.
We observe that ARGO has inconsistent performance over nodes. We believe
this is because the external feature of ARGO, the Google search pattern data,
does not offer useful information, since the national search pattern does not
necessarily apply to a certain HHS region. Meanwhile, we also have much less
computational cost than ARGO, which takes in top 100 search terms related
to influenza as well as their historical activity levels, with a look-back window
length of 52 weeks. During the time for ARGO to compute one node, our model
finishes all the ten nodes.
We sparsify the network through 1 and T1 (Eq. (3) using a = 1 and penalty
parameter α = 10−8 during training). Post training, we hard threshold small
network weights to 0 at threshold 10−3 , and find that high sparsity under T1
regularization is achieved while maintaining the accuracy at the same level, see
Tables 2 and 3. Hard-thresholding improves the predictions for some nodes but
not all of them, however it reduces the inference latency and is thus beneficial
for the overall algorithm.
Table 1. The RMSE between the predicted and ground-truth activity levels by differ-
ent methods over 10 different states.
Node 1 2 3 4 5 6 7 8 9 10
AR(3) 0.242 0.383 0.481 0.415 0.345 0.797 0.401 0.305 0.356 0.317
ARGO 0.281 0.379 0.397 0.335 0.285 0.673 0.449 0.244 0.356 0.310
LSTM 0.271 0.364 0.487 0.349 0.328 0.751 0.421 0.333 0.335 0.310
GSRNN 0.223 0.354 0.374 0.320 0.289 0.664 0.361 0.275 0.284 0.303
Penalty 1 2 3 4 5
α=0 51.2% 47.8% 50.3% 50.6% 49.9%
l1 (α = 5 · 10−8 ) 67.7% 51.8% 57.7% 60.7% 61.2%
T L1(α = 5 · 10−8 ) 82.3% 58.9% 71.9% 64.2% 71.1%
738 Z. Li et al.
Fig. 5. The exact and predicted flu activity levels by GSRNN and ARGO.
Node 1 2 3 4 5 6 7 8 9 10
α=0 0.230 0.351 0.390 0.334 0.314 0.676 0.380 0.297 0.287 0.316
l1 (α = 5 · 10−8 ) 0.234 0.351 0.388 0.327 0.306 0.685 0.363 0.290 0.281 0.296
T L1(α = 5 · 10−8 ) 0.225 0.363 0.379 0.328 0.296 0.690 0.365 0.272 0.311 0.305
7 Concluding Remarks
We studied epidemic forecasting based on a graph-structured RNN model to take
into account geo-spatial information. We also sparsified the model and reduced
70% of the network weights to zero while maintaining the same level of pre-
diction accuracy. In future work, we plan to (1) explore wider neighborhood
interactions and more powerful sparsification methods, (2) study additional fac-
tors such as environmental conditions, population distribution, transportation
Graph-Structured RNN and Sparsification on Epidemic Forecasting 739
networks, sanitary conditions among others, (3) train RNNs with the recently
developed Laplacian smoothing gradient descent method [6].
References
1. CDC data: https://gis.cdc.gov/grasp/fluview/fluportaldashboard.html
2. Dinh, T., Xin, J.: Convergence of a relaxed variable splitting method for
learning sparse neural networks via 1 , 0 , and transformed-1 penalties (2018).
ArXiv: 1812.05719
3. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8),
1735–1780 (1997)
4. Jain, A., Zamir, A., Savarese, S., Saxena, A.: Structural-RNN: deep learning on
spatio-temporal graphs. In: Conference on Computer Vision and Pattern Recogni-
tion (CVPR 2016) (2016)
5. Nsoesie, E., Brownstein, J., Ramakrishnan, N., Marathe, M.: A systematic review
of studies on forecasting the dynamics of influenza outbreaks. Influenza Other
Respir. Viruses 8(3), 309–316 (2014)
6. Osher, S., Wang, B., Yin, P., Luo, X., Pham, M., Lin, A.: Laplacian smoothing
gradient descent (2018). ArXiv:1806.06317
7. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neu-
ral networks. In: Proceedings of the 30th International Conference on Machine
Learning (2013)
8. Perra, N., Goncalves, B.: Modeling and predicting human infectious disease. In:
Social Phenomena, pp. 59–83. Springer (2015)
9. Volkova, S., Ayton, E., Porterfield, K., Corley, C.: Forecasting influenza-like illness
dynamics for military populations using neural networks and social media. PLOS
one 12(12), e0188941 (2017)
10. Wang, B., Luo, X., Zhang, F., Yuan, B., Bertozzi, A., Brantingham, P.: Graph-
based deep modeling and real time forecasting of sparse spatio-temporal data
(2018). arXiv:1804.00684
11. Wang, B., Yin, P., Bertozzi, A., Brantingham, P., Osher, S., Xin, J.: Deep learning
for real-time crime forecasting and its ternarization (2017). arXiv:1711.08833
12. Wang, B., Zhang, D., Zhang, D., Brantingham, P., Bertozzi, A.: Deep learning for
real-time crime forecasting (2017). arXiv:1707.03340
13. Yang, S., Santillana, M., Kou, S.: Accurate estimation of influenza epidemics using
Google search data via ARGO. Proc. Natl. Acad. Sci. 112(47) (2015)
14. Zhang, S., Xin, J.: Minimization of transformed 1 penalty: theory, difference of
convex function algorithm, and robust application in compressed sensing. Math.
Program. Ser. B 169(1), 307–336 (2018)
Automatic Identification of Intracranial
Hemorrhage on CT/MRI Image Using
Meta-Architectures Improved from
Region-Based CNN
1 Introduction
According to WHO statistics, stroke remains the second leading cause of global
human deaths in the last 15 years [1]. Hemorrhage stroke is known as acute
stroke due to its abrupt symptom onset and rapid deterioration. The hyperten-
sive damage leads to the rupture of cerebral arteries. Blood leak directly into
parenchyma (intracerebral hemorrhage) or subarachnoid space (subarachnoid
hemorrhage). Traumatic brain injury is secondary to accidents with the blows
to the head or shaking, especially traffic accidents whose victims crash their
head. Traumatic cranial can lead to the main intracranial hemorrhage (ICH)
types: epidural hematoma, subdural hematoma, intracerebral hemorrhage and
subarachnoid hemorrhage [2–4].
CT and MRI are two popular radiological methods using for detection and
diagnosis of brain bleeding at hospitals [2]. However, both of CT and MRI depend
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 740–750, 2020.
https://doi.org/10.1007/978-3-030-21803-4_74
ICH Identification on CT/MRI Image Using Meta-Architectures 741
on the expertise of radiologists and neurosurgeons. They usually use naked eye
to pull out the important symptoms of disease from images [5], which is the
matter of concern in diagnosis and treatment of brain hemorrhage.
With the development of image processing techniques and machine learn-
ing algorithms, a lot of research are performed to determine, identify and seg-
ment bleeding zone in brain. However, almost of typical computer-aid diagno-
sis (CAD) studies utilize traditional computer vision techniques [6]. In those
systems, CT/MRI images is turned through many processing stages such as
enhancement, type transformation, segmentation and feature extracting. There-
fore, the important characteristics of CT/MRI images can be lost during the
processing. For instance, Mahmoud et al. [7] propose an approach to detect and
classify brain hemorrhage on CT images automatically with two main parts:
image processing and classification. Using Otsu’s method in segmentation stage
for detecting brain hemorrhage is the highlight of this study.
On the other hand, deep learning in general and convolutional neural net-
works (CNNs) in particular present the promising results in broad range of state-
of-the-art computer vision tasks such as object detection and image classification
[6,8,9]. Correspondingly, medicine imaging groups also turn their research ori-
entation from analyzing based on tradition techniques to using deep learning
methodology for image analysis with variation of tasks. According to the review
of Bernal et al. [9], diagnosis of brain diseases has also beheld many CNN-based
proposals working on different tasks such as brain tumor detection and clas-
sification brain hemorrhages. Rezaei et al. [10] propose the CNN architecture
with seven convolutional layers and three fully-connected layers for brain abnor-
mality classification on MR images. In approach of Arbabshirani et al. [6], ICH
presence on CT studies is identified with the fully 3-dimensional deep learning
architecture consisting of five convolutional and two fully-connected layers (aside
from max pooling and local normalization layers). Generally, almost of studies
are not implemented directly on DICOM files. Research groups are only inter-
ested in detecting and classifying brain abnormalities. They ignore the important
characteristics of anomalies which affect the diagnosis and patient monitoring. In
medical image analyzing, radiologists usually use Hounsfield Unit (HU) for deter-
mining the abnormalities on CT/MRI images. In addition, the implementation
of new CNN architecture take a lot of time and cost for hardware infrastructure.
Moreover, to learn effectively, CNNs require training set big enough with varia-
tion of classifying cases. DICOM datasets of brain hemorrhage, nevertheless, are
extremely expensive and scarce because of their private information.
From mentioned viewpoints, we propose a combination of HU with deep
learning for identification of ICH. HU is used for detection the hemorrhage
regions. Two type of region-based meta-architectures, Faster R-CNN [11] and
R-FCN [12], are experimented for the accuracy of ICH classification. The rest
of this paper is structured as follows. Section 2 presents the background of our
approach. In the following sections, we illustrate proposed method and experi-
ments. The conclusion and perspectives for future work are drawn in the final
section.
742 T.-H.-Y. Le et al.
2 Background
2.1 Meta-Architectures Improved from the Strategy of R-CNN
The combination of region proposals with CNNs (R-CNN) drives the advances
of CNN-based object detection approaches [11–13]. Steps of original R-CNN are
pretty intuitive with two states: proposing regions and classifying region pro-
posals with features extracted from them. However, its performance is very slow
because it not share convolutional computations among regions [11,12]. After the
introduction of R-CNN, a lot of approaches are suggested to improve it. With
using region proposal network (RPN) instead of Selective Search for proposing
regions, Faster R-CNN and R-FCN really stand out among the approaches.
Faster R-CNN is an improvement of Fast R-CNN which is the immedi-
ate descendant of R-CNN. Fast R-CNN run only one CNN to extract features
over entire of image before generating region proposals, i.e. regions are pro-
posed based on the last feature map of CNN, not from the input image. SVM
classifiers in the original is also replaced with a softmax layer. In other words,
instead of creating a new model, Fast R-CNN extends the neural network for
predictions [14]. The remaining problem of Fast R-CNN is Selective Search used
for proposing regions. Selective Search is one of most common region proposal
methods with the greedy mergers, but its implementation is slower compared
to efficient detection networks. Faster R-CNN solves the bottleneck of Fast R-
CNN with using RPN which is a deep convolutional network instead of Selective
Search for proposing regions. It can be said that Faster R-CNN is composition
of RPN and Fast R-CNN in a single, unified network for object detection. In
which, Fast R-CNN is used to detect the class of region proposals, results of
RPN module. Moreover, Ren et al. perform alternating training with 4 steps for
the idea sharing computation between RPN and Fast R-CNN. They start by
training RPN. The following step, they train a separate detection network (Fast
R-CNN) with the proposed regions of the 1-step RPN. Both RPN and detection
network are initialized with an ImageNet-pre-trained model. However, RPN is
fine-tuned end-to-end for the region proposals task. After the second step, the
two networks not share convolutional layers. Next, RPN and detection network
are continuously trained in turn with fixing the shared convolutional layers and
only fine-tuning their unique layers [11].
With the comparable idea, improving speed of R-CNN by sharing computa-
tion across region proposals and between networks, R-FCN increase speed by
maximizing the sharing computation. The design of R-FCN is adopted the pop-
ular strategy of CNN-based object detection with two-stage: generating region
proposals (region-of-interest, RoIs) by fully convolutional architecture of RPN
and classifying candidate regions. All learnable weight layers of R-FCN are con-
volutional and are computed on the entire image. In addition, Dai et al. are
interested in compromising between the translation invariance for image-level
classification and translation variance for object detection when convolutional
computations are shared across 100% of the net. With the inspiration from
developing FCNs for instance-level semantic segmentation, they give the con-
ICH Identification on CT/MRI Image Using Meta-Architectures 743
cept of position-sensitive score maps which are convolutional feature maps and
are trained to recognize the certain parts of each object. k 2 score maps present
relative positions (e.g. k = 3 corresponds to 9 relative positions: top-left, top-
middle, top-right, etc.) of one object class. Moreover, the position-sensitive RoIs
pooling layer is introduced to shepherd the learning based on score maps for
object detection. Like Faster R-CNN, RPN and R-FCN share features according
to the 4-step alternating training [12].
Table 1. The absorbed radiation degree of matters in brain by Hounsfield Units [18, 19]
Where, pixel value (actually is voxel value) is the absorbed radiation degree
of voxel in image data of DICOM. RescaleSlope and RescaleIntercept are values
stored in DICOM tags respectively and specify the linear transformation of data
[19,20].
3 Proposed Method
3.1 Converting DICOM to PNG Based on Window Technique
To identify ICH zones, interest structure, it is necessary to pick up them from
brain CT/MRI images. In practice, Hounsfield Unit is a important quantity
supporting for specialist to detect ICH. As a result, we apply it in our experiment.
Besides, in our approach, it is necessary to label data used for retrain network
models. However, range of CT numbers (exceeding 2000 values) recorded by
modern CT/MRI scanners outperforms computer visibility with popular gray-
scale from 0 to 255 [2,20]. Therefore, after computing HU values of pixel, we
would convert DICOM to PNG with windowing. According to this technique,
HU values above (window level+window width/2) are assigned white and those
below (window level − window width/2) are referred to black.
After converting images from DICOM to PNG, ICH regions used for retrain-
ing network models will be labeled into 4 main types of ICH. In the following
steps, training and evaluation data are generated in the form of tfrecord files.
The training result of each model is the frozen inference graph (.pb file) con-
taining respective ICH detection classifier. Figure 2 present our implementation
process to retrain and test models. After testing, the better classifier will be
integrate to CAD system supporting ICH diagnosis.
4 Experiments
We implement experiment detecting and classifying 4 types of ICH according
to Faster R-CNN and R-FCN using ResNet-101 architecture. Our dataset is
746 T.-H.-Y. Le et al.
collected from Can Tho University Hospital with 250 axial head CT slices which
have manifestation of brain hemorrhage (365 regions).
Table 3 presents the rest of dataset (50 images with 77 hemorrhage regions)
which is used for evaluating the accuracy of the trained network models. Aside
from converting image data to PNG, some important information of medical
record stored in tags of DICOM is saved to integrate with classification result
to support for ICH diagnosis. Based on HU values, the hemorrhagic time is also
recorded.
After generating respective label csv files of Train Set and Eval Set, we cre-
ate TFRecord files using for training networks. After TensorFlow initializes
the training, each step reports the loss. This value starts at 2.6 and rapidly
descends throughout the progress. The training stops when Loss(L) and Detec-
tionBoxes Precision mAP@.50IOU (P) reach the saturation. In which, P cal-
culated based on coco detection metrics is mean average precision of detection
boxes at 50% IOU. It takes about 30000 steps (2 h 9 min) and 77000 steps (5 h
30 min), for Faster R-CNN (L ≤ 0.06 and P ≈ 90.56%) and R-FCN (L ≤ 0.03 and
P ≈ 86.46%) respectively.
748 T.-H.-Y. Le et al.
5 Conclusion
Our research aims to detect brain hemorrhage regions on CT/MRI images and
classify them into four main types of ICH with meta-architecture using Faster
R-CNN and R-FCN enhanced from the R-CNN method. Although the training
takes a lot of time for R-FCN, it gives better results on time and accuracy of ICH
identification. HU is the important quantity to support the diagnosis of ICH. In
practice, radiologists usually use HU to detect ICH regions and determine the
time of bleeding. Therefore, our further research will have an integration of
HU into ICH classifiers based on CNN in general, and R-CNN in particular to
support the diagnosis and treatment of ICH.
ICH Identification on CT/MRI Image Using Meta-Architectures 749
References
1. WHO: The top 10 causes of death. http://www.who.int/en/news-room/fact-
sheets/detail/the-top-10-causes-of-death. Last accessed 19 Nov 2018
2. Holmes, E.J., Misra, R.R.: Interpretation of Emergency Head CT: A Practical
Handbook, 2nd edn. Cambridge University Press, United Kingdom (2017)
3. Pham, N.H., Le, V.P.: CT in Head Injuries, 1st edn. Medical Publishing House,
Vietnam (2011)
4. Ly, N.L., Dong, V.H.: Traumatic Brain Injuries, 1st edn. Medical Publishing House,
Vietnam (2013)
5. Fatima, Sridevi, M., Saba, N., Kauser, A.: Diagnosis and classification of brain
hemorrhage using CAD system. In: Proceeding of NCRIET-2015 and Indian J.
Sci. Res. 12(1), 121–125 (2015) (Indian)
6. Arbabshirani, M.R., Fornwalt B.K., Mongelluzzo, G.J., Suever, J.D., Geise, B.D.,
Patel, A.A., Moore, G.J.: Advanced machine learning in action: identification of
intracranial hemorrhage on computed tomography scans of the head with clinical
workflow integration. NPJ Digit. Med. 1(9) (2018)
7. Mahmoud, A-A., Duaa, A., Khaldun Al-D., Inad, A.: Automatic detection and clas-
sification of brain hemorrhages. WSEAS Trans. Comput. 10(12), 395–405 (2013)
8. Geenspan, H., van Ginneken, B., Summers, R.M.: Guest editorial deep learning in
medical imaging: overview and future promise of an exciting new technique. IEEE
Trans. Med. Imaging 35(5), 1153–1159 (2016)
9. Bernal, J., Kushibar, K., Asfaw, D.S., Valverde, S., Oliver, A., Martı́, R.: Deep con-
volutional neural networks for brain image analysis on magnetic resonance imaging:
a review. Artif. Intell. Med. (2018)
10. Rezaei, M., Yang, H., Meinel, C.: Brain abnormality detection by deep convolu-
tional neural network (2016). arXiv preprint arXiv:1708.05206v1
11. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object
detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell.
39(6) (2015)
12. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully
convolutional networks. In: NIPS, pp. 379–387. Curran Associates Inc (2016)
13. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accu-
rate object detection and semantic segmentation. In: 2014 IEEE Conference on
Computer Vision and Pattern Recognition (2014)
14. Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer
Vision. IEEE. Santiago, Chile (2015). https://doi.org/10.1109/ICCV.2015.169
15. Johnson, J., Karpathy, A.: The notes accompany the Standford CS class CS231n:
convolutional neural networks for visual recognition (tranfer learning). http://
cs231n.github.io/transfer-learning/. Last accessed 28 Dec 2018
16. Brownlee, J.: A gentle introduction to transfer learning for deep learning. https://
machinelearningmastery.com/transfer-learning-for-deep-learning/. Last accessed
28 Dec 2018
17. NEMA’s DICOM Homepage. http://www.dicomstandard.org/. Last accessed 31
Dec 2018
18. Hounsfield units-scale of HU, CT numbers. http://radclass.mudr.org/content/
hounsfield-units-scale-hu-ct-numbers. Last accessed 31 Dec 2018
19. Phan, A.-C., Phan, T.-C, Vo, V.-Q., Le, T.-H.-Y.: Automatic detection and clas-
sification of brain hemorrhage on CT/MRI images. In: 2017 National Conference.
Science and Technics Publishing House, Quy Nhon, Vietnam (2017)
750 T.-H.-Y. Le et al.
1 Introduction
only. The basic data structure is named rating matrix, where each entry is the
rating of a user on an item. This matrix has usually a small number of known
entries: the aim is to predict the remaining values using Machine Learning. A
specific CF method is known as model-based [4]: it predicts the unknown ratings
by assuming that they can be inferred depending on a small number of latent fac-
tors. The most successful latent factor models are based on matrix-factorization
(MF) [15,24], where the latent factors are learned by identifying a low-rank
approximation of the rating matrix, with the assumption of correlations between
rows (or columns) to guarantee the dimensionality reduction of the matrix itself.
The identification of this approximation is a minimization problem, where the
goal is to determine two low-rank matrices whose product is as close as possible
to the rating matrix. The approximation error depends on the number of latent
factors (i.e. rank), which is an hyper-parameter of the algorithm. Other two
hyper-parameters are a regularization term, added to reduce the non-linearity
effect and over-fitting, and the learning rate of a stochastic gradient descent
[3] procedure used to generate the low-rank approximation. The best values of
these hyper-parameters are unknown a priori. Recently, Bayesian Optimization
(BO)[9,25] is becoming the most widely adopted strategy for global optimiza-
tion of multi-extremal, expensive-to-evaluate and black-box objective functions
in robotics [20], sensor networks [10], drug design [18], simulation-optimization
problem [5], inversion problems [22] and automated Machine Learning [6]. Exam-
ples of BO applied to RSs can be found in [8,27].
In this paper, we propose BO for the optimization of the hyper-parameters
of a CF based RS. The rest of the paper is organized as follows. Section 2
introduces CF, Sect. 3 describes how BO is used to optimize the learning process
underlying a RS, in Sect. 4 BO is initially investigated on a benchmark test
function generated through the GKLS software [11], a generator of test functions
with known local and global minima for multi-extremal multidimensional box-
constrained global optimization. Finally, we report results of BO for RS based
on the benchmark dataset MovieLens-100k [13].
CF is based on two sets [26]: the set of users U = {u1 , u2 , . . . , uM } and the
set of items I = {i1 , i2 , . . . , iN }. A rating rui ∈ X represents the preference of
the user u for the item i: it can be a Boolean or an integer value. The ratings
given by the users on the items are organized in a matrix R ∈ RM × N , namely
rating matrix. Usually, each user rates only a small number of items, thus the
matrix entries are known only for a small number of positions (u, i) ∈ S, with
|S| << M × N . The set S is divided in a training set ST rain and a test set
ST est with ST rain ∩ ST est = ∅ and ST rain ∪ ST est = S. The aim of CF is to make
predictions for ST est using only the knowledge on ST rain , where the quality of
Bayesian Optimization for Recommender System 753
predictions is measured, for instance, through root mean square error (RMSE):
1 2
RM SE = (rui − r̂ui ) (1)
|ST est |
(u,i)∈ST e
where r̂ui denotes the prediction of the actual rating rui . The idea behind MF
techniques is to approximate the matrix R as the product of two matrices R ≈
P · Q, where P is a M × K and Q is a K × N matrix. The matrix P is called the
user-feature matrix, Q is called the item-feature matrix, and K is the number of
latent factors (features) in the given factorization. Typically, K << M, N , and
both P and Q contain real numbers, even when R contains only integers. The
matrix factorization is obtained by minimizing an error function (e.g., RMSE)
on the training set ST rain as a function of the matrices (P, Q). Therefore, the
optimization problem becomes
⎡ ⎡ 2 ⎤⎤
K K
1
2
argmin⎣ ⎣ rui − puk · qki +λ puk + qki 2 ⎦⎦ (2)
(P,Q) 2
(u,i)∈ST rain k=1 k=1
where puk and qki denote the elements of P and Q, respectively, and λ ≥ 0 is
the regularization factor.
While minimization of (Eq. 2) is performed to learn the matrix factorization,
the computation of RMSE on the test set ST est allows to estimate how good
could be that approximation in predicting new ratings. Usually, to have a robust
estimate, RMSE is computed on k fold-cross validation.
During the learning of the matrix factorization, both puk and qki are identified
through stochastic gradient descent, where the update is stochastically approxi-
mated in terms of the error in a (randomly chosen from a uniform distribution)
observed entry (i, j) as follows:
puk = puk + η · (eui · qki − λ · puk )
(3)
qki = qki + η · (eui · puk − λ · qki )
K
where eui = rui − k=1 puk · qki and η is the learning rate.
[a1 , b1 ] × ... × [an , bn ]. Given an error function H : Θ → R+ that maps each con-
figuration θ ∈ Θ to a numeric value, the aim of the hyper-parameter optimization
is to find the best configuration θ∗ minimizing H (θ):
The most common choice for GP kernel is the Squared Exponential kernel.
The BO algorithm starts with an initial set of n configurations θi=1:n and
their associated function values yi=1:n , with yi = H(θi ).
At each iteration t = n + 1, . . . , N the GP is fitted by conditioning its
mean and variance to the set of function evaluations performed so far, Dt =
{(θi , yi )}i=1:n . For any configuration θ ∈ Θ, the posterior mean μt (θ) and the
posterior variance σt2 (θ) of the GP, conditioned on Dt , are known in closed-form:
T −1
μt (θ) = k (θ) K + τ 2I y (6)
T
σt2 (θ) = k (θ, θ) − k (θ) [K + τ 2 I]−1 k(θ) (7)
where K is the t × t matrix whose entries are Ki,j = k (θi , θj ) , k (θ) is the t × 1
vector of covariance terms between θ and θi=1:n , y is the t × 1 vector whose ith
entry is yi , and τ 2 is the noise variance. When a new point θt+1 is selected and
evaluated it provides a new observation yt+1 = H(θt+1 ), so we can add the new
pair (θt+1 , yt+1 ) to the current set of function evaluations Dt , updating it for
the next BO iteration: Dt+1 = Dt ∪ (θt+1 , yt+1 ).
Bayesian Optimization for Recommender System 755
In Fig. 2a, b we report the best function value observed with respect to the
number of function evaluations. More precisely, 5 initial function evaluations
were performed, randomly, to have a first set of observations to train the GP;
then further 45 function evaluations were used for the optimization process. Ten
different tests have been performed for BO with EI and TS, separately, but the
same initial designs were used for the two approaches in every test. Although TS
in one case was not able to reach an optimal solution close to the global optimum,
the number of tests converging to the optimal solution before 30 evaluations is
higher than BO with EI.
Fig. 2. Best function value observed with respect to the number of function evaluations
5 Application
of the 30 function evaluations, the best value identified by BO-EI (Fig. 3a) is,
on average, 0.9069 (standard deviation = 0.0003), while for BO-TS (Fig. 3b) is
0.9082 (standard deviation = 0.0011).
Fig. 3. Best RMSE on 10 fold-cross validation with respect to the number of function
evaluations
In Tables 1 and 2 we report the best configurations identified for each test
and the function value (i.e., RMSE on 10 fold-cross validation).
Table 1. Best function value observed, λ, η, and K for the five tests with BO-EI
Table 2. Best function value observed, λ, η, and K for the five tests with BO-TS
6 Conclusions
The aim of a RS, based on CF, is to predict the preferences of users on new
incoming, items according to past ratings, that are stored as triples user-item-
score. The core of CF is, usually, a matrix-factorization procedure characterized
by, at least, three different hyper-parameters. As most of the Machine Learning
algorithms, the effectiveness of CF depends on a suitable tuning of its hyper-
parameters, leading to the optimization of a black-box and expensive-to-evaluate
loss function. In this paper, we showed how hyper-parameter optimization for a
CF based RS can be efficiently performed through BO, considering two different
acquisition functions: EI and -greedy TS. Results on a 2-dimensional test func-
tion generated through the GKLS software proved that BO is able to get close
to the global optima in a limited number of function evaluations. Furthermore,
TS proved to converge faster to the optimal with respect to EI. These results
were confirmed also when BO was used to optimize the hyper-parameters of a
CF based RS: after 10 function evaluations the best function value identified
Bayesian Optimization for Recommender System 759
by BO-TS was always lower than 0.95. On the other hand, the best function
value obtained by BO-EI after 30 function evaluations was better than BO-TS
in 4 out of 5 tests. Summarizing, BO proved to be a suitable tool for optimizing
a CF based RS but it is difficult to choose between the two acquisition func-
tions considered. Future works will consider to combine -greedy TS and EI in
order to exploit the convergence property of the first and the empirical good
performances of the second.
References
1. Aggarwal, C.C.: Recommender Systems. Springer International Publishing (2016).
https://doi.org/10.1007/978-3-319-29659-3
2. Basu, K., Ghosh, S.: Analysis of Thompson Sampling for Gaussian Process Opti-
mization in the Bandit Setting (2017). arXiv preprint arXiv:1705.06808
3. Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Pro-
ceedings of COMPSTAT 2010-19th International Conference on Computational
Statistics, Keynote, Invited and Contributed Papers, pp. 177–186 (2010). https://
doi.org/10.1007/2F978-3-7908-2604-3 16
4. Cacheda, F., Carneiro, V., Fernández, D., Formoso, V.: Comparison of collabora-
tive filtering algorithms. ACM Trans. Web 5(1), 1–33 (2011). https://doi.org/10.
1145/1921591.1921593
5. Candelieri, A., Perego, R., Archetti, F.: Bayesian optimization of pump operations
in water distribution systems. J. Glob. Optim. 71(1), 213–235 (2018). https://doi.
org/10.1007/s10898-018-0641-2
6. Candelieri, A., Giordani, I., Archetti, F., Barkalov, K., Meyerov, I., Polovinkin,
A., Sysoyev, A., Zolotykh, N.: Tuning hyperparameters of a SVM-based water
demand forecasting system through parallel global optimization. Comput. Oper.
Res. (2018). https://doi.org/10.1016/j.cor.2018.01.013
7. Crespo, R.G., Martı́nez, O.S., Lovelle, J.M.C., Garcı́a-Bustelo, B.C.P., Gayo,
J.E.L., Pablos, P.O.D.: Recommendation system based on user interaction data
applied to intelligent electronic books. Comput. Hum. Behav. 27(4), 1445–1449
(2011). https://doi.org/10.1016/j.chb.2010.09.012
8. Dewancker, I., McCourt, M., Clark, S.: Bayesian Optimization for Machine Learn-
ing : A Practical Guidebook (2016). arXiv preprint arXiv:1612.04858
9. Frazier, P.I.: A Tutorial on Bayesian Optimization (2018). arXiv preprint
arXiv:1807.02811
10. Garnett, R., Osborne, M.A., Roberts, S.J.: Bayesian optimization for sensor set
selection. In: Proceedings of the 9th ACM/IEEE International Conference on Infor-
mation Processing in Sensor Networks-IPSN 2010, Stockholm, pp. 209–219 (2010).
https://doi.org/10.1145/1791212.1791238
11. Gaviano, M., Kvasov, D.E., Lera, D., Sergeyev, Y.D.: Algorithm 829: software for
generation of classes of test functions with known local and global minima for
global optimization. ACM Trans. Math. Softw. 29(4), 469–480 (2003). https://
doi.org/10.1145/962437.962444
12. Gaviano, M., Kvasov, D., Lera, D., Sergeyev, Y.D.: Software for generation of
classes of test functions with known local and global minima for global optimiza-
tion. ACM Trans. Math. Softw. 29(4), 469–480 (2003)
13. Harper, F.M., Konstan, J.A.: The movielens datasets. ACM Trans. Interact. Intell.
Syst. 5(4), 1–19 (2015). https://doi.org/10.1145/2827872
760 B. Galuzzi et al.
14. Kandasamy, K., Krishnamurthy, A., Schneider, J., Póczos, B.: Parallelised Bayesian
optimisation via Thompson sampling. In: International Conference on Artificial
Intelligence and Statistics, pp. 133–142 (2018)
15. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender
systems. Computer 42(8), 30–37 (2009). https://doi.org/10.1109/MC.2009.263
16. Lee, S.K., Cho, Y.H., Kim, S.H.: Collaborative filtering with ordinal scale-based
implicit ratings for mobile music recommendations. Inf. Sci. 180(11), 2142–2155
(2010). https://doi.org/10.1016/j.ins.2010.02.004
17. McNally, K., O’Mahony, M.P., Coyle, M., Briggs, P., Smyth, B.: A case study
of collaboration and reputation in social web search. ACM Trans. Intell. Syst.
Technol. 3(1), 1–29 (2011). https://doi.org/10.1145/2036264.2036268
18. Meldgaard, S.A., Kolsbjerg, E.L., Hammer, B.: Machine learning enhanced global
optimization by clustering local environments to enable bundled atomic energies.
J. Chem. Phys. 149(13) (2018). https://doi.org/10.1063/1.5048290
19. Mockus, J.: Bayesian Approach to Global Optimization, vol. 37. Springer Nether-
lands (1989). https://doi.org/10.1007/978-94-009-0909-0
20. Olofsson, S., Mehrian, M., Calandra, R., Geris, L., Deisenroth, M., Misener, R.:
Bayesian multi-objective optimisation with mixed analytical and black-box func-
tions: application to tissue engineering (2018). https://doi.org/10.1109/TBME.
2018.2855404
21. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine
learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
22. Perdikaris, P., Karniadakis, G.E.: Model inversion via multi-fidelity Bayesian
optimization: a new paradigm for parameter estimation in haemodynamics, and
beyond. J. R. Soc. Interface 13(118) (2016). https://doi.org/10.1098/rsif.2015.1107
23. Roustant, O., Ginsbourger, D., Deville, Y.: DiceKriging, DiceOptim: two R pack-
ages for the analysis of computer experiments by kriging-based metamodeling
and optimization. J. Stat. Softw. 51(1), 1–55. http://www.jstatsoft.org/v51/i01/
(2012)
24. Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: Advances in
Neural Information Processing Systems (NIPS), pp. 1257–1264 (2008)
25. Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., De Freitas, N.: Taking the
human out of the loop: a review of Bayesian optimization. Proc. IEEE 104, 148–
175 (2016). https://doi.org/10.1109/JPROC.2015.2494218
26. Takács, G., Pilászy, I., Németh, B., Tikk, D.: Scalable collaborative filtering
approaches for large recommender systems. J. Mach. Learn. Res. 10, 623–656
(2009). https://doi.org/10.1145/1577069.1577091
27. Vanchinathan, H.P., Nikolic, I., De Bona, F., Krause, A.: Explore-exploit in top-N
recommender systems via Gaussian processes. In: Proceedings of the 8th ACM
Conference on Recommender systems-RecSys 2014, No. June 2015, pp. 225–232.
(2014). https://doi.org/10.1145/2645710.2645733
Creation of Data Classification System
for Local Administration
1 Introduction
The sharp growth of quantity of unstructured data is led by the fast development of the
Web. According to the reports of many consulting companies, now about 70% of the
digital data which is gathered, stored and utilized by society is in an unstructured (text)
and semistructured form and only 30% generates other types of data. Therefore, the
problem of developing models, methods and construction of systems that allow effi-
cient processing of large data streams is essential. Textual data today have an
expanding variety of forms, and because of the involvement of computers in both
analysis and production, may be encountered in many formats.
The purpose of this article is to compare modern methods for solving the task of
classifying texts, detecting trends in the development of this direction and consider
development of architecture of the system that helps us for efficiently classifying of the
flows of data.
Nowadays a large number of methods and their different variations for the classi-
fication of texts have been developed. Each group of methods has its advantages and
disadvantages, areas of application, features and limitations.
Recently the interest in document classification and text mining has been renewed
and intensified by the accelerated increase in unstructured and semistructured data, due
to the spreading the Internet. Many domain of human activity are related to text
classification research. For example, the process of classifying scientific and popular
articles from on-line journals using constraint satisfaction method was described by
© Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 761–768, 2020.
https://doi.org/10.1007/978-3-030-21803-4_76
762 R. Uskenbayeva et al.
Tran et al. [1], various approaches, in particular explicit rules, machine learning, and
linear discriminator analysis based methods to classification of a real time data set of
online employment offers gathered from different heterogeneous sources with a stan-
dard job classification system was applied and compared by Amato et al. [2].
Text classification is used in many spheres of human activity, from automatic
indexing of documents to document filtering, automatic metadata generation,
expanding the hierarchical directory of web resources, and structuring documents.
Office employees concentrate their working hours on ordinary, non-optimized work
related to the obligation of arranging electronic mails, messages, newsletters, chats,
releases, marketing statements, presentations, reviews and other documents that don’t
fit properly in relational databases, they can be stored as text files in various formats,
and these kinds of files may have an internal structure.
Moreover, text classification methods can be directed to problems on text docu-
ments analysis for the subject to containing certain criteria with further determining
identical documents, documents delaying the implementation of governmental
assignments, documents containing requests for background information and reports
which require a large amount of working hours thereby redirecting employees from the
essential work on processes of responding letters with unimportant contents.
Classification is a standard task in the field of Data Mining. The purpose of the data
classification is to define for each object one or more predefined categories to which
this object belongs. A feature of the classification problem is the assumption that the set
of classified data does not contain “garbage”, that is, each object corresponds to some
given category [17].
A particular case of the classification problem is the separation of many messages
into its category. Classification of unstructured text data, as well as in the case of
classifying objects by a certain property, consists in assigning letters to one of the
previously known classes. Classification applied to the text data is often called text
categorization or rubricating. Obviously, this name comes from the task of system-
atizing text data from catalogs, categories and rubrics [17].
Let M is the set of possible messages:
M ¼ fm1 ; . . .; mi ; . . .; mn g: ð1Þ
C ¼ fcr g; ð2Þ
where r = 1,…,m.
The hierarchy of categories can be represented in the form of a set of pairs
reflecting the ratio of nesting between rubrics:
Creation of Data Classification System for Local Administration 763
H ¼ \cj ; cp [ ; cj ; cp C : ð3Þ
The set of attributes of all letters must coincide with the set of attributes of
categories:
It should be noted that these feature sets distinguish the classification of text data
from the classification of objects in Data Mining, which is characterized by a set of
attributes. The decision to assign mi messages to the category cr is taken on the basis of
the intersection
K-nearest neighbors (KNN), Naïve Bayes and Term Graph Model are text classi-
fication methods which were studied in the works of Bijalwan et al. [3]. During the
comparative research of the above mentioned text classification methods KNN showed
the highest accuracy [1]. Despite the fact that the performance of KNN algorithm is
low, it is broadly used in text classification, because KNN completely depends on every
sample of the training set [5].
KNN algorithm uses the Euclidean distance between the document vector and the
query vector to measure a document relevancy to a given query, which is quite
accurate. Elden [4] described a way to improve the performance of KNN by replacing
the term document matrix with a low-rank approximation in order to capture the
relevant information and remove the unimportant details in the documents. As it was
mentioned before KNN shows the best accuracy results among other text classification
methods. However, when comparing Naive Bayes, K-Nearest Neighbors and Support
764 R. Uskenbayeva et al.
For the given model a document is a vector d = {w1, w2,…, wn}, where wi –
weight of i-th term, and n – the size of dictionary of a sample set. Thus, according to
the Bayes theorem the probability of a class c for a document d is:
The KNN algorithm can be applied to email classification [13]. An approach for
building a machine learning system in R that uses K-Nearest Neighbors (KNN) method
for the classification of textual documents from two sources: http://egov.kz and http://
www.government.kz was represented in [19].
There are numerous modifications of KNN algorithm. One of the issues of KNN
algorithm is to reduce the sparsity of the term-document matrix [18]. A flexible KNN
algorithm with combination of weighting algorithm and K-variable algorithm enhances
the efficiency of text classification [14]. A combination of eager learning with KNN
classification [6] improved increased the accuracy and the efficiency of classification.
A novel KNN classification algorithm combining evidence theory and model assists to
succeed in dealing with the shortage of time-consuming [7].
In order to increase the classification accuracy and reduce time consumption many
researchers have made attempts to combine KNN algorithm with various classification
techniques. In spite of easiness of using KNN and its efficiency in general, the per-
formance of KNN algorithm depends on mostly the allocation of the training set. It was
proposed a modified KNN algorithm based on integration the density of the test sample
and the density of its nearest neighbors taking into account the unevenness of the
textual data distribution [10, 15]. In order to decline the effect of the uneven data
distribution to the classification the distance between the test sample and samples in the
sparse area was intensified and the distance between the test sample and samples in the
dense area was reduced. An algorithm based on clustering the training samples making
a relatively uniform distribution of training samples which assists to solve the problem
of the uneven distribution of training samples was presented [15].
In order to improve the performance of KNN classifier TFKNN (Tree-Fast-K-
Nearest-Neighbor) method based on similarity search tree was proposed [16]. This
approach shows how to search the exact k nearest neighbors and improve time con-
suming drawback.
Next stage, data is presented in various forms. Moreover we can build a variety of
analytical reports according to the user needs. For example, you can use the tools
Business Analytics and others. The architecture of the system is presented in Fig. 2.
analyzed sources independently i.e. manually, but with a large amount of data, it is
necessary to use automated selection options according to specified criteria.
Extracting and loading data. Extraction of information from selected sources involves
the allocation of the necessary data, over which the classification will be carried out in
the future. After this step, the downloading data is performed.
Classification of the data coming into local government. Next, we will consider the
classification of data by category. The main task of classification is the grouping of text
data on subjects (education, medicine, etc.). Methods for classifying unstructured text
data lie at the junction of several areas: retrieval of data from various sources,
extraction and loading of data, Data Mining. Classification task was discussed earlier in
Sect. 2 of this paper.
The task of classification methods is to best select such characteristics and for-
mulate rules on the basis of which a decision will be made to assign messages to this
rubric [17].
After completing the classification of text messages data visualization is performed.
In our case, the data is presented in text form. If necessary, you can represent data in
various forms, such as tabular, graphical, etc. Also you can use ready-made business
intelligence tools for various visualizations of classified data.
4 Conclusion
In this paper, we considered the management of data flows coming to local government
from various data sources, in the messages form (letters), questions, etc. The main task
was the classification of messages, questions, etc. by category. There were presented
the scheme of the system operation for local government administrations and its
architecture.
The system can be used to provide communication of citizens with government
services and institutions. The system will carry out professional processing of incoming
calls to administrative institutions and include functions to ensure the reception of calls
through all types of communication channels (call centers, email, a web portal, social
networks), high-quality and fast processing of all calls, taking into account the priority
of requests and established procedural deadlines.
The article provides an overview of data classification methods, a scheme and
description of the system architecture for classifying of data flows, as well as a formal
description of the classification of text messages.
Acknowledgment. This work has been done in the framework of the grant given by Ministry of
Education and Science of the Republic of Kazakhstan (Grant No. 0218PК01178).
768 R. Uskenbayeva et al.
References
1. Tran, L.Q., Moon, C.W., Le, D.X., Thoma, G.R.: Web page downloading and classification.
In: Proceedings 14th IEEE Symposium on Computer-Based Medical Systems. CBMS 2001,
pp. 321–326 (2001). https://doi.org/10.1109/cbms.2001.941739
2. Amato, F., Boselli, R., Cesarini, M., Mercorio, F., Mezzanzanica, M., Moscato, V.,
Picariello, A.: Challenge: processing web texts for classifying job offers. In: Proceedings of
the 2015 IEEE 9th International Conference on Semantic Computing, IEEE ICSC 2015,
pp. 460–463 (2015). https://doi.org/10.1109/icosc.2015.7050852
3. Bijalwan, V., Kumar, V., Kumari, P., Pascual, J.: KNN based machine learning approach for
text and document mining 7(1), 61–70 (2014)
4. Elden, L.: Matrix Methods in Data Mining and Pattern Recognition. SIAM, Philadelphia,
PA, 224 pp. (2007). ISBN 978-0-898716-26-9
5. Hassanat, A.B., Abbadi, M.A., Alhasanat, A.A.: Solving the Problem of the K Parameter in
the KNN Classifier Using an Ensemble Learning Approach. Int. J. Comput. Sci. Inf. Secur.
(IJCSIS) 12(8), 33–39 (2014). https://doi.org/10.1007/s00500-005-0503-y
6. Dong, T., Cheng, W.: The research of kNN text categorization algorithm based on eager
learning, (d), pp. 1120–1123 (2012). https://doi.org/10.1109/icicee.2012.297
7. Guo, G., Ping, X., Chen, G.: A fast document classification algorithm based on improved
KNN, pp. 3–6 (2006)
8. Pratama, B.Y., Sarno, R.: Personality classification based on Twitter text using Naive Bayes,
KNN and SVM. In: 2015 International Conference on Data and Software Engineering
(ICoDSE), pp. 170–174 (2015). https://doi.org/10.1109/icodse.2015.7436992
9. Yan, Z.: Combining KNN algorithm and other classifiers, (1), 1–6 (2010)
10. Shimodaira, H.: Text classification using Naive Bayes, (4) (2015)
11. Wang, L., Zhao, X.: Improved KNN classification algorithms research in text categorization,
i, pp. 1848–1852 (2012)
12. Tjandra, S., Alexandra, A., Warsito, P.: Determining citizen complaints to the appropriate
government departments using KNN algorithm, pp. 2–5 (2015)
13. Nikhath, A.K., Subrahmanyam, K., Vasavi, R.: Building a K-nearest neighbor classifier for
text categorization 7(1), 254–256 (2016)
14. Yunliang, Z., Lijun, Z., Xiaodong, Q., Quan, Z.: Flexible KNN algorithm for text
categorization by authorship based on features of lingual conceptual expression, pp. 601–
605 (2009). https://doi.org/10.1109/csie.2009.363
15. Zhou, L., Wang, L.: A Clustering-based KNN improved algorithm CLKNN for text
classification, pp. 4–7 (2010)
16. Wang, Y.U., Wang, Z.: A fast KNN algorithm for text categorization, 19–22 Aug 2007
17. Barsegiyan, A.A.: Tekhnologii analiza dannykh: Data Mining, Visual Mining, Text Mining,
OLAP / A.A. Barsegyan, M.S. Kupriyanov, V.V. Stepanenko, I.I. Kholod – 2-ye izd.,
pererab. i dop. – SPb.: BKHV-Peterburg, 384 p. (2007)
18. Moldagulova, A.N., Sulaiman, R.B.: Document classification based on KNN algorithm by
term vector space reduction. In: Proceedings of 18th International Conference on Control,
Automation and Systems (ICCAS) (2018)
19. Moldagulova, A.N., Sulaiman, R.B.: Using KNN algorithm for classification of textual
documents. In: Proceedings of 8th International Conference on Information Technology
(ICIT) (2017)
Face Recognition Using Gabor Wavelet
in MapReduce and Spark
1 Introduction
Nowadays, with the development of society, we see that the field of information
technology is present in all aspects of life such as economics, politics, culture,
entertainment, etc. The issue of user privacy is an indispensable requirement and
this field is currently got a lot. Data security and privacy are always a top concern
for computer users and information authentication systems. If we do not have
the appropriate protection solutions, while interacting with a global network, the
ability to take control over the data will be higher. Therefore, information secu-
rity plays a very important role in authentication systems. In order to authenti-
cate someone, we usually use magnetic cards, passwords, passports, etc. However,
these methods are at the risk of information theft. Currently, the identification
system has been studied and given the increasing reliability contribute to solving
information security issues. The introduction of facial recognition systems has
given many benefits. Face recognition is considered one of the most common and
important methods of biometric identification. This method is capable of identi-
fying someone through their facial features. Automatic face recognition systems
have been extensively studied in recent years because of its role in access control
systems or real-time monitoring systems [3]. There have been many important
accomplishments in research. Turk and Pentland [4] presented a near real-time
facial recognition system by introducing eigenface technique in the extraction of
facial features. Wang et al. [5] proposed an effective facial recognition technique
using the Principal component analysis (PCA) method and machine learning
algorithm Support Vector Machine (SVM). In general, many methods have been
proposed to solve facial recognition problems.
Typically, there are 2 methods to recognize face image: comprehensive review
face and identifying through geometrical characteristic of the facial details.
Firstly, identity is based on a comprehensive review face, this method uses algo-
rithms such as PCA, LDA, ICA, wavelet transform and so on in order to extract
the principal features on the face. Secondly, the identification is made through
the identification of the geometrical characteristics of the facial details like loca-
tion, size, shape of the eyes, nose, mouth, etc and the relationship between
the details such as the distance between two eyes, eyebrows distance. In this
paper, we propose a facial recognition method using Gabor wavelet transform
and MapReduce parallel processing model at the training and identification stage
to improve the response time of the system.
2 Related Works
3 Background
3.1 Feature Extraction
KNN [7] is an algorithm classifying objects based on the closest distance between
objects to be classified and all objects in the training set. This is the simplest
algorithm in the machine systems. The KNN steps are described as follows:
Spark has provided a general execution model that allows optimizing arbitrary
operator graphs. And it supports in-memory computing, which lets it query data
faster than disk-based engines like MapReduce. Spark provides a kit of API for
Scala, Java, and Python languages. In some experimental results, Spark can be
as much as 10 times to 100 times faster than Hadoop [8–10]. Apache Spark
model deployment such as Standalone, Hadoop Yarn, Apache Mesos,... Spark
Core is a component of Spark: it provides the most basic functions of Spark
772 A.-C. Phan et al.
In this paper we develop a face recognition system using Gabor wavelet trans-
form and MapReduce parallel computing model. The proposed method consists
of two phases: training and recognition. Figure 2 describes the face recognition
system using Gabor wavelet transform and MapReduce. First, at the training
stage, we used the Gabor filter to extract the facial features. Extracted features
will be stored on the Hadoop distributed file system (HDFS). At the identity
Face Recognition Using Gabor Wavelet in MapReduce and Spark 773
stage, we use the KNN algorithm to predict labels and give identification results.
This process is performed under the MapReduce mechanism to improve com-
putational speed. The basic problem with facial recognition is the use of Gabor
filters to extract features. Instead of using the facial features schema, high energy
points will be used to compare faces, which not only reduce the volume of calcu-
lation but also increase the accuracy of the algorithm because it does not need
to identify the feature manually.
At the training stage, from the set of training photos, we performed the facial
feature extraction by Gabor wavelet transform and stored in the database. The
feature extraction algorithm consists of three main steps: Step 1: Detect and
extract face on image. Step 2: Determine position of feature points on face image
by using the Gabor wavelet filter. Step 3: Generate feature vector and save onto
the HDFS.
Figure 3 describes the training module. Input data of map progress is a list of
face images stored on HDFS. From this dataset, Tasktracker will generate sets of
records that are key (the label of the image) and value (the content of the image).
With this set of records, Tasktracker will loop to retrieve each record as input for
the map function in order to return results that are intermediate key and value.
The output of the map function will be sorted inside the main memory and then
written to the local disk. After the mapping process is completed, its output (set
of intermediate key(image label) and value (feature vector) pairs) will be input
to the reduce process. In the training module, the reduce process only performs
data merging from the mapping process. The final data is a collection of records
(each record is a (image label, feature vector)) that will be stored on HDFS.
From the response of the face through Gabor wavelet filters, we will find the
features according to the following method:
A point (x0 , y0 ) is characteristic if the following conditions are satisfied:
– Rj (x0 , y0 ) = max (Rj (x, y))
(x,y)∈W0
N1
N2
– Rj (x0 , y0 ) > 1
N1 N2 Rj (x, y), j = 1, ..., 40
x=1 y=1
– Rj is the response of the face image for Gabor filters j. N1 N2 are the size of
the facial image. W0 is a square has the edge W xW pixel core at x0 , y0 .
In this algorithm, dimensions W is a key parameter. It must be chosen small
enough to obtained the important characteristics, and large enough to avoid
excess. In this paper, we chose W = 9 to find features points of the face through
the response to Gabor filters.
* Reduce Phase: Due to the matrices of all the process map are the same
size. We will perform merge these matrices by browsing through each element
of all matrices. After that, we get the class label and the distance of the ele-
ment with the smallest distance at the position being considered in all matrices.
After merging all the EDi matrices we obtain a single matrix EDReducer . The
EDReducer will contain the definitive list of neighbors (class and distance) for
all the examples of FT. We perform the majority voting of the KNN model and
determine the predicted classes for F T . As a result, the predicted classes for all
the FT set are provided as the final output of the reduce phase.
776 A.-C. Phan et al.
In this paper, we conducted experiments for the two data set image, that
is: AT&T (of AT&T Laboratories Cambridge) and Yale (of UCSD Computer
Vision).For each data set, we divided images into 2 subsets: Training set (about
70%), Testing set (about 30%) (Table 1).
From Table 2, the optimal k value equals 1 because the accuracy of identifica-
tion is highest in all cases. To assess the system’s execution time, we compared
facial recognition method that was not parallel to the MapReduce method in
Spark environment. We used the Gabor wavelet with KNN method for facial
recognition. Table 3 shows the execution time using the non-parallel recognition
method and the MapReduce method in the Spark environment. The results show
that using the MapReduce method will significantly improve recognition time.
Face Recognition Using Gabor Wavelet in MapReduce and Spark 777
6 Conclusion
Face recognition is an attractive area for the study of the nervous system and
visual research on the computer. Humans have the ability to recognize a famil-
iar face with ease due to the visual and cerebral hemisphere crust, but human
memory capacity to the limit. The studies used to illustrate the possibilities for
computer-based identification was created with advantages is the large mem-
ory capacity. MapReduce is a framework for writing applications using parallel
processing large amounts of data with high fault tolerance across thousands of
computing clusters. Apache Spark is an open source system that enables the
construction of rapid prediction models with calculations performed on a set of
computers to provide fast data analysis. Identification method based on extract-
ing features are designed to reduce the problem of storing the data too large,
and Gabor wavelet transform is suitable for extracting features. In this paper,
we have developed a face recognition system which using Gabor filters to extract
features of the face. We performed parallel processing at the extraction stage and
facial recognition stage with the MapReduce model in the Spark environment.
Therefore, the operations of processing and storing intermediate data are per-
formed on the main memory (RAM). In further research, we will work on larger
data sets with the number of large system nodes that can perform real-time
recognition.
References
1. Bellakhdhar, F., Loukil, K., Abid, M.: Face recognition approach using Gabor
Wavelets, PCA and SVM. Int. J. Comput. Sci. Issues (IJCSI), 10(2) (2013)
2. Gervei, O., Ayatollahi, A., Gervei, N.: 3D face recognition using modified PCA
methods. World Acad. Sci. Eng. Technol. 39 (2010)
3. Jain, A.K., Klare, B., Park, U.: Face recognition: some challenges in forensics. In:
2011 IEEE International Conference on Automatic Face & Gesture Recognition
and Workshops (FG 2011). IEEE (2011)
4. Turk, M.A., Pentland, A.P.: Face recognition using eigenfaces. In: IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, Proceedings
CVPR 1991. IEEE (1991)
5. Wang, C., et al.: Face recognition based on principle component analysis and sup-
port vector machine. In: 3rd International Workshop on Intelligent Systems and
Applications (ISA). IEEE (2011)
778 A.-C. Phan et al.
6. Maillo, J., Triguero, I., Herrera, F.: A mapreduce-based k-nearest neighbor app-
roach for big data classification. In: Trustcom/BigDataSE/ISPA, 2015, vol. 2. IEEE
(2015)
7. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. The-
ory 13(1), 21–27 (1967)
8. Liu, L.: Performance Comparison by Running Benchmarks on Hadoop, Spark, and
HAMR. University of Delaware, Diss (2015)
9. Ranjani Priya, A.C., Sridhar, Dr, M.: Spark–an efficient framework for large scale
data analytics. Int. J. Sci. Eng. Res. (2016)
10. Zaharia, M., et al.: Spark: cluster computing with working sets. HotCloud 10(10-
10) (2010)
11. Bagwe, T., Darji, N., Gunjal, J., Vanjari, N.: Face Detection Using hadoop map-
reduce framework. Int. J. Res. Advent Technol. (2015)
12. Karau, H., et al.: Learning spark: lightning-fast big data analysis. O’Reilly Media,
Inc. (2015)
13. Divakar, M.A., Arakeri, M.P.: User authentication system using multimodal bio-
metrics and MapReduce. In: Information and Communication Technology for Sus-
tainable Development. Springer, Singapore, pp. 71–82 (2018)
Globally Optimal Parsimoniously Lifting
a Fuzzy Query Set Over a Taxonomy Tree
1 Introduction
Specifically, for each leaf node that is not in Su , we set both L(·) and H(·)
to be empty and the penalty to be zero. For each leaf node that is in Su , L(·) is
set to be empty, whereas H(·), to contain just the leaf node, and the penalty is
defined as its membership value multiplied by the offshoot penalty weight γ. To
compute L(t) and H(t) for any interior node t, we analyze two possible cases:
(a) when the head subject has been gained at t and (b) when the head subject
has not been gained at t. In case (a), the sets H(·) and L(·) at its children are
not needed. In this case, H(t), L(t) and p(t) are defined by:
In case (b), the sets H(t) and L(t) are just the unions of those of its children,
and p(t) is the sum of their penalties:
H(t) = H(w), L(t) = L(w), p(t) = p(w). (3)
w∈χ(t) w∈χ(t) w∈χ(t)
To obtain a parsimonious lift, whichever case gives the smaller value of p(t)
is chosen.
When both cases give the same values for p(t), we may choose arbitrarily –
in the formulation of the algorithm below, we have chosen (a). The output of the
algorithm consists of the sets defined at the root, namely, H – the set of head
subjects and offshoots, L – the set of gaps, and p – the associated penalty.
ParGenFS Algorithm
– INPUT: u, T
– OUTPUT: H = H(root), L = L(root), p = p(root)
I Base Case
for each leaf i ∈ I
if u(i) > 0
H(i) = {i}, L(i) = , p(i) = γu(i)
else
H(i) = , L(i) = , p(i) = 0
II Recursion
if u(t) + λV (t) ≤ w∈χ(t) p(w)
H(t) = {t}, L(t) = G(t), p(t) = u(t) + λV (t)
else
H(t) = w∈χ(t) H(w), L(t) = w∈χ(t) L(w), p(t) = w∈χ(t) p(w)
Proof. We prove this result by induction over the number of nodes n in the
tree. If n = 1, there is only one node i and, in the Base Case of ParGenFS, the
definition of the sets H(i) and L(i) is such that the only possible non-empty set
Globally Optimal Parsimoniously Lifting a Fuzzy Query Set 785
is H(i) = {i}, when i ∈ Su . The penalty in this case is γu(i), which is clearly
the correct, and minimum, penalty. When i ∈ / Su , the penalty is obviously zero.
Let us now assume that the statement is true for all rooted trees with fewer
than n nodes. Consider a rooted tree T (t) with n nodes, where n > 1. Each child
w of the root t is itself the root of a subtree T (w) with fewer than n nodes.
If the head subject is not gained at t, then the optimal H- and L-sets at
t are clearly the unions of the corresponding sets for the subtrees T (w); this
follows from the additive structure of the penalty function in (1). Clearly, the
minimum penalty for the subtree T (t) must be the smaller of the penalty values
p(t) = u(t)+λV (t) and p(t) = w∈χ(t) p(w), as it is in the algorithm. The result
now follows by induction on n.
Clusters of topics should reflect co-occurrence of topics: the greater the number
of texts to which both t and t topics are relevant, the greater the interrela-
tion between t and t , the greater the chance for topics t and t to fall in the
same cluster. We have tried several popular clustering algorithms at our data.
Unfortunately, no satisfactory results have been found. Therefore, we present
here results obtained with the FADDIS algorithm developed in [7] specifically
for finding thematic clusters. This algorithm implements assumptions that are
relevant to the task:
The clusters above are lifted in the DST taxonomy using ParGenFS algorithm
with the gap penalty λ = 0.1 and off-shoot penalty γ = 0.9 defined to correspond
specifics of the DST tree.
The results of lifting of Cluster L are shown in Fig. 4. There are three head
subjects: Machine Learning, Machine Learning Theory, and Learning to Rank.
Globally Optimal Parsimoniously Lifting a Fuzzy Query Set 787
These represent the structure of the general concept “Learning” according to the
text collection under consideration. One can see from these head subjects that
main work here still concentrates on theory and method rather than applications.
References
1. The 2012 ACM Computing Classification System. http://www.acm.org/about/
class/2012. Accessed 30 Apr 2018
2. Blei, D.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
3. Chernyak, E.: An approach to the problem of annotation of research publications.
In: Proceedings of the Eighth ACM International Conference on Web Search and
Data Mining, pp. 429–434. ACM (2015)
4. Frolov, D., Mirkin, B., Nascimento, S., Fenner, T.: Finding an appropriate gen-
eralization for a fuzzy thematic set in taxonomy. Working paper WP7/2018/04,
Moscow, Higher School of Economics Publ. House, 58 p. (2018)
5. Lloret, E., Boldrini, E., Vodolazova, T., MartÃnez-Barco, P., Munoz, R., Palo-
mar, M.: A novel concept-level approach for ultra-concise opinion summarization.
Expert. Syst. Appl. 42(20), 7148–7156 (2015)
6. Mei, J.P., Wang, Y., Chen, L., Miao, C.: Large scale document categorization with
fuzzy clustering. IEEE Trans. Fuzzy Syst. 25(5), 1239–1251 (2017)
7. Mirkin, B., Nascimento, S.: Additive spectral method for fuzzy cluster analysis
of similarity data including community structure and affinity matrices. Inf. Sci.
183(1), 16–34 (2012)
8. Mueller, G., Bergmann, R.: Generalization of workflows in process-oriented case-
based reasoning. In: FLAIRS Conference, pp. 391–396 (2015)
9. Pampapathi, R., Mirkin, B., Levene, M.: A suffix tree approach to anti-spam email
filtering. Mach. Learn. 65(1), 309–338 (2006)
10. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval.
Inf. Process. Manag. 25(5), 513–523 (1998)
11. Song, Y., Liu, S., Wang, H., Wang, Z., Li, H.: Automatic taxonomy construction
from keywords. US Patent No. 9,501,569. Washington, DC, US Patent and Trade-
mark Office (2016)
Globally Optimal Parsimoniously Lifting a Fuzzy Query Set 789
12. Vedula, N., Nicholson, P.K., Ajwani, D., Dutta, S., Sala, A., Parthasarathy, S.:
Enriching taxonomies with functional domain knowledge. In: The 41st Inter-
national ACM SIGIR Conference on Research & Development in Information
Retrieval, pp. 745–754. ACM (2018)
13. Waitelonis, J., Exeler, C., Sack, H.: Linked data enabled generalized vector space
model to improve document retrieval. In: Proceedings of NLP & DBpedia 2015
Workshop in Conjunction with 14th International Semantic Web Conference
(ISWC), vol. 1486. CEUR-WS (2015)
14. Wang, C., He, X., Zhou, A.: A Short survey on taxonomy learning from text cor-
pora: issues, resources and recent advances. In: Proceedings of the 2017 Conference
on Empirical Methods in Natural Language Processing, pp. 1190–1203 (2017)
K-Medoids Clustering Is Solvable in
Polynomial Time for a 2d Pareto Front
1 Introduction
Obj2
x1
•
x2
•
x3 x4
• • x5
• x6 x
• •7 x8
•
x9 x10
• • x11 x12
• • x13
• x14
• x15
•
Obj1
Lemma 3. Let x1 , x2 , x3 ∈ R2 .
The naive computation of ci,i has a time complexity in O((i − i)2 ). Computing
ci,i independently for all i < i , this induces a complexity in O(N 4 ) time and
in O(N 2 ) memory space. To improve the complexity, Algorithm 1 computes for
all i < c < i the values di,c,i , defined as the costs of k-medoids for cluster Ci,i
with center c, using the induction relations (14) to compute each element di,c,i
in O(1). The matrix ci,i is then computed from di,c,i using (15):
i
2
∀i c i , di,c,i = ||xk − xc || (13)
k=i
∀i c i < N,
2
di,c,i +1 = di,c,i + ||xi +1 − xc || (14)
∀i i , ci,i = min di,l,i (15)
l∈[[i,i ]]
Proposition 2 Using Algorithm 1, computing the matrix of costs ci,i for all
i < i has a complexity in O(N 3 ) time and in O(N 2 ) memory space.
Proof. The induction formula (14) uses only values di,k,i+l with l < l. In Algo-
rithm 1, it is easy to show by induction that di,k,i+l , and also ci,i+l , has its final
value for all l ∈ [[1, N ]] at the end of the for loops from k = i to i + l.
Let us analyze the complexity. The space complexity is defined by the sizes
of matrices ci,i and di,c,i . A priori, the space complexity is in O(N 3 ), given
K-Medoids Clustering Is Solvable in Polynomial Time 795
−l
−l
N N
i+l
N N
TN = β.l + α = (β.l + (l + 1)α) = O(N 3 )
l=1 i=1 k=i l=1 i=1
Let ci,i for i < i the elementary costs computed with the Algorithm 1. We define
Ci,k as the optimal cost of the k-medoids clustering with k cluster among points
[[1, i]] for all i ∈ [[1, N ]] and k ∈ [[1, K]], we have following induction relation:
796 N. Dupin et al.
with the convention C0,k = 0 for all k 0. The case k = 1 is directly given by:
These relations allow to compute the optimal values of Ci,k by dynamic pro-
gramming in the Algorithm 2. CN,K is the optimal solution of the k-medoids
problem, backtracking on the matrix (Ci,k )i,k computes the optimal partitioning
clusters.
Proof. The formula (16) uses only values Ci,j with j < k in Algorithm 2. Induc-
tion proves that Ci,k has its final value for all i ∈ [[1, N ]] at the end of the for
loops from k = 2 to K. CN,k is thus at the end of these loops the optimal
value of the k-medoids clustering among the N points of E. The backtracking
phase searches for the equalities in Ci,k = Cj −1,k−1 + cj ,i to return the optimal
clusters Cj ,i . Let us analyze the complexity. Sorting and indexing the elements
of E following Lemma 2 has a complexity in O(N log N ). The computation of
K-Medoids Clustering Is Solvable in Polynomial Time 797
the matrix ci,i has a complexity in O(N 3 ) thanks to Algorithm 1 and Propo-
sition 2. The construction of the matrix Ci,k requires N × K computations of
minj∈[[1,i]] Cj−1,k−1 + cj,i , which are in O(N ), the complexity of this phase is in
O(K.N 2 ). The backtracking phase requires K computations having a complex-
ity in O(N ), the complexity is in O(K.N ). The bottleneck is the computation
of the matrix ci,i as K < N , Hence, the complexity of Algorithm 1 is in O(N 3 )
time and in O(N 2 ) memory space because of the initialization phase.
We note that a similar DP algorithm solves HSS and 1-d k-means, with
a similar complexity in O(K.N 2 ) time and O(K.N ) memory space in [2,22].
In both cases, the time complexity of the DP was improved: time complexity in
O(KN ) using memory space in O(N ) for 1-d k-means in [8], and time complexity
for HSS in O(K.N + N log N ) since [3] and in O(K.(N − K) + N log N ) since
[12]. Similarly, perspectives are to speed-up the construction of the matrix Ci,k .
However, the complexity in Algorithm 2 is mainly due to the initialization, it
reduces the impact of such speed-up.
The practical efficiency can also be improved with a parallel implementation.
The parallelization of Algorithm 1, as previously described, is crucial: the ini-
tialization phase is indeed the bottleneck for the complexity. The backtracking
phase is sequential, but it has the lowest complexity. The second phase, con-
structing matrix C can also be parallelized. Indeed, once Ci,k are computed for
all i N and a given k, it allows to compute independently all the Ci,k+1 for
i N . A construction of the matrix C, line by line with k increasing, allows
to have K iterations with N independent computations that can be distributed
in several processors. This parallel scheme is straightforward to implement in a
shared memory environment with OpenMP. For a distributed implementation
with MPI, Ci,k computations for all i N require only Ci,k−1 in memory for
all i N . At most one N-dimensional array must be stored for each processor,
MPI AllGather operations at each iteration k allows to distribute the results
that are required for the next iteration.
Theorem 2 allows to compute the whole Pareto front {(k, Ck,N )}k with the
same complexity than only one point of this Pareto front. Searching for good
values of k, the elbow technique, graph test, gap test as described in [17], apply
to select a good value of k from the Pareto front of couples {(k, Ck,N )}k .
References
1. Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-
of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)
2. Auger, A., Bader, J., Brockhoff, D., Zitzler, E.: Investigating and exploiting the
bias of the weighted hypervolume to articulate user preferences. In: Proceedings of
GECCO 2009, pp. 563–570. ACM (2009)
3. Bringmann, K., Friedrich, T., Klitzke, P.: Two-dimensional subset selection for
hypervolume and epsilon-indicator. In: Annual Conference on Genetic and Evolu-
tionary Computation, pp. 589–596. ACM (2014)
K-Medoids Clustering Is Solvable in Polynomial Time 799
1 Introduction
Sparsification of neural networks is one of the effective complexity reduction
methods to improve efficiency and generalizability [3,4]. In this paper, we spar-
sify convolutional neural networks (CNN) for classifying curves with multi-scale
structures. Such curves arise in hand writings of people with neurological dis-
orders e.g. Parkinson disease (PD) patients, and in neuropsychological exams.
Distinguishing hand writings of normal and PD subjects computationally will
greatly help diagnosis and reduce physicians’ workload in evaluations.
People with PD tend to lose control of their hands, and their writing or
drawing shows oscillatory behavior as shown in Fig. 2, a century old image avail-
able online. Such oscillatory features can be learned during CNN training. Since
we do not have large amount of PD hand writings, we shall generate on the
computer a large number of oscillatory shapes that mimic shaky writings of PD
subjects. Indeed, we found that CNN is quite successful for this task and can
reach accuracy as high as 99% on our synthetic data set with three convolution
layers and one fully connected layer as shown in Fig. 1. However, we also found
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 800–809, 2020.
https://doi.org/10.1007/978-3-030-21803-4_80
Learning Sparse Neural Networks 801
that there is a lot of redundancy in the weights of the trained CNNs, especially
in the fully connected layer where we aim to significantly sparsify the network
weights with minimal loss of accuracy.
Since the natural sparsity promoting penalty 0 is discontinuous, we shall
adopt the relaxed variable splitting method (RVSM, [3]) for network sparsifica-
tion. Even though Lipschitz continuous penalties such as 1 and transformed-
1 [2,8] are almost everywhere differentiable, the splitting approach [3] is more
effective for enforcing sparsity than directly placing a penalty function inside the
stochastic gradient descent (SGD) algorithm. The RVSM is also much simpler
than the statistical 0 regularization approach in [4]. A systematic comparison
with [4] will be conducted elsewhere.
The rest of the paper is organized as follows. In Sect. 2, we review RVSM for
0 , transformed-1 , and 1 penalties and present a convergence theorem. A new
critical point condition is introduced for the limit. We apply RVSM to CNNs
for multi-scale curve classification. In Sect. 3, we describe our data set, CNN
architecture and training, the CNN performance in terms of network accuracy
and sparsity. We compare weight distributions of sparse and non-sparse networks.
Concluding remarks are in Sect. 4.
Algorithm 1. RVSM
Initialize u0 , w0 randomly.
while not converged do
ut+1 ← arg minu Lβ (u, wt )
ŵt+1 ← wt − η∇f (wt ) − ηβ(wt − ut+1 )
t+1
wt+1 ← ŵ ŵt+1
end
We also consider the transformed 1 (TL1) penalty [8], which nicely interpolates
the 0 and 1 penalties:
(a + 1)|x|
ρa (x) =
a + |x|
By solving the problem with TL1 penalty, we can also get a thresholding operator
Ta,γ in closed form [8]:
0 if |wi | ≤ t
Ta,γ (wi ) = (3)
ga,γ (wi ) if |wi | > t,
where
2 φ(x) 2a |x|
ga,γ (x) = sgn(x) (a + |x|) cos − +
3 3 3 3
and φ(x) = arccos 1 − 27γa(a+1)
2(a+|x|)3 . Here the parameter t depends on γ as:
a2
γ a+1
a if γ ≤ 2(a+1)
t=
a2
(4)
2γ(a + 1) − a2 if γ > 2(a+1) .
Remark 1. It follows from the Theorem above that the limit point (ū, w̄) satisfies
the equilibrium equations for the 0 , 1 and transformed-1 penalties respectively
as:
The system (5) serves as a novel “critical point condition”. This is particularly
useful in the 0 case where the Lagrangian function Lβ (u, w) is discontinuous in
u.
3 Experimental Results
We apply the RVSM algorithm to convolutional neural networks to see how
it brings about a sparse network. After training, the w̄ is sparse with small
components removed, and it serves as the network weight for inference. In the
following experiment, we consider a convolutional neural network of 3 layers and
a data set of 100 × 100 binary images. What we care about is the percentage
of the weights which are zero after training the sparse network. Many of the
algorithms can result in a sparsity of over 90%, which means only less than
804 F. Xue and J. Xin
10% of the parameters contribute to the model. This makes our model far more
efficient than the original one without regularization.
In order to find out how the weights are distributed in each layer, we go
through the structure of the network. Figure 1 shows the number of nodes in
each layer, from which we can simply calculate the number of weights needed
to connect the nodes.1 We apply 32 3 × 3 filters to the initial image to get the
first convolutional layer, which results in 32 × 3 × 3 = 288 weights. Similarly,
each of the second and the third convolutional layer contains 32 × 32 × 3 ×
3 = 9216 weights, if we apply 32 3 × 3 filters again. After each convolutional
layer, we add one max pooling layer with a 2 × 2 filter and a stride of 2. The
dimension of each image is not changed after each convolution, since we have
applied padding. But it is reduced by a half on both the width and the height
after max pooling because of a stride of 2. Thus the dimension of the image
is reduced from 100 × 100 to 50 × 50, to 25 × 25 and finally to 13 × 13. So
this produces 13 × 13 × 32 × 128 = 692224 weights when constructing a dense
layer of 128 nodes. Finally, 128 × 2 = 256 weights are used to connect the dense
layer to the output layer of 2 nodes, if our goal is to classify the images into
two categories. From the above discussion, we notice that 97.3% of the weights
are concentrated to the dense layer. We will see that most of them contribute
nothing to the model after we train the sparse network.
The first data set we use is the images of the handwritten alphabet by Parkin-
son’s disease (PD) patients and normal handwritten alphabet. We know that
many PD patients may suffer from tremors in their daily life and work. One
remarkable feature is that the words they write can be much shakier than the
normal, which can be used to distinguish a PD patient during diagnosis. Figure 22
shows one real example of handwritten sentence by a PD patient.
1
When generating the figure, we used a tool by Alex Lenail available at http://
alexlenail.me/NN-SVG/LeNet.html.
2
https://en.wikipedia.org/wiki/Micrographia (handwriting).
Learning Sparse Neural Networks 805
From our point of view, these two writing styles – normal vs. shaky – can
be treated as two fonts. There is one Parkinson’s font available on the internet,3
which contains the whole alphabet of the 52 uppercase and lowercase letters. We
simulate a training set of 5,000 observations and a test set of 1,000 observations
by adding some rotations, affine transformations and elastic distortions [5]. As
we have mentioned, this is a data set of 100 × 100 binary images, of which some
samples are shown in Fig. 3. Though our model is used to distinguish the letters
written by a Parkinson’s disease patient in this single experiment, it can be
simply applied to classify any other fonts.
As most of the redundancy appears in the dense layer, we apply the threshold
step of the algorithm to the weights in dense layer only. This is because if we
use the same λ and β in all the layers, the proportion of zero weights in the
convolutional layers might be high, where the zero weights can indeed grade
the model. Compared to the dense layer of 700,000 weights, there is not much
freedom to modify the convolutional layer of 10,000 weights. Too much sparsity
leads to a sizable loss of accuracy.
In our models, we have the freedom to set the thresholding parameters,
namely β, λ and a. A higher threshold usually means more sparsity, since more
weights are forced to zero by the threshold. From the formula (1) and (2) for the
0 and 1 penalties, it is clear that the larger λ is and the smaller β is, the higher
the threshold γ will be. Given the same thresholding parameter γ, the 0 model
may result in a sparser model than 1 , since its threshold is a square root of γ,
3
https://www.dafont.com/parkinsons.font.
806 F. Xue and J. Xin
which is higher. From the formula (3) and (4) for the TL1 penalty, the smaller a
is, the higher the threshold is. As discussed in the previous section, when a goes
to infinity, TL1 becomes 1 . When a goes to 0, it becomes 0 . So as to achieve
more sparsity, we may choose a small a.
Our algorithm converges quickly after a few iterations. In most of the cases, it
obtains an accuracy of 95% and a sparsity of 60% after 10 epochs. The accuracy
soon goes up to 98% within 20 epochs, while some models achieve a sparsity of
around 90% eventually. Figure 4 shows the convergence of the training algorithm.
Table 1 shows our results of sparsity and testing accuracy. It verifies what
we discussed on the thresholding parameter. That is, when the threshold grows
higher, the sparsity also grows correspondingly. When a is less than 0.1, we
achieve a sparsity of 86%, while the accuracy remains high. The key point should
be noticed is that these sparse networks achieve almost the same, or even better
accuracy than the non-sparse model. Thus we affirm that around 90% of the
parameters are redundant, as they hardly contribute to the accuracy of the
model.
Another data set we consider is the images of normal vs. shaky planar shapes
like triangles and quadrangles (not necessarily convex). It can be viewed as
another demonstration of PD patients’ handwriting, as what they draw are some-
how shaky, likewise the letters they write. This data set of 100×100 binary images
is simulated by adding random noise to the normal planar shapes. Figure 5 shows
some sample images of our shapes. The results on this data set are similar to
those of the first data set, as shown in Table 2. So RVSM also achieves high
accuracy and sparsity on multi-scale planar curve data.
More properties of our sparse networks are as follows. First, there is a remark-
able difference in distributions of the weights between the sparse and non-sparse
models. For the sparse model, most of the weights are zero, while the rest are
very close to zero. So its distribution looks like a vertical line plus some noise on
the interval close to zero. In our example of non-sparse model, it also has a peak
at zero. However, very few weights are exactly zero. Many of them are merely
close to zero, while a large proportion are far away from zero. What’s more, the
Learning Sparse Neural Networks 807
What we also notice is that, RVSM performs much better than applying SGD
directly to the TL1 penalized loss functions. As shown in Table 3, most of the
normalized weights in the SGD model are distributed between 10−5 and 10−3 . It
seems there is no apparent criterion to judge if a weight of 10−4 should be set to
zero or it does contribute to the network. However, for the RVSM method when
a = 0.01, it is clear that 8.7% of the weights are greater than 10−4 and 84.9% of
the weights are less than 10−10 . There is a significant gap between the two scales
of 10−4 and 10−10 , which makes it reasonable to set all the weights less than
10−10 to zero. This leads to a network of 84.9% sparsity. Another point worth
mentioning is that applying SGD directly to the penalized loss function may hurt
the accuracy a lot at a = 0.01, resulting in 96.7% accuracy for the model. This
is because when a is small, the penalized term behaves like 0 , which renders
the objective function nearly singular. RVSM resolves this issue by making the
penalty implicit to a thresholding process, which gives an accuracy of 99.5%.
Table 4 shows another interesting phenomenon. Since the weights are ran-
domly initialized with mean zero, there is roughly even split of plus/minus signs
808 F. Xue and J. Xin
Table 2. Testing sparsity and accuracy for the data of planar shapes.
Table 3. Sparsity and accuracy: RVSM vs. Direct SGD for TL1 penalty
in all layers. At the end of training, we counted the number of sign changes in the
kernel of each convolutional layer, and found that more weights changed signs in
the first convolutional layer than in the next two layers. This is consistent with
the network filters structured towards low pass in depth after training.
Learning Sparse Neural Networks 809
4 Conclusions
In this paper, we have applied the RVSM algorithm to learn sparse neural net-
works. We have achieved an accuracy of 99% and a sparsity of 87% when training
CNNs on a data set consisted of synthetic handwritten letters and planar curves
by PD patients, and normal handwriting. We have also discussed the tuning
of thresholding parameters, and verified the fact that a higher threshold can
produce higher sparsity. What’s more, our experiments show that the RVSM
outperforms the direct application of SGD on the penalized loss function, in
both sparsity and accuracy. The RVSM generates a significant gap between the
weights of large scale and small scale, which acts as an indicator to show sparsity.
In future work, we plan to explore a wider variety of PD patient data and more
refined multi-class classification tasks.
References
1. Blumensath, T., Davies, M.: Iterative thresholding for sparse approximations. J.
Fourier Anal. Appl. 14(5–6), 629–654 (2008)
2. Daubechies, I., Michel, D., De Mol, C.: An iterative thresholding algorithm for linear
inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11),
1413–1457 (2004)
3. Dinh, T., Xin, J.: Convergence of a relaxed variable splitting method for learn-
ing sparse neural networks via 1 , 0 , and transformed-1 penalties (2018).
arXiv:1812.05719
4. Louizos, C., Welling, M., Kingma, D.: Learning sparse neural networks through 0
regularization. In: ICLR (2018). arXiv:1712.01312v2
5. Simard, P., Steinkraus, D., Platt, J.: Best practices for convolutional neural net-
works applied to visual document analysis. In: Seventh International Conference on
Document Analysis and Recognition, pp. 958–963. IEEE (2003)
6. Yin, P., Zhang, S., Lyu, J., Osher, S., Qi, Y-Y., Xin, J.: Blended coarse gradient
descent for full quantization of deep neural networks. Res. Math. Sci. 6(1), 14 (2019).
arXiv:1808.05240
7. Yu, D., Deng, L.: Automatic Speech Recognition: A Deep Learning Approach. Sig-
nals and Communication Technology. Springer, New York (2015)
8. Zhang, S., Xin, J.: Minimization of transformed l1 penalty: closed form representa-
tion and iterative thresholding algorithms. Comm. Math. Sci. 15(2), 511–537 (2017)
Pattern Recognition with Using Effective
Algorithms and Methods of Computer Vision
Library
1 Introduction
• Measurement
• Edge detection
• Pattern matching
Pixel Count - counts the number of light or dark pixels. With the help of a pixel
counter, user can select a rectangular area on the screen in a place of interest, for
example, where he expects to see faces of people passing by [3]. Camera will
immediately respond by providing the data about the number of pixels represented by
the rectangle sides. The pixel counter allows you to quickly check whether the mounted
camera corresponds to regulatory or customer requirements regarding the pixel reso-
lution, for example, for the recognition of faces of people entering the doors, which are
monitored by the camera, or for the license plate recognition.
Binarization - converts an image in grayscale to binary (white and black pixels). The
values of each pixel are conventionally encoded as “0” and “1”. The “0” value is
conventionally referred to as background, while the “1” value is foreground. Often,
while storing digital binary images, a bitmap is used, where one bit of information is
used to represent one pixel.
Also, especially during the early stages of the technology development, the two
available colors were black and white, which is not mandatory.
The simple example of image binarization is illustrated below:
The cvAdaptiveThreshold function converts an image displayed in grayscale into a
monochrome image according [7] to the formulas below:
max value if srcðx; yÞ [ Tðx; yÞ
CV THRESH BINARY dstðx; yÞ ¼ ð1Þ
0 otherwise
0 if srcðx; yÞ [ Tðx; yÞ
CV THRESH BINARY INV dstðx; yÞ ¼
max value otherwise
ð2Þ
challenges and urgent tasks in this way. The introduction of such technologies as
computer vision requires serialization and product-based approach, which allows to
reduce the cost of single implementations [3].
Today we are talking more about the computer vision. There is also a term “ma-
chine vision”, which refers to the technique. There are video cameras similar to those
which are used for video surveillance, there are webcams that are used for commu-
nications, and there are special cameras used in the industry. They differ because of
such features as an absence of normal Ethernet port, usage of special protocols, and an
ability to transmit, for example, 750 frames per second, and not in burst mode, but
continuously, without compression. There are special cameras with specific sensitivity
lying in different diapason than that optically visible to the eye.
1 ðx2 þ y2 Þ=2t
gðx; y; tÞ ¼ e ð3Þ
2pt
According to the source below, we use the differential equation formula to change
the brightness and matrix value (Fig. 1):
Below, we obtained test (experimental) angle detection data using the Harris angle
recognition algorithm (Fig. 2):
Fig. 2. Image upload function of JPEG format picture and grey scale (binarization) Harris
corner detection
In the developed program, the same principle of coal determination is used, only
the maximum value is calculated for its own values. The maximum element and values
close to it are the corner points. It was experimentally determined that the threshold
value, at which the best result is achieved, is 70% of the maximum value. The
developed function has two formal parameters. The first one is the name of the jpg file,
the second is the threshold value [1]. The result of the execution is the coordinates of
the angles. As already mentioned, the Harris algorithm is one of the first values, but it
has a significant minus - a great deal of computational complexity. To increase the
speed of the original image is compressed, the coordinates of the corners are calculated,
after which the image takes its original size.
3 Experimental Results
Fig. 3. The results of testing the recognition methods SURF, FAST and SIFT
Fig. 6. Image recognition and search for objects method SURF. Invariance of this method
4 Contour Analysis
Contour analysis is one way to recognize images. Developers can easily define the
contours of images and manage them using the OpenCV library. The cvFindContours
function helps you find the outlines of a graphic image.
With the help of the function cvFindContours, you can find and determine the
number of contours of the monochrome image. The first_contour pointer, which
contains the pointer to the first external loop itself, is filled with functions. Also, this
index can be contained by RUNNING THE CARD with undetected contours (if the
picture is full black). Using h_next and v_next links, it is possible to determine other
contours. OpenCV has a CvSeq structure that provides an interface for all dynamic
structures, the structure of which describes the sequence of patterns [4].
Small contours often interfere with the recognition of images. In order to solve this
problem, it is necessary to run through the whole circuit of contours, at the same time
check the dimensions of each contour and shift the cycles [7].
One of the characteristics of the contour, which is summed with all the pixels of the
contour, is the moment. He has the following definition:
818 S. B. Mukhanov and R. Uskenbayeva
Xn
mpq ¼ i¼1
I ðx; yÞxp yq ð5Þ
where p is the order of x, and q is the order of y. Order is the power that determines the
sum of the component with the other components displayed. Below we got the results
of testing the algorithm for obtaining the contours of an object, which are displayed in
Fig. 7. The contours of different objects are highlighted here. The contours of the red
circle are clearly expressed in each object.
5 Conclusion
This paper has made a research on different types of recognition of algorithm patterns
and the definition of the contour of objects has been enhanced. It means that with the
help of basic Harris corner recognition algorithms and methods for determining local
features such as SURF, SIFT and FAST, the obtained algorithm recognizes contours in
every image. The issue was incomplete quality of all images. Furthermore, various
problems like blurring or merging, as well as interference and damage in the pictures,
were met. To clear up the obtained problems, Harris method function was used, then
the source image uploaded and converted to the gray (Load source image, Detector
parameters, Detect corners). The given results show the performance of recognition
algorithms. Contour analysis was implemented on the basis of a mathematical model,
which is indicated at 5-th formula. This algorithm works very efficiently, which can be
observed in Fig. 7. The computer vision - OpenCV technology (library) helps to deeply
understand the problem area, as well as flexible development in a non-trivial way. On
the basis of machine learning and neural networks, problems in the field of pattern
recognition are solved very effectively. Further, these methods and algorithms can be
optimized for more accurate or high results. In the future, this work will use the new
technology and programming languages. For example, the Python programming
Pattern Recognition with Using Effective Algorithms and Methods 819
language has a rich machine learning library. That allows to solve a number of problem
areas with the help of flexible, but effective development tools. We will not completely
supplant OpenCV C-based computer vision technology, however, we must offer the
latest technology and programming languages designed in this area.
References
1. Edward, R., Tom, D.: Machine learning for high speed corner detection. In: 9th European
Conference on Computer Vision, vol. 1, pp. 430–443 (2006)
2. Edward, R., Reid, P., Tom, D.: Faster and better: a machine learning approach to corner
detection. IEEE Trans. Pattern Anal. Mach. Intell. 32, 105–119 (2010)
3. Herbert, B., Andreas, E., Tinne, T., Luc, V.G.: SURF: speeded up robust features. Comput.
Vis. Image Underst. (CVIU) 110(3), 346–359 (2008)
4. Novikov, A.I., Sablina, V.A., Nikiforov, M.B., Loginov, A.A.: Contour analysis and image
superimposition task in computer vision system. In: Proceedings of 11th International
Conference on Pattern Recognition and Image Analysis: New Information Technologies
(PRIA-11-2013), vol. 1, pp. 282–285 (2013)
5. Ke, Y., Sukthankar, R.: PCA-SIFT: a more distinctive representation for local image
descriptors. In: CVPR, vol. 2, pp. 506–513 (2004)
6. Gary, B., Adrian, K.: Learning OpenCV 3 Computer Vision in C++, 1 edn. ISBN-13: 978-
1491937990. O’Reilly Media (2016)
7. Kruchinin, A.: Pattern Recognition with Using OpenCV Library. Кpyчинин A.Ю. http://
recog.ru (2013)
The Practice of Moving to Big Data on the Case
of the NoSQL Database, Clickhouse
Abstract. In the modern world, every technology and user generate a large
amount of data. Each data carries value to some degree. Therefore, the concept
of big data is actively developing because the idea of big data is to generate a
new value. Addressing big data is an invocation and time-demanding job that
needs a large computational infrastructure to ensure successful data processing,
storage, and analysis. This report is intended to compare how one of the big data
storage, Clickhouse, can replace the relational database, Oracle. This paper
motivation is to obtain an understanding of the benefit and drawbacks of NoSQL
database, in the case of Clickhouse to supporting a huge amount of data.
Keywords: Big data Big data value chain Data storage NoSQL
Clickhouse Column database
1 Introduction
Big Data is a phenomenon defined as the rapid acceleration in the expanding volume,
high velocity, and diverse types of data. Big Data is often defined along three
dimensions such as volume, velocity, and variety [5]. All three specifications influence
on the choice of data storage. NoSQL database can be a solution in the data storage.
Therefore, this raises the research question:
Can NoSQL DBMS Clickhouse serve as a data store layer for Big Data?
In order to provide an answer to this question following objectives are defined:
• to analyse available literature and to define terms such as “Big Data”, “Data Storage
layer” and data storage technologies
• to define a list of parameters for comparison
• to conduct the test regarding defined parameters and formulate research results
The aim of this paper to concentrate on one of the big data storage technologies and
compare how this technology, Clickhouse, can replace the relational database, Oracle.
In order to archive this purpose, firstly literature review was done. This part tried to
understand the meaning of “Big Data” and stages of the Big Data value chain; and the
2 Main Part
While another resource refers to the big data value chain which adds value at each step
of delivering data. It represented by seven phases: data generation, data collection, data
transmission, data pre-processing, data storage, data analysis and decision making [8].
Lehmann, etc. (2016) defined four sequential phases: data generation, data acqui-
sition, data storage, and data analytics. The initial four layers of the reference structure
relate to the process steps of this Big Data value chain while the last one delivers
valuable outcomes (see Fig. 2) [12].
Data Storage Layer
All three presented chains have the storage layer before the final step, value generation.
The perfect big data storage system would permit storage of an unlimited amount of
data, adapt with high rates of the insert and select; be flexible and effective in managing
with the different data types; and deal both structured and unstructured data. In addi-
tion, the data presented in the storage layer should be encrypted [4].
Big data storage technologies should also address the 6Vs challenges and “do not
fall in the category of relational database systems” [4]. The relational database system
can address these challenges, but unconventional storage technologies such as
columnar stores or Hadoop Distributed File System (HDFS) can be effective and be
cheaper [4]. According to [8], RDBMS with plain SQL data analysis techniques
struggles to process the increasing amount of data, especially processing the unstruc-
tured data types. Because of the Atomicity, Consistency, Isolation, and Durability
(ACID) constraints, large data scaling in RDBMS is impossible and dealing with semi-
structured and unstructured data is impossible [8]. These restrictions of RDBMS led to
the concept of NoSQL. NoSQL, “schema-free” databases, supports unstructured data
and enables a quick update without rewrites. NoSQL represents document stores, key-
value stores, column stores, and graph database. Data management and data storage
functionalities are separated in the NoSQL databases, which permit the scalability of
data [8]. For example, in the column-oriented database, “data from a given column are
stored together”, which allows flexible scaling and “each row can have a different set of
columns that allow tables to be sparse” with no extra spending [10].
The big data storage systems should ensure a durable storage area and effective
access to the data. “The distributed storage systems for big data should consider factors
like consistency (C), availability (A) and partition tolerance (P)”. However, the CAP
theory claims that not all requirements can be implemented simultaneously in one
technology [8]. “Hadoop is good at managing fewer very large files”, but when there is
a lot of small files it raises the issues in performance and additional ETL steps to
combine this small files [8, 13]. Hadoop, also, does not support “query optimiser”
which can lead to “an inefficient cost-based plan” [8]. NoSQL databases are created for
scalability sometimes by sacrificing consistency [4]. This article is aimed to compare
NoSQL database with relational database, in the case of Clickhouse introduced by
Yandex company. As the object of the study was chosen the project which is immi-
grated from Oracle to Clickhouse to generate big data.
3 Methodology
To identify features and finding deviations, the list of indicators for comparison should
be determined. For complex comparison of the DBMS Clickhouse with object-
relational database management system articles with a comparative analysis were
824 B. Imasheva et al.
analysed. These articles tried to identify the strengths and weaknesses of the software.
They, also, helped to determine the direction of the software research.
Since Clickhouse on the market from 2009, there are already comparative analyses
with other products like the Apaches Spark. Alexsander Rubin in article “Clickhouse in
a General Analytical Workload (Based on a StarSchema Benchmark)” compares three
software products such as MariaDB Column Store version1.0.7 (based on InfiniDB),
DBMS Clickhouse and Apache Spark [14]. The purpose of this research is to show
how column BDSM is better in compression and in performance. Compression is
defined an effective use of the disk space. While, the performance is the speed at which
the software product performs query. However, it is not enough to consider just two
parameters in order to get insides. In addition, this research compared different types of
database. While Yishan Li and Sathiamoorthy Manoharan analysed the same type of
database, key-value, using the CRUD (Create, Read, Update, Delete) model [15]. This
research underlines that analysing the database regarding their main feature with their
relatives helps to distinguish the fastest and optimized database [15]. Based on this, we
also compare the optimized function of column-oriented DBMS, column-reading
speed, in contract with row reading.
Company AltinityDB conducted test to check Clickhouse stability in time series.
Time Series Benchmark Suite was developed by InfluxDB engineers and modified by
Timescale team [16]. Clickhouse was not prepared for such kind of tests, but it showed
good results using several nodes, and even in some cases faster than TimescaleDB and
InfluxDB. This specific test showed that architectural features of the database to ver-
tical extensibility should be considered as one of the parameters of the comparative
analysis. Moniruzzaman A. and Hossain S. performed research that identified the
architectural differences of the various NoSQL databases [17]. Parameters such as
sharding, replication, programming language, horizontal/vertical extensibility, and
query language were used as parameters of analysis.
Roman Leventov made the compared analytical models of the databases, Click-
house, Apache Druid and Apache Pinat [18].
After an analysis of sources, the list of indexes was indicators. Table 1 summarises
the parameters of the further research.
Table 1. (continued)
Indicators group Indicators Type of indicators
Speed of join Measuring (s)
Speed of count Measuring (s)
The main differences The method of reading data Descriptive
Type of table engines Descriptive
Software limitations Descriptive
Analytical aspects Descriptive
Architectural indicators will help to determine the optimality of the current system
for projects and its compatibility with other products. Query engine shows main
parameters of data processing. The last part provides general distinctions of column
database.
As the dataset was used one of the systems which in Oracle dump file size is 1.4 Tb
after unzip 1.6 Tb, while in Clickhouse 177 Gb.
The Table 3 shows the result of analysis.
826 B. Imasheva et al.
5 Conclusion
To conclude, the NoSQL database was introduced one of the technologies to store Big
Data. As one of the presenters of the NoSQL database was chosen Clickhouse. The aim
of this paper to compare and contrast Clickhouse can replace the relational database,
Oracle. To archive this purpose the following objectives overcame:
• existing literatures was analysed and the main term were defined;
• the list of parameters for comparison was set;
• the comparison analyse was conducted and results of analysis was made.
As result, Clickhouse database concentrates on analytical processing of large scope
datasets, suggesting increased scalability over commodity hardware. Clickhouse shows
the ability to accumulate and index arbitrarily big data while allowing a large number
of simultaneous user requests. Clickhouse database has the ability for analytical data
processing. It, also, should be understood that Clickhouse DBMS has not only archi-
tectural features but also distinctive features such as running a query based on a sample
data to get an approximate result and using aggregation for a limited number of random
keys. In order to archive the better performance Clickhouse sacrifice the accuracy of
calculations. Moreover, this is acceptable for the analytical database.
828 B. Imasheva et al.
References
1. Laney, D.: 3-D Data Management: Controlling Data Volume, Velocity, and Variety. META
Group Res Note 6, Stamford (2001)
2. Loukides, M.: What is Data Science. O’Reilly Media (2010)
3. Jacobs, A.: The pathologies of big data. Commun. ACM 8(52) (2009)
4. Cavanillas, M., Curry, E., Wahlster, W.: New Horizons for a Data-Driven Economy: A
Roadmap for Usage and Exploitation of Big Data in Europe. Springer Open, Cham (2016)
5. TechAmerica Foundation.: Demystifying Big Data: A Practical Guide To Transforming The
Business of Government. TechAmerica Foundation, Washington (2012)
6. Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int.
J. Inf. Manag. 2(35), 137–144 (2015)
7. IBM Analytics. https://www.ibmbigdatahub.com/infographic/four-vs-big-data. Last acces-
sed 05 Feb 2019
8. Bhadani, A., Jothimani, D.: Big data: challenges, opportunities and realities. In: Singh, M.K.,
Kumar, D.G. (eds.) Effective Big Data Management and Opportunities for Implementation
2016, pp. 1–24. IGI Global, Pennsylvania (2016)
9. Rajkumar, B., Rodrigo, C.A., Vahid, D.: Big Data Principles and Paradigms. Morgan
Kaufmann, Cambridge (2016)
10. Sakr, S.: Big Data 2.0 Processing Systems: A Survey. Springer Publishing Company,
Incorporated (2016)
11. Curry, E., Freitas, A., Ngonga, A.: D2.2.2 Final Version of Technical White Paper. Big Data
Public Private Forum, pp. 2–8 (2014)
12. Lehmann, D., Fekete, D., Vossen, G.: Technology Selection for Big Data and Analytical
Applications. European Research Center for Information Systems No. 27. (2016)
13. Cloudera Engineering Blog. https://blog.cloudera.com/blog/2014/09/getting-started-with-
big-data-architecture/. Last accessed 04 Feb 2019
14. Rubin, A.: Column Store Database Benchmarks: MariaDB ColumnStore vs. Clickhouse vs.
Apache Spark. https://www.percona.com/blog/2017/03/17/column-store-database-benchmarks-
mariadb-columnstore-vs-clickhouse-vs-apache-spark/. Last accessed 17 Jan 2019
15. Yishan, L., Sathiamoorthy, M.: A performance comparison of SQL and NoSQL databases.
In: Communications, Computers and Signal Processing. New Zealand (2013)
16. Altunity. ClickHouse for Time Series. https://www.altinity.com/blog/clickhouse-for-time-
series. Last accessed 05 Jan 2019
17. Moniruzzaman, A., Hossain, S.: NoSQL database: new era of databases for big data analytics-
classification, characteristics and comparison. Int. J. Database Theory Appl. 6(4) (2013)
18. Leventov, R.: Comparison of the Open Source OLAP Systems for Big Data: ClickHouse,
Druid and Pinot. https://medium.com/@leventov/comparison-of-the-open-source-olap-
systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7. Last accessed 07 Jan 2019
19. Yandex. Distinctive Features of ClickHouse. https://clickhouse.yandex/docs/en/introduction/
distinctive_features/. Last accessed 07 Jan 2019
20. Oracle. Database Limits. https://docs.oracle.com/cd/B28359_01/server.111/b28320/limits.
htm#REFRN004. Last accessed 07 Jan 2019
Economics and Finance
Asymptotically Exact Minimizations
for Optimal Management of Public
Finances
1 Introduction
Eric published in 1992 in [4]. The modeling of the optimal management of pub-
lic finances gives an optimization problem of the same type as the problem (15)
whose constraints are all inequalities constraints and differentiable. This type of
problem is a good candidate for testing our algorithms. A theoretical resolution
assuming that all constraints are inactive was made in [4]. This makes it possible
to solve only the system (17). We will return to this resolution and then we will
numerically resolve the problem in the general case where the constraints are
active or nonactive. It is clear that the theoretical (or manual) resolution of the
system (16) in the case where the constraints are active or nonactive is almost
impossible because of the multiplicity of sub-sythems generated by the exclusion
condition.
We briefly describe the formulation of the public finances management prob-
lem as follows (see [4], for more details):
The objective function: The government has a goal (a long-term target) on
both expenditures and revenues (g ∗ , t∗ ). The existence of significant adjustment
costs leads the government to gradually correct its situation (in terms of revenue
and expenditure) so as to reach its target, in the absence of constraints. We admit
here an exponential adjustment of the recipes of the form:
topt
s = θ1 ts−1 + (1 − θ1 )t∗ (1)
where θ1 ∈ [0, 1], and ts means public revenue divided by GDP-per capita at time
s. In the same way, we admit here an exponential adjustment of the expenses of
the form:
constant for s > 0, and τ is the discount rate. Without adjustment costs, the
objective function of the government becomes:
∞ s
1+n
f (s, gs , ts ) = (1 − c)(ts − topt )2 + c(gs − g opt )2 (6)
s=1
1+τ
−m
r+1
Multiplying each member of the Eq. (13) by n+1 , we obtain
m m
s
n+1 n+1 n+1
bm = ds + b−1 (14)
r+1 s=0
r+1 r+1
To sum up, the problems to be solved fall into two categories: the case where
the Government does not have an adjustment cost and the case in which the
Government have it. According to the intertemporal budget constraint, each
category has three types of problems, as follows:
Category 1: the Government doesn’t have an adjustment cost; this category
groups the problems № 1, 2 and 3. Category2: the Government has an adjustment
cost; this category groups the problems № 4, 5 and 6.
In this paper, we solve any optimization problem of type
When the constraints of the problem are equality constraints, the exclusion con-
dition in (16) becomes obvious. So, find a solution of the problem (15) is to find
a point satisfying the system (16) without the exclusion condition. Our termi-
nology is based on [1,2]. Additionally, when all the constraints of the problem
(15) are not actives at an optimal point, according to exclusion condition, all
KKT multiplicators are equal to zero (λi = 0 ∀ i). In this case, we have:
∇f (x) = 0 (17)
Equation (17) will be somehow to solve. In this work, we will solve the problems
following each case (all the constraints are inactives that is to say gi (x) < 0 ∀ i
or the constraints are not necessarily actives gi (x) ≤ 0 ∀ i). We will compare
the solutions that will be obtained in each case and see the solutions that related
the reality. The principle of ours algorithms is to find x ∈ K such that
2 Main Results
For the resolution of each problem, we will define a limit of size N which can be
chosen very large depending on whether we decide to know the optimal values by
year or by month or by days. Recall that each value topt s represents the optimal
value on the date s. Each of the problems depended on two variables (quantities)
which are the recipe t and the expenditure g. Let’s define a new vector x = (t, g).
Our problems will depend only on this vector x. We consider a finite horizon
where time is discretized as S = {0, 1, . . . , N −1}, we have t = {t0 , t1 , . . . , tN −1 },
836 J. Koudi et al.
Let’s
denoteI = {0, 1, . . . , 2N − 1} and h1 (x) = (hi (x))i∈I . We have: ∇h1 (x) =
IN O
where IN is the unitary matrix of order N . Apart the feasibility
O − IN
conditions, the problems № 1 and 4 have the same constraints. The problems №
2 and 5 have in addition those equality constraints defined by:
N −1
b−1
h2 (x) = αs (xs − xN +s ) − =0 (22)
s=0
α
Consider the assumptions made by Loué Jean-François and Jondeau Eric since
1992 in [4] (that is to say, the revenues will never reach their fixed higher level
and that the expenditures will always be beyond their lower fixed level) about
the problems № 2 and 5 we have h1s (x) < 0 for all s ∈ I. With this assumption
we can easily solve the problems because λs = 0 ∀s ∈ I (KKT conditions).
Mathematically, these optimal solutions will be interior points. So, the system
(24) becomes:
⎧ s
⎪ 2γ (1 − c)(ts − t∗ ) + μαs = 0 ∀ s ∈ I
⎧ ⎪
⎪
⎨ ∇f (x) + μ∇h2 (x) = 0 ⎪ s ∗
⎨ 2γ c(gs − g ) − μα = 0 ∀ s ∈ I
s
b−1
h2 (x) = 0 ⇐⇒ αs (ts − gs ) − =0 (26)
⎩ ⎪
⎪
μ∈R ⎪
⎪
α
⎩ s∈I
μ∈R
which give:
s s
μ α μ α
ts − t ∗ = − , gs − g ∗ = ,
2(1 − c)γ 2c γ
s
μ α
t s − gs = + (t∗ − g ∗ )
2c(c − 1) γ
Replace ts − gs in the equality constraint in the system (26), we obtain the
following equation.
μ
s
α
b−1
αs + (t∗ − g ∗ ) − =0 (27)
2c(c − 1) γ α
s∈I
and
∗ ∗
2c(c − 1) b−1 − (t − g ) α s+1
s∈I
μ= α2s+1 (28)
γs
s∈I
s
∗ ∗
c α
γ b−1 − (t − g ) α s+1
ts = t∗ −
s∈I
α2s+1 (29)
γs
s∈I
s
∗ ∗
(c − 1) α
γ b−1 − (t − g ) α s+1
∗ s∈I
gs = g + α2s+1 (30)
γs
s∈I
838 J. Koudi et al.
So under the assumption that the discount rate is lower than the interest rate
(τ < r) and that the growth rate is lower than the interest rate (n < r). By
∞ ∞
α2 α2s+1
applying the formula (31), we obtain: αs+1 = and =
s=1
1−α s=1
γs
α3 s
Furthermore αγ converge to zero. This leads to the conclusion that
γ−α 2
revenues and expenses tend towards their targets (t∗ ; g ∗ ) on the infinite horizon.
Note that leaving the assumptions made by Loué Jean-François and Jondeau
Eric on the unsaturation of budget constraints and recipes, calculations become
complicated and the problems can only be solved numerically.
gopt = [48.702, 48.982, 48.393, 48.472, 48.798, 48.567, 48.762, 48.467, 48.695,
48.631, 48.507, 48.613];
ε2 = 26.207; εm = 0.426; time = 1.534 s.
Problem № 3
topt = [48.568, 48.364, 48.329, 48.880, 48.904, 48.636, 49.073, 48.585, 48.724,
48.838, 49.002, 48.624];
gopt = [48.814, 48.820, 49.039, 48.591, 49.014, 48.889, 48.665, 48.648, 48.233,
48.459, 48.534, 48.729];
ε2 = 26.735; εm = 0.430; time = 2.616 s.
Problem № 4
topt = [45.099, 50.797, 50.512, 51.837, 51.251, 51.267, 52.281, 50.864, 48.412,
50.105, 50.482, 49.495];
gopt = [44.806, 45.161, 45.689, 44.587, 44.725, 46.276, 45.995, 43.831, 44.266,
42.305, 44.079, 45.076];
ε2 = 33.984; εm = 0.485; time = 4.244 s.
Problem № 5
topt = [42.965, 46.971, 47.602, 48.367, 48.692, 47.647, 47.996, 49.729, 49.295,
49.079, 49.781, 48.699];
gopt = [46.848, 47.105, 47.533, 48.464, 48.419, 48.546, 48.979, 49.119, 46.914,
47.756, 48.267, 48.490];
ε2 =12.508;εm = 0.294; time = 1.582 s.
Problem № 6
topt = [42.934, 48.117, 48.816, 49.034, 48.272, 49.076, 48.987, 49.043, 49.434,
49.200, 49.572, 48.871];
gopt = [47.206, 47.948, 48.703, 48.959, 49.171, 49.157, 48.473, 48.993, 47.893,
47.374, 48.333, 48.768];
ε2 = 13.456; εm = 0.305; time = 1.779 s.
For the daily option: the Government can decide to evaluate for each day the
optimal level of its revenues and the optimal level of its expenses.
Recall that the size of the problem here is such that the number of variables
N v = 730 and the number of constraints is N c = 730 for the problems № 1 and
4 and N c = 731 for other. All solutions obtained are eligible, because of the
large size of the data topt and gopt (vectors of 365 size each). We will present only
the costs borne by the government, time elapsed of numerical resolution and the
curves of data evolutions for each type of problem. The results are presented in
the Table 1.
The results obtained by manual calculation (theoretical results) under
assumption in [4] that the revenues will never reach their fixed higher level and
that the expenditures will always be beyond their lower fixed level (h1s (x) < 0
for all s ∈ I), show in fact that the optimal choices of the proposed government
by the model tend towards the objective aimed by the latter (see Eqs. (29) and
(30)). This assumption can be quickly influenced by natural phenomena that
are sometimes unpredictable. Also, in the countries of the third world where the
840 J. Koudi et al.
budget is mainly based on the tax, the receipts are often lower than the hoped-
for level and to avoid to indulge more advantage the government must reduce its
expenses to the lower level fixed in certain period of his exercise. It is therefore
preferable to place oneself in the cases where the receipts can reach their fixed
higher level or not and that the expenses can reach their lower fixed level or
not. The results in these different cases are those of digital tests. Comparing
the results obtained in the case where the government has an adjustment cost
in relation to the case where it does not have one, we immediately note that
the choices are concentrated around the adjustment and converge towards the
fixed objective which is not the case in situations where the government does
not have a cost of adjustment. We note a high concentration of data around an
adjustment than in the other cases. This shows that in practice, the choice of
this option will minimize the fluctuations on the economy of the country.
3 Conclusion
The results obtained are satisfactory and reflect the reality. The goal of the opti-
mal management model used in this work is to find solutions that tend towards
objectives with lower-cost in the presence of an adjustment cost. This is what
the manual results (see Eqs. (29) and (30)) showed us. In the same way, numer-
ical resolutions have given us the same types of results and demonstrate the
difference between no cost-adjustment solutions and cost-adjustment solutions.
It should also be noted that the fourth case of intertemporal budget constraint
that we created in this work has further improved the results. The results of
these problems are more mixed around the adjustment than in the other cases.
So this assumption decreases fluctuations on the economy. All these results show
that the algorithms have a good ability to solve large problems.
References
1. Jean K., Guy D., Babacar, M.N., Mamadou, K.T.: Algorithms for asymptotically
exact minimizations in Karush-Kuhn-Tucker methods. J. Math. Res. 10(2) (2018).
https://doi.org/10.5539/jmr.v10n2pxx
Asymptotically Exact Minimizations for Optimal Management 841
2. Guy, D., Jean, K.: Les multiplicateurs de Lagrange en dimension finie, Edition EUE
(Novembre 2013)
3. Barro, R.J.: On the determination of the public debt. J. Polit. Econ. 87(5) (1979)
4. Jean-François, L., Eric, J.: La gestion optimale des finances publiques en présence
de coûts d’ajustement, Economie & prévision, No 104, 1992-3. Politique budgétaire,
taux d’intérêt, taux de change, pp. 19–38 (1992). https://doi.org/10.3406/ecop.
1992.5292
5. Roubini, N., Sachs, J.D.: Political economic determinants of budget deficit in the
industrial democracies. Eur. Econ. Rev. 33 (1989)
Features of Administrative and Management
Processes Modeling
Abstract. Business processes are created for gaining profit, i.e. they produce
the added value and the product, which represents the value and consumer
qualities. The objectives of administrative and managerial processes of gov-
ernment bodies are somewhat different. Administrative and management pro-
cesses at the state level are mainly focused on controlling access to an activity.
The purpose of this article is to examine the features of administrative and
management processes in terms of requirements. The article dwells upon the
process approach in government bodies on the example of university.
Descriptions of administrative and management processes are given on the
example of scientific activities’ licensing. The systems of standardization of
administrative and management processes, including a description of approa-
ches to the creation of process models are presented. This considers the specifics
of processes at university. In addition, presents the models of university pro-
cesses in notations IDEF0, IDEF3, eEPC.
1 Introduction
Nowadays it has become very difficult to conduct activities of the companies and
government bodies in the context of globalization of processes and market volatility.
This has led to new different management concepts. Among such concepts, the most
promising is the process approach to management or the process management of
companies and government bodies. Subsequent paragraphs, however, are indented.
According to this concept, the activity of a company and government bodies is
represented as a set of processes (divided into separate processes and a network of
processes is created), each of which is functionally autonomous at a certain level but
interconnected by the subject of work functionally and informationally [1].
Accordingly, business process models are still very simplistic - a business process
involves a sequence of chains of actions or operations leading to a goal. Such a
simplified business process model is not adequate to the actual processes and all actions
2.2 Description of the Process Environment and Creation of the “As Is”
Process Model
The main stages of building a business process model at work are the definition of roles
and business functions [7]; binding roles to business functions; determination of the
order of business functions’ execution; adding events, documents and resources. At the
stage of modeling, the following results should be obtained: Process Map, Role Dia-
gram and an “As Is” Model of each considered business process [13].
The process map representing the connection between the various administrative
and management processes of university and their interaction is shown in Model
IDEF0 (see Fig. 1). The process map shows the main processes and connections
between them (for example, the dependence of one process on another, or the
replacement of one process by another when a certain condition is met). It also presents
various documents that are passed from process to process or regulate their course
(standards, instructions, etc.). General activity of the university consists the following
processes: Management and organization of educational processes, Financial and
Economic Management, Implementation of ate administrative procedure of adminis-
trative and economic activities based on the Quality Management System (QMS),
Research activities, Educational activities. In turn, the process of research activities are
divided into the following subprocesses: Planning of research activity of university,
Organization of research activity of university, Doing of research activity of university,
Control of research activity of university [12]. Figure 2 shows the chain of listed
Features of Administrative and Management Processes Modeling 845
subprocesses in eEPC notation. Despite the small faculty and researchers of MUIT
implements this model. At the moment, the university has a license to conduct scientific
activities. In Kazakhstan, organizations carrying out research and (or) research and
technical activities must undergo an accreditation procedure for research activities once
every 5 years.
The “as is” model of each considered administrative and managerial process,
describing the process and reflecting its course, actions, roles, movement of documents
and points of possible optimization.
The “as is” model represents a process in the form of a single action (that is not
disclosing the course of the process), for which an event triggering process, necessary
input data, result, interrupting events, compensating processes, regulatory documents
and related business goals can be shown [11].
Administrative and management processes at the state level are mainly focused on
controlling access to a activity. On the other hand, the university as an organization
carrying out scientific and technical activities is a subject to accreditation [10].
Licensing as one of the forms of access control to an activity and the process of
granting licenses are the most common type of administrative and management process
[14]. The study of automated services of the electronic government of the Republic of
Kazakhstan revealed the absence of an automated service “Accreditation of subjects of
scientific and technical activities” (State Service Code No. 00802002 [9, 10]).
846 R. Satybaldiyeva et al.
Fig. 3. The eEPC model of business processes “Accreditation of subjects of scientific and
(or) scientific and technical activities” part 1
Features of Administrative and Management Processes Modeling 847
Fig. 4. The eEPC model of business processes “Accreditation of subjects of scientific and
(or) scientific and technical activities” part 2
848 R. Satybaldiyeva et al.
Fig. 5. The eEPC model of business processes “Accreditation of subjects of scientific and
(or) scientific and technical activities” part 3
3 Conclusion
The paper discusses the approach to improving the management business processes of
an enterprise for which automation and information systems act as the main means. To
do this, it is proposed to investigate the processes of the university to obtain licensing.
The results of practical research allowed to determine the sequence of work on building
Features of Administrative and Management Processes Modeling 849
References
1. Saarsen, T., Dumas, M.: Factors Affecting the Sustained Use of Process Models. Business
Process Management Forum, pp. 193–209 (2016)
2. Pudovkina, S.G.: Analiz i optimizatsiya biznes-protsessov: uchebnoye posobiye., Chelya-
binsk: Izdatel’skiy tsentr YUUrGU (2013)
3. Varzunov, A.V., Torosyan E.K., Sazhneva L.P.: Analysis and Management of Business
Processes. ITMO University, St. Petersburg (2016)
4. Bedrina, S.L., Bogdanova, O.B., Kiykova, Y.V., Ovsyannikova, G.L.: Modeling of business
processes of higher education institution at introduction of process management. Open Edu.
1(102), 4–11 (2014) (In Russia)
5. Badica, A., Ionascu, C., Radu, C.: Elicitation of business process knowledge: a university
use case. In: Proceedings of the 7th Balkan Conference on Informatics Conference, Craiova,
Romania (2015). https://doi.org/10.1145/2801081.2801120
6. Klimenko, A.V.: Razrabotka metodicheskikh rekomendatsiy po opisaniyu i optimizatsii
protsessov v organakh ispolnitel’noy vlasti v ramkakh podgotovki vnedreniya EAR.
Vysshaya shkola ekonomiki, Moscow (2004)
7. Samuylov, K.Y., Chukarin, A.V., Yarkina N.V.: Biznes-protsessy i informatsionnyye
tekhnologii v upravlenii sovremennoy infokommunikatsionnoy kompaniyey. Alpina Pab-
lisher, Moscow (2016)
8. Asadullin, I., Samigullina, N., Zamaletdinov, R.: Optimizatsiya upravleniya vysshim
uchebnym zavedeniyem na osnove protsessnogo podkhoda. Rektor VUZa 8 (2015)
9. Reyestr gosudarstvennykh uslug Government of the Republic of Kazakhstan. http://ru.
government.kz/ru/postanovleniya. Last accessed 2 Mar 2015
10. Ob utverzhdenii Pravil akkreditatsii subyektov nauchnoy i (ili) nauchno tekhnicheskoy
deyatelnosti Government of the Republic of Kazakhstan. http://ru.government.kz/ru/
postanovleniya. Last accessed 3 Jun 2016
11. Samuilov, K., Serebrennikova, N., Chukarin, A., Yarkina, N.: Fundamentals of Formal
Methods for Describing Business Processes, vol. 1. RUDN, Moscow (2008)
12. Kovalova, M., Turcok, L.: The importance of business process modelling in terms of
university education. Int. J. Sci. Technol. Res. 3(12), 111–117 (2014)
13. Kamennova, M.S., Krokhin, V.V., Mashkov, I.V.: Business Process Modeling. Part 1:
Textbook and Practical Work for the Academic Bachelor Degree, vol. 1. Publisher Jurajt,
Moscow (2018)
14. Salikhzyanova, N., Gallyamova, D.: Metodologiya modelirovaniya biznes-protsessov
organizatsii. Vestnik Kazanskogo tekhnologicheskogo universiteta 15(5), 202–204 (2012)
Optimization Problems of Economic
Structural Adjustment and Problem of
Stability
1 Introduction
As is known, solutions optimization in problems of real objects of different nature
represented by static or dynamic models and, in particular, in macroeconomic
modeling [1,2], is widely distributed. In mathematical formulation, such an opti-
mization problem is usually represented as a criterion—a function or functional
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 850–860, 2020.
https://doi.org/10.1007/978-3-030-21803-4_85
Optimization Problems of Economic Structural Adjustment 851
2 The Model
The dynamic Model was built by developing and linking to the data of the static
computable general equilibrium model (CGE model) Globe1 [1] on the base of
development of a conceptual description of the global economy.
First, we list some of the prerequisites for a meaningful description.
The world economy is presented in the form of the functioning of interact-
ing agents of selected Regions of the world economy: Producers (industries),
Households, The State. Agent-region Globe imports transportation services and
exports them to all regions when importing each type of goods from each region
to every other region.
852 A. Ashimov et al.
a special converter from the GTAP database [17]. The required SAM sets for
the years 2005, 2006, 2008–2010 and 2012–2015 were calculated using the devel-
oped Algorithm 1 [18] based on the available statistical sources containing the
input–output tables (see, e.g., [19], and indicators of mutual trade [20], using
the base ratios calculated with the help of the known SAMs for the most recent
year (2004, 2007 or 2011). For the forecast period (2016–2023), the developed
Algorithm 2 [18] was used to calculate these SAM sets based on the following
forecast indicators of the Regions provided by the IMF [21]: GDP, Total invest-
ment, Import volume, The volume of import of services, The volume of exports
of goods, The volume of exports of services, General government revenues, and
General government expenditure. In doing so, we used the baseline ratios calcu-
lated with the help of the obtained SAM for 2015.
The results of calculating the base scenario of the calibrated Model accurately
reproduce the statistical and forecast data to be used in building the SAM sets
indicated above.
(1) Algorithm 1 for estimating the set S(f ) of singular points of the mapping
f in parallelepiped A. This algorithm is based on dividing parallelepiped A
into small elementary parallelepipeds and evaluating the signs of the max-
imal order minors of the Jacobi matrix of the mapping f at all vertices of
elementary parallelepipeds. In cases where dim A ≥ dim B and when the
set S(f ) is estimated as empty, this mapping f is evaluated as a stable
submersion.
(2) Algorithm 2 for estimating the injectiveness of the mapping f . This algo-
rithm is used in the case when dim A < dim B and when the set S(f ) is
estimated as empty. If the application of Algorithm 2 evaluates the immer-
sion f as an injective mapping, then the mapping f is evaluated in A as
stable.
854 A. Ashimov et al.
the f mapping defined by the tested model is assessed on the A set as being con-
tinuously dependent on the exogenous values [6,7]. In the Model experiments as
set A, a parallelepiped with a center at point p corresponding to the baseline
values of all the tax rates in all the Regions for the year 2016 was considered,
whereas as Bt sets of endogenous variables were considered the following: GDP,
exports and imports of all the Regions of the Model for the fixed computational
year t (from 2016 to 2023).
Table 1 shows the calculated values of the βf (p, 0.01) of the Model stability
indicator (in percentage) for the base point p and α = 0.01.
Table 1. Values of the stability indicators for the basic calculation of the model.
Year
2016 2017 2018 2019 2020–2023
0.7652 0.2496 0.0259 0.0033 0.0000
The specified estimates of the stability indicators in the Table 1 indicate that
the stability of the Model (in the sense of the considered stability indicators) in
the calculations up to 2023 is sufficiently high.
Within the framework of developing optimal measures of fiscal policy for struc-
tural adjustment at the sectoral level and economic growth of the EAEU coun-
tries, and based on the Model the following steps were proposed, that allowed
to:
856 A. Ashimov et al.
1. select a set of promising industries for each EAEU country for which it is
desirable to have an outstripping growth of output; and
2. solve a set of dynamic optimization problems aimed at economic growth
and accelerated growth of output of each of the selected promising sectors in the
EAEU countries and in the EAEU as a whole.
The marginal cost of public funds for taxes from this industry (M CFr,i ) for
the forecast period (see also [10]) is proposed as an indicator characterizing the
prospects of each i-th industry of the country r. In this paper, the amount of
change in the r country’s GDP resulting from an increase in tax collections from
the i-th industry by 1 monetary unit is adopted as the M CFr,i . This indicator
characterizes the significance of the industry, such that increasing its taxation
leads to the increase in the country’s GDP. The results of the calculation of
M CFr,i indicator are shown in the Table 2.
Industry i Country r
1 Kazakhstan 2 Russia 3 Belarus 4 Armenia 5 Kyrgyzstan
1 ming 0.40 0.26 0.37 1.71 0.54
2 crog 0.51 0.08 −1.50 2.27 −1.01
3 mepe −0.58 0.23 0.10 1.39 0.52
4 mind 0.92 0.41 1.01 1.82 0.89
5 ehas 0.32 0.63 0.27 0.75 0.83
6 pegw −0.33 −0.27 0.33 0.32 −0.02
7 f pin −3.14 −1.26 −0.60 −0.66 −0.89
8 psta −0.24 −0.00 1.17 1.33 0.31
9 otis −1.41 −0.06 0.17 0.62 −0.47
10 oths 0.27 −0.34 −0.19 0.29 0.22
11 agf f −1.97 −0.77 −0.51 −0.66 0.20
12 buil 0.62 0.89 0.67 0.48 1.02
13 mtal −0.92 −1.03 0.53 0.75 0.37
14 f ins −0.22 −0.23 0.16 0.88 0.07
15 chpp −0.85 −0.14 1.39 1.25 0.94
16 tser 0.40 −0.20 1.00 2.10 1.39
Based on the analysis of the values shown in Table 2, the sets of industries
corresponding to the M CFr,i indicators, that are not less than 0.4 for each of
the r country of the EAEU, have been identified (in bold). The set of numbers
Optimization Problems of Economic Structural Adjustment 857
The set of problems of dynamic optimization (i.e., the problem of SPr parametric
control) was considered in solving the problem of economic growth of the EAEU
countries and accelerated growth in the output of selected industries in these
countries in 2016–2023 via fiscal policy measures. Here r = 1, . . . , 5 corresponds
to such a problem at the level of one country r; r = 0 corresponds to the problem
of developing a coordinated policy at the level of all EAEU countries. We give
an informal formulation of this problem.
Statement of the SPr problem. The problem deals with the identification of
the values of the control parameters ur (t) based on the Model (effective tax rates
on producer’s income, sales tax and customs duties differentiated by product and
industry; the share of government expenditures for consumption) that provide
the maximum value of the Kr criterion (1)–(2) with constraints on the control
instruments of ur (t) ∈ Ur (t) type. Here, for t = 2016, . . . , 2023; and Ur (t) is a
parallelepiped with the center at the point of base values ur (t) and boundaries
spaced at ±10% of the baseline values.
For the problems SPr , (r ∈ {1, . . . , 5}) the control parameters ur (t) are the
specified instruments of the state policy of only the r-th country, and for the
problem SP0 the control parameters are the indicated instruments of the state
policy of all five EAEU countries.
In each of the problems SPr it is supposed to conduct an optimal fiscal
policy aimed at economic growth and the growth of output of selected promising
industries in 2016–2023. (in the r country for the case r ∈ {1, . . . , 5} or in all
EAEU countries for the case r = 0). Therefore, in these problems, the criterion
Kr is written in the following form
2023
Kr = TQVAr (t) + αr,i TQX r,i (t) , r ∈ {1, . . . , 5}; (1)
i∈Ir
t=2016
5
K0 = Kr , (2)
r=1
where TQVAr (t) is the GDP per capita rate of the country r and TQX r,i (t)
is the rate of output of industry i per capita in the country r in year t and
αr,i = 0.1 is weighting coefficient.
The formulated problems SPr were solved numerically by using the solver
NLPEC [22]. The results of solving these problems in the form of increments
of the average value of GDP for 2016–2023 (in percentage compared with the
baseline scenario) are presented in Table 3.
858 A. Ashimov et al.
Table 3. The increments of the average value of the GDP of the Regions as a result
of solving problems SPr .
Problem Country
1 Kazakhstan 2 Russia 3 Belarus 4 Armenia 5 Kyrgyzstan
SP1 1.76 0.00 0.00 0.00 −0.01
SP2 −0.01 2.76 0.00 −0.06 −0.03
SP3 0.00 0.00 3.18 0.00 0.00
SP4 0.00 0.00 0.00 1.52 0.00
SP5 0.00 0.00 0.00 0.00 1.94
SP0 3.00 2.14 5.86 4.50 9.27
Table 4. The increments of the average value of the GDP of the Regions as a result
of solving problems SPr .
Problem Country
1 Kazakhstan 2 Russia 3 Belarus 4 Armenia 5 Kyrgyzstan
1 ming 3.27 2.19 3.61 2.53 9.10
2 crog 2.52 0.98 0.00 0.73 0.00
3 mepe 1.56 0.03 0.80 0.63 4.25
4 mind 1.92 1.58 0.45 2.73 10.86
5 ehas 5.40 12.74 22.39 9.97 4.82
6 pegw 0.00 1.32 0.00 1.24 3.33
7 f pin 1.54 0.97 1.69 2.88 5.03
8 psta 1.93 5.09 11.99 7.90 0.00
9 otis 1.74 0.91 0.00 1.41 3.51
10 oths 0.00 0.00 0.00 0.25 5.61
11 agf f 1.85 0.00 0.00 1.37 0.00
12 buil 1.94 2.16 2.71 2.82 12.96
13 mtal 0.00 4.67 0.86 3.01 2.55
14 f ins 2.22 3.67 0.66 0.38 0.00
15 chpp 3.18 1.59 0.96 1.75 5.17
16 tser 11.35 0.00 0.00 1.50 6.02
Table 4 presents the increments of the average value of the output of the
industries of the EAEU countries for 2016–2023, obtained as a result of solving
the problem SPr in percentage compared to the baseline scenario. These values
for promising industries are shown in bold.
Scenario variants of the Model for the obtained optimal values of the instru-
ments were tested by three methods specified in Sect. 3. In all cases, the calcu-
lation results demonstrated:
Optimization Problems of Economic Structural Adjustment 859
References
1. GLOBE 1, http://www.cgemod.org.uk/globe1.html. Accessed 09 Nov 2018
2. GTAP Models Home, https://www.gtap.agecon.purdue.edu/models/default.asp.
Accessed 09 Nov 2018
3. Gill, P., Murray, W., Wright, M.: Practical Optimization. Academic Press, London
(1981)
4. Golubitsky, M., Gueillemin, V.: Stable Mappings and Their Singularities. Springer-
Verlag, New York (1973)
5. Arnold, V.I.: Geometrical Methods in the Theory of Ordinary Differential Equa-
tions. Springer-Verlag, New York (1988)
6. Ashimov, A., Adilov, Zh, Alshanov, R., Borovskiy, Yu., Sultanov, B.: The theory of
parametric control of macroeconomic systems and its applications (I). Adv. Syst.
Sci. Appl. 1(14), 1–21 (2014)
7. Orlov, A.I.: Econometrics. Ekzamen, Moscow (2002). [in Russian]
8. Abouharb, R., Duchesne, E.: World bank structural adjustment programs and their
impact on economic growth: a selection corrected analysis. In: The 4th Annual
Conference on the Political Economy of International Organizations (2011)
9. Zografakis, S., Sarris, A.: A multisectoral computable general equilibrium model
for the analysis of the distributional consequences of the economic crisis in Greece.
In: 14th Conference on Research on Economic Theory and Econometrics (2015)
10. Devarajan, S., Robinson, S.: The impact of computable general equilibrium models
on policy. In: Conference on Frontiers in Applied General Equilibrium Modeling
(2002)
11. Khan, H.A.: Using macroeconomic computable general equilibrium models for
assessing poverty impact of structural adjustment policies. In: ADB Institute Dis-
cussion Paper, vol. 12 (2004)
12. Shishido, S., Nakamura, O.: Induced technical progress and structural adjustment:
a multi-sectoral model approach to Japan’s growth alternatives. J. Appl. Input-
Output Anal. 1(1), 1–23 (1992)
13. Naastepad, C.W.M.: Effective supply failures and structural adjustment: a real-
financial model with reference to India. Camb. J. Econ. 26(5), 637–657 (2002)
860 A. Ashimov et al.
14. Huang, H., Ju, J., Yue, V.Z.: A Unified model of structural adjustments and inter-
national trade: theory and evidence from China. In: Meeting Papers from Society
for Economic Dynamics, vol. 859 (2013)
15. GAMS Homepage, http://www.gams.com. Accessed 09 Nov 2018
16. Ferris, M., Munson, T.: PATH 4.7, http://www.gams.com/latest/docs/S PATH.
html. Accessed 09 Nov 2018
17. GTAP Data Base Homepage, http://www.gtap.agecon.purdue.edu/databases/
default.asp. Accessed 09 Nov 2018
18. Ashimov, A., Borovskiy, Yu., Novikov, D., Sultanov, B.: Macroeconomic analysis
and parametrical control of the regional economic union. URSS, Moscow (2018).
[in Russian]
19. World Input-Output Database Homepage, http://www.wiod.org/home. Accessed
09 Nov 2018
20. World Integrated Trade Solution Homepage, http://wits.worldbank.org. Accessed
09 Nov 2018
21. World Economic Outlook Databases Homepage, http://www.imf.org/external/ns/
cs.aspx?id=28. Accessed 09 Nov 2018
22. NLPEC, https://www.gams.com/latest/docs/S NLPEC.html. Accessed 09 Nov
2018
Research of the Relationship Between Business
Processes in Production and Logistics Based
on Local Models
1 Introduction
In all economic, industrial and technological spheres (or processes) business processes
are the main objects, uniting everything that is relevant to achieving the goal. Manu-
facturing and logistic processes are specific business processes. They focus on the
routing of materials and all location of work to resources [1]. The business process can
be perceived as production and factors of production [2].
There are many models of business processes that do not sufficiently reflect the
properties of a business process and the needs of a person in a business process. In
other words, all kinds of models are functionally incomplete.
Business process analysis becomes extremely important for production and logis-
tics systems, since it plays a vital role in successfully improving business processes.
The purpose of process analysis is to discover new knowledge to solve problems and
streamline processes to create key competencies. A large amount of research and
development has been carried out to optimize the performance of business processes in
this complex and dynamic environment [4–6]. For the analysis and optimization of
business processes in the field of production and logistics, several methodologies,
methods and tools have been developed.
In this regard, the paper [1] argued that the basic business process model (GM or
MBP) is created by the composition of local models (LM). Thus, the properties of the
whole business process can be reflected and transmitted using combinations of local
models.
When organizing local models into a combination, not only is the union of func-
tions important, but also the organization of the structure, in other words, the nature of
the relation between models and the type of protocol (universal or unique), also
important are questions of how integration takes place, especially such features of an
organization as: the type of technologies for integrating data, information, knowledge
and rules, services or agents, and the type of technologies of tools and interfaces are
especially important.
Standard integration tools can be used for integration, for example, such methods
and integration tools: ESB, EAI, EII, ETL, where the ESB literally translates as “en-
terprise service bus”, EAI—enterprise application integration, EII—enterprise infor-
mation integration, ETL—extract, transform and load - software for extracting,
transforming and loading data
Thus, the basic model is a composition of local models. At the same time, the
organization of local models in basic models depends on the characteristics of local
models and the environment, also on the characteristics of business processes and the
problem being solved in a business process.
The basic model has such properties as mono-target, hierarchy or peer-to-peer,
multidimensionality, semantic operations, non-linearity, fractality and fasset. These
properties arise in the composition by integration (based on the integration bus) and
aggregation of local models into a common integrated one.
Before outlining the main idea of the work, we first introduce and give a number of
concepts.
Definition 1. Model representation of a business process in a priori assumed that the
business process is an object of the external (or rather real or virtual) world, which
performs the function and role of a labor tool, performing assigned tasks or coming
from a production plan (operational calendar or schedule).
Definition 2. A complete business process is a process with all components, such as:
• schema logic (metamodel) of performing business process operations from abstract
classes.
• business process infrastructure, including various types of work tools, including
business process automation systems
– during the execution of production tasks, the business process is used to perform
production tasks (or plan) as an instrument of labor that links the plan and opera-
tional management in the production environment, i.e. process of execution of the
plan. This shows that the business process itself as a method or technology of labor
must be observable and manageable;
– automation system refers to the infrastructure of a business process as one of its
components.
The business process includes many objects or subjects, many special processes,
subjects and means of labor, and also includes the methodology and technology and the
responsible executives for the implementation of the business process. Thus, the
business process has a complex structure and composition, i.e. architecture and com-
plex components of this architecture.
Therefore, the presence of the model allows you to organize and accelerate the
process of building as components of a business process, and in the whole of the
business process, the establishment of which is planned.
The resulting model of a complex business process will allow:
– to establish and disclose the composition, structure and architecture of a complex
business process of the selected class,
– build optimal business process models;
– automate complex business process;
– to operate and manage a complex business process. The peculiarity of the basic
model of a complex business process is that it represents a complex business
process from a variety of special-purpose processes, i.e. special processes. Each of
which is described by conceptual, logical and procedural models of the basic model
of a complex business process.
In addition, it serves as the basis for all phases of the life cycle of a business process
and automation system, i.e. the model should support project processes: from the pre-
project stage to the decommissioning (or inheritance) of both the business process and
the automation system.
Thus, the model is a supporting tool for creating a business process and systems to
ensure that the general requirements that are imposed on the business process are met [1].
864 R. Uskenbayeva et al.
Define the purpose and function of each individual local model in this way. A business
process is an object of the outside world. And any object of the external world is
characterized by a conceptual representation, i.e. place in the “world of things”, which
occupies this object among other objects and a set of distinctive properties, as well as
the nature of communication with other objects of the external world. Therefore, the
business process as an object of the outside world should be characterized by a concept,
i.e. conceptual representation.
And, as is well known, the conceptual features of an object (that is, a business
process) must be presented in the form of a special model - a conceptual model.
Research of the Relationship Between Business Processes 865
Note that the object is an element of a united information space (UIS), hence the
conceptual model of the business process is an element of the UIS.
A business process, as an object of the outside world, must be represented by a
conceptual model (CM or CMBP is a conceptual model of a business process).
It should be noted that the object is conceptually presented for which purpose
separately. The business process is intended for production and is a managed object.
Therefore, the CM of a business process is characterized by its mission, targets or
purpose and criterion, input and output (result) of data.
• In addition, the composition of the input and output depends on what for (for what
purpose) we build the CM for the BR. It should be noted that the CM is building to
solve the problem of integration. Therefore, for us, the CM needs to ensure that it
integrates the business process of logistics with the business processes of other
organizations, for example, at the top level with partners (suppliers and consumers
of goods and machines and equipment) of logistic processes;
• at the lower level with business processes of other, for example, neighboring local
problem areas.
Thus, our business process must be able to integrate with the business processes of
other organizations. Accordingly, the inputs and outputs of the CM at the level of the
logistics business process should be harmonized with the business processes of other
organizations [6]. And for integration, the following data is needed:
• the internal structure of the logistics sector, local problems of the region, its com-
position, capacity;
• objects of labor, source and flow of goods, types of goods;
• means of labor, which means of transporting goods between warehouses and cus-
tomers, transportation of goods within the warehouse;
• what outsourcing operations are available, etc.
Since the field of logistics consists of two levels: the general problem area of
logistics and local problem areas, which constitute the general problem area, while they
have a different environment and environment.
where the j-th local problem area sets the information as:
(1) a list of specialized processes included in the created business process of the local
problem area.
(2) the number and types of specialized processes dependent on the problem being
solved and on the characteristics of the business process itself and its specialized
processes.
(3) on the specialized processes of a given business process in a local problem area,
(4) as well as the metamodel (descriptions) of the integration of specialized processes
within a business process for a specific purpose within the problem area.
(5) input and output data characterizing this business process as an element of a single
process information space (UIS) of the second level.
The model is designed to automate business processes. Therefore, we construct a
model for classes of business processes that are observable and controlled.
This is achieved by introducing the first phase of the strategic process, which is the
beginning of the management process. Therefore, for the managerial level, the spe-
cialized process, we introduce the strategic process from which we must begin.
It is generally accepted that the business process model is represented in two ways
either as it is or as it should be. In order to present the business process “as it should be”
(and should be observable and manageable), it is necessary to conduct the business
process from the model presentation “as is” by carrying it into a model presentation as
it should be.
This is achieved by reengineering. Business process reengineering (BPR) is defined
as a “fundamental rethinking process for all business metrics such as cost, speed,
quality, and service.” BPR or not. Although it is a concept of BPR programs, it has
been one of the most successful programs (e.g. 70%) [7, 8].
Business process reengineering must be carried out for a specific purpose, for
automation, for monitoring and for managing the business process, which as a result of
the organization needs methods and the integration of knowledge management models
to understand the environment, which includes processes, people, employees, cus-
tomers and tools.
To obtain a managed version of a business process, it is achieved by entering a
control loop that performs a number of coordinating and controlling functions that are
realized from processes consisting of operators, for example, processes (consisting of
operations): strategic processes (solutions), logical-operator, service (strategic model,
logical model, operator model, service model. C = {cijk}, k = 1, Kij) and this is for
each specialized BP process: analytics, administration, organization, management,
technological processes, provision of resources, services in the local problem area, for
example, B = {bij}, j = 1, Ji. Manageability is also allowed by the introduction of
additional functions - and this is achieved by services.
Research of the Relationship Between Business Processes 867
Processes or operators are controls. And the means (control actions) of management
are specialized processes and additionally introduced operations.
Such a model is built for each local area, which we have isolated from the logistic
process: receiving goods, storing and picking goods, shipping, shipping, receiving a
temporary storage warehouse, etc. A = {ai}, i = 1, I.
Thus, the process of reengineering can be explained by the following definitions:
(business process diagram “as it should be”) is carried out to increase productivity, to
increase efficiency, to give manageability. This means that it is necessary to neglect all
existing structures related to the procedures, to invent new ways of completion, to carry
out the work and to complete them in record time. Reengineering is an update business
process that starts with assumptions and does not take everything for granted.
Therefore, in the business process model (i.e., in the base model) we introduce
additional operations that are created from outside by the model developers themselves,
i.e. no in the BP itself. These are operations of strategies, logic, operators and services.
Thus, the manageability of a business process is gained by introducing a control
loop. Moreover, first of all, the strategic process of the control loop, which is the
beginning of the control process. Therefore, to give BP a managerial character, we
introduce the concept of a strategic process.
• SP(t) - production situation that arises before the execution of the business process,
• Z(t) - the purpose or purpose of the business process,
• Jb(t) - setting at the current time,
• Sl(t) - the subject of labor at the current time,
• EP(t) - factors and objects of the external environment that have a direct impact on
the implementation of the business process,
• BP(t) - the state of the business process, characterized by the values of the business
process indicators.
To make strategic decisions, production situations are divided into two classes of
situations:
If for the current production situation the conditions SP(t) 2 SP1, are met, then the
necessary list of specialized processes is selected (the necessary list of types of active
specialized processes) that are necessary to perform the specified task by this business
process, as well as their priorities for each of them. In the current production situation
of SP(t) 2 SP2, the k-th option of the specialized process is selected.
If SP(t) 2 SP3, then for the selected variant (k-th variant) of a specialized process,
the set of operations is determined, and the meta scheme for performing the sequence
of operations, in which the allowed combinations of the sequence of operations (Oph
Opk) reflect given the current situation. An admissible combination of an operation is
established on the basis of the semantics of a relation, which is determined from the
ontological model. Expression (Oph Opk) has the following meaning: a sequence is a
valid combination of a sequence of operations where Oph is the h-th class of opera-
tions, Opk is the k-th class of operations, is a sequence operation. This model performs
the role of a scheduler that plans to complete the business process of an upcoming order
or order.
Consider the purpose and principles of action of the logical model (model of decision
making) of the j-th separate specialized process from the stack of the specialized
process of the business process of the local problem area, i.e. j = 1, J.
The strategic model determines when and in what sequence processes are applied
and executed. All of these methods are a process of organization and management. The
purpose of the logical model is to define the sequence of business operations of each
special process. Each business transaction consists of two parts: an operator and a
procedure. Therefore, the logical model contains two levels.
At the top level, selects operators selected a single character representation of a
specialized process from a stack of specialized processes of a business process of a
local problem area (for example, technological or organizational, or providing process
Research of the Relationship Between Business Processes 869
resources) of one and that special process and follow these operators from operations.
The choice is made on the basis of the current situation (i.e., given the initial situation
of the problem area).
In a different way, this level of the logical model is an operator model of a spe-
cialized process of a business process of a given local problem area.
For two production situations, the composition and sequence of operations within a
single logical model may differ. For example, in the initial situations Stek(i) 2 Sst and
Stek(j) 2 Sst, in particular, in the following form has the form:
In a production situation Stek(i) 2 Sst:
where
Pri, Prj - specialized processes of business process BP;
mi, mj, are the number of business transactions in Pri, Prj, respectively, in the
general case mi 6¼ mj;
Opit 2 Pri, Opjk 2 Prj - operations in specialized processes SPpi, SPpj business
process BP;
Opit(Stek(i)) (or [Opit(Stek(i))] 2 SPpi), Opjk(Stek(j)) (or [Opjk(Stek(j))] 2 SPpj,), -
Opit, (or Opit(Stek(i))), Opjk, (or Opjk(Stek(j))) operations performed in situations of
Stek(i) and Stek(j), which Stek(i) 2 Sst and Stek(j) 2 Sst.
At the lower level performs the selection of procedures. For each operator, which
correspond to several procedures, therefore, based on the current situation for one
operator, one of the procedures is selected.
Thus, the logical model of the chosen i-th specialized process sets a complete list of
operations and their sequence of execution based on the admissibility requirement,
which is chosen (by the strategy) by the strategic model.
5 Conclusion
The authors of this work represent a business process as a formalized process in which
certain all types of resources, performers, owners of all types of processes are necessary
to achieve the ultimate goal of the process.
Each type of security is achieved by separate processes, which are called special-
ized business process processes. Each specialized process is modeled by a separate
model. To make the business process manageable, a control loop model is introduced,
consisting of the model:
870 R. Uskenbayeva et al.
• a strategic model that will ensure the adoption and implementation of strategic
decisions on the order of implementation of specialized processes,
• a logical model that determines the sequence of execution of operators of spe-
cialized processes after strategic decision-making and its implementation,
• a service model, which is defined by the control functions in the form of services.
Acknowledgments. This work was supported by Ministry of Education and Science Republic
of Kazakhstan (Grant No. 0118PК01084, Digital transformation platform of National economy
business processes BR05236517).
References
1. Van der Aalst, W.M.P.: On the automatic generation of workflow processes based on product
structures. Comput. Ind. 39(2), 97111 (1999)
2. Vlkner, P., Werners, B.: A decision support system for business process planning. Eur.
J. Oper. Res. 125(3), 633647 (2000)
3. Preparation of Papers in a Two-Column Format for the 2018.: In: 18th International
Conference on Control, Automation and Systems (ICCAS 2018)
4. Zhang, Y., Feng, S.C., Wang, X., Tian, W., Wu, R.: Object-oriented manufacturing resource
modelling for adaptive process planning. Int. J. Prod. Res. 37(18), 41794195 (1999)
5. Zhang, F., Zhang, Y.F., Nee, A.Y.C.: Using genetic algorithms in process planning for job
shop machining. IEEE Trans. Evol. Comput. 1(4), 278289 (1997)
6. Duisebekova, K., Serbin, V., Ukubasova, G., Kebekpayeva, Z., Aigul, S., Rakhmetulayeva,
S., Shaikhanova, A., Duisebekov, T., Kozhamzharova, D.: Design and development of
automation system of business processes in educational activity. J. Eng. Appl. Sci. 8, 4702–
4714 (2017) (ISSN:86-949X, Medwell Journals)
7. Dabbas, R.M., Chen, H.-N.: Mining semiconductor manufacturing data for productivity
improvementan integrated relational database approach. Comput. Ind. 45(1), 2944 (2001)
8. Musa, M.A., Othman, M.S., Al-Rahimi W.M.: Ontology driven knowledge map for
enhancing business process reengineering. J. Comput. Sci. Eng. 3(6), 11 (2013) (Academy &
Industry Research Collaboration Center (AIRCC))
9. Lila, R., Gunjan, M., Kweku-Muata, O.-B.: Building ontology based knowledge maps to
assist business process re-engineering. J. Decis. Support Syst. 52(3), 577–589 (2012)
Sparsity and Performance Enhanced
Markowitz Portfolios Using Second-Order
Cone Programming
Note that this formulation can be rewritten as a smooth one by substituting for
each variable the difference of two nonnegative variables and then replacing ||·||1
in the objective by the sum of these variables. Another commonly used penalty
is the L2 -norm. The use of an L2 -norm based penalty appears to stabilize the
inverse covariance matrix which is often ill-conditioned and also it is found to
reduce the estimation error of the covariance matrix [8]. We will consider a
discrete, also known as an “L0 -norm” constraint, where the quantity ||x||0 =
|{i ∈ N | xi = 0 }| is bounded by some given positive integer, together with an
L2 -norm penalty in the objective of our portfolio optimization formulations.
Lobo et al. [9] consider the portfolio selection with transaction costs. For the
case of fixed fee transaction costs they suggest using mixed-integer program-
ming. Additional motivations for using integer based constraints include buy-in
thresholds, diversification and round-lot constraints [2].
Closely related to Markowitz’s model is the Sharpe ratio which incorporates
both the portfolio’s expected return and its risk (as indicated by the standard
deviation of its returns rather than variance) into a single measure of perfor-
mance [12]. If μf is the risk-free return, then the Sharpe ratio of a portfolio x is
r(x)−µf
defined as √ , which attempts to measure how well the return of an asset
V (x)
may for its associated risk. Portfolio optimization models are also related to asset
pricing theory and factor models. In particular, the capital asset pricing model
(CAPM) lays the foundations for factor modeling with the market being the sole
factor. Fama and Franch [5] extend this notion and show that more than 90%
of a diversified portfolio’s return variance can be explained by the three-factor
model, consisting of the size factor (“small minus big”), the value factor (“high
minus low”) and the market factor.
Note that if short-sales are disallowed then x ≥ 0, and it would suffice to set
M = 1 without excluding any of the optimal solutions. In general, an upper
bound M ≥ maxi∈N |x∗i |, for all optimal solutions x∗ may tend to be even
larger. The magnitude of M (relative to the smallest component of x∗ ) directly
affects the quality of the continuous relaxation of (2). We consider a nonlinear
reformulation that replace the big-M constraints (2e) using additional variables
ui for each i ∈ N . The following formulation equivalent to (2) (with respect to
the set of optimal solutions) is based on the perspective reformulation technique
described in [1,6,7],
n
Z∗ = min V (x) + q ui (3a)
x,u∈R ,z∈{0,1}n
n
i=1
subject to (2b) − (2d) (3b)
x2i ≤ u i zi i ∈ N. (3c)
Note that in every optimum solution (x̂, û, ẑ) of (3) if zˆi = 1 then x̂i 2 = ûi ,
otherwise x̂i 2 = ûi = zˆi = 0. Also, this formulation is tighter than (2) as for
each i ∈ N it replaces the upper bounding constant M by a presumably smaller
nonnegative variable ui that is being minimized in the objective function. The
constraints (3c) are nonconvex but evidently the set of solutions that satisfy it
correspond to a (convex) rotated second order cone, and which can be reformu-
lated as second-order cone constraint (SOC) (see [7] for more details).
We now suggest a scheme to further tighten the formulation (3), when the min-
imum eigenvalue of C, λmin satisfies λmin > 0 (i.e. C is positive definite). Given
λmin > 0 consider the tightened formulation
n
Ẑ ∗ ≡ min xT (C − λmin I)x + (q + λmin ) ui (4a)
x,u,z
i=1
subject to (3b) − (3c). (4b)
874 N. Goldberg and I. Zagdoun
Proposition 1. The optimal value of (3) equals the optimal value of (4), that
is Z ∗ = Ẑ ∗ .
Proof. Let (x∗ , u∗ , z ∗ ) be an optimal solution of (3), and (x̂, û, ẑ) an optimal
solution of (4). For each i ∈ N note that x̂2i = ûi : Otherwise, if (x̂, û, ẑ) is
optimal to (4) with x̂2i < ûi for some i ∈ N (by feasibility (3c) x2i ≤ ui ) then
ûj − , j = i
for 0 < < ûi − x̂2i , define ūj = , and observe that (x̂, ū, ẑ)
ûj , otherwise
is feasible (because ui appears only in the constraint (3c) which is inactive for
i ∈ N and since zi ∈ { 0, 1} , x̂2i < ūi and accordingly x̂2i < ẑi ūi . Further,
n n
x̂ (C − λmin I)x̂ + (q + λmin ) i=1 ūi < x̂T (C − λmin I)x̂ + (q + λmin ) i=1 ûi ,
T
there by establishing a contradiction. Hence, (x̂, û, ẑ) must satisfy x̂i = ûi for 2
all i ∈ N . Since (x̂, û, ẑ) is feasible for (3); it follows optimality of (x∗ , u∗ , z ∗ )
T n n
to (3), that Z ∗ = x∗ Cx∗ + q i=1 u∗i ≤ x̂T C x̂ + q i=1 ûi = Ẑ ∗ . On the
other hand, given an optimal solution (x∗ , u∗ , z ∗ ) of (3) for each i ∈ N , by
2
applying a similar argument to formulation (3), for all i ∈ N, x∗i = u∗i . Thus,
T
since (x∗ , u∗ ,
z ∗ ) is feasible to (4) and by optimality of (x̂, û, ẑ) to (4), x∗ (C −
n n
λmin I)x∗ + q i=1 u∗i ≥ x̂T C x̂ + q i=1 ûi = Ẑ ∗ .
Table 1 details the properties of the datasets used in our experiments. The results
for computational running times are based on a monthly dataset of S&P 500
stocks In this dataset, to handle missing data, we removed rows and then columns
with more than 50% and 4% of missing values, respectively. Remaining missing
values were replaced by the most recent available values. After handling miss-
ing values in this manner the resulting dataset has 221 rows and 359 columns
(dataset SP500 221). We created two additional smaller datasets based on the
S&P 500 using the first 50 and 60 assets (datasets: SP500 50 and SP500 60,
respectively) of SP500 221. The predictive performance results were performed
on the other weekly stock index data listed in the table. In addition, we used a
recent update of the monthly data of FF48 and FF100 of Fama and French [5].
An earlier version of this data was also used by Brodie et al. [3].
Table 2. CPU seconds and B&B nodes for the mixed-integer formulations on the
SP500 50 data. A dark gray background indicates that the run is irrelevant (as k¿n).
Table 2 displays computational running times of formulations (2), (3) and (4).
The table compares the computational performance running times and branch-
and-bound (B&B) nodes with an optimality gap of 1%. It is evident that, other
than for small k, formulation (4) is solved with the least B&B nodes and in
most cases the least CPU time. Further, for intermediate values of k formula-
tion (4) solves the given instances within a reasonable running time, while for-
mulations (2) and (3) cannot be solved within the time limit of 2 hours. Table 3
displays computational running times and number of B&B nodes for our choice
of formulation (4) on several datasets. Here it is demonstrated that our chosen
formulation (4) effectively solves the cardinality-constrained integer problem for
real moderately sized financial data.
In order to handle larger datasets we also consider the continuous relaxation
of our integer formulations. Table 4 displays the optimal objective values of the
continuous relaxations of (2), (3) and (4) compared with the optimal integer solu-
tion. Evidently, the continuous relaxation of (4) has consistently larger objective
values demonstrating that indeed it is a tighter continuous relaxation for the
discrete problem. This is while the continuous relaxation of (2) has an opti-
mal objective value that does not significantly exceed that of the non-cardinally
constrained problem (with k = n) and thus does not appear to provide a use-
ful, sufficiently tight, continuous relaxation. Note that the results of this table
motivate our choice of a continuous relaxation and also provide an explana-
tion the difference in computational performance of solving the corresponding
mixed-integer problems as shown in Table 2.
Sparsity and Performance Enhanced Markowitz Portfolios 877
Table 3. CPU seconds and B&B nodes for solving formulation (4) on the indicated
datasets. LIMIT is indicated for runs reaching the time limit of two hours.
Table 4. Optimal objective value of the continuous relaxations vs. the optimal integer
solution on the SP500 221 data.
An outer partition of the data in a rolling window fashion is used for model
comparison. (An inner partition is used for some of the methods for fine tuning
parameter values.) In our experiments, each (outer) window consists of a training
set sized ttrain and a test set corresponding to the next ttest trading days. Given a
total of tall observed days, then the data is split into (tall − ttrain )/ttest disjoint
test sets.
878 N. Goldberg and I. Zagdoun
In our experiments each training set size is set equal to 20% of the entire
dataset, specifically ttrain =
t/5. The test set amounts to approximately 5% of
the data resulting from setting ttest =
0.25ttrain . Consequently, for each of the
datasets that we experiment with there are 16 test time windows. In implemen-
tation of CCM-R we had 5 inner time windows with tparm-train =
ttrain /2 and
tvalid =
ttrain /5. Also, numerically the TSE data necessitated adding to C a
small positive diagonal with entries approximately equal to 7 × 10−7 to make it
positive definite.
Predictive Performance Comparison for Unspecified Cardinality. The
methods designed for this setting include our learning method CCM-R, that
determines the parameter k in the continuous relaxation of (4) by additional
validation experiments, and compared it with Brodie et al., Markowitz and a
naive equal weight portfolio. We also compared it with additional benchmark
corresponding to (4) with k = n, essentially amounting to a Markowitz for-
mulation with an additional L2 penalty term in the objective. This method is
referred to as L2 -Markowitz. The q penalty parameter was set to the same fixed
value as CCM-R. The predictive performance comparison was evaluated on the
datasets SP500, TSE, LSE and TLV100. The choice of the parameter value k
performed in the inner time windows of CCM-R out of the set of candidate
values K = {n/100, n/4, n/2, 3n/4, n}.
Brodie et al.’s method was implemented by binary search for the minimum
value of ζ such that all of the xi ’s are nonnegative (no-short positions). This
setting disallows short positions and is similar to the experimental setup in [3]
when the portfolio cardinality is unspecified. In order to facilitate a meaningful
comparison we impose the corresponding nonnegativity constraint x ≥ 0 in the
Markowitz and CCM-R (the continuous relaxation of (4)) formulations.
Table 5 shows that average test Sharpe ratios for the five methods on each of
the four datasets. The results of the table show over all of the datasets CCM-R
performs better than Markowitz and the naive method in terms of the average
test Sharpe ratio. It performs better than Brodie et al.’s method [3] in nearly all
cases. When relaxing the cardinality constraint of the CCM-R formulation (4)
the resulting L2 -Markowitz method peforms best in nearly all cases but CCM-R
is a very close second when it is not best.
In addition, we checked the cardinality of the optimal portfolio vectors for
each formulation. Table 6 shows the average and standard deviation of the num-
ber of assets held determined as the number of components whose absolute value
is greater than 10−4 . As it becomes apparent the portfolios constructed by our
method involve more assets than Brodie and Markowitz. However, compared
with L2 -Markowitz, on average CCM-R portfolios consistently hold fewer assets
over all of the datasets. Overall it appears that CCM-R strikes a sensible bal-
ance between the performance of L2 -regularized Markowitz and sparsity such as
that attained by Brodie’s formulation. The CCM-R results with relatively dense
portfolios are consistent with the fact that the choice of k in CCM-R is based
on the best Sharpe ratio performance in validation experiments.
Sparsity and Performance Enhanced Markowitz Portfolios 879
Table 5. The average test Sharpe ratio in the unspecified cardinality (k) setting.
Table 6. The average number of assets held in the unspecified cardinality (k) setting.
Fig. 1. A set of figures displaying the Sharpe ratio vs. k for Brodie et al.’s method,
CCM and CCM-UNRM on the indicated datasets.
6 Conclusion
We show that our MISOCP formulation enables the solution and faster solu-
tions of realistically sized problems using standard solvers. CCM appeared as
an effective learning method for a small to moderate number of potential assets,
which handles a specified hard cardinality constraint. It appeared to result in
better Sharpe ratios for each given cardinality compared with Brodie et al.’s
Sparsity and Performance Enhanced Markowitz Portfolios 881
L1 -regularized method. Our basic CCM model also provides further evidence
in support of Fama and French’s factor models by showing that the factors
are deployed by optimal portfolios under a sparsity requirement. Our CCM-R
method is designed for large financial datasets by using instead a tight continu-
ous relaxation of the integer formulation. In our experiments, it appears to strike
a sensible balance between the best performing dense L2 -regularized Markowitz
models and the sparsity of Brodie et al.’s method.
References
1. Akturk, M.S., Alper, A., Sinan, G.: A strong conic quadratic reformulation for
machine-job assignment with controllable processing times. Oper. Res. Lett. 37(3),
187–191 (2009)
2. Bonami, P., Lejeune, M.: An exact solution approach for portfolio optimization
problems under stochastic and integer constraints. Oper. Res. 57(3), 650–670
(2009)
3. Brodie, J., Daubechies, I., Mol, C.D., Giannone, D., Loris, I.: Sparse and stable
Markowitz portfolios. Proc. Natl. Acad. Sci. 106(30), 12267–12272 (2009). https://
doi.org/10.1073/pnas.0904287106
4. DeMiguel, V., Garlappi, L., Uppal, R.: Optimal versus naive diversification: how
inefficient is the 1/n portfolio strategy? Rev. Financ. Stud. 5, 1915–1953 (2009)
5. Fama, E.F., French, K.R.: The cross-section of expected stock returns. J. Financ.
2, 427–465 (1992)
6. Goldberg, N., Leyffer, S., Munson, T.: A new perspective on convex relaxations of
sparse SVM. In: Proceedings of the 2013 SIAM International Conference on Data
Mining, pp. 450–457 (2013). https://doi.org/10.1137/1.9781611972832.50
7. Günlük, O., Linderoth, J.: Perspective reformulations of mixed integer nonlinear
programs with indicator variables. Math. Program. 124, 183–205 (2010)
8. Li, J.: Sparse and stable portfolio selection with parameter uncertainty. Bus. &
Econ. Stat. 33(3):381–392 (2015)
9. Lobo, M.S., Fazel, M., Boyd, S.: Portfolio optimization with linear and fixed trans-
action costs. Ann. Oper. Res. 152(1), 341 (2007). https://doi.org/10.1007/s10479-
006-0145-1
10. Markowitz, H.: Portfolio selection. J. Financ. 7(1), 77–91 (1952). https://doi.org/
10.1111/j.1540-6261.1952.tb01525.x
11. Gurobi Optimization: Inc.: Gurobi optimizer reference manual. http://www.
gurobi.com (2014)
12. Sharpe, W.F.: Mutual fund performance. J. Bus. 39(1), 119–138 (1966). Supple-
ment on Security Prices
Managing Business Process Based
on the Tonality of the Output Information
1 Introduction
part of a business process [3], which is constantly looking for a promising mode for
executing a business process after the next business process cycle.
This paper proposes one of the options for improving the quality of the business
process among the possible, in particular, overflows will improve or make more ade-
quate to the current needs of the population in this way.
Let enterprises produce m types of goods: (a1, a2, a3, a4, a5,…, ai). And let the
initial preferences at time t = to look like this:
where li;to , to is the preference coefficient (weight) of the ai -th product at time t = to.
Then, as a result of studying the opinion of the population, the coefficient of
preference at time t ¼ ti can be (improved) will become different:
where li;ti is the preference coefficient (weight) of the ai -th product at the moment of
time t ¼ ti .
Such work is necessary in marketing research in product planning, for example,
types of loans issued by banks. Each type of loan is a product or product. Then it will
be necessary to establish the most preferred type of loan by the лopпaвыф46шщзe
population and the less preferred type of loan.
It should be noted the most adequate, reliable and accurate business process
management is achieved if you manage your products using feedback. As one of the
type of feedback, you can take tonal data on the evaluation of attractiveness to con-
sumers, where product quality is the tonal data evaluated by consumers of this business
process.
Tonal expressions can be varied. In this paper, we consider only two types of
expression of the tonality of the output products of a business process:
• absolute tonality;
• comparative or relative tonality.
As a subject area of research we take the banking sector of the economy. Let there be
reviews on the work of each bank. We will analyze the reviews. Tonal data are reviews
that are presented in the form of texts. An example of absolute tonality:
• The “small business” loan Kazkom Bank is very good.
• Small business is well credited by Kazkombank.
• Kazkom always credits small business well.
• Kazkombank’s lending to a small business is good.
884 R. Uskenbayeva et al.
The objects of the outside world are not always evaluated in the absolute scale of
measurement, and some reviews are given in the form of a comparative assessment.
An example of relative tonality:
• “Kazkommertsbank services much better than Halyk Bank”;
• “The service in the national bank is not worse than in Kazkommertsbank, and it
serves much better than the Caspian Bank”;
• “The conditions of micro-crediting in Centercreditbank are more beautiful than
those of ATF Bank”;
• “In Kazkom, lending to small businesses is better than microcredit”;
• “Lending to the agricultural business ray than lending to commercial transactions”,
etc.
In such cases, a relative measurement scale is used.
There are two options for assessing the tonality of the quality of products and quality of
service of the business process. The feedback that customers leave on the Internet
reflects the tone of one of these two aspects or both of the aspects of the same business
process.
In addition, customer reviews can contain two types of tonality: absolute and
relative.
Thus, reviews may or may contain tonalities: absolute and relative, which reflect
the following aspects of the business process: the quality of output products and the
quality of servicing customers by the bank.
All these types of key in the reviews are identified by the keywords (so far without
semantics).
We will make a difference between absolute and relative tonality assessments, since
their processing algorithms are different. That is, reviews characterizing absolute and
relative tonality are processed by different algorithms. Here, the tonality of the relative
reviews is first translated into absolute; further processing is also carried out as absolute
reviews [4].
It is assumed that each review reflects a single tone value or characteristic of the
object being evaluated.
The source of feedback is the Internet. How they are recorded does not interest us,
for us they are given. And we calculate the tonality based on the results of each cycle or
for a given period of time Ds, for example, per day. It is possible that after each time
period Ds for a full partial analysis, and then after time k*Ds full tonality analysis, for
example, for a week
Consider the processing algorithm reviews (or tonality)
First, consider the algorithm for processing all types of reviews.
1. Collection of tonal data for a certain period by selecting from various types of data
on the Internet based on keywords (without semantics) or from specialized sites.
Managing Business Process Based on the Tonality … 885
It is assumed that in the task of analyzing the tonality each review {Otji} carries a
certain tonality about the product ai, which is denoted here by xi.
Among them, we choose the Psimplest expression for calculating the tonality using
the formula: FðÞ ¼ X ¼ 1=nð nði¼1Þ xi Þ.
Thus, if the assessment of the tonality of all is from 11 intervals, then we write in
the form:
Xn
FðÞ ¼ X ¼ 1=nð xÞ
ði¼1Þ i
¼ TnðtnÞ ¼ 1=11ðKðd5 Þ þ Kðd4 Þ þ Kðd3 Þ
þ Kðd2 Þ þ Kðd1 Þ þ Kðd0 Þ þ Kðd1 Þ þ Kðd2 Þ
þ Kðd3 Þ þ Kðd4 Þ þ Kðd5 ÞÞ;
or
where Tn(tn) is the tonality value at the current time tn, Kðdi Þ is the number of reviews
by the score di (the number of reviews that received the di rating), where
886 R. Uskenbayeva et al.
di 2 f5; 4; 3; 2; 1; 0:5; 1; 2; 3; 4; 5g. vðai Þabsolute is the absolute tonality of the
product ai.
The weight (or significance) of a score di (for example, by estimating di ¼ 4)
vðai Þabsolute we calculate in this way:
where KðdiÞ is the number of reviews by the score di , m is the total number of different
tonality points, i.e. the sum of reviews for all tonality points, cðdi Þ–the weight of the
point di (for example, di ¼ 4) as part of the collected reviews in the amount of m.
Thus, if on one i-th point there are a lot of reviews ðKðdi ÞÞ, then the weight of this
score cðdi Þ will be higher.
Relative tonality
Calculations (assessment) of the tonality of one portion of reviews selected from the
general list or at a specified period of time Ds proceed as follows.
The relative preference can be set as follows:
(1) “object i is better than object j”, i.e. i > j;
(2) “object i is worse than object j”, i.e. i < j;
(3) “object i is no better than object j”, i.e. i j;
(4) “object i is not worse than object j”, i.e. i j;
(5) “object i is equal to object j”, i.e. i = j.
The above series will transform in the following convenient:
Relative tonality
where vðaI Þtotal is the total overall assessment of tonality, vðaI Þabsolute is tonality by
absolute estimate, vðaI Þrelative refers to tonality by relative assessment.
i ðtn Þ ¼ li ðtn þ 1 Þ or li ðtn Þ\li ðtn þ 1 Þ; where li ðtn Þ, is the value of li
lcorr corr corr corr corr corr
at
time tn, li ðtn1 Þ– value li at time tn þ 1 .
corr corr
The computed tonality of the output of the business process and the level of customer
service of the organization is used to control the operation of the business process, but
first we make the following assumptions.
Each key parameter is autonomous. Autonomy means that the decision for each key
is independent and the decision will have independent values and missions.
To do this, each group of data intended for a specific purpose, for example, for
planning, for constructing trajectories, etc. This specialization of data is convenient to
conduct a uniform action on them. Especially these are the benefits for setting or
learning data parameters [5].
888 R. Uskenbayeva et al.
Where:
In this case, control the behavior of the business process on the example of data
planning products produced, i.e. We will demonstrate the basis for determining or
planning a number of manufactured products.
The formulation of the problem is as follows. Suppose that at the initial, at the time
to , the state of the object was CSðtÞ ¼ CSðtn Þ and at the same time the given target
situation of the object . And at the same time, enterprises produce m
types of goods: (a1 ; a2 ; a3 ; a4 ; a5 ; . . .; ai ). And let the initial preferences at time t ¼ to
look like this:
where li;to , to is the preference coefficient (weight) of the ai-th product at time t = to
And let at the moment of time t ¼ t þ o , any feedback on the business process under
consideration on the Internet, i.e. buyers wrote to the Internet. Then, as a result of
studying the opinion of the population, the preference ratio at time t ¼ ti may become
different:
where li;ti is the preference coefficient (weight) of the ai-th product at the moment of
time t ¼ ti .
The value of the weight of products and resources varies depending on market
conditions, depending on the change in taste and preference of users of products of the
business process.
Thus, the current weight value is defined as follows:
li ¼ linitial
i þ ji lcorr
i ;
where lcorr
i ¼ bi lton
i or li
corr
¼ lton
i þ bi li ; bi is the correction factor;
ton
The obtained values of weights are taken into account when developing plans for
the output of the business process.
5 Conclusion
In the work as a tonal expression or approval to assess the quality of the activities and
output of the business process are taken reviews, which gives individual representatives
of the consumer business process. But these expressions (reviews) are estimates that are
given by individuals, i.e. is the taste of products, determined by individual consumers
of products of the business process. To transform individual quality estimates into an
object or overall rating (estimated by a certain group of consumers), the following three
algorithms were developed: processing and summarizing absolute feedback, relative
feedback and a total assessment of the results of the two algorithms. The following
describes the procedures for recording the results of evaluating the quality of the
activities and products of the business process for drawing up a plan for the output of
products by the business process.
Acknowledgments. This work was supported by Ministry of Education and Science Republic
of Kazakhstan (Grant No. 0118PК01084, Digital transformation platform of National economy
business processes BR05236517).
References
1. Anderson, D.J.: Kanban: Successful Evolutionary Change for Your Technology Business.
Published by Blue Hole Press (2010)
2. Duisebekova, K., Serbin, V., Ukubasova, G., Kebekpayeva, Z., Skakova, A., Rakhmetu-
layeva, S., Shaikhanova, A., Duisebekov, T., Kozhamzharova, D.: Design and development
of automation system of business processes in educational activity. J. Eng. Appl. Sci. 8:4702–
4714 Medwell Journals, ISSN:86-949X (2017)
3. Rakhmetulaeva, S.: Using inverse information as a method for assessing and analyzing
reputational risk. J. Vestn. EKSTU 2, 129–133 (2015)
4. Kuandykov, A., Rakhmetulayeva, S., Baiburin, Y., Nugumanova, A.: Usage of singular value
decomposition matrix for search latent semantic structures in natural language texts. In: The
34th Chinese Control Conference and SICE Annual Conference 2015 (CCC&SICE2015),
pp. 286–291. Hangzhou, China (2015). https://ieeexplore.ieee.org/document/7285567. Last
accessed 21 Feb 2019
5. Nugumanova, A., Mansurova, M., Alimzhanov, E., Zyryanov, D., Apayev, K.: An automatic
construction of concet mas based on statistical text mining. In: International Conference on
Data Management Technologies and Applications, pp. 29–38 (2015)
6. Bessmertny, I.: Knowledge visualization based on semantic networks. Program. Comput.
Softw. 36(4), 197–204 (2010)
Energy and Water Management
Customer Clustering of French
Transmission System Operator (RTE)
Based on Their Electricity Consumption
1 Introduction
RTE (French transmission system operator) is in charge of the high voltage grid
for electricity in France. As a smart grid, RTE is responsible for the balance of
production and consumption, the safety of transportation and the quality of the
delivered services. Two thirds of the French industry is connected to the high
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 893–905, 2020.
https://doi.org/10.1007/978-3-030-21803-4_89
894 G. Da Silva et al.
level voltage grid and the main business objective of RTE is to ensure customer
satisfactions by improving its services. In an evolving context with digitalization
and the rise of new technologies, RTE keeps the pace of innovation through its
R&D department and collaborations with research institutions. Projects, like
customer clustering, will grant that the RTE’s high voltage grid is relevant in
the future as a public asset. One of main activities of RTE customer relationships
managers is to analyse customer’s data in order to better know the customers and
to enhance relationships. Traditionally, RTE customer relationships managers
visualize the customer’s consumption curves and manually detect their patterns
to understand troubles, changes of behavior, evolution, etc. The task is realized
based on the experience and knowledge of customer relationships managers.
However, such a technique is somewhat biased and a “manual” pattern detection
is clearly not a good method. Hence, there is a need to develop efficient tools
helping customer relationships managers to realize their tasks.
In this work, we develop an efficient approach for clustering (automatic clas-
sification) RTE’s customers based on their electricity consumption. Each RTE’s
customer is characterized by his electricity consumption curves which contain
the electricity consumption of each 10 m over two years. Hence, each customer
is represented by a time-series sequence of 105, 120 points. We are undoubtedly
facing a very large-scale time-series clustering problem.
Clustering is a fundamental machine learning task and has numerous appli-
cations in various domains. Clustering consists in dividing a set of data objects
into “homogeneous” groups (clusters) such that objects in the same group are
more “similar” to each other than to those in other groups. The main objec-
tive of our customer clustering task is to automatically detect patterns and find
casualties of customers in their evolution. The results will help RTE to better
know its customers and to propose them more adequate services. To speak in
marketing terms, they will allow a better adaption of maintenance schedule, a
smooth preparation for real-time operations and a cost reduction. Understanding
more precisely the behaviors of customers on the grid is one more step towards
a smart grid.
In recent years, due to an exponential growth of the time-series data applica-
tions in emerging areas such as sale data, finance, weather, . . . , there have been
considerable research and developments in time-series clustering. Time-series
clustering is a hard task due to the following two main difficulties. The first lies
in the nature of temporal information in time-series. More precisely, while eval-
uating the similarity between time-series objects, the chosen similarity measure
should be able to take into account the temporal information of the consid-
ered time-series data. The second main difficulty concerns the high-dimensional
nature of time-series data. A high number of dimensions leads to great increases
in the computation time of clustering algorithms. Furthermore, clustering tech-
niques often suffer from the “curse of dimensionality” phenomenon, say the
quality of clustering algorithms is degraded as the dimension of data increases.
Hence, for developing an efficient time-series clustering algorithm, one has to deal
with three important issues: appropriate similarity measures for time-series data,
Customer Clustering of French Transmission System Operator 895
Fig. 1. Euclidean and DTW distance of two very-similar time-series (red and blue
curves). The black line shows the matching {pl = (nl , ml )}l=1,...,L between points
between two time-series.
{(1, 0), (0, 1), (1, 1)} for l ∈ {1, . . . , L − 1}). Hence, the DTW distance between
two time-series x and y is given by
L
dDTW (x, y) = min |xnl − yml || p is a valid (N, M)-warping path . (1)
l=1
DTW minimizes the differences between two times-series by aligning each point
from one time-series to the best corresponding points in the other time-series by
a warping path. Hence DTW is enough flexible to handle the shifting between
time-series, and is able to catch the similar patterns of two time-series sequences.
Therefore, the DTW distance is chosen for our time-series clustering approach.
However, the main drawback of DTW distance is its computation time. The
algorithm for computing the DTW distance is a recursive algorithm with com-
plexity of O(T 2 ) where T is the length of time-series. Hence, for high-dimensional
time-series data, it is very slow, even impossible to use directly DTW distance
in clustering algorithms involving the computation of DTW at each iteration
(e.g. k-means and its variants). To overcome this drawback of DTW, we adopt
a feature-based clustering approach. Clustering time-series is usually tackled by
two approaches: the raw-data-based, where clustering is directly applied over
time-series vectors without any space-transformation previous to the clustering
phase, and the feature based approach which does not directly perform clustering
on the time-series raw data.
where σi is the variance of the Gaussian with the mean ai . Similarly, we define
j=1,...,N
Q := (qi,j )i=1,...,N , the similarity pairwise matrix of xi in new space, where
The problem (3) is a non-convex optimization problem for which several algo-
rithms have been developed [8,12,22].
t-SNE offers the liberty to choose the similarity measures doriginal (resp. dnew )
in original (resp. new space). In the original work [12], the authors applied t-SNE
to an application in data visualization where the number of dimensions in the
new space is low (2 or 3), with both doriginal and dnew are Euclidean distance. In
our case, we consider doriginal as the DTW distance while the Euclidean distance
898 G. Da Silva et al.
is chosen for dnew . On the one hand, it is obvious that doriginal should be DTW
distance since it is well adapted for time-series data as we have shown in Sub-
Section 2.1. On the other hand, the choice of Euclidean distance for dnew is
motivated by the existence of several efficient, scalable and robust clustering
algorithms based on DC (Difference of Convex programming) and DCA (DC
Algorithm) via the Euclidean distance (e.g. DCA-MSSC [6], DCA-KMSSC [7]).
For a complete study of DC programming and DCA the reader is referred to
[10,11,15,16] and references therein. To the best of our knowledge, this is the
first time t-SNE is used with DTW distance.
The time-series data are now transformed into a new space by t-SNE. In the
next Section, we will study some efficient clustering algorithms for the trans-
formed data.
Fast and scalable DCA based clustering algorithms As we have men-
tioned previously, in our problem, except the consumption curves, we do not
have any information on the clusters, nor the number of clusters. Finding the
number of clusters is challenging for clustering tasks. Generally, there exist two
approaches. The first approach consists in finding firstly the number of clusters
with a “simple” procedure then apply a clustering algorithm with the number of
clusters found previously. In the second approach, one simultaneously determines
the number of clusters and clustering assignment.
In the literature, several algorithms have been developed for finding the num-
ber of clusters. For instance, Elbow algorithm which uses the WSS criterion
(“total within-cluster sum of square”) to determine the number of clusters. The
number k ∗ of clusters is optimal if the corresponding WSS does not change
significantly when increases the number of clusters by 1. Silhouette Average
algorithm is similar to Elbow algorithm. This algorithm varies the number of
clusters and chooses the one that maximizes the Silhouette criterion. Gap Statis-
tic algorithm [20] is another variant of Elbow algorithm. Gap Statistic algorithm
maximize “gap statistic” criterion, which is defined by the difference between the
measured WSS and its expected value under some conditions.
Assuming that the number of clusters is known, there exists a variety of
Euclidean-based clustering algorithms such as k-means, k-medoids, fuzzy c-
means, etc. Among many models for clustering, the Minimum Sum-of-Squares
(MSSC) is one of the most popular since it expresses both homogeneity and
separation. The MSSC problem is described as follows. Given a dataset X :=
(xi )i=1,...,N of N points in RD (i.e. xi ∈ RD are transformations of the time-
series ai in the new space) and a pre-defined number of clusters K. We aim to
to find K points U := (ui )i=1,...,K in RD , known as “centroid”, and assign each
data point in X to its closest centroid ui . MSSC minimizes the sum of squared
distances from the data-points to the centroid of their cluster, as presented in (4).
N
1
min FMSSC (U ) = min ul − xi 22
(4)
U ∈RK×D 2 i=1 l=1,...,K
the MSSC model. DCA-MSSC have shown its superior in comparison with state-
of-the-art algorithms: performance, robustness, and adaptation to different types
of data. We refer to the original paper [6] for more details of DCA-MSSC algo-
rithm.
On the other hand, among the algorithms that simultaneously determines
the number of cluster and clustering assignment, mclust [19] is a well-known
one. mclust uses the Gaussian Mixture Model. The optimal number of segments
K ∗ is determined by the Bayesian Information Criterion (BIC) and Integrated
Complete-data Likelihood (ICL). In a different direction, Le Thi et al. [9] have
proposed the DCA-Modularity algorithm. DCA-Modularity transforms the data-
points into a graph then segments vertices of the graph using the modularity cri-
terion as a measure of clustering quality. The problem of maximization of graph
modularity is summarized as follows. Consider an undirected unweighted net-
work G = (V, E) with N nodes (V = {1, . . . , N }) and M edges (M = Card(E)).
Denote by the adjacency matrix A: ai,j = 1 if (i, j) ∈ E, 0 otherwise. The degree
N
of node i is denoted ωi (ωi = j=1 ai,j ), and ω stands for the vector whose com-
ponents are ωi . Let P be a partition of V, and K is the number of communities
in P. Define the binary assignment U = (ui,k )k=1,...,Ki=1,...,N , says ui,k = 1 if vertex
i belongs to community k and 0 otherwise. Then, the modularity maximization
problem can be written as
1
N K
max Q(U ) := 2M i,j=1 bi,j k=1 ui,k uj,k , (5)
U
K
s.t. k=1 ui,k = 1, for i = 1, . . . , N ;
ui,k ∈ {0, 1}, for i = 1, . . . , N, k = 1, . . . , K;
3 Numerical Experiments
Our experiments are realized on a dataset which contains 462 customer’s elec-
tricity consummation curves of RTE. For a confidential reason, each client is
named by a randomly generated number. The consumption curve contains the
electricity consumption of each 10 minutes over two years (from 01 January 2016
to 31 December 2017).
The code was written in C# 4.7.1. All experiments are conducted on an
Intel(R) Xeon(R) E5-2630v4 (40 CPUs) with 32 GB of RAM.
Experiment 1: We first analyze the relevance of our clustering result. For this
purpose, we perform the Algorithm 1 on the whole dataset. It is worth to note
that the computation time of our method is only 42 m in total. It comes out that
the number of clusters determined by our algorithm is 19. In Fig. 3, we report
the number of customers in each cluster.
Due to the limitation of paper’s length, we only analyze the result from three
clusters. The choice is solely based on the number of customers in each cluster:
the biggest cluster (cluster C7), followed by a medium-size one (cluster C3, which
Customer Clustering of French Transmission System Operator 901
is 25% smaller than cluster C7) and one small-size cluster (cluster C17, which
is the 4th smallest cluster). The consumption curve of four arbitrarily chosen
customers from each cluster are presented in Fig. 4 (cluster C3), Fig. 5 (cluster
C7) and Fig. 6 (cluster C17).
We observe that customers in each cluster clearly have similar shapes. For
cluster C7, customers tend to have a “regular” consumption pattern: the con-
sumption is high and followed by a short “drop”, i.e. the consumption suddenly
tumbles to a small value, in comparison with the consumption level of previous
period), which repeats during the year. Further analysis reveals that this is a
typical “weekly” consumption pattern, where the drops often happen during the
weekend. In addition, they also have a long drop in consumption for 2–3 weeks
around the middle of August, and a shorter drop at the end of the year. For
cluster C17, as we can see, all four customers have a low electricity consump-
tion (all are around 0), despite the differences in their maximum consumption.
They frequently generate very high peaks in a short duration during the whole
year. Customers in cluster C3 have a stable consumption during the whole year
(mostly varies around a base) and rarely have long “drop” during the year. They
902 G. Da Silva et al.
often have short drops in consumption (as oppose to short “peaks” in cluster
C17).
From the Figs. 4, 5 and 6, we can conclude that (1) the consumption curve of
customers in the same clusters are coherent and (2) the differences of customers
between clusters are quite clear.
Experiment 2: In this experiment, we are interested in the capacity of our
method to detect if a customer changes his way of consuming. For this purpose,
we apply a Sliding Window technique with slide duration of four weeks, thus we
obtain 13 different one-year-window datasets. Each one-year window datasets is
processed by Algorithm 1.
We now analyze a customer whose consumption behavior changes over time.
Consider the case of customer 10010. As we can see in Fig. 7, this customer
has a sharp drop in consumption in August 2015 while there is a much smaller
decline in August 2016. This customer has therefore clearly changed his mode of
consumption. In Fig. 8, we show the clusters to which customer 10010 belongs
during the 12 monthly runs of our segmentation algorithm. We see that up to
the 12/08/2016, customer 10010 belongs to cluster C2. As it was detected by
our algorithm, this customer changes his mode of consumption in August 2016.
By 09/09/2016, customer 10010 is assigned to a new cluster (cluster C7).
In Figs. 9 and 10, we show some customers in cluster C2 and cluster C7. We
observe the similarities between the load curve of customer 10010 from August
12th, 2015 to August 11th, 2016 (Fig. 7a) and those of customers in cluster C2
(Fig. 9). This same remark is valid between customer 10010’s consumption curve
Fig. 8. The cluster of customer 10010 during nine runs. The first four runs’ results
are also C2, thus it is cropped out for visibility reason. The date (i.e. 22/04/2016)
represents the staring date of the one-year window.
from September 9th, 2015 to September 8th, 2016 (Fig. 7b) and other customers
of cluster C7 (Fig. 10).
4 Conclusion
This approach is the result of in-depth studies using advanced theoretical and
algorithmic tools for large-scale time-series data clustering. We have efficiently
tackled the three challenges for our time-series clustering task: the similarity
distance measure, the clustering algorithm, and the Big data. The innovative
character intervenes in all stages of the proposed approach: the data transfor-
mation via t-SNE with the DTW measure in the original data space and the
904 G. Da Silva et al.
References
1. Aach, J., Church, G.M.: Aligning gene expression time series with time warping
algorithms. Bioinformatics 17(6), 495–508 (2001)
2. Aghabozorgi, S., Seyed Shirkhorshidi, A., Ying Wah, T.: Time-series clustering -
a decade review. Inf. Syst. 53, 16–38 (2015)
3. Bagnall, A., Lines, J., Bostrom, A., Large, J., Keogh, E.: The great time series
classification bake off: a review and experimental evaluation of recent algorithmic
advances. Data Min. Knowl. Discov. 31(3), 606–660 (2017)
4. Chu, S., Keogh, E., Hart, D., Pazzani, M.: Iterative deepening dynamic time warp-
ing for time series. In: Proceedings of the 2002 SIAM International Conference on
Data Mining, pp. 195–212. SIAM (2002)
5. Goldin, D.Q., Kanellakis, P.C.: On similarity queries for time-series data: Con-
straint specification and implementation. In: Montanari, U., Rossi, F. (eds.) Prin-
ciples and Practice of Constraint Programming – CP 1995. Lecture Notes in Com-
puter Science, pp. 137–153. Springer, Heidelberg (1995)
6. Le Thi, H.A., Belghiti, M.T., Pham Dinh, T.: A new efficient algorithm based on
DC programming and DCA for clustering. J. Glob. Optim. 37(4), 593–608 (2007)
7. Le Thi, H.A., Le, H.M., Pham, D.T.: New and efficient DCA based algorithms for
minimum sum-of-squares clustering. Pattern Recognit. 47(1), 388–401 (2014)
8. Le Thi, H.A., Le, H.M., Phan, D.N., Tran, B.: A DCA-Like Algorithm and its
Accelerated Version with Application in Data Visualization. arXiv:1806.09620 [Cs,
Math]. p. 8 (2018)
Customer Clustering of French Transmission System Operator 905
9. Le Thi, H.A., Nguyen, M.C., Pham Dinh, T.: A DC programming approach for
finding communities in networks. Neural Comput. 26(12), 2827–2854 (2014)
10. Le Thi, H.A., Pham, D.T.: The DC (Difference of Convex Functions) programming
and DCA revisited with DC models of real world nonconvex optimization problems.
Ann. Oper. Res. 133(1), 23–46 (2005)
11. Le Thi, H.A., Pham Dinh, T.: DC programming and DCA: thirty years of devel-
opments. Math. Program. 169(1), 5–68 (2018)
12. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn.
Res. 9, 2579–2605 (2008)
13. Meinard, M.: Dynamic time warping. In: Information Retrieval for Music and
Motion, pp. 69–84. Springer, Heidelberg (2007)
14. Paparrizos, J., Gravano, L.: K-Shape: efficient and accurate clustering of time
series. In: Proceedings of the 2015 ACM SIGMOD International Conference on
Management of Data, pp. 1855–1870. ACM Press (2015)
15. Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to DC programming:
theory, algorithms and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)
16. Pham Dinh, T., Le Thi, H.A.: A D.C. optimization algorithm for solving the trust-
region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)
17. Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q.,
Zakaria, J., Keogh, E.: Searching and mining trillions of time series subsequences
under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data Mining - KDD 2012, p. 262.
ACM Press, Beijing, China (2012)
18. Schäfer, P.: The BOSS is concerned with time series classification in the presence
of noise. Data Min. Knowl. Discov. 29(6), 1505–1530 (2015)
19. Scrucca, L., Fop, M., Murphy, T.B., Raftery, A.E.: Mclust 5: clustering, classi-
fication and density estimation using gaussian finite mixture models. R J. 8, 29
(2016)
20. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data
set via the gap statistic. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 63(2), 411–423
(2001)
21. Warren Liao, T.: Clustering of time series data–a survey. Pattern Recognit. 38(11),
1857–1874 (2005)
22. Yang, Z., Peltonen, J., Kaski, S.: Majorization-minimization for manifold embed-
ding. In: Artificial Intelligence and Statistics, pp. 1088–1097 (2015)
23. Yi, B.K., Faloutsos, C.: Fast time sequence indexing for arbitrary Lp norms. In:
VLDB, vol. 385, p. 99 (2000)
Data-Driven Beetle Antennae Search
Algorithm for Electrical Power Modeling
of a Combined Cycle Power Plant
1 Introduction
sensors. These phenomenon is depicted in Fig. 1. Based on this phenomenon, the BAS
algorithm could be synopsized.
Fig. 1. Beetle Search procedure based on odour sensing mechanism using antennae
2 Research Methodology
In this study, the focus is put on the Multi-Layer Perceptron (MLP) network, which is
used as the fitness evaluating function for BAS. The MLP networks are suitable for
predictive modeling because of their natural ability of finding the correlations among
the random inputs and outputs [9]. MLPs are classified in two categories, (1) Cascade
Feed-Forward Neural Network (CFNN) and (2) Feed-Forward Neural Network
(FFNN). Unlike CFNN, the FFNN does not have any direct connection between inputs
and outputs.
Data-Driven Beetle Antennae Search Algorithm for Electrical … 909
It has some direct connections among the inputs and the outputs. It has n input
neurons, m hidden layers neurons, and output neurons. The output equation is shown
as,
!!
X
n X
m X
n
yi ¼ Zik wkj xk þ Zioa woa
ji Zkha wha
jk xk ð1Þ
k¼1 j¼1 k¼1
Where Zoa oa
i is denoted as activation function for ith output yi, wji is the weight from jth
ha
hidden layer neuron to ith output node, Zk is the activation function for jth hidden
layer neuron, whajk is the weight from the kth input to the jth hidden layer neuron, and xk
is the kth input signal. Zki is the activation function and wkj is the weight from the inputs
to outputs. Further, if some bias is added to the input layer, the Eq. (1) becomes,
!!
X
n X
m X
n
yi ¼ Zik wkj xk þ Zioa bi þ woa
ji Zkha bj þ wha
jk xk ð2Þ
k¼1 j¼1 k¼1
Where bi is the weight from the bias to the ith output layer neuron and bj is the weight
from the bias to the jth hidden layer neuron. Zki is the activation function and wkj is the
weight from the inputs to outputs. The network weight in CFNN is approximated based
on the neurons in the input layer.
Data-Driven Beetle Antennae Search Algorithm for Electrical … 911
1 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
X
RMSE ¼ ð yi t i Þ 2 ð3Þ
N i
1X
MAE ¼ jyi ti j ð4Þ
N i
The BAS algorithm is a single solution based metaheuristic technique which is similar
to the Simulated Annealing (SA) algorithm. The BAS starts with a randomly generated
beetle with the position vector xt at tth time instant (t = 1, 2, …, n) and the position is
evaluated using some fitness function f which determines the smell or odour concen-
tration [12]. The beetle would decide to move further based on the smell concentration
by generating the next promising position for it to move. This next position would be
obtained in the neighborhood of the previous position by following some rules. These
912 T. Ghosh et al.
rules are derived from the behavior of the beetle, which includes exploring and
exploiting behavior. The directional move is determined by,
~ rnd ðk; 1Þ
b¼ ð5Þ
krnd ðk; 1Þk
Where, rnd is considered as a random function, and k signifies the input dimensions of
the beetle position. The exploring is performed on right (xr) and left (xl) sides as the
beetle moves similarly using its antennae. The moves could be presented using,
xr ¼ xt þ d t ~
b ð6Þ
xl ¼ xt d t ~
b ð7Þ
Where d is the sensing length of the antennae, which implies the exploiting skill. Value
of d must enfold the solution space which is large enough. This phenomenon could
further guide the algorithm to escape from being stuck at the local optima and improve
the convergence speed. Secondly, to formulate the behavior of detecting, following
iterative model is generated,
xt ¼ xt1 þ dt~
b signðf ðxr Þ f ðxl ÞÞ ð8Þ
Where d is the step size of the exploring mechanism, which follows a decreasing
function of t. sign represents a sign function. Update rules are defined using the
antennae length d and step size d, as,
dt ¼ 0:95dt1 ð10Þ
The proposed data-driven BAS framework is depicted in Fig. 4. It shows that the
CFNN model is used as a surrogate model to the BAS algorithm, which can evaluate
the candidate solutions effectively. The CCPP module of this framework, is used for
collecting data.
To validate the proposed data-driven BAS algorithm, the CCPP dataset is used from the
UCI Machine Learning Repository, which is portrayed in Subsect. 1.2. This data is
divided in 70:30 for the training, and testing. Thereafter, 100 data points are randomly
picked for the validation purpose. Levenberg-Marquardt backpropagation is used for
the network training. Parameters for the CFNN are set as, learning rate = 0.1, error
goal = 1e-7, and number of epochs = 1000. The convergence property of the data-
Data-Driven Beetle Antennae Search Algorithm for Electrical … 913
driven BAS algorithm is demonstrated in Fig. 5 for 500 generations. The parameters of
the BAS algorithm are set as d0 = 0.001, and d0 = 0.8. It could be observed that the
proposed algorithm achieves global optimal solution promptly. The best solution
obtained is fbst [PE = 498.2971] with the optimal parameters set = [AT = 7.9415,
V = 42.92045, AP = 1003.937, RH = 51.678] t=253. This solution shows better electric
power output than the best result portrayed in the dataset [AT = 5.48, V = 40.07,
AP = 1019.63, RH = 65.62, PE = 495.76].
Based on the data considered in Ref. [6], input variables are divided in 4 subsets
and the CFNN based predictive model is tested on these. The obtained MAE and
RMSE scores are compared with seven different regression models published previ-
ously. Results are depicted in Table 2. It could be observed that the CFNN model is
clearly better than the published six models out of seven except the one. The CFNN
scores are very close to the best published results with low variance scores overall.
Table 2. Comparison among the CFNN prediction model and published methods [6]
AT AT-V AT-V-AP AT-V-AP-RH Mean Variance
MAE RMSE MAE RMSE MAE RMSE MAE RMSE MAE RMSE MAE RMSE
CFNN 3.92 5.08 3.24 4.25 2.98 4.01 2.92 3.89 3.27 4.31 0.21 0.29
LMS 4.28 5.43 3.91 4.97 3.62 4.58 3.62 4.57 3.86 4.89 0.10 0.17
SMO 4.28 5.43 3.91 4.97 3.62 4.58 3.62 4.56 3.86 4.89 0.10 0.17
K* 4.26 5.38 3.63 4.63 3.36 4.33 2.88 3.86 3.53 4.55 0.33 0.41
BREP 4.07 5.21 3.03 4.03 2.95 3.93 2.82 3.79 3.22 4.24 0.33 0.43
M5R 3.98 5.08 3.42 4.42 3.26 4.22 3.17 4.13 3.46 4.46 0.13 0.19
M5P 3.98 5.09 3.36 4.36 3.23 4.18 3.14 4.09 3.43 4.43 0.14 0.21
REP 4.09 5.23 3.26 4.34 3.21 4.29 3.13 4.21 3.42 4.52 0.20 0.23
914 T. Ghosh et al.
Figure 6 portrays the CFNN regression plots (with R-Values) and the scatter plot of
the CFNN predictive model. The prediction result for electrical power output of a CCPP
is substantially accurate. Due to this accuracy and high R-values obtained during CFF
training, cross-validation is not performed. This is demonstrated based on the actual PE
values and estimated PE values (for the 100 data points obtained randomly from the
dataset). It is observed from Table 2, that the CFNN obtains (MAE = 2.919 and
RMSE = 3.895) scores for the subset of all four parameters and very close to the BREP
scores (MAE = 2.818 and RMSE = 3.787). Therefore, this could be concluded that the
CFNN based predictive model is an efficient tool for electric power prediction in the
CCPP. This could be further used as a tool to forecast the accurate power output for the
next hours or days for the CCPP. Thereafter, the BAS algorithm is depicted as an efficient
data-driven optimization tool, which could select the right set of the process parameters
and optimal level of the electric power output. This approach could be employed to
increase the efficiency of CCPP. This further proves that the BAS is capable of achieving
the near-optimal solution even when the specific objective function is not available and
the process is solely dependent on the empirical process data.
Fig. 6. Regression plots and curve fitting plots (Scatter plot) of four parameter subset by CFNN
Data-Driven Beetle Antennae Search Algorithm for Electrical … 915
4 Conclusions
This article proposes a novel data-driven CFNN assisted BAS algorithm for optimal
power output of the CCPP. The BAS is a latest metaheuristic algorithm in the category
of the single solution based metaheuristics, which mimics the searching behavior of the
longhorn beetles. The CFNN network is used as the predictive model for output
approximation for the CCPP. The proposed technique is successfully tested on the
CCPP dataset published in the UCI Machine Learning Repository. The conclusions are,
the CFNN model is competitive and can produce outputs with very low MAE and
RMSE scores, the BAS algorithm is substantially efficient and capable of producing
optimal parameter sets and output of the CCPP, and The CFNN assisted BAS produces
next hour/day/month prediction accurately and enhances the efficiency of the
CCPP. This technique could be further extended for various engineering process
modelling and could be compared with the other standing metaheuristics in future.
References
1. Jiang, X., Li, S.: BAS: beetle antennae search algorithm for optimization problems (2017).
arXiv:1710.10724 [cs.NE]
2. Simpson, T.W., Toropov, V., Balabanov, V., Viana, F.A.C.: Design and analysis of
computer experiments in multidisciplinary design optimization: a review of how far we have
come–or not. In: 12th AIAA/ISSMO Multidisciplinary Analysis and Optimization
Conference (2008)
3. Beykal, B., Boukouvala, F., Floudas, C.A., Pistikopoulos, E.N.: Optimal design of energy
systems using constrained grey-box multi-objective optimization. Comput. Chem. Eng. 116,
488–502 (2018)
4. Garud, S.S., Karimi, I.A., Kraft, M.: Design of computer experiments: a review. Comput.
Chem. Eng. 106, 71–95 (2017)
5. An, Y., Lu, W., Cheng, W.: Surrogate model application to the identification of optimal
groundwater exploitation scheme based on regression kriging method-a case study of
Western Jilin Province. Int. J. Environ. Res. Public Health 12(8), 8897–8918 (2015)
6. Messac, A.: Optimization in Practice with MATLAB. Cambridge University Press, NY,
USA (2015)
7. Niu, L. X.: Multivariable generalized predictive scheme for gas turbine control in combined
cycle power plant. In: IEEE Conference on Cybernetics and Intelligent Systems (2009)
8. Tüfekci, P.: Prediction of full load electrical power output of a base load operated combined
cycle power plant using machine learning methods. Electr. Power Energy Syst. 60, 126–140
(2014)
9. Arnaiz-González, Á., Fernández-Valdivielso, A., Bustillo, A., De Lacalle, L.N.L.: Using
artificial neural networks for the prediction of dimensional error on inclined surfaces
manufactured by ball-end milling. Int. J. Adv. Manuf. Technol. 83, 847–859 (2016)
10. Willmott, C.J.: On the Validation of Models. Phys. Geogr. 2(2), 184–194 (1981)
11. Zhu, Z., Zhang, Z., Man, W., Tong, X., Qiu, J., Li, F.: A new beetle antennae search
algorithm for multi-objective energy management in microgrid. In: 13th IEEE Conference
on Industrial Electronics and Applications (ICIEA), pp. 1599–1603 (2018)
12. Wang, J., Chen, H.: BSAS: beetle swarm antennae search algorithm for optimization
problems (2018). arXiv:1807.10470 [cs.NE]
Finding Global-Optimal Gearbox Designs
for Battery Electric Vehicles
1 Introduction
Battery electric vehicles (BEV) become more and more important. The major
drawback of BEV in comparison to cars with an internal combustion engine
is still a shorter travel distance [9]. Therefore it is important to increase the
overall efficiency of the complete powertrain, i.e. of all vehicle components used
to transform stored electric energy into kinetic energy. This includes the engine
(electric motor) and the drivetrain, consisting of the transmission (gearbox), the
drive shafts, the differential, and the final drive (drive wheels). In our paper,
we focus on the interplay between the electric motor and the gearbox. The use
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 916–925, 2020.
https://doi.org/10.1007/978-3-030-21803-4_91
Finding Global-Optimal Gearbox Designs for Battery Electric Vehicles 917
2 Related Work
The optimization of powertrains and their parts is a major research area. Espe-
cially the optimal design of gearboxes has been investigated in the literature.
Because of the combinatorial nature of the underlying mathematical problem,
the general optimization task is highly complex. Therefore, mostly heuristic opti-
mization methods have been applied to derive optimized solutions, cf. [15,20,21].
An approach for finding global-optimal solutions is shown in [1,8]. The authors
use a MINLP formulation to derive gearbox designs with a minimum size and
a minimum number of switching devices, as they have a major impact on the
manufacturing costs.
A concept for optimizing all relevant powertrain components (battery,
inverter, electric motor, and gearbox) is presented by [22], however no equa-
tions are shown. References [10,24] investigate the economic and dynamic per-
formance benefits of connecting the electric motor of BEVs with a two-speed
transmission, while optimizing the gear ratio using dynamic programming and
an heuristic approach, respectively. Since in both works specific load cycles are
used, none of these approaches are robust against uncertainties in the load, i.e.
are able to guarantee a working system for loads that are different than expected.
In the following, we present a new systematic methodology to derive
optimally-matched gearboxes for an electric motor. We optimize the efficiency
and dimension of a gearbox given a set of load scenarios. To ensure a work-
ing system for a whole range of load points, besides the scenarios considered
for calculating the objective value, we use additional constraints to improve the
robustness.
consists of one electric motor combined with one multi-gear gearbox. Power-
trains with multiple motors and other components like the traction battery or
the inverter are not considered, neither are the effects of recuperation on the
efficiency of electric vehicles, as they mainly affect the battery size and not the
transmission system. We refer to [12] for details on this topic.
1,200
1,000
800
T W in kg m2 s−2
600
400
200
0
0 20 40 60 80 100 120 140
W −1
Ω in s
Fig. 1. Final clustering result with 6 clusters. The centers are shown with filled boxes
marked with an ×. To ensure a robust design, we also generate the convex hull com-
prising all load points, here depicted as black circles linked with straight lines.
β is the angle of inclination, which is set to 20◦ based on [27, p. 726]. The
dimension d is the same for all gears g in the gearbox, since they use the same
shafts. The considered multi-gear design uses gear-dependent transmission ratios
ig . We restrict the spread of gears by the following constraints:
Set Description
G Set of possible gears
K Convex hull of load points in Fig. 1
Q Set of support points for linearization of motor model in Φ direction
R Set of support points for linearization of motor model in Ψ direction
S Set of simplices for linearization of motor model
T Set of load scenarios
Scalar Description Value
A Slope of domain restriction constraint −1.04314
B Intercept of domain restriction constraint 1.25974
D Normalization length in mm 100
M Module, measured in mm 3
S max Difference in gear ratio between different gears 3
T max Maximum motor momentum in N m 1500
γ Weighting factor 0.05
ηB Bearing efficiency of switchable gears 0.98
ηgG Gear efficiency of gears in mesh 0.99
ηS Sealing efficiency of switchable gears 0.99
πt Probability of occurrence of load scenario t ∈ T 1/|T |
Ω max Maximum motor velocity in s−1 367
Parameter Description
TkW Momentum input for corner point k ∈ K
TtW Momentum input of load scenario t ∈ T
Xq,r Dataset point in Ψ direction
Yq,r Dataset point in Φ direction
Zq,r Efficiency value at position (X, Y )
ΩkW Angular velocity input for corner point k ∈ K
ΩtW Angular velocity input of load scenario t ∈ T
Variable Description Domain
as,t Binary decision variable to select simplex s ∈ S in load scenario t ∈ T {0, 1}
bg,t Binary variable to choose a gear g ∈ G in a load scenario t ∈ T {0,1}
d Distance of shafts in mm {25, ..., 210}
ig Transmission for gear g ∈ G [0, 6]
tM
k,g Motor momentum for corner point k of K and gear g ∈ G [0, T max ]
tM
t Motor momentum in load scenario t ∈ T [0, T max ]
zg,1 Number of teeth of engine gear wheel {17, ..., 70}
zg,2 Number of teeth of output gear wheel {17, ..., 70}
ηtM Approximated motor efficiency in load scenario t ∈ T [−1, 1]
λq,r,t Linearization variable for support point (q, r) in load scenario t ∈ T [0, 1]
Φk,g Normalized momentum for point k ∈ K and gear g ∈ G [0, 1]
Φt Normalized momentum in load scenario t ∈ T [0, 1]
Ψk,g Normalized angular velocity for point k ∈ K and gear g ∈ G [0, 1]
Ψt Normalized angular velocity in load scenario t ∈ T [0, 1]
M
ωk,g Motor angular velocity for point k ∈ K and gear g ∈ G [0, Ω max ]
ωtM Motor angular velocity in load scenario t ∈ T [0, Ω max ]
Finding Global-Optimal Gearbox Designs for Battery Electric Vehicles 921
To avoid inference of engaged gear wheels, the number of teeth of the smaller
wheels are lower bounded by 17, cf. [27, p. 714]. Moreover, we set an upper bound
of 70:
17 ≤ zg,j ≤ 70 ∀g ∈ G, ∀j ∈ {1, 2}. (4a)
To consider the efficiency η M of the electric motor within the optimization pro-
gram, we use a generic functional description of the efficiency map of a perma-
nent magnet synchronous motor (PMSM) in Eq. (5a), cf. Sect. 3.3. The variables
Ψt , Φt ∈ [0, 1] represent the normalized motor momentum and rotational speed,
where the normalization is given by Eqs. (5b) and (5c). Equation (5d) restricts
the possible motor domain to physically relevant parts.
tM
t
Ψt = ∀t ∈ T (5b)
T max
ωtM
Φt = ∀t ∈ T (5c)
Ω max
Φt ≤ AΨt + B ∀t ∈ T (5d)
The central part of the gearbox optimization is given by the following constraints:
⎛ ⎞
tM
t
⎝ ig bg,t ηgG ⎠ η B η S = TtW ∀t ∈ T , (6a)
g∈G
⎛ ⎞
ωtM = ΩtW ⎝ ig bg,t ⎠ ∀t ∈ T , (6b)
g∈G
SOS1(bg,t ∀g ∈ G) ∀t ∈ T . (6c)
The sum of gear-dependent transmission ratios ig in Eqs. (6a) and (6b) contains
the binary variable bg,t indicating the used gear in each load scenario. The special
ordered set of type 1 in Eq. (6c) ensures that only one gear is used in each
scenario. The efficiency of a pair of meshing gears is considered in Eq. (6a),
with the constant efficiency factor ηgG . Additional efficiency parameters are the
bearing efficiency η B and the sealing efficiency η S . To get a robust solution, we
add further constraints (Eqs. (7a)–(7e)) to restrict the solution space to solutions
which fulfill the most demanding loads. These most demanding loads correspond
to the convex hull K of all points of the WLTP based demand cycle shown in
Fig. 1.
tM
k,g
Ψk,g = max ∀k ∈ K, ∀g ∈ G (7a)
T
M
ωk,g
Φk,g = max
∀k ∈ K, ∀g ∈ G (7b)
Ω
Φk,g ≤ AΨk,g
r
+B ∀k ∈ K, ∀g ∈ G (7c)
922 P. Leise et al.
tM G B S W
k,g ig ηg η η = Tk ∀ t ∈ T , ∀g ∈ G (7d)
M
ωk,g = ΩkW ig ∀ k ∈ K, ∀g ∈ G (7e)
As an objective, we consider the motor efficiency in each load scenario and the
dimension of the gearbox modeled by the distance d of both shafts. Both terms
are weighted against each other by a user-specific weighting factor γ = 0.05:
d
min γ − (1 − γ) πt ηtM . (8)
D
t∈T
S(q, r) is the set of simplices that are adjacent to vertex (q, r).
Next to the approximation of the motor efficiency, uncertainty in the loads
of the gearbox output is also considered by generating different load scenarios
based on the WLTP load cycle. Thus, the derived MINLP model enables us not
only to find an optimal gearbox layout, but also an optimal control strategy for
changing gears in each load scenario, maximizing the expected efficiency.
Finding Global-Optimal Gearbox Designs for Battery Electric Vehicles 923
a) b) c) 0.95
MOTOR EFFICIENCY
1 0.93
0.91
0.5 0.89
Φ
0.87
0 0.85
0 0.5 1 0 0.5 1 0 0.5 1
0.83
Ψ Ψ Ψ
Fig. 2. (a) Approximation of motor efficiency based on radial basis function. (b) Piece-
wise linear approximation with a 10×10 grid (for illustration only). (c) Piecewise linear
approximation with the used 30 × 30 grid.
4 Results
We use the MINLP solver SCIP 6.0, cf. [11], to optimize the model. The complete
software stack was implemented in Python 3.6.7 using PySCIPOpt 2.1.2 [16].
The computations were done on a Linux-based machine with an Intel i7-6600U
and 16 GB RAM. The gap limit was set to 0.5% and the time limit to 7200 s.
The final results are shown in Table 2. Compared to an optimized one-speed
gearbox, a multi-gear gearbox yields around 1% higher efficiency and around
8–10% decrease in the maximum momentum, cf. Table 2. The solution with 3
gears reaches the time limit. Therefore the optimality gap is slightly higher than
for the other shown results. It is also important to mention that the performance
increase due to more gears also leads to higher diameters.
Funding
Funded by Deutsche Forschungsgemeinschaft (DFG, German Research Founda-
tion) – project number 57157498 – SFB 805.
References
1. Altherr, L.C., Dörig, B., Ederer, T., Pelz, P.F., Pfetsch, M.E., Wolf, J.: A mixed
integer nonlinear program for the design of mechanical transmission systems. Oper.
Res. Proc. 2016, 227–233 (2018)
2. Barber, C.B., Dobkin, D.P., Dobkin, D.P., Huhdanpaa, H.: The quickhull algorithm
for convex hulls. ACM Trans. Math. Softw. (TOMS) 22(4), 469–483 (1996)
3. Belotti, P., Kirches, C., Leyffer, S., Linderoth, J., Luedtke, J., Mahajan, A.: Mixed-
integer nonlinear optimization. Acta Numer. 22, 1–131 (2013)
4. Bishop, C.: Pattern Recognition and Machine Learning, vol. 1. Springer (2006)
5. Bussieck, M.R., Vigerske, S.: MINLP solver software (2010)
6. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern
Anal. Mach. Intell. 2, 224–227 (1979)
7. DIN 780-1: Series of modules for gears; modules for spur gears (1977)
8. Dörig, B., Ederer, T., Pelz, P.F., Pfetsch, M.E., Wolf, J.: Gearbox design via mixed-
integer programming. In: Proceedings of the VII European Congress on Compu-
tational Methods in Applied Sciences and Engineering (2016)
9. Egbue, O., Long, S.: Barriers to widespread adoption of electric vehicles: an analysis
of consumer attitudes and perceptions. Energy Policy 48, 717–729 (2012)
10. Gao, B., Liang, Q., Xiang, Y., Guo, L., Chen, H.: Gear ratio optimization and
shift control of 2-speed I-AMT in electric vehicle. Mech. Syst. Signal Process. 50,
615–631 (2015)
11. Gleixner, A., Bastubbe, M., Eifler, L., Gally, T., Gamrath, G., Gottwald, R.L.,
Hendel, G., Hojny, C., Koch, T., Lübbecke, M.E., Maher, S.J., Miltenberger, M.,
Müller, B., Pfetsch, M.E., Puchert, C., Rehfeldt, D., Schlösser, F., Schubert, C.,
Serrano, F., Shinano, Y., Viernickel, J.M., Wegscheider, F., Walter, M., Witt,
J.T., Witzig, J.: The SCIP optimization suite 6.0. Tech. rep. (2018). (Optimization
Online)
Finding Global-Optimal Gearbox Designs for Battery Electric Vehicles 925
12. Grunditz, E.A., Thiringer, T.: Characterizing BEV powertrain energy consump-
tion, efficiency, and range during official and drive cycles from Gothenburg, Sweden.
IEEE Trans. Veh. Technol. 65(6), 3964–3980 (2016)
13. Guzzella, L., Sciarretta, A., et al.: Vehicle Propulsion Systems, vol. 1. Springer
(2007)
14. Hadj, N.B., Abdelmoula, R., Chaieb, M., Neji, R.: Permanent magnet motor effi-
ciency map calculation and small electric vehicle consumption optimization. J.
Electr. Syst. 14(2), (2018)
15. Hofstetter, M., Lechleitner, D., Hirz, M., Gintzel, M., Schmidhofer, A.: Multi-
objective gearbox design optimization for xEV-axle drives under consideration of
package restrictions. Forsch. Im Ing. 82(4), 361–370 (2018)
16. Maher, S., Miltenberger, M., Pedroso, J.P., Rehfeldt, D., Schwarz, R., Serrano, F.:
PySCIPOpt: mathematical programming in python with the SCIP optimization
suite. In: International Congress on Mathematical Software, pp. 301–307. Springer
(2016)
17. McDonald, R.: Electric motor modeling for conceptual aircraft design. In: 51st
AIAA Aerospace Sciences Meeting including the New Horizons Forum and
Aerospace Exposition, p. 941 (2013)
18. Misener, R., Floudas, C.: Piecewise-linear approximations of multidimensional
functions. J. Optim. Theory Appl. 145(1), 120–147 (2010)
19. Rinderknecht, S., Meier, T.: Electric power train configurations and their transmis-
sion systems. In: International Symposium on Power Electronics Electrical Drives
Automation and Motion (SPEEDAM), pp. 1564–1568. IEEE (2010)
20. Salomon, S., Avigad, G., Purshouse, R.C., Fleming, P.J.: Gearbox design for uncer-
tain load requirements using active robust optimization. Eng. Optim. 48(4), 652–
671 (2016)
21. Savsani, V., Rao, R., Vakharia, D.: Optimal weight design of a gear train using
particle swarm optimization and simulated annealing algorithms. Mech. Mach.
Theory 45(3), 531–541 (2010)
22. Schönknecht, A., Babik, A., Rill, V.: Electric powertrain system design of BEV and
HEV applying a multi objective optimization methodology. Transp. Res. Procedia
14, 3611–3620 (2016)
23. Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on stochastic programming:
modeling and theory. SIAM (2009)
24. Tan, S., Yang, J., Zhao, X., Hai, T., Zhang, W.: Gear ratio optimization of a
multi-speed transmission for electric dump truck operating on the structure route.
Energies 11(6), 1324 (2018)
25. Tutuianu, M., Marotta, A., Steven, H., Ericsson, E., Haniu, T., Ichikawa, N., Ishii,
H.: Development of a world-wide worldwide harmonized light duty driving test
cycle (WLTC). Draft Technical Report, DHC subgroup, GRPE-67-03 (2013)
26. Vielma, J.P., Ahmed, S., Nemhauser, G.: Mixed-integer models for nonsepara-
ble piecewise-linear optimization: unifying framework and extensions. Oper. Res.
58(2), 303–315 (2010)
27. Wittel, H., Muhs, D., Jannasch, D., Voßiek, J.: Roloff/Matek Maschinenelemente,
vol. 21. Vieweg + Teubner Verlag (2013)
Location Optimization of Gas Power Plants
by a Z-Number Data Envelopment Analysis
Abstract. Electricity demand has been ever increasing with growth and
development of our country. Demand for electricity is not only from industrial
consumers but from home consumers. Supply of requisite electric power is
frequently in need of establishing new power plants. Considering the impact of
power plant location on production costs, energy transmission costs, environ-
mental issues, etc., the importance of selecting an optimal location to establish a
power plant is clarified. Because of attention to the major portion of thermal
power plants, namely the gas power plants in electricity production, establishing
such power plants requires a specific concern in Iran. Hence, location-allocation
to a gas power plant in Iran is considered in the presented study. 25 cities are
studied for location-allocation to a gas power plant and the optimum location
should be selected by applying a Z-number data envelopment analysis (Z-DEA).
The proposed approach considers most important and effective indices including
the pollution rate, land cost, economic rate, natural risks, distance from the
electricity distribution network, distance from gas supply station, proximity to
water, population and labor force rate, topographic feasibility, electricity gen-
eration amount and land feasibility. Finally, the fuzzy DEA (F-DEA) model is
also applied to validate the obtained results.
1 Introduction
consist in abundant social, economic, political and environmental consequences and for
that, it seems to be necessary to initiate overall studies in advance to construct a power
plant.
There are two types of fuel used in gas power plants, which are gas oil and natural
gas. To generate each megawatt of electricity in such power plants, approximately 55 L
of gas oil and 313 cubic meters of natural gas are being used and versus to other
thermal power plants, gas power plants have some noticeable privileges, such as rapid
installation, lower price and fast launch. A basic step in establishing a gas power plant
is to determine an appropriate location to construct that. Lior (2012) stated that
selecting a location for thermal power plants impacts on the amount of energy gen-
eration, productivity of the power plant, production and transmission costs, economic
development and environment. Moreover, energy and consumption resources influence
on the quality of the environment and other vital resources, such as water and food
(Lior 2012). Selecting an appropriate location needs to consider a variety of criteria and
factors. In location allocation, the effort is to compare parameters together at the same
scale in the present study, in which 25 cities in Iran are evaluated as potential locations
to allocate to the gas power plant. To rank the potential locations, the Z-number data
envelopment analysis (Z-DEA) model is applied as a novel competent model in a
highly uncertain condition.
Based on the previous studies and experts’ opinions, 11 significant and effective
factors in establishing a gas power plant are detected. These indices are categorized into
three groups of techno-economic, social and environmental that are described below.
• Proximity to water: Gas power plants require a noticeable amount of water for their
operation. The consumed water amount depends on several elements, namely the
cooling tower, cooling system, weather condition, the age of the power plant,
maintenance condition, etc.
• Natural risks: Location to construct a power plant shall be selected in a way not to
be in the path of seasonal floods and storms, seismic faults, active volcanoes,
tsunamis, etc., as far as possible.
• Topographic feasibility: The land should be level and free from deep and high
terrains if possible; otherwise, power plant construction will confront with several
difficulties. Hence for choosing the land, it will be better to try finding the low slope
and fairly even surfaces (Azadeh et al. 2014).
• Electricity generation amount: The proximity of the power plant place to the zones
with higher needs of electricity leads to decrease in wastes and economic savings.
• Distance from electricity distribution network: One of the main parameters in
electrical energy waste is the length of transmission lines, hence the closer the
location to the electrical distribution network, the better (Azadeh et al. 2014).
• Land feasibility (hardness/toughness): To construct the power plant on a land of
which underneath stone levels have enough sustainability, it is essential to study the
geometric levels of the area in advance (Azadeh et al. 2014).
• Pollution rate: Because of attention to the cities expansion and rising air pollutants,
most of the big industrial cities confront with polluted air and as this pollution is
harmful and dangerous for the residents’ health.
928 F. Fakhari et al.
• Population and labor force rate: Population centers are a part of main electricity
consumption and proximity of the selected location to them means the proximity of
electricity generation and consumption centers together.
• Land cost: In some areas, the cost of establishing a power plant is higher than the
other areas, also constructing a gas power plant requires a considerable measure of
land (Azadeh et al. 2014).
• Economic rate: The economic growth rate is simply a ratio in terms of percentage
which indicates the incremental value produced by the economy of a country in a
period (usually a year) divided by the changes in the previous period.
• Distance from a gas supply station: the shorter this distance is, the faster and easier
will be the fuel delivery.
The structure of the present study is as follows. The next section reviews the latest
related literature. In Sect. 3, problem-solving stages are described along with the Z-
DEA method. In Sect. 4 results of the Z-DEA and F-DEA models plus the relevant
statistical methods are presented. Finally, the consequences are discussed in Sect. 5.
2 Literature Review
Concerning the variety of electricity generating power plants, there are some studies
done to evaluate them and compare their efficiencies. Some of them are explained
below. Lam and Shiu (2004) measured the efficiency and productivity of electrical
industry in China, applying the DEA and Malmquist index, considering the generated
electricity in each power plant in megawatt hour (MWh) as the output variable and the
nominal capacity (MW), fuel and labor (person) as input variables. Azadeh et al. (2008)
presented a hierarchy approach based on the DEA and performed location allocation to
solar power plants in different cities and areas of Iran. Chatzimouratidis and Pilavachi
(2009) evaluated 10 types of electricity generation power plants with due attention to
technical, economic and sustainability criteria using the analytic hierarchy pro-
cess (AHP) method. The results in the mentioned study showed that renewable ener-
gies were indicated as the best solution for the future as they do not need any fuel and
thus there will not be any fuel cost to generate them. Among those nine power plants,
hydropower, geothermal and wind power plants were respectively at the highest ranks.
Ren (2010) presented a multi-objective model for location-allocation to construct a
thermal power plant with two objectives of minimizing the cost and maximizing the
efficiency. Choudhary and Shankar (2012) performed a study on location-allocation to
a thermal gas power plant in India aiming to minimize socio-economic, environmental
and infrastructure costs and maximize the electricity generation productivity. They
applied the fuzzy AHP and TOPSIS methods to evaluate and select the optimum
location for thermal power plants.
Chatzimouratidis and Pilavachi (2012) proceeded to study and evaluate the electric
generating power plants in Greece from various aspects using the AHP method. Based
on their findings, geothermal power plants, wind power plants, biomass power plants,
nuclear power plants, combined cycle power plants, gas power plants, coal/lignite
power plants and oil power plants were respectively the first preferences of the elec-
trical production in Greece. Asayesh and Raad (2014) proceeded the performance
Location Optimization of Gas Power Plants … 929
assessment of 26 gas stations in two northern cities of Iran and identified efficient and
inefficient stations. They applied the DEA method for each gas station as a system of
four input factors and three output factors.
Azadeh et al. (2014) proceeded to determine the optimum location among all alterna-
tives to establish a wind power station. 25 cities considering five districts within each city
were studied, and finally the most efficient city and district was selected by identifying input
and output factors and applying the fuzzy DEA.El-Azab and Amin (2015) reviewed the
solar energy’s current status in the Middle East and North Africa. Also, they proposed an
algorithm for optimizing solar plants site selection. Jahangiri et al. (2016) used a GIS-based
method to determine the best location for wind-solar plants based on the data collected from
400 meteorological stations in the Middle East. Lee et al. (2017) used a hybrid multiple-
criteria decision-making approach for photovoltaic solar plant location selection. Rezaei
et al. (2018) used an MCDM method to determine the best location for the construction of a
wind-solar hybrid plant in the Fars province, Iran. The results show that Eghlid is the best
option for the construction of a wind-solar plant. According to the obtained results, the cities
of Firuzabad, Estahban, Safashahr, Bavanat, Izadkhast and Arsanjan hold the next ranks in
terms of suitability for the construction of solar-wind hybrid plants.
Zadeh (2011) introduced an assumption of the Z-number that could explain experts’
information into a linguistic variable. This variable was an ordered pair (E, F), where
the first number E was the fuzzy constraint and F was defined as the reliability of
E. The proposed model is an integral model based on the Z-number that not only holds
the DEA properties but is capable of considering uncertainties in decision-making units
(DMUs) along with their relevant reliabilities.
Input and output values are in shape of Z-numbers in this model. Values EA f lh are
related to the o-th output for the h-th DMU, where Alh refer to the reliability of them in
shape of triangular fuzzy numbers. Equations (1)–(4) show the CCR model based on
Z-number and Eqs. (5)–(8) are the dual form of Eq. (1) (Azadeh and Kokabi 2016).
Indicators:
H Indicators of DMUs
L Indicators of inputs
O Indicators of outputs
X Number of DMUs
Y Number of inputs
W Number of outputs
DMU n n-th DMU
DMU 0 Target DMU (m = 0)
Parameters:
f lh Z-number value of input l related to DMU h
ZA
f lh Fuzzy value of input l related to DMU h
EA
930 F. Fakhari et al.
f lh
FA Fuzzy reliability value of input l related to DMU h
f
ZA oh Z-number value of output o related to DMU h
Variables:
ah Weight variables in the proposed model to obtain the efficiency
X0 Objective value of the (efficiency) DEA model
Min X0 ð1Þ
s.t.
Xa
f lh X0 ZX
ah ZX f l0 ; l ¼ 1; . . .; y ð2Þ
h¼1
Xa
f oh ZX
ah ZY f o0 ; o ¼ 1; . . .; w ð3Þ
h¼1
ah 0; h ¼ 1; . . .; a ð4Þ
Xw
Max X0 ¼ g
vO ZX L0 ¼ 1 ð5Þ
O¼1
s.t.
Xy
f Lo ¼ 1
gl ZY ð6Þ
L¼1
Xq Xq
g
v ZY oh
glh 0;
Vx ZX i ¼ 1; . . .; t ð7Þ
O¼1 o L¼1
vw :gl 0; o ¼ 1; 2; . . .; w; l ¼ 1; . . .; q ð8Þ
The above models are a non-linear one. To make them to linear one, a method to
defuzzify is first used and what will be gained is a set of membership functions of
reliability amounts, F e ¼ fðX:lF ðXÞjX 2 ½0:1g, where l ðXÞ is the membership
e
F
function of the reliability amount. Equation (3) is applied for using the center of gravity
(COG) method (Azadeh and Kokabi 2016).
R
Xle ðxÞ dx
U¼ R F
ð9Þ
Xe ðxÞ dx
F
Assuming that the reliability amounts of DMUs are in shape of triangular mem-
bership functions, then we have:
Location Optimization of Gas Power Plants … 931
eþf þd
U¼ ð10Þ
3
Equation (11) transforms the input and output amounts of DMUs into the gravity
Z-number with an abnormal triangular membership function.
s.t.
For more details, the esteemed readers are suggested to refer to Azadeh and Kokabi
(2016).
The fuzzy programming of the Z-CCR model is presented in Expressions (12–15).
Equations (16–21) are the dual model of the Z-CCR.
Xw
Max tp ¼ o¼1
v o y l
op :y m u
op :y op ð12Þ
s.t.
Xq
lp lp lp ¼ 1 :1:1
:x :x ð13Þ
l m u l u
l¼1
u l x
Xw Xq
v
o¼1 o
xloh :xm
oh :xoh
u
u xl :xm :xu 0;
l¼1 l lh lh lh
h ¼; . . .; a ð14Þ
vo :ul 0; o ¼ 1; . . .; w; l ¼ 1; . . .; q ð15Þ
Xw
Max bP ¼ x
O¼1 OP
ð16Þ
s.t.
Xq
y
L¼1 LP
¼1 ð17Þ
Xw Xa
y
o¼1 lh
x
l¼1 lh
0; h ¼ 1; . . .; a ð18Þ
lh þ ð1 hÞxlh Þ
ul ððhxm l
xlh ul ðhxm
lh þ ð1 hÞxlh ; h ¼ 1; . . .; q; l ¼ 1; . . .; a
u
ð19Þ
oh þ ð1 hÞyoh Þ
uo ððhym l
yoh vo ðhym
oh þ ð1 hÞyoh ; h ¼ 1; . . .; q; o ¼ 1; . . .; w
u
ð20Þ
vo :ul 0; o ¼ 1; . . .; w; l ¼ 1; . . .; a ð21Þ
932 F. Fakhari et al.
In this research, information related to the applied indices in this paper were collected
from Statistical Center of Iran, NIGC, and Tavanir. The Statistical Center of Iran has
been established to create a concentrated statistical system aiming to provide accurate
and comprehensive statistics in different fields of economic and social fields to meet the
needs of scientific and research planning in Iran. In this section, the procedure needed
to proceed the study along with results Z-DEA approaches, which are presented for a
location to a gas power plant in Iran are explained in details.
4.3 Determining the Efficiency of Each Index with F-DEA and Z-DEA
Models
According to Table 1, the specific reliability is assigned to every input and output
variables based on their involving intervals. The reliability amounts are determined by
experts and stated by three linguistic variables of “sure”, “usually” and “likely”
(Azadeh and Kokabi 2016). Figure 1 shows the amounts of linguistic reliability. Then
Z-DEA model is implemented in Matlab v.2014.
Table 1. Classification of reliability values given by experts (Azadeh and Kokabi 2016)
Z = (E, F) Interval data Linguistic variable Membership functions parameters
[15,20] Sure [0.8, 1, 1]
[10,15) Usually [0.65, 0.75, 0.85]
[1,10) Likely [0.5, 0.6, 0.7]
In the Z-DEA and F-DEA models, it is required to consider different a-cuts due to
the high uncertainty in the model. In this paper, the efficiency of each DMU (i.e., city)
is measured by 14 different a-cuts which are 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7,
0.8, 0.9, 0.95, 0.99 and 1.
Location Optimization of Gas Power Plants … 933
µ
usually sure
likely
Fig. 1. Fuzzy sets of linguistic reliability values (Azadeh and Kokabi 2016)
j h hi j
Wi ¼ P ð22Þ
jh hi j
where h is the full average efficiency and hi is the average efficiency of the i-th factor
(Fig. 2).
5 Conclusions
Providing the required electricity obligates the development of electric power industry
and subsequent establishment of new power plants. Allocating a proper location to
construct a power plant, greatly impacts on efficiency, energy generation trend, etc. The
main goal of the present study is an appropriate location-allocation to a gas power plant
using the novel Z-DEA method for the first time. In this paper, 11 indices were
introduced. To evaluate and rank the cities as well as weigh the indices, the proposed
model was the Z-DEA. After efficiency measurements with the Z-DEA for 14 a-cuts,
the optimum alpha was determined at 0.01, exerting the noise analysis and statistical
tests. Index weighing and determining the preferences of indices were performed with
the sensitivity analysis and according to the obtained results, the generated electricity
index weighing 30.68%, was the most important index in location-allocation to the
power plant while the population and labor force rate index weighting 0.13% was
identified as the least important index. Hamedan, Gilan and Zanjan were ranked
respectively the most proper locations for allocating to the gas power plant.
References
Aras, H., Erdoğmuş, Ş., Koç, E.: Multi-criteria selection for a wind observation station location
using analytic hierarchy process. Renew. Energy 29(8), 1383–1392 (2004)
Asayesh, R., Raad, Z.F.: Evaluation of the relative efficiency of gas station by data envelopment
analysis. Int. J. Data Envel. Anal. Oper. Res. 12–15 (2014)
Azadeh, A., Ghaderi, S., Maghsoudi, A.: Location optimization of solar plants by an integrated
hierarchical DEA PCA approach. Energy Policy 36(10), 3993–4004 (2008)
Azadeh, A., Kokabi, R.: Z-number DEA: a new possibilistic DEA in the context of Z-numbers.
Adv. Eng. Inform. 30(3), 604–617 (2016)
Azadeh, A., Rahimi-Golkhandan, A., Moghaddam, M.: Location optimization of wind power
generation–transmission systems under uncertainty using hierarchical fuzzy DEA: a case
study. Renew. Sustain. Energy Rev. 30, 877–885 (2014)
Chatzimouratidis, A.I., Pilavachi, P.A.: Technological, economic and sustainability evaluation of
power plants using the Analytic Hierarchy Process. Energy Policy 37(3), 778–787 (2009)
Chatzimouratidis, A.I., Pilavachi, P.A.: Decision support systems for power plants impact on the
living standard. Energy Convers. Manag. 64, 182–198 (2012)
Choudhary, D., Shankar, R.: An STEEP-fuzzy AHP-TOPSIS framework for evaluation and
selection of thermal power plant location: a case study from India. Energy 42(1), 510–521
(2012)
El-Azab, R., Amin, A.: Optimal solar plant site selection. In: Paper presented at the SoutheastCon
2015 (2015)
Jahangiri, M., Ghaderi, R., Haghani, A., Nematollahi, O.: Finding the best locations for
establishment of solar-wind power stations in Middle-East using GIS: A review. Renew.
Sustain. Energy Rev. 66, 38–52 (2016)
Lam, P.-L., Shiu, A.: Efficiency and productivity of China’s thermal power generation. Rev. Ind.
Organ. 24(1), 73–93 (2004)
Lee, A.H., Kang, H.-Y., Liou, Y.-J.: A hybrid multiple-criteria decision-making approach for
photovoltaic solar plant location selection. Sustainability 9(2), 184 (2017)
936 F. Fakhari et al.
Lior, N.: Sustainable energy development: the present (2011) situation and possible paths to the
future. Energy 43(1), 174–191 (2012)
Ren, F. Optimal site selection for thermal power plant based on rough sets and multi-objective
programming. In: Paper presented at the E-Product E-Service and E-Entertainment (ICEEE),
2010 International Conference (2010)
Rezaei, M., Mostafaeipour, A., Qolipour, M., Tavakkoli-Moghaddam, R.: Investigation of the
optimal location design of a hybrid wind-solar plant: A case study. Int. J. Hydrogen Energy
43(1), 100–114 (2018)
Zadeh, L.A.: A note on Z-numbers. Inf. Sci. 181(14), 2923–2932 (2011)
Optimization of Power Plant Operation
via Stochastic Programming with Recourse
1 Introduction
Because of the prevalent environmental problems, the need to spread awareness about
the use of renewable energy is an urgent concern worldwide. In the Paris Agreement
issued in 2016, the long-term goal was to keep the global average temperature increase
within 2 °C as a result of the industrial revolution. To achieve this, efforts are under
way to adopt smart community all over the world. Furthermore, in Japan, the reex-
amination of energy costs has been dealt with as a problem of management engineering
due to the liberalization of electricity. Based on these, introduction of renewable energy
as a new type of energy supply in large-scale facilities such as factories and shopping
centers is being studied. However, as the output of renewable energy is unstable,
decision making under uncertain conditions is required at the time of introduction.
In this research, an optimization model for operation planning using stochastic
programming by introducing photovoltaic power generation as renewable energy into
factory energy plant was developed. We showed that the modeling by stochastic
programming is more suitable as a realistic operation plan than conventional deter-
ministic mixed integer programming method and economic evaluation on the imple-
mented plan was carried out.
Figure 1 shows the outline of the basic model which can quantitatively evaluate the
energy cost etc. of the smart community. An industrial model of energy consumption at
the factory in the basic model is modeled based on a benchmark problem seeking an
© Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 937–948, 2020.
https://doi.org/10.1007/978-3-030-21803-4_93
938 T. Fukuba et al.
optimum operation plan of a plant energy plant presented by Suzuki and Okamoto [1],
Inui and Tokoro [2].
The optimization model for photovoltaic generation extends the benchmark problem by
including photovoltaic generation and a storage battery. To consider the uncertainty of
photovoltaic power generation, stochastic programming is applied.
The benchmark problem is of an energy plant that purchases electricity and gas as
shown in the dotted frame in Fig. 1, and generates electricity, heat, and steam to meet
the demand. As equipment, there are a gas turbine, a boiler, two kinds of refrigerators,
and a thermal storage tank. The objective of the benchmark problem is to establish an
operation plan that minimizes the cost of purchasing electricity and gas while satisfying
the constraints on equipment and energy balance. Decision variables consists of vari-
ables related to the amount of purchase and generation of energy, and variables con-
cerning the start and stop of each device.
The photovoltaic generation introduction model includes MW-class photovoltaic
generation equipment so that the generated electricity flows to the demand and turbo
refrigerators. The photovoltaic generation of power is for in-house power consumption
and for sale. We assumed that the storage battery can store only the electric power of
photovoltaic power generation, and that some of the charged electricity will be lost by
the time of discharge. Figure 1 shows the energy flow of the energy plant when
photovoltaic generation and storage batteries are introduced.
The uncertainty of photovoltaic generation output is expressed using a set of
deterministic scenarios. Because the uncertainty of the output of photovoltaic gener-
ation also affects the entire energy flow, decision variables related to the purchase
amount of energy and generation amount are also defined for each scenario. As a result,
the number of decision variables increases according to the number of scenarios. When
the number of scenarios is 30, the number of decision variables increases from 192 to
5826, considering the introduction of photovoltaic power generation.
Optimization of Power Plant Operation 939
Table 1 shows the definitions of the symbols used for formulating the photovoltaic
power generation introduction model:
Table 1. (continued)
i
CEr ; CFr
i
: Purchase cost of electricity and gas in time i
i i i
EL ; QL ; SL : Demand of electricity, heat, and steam in time i
Ei ; S ~i : Remaining amount of electric energy and steam in time i
rm rm
K: Number of scenarios
Prk : Probability of scenario k
i;k Electric energy of photovoltaic power generation in time
Epv :
i under scenario k
zinit
sb : Initial storage of storage battery
max
Esb : Capacity of storage battery
a: Charge and discharge efficiency
Decision Variable
xi;k i;k
t;j ; xs;j :
Heat production of the turbo refrigerator j and steam
absorption refrigerator j in time i under scenario k
i;k
g ; xb :
xi;k Gas consumption of gas turbine and the boiler in time
i under scenario k
yit;j ; yis;j : State of the turbo refrigerator j and the steam absorption
refrigerator j in time ið1 for running; 0 for stopped)
yig ; yib : State of the gas turbine and the boiler in time i (1 for
running, 0 for stopped)
zi;k
in :
Electric energy to be charged from the photovoltaic
generation to the storage battery in time i under scenario k
zi;k
out :
Discharge of storage battery in time i under scenario k
zi;k
pv :
Electric energy directly consumed from photovoltaic
power generation in time i under scenario k
zi;k
sb :
The storage of the storage battery in time i under scenario k
Function
i1;k i The heat storage of the thermal storage tank in time i under
Qi;k i;k i;k
ts xt ; xs ; Qts ; QL :
scenario k
i;k i;k i;k i Purchase of electric power in time i under scenario k
g ; xt ; zpv ; zout ; EL ; Erm :
Eri;k xi;k i
i;k i;k i;k i Remaining amount of steam in time i under scenario k
rm xg ; xb ; xs ; SL :
Si;k
fge xi;k Power generation of the gas turbine in time i under
g :
scenario k
fgs xi;k The steam generation of the gas turbine in time i under
g :
scenario k
fb xi;k Steam generation in the boiler in time i under scenario k
b :
ft;j xi;k Power input of the turbo refrigerator j in time i under
t;j :
scenario k
fs;j xi;k Steam input of steam absorption refrigerator j in time
s;j :
i under scenario k
Optimization of Power Plant Operation 941
i;k
t;j yt;j xt;j Qt;j yt;j ; i ¼ 1; ; I; j ¼ 1; ; Nt ; k ¼ 1; ; K
Qmin ð5Þ
i max i
i;k
s;j ys;j xs;j Qs;j ys;j ; i ¼ 1; ; I; j ¼ 1; ; Ns ; k ¼ 1; ; K
Qmin ð6Þ
i max i
Egmin yig fge xi;k
g Egmax yig ; i ¼ 1; ; I; k ¼ 1; ; K ð7Þ
Smin
b y i
b f b xi;k
b b yb ; i ¼ 1; ; I; k ¼ 1; ; K
Smax i
ð8Þ
s
yit;j yi1
t;j yt;j ; s ¼ i þ 1; ; min i þ Lt;j 1; I ; i ¼ 2; ; I; j ¼ 1; ; Nt ð9Þ
s
t;j yt;j 1 yt;j ; s ¼i þ 1; ; min i þ Lt;j 1; I ;
yi1 i
ð10Þ
i ¼2; ; I; j ¼ 1; ; Nt
s
yis;j yi1
s;j ys;j ; s ¼i þ 1; ; min i þ Ls;j 1; I ;
ð11Þ
i ¼2; ; I; j ¼ 1; ; Ns
s
s;j ys;j 1 ys;j ; s ¼i þ 1; ; min i þ Ls;j 1; I ;
yi1 i
ð12Þ
i ¼2; ; I; j ¼ 1; ; Ns
s
yig yi1
g yg ; s ¼ i þ 1; ; min i þ Lg 1; I ; i ¼ 2; ; I ð13Þ
s
g yg 1 yg ; s ¼ i þ 1; ; min i þ Lg 1; I ; i ¼ 2; ; I
yi1 ð14Þ
i
s
yib yi1
b yb ; s ¼ i þ 1; ; minfi þ Lb 1; I g; i ¼ 2; ; I ð15Þ
942 T. Fukuba et al.
s
b yb 1 yb ; s ¼ i þ 1; ; minfi þ Lb 1; I g; i ¼ 2; ; I
yi1 ð16Þ
i
zi1;k
sb þ azi;k i;k i;k 0;k I;k
in ¼ zsb þ zout ; i ¼ 1; ; I; k ¼ 1; ; K; zsb ¼ zsb ¼ zsb
init
ð17Þ
zi;k
sb Esb ; i ¼ 1; ; I; k ¼ 1; ; K
max
ð18Þ
i;k
i;k
Epv ¼ zi;k
pv þ zin ; i ¼ 1; ; I; k ¼ 1; ; K ð19Þ
xi;k
t;j 0; i ¼ 1; ; I; j ¼ 1; ; Nt ; k ¼ 1; ; K ð20Þ
xi;k
s;j 0; i ¼ 1; ; I; j ¼ 1; ; Ns ; k ¼ 1; ; K ð21Þ
zi;k
sb 0; i ¼ 1; ; I 1; k ¼ 1; ; K ð23Þ
Where
Nt n
X o XNs n o
i;k i;k i1;k i
¼ xi;k xi;k i þ Qloss ;
ts xt ; xs ; Qts
Qi;k ;Q s;j þ Qts þQ
i1;k
ð27Þ
L t;j L
j¼1 j¼1
i ¼ 1; ; I; k ¼ 1; ; K; Q0;k
ts ¼ Qinit
ts
XNt n o
i;k i;k i;k i Li fge xi;k
Eri;k xi;k ; x ; z ; z ; E ; E i
¼ ft;j xi;k þE þ Erm
i
zi;k
pv zout 0;
i;k
g t pv out L rm
j¼1
t;j g
ð28Þ
i ¼ 1; ; I; k ¼ 1; ; K
X Ns n o
i;k i;k i;k i
rm xg ; xb ; xs ; SL ¼ fgs xg
Si;k i;k
þ fb xi;k fs;j xi;k SiL ;
b s;j
j¼1 ð29Þ
i ¼ 1; ; I; k ¼ 1; ; K
fge xi;k
g g ; i ¼ 1; ; I; k ¼ 1; ; K
¼ age xi;k ð30Þ
fgs xi;k
g g ; i ¼ 1; ; I; k ¼ 1; ; K
¼ ags xi;k ð31Þ
fb xi;k
b ¼ ab xi;k
b ; i ¼ 1; ; I; k ¼ 1; ; K ð32Þ
Optimization of Power Plant Operation 943
ft;j xi;k i;k
t;j ¼ at;j xt;j ; i ¼ 1; ; I; j ¼ 1; ; Nt ; k ¼ 1; ; K ð33Þ
xi;k
s;j
fs;j xi;k
s;j ¼ n o2 ; i ¼ 1; ; I; j ¼ 1; ; Ns ; k ¼ 1; ; K
as;j xi;k
s;j þ bs;j xi;k
s;j þ cs;j
ð34Þ
Objective function (1) represents the minimization of the expected value of the
purchase cost of electricity and gas. Constraints (2) and (3) represent the capacity
constraints of the thermal storage tank. Inequality (2) represents the case of the first
time zone to the (I − 1) time zone, and (3) represents the case of the I time zone.
Equation (4) is a constraint on the remaining amount of steam. Inequalities (5)–(8)
represent the capacity constraints of the energy production amounts of two types of
refrigerators, gas turbines, and boilers. Inequalities (9)–(16) are constraints on on/off
decision of each device, and they are represented by a linear inequality based on the
unit commitment problem [3]. For these restrictions, two constraints are used for each
unit, and the first one means that once the unit starts, it must keep its operating state for
a certain period of time. The second one represents the case of stop. Constraint (17) is
related to the storage amount of the storage battery. To consider that a certain amount
of electric power is lost at the time of charge and discharge, zi;k in is multiplied by
efficiency a. The initial charge amount and the storage amount in time zone I are the
same. Constraint (18) represents the capacity of the storage battery.
Equation (19) shows that the power generated by the photovoltaic generation is
divided into the power to satisfy the demand and the power charged in the storage battery.
Constraints (20)–(26) represent non-negative constraints and 0–1 constraints of decision
variables.
Function (27) represents the heat storage amount of the thermal storage tank.
Function (28) represents the purchase amount of the electricity. However, as it does not
consider power selling, it imposes non-negativity constraints. Function (29) repre-sents
the remaining amount of steam. Function (30)–(34) represent the relational expressions
of the input and output quantities of the gas turbine, the boiler, and the two types of
refrigerators. Function (34) represents the relationship between the input and output
amounts of the steam absorption refrigerator. This function is expressed in the form of
non-convex nonlinear constraints to account for a practical operating plan. Because
these types of constraints are non-convex, they are difficult to deal with.
This research model includes the nonlinear constraint Eq. (34) expressing the relationship
between the input and output quantities of the steam absorption refrigerator. To treat the
model as a mixed integer programming problem, it was linearized by piecewise linear
approximation [4] on (34).
944 T. Fukuba et al.
Let fs;j xi;k
s;j be the approximation of f s;j xi;k
s;j and xs;j;l ; fs;j xs;j;l 8i; j;
kði ¼ 1; ; I; j ¼1;
; Ns ; k ¼ 1; ; K Þ be the split points of the function. The
approximation fs;j x i;k
is given by Eqs. (36) and (37).
s;j
Constraints (38) and (39) represent the SOS 2 constraint that requires only two
adjacent ki;k
j;l is at most positive. Because of adding the decision variables to the
piecewise linear approximation, the number of decision variables further increases,
resulting in a large-scale mixed integer programming problem.
X
pj
xi;k
s;j ¼ ki;k
j;l xs;j;l ð35Þ
l¼1
X pj
fs;j xi;k ¼ ki;k ð36Þ
s;j j;l fs;j xs;j;l
l¼1
X
pj
ki;k i;k
j;l ¼ 1; kj;l 0; l ¼ 1; ; pj ð37Þ
l¼1
9
ki;k i;k
j;1 lj;1 >
>
>
>
i;k i;k
kj;2 lj;1 þ lj;2i;k =
.. ð38Þ
. >
>
>
i;k >
;
i;k i;k
kj;pj lj;pj 1 þ lj;pj
X
pj
li;k i;k
j;l ¼ 1; 0 lj;l 2 Z; l ¼ 1; ; pj ð39Þ
l¼1
The exact algorithm using piecewise linear approximation is as follows: For each
iteration, we increase the number of split points and improve the accuracy of piece-wise
linear approximation. This made it possible to solve a large-scale mixed integer pro-
gramming problem.
Piecewise Linear Approximation Algorithm
For evaluating the solution of the stochastic programming method, we use the value
VSS (value of stochastic solution) [5] of the solution of the stochastic programming
problem.
If the random variable is defined as n, we define the optimization problem on the
realization n of the random variable n as follows:
minx zðx; nÞ ¼ cT x þ miny qT yjWy ¼ h Tx; y 0 ð40Þ
s:t: Ax ¼ b; x 0 ð41Þ
The optimum objective function value RP (recourse problem) of the stochastic pro-
gramming problem is defined as follows:
We define the optimal objective function value ADP (average deterministic problem) of
n is replaced by its mean value n,
a deterministic problem in which the random variable
and let the optimal solution of the problem be x n .
ADP ¼ minx z x;
n ð43Þ
The optimal objective function value RP n when x
n is applied to the stochastic
programming problem is defined as follows.
RP n ¼ En z x n ; n ð44Þ
Because the optimal solution obtained in the problem of finding RP n is a feasible
solution to the problem for obtaining RP, the following relation holds.
RP RP n ; VSS 0 ð46Þ
The problem for obtaining RP is a stochastic programming model that is, formu-
lated as (1)–(34). On the other hand, the problem of finding ADP is a deterministic
model that considers the output of the photovoltaic generation fixed at the average
value. RP n becomes a problem to find out how much expense the deterministic
solution in the problem for ADP will be under uncertainty.
946 T. Fukuba et al.
X
n
difference of cost
initial investment cost ¼ ð47Þ
i¼1 ð1 þ r Þi
The calculation result of the discount rate r = 1 [%] is listed in Table 3. Comparing the
recovery years with the present model by using the stochastic programming method
and deterministic model, the initial investment cost recovery period is smaller in this
research model. Compared with the capacity of the storage battery, the larger the
capacity, the larger the recovery period.
Optimization of Power Plant Operation 947
max
Table 2. RP and RP n Esb ¼ 1½MWh .
Month RP½yen RP n ½yen Month RP½yen RP n ½yen
1 3,946,209 3,947,316 7 3,872,237 3,873,733
2 3,921,607 3,921,607 8 3,880,117 3,881,212
3 3,906,488 3,908,086 9 3,925,601 3,925,601
4 3,881,487 3,882,983 10 3,938,757 3,938,757
5 3,868,814 3,870,060 11 3,953,877 3,955,884
6 3,896,726 3,898,624 12 3,956,763 3,956,763
6 Concluding Remarks
References
1. Suzuki, R., Okamoto, T.: An introduction of the energy plant operational planning problem: a
formulation and solutions. In: IEEJ International Workshop on Sensing, Actuation, and
Motion Control (2015)
2. Inui, N., Tokoro, K.: Finding the feasible solution and lower bound of the energy plant
operational planning problem by an MILP formulation. In: IEEJ International Workshop on
Sensing, Actuation, and Motion Control (2015)
3. Shiina, T., Birge, J.R.: Stochastic unit commitment problem. Int. Trans. Inoperational Res. 11
(1), 19–32 (2004)
4. Nemhauser, G., Wolsey, L.A.: Integer and Combinatorial Optimization. Wiley (1989)
5. Birge, J.R., Louveaux, F.: Introduction to Stochastic Programming. Springer, New York
(1997)
948 T. Fukuba et al.
1 Introduction
Cost management in the aviation sector is a critical factor, considering the tight
profit margins and the instability of economic performance. Due to the speci-
ficity of the aviation sector; normal errors can be catastrophic and lead to huge
losses. The aircraft maintenance sector is one of the most important and most
expensive air transport sectors after direct operational cost. In aircraft mainte-
nance, engine maintenance represents the highest cost and the most effective on
aircraft operations and on the continuity of companies owning these aircrafts,
and here comes the role of scheduling engine maintenance operations; to avoid
errors and to ensure the longest working period and the lowest downtime and
the highest financial return. The main goal of this research is to maximize crafts
operation time without affecting engine maintenance schedule. This goal will be
The authors would like to thank the Deanship of Scientific Research at Majmaah
University for supporting this work.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 949–956, 2020.
https://doi.org/10.1007/978-3-030-21803-4_94
950 M. Jemmali et al.
2 Problem Description
Maximization the minimum completion time problem is described as follows.
Consider a set P that contains a fixed number of spare parts Spn , that has to
be assigned to a deterministic number of turbines T un . Each turbine will be
indexed by i and denoted by T ui . The lifespan of each part j is denoted by lpj .
Each turbine require some parts to be replaced and each turbine has at most
one part at a time. The available parts have not got a release date i.e parts
required for the maintenance process are ready for immediate delivery (delivery
processing time = 0). This problem focuses on maximization of the minimum
turbines operating denoted by T umin . We denote by Clj the cumulative lifespan
of part j.
Example 1. Let SPn = 5 and T un = 2. Table 1 display the lifespan lpj for each
part j.
Randomized-Variants Lower Bounds for Gas Turbines Aircraft Engines 951
j 1 2 3 4 5
lpj 3 5 2 11 4
We assign all parts on the turbines applying any algorithm. The schedule
given by the assignment method is illustrated in Fig. 1. It is clearly to see that
turbine 1 has parts 3, 1 and 4. Contrariwise, for turbine 2, parts 5 and 2 are
picked.
Turbine 1
3 1 4
Turbine 2
5 2
Based on Fig. 1, turbine 1 has a total lifespan 16. However, turbine 2 has
a total lifespan of 9. The maximum operating time is T umax = 16. However,
the minimum operating time is T umin = 9. The objective is to maximize the
minimum operating time T umin . So, we have to seeking for other more efficient
algorithm given a minimum operating time greater than 9.
Using the standard three-field notation of [4], this problem can be denoted
as P ||Cmin .
We order all parts in the non-increasing order of its lifespan. After that, we
assign the parts having the greatest lifespan on the more available turbine.
952 M. Jemmali et al.
For this type of lower bounds, our study is basically articulated on the choice
of the part having the largest lifespan to be scheduled on turbine which having
the minimum total operating time with some probabilistic method. The lower
bounds is based on probabilistic choice between the k largest part with k ∈
{2, 3, 4, 5} respectively for the lower bounds R1 , R2 , R3 and R4 . The selected
part is chosen among the k first parts having largest lifespan with probability β.
This probability is fixed as following:
– We chose randomly a number r in [1 − k]. The selected part will be the rth
largest unscheduled part. We schedule the selected part on the most available
turbine.
– Denoted by Up the number of unassigned parts. If Up < k, then r will be
chosen randomly between [1 − Up ].
The Algorithm 1, gives a result for a fixed k but not the result of the proposed
heuristic. Indeed, the proposed heuristic need the iteration from 2 to k. The
following algorithm calculate the value obtained by the proposed heuristic Rk
for a fixed k.
4 Experimental Results
This section presents an analysis of the obtained results of the developed heuris-
tics. These heuristics were coded with Microsoft Visual C++ (Version 2013),
then were executed on an Intel(R) Core(TM) i5-3337U CPU @ 1.8 GHz and 8
GB RAM. The operating system used is windows 10 with 64 bits.
The proposed lower bounds were tested on a set of instances. These instances was
generated as described in [3]. The lifespan lpj was generated according to differ-
ent probably distributions. Each one of the probability distributions represents
a class. The classes are:
– Class 1: lpj is generated from the discrete uniform distribution U [1, 100].
– Class 2: lpj is generated from the discrete uniform distribution U [20, 100].
– Class 3: lpj is generated from the discrete uniform distribution U [50, 100].
– Class 4: lpj is generated from the normal distribution N [50 − 100].
– Class 5: lpj is generated from the normal distribution N [20 − 100].
The generated instances was obtained by the choice of Spn , T un and Class.
The pair (Spn , T un ) has many possibilities and was fixed by the values displayed
in Table 2.
Spn T un
10 2,3,5
25 2,3,5,10,15
50 2,3,5,10,15
100 3,5,10,15
250 3,5,10,15
LP T R1 R2 R3 R4
P erc T ime P erc T ime P erc T ime P erc T ime P erc T ime
26.8% - 60.3% 0.010 71.3% 0.019 79.4% 0.028 93.0% 0.038
Spn LP T R1 R2 R3 R4
Gap T ime Gap T ime Gap T ime Gap T ime Gap T ime
10 1.97 - 0.60 0.001 0.14 0.002 0.03 0.002 0.01 0.003
25 1.57 - 0.58 0.002 0.28 0.004 0.14 0.006 0.04 0.008
50 0.87 - 0.23 0.004 0.10 0.007 0.04 0.011 0.04 0.014
100 0.32 - 0.17 0.009 0.10 0.018 0.06 0.027 0.01 0.036
250 0.07 - 0.03 0.034 0.02 0.066 0.01 0.099 0.00 0.131
Table 3 shows the heuristic that achieves the best lower bound is R4 with
P erc = 93% in an average time equal to 0.038s, Compared to LP T heuristic
which have only 26.8%.
The performance measure based on spare parts Spn is given in Table 4. The
given results show that when varying Spn , the performance of the developed
heuristics varies, for example for heuristics LP T and R1 when Spn increases,
the performance measure increases. While this wasn’t true for the heuristics R2 ,
R3 and R4 . But in general for all heuristics the best Gap value was obtained
when Spn = 250. Also this table shows that Heuristic R4 obtained the best
performance measure when Spn = 250 and heuristic LP T obtained the worst
performance measure when Spn = 10.
T un LP T R1 R2 R3 R4
Gap T ime Gap T ime Gap T ime Gap T ime Gap T ime
2 0.91 - 0.10 0.002 0.01 0.003 0.00 0.005 0.00 0.006
3 1.51 - 0.64 0.009 0.22 0.016 0.05 0.024 0.01 0.032
5 0.49 - 0.05 0.009 0.03 0.018 0.02 0.026 0.02 0.035
10 1.31 - 0.54 0.013 0.29 0.026 0.15 0.038 0.04 0.051
15 0.44 - 0.18 0.015 0.10 0.029 0.08 0.043 0.04 0.057
Table 5 shows the results of the performance measure based on number of tur-
bines T un . The worst performance measures where obtained for heuristic LP T
when T un = 3 and the best performance measures where obtained for heuristics
Randomized-Variants Lower Bounds for Gas Turbines Aircraft Engines 955
Class LP T R1 R2 R3 R4
Gap T ime Gap T ime Gap T ime Gap T ime Gap T ime
1 0.59 - 0.08 0.010 0.05 0.019 0.03 0.029 0.02 0.037
2 0.85 - 0.24 0.010 0.11 0.019 0.06 0.028 0.03 0.038
3 0.99 - 0.57 0.010 0.22 0.018 0.09 0.028 0.01 0.037
4 1.13 - 0.21 0.010 0.08 0.019 0.05 0.028 0.03 0.037
5 1.11 - 0.49 0.010 0.22 0.019 0.08 0.028 0.02 0.038
Spn T un LP T R1 R2 R3 R4
10 2 1.24 0.01 0.00 0.00 0.00
3 4.39 1.80 0.43 0.08 0.02
5 0.29 0.00 0.00 0.00 0.00
25 2 1.42 0.30 0.02 0.00 0.00
3 1.53 0.67 0.34 0.10 0.01
5 1.55 0.17 0.10 0.09 0.08
10 3.29 1.67 0.91 0.46 0.06
15 0.07 0.07 0.00 0.05 0.05
50 2 0.07 0.00 0.00 0.00 0.00
3 1.31 0.52 0.21 0.03 0.00
5 0.46 0.07 0.03 0.02 0.01
10 1.42 0.27 0.14 0.07 0.08
15 1.08 0.29 0.14 0.09 0.09
100 3 0.24 0.17 0.10 0.04 0.00
5 0.11 0.03 0.01 0.01 0.01
10 0.43 0.19 0.10 0.05 0.01
15 0.51 0.29 0.21 0.14 0.01
250 3 0.06 0.05 0.03 0.02 0.00
5 0.03 0.01 0.00 0.00 0.00
10 0.08 0.03 0.02 0.01 0.00
15 0.09 0.05 0.03 0.02 0.00
956 M. Jemmali et al.
5 Conclusion
References
1. Edmunds, D.B.: Modular engine maintenance concept considerations for aircraft
turbine engines. Aircr. Eng. Aerosp. Technol. 50(1), 14–17 (1978)
2. Gharbi, A.: Scheduling maintenance actions for gas turbines aircraft engines. Con-
straints 10, 4 (2014)
3. Haouari, M., Jemmali, M.: Maximizing the minimum completion time on parallel
machines. 4OR 6(4), 375–392 (2008)
4. Lawler, E.L., Lenstra, J.K., Kan, A.H.R., Shmoys, D.B.: Sequencing and scheduling:
algorithms and complexity. Handb. Oper. Res. Manag. Sci. 4, 445–522 (1993)
5. Mokotoff, E.: Parallel machine scheduling problems: a survey. Asia-Pac. J. Oper.
Res. 18(2), 193 (2001)
6. Tan, Z., He, Y., Epstein, L.: Optimal on-line algorithms for the uniform machine
scheduling problem with ordinal data. Inf. Comput. 196(1), 57–70 (2005)
7. Walter, R., Lawrinenko, A.: Effective solution space limitation for the identical par-
allel machine scheduling problem. Technical report, Working Paper (2014). https://
fhg.de/WrkngPprRW
8. Walter, R., Wirth, M., Lawrinenko, A.: Improved approaches to the exact solution
of the machine covering problem. J. Sched. 20(2), 147–164 (2017)
9. Woeginger, G.J.: A polynomial-time approximation scheme for maximizing the min-
imum machine completion time. Oper. Res. Lett. 20(4), 149–154 (1997)
Robust Design of Pumping Stations
in Water Distribution Networks
1 Introduction
While the lifetime of pipes and water tanks usually reaches 100 years, the mean
lifetime of a pump is closer to 20 years. Operators of water networks must then
periodically proceed to the rehabilitation of pumping stations with the character-
istics of the other network assets already fixed. The problem is complex because,
besides the strategical level–which pump combination to install?–it requires to
investigate the operational level–how to operate the installed pumps?–to evalu-
ate and minimize the lifetime costs over the set of pump combinations. More-
over, evaluating the minimum operation costs brings into play, together, dynamic
water demand and energy tariff profiles, discrete pump scheduling decisions, non-
convex hydraulic laws, and uncertain long-term forecasts. Actually, solving the
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 957–967, 2020.
https://doi.org/10.1007/978-3-030-21803-4_95
958 G. Bonvin et al.
with x the on/off state of the pumps, q the flow through the arcs, and h the
head (sum of pressure and elevation in m) at the nodes, and:
qijt = qjit , t ∈ T , j ∈ JJ (1)
ij∈L ji∈L
qijt = Sj (hjt − hjt−1 ) + Djt , t ∈ T , j ∈ JT (2)
ij∈L
κk denote the class of pump k ∈ K. Constraints (7) limit the pump operation
range, given 0 < Qmin
κ ≤ Qmaxκ (in m3 ), and bind flow values and pump activa-
tion states. Constraints (8) models the head loss due to friction in pipes. For each
pipe ij ∈ Lp , the head loss-flow coupling function Φij can be accurately approx-
imated by a quadratic function Φij (q) = Aij q + Bij q 2 with Aij ≥ 0, Bij ≥ 0.
Constraints (9) synchronize (given a large enough M value) the head increase
of each active pumps. Function Ψκt depends on the manufacture characteristics
κ and on the ageing t of the pump. It can be accurately fitted from operating
points as a quadratic function Ψκt (q) = ακt − βκt q 2 , with ακt ≥ 0 and βκt ≥ 0.
We highlight that the head-flow coupling constraints (8) and (9) are actually
equalities in the original–thus non-convex–formulation of the pump scheduling
problem, but it is shown in [3] that the optimality gap of this relaxation is small.
Finally, the financial cost of operation plan (x, q, h) is mainly incurred by the
purchase of the electricity consumed by pumping, namely:
CTK (x, q, h) = Ct Γκk t (xkt , qkt ), (10)
t∈T k∈K
with Ct ≥ 0 the actualized electricity price on period t and Γκt the power
consumption function for each active pump of class κ and ageing t defined as a
linear fit Γκt (xkt , qkt ) = λκt xkt + μκt qkt .
In the considered water network design problem, only the set K of pumps must
be sized in a way to satisfy the future water demand D and to minimize the
global cost–sum of the actualized investment and operation costs–over a life
span T of typically 20 years. The number of pumps in K is limited by the
capacity N ∈ N∗ of the pumping station. Each pump is selected from a given
set of candidate classes κ ∈ K and acquired new at time t = 0 at a fixed
investment cost Iκ ≥ 0. We assume that the maximal efficiency and the head
increase at constant flow for all pumps decrease from 1% each year, according
to the empirical ageing model of [5]. Under this hypothesis, we assume in the
definition of Φij that ακt = (1 − 100
vt
)ακ0 and βκt = (1 − 100
vt
)βκ0 for any time
t ∈ T with vt the duration from time 0 to t in years. Also, because the power
consumption can alternatively be formulated as the product of the flow and the
head increase divided by the efficiency, we assume that it does not change in
time, i.e. λκt = λκ and μκt = μκ for t ∈ T in the definition of Γκt .
Assuming that the water demand (Dt )t∈T and the electricity price (Ct )t∈T
are given, the optimal design problem is to find a set K of at most N pumps
of classes in K and an operation plan (x, q, h) ∈ PTK of minimum investment
and operation costs. As the life span extends into years while the operation
model requires a time resolution in minutes or hours, this model is not practical
due to its complexity, its dimension and the stochasticity of the water demand.
To address these issues, we formulate a robust variant of the problem where
the time horizon is decomposed into a given set of regular (resp. peak) days
Robust Design of Pumping Stations in Water Distribution Networks 961
3 Benders’ Decomposition
Problem (13) has a decomposable structure: we can separate the investment
variables y from the operational ones (x, q, h) to rewrite (13) as
{y } of be the trial points for (14). We split such a sequence according to feasi-
bility of its elements: O = {ι ∈ {0, . . . , } : c(y ι ) = 0} and F = {0, . . . , }\O .
Note that F gathers (up to iteration ) the points that have been proved infea-
sible for problem (14). By setting = 0, starting with a vector y 0 and defining
the incumbent point ŷ 0 = y 0 , our variant of [1, Alg. 4] defines trial points by
iteratively solving the following MILP:
⎧
⎪
⎪ arg min 12 1 − ŷ , y
⎪
⎨ y
In this section, we show how to accelerate the convergence of the solution algo-
rithm by using a dominance relation on the set of combinations to generate more
than one feasibility cut at a time, including in a preprocessing step. To simplify
the presentation, we fairly assume that a combination which is infeasible for a
regular day is also infeasible for a peak day. We finally integrate to the definition
of peak days a model of robustness to pump failures.
4.1 Dominance
By definition, the maximal flow a pump combination y can deliver for a given
head increase α = hs − hr corresponds to activate all Nκ (y) pumps
of each class
κ ∈ K with ακ0 > α. It is thus equal to Qy (α) = κ∈K Nκ (y) max(0, ακ0βκ−α ).
We say that a combination y dominates y , and we note y y, if Qy ≤ Qy on
the allowed operation range [αmin , αmax ]. The following proposition shows that
all combinations dominated by an infeasible combination are infeasible too.
Proposition 1. For any combinations y and y and peak day d ∈ DP , if y y
and y is infeasible for (14), so is y .
Proof. For any combination y, function Qy is strictly decreasing with the head
increase, then Ay = {(q, α) ∈ R2 | 0 ≤ q ≤ Qy (αmin ), αmin ≤ α ≤ Q−1 y (q)}
964 G. Bonvin et al.
identifies the set of pairs of total flow and head increase which can be operated
by y at any time t (because we relaxed lower bound Qmin κ on peak days). Then
for any point in Rd (y ) = {(x , q , h ) ∈ RK
Td s.t. xd
≤ y
∀d ∈ DP }, which is built
on a sequence of Td elements of Ay ⊆ Ay , there exists another point Rd (y).
assume that all the 6 existing pumps should be replaced by at most N = 6 new
ones selected in the KSB manufacturer catalog [8]. We have preselected a set K
of 19 classes of pumps compatible with the network allowed range of pressure
[αmin = 91, αmax = 140]. For each class κ ∈ K, we interpolated curves Ψκ and Γκ
pκ 0.67
from the catalog data and estimated the investment cost Iκ = 4 400 74.6 +
pκ 0.77
19 300 52 following ([11], Table 9-50). Considering a planning horizon of
T =20 years, we built a set DR of 12 regular days, each representative of a week
day or a week-end in six 2-months periods (January-February, March-April,...)
20
of any year. For each regular day d ∈ DR , we define Ld = L0d l=1 (1 + τ )1−l
with L0d the number of days represented by d in 2013 and τ = 5% the discount
rate. Demand and tariff profiles Dd and C d for a regular day d are computed in
average over the L0d represented days of 2013. The unique peak day is built from
the day in 2013 with the highest instantaneous demand and by initializing the
tanks at their minimum level H min . We fix the time resolution (duration of the
time steps) to 2 h for regular days and to 4 h for peak days.
The computations were performed on a Xeon E5-2650V4 2.2 GHz with 254
GB RAM. The algorithms were implemented in Python (with tol = 10−3 ), and
the master and slave programs solved with Gurobi 7.0.2.
to 66 kW. The purchase cost of the new combination is then half the present
value of the installed combination. Second, the operation costs for year 2013,
when estimated with PTK , are reduced by 24.3%, a reduction that is mainly
driven by a better usage of the pumps (+18.9% of the average efficiency) which
proves the better adequation to the demand.
6 Conclusion
In this paper, we tackled the water network design problem in the context of
networks equipped with pumping stations, where the operation costs must be
estimated dynamically in addition to the investment costs. We formulated differ-
ent approximations to estimate the operation costs to be embedded in a stabi-
lized Benders’ decomposition approach. We also ensured a certain robustness of
the solutions to stress operation conditions and derived dominance arguments to
accelerate the cutting-plane algorithm. Experimented on a realistic instance, the
approximations turned to be accurate and the algorithm fast. While some fea-
tures of our implementation are specific to the FRD network, and more generally
to a class of branched network defined in [3], the method can be generalized to a
variety of water networks after identifying an accurate continuous relaxation for
the operation problem. The potential higher complexity and larger optimality
gap should however be evaluated in practice.
References
1. van Ackooij, W., Frangioni, A., de Oliveira, W.: Inexact stabilized benders’ decom-
position approaches with application to chance-constrained problems with finite
support. Comput. Optim. Appl. 65(3), 637–669 (2016)
2. Alperovits, E., Shamir, U.: Design of optimal water distribution systems. Water
Resour. Res. 13(6), 885–900 (1977)
3. Bonvin, G., Demassey, S., Le Pape, C., Maı̈zi, N., Mazauric, V., Samperio, A.: A
convex mathematical program for pump scheduling in a class of branched water
networks. Appl. Energy 185, 1702–1711 (2017)
4. Bragalli, C., D’Ambrosio, C., Lee, J., Lodi, A., Toth, P.: On the optimal design
of water distribution networks: a practical MINLP approach. Optim. Eng. 13(2),
219–246 (2012)
5. Bunn, S.: Ageing pump efficiency: the hidden cost thief? In: Distribution Systems
Symposium & Exposition (2009)
6. Burgschweiger, J., Gnädig, B., Steinbach, M.: Optimization models for operative
planning in drinking water networks. Optim. Eng. 10(1), 43–73 (2009)
7. D’Ambrosio, C., Lodi, A., Wiese, S., Bragalli, C.: Mathematical programming tech-
niques in water network optimization. Eur. J. Oper. Res. 243(3), 774–788 (2015)
8. KSB: Multitec : High-pressure pumps in ring-section design-booklet with perfor-
mance curves. https://shop.ksb.com/
9. Marques, J., Cunha, M., Savić, D.A.: Multi-objective optimization of water distri-
bution systems based on a real options approach. Environ. Model. Softw. 63, 1–13
(2015)
Robust Design of Pumping Stations in Water Distribution Networks 967
10. Murphy, L., Dandy, G., Simpson, A.: Optimum design and operation of pumped
water distribution systems. In: Conference on Hydraulics in Civil Engineering
(1994)
11. Perry, R., Green, D., Maloney, J.: Perry’s Chemical Engineers’ Handbook, 7th edn.
McGraw-Hill (1997)
12. Raghunathan, A.U.: Global optimization of of nonlinear network design. SIAM J.
Optim. 23(1), 268–295 (2013)
Engineering Systems
Application of PLS Technique to Optimization
of the Formulation of a Geo-Eco-Material
1 Introduction
Since the earliest civilizations settled on the continent, earth has been used as a major
building material. Raw earth material is attracting renewed interest thanks to its green
characteristics with very low energy for its transformation into building material
compared to conventional building materials like concrete. It is recyclable and it
requires only simple technology. Building with raw earth, which is a local material, is a
solution to avoid energy-intensive transport [1].
The principal properties of raw earth material are the mechanical strength, the
shrinkage and swelling, the hygrothermal properties and the cracking [2]. It is not often
adequate to achieve the performance needed for a building material and different
stabilizers are used to improve its properties [3]. Among them, when the mechanical
strength of a raw earth material is concerned, stabilizers like lime, gypsum and cement
can be used. Some studies are concerned with the influence of these binders on the raw
earth properties in order to produce an improved construction material [4]. As an
example, Zak et al. [1] studied the effect of reinforcement by natural fibers, gypsum and
cement on compressive strength of unfired earth bricks materials. They showed that the
mixing of earth material with gypsum has no favorable effect on the compressive
strength.
A new concrete made of raw earth was recently developed by a French firm from
Normandy called Cematerre, in working with University of Le Havre Normandie. Its
originality and its advantage is its ability to be cast in place as a conventional concrete.
This raw earth concrete as an Eco-geo-material is composed of four components: silt,
lime, cement and water. The principal goal of this paper is to optimize the formulation
of raw earth concrete to improve the mechanical strength.
The Design of experiments, DOE, can be a good approach to search such a for-
mulation depending on the proportion of mixture components [5]. To satisfy this goal,
combinations of four-constituent mixtures were formulated using the D-optimal mix-
ture design. The experimental domain was determined according to some constraints.
Several series of laboratory tests were performed to establish model formulations tar-
geting the sought compressive strength after 90-day curing-time. Thereafter, the
derived model was validated. A multivariate statistical regression method of PLS,
Partial Least Square projections to latent structures, was chosen to examine the design
[6, 7]. This PLS technique was chosen because of the complicated experimental design
data due to different constraints on model. The Partial Least Square projections on
latent structures were calculated to estimate the coefficients of the model. Finally,
results were analyzed to improve and optimize formulations of raw earth concrete
materials using the loading plot of PLS technique and response trace plot.
statistical regression method of PLS, Partial Least Square projections to latent struc-
tures, is a better choice than MLR technique.
The PLS technique was originally introduced in chemometrics and economics [11].
The pioneering work dealing with PLS was performed by Wold in the field of
econometrics [6]. It has been successfully employed in other scientific areas, such as
medicine, bioinformatics, computer vision and civil engineering. PLS model represents
approximations of an underlying complicated reality. The aim of PLS technique is to
predict a set of response variables from a set of predictor variables through latent
variables. It is commonly followed through an iterative process. PLS technique has two
basic advantages. PLS produces low-rank approximations of the data that are aligned
with the response and then uses low-rank approximations of both the input data and
response data to estimate the final regression model. The PLS technique is decomposed
in two parts: linear and non-linear PLS. A major limitation of linear PLS is that some
problems are often displaying nonlinear characteristics; hence using the nonlinear PLS
is appropriated [6].
Associated with PLS technique, some tools such as scores and loadings can be
performed to interpret the model. Within the PLS modelling framework, an important
attention is given on the plotting of model parameters like scores and loadings, since
such plots are constructive and useful for model verification and interpretation [12].
The analysis with PLS technique provides some advantages [7]: (i) there is no
limitation by the number of experiments and the degree of freedom of the model. (ii) if
there are several response variables, their covariance is taken into account in the model.
(iii) in addition to the results given by the effect or response surface analysis, PLS
technique provides tools to detect outliers. As each experiment (except the center one)
is often not replicated, it is difficult to locate an outlier result in classical method.
The PLS technique has been largely explained in the literature [13] and its use to
mixture data was detailed by Kettaneh-Wold [14].
3 Eco-Geo-Materials
Eco-geo-materials analyzed herein are composed of four constituents: silt, lime, cement
and water. The used soil material for a raw earth concrete is natural silt. Silt is granular
material of a size between sand and clay, whose mineral origin is quartz and feldspar
[15]. It was selected because locally available material in abundance on the con-
struction site. For this natural silt, the effective diameter, the Hazen uniformity factor
and the curvature factor are respectively equal to 32 µm, 4.37 and 0.94. The Atterberg
limits are respectively 20% for the liquid limit and 6% for the plasticity index [16].
Lime and Portland cement as two binders are used to produce a raw earth concrete. The
used lime comes from the Proviacal® DD range. The used cement is CEM I 52.5 N,
with respect to the NF EN197-1, NF P15-318 and NF EN196-10 standards. It is a
Portland cement, composed of 95 to 100% of clinker, with an unconfined compressive
strength (UCS) determined at 28 days of 52.5 MPa (lowest value), with an ordinary
short-term strength class (2 or 7 days). More information about these two binders’
properties was given in details in Eid 2017 [17]. For preparing the raw earth concrete, a
potable tap water in the laboratory was used.
974 S. Imanzadeh et al.
The mixing procedure was performed in a laboratory mixer with a capacity of four
liters. Then, the molds of 100 mm in height and 50 mm in diameter were filled by
vibration for two minutes with a vibrating table. Thereafter, the specimens were stored
for 90-day curing-time in controlled laboratory environment. Laboratory prepared raw
earth concrete specimens with different mix proportion were examined, conducting
Unconfined Compressive Strength (UCS) test. The specimens were sheared on
unconfined compressive strength path respecting to NF P94-420 and NF P94-425
French standards. The unconfined compressive strength test is performed to present
stress-strain curve.
Quadratic polynomial model was chosen for four-constituents to obtain the model that
satisfy the best fit to the experimental measurements then to make predictions of the
unconfined compressive strength (UCS) of a raw earth concrete whatever the mixture
of constituents. The quadratic model considers binary interaction influence for all
possible pairs of components (Eq. 1).
X
n X
i;j¼n
UCS ¼ a0 þ ai x i þ aij xi xj ðQuadratic model) ð1Þ
i¼1 i;j¼1
where UCS is the response, in the other word, the unconfined compressive strength of a
raw earth concrete in MPa. In Eq. 1, a0 is a constant. n is the number of constituents. xi
and xj are the quantity of constituents in weight percent. ai the regression coefficients
for the linear terms and aij are the regression coefficients for the binary interaction
parameters.
The Analysis Of Variance (ANOVA) involves several diagnostic tools to confirm
the validity of the models [18, 19]. The two first diagnostic tools are the regression
coefficients, R2 and R2 adj , giving the information on the ability of the model to fit the
measured values. Furthermore, Q2 coefficient evaluates the model validity that means
its ability to predict new data. Afterwards, the first F-Test and the second F-Test should
be evaluated. The first F-Test is the regression model significance test. It compares
regression variance to residual variance. For the second F-Test, residual error is
structured in two parts: lack of fit due to imperfection of the model and pure error
estimated from replicates data error [20].
Three constraints must be considered:
(1) Fundamental constraint where the sum of the components of the mixture is equal
to 100% in weight for all the mixes of the design,
(2) Economic and ecological mixture constraints: the raw earth concrete should be
designed to offer the convenient mechanical properties to be used as a con-
struction building material. In addition, it should be non-energy-intensive. For this
reason, the amount of these two binders, cement and lime, should be limited.
Therefore, the maximum amounts of cement and lime are respectively limited to
Application of PLS Technique to Optimization 975
16% and 12% and a condition of maximum binder proportion was fixed: Cement
% + Lime % < 16%. In agreement with above-mentioned constraints, the mixing
range selected for each of the constituents, is presented in Table 1.
(3) Workability constraint: workability has an important role on the mechanical
properties. Some properties like cohesion, plasticity and consistency can affect the
workability of a raw earth concrete. In this research paper, a S3 level of consis-
tency calibrated by standard slump test was performed to confirm a fluidity near to
a very plastic concrete in conformity with the standard NF EN 206-1.
5 Model Validation
Quadratic polynomial model was first fit to the measured experimental values ensuing a
thorough examination of R2, R2 adj and Q2 to find the most adequate model representing
the measured experimental data. The values of R2, R2 adj and Q2 are respectively 0.979,
0.976 and 0.978 for 90-day curing time. For this curing time, the values of R2, R2 adj can
be considered as good. In addition, this model illustrates a very good predictive rele-
vance (Q2 > 0.9). Furthermore, it was found that the chosen model passed the first F-
Test and the second F-Test. Thus, selected model can be accepted as valid. Once the
best-fitting model was chosen for this curing time, an equation explaining the pre-
diction of Unconfined Compressive Strength (UCS) was obtained for raw earth con-
crete formulations. The regression coefficients of the model were shown in Table 3.
976 S. Imanzadeh et al.
Table 2. Experimental design of the D-optimal design for a raw earth concrete (xi is the
proportion of the different mixture constituents)
Formulation Silt (x1) Lime (x2) Cement (x3) Water (x4)
1 0.7283 0.0000 0.0400 0.2317
2 0.5784 0.1200 0.0400 0.2616
3 0.5784 0.1200 0.0400 0.2616
4 0.6088 0.0000 0.1600 0.2312
5 0.6088 0.0000 0.1600 0.2312
6 0.7381 0.0000 0.0400 0.2219
7 0.7381 0.0000 0.0400 0.2219
8 0.5882 0.1200 0.0400 0.2518
9 0.6185 0.0000 0.1600 0.2215
10 0.6185 0.0000 0.1600 0.2215
11 0.6784 0.0400 0.0400 0.2416
12 0.6284 0.0800 0.0400 0.2516
13 0.6382 0.0800 0.0400 0.2418
14 0.5886 0.0800 0.0800 0.2514
15 0.5987 0.0400 0.1200 0.2413
16 0.5983 0.0800 0.0800 0.2417
17 0.6084 0.0400 0.1200 0.2316
18 0.6483 0.0400 0.0800 0.2317
19 0.6434 0.0400 0.0800 0.2366
20 0.6434 0.0400 0.0800 0.2366
21 0.6434 0.0400 0.0800 0.2366
Table 3. Model coefficients for the unconfined compressive strength of raw earth concrete
Coefficients Quadratic model (curing time: 90-day)
a0 18.19
a1 −11.52
a2 −286.01
a3 326.28
a4 6.38
a12 210.62
a13 −106.54
a14 −84.12
a23 −355.60
a24 660.80
a34 −765.81
Application of PLS Technique to Optimization 977
Fig. 1. Interpretation of the unconfined compressive strength response, UCS, by a PLS loading
plot
978 S. Imanzadeh et al.
found that the distance of the orthogonal projection to origin for cement constituent is
longer than for these interaction terms. Cement is clearly and obviously, the main factor
that contributes to increase UCS. Indeed, when the amount of cement increases, the
unconfined compressive strength also increases. The positive effect of the silt-lime
interaction on UCS is slightly stronger than the silt-water interaction. On the contrary,
lime and water constituents and both lime-cement and cement-water interactions act
with a negative impact on the unconfined compressive strength. Finally, silt compo-
nent, lime-water and silt-cement interactions have a little impact on UCS. They can be
neglected because of their small values of Variable Importance for Projection
(VIP) less than 0.8 [6].
Fig. 2. Response surface contour plot of unconfined compressive strength for 90-day curing
time (cement = 8%)
Application of PLS Technique to Optimization 979
7 Conclusions
In this research study, it was shown that Design of Experiments method is an opti-
mization tool adapted to examine the unconfined compressive strength of a raw earth
material. The design tests were determined by considering three different constraints:
ecological, economical and workability. A multivariate statistical regression technique
of PLS was selected to evaluate the design. This PLS technique was applied because of
the complicated experimental design data together with different constraints on model.
Thanks to the loading plot of PLS technique, the negative or positive roles of each
constituent and existing interactions between constituents were pointed out. For
example, the model has shown the unfavorable role of lime on UCS response; on the
other hand, favorable role of silt-lime interaction on UCS response was illustrated. This
could be explained by the fact that, when the percentage of lime exceeds a certain
threshold, it has insignificant effect on UCS and it behaves like an addition of fine
particles in the raw earth material, which reduces UCS. However, silt-lime interaction
effect on UCS is favorable because the lime reacts in two steps, first the cationic
exchange within silt particles changes the granulometry of silt, and thereafter the
pozzolanic reactions produce a cementation between the silt particles so increasing the
UCS. The eco-geo-material formulation was optimized based on surface plots. In the
presence of complicated experimental design data with different constraints on model,
PLS is a good alternative, to the more classical multiple linear regression for predicting
a suitable mixture composition.
980 S. Imanzadeh et al.
References
1. Zak, P., Ashour, T., Korjenic, A., Korjenic, S., Wu, W.: The influence of natural
reinforcement fibers, gypsum and cement on compressive strength of earth bricks materials.
Constr. Build. Mater. 106, 179–188 (2016)
2. Ashour, T., Korjenic, A., Korjenic, S., Wu, W.: Thermal conductivity of unfired earth bricks
reinforced by agricultural wastes with cement and gypsum. Energy Build. 104, 139–146
(2015)
3. Carmen Jimenez Delgado, M., Canas Guerrero, I.: Earth building in Spain. Constr. Build.
Mater. 20, 679–690 (2006)
4. Al-Mukhtar, M., Lasledj, A., Alcover, J.F.: Lime consumption of different clayey soils.
Appl. Clay Sci. 95, 133–145 (2014)
5. Herrero, A., Ortiz, M.C., Sarabia, L.A.: D-optimal experimental design coupled with parallel
factor analysis 2 decomposition a tool in determination of triazines in oranges by
programmed temperature vaporization-gas chromatography-mass spectrometry when using
dispersive-solid phase extraction. J. Chromatogr. A 1288, 111–126 (2013)
6. Wold, S., Kettaneh-Wold, N., Skagerberg, B.: Nonlinear PLS modelling. Chemometr. Intell.
Lab. Syst. 7, 53–65 (1989)
7. Eriksson, L., Byrne, T., Johansson, E., Trygg, J., Vikstrom, C.: Multi - and megavariate data
analysis, basic principal and applications, 3rd edn. Umetrics Academy (2013)
8. Myers, R.H., Montgomery, D.C., Anderson-Cook, C.M.: Response surface methodology:
process and product optimization using designed experiments, 4th edn. Wiley, New York
(2016)
9. Eriksson, L., Johansson, E., Kettaneh-Wold, N., Wikström, C., Wold, S.: Design of
Experiments: Principles and Applications. Umetrics AB, Umeå Learnways AB, Stockholm
(2000)
10. Geladi, P., Kowaiski, B.R.: Partial least-squares regression: a tutorial. Anal. Chim. Acta 185,
1–17 (1986)
11. Wold, H.: Path Models with Latent Variables: The NIPALS Approach. Academic Press,
New York (1975)
12. Hoskuldsson, A.: Prediction Methods in Science and Technology. Thor Publishing,
Denmark (1996)
13. Li, B., Morris, A.J., Martin, E.B.: Generalized partial least squares regression based on the
penalized minimum norm projection. Chemom. Intell. Lab. Syst. 72, 21–26 (2004)
14. Kettaneh-Wold, N.: Analysis of mixture data with partial least squares. Chemometr. Intell.
Lab. Syst. 14, 57–69 (1992)
15. Assallay, A.M., Rogers, C.F., Smalley, I.J., Jefferson, I.F.: Silt. Earth-Sci. Rev. 45, 20–30
(1998)
16. Imanzadeh, S., Hibouche, A., Jarno, A., Taibi, S.: Formulating and optimizing the
compressive strength of a raw earth concrete by mixture design. Constr. Build. Mater. 163,
149–159 (2018)
17. Eid, J.: New construction material based on raw earth: cracking mechanisms, corrosion
phenomena and physico-chemical interactions. Eur. J. Environ. Civ. Eng. 8189, 1–16 (2017)
18. Box, G.E., Stuart Hunter, J., Hunter, W.G.: Statistics for experimenters: design, innovation,
and discovery, 2nd edn. Wiley (2005)
19. Fisher, R.A.: The Design of Experiments. Hafner, Libraries Australia, New York (1971)
20. Goupy, J.: Plans d’expériences : les mélanges. DUNOD, Paris (2001)
Databases Coupling for Morphed-Mesh
Simulations and Application on Fan
Optimal Design
1 Introduction
With the improvement of measurement facilities such as high-performance
computing, data collected becomes more and more complex and informative,
characterized by increasing dimensionality and larger sample size [1,5], seriously
challenging our ability to keep pace with the need to precisely model the systems
we seek to design.
In the design optimization, it is a common practice to extract design infor-
mation through a Design of Experiments (DoE) based modeling process [4,14].
Supported by NSFC No.51575498.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 981–990, 2020.
https://doi.org/10.1007/978-3-030-21803-4_97
982 Z. Zhang et al.
However, the number of samples grows dramatically with the dimensions and
the size of the design space. Usually it is too computational costly to obtain
enough information for the entire design space. Consequently, optimization are
restricted on the neighborhoods of some reference configurations or operating
conditions. Optimization results are often some incrementally “improved” ones
instead of the global optima. Nevertheless, data used for one optimization can be
relevant for another, both of them can be used for the exploration of the design
space [18].
Inspired from the idea of data clustering, this work draws attention to the
coupling of existing databases which have been used for different optimizations.
Links between these disperse data can be approximated, which helps to capture
the global properties of the design space. Different from the DoE-based mod-
eling approaches, only a few databases are used, each of them relies on either
a new geometrical configuration or a new operating condition. The coupling of
databases is performed at 2 steps:
(1) Data smoothing: Data is collected from numbers of discrete measurements,
which are not arbitrary distributed but roughly centered at a reference
configuration in the design space. A smoothing process is done to collect
information out of the database, so that continuous extrapolation can be
performed out of the reference configuration;
(2) Databases coupling: Extrapolation out of one single database suffers from
truncation error, effective covered region by one database is limited to a
certain radius. For large-scale design space with disperse reference configu-
rations, the coupling of several databases is necessary.
The first operation basically creates a “local” model with a limited acting-range,
a mesh-morphing technique is employed. The second step makes uses of the
“local” models to establish a global one by using direct co-Kriging method.
This paper begins with a description of the generation of database by using
mesh-morphing technique, then direct co-Kriging method is used to couple the
databases. This methodology is employed for the aerodynamic shape optimiza-
tion of an automotive engine cooling fan that has been optimized in a previous
study [22].
2 Database Generation
For simulation based aerodynamic shape optimization, design parameters can
be categorized into geometrical parameter and physics parameter. Variation of
the geometrical parameter results in different geometrical configurations, which
requires a remeshing process. Variation of physics parameter results in different
operating conditions, which can be simply done through modification of the
boundary condition.
A mesh deformation method is applied for the variation of geometrical param-
eter. It relies on a morphing technique which calculates the mesh (nodes) dis-
placements with a Radial Basis Function (RBF) approach [15,16]. The mesh
Databases Coupling for Morphed-Mesh Simulations and Application 983
Fig. 1. (a). Mesh morphing driven by control points (Ref. [16]) (b). An illustration of
surface mesh deformation
One single reference mesh is used for one geometrical configuration. Deriva-
tives can be deduced from the results of some new CFD simulations based on
deformed meshes. Compared with the classical parametrization method which
parametrizes the geometrical configuration directly, followed by a re-meshing
process for each new configuration, the mesh deformation approach deals with
one single reference mesh, which can more-likely conserves the total mesh element
number and assures the similar discretization of the computational domains.
Figure 1b compares the original surface mesh with a deformed mesh following
a variation of sweep angle. The variation of parameters follows a selected finite
difference scheme as shown, where the round point in the center represents the
reference mesh, and the diamond shape points represent the deformed meshes.
CFD simulations are performed based on the morphed meshes and modified
boundary conditions. Simulations results are used to calculate the derivatives.
The derivative calculation follows the same ordinary least square method
as used in the Ref. [22]: first, data is normalized and variations are calculated;
second, second order polynomial regression is applied to calculate the first order
and second order derivatives; finally, not only the simulation results, but also the
previously calculated derivatives are used to calculate the cross derivatives. By
doing this, data used to calculate the derivatives are reused to calculate the cross
derivatives which can improve the accuracy. Notice that points are allocated in
this study with a given finite difference scheme, which is not obligatory for the
984 Z. Zhang et al.
calculation of derivatives. The same regression method can be applied for a DoE
style distribution.
Once the derivative database is computed, a model can be built for each
objective using a multi-parameter high-order Taylor-series expansion:
F (n)
F (P + ΔP ) = F (P ) + F (1) ΔP + . . . + ΔP n + o(ΔP n+1 ) (1)
n!
where F (P + ΔP ) is the objective reconstructed in term of a variation ΔP of
the parameter P . In Eq. (1), the truncation error is of the magnitude of ΔP n+1 .
In the previous work [22], one simple polynomial model is used directly for the
optimization. In this work, several models are created and coupled by using the
co-Kriging method presented in the next section.
Much work has been done to improve the model reliability, either from local or
global points of view [3,10]. These researches follows two main paths. First, by
applying different training strategies, one tries to improve the local precision of
the model, especially that of the optima neighborhoods, so called “design space
exploitation”. This is particularly useful, and at times indispensable for large-
scale optimization. Second, introducing assistant information such as deriva-
tives to the original dataset which serves to build the surrogate model. The
most typical development concerns the integration of the adjoint method and
the gradient-enhanced co-Kriging model in the aerodynamic shape optimiza-
tion [6]. One remarkable advantage of adjoint method lies in the independence
of the number of design variables, making this approach an appropriate choice for
high-dimensional problems. However, the gradients obtained from this method
depend on some given objective functions because the reversal mode differenti-
ation is used, consequently, adding new objectives requires a new adjoint-state
evaluation. Furthermore, for the sake of reliability of the surrogate model, despite
the gradients can improve the model accuracy, it is not sufficient for the mod-
eling of the multimodal objective functions with limited number of evaluations.
High-order derivatives, if applicable, should be naturally considered [9,17]. The
databases used in this study contain derivatives up to second order for the pur-
pose of illustration [21], database with higher order is also applicable.
The Kriging method models an unknown function by a stochastic process [7,
8,13], represented by a mean function and a covariance function. It is often the
best linear unbiased prediction from some given evaluations, i.e., the prediction
error is minimized. Two sub-models are needed for this stochastic process:
(1) A regression model which can be considered as the mean function of all the
possible prediction functions subject to the existing evaluations;
(2) A correlation model which reflects the spatial correlations between the points
of the design site. The Gaussian model is the most-frequently used, which
assumes a priori the Weakly Sense Stationary(WSS) of the design site.
Databases Coupling for Morphed-Mesh Simulations and Application 985
For the regression model, many works presume a constant average of the Kriging
process [12,20]. For a co-Kriging process [9,19], the differentiation of the regres-
sion model is required, once differentiated, a constant regression model cannot
reflect the differences of high order terms at different design points. Zhao has
assessed different regression models such as Hermite polynomial, the trigonom-
etry function and the exponential function and so on [24]. According to his
research, the polynomial function is recommended for the universal Kriging and
is therefore taken for this study. The regression matrix for a co-Kriging model
consists of a regression part and a differentiated part which takes the following
form: F(x) = [F0 d(F) d2 (F)]T , where d(F) and d2 (F) are respectively the
first order and second order derivatives of the regression matrix F. If higher order
database is concerned, the corresponding d3 (F), . . . dn (F) should be added after
d2 (F). Clearly from this representation, if the regression model is of order less
than that of the given database, the higher order part of the regression matrix
will all be zeros. In this case, the “enhancing” effect of the high order deriva-
tives will not be fully extracted. As the database is integrated with second order
derivatives, second order polynomial function is chosen for this study.
Kriging method assumes that the regression function is a mean path of a ran-
dom process, a careful choice of law of probability for this process is essential.
Generally, the laws of probability are unknown for a n dimensional design space.
In order to characterize the covariogram in the random field, it is necessary to
do an estimation from the existing samples following a given correlation model.
Commonly used correlation models are exponential model and Gaussian
model which assume an a priori known covariogram style. The interpolation
properties primarily depend on the local behavior of the random field. Near to
the origin, the exponential model behaves linearly and the Gaussian model shows
a quadratic behavior. For the sake of derivability, the latter is chosen for current
study, the correlation between 2 samples can be expressed as:
n
R(θ, skj , slj ) = exp(−θj d2j ) (2)
j=1
where r(x) is the correlation vector between any design to be predicted and
existing reference points, Y is the response matrix, regression coefficient vector
β can be calculated by a least square estimation which gives:
fan will not generate additional drag (negative Δp) when a vehicle is running at
high speed.
The pressure rise Δp from upstream to downstream of the fan wheel, the
torque T acting on the fan and the aerodynamic efficiency η are considered
as objectives to be optimized. The first study has been performed with only
one database [22]. With one database, the covered design space is limited. The
complexity of the characteristic curves and the transparent point prediction drive
us to couple several databases.
The database coupling will exploit the most influent parameters, also it will
show the capacity of exploration for the dimension with only one single value,
which is not possible for a classical Gaussian process based co-Kriging method.
For the sake of database coupling, where only a few databases are available,
adaptation of the co-Kriging method has been studied and implemented [23].
Stagger angles at mid-span and at tip of the fan blade, air flow rate were taken
as parameters to be exploited. Having 2 geometrical configurations, functioning
at 2 operating points, will result in 4 databases. The coupling of all the databases
allows performances evaluating at different flow rates.
Fig. 2. Schema of database coupling and optimization results presented on the Pareto
front
at the tip takes its reference relative value 0◦ , a third axis can be imagined
perpendicular to the two axis shown. Although with only one reference value,
the dimension of tip stagger angle γt is explored.
Model D, which couples 4 different databases, allows us to obtain more reli-
able results on a larger range of parameter variation. Hence this model is used
to exploit the optima according to different criteria.
Based on model D, a multi-objective optimization has been performed to
pursue an optimal design in term of performances at 2 different operating con-
ditions: Qn = 2300 m3 /h and Qi = 2800 m3 /h. At nominal operating point
2300 m3 /h, a higher efficiency is wanted, for 2800 m3 /h, a higher pressure rise
Δp is preferred in order to have an extended range of flow rate.
Two objectives: efficiency at Qn = 2300 m3 /h, namely ηn and pressure rise
at Qi = 2800 m3 /h, namely Δpi , have been considered. The algorithm NSGA-
II [2] has been employed with 5000 individuals and 100 generations due to the
inexpensive model based evaluations. For each point, 2 performance evaluations
have been done thanks to the model D, one evaluation at 2300 m3 /h and the
other at 2800 m3 /h. The bi-objective Pareto front is illustrated (Fig. 2b).
In Fig. 2b, the initial individuals, marked as red points, form a 2-dimensional
projection on the objective surface ηn − Δpi . The frontiers are clearly depicted,
where the Pareto front can be seen on the top-right part, marked with black
dots, formed of 502 survived individuals.
In order to illustrate the possible exploitation with the coupled model, 2
optima have been adopted on the Pareto front, one favors the transparent point
(optimB) and the other valorizes the efficiency of nominal condition (optimC).
Optimization results are compared with the reference configuration “Ref” in
Table 1.
Geometry γt◦ ◦
γm Q (m3 /h) Δp (Pa) C (Nm) η (%)
Ref 0 0 2300 226.4 0.9073 54.36
Ref 0 0 2800 184.8 0.9019 54.23
OptimB −2.88 0.04 2300 242.0 0.9689 54.36
OptimB −2.88 0.04 2800 199.8 0.9716 54.39
OptimC −2.65 0.99 2300 232.1 0.9163 55.20
OptimC −2.65 0.99 2800 184.8 0.9074 53.87
Conclusion
By using a mesh-morphing technique, aerodynamic databases are generated at
different locations in the design space, each of them is centered by one reference
geometrical configuration or physics condition that has been previously used for
optimal design. Databases are analyzed and some useful regression coefficients
are collected through a ordinary least square method. The co-Kriging method has
been implemented to couple the databases. A second order polynomial regression
and a Gaussian correlation model have been employed for the model. Adapta-
tion of the classical co-Kriging method has been done to make it work for the
dimension with one-single reference value.
The proposed approach is applied to the optimal design of an engine cooling
fan. 4 databases have been obtained, corresponding to 4 polynomial model. 3
coupled models were created based on these databases, in which the model D,
being the most reliable one, is used for a model based optimization. One of the
results showed obvious improvements on both the aerodynamic efficiency and
the torque for an engine cooling fan. A multi-objective optimum succeeded in
enlarging the operating range of the fan, the other managed to keep the same
range and improve the efficiency at nominal condition.
The approach can be possibly coupled with the sensitivity equation meth-
ods, where the Navier-Stokes equations are implicitly differentiated to obtain
derivatives in a much economical way.
References
1. Constantine, P.G.: Active Subspaces: Emerging Ideas for Dimension Reduction in
Parameter Studies, vol. 2. SIAM, Philadelphia, PA (2015)
2. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast elitist non-dominated
sorting genetic algorithm for multi-objective optimization: NSGA-II. Lecture Notes
in Computer Science, vol. 1917. Springer, Berlin, Heidelberg (2000)
3. Forrester, A., Keane, A., Bresslo,NW.: Design and analysis of “noisy” computer
experiments[J]. AIAA J. 44(10), 2331–2339 (2012)
4. Wang, G., Shan, S.: Review of metamodeling techniques in support of engineering
design optimization. ASME J. Mech. Des. 129(4), 370–380 (2007)
5. Giraldo, R., Dabo-Niang, S.: Statistical modeling of spatial big data: an approach
from a functional data analysis perspective. Stat. Prob. Lett. (2018) (in press)
6. Han, Z., Zimmerman, R., Görtz, S.: Alternative cokriging model for variable-fidelity
surrogate modeling. AIAA J. 50(5), 1205–1210 (2012)
7. Jones, D.R.: A taxonomy of global optimization methods based on response
surfaces. J. Global Optim. 21(4), 345–383 (2001). https://doi.org/10.1023/A:
1012771025575
990 Z. Zhang et al.
8. Krige, D.: Statistical approach to some mine valuations and allied problems at the
witwatersrand. Master’s thesis, University of Witwatersrand (1951)
9. Laurenceau, J., Meaux, M., Montagnac, M., Sagaut, P.: Comparison of gradient-
based and gradient-enhanced response-surface-based optimizers. AIAA J. 48(5),
981–994 (2010)
10. Leifsson, L., Koziel, S., Tesfahunegn, Y.A.: Multiobjective aerodynamic optimiza-
tion by variable-fidelity models and response surface surrogates. AIAA J. 54(2),
531–541 (2016)
11. Lophaven, S.N.: Aspects of the matlab toolbox dace. Technical Report, University
of Denmark (2002)
12. March, A., Willcox, K.: Provably convergent multifidelity optimization algorithm
not requiring high-fidelity derivatives. AIAA J. 50(5), 1079–1089 (2012)
13. Matheron, G.: Principles of geostatistics. Econ. Geol. 58, 1246–1266 (1963)
14. Probst, D.M., Senecal, P.K.: Optimization and uncertainty analysis of a diesel
engine operating point using computational fluid dynamics. ASME 2016 Internal
Combustion Engine Division Fall Technical Conference, Greenville, South Carolina,
USA (2016)
15. Rendall, T.C.S., Allen, C.B.: Unified fluid-structure interpolation and mesh motion
using radial basis functions. Int. J. Numer. Methods Eng. 74, 1519–1559 (2014)
16. Rozenberg, Y., Benefice, G., Aubert, S.: Fluid structure interaction problems in
turbomachinery using rbf interpolation and greedy algorithm. In: ASME Turbo
Expo 2014: Turbine Technical Conference and Exposition, vol. 16, no. 1, p. 102
(2014)
17. Rumpfkeil, M.P.: Optimizations under uncertainty using gradients, hessians, and
surrogate models. AIAA J. 51(2), 444–451 (2013)
18. Schnoes, M., Nicke, E.: A database of optimal airfoils for axial compressor through-
flow design. ASME J. Turbomach. 139(5) (2017)
19. Villemonteix, J., Vazquez, E., Walter, E.: An informational approach to the global
optimization of expensive-to-evaluate functions. J. Global Optim. 44(4), 509–534
(2008)
20. Yamazaki, W., Mavriplis, D.J.: Derivative-enhanced variable fidelity surrogate
modeling for aerodynamic functions. AIAA J. 51(1), 126–137 (2013)
21. Zhang, Z., Demory, B.: Space infill study of kriging meta-model for multi-objective
optimization of an engine cooling fan. Turbine Technical Conference and Exposi-
tion. In: Proceedings of ASME Turbo Expo 2014, Dusseldorf, Germany (2014)
22. Zhang, Z., Buisson, M., Ferrand, P.: Meta-model based optimization of a large
diameter semi-radial conical hub engine cooling fan. Mech. Ind. 16(1), 102 (2015)
23. Zhang, Z., Han, Z., Ferrand, P.: High anisotropy space exploration with co-kriging
method. Global Optimization Workshops 2018 (LeGO). Leiden, Netherlands (2018)
24. Zhao, L., Choi, K.K., Lee, I.: Metamodeling method using dynamic kriging for
design optimization. AIAA J. 49(9), 2034–2046 (2011)
Kriging-Based Reliability-Based Design
Optimization Using Single Loop
Approach
1 Introduction
min f (d, μX )
P rob[Gi (X, d)] ≤ Ffti (1)
st. i = 1, 2, . . . m
d ≤ d ≤ dU , μL ≤ μ ≤ μU
L
where f (d, μ) is the cost function or objective function,Gi (X, d) is the ith limit
state function, P rob[Gi (X, d)] is its probability of failure and Ffti is the ith
target probability of failure, m is the number of limit state functions.
To solve the RBDO problem, many algorithms have been proposed and can
be summarized as double loop methods, single loop methods, and decoupled
methods [1].
Double-loop methods aim to solve the RBDO problem in two loops, the outer
loop tries to solve the optimization problem by changing the design variables,
while the inner loop solves the reliability constraints. These methods include
the simple Monte Carlo simulation(MCS), which is straightforward, but it needs
large sample sets and becomes prohibitive when the probability of failure is
low [9]. Approximation methods have been proposed to approximate the prob-
ability of failure. Enevoldsen and Sørensen (1994) have proposed the Reliability
Index approach(RIA) [6], Tu and Choi (1999) has proposed the performance
measure approach(PMA) [15], which has proven to be more robust and efficient
in evaluating inactive probabilistic constraints. These two first-order reliability
methods (FORM) are easy to implement but are time-consuming for complex
constraints, because for each time the design variables are changed, the inner
loop must calculate the reliability constraints iteratively, which will become very
computing expensive for complex engineering problems.
To reduce the computational cost of the double loop approach, single
loop approach methods and decoupled approaches have been proposed. Mad-
sen and Hansen (1992) have proposed a method based on the Karush-Kuhn-
Tucker(KKT) optimality conditions [12], where the RBDO problems are trans-
formed into KKT optimality conditions. Liang et al. (1997) have based on KKT
method, and have further developed a Single Loop Approach(SLA) [10], in SLA,
the nested RBDO problem is transformed into equivalent deterministic single-
loop processes. Du and Chen (2004) have proposed the Sequential Optimization
and Reliability Assessment(SORA) method [4], Cheng et al. (2006) have pro-
posed a Sequential Approximate programming(SAP) method [2], these meth-
ods all try to separate the reliability analysis from the optimization loop and
transform the RBDO problem into deterministic optimization loops to improve
efficiency.
For complex engineering problems, metamodels are widely used to substitute
complex reliability constraints. Ju and Lee (2008) have used Kriging metamodel
and moment method to solve RBDO problem [3], Lee and Jung (2008) have pro-
posed a constraint boundary sampling(CBS) method, that adds more training
points on the limit state functions and has used MCS to solve the reliability
problem [9]. Chen et al. (2014) have proposed a local adaptive sampling(LAS)
method, that will add points around current design points to update the Kriging
metamodel, and have used FORM to perform the reliability analysis [1]. Dubourg
and Sudret (2013) have used important sampling(IS) method to build the Krig-
ing model and have used MCS to perform the reliability analysis [5]. Zhuang
and Pan (2012) have proposed a sequential sampling for Kriging using PMA
method and add samples using expected relative improvement(ERI) criterion,
Kriging-Based Reliability-Based Design Optimization Using 993
which focuses more points on current most probable point(MPP) [11]. Echard
et al. (2011) has proposed an active learning method by combining MCS and
Kriging, which used expected feasibility function(EFF) to find the best points
to update the surrogate [16].
These sampling methods separate the processes of training the Kriging meta-
models and the reliability analysis, they use double loop methods to solve RBDO
problems. To further improve the efficiency of Kriging-based RBDO, this paper
aims to use the Kriging metamodel and the Single Loop Approach (SLA). The
Kriging metamodel is updated by using the Most Probable Points (MPPs) cal-
culated at each iteration of SLA.
The paper is structured as follows, in the first part, previous work of RBDO
is discussed, in part 2, the theory of Kriging metamodel is briefly introduced,
in part 3 the Kriging-SLA method is introduced, in part 4, two well-known
benchmark problems are used to demonstrate the method. The last part is the
conclusion.
To determine the value of θ, βi and σz2 from observed data set [x, Y (x)],
maximum likelihood estimation (MLE) can be used. The likelihood function of
Eq. 2 is expressed as [7]:
n 1 1
L(θ, βi , σz2 |x, Y (x)) = − ln(2πσ2 ) − ln(|R|) − 2 (Y − F β)T R−1 (Y − F β)
2 2 2σ
(5)
994 H. Zhang et al.
where Ψ(x)T is the correlation vector between observed value and the new
prediction.
The derivative ∇Ŷ (x) of the prediction Ŷ (x) can be easily calculated from
Ŷ (x):
∂ Ŷ (x) ∂h(x)T Ψ(x)T −1
= β̂ + R (Y − F β̂) (9)
∂x ∂x ∂x
where
∂ Ŷ (x) ∂ Ŷ (x) ∂ Ŷ (x) ∂ Ŷ (x)
= , ,... (10)
∂x ∂x(1) ∂x(2) ∂x(n)
min Gi (U )
(12)
st.||U || = βit
where Gi (U ) is the ith RBDO constraint in the normal space U , βit is the
target reliability index. At the optimal point (MPP), adopting KKT optimality
condition of Eq. 12:
∇G(U ) + λ∇H(U ) = 0 (13)
where ∇H(U ) = ||U ||−β t , is the equality constraint of PMA, λ is the Lagrangian
multiplier.
Kriging-Based Reliability-Based Design Optimization Using 995
(k)
where ui is the random design variables in the normalized space, d(k) is the
deterministic design variables, Gi (dk , X k ) is the ith constraint, βit is the target
(k)
reliability index for the ith constraint, αi is the normalized gradient vector of
(k)
the ith constraint. The SLA is run iteratively, until convergence. f (d(k) , μX )
is minimized under the deterministic constraints Gi (d(k) , X (k) ) ≥ 0. In each
loop of the SLA, MPPs for each constraint are calculated, these MPPs are then
used to update the Kriging surrogate. The program will move the design value
iteratively until convergence criteria are reached. The flowchart of Kriging-SLA
can be summarized as:
Step 1. A design of experiment defined by N samples (sampled by Latin hyper-
cube Sampling(LHS) method) as [x1 , x1 , . . . , xm , ] ∈ X and their limit state
function evaluation [Gi (x1 ), Gi (x2 ), . . . , Gi (xN )], i ∈ 1, 2, . . . m, are used to
train the first Kriging surrogate, N is the number of training points.
Step 2. Start the SLA loop, from k = 0, dk is the vector set of variables, lb,
hb are the vector of the lower bounds and the upper bounds of μkx , μkx is the
vector of the mean values of the design variables X, σX is the vector of the
standard deviation, βit is the target reliability index of the ith constraint.
(k)
Step 3. k = k +1, the vector of normalized gradient vector αi and the current
k
most probable points (MPPs) Xi of each constraint are calculated by using
(k)
the derivatives ∇μ Ĝi (d(k) , Xi ). These derivatives are calculated from the
Kriging surrogate.
(k)
Step 4. Calculate the true response of current MPPs Gi (Xi ), and add the
new points and the response to the data set of Kriging.
Step 5. min f (d, μX ), under SLA constraints Gi (dk , X k ) ≥ 0, and calculate
(k)
new d(k) and μX .
(k) (k−1)
Step 6. Compare d(k) , μX with d(k−1) ,μX , if ||d(k) − d(k−1) || ≤ and
||μ(k) − μ(k−1) || ≤ , stop; else go to step 2 and continue.
996 H. Zhang et al.
The number N of starting data set for building the surrogate can be very
small, because, though the first surrogate may failed to capture the main char-
acteristic of the constraints. The SLA may converge to an infeasible point, new
points will be added to the surrogate which are the best points to update the
Kriging around the MPPs, until the surrogate is better and better around the
optimum.
Three benchmark examples are given below and used to validate the pro-
posed method Kriging-SLA, the results are compared with reference with other
authors.
fewer training points, but it should be noted that for LAS, CBS, and SS methods,
after the Kriging surrogate is built, they still need to conduct reliability analysis
using MCS separately, if the failure rate is very low, large samples should be
drawn from Kriging surrogate to get the accurate results, which will increase
the total computation cost.
This is another well-known benchmark example [3], where there are also two
variables, their standard deviations are same as σ = 0.3, the target reliability
index for all three constraints are set to be 3. The reference result with SLA
method is μ = [3.4391; 3.2864], the optimum objective function value is 6.7255.
998 H. Zhang et al.
5 Conclusion
The proposed Kriging-SLA method can well solve RBDO problems, it is very
robust and accurate. It is very suitable for engineering problems with com-
plex reliability constraints. This method needs fewer initial sample points, as
it doesn’t seek to globally fit the constraints well, it seeks to add the best points
that are currently available until the surrogate finds the accurate MPP. It’s
highly robust that can converge to right optimum with very few initial points,
even the initial sampling failed to cover the main characteristic of the constraints.
The accuracy of Kriging-SLA is in accordance with the SLA method without
Kriging.
References
1. Chen, Z., Qiu, H., Gao, L., Li, X., Li, P.: A local adaptive sampling method
for reliability-based design optimization using kriging model. Struct. Multidiscip.
Optim. 49(3), 401–416 (2014)
2. Cheng, G., Xu, L., Jiang, L.: A sequential approximate programming strategy
for reliability-based structural optimization. Comput. struct. 84(21), 1353–1367
(2006)
3. Cho, T.M., Lee, B.C.: Reliability-based design optimization using convex lineariza-
tion and sequential optimization and reliability assessment method. Struct. Saf.
33(1), 42–50 (2011)
4. Du, X., Chen, W.: Sequential optimization and reliability assessment method for
efficient probabilistic design. American Society of Mechanical Engineers (2002)
1000 H. Zhang et al.
Abstract. Wind turbine blades are subjected to wind pressure and iner-
tial loads from their rotational velocity and acceleration, that depends
on the external environment and the turbine control (start-up, nor-
mal energy production, shut down procedures, etc.). Several numerical
tools are developed to compute the applied loads to the wind turbine
blades. These numerical tools are generally based on multiphysics simula-
tion (aeroelasticity, aerodynamics, turbulence, etc.) and multibody beam
finite element model of the whole turbine. However, when we are inter-
ested in optimizing the structural blades, we need to use shell finite ele-
ment models in the structural analysis. Thus, the loads estimated by using
the beam element are transformed into a 3D distribution pressure loads
for the shell element. Several Load Application Methods are developed in
the literature. However, in the context of the structural reliability analysis
and optimization of the wind turbine blades, the suitable method should
be selected with respect to his sensitivity to uncertain input parameters.
This study present, a sensitivity analysis of the output of two load appli-
cation methods for shell finite element models, with respect to uncertain
input parameters as loads and material properties. The Morris method
is used to carry out a sensitivity analysis. Both load application methods
are sensitive to the change of the thickness in the materials and have a
greater effect than the distributed loads applied by section.
1 Introduction
Wind turbine blades are subjected to wind pressure and inertial loads from
their rotational velocity and acceleration, that depends on the external environ-
ment, electromagnetic generator torque and the turbine control (start-up, nor-
mal energy production, shut down procedures, etc.). Several numerical tools are
developed to facilitate the design of wind turbine blades, usually loads calculation
is carried out with a beam Finite Element Model (FEM), taking into account
aero-elastic behaviour, turbine control commands and also the hydrodynamic
behaviour for off-shore turbines. Some examples of these codes are the Fatigue,
Aerodynamics, Structures, and Turbulence [12] from National Renewable Energy
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 1001–1010, 2020.
https://doi.org/10.1007/978-3-030-21803-4_99
1002 W. J. V. Parra et al.
Laboratory and Horizontal Axis Wind turbine simulation Code 2nd generation
[15]. These multi-physics and multi-body aero-servo-hydro-elastic beam finite
elements codes are able to run a great number of the design situations described
by certification bodies [9], taking into account all different extreme loads acting
in the structure, generating the loading history used to fatigue analysis.
The method to transform these loads from the beam finite element model to
a shell finite element model is defined as the Load Application Method (LAM)
[5]. The issue is to select the appropriate method to use in the context of the
Reliability-Based Design Optimization (RBDO) of the blades, that ensures the
convergence of the optimization and the reliability procedures and having a bal-
ance on computational time and physical considerations in the load distribution.
A sensitivity analysis using the Morris method is used to compare two LAM,
viewing the sensitivity of the output responses (displacement and stress) with
respect to uncertain input parameters. The sensitivity analysis aims to iden-
tify the most significant uncertain parameters on the variability of the output
responses and to select the appropriate LAM for the RBDO approach. In other
words, the main goal of the sensitivity analysis in the RBDO approach is to
reduce the stochastic dimension of the reliability analysis, where only the most
uncertain parameters that have a great influence on the output responses are
considered in the surrogate model, the remaining parameters were fixed to their
respective mean values.
To transfer the 1D load distribution from the beam FEM to a 3D load distribu-
tion to be applied to a shell FEM, Caous [5] has classified the methods reviewed
in the literature into 4 groups, as listed bellow:
In this article only the first two groups are studied applying a Morris sensi-
tivity analysis.
In this first group (Fig. 1(a)), the resultant loads from the beam FEM computed
at selected nodes along the blade span are applied directly into the shell model
at selected sections (that have the same position as the beam FEM), either
Sensitivity Analysis of Load Application Methods 1003
Fig. 1. Approaches for load application in a shell FEM of the blade [5].
through a master node which controls the whole section displacement by relations
between nodes degrees of freedom (using Rigid Body Elements: RBE) [7,10] or
directly onto one node having a nearby position compared with the aerodynamic
node of the beam finite element [8].
Fig. 2. LAM used to compare: (a) RBE and (b) load distribution in four points [5].
The authors had applied directly the resultant loads from the beam
FEM to a shell model without distinction between aerodynamic and iner-
tial loads (Fig. 2(a)). External forces and moments are extracted from ten
nodes of the beam FEM and applied into the shell model as FxRBE , FyRBE ,
FzRBE ,MxRBE , MyRBE and MzRBE at each section that have the same location
as the beam element.
All the nodes of the different sections of the shell model are linked by RBE
and it makes then undeformable. This approach is mostly used to model a full-
scale tests on blades [3], for a simple and fast application of loads from the beam
to the shell FEM.
sensitivity measures are based on the computation of the elementary effect EEi
for each input parameter Xi , which is defined by the finite difference derivative
approximation:
1
j=m
μ∗ = |EEij | (2)
m j=i
sensitivity measures, the absolute expected value μ∗ and the standard deviation
σ ∗ , are normalized between [0, 1] using the equations bellow:
μ∗ −min(μ∗)
μN = max(μ∗)−min(μ∗)
σ ∗ −min(σ ∗ ) (4)
σN = max(σ ∗ )−min(σ ∗ )
In this study, the Morris method is used to carry out the sensitivity analysis,
that allows determining the parameters more influential, and also, decrease the
number of input parameters that will be considered thereafter other sensitivity
analysis, as Sobol sensitivity approach. Screening analysis is implemented using
the software OpenTurns [1] that is incorporated in the Open Source software
Salome-Meca that allows the interaction between Code-Aster and OpenTurns.
Results obtained using this method are presented in the next section.
where, ρ is the density of air, A reference area, V (z) the wind speed at the height
z, cd drag coefficient and cl lift coefficient. The wind speed is calculated by using
the average wind gradient at the boundary of the atmosphere surface developed
by Panofsky and Dutton [18]:
q
z
ū(z) = ū(zref ) (6)
zref
selecting, ū(zref ) as 11 m/s, zref as 10 m and q as 0.27 from the Hellmann
exponent in stable air above open water surface [13]. The lift and drag coefficient
were selected as 1 and 0.47 respectively, in order to generate the same distribution
of forces in all sections but with different magnitudes in both directions.
Both forces from Eq. 5 are used in a 1D beam FEM of the cylinder created
in Code Aster. Therefore, the forces and moments at each section that will be
Sensitivity Analysis of Load Application Methods 1007
applied in the shell FEM cylinder are extracted. In the Morris screening method,
all the input parameters are considered uniform random variables with interval
of ±10% of the initial values.
1 1
RBE RBE
N N
0.9 4NO 0.9 4NO
N N
RBE RBE
N N
0.8 4NO 0.8 4NO
N N
0.7 0.7
0.6 0.6
* Normalized
* Normalized
0.5 0.5
* and
* and
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
EP1 EP2 Other Variables EP1 EP2 Other Variables
Fig. 3. Histogram of sensitivity measures of load application methods RBE and 4NO
for the cases 1.
5.2 Case 1(b): Materials and Loads as Inputs and Stress as Output
Parameters
In Fig. 3(b) for the level of σ22 , For the LAM-RBE the thickness of both materials
have the highest sensitivity effect, and EP1 have the maximum values for both
μ∗ = 1.4e7 and σ ∗ = 1.5e7, and also having the highest non-linear effect. As
well, for LAM-4NO the thickness have a non-linear effect but not as sensitive as
LAM-RBE.
1008 W. J. V. Parra et al.
1 1
RBE RBE
N N
0.9 4NO 0.9 4NO
N N
N
RBE N
RBE
0.8
N
4NO 0.8
N
4NO
0.7 0.7
0.6 0.6
* Normalized
* Normalized
0.5 0.5
* and
* and
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
Fx0
Fx1
Fx2
Fx3
Fx4
Fx5
Fx6
Fx7
Fx8
Fx9
Fy0
Fy1
Fy2
Fy3
Fy4
Fy5
Fy6
Fy7
Fy8
Fy9
Fz0
Fz1
Fz2
Fz3
Fz4
Fz5
Fz6
Fz7
Fz8
Fz9
Mx0
Mx1
Mx2
Mx3
Mx4
Mx5
Mx6
Mx7
Mx8
Mx9
My0
My1
My2
My3
My4
My5
My6
My7
My8
My9
Mz0
Mz1
Mz2
Mz3
Mz4
Mz5
Mz6
Mz7
Mz8
Mz9
Fx0
Fx1
Fx2
Fx3
Fx4
Fx5
Fx6
Fx7
Fx8
Fx9
Fy0
Fy1
Fy2
Fy3
Fy4
Fy5
Fy6
Fy7
Fy8
Fy9
Fz0
Fz1
Fz2
Fz3
Fz4
Fz5
Fz6
Fz7
Fz8
Fz9
Mx0
Mx1
Mx2
Mx3
Mx4
Mx5
Mx6
Mx7
Mx8
Mx9
My0
My1
My2
My3
My4
My5
My6
My7
My8
My9
Mz0
Mz1
Mz2
Mz3
Mz4
Mz5
Mz6
Mz7
Mz8
Mz9
Fig. 4. Histogram of sensitivity results of load application methods RBE and 4NO for
case 2.
References
1. Baudin, M., Dutfoy, A., Iooss, B., Popelin, A.L.: OpenTURNS: an industrial soft-
ware for uncertainty quantification in simulation. In: Handbook of Uncertainty
Quantification, pp. 2001–2038 (2017)
1010 W. J. V. Parra et al.
2. Bottasso, C.L., Campagnolo, F., Croce, A., Dilli, S., Gualdoni, F., Nielsen,
M.B.: Structural optimization of wind turbine rotor blades by multilevel
sectional/multibody/3D-FEM analysis. Multibody Syst. Dynam. 32(1), 87–116
(2014)
3. Branner, K., Berring, P., Berggreen, C., Knudsen, H.W.: Torsional performance of
wind turbine blades–part ii: Numerical validation. In: 16th International Confer-
ence on Composite Materials, Anonymous, pp. 8–13 (2007)
4. Caous, D., Valette, J.: Methodology for G1 blade assessment. Technical report,
TENSYL, La Rochelle (2014)
5. Caous, D., Lavauzelle, N., Valette, J., Wahl, J.C.: Load application method for
shell finite element model of wind turbine blade. Wind Eng. 42(5), 467–482 (2018)
6. EDF: Finite element code aster, analysis of structures and thermomechanics for
studies and research, year = 1989–2017. Open source on www.code-aster.org
7. Forcier, L.C., Joncas, S.: Development of a structural optimization strategy for
the design of next generation large thermoplastic wind turbine blades. Struct.
Multidiscip. Optim. 45(6), 889–906 (2012)
8. Griffith, D.T., Ashwill, T.D.: The sandia 100-meter all-glass baseline wind tur-
bine blade: Snl100-00. Sandia National Laboratories, Albuquerque, Report No.
SAND2011-3779, p. 67 (2011)
9. Guideline, G., Lloyd, G.: Guideline for the certification of wind turbines. German-
ischer Lloyd Wind Energie Gmb H, Hamburg (2010)
10. Haselbach, P.U., Bitsche, R., Branner, K.: The effect of delaminations on local
buckling in wind turbine blades. Renew. Energy 85, 295–305 (2016)
11. Hu, W., Choi, K., Zhupanska, O., Buchholz, J.H.: Integrating variable wind load,
aerodynamic, and structural analyses towards accurate fatigue life prediction in
composite wind turbine blades. Struct. Multid. Optim. 53(3), 375–394 (2016)
12. Jonkman, J.M., Buhl, Jr. M.L.: Fast user’s guide-updated august 2005. Technical
report, National Renewable Energy Laboratory (NREL) (2005)
13. Kaltschmitt, M., Streicher, W., Wiese, A.: Renewable Energy: Technology, Eco-
nomics and Environment. Springer Science & Business Media, Heidelberg (2007)
14. Knill, T.J.: The application of aeroelastic analysis output load distributions to
finite element models of wind. Wind Eng. 29(2), 153–168 (2005)
15. Larsen, T.J., Hansen, A.M.: How 2 HAWC2, the user’s manual. Technical report,
Risø National Laboratory (2007)
16. Mandell, J., Samborsky, D.: SNL/MSU/DOE composite material fatigue database
mechanical properties of composite materials for wind turbine blades version 25.0.
Montana State University (2016)
17. Morris, M.D.: Factorial sampling plans for preliminary computational experiments.
Technometrics 33(2), 161–174 (1991)
18. Panofsky, H.A.: Atmospheric Turbulence. Models and Methods for Engineering
Applications. 397 (1984)
Transportation, Logistics, Resource
Allocation and Production Management
A Continuous Competitive Facility
Location and Design Problem for Firm
Expansion
1 The Model
When locating a new facility in a competitive environment, both the location
and the quality of the facility need to be determined jointly and carefully in
order to maximize the profit obtained by the locating chain. This fact has been
This research has been supported by grants from the Spanish Ministry of Economy
and Competitiveness (MTM2015-70260-P, and TIN2015-66680-C2-1-R), the Hungar-
ian National Research, Development and Innovation Office - NKFIH (OTKA grant
PD115554), Fundación Séneca (The Agency of Science and Technology of the Region
of Murcia, 20817/PI/18) and Junta de Andalucı́a (P12-TIC301), in part financed by
the European Regional Development Fund (ERDF).
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 1013–1022, 2020.
https://doi.org/10.1007/978-3-030-21803-4_100
1014 B. G.-Tóth et al.
highlighted in [2] among other papers. However, when a chain has to decide how
to invest in a given geographical region, it may also invest part of its budget in
modifying the quality of other existing chain-owned centers (in case they exist)
up or down, or even in closing some of those centers in order to allocate the
budget devoted to those facilities to other chain-owned facilities or to the new
one (in case the chain finally decides to open it). In this paper, we extend the
single facility location and design problem introduced in [2] to accommodate
these possibilities as well.
The scenario is as follows. A chain has to decide how to invest its budget B in
a given area of the plane in order to maximize its annual profit. It may open one
new facility and/or close and/or modify the quality of its existing chain-owned
facilities. Let us assume that there already exist m facilities offering the same
goods or product in the area and that the first k of those m facilities belong to
the expanding chain (k < m). We assume k > 0, otherwise, the model reduces
to that in [2] when the chain is a newcomer. It is assumed in this paper that the
demand is fixed and concentrated at n demand points, although a similar model
with variable demand could also be considered (see [9]). Hence, the locations
pi and buying power wi at the demand points are known. The location fj and
present quality α̃j of the j-th existing facility are also known, for j = 1, . . . , m.
In [2] it was assumed that the qualities of the existing chain-owned facilities,
as well as the quality α0 of the new facility to be located, were within the interval
[αmin , αmax ], where αmin > 0 (resp. αmax ) was the minimum (resp. maximum)
value that the quality of a facility run by the chain could take in practice. Here,
we will assume the same, but now we have to take into account the closing of
an existing chain-owned facility, or not opening any new facility. To do it, we
will use a binary variable yj , that is, 1 if the j-th facility is kept open, and 0 if
it is closed (or not open for j = 0). In the later case, the quality of the facility
does not play any role, as the attraction of the facility will be 0, as we will see.
Notice that initially, αj = α̃j ∈ [αmin , αmax ], j = 1, . . . , k. Since the j-th facility
is already established, its area can hardly be modified. Hence, most likely its
j
quality can be upgraded from α̃j up to certain level αmax ≤ αmax . Hence, we
j
will assume that αj ∈ [αmin , αmax ] for all j ∈ {1, . . . , k}, where αj denotes the
final value for the quality of the j-th facility. For j = 0, which refers to the new
facility, we have that α0 ∈ [αmin , αmax ].
Of course, some types of costs must be taken into account, too. The most
obvious one is related to the opening of the new facility at a given location f0 with
a quality α0 (in case it is open). This annualized cost will be denoted by G(f0 , α0 )
and it is only incurred if the new facility is actually open. In that case, the actual
cost depends on the location and the quality of the facility. Analogously, we also
have to pay the annualized costs Aj , j = 1, . . . , k, of the facilities already open,
in case they are kept open in the current year. Conversely, the closing of an
existing facility j (j = 1, . . . , k) also implies a cost, Cj , as this usually implies
dismantling the facility, moving materials and furniture to another place, etc.
Another cost is incurred when the quality of an existing facility is varied, as this
usually requires some investment. Again, this annualized cost, Vj (αj ) should be
A Location Model for Firm Expansion 1015
only taken into account when a variation in the quality occurs, and in that case,
the amount of the investment depends on how much the new quality αj of the
facility differs from the present quality α̃j . Finally, we also have to consider the
annual cost Rj (αj ) of running the facility j when its quality is αj .
The costs of the chain,
k
T (ns) = (yj (Aj + Rj (αj ) + Vj (αj )) + (1 − yj )Cj ) (1)
j=1
+y0 (G(f0 , α0 ) + R0 (α0 )), (2)
include the annualized cost of having open the existing facilities (Aj ) plus the
annual cost of operating them (Rj ) plus the annualized cost of varying their
qualities (Vj ) or the cost of closing them (Cj ), (given by (1)), and the annualized
cost of opening and the annual cost of operating the new facility, in case it is
open (see (2)).
For the ease of notation, the variables (f0 , α0 , . . . , αk , y0 , . . . , yk ) of the model
will be denoted by ns. Other notation needed for the mathematical formulation
are the Euclidean distance between demand point pi and facility fj , dij (i =
1, . . . , n, j = 1, . . . , m) and similarly, the distance between demand point pi and
the new facility, di (f0 ). Besides, gi (·) is a non-negative non-decreasing function
which transforms the distance in the measurement of the attraction.
The patronizing behavior of customers is probabilistic, that is, customers’
demand is split among the facilities proportionally to the attraction they feel
for them. The attraction (or utility) that a demand point feels for a facility
depends on both the location of the facility and its quality, and may vary from
one demand point to another, as indicated by the parameter γi . At present, the
attraction (or utility) that demand i feels for facility j is ũij = γi α̃j /gi (dij ).
When the quality changes to αj (or is α0 for the new facility), it is given by
γi αj γi α0
uij (αj ) = yj ui0 (f0 , α0 ) = y0
gi (dij ) gi (di (f0 ))
Notice that due to yj , the attraction is 0 whenever a facility is closed (or not
open).
Based on these assumptions, the market share captured by the chain is
n k
ui0 (f0 , α0 ) + j=1 uij (αj )
M (ns) = wi k m . (3)
i=1 ui0 (f0 , α0 ) + j=1 uij (αj ) + j=k+1 ũij
The problem (P ) of profit maximization can be formulated as follows:
max Π(ns) = F (M (ns)) − T (ns) (4)
s.t. T (ns) ≤ B (5)
di (f0 ) ≥ dmin
i , i = 1, . . . , n (6)
f0 ∈ S ⊂ R 2
(7)
j
αj ∈ [αmin , αmax ], j = 0, . . . , k (8)
yj ∈ {0, 1}, j = 0, . . . , k (9)
1016 B. G.-Tóth et al.
In the previous expression, the parameter δj > 0 determines how much cheaper
is decreasing the quality as compared to increasing it.
Concerning Rj (αj ), which gives the annual operating cost of facility j when
its quality is αj , it should be nondecreasing as αj increases. Its functional
form may vary from convex to concave, linear, piecewise linear or other forms
A Location Model for Firm Expansion 1017
depending on the type of facility. In this paper we will assume a linear form,
Rj (αj ) = oj αj , with oj > 0 a given constant.
Problem (P ) is a Mixed-Integer NonLinear Programming problem (MINLP).
Hence, solving it is a challenge from the optimization point of view, and global
optimization tools are required to cope with it.
Other discarding tests have also been designed following the same idea.
If in a given iteration more budget is required, then the quality of the least
profitable facility should be reduced down to a given value, or even the facility
could be closed (whatever is better). This procedure, called get more budget
in Algorithm 1, provisionally reduces the quality of the least profitable facility
whose quality can be reduced, j1 , down to αj1 = max{αmin , αjlp1 } where αjlp1 is
the solution of the equation prof itabj1 = prof itabj2 and j2 is the facility with
the second least profitable ratio. Then, it computes an estimation of the profit
that can be obtained with this reduction with a small number of iterations of
a Weiszfeld-like method. Analogously, the procedure computes an estimation of
the profit that can be obtained with the closure of the least profitable facility
and chooses the best option.
Notice that every time that a facility is closed or its quality is reduced, the
profitability ranking should be recomputed, as the market share captured by
the open facilities may change. Also, every time a facility is closed, a forbidden
area surrounding the facility should be included in the model, so as to avoid the
1020 B. G.-Tóth et al.
new facility to be located just in the area where a facility has just been closed.
Procedure open locates a new facility using the available budget, that must be
at least B min , using a modification of the multi-start Weiszfeld-like algorithm
in [10].
The available budget at a given iteration is distributed using a greedy strat-
egy. The most profitable facility is allowed to vary its quality as much as needed
(provided that this does not surpass the budget and that the profit obtained
by the chain, Π(ns), improves). Once finished, if there is still some budget left,
the process is repeated with the second most profitable facility, and so on (pro-
cedure improve). In addition, if there are previously closed facilities, they can
be reopened if there is enough budget for it and the chain’s profit improves
(procedure reopen).
3 Computational Studies
All the computational results in this paper have been obtained under Linux on
an AMD Athlon(tm) 64 X2 with 2.2 GHz CPU and 2 GB memory. The algo-
rithms have been implemented in C++. For the interval B&B method, we used
the interval arithmetic in the PROFIL/BIAS library [8], and the automatic dif-
ferentiation of the C++ Toolbox library [4].
We have generated a set of random problems in order to evaluate the perfor-
mance of the algorithms. They all have n = 100 demand points, and the number
m of existing facilities and the number k of those facilities belonging to the chain
considered were m = 3, 5 and k = 1, 2.
For each setting, 10 problems were generated by randomly choosing the
parameters of the problems uniformly within given intervals (or computed from
other parameters). The parameters are pi , fj ∈ S = ([0, 10], [0, 10]), ωi ∈ [0, 10],
γi ∈ [0.75, 1.25], α̃j ∈ [0.5, 5], φi0 = 2, φi1 ∈ [0.5, 1.5], β0 ∈ [7, 9], β1 ∈
[5, 5.5], δj ∈ [3, 5], c ∈ [12, 14], Ai ∈ [8, 11], Ci = Ai /2, oj = 20Aj .
These settings were obtained by varying up and down the value of the param-
eters of the quasi-real problem studied in [13], where a case of location of super-
markets in southeast Spain is studied. Nevertheless, when applying the model
to a particular problem those parameters have to be fine-tuned.
The search space for every problem was f0 ∈ S and αj ∈ [0.5, 5], j = 0, . . . , k.
For each problem, we computed the difference between the optimal objective
value obtained by the B&B and the best solution obtained by the heuristic in
10 runs, in percentage, and the number of times that the heuristic found the
best solution it could find. Table 1 shows the average values obtained for each
(m, k) setting, with the standard deviation in brackets. As shown, we solved each
problem with five different budgets. Hence, in all, 200 instances were generated.
The B&B method could solve all the instances with a relative accuracy of
0.0001. The CPU time needed for the method can be very large, although it is
not always the case. The standard deviation shows that the required time varies
extremely from a few seconds to many hours. It also shows that the difficulty
of the problems does not depend on the number of existing facilities nor on the
A Location Model for Firm Expansion 1021
chain length. Interestingly, the cases with the setting (m = 5, k = 1) seem very
easy as compared to the others. Although we have checked the results in more
detail, we could not find any pattern in this behavior.
As we can see, the heuristic method can find a solution very close to the
optimum. In average, the difference from the global optimum is 0.5%, and in the
worst case, it is still only 4.18%. Clearly, the elapsed time is much shorter, as
on average only 24 seconds are required. Comparing the results for the different
settings, one can see no remarkable effects.
References
1. Fernández, J., Pelegrı́n, B.: Using interval analysis for solving planar single-facility
location problems: new discarding tests. J. Global Optim. 19(1), 61–81 (2001)
2. Fernández, J., Pelegrı́n, B., Plastria, F., Tóth, B.: Solving a Huff-like competitive
location and design model for profit maximization in the plane. Eur. J. Oper. Res.
179(3), 1274–1287 (2007)
3. Francis, R., Lowe, T., Tamir, A.: Demand point aggregation for location models.
In: Drezner, Z., Hamacher, H. (eds.) Facility Location: Application and Theory,
pp. 207–232. Springer, Heidelberg (2002)
4. Hammer, R., Hocks, M., Kulisch, U., Ratz, D.: C++ Toolbox For Verified Com-
puting I: Basic Numerical Problems: Theory, Algorithms and Programs. Springer-
Verlag, Heidelberg (1995)
5. Hansen, E., Walster, G.W.: Global Optimization Using Interval Analysis - Second
Edition, Revised and Expanded. Marcel Dekker, New York (2004)
6. Kearfott, R.: Rigorous Global Search: Continuous Problems. Kluwer, Dordrecht
(1996)
7. Kearfott, R., Nakao, M., Neumaier, A., Rump, S.M., Shary, S.P., van Hentenryck,
P.: Standardized notation in interval analysis. TOM 15(1), 7–13 (2010)
8. Knüppel, O.: PROFIL/BIAS - a fast interval library. Computing 53(1), 277–287
(1993)
9. Redondo, J., Fernández, J., Arrondo, A., Garcı́a, I., Ortigosa, P.: Fixed or variable
demand? Does it matter when locating a facility? Omega 40(1), 9–20 (2012)
10. Redondo, J., Fernández, J., Garcı́a, I., Ortigosa, P.: A robust and efficient global
optimization algorithm for planar competitive location problems. Ann. Oper. Res.
167(1), 87–106 (2009)
11. Tóth, B., Fernández, J.: Interval Methods For Single and Bi-Objective Optimiza-
tion Problems - Applied to Competitive Facility Location Problems. Lambert Aca-
demic Publishing, Saarbrücken (2010)
12. Tóth, B., Fernández, J., Csendes, T.: Empirical convergence speed of inclusion
functions for facility location problems. J. Comput. Appl. Math. 199, 384–389
(2007)
13. Tóth, B., Plastria, F., Fernández, J., Pelegrı́n, B.: On the impact of spatial pattern,
aggregation, and model parameters in planar Huff-like competitive location and
design problems. OR Spectr. 31(1), 601–627 (2009)
A Genetic Algorithm for Solving the
Truck-Drone-ATV Routing Problem
1 Introduction
A drone is an unmanned aircraft, which flies mostly autonomously and relies on
routing algorithms to find its way. Drones started playing an increasing role in
logistics [3,10]. Typically, drones can carry about 2 to 6 kg and reach speeds of
up to 70 km/h (multirotor drones) and 130 km/h (fixed wing drones). Because
of their high maneuverability and relatively ease of usage, multirotor drones are
useful for parcel deliveries to customers in urban areas. However, to overcome
their limitation in range, a dense network of depots or micro-depots (e.g., DHL’s
SkyPort [5]) to start and land the drones might be needed. Hence, an alternative
approach consists in using trucks that carry drones and assist the driver in
delivering parcels. More precisely, the advantages rely on the high-capacity cheap
long distance transportation through trucks and the possibility to charge the
limited batteries of the drones, which can have faster access to hard-to-reach
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 1023–1032, 2020.
https://doi.org/10.1007/978-3-030-21803-4_101
1024 M. Moeini and H. Salewski
areas. Additionally, for delivering a single light parcel, drones are energy-efficient
and able to ignore the possibly congested road network. The combination of
trucks and drones received particular attention in the research community and
produced a number of articles ranging over different settings: a single truck
carrying one drone (e.g., [1,7,9]) up to a fleet of trucks with each truck carrying
multiple drones [11,13–15].
Parcel delivery by drones is a prominent and an emerging industry; however,
there are some issues that such deliveries are facing, e.g., sensitivity regarding
winds (menacing drones), especially in urban areas. Furthermore, high popula-
tion density requires high safety measures; in particular, if a drone fails in its
mission. Consequently, in most western countries, regulation authorities do not
allow the operation of fully autonomous drones; therefore, human operators are
needed to supervise drones and need a communication link to the drones. This
increases the true operational costs and imposes additional restriction on the
number of drones that can be operated in parallel, and the places where drones
can be used.
Another approach tries to combine trucks with smaller ground based
autonomous transport vehicles (ATVs). They cannot move as fast (up to 30 km/h
on roads or 6 km/h on side walks) as far (up to 3 to 10 km), or as free as drones.
However, since ATVs are allowed to travel in pedestrian areas and on side-walks,
they can use a different network than trucks, which might allow shorter distances.
Compared to drones, ATVs are more energy efficient, can carry much heavier
parcels (up to 40 kg), and require less space to be launched from a truck. If ATVs
should, by law, require an operator, a single one might handle more ATVs than
drones, thus, reducing the operational costs per vehicle. Prototypes of trucks
which are capable of dispatching up to six ATVs exist. Here again, the goal is to
improve the performance of last-mile delivery systems. This idea was adapted in
[2], where a routing model for a truck that dispatches ATVs at drop-off points is
formulated. The truck might replenish on ATVs at decentralized micro-depots
and after serving the customer, the ATVs continue to such depots.
Since both approaches, assisting trucks by drones and the combination of
trucks with ATVs might be beneficial, including them in a single approach
might be even more advantageous. Depending on the parcel’s weight and the
exact location of the customers, either a drone or an ATV could be used for
delivering parcels. Further, the truck is used to extend the limited range of the
ATVs and drones. The system is more versatile than the use of a truck with just
drones or ATVs. In this study, we introduce a new concept in last-mile deliv-
ery in which a truck carries a mixed fleet of drones and ATVs, such that the
truck dispatches and collects drones or ATVs that do the actual delivery from
designated points. In other words, the truck is not used for direct deliveries.
From a practical point of view, this style of delivering system is useful to serve
customers; in particular, in cases where a truck might be not allowed in certain
areas (e.g., due to ecological constraints, closed or too narrow roads, protected
areas, etc.). Such a combined system should be cheaper than the widespread
installation of micro-depots that launch drones or ATVs. Additionally, a
A Genetic Algorithm for Solving the Truck-Drone-ATV Routing 1025
2 Problem Description
capacity) and can serve only one customer per operation. But, their battery is
recharged immediately as soon as they return to the truck. The objective of
the TDA-RP consists in finding a feasible routing plan such that, all customers
are served by either drones or ATVs, and the total mission time is minimized.
Moreover, by the end of the mission, all drones and ATVs must be on the truck
and the truck must be at the depot.
Finally, we make the following additional assumptions about the behavior of
drones’ and ATVs’ in a risk-free environment [9,11,13–15]:
– A drone or an ATV has a limited battery life of E time units. After returning
to the truck, the battery life of the drone or ATV is recharged immediately
with no delay in service.
– The trucks, drones, and ATVs follow the same distance metric.
– We suppose that the service time to launch and retrieve a drone or an ATV,
as well as the service time required to serve a customer are negligible.
– We assume that the drones/ATVs are in constant flight/movement and can
not conserve battery while in flight/movement; consequently, if a truck arrives
earlier to a grid point, then the truck has to wait for its corresponding
drones/ATVs and vise versa.
– We assume that when a drone or an ATV is dispatched, then its delivery will
be successful.
– Due to technological restrictions like limited volume or the missing possibility
to securely divide the cargo hold, we assume that ATVs might only carry one
parcel, regardless of its size.
3 Solution Method
1
A TSP route might be then polished to include only a subset of all grid points.
However, it is ensured that the limited range of the ATVs as well as drones are
respected and all customers can be reached from at least on grid point.
A Genetic Algorithm for Solving the Truck-Drone-ATV Routing 1027
to schedule the customer deliveries from each grid point. Since only ATVs are
able to deliver large parcels, this component also includes the decision if a small
parcel’s delivery should be done by a drone (e.g., for customers 4 and 6) or an
ATV (customer 5).
We design a GA that fits to the specific structure of the TDA-RP. For this
purpose, we use a direct problem representation and the fitness value defined by
the mission time. In the following, we provide a detailed description of our GA.
We initialize the genetic algorithm by creating solutions for the first genera-
tion until we reach a maximum population size of P . In this phase of the GA,
we only use direct sorties. For this purpose, we set the starting and landing grid
point for each customer in a solution: With a probability of pinit , the heuristic
uses the grid point on the truck’s TSP-route that is closest to the customer’s
location. With a probability of 1 − pinit , the direct sorties start from a different
grid point on the truck’s TSP-route. To choose grid points that are closer to the
customers’ location with a higher probability, we use an ordered set of all grid
points from which the customer could be served. We order this set by the grid
point’s distance to the considered customer and then draw the grid point using
a Poisson distribution with parameter λinit .
As a recombination operator, we use a proportional (roulette wheel) selection
and a two-point crossover to choose the parents and to generate the children,
respectively. After the recombination, we apply the mutation operator to the
entire population of all children and parents. During mutation, we introduce or
change a jump sortie of a randomly chosen customer’s delivery operation with
a predefined probability of pm . For this purpose, we change the starting grid
point (respectively, landing grid point) to an earlier (respectively, later) position
in the sequence of grid points in the truck’s TSP-route. We set a maximum
distance dmax and use a Poisson distribution with parameter λm to decide how
far down (respectively, up) the starting (respectively, landing) should move in the
sequence of the TSP-route. Whether the jump sortie is changed with respect to
the starting or landing grid point, depends on a preset probability pmd , where m
1028 M. Moeini and H. Salewski
and d stand for mutation and distance, respectively. In particular, if the randomly
chosen move in the sequence is too large, i.e., would go beyond the depot in
any direction, then we limit the move to the depot, i.e., it is set as the delivery
operation’s starting or landing point. It might happen that invalid solution could
be introduced by jump sorties that use a combination of starting and landing grid
points which violate the battery limit of an ATV or drone. In any stage, whenever
an invalid solution is generated, we remove it from the resulting population.
In order to calculate the fitness of any solution, we first need to schedule the
ATVs’ and drones’ sorties at each grid point. Since the schedule needs to be
determined for each individual in the entire population, the calculation needs to
be quick. Hence, we based it on the approximation algorithm presented in [4],
where a so-called multi-fit descent heuristic aims at the approximation of the
nonpreemtive scheduling of independent tasks on a set of identical machines.
In the TDA-RP, each machine corresponds to a drone or ATV, and each task
would be a delivery from the considered grid point. Due to the fact that the
scheduling of drones and ATVs is not independent, and furthermore, jump sorties
are possible, we need to modify the approach presented in [4] as follows: First, we
approximate the solution only for the direct sorties, and independently for ATVs
and drones. In order to use as many ATVs/drones for parallel direct sorties, we
schedule any jump sorties after all direct sorties finished. In addition, as drones
are faster than ATVs, we prefer to use drones for deliveries of small parcels. In
fact, we only consider ATVs for deliveries of small parcels, if all large parcels’
deliveries have been scheduled, and if the use of an ATV to deliver a small parcel
might reduce the overall time that a truck spends at the grid point. Based on
the operations sequence we get from the approximation, we calculate the drones’
and ATVs’ landing times and the total duration a truck needs to spend at the
considered grid point. The operations at the next grid point on the truck’s route
start once the truck arrives at the next grid point, i.e., the time required to
finish all operations at the current grid point plus the time that the truck needs
to drive from the current grid point to the next one. Through the scheduling
procedure, it might happen that some grid points are not used as starting or
landing point. In these cases, the corresponding grid points are omitted by the
truck with the objective of reducing the mission time. The maximum arrival time
of the truck, ATVs, or drones at the depot is the fitness value of the solution
and corresponds to the objective function value of the TDA-RP.
We repeat the recombination and the mutation until we can find no further
improvement for a preset number of consecutive iterations. We restart the GA
for a given number of times and report the best found solution for the same route
of grid points (for now) over all independent runs as the result of the algorithm.
4 Computational Experiments
We generated test instances that differ in the number of customers (25, 50, or 75)
and grid points (10 or 20). For generating an instance, we randomly place grid
points in a 20 by 20 km area with a depot at the center. We scatter customers at
random spots within the mentioned area, and ensure that each customer could be
reached from at least one grid point while respecting the endurance restrictions
of the ATVs and drones. Furthermore, the demanded parcel size is determined
randomly, i.e., each customer has a 50% probability of receiving either a large
or a small parcel. For each combination of the number of customers and grid
points, we generate 5 different instances. Considering different combinations of
the number of drones (0, 1, or 2) and ATVs (2 or 4), we have a total number
of 180 test instances [12] on which we apply the realistic technical specifications
presented in Table 1, stating that drones are 7.6 times faster than ATVs.
– Taking into account the size of the test instances, which fits to typical real-
world applications, it is interesting to note that the introduced GA is able
to provide routing plans, in a short computation time, for all instances. Two
sample results for a problem instance, with 25 customers and 20 grid points,
are depicted in Figs. 2 and 3.
– Additional number of customers requires higher mission time, which is also
approved by the numerical results. Similarly, additional number of vehicles
increases the computation time of the GA.
– Using a mixed fleet of drones and ATVs has the advantage of a consider-
able reduction in the mission time and deliveries of small parcels by ATVs.
Due to the higher speed of drones, these effects are not surprising. Finally,
larger number of vehicles (ATVs and drones) increases the possibility of jump
sorties.
A Genetic Algorithm for Solving the Truck-Drone-ATV Routing 1031
Table 2. Average mission as well as solution time, ratio of ATV deliveries of small
parcels, and ratio of jump sorties, provided by the GA, for the test instances.
5 Conclusion
References
1. Bouman, P., Agatz, N., Schmidt, M.: Dynamic programming approaches for the
traveling salesman problem with drone. Networks 72(4), 528–542 (2018)
2. Boysen, N., Schwerdfeger, S., Weidinger, F.: Scheduling last-mile deliveries with
truck-based autonomous robots. Eur. J. Oper. Res. 271(3), 1085–1099 (2018)
3. Carlsson, J.G., Song, S.: Coordinated logistics with a truck and a drone. Manag.
Sci. 64(9), 4052–4069 (2018)
4. Coffman, E., Garey, M., Johnson, D.: An application of bin-packing to multipro-
cessor scheduling. SIAM J. Comput. 7(1), 1–17 (1978)
5. Deutsche Post DHL Group: DHL Parcelcopter. Press Kit. https://www.dpdhl.
com/en/media-relations/specials/dhl-parcelcopter.html (2019)
6. Reeves, C.R.: Genetic algorithms. In: Gendreau, M., Potvin, J.-Y. (eds.) Handbook
of Metaheuristics, pp. 109–139. Springer, New York (2010). https://doi.org/10.
1007/978-1-4419-1665-5 5
7. Ha, Q.M., Deville, Y., Pham, Q.D., Hà, M.H.: On the min-cost traveling salesman
problem with drone. Transp. Res. Part C 86, 597–621 (2018)
8. Lin, S., Kernighan, B.W.: An effective heuristic algorithm for the traveling-
salesman problem. Oper. Res. 21(2), 498–516 (1973)
9. Murray, C.C., Chu, A.G.: The flying sidekick traveling salesman problem: opti-
mization of drone-assisted parcel delivery. Transp. Res. Part C 54, 86–109 (2015)
10. Otto, A., Agatz, N., Campbell, J., Golden, B., Pesch, E.: Optimization approaches
for civil applications of unmanned aerial vehicles (UAVs) or aerial drones: a survey.
Networks 72(4), 411–458 (2018)
11. Poikonen, S., Wang, X., Golden, B.: The vehicle routing problem with drones:
extended models and connections. Networks 70(1), 34–43 (2017)
12. Salewski, H., Moeini, M.: Instances for the truck-drone-ATV routing problem.
https://doi.org/10.5281/zenodo.2600809
13. Schermer, D., Moeini, M., Wendt, O.: Algorithms for solving the vehicle routing
problem with drones. Lect. Notes Artif. Intell. 10751, 352–361 (2018)
14. Schermer, D., Moeini, M., Wendt, O.: A Variable Neighborhood Search Algorithm
for Solving the Vehicle Routing Problem with Drones, pp. 1–33. Technical Report,
Technische Universität Kaiserslautern (2018)
15. Wang, X., Poikonen, S., Golden, B.: The vehicle routing problem with drones:
several worst-case results. Optim. Lett. 11(4), 679–697 (2016)
A Planning Problem with Resource
Constraints in Health Simulation Center
1 Introduction
a center. We detail below the different elements of this planning problem, taken
into consideration in our study.
Horizon: The used horizon H is one week decomposed in working days. Let
D be the set of these working days and ∀d ∈ D, we denote Td the set of
time slots of the day d and T = d∈D Td . Each time slot represents 1 h. Let
breakd be a subset of slots identified as potential time breaks, for day d. One
at least of these time slots should be idle to ensure the existence of a daily
lunch break for any sessions.
Resources: We have a finite set of resources R = Rr ∪ Re with Rr the set
of rooms and Re the set of employees. To Re is associated a set of types
Λe = {λ1 , ..., λ|Λe | } which corresponds for example to the skills of employees.
To Rr is also associated a set of types Λr = {λ|Λe |+1 , ..., λ|Λe |+|Λr | } which
corresponds for example, to specific room equipments. We denote Λ = Λr ∪Λe .
Each resource can have more than one associated type. For example, a room
may be equipped with artificial arms for simulation of taking blood, but also
with artificial vertebral columns for the simulation of lumbar punctures. We
denote qtavλt i quantity of resource λi available at time slot t. All activities
scheduled at time slot t cannot use more than the available resources. We
also take into account the availabilities of employees.
Activities: Let A be the set of activities. Each activity a ∈ A is characterized
by duration durationa , an earliest starting date ESa and a latest starting
date LSa . qtreqλai is the quantity of resource of type λi , ∀i = 1, .., |Λ| required
by activity a, and Λa = {λi ∈ Λ / qtreqλai = 0}, the set of resource types
required by a. A precedence relation is defined between the activities and we
denote preda the set of activities that must be planned before activity a.
Training session: Let S be the set of training sessions to scheduled over hori-
zon H. Each training session s ∈ S is composed of a set of activities As , and
Λs = a∈As Λa , gives resource types required by training session s. The oper-
ating rules relative to the activities of a given session s are that all activities
in As must be planned not at the same time; activities are not preemptive.
Algorithm 1 SimU G
Require: S (set of unscheduled training sessions), T (set of time slots)
Ensure: Sol (a feasible solution), Sol− (set of unscheduled activities)
1: Sol ← ∅
2: Sol− ← ∅
3: while S = ∅ do
4: s∗ ← sessionChoice(S)
5: S ← S \ {s∗ }
6: t∗ ← f indBetterStart(s∗ , T )
7: t ← t∗
8: U As ← As
9: while (t ≤ |T |) ∧ (U As = ∅) do
10: EA ← eligibleActivities(As , t)
11: if EA = ∅ then
12: (a∗ , Ra∗ ) ← activityChoice(EA, t)
13: Sol ← Sol ∪ (a∗ , t, Ra∗ )
14: updateAvalaibility(a∗ , t, Ra∗ )
15: U As ← U As \ {a∗ }
16: t ← t + durationa∗
17: else
18: t←t+1
19: end if
20: end while
21: Sol− ← Sol− ∪ U As
22: end while
23: return (Sol, Sol− )
Function f indBetterStart() computes the best start date for session s∗ . In order
to find this date, we need to compute for s∗ and for any time slot t, the earliest
end date endt of s∗ . To compute it, we relax all resource constraints.
For a given start date t and for s∗ , Eq. 5 computes endt , where break() is
a function that gives the number of time slots for lunch breaks required by the
operating rules.
1038 S. Caillard et al.
endt = t + durationa + break(t, durationa ) (5)
a∈As∗ a∈As∗
If two time slots have the same score, we choose time slot t that maximizes the
sum of available resources over [t, endt ]. The more resources are still, the more
opportunities there are to plan remaining activities. The available resources score
of time slot t is given by Eq. (7).
t
end
availt = (qtavλt i ) (7)
t =t λi ∈Λs
The best starting time slot t∗ (see Eq. (8)) for training session s∗ is then the
time slot with the smallest resource deficiencies Dt , and as second criterion, the
biggest resource availability availt .
For a given set As∗ of unscheduled activities for session s∗ , and a time slot t,
eligibleActivities() computes set EA of couples (a, Ra ), where a is an activity
that could start at time slot t, with pre-assigned set of resources Ra .
Activity a can start at t if there are enough resources continuously available
on the period [t, t + durationa ] and ESa ≤ t ≤ LSa . Moreover, ∀a ∈ preda , a is
already scheduled and ta + durationa < t. We note that an activity can require
several different types of resources and, a resource can be of several types.
A Planning Problem with Resource Constraints in Health Simulation Center 1039
When activity a∗ is scheduled on t, all resources in Ra∗ are set unavailable over
the period [t, t + durationa∗ ]. Let us note that precedence constraints are also
updated for all activities linked to a∗ .
1040 S. Caillard et al.
Table 1 presents the comparison between CPLEX and SimU G (with penaltiy
α set to |T |). For each instance, column CPLEX is the optimal makespan Mcplex ,
column SimU G is the makespan MSimU G worked out by our algorithm and the
last column Gap is computed by (MSimU G − Mcplex )/Mcplex .
SimU G achieved the optimal solution for D0 T0 C0 and D0 T0 C1 Brazil1 family
instances. Moreover the gap is always less than 6% in the other cases. We can
observe that the generated instance optimality is always the same, except for
D1 T1 C0 family. Indeed, for these instances, availabilities of employees and rooms
are reduced without compensation of the increasing number of types associated
to employees (see D1 T1 C1 family).
In this paper we have presented a first study of a planning problem with resource
constraints, for the health training center SimUSanté. We proposed a greedy
algorithm SimU G, based on a set of choice criteria aimed at reducing the over-
all Makespan of training sessions, while respecting all resource and time con-
straints. We experimented SimU G on new instances, generated from those of
CB-CTT, and integrating the SimUSanté problem characteristics. The results
obtained were compared to the optimal solutions provided by the CPLEX solver.
Optimality was reached for few instances, but for the others the gap was still
below 6%. SimU G produced a suitable basic solution that we plan to use in a
genetic algorithm, which is the focus of our current research.
References
1. High School Timetabling Project. https://www.utwente.nl/en/eemcs/dmmp/hstt/
2. Bellenguez-Morineau, O., Neron, E.: A branch and bound method for solving multi-
skill project scheduling. RAIRO Op. 41, 155–170 (2007)
3. Browning, T.R., Yassine, A.A.: Resource-constrained multi-project scheduling: pri-
ority rules. Int. J. Prod. Econ. 126, 212–228 (2010)
4. Brucker, P., Knust, S.: Resource-constrained project scheduling and timetabling. In:
PATAT 2000, LNCS 2079. Springer, Heidelberg (2001)
5. Caillard, S., Brisoux-Devendeville, L., Lucet, C.: Health simulation center simu-
santéR
’s Problem Benchmarks. https://mis.u-picardie.fr/en/Benchmarks-GOC/
6. Cooper, T.B., Kingston, J.H.: The Complexity of Timetable Construction Problems.
Springer, Heidelberg (1995)
7. Di Gaspero, L., McCollum, B., Schaerf, A.: Curriculum-based CTT - Technical
Report. The Second International Timetabling Competition (ITC-2007)
8. Pritsker, A.A.B., Watters, L.J., Wolfe, P.M.: Multiproject scheduling with limited
resources: a zero-one programming approach. Manag. Sci. 16(1), 93–108 (1969).
http://www.jstor.org/stable/2628369
9. Schaerf, A.: A survey of automated timetabling. Artif. Intell. Rev. 87–127 (1999)
Edges Elimination for Traveling Salesman
Problem Based on Frequency K5 s
Yong Wang(B)
1 Introduction
Given Kn on n vertices {1, . . . , n}, there is a distance function d(x, y) =
d(y, x) > 0 for any x, y ∈ {1, . . . , n} and x = y. A salesman wants to find
a permutation σ = (σ1 , . . . , σn ) of 1, . . . , n such that σ1 = 1 and d(σ) :=
n−1
d(σn , 1) + i=1 d(σi , σi+1 ) is as small as possible. This is the symmetric travel-
ing salesman problem (T SP ). Due to its theoretical values and wide applications
in engineering, T SP has been extensively studied to find efficient algorithms for
searching either an optimal Hamiltonian cycle (OHC), or an approximate solu-
tion which is a Hamiltonian cycle, i.e. a permutation τ such that d(τ ) ≤ cd(σ)
where σ is the OHC and c is some constant. There are a number of special classes
of graphs where one can find the OHC in a reasonable computation time, see [1].
Karp [2] has shown that T SP is N P -complete. This means that there are no
exact polynomial-time algorithms for T SP unless P = N P . The computation
time of exact algorithms is O(an ) for some a > 1 for general T SP . For example,
We acknowledge W. Cook, H. Mittelmann who created the Concorde and G. Reinelt
et al. who provide the T SP data to TSPLIB. The authors acknowledge the funds
supported by the Fundamental Research Funds for the Central Universities (No.
2018MS039 and No. 2018ZD09).
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 1043–1053, 2020.
https://doi.org/10.1007/978-3-030-21803-4_103
1044 Y. Wang
Held and Karp [3], and independently Bellman [4] gave a dynamic programming
approach that required O(n2 2n ) time. Integer programming techniques, such as
either branch and bound [5] or cutting-plane [6], are able to solve T SP examples
on thousand points. In 2006, a VLSI application with 85,900 points has been
solved with an improved cutting-plane method on a computer system with 128
nodes [6]. The experiments showed the computation time of the exact algorithms
was hard to reduce for large T SP instances.
On the other hand, the computation time of approximation algorithms and
heuristics has been significantly decreased. For example, the MST-based algo-
rithm and Christofides’ algorithm [7] can find the 2-approximation and 1.5-
approximation in time O(n2 ) and O(n3 ), respectively, for metric T SP . For
graphic T SP , Mömke and Svensson [8] gave a 1.461-approximation algorithm
with respect to Held-Karp lower bound. In most cases, the Lin-Kernighan heuris-
tics (LKH) can generate the “high quality” solutions within 2% of the optimum
in nearly O(n2.2 ) time [9]. However, these approximation algorithms and heuris-
tics can not guarantee to find an OHC.
In recent years, researchers have developed polynomial-time algorithms to
resolve the T SP on sparse graphs. In sparse graphs, the number of Hamiltonian
cycles (HC) is greatly reduced. For example, Sharir and Welzl [10] proved that in
a sparse graph of average degree d, the number of HCs is less than e∗ ( d2 )n where
e∗ is the base of natural logarithm. In addition, Björklund [11] proved that T SP
on bounded degree graphs can be solved in time O(2 − )n , where depends
on the maximum vertex degree. For T SP on cubic connected graphs, Correa,
Larré and Soto [12] proved that the approximation threshold is strictly below 43 .
For T SP on bounded-genus graphs, Borradaile, Demaine and Tazari [13] gave a
polynomial-time approximation scheme. In the case of asymmetric T SP , Gharan
and Saberi [14] designed the constant factor approximation algorithms. For T SP
on planar graphs, the constant factor is 22.51(1+ n1 ). Thus, whether one is trying
to find exact solutions or approximate solutions to the T SP , one can has variety
of more efficient algorithms available if one can reduce a given T SP to finding
an OHC in a sparse graph.
Based on 2 − opt move, Jonker and Volgenant [15] found many useless
edges out of OHC. After these edges were trimmed, the computation time of
branch-and-bound for certain T SP instances was reduced to half. Hougardy and
Schroeder [16] eliminated the useless edges with a combinatorial algorithm based
on 3 − opt move. The combinatorial algorithm eliminates more useless edges for
T SP instances in T SP LIB and the computation time of Concorde Package was
reduced by more than 11 times for certain big T SP instances. Besides the accel-
eration to the exact algorithms for T SP , the candidate edges in a sparse graph is
helpful for the local search solver LKH to detect the high quality solutions quite
efficiently [17]. Different from the above research, we eliminate the useless edges
for T SP according to frequencies of edges computed with frequency quadrilat-
erals [18,19] and optimal four-vertex paths [20]. As the frequencies of edges are
computed with either frequency quadrilaterals or optimal 4-vertex paths, the
frequencies of OHC edges are generally much bigger than those of most of the
Edges Elimination for Traveling Salesman Problem Based on Frequency K5 s 1045
other edges. As the minimum frequency of the OHC edges is taken as a fre-
quency threshold to cut the other edges, the experiments displayed that a sparse
graph with O(nlog2 (n)) edges are obtained for most T SP instances.
In this paper, the frequency K5 s are presented and a binomial distribution
model based on frequency K5 s is built. According to the binomial distribution,
one can eliminate half edges with small frequencies and OHC edges are preserved
with a big probability. If each edge is contained in enough number of K5 s, the
edges elimination can be repeated until there are seldom K5 s. In this case, the
Kn of T SP is converted into a sparse graph. The sparse graphs generally have
O(|V |) or O(|V |ln(|V |)) edges. In addition, if the resulting graph has bounded
degree (genus) or is planar or k-edge connected, then we have even more efficient
algorithms available to find exact or approximate solutions to the T SP .
The outline of this paper is given as follows. In Sect. 2, frequency K5 s will
be given and a probability model is built for cultivating OHC edges. In Sect. 3,
a binomial distribution model based on frequency K5 s is introduced. In Sect. 4,
a heuristic algorithm is designed to trim many useless edges. In Sect. 5, we shall
do experiments to cut edges for four types of T SP instances. Conclusions are
drawn in the last section.
A A A
7 7 5 9
1 1 1 1
E 1 1
B E B E B
1 1 2 0
7 7 8 6
D C 7 7
D C D C
(a) (b) (c)
(E, A, B, C, D). Based on the five OP 5 s, the ten distance inequalities determined
by the edges’ distances are derived and shown in Table 1. Besides the five OP 5 s in
OHC, there are other five OP 5 s. The distance inequalities to compute the other
OP 5 s cannot violate the inequalities for computing the OP 5 s in OHC. For exam-
ple, one possible set of the other five OP 5 s are (A, B, E, D, C), (B, C, A, E, D),
(C, D, B, A, E), (D, E, C, B, A) and (E, A, D, C, B). The inequalities to compute
the five OP 5 s are not hard to derive. We neglect them for saving pages. The fre-
quency K5 computed with the ten OP 5 s is shown in Fig. 1(b). The numbers aside
the edges are their frequencies enumerated from the ten OP 5 s. The frequency
of each OHC edge is 7 and that of the other edges is 1. The frequency of OHC
edges is above 4 and the frequency of the other edges is below 4.
OP 5 s Distance inequalities
(A, B, C, D, E) d(A, B) + d(C, D) < d(A, C) + d(B, D)
(B, C, D, E, A) d(A, B) + d(D, E) < d(A, D) + d(B, E)
(C, D, E, A, B) d(A, E) + d(B, C) < d(A, C) + d(B, E)
(D, E, A, B, C) d(A, E) + d(C, D) < d(A, D) + d(C, E)
(E, A, B, C, D) d(B, C) + d(D, E) < d(B, D) + d(C, E)
d(A, B) + d(A, E) + d(C, D) < d(A, C) + d(A, D) + d(B, E)
d(A, B) + d(B, C) + d(D, E) < d(A, C) + d(B, D) + d(B, E)
d(A, B) + d(C, D) + d(D, E) < d(A, D) + d(B, D) + d(C, E)
d(A, E) + d(B, C) + d(C, D) < d(A, C) + d(B, D) + d(C, E)
d(A, E) + d(B, C) + d(D, E) < d(A, D) + d(B, E) + d(C, E)
We assume
each K5 contains one OHC and ten OP 5 s. An edge e is contained
in n−23 K 5 s in Kn . If e belongs to OHC of K5 , its frequency will be above 4.
Otherwise, the frequency will be much smaller than 4. As we choose N frequency
K5 s containing e to compute its total frequency, the frequency of e will be nearly
equal based on different frequency K5 s. Thus, we choose one of the thirty-one
frequency K5 s as a standard model to compute the frequency of edges in Kn .
Among the thirty-one frequency K5 s, the set of frequencies 9, 8, 7, 6, 5, 2, 1, 1, 1, 0
occurs with the maximal time 10. One frequency K5 containing such frequency
set is shown in Fig. 1(c). We take the frequency K5 containing this frequency set
as the standard model to compute the frequency of each edge in Kn .
A B
…
C
d(A,B)+d(C,D)<d(A,C)+d(B,D)
(A,B), (C,D) belong to OHC D
in ABCDE
…
OHC in Kn E
obviously bigger than 4N for a general edge as N is big. Thus, one can eliminate
the edges with frequencies less than 4N and OHC will be kept intact.
1 3
p5 (eo ) = p6 (eo ) = p7 (eo ) = p8 (eo ) = p9 (eo ) = +
10 10(n − 2)
1 3
p0 (eo ) = p2 (eo ) = − (1)
10 10(n − 2)
In a K5 , half edges belong to OHC. It means half edges will have the fre-
quency above 4. For an edge e in Kn , the p>4 = 12 . As we choose N frequency
K5 s containing e, there will be N2 frequency K5 s where the frequency of e is
above and below 4, respectively. If we use 4 as a frequency threshold to trim e, e
will be eliminated N2 times. The probability 1
nthat e is eliminated is 2 according
to frequency threshold 4. Considering the 2 edges in Kn , half edges will be cut
according to 4. Since OHC edges generally have big frequencies based on formula
(1), they will be preserved as we delete the half edges with small frequencies.
After one round of edges elimination, a graph with 12 n2 edges is preserved. As
N is big enough for each edge in the preserved graph, we can compute each of
their frequencies with N frequency K5 s and eliminate another half edges with
small frequencies. This edges elimination can be iterated until N is small for
edges in some preserved graph. As N ≈ 0, the binomial distribution (2) does not
work well. In this case, we will compute a sparse graph for T SP , see the exper-
iments based on frequency quadrilaterals [20]. At the k th cycle, the number of
k n
preserved edges is 12 2 . Thus, the maximum iteration kmax = log 1
2
2 n−1
.
For an OHC edge, the p>4 > 12 based on formula (1). It means the probability
that eo is cut is less than 12 according to frequency threshold 4. As we trim half
edges with small frequencies, eo will be preserved with a big probability.
4 A Heuristic Algorithm
TSP n M1 /l1 davg dmax M2 /l2 davg dmax M3 /l3 davg dmax M4 /l4 davg dmax
att48 48 142/0 5 10
gr229 229 759/0 6 13 589/2 5 9 531/5 4 8 506/9 4 8
rd400 400 3813/0 19 49 2415/1 12 31 1691/4 8 21 1083/7 5 12
gr431 431 3703/0 17 41 1982/1 9 27 1209/4 5 11 1121/7 5 10
pcb442 442 1712/0 7 19 1349/1 6 14 1242/4 5 12 1147/8 5 10
att532 532 8326/0 31 70 4760/2 17 42 2492/5 9 22 1657/8 6 15
si535 535 8338/0 31 84 6562/3 24 69 5293/6 19 56 3801/9 14 46
pa561 561 3807/0 13 34 2584/3 9 25 1887/7 6 18
gr666 666 5568/0 16 58 3670/2 11 39 2451/5 7 25 2028/8 6 21
rat783 783 8308/0 21 38 4077/1 10 39 3203/5 8 16 2404/8 6 11
si1032 1032 43038/0 83 212 20689/1 40 77 10882/3 21 48 8469/9 16 36
d1291 1291 16868/0 26 69 10932/1 16 45 5076/5 7 23 3093/10 4 11
d1655 1655 16417/0 19 84 14633/1 17 73 11080/3 13 57 9379/5 11 51
u1817 1817 44728/0 49 116 13682/1 15 36 7770/5 8 21 5863/10 6 16
rl1889 1889 78461/1 83 231 54355/3 57 189 29536/6 31 123 11340/10 12 56
d2103 2103 50184/1 47 136 33681/2 32 102 15873/7 15 55 10158/9 9 36
u2152 2152 20009/0 18 42 15412/1 14 33 13284/3 12 28 9631/8 8 24
u2319 2319 8557/0 7 9 7802/2 6 8 4862/5 4 8 4739/7 4 7
pr2392 2392 39356/0 32 87 26771/2 22 65 12089/5 10 28 9110/9 7 20
Edges Elimination for Traveling Salesman Problem Based on Frequency K5 s
increased by 5. For OHC edges, big values of f¯(e) will be computed as N rises
since they have big probabilities p>4 . As Mi − Mi+1 < 5, the heuristic algorithm
will be terminated. At each iteration, the frequency threshold f , the number
M of preserved edges, the number l of eliminated OHC edges, the minimum,
average and maximum vertex degrees dmin , davg and dmax are recorded. We
show four groups of results according to li (i = 1, 2, 3, 4), i.e., l1 = 0, 1 ≤ l2 ≤ 3,
4 ≤ l3 ≤ 7 and 8 ≤ l4 ≤ 10. The values of li and the corresponding minimum
Mi , davg and dmax are given in Table 2.
As l1 = 0, M1 = O(n log2 n) and davg = O(log2 n). The heuristic algorithm
cut many useless edges for all of the T SP instances. In addition, dmax ≤ 3davg
for nearly all of the instances as l1 = 0. It means OHC edges have big p>4 in Kn
and the preserved graphs so they are not cut. In the following process, more and
more edges are eliminated. M2 , M3 and M4 decrease quickly although only a few
known OHC edges are cut. For example as 1 ≤ l2 ≤ 3, M2 is much smaller than
M1 . In the sparse graphs, most OHC edges still have the bigger p>4 s than most
of the other eliminated edges. It indicates the heuristic algorithm works well to
delete useless edges for either dense or sparse graphs of T SP . When 4 ≤ l3 ≤ 7,
M3 < n log2 n. In this case, the algorithm has computed a very sparse graph for
T SP at the expense of losing a few OHC edges.
6 Conclusions
Frequency K5 s have good properties to be used for eliminating useless edges
for T SP . As we choose N frequency K5 s for an edge to compute its frequency,
the binomial distribution demonstrates OHC edges generally have bigger fre-
quency than most of the other edges. A heuristic algorithm is provided to cut
useless edges according to their frequencies. The probability model, binomial
distribution are verified by the experimental results.
References
1. Johnson, D.S., McGeoch, L.-A.: The Traveling Salesman Problem and its Varia-
tions, Combinatorial Optimization. 1st edn. Springer Press, London (2004)
2. Karp, R.: On the computational complexity of combinatorial problems. Networks
5(1), 45–68 (1975)
3. Held, M., Karp, R.: A dynamic programming approach to sequencing problems. J.
Soc. Ind. Appl. Math 10(1), 196–210 (1962)
4. Bellman, R.: Dynamic programming treatment of the traveling salesman problem.
J. ACM 9(1), 61–63 (1962)
5. Klerk, E.-D., Dobre, C.: A comparison of lower bounds for the symmetric circulant
traveling salesman problem. Discret. Appl. Math 159(16), 1815–1826 (2011)
6. Applegate, D., Bixby, R., Chvátal, V., Cook, W., Espinoza, D.-G., Goycoolea, M.,
Helsgaun, K.: Certification of an optimal TSP tour through 85900 cities. Oper.
Res. Lett. 37(1), 11–15 (2009)
7. Thomas, H.-C., Charles, E.-L., Ronald, L.-R., Clifford, S.: Introduction to Algo-
rithm, 2nd edn. China Machine Press, Beijing (2006)
Edges Elimination for Traveling Salesman Problem Based on Frequency K5 s 1053
8. Mömke, T., Svensson, O.: Approximating graphic TSP by matchings. In: FOCS
2011, pp. 560–569. IEEE, NY (2011)
9. Helsgaun, K.: An effective implementation of the Lin-Kernighan traveling salesman
heuristic. Eur. J. Oper. Res. 126(1), 106–130 (2000)
10. Sharir, M., Welzl, E.: On the number of crossing-free matchings, cycles, and par-
titions. SIAM J. Comput. 36(3), 695–720 (2006)
11. Björklund, A., Husfeldt, T., Kaski, P., Koivisto, M.: The traveling salesman prob-
lem in bounded degree graphs. ACM T. Algorithms 8(2), 1–18 (2012)
12. Correa, J.-R., Larré, O., Soto, J.-A.: TSP tours in cubic graphs: beyond 4/3. SIAM
J. Discret. Math. 29(2), 915–939 (2015)
13. Borradaile, G., Demaine, E.-D., Tazari, S.: Polynomial-time approximation
schemes for subset-connectivity problems in bounded-genus graphs. Algorithmica
68(2), 287–311 (2014)
14. Gharan, S.-O., Saberi, A.: The asymmetric traveling salesman problem on graphs
with bounded genus. In: SODA 2011, pp. 23–25. ACM (2011)
15. Jonker, R., Volgenant, T.: Nonoptimal edges for the symmetric traveling salesman
problem. Oper. Res. 32(4), 837–846 (1984)
16. Hougardy, S., Schroeder, R.-T.: Edges elimination in TSP instances. In: Kratsch,
D., Todinca, I. (eds.) WG 2014. LNCS, vol. 8747, pp. 275–286. Springer, Heidelberg
(2014)
17. Taillard, É.-D., Helsgaun, K.: POPMUSIC for the traveling salesman problem.
Eur. J. Oper. Res. 272(2), 420–429 (2019)
18. Wang, Y., Remmel, J.-B.: A binomial distribution model for the traveling salesman
problem based on frequency quadrilaterals. J. Graph Algorithms Appl. 20(2), 411–
434 (2016)
19. Wang, Y., Remmel, J.-B.: An iterative algorithm to eliminate edges for traveling
salesman problem based on a new binomial distribution. Appl. Intell. 48(11), 4470–
4484 (2018)
20. Wang, Y.: An approximate method to compute a sparse graph for traveling sales-
man problem. Expert Syst. Appl. 42(12), 5150–5162 (2015)
Industrial Symbioses: Bi-objective Model
and Solution Method
1 Introduction
Since few years, it becomes obvious that mankind needs cannot be considered as a
necessary and sufficient priority. Indeed, the Nature’s and Earth ecosystem’s needs are
just as important because of resources availability and large natural cycles preservation
objectives [1]. We are now conscious that Earth system has limited capacities and
moreover, some of its natural resources involve very long term cycles to be renewed,
becoming, in fact, non-renewable at mankind scale. This implies that system stability,
which allows life emergence, is in fact weak and can easily shift toward other states,
less suitable to life development. It is then necessary to change paradigm to propose
new production and consumption models. This is one of the major issues of sustainable
development.
Among the different ideas and applications linked to sustainable development, the
industrial ecology is very promising to establish a link between Nature and mankind
needs by considering industrial systems as ecosystems with objectives compatible with
those of natural ecosystems [2]. We can define industrial ecology as “all practices
useful to reduce industrial pollution” and its objective is to give to industrial systems a
long-term viability and transforms them into ecofriendly systems, i.e. without negative
impacts on environment. The concept origin comes from the metaphor between natural
and industrial ecosystems, reusing matters and wastes and hoping reduce the need in
raw materials extracted from Earth resources and having a major positive environ-
mental impact [3].
Among the existing concept developed in the frame of industrial ecology, the
industrial symbiosis seems very interesting, because it is based on an exchange rela-
tionship, beneficial to all participants. Indeed, an enterprise may find any interest in its
own waste, making it impossible to set up circular economy loops, but another
enterprise may be very interested by this same waste (cheaper, closer and/or more
accessible than traditional supply) [4]. More than that, this second enterprise may
produce some waste, which could be useful to the first one. Even if this concept is
rarely applicable considering only two enterprises, some industrial parks was designed
based on this concept, involving many enterprises to achieve a win-win (called here
symbiotic) relationship between them, which is very profitable to environment, because
wastes are considered (and used) as raw materials. In fact, this concept transforms
negative environmental externalities into positive ones like pollution reduction and
reduced raw material need.
Nevertheless, more than the only ecological benefit, involved enterprises in the IS
must also find a maximum economical gain. That is why, we propose a model which
allows to take in account simultaneously those two objectives which can be conflicting:
the improvement of one objective may lead to deterioration of the other [5]. Thus, a
single solution, which can optimize all objectives simultaneously, perhaps does not
exist. To solve our bi-objective maximization problem with logical constraints (for
example, minimum levels of replenishment can be defined according to the needs of
each plant), we propose two different solutions based on scalarization. The scalarization
is a technique, which permits to find efficient solutions employed in nearly all exact
methods and many heuristic techniques, by transforming the multi-objective problem
into a single objective problem, with additional variables and/or parameters. Then, the
single objective problem is solved repeatedly in order to find some subset of efficient
solutions of the initial multi-objective problem [6]. The first proposed scalarization
method corresponds to a linear scalarization of our mathematical problem and the
second ones to the classical -constraint method [7]. The well-known and popular e-
constraint method consists in retaining one objective and transforming the other
objective(s) into constraints [8, 9]. The linear scalarization consists in a convex com-
bination (i.e. a linear weighted sum) of all the objectives [10]. It is then well known that
if all defined weighted parameters are strictly positive for each objective, an optimal
solution of the combination is efficient (but that efficient solutions in the interior of the
convex hull of the set of non-dominated points in criterion space cannot be found).
Then, based on these two solutions, we develop a numerical study based on a real
industrial park located in China.
The rest of this article is structured as follow. After seeing deeper what an industrial
symbiosis is and which assumptions we consider, we will show in the third part the bi-
objective maximization problem model, just before proposing methods to resolve it and
analyze the obtained results. Finally, we will conclude and explore some potential
future works.
1056 S. Hennequin et al.
1. Construction 7. Aluminum
3. Fertilizer plant
materials plant 1 plant 1
2. Construction
materials plant 2 6. Secondary
aluminum plant
Desulfurized
Heat Alumina Liquid aluminum gypsum
Aluminum Alloy Aluminum waste Nitamine Slag
Now, as we defined the problem with its intern and extern exchanges and the limits
we consider, we can develop the corresponding mathematical model, which is more
precisely a bi-objective maximization problem.
Table 1. Notations
Variable Description
i 2 f1; 2; 3; . . .; N g The N different studied enterprises
k 2 f1; 2; 3; . . .; K g The K different involved product types
Rk ðiÞ Requirement of enterprise i in k type product (constant)
T k ðiÞ Threshold quantity for a k type product acceptable for
enterprise i
Wak ðiÞ Waste product by enterprise i of k type product (constant)
GðiÞ Economic profit for the enterprise i
SP ðiÞ Total selling price of enterprise i outputs (commercialized
finished products excepted)
CT ðiÞ Total cost for inputs and outputs for enterprise i
CT;in ðiÞ & CT;out ðiÞ Total cost for inputs & outputs for enterprise i
Skext ðiÞ Selling price of a type k output for enterprise i outside the IS
Skint;i ðjÞ Selling price of a type k output for enterprise i to enterprise
j inside the IS
(continued)
1058 S. Hennequin et al.
Table 1. (continued)
Variable Description
akext;in ðiÞ & akext;out ðiÞ Amount of inputs & outputs of type k exchanged by enterprise
i outside the IS
akint;j ðiÞ Amount of outputs of type k transferred from enterprise j to
enterprise i
k
Cext;in ðiÞ Cost of k type product imported by enterprise i from outside the
IS
k
Cint;j ðiÞ Cost of k type product transferred by enterprise i from
enterprise j
k
Ctrans;in=out ðiÞ Cost of input (resp. output) transportation of k type product
exchanged by enterprise i with outside
k
Cenv;in=out ðiÞ Environmental cost of input (resp. output) of k type product
exchanged by enterprise i with outside
k
Csoc;in=out ðiÞ Social cost of input (resp. output) of k type product exchanged
by enterprise i with outside
k
Ctrait;out ðiÞ Cost of input (resp. output) treatment of k type product
exchanged by enterprise i with outside
k
Ctrans;j ðiÞ, Ctrait;j
k
ðiÞ, Transportation, treatment, environmental and social costs of
Cenv;j ðiÞ and Csoc;j
k k
ðiÞ k type product transferred by enterprise i from enterprise j
k¼1 j¼1;j6¼i
X
K X
N
CT;in ðiÞ ¼ ½akext;in ðiÞ C1k ðiÞ þ akint;j ðiÞ C1;j
k
ðiÞ;
k¼1 j¼1;j6¼i
X
N
Rk ðiÞ ¼ akext;in ðiÞ þ akint;j ðiÞ; ð1Þ
j¼1; j6¼i
X
N
Wak ðiÞ ¼ akext;out ðiÞ þ akint;i ðjÞ; ð2Þ
j¼1; j6¼i
The logical constraint (3) verifies that the industry j is able to satisfy at least the
threshold T k ðiÞ in order to have an exchange between enterprise i and j. Indeed, a
minimum level of replenishment, denoted T k ðiÞ, is defined according to the needs of
the plant i to represent the fact that if the firm j could not supply sufficient product k to
the firm i but only few products k, it will be not interesting for the firm i.
Now we are going to propose the mathematical model of our problem with its
internal and external exchanges. It focuses on maximizing the profit of the symbiotic
flow and the economic profit of all enterprises in the IS, defined respectively as:
" ! #
K X
X N X
N
F1 ðaÞ ¼ akint;j ðiÞ akext;in ðiÞ þ akext;out ðiÞ ;
k¼1 i¼1 j¼1;j6¼i
X
N
F2 ðaÞ ¼ GðiÞ;
i¼1
where the variable a ¼ akint;j ðiÞ; akext;in ðiÞ; akext;out ðiÞ .
i;j¼1;...;N;k¼1;...;K
Finally, the bi-objective maximization problem is given by:
2
þ 2KN
maxðF1 ðaÞ; F2 ðaÞÞ s:t: a 2 RKN
þ ; ð1Þ; ð2Þ and ð3Þ: ð4Þ
We see that if ykj ðiÞ ¼ 1, then T k ðiÞ akint;j ðiÞ Rk ðiÞ. Otherwise, ykj ðiÞ ¼ 0.
In this case, the problem (4) can be reformulated as the following problem:
2
þ 2KN 2
maxFða; yÞ :¼ ðF1 ðaÞ; F2 ðaÞÞ s:t: a 2 RKN
þ ; y 2 f0; 1gKN ; ð1Þ; ð2Þ; ð5Þ: ð6Þ
It is worth noting that the number of continuous variables, binary variables, linear
constraints are, respectively, KN 2 þ 2KN, KN 2 , 2KN þ 2KN 2 .
Next, we propose two approaches based on the linear scalarization and the -
constraint method in order to scalarize (6) into a single-objective optimization problem
[7–10].
Solution method 1: linear scalarization. Let us denote by w1 and w2 the weight of
the objective function F1 and F2 , respectively. Assume that w1 [ 0; w2 [ 0;
w1 þ w2 ¼ 1. In this case, a reformulation for a scalarization of (6) is defined as:
2
þ 2KN 2
min ðw1 F1 ðaÞ þ w2 F2 ðaÞÞ s:t: a 2 RKN
þ ; y 2 f0; 1gKN ; ð1Þ; ð2Þ; ð5Þ: ð7Þ
and
2
þ 2KN 2
min F2 ðaÞ s:t: F1 ðaÞ 2 ; a 2 RKN
þ ; y 2 f0; 1gKN ; ð1Þ; ð2Þ; ð5Þ: ð9Þ
Note that three resulting problems (7), (8) and (9) take the form of mix integer linear
programs. In what follows, we propose a numerical application of our solutions based
on a real industrial park located in China.
5 Numerical Experiment
In this section, we conduct an experiment on the real case of Chinese Qijiang industrial
eco-park located at Chongqing (see Fig. 1). Suppose that suppliers outside the IS have
an unlimited production capacity. The costs, the initial input needs, the production
capacity of each enterprise in the IS for each type and the minimum input quantities
Table 2. The costs of types in the industrial symbioses ecosystem.
Type k k k k k k k k k
Cext;in =Skext Cint;j =Skint;i Ctrans;in =Ctrans;out Cenv;in =Cenv;out Csoc;in =Csoc;out Ctrait;out
1. Alumina 40000$/T 30000$/T 25$/T 1.2$/T 1.5$/T 0
2. Liquid aluminum 2000$/T 2000$/T 70$/T 1.2$/T 1.5$/T 0
3. Nitamine 300$/U 0 200$/U 1$/U 1$/U 0
4. Slag 10$/T 10$/T 80$/T 9$/T 9$/T 3$/T
5. Desulfurized gypsum 20$/T 0 50$/T 9$/T 11$/T 0
6. Carbon 92$/T – 50$/T 9$/T 11$/T –
7. Aluminum alloy 3000$/T 3000$/T 100$/T 1.2$/T 1.5$/T 0
8. Aluminum waste 0 0 30$/T 1$/T 1$/T 1.3$/T
9. Heat 50000$ 0 0 0 0 0
10. Building material 200$/T – 60$/T 1$/T 1$/T 0
Industrial Symbioses: Bi-objective Model and Solution Method
1061
1062 S. Hennequin et al.
accepted by enterprise i to choose its supplier are given by the following tables. Here
the numbers N ¼ 8, K ¼ 10; the indexes of enterprises and product types are,
respectively, given in Fig. 1 and Tables 2, 3, 4, 5 and 6.
Table 3. The transportation/treatment costs of types between two enterprises (in $/T).
k
Ctrans;j ðiÞ=Ctrait;j
k
ðiÞ Elect. Const. mat. Fert. 2nd Alum. Alum.
alum. plant 1 & 2 plant alum. plant 1 plant 2
Elect. Liquid 30/0 30/0
alu. alum.
Power Alumina 10/0
plant Slag 10/3
Heat 0/0 0/0 0/0
Des. 10/0
gypsum
Fert. Des. 10/0
plant gypsum
2ndary Alum. 50/0 50/0
alum. alloy
Alu. Alum. 10/1
plant 1 waste
Alu. Alum. 10/1
plant 2 waste
Table 4. The environmental/social costs of types between two enterprises (in $/T).
k
Cenv;j ðiÞ=Csoc;j
k
ðiÞ Elect. Const. Fert. 2nd Alum. Alum.
alum. mat. plant plant alum. plant 1 plant 2
1&2
Elect. Liquid 0.1/0.15 0.1/0.15
alum. alum.
Power Alumina 0.1/0.15
plant Slag 0/0
Heat 0/0 0/0 0/0
Des. 0.1/0.15
gypsum
Fert. Des. 0.1/0.15
plant gypsum
2ndary Alum. 0.1/0.15 0.1/0.15
alum. alloy
Alum. Alum. 0.1/0.15
plant 1 waste
Alum. Alum. 0.1/0.15
plant 2 waste
Industrial Symbioses: Bi-objective Model and Solution Method 1063
Table 7. Objective values F1 ða Þ and F2 ða Þ at the optimal solution a to (7) and the CPU time
(in seconds) with the different weights w1 and w2 .
CPU =0.5
0.2 0.8 -125 71264.9 0.012 1 4 4 25
0.4 0.6 -125 71264.9 0.010 1 4 5 35
0.5 0.5 -125 71264.9 0.032 2 3 5 25
0.6 0.4 -125 71264.9 0.020 2 4 4 25
0.8 0.2 -125 71264.9 0.010 2 4 5 25
3 4 9 1
=0.5 5 4 1 46
1 5 10 6 7 8 5
3 3 1 6 8 8 6
4 6 150 7 4 9 1
8 2 8 7 5 2 18
8 9 1 7 6 7 5
1 10 70 8 5 2 12
2 10 90 8 6 7 5
3 5 10
4 3 1
4 4 10
4 9 3
7 8 3
8 8 2
We observe from Tables 7, 8 and 9 that two approaches based on linear scalar-
ization and -constraint method are very efficient for the proposed model. In particular,
the values of objective functions at the optimal solution a are the same with the
different values of parameters in both linear scalarization and -constraint method. They
run very fast and CPU time is less than 0.04 s in all cases. We see that the optimal
solutions a obtained by two methods are slightly different. There exist 27 exchanges
between two enterprises for linear scalarization while 26 exchanges for -constraint
method. However, the total quantities akint;j ðiÞ of enterprises in the IS are the same for
both methods (=234); it is similar to the total quantities akext;in=out ðiÞ outside IS (=359).
Industrial Symbioses: Bi-objective Model and Solution Method 1065
Table 8. Objective values F1 ða Þ and F2 ða Þ at the optimal solution a to (8) and the CPU time
(in seconds) with the different values of 1 .
CPU =-70000
-5000 -125 71264.9 0.008 1 3 5 35
-10000 -125 71264.9 0.008 1 4 4 25
-50000 -125 71264.9 0.007 2 4 4 25
-70000 -125 71264.9 0.007 2 4 5 50
-80000 * * * 3 4 9 1
-100000 * * * 5 4 1 46
6 7 8 5
=-70000 6 8 8 6
1 5 10 7 4 9 1
3 3 1 7 5 2 18
4 6 150 7 6 7 5
8 2 8 8 5 2 12
8 9 1 8 6 7 5
1 10 70
2 10 90
4 3 1
4 4 10
Here the symbol “*” means that
4 5 10
no feasible point is found.
4 9 3
7 8 3
8 8 2
Table 9. Objective values F1 ða Þ and F2 ða Þ at the optimal solution a to (9) and the CPU time
(in seconds) with the different values of 2 . Here the optimal solution to (9) when 2 ¼ 130 is the
same as the one in Table 8.
2 F1 ða Þ F2 ða Þ CPU
100 * * *
130 −125 71264.9 0.007
150 −125 71264.9 0.007
200 −125 71264.9 0.017
6 Conclusion
using a bi-objective mathematical model in which the constraints are mainly link to
amounts of products that can be exchanged inside the IS, considering the involved
enterprises requirements and their waste production. It can be expressed as a bi-
objective maximization problem with logical constraints. Then, secondly, we refor-
mulate this problem by converting these logical constraints into linear constraints and
propose two solution methods based on the linear scalarization and the e-constraint
method for the resulting problem. Finally, we use values inspired by the real case of
Chinese Qijiang eco-park located at Chongqing to numerically solve the problem
following the two proposed resolution methods and then compare the results.
As a matter of perspective, we of course can speak about the opportunity of taken in
account the limited capacities of many things we do not consider here. More specifi-
cally: the transportation means (limited in charge/volume, but also in tour frequency),
the treatment devices/installation (which may severely reduce the availability or
profitability of a specific waste if the process is expensive and/or long) and of course
enterprises themselves which could produce waste at a rhythm which may be non-
compatible with other enterprises’ needs. Then, adding these constraints about time and
capacity will improve the model accuracy.
References
1. Barnosky, A.D., Hadly, E.A., Bascompte, J., Berlow, E.L., Brown, J.H., Fortelius, M., Getz,
W.M., Harte, J., Hastings, A., Marquet, P.A., Martinez, N.D., Mooers, A., Roopnarine, P.,
Vermeij, G., Williams, J.W., Gillespie, R., Kitzes, J., Marshall, C., Matzke, N., Mindell, D.
P., Revilla, E., Smith, A.B.: Approaching a state shift in Earth’s biosphere. Nature 486, 52–
58 (2012)
2. Frosch, R.: Industrial ecology: a philosophical introduction. Proc. Natl. Acad. Sci. USA 89,
800–803 (1992)
3. Ayres, R.U.: Industrial Metabolism in Technology and Environment (1989)
4. Trevisan, M., Nascimento, L.F., Madruga, L.R.D.R.G., Mülling, D.N., Figueiró, P.S.,
Bossle, M.B.: Industrial ecology, industrial symbiosis and industrial Eco-parc: to know to
apply. Syst. Manag. 11, 204–215 (2016)
5. Wiecek, M.M., Ehrgott, M., Engau, A.: Continuous multiobjective programming. In:
Multiple Criteria Decision Analysis, pp. 739–815. Springer, New York (2016)
6. Ehrgott, M.: A discussion of scalarization techniques for multiple objective integer
programming. Ann. Oper. Res. 147(1), 343–360 (2006)
7. Hwang, C.-L., Masud, A.S.M.: Multiple Objective Decision Making, Methods and
Applications: A State-of-the-Art Survey. Springer-Verlag (1979)
8. Mavrotas, G.: Effective implementation of thee-constraint method in multi-objective
mathematical programming problems. Appl. Math. Comput. 213, 455–465 (2009)
9. Miettinen, K.: Nonlinear Multiobjective Optimization. Springer (1999)
10. Holzmann, T., Smith, J.C.: Solving discrete multi-objective optimization problems using
modified augmented weighted Tchebychev scalarizations. Eur. J. Oper. Res. 271, 436–449
(2018)
11. Chertow, M.R.: Industrial symbiosis: literature and taxonomy. Ann. Rev. Energy Environ.
25, 313–337 (2000)
12. Li, B., Xiang, P., Hu, M., Zhang, C., Dong, L.: The vulnerability of industrial symbiosis: a
case study of Qijiang Industrial Park China. J. Cleaner Prod. 157, 267–277 (2017)
Intelligent Solution System Towards Parts
Logistics Optimization
1 Introduction
Parts logistics optimization, aiming at solving the minimum cost of transporting
all the required parts to its destination, is vital to modern industrial activities
Jointly supported by Natural Sciences Foundation of China under Grant (No.
61673119), the key project of Shanghai Science & Technology (No. 16JC1420402),
Shanghai Municipal Science and Technology Major Project (No. 2018SHZDZX01) and
ZHANGJIANG LAB, the Shanghai Committee of Science and Technology (Grant No.
14DZ1118700).
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 1067–1077, 2020.
https://doi.org/10.1007/978-3-030-21803-4_105
1068 Y. Huang et al.
for all manufacturing enterprises, which is the key link of the supply chain man-
agement (SCM) and has been long studied [17]. In SCM researches, Supply chain
integration is considered as a key factor of achieving improvement [15,16] and
have already become a powerful tool in real-world economic activities [1,14].
The main component of parts logistics problem is the vehicle routing problem
(VRP), introduced by [2,8], that is, designing optimal delivery collection routes
from depots to a number of cities or customers, subjecting to side constraints [12].
There are some researches on VRP variants. VRP with multiple depots which
vehicle can choose starting from is researched by [3]. Replenishment concept
for VRPs is introduced by [6], where vehicles can replenish capacity in several
stations. VRPs with time-window constraints (VRPTW) are surveyed by [5].
Multiple uses of vehicles (VRPMU) is considered in [7]. 2-dimensional loading
constraints to VRP (2L-CVRP) is considered in [11] which uses branch-and-
bound for checking loading feasibility. Tabu Search is utilized to solve 2L-CVRP
by [9]. However, above researches only considered individual or partial combina-
tions of above constraints. Due to this circumstances SAIC motor is still utilizing
old-fashion manual planning scheme.
This work incorporates above works and hereby proposes the 2-Dimensional
Loading Capacitated Multi-Depot Heterogeneous Vehicle Routing Problem with
Time Windows.
2 Problem Formulation
2.1 Problem Scenario Description
Fig. 1. An example route: a truck departed from a vehicle yard, visited suppliers,
warehouses and the hub in an intermittent way, and finally ending its daily task by
returning to the truck yard.
Typical parts logistics are briefly described as follows: the supply chain sys-
tem requests parts from various suppliers to be delivered to plant warehouses
for assembly. Herein, the basic transportation unit is named as shipment. Each
shipment is composed of a bunch of same type bins, which should be delivered
from a specific supplier to a specific plant warehouse. These transportation are
carried out by the logistic department with adequate trucks of different models,
Intelligent Solution System Towards Parts Logistics Optimization 1069
starting off and ending from a specific vehicle yard. The pickup and load at the
plant warehouse occur at its limit number of docks, each of which allows several
trucks to pickup/load at specific given time-windows. These activities of loading
and unloading cost fixed time lengths to be completed. A hub is also considered
in this problem, which is able to receive scattered shipments to be integrated.
Figure 1 illustrates a typical route by a truck. The 3D packing problem is sim-
plified to a 2D packing problem, As the bins should initially be stacked into
columns following stacking rule or with pallets. The information of a shipment
contains: supplier, plant warehouse, bin quantity, bin size (length, width and
height), stacking layer limitation, pallet requirement, pickup time interval and
delivery time-interval.
The complication of this problem owes to the numerous constraints. The
main constraints are of the Time-window constraint (TW).
A1. Suppliers and plants have working time-windows so that the whole time-
interval of the visit of the truck, including the loading or unloading should
be contained in this working time-window.
A2. According to the limit of the dock of the supplier and warehouse, there
should be only limited number of trucks visiting this specific depot at the
same time. More trucks than this number leads queueing.
A3. Each shipment has its pickup and delivery time-intervals.
B1. Bin stacking should follow given rules: the bins of the same type from the
same supplier can be stacked together and some specific types must be
loaded on pallets, which can be stacked. The stacking layers are limited.
B2. The bins must be contained within the loading surface of the truck, and no
two bins can overlap.
B3. Because the shipments will be handled by forklift, sequence constraint of
loading should be considered [10]: when a location is visited, and the bins of
the corresponding lot, can be downloaded or uploaded through a sequence
of straight movement parallel to the width of the loading area.
There are additional essential constraints: according to the company’s rule, the
shipments from different cities cannot be loaded on a single truck; docks of some
suppliers restrict the truck lengths. Some suppliers restrict the number of visit
times; a very few suppliers request to be the first to be visited site of on a route
and some a few warehouses request to be the last. We add judgement statement
in coding and won’t discuss them further.
In the overview, the followings are decisions to decide a feasible planning:
– Select which warehouses and suppliers a truck visits;
– Plan the route for this truck;
– Pack specific bins with constraints.
We utilize total mileage as cost function, and our object is to minimize it.
1070 Y. Huang et al.
Table 1. continued
3 Solution
Given an X, 2L-VRPTW Solver generates route and packing scheme for each
truck satisfying A1, A3 and B series constraints. The algorithm goes as following:
an asynchronous route search procedure search all the routes space, which is
built as a tree. Branches will be pruned if it violates A1 and A3, or its partial
mileage is larger than searched feasible leaf nodes. When we reach the leaf node
(representing a complete route), we will judge feasibility with respect to B series
constraints. This packing procedure is conducted by the following steps: first,
stacking the bins and pallets into columns; second, pre-justifying whether the
packing at this site is possible by a preassigned threshold of the ratio of the
coverage area of the columns over the total. If the pre-justification is passed, 2-
D Packing Problem (2PP) is solved by searching, following a heuristic function
with respect to the wasted area, convexity and covered area. We reduce the
search space by limiting the search width.
1072 Y. Huang et al.
Table 2. Constraints
Constraint Formula
for ∀i ∈ {1, 2, · · · , N }, ∀k ∈ {0, 1, · · · , ni }:
yi0 = yin = y0 , ti0 = 0
Working TW (A1) i
ti(k+1) = tik + T (yik , yi(k+1) ) + wi(k+1)
tik + T (yi k, yi(k+1) ) ∈ T W (yi(k+1) )
for ∀i ∈ {1, 2, · · · , N }, ∀k ∈ {0, 1, · · · , ni }:
Queue and dock(A2) wi(k+1) = T H(yi(k+1) ) + (Ψ (DC(yi(k+1) , {t |t +
i (k +1) i k
T (y , y ) ≤ tik + T (yik , yi(k+1) )})) − (tik + T (yik , yi(k+1) )))
i k i (k +1)
for ∀j ∈ {0, 1, · · · , N S } :
N N N N
Shipment TW(A3)
i=1 tipj xij ∈ T P ( i=1 yipj xij ), i=1 tidj xij ∈ T D( i=1 yidj xij )
pj < dj
for ∀j ∈ {0, 1, · · · , N S }, ∀θ ∈ {0, 1, · · · , nsi }:
B
θ ljθ = Nj , ljθ ≤ Lj
p
for ∀j, θ, and γ ∈ {0, 1, · · · , b } and Pj = 1:
Stacking Rule (B1) θ
0 < ujθγ ≤ W P − Wj , 0 < vjθγ ≤ LP − Lj
N Bj = θ l
γ jθγ jθγ, l ≤ Lj
∀γ = γ one of the followings must be true:
≥ ujθγ + Wj , ujθγ ≥ u + Wj
jθγ jθγ
u
≥ vjθγ + Lj , ujθγ ≥ viθγ + Lj
jθγ
for ∀j, j ∈ {0, 1, · · · , N }, ∀θ, θ ∈ {0, 1, · · · , ns
S }:
V i V
0 < ujθ ≤ i (xi j Wi xij − Wj ), 0 < vjθ ≤ i (xi j Li xij − Lj )
one of the followings must be true:
2-D loading(B2) P
i xij uj θ ≥ i xij (ujθ + Wj (1 − Pj ) + Pj W )
P
i xij ujθ ≥ i xi j (uj θ + Wj (1 − Pj ) + Pj W )
P
i xij vj θ ≥ i xij (vj θ + Lj (1 − Pj ) + Pj L )
P
i xij vjθ ≥ i xi j (vj θ + Lj (1 − Pj ) + Pj L )
for all j = j , dj < dj and pj ≤ pj one of the followings must be true:
P
Loading sequence(B3) i xij ujθ ≥ i xi j (uj θ + Wj (1 − Pj ) + Pj W )
P
i xij vj θ ≥ i xij (vj θ + Lj (1 − Pj ) + Pj L )
P
i xij vjθ ≥ i xi j (vj θ + Lj (1 − Pj ) + Pj L )
Hub shipments for ∀j = j : 0 ≤ [ i tip xij − i tip xij ]h
j j jj
Shipments must be loaded for j 0, 1, , NS : N
i=1 xij = 1
3.3 Initialization
The optimization process fetches initial solution, then bundles all the ship-
ments. At each TS step, all neighborhood solutions will be evaluated asyn-
chronously, and the reverse bundle move will be declared tabu. We also keep
track of the best solution so far. The overall xrithm is shown in Algorithm 1.
1074 Y. Huang et al.
Algorithm 1 TS Optimization
Require: Initial scheme X0 , bundle break threshold B
1: Bundle all shipments
2: initial tabu list T
3: set X = X0 , best solution keeper X ∗
4: repeat
5: Evaluate all non-tabu neighbor solution {Xk } of X, where (X → Xk ) ∈
/T
6: Choose X ∈ {Xk } with the least mileage
∗
7: if mileage of X ≤ mileage that of X then
8: X∗ ← X
9: end if
10: T = T ∪ {(X ← X )}
11: X ← X
12: until given computation resorces reached
13: return X ∗
3.5 Post-optimization
We tested on a data set of 311 shipments obtained from SAIC logistic division,
containing 54 variants of bins, alone with the data of 45 suppliers and 8 plant
warehouses. We run the test on Intel Core i7 CPU 2.70 GHz*4. We ran TS for
5000 s and compared TS with (TS-WB) and without (TS-NB) shipment bundle.
Table 3 shows the results.
Fig. 4. Results of optimization process. (a) shows the mileage descending process with
bundle and without bundle. (b) shows the robustness across 24 hyperparameter com-
binations (all with bundle). x-axis indicated ranking of hyperparameter combination
in descent order, and y-axis indicate the total mileage of each combination
In Table 3, we can see that post-optimization can only have limited effect.
This is because optimization and pot-optimization compete for time-window
occupation. However, Post-optimization is necessary, because in circumstance
when a massive number of shipments required between two very close stations,
it is always better to arrange one truck to carry these shipments back and forth
without going back to truck yard, such circumstance is difficult to be considered
in optimization process. The mileage changing curve with respect to time is
shown in Fig. 4(a). We can see that the bundle technique, leading to larger
TS step, made it easier to escape local minima, ended in better convergence.
Robustness experiment in Fig. 4(b) shows that, TS algorithm is very robust,
referring to initial mileage scale.
SAIC Logistic Management System (SPRUCE): Our research is assisted by SAIC
motor, and adopted into their auto part logistic management system. The Shang-
hai Auto Incorporate Company (SAIC) Motor Co. Limited is the largest auto
maker of China and has its own supply chain system that contains more than 500
suppliers, 4 plants and delivers more than 10000 shipments daily. The algorithm
above is utilized for the SAIC Motor to build up a parts logistics scheme man-
agement system consisting of several modules, illustrated in Fig. 5. Data Mainte-
nance Module maintains station and truck states, pack parts into bins. Shipment
Management Module accepts new shipment and then transfers them into Global
Optimization Module. This Module will possibly utilize Manual Planning Mod-
ule to Handle temporary shipments and very important shipments. The results
will be eventually processed by Graph Generator to routing map, stowage plan,
time plan and schedule of truck resources. The whole system can handle 2000
shipments in about 10 min to 15 min.
1076 Y. Huang et al.
5 Conclusion
In this paper, we established a parts logistics optimization model, which is math-
ematically a 2-Dimensional Loading Capacitated Multi-Depot Heterogeneous
Vehicle Routing Problem with Time Windows, and presented algorithms for its
systemic solution, by using TS accelerated by pruning methods and shipment
bundling techniques. This systemic solution has shown efficient power to the
optimization problem and has been utilized for the SAIC to establish its parts
logistics scheme management systems. We will concentrate on further acceler-
ating the computation. For instance, for the 2PP, a deep Q-learning method
is being used and another effort is taken to incorporate genetic algorithms by
utilizing parallel computing.
References
1. Akintoye, A., McIntosh, G., Fitzgerald, E.: A survey of supply chain collaboration
and management in the UK construction industry. Eur. J. Purch. Supply Manag.
6, 159–168 (2000)
2. Anbuudayasankar, S.P., Ganesh, K., Mohapatra, S.: Survey of methodologies for
TSP and VRP. In: Models for Practical Routing Problems in Logistics, pp. 11–42.
Springer International Publishing, Cham (2014)
3. Angelelli, E., Grazia Speranza, M.: The periodic vehicle routing problem with
intermediate facilities. Eur. J. Oper. Res. 137(2), 233–247 (2002)
4. Bellman, R.: Dynamic programming treatment of the travelling salesman problem.
J. ACM (JACM) 1, 61–63 (1962)
5. Cordeau, J.F., Desaulniers, G., Desrosiers, J., Solomon, M.M., Soumis, F.: VRP
with Time Windows (1999)
6. Crevier, B., Cordeau, J.F., Laporte, G.: The multi-depot vehicle routing problem
with inter-depot routes. Eur. J. Oper. Res. 176(2), 756–773 (2007)
7. Taillard, É.D., Laporte, G., Gendreau, M.: Vehicle routeing with multiple use of
vehicles. J. Oper. Res. Soc. 47(8), 1065–1070 (1996)
8. Dantzig, G.B., Ramser, J.H.: The truck dispatching problem. Manag. Sci. 6(1),
80–91 (1959)
9. Gendreau, M., Iori, M., Laporte, G., Martello, S.: A Tabu search heuristic for the
vehicle routing problem with two-dimensional loading constraints. Networks 51(1),
4–18 (2008)
Intelligent Solution System Towards Parts Logistics Optimization 1077
10. Iori, M., Salazar-Gonzalez, J.J., Vigo, D.: An exact approach for the vehicle routing
problem with two-dimensional loading constraints. Transp. Sci. 41, 253–264 (2007)
11. Iori, M., Salazar-González, J.J., Vigo, D.: An exact approach for the vehicle routing
problem with two-dimensional loading constraints. Transp. Sci. 41(2), 253–264
(2007)
12. Laporte, G.: The vehicle routing problem: an overview of exact and approximate
algorithms. Eur. J. Oper. Res. 59(2), 231–247 (1991)
13. Martelot, E.L., Hankin, C.: Fast multi-scale community detection based on local
criteria within a multi-threaded algorithm. Comput. Sci. (2013)
14. Olhager, J., Selldin, E.: Supply chain management survey of Swedish manufactur-
ing firms. Int. J. Prod. Econ. 89(3), 353–361 (2004)
15. Romano, P.: Co-ordination and integration mechanisms to manage logistics pro-
cesses across supply networks. J. Purch. Supply Manag. 9(3), 119–134 (2003)
16. Tan, K., Kannan, V.R., Handfield, R.B., Ghosh, S.: Supply chain management: an
empirical study of its impact on performance. Int. J. Oper. Prod. Manag. 19(10),
1034–1052 (1999)
17. van der Vaart, T., van Donk, D.P.: A critical review of survey-based research in
supply chain integration. Int. J. Prod. Econ. 111(1), 42–55 (2008)
Optimal Air Traffic Flow Management
with Carbon Emissions Considerations
1 Introduction
emissions trading system, where all airlines in Europe are required to monitor and
report their emissions. Within this emissions trading system, each airline is given an
annual tradeable emission level for its flights [3]. As a result, in addition to minimizing
costs, airlines are concerned about minimizing their emissions.
Fuel burning is the source of transportation emissions. Among the components
emitted during the fuel burning, carbon dioxide (CO2) has received a significant focus
in many transportation-related pieces of research. Burning one liter of aviation fuel
emits around 2.527 kg of CO2 [4]. Fuel burning is related to the speed, where
according to [5], fuel consumption decreases as the speed increases up to certain speed
level after which the fuel consumption starts to increase as the speed increases. Speed
levels affect the time needed to travel a certain distance and thus it has an effect on the
network capacities and consequently on the delays. Despite the importance of the
aircraft speed level and its effect on the CO2 emissions, network models on the air
traffic flow management only used the speed to control the network capacities. This
paper links the CO2 emissions with the speed level and proposes a bi-objective mixed
integer network model that considers the total network delays and the CO2 emissions in
the air traffic flow management (ATFM) problem. We illustrate the effect of CO2
emissions on the total network delays.
This paper is structured as follows. Section 2 presents the relevant work on the air
traffic flow management network models. Section 3 provides the bi-objective model
formulation. Section 4 illustrates the bi-objective model with a numerical example.
Finally, Sect. 5 concludes the paper and provides some suggestions for future work.
2 Literature Review
To the best of the authors’ knowledge, the main focus in ATFM network models
was the minimization of the total network delay costs, where most of the ATFM
models were single objective models. In addition to that, CO2 emissions as function of
speed and their effects on the network delays were not studied in the literature. This
paper targets to fill this gap by introducing a bi-objective mathematical model that will
help in studying the tradeoff between CO2 emissions and network delays.
In this section, we present a network optimization model for managing flights taking
into consideration airports’ capacities, en-route sector capacities, delays and CO2
emissions. The model proposed in this work considers emissions during the flight
cruising stage. Due to data unavailability, landing and climbing are assumed to be
identical and were ignored in this model. In the upcoming sections, we present the
linear approximation of fuel consumption function used to calculate the CO2 emissions.
Then, we provide the model sets, parameters and decisions variables. After that, we
present the objective functions and the constraints. Finally, we illustrate the solution
technique.
100
Fuel consumption (L/hr)
80
60
40
20
0
350 550 750 950 1150
True airspeed (km/h)
In the ATFM model, the aircraft speed is not considered as a decision variable,
instead the common decision variable is the time spent in each sector. As a result, the
speed is controlled by adjusting the time spent in each sector to travel a known
distance. Thus, we plot the relation between the fuel consumption and the speed inverse
(see Fig. 2), which can be multiplied by the distance to result in the time.
P2
70
60
50
40
30
20
10 P1 Popt
0
0.0008 0.0013 0.0018 0.0023 0.0028
Inverse of the airspeed (h/km)
The fuel consumption with respect to the inverse of speed can be approximated
using two linear functions. The first linear function connects between the first point (P1)
and the optimal point (Popt) and the second linear function connects between the second
point (P2) and the optimal point (Popt). The two functions are defined as follows:
F1 ðtÞ ¼ F1 D S1 t þ S1 r1 D; ð1Þ
F2 ðtÞ ¼ F2 D þ S2 t S2 r2 D; ð2Þ
where F1 ðtÞ and F2 ðtÞ are the fuel consumption function with respect to the time
needed to travel a distance D in liter (L) using the first or the second linear lines
respectively. F1 and F2 are the fuel consumption at P1 and P2 respectively
(F1 ¼ 25:36225 L=km, F2 ¼ 92:74225 L=km), D is the distance travelled (km) S1 and
S2 are the slope between P1 and Popt and the slope between P2 and Popt respectively
(S1 ¼ 14722:2 L=h, S2 ¼ 43228:5 L=h), t is the time spent (h) and r1 and r2 are the
speed inverse at P1 and P2 respectively (r1 ¼ 0:0009 h=km and r2 ¼ 0:0027 h=km).
3.2 Sets
• F : Set of flights.
• T : Set of time periods.
• K: Set of all airports in the ATFM network.
1082 S. Hamdan et al.
3.3 Parameters
• wj;tf : A binary variable that is equal to one if flight f arrives at resource j by time t. In
other words, if wj;tf ¼ 1 at any time period t, then it will be equal to one for all the
later periods.
• Rjf ;1 and Rjf ;2 : Decision variables that represent the total time spent in each resource
j by flight f using the first and the second linear functions as described in Sect. 3.1.
• Yjf ;1 and Yjf ;2 : Binary variables that are used to link Rjf ;1 and Rjf ;2 to the first and
second linear functions respectively.
min C ¼ C1 þ C2 ð3Þ
Where
Optimal Air Traffic Flow Management with Carbon Emissions 1083
X X
f f
C1 ¼ Ca t akf wk;t wk;t1
f 2F
t 2 Tkf
k ¼ destf
X X ð3:1Þ
f f
Ca t dfk wk;t wk;t1
f 2F
t 2 Tkf
k ¼ orignf
X X
f f
C2 ¼ Cg t dfk wk;t wk;t1 ð3:2Þ
f 2F
t 2 Tkf
k ¼ orignf
!
XX
minE ¼ ECO2 F1 Djf Yjf ;1 TD S1 Rfj ;1 þ S1 r1 Djf Yjf ;1
f 2F j 2 P f
! ð4Þ
XX
þ F2 Djf Yjf ;2 þ TD S2 Rfj ;2 S2 r2 Djf Yjf ;2
f 2F j 2 P f
Subject to:
X
f f
wk;t wk;t1 Dk ðtÞ; 8k 2 K; t2T ð5Þ
f 2 F : k ¼ orignf
X
f f
wk;t wk;t1 Ak ðtÞ; 8k 2 K; t2T ð6Þ
f 2 F : k ¼ destf
X
wj;tf wjf0 ;t Sj ðtÞ; 8j 2 P f ; t 2 T ð7Þ
f 2 F : j 2 P f ; j 0 2 P f ð j þ 1Þ
f f
worign ;T f
¼ wdest ;T f
; 8f 2 F G ð8Þ
f orignf f destf
f
wj;t1 wj;tf 0; 8f 2 F ; j 2 K [ P f ; t 2 Tjf ð9Þ
X X
wj;tf wjf0 ;t ¼ Rfj ;1 þ Rfj ;2 ; 8j 2 P f ð12Þ
t2T f 2 F : j 2 P f ; j0 2 P f ðj þ 1Þ
Yf;1 f;2
j ; Yj 2 f0; 1g; Rf;1 f;2
j ; Rj 0 8f 2 F ; j 2 P f ð17Þ
The objective function (3) minimizes the total network delays that consist of the air
delays in Eq. (3.1), where the ground delays are subtracted from the total delays, and
the total ground delays in Eq. (3.2). The objective function (4) minimizes the total CO2
emissions as a function of the time needed to travel in each sector. Constraints (5)–(7)
are the network capacity constraints for the airport departure airport arrival and airspace
sector respectively. The capacity constraints ensure that at each period, the number of
flights does not exceed the resource capacity. Constraint (8) ensures that if a flight
departs, it will arrive within its allowable time. Constraint (9) and Constraint (10) are
the time and path connectivity constraints. Constraint (11) ensures that at least the
minimum turnaround time between any two connected flights is satisfied. Constraint
(12) defines the actual time spent in each sector by each flight and assigns it to either
Rfj ;1 or Rfj ;2 which is controlled by Constraints (13)–(15). Constraint (13) ensures that
either Yjf ;1 or Yjf ;2 is selected. Constraint (14) links Yjf ;1 with Rfj ;1 and to the first linear
line. Constraint (15) links Yjf ;2 with Rfj ;2 and to the second linear line. Constraints (16)
and (17) ensures that wj;tf ; Yjf ;1 ; Yjf ;2 are binary variables and Rfj ;1 and Rfj ;2 are positive
variables.
where C and E correspond to the total delay cost and the total CO2 emissions objective
functions defined by Eqs. (3) and (4) respectively, Copt : is the optimal total delay cost
when the model is solved considering only Eq. (3). Eopt : is the optimal total CO2
emissions obtained by solving the model using the objective function defined in
Eq. (4). The parameters a and b are the total cost and the total CO2 emissions
importance values defined by decision-makers, where a þ b ¼ 1.Varying the values of
the importance weights provides Pareto solutions.
In this section, we study the trade-off between minimizing the total delay cost and
minimizing the total CO2 emissions by presenting all the non-dominating Pareto
optimal solutions. To achieve this goal, we assume a grid of twenty-five airspace
sectors and fifteen airports (see Fig. 3). Two hundred flights are assigned to the airports
randomly, and their paths include all the sectors passing through the straight line
connecting their departure and arrival airports. Departing time for each flight is set
randomly, and the planning horizon is three and a half hours and each period accounts
for three minutes (TD ¼ 3) resulting in T ¼ 70 periods. In this example, the capacities
of the airspace sectors, airport departures and arrivals are assumed to be nine flights per
period. All flights are assumed to be performed using the same aircraft model due to
data availability; thus, the fuel consumption approximation functions described in
Sect. 3.1 is used in this example. The air delay (Ca ) and the ground delay (Cg ) costs are
348 and 270 €/period. The CO2 emissions factor (ECO2 ) is 2.527 kg/L.
Solving the model for the total delay cost function in (3) only subject to the
constraints (5)–(17) results in an optimal total delay cost (Copt ) of 24840 €. This
Fig. 3. The network used in the illustrative example. The twenty-five square grids represent the
airspace sectors. The fifteen circles represent the location of the fifteen airports.
1086 S. Hamdan et al.
solution produces a total of 12851533 kg of CO2. Then, solving the model for the total
CO2 emissions only (Eq. (4)) subject to constraints (5)–(17) results in an optimal
amount of CO2 emissions (Eopt ) of 12715558 kg but with a total delay cost of 191190
€. The difference between the two extreme cases are 166350 € and 135974 kg.
A Pareto front (see Fig. 4) was developed using Copt and Eopt and while solving the
model using Eq. (18) subject to constraints (5)–(17). The value of a was from 0 to 1
with an increment of 0.001. The value of b was calculated as 1 a. As can be noticed
from this Pareto set, on average, reducing 135974 kg of the emitted CO2 comes at a
delay cost of 166350 € (reducing 1 kg of CO2 costs 1.22 € of delays).
12860000
12840000
Total CO2 emissions [kg]
12820000
12800000
12780000
12760000
12740000
12720000
12700000
23000 43000 63000 83000 103000 123000 143000 163000 183000 203000
Total Delay Cost [€]
5 Conclusions
Despite its increasing importance, existing ATFM models with CO2 emissions are
limited. This paper presents two linear approximation functions for the fuel con-
sumption speed relation and includes it to an ATFM model under a bi-objective
configuration. The bi-objective model was solved using the WCCM technique. A nu-
merical example was developed to study the tradeoff between the network delays and
the CO2 emissions. Results showed that an average reduction in the emitted CO2 of
1 kg is equivalent to increasing the delay costs of 1.22 €. This tradeoff can be further
utilized in developing emission reduction policies and strategies that take into con-
sideration the network delays. As a future work, using additional datasets, a better
approximation can be found for the function relating the fuel consumption to the speed.
Climbing and landing emissions functions can be developed if more data can be
available and the different landing techniques can be studied and compared.
Optimal Air Traffic Flow Management with Carbon Emissions 1087
Acknowledgement. The authors would like to thank Prof. Ali Akgunduz, Professor of
Mechanical, Industrial and Aerospace Engineering at Concordia University – Canada, for pro-
viding the fuel consumption data used in this study.
This work was supported by the University of Sharjah [grant number 1702040585].
References
1. Dekker, R., Bloemhof, J., Mallidis, I.: Operations Research for green logistics - an overview
of aspects, issues, contributions and challenges. Eur. J. Oper. Res. 219, 671–679 (2012)
2. Gössling, S., Broderick, J., Upham, P., Ceron, J.P., Dubois, G., Peeters, P., Strasdas, W.:
Voluntary carbon offsetting schemes for aviation: efficiency, credibility and sustainable
tourism. J. Sustain. Tour. 15, 223–248 (2007)
3. European Commission: Reducing emissions from aviation. https://ec.europa.eu/clima/
policies/transport/aviation_en. Accessed 4 Feb 2019
4. Hayward, J.A., O’Connell, D.A., Raison, R.J., Warden, A.C., O’Connor, M.H., Murphy, H.
T., Booth, T.H., Braid, A.L., Crawford, D.F., Herr, A., Jovanovic, T., Poole, M.L.,
Prestwidge, D., Raisbeck-Brown, N., Rye, L.: The economics of producing sustainable
aviation fuel: a regional case study in Queensland Australia. GCB Bioenergy 7, 497–511
(2015)
5. Clarke, J.-P., Lowther, M., Ren, L., Singhose, W., Solak, S., Vela, A., Wong, L.: En route
traffic optimization to reduce environmental impact. http://web.mit.edu/aeroastro/partner/
reports/proj5/proj5-enrouteoptimiz.pdf (2008)
6. Bertsimas, D., Patterson, S.S.: The air traffic flow management problem with enroute
capacities. Oper. Res. 46, 406–422 (1998)
7. Bertsimas, D., Patterson, S.S.: The traffic flow management rerouting problem in air traffic
control: a dynamic network flow approach. Transp. Sci. 34, 239–255 (2000)
8. Lulli, G., Odoni, A.: The European air traffic flow management problem. Transp. Sci. 41,
431–443 (2007)
9. Agustín, A., Alonso-Ayuso, A., Escudero, L.F., Pizarro, C.: On air traffic flow management
with rerouting. Part I: deterministic case. Eur. J. Oper. Res. 219, 156–166 (2012)
10. Agustín, A., Alonso-Ayuso, A., Escudero, L.F., Pizarro, C.: On air traffic flow management
with rerouting. Part II: stochastic case. Eur. J. Oper. Res. 219, 167–177 (2012)
11. Diao, X., Chen, C.H.: A sequence model for air traffic flow management rerouting problem.
Transp. Res. Part E Logist. Transp. Rev. 110, 15–30 (2018)
12. Mukherjee, A., Hansen, M.: A dynamic rerouting model for air traffic flow management.
Transp. Res. Part B Methodol. 43, 159–171 (2009)
13. Andreatta, G., Dell’olmo, P., Lulli, G.: An aggregate stochastic programming model for air
traffic flow management. Eur. J. Oper. Res. 215, 697–704 (2011)
14. Bertsimas, D., Gupta, S.: Fairness and collaboration in network air traffic flow management:
an optimization approach. Transp. Sci. 50, 57–76 (2015)
15. Chen, J., Cao, Y., Sun, D.: Modeling, optimization, and operation of large-scale air traffic
flow management on spark. J. Aerosp. Inf. Syst. 14, 504–516 (2017)
16. Hamdan, S., Cheaitou, A., Jouini, O., Jemai, Z., Alsyouf, I., Bettayeb, M.: On fairness in the
network air traffic flow management with rerouting. In: 2018 9th International Conference on
Mechanical and Aerospace Engineering (ICMAE), pp. 100–105. IEEE, Budapest, Hungary
(2018)
1088 S. Hamdan et al.
17. Hamdan, S., Cheaitou, A., Jouini, O., Jemai, Z., Alsyouf, I., Bettayeb, M.: An environmental
air traffic flow management model. In: 2019 8th International Conference on Modeling,
Simulation, and Applied Optimization (ICMSAO). IEEE, Bahrain (2019)
18. Akgunduz, A., Kazerooni, H.: A non-time segmented modeling for air-traffic flow
management problem with speed dependent fuel consumption formulation. Comput. Ind.
Eng. 122, 181–188 (2018)
19. Hamdan, S., Larbi, R., Cheaitou, A., Alsyouf, I.: Green Traveling purchaser problem model:
a bi-objective optimization approach. In: 2017 7th International Conference on Modeling,
Simulation, and Applied Optimization, ICMSAO 2017. IEEE, United Arab Emirates (2017)
Scheduling Three Identical Parallel
Machines with Capacity Constraints
1 Introduction
In the unrelated parallel machine scheduling problem, we are given a set of n
jobs J = {1, 2, · · · , n} and m parallel machines. Each job Jj has a positive pro-
cessing time pij depending on machine i and must be processed for the respective
amount of time on one of the m machines, and may be assigned to any of them.
Every machine can process at most one job at a time. The completion time of job
Jj in a schedule
is denoted by Cj . We aim to minimize the total weighted com-
pletion time j∈J wj Cj where wj denotes a given nonnegative integral weight
of job Jj which is a measure for its importance. For the sake of convenience,
Supported by Natural Science Foundation of China(Grant Nos. 11531014, 11871081,
11871280, 11471003, 11501171) and Qinglan Project.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 1089–1096, 2020.
https://doi.org/10.1007/978-3-030-21803-4_107
1090 J. Sun et al.
we denote this problem as Rm|ri,j | wj Cj where ri,j is often called as the job
j’s arrival/release time on machine i. The special case of this problem, identi-
i.e. pij = pj holds for each job j and all machines i is
cal parallel scheduling,
denoted by P m|ri,j | wj Cj . When machines are identical, uniformly related,
or a special case of unrelated machines, PTASes are known [1,4,11].
For the case of m ≥ 1 and ri,j = 0, Skutella [9] presented a 32 -approximation
algorithm in 1998. Recently, for a small constant > 0, Bansal et al. [3] gave a
( 32 − )-approximation algorithm improving upon the natural barrier of 32 which
follows from independent randomized rounding. In simplified terms, their result
was obtained by an enhancement of independent randomized rounding via strong
negative correlation properties. In 2017, Kalaitzis et al. [7] took a different app-
roach and proposed to use the same elegant rounding scheme for the weighted
completion time objective as devised by Shmoys and Tardos [8] for optimizing
a linear function subject to makespan constraints. Their main result is a 1.21-
approximation algorithm for the natural special case where the weight of a job
is proportional to its processing time (specifically, all jobs have the same Smith
ratio), which expresses the notion
that each unit of work has the same weight.
For the problem Rm|ri,j | wj Cj , Skutella gave a 2-approximation algorithm
in 2001 [10]. It has been a long standing open problem if one can improve upon
this 2-approximation. Im and Li answered this question in the affirmative by
giving a 1.8786-approximation [6].
Most of the parallel machine scheduling models assume that each machine
has no capacity constraints, which means every machine can process an arbi-
trary number of jobs. In general, it is quite important to balance the number of
jobs allocated to each single production facility in many flexible manufacturing
systems, for example, in VLSI chip production.
For the case of two identical parallel machines with capacity constraint, Yang,
Ye and Zhang [12] presented a 1.1626-approximation algorithm which has the
first non-trivial ratio to approximate this problem (m = 2) by semidefinite pro-
gramming relaxation. In this paper, we extend the techniques in [9,12] to com-
plex semidefinite programming and approximate the problem when m = 3 with
performance ratio of 1.4446.
The rest of the paper is organized as follows.
In Sect. 2 we introduce the
approximation preserving reduction of P 3|q| wj Cj to Max-(q; q; n-2q) 3-Cut
and present the translated guarantee. Then, in Sect. 3 we develop a CSDP-based
approximation algorithm for Max-(q; q; n-2q) 3-Cut and present our main results.
2 Preliminaries
yj ∈ {1, ω, ω 2 }, j = 1, 2, · · · , n.
The underlying intuition for the reduction from P 3|q| wj C j to Max-(q; q; n −
2q) 3-Cut is as following. A solution for any instance of P 3|q| wj Cj can be seen
as a two-phases schedule: first assigning the jobs to one of the three machines;
then sequencing the jobs on each machine. It has been proved that once the
jobs are assigned they must be sequenced in the non-descending order of pj /wj .
We say i ≺ j if i = j and pi /wi ≤ pj /wj . Therefore, if i ≺ j, and i and j are
assigned to thesame machine, then i should always be processed earlier than
j. Then P 3|q| wj Cj is simply a partition of the n jobs. Now we consider the
jobs assigned to one of the machines, say jobs 1, 2, · · · , s. The total weighted
completion time of the s jobs is
s
s
wj Cj = wj pj + wj pi . (1)
j=1 j=1 i≺j
If the graph is defined on n nodes which correspond to the n jobs, then the
assignment of the n jobs can be seen as dividing the vertex n set into three sub-
sets. Furthermore, the total weighted completion time j=1 wj Cj will be the
n
total weight of edges within each of the three sub-graphs plus j=1 wj pj . Then,
n
roughly speaking, minimizing j=1 wj Cj is equivalent to minimizing the total
weight of edges within each of the three sub-graphs or to maximizing the total
weight of edges across the three sub-graphs.
We associate each instance of P 3|q| wj Cj with a complete undirected graph
G = (V, E) on the vertex set V = {1, 2, · · · , n} that corresponds to the job set
{J1 , J2 , · · · , Jn }; and the weight wij of the edge (i, j) ∈ E given by
wij = min{wi pj , wj pi }. (2)
Each partition S = {S1 , S2 , S3 } of the vertex set V can be interpreted as a
feasible schedule of the n jobs on three machines. Note that |Si | ≤ q (i = 1, 2, 3)
1092 J. Sun et al.
n
n
wT O + wj pj = wj Cj + w(S1 , S2 , S3 ), (3)
j=1 j=1
where Cj is the completion time of job Jj in the schedule corresponding to the
partition S = {S1 , S2 , S3 }; wT O = i<j wij denotes the total weights of all the
edges of G; and w(S1 , S2 , S3 ) denotes the cut value ofpartition S = {S
1 , S2 , S3 }.
n
It is worth noting that for any given instance P 3|q| wj Cj , wT O + j=1 wj pj
is a constant.
n Therefore, by equality (3), minimizing the total weighted comple-
tion time j=1 wj Cj is equivalent to maximizing the cut value w(S1 , S2 , S3 ).
Let the minimum value of P 3|q| wj Cj be Z ∗ and the maximum value of the
corresponding Max-(q; q;n-2q) 3-Cut be w∗ . By generalizing the technique, we
have the following result of this section.
Lemma 1. For any ρ(k) ≤ 1, a ρ(k)-approximation algorithm for Max-(q; q; n−
2q) 3-Cut can be translated to an algorithm for P 3|q| wj Cj with a performance
guarantee of 1 + (1 − ρ(k))/(2 − k).
(i) Sort the vertices in Ŝl such that δ(i1 ) ≥ · · · δ(i|Ŝl | ) where δ(i) = / Sˆ1l
j∈ wij
where (i ∈ Ŝl ).
(ii) Move the point i|Ŝl | from Ŝl to Sˆ3 , namely Ŝl = Ŝl \ {i|Ŝl | } and Sˆ3 =
Sˆ3 ∪ {i }.
|Ŝl |
2. If |S1 | ≥ q ≥ |S2 |, then iteratively, perform the following operations (i)–(ii)
until |Sˆ1 | = q and |Ŝl | ≤ q for each l = (2, 3):
(i) Sort the vertices in Sˆ1 such that δ(i1 ) ≥ · · · δ(i|Sˆ1 | ) where δ(i) = j ∈/ Sˆ1 wij
where (i ∈ Sˆ1 ).
(ii) Move the point i|Sˆ1 | from Sˆ1 to Ŝl , namely Sˆ1 = Sˆ1 \ {i|Sˆ1 | } and Ŝl =
Ŝl ∪ {i|Sˆ1 | }.
We are now ready to present our main results.
For the sake of analyzing the solution returned by the algorithm, we define
a real function:
9 1
f (x) = (arccos2 (−x) − arccos2 ( x)), x ∈ [−1, 1].
8π 2 2
For a given 0 ≤ θ ≤ 1, let
1 − f (θx)
α(θ) = min ,
− 12 ≤x<1 1−x
b(θ) = 1 − f (θ),
f (θ) − f (θx)
c(θ) = min .
− 12 ≤x<1 1−x
Then we let
2k
d(θ) = max{α(θ), b(θ) + c(θ)},
3
1094 J. Sun et al.
and for q ∈ [ n3 , 2n
3 ),
n2 − n
β(θ) = b(θ) + c(θ).
3q(2n − 3q)
Then, we have
Lemma 2. For any given θ ∈ [0, 1] and q ∈ [ n3 , 2n
3 ), Algorithm 1yields S at
line 8 satisfying the following inequalities
E[M ] ≥ β(θ)M ∗ ,
where M = [S1 ][S2 ] + [S1 ][S3 ] + [S2 ][S3 ] and M ∗ = q(2n − 3q)
E[w(S)] ≥ d(θ)w∗ .
Theorem 1. For any γ > 0, if random variable Z satisfies inequality (7), then
for the corresponding cut S̃ computed by Algorithm 1, we have
w(S̃) ≥ g · w∗ , (8)
where
γ2
g=γ− . (9)
[d(θ) + γβ(θ)](1 − )
|Si |
Proof. When the algorithm finds a cut S satisfying inequality (7), we let xi = n
for i = 1, 2, 3 and λ = w(S)
w∗ . Then we have:
There are two cases for the cut S̃, either S̃ = rebalance(S) or S̃ = S. In the first
case, it is easy to see that w(S̃) ≥ 3x1 1 3x1 2 λw∗ , whereas in the second case we
can obviously have that w(S̃) ≥ λw∗ . Hence w(S̃) ≥ 3x1 1 3x1 2 λw∗ , then using the
inequality for λ, we get
w(S̃) ≥ f · w∗ , (11)
Scheduling Three Identical Parallel Machines with Capacity Constraints 1095
where
(1 − ) x1 x2 + (x1 + x2 )(1 − x1 − x2 )
f = [d(θ) + γβ(θ)] −γ . (12)
9x1 x2 3x1 x2
In order to simplify the above equality and to remove the dependence on x =
(x1 , x2 ), we consider function f for x1 , x2 ≥ 0. It is easy to calculate that f gets
a minimum at x = (x∗1 , x∗2 ) where x∗1 = x∗2 = [d(θ)+γβ(θ)](1−)
3γ , and it gets the
γ2
value [γ − [d(θ)+γβ(θ)](1−) ] which is, by definition the value of function g.
It is easily seen that the function g is concave and has a maximum at
d(θ) 1
γ= ( − 1) (13)
β(θ) 1 − β(θ)(1 − )
For the sake of analysis being simple, we may suppose that = 0 in the rest
proof. In the worst case, i.e. q = n3 , we have the following result:
Lemma 3. Algorithm 1, with probability almost 1, generates a schedule for
P 3| n3 | wj Cj (the most capacitated case) whose performance guarantee is
1.4446.
Proof. Let
d(θ) 1
γ= ( − 1),
β(θ) 1 − β(θ)
we have
d(θ)
ρ(k) ≥ .
(1 + 1 − β(θ))2
Consider the maximal translated guarantee for P 3| n3 | wj Cj .
1 − ρ(k)
max3 1 +
k∈[1, 2 ] 2−k
⎧ 2k ⎫
⎪
⎨ 1− √3
b+c
2 1− √α(θ) 2
⎪
⎬
(1+ 1−β(θ)) (1+ 1−β(θ))
= max3 min 1 + ,1 + . (14)
k∈[1, 2 ] ⎪
⎩ 2−k 2−k ⎪
⎭
2k b+c
1− √3 2
(1+ 1−β(θ))
One can easily verify that 1+ 2−k is a decreasing function of k ∈ [1, 32 ],
α(θ)
1− √ 2
(1+ 1−β(θ))
and 1 + 2−k is an increasing function of k ∈ [1, 32 ]. Then we can see
that the maximum value of equality (14) occurs at two possible points: k =
3(α(θ)−c(θ))
2b(θ) if 1 ≤ 3(α(θ)−c(θ))
2b(θ) ≤ 32 , where the two functions have the identical
3(α(θ)−c(θ))
value; or k = 3
2 if 2b(θ) ≥ 32 . In this particular case
3(α(θ) − c(θ))
k=
2b(θ)
yields the maximal value less than 1.4446, for sufficiently large n.
1096 J. Sun et al.
Since it is easy to verify the approximation ratio is always better than that of
the worst case (q = n3 ) which is 1.4446, we have
Theorem 2. Algorithm 1 generates a schedule for P 3|q| wj Cj ( n3 ≤ q ≤ n)
whose performance guarantee is 1.4446 with probability almost 1.
4 Conclusions
In this paper, we have presented a CSDP-based approximation algorithm for
scheduling on three identical parallel machines with capacity on each machine.
It is still open to ask whether this approach could be applied to approximating
the more general scheduling on m machines with capacity constraints.
References
1. Afrati, F., Bampis, E., Chekuri, C., Karger, D. et al.: Approximation schemes for
minimizing average weighted completion time with release dates. In: Proceedings
of the 40th Annual IEEE Symposium on Foundations of Computer Science, pp.
32–43 (1999)
2. Andersson, G.: An approximation algorithm for max p-section. In: Meinel, C.,
Tison, S. (eds.) STACS 1999. LNCS, vol. 1563, pp. 237–247
3. Bansal, N., Srinivasan, A., Svensson, O.: Lift-and-round to improve weighted com-
pletion time on unrelated machines. In: Proceedings of the 48th Annual ACM
Symposium on Theory of Computing, pp. 156–167 (2016)
4. Chekuri, C., Khanna, S.: A PTAS for minimizing weighted completion time on
uniformly related machines. In: Proceedings of 28th International Colloquium on
Automata, Languages, and Programming, pp. 848–861. Springer, Berlin (2001)
5. Goemans, M.X., Williamson, D.P.: Approximation algorithms for MAX-3-CUT
and other problems via complex semidefinite programming. J. Comput. Syst Sci.
68, 442–470 (2004)
6. Im, S., Li, S.: Better unrelated machine scheduling for weighted completion time
via random offsets from non-uniform distributions. In: Proceedings of the 57th
Annual Symposium on Foundations of Computer Science, pp. 138–147 (2016)
7. Kalaitzis, C., Svensson, O., Tarnawski, J.: Unrelated machine scheduling of jobs
with uniform smith ratios. In: Proceedings of the 28th Annual ACM-SIAM Sym-
posium on Discrete Algorithms, pp. 2654–2669 (2017)
8. Shmoys, D.B., Tardos, É.: An approximation algorithm for the generalized assign-
ment problem. Math. Program. 62(1–3), 461–474 (1993)
9. Skutella, M.: Semidefinite relaxations for parallel machine scheduling. In: Proceed-
ings of the 39th Annual IEEE Symposium on Foundations of Computer Science,
pp. 472–481 (1998)
10. Skutella, M.: Convex quadratic and semidefinite programming relaxations in
scheduling. J. ACM 48(2), 206–242 (2001)
11. Skutella, M., Woeginger, G.J.: A PTAS for minimizing the total weighted comple-
tion time on identical parallel machines. Math. Oper. Res. 25(1), 63–75 (2000)
12. Yang, H., Ye, Y., Zhang, J.: An approximation algorithm for scheduling two parallel
machines with capacity constraints. Discret. Appl. Math. 130(3), 449–467 (2003)
Solving the Problem of Coordination
and Control of Multiple UAVs by Using
the Column Generation Method
1 Introduction
UAVs (unmanned aerial vehicles) nowadays can be used for various civilian and
military tasks. There has been considerable interest in making these unmanned
vehicles completely autonomous, giving rise to the research area of UAVs. These
are usually seen as rather simple vehicles, acting cooperatively in teams to accom-
plish difficult missions in dynamic, poorly known or hazardous environments
[2–4,9,10,12].
In this paper, we consider a UAV coordination problem that can be described
as follows: we have a fleet of UAVs, a set of waypoints (i.e., missions), and
obstacles (i.e., No-Fly-Zones). The objective is to design the trajectories for
UAVs visiting the waypoints while avoiding the No-Fly-Zones in order to maxi-
mize the overall performance. There are several constraints in our problem, such
as capability constraints, capacity constraints, dynamics constraints, avoidance
constraints and dependency constraints. In fact, our considered problem orig-
inates from the context presented in the papers [2,9]. The difference is that
instead of just minimizing the completion time, our objective is to maximize
the overall performance. The reason to chose this objective function is that per-
forming missions is usually the most important thing in military [5,6,8,11]; and
sometimes because the set of UAVs is smaller than the set of missions, thus one
should choose some missions with higher priority not all to perform. Actually,
this is an extension of our previous work [8]. The problem can be reformulated as
a mixed-integer linear programming (MILP) as in [2,9] and thus it can be solved
by some available solvers. However the computational time increases dramati-
cally in this approach. Another one is an approximate method that simplifies the
coupling between the assignment and trajectory design problems by calculating
and communicating only the key information that connects them [2].
In this work, we investigate an proficient approach based on the column gen-
eration method [1,7] for solving this problem. The column generation is nowa-
days a prominent method to cope with a huge number of variables, numerous
applications based on column generation have been developed [7]. In applica-
tions, constraint matrices of (integer) linear programs are typically sparse and
well structured. Subsystems of variables and constraints appear in independent
groups, linked by a distinct set of constraints and/or variables. We will refor-
mulate this problem in the form of column generation, where the dependency
constraint is handled in the Master Problem (MP). Since the effect of the dual
variables is not enough strong to drive the sub-problem towards trajectories that
lead to feasible solutions, we propose a modified dependency constraint for the
sub-problem to prevent it from generating the infeasible trajectories.
The rest of paper is organized as follows. In Sect. 2, we introduce the con-
sidered UAV coordination problem and its formulation. Our column generation
approach for solving this problem is presented in Sect. 3. Numerical experiments
are reported in Sect. 4 while some conclusions and perspectives are discussed in
Sect. 5.
2 Problem Formulation
vmax , ω, S, W, Z, K, D, G, tD .
Each aircraft p is modeled as a point mass mp moving in 2-D. Let the position
of aircraft p at time-step t be given by (xtp , ytp ) and its velocity by (ẋtp , ẏtp ),
forming the elements of the state vector stp . The aircraft is assumed to be acted
y
upon by control forces (ftp x
, ftp ) in the X and Y directions respectively, forming
the force vector ftp .
The maximum speed vmax,p is enforced by an approximation to a circular
region in the velocity plane given by Eq. (1)
2πh 2πh
ẋtp sin + ẏtp cos ≤ vmax,p , (1)
NC NC
∀t = 1, ..., NT , ∀p = 1, ..., NV , ∀h = 1, ..., NC ,
2πh y 2πh
ftp
x
sin +ftp cos ≤ fmax,p , (2)
NC NC
∀t = 0, ..., NT − 1, ∀p =1, ..., NV , ∀h = 1, ..., NC .
1100 D. M. Nguyen et al.
where fmax,p is related to the maximum turn rate, for travel at constant speed
vmax,p , by
fmax,p
ωp = · (3)
mp .vmax,p
The discretized dynamics of the overall system, applied to all NV vehicles
up to NT time-steps, can be written in the linear form
where A and B are the system dynamics matrices for a unit point mass. In all
cases, the initial conditions are specified from the initial condition matrix S.
The constraints for avoiding a rectangular obstacle1 were developed in [10]
and can be written as
where cjptz is a set of binary decision variables and R is a positive number that is
much larger than any position to be encountered in the problem. If cjptz = 0, the
vehicle p is clear of the No-Fly-Zone j in the direction z (of the four directions
+X, −X, +Y, −Y ) at the time t step. If cjpkz = 1, the constraint is relaxed. The
final inequality ensures that no more than three of the constraints are relaxed
at any time-step, so the vehicle must be clear of the obstacle in at least one
direction.
where bipt is a binary decision variable, W is the waypoint location matrix, and
R is the same large, positive number used in (5). It can be seen that bipt = 1
implies that vehicle p visits waypoint i at time-step t.
While the logical constraint enforces that each waypoint without depen-
dency is visited at most once by a vehicle with suitable capabilities, the time
1
The generalization to a polygonal obstacle is easy.
Column Generation Method for Coordination and Control of Multiple UAVs 1101
NT
NV
NT
NV
Kpi bipt ≤ 1, ∀i ∈ NW \D, Kpi bipt = 1, ∀i ∈ D,
t=1 p=1 t=1 p=1
NT
NV
NT
NV
tbik ,p,t − tbjk ,p,t ≥ tDk , ∀k = 1, ..., ND . (7)
t=1 p=1 t=1 p=1
NT
tp ≥ tbipt , ∀p = 1, ..., NV , ∀i = 1, ..., NW . (9)
t=1
The objective is to maximize the total gain and also minimize the imple-
mentation time and the resource expense. Here the gain is more important than
the others, and the implementation time is more important than the resource
expense. Thus, we have the following optimization problem
⎧
⎪
⎨
T N
N W N
V
N V N
T −1
y
max gip bipt − 1 tp + 2 |ftp | + |ftp |
x
s,f,b,c t=1 i=1 p=1 (10)
⎪
⎩
p=1 t=0
subject to: theconstraintsfrom(1)to(9).
In this model, the parameters 1 , 2 > 0 are quite small to represent the
importance of the gain, the implementation time and the resource expense. This
is a linear mixed 0–1 programming problem which allows us to use available
solvers to find the optimal solution, but the computation can be intensive. The
next section is devoted to the description of a column generation approach for
solving the problem (10).
For using column generation, we will reformulate the problem given in Eq. (10)
in the form of column generation, where the dependency constraint is handled in
the Master Problem. The sub-problem is then treated with careful consideration
of the impact of dependency constraint.
1102 D. M. Nguyen et al.
In this formula, i∈NW ∩r gip is the gain of the trajectory r, tpr is the time when
the vehicle p visits its last waypoint in the trajectory r, and frp is the total force
used through the trajectory r.
We define the parameter of visitation ari for each waypoint i ∈ NW by
1 if r visits waypoint i,
ari =
0 otherwise.
When the trajectory r visits the waypoint i, the time of visitation is denoted
by tr (i). Note that each couple of dependency waypoints is represented as Dk =
(ik , jk ), where waypoint ik must be visited after the waypoint jk plus a tDk time
units. Thus, the problem (10) can be reformulated as
⎧
⎪
⎪
N V
⎪
⎪ max g(r).θr
⎪
⎪
⎪
⎪
p=1 r∈Ωp
⎪
⎪
N V
N V
⎪
⎪ s.t. ari θr ≤ 1, ∀i ∈ NW \D, ari θr = 1, ∀i ∈ D,
⎪
⎨
⎪ θr ≤ 1, ∀p = 1, ..., NV ,
⎪
⎪
⎪
⎪
r∈Ωp
⎪
⎪
N V
N V
⎪
⎪ ar,ik tr (ik )θr − ar,jk tr (jk )θr ≥ tDk , ∀k = 1, ..., ND ,
⎪
⎪
⎪
⎪ p=1 r∈Ωp p=1 r∈Ωp
⎩
θr ∈ {0, 1}, ∀r ∈ Ω.
(11)
The variable θr ∈ {0, 1} is a decision variable which describes if a trajectory
r is chosen or not. Because of the first and second constraints, the condition
θr ∈ {0, 1}, ∀r ∈ Ω can be replaced by θr ∈ N, ∀r ∈ Ω. The linear relaxation of
problem (11), i.e., with θr ≥ 0, ∀r ∈ Ω, is called Master Problem (MP).
The methodology of column generation approach can be described as follows.
Let Ωp1 ⊂ Ωp , p ∈ NV , and Ω 1 = p∈NV Ωp1 , we consider the Restricted Master
Problem (RMP), denoted by MP(Ω 1 )
Column Generation Method for Coordination and Control of Multiple UAVs 1103
⎧
⎪
⎪
N V
⎪
⎪ max g(r).θr
⎪
⎪
⎪
⎪
p=1 r∈Ωp1
⎪
⎪
⎪
⎪
N V N V
⎪
⎪ s.t. ari θr ≤ 1, ∀i ∈ NW \D, ari θr = 1, ∀i ∈ D,
⎨ p=1 r∈Ωp1 p=1 r∈Ωp1
⎪ θr ≤ 1, ∀p = 1, ..., NV ,
⎪
⎪
⎪
⎪
r∈Ωp1
⎪
⎪
⎪
⎪
V
a t −
V
ar,ik tr (ik )θr ≤ −tDk , ∀k = 1, ..., ND ,
⎪
⎪ r,j k r (jk )θ r
⎪
⎪ p=1 r∈Ωp 1 p=1 r∈Ωp1
⎪
⎩
θr ≥ 0, ∀r ∈ Ω 1 .
(12)
The dual program of (12), denoted by D(Ω 1 ), is
⎧
⎪
N
W
NV
N D
⎪
⎪ min λi + μp − tDk .αk
⎪
⎪
⎪
⎪ i=1 p=1 k=1
⎪
⎨
NW N
D
s.t. ari λi + μp + (tr (jk ).ar,jk − tr (ik ).ar,ik ) αk ≥ g(r),
⎪
⎪
i=1 k=1
⎪
⎪ ∀r ∈ Ωp1 , p ∈ NV ,
⎪
⎪
⎪
⎪ λi ≥ 0, ∀i ∈ NW \D, λi ∈ R, ∀i ∈ D,
⎩
μp ≥ 0, ∀p = 1, ..., NV , αk ≥ 0, ∀k = 1, ..., ND .
(λ̄, μ̄, ᾱ) = (λ̄1 , ..., λ̄NW , μ̄1 , ..., μ̄NV , ᾱ1 , ..., ᾱND )
NW
ND
ari λ̄i + μ̄p + (tr (jk ).ar,jk − tr (ik ).ar,ik )ᾱk ≥ g(r), ∀r ∈ Ωp1 , p ∈ NV .
i=1 k=1
It is clear that if this condition holds for all r ∈ Ωp , p ∈ NV , then (λ̄, μ̄, ᾱ) is
also the optimal solution of the dual program of (MP). Otherwise, we will look
for a trajectory r ∈ Ωp \Ωp1 , for a vehicle p ∈ NV such that
NW
ND
ari λ̄i + μ̄p + (tr (jk ).ar,jk − tr (ik ).ar,ik )ᾱk < g(r). (13)
i=1 k=1
Step 2. Solve the problem (12) in order to obtain the optimal solution and
its dual solution (λ̄, μ̄, ᾱ).
Step 3. For each vehicle p = 1, ..., NV , solving the sub-problem optimally to
find a trajectory r ∈ Ωk \Ωk1 , and update Ωp1 := Ωp1 ∪ {r}.
Step 4. Iterate step 2–3 until there is no trajectory satisfying the condition
(13).
Step 5. Solve the integer programming formulation of the final RMP to
obtain approximate solution.
NT
NT
NT
ari = bit , tr (ik ).ar,ik = tbik ,t , tr (jk ).ar,jk = tbjk ,t .
t=1 t=1 t=1
Thus
NW
NT T −1
N
g(r) := gi −1 (tr + 2 fr ) = gi bit −1 tr + 2 (|ftx | + |fty |) ,
i∈NW ∩r i=1 t=1 t=0
where N
T
Additionally, we have
ND
ND
NT
(tr (jk ).ar,jk − tr (ik ).ar,ik )ᾱk = (bjk ,t − bik ,t )tᾱk .
k=1 k=1 t=1
The left hand side of (14) is the objective function of the sub-problem.
An interesting in this model is the dependency constraint. Although the
master problem has the proper constraints, including the dependency ones, the
effect of the dual variables is not strong enough to drive the sub-problem towards
trajectories that lead to feasible solutions. There is some numerical investigations
which we do not describe more detail here (but hopefully in the longer paper)
Column Generation Method for Coordination and Control of Multiple UAVs 1105
show this situation. Therefore, we also use the modified dependency constraints
for the sub-problem as follows
NT
NT
NT
tbik ,t + R(1 − bik ,t ) ≥ tDk + tbjk ,t , k = 1, ..., ND , (15)
t=1 t=1 t=1
where R is the same large, positive number used in (5). If the vehicle p visits both
two waypoints ik , jk : the constraint (15) enforce that the dependency is respected
and the visitation time of waypoint ik must be larger than the visitation time of
waypoint jk plus tDk time units. If the vehicle p visits ik , not jk : the constraint
(15) specifies that the visitation time of waypoint ik must be larger than tDk .
Because only such trajectories maybe appear in the optimal solution. If the
vehicle p visits jk , not ik : the constraint (15) will be the relaxation.
4 Numerical Experiment
The algorithms are written in MATLAB 2015b, and are tested on a MacBook
Air, 1.6 GHz Intel Core i5, 8G of RAM. The solver CPLEX 12.8 is used for
solving the linear program (12), and the sub-problem.
We compare the results obtained by our column generation approach with a
purely CPLEX-based approach for the compact model (10). We have NV = 6
UAVs, NW = 12 waypoints, NZ = 5 No-Fly-Zones, NC = 4, and NT = 25 time
steps corresponding to 25 s. Their positions are presented in the Fig. 1. The
parameters of UAVs are presented in Table 1. The parameters of dependency
are given in Table 2. It requires that the waypoint 6 must be visited after the
waypoint 9 plus one time unit, and the waypoint 1 must be visited after the
waypoint 7 plus two time units. We suppose that each vehicle is compatible with
all waypoint, i.e., Kpi = 1, ∀p = 1, ..., NV , ∀i = 1, ..., NW , and set 1 = 2 = 10−3 .
For this data, we have the MILPs with 5760 binary variables, 1446 continuous
variables and 16116 constraints.
UAV Mass (kg) Initial velocity ωmax (◦ /s) vmax (m/s) fmax (N ) F (N )
X(m/s) Y(m/s)
1 5 0.1 0 15 1.5 1.9635 25
2 5 0.1 0 15 1.5 1.9635 25
3 5 0.1 0 15 1.0 1.3090 25
4 5 0.1 0 15 1.5 1.9635 25
5 5 0.1 0 15 1.0 1.3090 25
6 5 0.1 0 15 1.0 1.3090 25
The gains are integer numbers generated uniformly in the interval [1, 20]. The
10 test problems correspond to the 10 samples of generated gains (see Table 3).
1106 D. M. Nguyen et al.
One hour is the limit of computational time for solving the MILP by CPLEX.
For starting column generation procedure we generate some initial trajectories
as follows: we consider the problem (10) with only the set of waypoints having
dependency, and use CPLEX for solving this problem until finding a feasible
solution. The trajectories corresponding to this solution become the initial tra-
jectories Ω 1 .
ik jk tDk
6 9 1
1 7 2
Fig. 1. Results for the test problem 5: the solution of pure CPLEX with the perfor-
mance 179.865901 (left) and the solution of Column Generation with the performance
184.840890 (right)
5 Conclusion
In this paper, we have proposed a column generation approach for solving the
coordination and control of multiple UAVs where the dependency constraint is
mainly handled in the Master Problem and also treated in the sub-problem to
avoid generating infeasible trajectories. The comparative results have demon-
strated the efficiency of our approach in comparison with CPLEX for the com-
pact model. In future work, we will develop some branching techniques to get a
Branch-and-Brice scheme for global solution, as well as solve the sub-problems
in parallel processing.
Acknowledgement. The authors would like to thank the DGA (Direction Générale
de l’Armement) for the support to this research.
References
1. Barnhart, C., Johnson, E.L., Nemhauser, G.L., Savelsbergh, M.W.P., Vance, P.H.:
Branch-and-price: column generation for solving huge integer programs. Oper. Res.
46(3), 316–329 (1998)
2. Bellingham, J., Tillerson, M., Richards, A., How, J.P.: Multi-task allocation and
path planning for cooperative UAVs. In: Butenko, S., Murphey, R., Pardalos,
P.M. (eds.) Cooperative Control: Models, Applications, and Algorithms, pp. 23–41.
Kluwer Academic Publishers (2003)
3. Chandler, P., Pachter, M.: Research issues in autonomous control of tactical UAVs.
In: Proceedings of ACC 1998, pp. 394–398 (1998)
4. Chandler, P.R., Pachter, M., Rasmussen, S.R., Schumacher, C.: Multiple task
assignment for a UAV team. In: Proceedings of the AIAA Guidance, Navigation
and Control Conference (2002)
1108 D. M. Nguyen et al.
5. Le, T.H.A., Nguyen, D.M., Pham, D.T.: Globally solving a nonlinear UAV task
assignment problem by stochastic and deterministic optimization approaches.
Optim. Lett. 6(2), 315–329 (2012)
6. Le, T.H.A., Nguyen, D.M., Pham, D.T.: A DC programming approach for planning
a multisensor multizone search for a target. Comput. Oper. Res. 41, 231–239 (2014)
7. Lübbecke, M., Desrosiers, J.: Selected topics in column generation. Oper. Res.
53(6), 1007–1023 (2005)
8. Nguyen, D.M., Dambreville, F., Toumi, A., Cexus, J.C., Khenchaf, A.: A column
generation based label correcting approach for the sensor management in an infor-
mation collection process. In: Advanced Computational Methods for Knowledge
Engineering, pp. 77–89 (2013)
9. Richards, A., Bellingham, J., Tillerson, M., How, J.: Co-ordination and control of
multiple UAVs. In: AIAA Guidance, Navigation, and Control Conference (2002)
10. Schouwenaars, T., DeMoor, B., Feron, E., How, J.: Mixed integer programming for
safe multi-vehicle cooperative path planning. In: ECC, Porto, Portugal (2001)
11. Simonin, C., Le Cadre, J.-P., Dambreville, F.: A hierarchical approach for planning
a multisensor multizone search for a moving target. Comput. Oper. Res. 36(7),
2179–2192 (2009)
12. Walker, D.H., McLain, T.W., Howlett, J.K.: Coordinated UAV target assignment
using distributed tour calculation. In: Grundel, D., Murphy, R., Pardalos, P.M.
(eds.) Theory and Algorithms for Cooperative Systems. Series on Computers and
Operations Research, vol. 4, pp. 327–333. Kluwer, Dordrecht (2004)
Spare Parts Management in the Automotive
Industry Considering Sustainability
Abstract. Spare parts are a fundamental part of the automotive industry, even if
they are intended for out of series market and aftermarket products. Throughout
this study, a process for formulating an inventory model for spare parts is
presented through this paper, a proposal for an inventory management system
applied to automotive spare parts and based on forecast and simulation is pre-
sented. The objective is to improve the implementation of an inventory system
in order to reduce transportation, storage and production costs by studying the
behavior of demand, the applicability of an inventory management policy and
the use of simulations to test the proposed results. This allows correcting
parameters to avoid stock shortage and to integrate sustainable paradigms.
1 Introduction
components of that vehicle, including goods such as lubricants that are necessary for
the use of the motor vehicle. As part of our project, we collaborated with a supplier,
located in France, of car producers. The different collected data show the impact of
spare parts and the importance of inventory management of components and finished
products while ensuring high service levels required by customers (very low delivery
times and very little shortages). For this, it is necessary to know the finished products to
be made in each period, the finished products to be stored, the periods to order com-
ponents, the quantity of components to be stored and the quantity of components to use
in production per period [1].
Therefore, the main purpose of this paper is firstly to propose an efficient demand
forecast. Indeed, demand forecasts are one of the key issues in logistics since many
factors are involved and responsible for obtaining good forecast results [2]. For that, we
start from the idea developed in Hubert’s PhD dissertation, which proposes to test
different models of forecasts in order to retain the best [3]. To measure the performance
of the proposed forecasting system, we choose to develop an Analytical Hierarchical
Process (AHP), which allows having a good decision-making by involving structuring
criteria into a hierarchy [4]. Saaty [5] gives a good description of AHP processes. Once
the demand is efficiently estimated, the project has a good basis for starting the eval-
uation of the inventory management system.
To define the inventory management policy with components to keep, different
methods can be developed to find the optimal quantity to order when the demand
changes over time and an order is considered per period. This is known as the lot sizes
approach [6]. Several models could be defined based on determinist or stochastic
methods [7] but generally, the main objective is to calculate/estimate an economic lot
size [8]. Therefore, we develop an integer linear mathematical model, which corre-
sponds to the minimization of the sum of transportation, storage and production costs
but also carbon emission costs on the supply chain [9]. This kind of mathematical
models could be complex to solve. The resolution strategy could be conducted based
on a simulation method.
This article breaks down as follows. The second section presents the industrial case
study and the main hypotheses. Then, the mathematical model is presented. The fourth
section develops the proposed resolution algorithm and the chapter five gives some
numerical experiments. Finally, a discussion is proposed and future works are
expressed.
We consider a supplier of automobile producers. This supplier has several raw material
and components suppliers and several customers. The industrial supplier is imple-
mented in Europe such as this customer with different factories and warehouses of
finished products and components in the world. Some warehouses could be dedicated
only to raw materials and components or to finished products or a mix of both.
As part of a research project, this automobile industrial supplier asked us to provide
a decision support tool allowing to size the need of components to order knowing the
need of finished products to be produced/outsourced over a period of finite time, and, to
Spare Parts Management in the Automotive Industry 1111
deduce the optimal strategy for storing these different elements in its platforms or those
of its suppliers in case of orders.
The major difficulty of this work (apart from the complexity related to the number
of different variables to be considered and the constraints) is to predict the demand for
the sold finished products knowing that these are semi-finished products that can be
used in the vehicle manufacturing or maintenance. In case of maintenance, it is nec-
essary to store during several years these semi-finished products. It is therefore difficult
to estimate future needs knowing that we must consider the maintenance of end-of-life
finished products that are no longer sold, as well as the fact that some semi-finished
products will be reconditioned from several semi-finished products. In addition, the
semi-finished products are made from various components that can be used in several
semi-finished products, the same goes for semi-finished products that can be used in
several vehicles.
To simplify the study and the data recovery, we focus on a particular customer. We
are also interested in a single warehouse with ordered products (which can be
components/raw materials or outsourced finished products) and finished products
manufactured from factories of our industrial supplier.
In parallel, the considerations in terms of sustainable development become more
and more important for customers considering the traceability of products (origin,
ecological footprint, composition, elimination, etc.) and society (taxation, require-
ments, lobbying, public regulations, etc.), and push companies to deploy new strategies
incorporating this concept. For firms (and individuals) the integration of sustainable
development is based on the “triple bottom line” approach where a minimum perfor-
mance must be achieved for each of the three dimensions of sustainable development
(economic, environmental and social).
Two approaches can be taken: (i) an internalization of sustainability that is, the
factors that improve the company’s performance are integrated into the company’s
strategy (this often leads to additional costs); or (ii) the externalization of sustainability,
in which case the company decides to postpone the problem to its subcontractors and
service providers but without making any real effort to improve its internal perfor-
mance. In this work, we consider both aspects according to the requirements of the
industrial supplier. Thus, the factors of improvement of the sustainability are integrated
in the form of economic costs at first in order to simplify the study.
The next sections detail our methodology and our approach.
3 Mathematical Model
The objective of the lot size modeling is to find the optimal quantity to order when the
demand changes over time considering an order per period. So, it is necessary to take
into account the production and inventory variables and decision variables representing
a command. Notably, it is a problem that can be modeled as an integer linear math-
ematical model.
1112 D. A. B. Diaz et al.
Table 1. Notation.
Variable Description
i A finished product (to be produced by the firm)
j A production period
k A spare part of the product i
l An element of the supply chain which can be:
Xij Quantity of i to be produce in period j
Aij Quantity of i to be stored in period j
Ckj Quantity of k to be used in period j
Bkj Quantity of k to be stored in period j
Elj Quantity of carbon emitted during period j for element l which can be the
transport mode, the supplier, the warehouse or the supplier’s factory
Oij Binary decision variable which represents the fact we should produce i during
period j (Oij=1) or not (Oij=0)
Ykj Binary decision variable which represents the fact we should order k during period
j (Ykj =1) or not (Ykj =0)
Dij Demand of product i for period j
Rkj Need of spare part k during period j
Cok Ordering cost per unit of spare part k
Cf Storage cost per stored part
Wi Production cost for the product i
Ctk Transportation cost per unit of spare part k
Ccl Carbone emission cost for the element l
X
I X
J I X
X J K X
X J K X
X J
minZ ¼ Oij Wi þ Aij Cf þ Pkj Cok þ Bkj
i¼1 j¼1 i¼1 j¼1 k¼1 j¼1 k¼1 j¼1
L X
X J
Cf þ Elj Ccl ð1Þ
l¼1 j¼1
The constraints allow to make a balance period by period. In this way, the problem
keeps a coherence between the production of the current period and the storage of the
last period. They are given as follows.
Ai0 ¼ 0 8i ð2Þ
Bk0 ¼ 0 8k ð3Þ
Spare Parts Management in the Automotive Industry 1113
X
n
Ckj þ Bkð j1Þ ¼ Xij Rki þ Bkj 8 k; j ð5Þ
i¼1
Equation (2) represents the initial inventory of finished product i and Eq. (3) rep-
resents the initial inventory of component k. Equation (4) represents the limited
capacity for the finished product storage i and Eq. (5) represents the limited capacity
for the component k storage. Equation (6) represents the maximum amount of com-
ponent k and Eq. (7) represents the maximum quantity of finished product i. A service
level of 95% is chosen for the project.
An analysis of the programming model shows 1152 variables to be considered (9
finished products, 22 components, 12 periods of horizon and a variable for each type of
component and finished product). The resolution complexity is a real problem even if
the linear and integer programming model is the best option. To simplify our problem
and reduce the quantity of variables to be considered, the project will seek the mini-
mization of costs only for the finished products since they have a greater economic
value than components.
The next section will present our approach to solve the proposed integer linear
mathematical model.
4 Resolution Method
The behavior of the demand conditions the type of system. In this project, we can
consider systems with deterministic or stochastic demands. To find the best demand
prediction, we chose to have a multi-criteria approach to compare different possible
modeling. This multi-criteria approach is based on an AHP model [5] not detailed in
this paper.
1114 D. A. B. Diaz et al.
T2
Legend
P5 P2
P1: pending demand
P2: pending order
P3: stock of finished products
T1 T3 P4: quantity of delivery orders
P5: number of orders
P6: demand level
P6 P1 P3
T1: Arrival of a customer re-
quest
T2: warehouse order
T3: Arrival of a factory request
T4 T4: Starting a customer delive-
ry
P4
It has to be noticed that a timed transaction (T4) and two places (P1 and P6) are
added to simulate the delivery time and to have a count of the quantity of orders, and
another place to see the evolution of the demand.
Spare Parts Management in the Automotive Industry 1115
5 Numerical Results
To collect the data, it is necessary to make a historical revision of all the needs month
by month (a month is a period of study for the automotive supplier). We consult the
needs of the last two years. The collected data are not given in this paper for the sake of
confidentiality.
The results are given for two different inventory policies: the (Q, R) policy which
consists of order a quantity Q whenever the stock level is below the recommended
point R, and the lot by lot policy which consists in order a lot at each period of study.
The lot could be an economic order quantity [6].
The obtained results are given in Table 2.
The following table contains the comparison of the current policy (the (Q, R)
policy) and the proposal. A lot-by-lot system calculation shows that implementing this
inventory policy saves 23% of total costs (of course it depends on the parameters). At
the same time, it allows to have a vision of the behavior of the demand and to establish
security to avoid the shortages and the customer service level. It is a reconciliation to all
1116 D. A. B. Diaz et al.
the strong or weak points of the new inventory management system, not only on an
economic side but also in an operative way integrating environmental considerations.
Spare Parts Management in the Automotive Industry 1117
6 Conclusion
The proposed methodology is based on real data from the automotive industry provided
by a supplier of automotive producers as well as its needs and constraints. These
collected values have highlighted the importance of having efficient forecasts of spare
part demands even more for components arriving at the end of their life but having to
be kept for the maintenance of the vehicles of the various customers. Thus, several
forecasting techniques were used and the use of the AHP allowed to choose a pre-
diction technique balanced against error, application and adaptation.
Once a forecasting technique has been selected, we must choose whether the parts
are ordered, the order quantity and the replenishment date (the study horizon is finite
and decomposed into periods). To do this, based on an integer linear mathematical
model, we propose a resolution algorithm based on a Petri net. It has to be noticed that
in this paper, we only present results for semi-finished products to simplify the pre-
sentation. Then, we propose numerical experiments to highlight our results, which
shows how the implementation of an inventory system is possible by considering the
behavior of the application, the industrial application and the monitoring of the results.
The proposal highlights an economic and operational improvement for the company in
the case of the management of the alternative products.
The research project is the result of the use of approximate methods. In practice the
effect of working with an estimate of demand will not ensure an optimal result.
However, the proposal for an inventory management system with continuous revision
allows the possibility to adapt to each situation.
Mathematical modeling highlights the degree of complexity of this industrial
problem. In the paper, we have studied only one segment of the global problem. The
development of inventory policies and forecasts should be a deeper analysis. In the
same way we should include more stochastic cases to be closer to reality.
The perspectives of this work are: on the one hand, to propose resolution methods
closer to the obtained mathematical model. Indeed, our proposed Petri net remains
simple in its definition and therefore does not allow to include also all components. On
the other hand, we should identify all factors and costs induced and integrate them in
our work. Then, we could propose a real time decision-making tool for the complete
inventory management of spare parts for automotive industry (suppliers and
producers).
References
1. Alfarez, H., Turnadi, R.: General model for single-item lot sizing with multiple suppliers,
quantity discounts, and backordering. Proc. CIRP 56, 199–202 (2016)
2. Bussay, A., Van der Velde, M., Fumagalli, D., Seguini. L.: Improving operational maize
yield forecasting in Hungary. Agric. Syst. 141, 94–106 (2015)
3. Hubert, T.: Prévision de la demande et pilotage des flux en approvisionnement lointain. Ph.
D. Thesis. École Centrale Paris (2013). (Chapitre 1. Pages 10–12 – in French)
1118 D. A. B. Diaz et al.
4. Gupta, S., Dangayach, G., Kumar, A., Rao, P.: Analytic hierarchy process (AHP) model for
evaluating sustainable manufacturing practices in Indian electrical panel industries. Proc.
Soc. Behav. Sci. 189, 208–216 (2015)
5. Saaty, T.L.: The Analytic Hierarchy Process. McGraw-Hill, New York (1980)
6. Rahjans, N., Samak, S.: Determination of optimum inventory model for minimizing total
inventory cost. Proc. Eng. 51, 803–809 (2013)
7. Erol, S., Jäger, A. Hold, P. Ott, K., Sihn, W.: Tangible industry 4.0: a scenario-based
approach to learning for the future of production. Proc. CIRP 54, 13–18 (2016)
8. Anderson, D.R., Sweeney, D.J., Williams, T.A., Camm, J.D.: An Introduction to
Management Science: Quantitative Approaches to decision making, p. 912, 15th edn.
Cengage Learning (2018)
9. Hennequin, S., Ramirez Restrepo L.M.: Fuzzy model of a joint maintenance and production
control under sustainability constraints. In: Proceedings of the 8th IFAC MIM 2016
Conference, Troyes, France (2016)
10. Hu, H., Zhou, M.C.: A Petri net-based discrete-event control of automated manufacturing
systems with assembly operations. IEEE Trans. Control Syst. Technol. 23(2), 513–524
(2015)
11. Sanjoy, K.: Determination of exponential smoothing constant to minimize mean square error
and mean absolute deviation. Global J. Res. Eng. 11(3), 31–33 (2011)
The Method for Managing Inventory
Accounting
1 Introduction
This article considers the main sections in the logistics industry - warehouse and
transportation departments. Also, the capacity of the commodity warehouse, the pos-
sibility of storing the fund and the ways of determining the capacity of the warehouse
were considered. Effective ways of transportation calculations were calculated to save
time. The sphere of logistics has only just begun to develop. But despite this
automation device are developing at a good pace. The main feature of logistics is the
achievement of maximum income using minimum costs. At present, logistics is a major
source of processes generating competition. In addition to warehousing, transport,
information and production logistics provides the customer with a quality and reliable
service.
2 Objective
The entire path of material transfer presented in the scheme can be divided into two
large sections:
– in the first section the products of the production and technical direction are trans-
furred;
– in the second - the products of which people consume.
1122 D. Kulanda et al.
Thus, the model is a supporting tool for creating a business process and systems to
ensure that the general requirements that are imposed on the business process are met [3].
The formulation of the problem on the process of building business processes:
• accelerate the process of business process and the creation of systems according to
time TB ! min or RiTB ðOpiÞ ! min;
• improve the quality indicators of the business process and the creation of systems
Kп ! max or RiKп(Opi) ! max,
• reduce labor efforts Tp ! min или RiTp(Oтi) ! min;
– where Tв and Tв(Opi) is an indicator of time: the total time spent on creating a
business process and the time it takes to complete each Opi operation; Kp и
Kp(Opi) - quality indicators: common for the business process (and/or system)
and for the performance of each Opi operations.
The Method for Managing Inventory Accounting 1123
These requirements must satisfy the business process model. On the other hand, it
is very difficult to build a universal model for all industries. Therefore, this paper
discusses the construction of a model for a class of business processes in a specific area,
namely for LP (logistic process), which should ensure the creation of logistics business
processes and an automation system for a given business process [4]:
• Creation of a model that provides descriptions and constructions of a wide class of
business processes in the sector and automation system, i.e. (KS ! max).
Where KS is the number of business processes and automation systems
• The list of implemented functions for each generated business process and system
should be wide enough for missions (KF ! max), i.e. full functionality for each case
of the creation of the system. Where KF is the number of implemented functions.
• The completeness level of each function should be sufficient for the mission (ZF
max) to complete the business process and system. Where ZF is the level of
completeness of each implemented function.
Define the purpose and function of each individual local model in this way. A business
process is an object of the outside world. And any object of the external world is
characterized by a conceptual representation, i.e. place in the “world of things”, which
occupies this object among other objects and a set of distinctive properties, as well as
the nature of communication with other objects of the external world. Therefore, the
business process as an object of the outside world should be characterized by a concept,
i.e. conceptual representation [5].
And, as is well known, the conceptual features of an object (that is, a business
process) must be presented in the form of a special model - a conceptual model.
Note that the object is an element of a united information space (UIS), hence the
conceptual model of the business process is an element of the UIS.
It should be noted that the object is conceptually presented for which purpose
separately. The business process is intended for production and is a managed object.
Therefore, the CM of a business process is characterized by its mission, targets or
purpose and criterion, input and output (result) of data.
• In addition, the composition of the input and output depends on what for (for what
purpose) we build the CM for the BR. It should be noted that the CM is building to
solve the problem of integration. Therefore, for us, the CM needs to ensure that it
integrates the business process of logistics with the business processes of other
organizations, for example, at the top level with partners (suppliers and consumers
of goods and machines and equipment) of logistic processes;
• at the lower level with business processes of other, for example, neighboring local
problem areas.
Thus, our business process must be able to integrate with the business processes of
other organizations. Accordingly, the inputs and outputs of the CM at the level of
1124 D. Kulanda et al.
the logistics business process should be harmonized with the business processes of
other organizations [6]. And for integration, the following data is needed:
• the internal structure of the logistics sector, local problems of the region, its com-
position, capacity;
• objects of labor, source and flow of goods, types of goods;
• means of labor, which means of transporting goods between warehouses and cus-
tomers, transportation of goods within the warehouse;
• what outsourcing operations are available, etc.
Since the field of logistics consists of two levels: the general problem area of
logistics and local problem areas, which constitute the general problem area, while they
have a different environment and environment.
where the j-th local problem area sets the information as:
(1) a list of specialized processes included in the created business process of the local
problem area.
(2) the number and types of specialized processes dependent on the problem being
solved and on the characteristics of the business process itself and its specialized
processes.
(3) on the specialized processes of a given business process in a local problem area,
(4) as well as the metamodel (descriptions) of the integration of specialized processes
within a business process for a specific purpose within the problem area.
(5) input and output data characterizing this business process as an element of a single
process information space (UIS) of the second level.
The model is designed to automate business processes. Therefore, we construct a
model for classes of business processes that are observable and controlled.
It is generally accepted that the business process model is represented in two ways
either as it is or as it should be. In order to present the business process “as it should be”
The Method for Managing Inventory Accounting 1125
current situations in production before the business process. The variant of joint per-
formance of operations of specialized processes may be different [7].
At time t, the production situation for a business process is determined as follows:
• SP(t) - production situation that arises before the execution of the business process,
• Z(t) - the purpose or purpose of the business process,
• Jb(t) - setting at the current time,
• Sl(t) - the subject of labor at the current time,
• EP(t) - factors and objects of the external environment that have a direct impact on
the implementation of the business process,
• BP(t) - the state of the business process, characterized by the values of the business
process indicators.
To make strategic decisions, production situations are divided into two classes of
situations:
If for the current production situation the conditions SP(t) 2 SP1, are met, then the
necessary list of specialized processes is selected (the necessary list of types of active
specialized processes) that are necessary to perform the specified task by this business
process, as well as their priorities for each of them. In the current production situation
of SPðtÞ 2 Sp2 , the k-th option of the specialized process is selected.
If SPðtÞ 2 Sp3 , then for the selected variant (k-th variant) of a specialized process,
the set of operations is determined, and the meta scheme for performing the sequence
of operations, in which the allowed combinations of the sequence of operations (Oph
Opk) reflect given the current situation. An admissible combination of an operation is
established on the basis of the semantics of a relation, which is determined from the
ontological model. Expression (Oph Opk) has the following meaning: a sequence is a
valid combination of a sequence of operations where Oph is the h-th class of opera-
tions, Opk is the k-th class of operations, is a sequence operation. This model performs
the role of a scheduler that plans to complete the business process of an upcoming order
or order.
Consider the purpose and principles of action of the logical model (model of decision
making) of the j-th separate specialized process from the stack of the specialized
process of the business process of the local problem area, i.e. j = 1, J.
The strategic model determines when and in what sequence processes are applied
and executed. All of these methods are a process of organization and management. The
purpose of the logical model is to define the sequence of business operations of each
special process. Each business transaction consists of two parts: an operator and a
procedure. Therefore, the logical model contains two levels [8].
The Method for Managing Inventory Accounting 1127
Pri ¼ \Opi1 ; Opi2 ; Opi3 ; Opi4 ; . . .:; Opit ; Opit þ 1 ; . . .:; Opi [ ; ð4:1Þ
Prj ¼ \Opj1 ; Opj2 ; Opj3 ; Opj4 ; . . .:; Opjk ; Opjk þ 1 ; . . .:; Opj [ ð4:2Þ
5 Conclusion
The authors of this work represent a business process as a formalized process in which
certain all types of resources, performers, owners of all types of processes are necessary
to achieve the ultimate goal of the process.
Each type of security is achieved by separate processes, which are called special-
ized business process processes. Each specialized process is modeled by a separate
model. To make the business process manageable, a control loop model is introduced,
consisting of the model:
• a strategic model that will ensure the adoption and implementation of strategic
decisions on the order of implementation of specialized processes,
1128 D. Kulanda et al.
Acknowledgments. This work was supported by Ministry of Education and Science Republic
of Kazakhstan (Grant No. 0118PК01084).
References
1. Van der Aalst, W.M.P.: On the automatic generation of workflow processes based on product
structures. Comput. Ind. 39(2), 97105–97111 (1999)
2. Vlkner, P., Werners, B.: A decision support system for business process planning. Eur.
J. Oper. Res. 125(3), 633647 (2000)
3. Zhang, Y., Feng, S.C., Wang, X., Tian, W., Wu, R.: Object-oriented manufacturing resource
modelling for adaptive process planning. Int. J. Prod. Res. 37(18), 41794195 (1999)
4. Zhang, F., Zhang, Y.F., Nee, A.Y.C.: Using genetic algorithms in process planning for job
shop machining. IEEE Trans. Evol. Comput. 1(4), 278289 (1997)
5. Duisebekova, K., Serbin, V., Ukubasova, G., Kebekpayeva, Z., Skakova, A., Rakhmetu-
layeva, S., Shaikhanova, A., Duisebekov, T., Kozhamzharova, D.: Design and development
of automation system of business processes in educational activity. J. Eng. Appl. Sci. 8,
4702–4714. ISSN:86-949X (2017) (Medwell Journals)
6. Dabbas, R.M., Chen, H.-N.: Mining semiconductor manufacturing data for productivity
improvementan integrated relational database approach. Comput. Ind. 45(1), 2944 (2001)
7. Musa, M.A., Oman M.S., Al-Rahimi W.M.: Ontology driven knowledge map for enhancing
business process reengineering. J. Comput. Sci. Eng. 3(6), 11 (2013) (Academy & Industry
Research Collaboration Center (AIRCC))
8. Rao, L., Mansingh, G., Osei-Bryson, K.-M.: Building ontology based knowledge maps to
assist business process re-engineering. J. Decis. Support Syst. 52(3), 577–589 (2012)
The Traveling Salesman Drone Station
Location Problem
1 Introduction
Drones are on the verge of becoming a proven commercial technology for civil
applications in many public and private sectors. In particular, drones have
already been successfully applied for surveillance or monitoring tasks in agri-
culture, energy, or infrastructure, and for the delivery of packages (see [9] and
references therein).
Murray and Chu introduced two novel NP-hard problems where a truck is
assisted by a drone in last-mile delivery [7]. The first one is called the Flying Side-
kick Traveling Salesman Problem (FSTSP). In this case, if the depot is remotely
located from the demand centers, it can be beneficial to have the drone working
in close collaboration with the truck. To this end, they assume that the drone
is taken along by the truck and might be launched at some locations to initi-
ate an autonomous delivery. After fulfilling the requested order, the drone must
return to the truck in order to be resupplied with a new package. During recent
years, a fast-growing number of research papers have been published, which fol-
low the general concept of the FSTSP such that a high degree of synchronization
between trucks and drones is required (see, e.g., [1,10–13,15]).
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 1129–1138, 2020.
https://doi.org/10.1007/978-3-030-21803-4_111
1130 D. Schermer et al.
2 Problem Definition
In this section, we introduce the TSDSLP that covers a more general case than
the PDSTSP and TSP-DS [4,7]. More precisely, the TSDSLP not only integrates
possibility of drones deliveries into the TSP tours, but also assumes the presence
of multiple drone stations, which might be opened to be used for drone deliveries.
Suppose that a single truck located at a depot, a multitude of drone sta-
tions that can accommodate a fixed number of drones, and a set of customer
locations, each of them with a single demand, are given. In the TSDSLP, the
objective consists in minimizing the makespan (or operational cost) such that all
customers are served, either by the truck or by a drone. Furthermore, we accept
the following assumptions regarding the nature of drones [1,7,10–13,15]:
– When launched from a station, each drone can fulfill exactly one request and
then it needs to return to the same station from which it was launched to
be resupplied for future missions. Without loss of generality, we assume that
each customer is eligible to be served by a drone.
– We assume that a drone has a limited endurance of E distance units per
operation. After returning to the station, the battery of the drone is recharged
(or swapped) instantaneously.
– While the truck is subject to the limitations of the road network, the drones
might be able to use a different trajectory for traveling between locations.
Furthermore, based on their technical constraints, the speed of the truck and
drone might differ. Hence, without loss of generality, the average velocity of
each truck is assumed to be equal to 1 and the relative velocity of each drone
is assumed to be α ∈ R+ times the velocity of the truck.
– We do not consider explicit service times; however, such considerations might
be easily integrated into the model.
– We assume that at most C ∈ Z+ drone stations can be opened and they
are free of charge, except if we minimize the operational cost (see Sect. 2.2).
Furthermore, we require that the potential sites of these stations have been
determined already.
Let us present the notation that we are going to use throughout the paper.
Assume that a complete and symmetric graph G = (V, E) is given, where V is the
set of vertices and E is the set of edges. The set V contains n vertices associated
with the customers, named VN = {1, . . . , n}, a set of m possible drone stations
1132 D. Schermer et al.
VS = {s1 , . . . , sm }, and two extra vertices 0 and n + 1 that (in order to simplify
the notation and the mathematical formulation) both represent the same depot
location at the start and end of the truck’s tour. Thus, V = {0}∪VN ∪VS ∪{n+1},
where 0 ≡ n+1. To simplify the notation, we introduce the sets VL = V \{n+1}
and VR = V \ {0}.
By the parameters dij and dij we define the distance required to travel from
vertex i to vertex j by truck and drone, respectively. In principle, as the drones
are not limited to the road network, the distances might differ. For the purpose
of this work, Euclidean distance is considered for both the truck and drones. We
use the parameters v = 1 and v = α · v to define the (constant) velocity of the
truck and drones. Hence, the time required to traverse edge (i, j) is defined as
tij = dij and tij = dij /α for the truck and drones, respectively.
We assume that each drone station accommodates a limited and identical
number of drones D := {1, . . . , Dn }, where Dn ∈ Z>0 . Each drone may travel
a maximum span of E distance units per operation, where a drone operation is
characterized by a triple (d, s, j) as follows: the drone d ∈ D is launched from
a drone station s ∈ VS , fulfills a request at j ∈ VN , and returns to the same
station from which it was launched.
Figure 1 shows an example of a TSDSLP instance and potential TSP and
TSDSLP solutions.
1 3
D s1 s2
2 4
Fig. 1. A TSDSLP with a depot D, four customers VN = {1 . . . 4}, two drone stations
VS = {s1 , s2 } that can accommodate two drones each, a TSP solution (middle figure)
and a TSDSLP solution (right figure) in which a station is utilized for two deliveries
Using this notation, we have the following MILP formulation of the TSDSLP:
min τ, (1)
s.t. tij xij ≤ τ, (2)
i∈VL j∈VR
i=j
tij xsij + d
2 · tsj · ysj ≤ τ : ∀s ∈ VS , d ∈ D, (3)
i∈VL j∈VR j∈VN
i=j
d
xij + ysj = 1 : ∀j ∈ VN , (4)
i∈VL , s∈VS d∈D
i=j
x0j = xi,n+1 = 1, (5)
j∈VR i∈VL
xik − xkj = 0 : ∀k ∈ VN ∪ VS , (6)
i∈VL j∈VR
i=k k=j
xij ≤ |S| − 1 : ∀S ⊂ V, {0, n + 1} ∈
/ S, |S| > 1, (7)
i∈S j∈S
i=j
In this model, the objective function (1) minimizes the makespan τ . Con-
straints (2) and (3) describe τ mathematically. More precisely, constraint (2)
sets the time spent traveling by the truck (to serve customer and supply sta-
tions) as a lower bound on the objective value. In constraints (3), for each station
s, we account for the time until the truck has reached the station and then, for
each drone d located at the station, the time spent fulfilling requests. These
values are summed up to define lower bounds on τ .
Constraints (4) guarantee that each request j is served exactly once by either
the truck or a drone. The flow of the truck is defined through constraints (5)–(6).
1134 D. Schermer et al.
More precisely, constraints (5) ensure that the truck starts and concludes its tour
exactly once. For each customer or drone station, constraints (6) guarantee that
the flow is preserved, i.e., the number of incoming arcs must equal the number
of outgoing arcs. Moreover, constraints (7) serve as classical subtour elimination
constraints, i.e., for each proper non-empty subset of vertices S (that does not
contain the depot), no more than |S| − 1 arcs can be selected within this set.
Constraints (8)–(11) specify the route of the truck that leads to each drone
station s. To this end, constraints (8) ensure that this path must follow the path
of the truck. Moreover, constraints (9) guarantee that the departure from the
depot is always a part of each route to a station. Furthermore, for each vertex
k that might be located in between the depot and the station, constraints (10)
preserve the flow. In addition, for each station s that is visited by the truck,
constraints (11) guarantee that there is exactly one arc leading to the station.
As specified by constraints (12), a drone station is opened only if it is visited
by the truck. Moreover, constraint (13) guarantees that at most C drone stations
may be opened. Constraints (14) restrict the number of drone operations that
can only be performed at opened drone stations. Constraints (15) determine
the drone stations’ range of operation. Note that these constraints might be
effectively handled during preprocessing. Finally, according to the definition of
the decision variables, τ ∈ R≥0 and the other decision variables are binary.
In place of constraints (7), it is possible to adapt the family of Miller-Tucker-
Zemlin (MTZ) constraints, using auxiliary integer variables ui , as follows [5]:
u0 = 1, (16)
2 ≤ ui ≤ n + m + 2 : ∀i ∈ VR , (17)
ui − uj + 1 ≤ (n + m + 2)(1 − xij ) : ∀i ∈ VL , j ∈ VR , i = j, (18)
ui ∈ Z+ : ∀i ∈ V. (19)
d
min ct dij xij + cd (dsj + djs )ysj + fs zs (20)
i∈VL j∈VR s∈VS d∈D j∈VN s∈VS
i=j
where, fs is the fixed cost of opening and using the station s, and the parameters
ct , cd ∈ R+ determine the relative cost for each mile that the truck and drones
are in operation. In this case, the model can remain unchanged with the excep-
tion that it is not necessary to consider the variables τ , xsij and the respective
constraints associated with these variables.
The Traveling Salesman Drone Station Location Problem 1135
3 Computational Experiments
We implemented the model (1)–(15) and solved it by the MILP solver Gurobi
Optimizer 8.1.0. Throughout the solution process, the subtour elimination con-
straints (7) were treated as lazy constraints. More precisely, whenever the solver
determines a new candidate incumbent integer-feasible solution, we examine if
it contains any subtour. If no subtour is contained, we have a feasible solution.
Otherwise, we calculate the set of vertices S that is implied by the shortest sub-
tour contained in the current candidate solution. For this set S, constraint (7) is
added as a cutting plane and the solver continues with its inbuilt branch-and-cut
procedure. For comparative purposes, we solved also the alternative formulation
of the TSDSLP in which the MTZ constraints (16)–(19), in place of (7), are
used. We carried out all experiments on single compute nodes in an Intel Xeon
E5-2670 CPU cluster where each node was allocated 8 GB of RAM. A time limit
of 10 min was imposed on the solver.
We generated the test instances according to the scheme shown in Fig. 2.
More precisely, we considered a 32 × 32 km2 square grid where the customer
locations VN = {1, . . . , n}, n ∈ {10, 30, 50} were generated under uniform dis-
tribution. Furthermore, we assumed that the drone stations are located at the
coordinates (x, y) ∈ (8, {8, 24}) ∪ (24, {8, 24}). Moreover, we considered a cen-
tral depot at (x, y) = (16, 16). We investigated different cases; more precisely,
the basic one follows the assumption of Murray and Chu [7], where drones have
a maximum range of operation of E = 16 km. Therefore, the radius of opera-
tion associated with each station is Er = E/2 = 8 km. In order to broaden our
experiments, we tested the model for two other values of Er .
In order to study the influence of problem parameters on the solver and the
solutions, we considered their domains as follows. We tested for three different
possible values for C, i.e., C ∈ {1, 2, 3}, and also did experiments for three
distinct number of identical drones that a drone station can hold, i.e., |D| ∈
{1, 2, 3}. Moreover, we let the relative velocity α be one of the values from
{0.5, 1, 2, 3} and we assumed that the operational radius Er ∈ {8, 12, 16}.
1136 D. Schermer et al.
For each value of n ∈ {10, 30, 50}, we generated 10 random instances, which
(along with the drone stations and the location of the depot) specify the graph G
(refer to [14] for the instances). Furthermore, based on our choice of parameters
C, |D|, α and E, we have 3 · 3 · 4 · 3 = 108 parameter vectors. Therefore, we have
a total of 30 · 108 = 3240 problems that are solved through both formulations.
Table 1 contains the numerical results of our computational experiments
using two different formulations of the TSDSLP. In particular, for each num-
ber of customers n, the permitted number of stations C, and the operational
radius Er , this table shows the average run-time t (in seconds) as well as the
average MIP gap. While comparing the results of two formulations, we observe
that the differences on instances with n = 10 are negligible; however, on larger
instances, the prohibition of subtours through lazy constraints improves the aver-
age run-times and MIP gaps significantly. Although medium-sized instances can
be solved within reasonable time, the run-time depends strongly on n and the
parameters.
Table 1. Influence of the instance size n, the number of permitted stations C, the
radius of operation Er , and the formulation on the run-time (s.) as well as MIP gap
For the purpose of illustrating the benefits of utilizing the drone stations
with regards to makespan reduction, we introduce the following metric, where τ
∗
is the objective value returned by the solver and τTSP is the optimal objective
value of the TSP (that does not visit or use any drone station):
τ
Δ = 100% − ∗ (21)
τTSP
Figure 3 highlights the numerical results. More precisely, this figure shows
the average savings over the TSP, i.e., Δ, based on the number of permitted
stations C, the number of drones located at each station |D|, as well as the
drones’ relative velocity α and radius of operation Er . Overall, we can distinguish
two cases. If the radius of operation is small (Er = 8, solid lines) and the number
The Traveling Salesman Drone Station Location Problem 1137
of permitted stations C is fixed, the savings are nearly independent from the
number of drones at each station and their velocity and radius of operation. In
this case, the number of customers that can be served by the drones is limited
(see Fig. 2). However, even a slow-moving drone can effectively serve most (or all)
customers within its radius of operation. An increase in the number of drones (or
their relative velocity) will in this case not improve the overall makespan. On the
other hand, if the radius of operation is large (Er = 16, dashed lines), there is a
significant impact of these parameters on the savings. In this case, the makespan
can be reduced effectively by increasing the number of drones (or their relative
velocity). Furthermore, it is worth to highlight that, in many cases, significant
savings are already possible with few drones (per station) that have a relative
velocity of α ∈ {0.5, 1} but a large operational range. This contrasts problems
that follow the fundamental idea of the FSTSP, where drones with relatively
small endurance but fast relative velocity are often preferred [1,10–13].
Permitted Stations C
1 2 3
60
|D| = 1
50
|D| = 2
Savings Δ [%]
40 |D| = 3
30
20
10
0
0.5 1.0 2.0 3.0 0.5 1 2 3 0.5 1 2 3
Relative Velocity α
Fig. 3. The savings Δ for different values of the problem parameters (averaged over
all instances). Solid and dashed lines correspond to Er = 8 and Er = 16, respectively
4 Conclusion
In this work, we introduced the Traveling Salesman Drone Station Location
Problem (TSDSLP), which combines Traveling Salesman Problem and Facility
Location Problem in which facilities are drone stations. After formulating the
problem as a MILP, we presented the results of our computational experiments.
According to the numerical results, using suitable drone stations can bring sig-
nificant reduction in the delivery time.
Since TSDSLP defines a new concept, the future research directions are
numerous, e.g., a research idea might consist in studying the case of using mul-
tiple trucks in place of a single one. Another research direction might focus in
design of efficient solution methods. In fact, the standard solvers are able to solve
1138 D. Schermer et al.
only small TSDSLP instances; hence, we might design effective heuristics, which
can address large-scale instances. The research in this direction is in progress
and the results will be reported in the future.
References
1. Agatz, N., Bouman, P., Schmidt, M.: Optimization approaches for the traveling
salesman problem with drone. Transp. Sci. 52(4), 965–981 (2018)
2. Chauhan, D., Unnikrishnan, A., Figliozzi, M.: Maximum coverage capacitated facil-
ity location problem with range constrained drones. Transp. Res. Part C: Emerg.
Technol. 1–18 (2019)
3. Dorling, K., Heinrichs, J., Messier, G.G., Magierowski, S.: Vehicle routing problems
for drone delivery. IEEE Trans. Syst. Man Cybern. Syst. 47(1), 70–85 (2017)
4. Kim, S., Moon, I.: Traveling salesman problem with a drone station. IEEE Trans.
Syst. Man Cybern. Syst. 49(1), 42–52 (2018)
5. Miller, C.E., Tucker, A.W., Zemlin, R.A.: Integer programming formulation of
traveling salesman problems. J. ACM 7(4), 326–329 (1960)
6. Min, H., Jayaraman, V., Srivastava, R.: Combined location-routing problems: a
synthesis and future research directions. Eur. J. Oper. Res. 108(1), 1–15 (1998)
7. Murray, C.C., Chu, A.G.: The flying sidekick traveling salesman problem: opti-
mization of drone-assisted parcel delivery. Transp. Res. Part C: Emerg. Technol.
54, 86–109 (2015)
8. Nagy, G., Salhi, S.: Location-routing: issues, models and methods. Eur. J. Oper.
Res. 177(2), 649–672 (2006)
9. Otto, A., Agatz, N., Campbell, J., Golden, B., Pesch, E.: Optimization approaches
for civil applications of unmanned aerial vehicles (UAVs) or aerial drones: a survey.
Networks 72(4), 411–458 (2018)
10. Schermer, D., Moeini, M., Wendt, O.: Algorithms for solving the vehicle routing
problem with drones. In: LNCS, vol. 10751, pp. 352–361 (2018)
11. Schermer, D., Moeini, M., Wendt, O.: A variable neighborhood search algorithm
for solving the vehicle routing problem with drones (Technical report), pp. 1–33.
BISOR, Technische Universität Kaiserslautern (2018)
12. Schermer, D., Moeini, M., Wendt, O.: A hybrid VNS/Tabu search algorithm for
solving the vehicle routing problem with drones and en route operations. Comput.
Oper. Res. 109, 134–158 (2019). https://doi.org/10.1016/j.cor.2019.04.021
13. Schermer, D., Moeini, M., Wendt, O.: A matheuristic for the vehicle routing prob-
lem with drones and its variants (Technical report), pp. 1–37. BISOR, Technische
Universität Kaiserslautern (2019)
14. Schermer, D., Moeini, M., Wendt, O.: Instances for the traveling salesman
drone station location problem (TSDSLP) (2019). https://doi.org/10.5281/zenodo.
2594795
15. Wang, X., Poikonen, S., Golden, B.: The vehicle routing problem with drones:
several worst-case results. Optim. Lett. 11(4), 679–697 (2016)
Two-Machine Flow Shop with a Dynamic
Storage Space and UET Operations
1 Introduction
This paper presents a proof of the NP-hardness in the strong sense and a
polynomial-time approximation scheme (PTAS) for the two-machine flow shop,
where the duration of each operation is one unit of time and where, in order
to be processed, each job requires a certain amount of an additional resource,
which will be referred to as a storage space (buffer). The storage requirement
varies from job to job, and the availability of the storage space (buffer capacity)
varies in time. The goal is to minimise the time needed to complete all jobs.
The presented computational complexity results are complemented by several
heuristics which are compared by means of computational experiments.
The considered problem arises in star data gathering networks where datasets
from the worker nodes are to be transferred to the base station for processing.
c Springer Nature Switzerland AG 2020
H. A. Le Thi et al. (Eds.): WCGO 2019, AISC 991, pp. 1139–1148, 2020.
https://doi.org/10.1007/978-3-030-21803-4_112
1140 J. Berlińska et al.
Data transfer can commence only if the available memory of the base station is
not less than the size of the dataset that is to be transferred. The amount of
memory, occupied by a dataset, varies from worker node to worker node. Only
one node can transfer data to the base station at a time, although during this
process the base station can process one of the previously transferred datasets.
The memory, consumed by this dataset, is released only at the completion of pro-
cessing the dataset by the base station. The base station has a limited memory
whose availability varies in time due to other processes.
The existing publications on scheduling in the data gathering networks
assume that the exact time, needed for transferring each dataset, and the exact
time, required by the base station for its processing after this transfer, are known
in advance (see, for example, [1–3,10]). In reality, the exact duration of trans-
ferring a dataset and the duration of its processing by the base station may be
difficult to estimate, and only an upper bound may be known. In such a situa-
tion, the allocation of dataset independent time slots for transferring data and
for processing a dataset by the base station, may be a more adequate option.
This approach is analysed in this paper. The paper also relaxes the assumption,
which is normally made in the literature (see, for example, [3]), that, during the
planning horizon, the available amount of memory remains the same.
Another area that is relevant to this paper is transportation and manufactur-
ing systems where two consecutive operations use the same storage space that is
allocated to a job at the beginning of its first operation and is released only at
the completion of this job. For example, in supply chains, where goods are trans-
ported in containers or pallets with the consecutive use of two different types
of vehicles, the unloading of one vehicle and the loading onto another normally
require certain storage space. Although the storage requirements of different con-
tainers or pallets can vary significantly, the durations of loading and unloading
by a crane practically remain the same regardless of their sizes.
The considered scheduling problem can be stated as follows. The jobs, com-
prising the set N = {1, ..., n}, are to be processed on two machines, machine
M1 and machine M2 . Each job should be processed during one unit of time on
machine M1 (the first operation of the job), and then during one unit of time
on machine M2 (the second operation of the job). Each machine can process at
most one job at a time, and each job can be processed on at most one machine
at a time. If a machine starts processing a job, it continues its processing until
the completion of the corresponding operation, i.e. no preemptions are allowed.
A schedule σ specifies for each j ∈ N the point in time Sj (σ) when job j starts
processing on machine M1 and the point in time Cj (σ) when job j completes
processing on machine M2 . In order to be processed each job j requires ωj
units of an additional resource. These ωj units are seized by job j during the
time interval [Sj (σ), Cj (σ)). At any point in time t the available amount of the
resource is specified by the function Ω(t), i.e. any schedule σ, at any point in
time t, should satisfy the condition
ωj ≤ Ω(t).
{j: Sj (σ)≤t<Cj (σ)}
Two-Machine Flow Shop with a Dynamic Storage Space 1141
In what follows, if it is clear what schedule is considered, the notation Sj (σ) and
Cj (σ) will be replaced by Sj and Cj .
A schedule is a permutation schedule if the order in which the jobs are pro-
cessed on machine M1 , and the order in which the jobs are processed on machine
M2 , are the same. In the case of arbitrary processing times, an optimal schedule
that is also a permutation schedule may not exist even if the availability of stor-
age space does not change [7]. Furthermore, in this case, the problem remains
NP-hard in the strong sense even if the order in which the jobs are processed
on one of the machines, is given [9]. In contrast to the case of arbitrary pro-
cessing times, for the problem with the unit execution time (UET) operations,
considered in this paper, there always exists an optimal schedule, where each
job starts processing on machine M2 at the moment when it completes its pro-
cessing on machine M1 . So, one can assume that the resource is allocated to
operations rather than to jobs. This makes this paper relevant to the publica-
tions on resource constrained scheduling with UET operations [4,5,11,12], which
consider a flow shop with an additional resource and assume that the resource
is allocated to operations. This paper contributes to the body of literature on
resource constrained scheduling by presenting the results for the case when the
availability of the resource changes in time.
The rest of the paper is organised as follows. Section 2 presents a proof that
the problem is strongly NP-hard. Section 3 describes a polynomial-time approxi-
mation scheme. An integer linear program approach can be found in Sect. 4. The
proposed heuristic algorithms are described in Sect. 5, and the results of their
comparison by computational experiments are given in Sect. 6. The last section
comprises conclusions.
QUESTION: do there exist permutations (i1 , ..., ir ) and (j1 , ..., jr ) of the indices
1, ..., r such that zk = xik + yjk for all k ∈ {1, ..., r}?
Theorem 1. The considered scheduling problem is NP-hard in the strong sense.
1142 J. Berlińska et al.
Proof. Let
x = max xi and Z = x + max yi + max zi
1≤i≤r 1≤i≤r 1≤i≤r
It will be shown that the jobs {1, ..., 2r} with the storage requirements
2Z + xi if 1 ≤ i ≤ r;
ωi = (3)
Z + yi−r + x if r + 1 ≤ i ≤ 2r
can be completed in the time interval [0, 3r] if and only if the corresponding
instance of the Numerical Matching with Target Sums has answer YES.
Suppose that there exists a schedule whose makespan is less than or equal
to 3r. Observe that for any point in time t such that 0 ≤ t ≤ 3r and any job j,
ωj ≤ Ω(t), but if at this point in time Ω(t) = 2Z + x, then at most one job can
use the buffer at t.
Since, after the completion of a job, the next job cannot be completed earlier
than after one unit of time and since all time intervals where Ω(t) = 3Z + x + zk ,
i.e. when several jobs can use the storage space simultaneously, are disjoint
unit time intervals, at any point in time at most two jobs can use the buffer
simultaneously.
The total processing time of all jobs is 4r, whereas the total length of all
time intervals before the point in time 3r where Ω(t) = 2Z + x and therefore
where at most one job can be processed is 2r. Hence, taking into account that
the makespan is 3r, the remaining 2r units of processing time are allocated to
the disjoint unit time intervals of total length r. In other words, in each of these
unit time intervals, the machines process concurrently two jobs and each of these
two jobs is processed during the entire unit time interval.
On the other hand, if the same job is processed in two disjoint time intervals
[3k − 2, 3k − 1] and [3k − 2, 3k − 1], where 1 ≤ k < k ≤ r, then no jobs
are processed in the time interval (3k − 1, 3k − 2) where the buffer capacity is
2Z + x. This contradicts the assumption that the makespan does not exceed 3r,
because only 2r units of processing time can be allocated to the disjoint time
intervals where jobs can be processed concurrently, and the remaining 2r units
must be allocated to the time intervals where the buffer capacity is 2Z + x.
If two jobs that are processed concurrently are both from the set {r+1, ..., 2r},
then it leaves only r − 2 jobs from {r + 1, ..., 2r} for processing in the remaining
r − 1 disjoint unit time intervals. Hence, in this case, in at least one such unit
time interval, two jobs from the set {1, ..., r} are processed concurrently, which
violates the buffer capacity. Consequently, for each 1 ≤ k ≤ r, in the time interval
Two-Machine Flow Shop with a Dynamic Storage Space 1143
[3k − 2, 3k − 1], some job ik ∈ {1, ..., r} is processed concurrently with some job
jk ∈ {r + 1, ..., 2r}, and for any two different k and k from {1, ..., r} we have
ik = ik and jk = jk .
For each 1 ≤ k ≤ r, since ik and jk are processed concurrently, they satisfy
the inequality
ωik + ωjk ≤ 3Z + x + zk
which, by virtue of (3), implies xik + yjk ≤ zk which, in turn, by virtue of (1)
gives
xik + yjk = zk .
Suppose now that the instance of the Numerical Matching with Target Sums
has answer YES, i.e. there exist permutations (i1 , ..., ir ) and (j1 , ..., jr ) of the
indices 1, ..., r such that zk = xik + yjk for all k ∈ {1, ..., r}. Then, the schedule
where, for each 1 ≤ k ≤ r, Sik = 3(k − 1) and Sjk +r = 3k − 2 has the required
makespan of 3r.
Then, the values of F for all (q + 1)-tuples that are not boundary are computed,
using the following recursive equations:
F (n1 , ..., ni + 1, ..., nq , i) = min [F (n1 , ..., nq , e) + Wi,e (F (n1 , ..., nq , e) − 1)].
{e: ne >0}
4 ILP Formulation
In this section, we formulate our problem as an integer linear program. Since we
assumed that ωj ≤ Ω(t) for all j and t, the schedule length never exceeds 2n.
Let T ≤ 2n be an upper bound on the optimum schedule length Cmax . As the
available buffer size may change only at integer moments, for any nonnegative
integer t the buffer size equals Ω(t) in the whole interval [t, t + 1). For each
j = 1, . . . , n and t = 0, . . . , T − 1 we define binary variables xj,t such that
xj,t = 1 if job j starts at time t, and xj,t = 0 in the opposite case. The minimum
schedule length can be found by solving the following integer linear program.
T
−1
xj,t = 1 for j = 1, . . . , n (8)
t=0
T
−1
txj,t + 2 ≤ Cmax for j = 1, . . . , n (9)
t=0
xj,t ∈ {0, 1} for j = 1, . . . , n, t = 0, . . . , T − 1 (10)
Constraints (6) guarantee that the jobs executed in interval [t, t + 1), where
1 ≤ t ≤ T − 1, fit in the buffer. Note that for t = 0 we have only one job running
in interval [t, t + 1), and hence, the buffer limit is also observed. At most one job
starts at time t by (7), and each job starts exactly once by (8). Inequalities (9)
ensure that all jobs are completed by time Cmax .
5 Heuristic Algorithms
Although the ILP formulation proposed in the previous section delivers opti-
mum solutions for our problem, it is impractical for larger instances because of
its high computational complexity. Therefore, in this section we propose heuris-
tic algorithms. Each of them constructs a sequence in which the jobs should be
processed. The jobs are started without unnecessary delay, as soon as the previ-
ous job completes on the first machine and sufficient amount of buffer space is
available.
Algorithm LF constructs the job sequence using a greedy largest fit rule.
Every time unit, we start on the first machine the largest job which fits in
the currently available buffer. If no such job can be found, the procedure con-
tinues after one time unit, when the buffer is released. This algorithm can be
implemented to run in O(n log n) time, using a self-balancing binary search tree.
However, if n is not very big, a simple O(n2 ) implementation that does not use
advanced data structures may be more practical.
Local search with neighborhoods based on job swaps proved to be a very
good method for obtaining high quality solutions for flow shop scheduling with
constant buffer space and non-unit operation execution times [3]. Therefore, we
also analyse algorithm LFLocal that starts with a schedule generated by LF,
and then applies the following local search procedure. For each pair of jobs, we
check if swapping their positions in the current sequence leads to decreasing the
schedule length. The swap that results in the shortest schedule is executed, and
the search is continued until no further improvement is possible.
Algorithm Rnd constructs a random job sequence in O(n) time. This algo-
rithm is used mainly to verify if the remaining heuristics perform well in com-
parison to what can be achieved without effort.
Algorithm RndLocal starts with a random job sequence, and then improves
it using the local search procedure described above. Analysing this heuristic will
let us know what can be achieved by local search if we start from a probably
low quality solution.
1146 J. Berlińska et al.
6 Experimental Results
In this section, we compare the quality of the obtained solutions and the com-
putational costs of the proposed heuristics. The algorithms were implemented in
C++ and run on an Intel Core i7-7820HK CPU @ 2.90 GHz with 32 GB RAM.
Integer linear programs were solved using Gurobi. Due to limited space, we report
here only on a small subset of the obtained results. The generated instances and
solutions can be found at http://berlinska.home.amu.edu.pl/datasets/F2-UET-
buffer.zip.
40 1E+4
35 1E+3
30 1E+2
1E+1
25
1E+0
20
1E-1
15
1E-2
10 1E-3
5 1E-4
0 1E-5
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
ILP LF LFLocal Rnd RndLocal ILP LF LFLocal Rnd RndLocal
a) b)
Fig. 1. Results for n = 100 vs. δ. (a) Average quality, (b) average execution time.
Not all analysed instances could be solved to optimality using the ILP in
reasonable time. Therefore, we measure the schedule quality by the relative per-
centage error from the lower bound computed by Gurobi by solving the ILP in
1 h time limit. In most cases, this limit was enough to reach an optimal solution.
To illustrate this, in addition to the heuristic results, we also report on the qual-
ity of solutions returned by ILP after at most 1 h. For each analysed parameter
combination, we present the average results over 30 instances.
The test instances were generated as follows. In tests with n jobs, the buffer
requirements ωj were chosen randomly from the range [n, 2n]. Due to such a
choice of ωj range, the buffer requirements are diversified, but not very unbal-
anced. For a given δ, the available buffer space Ω(t) was chosen randomly from
the range [maxnj=1 {ωj }, δ maxnj=1 {ωj }], for each t = 0, . . . , 2n − 1 independently.
On the one hand, when δ is very small, the instances may be easy to solve
because there are not many possibilities of job overlapping, and the optimum
schedules are long. On the other hand, if δ is very big, there exist many pairs
of jobs that fit in the buffer together, which can also make instances easier.
Therefore, we tested δ ∈ {1.0, 1.1, . . . , 2.0} for n = 100. The obtained results are
presented in Fig. 1. All instances with δ ∈ {1.0, 1.1, 2.0} were solved by ILP to
optimality within the imposed time limit. For each of the remaining values of δ,
there were some tests for which only suboptimal solutions could be found within
an hour. The instances with δ ∈ [1.3, 1.6] seem the most difficult, as the average
Two-Machine Flow Shop with a Dynamic Storage Space 1147
running time of ILP for these values is above 1000 s. In this set of instances, the
heuristic algorithms have the worst solution quality for δ = 1.6. Hence, in the
next experiment we use δ = 1.6, in order to construct demanding instances.
We analysed the performance of our heuristics for n = 10, 20, . . . , 100. The
quality of solutions delivered by the respective algorithms is presented in Fig. 2a.
All instances with n ≤ 40 were solved to optimality by the ILP algorithm in the
1 h time limit. In each remaining test group, there were several instances for
which the optimum solution was not found within this time. Still, the largest
average error of one-hour ILP, obtained for n = 90, is only 0.25%. As expected,
the worst results are obtained by algorithm Rnd. Algorithm RndLocal delivers
much better schedules, which shows that our local search procedure can improve
a poor initial solution. On the contrary, the differences between the results deliv-
ered by LF and LFLocal are very small. For most instances, the schedules deliv-
ered by LF and LFLocal are identical, because they are local optimums. The
quality of results returned by all algorithms gets worse with growing n. The
number of jobs has the strongest influence on algorithms Rnd and RndLocal. Its
impact on LF and LFLocal is much smaller, and the changes in the quality of
ILP results are barely visible. The quality of results produced by all algorithms
seems to level off for n ≥ 50.
40 1E+4
35 1E+3
30 1E+2
1E+1
25
1E+0
20
1E-1
15
1E-2
10 1E-3
5 1E-4
0 1E-5
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
ILP LF LFLocal Rnd RndLocal ILP LF LFLocal Rnd RndLocal
a) b)
Fig. 2. Results for δ = 1.6 vs. n. (a) Average quality, (b) average execution time.
The average execution times of the algorithms are shown in Fig. 2b. Naturally,
algorithms Rnd and LF, each of which generates only one job sequence, are the
fastest. The impact of n on their running times is very small, because they have
low computational complexity. Local search algorithms need more time, and are
affected by the growth of n. RndLocal is significantly slower than LFLocal. This
is caused by the fact that when we start from a random sequence, the local
search can really do some work, while in the case of LFLocal, we usually have
only 1 or 2 iterations of the search procedure. The ILP algorithm is the slowest,
and its running time increases fast with growing n.
All in all, the one-hour limited ILP returns optimum or near-optimum solu-
tions, but at a relatively high computational cost. In our experiments with chang-
1148 J. Berlińska et al.
ing n, algorithm LF delivers schedules within 12% from the optimum on average.
The worst results were obtained by LF for the tests with δ = 2.0 (see Fig. 1a),
but the average error was still below 14%. The running time of LF is several
orders of magnitude lower than that of ILP even for small instances, and the dif-
ference between them increases with the growth of n. Therefore, ILP should only
be used when getting as close as possible to the optimum is more important than
the algorithm’s running time. For the cases when 10–15% error is acceptable, we
recommend using algorithm LF.
7 Conclusions
To the authors’ knowledge, this article is the first paper attempting to explore the
two-machine flow shop with a dynamic storage space and job dependent storage
requirements. For the case of UET operations, the paper presents a proof of the
NP-hardness in the strong sense and a polynomial-time approximation scheme,
together with an integer linear program and several heuristics, characterised
by the results of computational experiments. Future research should include a
worst-case analysis of approximation algorithms.
References
1. Berlińska, J.: Scheduling for data gathering networks with data compression:
Berlińska. J. Eur. J. Oper. Res. 246, 744–749 (2015)
2. Berlińska, J.: Scheduling data gathering with maximum lateness objective. In:
Wyrzykowski, R. et al. (eds.) PPAM 2017, Part II. LNCS, vol. 10778, pp. 135–
144. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78054-2 13
3. Berlińska, J.: Heuristics for scheduling data gathering with limited base station
memory. Ann. Oper. Res. (2019). https://doi.org/10.1007/s10479-019-03185-3. In
press
4. Blażewicz, J., Kubiak, W., Szwarcfiter, J.: Scheduling unit-time tasks on flow-shops
under resource constraints. Ann. Oper. Res. 16, 255–266 (1988)
5. Blażewicz J., Lenstra, J.K., Rinnooy Kan, A.H.G.: Scheduling subject to resource
constraints: classification and complexity. Discret. Appl. Math. 5, 11–24 (1983)
6. Fernandez de la Vega, W., Lueker, G.S.: Bin packing can be solved within 1 + ε in
linear time. Combinatorica 1(4), 349–355 (1981)
7. Fung, J., Zinder, Y.: Permutation schedules for a two-machine flow shop with
storage. Oper. Res. Lett. 44(2), 153–157 (2016)
8. Garey, M.R., Johnson, D.S.: Computers and intractability: a guide to the theory
of NP-completeness. Freeman, San Francisco (1979)
9. Gu, H., Memar, J., Kononov, A., Zinder, Y.: Efficient Lagrangian heuristics for the
two-stage flow shop with job dependent buffer requirements. J. Discret. Algorithms
52–53, 143–155 (2018)
10. Luo, W., Xu, Y., Gu, B., Tong, W., Goebel, R., Lin, G.: Algorithms for communi-
cation scheduling in data gathering network with data compression. Algorithmica
80(11), 3158–3176 (2018)
11. Röck, H.: Some new results in no-wait flow shop scheduling. Z. Oper. Res. 28(1),
1–16 (1984)
12. Süral, H., Kondakci, S., Erkip, N.: Scheduling unit-time tasks in renewable resource
constrained flowshops. Z. Oper. Res. 36(6), 497–516 (1992)
Author Index
E I
Eddy, Foo Y. S., 247 Imanzadeh, S., 971
Einšpiglová, Daniela, 202 Imasheva, Baktagul, 820
Ellaia, Rachid, 3, 547
J
F Jarno, Armelle, 971
Fajemisin, Adejuyigbe, 617 Jemai, Zied, 1078
Fakhari, Farnoosh, 926 Jemmali, Mahdi, 949
Fampa, Marcia, 89, 267, 428 Ji, Sai, 488
Fenner, Trevor, 779 Jiang, Rujun, 145, 213
Fernandes, Edite M. G. P., 16 Jin, Zhong-Xiao, 1067
Fernández, José, 1013 Jouini, Oualid, 1078
Ferrand, Pascal, 981
Foglino, Francesco, 720 K
Frolov, Dmitry, 779 Kanzi, Nader, 702
Fuentes, Victor K., 89 Karpenko, Anatoly, 191
Fukuba, Tomoki, 937 Kasri, Ramzi, 279
Kassa, Semu Mitiku, 589
G Kaźmierczak, Anna, 128
G.-Tóth, Boglárka, 1013 Khalij, Leila, 567
Galán, M. Ruiz, 518 Khenchaf, Ali, 1097
Galuzzi, Bruno Giovanni, 751 Koliechkina, Liudmyla, 355
Gamez, Domingo, 518 Kononov, Alexander, 1139
Gao, Runxuan, 135 Korolev, Alexei, 398
Garmashov, Ilia, 398 Koudi, Jean, 831
Garralda–Guillem, A. I., 518 Kozinov, Evgeniy, 638
Gautrelet, Christophe, 567 Krishnan, Ashok, 247
Gawali, Deepak D., 58 Kronqvist, Jan, 448
Gergel, Victor, 638 Kulanda, Duisebekova, 1119
Ghaderi, Seyed Farid, 926 Kulitškov, Aleksei, 365
Ghosh, Tamal, 906 Kumar, Deepak, 257
Giordani, Ilaria, 751 Kumlander, Deniss, 365, 458
Gobbi, Massimiliano, 68
Goldberg, Noam, 871 L
Gomes, Guilherme Ferreira, 600 Le Thi, Hoai An, 289, 299, 320, 893, 1054
Granvilliers, Laurent, 99 Le, Hoai Minh, 893
Le, Thi-Hoang-Yen, 740
H Lebedev, Ilya, 48
Haddou, Mounir, 228 Lee, Jon, 89, 387, 438
Hamdan, Sadeque, 1078 Lefieux, Vincent, 893
Hartman, David, 119 Leise, Philipp, 916
Hennequin, Sophie, 1054, 1109 Lemosse, Didier, 567, 991, 1001
Henner, Manuel, 981 Leonetti, Matteo, 720
Hladík, Milan, 119 Li, Duan, 145, 213
Ho, Vinh Thanh, 1054 Li, Min, 488
Holdorf Lopez, Rafael, 238, 557 Li, Yaohui, 627
Homolya, Viktor, 109 Li, Zhijian, 730
Hu, Xi-Wei, 341 Liu, Wen-Zhuo, 330
Huang, Yaoting, 1067 Liu, Zhengliang, 691
Author Index 1151
M Q
Ma, Ran, 1089 Qiu, Ke, 468
Martinsen, Kristian, 906
Melhim, Loai Kayed B., 949 R
Melo, Wendel, 428 Raissa, Uskenbayeva, 761, 810, 842, 861, 882
Migot, Tangi, 228 Rakhmetulayeva, Sabina, 861, 882, 1119
Mikitiuk, Artur, 407 Raupp, Fernanda, 428
Mirkin, Boris, 779 Redondo, Juana L., 1013
Mishra, Priyanka, 660 Regis, Rommel G., 37
Mishra, Shashi Kant, 182, 660 Rocha, Ana Maria A. C., 16
Mizuno, Shinji, 611 Rossi, Roberto, 417
Moeini, Mahdi, 1023, 1129 Roy, Daniel, 1054, 1109
Mohapatra, Ram N., 660 Ryoo, Hong Seo, 376
Mokhtari, Abdelkader, 26 Ryskhan, Satybaldiyeva, 842
Mukazhanov, Nurzhan K., 761
Mukhanov, S.B., 810 S
Muts, Pavlo, 498 Sadeghieh, Ali, 702
Myradov, Bayrammyrat, 526 Sagratella, Simone, 720
Sakharov, Maxim, 191
N Salewski, Hagen, 1023
Nakispekov, Azamat, 820 Samir, Sara, 299
Nascentes, Fábio, 238 Sampaio, Rubens, 238
Nascimento, Susana, 779 Sarmiento, Orlando, 267
Nataraj, Paluri S.V., 58 Sato, Tetsuya, 937
Naumann, Uwe, 78 Schermer, Daniel, 1129
Ndiaye, Babacar Mbaye, 831 Seccia, Ruggiero, 720
Nguyen, Duc Manh, 1097 Sergeev, Sergeĭ, 691
Nguyen, Huu-Quang, 221 Shahi, Avanish, 182
Nguyen, Phuong Anh, 289 Sheu, Ruey-Lin, 221
Nguyen, Viet Anh, 320 Shi, Jianming, 611
Nielsen, Frank, 790 Shiina, Takayuki, 937
Niu, Yi-Shuai, 330, 341 Shoemaker, Christine A., 672, 681
Nouinou, Hajar, 1054 Sidelkovskaya, Andrey, 820
Nowak, Ivo, 498 Sidelkovskiy, Ainur, 820
Nowakowski, Andrzej, 128 Simon, Nicolai, 916
Singh, Sanjeev Kumar, 182
O Singh, Vinay, 649
Onalbekov, Mukhit, 850 Skipper, Daphne, 387
Ortigosa, Pilar M., 1013 Speakman, Emily, 387
Subba, Mohan Bir, 649
P Sun, Jian, 1089
Pagnacco, Emmanuel, 547 Syga, Monika, 175
Parra, Wilson Javier Veloz, 1001
Patil, Bhagyesh V., 58, 247 T
Pelz, Peter F., 916 Taibi, S., 971
Perego, Riccardo, 751 Talbi, El-Ghazali, 790
Phan, Anh-Cang, 740, 769 Tarim, Armagan, 417
1152 Author Index