Sameh Hosny Math MSC Dissertation

LARGE SCALE LINEAR OPTIMIZATION FOR
WIRELESS COMMUNICATION SYSTEMS
A Thesis
Presented in Partial Fulfillment of the Requirements for the Degree

Master of Mathematical Science in the Graduate School of The Ohio
State University
By
Sameh Hosny, M.S.
Graduate Program in Department of Mathematics
The Ohio State University
2017
Master’s Examination Committee:

Prof. Ghaith Hiary, Advisor
Prof. Facundo Memoli
© Copyright by
Sameh Hosny
2017
Abstract
Linear Programming has many applications in the domain of wireless commu-
nication. Many problems in this field consist of a very large number of variables
and constraints and therefore fit in the platform of large scale linear programming.
Advancements in computing over the past decade have allowed us to routinely solve
linear programs in thousand of variables and constraints, using specialized methods
from large scale linear programming. There are many software packages that im-
plement such methods, e.g. AMPL, GAMS and Matlab. This dissertation gives A
concise survey of linear programming fundamentals with a focus on techniques for
large scale linear programming problems in the context of wireless communication.
The dissertation explains some of these techniques, in particular the delayed column
generation method and the decomposition method. It also draws on examples from
the active field of wireless communication. The dissertation is concluded by giving
concrete examples of how to use various software packages to solve large scale lin-
ear programming problems stemming from our examples in the context of wireless
communication.
ii
To the soul of my father, to my beloved mother, to my great wife, Doaa Eid
and my kids Rinad, Rawan and Mohammed.
iii
Acknowledgments
I would like to express my special appreciation and thanks to my advisor Professor
Ghaith Hiary. You have been a tremendous mentor for me. It has been an honor for
me to be one of your students. I appreciate all your contributions of time and ideas to
make my M.Sc. experience productive and stimulating. The joy and enthusiasm you
have for your research was contagious and motivational for me, even during tough
times in the M.Sc. pursuit. I am also thankful to all the professors who taught me
from the math department. I am really grateful to them all for their dedication and
devotion to the courses they educate. These courses helped me create a strong and
rigorous background in both my major and minor fields. It allowed me to improve
my research skills and to change my perspective to many things.
The members of the IPS lab have contributed immensely to my personal and
professional time at The Ohio State University. The group has been a source of
friendships as well as good advice and collaboration. I would like to acknowledge
my colleague John Tadrous for his continuous help and generosity. He was always
supporting me with all the information I needed especially in the beginning of my
study. Moreover, I am thankful to my colleague Faisal Alotaibi for the great time we
spent working together and having useful technical discussions in our group meetings.
My time at OSU was made enjoyable in large part due to the many friends and
groups that became a part of my life. I would like to extend my special thanks to my
iv
best Egyptian friend Sameh Shohdy and his great family for their kindness, support,
and hospitality. They supported me and my family until everything was settle down
in Columbus. I also experess my thanks to our great American firends, Betty Rocke
and Randy, for supporting our stay in Columbus and helping my son Mohammed in
learning so many things.
Lastly, I would like to thank my family for all their love and encouragement. For
my parents who raised me with a love of science and supported me in all my pursuits.
And most of all for my loving, supportive, encouraging, and patient wife Doaa Eid
whose faithful support during all stages of this Ph.D. is so appreciated. For spending
many nights waiting for me to accomplish my hard tasks. Thank you.
v
Vita
December 11, 1978 . . . . . . . . . . . . . . . . . . . . . . . . . Born - Cairo, Egypt
2001 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.S. Electrical and Computer Engi-

neering
2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M.S. Electrical and Computer Engi-
neering
2014-present . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ph.D. Student, Electrical and Com-
puter Engineering, The Ohio State
University
2015-present . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M.S. Student, Mathematics, The Ohio
State University
Publications
(Accepted) S. Hosny, A. Eryilmaz and H. El Gamal, ”Impact of User Mobility on

D2D Caching Networks,” IEEE Global Communications Conference, Washignton DC,
USA, 2016.
(Accepted) S. Hosny, A. Eryilmaz and H. El Gamal, ”Mobility-Aware Centralized

D2D Caching Networks,” 54th Annual Allerton Conference on Communication, Con-
trol, and Computing, Illinois, USA, 2016.
(Submitted) S. Hosny, F. Alotaibi, J. Tadrous, A. Eryilmaz and H. El Gamal, ”Con-

tent Trading in D2D Caching Networks,” IEEE/ACM Transactions on Networking.
(To be submitted) S. Hosny, A. Abouzeid, A. Eryilmaz and H. El Gamal, ”Mobility-

Aware D2D Caching Networks,” IEEE Transactions on Wireless Communications.
F. Alotaibi, S. Hosny, H. El Gamal and A. Eryilmaz, ”A game theoretic approach to

content trading in proactive wireless networks,” 2015 IEEE International Symposium
on Information Theory (ISIT), Hong Kong, 2015, pp. 2216-2220.
vi
S. Hosny, F. Alotaibi, H. E. Gamal and A. Eryilmaz, ”Towards a P2P mobile contents
trading,” 2015 49th Asilomar Conference on Signals, Systems and Computers, Pacific
Grove, CA, 2015, pp. 338-342.
S. Hosny, F. Alotaibi, H. El Gamal and A. Eryilmaz, ”Towards a mobile content

marketplace,” 2015 IEEE 16th International Workshop on Signal Processing Advances
in Wireless Communications (SPAWC), Stockholm, 2015, pp. 675-679.
Alotaibi, F., S. Hosny, J. Tadrous, H. El Gamal, and A. Eryilmaz. ”Towards a mar-

ketplace for mobile content: Dynamic pricing and proactive caching.” arXiv preprint
arXiv:1511.07573 (2015).
Fields of Study
Major Field: Electrical & Computer Engineering
vii
Table of Contents
Page
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. Review of Linear Programming . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Geometry of a Linear Program . . . . . . . . . . . . . . . . . . . . 7
2.3 Degeneracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 The Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.1 Implementation of the Simplex Method . . . . . . . . . . . 16
2.4.2 Comparisons and Performance Enhancements . . . . . . . . 20
2.5 The Duality Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6 Example Problems for Wireless Communication Networks . . . . . 27
2.6.1 Power Control in a Wireless Network . . . . . . . . . . . . . 27
2.6.2 Multicommodity Network Flow . . . . . . . . . . . . . . . . 28
2.6.3 D2D Caching Networks . . . . . . . . . . . . . . . . . . . . 30
viii
3. Large Scale Linear Programs . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.1 Delayed Column Generation Method . . . . . . . . . . . . . . . . . 34

3.2 Cutting Plane Method . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3 Dantzig-Wolfe Decomposition . . . . . . . . . . . . . . . . . . . . . 40
3.4 The Cutting Stock Problem . . . . . . . . . . . . . . . . . . . . . . 45
3.5 Applications in Wireless Communication . . . . . . . . . . . . . . . 48
4. Implementation of Large Scale Linear Programs . . . . . . . . . . . . . . 51
4.1 AMPL Programming Language . . . . . . . . . . . . . . . . . . . . 52

4.1.1 Implementation of The Cutting Stock Problem using AMPL 53
4.2 GAMS Programming Language . . . . . . . . . . . . . . . . . . . . 57
4.2.1 Implementation of Dantzig-Wolfe Decomposition Method us-
ing GAMS . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 Matlab Programming Language . . . . . . . . . . . . . . . . . . . . 62
4.3.1 Matlab Implementation of D2D Caching Example . . . . . . 64
Appendices 68
A. AMPL Implementation of Column Generation . . . . . . . . . . . . . . . 68
B. GAMS Implementation of Multi-Commodity Network Flow Problem . . 71
C. Matlab Implementation of D2D Caching Example . . . . . . . . . . . . . 75
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
ix
List of Tables
Table Page
2.1 Comparison between Simplex implementation methods . . . . . . . . . . . 22
2.2 The Different Possibilities for the Primal and Dual Problems . . . . . . . . 26
x
List of Figures
Figure Page
2.1 Graphical solution of a linear program example. . . . . . . . . . . . . . . . 8
2.2 Visualization of standard form problems . . . . . . . . . . . . . . . . . . . 8
2.3 Full Tableau Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 An illustration of the power control example. . . . . . . . . . . . . . . . . 28
2.5 An illustration of the multi-commodity network flow example. . . . . . . . 30
2.6 An illustration of the D2D caching networks example. . . . . . . . . . . . . 32
4.1 System Performance of the D2D Caching Network . . . . . . . . . . . . . 67
xi
Chapter 1: Introduction
The importance of Linear Programming (LP) derives in part from its many ap-
plications and in part from the existence of efficient techniques to solve it. These
techniques are fast and reliable over a substantial range of problem sizes, inputs and
applications. Linear programming has been proven to be valuable for modeling di-
verse types of problems in planning, routing, scheduling, assignment, and design.
Industries that make use of LP and its extensions include transportation, energy,
telecommunications, health care, finance and manufacturing. In a number of these
applications, a realistic model gives rise to a LP problem with a large number of
variables and constraints. This makes the problem more complicated and requires
substantial computational resources to solve it; in particular, substantial amount of
fast memory and higher computational speed. For this reason, a number of special-
ized procedures, such as column generation and cutting-plane methods, have been
developed to effectively solve such large-scale linear programs. Yet, in other cases,
the LP problem may have a special structure where the decomposition methods can
be useful.
Linear programming has numerous and important applications in the domain of
wireless communications, e.g. network flow, power control, caching networks, etc.
Most of these applications deal with very large number of variables and constraints.
1
For example, caching networks deal with a very large number of users and a tremen-
dous amount of data contents. Therefore, we focus in this dissertation on linear
programming methods for such large scale problems. We also investigate some soft-
ware packages to implement and solve these problems. Thanks to the advances in
computing over the past decade, linear programs in a few thousand variables and
constraints are nowadays viewed as ”small” problems. Problems having tens, or even
hundreds, of thousands of continuous variables are regularly solved using software
packages such as AMPL, GAMS, Matlab, etc. Large-scale LP software packages uti-
lize special techniques from numerical linear algebra such as sparse matrix techniques
together with refinements developed through years of experience. Though, this is not
going to be the focus of this dissertation. This dissertation presents some common
examples of wireless communication systems and illustrates of how to solve them
using these software packages.
The dissertation is organized as follows: Chapter 2 is a review for the fundamentals
of linear programming to fix the notation. In Chapter 3, we describe some important
large scale LP algorithms. In this chapter, we focus on algorithms that have been
proven in practise. In Chapter 4, we illustrate how to implement the large scale LP
algorithms, discussed in Chapter 3, using software packages such as AMPL, GAMS
and Matlab, to solve some large scale LP examples from the context of wireless
communication.
2
Chapter 2: Review of Linear Programming
2.1 Introduction
A linear program (LP) is an optimization problem in which the objective function
is linear in the unknowns and the constraints consist of linear equalities and linear
inequalities [1]. In a general linear programming problem, we are given a cost vector
0
c = (c1 , · · · , cn ) and we seek to minimize a linear cost function c x = ni=1 ci xi over
P
all n-dimensional vectors x = (x1 , · · · , xn ) subject to a set of linear equality and
inequality constraints. Any linear program can be transformed into the following
standard form:
minimize c1 x1 + c2 x2 + · · · + cn xn
x
subject to a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2

(2.1)
..
.
am1 x1 + am2 x2 + · · · + amn xn = bm
x1 ≥ 0, x2 ≥ 0, · · · , xn ≥ 0.
where the bi ’s, ci ’s and aij ’s are fixed real constants, and the xi ’s are real numbers to
be determined. We always assume that each equation has been multiplied by minus
3
unity, if necessary, so that each bi ≥ 0. A linear programming problem of the form:
0
minimize cx
x
subject to Ax = b (2.2)
x ≥ 0.
is said to be in standard form. Suppose that x has dimension n and let a1 , · · · , an
be the rows of an (m × n) matrix A and b is an m-dimensional column vector.
The variables x1 , · · · , xn are called decision variables, and a vector x satisfying all of
the constraints is called a feasible solution or feasible vector. The set of all feasible
0
solutions is called the feasible set or feasible region. The function c x is called the
objective function or cost function. A feasible solution x∗ that minimizes the objective
0 0
function (that is , c x∗ ≤ c x, for all feasible x) is called an optimal feasible solution
0
or, simply, an optimal solution. The value of c x∗ is then called the optimal cost.
0 0
An equality constraint ai x = bi is equivalent to the two constraints ai x ≤ bi
0 0
and ai x ≥ bi . In addition, any constraint of the form ai x ≤ bi can be rewritten as
0
(−ai )x ≥ −bi . Finally, constraints of the form xj ≥ 0 or xj ≤ 0 are special cases of
0 0
constraints of the form ai x ≥ bi where ai is a unit vector and bi = 0. We conclude that
the feasible set in a general linear programming problem can be expressed exclusively
0
in terms of inequality constraints of the form ai x ≥ bi . Suppose that there is a total
of m such constraints, indexed by i = 1, · · · , m, let b = (b1 , · · · , bm ), and let A be

0 0
the m × n matrix whos rows are the row vectors a1 , · · · , am , the linear programming
problem can be written as:

0
minimize cx
x
(2.3)
subject to Ax ≥ b
4
Example 1. The following is a linear programming problem:
minimize x1 − x2 + x3
x
subject to x1 + x2 + x4 ≤2
x2 − x3 =5
x3 + x4 ≥3
x1 ≥0
x3 ≤ 0.
It can be rewritten as follows:
minimize x1 − x2 + x3
x
subject to − x1 − x2 − x4 ≥ −2
x2 − x3 ≥5
− x2 + x3 ≥ −5
x3 + x4 ≥3
x1 ≥0
− x3 ≥ 0.
with c = (1, −1, 1, 0),  

−1 −1 0 −1

 0 1 −1 0 
 0 −1 1 0 
A= 

 0 0 1 1 
 1 0 0 0 
0 0 −1 0
and b = (−2, 5, −5, 3, 0, 0).
Any general linear programming problem can be transformed into an equivalent
problem in standard form (2.2). We say that the two problems are equivalent, that is
given a feasible solution to one problem, we can construct a feasible solution to the
5
other, with the same cost. In particular, the two problems have the same optimal cost
and given an optimal solution to one problem, we can construct an optimal solution
to the other. The problem transformation is based on two steps:
(a) Elimination of free variables: Any real number can be written as the dif-
ference of two non-negative numbers. Hence, any unrestricted variable xj in a

− −
problem in general form, can be replaced by x+ +
j − xj , where xj and xj are new
−
variables on which we impose the sign constraints x+
j ≥ 0 and xj ≥ 0.
(b) Elimination of inequality constraints: Given an inequality constraint of

Pn
the form aij xj ≤ bi , we introduce a new variable si and the standard form
j=1
constraints nj=1 aij xj + si = bi , si ≥ 0. Such a variable si is called a slack
P
Pn
variable. Similarly, an inequality constraint aij xj ≥ bi can be put in standard
j=1
form by introducing a surplus variable si and the constraints nj=1 aij xj − si =
P
bi , si ≥ 0.
Example 2. Given the problem:
minimize 2x1 + 4x2
subject to x1 + x2 ≥3
3x1 + 2x2 = 14
x1 ≥ 0.
is equivalent to the standard form problem:
minimize 2x1 + 4x+

2 − 4x−
2
subject to x1 + x+
2 − x−
2 − x3 =3
3x1 + 2x+
2 − 2x−
2 = 14
x1 , x+
2, x−
2, x3 ≥ 0.
6
For example, given the feasible solution (x1 , x2 ) = (6, −2) to the original problem, we ob-
−
tain the feasible solution (x1 , x+
2 , x2 , x3 ) = (6, 0, 2, 1) to the standard form problem, which
−
has the same cost. Conversely, given the feasible solution (x1 , x+
2 , x2 , x3 ) = (8, 1, 6, 0) to
the standard form problem, we obtain the feasible solution (x1 , x2 ) = (8, −5) to the origi-
nal problem with the same cost .
2.2 Geometry of a Linear Program
We can also visualize standard form problems geometrically. For example, consider
the problem
minimize − x1 − x2
subject to x1 + 2x2 ≤3
2x1 + x2 ≤3
x1 , x2 ≥ 0.
The feasible set is the shaded region in Figure 2.1. In order to find an optimal
solution, we identify the cost vector c = (−1, −1) and for any given z, we consider
the line described by the equation −x1 − x2 = z. We change z to move this line in the
direction of the vector −c as much as possible as long as we do not leave the feasible
region. The best we can do is z = −2 and the vector x = (1, 1) is an optimal solution
which is a corner in the feasible region.
Now, assuming that m ≤ n and that the constraints Ax = b force x to lie on an
(n − m)-dimensional set. For example, consider the feasible set in R3 defined by the
constraints x1 + x2 + x3 = 1 and x1 , x2 , x3 ≥ 0 and note that n = 3 and m = 1. The
plane defined by the equality constraint appears as a triangle in a two-dimensional
space. Furthermore, each edge of the triangle corresponds to one of the inequality
constraints. The optimal solution lies inside the shaded triangle in Figure 2.2.
7
x2
3
−x1 − x2 = −2
2x 1
+x
2
=
3
1.5
−x1 − x2 = z (1, 1)
−x1 − x2 = 0 x1
+2
x2
=3
x1
1.5 3
c
Figure 2.1: Graphical solution of a linear program example.
x3
x2
x1
=0
x2
0
=
x3 =
0
x1
(a) An (n − m)-dimensional (b) An n-dimensional view of
view of the same set. the feasible set.
Figure 2.2: Visualization of standard form problems
In general, for any linear program, we have the following possibilities:
(a) There exists a unique optimal solution.
(b) There exist multiple optimal solutions; in this case, the set of optimal solutions
can be either bounded or unbounded.
8
(c) The optimal cost is −∞, and no feasible solution is optimal.
(d) The feasible set is empty.
As a preliminary investigation, we can say that if the problem has at least one
optimal solution, then an optimal solution can be found among the corners of the
feasible set. This idea is the core of how to solve a linear program. To develop this
let us start with some basic definitions.
Definition 1. A polyhedron is a set that can be described in the form {x ∈ Rn |Ax ≥ b},
where A is an m × n matrix and b is a vector in Rm .
Definition 2. Let P be a polyhedron. A vector x ∈ P is an extreme point of P if we
cannot find two vectors y, z ∈ P , both different from x, and a scalar λ ∈ [0, 1] such that
x = λy + (1 − λ)z.
An alternative definition which is also used to find the unique optimal solution is
a vertex of a polyhedron.
Definition 3. Let P be a polyhedron. A vector x ∈ P is a vertex of P if there exists some

0 0
c such that c x < c y for all y satisfying y ∈ P and y 6= x.
In other words, x is a vertex of P if and only if P is on one side of a hyperplane
which meets P only at the point x. Consider a polyhedron P ⊂ Rn defined in terms
of the linear equality and inequality constraints

0
ai x ≥ bi , i ∈ M1 ,
0
ai x ≤ bi , i ∈ M2 ,
0
ai x = bi , i ∈ M3 ,
where M1 , M2 and M3 are finite index sets, each ai is a vector in Rn , and each bi is a
scalar. This allows us to see the following definition.
9
0
Definition 4. If a vector x∗ satisfies ai x = bi for some i ∈ M1 , M2 , or M3 , we say that the
corresponding constraint is active or binding at x∗ .
If there are n constraints that are active at a vector x∗ , then x∗ satisfies a certain
system of n linear equations in n unknowns. This system has a unique solution if
and only if these n equations are ”linearly independent”. Since we have m equality
constraints in the standard form problems, we need to find n−m inequality constraints
which are also active. Once we have n linearly independent active constraints, a
unique vector x∗ is determined. However, this procedure has no guarantee of leading
to a feasible vector x∗ , because some of the inactive constraints could be violated; in
the latter case we say that we have a basic (but not basic feasible) solution.
Definition 5. Consider a polyhedron P defined by linear equality and inequality con-
straints, and let x∗ be an element of Rn .
(a) The vector x∗ is a basic solution if:
i All equality constraints are active;
ii Out of the constraints that are active at x∗ , there are n of them that are linearly
independent.
(b) If x∗ is a basic solution that satisfies all of the constraints, we say that it is a basic
feasible solution.
Now, we relate these definitions together in the following theorem:
Theorem 1. Let P be a nonempty polyhedron and let x∗ ∈ P . Then, the following are
equivalent: (a) x∗ is a vertex; (b) x∗ is an extreme point; (c) x∗ is a basic feasible solution.
10
Every basic solution must satisfy the equality constraints Ax = b, which provides
us with m active constraints. These active constraints are linearly independent by
the assumption on the rows of A. To obtain a total of n active constraints, we need to
choose n − m of the variables xi and set them to zero, which makes the corresponding
constraint xi ≥ 0 active. However, to get a set of n linearly independent active
constraints, the choice of these n − m variables is not entirely arbitrary.
Theorem 2. A vector x ∈ Rn is a basic solution if and only if we have Ax = b and there
exist indices B(1), · · · , B(m) such that:
(a) The columns AB(1) , · · · , AB(m) are linearly independent;
(b) if i 6= B(1), · · · , B(m), then xi = 0.
Therefore, to find a basic solution, we need to choose m linearly independent
columns AB(1) , · · · , AB(m) . We let xi = 0 for all i 6= B(1), · · · , B(m). Then, we
solve the system of m equations Bx = b for the unknowns xB(1) , · · · , xB(m) . If the
result of this procedure is nonnegative, then it is feasible. If x is a basic solution,
the variables xB(1) , · · · , xB(m) are called basic variables; the remaining variables are
called nonbasic. The columns AB(1) , · · · , AB(m) are called basic columns and, since
they are linearly independent, they form a basis of Rn . By arranging the m basic
columns next to each other, we obtain an m × m matrix B called a basis matrix. We
can also define a vector xB with the values of the basic variables. Thus,
 
xB(1)
 
| | |
B =  AB(1) AB(2) · · · AB(m)  , xB =  ... 
 
| | | xB(m)
The basic variables are determined by solving the equation BxB = b whose unique
solution is given by xB = B−1 b. We end this section by the following theorem.
11
0
Theorem 3. Consider the LP problem of minimizing c x over a polyhedron P. Suppose that
P has at least one extreme point and that there exists an optimal solution. Then, there exists
an optimal solution which is an extreme point of P.
2.3 Degeneracy
According to the previous definitions, at a basic soution, we must have n linearly
independent active constraints. This allows for the possibility of the number of active
constraints is greater than n. Of course, in n dimensions, no more than n of them
can be linearly independent. This also means that we will have more than n − m
variables with the value of zero. In this case, we say that we have a degenerate basic
solution.
Definition 6. A basic solution x ∈ Rn is said to be degenerate if more than n of the
constraints are active at x. In other words, if more than n − m of the components of x take
the value of zero.
If the entries of A or b were chosen at random, this would almost never happen.
However, in practical problems, the entries of A and b often have a nonrandom
structure and degeneracy is more common.
Example 3. Consider the polyhedron P defined by the constrains
x1 + x2 + 2x3 ≤ 8
x2 + 6x3 ≤ 12
x1 ≤ 4
x2 ≤ 6
x, x2 , x3 ≥ 0.
12
The vector x = (2, 6, 0) is a nondegenerate basic feasible solution, because there
are exactly theree active and linearly independent constraints, namely x1 +x2+2x3 ≤
8, x2 ≤ 6, and x3 ≥ 0. The vector x = (4, 0, 2) is a degenerate basic feasible solution,
because there are four active constraints, three of them are linearly independent,
namely x1 + x2 + 2x3 ≤ 8, x2 + 6x3 ≤ 12, x1 ≤ 4 and x2 ≥ 0.
2.4 The Simplex Method
If a linear program in standard form has an optimal solution, then there exists a
basic feasible solution that is optimal. The simplex method searches for an optimal
solution by moving from one basic feasible solution to another, along the edges of
the feasible set, in a cost reducing direction. For general optimization problems, a
locally optimal solution need not be globally optimal. In linear programming, local
optimality implies global optimality; because we are minimizing a convex cost function
over a convex set. Therefore, the simplex method terminates once an optimal solution
is found.
Now suppose that we are at point x ∈ P and we are moving away from x in the
direction of a vector d ∈ Rn . Clearly, we should not consider those choices of d which
take us outside the feasibility set. We say that d ∈ Rn is a feasible direction at x,
if there is a positive scalar θ for which x + θd ∈ P . We are moving away from x,
to a new vector x + θd by selecting a nonbasic variable xj (which is initially zero)
and increasing it to a positive value θ, while keeping the remaining nonbasic variables
at zero. Algebraically, dj = 1, and di = 0 for every nonbasic index i other than
j. At the same time, the vector xB of basic variables changes to xB + θdB , where
dB = (dB(1) , dB(2) , · · · , dB(m) ). Since we are only interested in feasible solutions, we
13
require A(x + θd) = b, and since x is feasible, we have Ax = b. Thus, for θ > 0, we
need Ad = 0. Then
n
X m
X
0 = Ad = Ai d i = AB(i) dB(i) + Aj = BdB + Aj
i=1 i=1
Since B in invertible, we obtain
dB = −B−1 Aj
0
Now, if d is the jth basic direction, then the rate c d of cost change along the direction
0
d is given by cB dB + cj , where cB = (cB(1) , · · · , cB(m) ). This is defined as the reduced
0
cost c̄j = cj − cB B−1 Aj of moving in this direction. Note that, cj is the cost per unit
0
increase in the variable xj , and the term −cB B−1 Aj is the cost of the compensating
change in the basic variables necessitated by the constraint Ax = b. Since B is the
matrix AB(1) · · · AB(m) , we have B−1 AB(1) · · · AB(m) = I, where I is the m × m

identity matrix. Therefore, for every basic variable xB(i) , we have
0 0
c̄B(i) = cB(i) − cB B−1 AB(i) = cB(i) − cB ei = cB(i) − cB(i) = 0,
that is the reduced cost of every basic variable is zero. The following theorem illus-
trates the optimality conditions.
Theorem 4. Consider a basic feasible solution x associated with a basis matrix B, and let
c̄ be the corresponding vector of reduced costs.
(a) If c̄ ≥ 0, then x is optimal.
(b) If x is optimal and nondegenerate, then c̄ ≥ 0.
That is, in order to decide whether a nondegenerate basic feasible solution is
optimal, we need only to check whether all reduced costs are nonnegative, which is
14
the same as examining n − m basic directions. If x is a degenerate basic feasible
solution, an equally simple computational test for determining whether x is optimal
is not available. Therefore, to assert that a certain basic solution is optimal, we need
to satisfy two conditions: feasibility and nonnegativity of the reduced costs. This
leads us to the following definition
Definition 7. A basis matrix B is said to to be optimal if:
(a) B−1 b ≥ 0, and
0 0 0 0
(b) c̄ = c − cB B−1 A ≥ 0 .
If an optimal basis is found, the corresponding basic solution is feasible, satisfies
the optimality conditions, and is therefore optimal. Let us assume that every basic
feasible solution is nondegenerate. Suppose we are at a basic feasible solution x and
that we have computed the reduced costs c̄j of the nonbasic variables. If the reduced
cost c̄j of a nonbasic variable xj is negative, the jth basic direction d is a feasible
direction of cost reduction. While moving along this direction d, the nonbasic variable
xj becomes positive and all other nonbasic variables remain at zero. We describe this
situation by saying that xj (or Aj ) enters or is brought into the basis and replaces
one of the columns in B. An iteration of the simplex method is described as follows:
The following theorem states that, in the nondegenerate case, the simplex method
works correctly and terminates after a finite number of iterations.
Theorem 5. Assume that the feasible set is nonempty and that every basic feasible solution
is nondegenerate. Then, the simplex method terminates after a finite number of iterations.
At termination, there are the following two possibilities:
15
Algorithm 1 An iteration of the simplex method
1. We start with a basis consisting of the basic columns AB(1) , · · · , AB(m) and an asso-
ciated basic feasible solution x.
0
2. Compute the reduced costs c̄j = cj − cB B−1 Aj for all nonbasic indices j. If they are
all nonnegative, then x is optimal and the algorithm terminates; else, choose some j
0
for which c̄j < 0.
3. compute u = B−1 Aj . If no component of u is positive, we have θ∗ = ∞, the optimal

cost is −∞, and the algorithm terminates.
xB(i)
4. If some component of u is positive, let θ∗ = min
{i=1,··· ,m|ui >0} ui
x
5. Let l be such that θ∗ = B(i) ui
. Form a new basis by replacing AB(l) with Aj . If y is
the new basic feasible solution, the values of the new basic variables are yj = θ∗ and
yB(i) = xB(i) − θ∗ ui , i 6= l.
(a) We have an optimal basis B and an associated basic feasible solution which is opti-
mal.
0
(b) We found a vector d satisfying Ad = 0, d ≥ 0, c d < 0, and the optimal cost is
−∞.
2.4.1 Implementation of the Simplex Method
From the previous section, we notice that the vectors B−1 Aj play a key role in the
simplex method. If these vectors are available, the reduced cost c̄, the direction of
motion d, and the step size θ∗ are easily computed. Thus, the main difference between
alternative implementations lies in the way that the vectors B−1 Aj are computed and
the complexity of this computation. We introduce a comparison between alternative
implementations and performance enhancement in Section 2.4.2.
16
Naive Implementation
At the beginning of a typical iteration, we have the indices B(1), · · · , B(m) of the
0 0
current basic variables. For the basic matrix B, we compute the vector p = cB B−1
which is called the vector of simplex multipliers associated with the basis B. The
0
reduced cost of any variable xj is then obtained by c̄j = cj − p Aj . Depending on
the pivoting rule employed, we may have to compute all of the reduced costs or we
may compute them one at a time until a variable with a negative reduced cost is
encountered. Once a column Aj is selected to enter the basis, we solve the linear
system Bu = Aj in order to determine the vector u = B−1 Aj . At this point , we
can form the direction along which we will be moving away from the current basic
feasible solution. We finally determine θ∗ and the variable that will exit the basis, and
construct the new basic feasible solution. This iteration is repeated until all reduced
costs are nonngative.
Revised Simplex Method
The computational complexity of the naive implementation is due to the need
for solving two linear system of equations. In the revised simplex method, the
matrix B−1 is made available at the beginning of each iteration, and the vectors
0
cB B−1 and B−1 Aj are computed by matrix-vector multiplication. However, we
need an efficient method for updating the matrix B−1 for each basis change. Let

B = AB(1) · · · AB(m) be the basis matrix at the beginning of an iteration and let

B̄ = AB(1) · · · AB(l−1) Aj AB(l+1) · · · AB(m) be the basis matrix at the begin-
ning of the next iteration. These two basis matrices have the same columns except
that the lth column AB(l) (the one that exists the basis) has been replaced by Aj
17
(the one that enters the basis). Thus, B−1 contains information that can be exploited
in the computation of B̄−1 . Since B−1 B = I, we see that B−1 AB(i) is the ith unit
vector ei and hence we have

 
| | | | |
B−1 B̄ =  e1 · · · el−1 u el+1 · · · em  , u = B−1 Aj
| | | | |
We can now apply a sequence of elementary row operations that will change the
above matrix to the identity matrix. This sequence of elementary row operations
is equivalent to left-multiplying B−1 B̄ by a certain invertible matrix Q. Hence, we
have QB−1 B̄ = I, which yields QB−1 = B̄−1 . So, applying the same sequence of
row operations to the matrix B−1 , we obtain B̄−1 . A typical iteration of the revised
simplex method includes the same steps as in Algorithm 1 with one more step added
at the end to compute B̄−1 . Form the m × (m + 1) matrix B−1 |u . Add to each one

of its rows a multiple of the lth row to make the last column equal to the unit vector
el . The first m columns of the result is the matrix B̄−1 .
The Full Tableau Implementation
Instead of storing and updating the matrix B−1 , we store and update the m ×
(n + 1) matrix B−1 b|A with columns B−1 b and B−1 A1 , · · · , B−1 An . This matrix

is called the simplex tableau. The column B−1 b, called the zeroth column, contains
the values of the basic variables. The column B−1 Ai is called the ith column of the
tableau. The column u = B−1 Aj corresponding to the variable that enters the basis
is called the pivot column. If the lth basic variable exits the basis, the lth row of
the tableau is called the pivot row. Finally, the element belonging to both the pivot
row and the pivot column is called the pivot element. Note that the pivot element
18
is ul and is always positive (unless u ≤ 0 , in which case the algorithm has met the
termination condition in Step 3).
Note that given the current basis matrix B, the quality constraint Ax = b can
be rewritten as B−1 b = B−1 Ax, which is precisely the information in the tableau.
At the end of each iteration, we need to update the tableau B−1 b|A and compute

B̄−1 b|A . This can be accomplished following the same idea as the revised simplex

method. To determine the exiting column AB(l) and the step size θ∗ , Step 4 and 5 in
the Algorithm 1 amount to the following: xB(i) /ui is the ratio of the ith entry in the
zeroth column of the tableau to the ith entry in the pivot column of the tableau. We
only consider those i for which ui is positive. The smallest ratio is equal to θ∗ and
determines l. We need to augment the simplex tableau by including a top row, to
be referred to as the zeroth row. The entry at the top left corner contains the value
0
−cB xB , which is the negative of the current cost. the rest of the zeroth row is the
0 0 0
row vector of the reduced costs c̄ = c − c̄B B−1 A. The structure of the tableau is
shown in Figure 2.3.
A summary of the full tableau implementation method is described in the following
algorithm.
0 0 0
−cB B−1 b −c − cB B−1 A
B−1 b −B−1 A
Figure 2.3: Full Tableau Structure
19
Algorithm 2 An iteration of the full tableau implementation
1. A typical iteration starts with the tableau associated with a basis matrix B and the
corresponding basic feasible solution x.
2. Examine the reduced costs in the zeroth row of the tableau. If they are all nonnega-
tive, the current basic feasible solution is optimal, and the algorithm terminates; else,
choose some j for which c̄j < 0.
3. Consider the vector u = B−1 Aj , which is the jth column (the pivot column) of the
tableau. If no component of u is positive, the optimal cost is −∞, and the algorithm
terminates.
4. For each i for which ui is positive, compute the ratio xB(i) /ui . Let l be the index of a
row that corresponds to the smallest ratio. The column AB(l) exits the basis and the
column Aj enters the basis.
5. Add to each row of the tableau a constant multiple of the lth row (the pivot row)
so that ul (the pivot element) becomes one and all other entries of the pivot column
become zero.
2.4.2 Comparisons and Performance Enhancements
When comparing different implementations, it is important to keep the following
facts in mind. If B is a given m × m matrix and b ∈ Rm is a given vector, computing
the inverse of B or solving a linear system of the form Bx = b takes O(m3 ) arithmetic
operations. Computing a matrix-vector product Bb takes O(m2 ) operations. Finally,

0
computing an inner product p b of two m-dimensional vectors takes O(m) arithmetic
operations.
Note that, in the naive implementation, we need O(m3 ) arithmetic operations to

0 0
solve the systems p B = cB and Bu = Aj . In addition, computing the reduced costs
of all variables requires O(mn) arithmetic operations , because we need to form the
inner product of the vector p with each one of the nonbasic columns Aj . Thus , the
20
total computational effort per iteration, for the naive implementation, is O(m3 +mn).
The alternative implementations require only O(m2 + mn) arithmetic operations.
Therefore, the naive implementation is rather inefficient, in general. On the other

0 0
hand, for certain problems with a special structure , the linear systems p B = cB and
Bu = Aj can be solved very fast, in which case the naive implementation can be of
practical interest.
The full tableau method requires a constant (and small) number of arithmetic
operations for updating each entry of the tableau. Thus, the amount of computation
per iteration is proportional to the size of the tableau, which is O(mn). The revised
0
simplex method uses similar computations to update B−1 and cB B−1 , and since only
O(m2 ) entries are updated, the computational requirements per iteration are O(m2 ).
In addition, the reduced cost of each variable xj can be computed by forming the inner
0
product p Aj , which requires O(m) operations. In the worst case , the reduced cost
of every variable is computed, for a total of O(mn) computations per iteration. Since
m ≤ n, the worst-case computational effort per iteration is O(mn + m2 ) = O(mn),
under either implementation. On the other hand , if we consider a pivoting rule that
evaluates one reduced cost at a time, until a negative reduced cost is found, a typical
iteration of the revised simplex method might require a lot less work. In the best case,
if the first reduced cost computed is negative, and the corresponding variable is chosen
to enter the basis, the total computational effort is only O(m2 ). The conclusion is
that the revised simplex method cannot be slower than the full tableau method, and
could be much faster during most iterations.
Another advantage to the revised simplex method is that memory requirements
are reduced from O(mn) to O(m2 ). As n is often much larger than m, this effect can
21
be quite significant. It could be counterargued that the memory requirements of the
revised simplex method are also O(mn) because of the need to store the matrix A.
However, in most large scale problems that arise in applications, the matrix A is very
sparse (has many zero entries) and can be stored compactly. (Note that the sparsity
of A does not usually help in the storage of the full simplex tableau because even if A
and B are sparse, B−1 A is not sparse in general). The following table summarizes this
discussion. Note that memory is the storage space required, time is the computational
effort per iteration and best-case time is considered if first computed reduced cost is
negative.
Table 2.1: Comparison between Simplex implementation methods
Naive Revised Full Tableau

Memory O(m) O(m2 ) O(mn)
Worst-case time O(m3 + mn) O(mn) O(mn)
Best-case time O(m3 ) O(m2 ) O(mn)
Some ideas from numerical linear algebra can help us to enhance the performance
of the simplex method. The following are some examples of these ideas:
1. Recall that at each iteration of the revised simplex method, the inverse basis
matrix B−1 is updated according to certain rules. Each such iteration may
introduce roundoff or truncation errors which accumulate and may eventually
lead to highly inaccurate results. The efficiency of such reinversions can be
greatly enhanced by using suitable data structures and certain techniques from
numerical linear algebra like LU factorization, sparse matrices, ... etc.
22
2. Now, suppose that a reinversion has been just carried out and B−1 is available.
Subsequent to the current iteration of the revised simplex method, we have the
option of generating explicitly and storing the new inverse basis matrix B̄−1 .
An alternative that carries the same information, is to store a matrix Q such
that QB−1 = B̄−1 . Note that Q basically prescribes which elementary row
operations need to be applied to B−1 in order to produce B̄−1 . It is not a full
matrix, and can be completely specified in terms of m coefficients: for each row,
we need to know what multiple of the pivot row must be added to it.
3. Subsequent to a ”reinversion,” one does not usually compute B−1 explicitly, but
B−1 is instead represented in terms of sparse triangular matrices with a special
structure.
These methods are designed to accomplish two objectives: improve numerical stability
(minimize the effect of roundoff errors) and exploit sparsity in the problem data
to improve both running time and memory requirements. These methods have a
critical effect in practice. Besides having a better chance of producing numerically
trustworthy results, they can also speed up considerably the running time of the
simplex method. Duality helps us to check the efficiency of the used algorithm by
solving both primal and dual problems and comparing the obtained results. Therefore,
we discuss the duality theory in the following section.
23
2.5 The Duality Theory
Consider the standard form problem

0
minimize cx
x
subject to Ax = b
x ≥ 0.
which we call the primal problem, let x∗ be an optimal solution and assume it exists.
We introduce a relaxed problem in which the constraint Ax = b is replaced by a

0
penalty p (b − Ax), where p is the price vector of the same dimension as b. Then,
we have 0 0
minimize c x + p (b − Ax)
x
subject to x ≥ 0.
Let g(p) be the optimal cost for the relaxed problem, as a function of the price vector
0
p. We see that g(p) is no larger than the optimal primal cost c x∗ , since
0 0 0 0 0
g(p) = min c x + p (b − Ax) ≤ c x∗ + p (b − Ax∗ ) = c x∗

x≥0
where the later inequality follows from the fact that x∗ is a feasible solution to the
primal problem, and satisfies Ax∗ = b. Thus, each p leads to a lower bound g(p) for
0
the optimal cost c x∗ . The problem
maximize g(p)
p
subject to no constraints.
which is interpreted as a search for the tightest possible lower bound of this type, as
is known as the dual problem. Now, using the definition of g(p), we have
0 0 0 0 0
g(p) = min c x + p (b − Ax) = p b + min(c − p A)x
x≥0 x≥0
24
Note that ( 0 0 0
0 0 0, if c − p A ≥ 0 ,
min(c − p A)x =
x≥0 −∞, otherwise.
To maximize g(p), we only consider the values of p for which g(p) is not equal to
−∞. Therefore, we conclude that the dual problem is as follows

0
maximize pb
p
(2.4)
0 0
subject to p A ≥ c .
Moreover, if we transform the dual into an equivalent minimization problem and then
form its dual, we obtain a problem equivalent to the original problem, i.e. ”the dual
of the dual is the primal”. Since g(p) provides a lower bound for the optimal cost,
we can now state the weak duality theorem as follows.
Theorem 6. (Weak Duality) If x is a feasible solution to the primal problem and p is a

0 0
feasible solution to the dual problem, then p b ≤ c x.
Although the weak duality theorem is not a deep result, it does provide some
useful information about the relation between the primal and the dual as stated in
the following corollaries.
Corollary 1. If the optimal cost in the primal is −∞, then the dual problem must be
infeasible. Moreover, if the optimal cost in the dual is +∞, then the primal problem must
be infeasible.
Corollary 2. Let x and p are feasible solutions to the primal and dual problems, respec-
0 0
tively, and suppose that p b = c x. Then x and p are optimal solutions to the primal and
the dual problems, respectively.
The next theorem is the central result on linear programming duality.
25
Theorem 7. (Strong Duality) If a linear programming problem has an optimal solution,
so does its dual, and the respective optimal costs are equal.
Recall that in a linear programming problem, exactly one of the following three
possibilities will occur:
(a) There is an optimal solution.
(b) The problem is ”unbounded” ; that is, the optimal cost is −∞ (for minimization
problems) , or +∞ (for maximization problems) .
(c) The problem is infeasible.
This leads to nine possible combinations for the primal and the dual, which are shown
in Table 2.2. By the strong duality theorem, if one problem has an optimal solution,
so does the other. Furthermore, the weak duality theorem implies that if one problem
is unbounded, the other must be infeasible. This allows us to mark some of the entries
in Table 2.2 as ”impossible.”
Table 2.2: The Different Possibilities for the Primal and Dual Problems
Finite Optimum Unbounded Infeasible

Finite Optimum Possible Impossible Impossible
Unbounded Impossible Impossible Possible
Infeasible Impossible Possible Possible
There is another interesting relation between the primal and the dual which is
known as Clark’s theorem (Clark, 1961). It asserts that unless both problems are
infeasible, at least one of them must have an unbounded feasible set.
26
2.6 Example Problems for Wireless Communication Networks
We are interested in the connection between linear programming and wireless
communication networks. In this section we introduce some examples of wireless
communication problems which fit in the platform of linear programming. All of
them can require large scale linear programming techniques to overcome their growing
complexity with the system parameters.
2.6.1 Power Control in a Wireless Network
Consider a wireless communication system consisting of n mobile users and a
single base station as shown in Figure 2.4. For each i = 1, 2, · · · , n, user i transmits
a signal to the base station with power pi and an attenuation factor of hi (i.e., the
actual signal power received at the base station from user i is hi pi ). When the
base station is receiving from user i the total power received from all other users is
P
considered as an interference (i.e., j6=i hj pj ). For the communication with user i
to be reliable, the signal-to-interference ratio must exceed a threshold γi , where the
“signal” is the power received from user i. We are interested in minimizing the total
power transmitted by all users subject to having reliable communication for all users.
The total transmitted power is p1 + p2 + · · · + pn . The signal-to-interference ratio for
user i is P hi pi . Hence, the problem can be written as:

j6=i hj pj
minimize p1 + p2 + · · · + pn
hi pi
subject to P ≥ γi , i = 1, 2, · · · , n
j6=i hj pj
p1 , p2 , · · · , pn ≥ 0
27
We can rewrite the problem as a linear programming problem as follows:
minimize p1 + p2 + · · · + pn
X
subject to hi pi − γi hj pj ≥ 0, i = 1, 2, · · · , n
j6=i
p1 , p2 , · · · , pn ≥ 0
Note that the complexity of this problem increases with the number of users in the
network and hence it fits under large scale linear programming.
User 1 User 2 User n
p1 p2 pn
h2 p2
h1 p1 hn pn
Base Station
Figure 2.4: An illustration of the power control example.
2.6.2 Multicommodity Network Flow
Consider a communication network consisting of n nodes. Nodes are connected
by communication links. A link allowing one-way transmission from node i to node
j is described by an ordered pair (i, j). Let A be the set of all links. We assume that
each link (i, j) ∈ A can carry up to uij bits per second. There is a positive charge
cij per bit transmitted along that link. Each node k generates data, at the rate of bkl
bits per second, that have to be transmitted to node l, either through a direct link
(k, l) or by tracing a sequence of links. The problem is to choose paths along which
28
all data reach their intended destinations, while minimizing the total cost. We allow
the data with the same origin and destination to be split and be transmitted along
different paths.
In order to formulate this problem as a linear programming problem, we introduce
variables xkl
ij indicating the amount of data with origin k and destination l that
traverse link (i, j). Let 

kl
b ,
 i = k,
kl kl
bi = −b , i = l,

0, otherwise

Thus , bkl
i is the net inflow at node i, from outside the network, of data with origin k
and destination l. We then have the following formulation:

n X
X X n
minimize cij xkl
ij
(i,j)∈A k=1 l=1
X X
subject to xkl
ij − xkl kl
ji = bi , i, k, l = 1, 2, · · · , n
{j|(i,j)∈A} {j|(j,i)∈A} (2.5)
Xn X n
xkl
ij ≤ uij , (i, j) ∈ A,
k=1 l=1
xkl
ij ≥ 0, (i, j) ∈ A, k, l = 1, 2, · · · , n.
The first constraint is a flow conservation constraint at node i for data with origin
k and destination l. The expression {j|(i,j)∈A} xkl

P
ij represents the amount of data
with origin and destination k and l, respectively, that leave node i along some link.
The expression {j|(j,i)∈A} xkl

P
ji represents the amount of data with the same origin and
destination that enter node i through some link. Finally, bkl

i is the net amount of such
data that enter node i from outside the network. The second constraint expresses
the requirement that the total traffic through a link (i, j) cannot exceed the link’s
capacity. The last constraint is a non-negativity constraint which is required for the
feasibility of the solution.
29
a
7
8
5 b c
7
9 5
15
d e
6 9
8
11
f g
Figure 2.5: An illustration of the multi-commodity network flow example.
2.6.3 D2D Caching Networks
We consider a wireless network consisting of a set of N users N = {1, 2, · · · , N }
and a carrier who supplies M data items upon demand. Each data item m ∈
{1, 2, · · · , M } has a size Sm > 0. We also consider a time-slotted system where
the carrier divides the day into T time slots. The probability that user n requests
item m in time slot t is denoted by pm

n,t . We assume that users are moving around
many locations. The carrier is interested in L popular locations L = {1, 2, · · · , L}
like airports, schools, shopping malls, stadiums or governmental buildings where high
demand can be related to user mobility. The probability that user n will be present
l
where Ll=1 θn,t
l
P
at location l in time slot t is denoted by θn,t = 1 ∀n, t.
Each user n has an isolated cache memory of size Zn . The carrier caches an amount
xm
n of content m in the device of user n at time slot 0 and then lets users share it
together for t ≥ 1. Therefore, the carrier smooths out the network load by caching
some of the data items at the network edge and exploits user mobility to enhance
its caching decision. We assume that a device-to-device (D2D) communication is
30
allowed and can be used to transfer data items between users. Users occupy part of
their devices memory for caching these data items and consume some of their battery
to transfer it through the D2D communication. We capture this cost by a reward
corresponding to each cached byte denoted by r > 0. The carrier’s objective is to find

an optimal proactive service policy xm∗
n , ∀n, m which minimizes the time-averaged
expected cost while delivering the requested data items on time to all users. The
optimal solution is found by solving the following problem:

T X
X N X
M
minimize xm
n r− m
αn,t
t=1 n=1 m=1
XM
subject to xmn ≤ Zn , ∀n,
m=1
(2.6)
N
X
l
θn,t xm
n ≤ Sm , ∀m, l, t
n=1
0 ≤ xm
n ≤ Sm , ∀n, m.
where,
L
X N
X
m l
αn,t = θn,t pm l
k,t θk,t , (2.7)
l=1 k=1
m
Note that αn,t captures the demand and mobility profiles of all users. Also, we notice
that the term N m l

P
k=1 pk,t θk,t captures the total expected number of requests for item
m at location l in time slot t. So, the higher the probability that item m will be
requested at location l in time slot t and the higher the probability that user n will
be present at that location in this time slot, the more amount of this item will be
m
cached at user n. Also, for each user n and for every content m, when r > αn,t the
carrier decides xm∗ m∗

n = 0 and decides xn ≥ 0 otherwise based on the remaining space
in user’s memory. More details and results are provided in [2]
31
Data Sources
Service Provider
User 1 User 2 User N
.. .. ..
. D2DLink
. .
D2DLink
D2DLink
End Users
Figure 2.6: An illustration of the D2D caching networks example.
32
Chapter 3: Large Scale Linear Programs
In the previous chapter, we discussed the basics of linear programming in de-
tails. Furthermore, we showed how to use duality theory to solve linear optimization
problems. In this chapter, we extend our discussion to consider large scale linear op-
timization problems. Many practical applications require a large number of variables
or constraints which leads to a tremendous increase in memory and computational
requirements of the system. For example, in the proactive caching problem men-
tioned in Section 2.6.3, we need to consider the case when the number of users and
the number of constraints are very large. This type of linear optimization problems
requires specialized algorithms to find an optimal solution efficiently.
Recent improvements in linear optimization techniques allow us to deal with large
scale problems. The complexity of these problems arises when the dimension of matrix
A increases. For instance, in the simplex method, identifying the entering and exiting
columns among a massive number of columns in an (m × n) matrix A consumes most
of the memory resources. A proper modification of the simplex method allows us to
find a solution to such large scale problems.
In this chapter, we present some methods for solving linear programming problems
with a large number of variables or constraints. We shed light on delayed column gen-
eration where columns of the matrix A are generated only when they are required.
33
We also present its dual, the ”cutting plane” method, in which the feasible set is
approximated using only a subset of the constraints. We also introduce the decom-
position algorithm found by Dantzig-Wolfe [3]. It is used for linear programming
problems whose constraints can be divided into two sets: the first set includes gen-
eral constraints Ax ≥ b; while the second set has constraints with a special structure.
These methods are illustrated through a classical application, the cutting-stock prob-
lem presented by Gilmore and Gomory [4]. We close this chapter by surveying some
of the applications to the mentioned methods in wireless communication networks.
3.1 Delayed Column Generation Method
The delayed column generation method was first presented by Dantzig and Wolfe
in 1960 [3] and Gomory and Gilmore in 1961 [4]. This method still has a great interest
and so many recent applications in the literature [5, 6]. For example, the generalized
bin packing problem (GBPP) is a novel packing problem arising in many transporta-
tion and logistic settings, characterized by multiple items and bins attributes and the
presence of both compulsory and non-compulsory items. The computational com-
plexity and the approximability of the GBPP is discussed in [7]. A presentation of
the main mathematical models and an experimental evaluation of the main available
software tools for the one-dimensional bin packing problem is introduced in [8]. A
generalization of the classical multiple knapsack problem, in which instead of packing
single items we are packing groups of items, is discussed in [9]. Such a general model
finds applications in various practical problems, e.g., delivering bundles of goods and
also in caching networks.
34
Some linear programming problems become intractable because of the large num-
ber of variables involved. Moreover, it becomes more difficult to find an optimal
solution satisfying a huge number of constraints. Assuming n m (i.e. the number
of variables is much larger than the number of constraints), most of the variables will
be non-basic and hence only a subset of variables need to be considered when solving
the problem. Column generation leverages this idea to generate only the variables
which have the potential to improve the objective function; that is, to find variables
with negative reduced cost. The problem being solved is split into two problems: the
master problem and the sub-problem. The master problem is the original problem
with only a subset of variables being considered. The sub-problem is another problem
created to identify a new variable to enter the basis.
Now, consider the standard form problem

0
minimize cx
x
subject to Ax = b
x ≥ 0.
with x ∈ Rn and b ∈ Rm and the rows of A are linearly independent. If the number
of columns of A is so large, then it is not practical to generate and store the entire
matrix A in memory as done in the full tableau method for example. Moreover, in
many problems, most of the columns of A never enter the basis. Therefore, we don’t
have to generate these unused columns. In particular, the revised simplex method,
at any given iteration, requires the current basic columns and the column which is to
enter the basis. Consequently, we need an efficient method for recognizing variables
xi with negative reduced costs c¯i without having to generate all columns. Sometimes,
35
this can be accomplished by solving the problem
minimize c¯i (3.1)

i
In many cases, this optimization problem has a special structure, that is a smallest
c¯i can be found without computing every c¯i . If the minimum of this optimization
problem is greater than or equal to 0, all reduced costs are non-negative and we have
an optimal solution to the original linear programming problem. On the other hand,
if the minimum is negative, the variable xi corresponding to a minimizing index i
has negative reduced cost, and the column Ai can enter the basis. The key to this
approach is our ability to solve the optimization problem (3.1) efficiently without
using so much memory.
In the delayed column generation method, the columns that exit the basis are
discarded from memory. In a variant of this method, the algorithm retains in memory
all or some of the columns that have been generated in the past, and proceeds in terms
of restricted linear programming that involves only the retained columns. To clarify
the idea, let us consider a sequence of master iterations. At the beginning of a master
iteration, we have a basic feasible solution to the original problem, and an associated
basis matrix. For each master iteration, we do the steps defined in Algorithm 3.
Note that step (1) in Algorithm 3 may require to go over all columns of A to find
a variable with negative reduced cost. An alternative to this method is to solve the
master problem for some set I. From this solution, we are able to obtain dual prices
for each of the constraints in the master problem (recall that the dual prices are the
0 0
elements of the vector p = cB B−1 ). This information is then utilized in the objective
function of the subproblem. After solving the subproblem, if the objective value of
the subproblem is negative, a variable with negative reduced cost has been identified.
36
Algorithm 3 Delayed Column Generation Master Iteration
1: We search for a variable with negative reduced cost, possibly by minimizing c̄i over all
i using (3.1). If none is found, the algorithm terminates and this solution is optimal.
2: Suppose that we have found some j such that c¯j < 0. We form a collection of columns
Ai , i ∈ I, which contains all of the basic columns, the entering column Aj , and possi-
bly some other columns as well.
3: Define the restricted problem
X
minimize ci x i
x
i∈I
X
subject to Ai x i = b (3.2)
i∈I
x ≥ 0.
4: The basic variables at the current basic feasible solution to the original problem are
among the columns that have been kept in the restricted problem. Therefore, we have
a basic feasible solution to the restricted problem, which can be used as a starting point
for its optimal solution.
5: We perform as many simplex iterations as needed until we find an optimal solution to
the restricted problem.
This variable is then added to the master problem, and the master problem is re-
solved. Re-solving the master problem will generate a new set of dual values, and the
process is repeated until no negative reduced cost variables are identified. We can
conclude that the solution to the master problem is optimal.
The delayed column generation method is a special case of the revised simplex
method with some special rules for choosing the entering variable that give priority
to the variables xi , i ∈ I; only when the reduced costs of these variables are all non-
negative. We wish to give priority to variables for which the corresponding columns
have already been generated and stored in memory. There are several variants of this
method, depending on how the set I is chosen at each iteration. We summarize these
variants as follows:
37
(a) I is just the set of indices of the current basic variables together with the entering
variable. A variable that exits the basis is immediately dropped from the set I.
Since the restricted problem has m + 1 variables and m constraints, its feasible
set is at most one-dimensional, and it gets solved in a single simplex iteration,
that is, as soon as the column Aj enters the basis.
(b) I is the set of indices of all variables that have become basic at some point in
the past; equivalently, no variables are ever dropped, and each entering variable
is added to I. The set I keeps growing and hence this option is not preferred
when the number of master iterations is large.
(c) The set I is kept to a moderate size by dropping those variables that have exited
the basis in the remote past and have not reentered again.
These variants are guaranteed to terminate in the absence of degeneracy. In the
presence of degeneracy, cycling can be avoided by using the lexicographic tie breaking
rule for example [10], [11].
3.2 Cutting Plane Method
Delayed column generation methods in terms of the dual variables can be described
as delayed constraint generation or cutting plane methods. Consider the dual problem
of the standard form problem

0
maximize pb
p
(3.3)
0
subject to p Ai ≤ ci , i = 1, 2, · · · , n
We assume that it is impractical to generate and store each one of the columns Ai
because the number n is very large. Instead, we consider a subset I of {1, 2, · · · , n}
38
and form the relaxed dual problem
0
maximize pb
p
(3.4)
0
subject to p Ai ≤ ci , i ∈ I
Let p∗ be an optimal basic feasible solution to the relaxed dual problem. There are
two possibilities:
(a) p∗ is a feasible solution to the original problem (3.3). Any other feasible solution
p to the original problem (3.3) is also a feasible solution for the relaxed problem
(3.4) because the latter has fewer constraints. Therefore, by optimality of p∗

0 0
for the relaxed problem (3.4), we have p b ≤ (p∗ ) b. Hence, p∗ is an optimal
solution to the original problem (3.3) and the algorithm terminates.
(b) If p∗ is infeasible for the original problem (3.3), we find a violated constraint,
add it to the constraints of the relaxed dual problem and continue similarly.
Therefore, we need a method for checking the feasibility of vector p∗ to the original
dual problem (3.3). We also need an efficient method to identify a violated constraint.
This is known as the separation problem, because it amounts to finding a hyperplane
that separates p∗ from the dual feasible set. This can be done by solving the problem
0
minimize ci − (p∗ ) Ai (3.5)
i
over all i. If the optimal solution of this problem is non-negative, we have a feasible
solution to the original dual problem. If it is negative, the corresponding index of

0
an optimizer, i, satisfies ci < (p∗ ) Ai , and we have identified a violated constraint.
The success of this approach hinges on our ability to solve the problem (3.5) effi-
ciently; fortunately it is sometimes possible. In addition, there are cases where the
39
optimization problem (3.5) is not easily solved but one can test for feasibility and
identify violated constraints using other means such as those used for integer linear
programming [12].
Applying the cutting plane method to the dual problem is identical to applying the
delayed column generation method to the primal. Furthermore, the relaxed problem
(3.4) is the dual of the restricted primal problem (3.2). In some cases, we may have
a primal problem (not in a standard form) that has relatively few variables but a
very large number of constraints. In that case, it makes sense to apply the cutting
plane algorithm to the primal; equivalently, we can form the dual problem and solve
it using the delayed column generation method.
3.3 Dantzig-Wolfe Decomposition
Another method to solve large scale linear programming is the decomposition
algorithm proposed by Dantzig and Wolfe [3]. Dantzig-Wolfe decomposition has been
an important tool to solve large structured models that could not be solved using
a standard Simplex algorithm as they exceeded the capacity of those solvers. With
the current generation of simplex and interior point LP solvers and the enormous
progress in standard hardware (both in terms of raw CPU speed and availability of
large amounts of memory), the Dantzig-Wolfe algorithm has become less popular.
The decomposition algorithm is a procedure for the solution of linear programs using
a generalized extension of the simplex method. The solution is obtained by solving
a sequence of linear programs each of smaller size than the original. Dantzig–Wolfe
decomposition relies on delayed column generation for improving the tractability of
large-scale linear programs. To illustrate the idea of this decomposition method,
40
consider a linear programming problem of the form
0 0
minimize c1 x1 + c2 x2
x
subject to D1 x1 + D2 x2 = b0
F1 x1 = b1 (3.6)
F2 x2 = b2
x1 , x2 ≥ 0.
Supose that x1 and x2 are vectors of dimensions n1 and n2 , respectively, and that
b0 , b1 , b2 have dimensions m0 , m1 , m2 , respectively. Thus, besides nonnegativity con-
straints, x1 satisfies m1 constraints, x2 satisfies m2 constraints, and x1 , x2 together
satisfy m0 coupling constraints. Note that, D1 , D2 , F1 , F2 are matrices of appropriate
dimensions. Often, the number of coupling constraints is a small fraction of the total
constraints (i.e. m0 m).
The first step of this method is to introduce an equivalent problem, with fewer
equality constraints, but many more variables. The original problem is reformulated
into a master program and n subprograms. This reformulation relies on the fact that
any element of a polyhedron that has at least one extreme point can be represented
as convex combination of extreme points plus a nonnegative linear combination of
extreme rays.
Definition 8. A nonzero element x of a polyhedral cone C = {x ∈ Rn |Ax ≥ 0} is
called an extreme ray if there are n − 1 linearly independent constraints that are active
at x. Moreover, an extreme ray of the characteristic cone C associated with a nonempty
polyhedron P = {x ∈ Rn |Ax ≥ b} is also called an extreme ray of P .
41
Now, we define n o
P1 := x1 ≥ 0 : F1 x1 = b1
n o
P2 := x2 ≥ 0 : F2 x2 = b2
we assume that P1 and P2 are non-empty. Then the problem stated in (3.6) can be
rewritten as 0 0
minimize c1 x1 + c2 x2
x
subject to D1 x1 + D2 x2 = b0
x1 ∈ P 1
x2 ∈ P 2 .
For i = 1, 2, let xji , j ∈ Ji be the extreme points of Pi . Let also wik , k ∈ Ki be
a complete set of extreme rays of Pi . Using Minkowski’s (resolution) theorem, any
element xi of Pi can be represented in the form

X X
xi = λji xji + θik wik
j∈Ji k∈Ki
where the coefficients λji and θik are nonnegative and satisfy
X
λji = 1, i = 1, 2 (3.7)
j∈Ji
The original problem (3.6) can be rewritten as

X j 0 j X 0
X j 0 j X 0
minimize λ1 c1 x1 + θ1k c1 w1k + λ2 c1 x2 + θ2k c2 w2k
x
j∈J1 k∈K1 j∈J2 k∈K2
D1 xj1 D2 xj2
   
X X
subject to λj1  1 + λj2  0 
j∈J1 0 j∈J2 1 (3.8)
     
X D1 w1k X D2 w2k b0
+ θ1k  0 + θ2k  0 = 1 
k∈K1 0 k∈K2 0 1
λji ≥ 0, θik ≥ 0, ∀i, j, k.
This problem is called the master problem. Note that the original problem has
m0 + m1 + m2 equality constraints while this master problem has only m0 + 2 equal-
ity constrains which are the coupling constraints plus the constraints in (3.7). On
42
the other hand, the number of decision variables in the master problem could be ex-
tremely large because the number of extreme points and rays is usually exponential
in the number of variable and constraints. Therefore, we can see that the delayed
column generation is the centerpiece of the decomposition algorithm where a column
is generated only after it is found to have a negative reduced cost and is about to enter
the basis. we need to use the revised simplex method which, at any iteration, involves
only m0 + 2 basic variables and a basis matrix of dimension (m0 + 2) × (m0 + 2).
Suppose that we have a basic feasible solution to the master problem associated
with a basis matrix B and that B−1 is available. since we have m0 + 2 equality
0 0
constraints, the dual vector p = cB B−1 has dimension m0 +2. Its first m0 components
denoted by q are the dual variables associated with the equality coupling constraints
in (3.8). The last two components, denoted by r1 and r2 , are the dual variables
associated with the convexity constraints (3.7) for i = 1, 2, respectively. In particular
p = (q, r1 , r2 ). We need to examine the reduced costs of different variables and check
whether any one of them is negative. The reduced cost of the variable λj1 is given by
j
 
D1 x 1
0 0 0 0
c1 xj1 − q r1 r2  1  = (c1 − q D1 )xj1 − r1 .

Similarly, the reduced cost of the variable θ1j is given by

 k

D 1 w 1
0 0 0 0
c1 w1k − q r1 r2  0  = (c1 − q D1 )w1k .

Instead of evaluating the reduced cost of every variable λj1 and θ1k , and checking its
sign, we form the following linear programming problem

0 0
minimize (c1 − q D1 )x1
x1
subject to x1 ∈ P1 .
43
which is called the first subproblem and can be solved by the simplex method. Simi-
larly, for the variables λj2 and θ2k , we can form the second subproblem
0 0
minimize (c2 − q D2 )x2
x2
subject to x2 ∈ P2 .
and solve it using the simplex method as well. The decomposition method is sum-
marized in Algorithm 4. Note that the sub-problems are smaller linear programming
problems that are employed as economical search method for discovering columns
with negative reduced costs.
Algorithm 4 Dantzig-Wolfe Decomposition Algorithm

1: Start with a basic feasible solution to the master problem, the corresponding inverse
0
basis matrix B−1 and the dual vector p = (q, r1 , r2 ) = cB B−1 .
2: Form and solve the two sub-problems. If the optimal cost in the first sub-problem is
≥ r1 and the optimal cost in the second sub-problem is ≥ r2 , then all reduced costs in
the master problem are non-negative, we have an optimal solution, and the algorithm
terminates.
3: If the optimal cost in the ith sub-problem is −∞, we obtain an extreme ray wik , asso-
ciated with a variable θ + ik , whose reduced cost is negative. This variable can enter
the basis in the master problem.
4: If the optimal cost in the ith sub-problem is finite and les than ri , we obtain an extreme
point xji , associated with a variable λji , whose reduced cost is negative. This variable
can enter the basis in the master problem.
5: Having chosen a variable to enter the basis, generate the column associated with that
variable, carry out an iteration of the revised simplex method for the master problem
and update B−1 and p.
This method generalizes to problems of the form

0 0 0
minimize c1 x1 + c2 x2 + · · · + ct xt
x
subject to D1 x1 + D2 x2 + · · · + Dt xt = b0
(3.9)
Fi xi = bi , i = 1, 2, · · · , t
x1 , x2 , · · · , xt ≥ 0.
44
The only difference is that at each iteration of the revised simplex method with the de-
layed column generation for the master problem, we may have to solve t sub-problems.
In fact the method is applicable even if t = 1. Consider the linear programing problem
0
minimize c1 x
x
subject to Dx = b0
(3.10)
Fx = b
x ≥ 0.
The equality constraints have been partitioned intwo two sets, and define the poly-
hedron P = {x ≤ 0|Fx = b}. By expressing each element of P in terms of extreme
points and extreme rays, we obtain a master problem with a large number of columns,
but a smaller number of equality constraints. Searching for columns with negative
reduced cost in the master problem is then accomplished by solving a single sub-
problem, which is a minimization over the set P . This approach can be useful if the
subproblem has a special structure and can be solved very fast. Finally, note that
the decomposition methods assumes that all constraints are in standard form and the
feasible sets Pi of the sub-problems are also in standard form. This assumption is
hardly necessary. For example if we assume that the sets Pi have at least one extreme
point, the resolution theorem and the same line of development applies.
3.4 The Cutting Stock Problem
The cutting stock problem is the problem of cutting standard-sized pieces of stock
material, such as paper rolls or sheet metal, into pieces of specified sizes while min-
imizing material wasted. It is an optimization problem in mathematics that arises
from applications in industry. In terms of computational complexity, the problem
45
is an NP-hard problem reducible to the knapsack problem [13]. It can also be for-
mulated as an integer linear programming problem by solving the real cutting stock
problem and then approximating the results to integers. It was first formulated by
Kantorovich in 1939 [14]. In 1951, before computers became widely available, L. V.
Kantorovich and V. A. Zalgaller suggested solving the problem of the economical
use of material at the cutting stage with the help of linear programming [15]. The
proposed technique was later called the column generation method.
Consider a paper company that has a supply of large rolls of paper, each of width
W . We assume that W is a positive integer. The company receives customer demand
for smaller widths of paper. In particular, bi rolls of width wi , i = 1, 2, · · · , m, need to
be produced. We also assume that each wi is an integer and that wi ≤ W, ∀i. Smaller
rolls are obtained by slicing a large roll in a certain way, called a pattern. For example,
a large roll of width 70 can be cut into three rolls of widths w1 = 17 and one roll
of width w2 = 15, with a waste of 4. In general, the jth pattern can be represented
by a column vector Aj whose ith entry aij indicates how many rolls of width wi are
produced by that pattern. For example, the pattern described earlier is represented
by the vector (3, 1, 0, · · · , 0). For a vector (a1j , · · · , amj ) to be a representation of a
feasible pattern, its components must be non-negative integers and we must have
m
X
aij wi ≤ W (3.11)
i=1
Let n be the number of all feasible patterns and consider the m × n matrix A with
columns Aj , j = 1, 2, · · · , n. The goal of the company is to minimize the number of
large rolls used while satisfying customer demand. Let xj be the number of large rolls
46
cut according to pattern j. Then, the problem will be
n
X
minimize xj
x
j=1
n
X (3.12)
subject to aij xj = bi , i = 1, 2, · · · , m,
j=1
xj ≥ 0, j = 1, 2, · · · , n.
Naturally, each xj should be an integer and we have an integer programming problem.
However, rounding the solution of 3.12 often provides a feasible solution to the integer
programming problem, which is fairly close to optimal at least if the demands bi are
reasonably large.
The difficulty of the problem lies in the large number of cutting patterns (columns)
that may be encountered [4]. For example, with a standard roll of 200 in. and
demands for 40 different lengths ranging from 20 in. to 80 in., the number of cutting
patterns can easily exceed 10 million or even 100 million. This happens in practical
problems and, in this case, we are facing a complicated linear programming problem.
However, the problem can be solved efficiently, by using the revised simplex method
and by generating columns of A as needed rather than in advance.
For an initial basic solution, we may let the jth pattern consist of one roll of width
wj for j = 1, 2, · · · , m, and none of the other widths. Then the first m columns of
A form a basis that leads to a basic feasible solution. Now, suppose we have a basis
matrix B and an associated basic feasible solution, and that we wish to carry out the
next iteration of the revised simplex method. Because the cost coefficient of every
varible xj is unity, every component of the vector cB is equal to 1. We compute the

0 0 0
simplex multipliers p = cB B−1 . Instead of computing the reduced cost c̄j = 1−p Aj
47
associated with every column (pattern) Aj , we consider the problem
0
minimize 1 − p Aj (3.13)
j
0
This is the same as maximizing p Aj over all j. If the maximum is less than or
equal to 1, all reduced costs are non-negative and we have an optimal solution. On
the other hand, if the maximum is greater than 1, the column Aj corresponding to
a maximizing j has negative reduced cost and enters the basis. We now have the
problem
m
X
maximize p i ai
a
i=1
m
X
subject to w i ai ≤ W (3.14)
i=1
ai ≥ 0, i = 1, 2, · · · , m,
ai integer, i = 1, 2, · · · , m.
This problem is called the integer knapsack problem. Solving the knapsack problem
requires some effort, but for the range of numbers that arise in the cutting stock
problem, this can be done fairly efficiently. The knapsack problem has well-known
methods to solve it, such as branch and bound [16] and dynamic programming [17].
3.5 Applications in Wireless Communication
Although the delayed column generation method goes back to the 1960’s, it started
recently to find its way in so many applications related to the wireless communication
and the machine learning fields. Researchers started to pay attention to large scale
linear programming techniques to solve resource allocation and caching problems. On
the same vein, machine learning and big data science has a lot of linear problems with
a very large number of variables and constraints. In this section, we shed light on
48
some examples to illustrate the importance of the large scale linear programming in
these fields.
Optimizing the throughput capacity over a multihop wireless network is studied
in [18]. The main thread is to apply a multi-commodity flow (MCF) formulation,
discussed in Section 2.6.2, augmented with a scheduling constraint derived from the
conflict graph associated with the network. A fundamental issue with the conflict
graph based MCF formulation is that finding all independent sets (ISs) for scheduling
is NP-hard in general. By expressing the MCF formulation in a matrix format, the
constraint matrix will contain a very large number of columns, with each IS being
associated with one column. The complexity of this approach is resolved using the
delayed column generation (DCG) method. Furthermore, the DCG method is also
applied to the multi-radio multi-channel (MR-MC) networks. It was shown than the
DCG method achieves the most preferred trade-off between computation complexity
and network capacity and maintains good scalability when addressing large-scale
networks, particularly in the complex MR-MC context.
A joint power control and transmission scheduling problem in wireless networks
with average power constraints is studied in [19]. Network utility optimization prob-
lem involving time-sharing across different “transmission modes” was introduced.
The structure of the optimal solution is a time-sharing across a small set of such
modes. This structure was used to develop an efficient heuristic approach to finding
a suboptimal solution through column generation iterations. This heuristic approach
converges quite fast in simulations, and provides a tool for wireless network planning.
Routing in Delay-Tolerant Networks (DTN) has drawn much research effort re-
cently. Since many different kinds of networks fall in the DTN category, many routing
49
approaches have been proposed. Such systems can benefit from a previously proposed
routing algorithm based on linear programming that minimizes the average message
delay [20, 21]. This algorithm, however, is known to have performance issues that
limit its applicability to very simple scenarios. An alternative linear programming
approach for routing in Delay-Tolerant Networks is proposed in [22]. It was shown
that the proposed formulation is equivalent to that presented in a seminal work in
this area, but it contains fewer LP constraints and has a structure suitable to the
application of Column Generation (CG). Simulation shows that the proposed CG
implementation arrives at an optimal solution up to three orders of magnitude faster
than the original linear program in the considered DTN examples.
A joint caching, routing, and channel assignment for video delivery over coordi-
nated small-cell cellular systems of the future Internet is considered in [23]. The prob-
lem of maximizing the throughput of the system was formulated as a linear program
in which the number of variables is very large. To address channel interference, the
proposed formulation incorporates the conflict graph that arises when wireless links
interfere with each other due to simultaneous transmission. The column generation
method was used to solve the problem by breaking it into a restricted master subprob-
lem that involves a select subset of variables and a collection of pricing subproblems
that select the new variable to be introduced into the restricted master problem. The
proposed framework demonstrates considerable gains in average transmission rate at
which the video data can be delivered to the users, over the state-of-the-art femto-
caching systems, of up to 46%. These operational gains in system performance map
to analogous gains in video application quality, thereby enhancing the user experience
considerably.
50
Chapter 4: Implementation of Large Scale Linear Programs
The advances in computing in the past decades allowed us to find many software
packages to solve linear programs. Nevertheless, problems with few thousands of
variables and constraints can be seen as small problems. Problems with tens or even
hundreds of thousands of variables are usually solvable. Linear programming software
packages come in two different kinds. Some of them are algorithmic codes devoted
to finding optimal solutions to specific linear programs. They take the input as a
compact list of the linear program constraint coefficients (A, b, c and related values
in the standard form) and produce the output as a compact list of optimal solution
values and related information. Other packages are considered as modeling systems
which allow people to formulate their own linear programs and analyze their solutions.
Most modeling systems support a variety of algorithmic codes, while the more popular
codes can be used with many different modeling systems. Conversion to the forms
required by algorithmic codes is done automatically in these modeling systems [24]. In
this chapter we shed light on some popular modeling software packages and illustrate
how to use them through the examples discussed in Chapter 2. We investigate how
to implement the cutting stock problem, discussed in Section 3.4, using AMPL. The
multi-commodity network flow example, discussed in Section 2.6.2, is implemented
51
using GAMS. The implementation of the D2D caching network example, discussed
in Section 2.6.3, is introduced using Matlab.
4.1 AMPL Programming Language
A Mathematical Programming Language (AMPL) is a modeling language that
can be used to describe and solve high-complexity problems for large-scale mathe-
matical computing (i.e., large-scale optimization and scheduling-type problems). It
was developed by Robert Fourer, David Gay, and Brian Kernighan at Bell Laborato-
ries [25]. AMPL offers an interactive command environment for setting up and solving
mathematical programming problems. A flexible interface enables several solvers to
be available, both open source and commercial software, including CBC, CPLEX,
FortMP, Gurobi, MINOS, IPOPT, SNOPT, KNITRO, and LGO. It has a syntax
very similar to the mathematical notation of optimization problems, which allows for
a very concise and readable definition of problems in the domain of optimization.
Once optimal solutions have been found, they are translated back to the modeler’s
form so that they can viewed and analyzed easily.
AMPL has a variety of options to format data for browsing, printing reports,
or preparing input to other programs. In addition, AMPL is readily available for
experiment: the AMPL web site, www.ampl.com, provides free downloadable student
versions and representative solvers that run on Windows, Unix/Linux, and Mac OS
X. In this section, we briefly discuss how to use AMPL to model large scale linear
programs like the cutting stock problem, discussed in 3.4. Our objective is to cover
the main features of this software package and how to use it in solving these problems.
52
4.1.1 Implementation of The Cutting Stock Problem using AMPL
The cutting stock problem, discussed in Section 3.4 is a typical example to il-
lustrate the column generation method. In this problem, we wish to cut up long
raw widths of some commodity, such as rolls of paper, into a combination of smaller
widths that meet given orders with as little waste as possible. The Gilmore-Gomory
procedure defines a cutting pattern to be any feasible way in which a raw roll can be
cut. Thus, a pattern is a vector consisting of a certain number of rolls of each desired
width, such that their total width does not exceed the raw width. The Gilmore-
Gomory procedure consists of a main problem and a knapsack sub-problem. The
main problem finds the minimum number of raw rolls that need be cut, given a
collection of known cutting patterns that may be used. The sub-problm seeks to
identify a new pattern that can be used in the cutting optimization, either to reduce
the number of raw rolls needed, or to determine that no such new pattern exists. The
variables of this model are the numbers of each desired width in the new pattern; the
feasibility constraint 3.11 ensures that the total width of the pattern does not exceed
the raw width. This procedure is described in the following algorithm.
Algorithm 5 The Gilmore-Gomory Procedure

Pick initial patterns sufficient to meet demand
repeat
Solve the (fractional) cutting stock optimization problem
Let price[i] equal Fill[i].dual for each pattern i
Solve the pattern generation sub-problem
if the optimal value is < 0 then
add a new pattern that cuts Use[i] rolls of each with i
else
find a final integer solution and stop
end if
until stop
53
The complete implementation code is provided in Appendix A. AMPL allows us
to define two problem statements, one for the main problem and another one for the
sub-problem.
problem Cutting_Opt: Cut, Number, Fill;
option relax_integrality 1;
problem Pattern_Gen: Use, Reduced_Cost, Width_Limit;
The first statement defines a problem named Cutting Opt that consists of the
Cut variables, the Fill constraints, and the objective Number. This is defined in
the statement
minimize Number:
sum {j in PATTERNS} Cut[j];
subject to Fill {i in WIDTHS}:
sum {j in PATTERNS} nbr[i,j] * Cut[j] >= orders[i];
Comparing the definition of the Cutting Opt problem with (3.12), we see that
Number is the objective function, Cut represents the optimization variables xj , Fill
represents the constraint where nbr are the coefficients aij and orders are the con-
straint values bi . In a similar way, we define a problem Pattern Gen that consists of
the Use variables, the Width Limit constraint, and the objective Reduced Cost.
Which is also defined as
minimize Reduced_Cost:
1 - sum {i in WIDTHS} price[i] * Use[i];
54
subject to Width_Limit:
sum {i in WIDTHS} i * Use[i] <= roll_width;
Comparing the definition of the Pattern Gen problem with (3.14), we see that
Use corresponds to ai , price corresponds to pi and roll width corresponds to W .
The for loop creates the initial cutting patterns, after which the main repeat loop
carries out the Gilmore-Gomory procedure as described previously. The statement
solve Cutting_Opt;
sets the Cutting Opt as the current problem, along with its environment, and
solves the associated linear program. A similar statement is defined for the Pattern Gen
problem. An example for this problem is for a roll width of 110 and required demands
of 48, 35, 24, 10 and 8 for finished rolls of widths 20, 45, 50, 55 and 75, respectively.
Running the script mentioned in Appendix A, we get the following result
CPLEX 12.6.3.0: optimal solution; objective 52.1
0 dual simplex iterations (0 in phase I)
CPLEX 12.6.3.0: optimal integer solution; objective -0.2
2 MIP simplex iterations
0 branch-and-bound nodes
No basis.
55
No basis.
CPLEX 12.6.3.0: optimal solution; objective 47
No basis.
CPLEX 12.6.3.0: optimal integer solution; objective -1e-06
No basis.
nbr [*,*]:=
: 1 2 3 4 5 6 7 8
20 5 0 0 0 0 1 1 3
45 0 2 0 0 0 0 2 0
50 0 0 2 0 0 0 0 1
55 0 0 0 2 0 0 0 0
75 0 0 0 0 1 1 0 0;
Cut [*] := 1 0 2 0 3 8.25 4 5 5 0 6 8 7 17.5 8 7.5;
The final fractional solution means that a pattern of (0, 0, 2, 0, 0) will be generated
8.25 times, a pattern of (0, 0, 0, 2, 0) will be generated 5 times, a pattern of (1, 0, 0, 0, 1)
will be generated 8 times, a pattern of (1, 2, 0, 0, 0) will be generated 17.5 times and
56
finally a pattern of (3, 0, 1, 0, 0) will be generated 7.5 times. The best fractional
solution cuts 46.25 raw rolls in five different patterns, using 48 rolls if the fractional
values are rounded up to the next integer.
4.2 GAMS Programming Language
The General Algebraic Modeling System (GAMS) is a high-level modeling system
for mathematical optimization. GAMS is designed for modeling and solving linear,
nonlinear, and mixed-integer optimization problems. The system is tailored for com-
plex, large-scale modeling applications and allows the user to build large maintainable
models that can be adapted to new situations. GAMS was first presented at the In-
ternational Symposium on Mathematical Programming (ISMP), Budapest, Hungary
in 1976. GAMS allows the user to concentrate on the modeling problem by making
the setup simple. The system takes care of the time-consuming details of the specific
machine and system software implementation. GAMS contains an integrated devel-
opment environment (IDE) and is connected to a group of third-party optimization
solvers. Among these solvers are BARON, COIN-OR solvers, CONOPT, CPLEX,
DICOPT, Gurobi, MOSEK, SNOPT, SULUM, and XPRESS [26]. We illustrate how
to use GAMS to implement the multi-commodity network flow example, discussed in
Section 2.6.2, using the Dantzig-Wolfe decomposition algorithm.
4.2.1 Implementation of Dantzig-Wolfe Decomposition Method using

GAMS
The implementation of Dantzig-Wolfe decomposition algorithm was discussed in
[27]. In this section, we highlight how GAMS can be used to implement a multi-
commodity network flow problem. The definition of this problem was discussed in
57
Section 2.6.2 and the complete implementation code is provided in Appendix B. In
the beginning, we define the settings of the problem including the number of nodes
and commodities. In this example we have 10 nodes and 5 commodities.
$if NOT set nodes $set nodes 20
$if NOT set comm $set comm 5
GAMS allows us you to specify indices in a straightforward way: declare and name
the set (here, i, k and e(i, k)), and enumerate their elements.
sets i nodes / n1*n%nodes% /
k commodities / k1*k%comm% /
e(i,i) edges
alias (i,j)
Indexed parameters are defined to store the cost of each link cij , the balance bki ,
the demand bk and the capacity uij . Notice that the commodity is indexed by k only
instead of using k and l as discussed in Section 2.6.2. GAMS also allows us to place
explanatory text (shown in lower case) throughout the model, as we develop it. These
comments are automatically incorporated into the output report, at the appropriate
places.
parameters
cost(i,j) cost for edge use
bal(k,i) balance
kdem(k) demand
cap(i,j) bundle capacity ;
58
Decision variables are expressed with their indices specified, where cost cor-
responds to cij in (2.5), bal corresponds to bki , kdem corresponds to bk and cap
corresponds to uij . From this general form, GAMS generates each instance of the
variable in the domain. Variables are specified as to type: FREE, POSITIVE, NEG-
ATIVE, BINARY, or INTEGER. The default is FREE. The objective variable (z,
here) is simply declared without an index. Here, the optimization variable x is defined
to be a positive number by feasibility constraint.
variables
x(k,i,j) multi commodity flow
z objective
positive variable x;
The objective function and constraint equations are first declared by giving them
names. Then their general algebraic formulae are described. GAMS now has enough
information (from data entered above and from the algebraic relationships specified in
the equations) to automatically generate each individual constraint statement. Notice
that these equations are typically defined as mentioned in Section 2.6.2.
equations
defbal(k,i) balancing constraint
defcap(i,j) bundling capacity
defobj;
defobj.. z =e= sum((k,e), cost(e)*x(k,e));
defbal(k,i).. sum(e(i,j), x(k,e)) - sum(e(j,i),x(k,e))
=e= bal(k,i);
59
defcap(e).. sum(k, x(k,e)) =l= cap(e);
The model is given a unique name (here, mcf multi-commodity flow problem),
and the modeler specifies which equations should be included in this particular for-
mulation. In this case we specified ALL which indicates that all equations are part
of the model.
model mcf multi-commodity flow problem /all/;
A random instance is generated here for testing. However, we could set exact
values for the link cost cij , the balance bki , the demand bk and the capacity uij . The
model checks whether the generated instance is feasible. In this case, the model is
solved by this statement
solve mcf min z using lp;
The solve statement tells GAMS which model to solve, selects the solver to use (in
this case an LP solver), indicates the direction of the optimization, either MINIMIZ-
ING or MAXIMIZING , and specifies the objective variable. We have two problems
here to solve, a master problem and a pricing sub-problem. Corresponding indices,
parameters, variables and equations are defined for each problem. These problem are
defined by the statements
model master / mdefobj, mdefbal, mdefcap /;
model pricing / pdefobj, pdefbal /;
The steps defined in Algorithm 4 are implemented in the last part of the model.
To solve the main problem and the pricing sub-problem by these statements
60
solve master using lp minimizing z;
solve pricing using lp minimizing z;
where each statement is at its appropriate place in code (See Appendix B). Run-
ning this code using GAMS platform, a report is generated including many results.
We emphasize here the most important messages in this report as follows
S O L V E S U M M A R Y
MODEL mcf OBJECTIVE z
TYPE LP DIRECTION MINIMIZE
SOLVER CPLEX FROM LINE 81
**** SOLVER STATUS 1 Normal Completion
**** MODEL STATUS 1 Optimal
**** OBJECTIVE VALUE 1726.1151
RESOURCE USAGE, LIMIT 0.078 1000.000
ITERATION COUNT, LIMIT 10 2000000000
**** REPORT SUMMARY : 0 NONOPT
0 INFEASIBLE
0 UNBOUNDED
---- 84 PARAMETER xsingle single solve
n1 n4 n5 n6 n8
k1.n2 123.400
k2.n4 29.264
k2.n5 29.264 100.296
k3.n6 51.062
k5.n2 52.463
61
k5.n8 52.463
---- 203 PARAMETER xall summary of flows
single serial
k1.n2.n5 123.400 123.400
k2.n4.n6 29.264 29.264
k2.n5.n4 29.264 29.264
k2.n5.n6 100.296 100.296
k3.n6.n5 51.062 51.062
k5.n2.n8 52.463 52.463
k5.n8.n1 52.463 52.463
The report shows that CPLEX solver was used to find the optimal solution. The
final objective value is 1726.1151 and the problem was solved. The final solution is
also included for the randomly generated instance.
4.3 Matlab Programming Language
Another important platform which allows us to solve large scale linear programs is
the Matrix Laboratory or Matlab. Matlab is a multi-paradigm numerical computing
environment and a programming language developed by Mathworks. Matlab allows
matrix manipulations, plotting of functions and data, implementation of algorithms,
creation of user interfaces, and interfacing with programs written in other languages,
including C, C++, Java, Fortran, and Python. Matlab includes an optimization tool-
box which provides functions for solving constrained and unconstrained optimization
problems. The toolbox includes solvers for linear programming, mixed-integer linear
62
programming, quadratic programming, nonlinear optimization, and nonlinear least
squares.
Linear optimization problems can be solved using the linprog function from
the toolbox. The optimization toolbox includes three algorithms used to solve linear
programming problems:
• The simplex algorithm is a systematic procedure for generating and testing
candidate vertex solutions to a linear program. The simplex algorithm is the
most widely used algorithm for linear programming.
• The interior point algorithm is based on a primal-dual predictor-corrector algo-
rithm used for solving linear programming problems. Interior point is especially
useful for large-scale problems that have structure or can be defined using sparse
matrices.
• The active-set algorithm minimizes the objective at each iteration over the ac-
tive set (a subset of the constraints that are locally active) until it reaches a
solution.
The syntax of this function is as follows
[x,fval] = linprog(f,A,b,Aeq,beq,lb,ub,x0,options);
which solves min f (x) such that Ax ≤ b. It also includes equality constraints
Aeq x = beq . It defines a set of lower and upper bounds on the design variables, x, so
that the solution is always in the range lb ≤ x ≤ ub. Moreover, x0 is the starting
point from which the algorithm starts searching for the optimal solution. The options
of this optimization function are defined in options using the optimset function.
63
There are many options that can be set using this function. We focus here on two
of them, Algorithm and LargeScale. This function returns the optimal value of
decision parameter x corresponding to the optimal value of objective function fval.
We can choose the optimization algorithm from one of interior-point-legacy
(default), interior-point, dual-simplex, active-set or simplex. The
first three algorithms are large-scale algorithms, while last two algorithms are not.
An optimization algorithm is large scale when it uses linear algebra that does not
need to store, nor operate on, full matrices. This may be done internally by storing
sparse matrices, and by using sparse linear algebra for computations whenever pos-
sible. Furthermore, the internal algorithms either preserve sparsity, such as a sparse
Cholesky decomposition, or do not generate matrices, such as a conjugate gradient
method. The option LargeScale can be set to ’on’ (default), with one of the
mentioned large scale algorithms, to solve large size problems or ’off’ when we
intend to solve medium or small size problems.
4.3.1 Matlab Implementation of D2D Caching Example
In this section we shed light on how Matlab can help us to solve the D2D caching
problem discussed in Section 2.6.3. When the number of users and the number of data
items grow large in this problem we have a large scale linear optimization problem.
A complete implementation code is provided in Appendix C. We consider the case
when N = 1000 and M = 105 and solve the problem by increasing the number of
users involved in the system. We generate a random instance of the demand and
mobility profiles using the demand gen and mobility gen functions respectively.
We assume that each user can store up to 10% of these data items in his device. We
64
also assume that the carrier pays back 0.5 units for each byte cached and shared by
every user (i.e. r = 0.5). We initialize some variables to store the values obtained
after each run of the loop.
X_optimal = zeros(NMax,NMax*M);
Cost_optimal = zeros(1,NMax);
Gain_optimal = zeros(1,NMax);
Memory_optimal = zeros(1,NMax);
Each time we run the loop, the statistics corresponding to involved users are
captured to prepare alpha. After that we set the parameters of the optimization
function linprog. We set an upper and lower bound on the decision parameter for
the feasibility of the solution. These bounds represent the third constraint in (2.6).
LB = zeros(1,N*M);
UB = zeros(1,N*M);
for n=1:N
UB(M*(n-1)+1:M*n) = Sm;
end
We define the inequality constraints by generating two matrices, A1 and A2.
Matrix A1 is for the memory constraint which is the first constraint in (2.6). Matrix
A2 is the second constraint in (2.6). These two matrices are merged in one matrix to
be set as an option to the linprog function.
A = [A1 ; A2];
b = [b1; b2];
65
An initial point x0 = (0, 0, · · · , 0) is chosen. The most important part of this code
is to choose the optimization algorithm and to turn the large scale option to ’on’
options = optimset(’Algorithm’,’dual-simplex’,
’Display’,’off’,’LargeScale’,’on’);
Now, everything is ready to call the linprog function and solve the problem
[xopt,costP]=linprog(cost_fun,A,b,[],[],LB,UB,x0,options);
This function returns the optimal solution xopt and the optimal value of the
cost function costP. The results of this system are depicted in Figure 4.1. The
carrier achieves more gain and uses less memory when more users are engaged in the
network. Moreover, we notice that memory usage decays as O( N1 ). More users help
all parties to gain more and, at the same time, it requires less memory for this caching
as the network expands. When a user requests a certain data item and more users are
located around him, he gets that item either from his local cache or from other users
through the D2D communication. This helps the carrier to smooth out the network
load and minimize the incurred service cost. Notice that the LargeScale option of
Matlab allowed us to find the optimal solution even when the number of users N and
the number of data contents M increase. We refer the reader to [2] for more details.
66
100 50
40
80
Memory Usage (%)
Carrier Gain (%)
30
60
20
40
10
20 0
0 50 100 0 50 100
No. of Users (N) No. of Users (N)
Figure 4.1: System Performance of the D2D Caching Network
67
Appendix A: AMPL Implementation of Column Generation
The Data File:
data;
# -----------------------------------------
# SETTING THE ROLL WIDTH AND ORDER DETAILS
# -----------------------------------------
param roll_width := 110 ;
param: WIDTHS: orders :=
20 48
45 35
50 24
55 10
75 8 ;
The Run File:
# ----------------------------------------
# SETTING MODEL FILE AND SOLVER
# ----------------------------------------
model cut.mod;
data cut.dat;
option solver cplex, solution_round 6;
option display_1col 0, display_transpose -10;
# ----------------------------------------
# DEFINING THE PROBLEMS
# ----------------------------------------
problem Cutting_Opt: Cut, Number, Fill;
problem Pattern_Gen: Use, Reduced_Cost, Width_Limit;
68
# ----------------------------------------
# GENERATING THE PATTERNS
# ----------------------------------------
let nPAT := 0;
for {i in WIDTHS} {
let nPAT := nPAT + 1;
let nbr[i,nPAT] := floor (roll_width/i);
let {i2 in WIDTHS: i2 <> i} nbr[i2,nPAT] := 0;
}
# ----------------------------------------
# RUNNING THE PROCEDURE
# ----------------------------------------
repeat {
solve Cutting_Opt;
let {i in WIDTHS} price[i] := Fill[i].dual;
solve Pattern_Gen;
if Reduced_Cost < -0.00001 then {
let nPAT := nPAT + 1;
let {i in WIDTHS} nbr[i,nPAT] := Use[i];
}
else break;
}
display nbr, Cut;
# ----------------------------------------
# FINAL ROUND AND OUTPUT DISPLAY
# ----------------------------------------
option Cutting_Opt.relax_integrality 0;
solve Cutting_Opt;
display Cut;
The Mod File:
# ----------------------------------------
# CUTTING STOCK USING PATTERNS
# ----------------------------------------
param roll_width > 0; # width of raw rolls
set WIDTHS; # set of widths to be cut
param orders {WIDTHS} > 0; # number of each width to be cut
param nPAT integer >= 0; # number of patterns
set PATTERNS = 1..nPAT; # set of patterns
param nbr {WIDTHS,PATTERNS} integer >= 0;
69
check {j in PATTERNS}:
sum {i in WIDTHS} i * nbr[i,j] <= roll_width;
# defn of patterns: nbr[i,j] = number
# of rolls of width i in pattern j
var Cut {PATTERNS} integer >= 0; #rolls cut using each pattern
minimize Number: # minimize total raw rolls cut
sum {j in PATTERNS} Cut[j];
subject to Fill {i in WIDTHS}:
sum {j in PATTERNS} nbr[i,j] * Cut[j] >= orders[i];
# for each width, total
# rolls cut meets total orders
# ----------------------------------------
# KNAPSACK SUBPROBLEM FOR CUTTING STOCK
# ----------------------------------------
param price {WIDTHS} default 0.0;
var Use {WIDTHS} integer >= 0;
minimize Reduced_Cost:
1 - sum {i in WIDTHS} price[i] * Use[i];
subject to Width_Limit:
sum {i in WIDTHS} i * Use[i] <= roll_width;
70
Appendix B: GAMS Implementation of Multi-Commodity Network
Flow Problem
# ----------------------------------------
# Problem Settings
# ----------------------------------------
$Eolcom !
$setddlist nodes comm maxtime
$if NOT set nodes $set nodes 20
$if NOT set comm $set comm 5
$if NOT set maxtime $set maxtime 50
$if NOT errorfree $abort wrong double dash parameters:
--nodes=n --comm=n --maxtime=secs
# ----------------------------------------
# Defining SETS
# ----------------------------------------
sets i nodes / n1*n%nodes% /
k commodities / k1*k%comm% /
e(i,i) edges
alias (i,j)
# ----------------------------------------
# Defining Indexed Parameters
# ----------------------------------------
parameters
cost(i,j) cost for edge use
bal(k,i) balance
kdem(k) demand
cap(i,j) bundle capacity ;
# ----------------------------------------
# Declaring Variables
# ----------------------------------------
variables
71
x(k,i,j) multi commodity flow
z objective
positive variable x;
# ----------------------------------------
# Objective Functions and Constraints
# ----------------------------------------
equations
defbal(k,i) balancing constraint
defcap(i,j) bundling capacity
defobj;
defobj.. z =e= sum((k,e), cost(e)*x(k,e));
defbal(k,i).. sum(e(i,j),x(k,e))-sum(e(j,i),x(k,e))=e=bal(k,i);
defcap(e).. sum(k, x(k,e)) =l= cap(e);
# ----------------------------------------
# Defining Model
# ----------------------------------------
model mcf multi-commodity flow problem /all/;
# ----------------------------------------
# Making a Random Instance
# ----------------------------------------
scalars inum, edgedensity /0.3/ ;
e(i,j) = uniform(0,1) < edgedensity; e(i,i) = no;
cost(e) = uniform(1,10);
cap(e) = uniform(50,100)*log(card(k));
loop(k,
kdem(k) = uniform(50,150);
inum = uniformInt(1,card(i));
bal(k,i)$(ord(i)=inum) = kdem(k);
inum = uniformInt(1,card(i));
bal(k,i)$(ord(i)=inum) = bal(k,i) - kdem(k);
kdem(k) = sum(i$(bal(k,i)>0), bal(k,i)) );
# ----------------------------------------
# See if the random model is feasible
# ----------------------------------------
option limrow=0, limcol=0;
option solprint=off, solvelink=%solvelink.CallModule%;
solve mcf min z using lp;
abort$(mcf.modelstat <> %modelstat.Optimal%)
’problem not feasible. Increase edge density.’
parameter xsingle(k,i,j) single solve;
72
xsingle(k,e) = x.l(k,e)$[x.l(k,e) > 1e-6];
display$(card(i) < 30) xsingle;
# ----------------------------------------
# Define Master Model
# ----------------------------------------
set p paths idents / p1*p100 /
ap(k,p) active path
pe(k,p,i,j) edge path incidence vector
parameter
pcost(k,p) path cost
positive variable xp(k,p), slack(k);
equations
mdefcap(i,j) bundle constraint
mdefbal(k) balance constraint
mdefobj objective;
mdefobj.. z=e=sum(ap,pcost(ap)*xp(ap))+sum(k,999*slack(k));
mdefbal(k).. sum(ap(k,p), xp(ap)) + slack(k) =e= kdem(k);
mdefcap(e).. sum(pe(ap,e), xp(ap)) =l= cap(e);
model master / mdefobj, mdefbal, mdefcap /;
# ----------------------------------------
# Define Pricing Model: Shortest Path
# ----------------------------------------
parameter ebal(i)
positive variable xe(i,j)
equations
pdefbal(i) balance constraint
pdefobj objective;
pdefobj.. z =e= sum(e, (cost(e)-mdefcap.m(e))*xe(e));
pdefbal(i).. sum(e(i,j), xe(e)) - sum(e(j,i),xe(e))=e=ebal(i);
model pricing / pdefobj, pdefbal /;
# ----------------------------------------
# Solving Master and Pricing Problems
# ----------------------------------------
Scalar done loop indicator, iter iteration counter;
Set nextp(k,p) next path to be added ;
* clear path data
done = 0; iter = 0;
ap(k,p) = no; pe(k,p,e) = no;
pcost(k,p) = 0;
nextp(k,p) = no; nextp(k,’p1’) = yes;
While(not done, iter=iter+1;
73
solve master using lp minimizing z;
done = 1;
loop(k$kdem(k),
ebal(i) = bal(k,i)/kdem(k);
solve pricing using lp minimizing z;
pricing.solprint=%solprint.Quiet%;
! turn off all outputs fpr pricing model
if (mdefbal.m(k) - z.l > 1e-6, ! add new path
ap(nextp(k,p)) = yes;
pe(nextp(k,p),e) = round(xe.l(e));
pcost(nextp(k,p)) = sum(pe(nextp,e), cost(e));
nextp(k,p) = nextp(k,p-1);
! bump the path to the next free one
abort$(sum(nextp(k,p),1)=0) ’set p too small’;
done = 0 ) ) );
abort$(abs(master.objval-mcf.objval)>1e-3)
’different objective values’, master.objval, mcf.objval;
parameter xserial(k,i,j);
xserial(k,e) = sum(pe(ap(k,p),e), xp.l(ap));
display$(card(i) < 30) xserial;
74
Appendix C: Matlab Implementation of D2D Caching Example
The Main File:
% ----------------------------------------
% System Parameters
% ----------------------------------------
clc, clear all; close all; % Clearing Everything
NInit = 5; % Initial Number of Users
step = 5; % Step Size in For Loop
L = 4; % No. of Locations
NMax = 1000; % No. of Users
M = 1e05; % No. of Data Items
Theta_all = mobility_gen(NMax,L,1); % Mobility Profile
P_all = demand_gen(NMax,M,1); % Demand Profile
Sm = 100*ones(1,M); % Data Items Sizes
Zn_all = (M/10)*100*ones(NMax,1); % Memory Sizes
r = 0.50; % Reward Factor
% ----------------------------------------
% Initialization
% ----------------------------------------
X_optimal = zeros(NMax,NMax*M);
Cost_optimal = zeros(1,NMax);
Gain_optimal = zeros(1,NMax);
Memory_optimal = zeros(1,NMax);
% ----------------------------------------
% Running Loop on No. of Users
% ----------------------------------------
for N=NInit:step:NMax
disp([’N = ’ num2str(N)]);
% Taking Chunk
PIndex = [];
for m=1:M
PIndex = [PIndex (1:N)+(m-1)*NMax];
75
end
P = P_all(PIndex);
TIndex = [];
for l=1:L
TIndex = [TIndex (1:N)+(l-1)*NMax];
end
Theta = Theta_all(TIndex);
Zn = Zn_all(1:N);
% Caching Decisions
x = zeros(1,N*M);
% Preparing Matrix (S)
S = zeros(1,M*L);
for m=1:M
S((m-1)*L+1:m*L)=Sm(m)*ones(1,L);
end
% Preparing Matrix (I) & (II)
I = zeros(M*L,M*N);
II = zeros(M*N,M*L);
for m=1:M
I((m-1)*L+1:m*L,(m-1)*N+1:m*N)= reshape(Theta,N,L)’;
II((m-1)*N+1:m*N,(m-1)*L+1:m*L)= reshape(Theta,N,L);
end
% Preparing Matrix (III)
III = zeros(1,M*L);
for m=1:M
for l=1:L
III((m-1)*L+l)=P((m-1)*N+1:m*N)*Theta((l-1)*N+1:l*N)’;
end
end
% Preparing Alpha
alpha = III*I;
% Reactive Load Calculation
loadR=0;
for m=1:M
loadR = loadR + Sm(m)*ones(1,N)*P((m-1)*N+1:m*N)’;
end
% Reactive Cost (Linear)
costR = loadR;
% ----------------------------------------
% Optimization Parameters
% ----------------------------------------
% Upper and Lower Bounds
LB = zeros(1,N*M);
76
UB = zeros(1,N*M);
for n=1:N
UB(M*(n-1)+1:M*n) = Sm;
end
% Constraint 1
A1 = zeros(N,N*M);
for m=1:M
A1(1:N,(m-1)*N+1:m*N)=eye(N,N);
end
b1 = Zn(1:N);
% Constraint 2
A2 = I;
b2 = S’;
% Constraint Matrix
A = [A1 ; A2];
b = [b1; b2];
% Initial Point
x0 = zeros(1,N*M);
% Option Setting
options = optimset(’Algorithm’,’dual-simplex’,
’Display’,’off’,’LargeScale’,’on’);
% ----------------------------------------
% Solving the Problem - Proactive Cost
% ----------------------------------------
cost_fun = r - alpha;
[xopt,costP] = linprog(cost_fun,A,b,[],[],LB,UB,x0,options);
costP = costP + costR;
X_optimal(1:N*M,N) = xopt;
Cost_optimal(N) = costP;
Gain_optimal(N) = 100*(costR-costP)/costR;
Memory_optimal(N) = 100*sum(xopt)/sum(Zn(1:N));
end
Cost_LB = Cost_optimal;
X_LB = X_optimal;
% ----------------------------------------
% Plotting Results
% ----------------------------------------
figure(3); subplot(1,2,1);
plot([NInit:step:NMax],Gain_optimal(NInit:step:NMax),
’r-’,’LineWidth’,2);
grid on; xlabel(’No. of Users (N)’); ylabel(’Gain (%)’);
77
title(’Carrier Gain vs No. of Users (N)’);
figure(3); subplot(1,2,2);
plot([NInit:step:NMax],Memory_optimal(NInit:step:NMax),
’b-’,’LineWidth’,2);
grid on; xlabel(’No. of Users (N)’); ylabel(’Memory Usage (%)’);
title(’Overall Memeory Usage vs No. of Users (N)’);
Demand Generation Function:

function P = demand_gen(N,M,S)
P = zeros(S,N*M);
for s=1:S
for m=1:M
Z = zipf(rand,1,N);
P(s,(m-1)*N+1:m*N) = Z(randperm(N))./max(Z);
end
end
end
Mobility Generation Function:
function T = mobility_gen(N,L,S)
T = rand(S,L*N);
M = zeros(N,N*L);
if N==1
M = ones(S,L);
else
for l=1:L
M(1:N,(l-1)*N+1:l*N)=eye(N,N);
end
end
for s=1:S
temp = M*T(s,:)’;
for l=1:L
T(s,(l-1)*N+1:l*N) = T(s,(l-1)*N+1:l*N)./temp’;
end
end
end
78
Bibliography
[1] D. Bertsimas and J. N. Tsitsiklis, Introduction to linear optimization. Athena

Scientific Belmont, MA, 1997, vol. 6.
[2] S. Hosny, H. El Gamal, and A. Eryilmaz, “Impact of user mobility on d2d

caching networks,” in 2016 IEEE Global Communications Conference (Globe-
com). IEEE, 2016, p. na.
[3] G. B. Dantzig and P. Wolfe, “Decomposition principle for linear programs,”

Operations research, vol. 8, no. 1, pp. 101–111, 1960.
[4] P. C. Gilmore and R. E. Gomory, “A linear programming approach to the cutting-

stock problem,” Operations research, vol. 9, no. 6, pp. 849–859, 1961.
[5] C. Alves, F. Clautiaux, J. V. de Carvalho, and J. Rietz, “Applications for cutting

and packing problems,” in Dual-Feasible Functions for Integer Programming and
Combinatorial Optimization. Springer, 2016, pp. 91–123.
[6] T. Larsson, A. Migdalas, and M. Patriksson, “A generic column generation prin-

ciple: derivation and convergence analysis,” Operational Research, vol. 15, no. 2,
pp. 163–198, 2015.
[7] M. M. Baldi and M. Bruglieri, “On the generalized bin packing problem,” Inter-
national Transactions in Operational Research, 2016.
[8] M. Delorme, M. Iori, and S. Martello, “Bin packing and cutting stock problems:
Mathematical models and exact algorithms,” European Journal of Operational
Research, 2016.
[9] L. Chen and G. Zhang, “Packing groups of items into multiple knapsacks,”
in LIPIcs-Leibniz International Proceedings in Informatics, vol. 47. Schloss
Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016.
[10] P. Wolfe, “A technique for resolving degeneracy in linear programming,” Journal

of the Society for Industrial and Applied Mathematics, vol. 11, no. 2, pp. 205–211,
1963.
79
[11] R. G. Bland, “New finite pivoting rules for the simplex method,” Mathematics
of operations Research, vol. 2, no. 2, pp. 103–107, 1977.
[12] R. E. Gomory, “An algorithm for integer solutions to linear programs,” Recent
advances in mathematical programming, vol. 64, pp. 260–302, 1963.
[13] M. Gradišar, G. Resinovič, and M. Kljajić, “Evaluation of algorithms for one-

dimensional cutting,” Computers & Operations Research, vol. 29, no. 9, pp. 1207–
1220, 2002.
[14] L. V. Kantorovich, “Mathematical methods of organizing and planning produc-

tion,” Management Science, vol. 6, no. 4, pp. 366–422, 1960.
[15] L. Kantorovich and V. Zalgaller, “Calculation of rational cutting of stock,” Leniz-

dat, Leningrad, vol. 5, pp. 11–14, 1951.
[16] G. L. Nemhauser and L. A. Wolsey, “Integer and combinatorial optimization.

interscience series in discrete mathematics and optimization,” ed: John Wiley &
Sons, 1988.
[17] D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas, Dynamic

programming and optimal control. Athena Scientific Belmont, MA, 1995, vol. 1,
no. 2.
[18] Y. Cheng, X. Cao, X. S. Shen, D. M. Shila, and H. Li, “A systematic study of

the delayed column generation method for optimizing wireless networks,” in Pro-
ceedings of the 15th ACM international symposium on Mobile ad hoc networking
and computing. ACM, 2014, pp. 23–32.
[19] M. Cao, V. Raghunathan, S. Hanly, V. Sharma, and P. Kumar, “Power control

and transmission scheduling for network utility maximization in wireless net-
works,” in Decision and Control, 2007 46th IEEE Conference on. IEEE, 2007,
pp. 5215–5221.
[20] J. Alonso and K. Fall, “A linear programming formulation of flows over time
with piecewise constant capacity and transit times,” Intel Research Technical
Report IRB-TR-03-007, Tech. Rep., 2003.
[21] S. Jain, K. Fall, and R. Patra, Routing in a delay tolerant network. ACM, 2004,
vol. 34, no. 4.
[22] G. Amantea, H. Rivano, and A. Goldman, “A delay-tolerant network routing

algorithm based on column generation,” in Network Computing and Applications
(NCA), 2013 12th IEEE International Symposium on. IEEE, 2013, pp. 89–96.
80
[23] A. Khreishah, J. Chakareski, and A. Gharaibeh, “Joint caching, routing, and
channel assignment for collaborative small-cell cellular networks,” arXiv preprint
arXiv:1605.09307, 2016.
[24] J. J. More, S. J. Wright, and P. M. Pardalos, Optimization software guide. SIAM,

1993, vol. 14.
[25] R. Fourer, D. M. Gay, and B. W. Kernighan, “A modeling language for mathe-

matical programming,” Management Science, vol. 36, no. 5, pp. 519–554, 1990.
[26] J. Kallrath, Modeling languages in mathematical optimization. Springer Science

& Business Media, 2013, vol. 88.
[27] J. K. Ho and E. Loute, “An advanced implementation of the dantzig—wolfe

decomposition algorithm for linear programming,” Mathematical Programming,
vol. 20, no. 1, pp. 303–326, 1981.
81

Sameh Hosny Math MSC Dissertation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sameh Hosny Math MSC Dissertation

Uploaded by

Copyright:

Available Formats

LARGE SCALE LINEAR OPTIMIZATION FOR

WIRELESS COMMUNICATION SYSTEMS

Presented in Partial Fulfillment of the Requirements for the Degree

Sameh Hosny, M.S.

Graduate Program in Department of Mathematics

The Ohio State University

Master’s Examination Committee:

Linear Programming has many applications in the domain of wireless commu-

linear programs in thousand of variables and constraints, using specialized methods

concise survey of linear programming fundamentals with a focus on techniques for

large scale linear programming problems in the context of wireless communication.

the active field of wireless communication. The dissertation is concluded by giving

and my kids Rinad, Rawan and Mohammed.

I would like to express my special appreciation and thanks to my advisor Professor

rigorous background in both my major and minor fields. It allowed me to improve

my research skills and to change my perspective to many things.

friendships as well as good advice and collaboration. I would like to acknowledge

supporting me with all the information I needed especially in the beginning of my

learning so many things.

many nights waiting for me to accomplish my hard tasks. Thank you.

December 11, 1978 . . . . . . . . . . . . . . . . . . . . . . . . . Born - Cairo, Egypt

2001 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.S. Electrical and Computer Engi-

(Accepted) S. Hosny, A. Eryilmaz and H. El Gamal, ”Impact of User Mobility on

(Accepted) S. Hosny, A. Eryilmaz and H. El Gamal, ”Mobility-Aware Centralized

(Submitted) S. Hosny, F. Alotaibi, J. Tadrous, A. Eryilmaz and H. El Gamal, ”Con-

(To be submitted) S. Hosny, A. Abouzeid, A. Eryilmaz and H. El Gamal, ”Mobility-

F. Alotaibi, S. Hosny, H. El Gamal and A. Eryilmaz, ”A game theoretic approach to

S. Hosny, F. Alotaibi, H. El Gamal and A. Eryilmaz, ”Towards a mobile content

Alotaibi, F., S. Hosny, J. Tadrous, H. El Gamal, and A. Eryilmaz. ”Towards a mar-

Major Field: Electrical & Computer Engineering

2. Review of Linear Programming . . . . . . . . . . . . . . . . . . . . . . . 3

3.1 Delayed Column Generation Method . . . . . . . . . . . . . . . . . 34

4. Implementation of Large Scale Linear Programs . . . . . . . . . . . . . . 51

4.1 AMPL Programming Language . . . . . . . . . . . . . . . . . . . . 52

A. AMPL Implementation of Column Generation . . . . . . . . . . . . . . . 68

B. GAMS Implementation of Multi-Commodity Network Flow Problem . . 71

C. Matlab Implementation of D2D Caching Example . . . . . . . . . . . . . 75

2.1 Comparison between Simplex implementation methods . . . . . . . . . . . 22

2.1 Graphical solution of a linear program example. . . . . . . . . . . . . . . . 8

2.2 Visualization of standard form problems . . . . . . . . . . . . . . . . . . . 8

2.3 Full Tableau Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 An illustration of the power control example. . . . . . . . . . . . . . . . . 28

2.5 An illustration of the multi-commodity network flow example. . . . . . . . 30

2.6 An illustration of the D2D caching networks example. . . . . . . . . . . . . 32

4.1 System Performance of the D2D Caching Network . . . . . . . . . . . . . 67

verse types of problems in planning, routing, scheduling, assignment, and design.

telecommunications, health care, finance and manufacturing. In a number of these

applications, a realistic model gives rise to a LP problem with a large number of

substantial computational resources to solve it; in particular, substantial amount of

Linear programming has numerous and important applications in the domain of

dous amount of data contents. Therefore, we focus in this dissertation on linear

hundreds, of thousands of continuous variables are regularly solved using software

examples of wireless communication systems and illustrates of how to solve them

using these software packages.

The dissertation is organized as follows: Chapter 2 is a review for the fundamentals

of linear programming to fix the notation. In Chapter 3, we describe some important

proven in practise. In Chapter 4, we illustrate how to implement the large scale LP

algorithms, discussed in Chapter 3, using software packages such as AMPL, GAMS

A linear program (LP) is an optimization problem in which the objective function

all n-dimensional vectors x = (x1 , · · · , xn ) subject to a set of linear equality and

subject to a11 x1 + a12 x2 + · · · + a1n xn = b1

a21 x1 + a22 x2 + · · · + a2n xn = b2

am1 x1 + am2 x2 + · · · + amn xn = bm