Professional Documents
Culture Documents
Chapter 1 Introduction
Chapter 1 Introduction
MATHEMATICS
MATH3082 Optimization
2021-22
MATH3082: Optimization (2020-21)
Lecturers
Professor Hou-Duo Qi Dr Vuong Phan
(for Linear Programming) (for Nonlinear Programming)
Email H.Qi@soton.ac.uk T.V.Phan@soton.ac.uk
Lectures
See your timetable
Content
Ch.1: Introduction to Optimization
References
[1] R. Vanderbei, Linear Programming: Foundation and Extension. (An excellent on-line
book).
[2] J. Nocedal and S. Wright, Numerical Optimization (1st and 2nd ed. in Hartley Li-
brary).
2
1 Introduction to Optimization
Nothing happens in the universe that does not have a sense of either certain maxi-
mum or minimum. L. Euler, Swiss Mathematician and Physicist, 1707-1783.
In this chapter, we start with a few interesting problems that can be modelled by linear or
nonlinear programming. Many other examples can be found in many standard text books on
optimization (e.g., those books recommended). We then give a formal denition of the linear
programming (LP). We nish this chapter by introducing the graphical method for LPs with
just two variables. It will motivate us to study the Simplex method in next chapter. General
nonlinear programming will be introduced in later chapters.
Ax = b.
Suppose n m and the number of columns of A used is very small. This means that we would
like decode x with a small number of non-zeros in x. Let kxk0 (known as the zero-norm of x)
be the count of the non-zero elements of x. We would like to seek a solution x that has small
s = kxk0 .
This problem can be solved (under some reasonable conditions) by the `1 minimization:
That is, the system has 100 equations, 1000 variables. We would like to nd a solution which has
only 20 non-zero elements. Hence, the solution is sparse comparing to its dimension n = 1000.
You may wonder why `1 minimization works for this case. Let us look at a simple example in
2 dimensions (Fig. 1.2), where we have only one linear equation. It can be clearly seen that the
optimal solution has one element (out of 2) taking the value 0.
wT ai ≥ γ, for ai ∈ S1 ,
3
Figure 1.1: `1 minimization leads to sparse solutions
and
wT bi ≤ γ, for bi ∈ S2 .
There is a trivial solution (w, γ) = (0, 0) to the above conditions. To guard against a trivial
answer, we seek to enforce the stronger conditions:
wT ai ≥ γ + 1, for ai ∈ S1 , (1.2)
and
wT bi ≤ γ − 1, for bi ∈ S2 . (1.3)
We want to nd such a line characterized by (w, γ). This certainly can be reformulated as a
linear programming problem:
min 0
subject to wT ai ≥ γ + 1, for ai ∈ S1
wT bi ≤ γ − 1, for bi ∈ S2 .
Note: There is an implicit assumption: Existence of separating lines. What shall we do if no
separating planes exist?
Solution continued: The Inseparable Case∗ . If no lines exist to satisfy conditions (1.2) or
(1.3), there must exist violations. But we generally do not know which one is violated. Let us
consider the ith inequality in (1.2):
wT ai ≥ γ + 1.
If this inequality is violated, then the violation yi can be calculated by
yi = (γ + 1) − wT ai and yi ≥ 0. (1.4)
4
Figure 1.2: `1 minimization in 2 dimensions
We would like to nd a pair (w, γ) that minimizes the total violation:
m
X k
X
yi + zi ,
i=1 i=1
5
20 10
15 8
6
10
4
5
2
0
0
−5
−2
−10
−4
−15 −6
−20 −8
−4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8
Example 1.2. (Diet Problem) A nutritionist is planning a menu consisting of two main foods
A and B. Each ounce of A contains 2 units of fat, 1 unit of carbohydrate, and 4 units of protein.
Each ounce of B contains 3 units of fat, 3 units of carbohydrates, and 3 units of protein. The
nutritionist wants the meal to provide at least 18 units of fat, at least 12 units of carbohydrate,
and at least 24 units of protein. If an ounce of A costs 20 pence and an ounce of B costs 25
pence, how many ounces of each food should be served to minimize the cost of the meal yet satisfy
the nutritionist's requirement?
Solution: LP Model:
Step 1: Set up Variables. Let x1 and x2 denote the number of ounces of food A and B, which
are to be served.
Step 2: Set up Objective Function. The cost of the meal, which is to be minimized, is:
Constraint 2:
Constraint 3:
Constraint 4:
x1 ≥ 0, x2 ≥ 0 (nonnegativity constraint)
6
Step 4: Set up LP problem.
We also suppose that the communication between xi and its neighbours can be converted to
distances, denoted by δij . An example of 50 sensors with 4 anchors is depicted in Figure 1.5.
Figure 1.5: Sensor network localization with partial and noisy observations
The question is to locate the unknown sensors xi such that their distances should be as close as
possible to those observed δij . This can be modelled as an optimization problem:
n X
X 2 n X
X 2
min f (x1 , . . . , xn ) = kxi − xj k − δij + kxi − aj k − δij ,
i=1 j∈Nix i=1 j∈Nia
where the norm is the Euclidean norm. The problem is challenging because of the following
computational issues:
(a) There are nr variables (xi ∈ IRr and there are n of them). If n is large, this is a large scale
problem.
7
(b) The objective function is not dierentiable at some points.
Figure 1.6 presents a recovery of 50 sensors based on partially observed {δij }. How to solve such
kind of problems is a topic in this course.
Figure 1.6: Sensor network localization with partial and noisy observations in the unit square [−0.5, 0.5] : 4
2
blue points are anchors, noisy level at 10% and the radio range is 0.5. ◦ are the true locations and ∗ represents
the recovered locations. The corresponding pair are linked by a line.
8
A well-known unconstrained problem is to minimize Rosenbrock function (1960) of two variables
The global solution is (x, y) = (1, 1). Suppose we do not know it. It is not obvious what methods
would quickly lead to the optimal solution. The function is plotted below:
The sensor network localization is another example of unconstrained optimization and it is one
of the most challenging problems to solve when n is big.
min f (x)
s.t. hi (x) = 0, i = 1, . . . , m0
(1.8)
gj (x) ≤ 0, j = 1, . . . , m1
x ∈ C,
9
where C is a subset in IRn . For example, in the diet problem, we can choose
n o
C = (x1 , x2 ) x1 ≥ 0, x2 ≥ 0 .
It is not hard to formulate many practical problems into the form of (1.8). One key issue,
however, is how you would solve it and what makes the problem easier to solve. There is a
common consensus that if a problem is convex, there should be some ecient algorithms to solve
it.
Denition 1.3. A set C ∈ IRn is a convex set if it contains the entire line segment between
every pair of points in C . That is,
λx + (1 − λ)y ∈ C, whenever x, y ∈ C and 0 ≤ λ ≤ 1.
There are many convex sets.
In particular, the set is called polyhedral if it can be put in the following form:
n o
C = x ∈ IRn Ax ≤ b .
Denition 1.4. (Extreme point) For a convex set C ∈ IRn , a point xb ∈ C is called an extreme
point of C if it is not on the interior of any line segment wholly contained in C .
Denition 1.5. A function f : IRn 7→ IR is convex if for any x, y ∈ IRn , it holds
f (x) = x2 is convex.
The function
f (X) = − log det(X), X is positive denite matrix
is convex.
10
Figure 1.7: Conic Sections
The problem (1.8) is called linear programming problem if all the functions f, hi , gj are linear
and C is polyhedral.
We also note that minimizing function f (x) is equivalent to maximizing (−f (x)):
11
1.3 Linear Programming
1.3.1 The standard form
The standard form of linear programming takes the following form:
maximize z= c1 x1 + c2 x2 + · · · + cn xn
subject to a11 x1 + a12 x2 + · · · + a1n xn ≤ b1
a21 x1 + a22 x2 + · · · + a2n xn ≤ b2
. (1.9)
.
.
am1 x1 + am2 x2 + · · · + amn xn ≤ bm
and
x1 ≥ 0, x2 ≥ 0, . . . , xn ≥ 0.
Common terminologies used in linear programming are as follows:
(b) Constraints. The linear inequalities in the restrictions are referred as constraints. For
example, the rst constraint is: a11 x1 + . . . + a1n xn ≤ b1 . There are total m constraints in
(1.9).
(e) x1 , . . . , x n are called variables of the LP. n is the number of variables and m is the number
of constraints not including the nonnegativity constraints.
a11 · · · a1n x1 b1 c1
a21 · · · a2n x2 b2
c2
A= . , x = .. , b = .. , c = . ,
. .
.. .
.
.
. . . .
.
am1 · · · amn xn bm cn
maximize z = cT x
subject to Ax ≤ b
x ≥ 0.
12
1.3.2 Other forms
Linear programming problems may appear in dierent forms other than the standard form. But
no matter what forms an LP may be formulated to, they can all be converted into the standard
form. Common forms are the following.
(a) Minimizing rather than maximizing the objective function. But we have the following
equivalence:
(d) The nonnegativity constraints for some decision variables are absent.
(a) A feasible solution is a solution for which all the constraints are satised.
(b) The feasible region is the set of all feasible solutions.
(c) An optimal solution is a feasible solution that maximizes the objective function in the
standard LP.
Look at the following gures to see what feasible regions may look like in LP.
13
5 4
4.5
3.5
4
3
3.5
3
2.5 ← 4x1+3x2 =12
x2
x2
2.5 2
0 0
0 1 2 3 4 5 0 1 2 3 4 5
x1 x1
Figure 1.8: Feasible region (shaded) for the con- Figure 1.9: Feasible region (shaded) for the con-
straints: x 1 , ,
≥ 0 x2 ≥ 0 x1 ≤ 4 and x ≤ 4 2 straints: , ,
x1 ≥ 0 x2 ≥ 0 4x1 + 3x2 ≤ 12 and
2x1 + 5x2 ≤ 10
3.5
2
with z=66
1.5
1 ↓ 2x1+5x2 =10
Feasible Region
0.5
0
0 1 2 3 4 5
x1
Figure 1.10: Graphical method: Optimal solution x1 = 15/7, x2 = 8/7 with the optimal objective
functional value z = 300/7
S.3 Decide which corner point yields the largest (the smallest for minimization problem) ob-
jective function value. The corner point is the optimal solution.
14
An important theoretical result comes out of the graphical method is the following.
Theorem 1.7. For a linear programming problem, at least one corner point (also an extreme
point) is the optimal solution provided that the LP has an optimal solution.
Ax = b, x ≥ 0, (1.10)
Rank assumption: We assume the rank of A is m. That is, A has a full row rank.
Suppose that B is an m×m sub-matrix of A and the rank of B is m. That is B has a full rank
and hence it is nonsingular. We consider the equation:
BxB = b,
whose solution is
xB = B −1 b.
Now we return to our original equation in (1.10). We split the matrix A in two parts:
A = [B, N ],
BxB + N xN = b.
x = [xB , 0]
Denition 1.8. Consider the system (1.10) with the rank assumption that the rank of A is m.
Let B be an m × m be a submatrix of A and B has a full rank. Let
xB = B −1 b.
x = [xB , 0]
15
Example 1.9. Consider a system of (1.10) with
−1 2 2 3
A= , b= .
0 1 0 1
The rank of A is 2.
is a Basic Solution, but not a Basic Feasible Solution (because x1 < 0).
BFS is very important in our understanding of linear programming. We discuss it below involving
minimum amount of mathematics (linear algebra).
max cT x
(1.11)
s.t. Ax = b, x ≥ 0,
where A is an m × n matrix. We further assume that the matrix A satises the rank assumption
(it has rank m).
Theorem 1.10. Fundamental Theorem of Linear Programming Consider the linear program
(1.11) with A satisfying the rank assumption. The following hold.
(ii) If there is an optimal feasible solution, there must be an optimal basic feasible solution.
16
Proof. (i) Let ai be the ith column of A, i.e.,
A = [a1 , a2 , . . . , an ].
x1 a1 + x2 a2 + · · · + xn an = b. (1.12)
Suppose that there are exactly k components in x that are not zero. We assume that the rst k
are not zero:
x1 > 0, x2 > 0, · · · , xk > 0, xk+1 = · · · = xn = 0.
Then the equation (1.12) becomes
x1 a1 + x2 a2 + · · · + xk ak = b.
We consider two cases depending on whether {ai }ki=1 are linearly independent.
xTB = (x1 , x, . . . , xk , 0, . . . , 0 )
| {z }
(m−k) zeros
Then,
BxB = b.
The fact that xB ≥ 0 means that x is a basic feasible solution.
Case 2. {a1 , a2 , · · · , ak } are linearly dependent. There exists y1 , y2 , . . . , yk (not all of them are
zero) such that
y1 a1 + y2 a2 + · · · + yk ak = 0.
Let
yT = (y1 , y2 , · · · , yk , 0, . . . , 0 )).
| {z }
(n−k) zeros
Let
z = x − y,
where is a number.The key observation is that
Az = b, for any .
We choose (note: there exists at least one yk that is not zero. Without loss of generality, we may
assume there exists yk > 0)
= min{xi /yi : yi > 0}.
Then z will have at least one more zero that x, and still is feasible solution. If the correspond-
ing columns are linearly independent, then z is a basic feasible solution. If the corresponding
17
columns are not linearly independent, we can repeat the above process to get another feasible
solution. We continue this process until we reach a set of independent columns because of the
rank assumption. We nally nd a basic feasible solution.
We omit the proof for (ii) as the proof is similar to the rst part.
Remark: This theorem reduces the task of solving the linear program (1.11) to that of searching
over all basic feasible solutions. There are at most
n n!
=
m m!(n − m)!
basic solutions (a nite number of BFS). However, when n and m are getting large, this number
grows too fast and hence the search would be slow. We must execute a clever search, which leads
to the Simplex Method in the next chapter.
We state our nal result without giving a proof. The result relates the BFS to the extreme point
dened by the system (1.10). Let
K = {x ∈ IRn | Ax = b, x ≥ 0} .
Theorem 1.11. (Equivalence of Extreme Points and Basic Solutions) Consider the system (1.10)
with A satisfying the rank assumption. A vector x is an extreme point of K if and only if x is a
BFS to (1.10).
18