LNO Notes

There will be six chapters of varying lengths, 2 and 6 being the
longest, 1 and 5 the shortest.
1. Introduction.
2. Linear Programming
3. Integer Programming
Linear and Non-Linear Optimization 4. Networks
5. Computational Complexity
6. Non-Linear Programming
Nicholas Cron
Each chapter will be divided into sections.
1 2
1.1 General and Administrative Matters

Chapter 1. Introduction
Teaching Arrangements
There will be ten three-hour lectures on Friday evenings
during the Autumn Term.
1.1 General and Administrative Matters Assessment

1.2 History and Scope of Optimization Problems Formative exercises will be handed out during the course.
1.3 Sample Problems There will be four or five, at roughly fortnightly intervals.
Solutions will be provided. It is strongly advised, but not
mandatory, that you attempt them.
A summative exercise and/or small project will be provided
towards the end of term, to be submitted on an agreed date.
This exercise will contribute 20% of overall marks; the
remaining 80% will be provided by the terminal exam.
3 4
General and Administrative Matters General and Administrative Matters
Contact Arrangements Textbooks
I do not have an office at Birkbeck, so it will be most There are many books which may be helpful.
convenient if you use my e-mail address:n.cron@bbk.ac.uk The following are recommended:
Alternatively, you can write to binomial6@aol.com or
1. Bazaraa M S, Jarvis J J, Sherali H D, Linear Programming
n.cron@lse.ac.uk but I have been told in very strong terms and Network Flows, Wiley, 2005.
that I must use the .bbk address so I suppose I should do so.
2. Bazaraa M S, Sherali H D, Shetty C M, Nonlinear
Use email to advise me of any special difficulties, illness etc. Programming Theory and Algorithms, Wiley 2006
If there are problems with the course material, please telI me
3. Bertsimas D, Tsitsiklis J N, Introduction to Linear
asap. I may well address these during lectures. If you are
Optimization, Athena Scientific 1997
struggling with something, it is quite likely others will be too.
We can also arrange to talk over coffee during the break in
5 6
the lecture.
General and Administrative Matters

8. Vanderbei R J, Linear Programming: Foundations and
4. Ecker J G, Kupferschmid M, Introduction to Operations
Extensions, Springer 2013
Research, Krieger, 2004
5. Hillier F S, Liebermann G J, Introduction to Operations 9. Williams H P, Model Building in Mathematical

Research, McGraw-Hill, 2004 Programming, Wiley, 2013.
6. Luenberger D G, Ye Y, Linear and Nonlinear

10. Winston, W L, Introduction to Mathematical
Programming, Springer, 2008.
Programming: Operations Research,
Thomson/Brooks/Cole, 2003
7. Taha H A, Operations Research: An Introduction,
Pearson, 2006.
7 8
Please appreciate that the books listed represent a very small Do understand that, for books and web pages:
sample of all those available. There are dozens more – with • The content may differ (irrelevant topics included,
titles including words such as ‘Operations Research’ or relevant topics omitted);
‘Operational Research’ or ‘OR’ or ‘linear programming’ or • The level may differ (too easy, too advanced);
‘nonlinear programming’ or ‘optimisation’ and so on.
• Notation may differ;
Also very many websites, including complete sets of lecture
notes. Here is one chosen almost at random after a google • Explanations may differ (e.g. proofs of results included
search: or omitted);
http://www.cs.toronto.edu/~stacho/public/IEOR4004-notes1.pdf • Discussion may be better or worse than these notes.
(Looks good but different choice of topics, and no NLP) So it is strongly recommended that these sources be
treated as supplementary (or complementary) rather than
9 as the primary resource. 10

Some of the books listed have been through more than one Of course, there are also numerous websites, many of
edition – earlier editions may be all right, but try and get a which will be helpful (but not all!).
recent one if possible. (Suggest Amazon or ebay!)
No single book matches course material perfectly. In most Acknowledgements
cases, the book covers topics not in the course, and we These notes are built from a range of sources,
cover areas not in the book; or the level of presentation is too including some of the texts listed, online material, notes for
high or too low. (To labour the point made earlier.) earlier years of this course written by Suzanne Evans,
Browse in bookshops or online to find something you and notes for other courses taught by myself and
are comfortable with. If you come across other helpful books, others.
let me (and your colleagues) know about them. I am grateful to Suzanne Evans, and Richard Weber of
If I am pressed for a single book, I would probably Cambridge University, for permission to use some
go for Hillier & Lieberman (no. 5) but it is not perfect! of their work.
Winston (no. 10) would be my second choice.
11 12
Computing
We make some use of Excel and R. Many other packages are
Pre-requisites
available, some certainly superior, but these two are good for
A reasonable level of mathematical competence
pedagogic purposes.
will be assumed, including calculus for functions of several
variables and some linear algebra.
R – probably familiar to most - is a powerful programming Please ask if in difficulty.
language which is reasonably easy to learn and has many built
in useful functions. It is also free.
Course Materials
You can download the R package by going to the web I intend to post all material related to the course on Moodle.
page http://cran.uk.r-project.org. This should be your primary resource. I may supply
handouts on occasions, and I can also email files directly. It
Please ASK if you need help running R. Use of R is not a is essential that you are able to access and use Moodle. If
major part of the course, but it may be useful for the main you are experiencing any problems with it, let me know.
exercise. 13 14
1.2 History and Scope of Optimization Problems History and Scope of Optimization Problems
Optimization problems occur very widely in mathematics, • Scheduling Problems - assign crews to different airline
statistics and elsewhere. Typically, these problems ask for flights to minimize total cost while ensuring that a crew
the maximum or largest or smallest or least or best or rotation begins and ends in the same city.
optimal. Examples: • Revenue Management - for different classes of airline
tickets, determine how many seats to sell or hold back
as flight date approaches to maximize profits.
• Simple Calculus Problems – a cylindrical can is to be • Cutting Stock Problems: Given large paper sheets, and
constructed to contain 500g of soup. What should the demand for units of smaller sizes, determine the cutting
dimensions be to minimise the surface area, i.e. to use pattern of large into small pieces that meets demand
the smallest amount of metal? while minimizing waste.
• Simple Calculus Problems – what is the area of the
largest rectangle that can be inscribed in a circle of
radius 4? 15 16
History and Scope of Optimization Problems
• Electricity Planning - given forecast demand by period
and operating cost for each generator, determine which
generators should be run in each time interval to satisfy Most of these problems would normally be considered as
demand and minimize cost. Note that the forecasts are
critical. coming under the ambit of Operational Research. We give
• Maximum Likelihood Estimation in Statistics. a tentative definition of OR shortly, but hold onto the idea
that OR is practical, uses mathematics extensively and
More examples will be given as the course progresses. For typically involves optimisation of some sort.
now, note that: Even for those approaches that are specifically statistical
(e.g. MLE), mathematical (e.g. optimization in non-
• Most of the contexts are highly practical. Euclidean spaces), economic, financial and so on, OR
• Most problems are explicitly or implicitly in the form of techniques are transferrable.
maximizing or minimizing (i.e. optimizing) a function, It is therefore convenient to describe some of the most
often subject to constraints; the functions may be linear important and useful optimization methods against the
or non-linear. background of OR.
• The problems are of different types from different
contexts, but are capable of being represented 17 18
mathematically.

For a rather more detailed introduction than we give here, see
http://people.brunel.ac.uk/~mastjjb/jeb/or/intro.html or
http://people.brunel.ac.uk/~mastjjb/jeb/or/basicor.html. What is OR?
(Operational Research OR Operations Research OR OR!).
Following the links provided there, a wealth of other material Tentative definition:
can be found; some relevant to this course, some not. The use of (usually) quantitative models to
analyze, predict the behaviour of and improve the
Information on OR can also be found at
http://www.thescienceofbetter.org functioning of complex systems that are influenced by
human decisions.
The site of the Operational Research Society can be found at The methods of OR comprise a variety of techniques and
www.orsoc.org.uk. It has much useful and relevant
methodologies that have been developed for particular
material, and offers membership. problems, and subsequently shown to have general
applicability.
19 20
History and Scope of Optimization Problems History and Scope of Optimization Problems
History However, most authorities would maintain that OR in its
Isolated ideas in related areas – mathematics, economics, present form dates from the Second World War. As
physics, psychology etc. - have been developed since the tensions in Europe grew, work was carried out on radar
Renaissance and more particularly since the Industrial systems in 1938. The background research was called
Revolution. See Gass S I and Assad A A, ‘Research into Military Operations’ which was shortened to
An Annotated Timeline of Operations Research: An Informal Operations Research and the term was born.
History, Kluwer, 2005. The authors trace the roots of the
subject back to 1564. The work was expanded and teams or researchers formed.
At a time of acute pressure on resources (manpower,
Arguably, Charles Babbage (1791-1871) was a precursor money, equipment, time etc.) it was critical to use what was
through his work on sorting and transportation, leading to available as efficiently as possible. Top British and US
the introduction of a nationwide mail service in 1840. scientists, mathematicians and others developed
methodologies which form the basis of modern OR.
21 22
Typical problems: • U-boat attach strategy. At what height should depth
charges be released? Too high means less precision, too
• Transatlantic convoys. Too large and they will be more low means less chance of a ‘kill’. In 1941, only 2 or 3%
visible to German U-boats, will go slower (a convoy can of attacks resulted in a sinking; this rose to 40% in 1944
only go as fast as its slowest vessel) and will be more and as high as 60% in the last few months of the war,
vulnerable to attack. Too small and fewer vital supplies thanks to OR.
will be transported. What is the optimal size? • Aircraft maintenance. Too frequent means wastage of
Conclusion: a few small convoys perform better than scarce resources. Too few means sub-standard aircraft
being used on critical missions.
many large ones.
• Attacks on land targets. How many planes to use? Too
• U-boat search strategy. Allied spies were able to advise many means waste of resources, greater risk of anti-
when U-boats were launched from key sites (Bordeaux, aircraft fire, too few means less effectiveness.
Brest etc.) Planes had a small window of opportunity to • Dogfight strategy. How close to approach an enemy
locate and attack the submarines before they fighter? Too close means greater chance of a hit but
submerged. What is best: to search in a spiral shape, a greater chance of being hit. Too distant and the opposite
rectangular shape etc.? Simulations 40 years later 23 risks arose. A complete solution had to await the 24
confirm the correct approach: a spiral strategy. development of game theory after the war.
Notice how many problems can be expressed in
optimization terms (too many, too few etc.) After the war, the methods were expanded to many
non-military organisations (business, government,
A key test of the newly formed OR team at Stanmore came engineering etc.). Nowadays, most large efficient
in May 1940 as German troops advanced through France. companies undertake OR, either explicitly through an
The French Government requested 120 extra fighter OR division, or in collaboration with mathematicians,
aircraft to defend their country. Churchill passed the statisticians and other researchers.
request to the OR team for analysis. Using (simple) Typical areas of work:
mathematical and statistical tools based on current losses • Location problems: Where should a city locate a new airport?
and knowledge of German capability, the team clearly What is the optimal location of a fire station?
showed the futility of sending the planes. • Route scheduling: How should police officers’ beats be
organised? Postman delivery rounds?
No aircraft were sent and most of those currently in France
were recalled. The Nazis conquered France. But the
aircraft and pilots were crucial during the subsequent Battle
25 26
of Britain. OR had passed its first major test.
• Inventory Problems: How much of a commodity should be Typical Steps in an OR Project
stored in a warehouse? What restocking policy should be 1. Identify the problem (general).
used?
• Auction Strategy: At what level should a bid be made to e.g. How can time and money spent by the Post
maximise chances of success for the smallest price? What Office be reduced?
is the optimal timing? 2. Formulate the problem (specific).
e.g. Minimise time spent on postal deliveries. Is
Many such problems involve us in maximising a the project viable?
function (e.g. profit) or minimising a function (e.g. cost or
3. Observe the system.
time) subject to constraints. This is the ‘classic’ context for
e.g. Find layout of town, union regulations etc.
OR and optimization generally, enabling researchers to
determine the most efficient way to carry out some operation. 4. Meet interested parties. Be aware of ‘territorial disputes’.
Tact needed.
27 28
5. Formulate mathematical model. 10. Post-optimal (sensitivity) analysis.

e.g. LP (minimise linear function subject to linear Are there alternative, sub-optimal solutions?
constraints), network model etc. These may be more practicable. What is the
6. Prepare to implement effect of changing input parameters?
Check algorithm to be used, pilot study, 11. Present results and options.
clean data etc. Presentation and report-writing skills important. Preferably
7. Write software - include checks. be non-directive.
8. Solve model. 12. Implement and evaluate.
9. Check solutions. May need to ‘sell’ proposed modifications to management
Are outcomes realistic? Is there an improvement and unions.
on the status quo? 13. Subsequent work.
May need to modify or debug model at a later stage.
29 30
An OR consultant/practitioner may need to undertake most
Hard systems approaches (hard OR) assume:
or all of these tasks. Tact, common sense and a flexible – objective reality of systems in the world
approach are at least as important as technical ability. – well-defined problem to be solved
– technical factors foremost
Hard and Soft OR – scientific approach to problem-solving
– an ideal solution.
Current academic thinking distinguishes two approaches.
Soft systems approaches (Soft OR) assume: HARD systems provide rigid techniques and procedures to
– organisational problems are ‘messy’ or poorly defined provide unambiguous solutions to well-defined data and
– stakeholders interpret problems differently (no processing problems, focused on computer implementations.
objective reality)
SOFT systems provide a loose framework of tools to be used
– human factors important
at the discretion of the analyst, focused on improvements to
– creative, intuitive approach to problem-solving organisational procedures.
– outcomes are learning, better understanding,
31 32
rather than a ‘solution’.
Summarising further, ‘hard’ OR is primarily technical and OR overlaps with other disciplines, as indicated earlier:
mathematical, ‘soft’ OR sociological and managerial.
- Mathematics (e.g. game theory, optimization theory)
Often discord between the two: ‘hard’ OR is too - Statistics (e.g. regression, MLE, time series and
‘mechanistic’; ‘soft’ OR is too ‘touchy feely’. forecasting, simulation)
- Management (e.g. leadership, organizational structure)
There should be room for both, but we concentrate - Economics (e.g. pricing policy).
on ‘hard’ technical approaches in this course.
And (for soft OR):
Pedantic point: all optimization methods can in principle be
considered as methods of OR. Some OR approaches such - Sociology (e.g. structure of organizations)
as soft OR are not optimization. So arguably optimization is - Philosophy (e.g. approaches to problems, scientific
a proper subset of OR … but let’s not get hung up on method)
definitions. - Psychology (e.g. negotiation).
33 34
1.3 Typical Problems Typical Problems

Specific techniques include:
• Linear Programming (LP) We concentrate on the first four of these (LP, IP,
• Integer Programming (IP) NLP, networks).
• Non-linear Programming (NLP) In general, our models are deterministic (not
• Network Modelling stochastic). That is, we assume parameters given
• Dynamic Programming exactly, not in accordance with some probability
• Queueing Models distribution. This distinguishes our course from
• Inventory Control other, parallel courses. Stochastic methods exist and are,
• Game Theory of course, widely applied – for example, in queueing theory.
• Simulation
• Forecasting
etc. 35 36
Typical Problems
Typical Problems
The revenue for each barrel produced is ₤6, ₤5,
₤3, ₤7 for Light, Dark, Ale, Premium respectively.
LP Example
What quantities of each should be produced to
A brewery makes four beers: Light, Dark, Ale and maximise revenue?
Premium. There are three main important ingredients: Let x1, x2, x3, x4 be the barrels of Light, Dark, Ale,
Malt (M), Hops (H) and Yeast (Y). [Assume limitless Premium. We have the following LP:
supply of water, sugar etc.] max 6x1 + 5x2 + 3x3 + 7x4 (revenue)
1 barrel of Light requires 1kg of M, 2kg of H, 1kg of Y s.t. x1 + x2 + + 3x4 ≤ 500 (malt)
1 barrel of Dark requires 1kg of M, 1kg of H, 1kg of Y 2x1 + x2 + 2x3 + x4 ≤ 1500 (hops)
1 barrel of Ale requires 2kg of H, 1kg of Y only x1 + x2 + x3 + 4x4 ≤ 800 (yeast)
1 barrel of Premium requires 3kg of M, 1kg of H, 4kg of Y x1, x2, x3, x4 ≥ 0 (all quantities non-negative)
[Clearly, this formulation may be oversimplified.]

Amounts of M, H, Y available are 500kg, 1500kg, 800kg
respectively 37 38
Typical Problems Typical Problems

That is, we seek to maximise an objective function We can write the problem even more succinctly:
subject to certain constraints (including non- max z = cTx
negativity constraints). This problem is linear. s.t. Ax ≤ b
In general, we have the (linear) problem or LP x≥0
n
Shall see that it is easy to write related problems in
max c x
j =1
j j (objective function) this form, where we want to minimise a linear
n
function, or have ≥ or equality constraints, or
s.t. a x  b
j =1
jij i
for i = 1, 2, …, m where x is unrestricted in sign. Shall also describe
xj ≥ 0 for j = 1, 2, …, n (constraints) a general method for solving LPs.
where we require the solution for unknown xj in
terms of given cj, aij, bi.
39 40
IP Example
Let xj = 1 if item j is taken, 0 if item j is
A shipping company wishes to transport six items:
omitted, for j = 1, 2, 3, 4, 5, 6. We have:
Item Weight Value
1 10 5 max 5x1 + 2x2 + 7x3 + 4x4 + x5 + 6x6
2 9 2 s.t 10x1 + 9x2 + 15x3 + 2x4 + 11x5 + 6x6 ≤ 33
3 15 7 xj ε {0, 1} (j = 1,2,3,4,5,6)
4 2 4
This is an example of a knapsack problem:
5 11 1
n n
6 6 6 max c x
i =1
i i
s.t. w x W
i =1
i i
The total weight cannot exceed 33. Which items should be

where xj ε {0, 1} (i = 1,2,…,n)
taken so as to maximise the value shipped?
and W, wi and ci are known for all i.
41 42
Typical Problems Typical problems

IPs are superficially similar to LPs. However, in practice they
are often harder to solve; fresh techniques are needed. We Let the decision variables xj (j=1,2,…,n) be the
described a 0-1 IP; others permit any positive integer values number of shares of stock j to be included.
for variables. Let μj and σjj be the estimated mean and variance of the
NLP example return on each share of stock j. (So σjj is a measure of risk.)
An NLP extends an LP where we relax the requirement that all For i,j =1,2,…,n (i≠j) let σij be the covariance of the
functions considered are linear (or, LP is a special case of return on one share each of stock i and stock j.
NLP). While LPs can, in general, be solved completely, NLPs (Difficult to estimate, but can be based on certain
are much more intractable. Often, the best we can hope for is assumptions about market behaviour.)
an approximate solution. But they do frequently occur. We can express the expected value R(x) and the variance
V(x) of the total return from the portfolio in terms of the
One context is in portfolio selection, when fund managers seek above quantities.
to balance their expected return and risk when n stocks are
being considered for inclusion in the portfolio. 43 44
V ( x) =  ii xi 2 +   ij xi x j
n
R( x ) =   x and
n
j =1
j j
i=1 i , j ,i j
Clearly, we have an NLP.
So V(x) represents portfolio risk.
One way to consider the trade-off between the two Various alternatives and extensions are possible.
factors is to use V as objective function to be For example, we might wish to maximise the total
minimised and insist that R be no less than the expected return where the risk should be no
minimum acceptable expected return. greater than a specified amount. There is no
This gives the NLP min V(x) guarantee the two formulations given yield the
s.t. R(x) ≥ L same result.
n
Px  B
j =1
j j
xj ≥ 0 (j=1,2,…,n) This approach is based on the work of Markowitz and

Here, L is the minimum acceptable expected return, Pj is the Sharpe who won the 1990 Nobel Prize in Economics.
price for each share of stock j, B is the amount of money
budgeted for the portfolio. 45 46

Network Example
Networks could represent:
A network is an arrangement of nodes (or vertices or
• Road networks
points), arcs (or edges) (edges) and capacities (numbers • Flow of oil
on arcs) such as • Electrical circuits
• Telephone switchboards etc.
Typical network problems are:
• Maximum flow problems, e.g. how much traffic can flow
from A to Z during rush hours? Capacities are maximum
numbers of cars on trunk roads.
• Shortest path problems, e.g. what is the shortest path from
A to Z? Capacities are distances.
Arcs may be ‘directed’ (can only be traversed in one Many network problems can be rewritten and solved as LPs
direction, as for one-way streets) or ‘undirected’. Some but more direct approaches exist and are more efficient.
arcs may not have a capacity. There are various types of
47 48
networks, or the equivalent notions of graphs and digraphs.
Chapter 2. LNO
2.1 Formulation
2.2 Standard Form
2.3 Graphical Solution
2.4 Simplex Method
2.5 Degeneracy and Cycling
Linear and Non-Linear Optimization 2.6 Initialisation
2.7 Further Example
2.8 Practical LP: Computing
Nicholas Cron 2.9 Duality
2.10 Dual Simplex Method
2.11 Sensitivity Analysis
2.12 Interior Point Methods
1 2
2.1 Formulation Formulation

Example 1
Many applications of mathematics and statistics involve Two warehouses A and B supply three stores C, D and E.
modelling. That is, instead of proceeding directly Supplies available at A and B are 10 and 6 units
Real Problem → Real Solution respectively and cannot be exceeded. Demands at C, D
we move indirectly
and E are 3, 7 and 6 units respectively and must be met.
Real Problem → Mathematical Representation
Transportation costs per unit of supply from warehouses to
→ Solution of Representation → Real Solution.
stores are as follows:
Often, solving the representation is straightforward, but
formulating it is less automatic as well as being crucial. C D E
Only limited guidelines are available for this, so it is A 5 4 2
worthwhile showing how LPs can be obtained in three less B 4 6 3
simple contexts than beer production. We formulate the problem of satisfying all requirements at
3
minimum total transportation cost.
Formulation Formulation
Suppose xij units are taken from warehouse i to store j Example 2(2010 exam)
(i = A,B, j = C,D,E). An oil firm operates three processes X, Y and Z. The input to
the processes is a single type of crude oil. For process X, each
Require min 5xAC + 4xAD + 2xAE + 4xBC + 6xBD + 3xBE (cost)
tonne of crude oil requires 20 staff, and produces a daily
subject to xAC + xAD + xAE ≤ 10 (supply at A)
output of ¾ tonne of a product A and ¼ tonne of a product B.
xBC + xBD + xBE ≤ 6 (supply at B)
For process Y, each tonne of crude oil requires 10 staff, and
xAC + xBC ≥ 3 (demand at C)
produces a daily output of ¼ tonne of product A and ¾ tonne
xAD + xBD ≥ 7 (demand at C)
of product B. For process Z, each tonne of crude oil requires
xAE + xBE ≥ 6 (demand at C)
15 staff, and produces a daily output of ½ tonne of product A
xij ≥ 0 (i = A,B, j = C,D,E)
and ½ tonne of product B. There are costs of £60, £40 and
£25 respectively for each tonne treated by processes X, Y and
Please don’t forget the non-negativity constraints!
Z. The firm has 125 staff available and can obtain 25 tonnes of
crude oil from a supplier each day. The selling prices of
6
products A and B are respectively £285 and £105 per tonne.
Suppose x, y, z are tonnes produced by processes X, Y, Z
How can we maximise revenue?
Read the question carefully, maybe draw a picture… The amount of A produced is ¾x + ¼y + ½z with income
285(¾x + ¼y + ½z). Similarly the income on B is
105(¼x + ¾y + ½z). Costs are 60x + 40y + 25z.
Net revenue is 285(¾x + ¼y + ½z) +105(¼x + ¾y + ½z) –

(60x + 40y + 25z) = 180x + 110y + 170z
which should be maximised.
The staffing constraint is 20x + 10y + 15z ≤ 125 can be

divided by 5.
The crude availability constraint is x + y + z ≤ 25.
There are also the obvious non-negativity constraints.
7 8
So LP is: Each period of duty lasts eight hours. How should nurses be
max 180x + 110y + 170z scheduled so that the minimum number are needed
s.t. 4x + 2y + 3z ≤ 25 in 24 hours, while maintaining adequate levels?
x + y + z ≤ 25
x ≥ 0, y ≥ 0, z ≥ 0 Let shift 1 be 0400-0800, shift 2 0800-1200 and so on.
Let xt be the number of nurses starting on shift t
Example 3 (t=1,2,3,4,5,6)
Hospital administrator needs to schedule nurses. Numbers Then for each t, {number starting on shift t}
required in each time period are: + {number starting on shift (t-1)}
≥ {number required during shift t}.
Period 0800-1200 1200-1600 1600-2000 2000-2400 2400-0400 0400-0800
Number 140 120 160 90 30 60
9 10
Formulation 2.2 Standard Form

We need min x1+ x2+ x3+ x4+ x5+ x6 Standard form for an LP is
s.t. x1+ x2 ≥ 140 max z = cTx (objective function)
x2+ x3 ≥ 120 s.t. Ax ≤ b (constraints)
x3+ x4 ≥ 160 x≥0 (non-negativity)
x4+ x5 ≥ 90 c ε R , b ε R , A an mxn matrix, maximisation is
n m
x5+ x6 ≥ 30 done for unknown x ε Rn. (n variables, m constraints).

x6+ x1 ≥ 60
xt ≥ 0 for t = 1,2,…,6 An nx1 vector y is feasible if Ay ≤ b and y ≥ 0
It is optimal if it is feasible and maximises z, so
This could be solved as an IP. Can also write in ‘standard cTy ≥ cTx for all feasible x.
form’, as a maximisation with ≤ constraints, as will be
described in the next section. The maximum value of z is the value of the LP.
The set of all feasible vectors is the feasible region.
11 12
Standard Form
Standard Form We write the following in standard form:
min 2x1+ 3x2 -7x3
Note that the definition of ‘standard form’ is not consistent s.t. x1 + 5x3 ≤ 2
through all texts. x1+ x2 + x3 = 11
Other LPs may have minimum objective function, equality x2 ≥2
or ≥ constraints, or free variables not restricted in sign. x1, x2 ≥ 0, x3 free
We can always convert to standard form, by: Let x3 = x3'- x3'', x3', x3'' ≥ 0. We solve
a) Converting a minimisation to a maximisation by negating z. [Notice that if S max -2x1 - 3x2 + 7x3' - 7x3''
is a set of real numbers, maxS = -(min(-S)), or min(S) = -(max(-S))].
s.t. x1 + 5x3' - 5x3'' ≤ 2
b) Reversing the sign of a ≥ constraint
x1 + x2 + x3' - x3'' ≤ 11
c) Converting an equality constraint to two constraints, one with ≤ , the other
with ≥ . (Alternatives exist.) -x1 - x2 - x3' + x3'' ≤ -11
d) Writing a free variable as the difference of two non-negative variables. -x2 ≤ -2
x1, x2 , x3' , x3'' ≥ 0
[Hint: when changing from min to max, check the sign of z
13 at the end.] 14
2.3 Graphical Solution

With only two variables, we can obtain an LP solution Graphical Solution
graphically. With several variables, the method is
impossible (although conceivable with three variables.) We can obtain the maximum by a carefully
Of limited practical use, but useful to suggest general
drawn graph. The constraints define a
properties that hold for ALL LPs.
polygonal feasible region. The objective
Consider the LP: max 20x1 + 25x2 function in the form z = 20x1 + 25x2 defines
s.t. x1 + 6x2 ≤ 48 (for varying z) a family of parallel lines,
3x1 + 3x2 ≤ 30 and we can observe the point (or, for some
x1 + ¼x2 ≤ 9
problems, edge, of the feasible region) where
x1, x2 ≥ 0
the maximum is attained.
(This problem arose from a practical context but we
leave formulation now to concentrate on solution.)
15 16
Graphical Solution Graphical Solution
We can determine the optimum point by the
intersection of the lines z = 20x1 + 25x2 with the
feasible region. It occurs at the intersection of
3x1 + 3x2 = 30 and x1 + 6x2 = 48 i.e. at (2.4, 7.6). z = 238.
Of course, for some LPs, there may be no solution
(e.g. where constraints include x1 + x2 ≤ 3 and x1 + x2 ≥ 4)
or there may be no finite solution (e.g. x1 - x2 ≤ 4 is only
constraint apart from non-negativity).
We can make the following observations and
comments. Important to realise they apply equally in higher
(> 2) dimensions.
17 18
Graphical Solution Graphical Solution

1. The feasible region for an LP is polygonal (for higher
3. The solution cannot occur in the interior of
dimensions the analogue is called a convex polytope).
This can be justified quite easily using the notion of the feasible region. This therefore raises the
convexity, to be defined later. possibility of examining each vertex of the polygon
2. There are just three possibilities for an LP in standard (convex polytope). But with n variables and m
form: constraints, we can show that the number of
a) The LP is infeasible. The feasible region is empty. vertices may be as many as (m+n)!/m!n! and such
There is no solution. enumeration is impractical for large m and n. The
b) The LP is unbounded. The feasible region is simplex method proceeds by examining candidate
unbounded and the objective function can be made
arbitrarily large. There is no finite optimal solution. vertices in a systematic and efficient way; that is, move
c) There is a unique finite value for the LP, occurring at from one vertex to another in a logical sequence.
a vertex of the feasible region, or along a hyperplane
(generalising a line in 2 dimensions). There is a unique We stress: comments 1,2,3 apply to ALL LPs, not just n=2.
optimal solution for z. 19 20
2.4 Simplex Method
Assume standard from. The idea is to move from vertex to
Simplex Method
vertex of the feasible region, increasing (or at least, not
Possible vertices of the feasible region are found by setting
decreasing) the value of z at each step. Shall assume at
2 (in general, n, the number of original variables) of the 5 (in
least one vertex has been located – consider later how to
general, n+m, the number of variables + number of
proceed when this has not been achieved.
constraints) zero.
The remaining 3 (in general, m) variables can then be
Consider the previous problem where slack variables have
determined.
been introduced, so the constraints are equalities:
The variables set to zero are non-basic.
max 20x1 + 25x2 = z The remaining variables are basic.
s.t. x1 + 6x2 + x3 = 48 If all basic variables are non-negative, we have a basic
3x1 + 3x2 + x4 = 30 feasible solution.
x1 + ¼x2 + x5 = 9 The equations which define basic variables in terms of non-
x1, x2, x3, x4, x5 ≥ 0 basic variables constitute a dictionary.
21 22
Simplex Method Simplex Method

For our example, set x1 = x2 = 0 (non-basic variables). That Now, the dictionary implies
is, we start our algorithm at point (0,0). The dictionary is x2 ≤ 48/6 = 8, x2 ≤ 30/3 = 10, x2 ≤ 9/¼ = 36.
x3 = 48 - x1 - 6x2 So increase x2 to 8, maximum possible, reduce x3 to 0.
x4 = 30 - 3x1 - 3x2 We have x1 = 0, x3 = 0 (non-basic)
x5 = 9 - x1 - ¼x2 x2 = 8 and easy to check from the dictionary that
We have the solution xT = (0 0 48 30 9) x4 = 6, x5 = 7.
Value of objective function is z = 0. Now xT = (0 8 0 6 7) and z = 0 + 25x8 = 200
Can we improve on this? We have moved from the point (0, 0) to the point (0, 8).
Recall z = 20x1 + 25x2 Can we improve further?
Yes, we can improve by relaxing x1 = 0 or x2 = 0 (i.e. Need to define a new dictionary; express basic x2, x4, x5 in
making one variable or the other positive). terms of non-basic x1 and x3.
Relax x2 because it has a larger coefficient, so likely to
lead to larger increase in z, so set x1 = 0. 23 24
x2 = 8 – 1/6x1 – 1/6x3 We have x3 = 0, x4 = 0 (non-basic)
x4 = 6– 5/ x + 1/ x
2 1 2 3
x1 = 12/5 and easy to check from the dictionary that
x5 = 7 - 23/24x1 + 1/24x3 x2 = 38/5, x5 = 47/10.
and z = 200 + 95/6 x1 - 25/6x3 Now xT = (12/5 38/5 0 0 47/10) and z = 200 + 95/6x12/5 = 238.
We have moved from the point (0, 8) to point (12/5, 38/5).
The LP is the same. We have simply used the variables in
Can we improve further? Recalculate dictionary.
different ways. We have used elementary algebra. As we x2 = 38/5 – 1/5x3 + 1/15x4
shall see, row operations would be better. x1 = 12/5 + 1/5x3 - 2/5x4
Now z can be increased only by increasing x1, since x3 is x5 = 47/10 - 3/20x3 + 23/60x4
best kept at 0. and z = 238 - x3 - 19/3x4
How much can we increase x1? No! Both coefficients in z are negative, there is nothing to
From the dictionary, be achieved by increasing x3 or x4.
x1 ≤ 8/1/6 = 48, x1 ≤ 6/5/2 = 12/5, x1 ≤ 7/23/24 = 168/23. We have the optimum, value 238 (and incidentally this
Increase x1 to 12/5, maximum possible, reduce x4 to 0. agrees with graphical solution).
25 26

STEP1: Write the problem, with slack variables, in
In practice, it is tiresome to use equations in this way. We
tableau form:
can do the same thing more efficiently with a tableau.
Recall the use of (elementary) row operations. In a matrix, x1 x2 x3 x4 x5 RHS
we can: 1 6 1 0 0 48
• Interchange two rows (Ri ↔Rj) 3 3 0 1 0 30
• Multiply a row by a non-zero constant (Ri →kRj) 1 ¼ 0 0 1 9
• Add a non-zero multiple of one row to another row -20 -25 0 0 0 0
(Ri →Ri+kRj)
while preserving the solutions of the underlying equations. The first three lines give the constraints, e.g. x1 + 6x2 + x3 = 48.
Simplex uses these operations, primarily the last one, to The last line can be read as -20x1 - 25x2 + z = 0; the entry in
avoid awkward algebra and streamline the process. the bottom right hand cell is the current value of the LP.
Facility with row operations required. (In some texts, a column for z is included. We do not do so.)
27 28
Simplex Method
Simplex Method
STEP 3: Using row operations, make the pivot 1 and all
STEP 2: Find the most negative value in the bottom row other column entries 0. (i.e. R1 →1/6R1 , R2 →R2-½R1 ,
(here, -25) and then the row with the smallest positive ratio R3 →R3-1/24R1 , Rz →Rz+25/6R1). (Rz is the bottom row).
from bi/aik where k is the column chosen, aik > 0. Here we This leads to a new tableau. Proceed as before: find
choose min{48/6, 30/3 ,9/¼ } = 8. [The ‘minimum ratio’ rule.] column, find pivot, carry out row operations.
This defines a pivot (green) with column chosen in red.
x1 x2 x3 x4 x5 RHS
x1 x2 x3 x4 x5 RHS
0.167 1 0.167 0 0 8
1 6 1 0 0 48
2.5 0 -0.5 1 0 6
3 3 0 1 0 30
0.958 0 -0.042 0 1 7
1 ¼ 0 0 1 9
-15.833 0 4.167 0 0 200
-20 -25 0 0 0 0
29 30

Note all bottom row coefficients are non-negative. x3, x4 non-
STEP 4:If all bottom row coefficients are non-negative, stop, basic, x1, x2, x5 basic. Solution vector (12/5 38/5 0 0 47/10), i.e.
have optimum. Otherwise, return to step 2. have x1 = 12/5, x2 = 38/5 in initial problem.
Here we use -15.833 column, find min {8/0.167,6/2.5,7/0.958}.
This leads to 2.5 as pivot. Use row operations We clarify the simplex procedure further.
R1 →R1 - 1/15R2, R2 →2/5R2 , R3 →R3-23/60R2 , Rz →Rz+19/3R2. STEP 1: Write LP, using slack variables, in tableau form.
• A subset of coefficients define the columns of an identity
Obtain fresh tableau, in optimum form. matrix, possibly permuted
• The bottom row coefficients for these columns should be 0
x1 x2 x3 x4 x5 RHS • RHS coefficients for vector b should be non-negative.
0 1 0.2 -0.067 0 7.6
1 0 -0.2 0.4 0 2.4 The bottom row will usually contain a mix of positive and
0 0 0.15 -0.383 1 4.7 negative values. If or when they are all non-negative, optimum
0 0 1 6.333 0 238 form has been attained. For now, we assume standard
form; if not, convert. Method needs to be modified if any RHS
31 coefficients are negative, or there is no identity matrix. 32
Note that the non-basic (not identity column) variables are
STEP 2: Find the most negative bottom row zero in any tableau.
coefficient (with ties, can choose at random). There may be redundant constraints, leading to a row
Suppose this is for variable xk. Select the pivot row by of zeros in the tableau, which can safely be removed.
min{bi/aik|aik>0} where the bi are the RHS coefficients, the Describe later the problems that can arise and how they
aik the elements of the chosen column. (Again, tie-breaking may be resolved.
at random). If there is a solution, the algorithm is guaranteed to
STEP 3: Having chosen the pivot, use row operations so terminate after a finite number of steps (except for cycling,
that the pivot becomes 1, all other column entries 0. later.)
STEP 4: Repeat steps 2 and 3 until all bottom row Also check that the tableau method merely streamlines
coefficients are non-negative. Then we have optimum form. earlier calculations; the same numbers are used.
The process ensures that the bi remain non-negative. Notice too that the process terminates early if one of the
33
following arise: 34
Simplex Method
Simplex Method Further Example
We solve max -x1 - 2x2 + x3
1. Some bottom row coefficient is negative but every other entry in that
column is non-positive. This is unbounded form; the LP has no finite s.t. x1 - x2 + x3 ≤ 1
optimum. x1 + x2 - 2x3 ≤ 4
x1 ≥ 0, x2, x3 free.
2. For some i, bi ≠0 (RHS coefficient non-zero) but aij=0 for all j (all other row Let x2 = x2' - x2'', x3 = x3' - x3'', x4, x5 slack. Tableaux:
entries zero). The feasible region is the empty set; the constraints are
incompatible. The LP is infeasible.
x1 x2' x2'' x3' x3'' x4 x5 RHS
1 -1 1 1 -1 1 0 1
3. bi>0 for some i but aij≤0 for all j. Again, the constraints are incompatible and 1 1 -1 -2 2 0 1 4
again there is no solution. The LP is infeasible. 1 2 -2 -1 1 0 0 0
x1 x2' x2'' x3' x3'' x4 x5 RHS
1 -1 1 1 -1 1 0 1
2 0 0 -1 1 1 1 5
35 36
3 0 0 1 -1 2 0 2
Simplex Method
x1 x2' x2'' x3' x3'' x4 x5 RHS 2.5 Degeneracy and Cycling
3 -1 1 0 0 2 1 6
2 0 0 -1 1 1 1 5
5 0 0 0 0 3 1 7 Consider the LP:
Solution x1 = 0, x2'' = 6, x3'' = 5
i.e. (x1, x2, x3) = (0, -6, -5) value 7. max ¾x1 - 20x2 + ½x3 - 6x4 + 3 = z
s.t. ¼x1 - 8x2 - x3 + 9x4 ≤ 0
Important Point 1 : The method as described assumes that ½x1 - 12x2 -½x3 +3x4 ≤ 0
all bi ≥ 0, and some columns of the tableau have an identity x3 ≤ 1
matrix, with objective function coefficients zero. Sometimes x1, x2, x3, x4 ≥ 0
that is not the case – so alternative procedures are needed.
Important Point 2: The method as described assumes the LP is and the following sequence of tableaux:
in standard form. If not, the technique will not yield the LP
optimum in general. Convert to standard form! 37 38
Degeneracy and Cycling Degeneracy and Cycling

x1 x2 x3 x4 x5 x6 x7 RHS x1 x2 x3 x4 x5 x6 x7 RHS
¼ -8 -1 9 1 0 0 0 1 0 8 -84 -12 8 0 0
1/2 -12 - 1/2 3 0 1 0 0 0 1 0.375 -3.75 - 1/2 1/4 0 0
0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 1
- 3/4 20 - 1/2 6 0 0 0 3 0 0 -2 18 1 1 0 3
x1 x2 x3 x4 x5 x6 x7 RHS x1 x2 x3 x4 x5 x6 x7 RHS
1 -32 -4 36 4 0 0 0 1/8 0 1 -10.5 -1.5 1 0 0
0 4 1.5 -15 -2 1 0 0 - 3/64 1 0 3/16 1/16 - 1/8 0 0
0 0 1 0 0 0 1 1 - 1/8 0 0 10.5 1.5 -1 1 1
0 -4 -3.5 33 3 0 0 3 1/4 0 0 -3 -2 3 0 3
39 40
Degeneracy and Cycling Degeneracy and Cycling
x1 x2 x3 x4 x5 x6 x7 RHS
-2.5 56 1 0 2 -6 0 0
-0.25 5.333 0 1 0.333 -0.667 0 0
2.5 -56 0 0 -2 6 1 1 ¼ -8 -1 9 1 0 0 0
-0.5 16 0 0 -1 1 0 3
1/2 -12 - 1/2 3 0 1 0 0
-1.25 28 0.5 0 1 -3 0 0 0 0 1 0 0 0 1 1
-0.167 -4 -0.167 1 0 0.333 0 0
0 0 1 0 0 0 1 1
- 3/4 20 - 1/2 6 0 0 0 3
-1.75 44 0 0 0 -2 0 3
41 42
Degeneracy and Cycling

But the final tableau is identical to the initial tableau. We
have made no progress. Notice that on several occasions, we
How to avoid cycling? One approach:
have had two choices of pivot and have made the ‘wrong’
a) Check for degeneracy at each stage. If no bi is zero, cycling cannot occur.
choice. This is the phenomenon called cycling.
b) If there is degeneracy, check for cycling (does any tableau recur?). Most
The problem was in the terms where bi = 0. degenerate problems will attain optimum without cycling.
An LP is called degenerate if this occurs in some tableau under c) If there is cycling, change the pivot rule. Could choose the least negative
some sequence of simplex pivots. Equivalently, it occurs if some rather than the most negative bottom row coefficient, or the candidate pivot
column k with smallest index k.
basic variable is zero. It is not normally serious. If bi is not zero
d) In the extremely unlikely event that the problem has not been fixed, use a
in any tableau, the LP is non-degenerate. perturbation method (some bi changed from 0 to δ before we let δ →0.)
A non-degenerate LP increases the objective function at each
iteration, so cannot cycle. In a degenerate LP, there may be no
increase following a pivot.
Cycling implies degeneracy.
Degeneracy does not imply cycling. 43 44
Change the pivot rule, by using the least, rather
than most negative bottom row coefficient:
Optimal form has been attained after two iterations.
Cycling is rare in practice – some LP packages don’t

even consider it – but be aware of the possibility.
45 46
2.6 Initialisation Initialisation

We can operate the simplex algorithm for an LP in standard For example, the following LP tableau is not in the right form
form given a suitable starting tableau: and cannot obviously be converted – possibly there is no
• Certain columns of coefficients contain an identity matrix BFS at all.
• Bottom (objective function) coefficients for these x1 x2 x3 x4 x5 RHS
columns are zero -2 1 -4 -1 0 3
• All RHS bi coefficients non-negative 3 -2 7 0 -1 -5
-1 -4 -9 0 0 0
These assumptions correspond to the existence of a basic
feasible solution (BFS), typically an origin in Rn. But there is no problem for this one. Multiply the first two
But sometimes there is no obvious BFS, and indeed such a rows by -1 and read off the BFS x1=x2=x3=0, x4=3, x5=5.
solution may not even exist. Typically, this problem occurs
x1 x2 x3 x4 x5 RHS
when there are ≥ or equality constraints, rather than all ≤ -2 1 -4 -1 0 -3
3 -2 7 0 -1 -5
constraints.
-1 -4 -9 0 0 0
47
Initialisation
Initialisation Big-M Method
How to proceed? Could: Consider the problem:

• Ensure all bi≥0, by multiplying rows by -1. But that could destroy the identity max x1 + 4x2 + 9x3
matrix structure. s.t. -2x1 + x2 - 4x3 ≥ 3
• Pivot to create identity matrix structure. But that could lead to negative RHS 3x1 -2x2 + 7x3 ≥ -5
coefficients. x1, x2 , x3 ≥ 0
• ‘Play around’ with the tableau. ‘Experiment’. Use ‘judicious pivoting’ (for small This is not in the appropriate form, and there is no obvious way
problems). Such an ad hoc approach is clearly unacceptable. to convert it. Introduce artificial variable(s) R solely for the
purpose of obtaining an initial basis. They have no real meaning.
We describe two methods to address these We aim to drive R out of the basis, so that it/they become zero. If
problems, both using ‘artificial variables’. simplex terminates with an optimal solution, with the artificial
variables 0, we have solved the LP. If we attain optimal form with
at least one artificial variable positive, the original LP has no
49 feasible solution. We modify the objective function …. 50
Initialisation Initialisation
Initial tableau:
Write the LP with M a huge positive constant as
max x1 + 4x2 + 9x3 - MR
s.t. -2x1 + x2 - 4x3 - x4 + R =3 Pivot as shown, initially to give ‘identity structure’ (a small
- 3x1+ 2x2 - 7x3 + x5 =5 step but a crucial one), then a standard simplex pivot:
x1, x2 , x3 , x4, x5 , R ≥ 0
R has been introduced to create an ‘artificial’ identity matrix
structure. The term MR is present to penalise positive
values of R; it is designed to compel R=0 if this can be done.
If the modified problem has a solution with R=0, it will solve
the original LP. If the modified problem has a solution with Notice that we have optimal form (M is very large) but R is
R > 0, there is no solution to the original LP without non-zero. Deduce that the feasible region is empty; no
introducing R, so the LP is infeasible. solution to the initial problem.
51 52
Initialisation
Initialisation
Modify the problem slightly.
max x1 + 4x2 + 9x3
s.t. -2x1 + x2 - 4x3 ≥ 3
3x1 -2x2 + 7x3 ≥ -7
x1, x2 , x3 ≥ 0
Leading to
max x1 + 4x2 + 9x3 -MR
s.t. -2x1 + x2 - 4x3 - x4 + R =3
- 3x1+ 2x2 - 7x3 + x5 =7
x1, x2 , x3 , x4, x5 , R ≥ 0
Now proceed as before. 53 54
Initialisation
Notice that R has been driven out of the basis. At optimum Initialisation
R=0. We can read off the solution: x1 = 0, x2 = 7, x3 = 1, z = 37.
With two or more artificial variables, proceed similarly; both Phase 2: Use the basic feasible solution from
(all) will need to be removed as basis variables, if possible. phase 1, ignoring the artificial variables which no
longer play any part, as starting point for the
original problem with original objective function.
Two-Phase Method
Apply ordinary simplex to yield the optimum.
Phase 1: Again use artificial variables. Create a new objective Return to the problem:
function consisting of the sum of the artificial variables. Use
simplex to minimise this function subject to given constraints. max x1 + 4x2 + 9x3
If this new artificial function can be reduced to zero, then each s.t. -2x1 + x2 - 4x3 ≥ 3
of the (non-negative) artificial variables will be zero. Then all 3x1 -2x2 + 7x3 ≥ -5
the original constraints are satisfied: proceed to stage 2. If not, x1, x2 , x3 ≥ 0
we deduce at once that the original problem is infeasible.
55 56
Solve: Pivot, initially for identity structure, then usual simplex:
min R
s.t. -2x1 + x2 - 4x3 - x4 + R =3
-3x1+ 2x2 - 7x3 + x5 =5
x1, x2 , x3 , x4, x5 , R ≥ 0
Tableau:
Solution has min R = ½ (note change of sign). Since

there is no solution for the modified problem with
R = 0, the original problem is infeasible, confirming the
earlier result.
57 58
Consider modified problem again, in form:
min R
s.t. -2x1 + x2 - 4x3 - x4 + R =3
-3x1+ 2x2 - 7x3 + x5 =7
x1, x2 , x3 , x4, x5 , R ≥ 0
For phase 1, proceed as before.
Now have solution to phase 1 problem with x2 = 3, x5 =1 and
crucially, R=0. A solution to original problem exists.
Find it by dropping the artificial variable, restoring original
objective function, pivoting to make x2 basic, then using
standard simplex.
59 60
Initialisation
Initialisation
Comparison
Which is better, Big-M or Two-Phase? Big-M may
be simpler and has the advantage of carrying out
optimisation in one pass. However, it has a serious
computational disadvantage. In running the
algorithm, we would frequently need to multiply a
very large number (M) by a very small number (R)
and computer arithmetic can lead to serious round-off
error.
We obtain the same solution as earlier. For this reason, the two phase method is more widely
used.
61 62
2.7 Further Example Further Example

First, write the LP with slack and surplus variables. There is
We consolidate some ideas encountered with another no obvious basic feasible solution, so an artificial variable R
example. This section can be skimmed by those is introduced.
comfortable with the simplex method so far. max 4x1 + 5x2 + 5/2x3
Consider the problem: s.t. 2x1 + 3x2 + x3 + s1 =9
max 4x1 + 5x2 + 5/2x3 2x1 + x2 + 2x3 + s2 =9
s.t. 2x1 + 3x2 + x3 ≤ 9 x1 + s3 =4
2x1 + x2 + 2x3 ≤ 9 x1 - s4 +R =1
1≤ x1≤4, 0≤ x2≤1, x3 ≥ 0 x2 + s5 =1
This problem has bounded constraints. We could apply (all variables non-negative)
standard simplex were it not for the constraint 1≤ x1.
Methods exist for bounded variables (for example, we can Now create the tableau. Since we shall use the two phase
define x1' = x1 – 1) but we solve using the two phase method, the objective function becomes min R. We expect to
approach for illustrative and pedagogic reasons. find a solution with R = 0 if the original problem has a solution.
63 64
Further Example Further Example
65 66
Further Example Further Example
67 68
Computing
2.8 Practical LP: Computing Excel
Open Solver under Tools – an Add-in may be needed. One
We see how we can solve an LP simply by computer.
way to proceed is to assume that variables x1, x2, x3, x4 are
min 4x1 + 11x2 - 13x3 + 5x4
in cells A1-A4.
s.t. x1 + 4x2 - 9x3 - x4 ≥ 21
2x1 - 7x2 + 10x3 + 10x4 = 13 B1 contains the value of the objective function
7x1 + 2x2 + 5x3 - 2x4 = 17 =4*A1+11*A2-13*A3+5*A4.
x1 + x2 + x3 + x4 ≥8 C1 contains the LHS of the first constraint =A1+4*A2-9*A3-A4
x1, x2 , x3 , x4 ≥ 0 and similarly, D1-F1 the LHSs for the other constraints.
Having four constraints, with awkward numbers, and ≥ and In the Solver dialogue window:
equality constraints, means that use of Big-M or the two Set Target Cell $B$1
phase method, or otherwise, directly will be tiresome and Equal to Min
prone to error. For larger problems, manual solution is By Changing Cells $A$1:$A$4
almost impossible.
Subject to Constraints – use Add to give $C$1>=21 etc.
69 Options – Assume Non-Negative and Assume Linear Model 70
Computing Computing
Now using Solver gives the solution
x1 = 2.165, x2 = 5.970, x3 = 0, x4 = 5.046,
With window
z = 99.565 (to 3 d.p.)
Some other options are available. For example, Excel
supplies a brief sensitivity analysis if required.
R
More than one way to do this. Maybe simplest is to use the
package lpSolve. This must be downloaded from the
Packages menu.
Then use the following syntax. Fairly self-explanatory, but
note:
- Syntax must be used exactly as written; R is very sensitive
- Coefficients are entered by columns rather than rows.
71 72
Computing 2.9 Duality
Early in the development of LP theory, it was
this.lp=lp(objective.in=c(4,11,-13,5),
const.mat=matrix(c(1,2,7,1,4,-7,2,1,-9,10,5,1,-1,10,-2,1),nrow=4),
realised that every LP has an associated LP,
const.rhs=c(21,13,17,8), its dual, and the solutions to the two are closely
const.dir=c(">=","==","==",">="),direction="min") related. This is important for both theoretical and
this.lp$solution practical reasons.
Obtain the same solution as before: Consider an LP in the form max cTx
s.t. Ax ≤ b
[1] 2.164557 5.970464 0.000000 5.046414
x≥0
Alternatives exist in R, such as solveLP and simplex in the
boot package. Of course, many packages other than R and Its dual is min bTy
s.t. ATy ≥ c
Excel can perform LP.
y≥0
Considerations should include speed, the size of the LP
and the amount of output desired.
73 74
Duality
Duality
For example, the dual of
max 2x1 + x2 Note the correspondence between primal L
and dual L*:
s.t. x1 + x2 ≤ 6
x1 - x2 ≤ 2 • One is a maximisation, the other a minimisation;
x2 ≤ 3 • Both have inequality constraints with opposite signs;
x1, x2 ≥ 0 • To form the dual, matrix A is transposed;
• The objective function in one is the RHS vector in the other, and vice versa;
• Each primal variable corresponds to a dual constraint, and vice versa.
is min 6y1 + 2y2 + 3y3
s.t. y1 + y 2 ≥2
y1 - y2 + y3 ≥ 1
y1, y2 ,y3 ≥ 0
75 76
Duality Duality
Result 1 Now … what about equality constraints?
The dual of the dual of an LP is the original LP. [We usually Let us find the dual of L: min cTx s.t. Ax = b, x ≥ 0.
refer to the original LP as the primal].
 A   b 
Proof Write L as T
min c x s.t.   x   , x  0
Let L (primal) be the LP max cTx s.t. Ax ≤ b, x ≥ 0.  − A  −b 
Notice that every LP can be written in this way. T T
The dual LP L* is min bTy s.t. ATy ≥ c, y ≥ 0

 b   A 
Then L* is max   y s.t.  y  c, y  0
Or –max(-bTy) s.t. -ATy ≤ -c, y ≥ 0  −b   − A
Then (L*)* is –min(-cTx) s.t. –(AT)Tx ≥ -b, x ≥ 0
 u  u  u
which can be written max cTx s.t. Ax ≤ b, x ≥ 0 Or max ( bT −bT )   s.t. ( A
T
− A )    c,    0
T
and that is just the LP L, i.e. (L*)* = L v v v

We can therefore talk of a dual pair of LPs.
 u
where y = 
77
v 78
Duality Duality
Thus L* can be written Result 3 (Weak Duality Theorem)
max bT(u-v) s.t. AT(u-v) ≤ c, u, v ≥ 0 Consider the primal-dual LPs: max cTx s.t.
or max bTz s.t. ATz ≤ c, where z = u-v is a Ax ≤ b, x ≥ 0 and min bTy s.t. ATy ≥ c, y ≥ 0.
vector of free variables. Then if y is feasible for the minimisation and x is
feasible for the maximisation, cTx ≤ bTy
We can deduce a further correspondence between an LP
Proof
and its dual (partial proof given):
Since x is feasible, Ax ≤ b, x ≥ 0.
Result 2 Thus (Ax)T ≤ bT hence xTAT ≤ bT and xTATy≤ bTy
The dual variable defined by an equality constraint is But y is feasible so ATy ≥ c, y ≥ 0
unrestricted in sign, and vice versa. So xTATy ≥ xTc
It follows that xTc ≤ bTy
79
But xTc is a scalar so xTc = (xTc )T = cTx. The result follows. 80
Duality Duality
Result 3 is usually called the Weak Duality Theorem. In words, Corollary 3.3 (direct from Weak Duality)
it says that the value of the objective function of the minimum If the maximum problem is feasible but its objective function
problem is always greater than or equal to that of the is unbounded, the minimum problem cannot have a feasible
maximum problem. Some consequences, for a primal-dual pair: solution. [If the minimum problem has a feasible solution y*,
Corollary 3.1 (direct from Weak Duality) then cTx ≤ bTy* for all solutions x of the maximum problem,
which cannot occur when the maximum problem is unbounded.]
The value of the objective function of the maximum problem for
any feasible solution is a lower bound to the minimum value of Corollary 3.4 (direct from Weak Duality)
the minimum objective function. If the minimum problem is feasible but its objective function is
Corollary 3.2 (direct from Weak Duality) unbounded, the maximum problem cannot have a feasible
solution. [If the maximum problem has a feasible solution x*,
The value of the objective function of the minimum problem for
then cTx* ≤ bTy for all solutions y of the minimum problem,
any feasible solution is an upper bound to the maximum value of
which cannot occur when the minimum problem is unbounded.]
the maximum objective function.
81 82
Duality Duality
Result 5 (Strong Duality Theorem)
Corollary 3.5 If one of the problems has an optimum, then so does the other
If both problems have feasible vectors, both have optimal and the optimal values are equal. (Proof omitted.)
vectors (contrapositive of 3.3 and 3.4).
This result lies at the heart of duality theory. Unfortunately, I am
Result 4 not aware of a satisfactory elementary proof; in fact I am almost
Both primal and dual may be infeasible. convinced that no elementary proof exists. The proofs I have
Proof seen are either short and dubious, or long and hard to follow.
A simple example will suffice, e.g. If you find a good one, let me know. But see Bazaraa, Sherali
L: maximize 2x1 - x2 and Shetty (not elementary).
subject to x1 - x2 ≤ 1
-x1 + x2 ≤ -2 Hillier and Lieberman do not give a proof. Some
x1, x2 ≥ 0 other texts give a matrix based proof which does not fit with the
Easy to see that both L and L* are infeasible. approach adopted in this course. See, for example
www.math.ubc.ca/~anstee/math340/340strongduality.pdf
83 84
Duality Duality
You should know this very important result but its proof will Result 6 (Complementary Slackness)
not be required. Consider a primal-dual pair of feasible LPs, one in standard
form. If a constraint of either LP is slack at optimum, then in
Corollary 5.1 (logical consequence of earlier results) the other problem, the corresponding dual variable is zero
The following are the only possible relationships at optimum. If a variable of either problem is non-zero at
between the primal and dual problems: optimum, then in the dual the corresponding constraint is
a) Both are feasible and bounded, so have (equal) binding at optimum.
optimal solutions;
b) One is feasible and unbounded, the other infeasible; Example
c) Both are infeasible. Suppose we are asked to solve the LP (call it P):
min 12w1 + 20w2 + w3
Note that the finiteness of all solutions implies the s.t. 0.5w1 + w2 + 1/16w3 – 1200 w4 ≥ 24
existence of an optimum for an LP. This may not be true for w1 + w2 + 1/24w3 + 800 w4 ≥ 20
an NLP. Consider, for example the problem min ex for w1, w2 , w3, w4 ≥ 0.
x ≤ 0. This does not have an optimum.
86
Duality Duality
x1 > 0 → first constraint of P is binding at optimum (holds
This is not straightforward. But consider the dual LP (say D):
with equality)
x2 > 0 → second constraint of P is binding at optimum
max 24x1 + 20x2
s.t. 0.5x1 + x2 ≤ 12
Constraint 2 of D is slack → w2 = 0
x1 + x2 ≤ 20
1/ x + 1/ x ≤ 1 Constraint 4 of D is slack → w4 = 0
16 1 24 2
-1200x1 + 800x2 ≤ 0
Putting the above four conditions together, we find
x1, x2 ≥ 0.
0.5w1 + 1/16w3 = 24
It is easy to check (for example, graphically) that x1 = 12,
w1 + 1/24w3 = 20
x2 = 6, z*= 408 is optimal for D. So we know (result 4) that P
has value z = 408. But what are w1, w2 , w3, w4 at optimum?
Solving gives w1 = 6, w3 = 336 so that (w1, w2 , w3, w4)T =
We use complementary slackness.
(6, 0, 336, 0)T is optimal for P. (Check we have z = 408)
87 88
Duality Duality
Proof of Complementary Slackness We can always write the dual in this way, including
Present an algebraic (not matrix) proof. surplus variables tj.
Write the primal P with slacks as: Both P and D are assumed feasible, so have
P: max Σj cjxj =z optimal solutions z*, w*. At optimum, we have
s.t. Σj aijxj + si = bi for all i w*- z* = Σi biyi - Σj cjxj
xj , si ≥ 0 for all i,j = Σi (Σj aijxj + si )yi - Σj(Σi aijyi - tj )xj
We can always write a primal in this way, including = Σisiyi + Σjtjxj
slack variables si. Again by duality, w* = z*
Write the dual D as: so Σisiyi + Σjtjxj = 0.
P: min Σi biyi = w But all variables are non-negative.
s.t. Σi aijyi - tj = cj for all j Therefore siyi = 0 for all i, tjxj = 0 for all j.
yi , tj ≥ 0 for all i,j
89 90
Duality Duality
If a constraint is slack at optimum, either Consider the following LP, for an arbitrary positive
si ≠ 0, when yi = 0, or tj ≠ 0, when xj = 0. integer M (2 variables, 2M constraints):
If a variable is non-zero at optimum, either max x1 + x2
yi ≠ 0, when si = 0, or xj ≠ 0, when tj = 0. s.t. (i – M)x1 + x2 ≤ i(i – 1) for i = 1, …, M
This is precisely the assertion of the x1 + (i – M)x2 ≤ i(i – 1) for i = 1, …, M
Complementary Slackness Theorem. x1, x2 ≥ 0
We can show that using ordinary simplex, we can
Apart from its theoretical importance, duality can find the optimum at (M(M-1), M(M-1)) but
sometimes be put to computational advantage by (M+1) pivots are needed to attain this point.
drastically reducing the number of pivots need to attain
optimum.
91 92
Duality Summary of Duality Relations
Now consider the dual LP. This is
min 0y1 + 2y2 + …M(M-1)yM + 0yM+1 + 2yM+2 + …+M(M-1)y2M Four Possible Primal-Dual Problems
s.t. -(M-1)y1 - … - 1yM-1 + 0yM + 1yM+1 + 1yM+2+… + 1y2M ≥ 1
1y1 + 1y2 + … + 1yM - (M-1)yM+1 - … - 1y2M-1 – 0y2M ≥ 1
yi ≥ 0 (i = 1, 2, …, M) Dual
Apart from non-negativity, there are only two constraints and the Finite Unbounded Infeasible
Optimum
optimum at (0,0,0,0,0,0,0,…,1,0,0,0,0,0,0,0,…,1) is attained
Finite ? X X
with two pivots. Of course, the value 2M(M-1) is the same for Optimum
both primal and dual.
Primal Unbounded X X ?
But the work required to obtain optimal form is much less, for
large M, through using the dual since Infeasible X ? ?
Pivots to solve dual/ 2
Pivots to solve primal = /M+1 93
Dual Simplex Method

2.10 Dual Simplex Method
For the primal simplex algorithm, some elements in row z
Shall consider a variant of the standard simplex method of (objective function row) will be negative until the final
section 2.4. That only operates when the RHS column is iteration, when all elements of row z are nonnegative and
always non-negative, so the basic solution is feasible at we have attained optimum. In the event that all elements of
each iteration. row z are non-negative, we say that the associated tableau
We refer to the main simplex algorithm as primal simplex, is dual feasible. Alternatively, if some of the elements of
and the tableau as primal feasible if all RHS elements are row z are negative, we have a dual infeasible tableau.
non-negative. When some elements are negative, we call
the tableau primal infeasible. As described, the primal simplex method works with primal
We encountered primal infeasible tableaux when feasible, but dual infeasible (non-optimal) tableaux. At the
initialisation was discussed. The dual simplex method can final (optimal) solution, the tableau is both primal and dual
sometimes provide a simper way to deal with such cases. feasible. Throughout the process we maintain primal
feasibility and drive toward dual feasibility.
Dual Simplex Method
Dual Simplex Method The dual simplex algorithm is used when an
initial dual feasible solution (all z row coefficients
In this section, a variant of the primal approach, known as non-negative) is readily available. Two contexts are
the dual simplex method, is considered that works in just for re-optimising a problem after a further constraint
the opposite fashion. Until the final iteration, each tableau has been added; and when some parameters change so the
examined is primal infeasible (some negative values on the
previous optimum may no longer be feasible. It is thus a
right-hand side) and dual feasible (all elements in row z are
nonnegative). At the final (optimal) iteration the solution most useful tool in sensitivity analysis.
will be both primal and dual feasible.
We shall not consider the full relationship between the primal
Throughout the process we maintain dual feasibility and and dual at each iteration but rather focus on the mechanics
drive toward primal feasibility. For a given problem, both of implementing the dual simplex method in tableau format. It
the primal and dual simplex algorithms, if applicable, will will be seen that the application of dual simplex is quite
terminate at the same solution but arrive there by different similar to primal simplex. There is a correspondence between
routes. rows and columns (constraints and variables) which places it
as a method of duality.
Dual Simplex Method Dual Simplex Method

With reference to the tableau, the algorithm must begin
with a basic solution that is dual feasible so all the In full, we consider a tableau in standard form having
elements of row z must be non-negative. As in the primal c ≥ 0 (all z row coefficients non-negative).
simplex algorithm, at each iteration there is an exchange: a
basic variable becomes non-basic, and a non-basic 1. If b ≥ 0, we have optimal form.
variable becomes basic. It is only the rule that is different, 2. Otherwise, select h with bh < 0 (usually the most negative bh, where there
and the change reflects the usual duality properties: rows is a choice).
and columns interchanged, max and min interchanged and 3. If ahj ≥ 0 for j = 1,2,…,n then the dual is unbounded so the primal is
so on. infeasible.
4. Otherwise, select k such that ck/ahk = max {cj/ahj | ahj< 0}.
If the tableau is both primal infeasible and dual infeasible
5. Pivot on ahk in the usual way and return to step 1.
(negatives in both the RHS and the z row), the method
cannot be used without modification.
Consider the LP: Tableau form:
min z = 2x1 + 15x2 +18 x3 x1 x2 x3 x4 x5 x6 x7 RHS
s.t. -x1 + 2x2 - 6x3 ≤ -10 -1 2 -6 1 0 0 0 -10
x2 + 2x3 ≤ 6 0 1 2 0 1 0 0 6
2x1 + 11x3 ≤ 19 2 0 11 0 0 1 0 19
- x1 + x2 ≤-2 -1 1 0 0 0 0 1 -2
x1, x2, x3 ≥ 0 2 15 18 0 0 0 0 0
In standard form, with slack variables added, this is: This would be in optimal form but two RHS coefficients are
-max -2x1 -15x2 - 18x3 negative. The tableau is primal infeasible. It would give a
s.t. -x1 + 2x2 - 6x3 + x4 = -10 solution with x4, x7 < 0.
x2 + 2x3 + x5 = 6
Rather than apply a two-phase method, or the Big-M
2x1 +11x3 + x6 = 19
- x1 + x2 + x7 = -2 method, it is simpler to apply dual simplex to the dual
x1, x2, x3, x4, x5, x6, x7 ≥ 0 feasible tableau. Proceed as follows.

The vector b has two negative components, b1 = - 10 and Repeat. There is only one negative b entry, b3 = - 1. There
is only one entry a33= - 1 for which cj> 0, a3j < 0. So a33= - 1
b4 = - 2. We choose b1 = - 10 because it is smaller.
is the pivot. The non-basic variable x3 must be changed
Next we choose an entry of the matrix A which produces into ( 0 , 0 , 1 , 0 )T. The new tableau is
the largest value of cj/a1j with negative a1j. x1 x2 x3 x4 x5 x6 x7 RHS
max { c1/a11 = 2/(-1) , c3/a13 = 18/(-6) } = -2 1 22 0 11 0 6 0 4
0 9 0 4 1 2 0 4
Then a11= - 1 is the pivot. Use standard row operations to 0 -4 1 -2 0 -1 0 1
0 23 0 11 0 6 1 2
make x1 basic, i.e. to make a11 = 1, other column entries zero.
0 43 0 14 0 6 0 -26
x1 x2 x3 x4 x5 x6 x7 RHS Since all the components of the vector b are nonnegative,
1 -2 6 -1 0 0 0 10
this is the final tableau; the minimum value of z = -(-26 ) = 26
0 1 2 0 1 0 0 6
0 4 -1 2 0 1 0 -1 and the optimal solution is (4,0,1,0,4,0,2 )T or x1=4,
0 -1 6 -1 0 0 1 8 x2=0,x3=1.
0 19 6 2 0 0 0 20
By contrast, the following tableau has no solution:
Further example:
1 -2 6 -1 0 0 0 10 min z = x1 + 4 x2 + 3x4
0 1 2 0 1 0 0 6 s.t. x1 + 2 x2 - x3 + x4 ≥ 3
0 4 1 2 0 1 0 -1 -2x1 - x2 + 4x3 + x4 ≥ 2
0 -1 6 -1 0 0 1 8 x1, x2, x3, x4 ≥ 0
0 19 6 2 0 0 0 20
Use dual simplex, pivots as shown.
It is impossible to find a pivot using the above rules
and method.
Alternatively, the third row tells us at once that the tableau
is in infeasible form, since 4x2 + x3 + 2x4 + x6 = -1
is impossible.

x1 x2 x3 x4 x5 x6 RHS
-1 -2 1 -1 1 0 -3
Now suppose the constraint x3 ≥ 6 is imposed. What to do?
2 1 -4 -1 0 1 -2
1 4 0 3 0 0 0 Modify final tableau.
1 3.5 0 2.5 -2 -0.5 0 7
0 1.5 1 1.5 -1 -0.5 0 4
0 0 -1 0 0 0 1 -6
0 0.5 0 0.5 2 0.5 0 -7
Obtain required simplex form by adding row 2 to row 3.
1 3.5 0 2.5 -2 -0.5 0 7
Solution x1 = 7, x2 = 0, x3 = 4, x4 = 0, z = -(-7) = 7 0 1.5 1 1.5 -1 -0.5 0 4
0 1.5 0 1.5 -1 -0.5 1 -2
A lot easier than phase 1 and phase 2! 0 0.5 0 0.5 2 0.5 0 -7
2.11 Sensitivity Analysis
Dual Simplex Method Also known as post-optimal analysis. After solving an LP,
some specification may change. Can sometimes solve the
Final pivot on a35 leads to optimal (both modified problem without starting all over again. Additionally,
primal and dual feasible) tableau can sometimes find what changes can occur in a given value
without altering the solution.
1 2 0 1 -1 0 -1 9
0 0 1 0 0 0 -1 6 Recall the brewery problem - slightly modified:
0 -3 0 -3 2 1 -2 4 max 6x1 + 5x2 + 3x3 + 7x4 (revenue)
0 2 0 2 1 0 1 -9
Read off solution x1 = 9, x2 = 0, x3 = 6, x4 = 0. s.t. x1 + x2 + + 3x4 ≤ 50 (malt)
2x1 + x2 + 2x3 + x4 ≤ 150 (hops)
A lot simpler than completely resolving the problem, as well
x1 + x2 + x3 + 4x4 ≤ 80 (yeast)
as Phase 1/Phase 2. x1, x2, x3, x4 ≥ 0
110
Sensitivity Analysis Sensitivity Analysis

Initial tableau M:
We shall consider:
• Changes in production, e.g. where we insist that x4 > 0,
so some units of Premium must be produced, even
though at optimum x4 = 0;
• Changes in resources, e.g. where the amount of
available yeast alters (so the RHS constant column is
different);
Optimal tableau M*, after standard simplex: • Changes in selling prices, e.g. where the revenue for
dark increases (so the objective function row is altered);
• New constraints, where an extra row is added to the
tableau.
We only look at changes in a single coefficient, although

methods can be extended for multiple changes.
Solution x1 = 40, x2 = 10, x3 = 30, x4 = 0. 111 112

Changes in Production: Non-Basic Variables Similarly, if we require 5 units of hops to be
Suppose we require x4 = 1 rather than x4 = 0. The method left over, s2=5 and from tableau:
is to combine the x4 column with the constant column in M*:
Provided the constant column is non-negative, the tableau

remains optimal.
When x4 = 1, the new optimum vector is we can give the new optimum in this case as
(x1, x2, x3, x4)T = (44, 3, 29, 1)T, z = 373. (x1, x2, x3, x4)T = (35, 15, 30, 0)T and z = 375.
113 114

Supposing the modified tableau is no longer optimal? Use Changes in Production: Basic Variables
dual simplex. Thus, if we insist x4 = 2, the tableau becomes Now make a change such as requiring x1 = 41 instead of
x1 = 40. Consider the second constraint:
x1 - 4x4 + s1 + s2 – 2s3 = 40.
To increase x1 to 41, either x4 = ¼ or s3 = ½.
(can show algebraically that a mixture will not be profitable.)
This is in the appropriate from to use dual simplex, and a Since z = 380 - 7x4 - 3s1 - s2 - s3, increasing x4 by ¼ lowers z
further pivot gives the tableau by 7/4, increasing s3 by ½ only lowers z by ½. So use s3.
Now combining s3 with the RHS, as before, have tableau
so the new optimum (x1, x2, x3, x4)T = (44, 0, 28, 2)T and z = 362. 115 116
Setting s3 = ½ gives (x1, x2, x3, x4)T = (41, 9, 29.5, 0)T
and z = 379.5. Changes in Resources
Similarly, suppose x2 = 8 instead of x2 = 10. Suppose the available quantity of malt/hops/yeast
Consider constraint x2 + 7x4 - s2 + 2s3 = 10. increases or decreases? If, at optimum, all of a resource is
The change implies x4 = 2/7 or s3 = 1. not used up (slack variable positive), then increasing a
Since z = 380 - 7x4 - 3s1 - s2 - s3, z will reduce to 378 in the resource, or decreasing it by less than will tighten the
former case and 379 in the latter. Choose s3 = 1. Now constraint, will not affect the optimum. So only consider
changes in zero slack variables.
using the tableau on the previous slide, we have
Let amount of yeast available be 80+a instead of a (a could
(x1, x2, x3, x4)T = (40+2s3,10-2s3,30-s3,0)T = (42,8,29,0),
be positive or negative). Modify initial tableau replacing 80
and z = 379.
by 80+a. call this M1. Perform the same operations used to
derive M* from M. This gives tableau M1*:
Reality check: all modified solutions are less than 380.
117 118

x1 x2 x3 x4 s1 s2 s3 RHS
If for some a, the tableau is no longer optimal, can use dual
0 1 0 7 0 -1 2 10+2a simplex. If the quantity of yeast rises to 110, then a=30 and
1 0 0 -4 1 1 -2 40-2a the tableau is
0 0 1 1 -1 0 1 30+a x1 x2 x3 x4 s1 s2 s3 RHS
0 0 0 7 3 1 1 380+a 0 1 0 7 0 -1 2 70
1 0 0 -4 1 1 -2 -20
0 0 1 1 -1 0 1 60
But notice that s3 is the slack variable for the yeast 0 0 0 7 3 1 1 410
constraint, and the modified RHS is obtained by adding a
times the s3 column – a considerable time saving. A single dual simplex pivot on the ‘-2’ entry leads to optimal
The tableau is in optimal form provided 10+2a≥0, 40-2a≥0, form: x1 x2 x3 x4 s1 s2 s3 RHS
1 1 0 3 1 0 0 50
30+a≥0, i.e. if -5≤a≤20. -0.5 0 0 2 -0.5 -0.5 1 10
So if amount of available yeast increases from 80 to 90, 0.5 0 1 -1 -0.5 0.5 0 50
0.5 0 0 5 3.5 1.5 0 400
can read off new optimum vector as (x1,x2,x3,x4)T = (20,30,40,0)T
The method can generally be used when a single resource, with The modified solution is (x1, x2, x3, x4)T = (0,50,50,0)T (z=400)
a binding slack variable, changes and the tableau remains
optimal. 119 120
Changes in Selling Prices In general, provided the contour for the optimum does not
The objective function coefficients can be regarded as extend beyond certain limits, the optimum vector will not
selling prices, e.g. ₤6 is the selling price for a barrel of change. We can determine limits of change of a price so that
Light. Can represent changes in 2-D graphically: there is no change in optimum vector (although of course z
may change). Suppose q (positive or negative) denotes a
change in the selling price of Light. The initial tableau is as M
but with -6-q replacing -6.
1 1 0 3 1 0 0 50
2 1 2 1 0 1 0 150
1 1 1 4 0 0 1 80
-6-q -5 -3 -7 0 0 0 0
121 122

Performing the same operations as were used to
derive M* from M gives:
So if the selling price of Light drops from 6 to 5.5,
0 1 0 7 0 -1 2 10 q=-½ and we can read off the optimum as
1 0 0 -4 1 1 -2 40
0 0 1 1 -1 0 1 30 (x1, x2, x3, x4)T = (40,10,30,0)T as before but with
-q 0 0 7 3 1 1 380
profit 360 instead of 380.
(Notice this can be written directly from M*) If the tableau is no longer in optimal form, further
And a single further pivot leads to:
pivots will be required. If the selling price of Light
x1 x2 x3 x4 s1 s2 s3 RHS drops to 4.5, we have q=-1.5.
0 1 0 7 0 -1 2 10
1 0 0 -4 1 1 -2 40
The resulting tableau is not in optimum form:
0 0 1 1 -1 0 1 30
0 0 0 7-4q 3+q 1+q 1-2q 380+40q
This is optimal provided 7-4q≥0,3+q≥0,1+q≥0,1-2q≥0, 0 1 0 7 0 -1 2 10
1 0 0 -4 1 1 -2 40
i.e. -1≤q≤½. 0 0 1 1 -1 0 1 30
123 0 0 0 13 1.5 -0.5 3 320 124
But an ordinary simplex pivot yields optimum form: The final tableau will be
1 1 0 3 1 0 0 50
1 0 0 -4 1 1 -2 40 0 1 0 7 0 -1 2 10
0 0 1 1 -1 0 1 30 1 0 0 -4 1 1 -2 40
1 0 0 11 2 0 2 340 0 0 1 1 -1 0 1 30
0 0 0 7-r 3 1 1 380
Now briefly consider the effect of changing prices for non- Again this can be written from M*.
basic variables. Notice that if the variable is non-basic, it is
not profitable at optimum. It will surely not become The optimum is unchanged provided 7-r≥0, that is
profitable (i.e. enter the basis) if the price is lowered, but if the selling price of Premium does not exceed 14.
the price is raised sufficiently it may enter the basis. If the selling is raised beyond this level, the tableau
Premium is not in the basis. Suppose 7+r is the new selling will not be optimal and a further pivot needed.
price.
125 126
Sensitivity Analysis
Suppose r = 8. The tableau is
Sensitivity Analysis
0 1 0 7 0 -1 2 10 New Constraints
1 0 0 -4 1 1 -2 40
0 0 1 1 -1 0 1 30
These are usually handled by dual simplex.
0 0 0 -1 3 1 1 380 Look at the original final solution again.
A standard simplex pivot leads to: (x1,x2,x3,x4)T = (40, 10, 30, 0)T
x1 x2 x3 x4 s1 s2 s3 RHS Suppose we require that the total amounts of Light and
0 0.142857 0 1 0 -0.142857 0.285714 1.428571 Dark are at least the total amounts of Ale and Premium, i.e.
1 0.571429 0 0 1 0.428571 -0.857143 45.71429
0 -0.142857 1 0 -1 0.142857 0.714286 28.57143 x1+ x2 ≥ x3+ x4. Since 40+10≥30+0, the constraint is
0 0.142857 0 0 3 0.857143 1.285714 381.4286
non-binding, the original solution still stands and no further
steps are needed.
The new optimum is (x1,x2,x3,x4)T = (45.71,0,28.57,1.43)T But if the total amounts of Light and Dark are to be at least
or (455/7,0,284/7,13/7)T and z = 3813/7 twice the total amounts of Ale and Premium, the picture
changes. Since 40+10 < 2(30+0), the constraint must be used.
127 128
The following sequence of tableaux lead to optimum: x1 x2 x3 x4 s1 s2 s3 s4 RHS
0 1 0 7 0 -1 2 0 10
1 0 0 -4 1 1 -2 0 40
0 0 1 1 -1 0 1 0 30 Now apply dual
x1 x2 x3 x4 s1 s2 s3 s4 RHS 0 0 0 3 3 0 -2 1 -10 simplex with pivot
Add constraint
0 1 0 7 0 -1 2 0 10 0 0 0 7 3 1 1 0 380 shown.
x1+ x2 ≥ 2(x3+ x4);
1 0 0 -4 1 1 -2 0 40
choose pivots to x1 x2 x3 x4 s1 s2 s3 s4 RHS
0 0 1 1 -1 0 1 0 30 0 1 0 10 3 -1 0 1 0
give tableau suitable
-1 -1 2 2 0 0 0 1 0 1 0 0 -7 -2 1 0 -1 50
for dual simplex 0 0 1 2.5 0.5 0 0 0.5 25 Optimal form.
0 0 0 7 3 1 1 0 380
0 0 0 -1.5 -1.5 0 1 -0.5 5 x1 = 50, x2 = 0,
0 0 0 8.5 4.5 1 0 0.5 375 x3 = 25, x4 = 0
129 130
Sensitivity Analysis 2.12 Interior Point Methods

A collection of related approaches. These are the subject of
Summary continuing research, in contrast to simplex methods which are
Have considered some methods of modifying an LP
well researched and known to work well in most cases. It is
solution when a single input value changes. These
only fair to say that most small or medium size LPs can in fact
methods are of practical importance.
We can extend the methods to give solutions when be solved efficiently by a number of computer packages which
a given input takes value θ, for all real θ. This is use simplex based methods without the user needing to know
called parametric programming. much theory. The main problems arise when formulating the
Extensions exist to examine simultaneous changes problem in an appropriate mathematical way and, of particular
in two or more values. Not covered in the course. relevance to this section, when the problem size becomes
We have also seen the use of dual simplex when modifying very large and alternative methods are needed.
an LP solution. References:
With so-called technology changes to matrix A, or in other • Ye, Y, Interior Point Algorithms: Theory and Analysis, Wiley,
more complex cases, resolving the whole LP may be 1997
simplest. 131 • Wright, S, Primal-Dual Interior Point Methods, SIAM, 1997 132
Interior Point Methods
• Online webpages. The first part of an Edinburgh University
thesis is clearly written. The 2007 PhD thesis by Marco But there are major differences:
Colombo is entitled Advances in Interior Point Methods for • Simplex methods move around the boundary of the
Large-Scale Linear Programming.
feasible region; interior points methods move through the
• The best elementary account I have found is given by Hillier
and Lieberman, which we broadly follow. feasible region.
• For a popular account, see the ‘cover story’ in New Scientist • Simplex determines the optimum; interior point methods
no. 2877, dated 11/08/2012 typically terminate close to optimum, when a
• For a more technical but clear account, see convergence criterion has been satisfied.
www.ise.ncsu.edu/fuzzy-neural/wp-content/uploads/sites/9/2017/12/Lecture6.ppt.pdf • Simplex is much more efficient for small to medium size
LPs, but interior point methods are better suited to very
The main developments were initiated by Karmarkar in large LPs.
the 1980s. There are some similarities with simplex:
• Both are iterative procedures
133
• Both usually start with an initial feasible solution.

To clarify the last point, we consider the ‘Klee-Minty’ example Interior Point Methods
of an LP with n variables - see
http://www.math.ubc.ca/~israel/m340/kleemin3.pdf: For very large d, 2d > dk for any integer k. So simplex is
potentially ‘exponential time’ whereas it can be shown the
min –xd interior methods are only ‘polynomial time’. (More on this
later, in chapter 5). In fact, most interior methods have been
s.t. 0 ≤ x1 ≤ 1
found to converge satisfactorily after 40-80 iterations,
εx1 ≤ x2 ≤ 1 - εx1
although a single interior point iteration will take longer than a
……………….. single simplex iteration. .
εxd-1 ≤ xd ≤ 1 – εxd-1 Some research has considered a combination of
algorithms, interior methods being used to get close to
where ε is a small positive constant (this implies the optimum, simplex methods for the last stage.
non-negativity constraints). We note also that interior methods, unlike simplex, are not
well suited to post-optimal analysis.
The Klee-Minty class of examples require 2d – 1 iterations 135 136
using simplex.
Interior Point Methods Interior Point Methods
b) An algorithm such as that given in a) can be used to move
The logic of most interior point techniques depends both on through the interior of the feasible region, but the choice of
approaches we shall outline in NLP, and also on LP Δw and α is crucial. For example, if α is too small, progress
theory, in particular duality theory. Specifically: will be slow; if too large, there is the risk of getting ‘stuck’
near the boundary, or even traversing it. In the picture. If we
a) Many OR algorithms – for example Steepest Descent, start at #1, with optimum at #5, we really don’t want to have
which we encounter in chapter 6 – use the following #3 as intermediate point.
generic rationale:
1. Find an initial feasible point w.

2. Determine a search direction Δw.
3. Compute the distance α to move in the search
direction.
4. Move to the next point w + αΔw.
5. Repeat 2, 3, 4 until some termination criterion is
met.

c) To avoid such problems (small step length, limited
choice of direction), many algorithms transform, or scale, d) Another very useful approach is to use a ‘barrier function’ to keep iterates
the feasible region to place the current iterate near its from approaching the boundary of the feasible region. This is related to
penalty functions, which we encounter in chapter 6.
centre. This prevents a trial solution getting too close
Suppose the problem is
to a constraint boundary. max cTx s.t. Ax = b, x ≥ 0
We can accommodate the non-negativity constraints by considering instead
If a current solution is near the center of the feasible the problem
domain (polyhedral set), on average we can make a max cTx + μ∑logxj s.t. Ax = b
decently long move. for a suitable constant μ. More generally, a barrier function will provide a
heavy penalty for any points that stray too close to the boundary – at the cost
of replacing the original LP with an NLP.
Interior Point Methods Interior Point Methods
e) We should also have some way to decide when the See https://www.princeton.edu/~rvdb/542/lectures/lec14.pdf or
sequence of points is getting close to optimum. It may http://ocw.nctu.edu.tw/course/lp992/Lecture6.pdf
not be enough to use a criterion such as ||x(i)-x(i+1)|| < ε
if we are moving close to a boundary. for reasonably accessible more detailed discussion.
See http://www.maths.ed.ac.uk/~gondzio/software/hopdm.html
A useful idea is to exploit the ‘duality gap’. We have seen or http://www.sztaki.hu/~meszaros/bpmpd/
that for a dual pair of LPs, cTx ≤ bTy with usual notation.
More precisely, cTx < bTy except at optimum, for two software packages that have been used.
where cTx = bTy. Then proximity to optimum can be
assessed by the size of the duality gap bTy – cTx. This discussion is extended somewhat in the Moodle
miscellaneous section (non–examinable).
141
Chapter 3. Integer Programming
Linear and Non-Linear Optimization 3.1 Introduction

3.2 Examples
3.3 Branch and Bound
Nicholas Cron
2
1
3.1 Introduction Introduction
A linear integer program (IP) is an LP where some or all xj At first glance, it may seem that IPs are easier to solve than
(j = 1, 2, …, n) are required to be integer. [Shall not LPs, because there are less potential solutions. In fact, they
consider non-linear integer programs.] are harder, because there is no universal algorithm such as
Sometimes all variables must be integer, other times just simplex, although there are universal approaches such as
some. So can have:
branch and bound, which may still take a very long time to
- A pure integer problem (PIP)
- A mixed integer problem (MIP) attain optimum.
Such problems arise naturally when xj represents an While LPs and IPs are superficially similar, there are major
indivisible unit (car, aircraft, factory, human being etc.) differences:
Additionally, many problems occur when xj is binary. Usually i) The IP solution may be a long way from the LP solution;
we then assume xj ε {0, 1}. ii) Rounding the LP solution up or down may not work;
If all variables are binary, we have a binary integer problem iii) Examining every possible IP solution may be impractical.
(BIP).
Various combinations can occur for the range of each xj.
3 4
Introduction
Introduction The difficulties may be seen from a sketch. Notice that
solutions can only occur at lattice points.
To see i) and ii), consider the LP
max z=x+ y
s.t. -2x + 2y ≤ 1
16x - 14y ≤ 7
x ≥ 0, y ≥ 0
and the corresponding IP with x, y integer.
The LP solution is (x, y)T = (7, 7.5), z = 14.5.
The rounded LP solutions are (x, y)T = (7, 7), z = 14 and (x, y)T
= (7, 8), z = 15 but unfortunately both of these are infeasible.
The optimum IP solution is (x, y)T = (3, 3), z = 6.
In fact there are four feasible points for the IP: (0,0), (1,1), (2,2)
and (3,3).
5 6
Introduction 3.2 Examples
Have already seen some examples, e.g. the nurse
To illustrate fact iii) – that exhaustive enumeration rostering problem might be solved as a PIP, the knapsack
is usually impractical – consider a fairly modest IP problem as a BIP. Three further examples:
where 20 integer variables can each take 20 Facility Location (MIP)
possible values. There are 1026 solutions to Suppose that facilities for distributing a product to n
verify. Even if still within range of some machines, customers can be placed at m possible locations.
a small increase in size will put the method out of If location i is used as a distribution point, there is a fixed
reach. A BIP with 87 variables requires a similar set-up cost Fi, and cij is the cost of transporting one unit of
production from location i to customer j, including
number of enumerations.
loading/unloading costs at i and j. (i = 1, …, m, j = 1, …, n).
In practice, these are quite small scale problems.
Customer j demands dj units of the product.
Clearly, fresh methods are needed. At which of the m possible locations should facilities be
placed, and how much should be shipped to each customer
to meet demand and minimise totals costs?
7 8
Examples Examples
Let yi = 1 if a facility is established at location i, and 0 otherwise. We also need to ensure that customer demand is met, so
Let xij be the amount shipped from location i to customer j. m
xij d j for each j. This gives the MIP:

i =1
m n m
Notice
n
that if yi = 0, nothing can be shipped from i. So min   c x +  F y ij ij i i
xij = 0 for those i with yi = 0.

i =1 j =1 i =1

s.t. m
(j = 1, …, n)
 x d
j =1
ij j
i =1
If yi = 1, total demand from all customers should
n
surely
n
not be n n (i = 1, …, m)
exceeded by the amount shipped from i, so  xij   d j  x  y d
ij i j
j =1 j =1 j =1 j =1
for those i with yi = 1. xij ≥ 0 (i = 1, …, m, j = 1, …, n)
n n
yj ε {0, 1} (i = 1, …, m)
Combining the two constraints,  xij  yi  d j (1≤i≤n)
j =1 j =1
where xij ≥ 0 for all i, j, yi = 0 or 1. (Various modifications possible).
.
9 10
Examples
Examples
n n
max z =   c x
Assignment Problem (BIP) i =1 j =1
ij ij
Suppose we wish to assign n people to n jobs, n

s.t.  x =1 ij
for i = 1, …, n
each person to do exactly one job, each job j =1
performed by exactly one person. Let cij denote n
the suitability of person i for job j.

and  x =1
i =1
ij
for j = 1, …, n
Let xij = 1 if person i is assigned to job j,
0 otherwise. where xij = 0 or 1 for all i and j
We can maximise the total suitability of an Again there are various modifications. The assignment
assignment by solving an IP: problem is a classic problem of OR. While it can be written
and solved as an IP, specific and more powerful methods
have been developed.
11 12
Examples Examples
Project Planning (BIP) n
The BIP is max  R x i i

Suppose n projects are being evaluated over an m i =1
year horizon. We know the projected expenditure for n
s.t.  aij xi  E j (j = 1, …, m)
each project over each year, viz. aij is the projected i =1
expenditure for project i in year j. We also know the

and xj ε {0, 1} (i = 1, …, n).
expected return Ri for project i, and the allowable
total expenditure Ej in year j. Which projects should
be selected for execution over the m year period so
as to maximise the return?
We use the binary variable xj, which equals 1 if
project j is selected and 0 if project j is not selected.
13 14
Examples
Suppose we have the specific problem:
Examples
Expenditure each year in ₤m
Project Year 1 Year 2 Year 3 Return (₤m) For this small problem, easy to find the solution
1 5 1 8 20
2 4 7 10 40
by inspection:
3 3 9 2 20 x1 = x2 = x3 = x4 = 1, x5 = 0.
4 7 4 1 15
5 8 6 10 30 Select projects 1-4, not project 5. Return ₤95m.
Funding (₤m) 25 25 25
Notice that solving as an LP, relaxing the 0/1
The problem is condition, gives the meaningless solution
max 20x1 + 40x2 + 20x3 + 15x4 + 30x5 x1 = 0.5789, x2 = x3 = x4 = 1, x5 = 0.7368.
s.t. 5x1 + 4x2 + 3x3 + 7x4 + 8x5 ≤ 25 The rounded solution x1 = x2 = x3 = x4 = x5 = 1 is infeasible.
We can include the constraint xj = xj2 but then the problem is
x1 + 7x2 + 9x3 + 4x4 + 6x5 ≤ 25
non-linear.
8x1 +10x2 + 2x3 + x4 + 10x5 ≤ 25
x1, x2, x3, x4, x5 ε {0, 1}
15 16
Examples
3.3 Branch and Bound
We can often incorporate logical constraints in a BIP. Thus:
• If both projects A and B should not be included together, To solve an IP, two naïve approaches can
then xA + xB ≤ 1
• If at least one of A and B should be included, then be tried:
xA + xB ≥ 1 a) Solve the associated LP (the ‘LP relaxation’) and
• If exactly one of A and B should be included, then round off. Have seen this may not work.
xA + xB = 1 b) Examine every integer lattice point. Have seen this
• If two of A, B, C, D should be included, may be impractical.
then xA + xB +xC + xD = 2 The branch and bound method incorporates
• If C must be included if either A or B is included,
then xC ≥ xA and xC ≥ xB useful features of both approaches. We successively
• If C must be included if both A and B are included, then eliminate those parts of the feasible region that cannot
xC ≥ xA + xB - 1 contain the optimum, using a tree structure.
17 18
Branch and Bound Branch and Bound
Consider the PIP:
But a solution for the IP must satisfy either
Max -4x1 + 6x2 = z xεF, x1≤1 or xεF, x1≥2. Construct two new
s.t. -x1 + x2 ≤ 1 LPs with these constraints.
x1 + 3x2 ≤ 9 1. max z s.t. xεF, x1≤1 → x1 = 1, x2 = 2, z = 8.
3x1 + x2 ≤ 15 2. max z s.t xεF, x1≥2 → x1 = 2, x2 = 22/3, z = 6.
x1, x2 ≥ 0 and integer. The process of constructing new problems is
called branching. Here, we branched on x1 to produce two
Let F denote the feasible region for the LP. subproblems from the master problem.
The LP relaxation can be written max z The solution to 1 is feasible, and 2 cannot lead to
x F
a higher objective value, so the solution to 1 is optimal.
Solution (simplex or graph) x1 = 1.5, x2 = 2.5, z = 9.
This is useless for the IP. [Note: adding further constraints cannot increase the value
in a maximisation problem.]
19 20

Notice that we could also have branched on x2:
1. max z s.t. xεF, x2≤2 → x1 = 1, x2 = 2, z = 8. A formal statement of the procedure follows.
2. max z s.t. xεF, x2≥3 → infeasibility. We produce a tree structure, where each node of
The same conclusion follows, albeit by another route. the tree represents the original problem or some
Branch and bound is an extension of this approach. Divide a subproblem of it. Assume the IP is a maximisation
large problem into two or more subproblems and show that
problem. We successively add further nodes to the
certain of them cannot contain the optimum. Ruling a subset
tree (branching) and rule out parts of the tree
out of further consideration is called fathoming.
(fathoming) because of infeasibility, or because the
The method generates a sequence of subproblems differing
only in the bounds placed on variables. Successively optimum cannot occur there (bounding).
excluding certain solutions, we eventually obtain the optimum.
21 22
Branch and Bound
STEP 0 - Initialisation: Branch and Bound
Solve the LP relaxation of the IP. If the solution has integer
values, stop. If not, let L be the objective value at any STEP 3 – Fathoming:
feasible integer solution. If several such are known, let L be For each subset that may contain the optimum, exclude it
the greatest of these. If none are known, let L be a large from consideration if
negative number. a) z < L;
STEP 1 – Branching: b) The subset has no feasible points;
Select a remaining subset of feasible solutions (on the first
c) z is attained at an integer feasible point and z > L.
run, select F, the set of all constraints). Choose a non-integer
In case c), call the integer feasible point the incumbent
component of the solution to the subproblem, and divide the
subset into two by adding constraints to exclude the non- solution, let z = L and repeat step 3 for further unfathomed
integer value. subsets for which z is known.
STEP 2 – Bounding: STEP 4 – Testing
For each subset formed, obtain an upper bound on the If no subsets are unfathomed, stop; the incumbent solution
objective value, say z. is optimal. If not, return to step 1.
23 24

Apply the algorithm to the IP:
This divides F into F ∩ {x|x2 = 0} and F ∩ {x|x2 ≥ 1}
max -3x1 + 7x2 + 12x3 = z (could also have branched on x3.)
s.t. -3x1 + 6x2 + 8x3 ≤ 12 We have two subproblems:
6x1 - 3x2 + 7x3 ≤ 8 max z, x ε F, x2 = 0, solution xT=(0,0,1.1), z = 13.7.
-6x1 + 3x2 + 3x3 ≤ 5 max z, x ε F, x2 ≥ 1, solution xT=(0.7,1,1), z = 17.
x1, x2, x3 ≥ 0 and integer. Neither can be fathomed by a), b) or c) so need to branch
further. Useful to use a branching diagram as shown, with
F is the set of all non-negative vectors in R3 subproblems as nodes.
satisfying all the constraints. Use the first problem for illustration and branch on x3.
Start by solving the LP relaxation. This yields Subproblem with x3 ≥ 2 is infeasible, so fathomed.
xT=(0,0.3,1.3), z = 17.4. Also set L=0 Subproblem with x3 ≤ 1 has integer solution (0,0,1) with
corresponding to xT=(0,0,0). We do not yet have value 12 > 0, so L=12 is the new incumbent.
an integer solution; branch on x2. Proceed similarly by branching for the second problem until
optimum is attained.
25 26
Branch and Bound
So IP solution is x1 = 2, x2 = 3, x3 = 0, z = 15.
Various general comments on branch and bound:

1. A good deal of choice exists in the method – which
variable to branch on, which order to use in following
branches? The decision will have a major impact on
computer run time.
2. Can use a breadth first (examine all nodes at the
highest level before those further down) or depth first
strategy (extend the diagram as far down as possible
before examining nodes to the left or right). No hard
and fast rules.
3. Logic is fairly simple. The main consumer of computer
time is solving many associated LPs.
27 28

4. In practice, can sometimes stop at a sub-optimal solution. If 10. In the 1970s, IBM research staff recommended the
incumbent has L = 25, smallest z at unfathomed nodes is following strategies in practical IP solving to reduce
z=26.6, may be acceptable to use current incumbent, computation time. They are still valid.
although not strict optimum.
5. Useful to spend some time in initialisation, to obtain a good a) Keep the number of integer variables as small as
initial incumbent and speed the algorithm. possible. Consider treating integer variables with more
6. Can sometimes exploit special structure. Thus for 0-1 than about 20 values as continuous, if possible.
problems, it is often quicker to maintain integrality and b) Provide tight lower and upper bounds on integer
branch on constraints, rather than vice versa in the main variables, when possible.
algorithm. c) Use extra constraints where this can be done. Unlike
7. Logic does not depend on LP/IP structure, so can apply LP, new constraints in a MIP will generally reduce
method for other problems, such as integer NLPs. computation time.
8. Other IP techniques can be used, notably Gomory’s cutting d) The order in which integer variables are chosen for
plane method (not considered).
branching is critical. The advice is that they be
9. Solution time clearly depends on the numbers of variables processed in priority order, based on economic
and constraints, but can grow exponentially with the size of
the problem. significance and user experience.
29 30
Branch and Bound
e) If sub-optimal solutions are acceptable, accept the first
integer solution that is within 3% of the continuous
optimum. That is, terminate the branch and bound
procedure with IP value S, LP value T, whenever Linear and Non-Linear Optimization
|T – S|T-1 < 0.03.
[Obviously the figure of 3% can be adjusted depending
on context.] Nicholas Cron
11. Always be aware of context and use common sense. It
is true that rounding an LP solution may not give a valid
IP solution mathematically, but it may yield a practical
solution. If, say, a project has a budget of £1m, a
rounded solution with expenditure £1.002m is likely to
31 1
be acceptable .
4.1 Introduction and Definitions

Chapter 4. Graphs and Networks
Graphs and networks can be used to model the flow of
transport (road, rail etc.), calls in a telephone exchange, oil
4.1 Introduction and Definitions
flow through pipelines, current in electrical circuits and much
4.2 Shortest Path Algorithm more besides.
A graph can be defined as a pair G = (N,A)
4.3 Maximum Flow Algorithm
consisting of a collection N of nodes (sometimes
4.4 Extensions called vertices) and a collection A of unordered
4.5 The Minimal Connector Problem pairs of elements of N called arcs (or edges).
For example, N = {1,2,3,4,5}, |N| = 5,
4.6 More about Graphs A={(1,2),(2,3),(1,5),(3,5),(3,4),(3,4),(4,5),(5,5)}, |A|=8.
Assume N and A are finite.
2 3
Introduction and Definitions
A path in G is a set of nodes {i1, i2, …, in} and a set of
Can be represented pictorially, viz. distinct arcs {a1, a2, …, an-1} such that ak = (ik, ik+1} for
k = 1, 2, …, n-1. We refer to a path from i1 to in e.g. a path from
nodes 1 to 4 consists of nodes {1,2,3,4} and arcs {a1, a2, a5} in
the above graph. .
We could write this as 1,a1,2,a2,3,a5,4.
A graph is connected if there exists a path between any two
vertices. Otherwise, it is disconnected.
a5 and a6 are often called multiple arcs

(multiple edges).
a8 is a loop.
4 5

A digraph (directed graph) is a pair G=(N,A)
consisting of a collection N of nodes and a A digraph is weakly connected if the underlying
collection A of ordered pairs of elements of N. For graph is connected.
the above example, if pairs are regarded as It is strongly connected if there exists a path between any
ordered, the representation is two vertices.
[If the digraph shows one-way streets in a town, it is strongly
connected if we can travel from one point to any other going
the correct way along streets. It is weakly connected if we
are permitted to travel the wrong way along streets.]
Clearly, every strongly connected digraph is weakly
connected, but not necessarily conversely.
6 7
Introduction and Definitions We define a network as a digraph containing no loops or
multiple arcs where each arc has a capacity, one or more
. identified nodes are sources and one or more identified nodes
are sinks. Assume all networks are weakly connected.
(There is not always complete consistency about definitions
between textbooks).
8 9
Introduction and Definitions Introduction and Definitions

Very many practical problems can be considered How should we distribute goods from producers to consumers
using graphs and networks. so as to minimise transportation costs?
The problem can be represented as a network, with m nodes
Example 1 for producers, n nodes for consumers, and an arc joining each
What is the critical path in an activity network, identifying producer node to each consumer node with capacity cij.
capacities as times to complete tasks in a complex Amount shipped from producer i to consumer j is xij.
sequence of activities? Efficient solution methods exist.
Example 2 - The Transportation Problem Notice that the Transportation Problem can also be written
m n
There are m producers, n consumers, ai (i=1,2,…,m) are as an LP: min   c x ij ij
i =1 j =1
available supplies from the producers, bj (j=1,2,…,n) are n
demands from the consumers, cij is the cost of shipping one  x a

j =1
ij i
(i=1,2,…,m) (supply constraints)
unit from producer i to consumer j. Supply and demand m
requirements must be met.  x b

i =1
ij j
(j=1,2,…,n) (demand constraints)
10 11
xij ≥ 0 for all i,j.
Introduction and Definitions 4.2 Shortest Path Algorithm
Indeed, many network problems can be expressed as LPs Suppose we have a network with a designated source A and
but it is usually more efficient to use algorithms that exploit sink Z. We give a method for finding the shortest path from A
the network structure. We do not consider the above problems, to Z. Typically, weights on arcs represent distances: we
but rather the following three: require the shortest distance from A to Z through a number of
1. What is the shortest path from source to sink, identifying intermediate nodes, when the distance between adjacent
capacities (usually) as distances? pairs of nodes is known.
We can think of travelling from town A to town Z via other
2. Finding the maximum flow in a (directed) network – for Intermediate towns, to minimise distance travelled .
example, the flow of water through a system of aqueducts.
In other problems, weights are times: we may require the
3. Finding the minimum spanning tree in an (undirected) shortest time to travel from A to Z.
graph. Given a graph with arc capacities, we seek the Or weights may be costs of certain activities. In each case
subgraph of minimum total capacity such that the subgraph the minimum total distance/time/cost can often be
is connected.
12
determined from the algorithm or an extension. 13
A similar algorithm exists for undirected graphs.
Shortest Path Algorithm Shortest Path Algorithm

Step 2: If Z has a permanent label, go to step 3. Otherwise,
Describe what is often called a labelling algorithm. At any select the node with minimum temporary label (any, if there
stage, nodes are either unlabelled, have a temporary label or is a choice) and make this label permanent. Return to step 1.
have a permanent label. At each iteration, a fresh node
receives a permanent label. The algorithm terminates when Step 3: The permanent label on Z is the length of the
sink Z has a permanent label. shortest path from A to Z. Construct this shortest path thus:
Start from Z and find a node Y such that:
a) There is an arc from Y to Z;
Step 0: Assign source node A a permanent label.
b) p(Y) < p(Z);
c) p(Y) + w(Y,Z) = p(Z).
Step 1: Let X be the last node assigned a permanent label. For
each arc of the form (X,Y), calculate p(X)+w(X,Y), where p(X) is Repeat for node Y instead of node Z and continue until
the permanent label on X and w(X,Y) the weight on (X,Y). If Y is source A is reached.
unlabelled, or if p(X)+w(X,Y) < T(Y), where T(Y) is the temporary
label on Y, give Y temporary label p(X)+w(X,Y). Otherwise leave Sometimes there is a choice of nodes in step 3, corresponding
14 15
T(Y) unchanged. to non-uniqueness of the shortest path.
Shortest Path Algorithm Shortest Path Algorithm
We find the shortest path in the following network:
Iteration Permanent Labels Temporary Labels
1 P(A)=0 -
2 P(B)=1 T(B)=1,T(C)=10,T(D)=6,T(E)=3
3 P(E)=3 T(C)=10,T(D)=6,T(E)=3,T(F)=11
4 P(D)=5 T(C)=10,T(D)=5,T(F)=11,T(H)=9,T(J)=11
5 P(H)=8 T(C)=9,T(F)=11,T(J)=11,T(H)=8
6 P(C)=9 T(C)=9,T(F)=11,T(J)=11,T(Z)=16,T(G)=13
7 P(F)=10 T(F)=10,T(G)=13,T(J)=11,T(Z)=16
8 P(J)=11 T(G)=12,T(J)=11,T(Z)=15
9 P(G)=12 T(G)=12,T(Z)=15
10 P(Z)=14 T(Z)=14
16 17
Shortest Path Algorithm 4.3 Maximum Flow Algorithm

Using step 3 of the algorithm, can now construct We seek the maximum flow from a single source v to a single
the required path of length 14, viz. sink w in a network. The method can be extended for multiple
Z→G→F→C→D→E→A or Z→G→F→C→H→D→E→A
sources and sinks, and for undirected graphs.
and the paths from A to Z by reversing the direction.
In a manual implementation, can sketch the network, place
The flow is an assignment of numbers (flows) to arcs such that
temporary labels by numbers at nodes, permanent labels by
the amount flowing into any node (apart from source and sink)
circled numbers, replacing temporary labels as the algorithm equals the amount leaving. Capacities must not be exceeded.
proceeds. Notice that the path found by the algorithm may not
be unique. An arc with flow equal to its capacity is saturated.
Otherwise, it is unsaturated.
This algorithm, and some extensions (for graphs rather than
One (trivial) flow is the zero flow with no flow in each arc.
networks, for finding the shortest path from source to every
other node) is often referred to as Dijkstra’s algorithm.
18 19
Maximum Flow Algorithm Maximum Flow Algorithm
Can represent as an LP, where fij is flow in arc (i,j). There is an algorithm simpler than solving an LP.
First, two important ideas. A cut is a set of arcs
such that every path from source to sink contains
at least one arc in the cut.
max fvx + fvy + fvz

fvx + fyx + fzx = fxz + fxw
fvy = fyx + fyz
fvz + fyz + fxz = fzx + fzw
0 ≤ fvx ≤ 4, 0 ≤ fvy ≤ 3, 0 ≤ fvz ≤ 1,
0 ≤ fyx ≤ 4, 0 ≤ fyz ≤ 2, 0 ≤ fxz ≤ 1, Two possible cuts are {(x,w),(z,w)} and
0 ≤ fzx ≤ 2, 0 ≤ fxw ≤ 2, 0 ≤ fzw ≤ 4. {(v,x),(y,x), (y,z),(v,z)}.
[Objective function could also be fxw+ fzw.] 20 21
Maximum Flow Algorithm

Max-flow Min-cut Theorem (Proof omitted)
Maximum Flow Algorithm
(Ford and Fulkerson, 1955)
Of course max flows and min cuts may not be unique.
In any network, the value of any maximum flow
For larger networks, both are hard to find by
equals the minimum capacity of any cut.
inspection, so need an algorithm.
For a given network, let

I be the set of arcs in which flow may be increased
R be the set of arcs in which flow may be decreased.
Easy to find a flow value 6, so max flow ≥ 6
{(x,w),(z,w)} is cut capacity 6, so min cut ≤ 6. STEP 1: Find a feasible flow. [The zero flow will do, but a
positive flow will speed termination of the algorithm.]
By theorem, max flow = min cut, each of value 6.
22 23
STEP 2: Using the following procedure, try to find a chain STEP 3: If the chain used to label the sink consists entirely
of labelled arcs and nodes that can be used to label the of forward arcs, increase the flow from source to sink in the
sink. chain by as much as possible.
Label the source. Then label nodes and arcs according to: If the chain used to label the sink consists of both forward and
a) If node x is labelled, node y is unlabelled, arc (x,y) ε I, backward arcs, increase the flow in each forward arc as much
then label y and arc (x,y). Arc (x,y) is a forward arc. as possible, decreasing the flow through each backward arc.
b) If node y is labelled, node x is unlabelled, arc (x,y) ε R, This will also increase the flow from source to sink.
then label x and arc (x,y). Arc (x,y) is a backward arc. Return to step 2.
If the sink cannot be labelled, the current flow is maximum; STEP 4 (optional): Verify the current flow is maximal by finding
stop. If the sink is labelled, proceed to step 3. the capacity of a suitable cut.
24 25

Example
Label arc (v,b), node b; (v,b) is forward arc.

Label arc (a,b), node a; (a,b) is backward arc.
Start with zero flow. All arcs in I, none in R. Label arc (a,c), node c; (a,c) is forward arc.
Label successively v,a,(v,a),b,(a,b),w,(b,w). Label arc (c,w), node w; (c,w) is forward arc.
There is a chain of forward arcs (v,a),(a,b),(b,w) from source Have labelled sink via chain (v,b),(a,b),(a,c),(c,w).
to sink. Can increase the flow in the chain by min(2,3,2) = 2. All arcs other than (a,b) are forward arcs. Can increase flow on
Resulting flow is shown; capacities in blue, flow in forward arcs by 1(largest possible), decrease flow in backward
red, each arc labelled I or R or both. arcs by 1. Have increased flow from source to sink by 1.
26 27
Further Example
We achieved the improvement by diverting 1 unit that was

transported through (a,b) to path a→c→w. Could then
transport an extra unit from source to sink via v→b→w. The
concept of backward arc was needed to find this improvement.
Initial flow v→x→z→w of value 2.
The new flow is optimal, of value 3. Can check that the sink
cannot be labelled; or because {(c,w),(b,w)} is cut of capacity283. 29
Now find chain (v,u) (forward), (u,z) (forward), (x,z)

v→s→t→w is chain of forward arcs; (backward), (x,t) (forward), (t,w) (forward). Augment
increase flow by 2 along this chain. flow by 1 in these forward arcs, decrease by 1 in (x,z).
30 31
4.4 Extensions
Maximum Flow Algorithm Various ways to go beyond the basic algorithm.
1. With several sources and/or sinks, create a ‘supersource’
and/or ‘supersink’. Capacities of arcs adjacent to
supersource and supersink are large and positive. Thus
becomes
This flow is optimal, value 5, since sink
cannot be labelled. Or apply step 4:
{(v,s),(v,x),(u,z)} is cut of capacity 5.
32 33
Extensions Extensions
2. With flows permitted both ways along an arc, it is
simplest to add an arc, i.e.
A widely used extension is used when there are
costs on arcs. We require the minimum cost flow
of given value from source to sink. It is a hybrid of
a transportation problem and a network flow
problem. Solution methods are known.
Further extensions exist for networks with gains or
3. With node capacities, add an extra arc.
losses (for example, due to heating in electrical
circuits, taxation for money flows) and where there
are lower capacities on arcs (e.g. where there are
contractual obligations to use certain routes).
34 35
4.5 The Minimal Connector Problem
The Minimal Connector Problem
Example 1: A new underground system is to be built in a city.
A number of stations are proposed servicing the city centre These are all problems on graphs (not digraphs or
and suburban locations. What is the shortest length of track networks) which share the same essential features. First a
that is needed so that one can travel from any station to any few more definitions.
other, not necessarily directly? A graph is simple if it has no loops or multiple edges. A
Example 2: Design a central heating system in a large house cycle in a simple graph is a path {i1, i2, …, in} with i1 = in and
so that every room is in the system, yet the total length of n >1. A connected graph with no cycles is a tree.
piping is minimal. The following equivalent properties hold for a tree on n
Example 3: Design a telecommunications system (e.g. fibre vertices, and may be used as defining properties:
optic network) so that an efficient path exists between every • There is exactly one path between any two vertices;
pair of vertices. Choosing the connections optimally could lead • The graph is connected with n-1 edges;
to significant cost savings. • The graph contains no cycles, but the addition of any
new edge creates exactly one cycle.
36 37
The Minimal Connector Problem The Minimal Connector Problem

Examples of trees:
A weighted graph is a graph in which each edge has a

numerical value. [Similar to capacity in a network.]
A subgraph of a graph G is a graph in which the If G is a connected weighted graph, a minimum spanning
vertices and edges are vertices and edges of G. tree (or minimal connector of G) is a spanning tree, the sum
A subgraph of G is spanning if it contains all the of the weights on the edges of which is as small as possible.
vertices of G. The problems outlined at the start of this section all require
A spanning subtree of G is a spanning subgraph us to find a minimal connector, and it is that problem we
which is also a tree. address.
38 39
Two algorithms are given. We can assume the
graph is simple because we would never include a loop in a
minimal connector, and a set of multiple edges can always Kruskal’s Algorithm
be replaced by a single edge with weight equal to the Let G be a connected simple weighted graph.
minimum of the parallel multiple edges. Construct a spanning subtree as follows: select
edges of G one at a time, always choosing one
Kruskal’s Algorithm of minimum possible length of those remaining,
Let G be a connected simple weighted graph. provided only that it does not create a cycle. If
Construct a spanning subtree as follows: select there is a choice of edges, select any of them. The
edges of G one at a time, always choosing one
tree obtained is a minimum spanning tree of G. It
of minimum possible length of those remaining,
is not necessarily unique.
provided only that it does not create a cycle. If
there is a choice of edges, select any of them. The
tree obtained is a minimum spanning tree of G. It
is not necessarily unique. 40 41

Prim’s Greedy Algorithm
Let G be a connected simple weighted graph on n vertices.
Form a spanning tree T as follows:
1. Place any vertex of G in T.
2. Add an edge to T of minimum weight joining a vertex
already in T to a vertex not in T. If more than one edge
is a candidate, choose arbitrarily.
3. Repeat 2 until a spanning tree is obtained (i.e. until n-1
edges have been included).
Notice we cannot choose CE or AC for the fourth
edge, or a cycle would appear. The final graph is T will then be a minimum connector for G.
a minimum connector with total weight 7+2+5+4=18.
42 43
The Minimal Connector Problem 4.6 More about Graphs
This short section contains general information

only. It is intended to say a bit more, in non-
mathematical terms, about the type of problems
considered in graph theory and related areas, to
mention a few more applications and to list some
of the areas where graph theorists work. The
section is not claimed to be comprehensive. Nor is
it examinable.
The final graph is a minimal spanning tree of total weight

4+5+2+7=18. The minimal spanning tree is not unique. We
discuss the choice of algorithm in the next chapter. 44
More about Graphs More about Graphs

These two graphs, K5 and K3,3 respectively, are non-
Topics in the theory of graphs: planar (try it!)
1. In graph theory, an (undirected) graph is a planar
graph if it can be drawn on the plane in such a way
that its edges intersect only at their endpoints.
Here is a planar graph, known as K4:
This one looks non-planar but it is actually planar. It is
just a disguised version of (we would say isomorphic to)
K4:
This graph, sometimes called the butterfly graph, is
also planar:
The Scope of Graph Theory More about Graphs
Can we characterize planar graphs? Yes, through 2. The Four Colour Theorem. Here is a map of (I
Kuratowski’s Theorem. think 25, but it doesn’t really matter) theoretical
‘Let G be a graph. Then G is non-planar if and only if G countries. Two countries are called adjacent if
contains a subgraph homeomorphic to either K5 or K3,3’. they share a boundary edge. Adjacent countries
One part of this – the non-planarity of K5 or K3,3 – is not must be coloured differently on a map - if they just
difficult. The other part is a little trickier but is quite meet at a vertex, they can be assigned the same
accessible to a mathematics undergraduate. (We also colour.
omit a formal definition of homeomorphic).
We can use Kuratowski’s Theorem to show that the
Petersen graph below is non-planar.
More about Graphs

More about Graphs 3. Enumeration problems. As the word suggests, these
are problems that can be expressed by asking ‘how
many…?’
It is not hard to see that three colours are insufficient For example, how many connected simple graphs are
to colour the map – we need four. The four-color there with n vertices? When n = 4, the answer is 6, as
theorem states that any map in a plane can be shown. In this case, it is easy to check by exhaustion.
colored using at most four colors in such a way that
regions sharing a common boundary edge do not
share the same color.
The problem was first investigated about 1850.
Many attempts and fallacious proofs appeared in
subsequent years, and it was only in 1976 that a full
satisfactory proof was given by Appel and Haken.
Their proof employs a very intricate computer
analysis and includes an exhaustive search of the
various possible configurations.
The Scope of Graph Theory More about Graphs
4. Eulerian graphs. A graph is called Eulerian if it has a
As another example, we can ask how many non- cycle that passes through every vertex at least once and
isomorphic trees on n vertices there are. When n = 6, traverses every edge exactly once. (This implies that we
there are in fact six such trees, as shown: start and end at the same vertex).
These graphs are named in honour of the famous
mathematician Leonhard Euler who first discovered their
properties in the 18th century. At that time, the city of
Konigsberg was divided into sections by the Pregel river.
Various bridges connected the regions formed, and we
are told that residents spent their Sunday afternoons
walking around trying to cross each bridge exactly once
Solving general enumeration problems in terms of n is, in and return to where they started.
most cases, highly complex. It is unusual for a closed
form of solution to be known. Sometimes, a recurrence Euler found a simple characterisation, based on the idea
can be used to solve the problem for any given integer n. of a degree of a vertex v: this is the number of edges
that are incident with v.

Euler’s Theorem states that an undirected graph G is A superficially similar problem is to seek a Hamiltonian
Eulerian iff it is connected and every vertex has even degree. cycle which visits each vertex exactly once. However,
We could also show, for example, that this graph is Eulerian: there is no useful connection between the two ideas, and
no necessary and sufficient conditions for a graph to be
Hamiltonian (although conditions that are EITHER
necessary OR sufficient are known!).
A well known problem associated with Eulerian cycles is
the Chinese Postman Problem after the Chinese
mathematician, Kwan Mei-Ko, who considered it in the
1960’s). We seek to travel along every road in a city in
order to deliver letters, with the least possible distance.
The problem is how to find a shortest cycle in the graph
in which each edge is traversed at least once, rather
When the criterion is met, algorithms exist which construct an than exactly once.
Eulerian cycle explicitly.
(The graph of Konigsberg in not Eulerian)
Graph theory has many practical applications, such as:
A well known problem associated with Hamiltonian • Sociology. Vertices represent people, edges some
cycles is the Travelling Salesman Problem. Given a list measure of association between them, such as familiarity,
of cities and the distances between each pair of cities, friendship or respect. By examining the graph, we can
we seek the shortest possible route that visits each city obtain information on areas such leadership, influence or
the formation of friendship groups.
exactly once and returns to the starting city. This can be
expressed by looking for a Hamiltonian cycle of shortest • Chemistry. We can study the isomers of organic chemicals.
length in a graph. It is not hard to show there are only two isomers of butane
(C4H10) (look at the linkages between the carbon atoms,
Again there is no simple answer, although many regarded as the vertices of a graph):
thousands of hours have been spent on the problem. It is
arguably one of the most famous optimization problems.
It is also important for being an NP hard problem.
What about pentane, hexane …?

• Communications, especially military communications. Graph theory has developed hugely in the last 50 years
Vertices represent communications centres, or towns. or so. The subject now has many adherents, both
Or groups of soldiers in the field. To what extent can specialist mathematicians and non-mathematicians, and
contact be retained following enemy action? Intuitively, intersects with many other disciplines. Those working in
we should strive for vertices with degrees as small as the area often choose to specialize further:
can be achieved. The concept of vulnerability of a • Algorithmic graph theory, looking at processes to solve
graph is useful here. certain problems such as the Shortest Path problem or
• Electrical circuits. When a circuit contains many wires, the Minimal Connector Problem
it is important that they only cross at specified points • In infinite graph theory, where the numbers of vertices
where they can be insulated or otherwise kept and edges are no longer constrained to be finite
separate. In graph theoretical terms, this can be • In random graph theory: graphs are described by a
considered as minimising the number of times edges probability distribution, or are generated by a random
intersect at points other than vertices. This can be process
measured by quantities such as the thickness of a
graph. • Graphs appearing in other branches of mathematics,
such as group theory, knot theory and finite geometry.
More about Graphs
To illustrate the last of these, the Fano plane represents
a projective plane of order 2.
Linear and Non-Linear Optimization
Nicholas Cron
The vertices give rise to the 7 points in the projective
plane, the edges to the 7 lines in the plane, and the
notion of incidence carries across from graph to finite
geometry.
1
5.1 Comparing Algorithms

Chapter 5. Computational Complexity
Algorithms are ubiquitous. For example:
• Sorting a list of numbers (or sorting in Excel)
5.1 Comparing Algorithms • Finding the GCD of two numbers (Euclidean Algorithm)
5.2 Examples and Applications • Simulating pseudorandom numbers etc.
In this course, we have met:

• Simplex Algorithm
• Interior Point Algorithm
• Kruskal’s Algorithm etc.
Informally, an algorithm is simply a set of rules, like a recipe,

but usually we have more specific requirements than those
involved in producing a nice ham omelette.
2
Computational Complexity Computational Complexity
We expect: We concentrate on issues of timing: how fast is it? does it run
• A clear unambiguous description of processes in a reasonably short time? which of two possible algorithms
• A defined set of inputs is faster?
• A defined set of outputs Algorithms are critical in many areas. For example, secure
• Guaranteed termination, or a stopping rule transmission of data. Suppose B wants to send a message
• A guarantee that the correct result will be found (this can (or pin number or whatever) to A. A should be able to read
sometimes be relaxed if suboptimality is acceptable) the message without difficulty, but the message should be
indecipherable to an evil hacker who manages to intercept it.
There are several questions we can ask: One method consists of:
• Does an algorithm actually exist to solve the task in hand? • Selection of two large secret primes, say p and q, say 512
• Can we be sure it works for all acceptable inputs? bits each (the private key)
• How long does it take to run? • The value n = pq, which can be public (the public key).
• How much memory is needed? The values p and q are known only to A and B, but n can be
• Is our algorithm the best possible? known more widely.
Computational Complexity
The data are encrypted by B with some mathematical method Computational Complexity
often involving modular arithmetic. (The technical details can
Are computers not getting faster?
get a bit involved – google, for example, RSA encryption).
The method will depend on p and q. Only A can decrypt the Yes, but not fast enough!
message as she is the only one with the private key. And
when she wants to reply, she simply repeats the process, Moore’s Law (an empirical rule) states that the number of
encrypting her message to B using p and q. transistors in an integrated circuit doubles about every
two years.
Many might like to crack the code by finding p and q given n.
In principle there is an algorithm that from given n will find p If, by some technological miracle, we were able to find the
and q: try all 2512 possible p’s, but an astronomical number. prime factors of a huge number with 1024 bits, we could
In practice no fast algorithm is known for this problem and simply increase the magnitude to 2048 or 4096 bits to put
security of many codes depends on this fact. the solution beyond reach, at least until the speed
catches up.
There is no known polynomial time algorithm to factorise 6
large integers, so the larger n is, the more secure the coding.
Computational Complexity Computational Complexity
As mentioned earlier, there are special situations where
algorithms that are sometimes wrong can still be useful.
An example is testing whether a number is prime. There is

an algorithm called the Rabin-Miller test that is always
correct when it reports a number as composite, but has a
25% chance of being wrong when it reports a number is
prime.
One test therefore is not enough to conclude you have

found a prime, but you can perform repeated tests and
reduce the probability of being wrong to an arbitrarily small
value (but never zero).
(Primality testing is ‘easier’ than factorisation).
Comparing Algorithms Comparing Algorithms

We often wish to analyse an algorithm, or to compare two In order to analyse the situation, it is helpful to use O
algorithms, in terms of the number of steps needed to run the notation. Alternatives exist, but this notation is widely used
algorithm, or the time required for completion. Notice that time in complexity theory, computer science and mathematics
taken can be expected to be proportional to number of steps. as well as optimisation. The idea is to determine how fast a
function grows as n increases. (O is short for order).
We can often determine the performance in terms of the size n
of the problem. This can be thought of as an input parameter
(number of rows of a square matrix, number of vertices Formally, suppose f(x) and g(x) are real functions. We say
of a graph, number of variables in LP etc.). that f(x) is O(g(x)) iff there exist constants N and C such
that |f(x)|≤C|g(x)| for all x≥N.
An algorithm can run quite differently for different actual inputs
of the same size. Could consider best case, average case or Intuitively, this means that f does not grow faster than g.
worst case. The last of these may most useful; we seek the
time (number of steps) that might potentially be needed.
10 11
In mathematics, if f(x) = x3 + 10x2 + 100x + 1000
we could write To analyse algorithms, we adapt the definition by reframing in
|f(x)| ≤ |x|3 + 10|x|2 + 100|x| + 1000 terms of the input size n, ignoring the modulus sign since,
≤ |x|3 + 10|x|3 + 100|x|3 + 1000|x|3 typically, positive functions with a natural number n as
= 1111|x|3 for large x. argument are considered, and we relate functions to execution
time.
This is exceptionally crude bounding but it does
enable us to invoke the definition and say that f is O(x3). An algorithm is said to be O(f(n)) if there exist constants C and
N such that, for all n > N, the execution time is at most Cf(n).
More generally, any polynomial is O(xn) where n is the Generally, the functions f(n) are elementary functions, as in the
highest power of x appearing in f. mathematical context, that express the behaviour of algorithms
in a straightforward and easily comprehended way.
The function g, the order of f, is invariably a simple function
such as √x, x2 or log x.
13
Comparing Algorithms
Comparing Algorithms The following fact is useful: if a function f(n) is a sum of
We consider such functions as functions, one of which grows faster than the others, then the
O(1) - Constant fastest growing one determines the order of f(n).
O(log n) - Logarithmic For example: if f(n) = 100 log(n) + 50n + 7nlogn + 3n2 + 0.01n3,
O(n) - Linear then f(n) is O(n3). [We write f(n) is O(n3), not f(n) = O(n3).]
2
O(n ) - Quadratic
O(nk) - Polynomial Some authors use mathematical/logical notation:
O(kn) - Exponential f(n) is O(nlogn) ↔ CN s.t. n>N, f(n) ≤ Cnlogn
These are given in increasing order of time required. In although saying f has order nlogn seems more transparent.
general we would expect an O(n) algorithm to take longer
than a O(logn) algorithm (at least for large n).
Now O(nk) and O(kn) are very different. The latter grows
very much faster with n, no matter what the value of k>1.
(Since kn > nk for all k > 1 for large enough n) 14 15
The efficiency of an algorithm depends on various factors, The constant C accounts for extraneous factors mainly related
including: to hardware over which we may have little control. The function
• CPU (time) usage of n that gives the order is inherently related to the nature of the
• memory usage algorithm used and is our primary concern. The constant N
accounts for the fact that for small problem sizes, the algorithm
• network usage may not show its worst case performance, and indeed may
All are important but we mostly talk about time complexity behave quite anomalously.
(CPU usage).
Distinguish between: Complexity affects performance but not the other way around.
• Performance: how much memory/disk/... is actually used When we are trying to find the complexity of an algorithm, we
when a program is run. This depends largely on the are not mainly interested in the exact number of operations that
machine, compiler, etc. as well as the code. are being performed. Rather, we are interested in the relation of
• Complexity: how do the resource requirements of a the execution time to the problem size.
program or algorithm scale, i.e., what happens as the size 16 17
of the problem being solved gets larger?
For an O(n) algorithm, doubling the problem size doubles
the time taken. n nlog2n n2 n3 1.5n 2n n!
For an O(n3) algorithm, doubling the problem size n=10 < 1 sec < 1 sec < 1 sec < 1 sec < 1 sec < 1 sec 4 sec
increases time taken by a factor of 8. n=30 < 1 sec < 1 sec < 1 sec < 1 sec < 1 sec 18 min 1025 years
For an O(2n) algorithm, doubling the problem size n=50 < 1 sec < 1 sec < 1 sec < 1 sec 11 min 36 years > 1025 years
increases the time taken by a factor of 2n.
n=100 < 1 sec < 1 sec < 1 sec 1 sec 12892 years 1017 years > 1025 years
The table on the next slide gives the running times of
n=1000 < 1 sec < 1 sec 1 sec 18 min > 1025 years > 1025 years > 1025 years
different algorithms on inputs of increasing size, for a n=104 < 1 sec < 1 sec 2 min 12 days > 1025 years > 1025 years > 1025 years
processor performing 106 instructions per second. n=105 < 1 sec 2 sec 3 hours 32 years > 1025 years > 1025 years > 1025 years
We see the deleterious effect of using a high order
n=106 1 sec 20 sec 12 days 31710 years > 1025 years > 1025 years > 1025 years
function. We can regard polynomial problems of order at
most nk for some k as reasonably tractable, exponential
18
problems as intractable.
Comparing Algorithms Theoretical computer scientists classify algorithms. More
can be found in a textbook on algorithms or the theory of
It is very important to know if an algorithm runs in polynomial computation. For a good elementary account, see
http://cs.stackexchange.com/questions/9556/in-basic-terms-what-is-the-
time or not, and whether it is O(n2), O(n3), etc. definition-of-p-np-np-complete-and-np-hard/9566#9566
Stress that we seek approximate results for large n, based We provide a very simplified account that is adequate for
on worst case performance, simply as a function of n. the purposes of the course.
One class of problems (class P) consists of those problems
Recall that, since time is roughly proportional to speed, an for which an algorithm to solve the problem exists and runs
algorithm is O(f(n)) if execution time is at most Cf(n) or if in polynomial time.
speed is at most Cf(n) for large n and suitable C. Another class (class NP) consists of those problems for which
any instance can be verified in polynomial time.
Notice also that the value of the constant C may affect time in NP stands for ‘non-deterministic polynomial’, not ‘not
a major way (there are still practical reasons to use a fast and polynomial’.
efficient computer). For example, any solution to sudoku can be checked quickly.
20
But there is no known universal polynomial time algorithm, only
strategies which may or may not work in individual cases.

Since it is clear that any class P problem can be verified in Theoretical computer scientists have been able to classify
polynomial time, we have PNP. Now consider an NP algorithms into several classes that extend beyond the P, NP
problem for which no polynomial time solution is known – and undecidable descriptions. Two of the most well known
there are many possible examples (e.g. simplex). For these and important are NP complete and NP hard.
problems, is that because no polynomial solution exists, or For any NP problem that is complete, there exists a
are we just not clever enough to find one? In the former polynomial-time algorithm that can transform the problem into
case, P is a proper subset of NP, but it is conceivable that all any other NP complete problem. This transformation
polynomially verifiable algorithms are in fact in class P, so requirement is also called reduction.
that P=NP. This is a famous unsolved question. Anong the NP problems proven to be complete are the
Traveling Salesman, Knapsack, and Graph Coloring problems
A third class of algorithms, sometimes called class U,
consists of undecidable problems, for which no universal
algorithm can exist that covers all instances. 23
Another class comprises those problems that are called NP
hard. This class includes, but is not limited to, NP complete It can be difficult to think about these classifications. In
problems too. They are not only hard to solve but many are summary:
hard to verify as well. In fact, some of these problems aren’t
even decidable. P problems are quick to solve;
The Traveling Salesman Problem, and the Graph Coloring NP problems are quick to verify but slow to solve
Problem are NP hard as well as NP complete. NP Complete problems are quick to verify, slow to solve
These algorithms are defined by a property similar to ones and can be reduced to any other NP Complete problem
that are NP Complete – they can all be reduced to any NP Hard problems are often slow to verify, always slow to
problem in NP. So they are at least as hard as any other solve and can be reduced to any other NP problem
problem in NP. A problem can be both in NP and NP Hard.
The relationships may be shown in an ‘Euler diagram’
25

The picture expresses the facts that if P ≠ NP,
then P ⊆ NP, NP Complete ⊆ NP, NP Complete ⊆ NP Hard
If P ≠ NP
But if P = NP then each of P, NP and NP Complete is equal
to each other, and each is a subset of NP Hard
The defining properties of these sets may also be illustrated:
If P = NP
26 27
A lot of work has been done to extend this discussion: Summary
• Is there a ‘gap’ between class NP and class U? (Decidable
problems not verifiable in polynomial time). Yes, but no
simply explained examples I know of. • Algorithms are important. They are very widely used in
computing and in our general lives.
• Even for class P (polynomial time) algorithms, it is important
to find an efficient algorithm, in particular to ensure the • Simple stated problems can be hard to solve. For
polynomial is of smallest possible order. example, TSP or the Subset Sum Problem.
• Class NP problems, not known to be class P, can often by • Simple ideas don’t always work. Choosing the nearest
addressed by heuristic methods that are tractable and will city at each stage will not in general solve TSP.
frequently yield a solution close to the true solution • Simple algorithms can be very slow. Brute-force
• The existence of undecidable problems was proved in the factoring, TSP.
1930s by Alan Turing. His arguments were theoretical. More • For some problems, even the best known algorithms are
recently, actual examples of such problems have been slow. TSP again.
found. So U≠φ.
• This discussion can be extended to include mathematical
logic and Turing machines
Summary 5.2 Examples and Applications

Example 1: Simple Arithmetic
The addition of two numbers a and b is O(1) – calculate a+b.
This discussion is not theoretical. Assuming a large instance of The addition of n elements in a list is O(n)
a problem, always choose a class P algorithm if possible, and Calculation of the trace of a matrix is O(n)
then choose that one for which the polynomial is of lowest Calculation of the sum of all elements of a matrix is O(n2).
degree. For large instances of problems in class NP not known These are all Class P.
to be class P, a heuristic method is likely to perform better than
an exact method such as an exhaustive search of cases. Example 2: Matrix Multiplication
Problems in class U should be avoided at all costs! Suppose we wish to multiply two nxn matrices. Then it is
quite easy to see that we require at most n2(n+1) operations
See http://www.bbc.co.uk/programmes/b06mtms8 for a recent Radio 4 (addition and multiplication) with usual methods. In fact,
discussion with Melvyn Bragg on P v NP. Another source:
http://bioinformatics.uchc.edu/LectureNotes_2015/Computation_Complexity_2015.pd efficient techniques exist to reduce the number of operations
f in some cases. Notice that discussion relates to the algorithm
used, not the underlying problem. What we can say is that
31
matrix multiplication is (at worst) O(n3)). [Class P problem.]
Examples and Applications Examples and Applications
Example 4: Sorting Algorithms
Example 3: Coin Weighing
It is important in many applications to sort a list of
Suppose we have n coins. All have the same weight except
one forgery, which is lighter. How many weighings are numbers into ascending order. Assume a
needed to find the underweight coin? Consider 25=32 permutation of {1,2,…,n}. A number of sorting
coins. Divide into two groups of 16 and weigh; the forgery algorithms exist. Which is best for large n?
is in the lighter group of 16. Divide this group into two
groups of 8; the forgery is in the lighter of the two.
a) Bubble Sort successively compares adjacent
Continually dividing into two groups, can see that 5 weighings
are needed. In general, with n coins, the number of weighings elements, exchanging them if necessary:
needed will be floor(log2n). That is, find the log of n to base 2 524163
and round down to the next integer. The algorithm is O(log n). 254163
With ingenuity, may be able to improve number of steps
slightly, but problem will still be O(log n). [This is class P.] 245163
241563
32
2 1 4 5 6 3 etc. 33
Examples and Applications

b) Selection Sort finds the smallest element and moves it
to the front by exchange, then the next smallest and so on: d) Heap Sort is a more sophisticated version of
524163 Selection Sort using a tree structure (not described
124563
here – can be googled).
123564
1 2 3 4 6 5 etc. Alternatives exist - but which of the four performs
best? Look at complexity of each.
c) Quick Sort chooses a pivot (various methods)
and places it so elements to the left are smaller,
Type Worst Case Average Case
elements to the right larger, then quick sorts sublists:
524163 Bubble O(n2) O(n2)
213546 Selection O(n2) O(n2)
123546 Quick O(n2) O(nlogn)
1 2 3 4 5 6 Done
Heap O(nlogn) O(nlogn)
[This depends on the choice of pivot. A safe approach is to
choose the pivot randomly.] 34 35
Discussion:
•Bubble Sort not considered a good method. Simulations Example 5: LP
show Selection Sort is generally quicker, although both are We saw earlier that the simplex method may involve
O(n2). visiting 2n vertices. The simplex algorithm is not in class P,
•Quick Sort generally performs well, but can perform quite but it is NP complete.
poorly in a few cases.
•Heap Sort can underperform Quick Sort in many cases, but It can be shown that Karmarkar’s interior point algorithm,
the worst case is much better. for n variables, is O(n3.5) [Class P] – other interior point
•The comparison shows that Heap Sort is the method of algorithms may perform differently.
choice (among the four) for large n.
This confirms the benefits of polynomial time interior point
algorithms compared with exponential time simplex
Other approaches exist. See algorithms for very large n.
http://www.cprogramming.com/tutorial/computersciencetheory/sortcomp.
html 36 37

Example 6: The Shortest Path Problem
The algorithm we described (and similar algorithms, such Example 8: Sudoku
as that for unweighted graphs) were described by Dijkstra Sudoku is NP complete when generalized to an nxn grid
in the 1950s. They are O(V2) (with V vertices) so are in (obviously the usual 9x9 version is in class P).
class P.
Example 9: Halting Problem
Example 7: The Knapsack Problem
The Halting Problem is historically important from the early
This was described in chapter 3 and is in class NP, in fact
days of computation theory. It aims to determine, given a
NP complete. computer program and an input, whether the program will
(http://people.orie.cornell.edu/shmoys/or630/notes-06/lecture27.pdf )
finish running, or will continue to run forever. Alan Turing
proved in 1936 that a general algorithm to solve the halting
problem for all possible program-input pairs cannot exist,
so the problem is in Class U. (It is also NP hard – but not
39
NP complete!)
Example 10 : Hilbert’s Tenth Problem Example 11: Minimal Connector Problem
This concerns Diophantine Equations. Suppose we have Have seen two algorithms, Kruskal and Prim.
such an equation in any number of variables with integer Which one to use? Use two results.
coefficients, i.e. an equation p(x1,x2,…,xn)=0 where p is a
polynomial. Is there an algorithm by which it can be a) It can be shown quite easily that if G is a
determined whether the equation is solvable in integers? simple connected graph with n nodes and m arcs, then
[e.g. x12+x22 = 5 is solvable; x12+x22 = 6 is not]. n-1 ≤ m ≤ ½n(n-1).
This was for long an open question, but we now know the The left hand inequality becomes an equality iff G is a tree,
answer is no. See Y.V. Mativasevich, Hilbert's Tenth when removing any arc disconnects the graph. The right
Problem, MIT Press, Cambridge, Massachusetts, 1993. hand inequality becomes an equality for ‘complete graphs’
The problem is undecidable (Class U). (every pair of distinct nodes joined by an arc), when adding
a further arc leaves a graph that is no longer simple.
41
Examples and Applications Comparing Algorithms

Example 12: Subset Sum Problem
b) It can be shown, with a little more difficulty, that the
Given a finite set of natural numbers S, is it possible to
complexity of Prim’s Algorithm is O(n2) for n nodes. The
partition S into two disjoint subsets A and B (so AUB=S and
complexity of Kruskal’s Algorithm is O(mlogm) for m arcs.
A∩B=∅) such that the sum of the numbers in A is equal to the
sum of number in B?
Suppose m ≈ n, that is the graph is sparse: there are not many
[If S={2,3,4,5,14}, the answer is YES; if S={2,3,4,5,15}, it is NO.]
more arcs then nodes. Since nlogn < n2, Kruskal is preferred.
You will probably try all possible ways of partitioning the
numbers into two sets until you find a partition where the sums
Suppose m ≈ ½n(n-1), so the graph is dense: most pairs of
are equal, or until you have tried all possible partitions and
nodes are joined by an arc. Then
none has worked. If any of them worked you would say YES,
mlogm ≈ ½n(n-1)log{½n(n-1)} > ½ n2 for large n; Prim is better.
otherwise you would say NO. There are exponentially many
possible partitions, and no polynomial time algorithm is known.
These results apply for large n. Different criteria apply for
However, given two sets A and B, you can easily check if the
smaller values of n. Note that both algorithms are class P, but
42 sums are equal and if A and B provide a partition of S. So this
the choice between them has consequences for timing.
problem is in class NP, in fact NP Complete.
Comparing Algorithms - TSP Comparing Algorithms - TSP
Example 13: The Travelling Salesman Problem
Mentioned in Chapter 4, the TSP is important in complexity
theory, graph theory and optimization and mathematics in Clearly, one can examine every possible closed path but
general. there are ½(n-1)! of them, so such an algorithm would not
To remind everyone, we consider a man (or a woman in run in polynomial time.
No polynomial time algorithm is known, although heuristic
the Traveling Salesperson Problem) who wants to visit n
methods approximate the solution and it is possible to find
cities. He/she can go from any city to any other; distances
upper and lower bounds for the true minimum.
between each pair of cities are known. How can he/she TSP is NP Complete. We can express this by saying that
visit them all so that the total distance travelled is as small if you can find a polynomial time algorithm to solve the
as possible? Alternatively, given an undirected weighted Travelling Salesman Problem, then you can find a
graph, find a closed path of least total weight which starts polynomial time algorithm for all problems in class NP, and
and finishes at a specified vertex after having visited each P=NP. This is quite an amazing result.
other vertex exactly once (a Hamiltonian cycle) .
Chapter 6. Non-Linear Programming

6.1 Introduction and Examples
6.2 Convexity and Concavity
6.3 Univariate Problems: Exact Methods
Linear and Non-Linear Optimization 6.4 Univariate Problems: Approximate Methods
6.5 Multivariate Unconstrained Problems: Exact Methods
6.6 Multivariate Unconstrained Problems: Approximate Methods
6.7 Multivariate Equality Constrained Problems
Nicholas Cron 6.8 Multivariate Inequality Constrained Problems
6.9 Quadratic Programming
6.10 Penalty Functions
(Some modifications are likely, dependent on timings and the

preferences of students.)
1 2
6.1 Introduction and Examples Introduction and Examples
An LP is a problem of the form max cTx s.t. Ax ≤ b, x ≥ 0 NLPs are much harder than LPs. The feasible region may be
(or similar form). All functions are linear. bounded, unbounded or non-existent (as in LP), but will
An NLP is of the form max f0(x) s.t. fi(x) ≤ 0 for i = 1, 2, …, m obviously have curved boundaries in general. This means
where at least one function fi (0≤i≤m) is non-linear. that the nice properties of LPs no longer apply.
The problem may have alternative forms: thus the objective • The optimum need no longer occur at a vertex, e.g. max x+y
function may be a minimisation, the constraints may be of form s.t. x2+y2≤1, x≥0, y≥0.
≥ or =. Can always be written as described. Here the optimum is at (½√2,½√2) on the curved boundary
of the feasible region.
The exact formulation is less crucial than for LP (no specific
• The optimum may occur inside the feasible region, e.g. min
need for non-negativity in all cases, only for a correct x4+y4 s.t. x2+y2≤1. The minimum is at (0,0).
representation of the problem). • It’s hard to distinguish a local optimum from a global
Many LPs are simplifications or approximations of NLPs. optimum. Therefore while we can often find necessary
NLPs occur naturally in many areas of science, engineering, conditions for optimum (typically of the form f '(x)=0) it is a
business etc. lot trickier to find sufficient conditions (which local optimum?)
3
Introduction and Examples Introduction and Examples

• There may be multiple disconnected feasible regions
• There is a huge body of complex mathematical theory.
Which algorithm to apply? Can we be sure two
methods will give the same outcome?
Some consequences:
• Problems in NLP are as much about solution as
formulation; LP solutions via simplex or interior point
• Different starting points may lead to different final are more standard.
solutions. We may end up at different local optima; worse, • We frequently need numerical methods rather than
the solution may vary with the solution method used. precise algorithms.
• There is seldom a clear determination of the optimum. We • Fresh approaches are needed. NLP solutions generally
may believe we have an optimum – at best we can often are based on calculus (whereas LP is based on linear
only check conditions that ensure a local optimum. algebra).
(Because we lack sufficient conditions for optimality). • There is no quick fix to the necessary/sufficient issue in
general.
Terminology: the notions ‘objective function’, ‘constraint’, Classification: it is convenient to classify NLPs.
‘feasible region’, etc. carry over from work on LPs. • If there is only one variable, the problem is univariate,
For a maximisation, a point x* in the feasible region is a otherwise it is multivariate. Univariate problems are
important and non-trivial.
global optimal solution to the NLP if f(x*)≥f(x) for all points x
• If there are no constraints, the problem is unconstrained,
in the feasible region. otherwise it is constrained. Unconstrained problems are
(Similarly for a minimisation). important and non-trivial.
A feasible point x'=(x'1, x'2, …, x'n) is a local optimum if, for • If there is an analytic solution, the method is exact; if a
sufficiently small ε, any feasible point x=(x1, x2, …, xn) numerical procedure is used, the method is approximate.
with |xi-x'i|<ε (i=1,2,…,n) satisfies f(x')≥f(x).
(Similarly for a minimisation). So conceptually there are 8 types of NLP. Whilst a couple
So a global optimum is a local optimum, but not vice versa. are straightforward, most pose serious difficulties.
Introduction and Examples

Introduction and Examples Example 1: Facility Location
A company is trying to determine where to locate a single
Additionally, we can sometimes exploit special
warehouse. The positions in the x-y plane of their
structure to aid solution:
• LP is a particular case and an obvious example four main customers, and the number of shipments made
• Quadratic Programming uses simplex-type methods annually to each customer, are as shown. Where should the
when the objective function is quadratic, constraints warehouse be if the total distance trucks travel annually from
linear
• Fractional Programming uses special techniques when warehouse to customers is minimised?
the objective function is of form h1(x)/h2(x) for certain h1,
h2, for example linear functions. Customer x coordinate y coordinate Shipments
• Convex programming comprises an important subclass 1 5 10 200
of NLP which uses properties of convexity and concavity.
They can sometimes guarantee that a solution is globally 2 10 5 150
optimum. 3 0 12 200
4 12 0 300
10
This can be solved as an unconstrained NLP:
No constraints, no need to alter Options unless required.
min{200[(x-5)2+(y-10)2]½ +150[(x-10)2+(y-5)2]½ + Solver gives solution x=9.314, y=5.029, i.e. warehouse
200[x2+(y-12)2]½ + 300[(x-12)2+y2]½} should be at or near point (9.3,5.0)
Let’s see how this can be achieved by computer. In R, use code (minimisation is the default)
f=function(x)
200*sqrt((x[1]-5)^2+(x[2]-10)^2) + 150*sqrt((x[1]-10)^2+(x[2]-5)^2) +
In Excel, can use Solver again. 200*sqrt(x[1]^2+(x[2]-12)^2) + 300*sqrt((x[1]-12)^2+x[2]^2)
optim(c(8,8),f)$par
Values for x and y are in cells A1 and A2.
B1 contains =200*SQRT(((A1-5)^2)+((A2-10)^2)) This uses starting point (8,8). $par suppresses all output
Similarly for C1, D1, E1 to give remaining terms. except solution values. A range of options is available, such
F1 contains =SUM(B1:E1) as changing the search method.
In the Solver dialogue window: R gives solution (9.313 5.029), almost the same as Excel.
Set Target Cell $F$1 Equal to Min Do not expect complete agreement for these approximate
By Changing Cells $A$1:$A$2 methods. Consider some approaches later in the chapter.
11
Constraints can easily be accommodated.

Example 2: Furniture Production If a large number of tables are produced, scraps can be
A company produces tables and chairs. 700 units of wood used instead of new wood for some parts. We can use a
function of form
are available. A table uses 2 units of wood and gives a
profit of £20; a chair uses 1 unit of wood and gives profit 0.5
1.5 + units of wood per table when x1 tables
£15. There must be at least 80 chairs, and at least twice as 0.998 + 0.002x1
many chairs as tables. How many of each to produce? are produced. [So x1=1 → 2 units, x1=100 → 1.92 units etc.]
Suppose there are x1 tables, x2 chairs.
We have: max 20x1 + 15x2 There may be similar economies of scale for the
s.t. 2x1 + x2 ≤ 700 chairs, so use a function 0.5 + 0.5
x2 ≥ 2x1 0.994 + 0.004 x + 0.002 x
1 2
x2 ≥ 80 units of wood per chair, since table scraps can also

x1, x2 ≥ 0 be used for chairs.
.
This a an LP but may be unrealistic.
13 14
Introduction and Examples
Suppose in addition a discount of 0.5% is applied
for every table or chair purchased after the first.
Then the profit per table is 20(1-0.005)x1+x2-1 Important notions in NLP – in particular because
and profit per chair is 15(1-0.005)x1+x2-1 Instead of an LP, they provide sufficient conditions for optimum in certain
now have a (messy) NLP: cases.
max (20x1+15x2)(0.995)x1+x2-1 A set S is convex if x1 ε S and x2 ε S imply that all points on
s.t. 0.5 0.5
x (1.5 + ) + x (0.5 + )  700 the line segment joining x1 and x2 are also members of S, or
0.998 + 0.002 x 0.994 + 0.004 x + 0.002 x
1 2
1 1 2
more formally if cx1 + (1-c)x2 ε S whenever 0 ≤ c ≤ 1.
x2 ≥ 2x1 We consider sets in Rn. A set is concave if it is not convex.
x2 ≥ 80
For example, a circle (or an n-dimensional sphere in Rn) is
x1, x2 ≥ 0
convex. A cube (or hypercube in Rn) is convex. A torus in
Impossible to solve analytically. However, with just
two variables, a graphical solution can be found. R3 is concave (not convex).
Rounding to an integral solution may be acceptable
here. 15 16
Convexity and Concavity Convexity and Concavity

A function is strictly convex if the inequality in the preceding
The following figures are respectively definition is strict (< instead of ≤). Strict concavity is defined
convex, convex, concave, concave. similarly. Consider the single variable case:
A function f, defined on all points in a convex set S, is a convex

function if, for any x1 ε S and x2 ε S,
f(cx1 + (1-c)x2) ≤ cf(x1) + (1-c)f(x2) for all c with 0 ≤ c ≤ 1.
The function f is concave if , for any x1 ε S and x2 ε S,
f(cx1 + (1-c)x2) ≥ cf(x1) + (1-c)f(x2) for all c with 0 ≤ c ≤ 1.
17
Mnemonic: concAve! 18
Some typical general results. Please do not confuse convex
In the former case, f(cx1 + (1-c)x2) ≤ cf(x1) + (1-c)f(x2) sets and convex functions.
In the latter, f(cx1 + (1-c)x2) ≥ cf(x1) + (1-c)f(x2).
We see that, with a single variable, f is convex iff the line Result1:
segment joining any two points on f is never below the The intersection of convex sets is convex.
curve. And f is concave iff the line segment is never above Proof: Let S1, S2 be convex. Suppose, x1 ε S1∩S2 , x2 ε S1∩S2,
c ε [0,1]. Now cx1 + (1-c)x2 ε S1 (since x1, x2 ε S1) and similarly
the curve. For example:
cx1 + (1-c)x2 ε S2. So cx1 + (1-c)x2 ε S1∩ S2; S1∩S2 is convex.
• f(x) = x2 is convex on R
The result extends to any number of convex sets by induction.
• f(x) = ex is convex on R
• f(x) = √x is concave on R+ Result 2:
• f(x) = sin x is neither convex nor concave on [0,2π] The union of convex sets is not necessarily convex.
(although it is concave on [0,π]) Proof: Take (for example) S1 = {(x,y) ε R2 | 0≤x≤2, 0≤y≤1},
19
S2 = {(x,y) ε R2 | 0≤x≤1, 0≤y≤2}. 20

Result 3
A hyperplane in Rn divides the space into two convex sets.
Result 6
Proof: Apply the definition of convexity by taking two points
in either set. A linear function f(x) = ax+b is both convex and concave.
(a,b constant)
Result 4 Proof: Consider f(cx1 + (1-c)x2) = a(cx1 + (1-c)x2) + b
The feasible region for an LP is convex. = c(ax1 + b) + (1-c)(ax2 + b) = cf(x1) + (1-c)f(x2).
Proof: Apply results 1 and 3. The polytope is the
The definitions for convexity and concavity hold with equality.
intersection of a number of sets, each of which is convex.
Result 5 Result 7
A function f is convex iff –f is concave If f and g are convex functions, then so is f+g.
Proof: Direct from definitions of convexity and concavity. Proof: Apply the definition of convexity to both f and g, and add
the resulting inequalities.
21 22
In practice, we often need to determine if a given function is Now consider n > 1 variables. How can we tell whether
convex or concave. How can this be done? f(x1,x2,…xn) is convex or concave on a convex set S in Rn?
Simple in the single variable case. If f(x) is convex the line Assume throughout that f has continuous second order
joining any two points is never below the curve, so the slope derivatives. Recall that the Hessian of f(x1,x2,…xn) is the nxn
of f(x) must be non-decreasing for all x. Hence we have matrix whose (i,j) entry is 2 f
Proposition xi x j
If f''(x) exists for all x in a convex set S, then f(x) is a convex
2 f  f
2
function iff f''(x) ≥ 0 for all x ε S. [Since we know that = , the Hessian is a
 x  x  x  x
symmetric matrix.] i j j i
And similarly
Proposition We let H(x1,x2,…xn) (or H(x) or simply H where the context is
If f''(x) exists for all x in a convex set S, then f(x) is a concave clear) denote the Hessian matrix at (x1,x2,…xn).
function iff f''(x) ≤ 0 for all x ε S.  6x 2 
23 If f(x1,x2) = x13+2x1x2+x22 then H (x , x ) =  1
 24
 2 2
1 2

Recall the following definition where A is an n × n
So we see that study of convexity and concavity is reduced
symmetric matrix. We say that A is:
to examining definiteness of a matrix.
1. Positive definite if x′Ax > 0 for all x ≠ 0 in Rn.
2. Positive semidefinite if x′Ax ≥ 0 for all x ≠ 0 in Rn. Theorem
3. Negative definite if x′Ax < 0 for all x ≠0 in Rn.
The following statements about a symmetric matrix A are
4. Negative semidefinite if x′Ax ≤ 0 for all x ≠ 0 in Rn.
equivalent:
5. Indefinite if none of the conditions 1-4 apply.
1. A is positive semidefinite;
2. All eigenvalues of A are nonnegative;
The following result holds for a function f: 3. A = Z'Z for some real matrix Z.
Proposition
1. f is concave iff H(x) is negative semidefinite for all x∈Rn
Similar results hold for other types of definiteness.
2. If H(x) is negative definite for all x∈Rn then f is strictly We can therefore examine convexity and concavity via
concave
3. f is convex iff H(x) is positive semidefinite for all x∈Rn eigenvalues, quadratic forms or other methods of linear
4. If H(x) is positive definite for all x∈Rn then f is strictly algebra.
convex.
We use the following matrix based approach, concentrating Denote the kth leading principal minor of the Hessian matrix
not on strict convexity but rather on convexity, since this is H(x1,x2,…xn) by Hk(x1,x2,…xn). If f(x1,x2) = x13+2x1x2+x22 then
adequate for NLPs with ≤ and ≥ constraints. H1(x1,x2) = 6x1 and H2(x1,x2) = |H| = 12x1 – 4.
An ith principal minor of an nxn matrix is the determinant The important test is now given. Proofs, while not very difficult,
of an ixi matrix obtained by deleting (n-i) rows and the involve some linear algebra and are omitted.
corresponding (n-i) columns from the matrix.
For example, for the matrix  −2 −1  the first principal minors Main Theorem 1 (Convexity/Concavity Testing)
 −1 −4 
are -2 and -4, the second principal minor is 8-1=7. Suppose f(x1,x2,…xn) has continuous second order derivatives
for each point (x1,x2,…xn) ε S. Then
The kth leading principal minor of an nxn matrix is the i) f is a convex function on S iff for all x ε S, all principal
determinant of the kxk matrix obtained by deleting the last (n-k) minors of H are non-negative.
columns and the last (n-k) rows. ii) f is a concave function on S iff for all x ε S, all kth non-zero
principal minors of H have the same sign as (-1)k for
k=1,2,…n.
27 28

Example 3  2 −3 
Example 1 Suppose f3 = x12 - 3x1x2 + 2x22 on S = R2. H =  
 2 2 The first principal minors are 2 and 4.  −3 4 
Suppose f1 = x12 + 2x1x2 + x22 on S = R2. H =  2 2 
  The second principal minor is -1.
The first principal minors are 2 and 2. The f3 is neither convex nor concave on S. [Is - f3 concave?]
second principal minor is 0. f1 is convex on S.
Example 4
Example 2 Suppose f4(x1,x2,x3)=x12+x22+2x32-x1x2-x2x3-x1x3 on R3.
 −2 −1 
Suppose f2 = -x12 - x1x2 - 2x22 on S = R2. H =  −1 −4 
   2 −1 −1 The first principal minors are 2, 2, 4.
The first principal minors are -2 and -4. The H =  −1 2 −1 The second principal minors are 7, 7, 3.
 
second principal minor is 7. f2 is concave on S.  −1 −1 4  The third principal minor is |H|=6
 
f4 is convex on R3.
29 30
Convexity and Concavity Chapter 6. Non-Linear Programming
Example 5 6.1 Introduction and Examples
f5(x1,x2,x3,x4)=x12+2x22+2x32+x42+2x1x2+2x2x3+2x3x4 on R4. 6.2 Convexity and Concavity
Can examine the 4x4 Hessian, but probably easier here: 6.3 Univariate Problems: Approximate Methods
f5(x1,x2,x3,x4)=(x1+x2)2+(x2+x3)2+(x3+x4)2. 6.4 Multivariate Unconstrained Problems: Exact Methods
Each of the square terms is convex (Example 1) so f5 is a
sum of convex functions hence convex (Result 7, Slide 24)
6.7 Multivariate Inequality Constrained Problems
Example 6 6.8 Quadratic Programming
f6 = -2x14 + 8x12x22 - 5x24 on S = R2. 6.9 Penalty Functions
One first principal minor of H is -24x12 + 16x22
When x1 = 1, x2 = 0 this is negative : f6 cannot be convex. (Some variation in these subchapters may occur, dependent on
When x1 = 0, x2 = 1 this is positive : f6 cannot be concave. timings and the preferences of students.)
The function is neither convex nor concave on S. 31 1
6.3 Univariate Problems Univariate Problems

These problems involve maximising (or minimising) scalar f(x)
in some context. Any additional constraints can generally be
Approximate methods are far more widely used and are
incorporated by restricting x to lie in some interval. If x is
ideal for computer implementation. These methods locate
unrestricted, can still also usually find an interval known to
optima with the required degree of accuracy, not precisely.
include the optimum. So we consider the problem in the
Can use point or interval approximation. Best choice of
form max (or min) f(x) s.t. x ε [a,b] for some constants a and b.
technique depends on nature of the function considered,
Ideally we should consider such apparently simple problems:
required degree of accuracy, available computing power
• Some non-trivial NLPs can be expressed this way
etc. Rate of convergence is important. Sometimes use
• The methods can generalise for multivariate problems
hybrid methods.
• Univariate problems are needed as subtechniques for
multivariate methods such as steepest ascent.
We describe one point method (a sequence of points
They are indeed not completely trivial, but exact univariate
converging, we hope, to the optimum); and one interval
methods depend largely on univariate calculus and are quite
methods (a sequence of successively narrower intervals
straightforward. They are relegated to the Miscellaneous part
2 enclosing the optimum). 3
of Moodle.
Univariate Problems: Approximate Methods
We require a point for which f'(xn+1) ≈ 0.
Newton’s Method (Point Method) So 0 ≈ f'(xn) + (xn+1-xn)f''(xn)
Similar to Newton-Raphson for estimating solutions to an Rearrange as an iteration to express xn+1 in terms of xn:
equation. [For some reason, Joseph Raphson (1648-1715)
has joint credit for solving equations, but not for f ( x )
optimisation.] x =x − n
f ( x )
n +1 n
Assume required function is twice differentiable. To use n
Newton to find a stationary point (necessary for optimum),

use a Taylor expansion for f'(x) in the vicinity of x=a, Use of the iteration with a sensible value x1 should usually
ignoring higher order terms: give a sequence converging to the value of x at a local (not
f'(x) ≈ f'(a) + (x-a)f''(a) necessarily global) optimum, on the basis that xn+1 is a
Use this to generate an iterative scheme, starting at some x1: better approximation than xn.
f'(xn+1) ≈ f'(xn) + (xn+1-xn)f''(xn) Iterate until some convergence criterion is met; for example
4
until |xn+1-xn|< ε for predetermined small ε. 5
Univariate Problems: Approximate Methods Univariate Problems: Approximate Methods

For example, min f(x) = e-x + x2 on [0,1]. Start with x1 = 0.5.
Tedious to carry out calculations by hand, but can use R.
1.3
1.2
f=function(x)
f (x)
1.1
{exp(-x)+x^2}
curve(f, from = 0, to = 1)
1.0
fprime=function(x){2*x-exp(-x)}
0.9
fprimeprime=function(x) {exp(-x)+2}
x=c(0.5,rep(NA,6)) 0.0 0.2 0.4 0.6 0.8 1.0
fval=rep(NA,7) x
fprimeval=rep(NA,7) x fval fprimeval fprimeprimeval

fprimeprimeval=rep(NA,7) 1 0.5000000 0.8565307 3.934693e-01 2.606531
for(i in 1:6) 2 0.3490448 0.8271938 -7.271912e-03 2.705362
{fval[i]=f(x[i]) 3 0.3517328 0.8271840 -2.545888e-06 2.703468
fprimeval[i]=fprime(x[i]) 4 0.3517337 0.8271840 -3.118616e-13 2.703467
fprimeprimeval[i]=fprimeprime(x[i]) 5 0.3517337 0.8271840 0.000000e+00 2.703467
x[i+1]=x[i]-fprimeval[i]/fprimeprimeval[i]} 6 0.3517337 0.8271840 0.000000e+00 2.703467
data.frame(x,fval,fprimeval,fprimeprimeval) 6
7 0.3517337 NA NA NA 7
Note: Line Search
• Rapid convergence to x=0.3517, f=0.8272 This is an interval method. It is fairly general but it does
• Could have used a ‘while’ loop to test for convergence require differentiability of f.
• Use of a data frame is optional. We seek the point where f'(x)=0 by successively narrowing
the search interval. We need to assume a unique
In general, Newton’s method has both pros and cons maximising point (or minimising point, by very similar
methods). Can appeal to a sketch. The method should not
1. Conceptually simple.
be used with more than one stationary point.
2. Easy to implement on a computer.
3. Can converge fast near optimum. But…… Basic idea: if after a number of iterations, we know the
4. Assumes f twice differentiable maximum lies in [an,bn], consider xn=½(an+bn)
5. May converge to local, not global optima.
6. May diverge or wander for badly behaved functions. If f'(xn)>0, the maximum is in [xn,bn], so discard [an,bn].
If f'(xn)≤0, the maximum is in [an,xn], so discard [xn,bn].
In particular, notice that Newton may fail if f'' ≈ 0 any time. 8 9

The method reduces the interval of uncertainty by a factor of Univariate Problems: Approximate Methods
2 until the required tolerance is reached. Proceed as follows:
Step 3: (Test)
Step 0 (Initialization) If |bn-an|<ε, have required precision. Stop.
a1 = left-hand end of interval containing maximum
b1 = right-hand end of interval containing maximum
Step 4: (Repeat)
ε = maximum permitted tolerance
n→n+1
n=1
Return to Step 1.
Step 1: (Compute midpoint)
Set xn=½(an+bn). Apply to min f(x) = e-x + x2 on [0,1]. Notice that this is a
minimisation. Obvious small changes are needed, by taking
Step 2: (Reduce Interval) different criteria in Step 2. (Interchange > and ≤).
If f'(xn)>0, an+1 = xn, bn+1 = bn The following R code determines the optimum:
If f'(xn)≤0, an+1 = an, bn+1 = xn
10 11
a b width
fprime=function(x){2*x-exp(-x)}
a=0 1 0.0000000 1.0000000 1.0000000000
b=0 2 0.0000000 0.5000000 0.5000000000
a[1]=0 3 0.2500000 0.5000000 0.2500000000
b[1]=1 4 0.2500000 0.3750000 0.1250000000
x=0
tolerance=0.001
5 0.3125000 0.3750000 0.0625000000
n=1 6 0.3437500 0.3750000 0.0312500000
while(abs(b[n]-a[n])>tolerance) 7 0.3437500 0.3593750 0.0156250000
{x[n]=0.5*(a[n]+b[n]) 8 0.3515625 0.3593750 0.0078125000
if(fprime(x[n])>0)
9 0.3515625 0.3554688 0.0039062500
{a[n+1]=a[n]
b[n+1]=x[n]} 10 0.3515625 0.3535156 0.0019531250
if(fprime(x[n])<=0) 11 0.3515625 0.3525391 0.0009765625
{a[n+1]=x[n]
b[n+1]=b[n]}
n=n+1} Output shows the solutions lies in (0.3516,0.3525). For a
width=b-a
data.frame(a,b,width)
point estimate for the value of x at minimum, can take
½(0.3515625+0.3525391).
The data frame is optional, but illustrates the progress of the
12 13
algorithm.

6.4 Multivariate Unconstrained Problems: Exact Methods
Pros and cons:
• Simple We seek an optimal solution (if it exists) for
• Easy to compute But max (or min) f(x1, x2, …, xn) s.t. (x1, x2, …, xn) ε Rn.
• Slow convergence Assume all first and second order derivatives exist
• Assumes f differentiable and are continuous everywhere.
• Assumes a unique optimum in the interval
 f (x )
Let be the partial derivative of f evaluated at x
Alternatives exist but can be less robust. They may fail to xi
(i=1,2,…,n).
converge at all, or may perform poorly with certain
Also define the gradient vector by
functions. There is typically a trade-off between speedy
convergence and robustness. Golden Section is a widely
T
used alternative. Details are given in the Miscellaneous  f f f 
f ( x ) =  , ,..., 
 x1 x2 xn 
section of Moodle.
14 15
Multivariate Unconstrained Problems: Exact Methods Multivariate Unconstrained Problems: Exact Methods
At stationary points, all partial derivatives are zero, Just to check on principal minors:
so x is a stationary point of f iff f( x )= 0. 8 7 6
 
If H=  5 4 3  then
This defines a system of n equations which may 2 1 0
 
have no solution, one solution or multiple solutions.
As in the univariate case, numerical methods of H1= 8
solution are often needed.
H 2 = 8 7 = −3
The nature of stationary points can be tested in 5 4
various ways (eigenvalues, definiteness of a matrix, H3= |H| = 0
quadratic forms etc.). The following result is
important. It uses Hk, the kth leading principal minor of the
Hessian matrix.
16 17
Main Theorem 2 (Examination of Stationary Pints) These results are important, but as mentioned, what often
1. If H k ( x )  0 (for all k=1,2,…,n) then a stationary point x concerns us is to test for global optima of an NLP.
is a local minimum for the NLP. Main Theorem 3 (Sufficient Condition for Global Optimum)
Consider an NLP in the form of a maximisation. Suppose
2. If H k ( x )  0 and has the same sign as (-1)k the feasible region S is a convex set. If the objective
(for all k=1,2,…,n), a stationary point x is a local function f0 is concave on S, then any local maximum is an
maximum for the NLP. optimal solution, so solves the NLP.
3. If H n ( x )  0 and the conditions of 1 and 2 fail to hold, a Proof
stationary point x is not a local extremum. If the result is false, there is a local maximum x' that is not a
4. If H n ( x ) = 0, no conclusions can be drawn; the tests are global maximum. Then for some x ε S, f0(x) > f0(x').
inconclusive. Since f0 is concave,
f0(cx'+(1-c)x) ≥ cf0(x') + (1-c) f0(x) for 0 ≤ c ≤ 1
> cf0(x') + (1-c) f0(x') because f0(x) > f0(x')
(Some extensions to part 4 can be given.) = f0(x')
18 19
Multivariate Unconstrained Problems: Exact Methods
Now, x' is a local maximum so there is a neighbourhood N A stationary point that is not a local extremum is sometimes
of x' such that f0(x') ≥ f0(x) for all x ε N. But have seen called a saddle point.
f0(cx'+(1-c)x) > f0(x'), and cx'+(1-c)x ε S (0≤c≤1) since Example
x ε S, x' ε S and S is convex. Choose c close enough to 1 min f(x1,x2) = x12+x1x2+x22-3x1-3x2+3
so that cx'+(1-c)x ε N. Then f0(cx'+(1-c)x) ≤ f0(x'), an
 f 
evident contradiction. So every local maximum must be a  x   2x + x − 3
0
global maximum. (Implicitly using the fact that a convex f ( x1 , x2 ) =  1 = 1 2
 = 
 f   x1 + 2 x2 − 3   0 
function on Rd is continuous.)  x 
 2 
Solving, the only extreme point can be (x1,x2)=(1,1)
We can similarly prove 2 1
Corollary H (x , x ) =  
1 2
1 2
Consider an NLP in the form of a minimisation. Suppose

the feasible region S is a convex set. If the objective By Main Theorem 2, (1,1) is a local minimum. All principal
function f0 is convex on S, then any local minimum is an minors are positive so f is convex (Main Theorem 1). It follows
optimal solution, so solves the NLP. that (1,1) is a global minimum (Main Theorem 3 Corollary).
20 21

Example Setting first derivatives to 0 gives x2y = y2x = 2
We shall find the dimensions of a rectangular box, without which leads to x = y = 3√2 as the only possible optimum.
 4 1
a top and having volume 1 unit, if the least amount of Now H =  x13 4  and at (3√2, 3√2), 4/x3 > 0 and |H| > 0
 3 
so by Main Theorem
y 
material is to be used in its manufacture. 2 we have a local minimum at (3√2, 3√2).
Let x units be the length of the base of the box, We cannot safely produce an argument based on convexity,
let y units be the width of the base, let z be the height because there are points where |H| < 0.
and let S be the surface area. But we can say that S→∞ as x,y→0 and S→∞ as x,y→∞
Note that x, y, z ≥ 0. Since S is continuous as a function of x and y for x,y > 0, we can
We have S = 2xz + 2yz + xy and 1 = xyz (volume). conclude that the unique minimum at (3√2, 3√2) must be a global
Eliminating z gives S = 2/y + 2/x + xy minimum.
Now differentiate. Since we find z = ½ 3√2, the box should have a square base and
S 2 S 2 2S 4 2S 2S 4 a depth half the length of a side of the base.
= y− 2 , = x− 2 , 2 = 3 , = 1, 2 = 3
x x y y x x xy y y 22 23
12e − 24 x   −12 12 
y y
Example 12 xe
H =  =  12 −30  at (1,0)
We find the global maximum and/or global minimum of  12 xe
y
6 x e − 36e
2 y y
  
f(x,y) = 6x2ey-4x3-e6y or show they do not exist.
H1 = -12<0, H2 = 216>0 so (1,0) is a local maximum. But f
Firstly, necessary conditions for optimum are
is not concave: for example, at (0,0) the concavity test on
12 xe − 12 x   0
y 2
f =  slide 28 fails. Nor is f convex.

 =  0
 6 x e − 6e   
2 y 6y
We cannot conclude that (1,0) is a global maximum or minimum.
In fact, suppose x=0 and let y→∞. Then f →-∞.
so xey = x2 and x2ey = e6y.
Suppose y=0 and let x→-∞. Then f →∞.
The second equation implies x≠0, hence x=ey so e3y=e6y.
So f has a single stationary point but no global maximum or
Therefore y=0, x=1 and (1,0) is unique stationary point.
minimum (a saddle point).
Now examine the Hessian.
24 25
For a concave function, a single local stationary point must For a concave function, a single local stationary point must
be a global maximum and for a convex function a single be a global maximum and for a convex function a single
local stationary point must be a global minimum. These local stationary point must be a global minimum. These
results are not always true for any function on Rn. Note results are not true for any function on Rn. Note again that
again that f = 0 is necessary but not sufficient for optimum. f = 0 is necessary but not sufficient for optimum.
Pros and cons of exact methods: Pros and cons of exact methods:
1. Relative simple, when applicable. 1. Relative simple, when applicable.
2. Can often locate global optima. But….. 2. Can often locate global optima. But…..
3. Usually not applicable. 3. Usually not applicable.
4. Assumes differentiability. 4. Assumes differentiability.
5. Solution of f = 0 may be impractical or impossible. 5. Solution of f = 0 may be impractical or impossible.
26 27
Chapter 6. Non-Linear Programming Newton’s Method (Multivariate)
Assume differentiability. In the univariate case, we
6.1 Introduction and Examples
approximated f using Taylor’s Theorem. The
6.3 Univariate Problems: Approximate Methods multivariate case is a simple generalisation,
6.4 Multivariate Unconstrained Problems: Exact Methods although results are only established for two
6.5 Multivariate Unconstrained Problems: Approximate Methods variables. Taylor says:
6.7 Multivariate Inequality Constrained Problems
6.8 Quadratic Programming f f
f ( x, y ) = f ( a , b ) + ( x − a ) ( a, b) + ( y − b) ( a, b) +
6.9 Penalty Functions x y
1  f 2
 f  f  2 2
(Some variation in these subchapters may occur, dependent on ( x − a) (a, b) + 2( x − a)( y − b)

2
( a, b) + ( y − b) (a, b)  + ... 2
timings and the preferences of students.) 2  x 2

xy y 
2
1 2
Multivariate Unconstrained Problems: Approximate Methods Multivariate Unconstrained Problems: Approximate Methods
f f  f  f 2 2
This yields an approximation for f(x,y) in the vicinity of (a,b), = (x , x ) + (x − x ) (x , x ) + (x − x )

(n) (n)
( x , x ) + ...
(n) (n) (n) (n) (n) (n)
x x x x x
1 2 1 1 2 1 2 2 2 1 2
assuming all derivatives exist and are continuous and are 1 1 1 1 2
evaluated at (a,b). f f  f  f 2 2
= (x , x ) + (x − x ) (x , x ) + (x − x )
(n) (n)
( x , x ) + ...
(n) (n) (n) (n) (n) (n)
x x x x x
1 2 2 2 2 1 2 1 1 1 2
Take (x1(n),x2(n)) as approximate value for the optimum (a,b) at 2 2 2 1 2
the nth iteration. We want an expression for (x1(n+1),x2(n+1)). Now (derivatives on RHS evaluated at (x1 x2 (n), (n)))
Now suppose (x1(n+1), x2(n+1)) is an improved estimate for the

f f
f (x , x ) = f (x , x ) + (x − x )(n) (n) (n)
(x , x ) + (x − x ) (x , x ) +
(n) (n) (n) (n) (n)
optimum. Terms on LHS of above equations are close to
x x
1 2 1 2 1 1 1 2 2 2 1 2
1 2
zero, and we ignore higher order terms. We can write
1  f 2
 f  f  2 2
(x − x ) ( x , x ) + 2( x − x )( x − x )
(n) 2 (n)
(x , x ) + (x − x )
(n)
( x , x )  + ...
(n) (n) (n) (n) (n) 2 (n) (n)
 f    f  f 
2 
2 2
x x x x
1 1 1 2 1 1 2 2 1 2 2 2 1 2

2 2
 0   x   x x x   x −x 
1 1 2 2
2 ( n +1) (n)
 0  =  f + 
1 1 1 2 1 1
Differentiate:       f
2
 f   x
2
2
( n +1)
− x  2
(n)
 x   x x x 
  
2
3 2 1 2 2 4
Example
In matrix form, this is 0 = f + H(x(n+1)-x(n)) min f(x1,x2) = x12+x1x2+x22-3x1-3x2+3
which can be expressed x(n+1)=x(n) - H-1 f Take starting point (x1(1),x2(1)) = (0,0)
This last expression provides an iteration,
Newton’s method, based on a suitable 2 1
 2
 3
1
-  2 x + x2 − 3 
H = 3 f =  1
 H -1 =   
starting value, intended to converge to a 1 2  - 1 2 
  x1 + 2 x2 − 3 
 3 3 
local, if not necessarily global optimum.
Stress that at any iteration H-1 and f are  2 −
1
x
( n +1)
 x   3
(n)
3 
2x + x (n) (n)
− 3
 =x − 1
1 2
each evaluated at (x1(n),x2(n)), and that a good   x + 2 x − 3 
1 1
x ( n +1)
(n) (n)
    −
(n)
starting point (x1(1),x2(1)) is crucial. Obvious 2 2

2
 1 2

extensions with n>2 variables.  3 3 
 2 −
1
 x   0  3 3   −3 
 1  x 
(2) ( 3)
 x  =  0 −  1   =   = 
1 1
− 
    −     1  x 
(2) ( 3)
2
2 3 2
5
 3 3  6
Multivariate Unconstrained Problems: Approximate Methods

In this case, optimum has been reached in a single
iteration – because the function is quadratic. In many Newton gives
other cases, use of Newton’s method can be problematic.  1
0

Example  x1( n+1)   x1( n )   12( x1( n ) − 1)2   4( x ( n ) − 1)3 
 ( n+1)  =  ( n )  −  
 4( x ( n ) − 2)3 
1
min f(x1,x2) = (x1-1)4 + (x2-2)4 x

 2   2  x  1   2 
 0
12( x2 − 2)2 
(n)
Clearly, the exact optimum is at (1,2) but use Newton. 
   1 (n)   1 2 (n) 
( x − 1)   + x1 ) 
1
12( x − 1) 0   12( x − 1) 2 0   x1( n )   3 1
2
H = 1
H =  =  (n)  −  =
3 3
12( x − 2) 

−1 1
 
 0
2
1 x
2
 0 2 
12( x2 − 2) 
 2   ( x ( n ) − 2)   + 2 x ( n ) ) 
1 2

3  3 3 
2 2
Note that H-1 exists for x1≠1, x2≠2. Starting from (0,0) obtain a sequence of points:
 4( x − 1) 
3
f =  1

 4( x − 2) 
3
7 8
2
Example
Iteration 1 2 3 4 5 Apply Newton to the function f(x,y) = sin(x2/2-y2/4)cos(2x-ey).
(x1(n), x2(n)) (0,0) (0.33,0.67) (0.56,1.11) (0.70,1.41) (0.80,1.61) Run the algorithm a number of times with starting points
around (1.5,0.5):
Iteration 6 7 8 9
(x1(n), x2(n)) (0.87,1.74) (0.91,1.82) (0.94,1.88) (0.96,1.92) Start point End point f at end point
(1.4,0.4) (0.0407,-2.5073) -1.0000
We see that the iteration is converging, but rather (1.4,0.5) (0.1180,3.3447) 0.3403
too slowly. (1.4,0.6) (-1.5532,6.0200) -1.0000
(1.5,0.4) (2.8371,5.3540) 0.0000
If we had naively used starting point (1, 2), the (1.5,0.5) (0.0407,-2.5073) -1.0000
(1.5,0.6) (0.0000,0.0000) 0.0000
singularity of H would have meant we couldn’t run
(1.6,0.4) (-0.5584,-0.7897) 0.0000
the algorithm at all.
(1.6,0.5) (-0.2902,-0.2305) 0.0056
9 (1.6,0.6) (-1.5529,-3.3326) 1.0000 10

The process is clearly unstable, although Newton has been Summarising, the main advantage is that convergence may
able to find the true maximum and minimum (1 and -1 be very rapid in some cases. But:
respectively) for some starting values. The method may 1. It needs considerable computing power (e.g. storing
converge to saddle points, as well as to maxima and minima. and evaluating H-1 for large n)
Further, unless very close to a local optimum, it is possible to 2. May converge to local, not global optima
move in unexpected directions. 3. May wander badly, converge slowly or diverge
The algorithm did indeed converge each time, but to very 4. Assumes f twice differentiable, and that derivatives
different destinations, despite the starting points being quite have an explicit analytic form.
close. We are no closer to finding optima! 5. Not always robust; sensitive to initial estimate.
This example, and earlier examples, are designed to make 6. Assumes H invertible and well conditioned.
the point that Newton’s Method suffers serious potential 7. Can sometimes encounter a stationary iteration point
or an infinite cycle.
defects and is not a viable practical tool.
8. Discarding quadratic terms may have an adverse
11 effect on convergence. 12
Quasi-Newton Methods There are many variants of quasi-Newton: the subject of

Methods have been developed aiming to sidestep some of the considerable research. One (less common) approach takes
difficulties of Newton’s method, in particular the need to Hn = (H+λnI)-1 for suitable constants λn.
evaluate all second derivatives and invert H at each iteration,
possible near-singularity of H and convergence problems. Three further choices are the popular method of steepest
descent, the DFP (Davidson, Fletcher, Powell, 1959-1963)
To find a local optimum, Newton uses method and the BFGS (Broyden, Fletcher, Golfarb, Shanno,
x(n+1)=x(n) - H-1▽f
1970) algorithm. The algorithms are clearly meant for
Quasi-Newton methods use the more general formulation
computer implementation rather than manual calculation.
x(n+1)=x(n) - αn+1Hn▽f
where αn+1 is a ‘step length’ usually found by univariate search,
{Hi} is a sequence of matrices typically starting with H0 = I. We describe steepest descent. The (non-examinable) DFP
Algorithm is given as a supplementary file on Moodle.
So the usual Newton method is the special case where More can be found in Bazaraa, Sherali and Shetty. See also
αn = 1 for all n, Hn = H-1 for all n. 13 http://www.mpri.lsu.edu/textbook/Chapter6-a.htm#quasi 14
Step 1: Initialise at a point x(0).
Step 2: Find a direction of movement away from the current
Steepest Descent solution so as to improve the value of f(x).
A very useful method for unconstrained Step 3: Determine how far to move in this direction, i.e. find a
optimisation. This uses the iterative scheme suitable step size.
x(n+1)=x(n) + αn+1▽f(x(n)) Step 4: Repeat steps 2 and 3 until no further improvement can
[The signs of ▽f(x(n)) and αn+1 determine the direction of be made, or until it is less than a specified tolerance.
On termination, the current solution is taken as optimum.
movement from one solution to the next.]
We can think of the method as quasi-Newton with Hn=I for
Since the maximum rate of decrease (or increase) of a function
all n, although it is not usually classified as such. at a point is to follow the gradient vector, we move in that
Alternatively, more intuitively, this is an example of a direction at step 2. The step length tells us how far to go.
generic algorithm: As always, a good starting point speeds the algorithm. A more
formal statement follows:
15 16
Example
Step 1: Select initial solution x(0). Set n = 0. min f(x1,x2) = x12+x1x2+x22-3x1-3x2+3
Step 2: Evaluate ▽f(x) at x(n). (we know the true minimum is at (1,1)).
Step 3: Choose αn+1 to minimise (or maximise) Take starting solution x(0) = (x1(0),x2(0)) = (1,0)T.
f(x(n+1))=f(x(n)+ αn+1▽f(x(n))) by univariate search.
Step 4: Obtain new solution x(n+1)=x(n)+αn+1▽f(x(n)).  2 x + x − 3   −1 
Search direction f ( x) =  1 2
= 
If |xj(n+1)- xj(n)| < ε (small, specified) for all  x + 2 x − 3   −2 
1 2
components xj, set x*=x(n+1) as optimum and stop; Choose step size α1 to minimise
otherwise set n →n+1 and return to step 2. f(x(1)) = f(x(0) + α1▽f(x(0)))
= f((1,0) + α1(-1,-2))
= f(1- α1 , -2α1)
= (1- α1)2 + (1- α1)(-2α1)+(-2α1)2 -3(1- α1) -3(-2α1)+3
17 = 7α12 +5α1+1 18
To minimise f, differentiate to find α1 = -5/14. After simplification and differentiation (a simple if

(May need Golden Section Search or similar in unpleasant task) we find α2 = - 5/6.
more complex problems.) Then x(2)=x(1)+α2▽f(x(1))=(19/14,5/7)-5/6(3/7,-3/14)
Now x(1)=x(0)+α1▽f(x(0)))=(1,0) -5/14(-1,-2)=(19/14,5/7). =(1, 25/28)
 2 x + x − 3   73  Can repeat; find a sequence of points converging
From x(1), search direction is f ( x) =  1 2
 =  −3  to the true optimum (1,1). Can check using convexity

 x + 2 x − 3   14 
1 2
that this point is indeed the true minimum.
Step size α2 should minimise
f(x(2)) = f(x(1) + α2▽f(x(1))) Example
= f((19/14,5/7) + α2(3/7,-3/14)) For f(x1,x2) = (x1-2)4+(x1-2x2)2 with starting point (0,3), we find
= f(19/14 + 3/7α2, 5/7 -3/14α2). αn+1 is the root of a cubic equation, so use a numerical
19 method at each iteration. 20
Obtain a sequence of points:
n x(n) ▽f(x(n)) αn+1
0 (0,3) (44,24) 0.06
1 (2.70,1.51) (-0.73,-1.28) 0.24
2 (2.52,1.20) (-0.80,0.48) 0.11
3 (2.43,1.25) (-0.18,-0.28) 0.31
4 (2.37,1.16) (-0.30,0.20) 0.12
… … … …
… … … …
55 (2.00,1.00) (0.00,0.00) 0.00
After a couple of iterations, the algorithm follows a long

narrow valley, taking small steps and needing many The advantages of the method of steepest descent are
iterations. This phenomenon, called zigzagging, causes slow • Conceptually simple
convergence and is a major difficulty. • Will converge in most cases
21 22

Disadvantages:
• Tricky computing, including univariate search The first, second and fifth objection above are in
• May converge to local optimum common with many other methods, so researchers
• Zigzagging have tried to find methods to tackle the third and
• Often slow convergence compared to alternatives
fourth.
• Assumes differentiability
Options include:
• Changing the step size. Use of
Since DFP usually has faster convergence, steepest
descent is not often recommended except for ‘well- x(n+1)=x(n)+0.9αn+1▽f(x(n)) instead of
conditioned’ problems. x(n+1)=x(n)+αn+1▽f(x(n)) has been suggested.
• Modifying the direction. Could use
Note that steepest ascent (maximisation) is x(n+1)=x(n)+αn+1{½(▽f(x(n))+ ▽f(x(n-1)))} in place of
handled almost identically to steepest descent
x(n+1)=x(n)+αn+1▽f(x(n)).
(minimisation).
23 24
6.6 Multivariate Equality Constrained Problems Multivariate Equality Constrained Problems
Present three approaches: two naive methods, one method of Consider the above NLP represented graphically:
wide applicability. Of course, there are many other approaches,
both exact and approximate. One such is the Generalised
Reduced Gradient Method; see Bazaraa, Sherali and Shetty
for an extensive survey (or Moodle miscellaneous section).
We illustrate using the NLP:
min f0(x) = x2 + y2
s.t. f1(x) = 3x + 2y = 13
(Chosen to avoid unnecessary algebra)
Naïve Methods
The simplest approach of all is to give a sketch. Quite easy to see that the minimum of f0(x) is attained
Obviously, this is only possible in two (or maybe approximately at (3,2) with value (32 +22) =13.
25 Of course, we can ‘zoom in’ to improve accuracy. 26
three) variables and is approximate.
Multivariate Equality Constrained Problems Multivariate Equality Constrained Problems

Secondly, we can sometimes substitute in f0, using the As a further example, suppose we seek the global maximum
constraints, to obtain the reduced objective and minimum of xyz when x + 2y + 3z = 1 and 3x + 2y + z = 1.
function. Again of limited applicability, mainly when there
are a small number of linear constraints, but the method is
not to be despised; it can be useful. Use of the Lagrangian (to be described) is awkward here,
so we find the reduced objective function.
Using the above example, the constraint gives y = (13-3x)/2 The two constraints lead to 2x-2z=0 (subtracting) so x=z.
Therefore 4z+2y=1 and x=z=¼(1-2y)
So f0(x) = x2 + (13-3x)2/4 = 13/4x2 - 39/2x+ 169/4 (called the reduced
objective function because there is now only one variable).
Differentiating and setting the derivative to 0 gives Form the reduced objective function
x = 3, y = 2 as before. Use of the second derivative shows Φ(y)=¼(1-2y).y.¼(1-2y)=1/16y(1-2y)2.
(3,2) is indeed the minimum, solving the NLP with Differentiating, Φ'(y)=0 when (1-2y)2 -4y(1-2y)=0
value 13. leading to y=1/2 or 1/6 .
27 28
Lagrange Multipliers
Easy to see we obtain two points: (0,½,0) value 0
and (1/6 ,1/6 ,1/6) value 1/216. A powerful, general method for equality constrained NLPs
So we have solved the NLP? No. The former is a capable of being solved exactly, with many further
applications – for example the method extends to NLPs with
local minimum, the latter a local maximum. Neither
inequality constraints. Detailed justifications are omitted,
is a global optimum. although an outline is provided in the Moodle miscellaneous
section.
We can see this by considering (x,y,z)=(¼(1-2y),y,¼(1-2y)).
If y=N (large and positive), xyz is large and positive. A more thorough analysis can be found in many books
covering NLP, or indeed in many calculus texts.
If y=-N (large and negative), xyz is large and negative.
So Φ(y) → ±∞ as y→ ±∞ so the NLP is solvable neither as
a minimisation nor a maximisation.
29 30

Example Substituting in (iii) gives 16y2 = 5
max x3y s.t. x2 + 4y2 = 5 so y=√5/4 and x = √12y = √15/2 with value x3y=75/32√3.
Let L = x3y + λ(x2 + 4y2 – 5) (L is the Lagrangian) (And corresponding negative solutions.)
Solve Lx = 3x2y + 2λx = 0 ..….(i)
3
Ly = x + 8λy = 0 .….(ii) Example
Lλ = x2 + 4y2 – 5 = 0 ….(iii) Consider the earlier problem: x2 + y2 s.t. 3x + 2y = 13.
We can assume x>0, y>0 (e.g. x=y=±1 satisfies the Now L = x2 + y2 + λ(3x + 2y – 13).
constraint with x3y=1>0) and in particular x ≠ 0, y ≠ 0. We solve Lx = Ly = Lλ = 0 giving
Now (i) gives 3xy + 2λ = 0 so λ = -3/2xy 2x + 3λ = 2y + 2λ = 3x + 2y – 13 = 0
(ii) gives λ = –x3/8y leading to x = -3/2λ, y = -λ
So -3/2xy = – x3/8y which leads to x2 = 12y2 and finally λ = -2, x = 3, y = 2 happily agreeing earlier work
A little more would be needed to confirm we have a minimum
rather than a maximum,
31 32
Example – Inventory Control The Lagrangian is L = ½dx + ey + λ(xy – X).
At regular intervals, a firm orders a quantity x of a Lx = ½d + λy = 0
commodity which is placed in stock. The stock is Ly = e + λx = 0
Lλ = xy - X = 0.
depleted at a constant rate until none remains
The first two equations yield ½ed = λ2xy = λ2X
Stock
whereupon the firm immediately restocks with x. Hence λ-1 = ±√2X/ed .
The firm requires X units of the commodity each Solutions are x = -e/λ = √2eX/d and y = -d/2λ = √dX/2e
year and, on average, the firm reorders with a
Time
frequency of y times a year. If the requirement is The firm should therefore order (2eXd-1)½ of the commodity
to be met, it is therefore necessary that xy = X. (dX(2e)-1)½ times a year. The yearly cost will be (2eXd)½ .
The cost of holding one unit of the commodity in stock for a year is d. Since the There are many more complex inventory models –
average amount held in stock is ½x, the yearly holding cost is ½dx. shortages allowed, varying depletion rates etc. Lagrange
The cost of reordering is e. The yearly reordering cost is therefore ey. multipliers are a key solution tool.
The firm faces the problem of minimising cost C(x,y) = ½dx + ey subject33to xy = X. 34
Multivariate Equality Constrained Problems

Chapter 6. Non-Linear Programming
In general, it is possible to classify stationary points for
constrained problems, as for unconstrained problems. But
we cannot say that all solutions of the Lagrange conditions 6.1 Introduction and Examples
will optimise the objective function, or will even be 6.2 Convexity and Concavity
stationary points. This is the problem of finding sufficient 6.3 Univariate Problems
conditions for optimum, as we have mentioned before. 6.4 Multivariate Unconstrained Problems: Exact Methods
If there is an optimum, it will arise from solution of the 6.6 Multivariate Equality Constrained Problems
necessary Lagrange conditions. But there may be 6.7 Multivariate Inequality Constrained Problems
Lagrange points that do not yield the optimum. 6.8 Penalty Functions
If there are no Lagrange points, we can deduce there is no (This reflects some changes that were recently put into effect.
optimum. If there is a unique solution to the Lagrange Univariate exact problems and quadratic programming have been
equations, and there is an optimum, we must have located it. relegated to the miscellaneous section of Moodle )
35 1
6.7 Multivariate Inequality Constrained Problems Multivariate Inequality Constrained Problems
The most general type of NLP. Consider a problem of form
min f0(x) for x ε Rn The following method is therefore suggested:
subject to fi(x) ≤ 0 (i=1,2,…,m). • If there are m constraints, solve 2m subproblems:
Fairly clear that any real programming problem can be ignoring all constraints, ignoring all but one which is an
expressed this way. Of course, many will be maximisations equality, ignoring all but two constraints which are
or have different types of constraints. The form is convenient, equalities, and so on
and it is quite important to be consistent here in view of • Any solutions found which satisfy all current constraints
subsequent work. are candidates for optimum; any solutions which violate
Kuhn-Tucker is the main tool, enabling exact solutions in one or more constraints can be discarded.
some cases, and underlying theory in others. First, briefly Naturally, the method is only viable when m is quite small,
consider a couple of other approaches.
As in other areas, can again use a plot – only with 2 or 3 even if modifications can reduce the labour somewhat.
variables. Alternatively, observe that at optimum, each
constraint either holds as an equality, or is not binding – in
which case, it can be ignored.
2 3
Multivariate Inequality Constrained Problems Multivariate Inequality Constrained Problems

Example The procedure only runs in (worse than?) exponential time –
Minimise (x1-4)2 + (x2-4)2 for the number of subproblems, and for solving the
s.t. x1 + x2 ≤ 4 subproblems themselves. Modifying Lagrange multipliers is
x1 + 3x2 ≤ 9 more promising in practice.
Karush-Kuhn-Tucker Conditions (or KT, or KKT etc.)
Problem Constraints Optimum Value Let fi(x) (i=0,1,2,…,m) be differentiable functions satisfying
1 None (4,4) 0 Both constraints violated certain regularity conditions.
Given the NLP
2 x1 + x2 = 4 (2,2) 8 Both constraints satisfied
min f0(x) for x ε Rn
3 x1 + 3x2 = 9 (3.3,1.9) 4.9 First constraint violated subject to fi(x) ≤ 0 (i=1,2,…,m) m
4 x1 + x2 = 4 (1.5,2.5) 8.5 Both constraints satisfied form the Lagrangian L = f 0 ( x ) +  ui f i ( x)

i =1
x1 + 3x2 = 9 and let u=(u1, u2,… um) where the u’s are unknown constants.
A point (x*,u*) satisfies the KKT conditions, and is called a
Optimum is (2,2) from second subproblem. KKT (or KT) point if the following conditions hold:
4 5
L So we can solve the algebraic equations and inequalities for
a) =0 (j=1,2,…,n) (Gradient Conditions)
x
j
these four sets of conditions, and examine solutions to see
whether a global minimum is achieved. [The usual problem
b) uifi(x) = 0 (i=1,2,…,m) (Orthogonality Conditions) arises; sufficiency conditions are much harder to obtain.]
A few comments on the KKT conditions:
c) fi(x) ≤ 0 (i=1,2,…,m) (Feasibility Conditions)
• A full proof is necessarily tricky. Refer ‘the interested reader’
to Bazaraa, Sherali and Shetty. Omitted in this course.
d) ui ≥ 0 (1=1,2,…,m) (Non-negativity Conditions) • The KKT conditions generalise Lagrange multipliers to
include inequality constraints.
A necessary condition for x* to be an optimum for • The regularity conditions are needed for mathematical
precision. However, when practitioners apply the conditions
the NLP is that all KKT conditions are satisfied. (and in this course) they are not routinely checked.
• Regularity conditions can be stated in a number of forms.
One of the simpler requires that the gradients of the active
Note the importance of formulating the NLP precisely as (i.e. binding) inequality constraints are linearly independent
described in the previous slide (common source of error!) 6
at x *. 7

Example (considered earlier) A sensible strategy is to start with orthogonality and
Minimise (x1-4)2 + (x2-4)2 split the problem into subcases. So consider (c) and
s.t. x1 + x2 ≤ 4 (d).
x1 + 3x2 ≤ 9 CASE 1
L = (x1-4)2+(x2-4)2+u1(x1+x2-4)+u2(x1+3x2–9)
x1 + x2 = 4 and x1 + 3x2 = 9
2(x1-4)+u1+u2=0 (Gradient) …….…….(a)
2(x2-4)+u1+3u2=0 (Gradient) ………..…(b) Then x1 = 1.5, x2 = 2.5
u1(x1+x2-4)=0 (Orthogonality) ……..(c) Now (a) and (b) give u1 + u2 = 5, u1 + 3u2 = 3
u2(x1+3x2–9)=0 (Orthogonality) ……..(d) Then u1 = 6, u2 = -1, violating (g).
x1 + x2 ≤ 4 (Feasibility) …………(e) CASE 2
x1 + 3x2 ≤ 9 (Feasibility) …………(f) x1 + x2 = 4 and u2 = 0
u1, u2 ≥ 0 (Non-negativity) ……(g) Now (a) and (b) give 2(x1-4)+u1= 2(x2-4)+u1=0
So x1 = x2 = 2, u1 = 4, u2 = 0.
All conditions hold (check); we have a KT point.
8 9
CASE 3
x1 + 3x2 = 9, u1 = 0 We see that solving the equations and inequalities
Now (a) and (b) give 2(x1-4)+u2=2(x2-4)+3u2=0 can be quite tedious with more than a few
Subtracting the second equation from thrice the first variables and constraints. Further, we still lack
6x1-24-2x2+8=0 or 3x1 – x2 = 8 sufficiency conditions. However, in a few cases
Solving with x1 + 3x2 = 9 we find x1 = 3.3, x2 = 1.9 they can be given. In particular, if fi is convex for
But then condition (e) is violated. each i then any KKT point is a global minimum.
CASE 4 Otherwise, in the absence of other methods, we
u1 = u2 = 0
must examine each KKT point for optimality.
Here, the gradient conditions (a) and (b) give
x1 = x2 = 4 and (e) is violated.
A further example of the use of KKT conditions is now
The KKT solutions lead to a unique solution (2,2) with u = (4,0). given. It is important that you are able to solve problems
The problem obviously has a global minimum (the objective of this type yourself.
function is always positive); we have found it with value 8.
10 11

Now form the Lagrangian:
Example L = -3x1+½x22+u1(x12+x22 –1)+u2(-x1)+u3(-x2)

min f0(x1,x2) = -3x1 + ½x22 = -3x1+½x22+u1(x12+x22 –1)-u2x1-u3x2
s.t. x12 + x22 ≤ 1
KKT conditions:
x1, x2 ≥ 0
-3+2u1x1-u2 = 0 (gradient)
x2+2u1x2-u3 = 0 (gradient)
First, write in correct form: x12 + x22 ≤ 1 (feasibility)
min f0(x1,x2) = -3x1 + ½x22 x1 , x2 ≥ 0 (feasibility)
s.t. x12 + x22 - 1 ≤ 0 u1(x12 + x22 -1) = u2x1 = u3x2 = 0 (orthogonality)
-x1 ≤ 0 u1, u2 , u3 ≥ 0 (non-negativity)
-x2 ≤ 0
12 13
2
Consider u1(x1 + x2 2 -1) = 0 (orthogonality) Now (a) and (c) give x1 = ±1 but by feasibility
If x12 + x22 -1 ≠ 0 then u1 = 0 so u2 = -3
x1 ≥ 0 so x1 = 1. Also (b) and the gradient conditions tell us
(gradient) contradicting non-negativity. that u1=1.5 and u3=0.
Therefore x12 + x22 -1 = 0 ………………….(a)
Hence x=(1,0) and u=(1.5,0,0) satisfy KKT conditions. But
Consider u2x1=0 (orthogonality)
does this solve the NLP?
If u2 ≠ 0 then x1 = 0 so u2 = -3 again
If we knew that f0 had a minimum, we could infer that it was
(gradient) contradicting non-negativity.
at (1,0) but this is not immediate.
Therefore u2 = 0, x1 ≠ 0 ……………………(b) 0 0
So consider Hessian matrix  0 1  of f0. Principal minors
Also u3x2=0 (orthogonality).
0,0,1 are all non-negative so f0 is convex.
If x2 ≠ 0 then u3=0 so x2(1+2u1)=0 (gradient) A similar Hessian argument shows x12 + x22 -1 is convex. x1
We have assumed x2 ≠ 0 so u1 = -½, contradicting
and x2 are linear, so convex.
non-negativity, and we must have Therefore all functions are convex. As remarked earlier,
x2 = 0 ……………………………………… (c) any KKT point is a global minimum. So (1,0)
solves the NLP.
14 15

Software R has the function optim, which is a compendium
There are many packages available for NLP problems. of several methods. It may perform well, but it is
They use a number of different algorithms, some described important to choose the best method from the ones
in this course, some not. It is crucial to have the correct available (which include Nelder-Mead and BFGS) rather
algorithm, especially if the NLP: than use the default unthinkingly. In addition,
• Is large (more than about 20 variables) correspondence on message boards suggest the function
• Is complex, with non-standard functions has some difficulties. I do not know whether the difficulties
• Needs to be run many times. addressed have recently been resolved.
For example, the method of Nelder and Mead has the large Excel Solver is very good for small problems, but struggles
advantage of not requiring differentiability, but it can run as the number of variables and constraints increases.
very slowly. Generally, in view of the difficult nature of NLP, it
seems wise to spend some time researching the
options before using any ‘in anger’.
16 17
6.8 Penalty Functions Penalty Functions
Example:
Introduction
This is a method for solving constrained NLPs either with Minimise x12 + x22
equality, or with inequality constraints (or with both). s.t. x1 + x2 = 1
The idea is to convert a constrained problem into an
unconstrained problem. In practice, because of computational
issues, a sequence of unconstrained subproblems is usually We can use an (exterior) penalty function, such as
solved. F(x1, x2, M) = x12 + x22 + M(x1 + x2 – 1)2
Two broad approaches.
Exterior point methods include a penalty term, added to the Here, F is a modified, unconstrained objective function, M is
objective function for any constraints violated. The penalty is the penalty parameter and M(x1 + x2 – 1)2 comprises the
applied for infeasibility; we force the solution to feasibility and
subsequent optimum. This gives a sequence of infeasible penalty term. Points for which x1 + x2 ≠ 1 are ‘penalised’.
points converging to an optimum solution to the original Typically, we might seek a sequence of infeasible points,
problem. starting at (1,1) say, which converges to the optimum.
18 Generally M is large – we sometimes consider M →∞.
Penalty Functions Penalty Functions

The other approach is to use barrier (or interior) methods with
a ‘barrier’ function that prevents points leaving the feasible Penalties or barriers?
region. We obtain a sequence of feasible points converging to • Some prefer barrier methods because even if they do not
an optimum solution to the original problem. converge, you will still have a feasible sub-optimal
solution.
• Again, barrier methods typically require fewer function
Example: Minimise x12 + x22 evaluations, so may run faster.
s.t. x1 + x2 ≤ 1 • Penalty methods handle equality constraints well. Barrier
methods need awkward modifications.
We can use a barrier function, such as • Barrier methods require a feasible starting point. Finding
Q(x1, x2, ε) = x12 + x22 - εln(1 - x1 - x2) such a point may be problematic.
• Both normally converge at least to a local minimum, and
work well even in the presence of cusps and other
Feasible points (x1, x2) are ‘discouraged’ from straying too anomalies.
close to x1 + x2 = 1, and cannot cross the barrier.
Most practitioners probably prefer external penalty functions
Generally ε is small with barrier methods – we sometimes on balance, although both methods are viable.
consider ε → 0.
20
We shall concentrate in this brief introduction on exterior Here, M is a (usually large) positive constant and pi(x) > 0 if
penalty functions. the ith constraint is violated (i=1,2,…,m+l) and 0 otherwise.
The idea is to draw solutions into the feasible region by
In general, we can consider an NLP of the form ‘penalising’ values for which the constraint fails to hold. An
min f(x) optimum solution must have pi(x) ≈ 0 for each i, to avoid a
s.t. gi(x) ≤ 0 (i=1,2,…,m) large penalty, and indeed pi(x) →0 as M → ∞. But
hj(x) = 0 (j=1,2,…,l) how should pi and M be chosen?
Notation is slightly changed. Both equality and inequality
constraints are considered but are treated rather differently. For equality constraints hj(x) = 0, it is convenient to choose
pi(x) = [hj(x)]2 , so pi(x)= 0 iff hj(x) = 0; otherwise pi(x) > 0.
Show how to solve the NLP by an exterior penalty method.
The idea is to write the constrained function as described For inequality constraints gi(x) ≤ 0, can choose pi(x) =
above as min F = f(x) + MΣpi(x); pi is the penalty function. [max{gi(x),0}]2. Then pi(x)= 0 iff gi(x) ≤ 0; otherwise pi(x) > 0.
23

First, give an example of one of the rare cases where
Could also let pi(x) = [max{gi(x),0}]. But the choice penalty functions yield an exact solution; usually a
we have made helps ensure differentiability of the sequence of approximations is used, based on Steepest
unconstrained function. [Consider the case gi(x) = x.] Descent (or similar) for the unconstrained problem.
Alternative choices of penalty can be used. Example 1 (External)
Minimisation is often difficult with very large M,
min (x1-1)2 + (x2-1)2
because of convergence problems and round-off
s.t. x1 + x2 = 1.
errors. So we usually solve a sequence of
minimisation problems for successively higher M,
starting each stage with the optimum from the Consider min F = (x1-1)2 + (x2-1)2 + M(x1+ x2 -1)2 [M>0 is large]
previous stage. [Even if the initial functions are well-behaved, Fx1 = 2(x1-1) + 2M(x1+ x2-1)
F is generally not well-behaved.] Fx2 = 2(x2-1) + 2M(x1+ x2-1)
24
At stationary points, both partial derivatives are zero so x1 = 25
x2.
Therefore 2(x1-1) + 2M(2x1-1) = 0 With M=1, an unconstrained multivariate optimisation gives
solution x(2) = (0.889,0.667), f(x(2)) = -0.592.
Solving, x1 = x2 = (1+M)/(1+2M) (can check this is minimum) Use this value as starting point with M=10.
A further unconstrained multivariate optimisation gives
As M →∞, x1→½, x2→½, F→½, the true optimum. solution x(3) = (0.686,0.586), f(x(3)) = -0.402.
Use this value as starting point with M=100.
Example 2 (External) A further unconstrained multivariate optimisation gives
min f(x1,x2) = -x1x2 s.t. solution x(4) = (0.668,0.578), f(x(4)) = -0.386.
x1 + x22 -1 ≤ 0 Use this value as starting point with M=1000.
-x1 - x2 ≤ 0 A further unconstrained multivariate optimisation gives
solution x(5) = (0.667,0.557), f(x(5)) = -0.385.
We proceed iteratively with starting point x(1) = (1,-2) using
Steepest Descent. We may decide to stop at this point. The true optimum is
indeed -0.385 at (0.667,0.556) to 3 d.p. (Note we have
F = -x1x2 + M(max[0, x1 + x22 -1]2 + max[0, -x1 - x2]2) 26 proceeded through infeasible points) 27

Obtain sequence:
Example 3 (External)
This illustrates the importance of using successively larger n M x1(n) x2(n) x3(n) F
values of M, for a maximisation. 1 1 0.4856 3.2671 1.9021 28.9710
max 4x1-x12+9x2-x22+10x3-2x32-½x2x3 2 10 0.4179 3.2349 1.8981 28.8362
s.t. 4x1 + 2x2 + x3 ≤ 10 3 100 0.4111 3.2309 1.8977 28.8221
2x1 + 4x2 + x3 ≤ 20
4 1000 0.4103 3.2308 1.8974 28.8206
x1, x2, x3 ≥ 0
5 10000 0.4103 3.2308 1.8974 28.8205
6 100000 0.4102 3.2308 1.8974 28.8205
Consider max F = 4x1-x12+9x2-x22+10x3-2x32-½x2x3
True Optimum 0.4102 3.2308 1.8975 28.8205
-M({max[0,4x1+2x2+x3-10]}2+{max[0,2x1+4x2+x3-20]}2+
{max[0,-x1]}2+{max[0,-x2]}2+{max[0,-x3]}2) Using steepest descent with each subproblem giving the
starting point for the next, a total of 300 iterations were found
F is not well-behaved. empirically to be needed.
28 Starting with M=100000, nearly 8000 were needed. 29
2  4 + 16 1 1
Example 4 (Barrier)
Solving, we find x1 = x2 and x1 = =  1 + 4
8 4 4
Consider the problem min x12 + x22 Letting ε→0, we obtain the solution (x1, x2) = (½, ½) since
s.t. 1 – x1 – x2 ≤ 0. the alternative solution ((0, 0) clearly violates the constraint.
We use a logarithmic barrier function. This is another rare
case allowing an exact solution. Example 5 (Barrier)
Consider the problem min x13 + x25
s.t. 1 – x1 – x2 ≤ 0.
Let B = x12 + x22 – εln(x1 + x2 – 1) The exact methods used in example 4 can no longer simply
For an optimum of B to occur, we require be employed in light of complicated algebra.
B B
We could use the modified barrier function
= =0 B = x12 + x22 – ε(k)ln(x1 + x2 – 1)
x1 x2
leading to with a sequence {ε(k)} starting ε(k) =1 and with ε(k) →0 as k→∞,
  where each solution is starting point of the next problem.
2 x1 − = 2 x2 − =0
x1 + x2 − 1 x1 + x2 − 1
Penalty Functions
Generally, the use of penalty functions provides a
useful extra tool for solution of NLPs. The biggest
THE END!!!
problem is potentially slow convergence, due to
the awkward nature of the composite functions.
The decision as to which NLP algorithm to use is
delicate, depending on the exact nature of the
problem, the degree of accuracy obtained, https://www.youtube.com/watch?v=0b-tN0LBfOs
available software and so on. Recent developments over
the last 50 years or so provide a range of options.
32 33

LNO Notes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LNO Notes

Uploaded by

Copyright:

Available Formats

There will be six chapters of varying lengths, 2 and 6 being the

longest, 1 and 5 the shortest.

1.1 General and Administrative Matters

1.1 General and Administrative Matters Assessment

General and Administrative Matters

5. Hillier F S, Liebermann G J, Introduction to Operations 9. Williams H P, Model Building in Mathematical

6. Luenberger D G, Ye Y, Linear and Nonlinear

General and Administrative Matters General and Administrative Matters

History and Scope of Optimization Problems

5. Formulate mathematical model. 10. Post-optimal (sensitivity) analysis.

1.3 Typical Problems Typical Problems

[Clearly, this formulation may be oversimplified.]

Typical Problems Typical Problems

The total weight cannot exceed 33. Which items should be

Typical Problems Typical problems

xj ≥ 0 (j=1,2,…,n) This approach is based on the work of Markowitz and

Typical Problems Typical Problems

2.1 Formulation Formulation

Net revenue is 285(¾x + ¼y + ½z) +105(¼x + ¾y + ½z) –

The staffing constraint is 20x + 10y + 15z ≤ 125 can be

Formulation 2.2 Standard Form

x5+ x6 ≥ 30 done for unknown x ε Rn. (n variables, m constraints).

2.3 Graphical Solution

Graphical Solution Graphical Solution

Simplex Method Simplex Method

Simplex Method Simplex Method

Simplex Method Simplex Method

Degeneracy and Cycling Degeneracy and Cycling

1/2 -12 - 1/2 3 0 1 0 0

Degeneracy and Cycling

Optimal form has been attained after two iterations.

Cycling is rare in practice – some LP packages don’t

2.6 Initialisation Initialisation

How to proceed? Could: Consider the problem:

Now proceed as before. 53 54

Solution has min R = ½ (note change of sign). Since

2.7 Further Example Further Example

Further Example Further Example

The dual LP L* is min bTy s.t. ATy ≥ c, y ≥ 0

and that is just the LP L, i.e. (L*)* = L v v v

Dual Simplex Method

Dual Simplex Method Dual Simplex Method

Dual Simplex Method Dual Simplex Method

Dual Simplex Method Dual Simplex Method

Obtain required simplex form by adding row 2 to row 3.

Sensitivity Analysis Sensitivity Analysis

We only look at changes in a single coefficient, although

Solution x1 = 40, x2 = 10, x3 = 30, x4 = 0. 111 112

Provided the constant column is non-negative, the tableau

Sensitivity Analysis Sensitivity Analysis

Sensitivity Analysis Sensitivity Analysis

Sensitivity Analysis Sensitivity Analysis

Sensitivity Analysis 2.12 Interior Point Methods

Interior Point Methods

1. Find an initial feasible point w.

Interior Point Methods

Chapter 3. Integer Programming

Linear and Non-Linear Optimization 3.1 Introduction

xij = 0 for those i with yi = 0.

Suppose we wish to assign n people to n jobs, n

performed by exactly one person. Let cij denote n

the suitability of person i for job j.

and that is just the LP L, i.e. (L) = L v v v