Analyzing Code with Θ, O and Ω: 1 Definitions

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Analyzing Code with Θ, O and Ω

Owen Kaser
November 2, 2017

Originally written Fall 2012, heavily revised Fall 2016.


last revised November 2, 2017.
These notes, prepared for CS2383, try to show how to use Θ and friends to
analyze the time complexity of algorithms and code fragments. For an alterna-
tive explanation, see the Wiki textbook sections https://en.wikibooks.org/
wiki/Data_Structures/Asymptotic_Notation and https://en.wikibooks.
org/wiki/Algorithms/Mathematical_Background.

1 Definitions
The Sedgewick book used in 2016 prefers to fix a cost model for an algorithm
(frequently the number of array accesses it makes) and then use a “tilde (∼)
notation” to give a fairly accurate estimate of the cost. It briefly mentions a
related approach on pages 206&207, then explains why they have avoided using
it. Nevertheless, this related approach is very widely used, so I think you need
to learn it. We expand on the related approach in this document.
In the related approach, our cost model is implicitly the number of primitive
operations executed at runtime, which would be closely related to the run-
ning time: we assume a unit-cost model where each primitive operation takes
the same number of nanoseconds to execute. Because we (thankfully!) write
programs in high level languages rather than by specifying only primitive opera-
tions, we have to make a few guesses about the basic operations that arise from
our high level programs. Fortunately, the details will end up not mattering.
As with the tilde notation, our focus is on the growth rate of the running
time as the amount of input becomes very large.
To describe growth rates, we use “big Oh” notation to compare the running-
time growth rate of an algorithm against the growth rate of a function (e.g., n2 )
that we intuitively understand.
In what follows, n is a nonnegative integer and represents the “size” of the
input being processed. Usually, it will be the number of data items in an array
or set being processed However, it could represent the value of a single input
integer or it could even represent the total number of bits required to hold the
input. If you are not told what n is measuring, it is probably supposed to be
clear from the context.

1
Functions f and g are functions that make sense for algorithm running times.
Given an input size n, f (n) is the running time, and is thus some nonnegative
real number. Since we usually care only about growth rates of functions,
the units (eg, nanoseconds) are usually not relevant. In the math examples,
we sometimes play with functions that can give negative values for small n.
Pretend that they don’t . . .

Big-Oh: We write that f (n) ∈ O(g(n)) if there exists a positive integer n0


and a non-negative constant c such that

f (n) ≤ cg(n), for all n ≥ n0

.
“Eventually, f (n) is less than some scaled version of g(n).”
“f ’s growth rate is no larger than g’s growth rate.”
Examples:
• n − 1 ∈ O(n) because n − 1 ≤ 1 · n for every n. (So we could take n0 to
be 1.)
• n + 1 ∈ O(n) because n + 1 ≤ 2n when n is large enough. (I could just
choose n0 = 10, or if I wanted to choose n0 as small as possible, I could
start by solving n + 1 = 2n for n. But this is not required.)
• 2n − 1 ∈ O(n). I can choose c = 3 and n0 = 10.
• n2 + 1 ∈ O(n2 ). I can choose c = 2 and n0 = 10.
• n + 1 ∈ O(n2 ) since n + 1 ≤ 1 · n2 for n ≥ 2.
• n2 6∈ O(n). Even if I choose c = 1000, it is not true that n2 ≤ 1000n when
n is big: any value of n > 1000 is problematic.
A conventional abuse of the notation is that we write f (n) = O(g(n)) or say
“f (n) is O(g(n))” instead of f (n) ∈ O(g(n)). Note that O(g(n)) is supposed
to be an (infinite) set of functions. For instance, O(n2 ) = {n2 , n2 − 1, n2 +
1, 21 n2 , n, n + 1, n1.5 , . . .}
Typical uses in CS:
• “The merge-sort algorithm’s average-case running time is O(n log n).” We
are not told exactly how well the merge-sort algorithm will scale up to
handle large amounts of input data, but we are told the growth rate is at
worst “linearithmic”, which is a desirable growth rate.
• “The insertion-sort algorithm’s average-case running time is O(n2 ).” Again,
we are not told exactly how well the insertion sort will scale up to handle
large amounts of input data, but we are told the growth rate is at worst
“quadratic”. It might, or might not, be better than that (it could be lin-
earithmic, for instance). But the statement allows that it could actually
have a quadratic growth rate, which is pretty bad.

2
The person making the statement above could have legitimately also say
“The insertion-sort algorithm’s average-case running time is O(n3 )” with-
out contradicting the first statement, since O(n2 ) ⊆ O(n3 ). But if they
knew it is O(n2 ), they are depriving you of useful information by merely
telling you it is O(n3 ).

Big Omega: In a sense, big-Oh allows us to state upper bounds on the growth
rate of a function. If we want to state a lower bound on a growth rate, we use
big-Omega notation. The definition is almost the same as the big-Oh definition,
except that the direction of the inequality has been reversed.
We write that f (n) ∈ Ω(g(n)) if there exists a positive integer n0 and a
non-negative constant c such that

f (n) ≥ cg(n), for all n ≥ n0

.
“Eventually, f (n) is more than some scaled version of g(n).”
“f ’s growth rate is no less than g’s growth rate.”
“The insertion-sort algorithm’s average-case running time is Ω(n2 ).” This is
a (true) statement that insertion sort is bad (given that it is possible to sort in
O(n log n) time with other algorithms). We’re not saying exactly how bad, but
it is at least quadratic in its slowness.
We can provide true, but less useful, results by replacing Ω(n2 ) by Ω(n1.5 ),
since Ω(n2 ) ⊆ Ω(n1.5 ). This would say that insertion sort is at least somewhat
bad.

Big Theta: Big-Theta notation allows us to state both an upper and a lower
bound for the growth rate of a function.
We write that f (n) ∈ Θ(g(n)) if there exists a positive integer n0 and a
non-negative constants c1 and c2 such that

c1 g(n) ≤ f (n) ≤ c2 g(n), for all n ≥ n0

.
If f (n) ∈ Θ(g(n)), then the two functions have the same growth behaviour.
For instance, n2 + n + 5 ∈ Θ(n2 ).
Big-Oh, Big-Theta and Big-Omega are more formally referred to as “asymp-
totic notations”: they describe the behaviours of the functions when the input n
is approaching infinity. However, we hope to use them to describe the behaviours
of algorithms on reasonably large inputs.
It is helpful to view big-Oh as letting you state a ≤ relationship between
two growth rates, while big-Omega lets you state a ≥ relationship. Big-Theta
lets you state that two functions have the same asymptotic growth rate.

3
2 Rarer asymptotic notations
There are other notations that are sometimes used to start < and > relationships
between growth rates. They are usually defined using limits, though the Wiki
textbooks (and others) have an alternate definition that works out the same in
many, but perhaps not all, cases.

f (n)
little-oh: If limn→∞ g(n) = 0, then we write f (n) ∈ o(g(n)).

f (n)
little-omega: If limn→∞ g(n) diverges toward +∞, then we write f (n) ∈
ω(g(n)).

soft-Oh: If we can find a positive constant k such that f (n) ∈ O(logk (n)g(n),
then we write f (n) ∈ Õ(g(n)).

Comments:
1. Little-oh lets us say that a function definitely grows more slowly than
another.
2. Little-omega lets us say that a function definitely grows more quickly than
another.
3. If limn→∞ fg(n)
(n)
= c, for some positive constant c, then f (n) ∼ c · g(n) and
also f (n) ∈ Θ(g(n)). However, there are cases where one can find two
functions f 0 and g 0 where f 0 (n) ∈ Θ(g 0 (n)) but the limit is undefined. For
instance, consider 
2n if n is even
f (n) =
3n if n is odd
Here, f (n) ∈ Θ(n) but the limit of f (n)/g(n) does not exist. Fortunately,
many interesting algorithms’ runtime functions are not as as weird as f .
4. In many practical situations, logarithmic factors might as well be con-
stants, and soft-Oh gives you a way to ignore them.

3 Handling Multi-variable Functions


Sometimes, the “input size” is not best expressed by a single number and we
want running time functions that might depend on several input parameters.
Consider a sorting algorithm whose speed depends both on the number of values
in the array and also on the magnitude of the largest value in the array.
This can be handled in different ways. The solution given in a widely used
textbook by Cormen [?] is to define:

f (n, m) ∈ O(g(n, m)) if there are positive constants c, n0 , m0 such that


f (n, m) ≤ cg(n.m) for all n ≥ n0 or m ≥ m0 .

4
4 Properties of Big-Oh, Theta & Omega
There are a variety of properties that are sometimes useful. See [?, ?].
• f (n) ∈ O(g(n)) ∧ f (n) ∈ Ω(g(n)) ⇐⇒ f (n) ∈ Θ(g(n)). This gives an
alternative way to show a big-Theta relationship.
• (Transitivity) If f (n) ∈ Θ(g(n)) and g(n) ∈ Θ(h(n)) then f (n) ∈ Θ(h(n)).
Similarly for big-Oh and big-Omega.

• (Symmetry of Theta) f (n) ∈ Θ(g(n)) ⇐⇒ g(n) ∈ Θ(f (n)).


• (Transpose Symmetry for big Oh and Omega) f (n) ∈ O(g(n)) ⇐⇒
g(n) ∈ Ω(f (n))
• If f (n) ∼ g(n) then f (n) ∈ Θ(g(n)). This lets you adapt results from
Sedgewick’s book.
• If p(n) is a polynomial of degree k, then p(n) ∈ Θ(nk ).
• For any positive constant c, we have c · f (n) ∈ Θ(f (n)).

• For any positive constant c, we have logc (n) ∈ Θ(log2 (n)).



• For any constant k, log k (n) ∈ O(n). Actually, it is in O( a n) for any
a > 0.

5 Simplifications
(Each of the following simplifications could be proved, but most of the proofs
have been omitted.)
In what follows, c is a positive constant and f and g are functions from
integers to non-negative real numbers. The formal definition of Θ, O and Ω
in some textbooks allows functions that produce negative numbers. This could
make some simplifications invalid. Running times cannot be negative, so if our
functions describe the relationship between input sizes and running times, we
can ban negative outputs.
Recall that knowing f ∈ Θ(g(n)) means that f ∈ O(g(n)) and f ∈ Ω(g(n)),
so you can replace “Θ(g(n))” by the (less informative) “O(g(n))” if it is helpful.
Besides the common simplifications listed below, you can often use the def-
initions of Θ, O and Ω to justify other simplifications when necessary.

1. Θ(cf (n)) should always be replaced Θ(f (n)). There should be no unnec-
essary constants c.
2. Θ(logc f (n)) should be written without c; the base of the logarithm has at
most a constant effect and thus does not matter.

5
3. Θ(f (n)) + Θ(g(n)) ⇒ Θ(max(f (n), g(n))). (We consider only which func-
tion is bigger when n is very large.) E.g., replace Θ(n log n) + Θ(n2 ) by
Θ(n2 ).
4. Θ(f (n)) + Θ(g(m)) ⇒ Θ(f (n) + g(m)). This rule is better than nothing.
E.g., replace Θ(x2 y) + Θ(xy 2 ) by Θ(x2 y + xy 2 ).
5. Θ(f (n)) ∗ Θ(g(n)) ⇒ Θ(f (n) ∗ g(n))
Pn Pn
6. i=0 Θ(g(i))
P ⇒ Θ( i=0 (g(i)).
P
Proof: i Θ(g(i)) means i f (i), for some function f (n) with
c1 g(n) ≤ f (n) ≤ c2 g(n) for n > n0 .
So Pn Pn0 −1 Pn
i=0 f (i) = i=0P f (i) + i=n0 f (i)
n
= c3 + i=n0 f (i)
Now
Pn Pn
c3 + f (i) ≤ c3 + Pi=n0 c2 g(i)
i=n0
n
≤ cP
3+ i=0 c2 g(i)
n
≤ i=0 2 + c3 )g(i) if g(i) ≥ 1.
(c
Pn Pn
A similar but simpler argument
Pn shows i=0 c2 g(i) ≥ i=0 f (i) This jus-
tifies the replacement by i=0 Θ(g(i)) (providing that g(i) ≥ 1).
For a function that depends on i and n, a sum over i can be similarly
simplified. E.g.
n
X Xn n
X
Θ(ni2 ) ⇒ Θ( ni2 ) ⇒ Θ(n i2 ) ⇒ Θ(n ∗ Θ(n3 )) ⇒ Θ(n4 )
i=2 i=2 i=2
.
P P
7. I f (i) ∈ O( I∪I 0 f (i). This rule is not valid for Θ or Ω and basically
says that you can add in extra items that “aren’t really there.” E.g.,
n/2 n
X X
i2 ∈ O( i2 )
i=0 i=0
.
f (i) ∈ Ω( I 0 f (i)), where I 0 ⊆ I. This rule is not valid for O or Θ
P P
8. I
and says you can ignore troublesome terms in a sum. E.g.,
n
X n
X
i ∈ Ω( i).
i=0 i=n/2
Pn
We can then attack Ω( i=n/2 i) because each term is bigger than n/2,
Pn
and there are about n/2 terms. So i=n/2 i > n2 /4. Since n2 /4 ∈ Ω(n2 )
Pn 2
Pn that i=0 i ∈ Ω(n ), even if you have forgotten
we have an easy argument
the exact formula for i=0 i.

6
6 Direction of Attack
Recursive code requires the creation and solving of recurrences. Since this is
a little more complicated, we ignore recursion in this section. So assume that
we have structured and non-recursive code — the control flow is given by a
combination of if, while, for statements, as well as blocks of code that are
sequentially composed. It is easy to deal with method/function calls, because
if we don’t have recursion, you can always substitute the called code in place of
the method call.
Nested statements should be attacked from the inside out. Work on
getting a Θ expression for the innermost statements first. Then, once you have
determined that that innermost statement is Θ(g(n)) (or whatever), use the
rules in the following section to determine the Θ expression for the statement
it is nested within, and so forth.

7 Rules for Structures


Simple statements are Θ(1), presuming they reflect activities that can be
done in a constant number of steps. E.g., i = A[j/2]+ 5 has a cost of Θ(1).

Blocks of sequential statements: Add the costs of the statements. The sim-
plification rule for Θ(f (n)) + Θ(g(n)) comes in handy here. Essentially, you can
analyze the cost of an entire block by taking the most expensive statement in
the block.

Loops — we presume that the loop test is cheap enough that we can ignore
it. The cost of the loop is obtained by adding up the individual costs of all the
iterations.
E.g.

for j=1 to n
i = A[j/2]+ 5
Pn Pn
The cost of this is j=1 Θ(1), which we can simplify to Θ( j=1 1) which
further simplifies to Θ(n).
When the cost of each iteration is not affected by the value of the index
variable (j in this case), we have a simpler rule:
For a loop that iterates Θ(f (n)) times and has a cost (that does not vary from
iteration to iteration) of Θ(g(n)) per iteration, the total cost is Θ(f (n))∗Θ(g(n)).
For O, there is a particularly simple rule. Determine the cost of the most
expensive iteration of the loop. Multiply it by the number of iterations. This
rule is only for O.
Reanalyzing the loop
for j=1 to n
i = A[j/2]+ 5

7
we compute the cost by noting the most expensive iteration costs O(1) and
the loop iterates O(n) times. The total cost is thus O(n ∗ 1) = O(n).
A more interesting case is
for j=1 to n
for k=1 to j
i = A[j/2]+ k
The innermost statement (the line with A) has cost Θ(1). We work on the
“for k” statement next. Once this loop is reached, it iterates j times, costing us
Θ(1) per iteration. Its total cost (when activated once) is thus Θ(j ∗ 1).
Now life gets harder, because the outermost loop needs to be analyzed. It
iterates n times, but the trick is that its first iteration (when j = 1) does not
make the “for k” loop do much. The last iteration of the outermost loop, when
j = n, makes the “for k” loop work hard.
A big-Oh analysis can just say that the “for k” loop does O(n) work. (Math-
ematically, if we know the the loop does O(j) work and we also know j < n, a
simplification rule not listed above says we can replace j by n and deduce the
“for k” loop does O(n) work every time it is activated.) It is the body of the
outermost loop, which runs O(n) times. Hence, the total cost for the outermost
loop is O(n) ∗ O(n).
A better, big-Theta analysis can be done as follows: The total cost of the
outermost loop is
Pn Pn
j=1 Θ(j) = Θ( i=1 j)
= Θ(n(n + 1)/2)
= Θ(n2 )
It seems like more work, and we just got the same “n2 ” answer as with big-Oh.
However, it is both an upper and a lower bound.
Another way to get a Θ(n2 ) answer would have been to do the quick and
sloppy O(n2 ) analysis. If we can do a quick and sloppy Ω(n2 ) analysis, then the
two sloppy bounds together imply the desirable Θ(n2 ) answer.
So let’s do an Ω analysis of the code. First, the innermost statement costs
Ω(1). This statement is repeated Ω(j) times, giving us a cost of Ω(j ∗ 1) for the
“for k” loop. The trick of replacing j by n is not valid in an Ω analysis, where
we could only replace j by something that will make the result smaller. The
corresponding idea to the big-Oh approach of identifying the maximum cost of
any iteration is to identify the minimum cost of any iteration, and then multiply
it by the number of iterations. Unfortunately, the minimum cost of the “for k”
loop is very small (consider when k is 1). Taking this minimum cost of Ω(1)
and multiplying it by the number of iterations, we have Ω(1) ∗ Ω(n), which does
not get the n2 bound we want.
One solution is inspired by the example in the last simplification rule: con-
sider just the work done by the last n/2 iterations. For these iterations, we
have j ≥ n/2 and so the cost per iteration of the outermost loop is Ω(n). Since

8
n/2 ∈ Ω(n) we can multiply the number of (considered) iterations by the mini-
mum cost per (considered) iteration. There are Ω(n) considered iterations and
each costs Ω(n). Thus the bound is Ω(n2 ). Combined with the O(n2 ) analysis,
we have shown the running time is in Θ(n2 ).

If-Then-Else statements are tricky. We shall assume the condition tested is


cheap and can be ignored. First, note that a missing “else” can be modelled by
an else statement that does nothing significant and costs Θ(1).
The problem is that we may have to use human cleverness (that cannot be
written into a formula) to deal properly with the fact that sometimes we execute
the “then” part, and other times we execute the “else” part.
For big-Oh, there is a sloppy but correct solution: we take the maximum (or
the sum) of the two parts. Whatever actually happens, it it can be no worse
than our pessimistic bound. The difficulty is with code like

if (<something that rarely happens>)


<something expensive>
else
<something cheap>

For big-Omega, the similar sloppy solution is to take the smaller of the two
costs. This will give a bad bound for cases like
if (<something that rarely happens>)
<something cheap>
else
<something expensive>
For big-Theta, we generally have to use human cleverness. Consider

for i=0 to n
if i is even
for j=1 to n*n
k=k+A[j]
else
for j=1 to n
k=k+A[j]

Working from the inside out, we discover the “then” part would cost Θ(n2 ),
but the “else” part would cost only Θ(n). Using human cleverness, we might
observe that every iteration (of the outermost loop) with i being even is followed
by an iteration where i is odd. So there are n/2 pairs of iterations, each doing
Θ(n2 ) + Θ(n) ⇒ Θ(n2 ) work. Since n/2 is Θ(n), we get a final cost of Θ(n3 ).

Method calls can be handled by first analyzing the cost of the method being
called. This cost should be a function of its parameters (e.g., how much data is
being passed in to it).

9
Once you have determined the cost function of the method, when the method
is called you can plug in the input costs.
E.g., suppose you have a method InsertionSort. As its parameter, it takes
an array with m elements. It will do Θ(m2 ) work, worst case, when invoked.
worst
So let TISort (m) = Θ(m2 ).
Consider code that repeatedly calls the method.
// assume array A with n items

for i = 1 to log(n)
create a new array B with n items
// copy A into B
for j = 1 to n do
B[i]=A[i]
call InsertionSort(B)
Analyzing it: The innermost statement is an assignment and takes Θ(1)
time. It’s nested inside a loop that runs n times, so the “for j” loop as a whole
takes Θ(n) time. Let’s assume it takes Θ(1) to create array B. Then, the body
of “for i” loop is a block of three consecutive statements: the creation of B,
the “for j” loop, and the “call InsertionSort” statement. The last of these three
worst
things costs, when asking it to process n items, TISort (n). And we have been
told already that TISort (m) = Θ(m ). So the last of these 3 things costs Θ(n2 ).
worst 2

Our block has a total cost of Θ(1) + Θ(n) + Θ(n2 ) which simplifies to Θ(n2 ).
Continuing to work from the inside out, we are now ready to analyze the whole
“for i” loop. We’ve just determined that its body costs Θ(n2 ), and this cost
does not depend on i. The body is repeated Θ(log n)) times, so the total cost
is Θ(n2 log n). This is a worst case, and it can arise if A is sorted backwards.
This illustrates a subtle issue: what if the worst-case behaviour of the called
method cannot happen, according to the way that the method is actually used?
Now consider what happens if we just repeatedly sort the same array, over and
over.
// assume array A with n items

for i = 1 to log(n)
create a new array B with n items
// copy A into B
for j = 1 to n do
B[i]=A[i]
call InsertionSort(A) // only change is B --> A
The first call to InsertionSort(A) can process a worst-case input and thus
take Θ(n2 ) time. However, that puts A into sorted order, which is a best-case
scenario for Insertion Sort, where the sorting is done in Θ(n) time. The proper
analysis would then be that the iteration with i = 1 costs Θ(n2 ) and then
we have Θ(log n) iterations that each cost Θ(n), for a total cost of Θ(n2 ) +
Θ(n log n) ⇒ Θ(n2 ).

10
In a sense, big-Oh is a safer analysis here, because it is more-or-less correct
any
to say that TISort (m) = O(m2 ) for any possible worst-case use InsertionSort. So
a quick big-Oh analysis could just say that the body of the “for i” loop costs
O(n2 ), and since the loop repeats log n times, the total cost of the code fragment
is O(n2 log n). This is an example where the big-Oh answer is quickly obtained
and mathematically correct, but still the bound is not tight: the better answer is
Θ(n2 ) — but getting that answer required more work and some human insight.

8 Recursion
The analysis techniques so far do not suffice to analyze recursive algorithms.
Later in the course, we will need to handle recursion.
There are two major approaches:
1. Reasoning based on tracing the pattern in which the recursion unfolds,
typically using a “recursion trace”.

2. Forming a mathematical “recurrence relation” that describes the running


time. The recurrence relation is a mathematical object that is recursively
defined in terms of itself. Fortunately, mathematicians have developed
various methods for solving recurrence relations.

8.1 Recursion Trace


We can make a diagram that shows how the recursion unfolds. (The Sedgewick
textbook does this on page 274.) At the top (“root”) of diagram, we have a circle
(“node”) that represents the first call to the recursive method. Usually, inside
the circle we depict the parameter values for that first call. Beneath the root
node, we have circles/nodes that represent the recursive calls made (directly) by
the root node. Their left-to-right order is based on the time when the call was
made (the earliest is leftmost). Inside each node, we have the parameter values
passed to the recursive call. A line segment (called an edge) connects these
nodes to the root. Since each node that does not represent a base-case situation
will itself make some recursive calls, these second-level nodes will themselves be
joined to some third-level nodes, and so forth.
For an example, see Figure 2, which arises from the code in Figure 1, with
the top-level call to foo(4).
It may be possible to use the recursion trace to reason about the cost of
the recursive call. As a simple example, for this code, the recursion trace for
foo(N ) has N levels. Each level has no more than twice the number of nodes
than the level above it. So the total number of nodes in the recursion trace is
O(2N ), which is also the total number of recursive calls made. For each non-
base-case recursive call we do O(n2 ) looping work, where n is the parameter
value for that call. But since we can reason that n < N , we can say that each
of O(2N ) nodes is associated with O(N 2 ) work that is done. In total, we can

11
int foo(int n) {
if (n <= 1) return 67; // base case
int z=0;
for (int i=0; i<n*n; ++i)
z += i;

int temp = foo(n-1);


int temp2 = foo(n/2);
return temp+temp2+z;
}

Figure 1: Recursive method foo.

3 2

2 1 1 1

1 1

Figure 2: Recursion trace.

conclude that the running time of foo(N ) is O(N 2 2N ). (This result is true, but
has relied on some pessimistic oversimplications.)
The structure generated by the recursion trace is a “tree” structure (think
about family trees) and the terminology of nodes, edges and levels is commonly
used in CS. The structure is sometimes also called a “call tree”.

8.2 Recurrences
A recurrence is a way of describing a mathematical function in terms of itself.
Mathematicians often use them to describe the later elements in a sequence in
terms of elements that came earlier in the sequence. A famous example would
be the Fibonacci sequence, where the first two numbers in the sequence are 0
and 1 (although others use 1 and 1), and every later numbers in the sequence
is the sum of the two numbers before it. So the sequence is
0,1,1,2,3,5,8,13,21,. . . . We can consider the function F that takes the input n
and then outputs the nth number in the Fibonacci sequence.
We can then describe the function as

 0 , if n = 1
F (n) = 1 , if n = 2
F (n − 1) + F (n − 2) , if n > 2

This recurrence describes the behaviour of some function F , but the recursive

12
nature of it is unsatisfying. In software engineering terms, it is a specification
of a function, and this specification may be met by no function, one function,
or many functions 1 . In ordinary algebra, the property 3x + 2 = 2x is satisfied
by one value of x. However, the property x = x + 1 is not satisfied by any value
of x, and the property x = x is satisfied by infinitely many values.
For specifications like 3x + 2 = 2x, ordinary algebra helps you solve x to
a numerical value. When a function is specified recursively, we would like our
solution to be (hopefully one) function that is described non-recursively and
without other pesky things like sums, often said to be a “closed-form” solution.
Depending on your previous math courses, you may know that the mathemati-
cians know a closed-form solution:
√ !n √ !n !
1 1+ 5 1− 5
F (n) = √ − .
5 2 2
For the example recursive method foo examined earlier, let the (unknown)
function measuring the number of operations executed be called T (n), where
the mathematical variable n represents the value of the programming parameter
that shares the same name. (The name T is very common, because the number
of operations is our model of Time.) Although we do not know a closed form
solution for T (n) yet, we can still analyze the code in method foo.
First, note the behaviour of the method when n ≤ 1. We see that Θ(1)
operations will occur. There will be a comparison and a return, and possibly a
few other minor operations. So we know to write T (n) = Θ(1) if n ≤ 1.
If n is larger, then we do a loop that takes Θ(n2 ) operations. We also do
an initialization and a return. The total cost of this is Θ(n2 ) + Θ(1), which
simplifies to Θ(n2 ). But we also make two recursive calls, and we have to
account for all the operations that this will lead to. Our first recursive call is
with parameter n − 1. So, I need to write down an expression that represents
the number of operations executed when foo is called a value of n − 1. Hmmm.
Wait! I have a name already for “the number of operations executed when foo
is called with some parameter value x”—it’s T (x). So the expression I need for
the cost of the first recursive call is T (n − 1). Similarly, my second recursive call
is with the code parameter n/2. Since Java’s integer division rounds down, and
in the world of math we don’t get this behaviour from “/”, the mathematical
expression representing the value of the parameter is b n2 c. Thus the cost (i.e.,
number of operations) from the second recursive call is T (b n2 c). Therefore, when
n > 1 we have T (n) = Θ(n2 ) + T (n − 1) + T (b n2 c).
Putting it all together, our recurrence describing T is

Θ(1) , if n ≤ 1
T (n) =
Θ(n2 ) + T (n − 1) + T (b n2 c) , if n > 1
Now we need to solve the recurrence, so that we can get a closed-form ex-
pression for T.
1 More advanced math courses involving differential or difference equations also end up

solving for unknown functions, and it is not surprising that the techniques we need can overlap
with those used in differential equation courses.

13
8.3 Solving Recurrences:
There are a variety of methods for solving recurrences:
1. the characteristic equation method, which is sometimes taught in CS3913
and would be comfortable to mathematicians;

2. a calculus-based approach based on “generating functions” sometimes


taught to graduate students, which would also be comfortable to mathe-
maticians;
3. the Master Theorem, which lets you read out an asymptotic answer for a
very specific class of recurrences;
4. the Plug-and-Crunch (a.k.a Repeated Substitution) method, tedious but
very general; and
5. the Recursion Tree method, a visual form of repeated substitution;

6. the “ask Maple or Wolfram Alpha” method — some packages for doing
symbolic math are able to solve many recurrences.
We first look at the Master Theorem and then look at the Plug-and-Crunch
approach.

Master Theorem: The Master Theorem approach is used to solve the kind
of recurrences that typically arise from “divide and conquer” algorithms. Many,
perhaps most, of the best-known recursive algorithms fit into this category,
which will be studied in CS3913. In CS2383, Merge Sort is a classic example
of a divide-and-conquer algorithm. A divide and conquer algorithm takes its
input (of size n). If n is small enough (less than some constant), the algorithm
does O(1) work in its base case. Otherwise, it divides its input into some
number of subproblems. (Each subproblem asks us to solve the same kind of
problem as the overall problem, just on less data.) Let’s suppose it divides
its input into A subproblems, each of size n/B, for some constants A and B
with A ≥ 2 and B ≥ 2. Each subproblem is then solved recursively. Finally,
the algorithm combines the A subprobem solutions into an overall solution.
Suppose the amount of work done to divide the problem into subproblems and
then reassemble the solutions is Θ(nk ) for some constant k.
This leads to a recurrence of

O(1) , if n < c
T (n) =
A · T (n/B) + Θ(nk ) , if n ≥ c

where constants A, B ≥ 2 and constant c ≥ 1, and k is a constant.


The Master Theorem applies to recurrences of this form. To use the theorem,
identify the values of A, B and k. Then compare A against B k and read off the
answer as below:

14

 O(nk log n) , if A = B k
T (n) = O(nk ) , if A < B k
logB (A)
O(n ) , if A > B k

For example, the Merge Sort recurrence is



O(1) , if n < 2
T (n) =
2T (n/2) + Θ(n) , if n ≥ 2

It fits the pattern, with A = 2, B = 2, k = 1. The first case applies, so we


know the running time of Merge Sort is O(n log n).
Consider our first recurrence example,

Θ(1) , if n ≤ 1
T (n) =
Θ(n2 ) + T (n − 1) + T (b n2 c) , if n > 1

Unfortunately, our Master Theorem approach does not work, because the two
subproblems are not the same size, and also the first subproblem is only 1
smaller than the original problem (it needs to be some fraction of the original
problem).
In truth, after the original Master Theorem was discovered and promoted in
the 1970s, there have been increasing complicated versions discovered that can
handle more cases. For instance, what if A is almost, but not quite, a constant?
What if the algorithm creates some subproblems of size n/2 and some of size
n/3? We sometimes study these more powerful versions of the Master Theorem
in CS3913.

Plug and Crunch: In the plug-and-crunch method, one repeatedly substi-


tutes the recurrence into itself, simplifies, and looks for patterns that emerge. It
is somewhat inelegant, and textbook authors sometimes modify the basic idea
with clever simplifications. The analysis on pages 273 and 274 of Sedgewick’s
textbook has been cleaned up in this fashion.
For Plug and Crunch, we tackle an exact recurrence without internal big-Oh
or big-Theta notation. For instance, Sedgewick has an exact recurrence when
determining C(N ), which is related to the number of comparisons done when
Merge Sort processes N items. He assumes that N is an even power of two, say
N = 2n , and has

0 , if N = 1
C(N ) =
C(bN/2c) + C(dN/2e) + N , if N > 1
Because N = 2n , repeatedly dividing by 2 never creates any fractions, so the
floors and ceilings are not needed.

0 , if N = 1
C(N ) =
2C(N/2) + N , if N > 1
A simple minded plug and crunch for large N then writes

15
C(N ) = 2C(N/2) + N
and then we start substituting for occurrences of C on the right hand side.
Since the recurrence is generally true for all values greater than one, we can
replace C(N/2) with 2C((N/2)/2) + (N/2), because it essentially says C(∗) =
2C(∗/2) + ∗ and we are filling in * with N/2.
After the substitution (plug), we have

C(N ) = 2[2C((N/2)/2) + (N/2)] + N

where the part in brackets is the result of the substitution.


We simplify (crunch) this to get

C(N ) = 4C(N/4) + N + N
And then we can plug into the C(N/4), filling in * by N/4:

C(N ) = 4[2C((N/4)/2) + (N/4)] + N + N


then crunch it to

C(N ) = 8C(N/8) + N + N + N
.
Another plug ( * is N/8) and crunch gives

C(N ) = 16C(N/16) + N + N + N + N
.
At this point, a pattern should be obvious: after k plug-and-crunch steps,
we will have

C(N ) = 2k C(N/2k ) + kN
.
Now, eventually, we will have done so many substitutions that we will hit
our base case of C(1) = 0. This happens when N/2k = 1; i.e., N = 2k .
At this point, we would have

C(N ) = 2k C(1) + kN = 2k · 0 + kN = kN
Since N = 2k , we know k = log2 N , from which we conclude

C(N ) = log2 (N ) · N
.
The approach used by Sedgewick is a less obvious, cleaned-up version of the
work shown above.

16
Getting an Exact Recurrence: If you have a running-time recurrence, it is
likely to have internal big-Oh or big-Theta expressions. Usually, we solve a “re-
lated” exact relation. If there is an internal Θ(nk ), one would normally replace
it by cnk , after looking carefully around to make sure no real mathematicians
are in sight. If you are absolutely certain there are no mathematicians in the
area, you might even replace Θ(nk ) by nk . After getting an exact solution, if you
are especially bold you might slap a Theta around it and simplify. There’s some
chance that you will get the right answer. . . but to do this correctly, you would
need to justify each step, and I suspect that sometimes these moves cannot be
justified.

Checking a Guess: The plug-and-crunch approach is not always safe, as


need to guess a pattern, and you might guess wrong. However, if you have a
guessed solution (no matter how obtained), you can check its correctness easily:
a guessed “solution” is correct if it satisfies the specification (the recurrence).
Otherwise, it’s wrong. . .
First, let’s check that the solution obtained above is correct:
C(1) is supposed to be 0. Plug in our guessed C of N log2 N , getting C(1) =
1 log2 1 = 1 · 0 = 0. So far, so good.
The second part of the specification says that

C(N ) = 2C(N/2) + N when N is a larger power of 2.

Plugging in our guessed C, we get

N log N = 2[(N/2) log2 (N/2)] + N


and after some simplification:

= N log2 (N/2) + N

= N (log2 N − log2 2) + N

= N (log2 N − 1) + N

= N log2 N − N + N

= N log2 N
So the left and right hand sides are indeed equal, and we have satisfied the
second part of the specification. Our guessed solution works.
On the other hand, if you guess incorrectly, the specification won’t be met.
For example, let’s suppose I guessed C(N ) = N − 1 as a solution.
For the first part of the specification, we need C(1) = 0. Since C(1) = 1 − 1,
the first part of the specification is met.

17
However, for the second part of the specification, we need

C(N ) = 2C(N/2) + N when N is a larger power of 2.

Let’s try it:

N − 1 = 2(N/2 − 1) + N

= N − 2 + N = 2N − 2
.
However, it is not the case that N − 1 = 2N − 2 whenever N is a larger
power of 2. For instance, if N = 4, N − 1 = 3 but 2N − 2 = 6.
Since the second part of the specification is not met, we know my “solution”
is wrong.

18

You might also like