Professional Documents
Culture Documents
Lec 7
Lec 7
and Computability
Search
Introduction
In this lecture we will address the Naïve String Search, understand
how it works and analyze its complexity. Then, we show how efficient
string-searching algorithms, in particular the KMP algorithm , can be
used to match specimens against items in very large DNA databases.
We also go back to the ADT Binary Tree and the more specific form
of this, the ADT AVL Tree , showing how these can be used in search
problems. We will explore methods of improving efficiency by keeping
trees in balance, examine the algorithms for doing so, and perform
complexity analyses on these.
Agenda
❑Basic string search (Naïve Search) or (Pattern
Matching Algorithm - Brute Force)
❑The Knuth–Morris–Pratt algorithm
❑Search trees
▪ Binary search trees
String Matching Problem
3 -7
Brute-Force Algorithm
• The brute-force pattern
matching algorithm compares Algorithm BruteForceMatch(T, P)
the pattern P with the text T for Input text T of size n and pattern
each possible shift of P relative P of size m
to T, until either Output starting index of a
substring of T equal to P or -1
• a match is found, or if no such substring exists
• all placements of the pattern for i 0 to n - m
have been tried
{ test shift i of the pattern }
• Brute-force pattern matching j0
runs in time O(nm) while j < m T[i + j] = P[j]
• Example of worst case: jj+1
• T = aaa … ah if j = m
• P = aaah return i {match at i}
else
• may occur in images and DNA break while loop {mismatch}
sequences
return -1 {no match anywhere}
• unlikely in English text
The KMP Algorithm – Motivation
Proposed by Knuth, Morris and Pratt in 1977
Algorithm KMPMatch(T, P)
• The failure function can be represented by F failureFunction(P)
an array and can be computed in O(m) time i0
j0
• At each iteration of the while-loop, either while i < n
if T[i] = P[j]
• i increases by one, or if j = m - 1
• the shift amount i - j increases by at least return i - j { match }
one (observe that F(j - 1) < j) else
ii+1
• Hence, there are no more than 2n iterations jj+1
of the while-loop else
if j > 0
• Thus, KMP’s algorithm runs in optimal time j F[j - 1]
O(m + n) else
ii+1
return -1 { no match }
Computing the Failure Function
Example 1
a b a c a a b a c c a b a c a b a a b b
1 2 3 4 5 6
a b a c a b
7
a b a c a b
j 0 1 2 3 4 5
8 9 10 11 12
P[j] aa b ab c a a b c a b
13
a b a c a b
F(j) 0 0 1 0 1 2
14 15 16 17 18 19
a b a c a b
Example 2
a b x a b c a b c a b y
j 0 1 2 3 4 5
P a b c a b Y
F 0 0 0 1 2 0
Time Complexity of KMP Algorithm
3 -15
Binary Search
• Example: find(4): 2
>
9
• Call TreeSearch(4,root)
1 4 = 8
• The algorithms for nearest
neighbor queries are similar
Insertion
6
<
• To perform operation put(k, 2
>
9
o), we search for key k (using 1 4 8
TreeSearch) >
• Assume k is not already in the
w
tree, and let w be the (None)
leaf reached by the search 6
• We insert k at node w and
2 9
expand w into an internal
node 1 4 8
• Example: insert 5 5
w
Insertion Pseudo-code
Deletion
6
<
• To perform operation 2
>
9
remove(k), we search for key k
1 4 v 8
• Assume key k is in the tree, and w
let let v be the node storing k 5