Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Knuth Morris Pratt

Algorithm
Mansi A. Radke
What does it do?
• It finds the occurrences of a given input string s1 in the text/string s2
and at what offsets. i.e. it checks if s1 is a substring of s2 and at what
offset.

• Brute force algorithm (goes back and forth) and takes time
complexity of the order of (m*n)
ababaca
0012301

abacab
001012
ababaca
Lps array is
ababaca
0012301

Text is
bacbababaabcbab
Note:
If number of symbols is 5 O(m + n)
If number of symbols is 50 O(m + n)
If number of symbols is 500 O(m + n)

Depends only on text and pattern string length.


Complexity of the function

Compute prefix is O(M) or O(length of the string)

KMP O(m+n)
Knuth Morris Pratt Algorithm for string
matching
• Linear algorithm
• Complexity O(m+n)
• Never backtracks on the input text string T
• Makes use of prefix function
• The compute prefix function calculates the overlap of pattern string
with itself.
• The KMP matcher calculates the overlap between the text string and
the pattern string.

25-08-2022 IR-Winter-2020 Mansi A. Radke 11


Find the Longest prefix suffix array LPS or Pi as
you call it and see if the pattern string is present
in the original text and how many times and what
offsets
1)
Text T is : abacaabaccabacabaa
Pattern P is : abacab
Pi values: 001012
2)
Text T is: bacbabababacaab
Pattern P is: ababaca
Pi values: 0012301

25-08-2022 IR-Winter-2020 Mansi A. Radke 12


Amortized Analysis
Mansi A. Radke
The canteen chai example!
• 1 person 10 minutes in the worst case

• 100 minutes?

• Is’nt this too pessimistic?

• What when the tea pot is full? It can serve like 20 people!

• So, we do amortized analysis where we analyse the time complexity over a


sequence of operations.
• We take average over successive calls of an operation/function

• So, single operation might be costly but when we average over a


sequence of operations, the average cost of an operation is small

• AMORTIZED ANALYSIS IS DIFFERENT FROM AVERAGE CASE ANALYSIS


WHICH WE WILL STUDY LATER!

• No probabilities are involved here!


Methods for amortized analysis
• Aggregate analysis

• Accounting method

• Potential method
Aggregate analysis
• We determine an upper bound on the total cost of sequence of n
operations. The average cost per operation is then T(n)/n

• This average cost is called as amortized cost of each operation.

• In this method, ALL operations considered have the same amortized


cost.
Stack example
• Push(S, x)
• Pop(S)
• Both push and pop are constant time operations O(1)
• So, n push and pop operations take O(n) time
Stack example
• MULTIPOP(S,k) pops k objects. If k > size of stack, then empties the
stack and stops (k is positive integer)

• Running time of MULTIPOP?


• Minimum of (k, size_of_stack)
• Consider we have a sequence of n push, pop and MULTIPOP
operations together

• Worst case for an operation is O(n) as MULTIPOP is of order n.

• So n operations will be O(n2) assuming we have n MULTIPOP


operations

• This is not a tight analysis!


• You can pop an object only when you have pushed it!

• Maximum how much can u push? n times as there are n operations.

• If you start with an empty stack, and you call MULTIPOP n times, it will
empty the stack in the first call and rest of the calls will be O(1)

• If pop, multipop and push all are combined, total cost would be O(n)

• So, average cost or amortized cost of n operations is O(n)/n i.e. O(1).


Incrementing a k bit binary counter example
INCREMENT(A)
00000000 00001000 i=0
00000001 00001001 while(i< A.length and A[i] == 1)
A[i] = 0
00000010 00001010 i++
if(i<A.length)
00000011 00001011 A[i] = 1;
00000100 00001100
00000101 00001101
The cost of each INCREMENT operation is linear in number
00000110 00001110 of bits flipped
00000111 00001111 In the worst case 01111111 is incremented to 10000000
00010000 for k = 8.

So, for n operations, we have O(n*k)


Consider a sequence of n INCREMENT
operations
A[0] flips every time • So total number of flips =
A[1] flips n/2 times
A[2] flips n/4 times

A[i] flips n/2i times

Thus we can see the amortized cost


of n operations is o(n)/n which is
O(1)
Amortized analysis of KMP
KMP Analysis Amortized
What is the worst case for computelpsArray?
• Abcde • Exactly parallel argument for KMPSearch
• 00000
• i is strictly less than.
• So, i is strictly less than m • New value of j is strictly less than its current value
• New value of len is strictly less than its current value • j is always less than i
• len can never be greater than i • j never becomes negative
• And len never becomes negative! • So j can be decremented as much as i is
incremented!
• So len can be decremented only as much as i has
been incremented! • In the worst case n increments and n decrements
means O(2n) i.e loop runs 2n times
• If I can increment m times, len can decrement m
times
• In the worst case m increments and m decrements So KMP algorithm = O(2m+2n) = O(m+n) i.e. linear in
means O(2m) which is O(m) i.e. loop runs 2m times size of text string and pattern string
Thank you !
Any Questions?

You might also like