Professional Documents
Culture Documents
07 Read Mapping
07 Read Mapping
07 Read Mapping
• Input:
– Pattern p = p1…pm of length m
– Text t = t1…tn of length n
PatternMatching(p,t)
1 m length of pattern p
2 n length of text t
3 for i 1 to (n – m + 1)
4 if ti…ti+m-1 = p
5 output i
• AGCCGCATCT
• GCAT
• AGCCGCATCT
• GCAT
• AGCCGCATCT
• GCAT
• AGCCGCATCT
• GCAT
• AGCCGCATCT
• GCAT
• AGCCGCATCT
• GCAT
• AGCCGCATCT
• GCAT
• AGCAGCTAGCTAGT
• AGCTAGT
Knuth-Morris-Pratt Algorithm
• Linear time algorithm
• Avoids comparisons with positions in the text that
we have already checked
• AGCAGCTAGCTAGT
• AGCTAGT
Knuth-Morris-Pratt Algorithm
Pattern Matched so far
AGCTAGT AGCTAG
Proper prefix Proper suffix
ATCATG
TCATG
CATG Keyword Suffix
ATG Tree Tree
TG
G
Adapted from slides available at
www.bioalgorithms.info
Suffix Trees: Runtime
• Say the length of the text is n.
• Construction of the keyword tree takes O(n2) time.
• Constructing the suffix tree takes O(n) time.
• So our method takes O(n2 + n) = O(n2) time.
• However, there exists a method of calculating the suffix tree in O(n) time
without needing the keyword tree for the suffixes.
ATCATG
TCATG
CATG Keyword Suffix
ATG Tree Tree
TG
G
Adapted from slides available at
www.bioalgorithms.info
Pattern Matching with Suffix Trees
• To find any pattern in a text:
1. Build suffix tree for text of length n—O(n) time
2. Thread the pattern of length m through the suffix tree of
the text—O(m) time.
“c”
“gc” does not
occur in the text
“c”
“gc” does not
occur in the text
Substitution
“a” “a”
“a”
“a” “a”
“c”