Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

Bioinformatics (BIO213)

Session 12
Finding distantly related proteins: PSI-BLAST and
DELTA-BLAST
• Many homologous proteins share only limited sequence identity  pairwise alignments they
may have no apparent similarity.
• Such proteins may adopt the same 3D structures.

• The scoring matrices are sensitive to protein matches at various evolutionary distances.
• For Eg., PAM250 vs PAM10 log‐odds matrices  the PAM250 matrix provides a superior scoring system for
the detection of distantly related proteins.

• Database search with BLAST we can adjust the scoring matrix to try to detect distantly
related proteins, but only to some extent.
• Many proteins in a database are too distantly related to a query to be detected using a standard
BLASTP search.
• In many other cases, protein matches are detected but are so distant that the inference of
homology is unclear.
PSI-BLAST: a specialized kind of BLAST search
more often sensitive than a regular BLAST search

• We would like an algorithm that can assign statistical significance to


distantly related proteins that are true positives.
• minimize the numbers of false positive results and false negative results.
• False positives: reporting two proteins as related when they are not.
• False negatives: failing to report that two proteins are significantly related.
• PSI-BLAST: Position Specific Iterative BLAST
• PSI‐BLAST can look deeper into the database to find distantly related
proteins that match your protein of interest.
Illustration of PSI-BLAST:

(Weighted by evolutionary distance)


PSSM generation:
DELTA-BLAST
Domain Enhanced Lookup Time accelerated BLAST

Works when you are looking for proteins with


a known protein domain
Lets try PSI-BLAST and DELTA-BLAST!
• Human beta globin protein: NP_000509.1
MegaBLAST: a specialized tool for rapidly searching
long DNA queries against genomic DNA databases.

Here 50 kilobases of DNA spanning the human beta globin genes were used as a query restricted to
nonredundant sequences of Pongo pygmaeus.
Matches are to orangutan globin genes and pseudogenes (area 1) as well as to repetitive sequences (area 2).
Assignment:
• Read about MegaBLAST.
• Run MegaBLAST using genes encoding M-channel proteins (KCNQ’s) on
nr database restricting it to Mus musculus.
• You will have to find the accession number of the genes by yourselves.
• Take the top hit of MegaBLAST and translate it search nr database by
tblastn.
• Take the hits of tblastn and perform a BlastX on vertebrates (excluding
mouse, rat and humans) and invertebrates.
• Which is the top hits from both these categories?
• What are the %of Query cover, Identities and E-values of these hits
• Provide us the aligned hits from this exercise
Other BLAST programs!
• BLAST-Like Tool (BLAT)
• MegaBLAST
• LAGAN (Limited Area Global Alignment of Nucleotides)
Next class
• Multiple Sequence Alignments

You might also like