Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

BLOSUM SCORING MATRIX

Tuesday, November 8th, 2011 Amity Institute of Biotechnology Amity University Rajasthan Jaipur (Rajasthan)

Presented by: Nihit Aggarwal B.Tech.Bioinformatics SemesterV

Development and need for blosums


PAM matrices derived from sequences with at least 85% identity Alignments usually performed on sequences with less similarity Henikoff & Henikoff (1992) develop scoring system based on more diverse sequences BLOSUM Blocks Substitutions Matrix Blocks defined as ungapped regions of aligned AAs from related proteins

Statistics of occurrence of AA pairs obtained


As with PAM frequency of co-occurrence of AA pairs and individual AAs employed to derive Odds ratio BLOSUM matrices for different evolutionary distances Unlike PAM cannot derive direct from original matrix Scoring Matrices derived from Blocks with differing levels of identity

DIFFRENCES B/W PAM & BLOSUM


PAM based on predictions of mutations when proteins diverge from common ancestor explicit evolutionary model BLOSUM based on common regions (BLOCKS) in protein Families BLOSUM better designed to find conserved domains BLOSUM - Much larger data set used than for the PAM matrix BLOSUM matrices with small percentage correspond to PAM with large evolutionary distances BLOSUM64 is roughly equivalent to PAM 120 PAM= % Accepted Mutations: 1500 changes in 71 groups w/ > 85% similarity BLOSUM = Blocks Substitution Matrix: 2000 blocks from 500 families

PAM matrices are based on global alignments of closely related proteins. The PAM1 is the matrix calculated from comparisons of sequences with no more than 1% divergence. Other PAM matrices are extrapolated from PAM1.

BLOSUM matrices are based on local alignments. BLOSUM 62 is a matrix calculated from comparisons of sequences with at least 62% identity in the blocks. All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins.

ODD & LOGS ODDS


Odds score: The ratio of the likelihood of 2 events or outcomes i.e. probability.

Log Odds score: Logarithm of odds score, in sequence alignment, it is added whereas, mismatch/gap is subtracted.

CONSTRUCTING THE MATRIX


BLOSUM Scoring Matrices Overall procedure to develop a BLOSUM X matrix Collect a set of multiple alignments Find the Blocks (no gaps) Group segments of Blocks with X% identity Count the occurrence of all pairs of AAs Employ these counts to obtain odds ratio (log) Most common BLOSUM matrices are 45, 62 & 80

Blosum 62
Captures mutation rates between divergent proteins Why is BLOSUM62 called BLOSUM62? Basically, this is because all blocks whose members shared at least 62% identity with ANY other member of that block were averaged and represented as 1 sequence

A R N D C Q E G H I L K M F P S T W Y V

4 -1 5 -2 0 6 -2 -2 1 6 0 -3 -3 -3 9 -1 1 0 0 -3 5 -1 0 0 2 -4 2 5 0 -2 0 -1 -3 -2 -2 6 -2 0 1 -1 -3 0 0 -2 8 -1 -3 -3 -3 -1 -3 -3 -4 -3 4 -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -1 2 0 -1 -1 1 1 -2 -1 -3 -2 5 -1 -2 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A R N D C Q E G H I L K M F P S T W Y V

Blosum62 scoring matrix

BLOSUM62
The idea of BLOSUM matrices is to get a better measure of differences between two proteins specifically for more distantly related proteins.

Similar AA have high score

With BLOSUM45 we found related and divergent sequences. With PAM30 we found only related sequences.
pam30 Blosum 45

By use of relative entropy, it can be found that PAM250 corresponds to BLOSUM-45 and PAM160 corresponds to BLOSUM-62, and PAM120 corresponds to BLOSUM-80

PAM VS. BLOSUM


PAM100 PAM120 PAM160 PAM200 PAM250 = = = = = BLOSUM90 BLOSUM80 BLOSUM60 BLOSUM52 BLOSUM45

More distant sequences PAM120 PAM120 for general use PAM60 for close relations PAM60 PAM250 for distant relations PAM250


BLOSUM62 for general use BLOSUM80 for close relations BLOSUM45 for distant relations


CONCLUSION
BLOSUM matrices are based on local alignments. BLOSUM stands for blocks substitution matrix and is used for diverged sequences. BLOSUM62,45,80 is a matrix calculated from comparisons of sequences with no less than 62% ,45% AND 80 % divergence respectively.

THANK YOU !!!

You might also like