Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Theme of the lecture: sequence comparison

Dot plot

This is the simplest way of comparing two sequences. We put one sequence horizontally and the
other vertically. In the space between, we make a dot (or some other mark) wherever the
sequences have the same symbol (nucleotide or amino acid). This is very simple and fast, and
gives a good visual overview, which is useful for example when comparing genomes. See the dot
plot for two bacterial genomes L. acidophilus and L. bulgaricus. A disadvantage is that the dot
plot does not identify the best possible alignment.

Sequence alignments

- Alignment is a way of arranging DNA, RNA and protein sequences so that similar regions are
highlighted - Alignments are used for finding conserved and important positions, since these will
be visible in the alignment - Alignments are also used when doing homolog search in nucleotide
or amino acid sequence databases

Homologs, paralogs and orthologs

Homologs are genes/proteins that have evolved from the same ancestral gene/protein. Paralogs
and orthologs are two different types of homologs. Paralogs appear from gene duplication. For
example, the human alpha-chain and human beta-chain hemoglobins are paralogs. They are
present in the same species. Orthologs appear from speciation events. For example, the human
alpha-chain hemoglobin and the chimpanzee alpha-chain hemoglobin are orthologs. They are
present in different species.

Examples of global pairwise protein alignments

The orthologs HBA_HUMAN and HBA_PANTR were globally aligned. It was found that they
are 100% identical. So the human and chimpanzee alpha-chain hemoglobins are exactly equal.
Then the paralogs HBA_HUMAN and HBB_HUMAN were aligned. They were found to be
identical at 43.6% of the positions in the alignment. Matches between identical amino acids are
marked by two dots, matches between similar amino acids are marked by one dot, and matches
between non-similar amino acids (i.e. mismatches) are not marked by any dots. Note that gap
symbols have been added in a few places.

Examples of local and global alignments

The human and frog lipocalin proteins were aligned globally and locally. In the local alignment,
the identity increased slightly, from 26.7% to 27.6%. As marked on the next slide, the local
alignment identifies a conserved sub region which was not visible in the global alignment. It
starts with TADG, appearing from position 96 in the human sequence, and from
position 66 in the frog sequence. In general, local alignments can identify conserved regions that
are missed by global alignment.

Types of alignments

Alignments can be classified according to global vs. local, pairwise vs. multiple, nucleotide vs.
protein, and exact vs. heuristic. Additional categories are possible.

Ungapped and gapped alignments

In ungapped alignments we can only shift complete sequences to the left and right and compute
scores. In gapped alignments, we can add gap symbols, which represent two types of point
mutation, namely insert and delete. A simple scoring scheme takes only identity into account,
setting for example +1 for match, 0 for mismatch and -1 for gap symbols. This simple scoring
only works for nucleotide sequences.

Dynamic programming

The standard pairwise alignment algorithms, Needleman-Wunsch (global) and SmithWaterman


(local), both use the same strategy. Dynamic programming means breaking down a large
problem (here, finding the best of all possible alignments) into smaller sub-problems (here,
choosing between gap and match for one position). This strategy guarantees finding the highest
scoring alignment, but has the disadvantage of being too slow for multiple alignment.

You might also like