Professional Documents
Culture Documents
Phylogenetic Analysis
Phylogenetic Analysis
analysis
Evolution
The theory of evolution is the
foundation upon which all of
modern biology is built.
From anatomy to behavior to genomics, the
scientific method requires an appreciation
of changes in organisms over time.
By looking at gene sequences, one can see
evolution in action.
Taxonomy
The study of the relationships
between groups of organisms is
called taxonomy
Taxonomy is the art of classifying
things into groups established as a
mainstream scientific field byCarolus
Linnaeus (1707-1778).
Phylogeny
Branching history of evolutionary lineages
New branches arise via speciation
Speciation occurs when gene flow is
severed between populations
Phylogenetic relationships depicted as a
tree
Phylogenetic data
Morphology
Secondary chemistry
Cytology
Allele frequencies
Protein sequences
Molecular data
Restriction sites
DNA sequences
Cladistic Methods
Cladistics was developed by Willi Hennig, a
German entomologist, in 1950.
Phylogenetic systematics (cladistics) is a
method of taxonomic classification of
organisms based on their evolutionary
history.
Cladistic methods construct a tree
(cladogram) by considering the various
possible pathways of evolution and choose
from among these the best possible tree.
Phenetic Methods
A phylogram is a tree with
branches that are proportional to
evolutionary distances.
AdvantageousSomatic Cells
Mutations
Disadvantageous
Germline Cells
Genetic changes
affecting the germline
(mutation)
Base substitutions
Insertions
Deletions
Exon shuffling
Transposition
Phylogenetic Tree
Terminology
Phylogenetic Tree
Terminology
distance scale : scale which represents the
number of differences between sequences
(e.g. 0.1 means 10 % differences between
two sequences)
Molecular Clocks
A molecular clock is a concept based on the
assumption that mutations occur at some
regular, more or less predictable rate. Thus,
if a certain amount of time has passed, a
certain number of mutations can be
expected to have occurred.
Phylogenetic Tree
Types of Trees
Rooted trees
A rooted tree infers the existence of an
actual common ancestor and defines the
evolutionary
paths
leading
to
the
development of each organism.
It provides an indication of the direction of
the evolutionary process, defining ancestral
and derived characters or species.
Taxon A
Taxon B
Taxon C
Taxon D
Types of Trees
Unrooted trees
An
unrooted
tree
shows
only
the
evolutionary relationships between the
organisms in the tree, and does not actually
infer the placement of a common ancestor in
the structure or the evolutionary path used to
obtain the current relationships. The direction
of the evolutionary process is not given.
Taxon C
Taxon A
Taxon D
Taxon B
Phylogram
6
Taxon B
Taxon C
Taxon A
Taxon D
no
meaning
1
1
3
1
Ultrametric tree
Taxon B
Taxon B
Taxon C
Taxon C
Taxon A
Taxon A
Taxon D
Taxon D
genetic change
time
All show the same evolutionary relationships, or branching orders, between the taxa.
A
B
D
E
A
B
D
E
Root
Unrooted tree
Rooted tree
Root
Rooted tree 1b
Rooted tree 1c
Rooted tree 1d
Rooted tree 1e
These trees show five different evolutionary relationships among the taxa!
By outgroup:
Uses taxa (the outgroup) that are known
to fall outside of the group of interest (the
ingroup).
Requires some prior knowledge about the
relationships among the taxa.
The outgroup can either be species (e.g.,
birds to root a mammalian tree) or previous
gene duplicates (e.g., a-globins to root bglobins).
outgroup
By midpoint or distance:
Roots the tree at the midway point between the
two most distant taxa in the tree, as
determined by branch lengths.
Assumes that the taxa are evolving in a clocklike manner. This assumption is built into some
of the distance-based tree building methods.
Four steps of
phylogenetic analysis
Alignment
Choosing algorithm
Tree building
Tree evaluation
Alignment
Input for phylogenetic analysis is
usually multiple sequence alignment
Output only as good as input
Any problems in alignment lead to
false relationships
Code indels separately
Choosing algorithm
Character and Distance Data
Phenetic methods or
Distance methods
Compress sequence information into a
single number
Two sequences with the shortest distance
are considered as closely related taxa.
Distance methods construct a tree in
stepwise manner, that is, the two most
similar sequences are grouped together,
then the next most similar sequences, and
so on.
Phenetic methods or
Distance methods
Phenetic methods are based on the relative
numbers of similarities and differences to group
organisms in a branching hierarchy tree called a
phenogram.
In the phenogram,taxa with the shorter distances
are classified more closely than those that do not.
Distance methods are more interested in the
relationships among data sets than evolutionary
pathway.
UPGMA (Distance
Based)
A
Lowest Score=2
B
Group A & B
UPGMA (Distance
Based)
AB
AB
Example:
AB/C=AC+BC/2
=4+4/2
=4
B
F
UPGMA (Distance
Based)
AB
AB
C
DE
DE
(AB)C
DE
0
4
(AB)C
DE
D
E
A
B
C
UPGMA (Distance
Based)
(AB)C(DE)
A
B
(AB)C(DE)
C
0
D
E
Advantages
Simple
Less time consuming
Accurate Method
Disadvantages
Look for more similar sequences
Subsitution Neglected
Neighbour Joining
Method
A Neighbor joining is also called as star
decomposition method
The phylogenetic tree is constructed from a
star-like tree by grouping OTUs with
shortest distance of branch length together.
This method is very suitable with dataset
consisting descendants with largely varying
rates of evolution.
Neighbour Joining
Method
10
11
Neighbour Joining
Method
Step1: We calculate the net divergence r (i)
for each OTU from all other OTUs
r(A) = 5+4+7+6+8=30
r(B) = 42
r(C) = 32
r(D) = 38
r(E) = 34
r(F) = 44
Neighbour Joining
Method
Step2: Now we calculate a new distance
matrix using for each pair of OTUs the
formula:
M(ij)=d(ij) - [r(i) + r(j)]/(N-2) or in the case
of the pair A,B:
M(AB)=d(AB) -[(r(A) + r(B)]/(N-2) = -13
Neighbour Joining
Method
-13
-11.5
-11.5
-10
-10
-10.5
-10
-10
-10.5
-13
-10.5
-10.5
-11
-11.5
-11.5
Neighbour Joining
Method
Step3: Now we choose as neighbors those two
OTUs for which Mij is the smallest. These are A
and B and D and E. Let's take A and B as
neighbors and we form a new node called U. Now
we calculate the branch length from the internal
node U to the external OTUs A and B.
S(AU) =d(AB) / 2 + [r(A)-r(B)] / 2(N-2) =
1 S(BU) =d(AB) -S(AU) = 4
Neighbour Joining
Method
Step4: Now we define new distances from
U to each other terminal node:
d(CU) = d(AC) + d(BC) - d(AB) / 2 =
3 d(DU) = d(AD) + d(BD) - d(AB) / 2
= 6 d(EU) = d(AE) + d(BE) - d(AB) /
2 = 5 d(FU) = d(AF) + d(BF) - d(AB)
/ 2 = 7 and we create a new matrix:
Neighbour Joining
Method
U
Advantages
Is fast and thus suited for large datasets
permits lineages with largely different
branch lengths
permits correction for multiple substitutions
Disadvantages
sequence information is reduced
gives only one possible tree
strongly dependent on the model of
evolution used.
Cladistic methods
The principle behind cladistic method is parsimony - a
hypothesis that requires less assumption.
Cladistic methods group organisms that share derived
characteristics in a branching hierarchy tree called a
cladogram.
In contrast with phenetic methods, cladistic methods
emphasize more on the evolutionary origin of species than
the relationships.
Assumes that a set of sequences descended from a
common ancestor by mutated and selected processes
Cladistic methods
An important concept in cladistic methods
is informative sites.
A position is considered as informative
when there are at least two different
nucleotides in multiple alignments at that
position and each of these nucleotides must
be present at least twice.
Informative and
Uninformative site
Position
Sequences
Maximum parsimony
Maximum parsimony assumes that trees
with the minimum number of evolutionary
changes are the most preferable trees.
Maximum parsimony bases on the number
of character-state changes to construct all
possible trees and give each a score.
The most parsimonious tree is that with
fewest character-state changes.
#Difference
Outgroup
Seq1
Seq2
Seq3
Outgroup
Seq1
out group.
Seq2 is the next lowest number of
derived character states, so it is
joined to the tree next. The Seq2 is
connected to Seq1 because Seq2
and Seq1 differ each other only one
evolutionary step while Seq2
requires two evolutionary steps in
comparison with outgroup.
Outgroup
Seq2
Seq1
c.
d.
Advantages
Reflect the ancestral relationship.
Use all known evolutionary information.
Disadvantages
Yield little information about branch length.
Require long computation time
Yield biased tree under some conditions.