Phylogenetic Analysis

Phylogenetic
analysis
Evolution
The theory of evolution is the
foundation upon which all of
modern biology is built.
From anatomy to behavior to genomics, the
scientific method requires an appreciation
of changes in organisms over time.
By looking at gene sequences, one can see
evolution in action.
Taxonomy
The study of the relationships
between groups of organisms is
called taxonomy
Taxonomy is the art of classifying
things into groups established as a
mainstream scientific field byCarolus
Linnaeus (1707-1778).
Phylogeny
Branching history of evolutionary lineages
New branches arise via speciation
Speciation occurs when gene flow is
severed between populations
Phylogenetic relationships depicted as a
tree
Phylogenetic data
Morphology
Secondary chemistry
Cytology
Allele frequencies
Protein sequences
Molecular data
Restriction sites
DNA sequences
Cladistic Methods
Cladistics was developed by Willi Hennig, a
German entomologist, in 1950.
Phylogenetic systematics (cladistics) is a
method of taxonomic classification of
organisms based on their evolutionary
history.
Cladistic methods construct a tree
(cladogram) by considering the various
possible pathways of evolution and choose
from among these the best possible tree.
Phenetic Methods
A phylogram is a tree with
branches that are proportional to
evolutionary distances.
How Does Genetic

Variation Occur?
Genome
Gene
Replication
AdvantageousSomatic Cells
Mutations
Disadvantageous
Germline Cells
Genetic changes
affecting the germline
(mutation)
Base substitutions
Insertions
Deletions
Exon shuffling
Transposition
Mutations that do have an

evolutionary effect can be divided
into two categories,
loss-of-function mutations
gain-of-function mutations.
Phylogenetic Tree
Terminology
node : a node represents a taxonomic

unit.
branch : defines the relationship between
the taxa in terms of descent and ancestry.
topology : is the branching pattern.
branch length : often represents the
number of changes that have occurred in
that branch.
root : is the common ancestor of all taxa.
Phylogenetic Tree
Terminology
distance scale : scale which represents the
number of differences between sequences
(e.g. 0.1 means 10 % differences between
two sequences)
Molecular Clocks
A molecular clock is a concept based on the
assumption that mutations occur at some
regular, more or less predictable rate. Thus,
if a certain amount of time has passed, a
certain number of mutations can be
expected to have occurred.
Phylogenetic Tree
Types of Trees
Rooted trees
A rooted tree infers the existence of an
actual common ancestor and defines the
evolutionary
paths
leading
to
the
development of each organism.
It provides an indication of the direction of
the evolutionary process, defining ancestral
and derived characters or species.
Taxon A
Taxon B
Taxon C
Taxon D
Types of Trees
Unrooted trees
An
unrooted
tree
shows
only
the
evolutionary relationships between the
organisms in the tree, and does not actually
infer the placement of a common ancestor in
the structure or the evolutionary path used to
obtain the current relationships. The direction
of the evolutionary process is not given.
Taxon C
Taxon A
Taxon D
Taxon B
Three types of trees

Cladograms - have no scale
Phylograms or Additive trees genetic distance or amount of
change
Ultrametric trees or true
evolutionary trees - time
Three types of trees

Cladogram
Phylogram
6
Taxon B
Taxon C
Taxon A
Taxon D
no
meaning
1
1
3
1
Ultrametric tree
Taxon B
Taxon B
Taxon C
Taxon C
Taxon A
Taxon A
Taxon D
Taxon D
genetic change
time
All show the same evolutionary relationships, or branching orders, between the taxa.
The number of unrooted trees

increases
in
a
greater
than
exponential manner with number of
Ataxa
B
C
A
B
D
E
A
B
D
E
Inferring evolutionary relationships

between the taxa requires rooting
the tree:
B
To root a tree
mentally, imagine
that the tree is made
of string. Grab the
A
string at the root
A
and tug on it until
the ends of the string
(the taxa) fall
opposite the root
Root
Unrooted tree
Rooted tree
Root
An unrooted, four-taxon tree

theoretically can be rooted in five
2
4
A
C
different places
to
produce
five
1
The unrooted
tree 1:
different
rooted 5trees
B
Rooted tree 1a
Rooted tree 1b
Rooted tree 1c
Rooted tree 1d
Rooted tree 1e
These trees show five different evolutionary relationships among the taxa!
There are two major ways to

root trees:
By outgroup:
Uses taxa (the outgroup) that are known
to fall outside of the group of interest (the
ingroup).
Requires some prior knowledge about the
relationships among the taxa.
The outgroup can either be species (e.g.,
birds to root a mammalian tree) or previous
gene duplicates (e.g., a-globins to root bglobins).
outgroup
By midpoint or distance:
Roots the tree at the midway point between the
two most distant taxa in the tree, as
determined by branch lengths.
Assumes that the taxa are evolving in a clocklike manner. This assumption is built into some
of the distance-based tree building methods.
Four steps of
phylogenetic analysis
Alignment
Choosing algorithm
Tree building
Tree evaluation
Alignment
Input for phylogenetic analysis is
usually multiple sequence alignment
Output only as good as input
Any problems in alignment lead to
false relationships
Code indels separately
Choosing algorithm
Character and Distance Data
Molecular data 2 types of phylogenetic

trees
Character based (parsimony, maximum

likelihood )
Distance based (UPGMA, neighbour

joining)
Phenetic methods or
Distance methods
Compress sequence information into a
single number
Two sequences with the shortest distance
are considered as closely related taxa.
Distance methods construct a tree in
stepwise manner, that is, the two most
similar sequences are grouped together,
then the next most similar sequences, and
so on.
Phenetic methods or
Distance methods
Phenetic methods are based on the relative
numbers of similarities and differences to group
organisms in a branching hierarchy tree called a
phenogram.
In the phenogram,taxa with the shorter distances
are classified more closely than those that do not.
Distance methods are more interested in the
relationships among data sets than evolutionary
pathway.
UPGMA (Unweighted Pair

Group Method with
Arithmetic mean)
UPGMA is the simplest method of phylogeny.

It uses clustering approach to build a tree.
This method is only suitable with datasets
consisting lineages with relatively constant
rates of evolution.
Steps for building a tree

1. Construct distance matrix.
2. Cluster the two shortest distance OTUs into
an internal nodes.
3. Recalculate the distance matrix.
4. Repeat the process until all OTUs are
grouped in a single cluster
UPGMA (Distance
Based)
A
Lowest Score=2
B
Group A & B
UPGMA (Distance
Based)
AB
AB
Net Lowest Score

=DE=4
Example:
AB/C=AC+BC/2
=4+4/2
=4
B
F
UPGMA (Distance
Based)
AB
AB
C
DE
DE
(AB)C
DE
0
4
(AB)C
DE
D
E
A
B
C
UPGMA (Distance
Based)
(AB)C(DE)
A
B
(AB)C(DE)
C
0
D
E
Advantages
Simple
Less time consuming
Accurate Method
Disadvantages
Look for more similar sequences
Subsitution Neglected
Neighbour Joining
Method
A Neighbor joining is also called as star
decomposition method
The phylogenetic tree is constructed from a
star-like tree by grouping OTUs with
shortest distance of branch length together.
This method is very suitable with dataset
consisting descendants with largely varying
rates of evolution.

1.Start with distance matrix and star-like tree.
2.Group the two most similar taxa into a node
and calculate the branch length.
3.Recalculate the distance matrix and branch
length and construct a new tree.
4.Repeat the process until only one terminal is
present.
Neighbour Joining
Method
The raw data of the tree are represented by the following

distance matrix:
10
11
We have in total 6 OTUs (N=6).
Neighbour Joining
Method
Step1: We calculate the net divergence r (i)
for each OTU from all other OTUs
r(A) = 5+4+7+6+8=30
r(B) = 42
r(C) = 32
r(D) = 38
r(E) = 34
r(F) = 44
Neighbour Joining
Method
Step2: Now we calculate a new distance
matrix using for each pair of OTUs the
formula:
M(ij)=d(ij) - [r(i) + r(j)]/(N-2) or in the case
of the pair A,B:
M(AB)=d(AB) -[(r(A) + r(B)]/(N-2) = -13
Neighbour Joining
Method
-13
-11.5
-11.5
-10
-10
-10.5
-10
-10
-10.5
-13
-10.5
-10.5
-11
-11.5
-11.5
Neighbour Joining
Method
Step3: Now we choose as neighbors those two
OTUs for which Mij is the smallest. These are A
and B and D and E. Let's take A and B as
neighbors and we form a new node called U. Now
we calculate the branch length from the internal
node U to the external OTUs A and B.
S(AU) =d(AB) / 2 + [r(A)-r(B)] / 2(N-2) =
1 S(BU) =d(AB) -S(AU) = 4
Neighbour Joining
Method
Step4: Now we define new distances from
U to each other terminal node:
d(CU) = d(AC) + d(BC) - d(AB) / 2 =
3 d(DU) = d(AD) + d(BD) - d(AB) / 2
= 6 d(EU) = d(AE) + d(BE) - d(AB) /
2 = 5 d(FU) = d(AF) + d(BF) - d(AB)
/ 2 = 7 and we create a new matrix:
Neighbour Joining
Method
U
Advantages
Is fast and thus suited for large datasets
permits lineages with largely different
branch lengths
permits correction for multiple substitutions
Disadvantages
sequence information is reduced
gives only one possible tree
strongly dependent on the model of
evolution used.
Cladistic methods
The principle behind cladistic method is parsimony - a
hypothesis that requires less assumption.
Cladistic methods group organisms that share derived
characteristics in a branching hierarchy tree called a
cladogram.
In contrast with phenetic methods, cladistic methods
emphasize more on the evolutionary origin of species than
the relationships.
Assumes that a set of sequences descended from a
common ancestor by mutated and selected processes
Cladistic methods
An important concept in cladistic methods
is informative sites.
A position is considered as informative
when there are at least two different
nucleotides in multiple alignments at that
position and each of these nucleotides must
be present at least twice.
Informative and
Uninformative site
Position
Sequences
(Position 2 and 4 are informative while position 1

and 3 are uninformative).
Maximum parsimony
Maximum parsimony assumes that trees
with the minimum number of evolutionary
changes are the most preferable trees.
Maximum parsimony bases on the number
of character-state changes to construct all
possible trees and give each a score.
The most parsimonious tree is that with
fewest character-state changes.

1. Start with multiple alignment
2. Construct all possible topologies and base
on evolutionary changes to score each of
these topologies
3. Choose a tree with the fewest evolutionary
changes as the final tree.
Table 1: Raw data
#Difference
Outgroup
Seq1
Seq2
Seq3
Seq1 has lowest number of

derived character states or
difference, so it is connected to
Outgroup
Seq1
out group.
Seq2 is the next lowest number of
derived character states, so it is
joined to the tree next. The Seq2 is
connected to Seq1 because Seq2
and Seq1 differ each other only one
evolutionary step while Seq2
requires two evolutionary steps in
comparison with outgroup.
Outgroup
Seq2
Seq1
Seq3 is the next lowest differences, so it is the third to

connect to the tree. To find the position of Seq3 on the tree,
we need to evaluate every possible site of Seq3 on the tree.
a. If attach Seq3 to the line leading to out-group, 3 steps
would have to be made.
Out group: ATCGT
Seq3
: CAGGT
differences: 3
b. To attach Seq3 to the line leading to Seq1 requires 2
evolutionary steps
Seq3
: CAGGT
differences: 2
Seq1
: AACGT
c.
d.
To attach Seq3 to the line leading to Seq2, 1

evolutionary step is required
Seq3
: CAGGT
differences: 1
Seq2
: AAGGT
To attach Seq3 to the line leading to the point where the
lines leading to Seq1 and Seq2 intersect, 2 evolutionary
steps are required
Seq3 : CAGGT differences: 2
Seq1Seq2 : AACGT
3
2
Outgroup
1
Advantages
Reflect the ancestral relationship.
Use all known evolutionary information.
Disadvantages
Yield little information about branch length.
Require long computation time
Yield biased tree under some conditions.

Phylogenetic Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Phylogenetic Analysis

Uploaded by

Copyright:

Available Formats

Phylogenetic

How Does Genetic

Mutations that do have an

node : a node represents a taxonomic

Three types of trees

Three types of trees

The number of unrooted trees

Inferring evolutionary relationships

An unrooted, four-taxon tree

There are two major ways to

Molecular data 2 types of phylogenetic

Character based (parsimony, maximum

Distance based (UPGMA, neighbour

UPGMA (Unweighted Pair

UPGMA is the simplest method of phylogeny.

Steps for building a tree

Net Lowest Score

Steps for building a tree

The raw data of the tree are represented by the following

We have in total 6 OTUs (N=6).

(Position 2 and 4 are informative while position 1

Steps for building a tree

Table 1: Raw data

Seq1 has lowest number of

Seq3 is the next lowest differences, so it is the third to

To attach Seq3 to the line leading to Seq2, 1

You might also like