Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 57

Phylogenetic

analysis

Evolution
The theory of evolution is the
foundation upon which all of
modern biology is built.
From anatomy to behavior to genomics, the
scientific method requires an appreciation
of changes in organisms over time.
By looking at gene sequences, one can see
evolution in action.

Taxonomy
The study of the relationships
between groups of organisms is
called taxonomy
Taxonomy is the art of classifying
things into groups established as a
mainstream scientific field byCarolus
Linnaeus (1707-1778).

Phylogeny
Branching history of evolutionary lineages
New branches arise via speciation
Speciation occurs when gene flow is
severed between populations
Phylogenetic relationships depicted as a
tree

Phylogenetic data
Morphology
Secondary chemistry
Cytology
Allele frequencies
Protein sequences
Molecular data
Restriction sites
DNA sequences

Cladistic Methods
Cladistics was developed by Willi Hennig, a
German entomologist, in 1950.
Phylogenetic systematics (cladistics) is a
method of taxonomic classification of
organisms based on their evolutionary
history.
Cladistic methods construct a tree
(cladogram) by considering the various
possible pathways of evolution and choose
from among these the best possible tree.

Phenetic Methods
A phylogram is a tree with
branches that are proportional to
evolutionary distances.

How Does Genetic


Variation Occur?
Genome
Gene
Replication

AdvantageousSomatic Cells
Mutations
Disadvantageous
Germline Cells

Genetic changes
affecting the germline
(mutation)
Base substitutions
Insertions
Deletions
Exon shuffling
Transposition

Mutations that do have an


evolutionary effect can be divided
into two categories,
loss-of-function mutations
gain-of-function mutations.

Phylogenetic Tree
Terminology

node : a node represents a taxonomic


unit.
branch : defines the relationship between
the taxa in terms of descent and ancestry.
topology : is the branching pattern.
branch length : often represents the
number of changes that have occurred in
that branch.
root : is the common ancestor of all taxa.

Phylogenetic Tree
Terminology
distance scale : scale which represents the
number of differences between sequences
(e.g. 0.1 means 10 % differences between
two sequences)
Molecular Clocks
A molecular clock is a concept based on the
assumption that mutations occur at some
regular, more or less predictable rate. Thus,
if a certain amount of time has passed, a
certain number of mutations can be
expected to have occurred.

Phylogenetic Tree

Types of Trees
Rooted trees
A rooted tree infers the existence of an
actual common ancestor and defines the
evolutionary
paths
leading
to
the
development of each organism.
It provides an indication of the direction of
the evolutionary process, defining ancestral
and derived characters or species.

Taxon A
Taxon B
Taxon C
Taxon D

Types of Trees
Unrooted trees
An
unrooted
tree
shows
only
the
evolutionary relationships between the
organisms in the tree, and does not actually
infer the placement of a common ancestor in
the structure or the evolutionary path used to
obtain the current relationships. The direction
of the evolutionary process is not given.

Taxon C

Taxon A

Taxon D

Taxon B

Three types of trees


Cladograms - have no scale
Phylograms or Additive trees genetic distance or amount of
change
Ultrametric trees or true
evolutionary trees - time

Three types of trees


Cladogram

Phylogram
6

Taxon B
Taxon C
Taxon A
Taxon D
no
meaning

1
1

3
1

Ultrametric tree

Taxon B

Taxon B
Taxon C

Taxon C

Taxon A

Taxon A

Taxon D

Taxon D

genetic change

time

All show the same evolutionary relationships, or branching orders, between the taxa.

The number of unrooted trees


increases
in
a
greater
than
exponential manner with number of
Ataxa
B
C

A
B

D
E
A
B

D
E

Inferring evolutionary relationships


between the taxa requires rooting
the tree:
B
To root a tree
mentally, imagine
that the tree is made
of string. Grab the
A
string at the root
A
and tug on it until
the ends of the string
(the taxa) fall
opposite the root

Root

Unrooted tree

Rooted tree
Root

An unrooted, four-taxon tree


theoretically can be rooted in five
2
4
A
C
different places
to
produce
five
1
The unrooted
tree 1:
different
rooted 5trees
B
Rooted tree 1a

Rooted tree 1b

Rooted tree 1c

Rooted tree 1d

Rooted tree 1e

These trees show five different evolutionary relationships among the taxa!

There are two major ways to


root trees:

By outgroup:
Uses taxa (the outgroup) that are known
to fall outside of the group of interest (the
ingroup).
Requires some prior knowledge about the
relationships among the taxa.
The outgroup can either be species (e.g.,
birds to root a mammalian tree) or previous
gene duplicates (e.g., a-globins to root bglobins).

outgroup

By midpoint or distance:
Roots the tree at the midway point between the
two most distant taxa in the tree, as
determined by branch lengths.
Assumes that the taxa are evolving in a clocklike manner. This assumption is built into some
of the distance-based tree building methods.

Four steps of
phylogenetic analysis
Alignment
Choosing algorithm
Tree building
Tree evaluation

Alignment
Input for phylogenetic analysis is
usually multiple sequence alignment
Output only as good as input
Any problems in alignment lead to
false relationships
Code indels separately

Choosing algorithm
Character and Distance Data

Molecular data 2 types of phylogenetic


trees

Character based (parsimony, maximum


likelihood )

Distance based (UPGMA, neighbour


joining)

Phenetic methods or
Distance methods
Compress sequence information into a
single number
Two sequences with the shortest distance
are considered as closely related taxa.
Distance methods construct a tree in
stepwise manner, that is, the two most
similar sequences are grouped together,
then the next most similar sequences, and
so on.

Phenetic methods or
Distance methods
Phenetic methods are based on the relative
numbers of similarities and differences to group
organisms in a branching hierarchy tree called a
phenogram.
In the phenogram,taxa with the shorter distances
are classified more closely than those that do not.
Distance methods are more interested in the
relationships among data sets than evolutionary
pathway.

UPGMA (Unweighted Pair


Group Method with
Arithmetic mean)

UPGMA is the simplest method of phylogeny.


It uses clustering approach to build a tree.
This method is only suitable with datasets
consisting lineages with relatively constant
rates of evolution.

Steps for building a tree


1. Construct distance matrix.
2. Cluster the two shortest distance OTUs into
an internal nodes.
3. Recalculate the distance matrix.
4. Repeat the process until all OTUs are
grouped in a single cluster

UPGMA (Distance
Based)
A

Lowest Score=2
B

Group A & B

UPGMA (Distance
Based)
AB

AB

Net Lowest Score


=DE=4

Example:
AB/C=AC+BC/2
=4+4/2
=4

B
F

UPGMA (Distance
Based)
AB

AB
C

DE

DE

(AB)C

DE

0
4

(AB)C

DE

D
E

A
B
C

UPGMA (Distance
Based)
(AB)C(DE)

A
B

(AB)C(DE)

C
0

D
E

Advantages
Simple
Less time consuming
Accurate Method

Disadvantages
Look for more similar sequences
Subsitution Neglected

Neighbour Joining
Method
A Neighbor joining is also called as star
decomposition method
The phylogenetic tree is constructed from a
star-like tree by grouping OTUs with
shortest distance of branch length together.
This method is very suitable with dataset
consisting descendants with largely varying
rates of evolution.

Steps for building a tree


1.Start with distance matrix and star-like tree.
2.Group the two most similar taxa into a node
and calculate the branch length.
3.Recalculate the distance matrix and branch
length and construct a new tree.
4.Repeat the process until only one terminal is
present.

Neighbour Joining
Method

The raw data of the tree are represented by the following


distance matrix:

10

11

We have in total 6 OTUs (N=6).

Neighbour Joining
Method
Step1: We calculate the net divergence r (i)
for each OTU from all other OTUs
r(A) = 5+4+7+6+8=30
r(B) = 42
r(C) = 32
r(D) = 38
r(E) = 34
r(F) = 44

Neighbour Joining
Method
Step2: Now we calculate a new distance
matrix using for each pair of OTUs the
formula:
M(ij)=d(ij) - [r(i) + r(j)]/(N-2) or in the case
of the pair A,B:
M(AB)=d(AB) -[(r(A) + r(B)]/(N-2) = -13

Neighbour Joining
Method

-13

-11.5

-11.5

-10

-10

-10.5

-10

-10

-10.5

-13

-10.5

-10.5

-11

-11.5

-11.5

Neighbour Joining
Method
Step3: Now we choose as neighbors those two
OTUs for which Mij is the smallest. These are A
and B and D and E. Let's take A and B as
neighbors and we form a new node called U. Now
we calculate the branch length from the internal
node U to the external OTUs A and B.
S(AU) =d(AB) / 2 + [r(A)-r(B)] / 2(N-2) =
1 S(BU) =d(AB) -S(AU) = 4

Neighbour Joining
Method
Step4: Now we define new distances from
U to each other terminal node:
d(CU) = d(AC) + d(BC) - d(AB) / 2 =
3 d(DU) = d(AD) + d(BD) - d(AB) / 2
= 6 d(EU) = d(AE) + d(BE) - d(AB) /
2 = 5 d(FU) = d(AF) + d(BF) - d(AB)
/ 2 = 7 and we create a new matrix:

Neighbour Joining
Method
U

Advantages
Is fast and thus suited for large datasets
permits lineages with largely different
branch lengths
permits correction for multiple substitutions

Disadvantages
sequence information is reduced
gives only one possible tree
strongly dependent on the model of
evolution used.

Cladistic methods
The principle behind cladistic method is parsimony - a
hypothesis that requires less assumption.
Cladistic methods group organisms that share derived
characteristics in a branching hierarchy tree called a
cladogram.
In contrast with phenetic methods, cladistic methods
emphasize more on the evolutionary origin of species than
the relationships.
Assumes that a set of sequences descended from a
common ancestor by mutated and selected processes

Cladistic methods
An important concept in cladistic methods
is informative sites.
A position is considered as informative
when there are at least two different
nucleotides in multiple alignments at that
position and each of these nucleotides must
be present at least twice.

Informative and
Uninformative site
Position

Sequences

(Position 2 and 4 are informative while position 1


and 3 are uninformative).

Maximum parsimony
Maximum parsimony assumes that trees
with the minimum number of evolutionary
changes are the most preferable trees.
Maximum parsimony bases on the number
of character-state changes to construct all
possible trees and give each a score.
The most parsimonious tree is that with
fewest character-state changes.

Steps for building a tree


1. Start with multiple alignment
2. Construct all possible topologies and base
on evolutionary changes to score each of
these topologies
3. Choose a tree with the fewest evolutionary
changes as the final tree.

Table 1: Raw data

#Difference

Outgroup

Seq1

Seq2

Seq3

Seq1 has lowest number of


derived character states or
difference, so it is connected to

Outgroup

Seq1

out group.
Seq2 is the next lowest number of
derived character states, so it is
joined to the tree next. The Seq2 is
connected to Seq1 because Seq2
and Seq1 differ each other only one
evolutionary step while Seq2
requires two evolutionary steps in
comparison with outgroup.

Outgroup

Seq2

Seq1

Seq3 is the next lowest differences, so it is the third to


connect to the tree. To find the position of Seq3 on the tree,
we need to evaluate every possible site of Seq3 on the tree.
a. If attach Seq3 to the line leading to out-group, 3 steps
would have to be made.
Out group: ATCGT
Seq3
: CAGGT
differences: 3
b. To attach Seq3 to the line leading to Seq1 requires 2
evolutionary steps
Seq3
: CAGGT
differences: 2
Seq1
: AACGT

c.

d.

To attach Seq3 to the line leading to Seq2, 1


evolutionary step is required
Seq3
: CAGGT
differences: 1
Seq2
: AAGGT
To attach Seq3 to the line leading to the point where the
lines leading to Seq1 and Seq2 intersect, 2 evolutionary
steps are required
Seq3 : CAGGT differences: 2
Seq1Seq2 : AACGT
3
2
Outgroup
1

Advantages
Reflect the ancestral relationship.
Use all known evolutionary information.

Disadvantages
Yield little information about branch length.
Require long computation time
Yield biased tree under some conditions.

You might also like