Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

Nicole C.

Mendoza Jerez
Currently, DNA sequencing has advanced to the point where it can produce
comprehensive genomic information, not just for individual representatives of a handful
of species, but for numerous individuals from various populations of the same species
and even entire ecological communities. This wealth of data at multiple levels has led to
the emergence of novel fields of research, such as population genomics and
metagenomics, along with fresh theoretical, computational, and data management
complexities. An especially thrilling aspect of these advancements is the opportunity to
directly observe the evolutionary dynamics thanks to the capabilities of new genomic
technology. In biology, evolution is the change in the characteristics of a species over
several generations and relies on the process of natural selection. Evolutionary biology
is being transformed by increasing access to growing data about the genome, organisms,
and environment. All of them can be linked to the Tree of Life (phylogeny), from
populations to entire clades, and can be included through new biodiversity informatics
protocols and networks. One crucial tool associated to this phylogenetic analysis is
Multiple Sequence Alignment (MSA). It’s used to compare and analyze multiple
biological sequences, such as DNA, RNA, or protein and to identify conserved
regions/similarities across multiple sequences by aligning them so we can identify
shared patterns, functional domains, and regions that have evolved from a common
ancestor. For example, using the protein and nucleotide sequences of Cytochrome c
oxidase I (COX1) also known as mitochondrially encoded cytochrome c oxidase I (MT-
CO1) of 9 different species: the coelacanth (Latimeria chalumnae), the great white
shark (Carcharodon carcharias), skipjack tuna (Katsuwonus pelamis), sea lamprey
(Petromyzon marinus), Myers' Surinam toad (Pipa myersi), Nile crocodile (Crocodylus
niloticus), Chinese cobra (Naja atra), leatherback sea turtle (Dermochelys coriacea) and
domestic pigeon (Columba livia domestica) we intend to find out if there’re significant
differences between them considering that all of these species are mammals from the
vertebrates subphylum that we also share with fishes, sharks, birds, and reptiles,
amphibia, and primitive jawless fishes (example: lampreys). These differences will be
analyzed by comparing the resulting Multiple Sequence Alignments using 2 different
software packages but first we extracted each nucleotide and amino acid sequence from
the NCBI database of all the 9 species in a multi-fasta format resulting in 2 different
files.
Various algorithms and methods are used for performing MSA, such as ClustalW,
Clustal Omega, Muscle, MAFFT, T-Coffee and each of them employ different strategies
to align sequences based on pairwise or progressive methods. MSA algorithms typically
use scoring systems to assess the similarity between sequences and optimize the
alignment being the most common scoring methods the substitution matrices (e.g.,
BLOSUM, PAM), gap penalties, and similarity scores. In this exercise we’ll use the T-
Coffee and ClustalW algorithms.
MSA of the Nucleotide Multi-Fasta File in T-Coffee

In this MSA it can be noted that the nucleotide sequences compared are not very similar
since only the bits with the asterisk are observed to be the same. This may mean that the
species have different evolutionary origins, but they certainly share a common ancestor
since several fragments of the sequences are still aligned.
MSA of the Amino acid Multi-Fasta File in T-Coffee

In this MSA it can be observed that the amino acid sequences compared are even less
similar to each other compared to the nucleotide sequences since only those with the
asterisk are considered exactly the same, if there are two dots they are moderately
similar, if there is only one dot they are slightly similar and finally if there is only a
space it means that there is no resemblance. This difference may be due to the fact that
proteins have much more diverse and specific functions, so it is difficult for them to be
the same in each region.
MSA of the Nucleotide Multi-Fasta File in ClustalW
MSA of the Amino acid Multi-Fasta File in ClustalW
Using the figures below, think of an organism that could be used as an outgroup for
inferring and deriving evolutionary trees between the above sequences.
An organism that could be used as an outgroup between the above sequences is the
leatherback sea turtle (Dermochelys coriacea)

You might also like