Module4 Session1 Prac Lucy Nakabazzi 2

Introduction to Bioinformatics online course: IBT
Practical assignment
Module topic: Multiple Sequence Alignment

Contact session title: Building a Multiple Sequence Alignment
Trainer: Ahmed M. Alzohairy
Name: Lucy Nakabazzi
Date: 9th September, 2022
Assignment – Building a Multiple Sequence Alignment
Introduction
Building multiple sequence alignments (MSA) is far from an exact science. In fact, it’s
more art than science, requiring that you use everything you know in bioinformatics and in
biology. The main idea behind building a multiple sequence alignment is to put similar
amino acids or nucleotides in the same column if they contain the same criterion. There
are four major criteria most scientists use to build a multiple alignment of sequences that
all have different properties.
Tools used in this session
The most popular programs for building MSA: we will see the differences between Cobalt,
ClustalW, MUSCLE, Tcoffee. For analysis: we will use other online tools.
Please note
 Hand-in information If you are formally enrolled in the IBT course, please upload
your completed practical assignment to the Vula ‘Practical Assignments’ tab. Take
note of the final hand-in date for each practical assignment, which will be indicated
on Vula.
Task1: Reading the slides
Task 1: instructions
Read and understand the contents of the powerpoint slides
1- State major characterized differences between Cobalt, ClustalW, MUSCLE,
Tcoffee?
Ans
Cobalt ClustalW MUSCLE Tcoffee
COBALT ClustalW MUSCLE stands It provides a simple
computes a is a for MUltiple Sequence Compariso and ¯flexible means
multiple general- n by Log- Expectation. MUSCLE of generating
protein purpose is claimed to achieve both better multiple alignments,
sequence DNA or average accuracy and better speed using heterogeneous
alignment protein than ClustalW2 or T-Coffee, data sources. The
using multiple depending on the chosen options. data from these
conserved sequence sources are
domain and alignment Can align up to 500 sequences or a provided to T-
local program maximum file size of 1 MB. Coffee via a library
sequence for three of pair-wise
similarity or more alignments.

information sequences
2- What are the Main Applications of Multiple Sequence Alignments?
Ans:
 Extrapolation
 Phylogenetic analysis
 Pattern identification
 Domain identification
 DNA regulatory elements
 Structure prediction
 nsSNP analysis
 PCR analysis
Task 2: Retrieve your protein sequences

We need to retrieve protein sequences (eg. Heat shock protein “HSP70”) from
evolutionary different organisms (Human, Rat, Mouse, Arabidopsis, Chicken, Pig)
- Go to https://www.expasy.org/proteomics
- Search for HSP70 (Heat Shock Protein70)
- Click on (UniProtKB http://www.uniprot.org/uniprot/?query=HSF1&sort=score )
- Retrieve your protein sequences (eg. Heat Shock Protein 70 “HSP70”) from different
organisms
- Select your organism (Human (P0DMV8), Rat ((P0DMW1)), Mouse (Q61696),
Arabidopsis (P22953), Bos Taurus (Q27975), Escherichia coli (P0A6Y8))
- Click Download (Download Selected) then (Go)
- Save it in FASTA format in one text file.
- Align the sequences using Clustal Omega
- Check the gene-based phylogentics tree
Based on the tree:
1- What is the name the MSA program used in UniportKB?

Ans: Clustal Omega
2- How many identical positions are there and what is the identity percentage?
3- How many similar positions are there between those sequences?
4- What is the most different organism in “HSP70” from your list?
Ans: Escherichia coli
5- What is the most similar sequence to Human HSP70 from the set of sequences?
Ans: Bos Taurus
Task 3: Extracting annotation (identifying conserved domain)
From the right-hand side panel you can extract some important information
- Identify the first four Amino Acids symbols at the Beta Strand in Human?
Ans: IGID
- Identify the first four Amino Acids symbols at the Helix in E.Coli?
Ans: QPAK
Task4: Extracting amino acid properties

From the right-hand side panel you can extract some important information
- Identify the first three conserved aromatic amino acids in all specified organisms?
Ans: FFF
- Identify the first three hydrophobic acids in Human?
Ans: AKA
* The student can choose any three
Task5: Interpretation
Check the information in the last lecture slide
6- The last line contains seemingly ClustalW, MUSCLE, or Tcoffee alignment,
cabalistic signs such as (*), (:), or (.), what do they mean?
Ans:
*- Indicates an entirely conserved column
: -indicates columns where all the residues have roughly the same size and the same
hydropathy
. - indicates columns where the size or the hydropathy has been preserved in the
course of evolution.
7- What could conserved K (Lysine), R (Arginine), D (Aspartic Acid), E (Glutamic

Acid) mean in protein MSA?
Ans:
These charged amino acids are often involved in ligand binding. Highly conserved
columns indicate a salt bridge inside the core of a protein.

Module4 Session1 Prac Lucy Nakabazzi 2

Uploaded by

Copyright:

Available Formats

You might also like

Module4 Session1 Prac Lucy Nakabazzi 2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module4 Session1 Prac Lucy Nakabazzi 2

Uploaded by

Copyright:

Available Formats

Introduction to Bioinformatics online course: IBT

Module topic: Multiple Sequence Alignment

Assignment – Building a Multiple Sequence Alignment

Tools used in this session

Task1: Reading the slides

similarity or more alignments.

2- What are the Main Applications of Multiple Sequence Alignments?

Task 2: Retrieve your protein sequences

Based on the tree:

1- What is the name the MSA program used in UniportKB?

Task 3: Extracting annotation (identifying conserved domain)

Task4: Extracting amino acid properties

7- What could conserved K (Lysine), R (Arginine), D (Aspartic Acid), E (Glutamic

You might also like