Module4 Session1 Prac Lucy Nakabazzi 2

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Introduction to Bioinformatics online course: IBT

Practical assignment

Module topic: Multiple Sequence Alignment


Contact session title: Building a Multiple Sequence Alignment
Trainer: Ahmed M. Alzohairy
Name: Lucy Nakabazzi
Date: 9th September, 2022

Assignment – Building a Multiple Sequence Alignment

Introduction

Building multiple sequence alignments (MSA) is far from an exact science. In fact, it’s
more art than science, requiring that you use everything you know in bioinformatics and in
biology. The main idea behind building a multiple sequence alignment is to put similar
amino acids or nucleotides in the same column if they contain the same criterion. There
are four major criteria most scientists use to build a multiple alignment of sequences that
all have different properties.

Tools used in this session

The most popular programs for building MSA: we will see the differences between Cobalt,
ClustalW, MUSCLE, Tcoffee. For analysis: we will use other online tools.

Please note
 Hand-in information If you are formally enrolled in the IBT course, please upload
your completed practical assignment to the Vula ‘Practical Assignments’ tab. Take
note of the final hand-in date for each practical assignment, which will be indicated
on Vula.

Task1: Reading the slides

Task 1: instructions
Read and understand the contents of the powerpoint slides
1- State major characterized differences between Cobalt, ClustalW, MUSCLE,
Tcoffee?
Ans
Cobalt ClustalW MUSCLE Tcoffee
COBALT ClustalW MUSCLE stands It provides a simple
computes a is a for MUltiple Sequence Compariso and ¯flexible means
multiple general- n by Log- Expectation. MUSCLE of generating
protein purpose is claimed to achieve both better multiple alignments,
sequence DNA or average accuracy and better speed using heterogeneous
alignment protein than ClustalW2 or T-Coffee, data sources. The
using multiple depending on the chosen options. data from these
conserved sequence sources are
domain and alignment Can align up to 500 sequences or a provided to T-
local program maximum file size of 1 MB. Coffee via a library
sequence for three of pair-wise
Introduction to Bioinformatics online course: IBT

similarity or more alignments.


information sequences

2- What are the Main Applications of Multiple Sequence Alignments?

Ans:
 Extrapolation
 Phylogenetic analysis
 Pattern identification
 Domain identification
 DNA regulatory elements
 Structure prediction
 nsSNP analysis
 PCR analysis

Task 2: Retrieve your protein sequences


We need to retrieve protein sequences (eg. Heat shock protein “HSP70”) from
evolutionary different organisms (Human, Rat, Mouse, Arabidopsis, Chicken, Pig)

Task 2: instructions
- Go to https://www.expasy.org/proteomics
- Search for HSP70 (Heat Shock Protein70)
- Click on (UniProtKB http://www.uniprot.org/uniprot/?query=HSF1&sort=score )
- Retrieve your protein sequences (eg. Heat Shock Protein 70 “HSP70”) from different
organisms
- Select your organism (Human (P0DMV8), Rat ((P0DMW1)), Mouse (Q61696),
Arabidopsis (P22953), Bos Taurus (Q27975), Escherichia coli (P0A6Y8))
- Click Download (Download Selected) then (Go)
- Save it in FASTA format in one text file.
- Align the sequences using Clustal Omega
- Check the gene-based phylogentics tree

Based on the tree:

1- What is the name the MSA program used in UniportKB?


Ans: Clustal Omega
2- How many identical positions are there and what is the identity percentage?
3- How many similar positions are there between those sequences?
4- What is the most different organism in “HSP70” from your list?
Ans: Escherichia coli
5- What is the most similar sequence to Human HSP70 from the set of sequences?
Ans: Bos Taurus

Task 3: Extracting annotation (identifying conserved domain)

Task 3: instructions
From the right-hand side panel you can extract some important information
- Identify the first four Amino Acids symbols at the Beta Strand in Human?
Ans: IGID
Introduction to Bioinformatics online course: IBT

- Identify the first four Amino Acids symbols at the Helix in E.Coli?
Ans: QPAK

Task4: Extracting amino acid properties


Task 4: instructions
From the right-hand side panel you can extract some important information
- Identify the first three conserved aromatic amino acids in all specified organisms?
Ans: FFF
- Identify the first three hydrophobic acids in Human?
Ans: AKA
* The student can choose any three

Task5: Interpretation
Task 3: instructions
Check the information in the last lecture slide
6- The last line contains seemingly ClustalW, MUSCLE, or Tcoffee alignment,
cabalistic signs such as (*), (:), or (.), what do they mean?
Ans:
*- Indicates an entirely conserved column
: -indicates columns where all the residues have roughly the same size and the same
hydropathy
. - indicates columns where the size or the hydropathy has been preserved in the
course of evolution.

7- What could conserved K (Lysine), R (Arginine), D (Aspartic Acid), E (Glutamic


Acid) mean in protein MSA?
Ans:
These charged amino acids are often involved in ligand binding. Highly conserved
columns indicate a salt bridge inside the core of a protein.

You might also like