Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 01

DATE:

INTRODUCTION

BIO INFORMATICS:

Bioinformatics is the use of IT in biotechnology for the data storage, data warehousing and
analyzing the DNA sequences in Bioinformatics. Knowledge of many branches are required
like biology, mathematics, computer science, law of physics and chemistry, and of course
sound knowledge of IT to analyze biotech data.

Bioinformatics is an interdisciplinary field that develops and improves upon method for
storing, retrieving, organizing and analyzing biological data. Bioinformatics is the application
of computer technology to the management of biological information computer is used to
gather, store, analyze and integrate the data.

Term bioinformatics was invented by Paulien Hogeweg and Ben Hesper in 1970 as " the
study of informatics process in biotic systems". Paulien Hogeweg is a Dutch theoretical
biologist and complex system researcher studying biological system as dynamic information
processing system at many interconnected levels.

BI TOOLS MAY BE CATEGORIZED INTO FOLLOWING


CATEGORIES:

1) Homology and similarity


Homology sequences are sequences that are related by divergence from a common
ancestor. Thus the degree of similarity between two sequences can be measured while
their homology is a case of being either true or false. This set of tools can be used to
identify similarities between novel query sequences of unknown structure and
function with database sequences whose structure and function has been elucidated.
2) Protein function analysis
This group of programs allows you to compare your protein sequence to the
secondary (or derived) protein database that contains information on motifs, and

1 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

protein domains. Highly significant hits against this different pattern database allow
you to approximate the biochemical function of your query protein.
3) Structural analysis
These sets of tools allow you to compare structure with the known structure database
the function of a protein is more directly a consequence of its sequence with structural
homology tending to share function to the determination of protein's 2D/3D structure
in the study of its function.
4) Sequence analysis
This set of tools allows you to carry out further, more detailed analysis on your query
sequence, including evolutionary, analysis, and identification of mutations. CpG
islands and compositional biases. The identification of the biological properties are all
clues that aid the search to elucidate the specific function of your sequence.

APPLICATIONS

Bioinformatics joins mathematics, statistics and computer sciences an information technology


to solve complex biological problem. These problems are usually at the molecular level
which cannot be solved by other means. This interesting field of science has many
applications in research areas where it can be applied.

1) Sequence analysis: The applications of sequence analysis determine those genes


which encode regulatory sequences or peptides by using the information of sequences.
For sequencing analysis, there are many powerful tools and computers and tools also
the DNA mutations in an organism and also detect, identify those sequences, which
are related. Short gun sequence techniques are also used for sequence analysis of
numerous fragment of DNA. Special software is used to see the overlapping of
fragments and their assembly.
2) Prediction of protein structure: It is easy to determine the primary structure of
protein in the form of amino acids, which are present on the DNA molecules, but it is
difficult to determine secondary, tertiary and quaternary structures of proteins. For
this purpose either the method of crystallography is used or tools of bioinformatics
can also be used to determine the complex the complete protein structures.

2 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

3) Genome annotation: The genome annotations, genomes are marked to know the
regulatory sequences and protein coding. It is very important part of the human
genome project as it determines the regulatory sequences.
4) Comparative genomics: It is the branch of bioinformatics, which determines the
genomic structure and function relation between different biological species. For this
purpose, intergenomic maps are constructed which enable the scientists to trace the
processes of evolution that occur in genomes of different species. These maps contain
the information about the point’s mutation as well as the information about the
duplication of large chromosomal segments.
5) Health and drug discovery: The tools of bioinformatics are also helpful in drug
discovery, diagnosis and disease management, complete sequencing of human genes
has enabled the scientists to make medicines and drugs, which can target more than
5000 genes. Different computational tools and drug targets has made the drug
delivery easy and specific because now only those cells can be targeted which are
diseased or mutated. It is also easy to know the molecular basis of disease.

3 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 02

DATE:

DIFFERENT DATABASES AVAILABLE AND SEARCH


TOOLS

DATABASE: Database is a collection of information organized in such a way that a


computer program can quickly select desired pieces of data such that information can be
searched, compared, retried and analyzed.

TYPES OF DATABASES: Scientific database can be classified on the data available as


follows:

BIOLOGICAL DATABASES:

These are the computer sites that organize store and disseminate files that contain information
consisting of literature, nucleic acid sequence, protein sequence and protein structure.

Another classification of biological database include

4 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

5 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

SECONDARY PRIMARY SOURCES STORAGE


DATABASE INFORMATION

PROSITE SWISS-PROT Regular expressions(patterns)

Profiles SWISS-PROT Weighted matrices (profiles)

PRINTS OWL Aligned motifs(fingerprints)

Pfam SWISS-PROT Hidden MARKOV model


(HMMs)

BLOCKS PROSITE/PRINTS Aligned motifs (blocks)

IDENTIFY BLOCKS/PRINTS Fuzzy regular expressions

6 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

Three major types of pharmacoinformatics databases are:

1) Pharmaceutical
Databases
a) Literature
Databases
b) Chemical
Databases

2) Biological
Databases
a) Structure
Databases
b) Sequence
Databases

7 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

3) Relational Databases
a) Structured Query
Language (SQL)
b) Practical Extraction and
Report Language (PERL)

8 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

LIST OF WEBSITES:

NCBI National center for www.ncbi.nlm.nih.gov/


Biotechnology Information

EMBL European Molecular www.embl.org


Biology Laboratory

DDBJ DNA Data Bank Of Japan www.ddbj.nig.ac.ip/

GenBank - www.ncbi.nlm.nih.gov/genbank/

PIR Protein Information Pri.georgetown.edu/


Resource

MIPS Martinsried Institute for www.mips.biochem.mpg.de/


protein sequences

Swissprot - www.ebi.ac.uk/swissprot/

Prosite - www.prosite.expasy.org/

CATH Class, Architecture, www.cathdb.info/


Topology, Homology

Profiles - www.profileddatabase.com

Prints - Http://www.bionif.manchester.Ac.uk/bdbrowser
/PRINTS/index.php

Pfam - http://www.sanger.ac.uk/resources/database/pfam.html

SCOP Structural Classification Of Scop.mrc-imb.cam.ac.uk/scop


Proteins

PDB Protein Data Bank www.resb.org/

Blocks - http://blocks.fthere.org/

Gene cards - www.genecards.org

Uni Gene - http://www.ncbi.nlm.nah.gov/sites/entrez?db=unigene

SGD Saccharomyces Genome http://www.yeastgenome.org/


Database

Ensembl - www.ensembl.org

9 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

REPORT: Different databases and search tools were studied.

EXP NO: 03 (a)

DATE:

CREATION OF DATABASE TABLE USING MS ACCESS

AIM: To create a database table using MS access.

PRINCIPLE: Access is a relational database management system (RDBMS) that can be


used to store and manipulate large amount of information. A relational database, which
consists of table of, related information that is linked together based on a key field.

MAIN ELEMENTS OF ACCESS: Access is an object-oriented program. Objects are


modules, which provide information and programs, which the user can directly apply to
create applications. Access has the following objects:

 DATABASE WINDOW: In access all objects of a database are stored in a single file
and the file name has an MSD extension. These objects are managed through the
database window.

10 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

 TABLES: Tables are primary building blocks of the access database. All databases is
stored in table. Every table in the database focuses on one subject; like customers,
orders, or products. Every row or record in the table is a unique instance of the subject
of the table.
 QUERIES: a query is a question that you ask the data that is stored in the tables of
your database. Likewise a query can be created that only asks for the customers who
reside in a particular state. Most access database contains more than one table, specific
fields from multiple tables into database sheet. The database that a query returns is
called a record set.
 FORMS: Forms present the data from a table or a query in a way we want it to be
represented. The fields in the table or query are made available to a place on the forms
we create. We can also edit the forms just as you would edit a datasheet bound table
or query.
 REPORTS: Reports are still necessary for printing the result of the data we store, with
access we can quickly and easily design reports based on our data.
 MACROS: Macros provide an easy, effective method for automating many database
tasks. Macros can be used for everything for displaying message boxes to validating
data entered into a record before it is saved.

STEPS FOR CREATING A DATABASE IN ACCESS:

1) STEP 1: Access Icon


Starting access is exactly the same as starting any other window application. Double
click on the Microsoft Access icon found in the Microsoft Office.

11 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

2) STEP 2: Creating A Database


 Choose New Database from the file menu. The window displayed asks for the name
of the new database and the location from where we want the file to be stored. The
file should be saved with an extension .mdb for example student.mdb .
 Click on the OK button to create the database.

3) STEP 3: Database Window


 A database window is displayed on the screen. This window provides access to tables,
queries, forms or other objects that we create.

12 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

 The next step is to create the table, which will store the data. Generally forms, queries
and reports are based on a table or on multiple tables. For example, a student
database holds student’s personal details such as name, ID number and the courses
they attend.

4) STEP 4: The Table Wizard


Access comes with complete various wizards which helps to create new objects
such as tables and queries, the table wizard allows using one of the many tables that
the designers of access have already created. The tables will suit many of the things
for which we want to use access.
To create a table,
 Click on the table button in the database window,

13 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

 Click on the create database button


 This displays a table, which is used to add to hold the data.

5) STEP 5: Creating A Structure For The Table


A database consists of a table. The structure for a table has to be defined before it is
created. The table structure consists of columns, which are called fields, and rows, which
are referred as records. A collection of records is one table.
 ADDING FIELDS:
To create a table we need to add the fields (or columns) to hold the data. The field
name can have as many as 64 characters and can contain any character, number and
spaces except for the period, an accent grave, square brackets and exclamation marks
because these characters have special meaning in conjunction with file name in
access. Spaces are not allowed so we need to begin each field name with a valid
character or number.

14 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

 DATA TYPES:
Data types of fields determine the kind of the data that field can store. The first field
in the student details table is the ID field. Type the field name as student id in the first
row. Access provides several data types, which determines the number in which the
data entered into the field is stored. The data types can be accessed by clicking on the
arrow at the right of the data type cell. This displays a drop-down list of different data
types.

15 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

The student ID field will be used as a primary key for the record. Each student’s record will
have a different student ID number so as to uniquely identify it. Primary key is set later in the
session. Also ID number can be assigned in last when creating a new record by the use of
data type counter. This means that access will automatically assign a student ID to each new
record beginning with 1 incremented by 1 for each new record. To set a data type counter
drop, go down the data type list and select counter. Also enter various fields with their
corresponding to the data like addresses, total marks, remarks, etc.

6) STEP 6: Setting The Primary Key


One of the major benefits of using a relational database is the ease with which we can query
the data to extract meaningful information and produce effective reports. Relational database
can use several types of keys; the most common being the primary, composite and foreign
keys. As a rule each table in a relational database has one or more fields that uniquely
identify each record. In the table this unique identifier is called as primary key. It is set
appropriate before finishing the designing of the table. Generally student ID field will be used
as the key. Select ID; clicking on it an arrow will be displayed before the field name. Click
on primary key button on the tool bar to set this field as primary key. Now a key symbol is
appearing next to the field name.

16 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

7) STEP 7: Closing The Table


Closing the window is possible by selecting close from the file menu. A message box will
appear asking you whether or not to save the changes to the table.

Click on yes button in the message box, which appears, replace table 1 the default table name
with student details. This will be table name.

Click on OK button to complete the process. The database window will now include students
details table in the list of tables.

17 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

8) STEP 8: Entering Data


Hence table is already created, we can now enter the data. To enter data, select the table
student details. Now a window with blank table with highlighted fields Student ID
number 1 can be noticed. A pencil symbol at the left of the student ID indicates the record
into which we are entering the data and the asterisk indicates the last or an empty record.
Repeat this process and complete the record and the table.

18 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

9) STEP 9: Deleting A Record


To delete record in a table, we should choose EDIT from the menu bar. A drop-down
menu will be displayed. Select DELETE from the drop-down menu. Access will ask for
confirmation whether we wish to save the file with the new changes.

10) STEP 10: Inserting A Field


 To insert a new field in the existing field firstly switch back to design view
button on the toolbar.

19 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

 Position the cursor anywhere in the course field.


 Now the pointer showing the field selected will appear next to the field course
name.
 Choose insert row from the edit menu, a blank line appears.
 Type required field name.
 Press the tab key to move the data type column.

20 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

11) STEP 11: Closing The Database And Existing From Access
The last step is to close the database and exit from the access. To do that,
 Close the table window by selecting CLOSE from the file menu.
 Choose EXIT from the file menu to exit from access.

REPORT: Creation of database table using MS access was done.

21 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 03 (b)

DATE:

GIVING A QUERY TO THE EXISTING DATABASE USING


MS ACCESS QUERY

AIM: To give a query to the existing database.

SOFTWARE USED: MS ACCESS

PRINCIPLE: The real advantage of a database is to see the data selectively and in the
order you want to see it. This is possible by specifying conditions for the display of data and
is referred to as query. The data in a query can come from one or more table. After Microsoft
Access retrieves the data that answers our query, we can view and analyze the data once we
have created. A query is a conditional selection of data, for example, which employees work
in the production department. Access will find the records, which meet the criteria we
specify, and order or update them as discussed.

PROCEDURE:

1) OPENING AN EXISTING DATABASE:


The first step is to open an existing database. To do so,
 Choose OPEN from the file menu.
 The Employee database contain only one table which holds the details of
employees of the company.
 Select the Employee database from the list of the database displayed in the
database window.
 Select the employee details in the database window, if it is not already
highlighted.
 Click on the open button to view the information stored in the table.
 Choose CLOSE from the file menu to close the table window so we can see the
main Database window.

22 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

2) CREATING A NEW QUERY:


Assume that we want a list of all the companies who works in the salesman
department. To do this we will create a new query, which we will select from the one,
which we are interested.
 Click on the TAG QUERY in the database window.
 Click on the NEW button to create a new query.
 The new query dialog will appear as shown to create a query.
 Click on the NEW QUERY button. A list of table will be displayed.
 Check that the Employee details table is highlighted, and then click on the ADD
button.

23 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

This ‘add the employee details to the query’ means that we can use the fields
from the table in the query, which we are going to create.
 Click on the CLOSE window to close the add table dialogue box.

3) QUERY WINDOW:
In a query we can design a query using a feature called Graphical query for example
(QBE). With Graphical QBE, we can create queries by dragging fields from the upper
portion of the query of the window to the QBE grid in the lower portion of the
window. In the QBE grid, each column contains information about a field included in
the query.

24 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

4) ADDING FIELDS TO THE QUERY:


The first step to create a query is to specify the field on which we want to query the
table, and the field that contain the data we want to display.
To add the field to the query:
 Click the pull down menu to display the field names.
 Click on the field last name to add it to the query.
 Add the position field to the query in the same way. Use the scroll bar to move
down the field list if position is not displayed. The query window will appear as
shown.

5) SETTING THE CRITERIA:


To display only those records where the employees work in the salesman department,
set the criteria for the field as:
 Position the cursor in the criteria panel for the field position.
 Type salesman in the criteria panel.
6) DISPLAY THE RESULTS:
The query is now complete. To view the results switch from the Design view to
Datasheet view.
To do that:
 Click on the DATASHEET VIEW button on the toolbar.

25 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

7) CLOSING AND SAVING THE QUERY:


To close and save the query:
 Choose CLOSE from the file menu.
 Click on the YES button to save the query.
 Enter the name of the query as sales representative query and click on the OK
button to complete the process.

8) PROCEDURE:
 Go to MS-ACCESS from MS-OFFICE.
 Select the query from the toolbox.
 Create a query after selection of the table and perform query and check the result and
select visualize.

REPORT: Queries were given to existing databases in MS ACCESS.

26 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 04

DATE:

PAIRWISE SEQUENCE ALIGNMENT USING DYNAMIC


PROGRAMING ALGORITHM

AIM: To carry out dynamic programming and align the two given sequences by local and
global alignment.

PRINCIPLE: Sequence alignment is a way of arranging the sequences of DNA, RNA, or


protein to identify regions of similarity that may be a consequence of functional, structural, or
evolutionary relationship between the sequences. Aligned sequences of nucleotide or amino
acid residues are typically represented as rows within a matrix. Gaps are inserted between the
residues so that identical or similar characters are aligned in successive columns. Sequence
alignments are also used for non-biological sequences, such as those present in natural
language or in financial data.

Ex:

Histone H1 (residues 120-180)

A sequence alignment produced by Crustal, of mammalian histone proteins. Sequences are


the amino acids for residues 120-180 of the proteins. Residues that are conserved across all
sequences are highlighted in grey. Below the protein sequences is a key denoting conserved
sequence (*), conservative mutations (:), semi-conservative mutations (,), and non-
conservative mutations ( ).

27 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

There are two types of sequence alignments. They are

A) Pairwise sequence alignment


B) Multiple sequence alignment

Pairwise alignment

Pairwise sequence alignment methods are used to find the best-matching Pairwise (local) or
global alignments of two query sequences. Pairwise alignments can only be used between
two sequences at a time, but they are efficient to calculate and are often used for methods that
do not require extreme precision (such as database for sequences with high similarity to a
query).

The three primary methods of producing Pairwise alignments are

1) Dot-matrix methods
2) Dynamic programming
3) Word methods

Tools used for Pairwise Sequence Alignment

Pairwise sequence alignment is used to identify regions of similarity that may indicate
functional, structural and/or evolutionary relationships between two biological sequences
(protein or nucleic acid).

a) Global Alignment
Global alignment tools create an end-to-end alignment of the sequence to be aligned.
There are separate forms of protein or nucleotide sequence.
1. Needle (EMBOSS): EMBOSS Needle creates an optimal global alignment of two
sequences using the Needleman-Wunsch algorithm.
2. Stretcher (EMBOSS): EMBOSS Stretcher uses a modification of the Needleman-
Wunsch algorithm that allows larger sequences to be globally aligned.
b) Local Alignment
Local alignment tools find one, or more, alignment describing the most similar region(s)
within the sequences to be aligned.
1. Water (EMBOSS): EMBOSS Water uses the Smith-Waterman algorithm (modified
for speed enhancements) to calculate the local alignment of two sequences.

28 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

2. Matcher (EMBOSS): EMBOSS Matcher identifies local similarities between two


sequences using a rigorous algorithm based on the LALIGN application.
3. LALIGN: LALIGN finds internal duplications by calculating non-intersecting local
alignments of protein or DNA sequences.
c) Genomic Alignment
Genomic Alignment tools concentrate on DNA (or to DNA) alignments while
accounting for characteristics present in genomic data.
1. Wise2DBA: Wise2DBA (DNA Block Aligner) aligns two sequences under the
assumption that the sequences share a number of collinear blocks of conservation
separated by potentially large and varied lengths of DNA in the two sequences.
2. Gen Wise: Gen Wise compares a protein sequence to a genomic DNA
sequence, allowing for introns and frame shifting errors.
3. Promoter Wise: Promoter Wise compares two DNA sequences allowing for
inversions and transactions ideal for promoters.

PROCEDURE:

1) NCBI site was opened.


2) The protein database was selected. The proteins with accession numbers
NP_390424.1 and NP_782596.1 was downloaded.
3) Run FASTA to get the sequences in FASTA format and copy the sequences in note
pad.
4) EMBOSS database was opened through google.
www.ebi.ac.uk/Tools/psa/

The alignment tool was selected and the set parameters using drop down boxes.

Alignment tool

Global – Needleman Wunch algorithm

Gap open – 10

Gap extension penalty – 0.5

Matrix – BLOSUM 62

29 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

5) The protein sequences were pasted and click on run button.


6) The result of the global alignment was displayed as output. Similarly,
local alignment was performed using Smith waterman algorithm and the result was
displayed.
a) Local alignment:
Respective protein sequences in fasta format of human and yeast were obtained by
accessing protein sequence databases (swissprot).
Log in to http://www.ebi.ac.uk/Tools/emboss/align/
Paste the respective sequences in the given input boxes.
Keep the method in water tool & remaining option buttons in default, click run.
Results were displayed as soon as the task completed.

30 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

REPORT: The results show that’s 238 to 330 of yeast protein sequence locally aligned
with human protein sequence from 419-489, with 23.1 identity (indicated by double dots :)
and 35.6 similarity (it includes both: &.).

31 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

b) Global alignment:

Respective protein sequences in fasta format of human and yeast were obtained by
accessing protein sequence databases.

Log in to http://www.ebi.ac.uk/Tools/emboss/align/
Paste the respective sequences in the given input boxes.
Keep the method in needle tool & remaining option buttons in default, click run.
Results were displayed as soon as the task completed.

32 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

REPORT: The results show that’s 1 to 121 of S.aureus protein sequence globally aligned
with S.epedermidis protein sequence from 1-147, with 63.9% identity (indicated by double
dots :) and 74.8% similarity (it includes both: &.).

33 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 05

DATE:

MULTIPLE SEQUENCE ALIGNMENT USING CLUSTAL O

AIM: To perform multiple sequence alignment for given protein sequence using Clustal O

INTRODUCTION: There are various methods of sequence alignment. These methods


differ n the approach, computational complexity and accuracy of results. There are four
categories of alignment methods.

Major methods of sequence alignment include:

a) BRUTE FORCE: Based on exhaustive enumeration- produces alignment without gaps


and has an N2 complexity, where N is the length of sequences. This is a trivial method
with hardly any practical utility.
b) DOT MATRIX: Useful for simple alignment- utilizing graphical methods easy to
understand and apply. However it does not show sequences or produce optimal
alignment.
c) DYNAMIC PROGRAMMING: produces optimal alignment by starting an alignment
from one end (as in dot matrix), then keeping track of all possible best alignments to that
point.

Dynamic programming can provide global or local sequence alignment.

(i) Global Sequence Alignment


(ii) Local Sequence Alignment

GLOBAL SEQUENCE ALIGNMENT: Sequence alignment methods predate dot matrix


searches, and all the methods in use are related to the original method of Needle and Wunsch.
Needle and Wunsch method qualifies the similarity between sequences. Global alignment is
done across the entire sequence length to include as many matches as possible up to and
including the sequence end.

LOCAL SEQUENCE ALIGNMENT: Smith-Waterman dynamic algorithm is used for


local alignment. The algorithm gives the highest scoring local match between two sequences.

34 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

Smith and Waterman used two sequences that have a matched region that is only a fraction of
their lengths that have different lengths, that overlap, or where one sequence is a fragment of
the other

d) HEURISTICS METHODS: heuristics methods of computer program making guesses to


obtain approximate results but much faster than possible with exhaustive searching.
Heuristics methods are approximate methods, but fast and effective. BLAST and FASTA
are the most popular.

i. BLAST (Basic local alignment search tool):

BLAST was developed by Altschul et al. Blast is a powerful tool used for identifying two
sequence similarities. BLAST finds the regions of high local similarity in alignment
without gaps and longer alignment with gaps. Blast focuses on no gap alignments of
certain, fixed length. Rather than requiring exact matches, Blast uses a scoring function to
measure rather than distance. For a given threshold parameter, BLAST reports to the user
all data base entries which have a segment pair with the query sequence that scores higher
than threshold parameter. These pairs of segment are called high scoring segment pairs
(HSPs)

ii. FASTA
FASTA was developed by Lipman and Pearson. FASTA considers the exact matches
between short substrings, for a given parameter. If a significant number of such exact
matches are found, FASTA uses the dynamic programming algorithm to compute
optimal alignment. This makes the program faster but loses precision.
FASTA provides a rapid way to find short stretches of similar sequence between a
new sequence and any sequence in a database. These short stretches are called k-tup
(or k-tuples). The default value is 2 for protein sequences and 6 for DNA sequences.

TOOLS USED IN MULTIPLE SEQUENCE ALIGNMENT:

Multiple Sequence Alignment (MSA) is generally the alignment of three or more biological
sequences (protein or nucleic acid) of similar length. From the output, homology can be
inferred and the evolutionary relationships between the sequences studied.

35 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

 Clustal omega: New MSA tool that uses seeded guide trees and HMM profile-profile
techniques to generate alignments. Suitable for medium-large alignments.
 Clustal W2: Popular MSA tool that uses tree-based progressive alignments. Suitable
for medium alignments.
 Db Clustal: Create a multiple sequence alignment from a protein BLAST result using
the DB CLustal program.
 Kalign: Very fast MSA tool that concentrates on local regions. Suitable for large
alignments.
 MAFFT: MSA tool that uses FAST Fourier Transforms. Suitable for medium-large
alignments.
 MUSCLE: Accurate MSA tool, especially good with protein. Suitable for medium
alignments.
 M View: Transform a sequence similarity search result into a Multiple Sequence
Alignment or reformats: Multiple Sequence Alignment using MView program
 T-Coffee: Consistency-based MSA tool that attempts to mitigate the pitfalls of
progressive alignment methods. Suitable for small alignments.
 Web PRANK: The EBI has a new phylogeny-aware multiple sequence alignment
program which makes use of evolutionary information to help place insertions and
deletions.

APPLICATIONS:

The main applications of Multiple Sequence Alignment are:

1. Structure Prediction: A multiple sequence alignment can give you the most perfect
protein or RNA secondary structure, sometimes it helps even with the 3D structure.
2. Protein Family: A Multiple Sequence Alignment can help you decide that your
protein is a member of a known protein family or not .
3. Pattern identification: By looking at conserved regions or sites, you can identify
which region is responsible for a functional site.
4. Domain identification: By looking at file provided by a Multiple Sequencre
Alighnment, you can extract profiles to use them against databases.
5. DNA Regulatory Elements: You can use Multiple Sequence Alignments to locate
DNA regulatory elements such as binding sites etc.

36 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

6. Phylogenetic Analysis: By carefully picking related sequences you can reconstruct a


tree using sequences that you have used in the Multiple Sequence Alignment (you can
use the PHYLIP package and you can find a post about it here).

A Multiple Sequence Alignments are playing a major role in bioinformatics, you can use it
almost anywhere but as everything on this earth, nothimg is perfect or 100% accurate, so you
have to choose your sequences very carefully to prevent meaningless results.

PROCEDURE:

1. NCBI website was opened (http://www.ncbi.nlm.nih.gov/)


2. The protein database was selected and the proteins with the following accession
number was downloaded.
(i) NP_390424.1

(ii) NP_782596.1

(iii) NP_390425.1

(iv) NP_394546.1

(v) NP_782598.1

(vi) NP_782597.1

(vii) NP_390426.1

(viii) NP_394545.1

3. From the results displayed, run FASTA to get the sequence in FASTA format and
copy the FASTA sequence in notepad.
4. Clustal W2 was opened through Google. (http://www.ebi.ac.uk/Tools/msa/clustalw2/)
5. The homepage of ClustalW2 was displayed. The previously copied sequences of
proteins were pasted in query box.

Note: each sequence entered in the query box should end with*, new sequence should
start in a new line.
6. Click on submit button. The results of Multiple Sequence Alignment was displayed in
the form of clado.

37 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

38 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

REPORT: Multiple sequence alignment of protein sequences are performed using clustal-
Omega.
39 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 06

DATE:

SIMILARITY SEARCH BY BLAST

INTRODUCTION: BLAST stands for Basic Local Alignment Search Tool. It is a


similarity search tool available at National Center for Biotechnology Information NCBI,
USA. It is a heuristics based algorithm. BLAST for basic local alignment search tool is an
algorithm for comparing primary biological sequence information, such as the amino acid
sequence of different proteins and the nucleotide sequence of DNA. A BLAST search
enables a researcher to compare a query sequence with a library or database of sequences and
identify library sequences that resembles the query sequence above a certain threshold. It
initially finds the list of high scoring words (w). BLAST takes each word from the query
sequence and locates all the words in the current test sequence.

Example- w is 3 for amino acid sequence and 11 for the nucleotide sequence.

BLAST compares the word list to the database and identifies the exact matches. If similar
words are found BLAST tries to expand the alignment to the adjacent words without
allowing the gaps. After all words are tested, a set of maximal segment pairs is chosen for the
database sequence.

BLAST output

 BLAST output includes the graphical overview box, a matching list and a text
description of the alignment.
 The graphical overview box contains coloured horizontal bars that allows quick
identification of the number of database hits and the degree of similarity of the hits.
 The colour coding of the horizontal bars corresponds to ranking of similarities of the
sequence hits.
Red colour – high similarity
Green and blue – moderately related
Black – less similar

40 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

 The length of the bars represents the spans of sequence alignments relative to the query
sequence. Each bar is hyperlinked to the actual pair wise alignment in the text portion of
the report.
 Below the graphical box is a list of matching hits ranked by E-values in ascending order.
Each hit includes the accession number, title of the database record, bit score and E-
value. This list is followed by the text description which may be divided into 3 sections.
First one is header. This section contains the gene index number or the reference
number of the database hit and the one-line description of the database sequence.
Second section is statistics. This includes the bit score. E-value, percentages of identity,
similarity and gaps. Third section is alignment. In this section, the query sequence is on
the top of the pair and the database sequence is at the bottom of the pair labelled as
subject.
 In between the two sequences matching identical residues are written out at their
corresponding position whereas non-identical but similar residues are labelled with plus
mark (+).
 Any residues identified as low complexity regions (LCRs) in the query sequence are
masked with Xs and Ns so that no adjustment is represented in those regions.

Statistical significance of BLAST

BLAST output provides a list of pair-wise sequence matches ranked by statistical


significance. The significance scores help in distinguish evolutionary related sequences from
unrelated ones. In BLAST searches, this statistical indicator is known as E-values
{Expectation value}. It indicates the probability that the resulting alignments from a database
search are caused by random chance.

E= m*n*p

m = total number of residues in a database.

n = number of residues in the query sequence.

P = probability that an hsp alignment is the result of random chance.

E- value is related to the p- value which is used to assess significance of single pair wise
alignment. This E- value provides information about the likelihood that given sequence’s
match is purely by chance.

41 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

If E- value<1e-50 it is considered that the database match is the result of homologous


relationship.

If E is between 0.01 and 1e-50 match is considered homology.

If E is between 0.01 and 10 match is considered not significant.

If E>10 it’s considered as a distant relationship.

BLAST Program Options

PROGRAM QUERY DATABASE TYPE OF

SEQUENCE ALLIGNMENT

BLASTP Protein Protein Gapped

BLASTn Nucleic acid Nucleic acid Gapped

BLAST x Translated nucleic acids Protein Each frame gapped

T BLAST n Protein Translated nucleic acids Each frame gapped

T BLAST x Translated nucleic acids Translated nucleic acids Each frame gapped

42 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 06(a)

DATE:

SIMILARITY SEARCH USING BLAST P

AIM: To perform BLASTP against protein data bank database (NCBI)

QUERY SEQUENCE: P71913

PROCEDURE:

1. The Ribokinase protein of mycobacterium tuberculosis sequence was selected from the
database.
2. Login to http://blast.ncbi.nil.nih.gov/Blast.cgi
3. Select the BLASTP as the input sequence is protein.
4. Paste the protein sequence in FASTA format in the input box.
5. Set the parameter default except Database option to PDB and click search button.
6. As soon as the search process completes the results are displayed.
7. BLAST output includes the graphical overview box, a matching list and a text
description of the alignment.

43 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

44 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

RESULT: The similarity search for Ribokinase was performed.

45 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 06(b)

DATE:

SIMILARITY SEARCH USING BLAST N

AIM: To perform BLAST n against nucleotide database.

PROCEDURE:

1. The adenine sequence was selected from the database.


2. Login to http://blast.ncbi.nil.nih.gov/Blast.cgi
3. Select the Blast n as the input sequence is nucleotide.
4. Paste the nucleotide sequence in FASTA format in the input box.
5. Set the parameter default except Database option to nucleotide collection (nr/nt) and
click search button.
6. As soon as the search process completes the results are displayed.
7. BLAST output includes the graphical overview box, a matching list and a text
description of the alignment.

46 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

47 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

RESULT: The similarity search for Adenine was performed.

48 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 07

DATE:

SIMILARITY SEARCH FOR PROTEIN SEQUENCE BY


USING FASTA

INTRODUCTION:

FASTA

 FASTA is a database search algorithm


 It was developed at Virginia institute of bioinformatics
 It uses Pearson and Lipman algorithm to search for similarities between of the same
type as the query sequence.
 FASTA is a word based method. It looks for matching ‘word’ or the sequence
patterns called k-tuples.
 It then builds a local alignment based upon the word matches
 It makes a list of all the words in each sequence. It matches identical words from each
list and then creates diagonals by joining adjacent matches.
 FASTA then rescores the highest scoring regions using a replacement matrix(ex:
PAM, BLOSUM). The best of these scores is called initn. FASTA joins together the
high scoring diagonals allowing for gaps. The best score from that is called initn
 FASTA finally uses smith-waterman algorithm to identify an optimal local alignment
around the regions it has discovered.

IMPLEMENTATIONS OF FASTA

 FASTA: It compares a protein sequence to another protein sequence or a protein


library or a DNA sequence to another DNA sequence.
 TFASTA: It compares a protein sequence to a DNA sequence by translating the DNA
sequence in all 6 possible reading frames and then comparing each frame to a protein
sequence.
 LFASTA: It identifies one or more regions of similarity between two sequences.

49 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

 PLFASTA: It presents a dot matrix plot of regions of sequence similarity between


two sequences.
 FASTX AND FASTY: To translate probe DNA sequence in 3 reading frames and
compare all the 3 frames to a protein sequence database.

FASTA FORMAT

 FASTA is one of the simplest and most popular sequence format, since it contains
protein sequence information that is easily readable the analysis programs.
 It has a single definition line that begins with right-handed bracket followed by a
sequence name.
 The plane sequence in standard one retrieval symbols starts in the second line. Each
line of the sequence data is limited to 60-80 characters in width.

STATISTICAL SIGNIFICANCE OF FASTA

FASTA also uses E-values and bit scores and the estimation of these parameters in FASTA
is essentially the same as in BLAST. However, FASTA output provides one more statistical
parameters called z-score. This describes the number of standard deviations from the mean
score for the database search.

Z score > 15 – extremely significant match

Z 5-15 - highly probable homologs

Z < 5 – distant relationship

Higher the z-score reported, more significant is the match.

50 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 07 (a)

DATE:

SIMILARITY SEARCH FOR PROTEIN SEQUENCE BY


USING FASTA

AIM: To perform FASTA for the following protein sequence.

>P71913 RIBOKINASE RBSK


MANASETNVGPMAPRVCVVGSVNMDLTFVVDALPRPGETVLAASLTRTPGGKGANQAVAAARAGAQVQFS
GAFGDDPAAAQLRAHLRANAVGLDRTVTVPGPSGTAIIVVDASAENTVLVAPGANAHLTPVPSAVANCDV
LLTQLEIPVATALAAARAAQSADAVVMVNASPAGQDRSSLQDLAAIADVVIANEHEANDWPSPPTHFVIT
LGVRGARYVGADGVFEVPAPTVTPVDTAGAGDVFAGVLAANWPRNPGSPAERLRALRRACAAGALATLVS
GVGDCAPAAAAIDAALRANRHNGS

PROCEDURE:

 The Ribokinase protein of mycobacterium tuberculosis sequence (P71913) was


selected from the database.
 Login to http://www.ebi.ac.uk/Tools/fasta33/index.html
 Paste the protein sequence in FASTA format in the input box.
 Set the parameters default and click RUN button.
 As soon as the search process completes the results are displayed
 FASTA output includes a matching list and a text description of the sequences, which
has shown the similarity.
 By clicking show alignment button it displays the alignment of query protein to that
of the similarity sequences.

51 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

52 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

RESULT: The similarity search for Ribokinase was performed using FASTA

53 DECCAN SCHOOL OF PHARMACY

You might also like