Professional Documents
Culture Documents
Bio 01 PDF
Bio 01 PDF
EXP NO: 01
DATE:
INTRODUCTION
BIO INFORMATICS:
Bioinformatics is the use of IT in biotechnology for the data storage, data warehousing and
analyzing the DNA sequences in Bioinformatics. Knowledge of many branches are required
like biology, mathematics, computer science, law of physics and chemistry, and of course
sound knowledge of IT to analyze biotech data.
Bioinformatics is an interdisciplinary field that develops and improves upon method for
storing, retrieving, organizing and analyzing biological data. Bioinformatics is the application
of computer technology to the management of biological information computer is used to
gather, store, analyze and integrate the data.
Term bioinformatics was invented by Paulien Hogeweg and Ben Hesper in 1970 as " the
study of informatics process in biotic systems". Paulien Hogeweg is a Dutch theoretical
biologist and complex system researcher studying biological system as dynamic information
processing system at many interconnected levels.
protein domains. Highly significant hits against this different pattern database allow
you to approximate the biochemical function of your query protein.
3) Structural analysis
These sets of tools allow you to compare structure with the known structure database
the function of a protein is more directly a consequence of its sequence with structural
homology tending to share function to the determination of protein's 2D/3D structure
in the study of its function.
4) Sequence analysis
This set of tools allows you to carry out further, more detailed analysis on your query
sequence, including evolutionary, analysis, and identification of mutations. CpG
islands and compositional biases. The identification of the biological properties are all
clues that aid the search to elucidate the specific function of your sequence.
APPLICATIONS
3) Genome annotation: The genome annotations, genomes are marked to know the
regulatory sequences and protein coding. It is very important part of the human
genome project as it determines the regulatory sequences.
4) Comparative genomics: It is the branch of bioinformatics, which determines the
genomic structure and function relation between different biological species. For this
purpose, intergenomic maps are constructed which enable the scientists to trace the
processes of evolution that occur in genomes of different species. These maps contain
the information about the point’s mutation as well as the information about the
duplication of large chromosomal segments.
5) Health and drug discovery: The tools of bioinformatics are also helpful in drug
discovery, diagnosis and disease management, complete sequencing of human genes
has enabled the scientists to make medicines and drugs, which can target more than
5000 genes. Different computational tools and drug targets has made the drug
delivery easy and specific because now only those cells can be targeted which are
diseased or mutated. It is also easy to know the molecular basis of disease.
EXP NO: 02
DATE:
BIOLOGICAL DATABASES:
These are the computer sites that organize store and disseminate files that contain information
consisting of literature, nucleic acid sequence, protein sequence and protein structure.
1) Pharmaceutical
Databases
a) Literature
Databases
b) Chemical
Databases
2) Biological
Databases
a) Structure
Databases
b) Sequence
Databases
3) Relational Databases
a) Structured Query
Language (SQL)
b) Practical Extraction and
Report Language (PERL)
LIST OF WEBSITES:
GenBank - www.ncbi.nlm.nih.gov/genbank/
Swissprot - www.ebi.ac.uk/swissprot/
Prosite - www.prosite.expasy.org/
Profiles - www.profileddatabase.com
Prints - Http://www.bionif.manchester.Ac.uk/bdbrowser
/PRINTS/index.php
Pfam - http://www.sanger.ac.uk/resources/database/pfam.html
Blocks - http://blocks.fthere.org/
Ensembl - www.ensembl.org
DATE:
DATABASE WINDOW: In access all objects of a database are stored in a single file
and the file name has an MSD extension. These objects are managed through the
database window.
TABLES: Tables are primary building blocks of the access database. All databases is
stored in table. Every table in the database focuses on one subject; like customers,
orders, or products. Every row or record in the table is a unique instance of the subject
of the table.
QUERIES: a query is a question that you ask the data that is stored in the tables of
your database. Likewise a query can be created that only asks for the customers who
reside in a particular state. Most access database contains more than one table, specific
fields from multiple tables into database sheet. The database that a query returns is
called a record set.
FORMS: Forms present the data from a table or a query in a way we want it to be
represented. The fields in the table or query are made available to a place on the forms
we create. We can also edit the forms just as you would edit a datasheet bound table
or query.
REPORTS: Reports are still necessary for printing the result of the data we store, with
access we can quickly and easily design reports based on our data.
MACROS: Macros provide an easy, effective method for automating many database
tasks. Macros can be used for everything for displaying message boxes to validating
data entered into a record before it is saved.
The next step is to create the table, which will store the data. Generally forms, queries
and reports are based on a table or on multiple tables. For example, a student
database holds student’s personal details such as name, ID number and the courses
they attend.
DATA TYPES:
Data types of fields determine the kind of the data that field can store. The first field
in the student details table is the ID field. Type the field name as student id in the first
row. Access provides several data types, which determines the number in which the
data entered into the field is stored. The data types can be accessed by clicking on the
arrow at the right of the data type cell. This displays a drop-down list of different data
types.
The student ID field will be used as a primary key for the record. Each student’s record will
have a different student ID number so as to uniquely identify it. Primary key is set later in the
session. Also ID number can be assigned in last when creating a new record by the use of
data type counter. This means that access will automatically assign a student ID to each new
record beginning with 1 incremented by 1 for each new record. To set a data type counter
drop, go down the data type list and select counter. Also enter various fields with their
corresponding to the data like addresses, total marks, remarks, etc.
Click on yes button in the message box, which appears, replace table 1 the default table name
with student details. This will be table name.
Click on OK button to complete the process. The database window will now include students
details table in the list of tables.
11) STEP 11: Closing The Database And Existing From Access
The last step is to close the database and exit from the access. To do that,
Close the table window by selecting CLOSE from the file menu.
Choose EXIT from the file menu to exit from access.
DATE:
PRINCIPLE: The real advantage of a database is to see the data selectively and in the
order you want to see it. This is possible by specifying conditions for the display of data and
is referred to as query. The data in a query can come from one or more table. After Microsoft
Access retrieves the data that answers our query, we can view and analyze the data once we
have created. A query is a conditional selection of data, for example, which employees work
in the production department. Access will find the records, which meet the criteria we
specify, and order or update them as discussed.
PROCEDURE:
This ‘add the employee details to the query’ means that we can use the fields
from the table in the query, which we are going to create.
Click on the CLOSE window to close the add table dialogue box.
3) QUERY WINDOW:
In a query we can design a query using a feature called Graphical query for example
(QBE). With Graphical QBE, we can create queries by dragging fields from the upper
portion of the query of the window to the QBE grid in the lower portion of the
window. In the QBE grid, each column contains information about a field included in
the query.
8) PROCEDURE:
Go to MS-ACCESS from MS-OFFICE.
Select the query from the toolbox.
Create a query after selection of the table and perform query and check the result and
select visualize.
EXP NO: 04
DATE:
AIM: To carry out dynamic programming and align the two given sequences by local and
global alignment.
Ex:
Pairwise alignment
Pairwise sequence alignment methods are used to find the best-matching Pairwise (local) or
global alignments of two query sequences. Pairwise alignments can only be used between
two sequences at a time, but they are efficient to calculate and are often used for methods that
do not require extreme precision (such as database for sequences with high similarity to a
query).
1) Dot-matrix methods
2) Dynamic programming
3) Word methods
Pairwise sequence alignment is used to identify regions of similarity that may indicate
functional, structural and/or evolutionary relationships between two biological sequences
(protein or nucleic acid).
a) Global Alignment
Global alignment tools create an end-to-end alignment of the sequence to be aligned.
There are separate forms of protein or nucleotide sequence.
1. Needle (EMBOSS): EMBOSS Needle creates an optimal global alignment of two
sequences using the Needleman-Wunsch algorithm.
2. Stretcher (EMBOSS): EMBOSS Stretcher uses a modification of the Needleman-
Wunsch algorithm that allows larger sequences to be globally aligned.
b) Local Alignment
Local alignment tools find one, or more, alignment describing the most similar region(s)
within the sequences to be aligned.
1. Water (EMBOSS): EMBOSS Water uses the Smith-Waterman algorithm (modified
for speed enhancements) to calculate the local alignment of two sequences.
PROCEDURE:
The alignment tool was selected and the set parameters using drop down boxes.
Alignment tool
Gap open – 10
Matrix – BLOSUM 62
REPORT: The results show that’s 238 to 330 of yeast protein sequence locally aligned
with human protein sequence from 419-489, with 23.1 identity (indicated by double dots :)
and 35.6 similarity (it includes both: &.).
b) Global alignment:
Respective protein sequences in fasta format of human and yeast were obtained by
accessing protein sequence databases.
Log in to http://www.ebi.ac.uk/Tools/emboss/align/
Paste the respective sequences in the given input boxes.
Keep the method in needle tool & remaining option buttons in default, click run.
Results were displayed as soon as the task completed.
REPORT: The results show that’s 1 to 121 of S.aureus protein sequence globally aligned
with S.epedermidis protein sequence from 1-147, with 63.9% identity (indicated by double
dots :) and 74.8% similarity (it includes both: &.).
EXP NO: 05
DATE:
AIM: To perform multiple sequence alignment for given protein sequence using Clustal O
Smith and Waterman used two sequences that have a matched region that is only a fraction of
their lengths that have different lengths, that overlap, or where one sequence is a fragment of
the other
BLAST was developed by Altschul et al. Blast is a powerful tool used for identifying two
sequence similarities. BLAST finds the regions of high local similarity in alignment
without gaps and longer alignment with gaps. Blast focuses on no gap alignments of
certain, fixed length. Rather than requiring exact matches, Blast uses a scoring function to
measure rather than distance. For a given threshold parameter, BLAST reports to the user
all data base entries which have a segment pair with the query sequence that scores higher
than threshold parameter. These pairs of segment are called high scoring segment pairs
(HSPs)
ii. FASTA
FASTA was developed by Lipman and Pearson. FASTA considers the exact matches
between short substrings, for a given parameter. If a significant number of such exact
matches are found, FASTA uses the dynamic programming algorithm to compute
optimal alignment. This makes the program faster but loses precision.
FASTA provides a rapid way to find short stretches of similar sequence between a
new sequence and any sequence in a database. These short stretches are called k-tup
(or k-tuples). The default value is 2 for protein sequences and 6 for DNA sequences.
Multiple Sequence Alignment (MSA) is generally the alignment of three or more biological
sequences (protein or nucleic acid) of similar length. From the output, homology can be
inferred and the evolutionary relationships between the sequences studied.
Clustal omega: New MSA tool that uses seeded guide trees and HMM profile-profile
techniques to generate alignments. Suitable for medium-large alignments.
Clustal W2: Popular MSA tool that uses tree-based progressive alignments. Suitable
for medium alignments.
Db Clustal: Create a multiple sequence alignment from a protein BLAST result using
the DB CLustal program.
Kalign: Very fast MSA tool that concentrates on local regions. Suitable for large
alignments.
MAFFT: MSA tool that uses FAST Fourier Transforms. Suitable for medium-large
alignments.
MUSCLE: Accurate MSA tool, especially good with protein. Suitable for medium
alignments.
M View: Transform a sequence similarity search result into a Multiple Sequence
Alignment or reformats: Multiple Sequence Alignment using MView program
T-Coffee: Consistency-based MSA tool that attempts to mitigate the pitfalls of
progressive alignment methods. Suitable for small alignments.
Web PRANK: The EBI has a new phylogeny-aware multiple sequence alignment
program which makes use of evolutionary information to help place insertions and
deletions.
APPLICATIONS:
1. Structure Prediction: A multiple sequence alignment can give you the most perfect
protein or RNA secondary structure, sometimes it helps even with the 3D structure.
2. Protein Family: A Multiple Sequence Alignment can help you decide that your
protein is a member of a known protein family or not .
3. Pattern identification: By looking at conserved regions or sites, you can identify
which region is responsible for a functional site.
4. Domain identification: By looking at file provided by a Multiple Sequencre
Alighnment, you can extract profiles to use them against databases.
5. DNA Regulatory Elements: You can use Multiple Sequence Alignments to locate
DNA regulatory elements such as binding sites etc.
A Multiple Sequence Alignments are playing a major role in bioinformatics, you can use it
almost anywhere but as everything on this earth, nothimg is perfect or 100% accurate, so you
have to choose your sequences very carefully to prevent meaningless results.
PROCEDURE:
(ii) NP_782596.1
(iii) NP_390425.1
(iv) NP_394546.1
(v) NP_782598.1
(vi) NP_782597.1
(vii) NP_390426.1
(viii) NP_394545.1
3. From the results displayed, run FASTA to get the sequence in FASTA format and
copy the FASTA sequence in notepad.
4. Clustal W2 was opened through Google. (http://www.ebi.ac.uk/Tools/msa/clustalw2/)
5. The homepage of ClustalW2 was displayed. The previously copied sequences of
proteins were pasted in query box.
Note: each sequence entered in the query box should end with*, new sequence should
start in a new line.
6. Click on submit button. The results of Multiple Sequence Alignment was displayed in
the form of clado.
REPORT: Multiple sequence alignment of protein sequences are performed using clustal-
Omega.
39 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019
EXP NO: 06
DATE:
Example- w is 3 for amino acid sequence and 11 for the nucleotide sequence.
BLAST compares the word list to the database and identifies the exact matches. If similar
words are found BLAST tries to expand the alignment to the adjacent words without
allowing the gaps. After all words are tested, a set of maximal segment pairs is chosen for the
database sequence.
BLAST output
BLAST output includes the graphical overview box, a matching list and a text
description of the alignment.
The graphical overview box contains coloured horizontal bars that allows quick
identification of the number of database hits and the degree of similarity of the hits.
The colour coding of the horizontal bars corresponds to ranking of similarities of the
sequence hits.
Red colour – high similarity
Green and blue – moderately related
Black – less similar
The length of the bars represents the spans of sequence alignments relative to the query
sequence. Each bar is hyperlinked to the actual pair wise alignment in the text portion of
the report.
Below the graphical box is a list of matching hits ranked by E-values in ascending order.
Each hit includes the accession number, title of the database record, bit score and E-
value. This list is followed by the text description which may be divided into 3 sections.
First one is header. This section contains the gene index number or the reference
number of the database hit and the one-line description of the database sequence.
Second section is statistics. This includes the bit score. E-value, percentages of identity,
similarity and gaps. Third section is alignment. In this section, the query sequence is on
the top of the pair and the database sequence is at the bottom of the pair labelled as
subject.
In between the two sequences matching identical residues are written out at their
corresponding position whereas non-identical but similar residues are labelled with plus
mark (+).
Any residues identified as low complexity regions (LCRs) in the query sequence are
masked with Xs and Ns so that no adjustment is represented in those regions.
E= m*n*p
E- value is related to the p- value which is used to assess significance of single pair wise
alignment. This E- value provides information about the likelihood that given sequence’s
match is purely by chance.
SEQUENCE ALLIGNMENT
T BLAST x Translated nucleic acids Translated nucleic acids Each frame gapped
DATE:
PROCEDURE:
1. The Ribokinase protein of mycobacterium tuberculosis sequence was selected from the
database.
2. Login to http://blast.ncbi.nil.nih.gov/Blast.cgi
3. Select the BLASTP as the input sequence is protein.
4. Paste the protein sequence in FASTA format in the input box.
5. Set the parameter default except Database option to PDB and click search button.
6. As soon as the search process completes the results are displayed.
7. BLAST output includes the graphical overview box, a matching list and a text
description of the alignment.
DATE:
PROCEDURE:
EXP NO: 07
DATE:
INTRODUCTION:
FASTA
IMPLEMENTATIONS OF FASTA
FASTA FORMAT
FASTA is one of the simplest and most popular sequence format, since it contains
protein sequence information that is easily readable the analysis programs.
It has a single definition line that begins with right-handed bracket followed by a
sequence name.
The plane sequence in standard one retrieval symbols starts in the second line. Each
line of the sequence data is limited to 60-80 characters in width.
FASTA also uses E-values and bit scores and the estimation of these parameters in FASTA
is essentially the same as in BLAST. However, FASTA output provides one more statistical
parameters called z-score. This describes the number of standard deviations from the mean
score for the database search.
DATE:
PROCEDURE:
RESULT: The similarity search for Ribokinase was performed using FASTA