Professional Documents
Culture Documents
Biology 171L - General Biology Lab I Lab 12: Introduction To Bioinformatics
Biology 171L - General Biology Lab I Lab 12: Introduction To Bioinformatics
2. Using an Internet browser to navigate to the 2. Under the “Basic BLAST” header choose the
following URL: “nucleotide blast” link.
NCBI BLASTn
1. Use your Internet Browser to go to the NCBI 6. Then click the “BLAST” button.
BLAST (Basic Local Alignment Search Tool)
page at http://www.ncbi.nlm.nih.gov/ and Your request will be placed in the BLAST
select the BLAST link under “Popular queue and given a request ID that can be
Resources. used later to retrieve the data. Wait until the
results page appears.
Lab 12: Introduction to Bioinformatics BIOLOGY 171L 3
7. Click on the “Citation” link which provides each pair in the matrix. A higher positive score
the specific bibliographic details for the indicates more significant alignment. The entries
paper in which the BLAST program is are sorted in decreasing order of statistical
published. Copy and paste this citation into significance.
your Bioinfo_Assignment Word document The E-value (expectation value) is the
under the heading: References. The probability that the observed degree of similarity
reference section should later be moved to (score) could have arisen by chance. An E-value
the end of the assignment. equal to zero indicates the sequences are
identical or nearly identical. For the purposes of
8. Click on the “Search Summary” link to reveal this lab, an E-value greater than 0 but less than
a report on the search itself. Refer to this or equal to 0.05 will be considered a significant
summary report to answer some of the hit. Depending on the sequence, researchers
questions below. will consider E-values up to 0.1 as significant.
9. Throughout this lab you will be proposed 11. Within the identical hits (E-value equal to
with sets of questions (italicized). Answer zero) is your query sequence. Referring to
these questions in your Bioinfo_Assignment Requirement 1 in your Word Document,
Word document under the designated identify and click on the gb number
headings. (Accession column) in the Blast Window for
the query sequence. By clicking on this gb
Blastn Questions: number, you will open the Genbank file for
the sequence.
a. What databases were searched for your
query? (See “Description” under
“Database Name”)
b. What databases were not searched?
c. What was the query length of the DNA
sequence searched for?
d. How many DNA bases, in the NCBI
database, were compared for the
query?
e. How many sequences were examined
from the database?
f. How many BLAST hits were obtained in
the search?
g. What is a BLAST hit?
PubMed is the bibliographic division of ORF Finder: Open Reading Frame Search
ENTREZ and contains files of abstracts of
scientific articles. PubMed can be accessed on Regions of DNA that encode proteins are
the home page of the NCBI site or at first transcribed into mRNA and then translated
http://www.ncbi.nlm.nih.gov/PubMed/. into protein. In translation of mRNA into protein,
codons of three nucleotides translate into an
13. Copy and paste the complete reference amino acid sequentially from the start until a
information including the names of the stop signal is reached. In most species, the
authors, title of the submission and journal open reading frame starts with an ATG, which
into the Reference section of your codes for the amino acid methionine (Met) and
Bioinfo_Assignment Word document. ends with a stop codon (TAA, TAG, or TGA).
To determine the amino acid sequence of
Referencing guidelines differ depending the protein directly from the DNA sequence the
on the publishing journal. You can find the nucleotide that starts the translation and the stop
accepted format for any journal by going to the signal must be determined. This sequence is the
journal website, and clicking on Instructions for open reading frame (ORF). There are six
Authors. For our purposes use the format below: possible reading frames for each gene: three
reading frames in the forward direction (5’ – 3’)
Redwood, R.G., and Jain, A.K. 1992. Code and three reading frames in the reverse direction
provisions for seismic design for (3’ – 5’). The reading frame that results in the
concentrically braced steel frames. Can. longest protein is usually the ORF that encodes
J. Civ. Eng. 19: 1025–1031. the expressed protein.
Examples of ORFs for a DNA sequence
are presented below. Frame 1 results in the
longest protein and therefore is the ORF.
There are three reading frames in the 5’ – 3’ direction for this sequence:
ATGCCCAAGCTGAATAGCGTAGAGGGGTTTTCATCATTTGAGGACGATGTATAA
ATG CCC AAG CTG AAT AGC GTA GAG GGG TTT TCA TCA TTT GAG GAC GAT GTA TAA
M P K L N S V E G F S S F E D D V STOP
TGC CCA AGC TGA ATA GCG TAG AGG GGT TTT CAT CAT TTG AGG ACG ATG TAT
C P S STOP I A STOP R G F H H L R T M Y
BLASTp
a. What does the acronym ORF stand for? 4. Under the “Program Selection” header,
b. What are you asking the ORF Finder select the “blastp (protein-protein BLAST)”
program to do? algorithm.
c. How many ORFs were predicted for the
given sequence? 5. Click the “BLAST” button.
d. What frame produces the longest ORF?
e. How many nucleotides long is the 6. After the blastp results window opens, scroll
longest ORF? down to view matched amino acid
sequences.
Lab 12: Introduction to Bioinformatics BIOLOGY 171L 6
5. Copy and paste this edited sequence into
your Bioinfo_Assignment Word document
under Requirement 6.
7. Requirement 5. Using Grab, select, copy 9. Set the display for “Genbank.”
and paste all identical (E-value equal to
zero) and significant hits (0 < E-value < 10. Requirement 7. Copy (from “LOCUS” to
0.05) into your Bioinfo_Assignment Word “FEATURE” only) and paste the information
document. presented into your Bioinfo_Assignment
Word document. If necessary, adjust the
Blastp Questions: font of the pasted text as done previously.
http://www.wcc.hawaii.edu/paces/summerfiles/DNAsequences/