Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Bioinformatics is the combination element of biology and computer science.

To simply put, it is the


science of storing, retrieving and analysing large amounts of biological information. It is a broader
interdisciplinary discipline, involving many different types of specialists, including biologists, molecular life
scientists, computer scientists and mathematicians. Moreover, it encompass modelling and image analysis
methods used for comparison of linear sequences or three-dimensional structures. Factually, bioinformatics is
rapidly developing branch of biology and it has many practical applications in different areas of biology and
medicine. Noting that as technology improve so does possibilities. As an aspiring biologist, getting familiar
with the techniques in bioinformatics and sequence analysis tools are very significant. So, the aim of this
exercise is to be acquaint with sequence analysis tools that can be accessed through the Internet specifically
working the NCBI database.

Procedures
A. Data Retrieval using NCBI

Visit NCBI

http://www.ncbi.nlm.nih.gov/Entrez/.

Select Nucleotide search option

Enter accession number in search box and click search


Accession number: AF108101 and AF108102

In GeneBank display, click FASTA, select the sequence (avoid > b/c as comment by all analysis
software!!), paste and save in MS word change the font to courier 10 point to obtain the proper spacing
and lines.
B. Obtaining a protein FASTA sequence.

Obtain protein FASTA output by returning to GeneBank display, scroll down until CDS & click /protein_id=,
click FASTA (avoid > b/c as comment by all analysis software!!), paste and save in MS word change the font to
courier 10 point to obtain the proper spacing and lines.

C. Translating nucleotide into protein: ExPASy Tool.

Select the nucleotide sequence, copy and paste it into the translate sequence window in the ExPasy link
(https://web.expasy.org/tools/dna.html ). Under Output format select "Compact" w/c will give amino acid
sequence as one letter codes with stop codons indicated by a hyphen. (The "Verbose" output indicates
start codons (ATG) in bold as Met and stop codons written out so this is an easy way to scan the outputs.
(Cannot be used in Blast search!!). Then click translate.

Often only one reading frame will give you a translation with no stop codons, however it's not
always the case and when there are multiple frames, then resolve to BLAST program to identify the
true frame and to determine if the sequence corresponds to any known protein sequence.

Using "Compact output", get one letter sequences, copy the one letter sequence of the best reading frame
(i.e. one with no stop codons) and paste it into the window below.

Copy the longest amino acid sequence (i.e. no hyphens) of one of the other reading frames
to the window below. If you have two reading frames without a stop codon, simply copy
each to the boxes below.

Copy and save each sequence to MS word for future use.


D. Identifying sequence.

Copy and paste the longest translated sequence into the first box below and into the second
box (or call up the translated sequences previously submitted)

Open BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi), at this page select Protein-Protein BLAST (BLASTP) &
copy the sequence from the first box above and paste it into the Search box on the protein-protein blast page.

Scroll down to the Format Section - in this section use the pull-down menus to change the Descriptions to 10 and the
Alignments to 10 and change the Layout to One Window. (Options sections settings will be in default value and will be
address in next advance exercise). Click on the Blast button at the bottom or top of the screen. A new window will appear that
gives an estimate of how long the search will take and which lists conserved domains in your query sequence. You may want to
copy your request id number (isn't necessary). After the indicated time has passed, press the Format button to see the results.
When there similar known protein match, a color window will appear showing degrees and range of similarity. Perfect matches
show up as red, next best as purple, mediocre as green, poor matches as blue and very poor or no match as black. Best 10
alignments (make sure you have limited this to 10!) will be appear if the DNA sequence has already been identified it should
show up as a perfect match (score generally between 200-400, but could be lower depending on size of peptide analyzed. The E
(tells you the probability that an unrelated sequence in the database could have given the score value) value will be down
around 10(-50) to 10(-100)). See info. ().
Copy the line below the color alignment window which shows the sequence producing the best alignment. This will
provide identifiers (gi number and other identifying numbers). Remember, you will need to download the full protein from the
database for characterization in Project 5. Save this information!!!!

You might also like