CADD EXPERIMENT 1 Manual

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

EXPERIMENT 1.

AIM: RETRIVE PROTEIN SEQUENCE IN FASTA FORMAT

THEORY :

In bioinformatics and biochemistry, the FASTA format is a text-based format for representing
either nucleotide sequences or amino acid sequences, in which nucleotides or amino acids are
represented using single-letter codes. The format also allows for sequence names and comments
to precede the sequences.

A sequence in FASTA format begins with a single-line description, followed by lines of


sequence data. The definition line (defline) is distinguished from the sequence data by a greater-
than (>) symbol at the beginning. The word following the ">" symbol is the identifier of the
sequence, and the rest of the line is the description (optional). Normally, identifiers are simply
protein accession, name or Entrez gi's (e.g., Q5I7T1, AG10B_HUMAN, 129295), but a bar-
separated NCBI sequence identifier (e.g., gi|129295) will also be accepted. Any arbitrary user-
specified sequence identifier can also be used (e.g., CLONE00073452) but you are advised to
use sufficiently long unique words in such case. There should be no space between the ">" and
the first letter of the identifier. It is recommended that all lines of text be shorter than 80
characters in length.

PROCEDURE:

 Go to NCBI website. On the left hand side is the resource list, depending upon
your querry click on the required type.
 Select protein from the drop down menu, and give the name of the protein, since the
aim is to retrieve the sequence.
 After the result run, you get vast number of search results.
 Filter can be applied according to the need.
 Each searched protein sequence comes with an Accession number, which is the
identity of that particular protein in the NCBI site.
 Different types of formats are available below the accession number, most popular
being the FASTA.
 Click on FASTA and the page opens, save the obsession number.
 The FASTA sequence starts with > symbol. To save it, go to the send to, drop
down menu , and download it.

ASSIGNMENT :
1. Derive sequences for Human Lysozyme, Human Serum Albumin and
Haemoglobin (HBA1) protein
2. Find the gene location for these proteins
3. Find the no of disulphide bonds in each protein.
4. Find the no of fluorescent amino acids in each protein

You might also like