Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 19

EST Expressed Sequence Tags

-Manali Mehendale

The ultimate goal of the genome project is to produce a complete and accurate sequence of the entire genetic material

What are ESTs ??

Expressed Sequence Tags (ESTs) are short (usually about 300-500 bp), single-pass sequence reads from mRNA (cDNA). Typically they are produced in large batches. They represent a snapshot of genes expressed in a given tissue and/or at a given developmental stage. They are tags (some coding, others not) of expression for a given cDNA library Expressed sequence tags (ESTs) - An expressed sequence tag (EST) is a small part of the active part of a gene which can be used to fish the rest of the gene out of the chromosome

Overview of how ESTs are constructed


Cell/Tissue
Isolate mRNA, Reverse transcription Deposit the EST sequences

Clone cDNA into a vector to make a library

Sequence 5 and 3 ends of cDNA insert

Pick individual clones

Debates over ESTs


FOR:

Merits of sequencing cDNA Represent only 3% of DNA, but represent the vast majority of information content Not easy to predict gene coding regions and their mRNA

AGAINST:

Difficult to find every mRNA

Relationship between EST sequence and mRNA transcript


Exon Intron

Splicing

Splice variants

3 EST sequences

Consequences of Methodology

Many sequences derived from 3 ends of mRNA thus mostly contain information about 3 untranslated regions. Average quality is low, errors are quite common. Genes that are highly expressed in tissues from which libraries have been made will be represented in many EST sequences. Genes that are expressed only in tissues that were not used for preparing cDNA will not be represented in the EST database. Substantial number of sequences derived from partially spliced RNA species. ESTs as good as the clones from which they are derived.

Why ESTs ?!

EST's can act as standard markers for the physical mapping of the genome. Additional advantage of pointing directly to an expressed gene.
Used intensively as a source of information for the discovery of new genes whose function can be tentatively deduced from sequence.

The DATA

Raw data is unorganised, unannotated, redundant and of low quality Present in flat file format Found at:
SAMPLE ENTRY (Genbank) CURRENT STATISTICS

EST clustering

UNIGENE:
-Experimental system for automatically partitioning Genbank sequences into a non redundant set of gene oriented clusters. -In addition to sequences of well characterized genes, hundreds of thousands EST sequences have been included

Unigene
Genbank mRNAs Genbank genomic CDSs dbESTs

Preliminary clusters
Unanchored clusters

UniGene

UniGene.

Querying Unigene result

OTHER SIMILAR DATABASES: TIGR Gene Indices BLAST -result

STACK BLAST-result

Querying the EST database

BLAST : result

FASTA (Only email submission allowed at http://www.ebi.ac.uk) Smith and Waterman

CAN ALSO PERFORM MOTIF SEARCHES

EST CONTIG ASSEMBLY

GCG package

FINDING CODING REGIONS

ESTScan : http://www.ch.embnet.org/software/ESTScan.html
Type of hidden Markov model that explicitly deals with the possibility of errors in the sequence to analyze, and incorporates a method for correcting these errors.

1) DNA (CDS) : result 2) Full DNA : result 3) Protein : result

TrEST
TrEST is an attempt to produce contigs from UniGene clusters (see below) and to translate them into proteins. This is a two-step process: (i) assembly of contigs from a collection of ESTs; (ii) translation of the assembled contigs into protein (using ESTscan)

Use's of ESTs

Hunting for novel genes

A large scale project generated 7,000 ESTs representing 4,000 sequences from T.gondii, sequence comparison with public databases identified potential for 500 novel genes.

Creating Gene Indices

Using EST and STS. Map location is provided by Unigene.

Uses of ESTs.

Gene Predication in genomic DNA

Reannotation of C.elegans: - In about half the cases the computationally predicted genes were identical to the EST alignments; 25% of the genes were predicted with less accuracy, remaining were predicted poorly. - Error rate of worm genome is less than 0.0001 - Many of the alternative splices are not annotated on the genomic sequence - Computational methods may predict separate genes, whereas EST shows that these are exons of a single gene.

Uses of ESTs

Ideal source of polymorphic data

Since ESTs are sequenced redundantly from libraries prepared from different individuals, they seem an ideal source of polymorphic data.

Assessing the level of gene expression

Since ESTs are generated by random sequencing of clones from many different
Libraries.Example is creation of CGAP.

In cDNA microarrays SAGE

You might also like