EST - "Expressed Sequence Tags": - Manali Mehendale

EST Expressed Sequence Tags
-Manali Mehendale
The ultimate goal of the genome project is to produce a complete and accurate sequence of the entire genetic material
What are ESTs ??
Expressed Sequence Tags (ESTs) are short (usually about 300-500 bp), single-pass sequence reads from mRNA (cDNA). Typically they are produced in large batches. They represent a snapshot of genes expressed in a given tissue and/or at a given developmental stage. They are tags (some coding, others not) of expression for a given cDNA library Expressed sequence tags (ESTs) - An expressed sequence tag (EST) is a small part of the active part of a gene which can be used to fish the rest of the gene out of the chromosome
Overview of how ESTs are constructed

Cell/Tissue
Isolate mRNA, Reverse transcription Deposit the EST sequences
Clone cDNA into a vector to make a library
Sequence 5 and 3 ends of cDNA insert
Pick individual clones
Debates over ESTs

FOR:
Merits of sequencing cDNA Represent only 3% of DNA, but represent the vast majority of information content Not easy to predict gene coding regions and their mRNA
AGAINST:
Difficult to find every mRNA
Relationship between EST sequence and mRNA transcript

Exon Intron
Splicing
Splice variants
3 EST sequences
Consequences of Methodology
Many sequences derived from 3 ends of mRNA thus mostly contain information about 3 untranslated regions. Average quality is low, errors are quite common. Genes that are highly expressed in tissues from which libraries have been made will be represented in many EST sequences. Genes that are expressed only in tissues that were not used for preparing cDNA will not be represented in the EST database. Substantial number of sequences derived from partially spliced RNA species. ESTs as good as the clones from which they are derived.
Why ESTs ?!
EST's can act as standard markers for the physical mapping of the genome. Additional advantage of pointing directly to an expressed gene.
Used intensively as a source of information for the discovery of new genes whose function can be tentatively deduced from sequence.
The DATA
Raw data is unorganised, unannotated, redundant and of low quality Present in flat file format Found at:
SAMPLE ENTRY (Genbank) CURRENT STATISTICS
EST clustering
UNIGENE:
-Experimental system for automatically partitioning Genbank sequences into a non redundant set of gene oriented clusters. -In addition to sequences of well characterized genes, hundreds of thousands EST sequences have been included
Unigene
Genbank mRNAs Genbank genomic CDSs dbESTs
Preliminary clusters
Unanchored clusters
UniGene
UniGene.
Querying Unigene result
OTHER SIMILAR DATABASES: TIGR Gene Indices BLAST -result
STACK BLAST-result
Querying the EST database
BLAST : result
FASTA (Only email submission allowed at http://www.ebi.ac.uk) Smith and Waterman
CAN ALSO PERFORM MOTIF SEARCHES
EST CONTIG ASSEMBLY
GCG package
FINDING CODING REGIONS
ESTScan : http://www.ch.embnet.org/software/ESTScan.html
Type of hidden Markov model that explicitly deals with the possibility of errors in the sequence to analyze, and incorporates a method for correcting these errors.
1) DNA (CDS) : result 2) Full DNA : result 3) Protein : result
TrEST
TrEST is an attempt to produce contigs from UniGene clusters (see below) and to translate them into proteins. This is a two-step process: (i) assembly of contigs from a collection of ESTs; (ii) translation of the assembled contigs into protein (using ESTscan)
Use's of ESTs
Hunting for novel genes
A large scale project generated 7,000 ESTs representing 4,000 sequences from T.gondii, sequence comparison with public databases identified potential for 500 novel genes.
Creating Gene Indices
Using EST and STS. Map location is provided by Unigene.
Uses of ESTs.
Gene Predication in genomic DNA
Reannotation of C.elegans: - In about half the cases the computationally predicted genes were identical to the EST alignments; 25% of the genes were predicted with less accuracy, remaining were predicted poorly. - Error rate of worm genome is less than 0.0001 - Many of the alternative splices are not annotated on the genomic sequence - Computational methods may predict separate genes, whereas EST shows that these are exons of a single gene.
Uses of ESTs
Ideal source of polymorphic data
Since ESTs are sequenced redundantly from libraries prepared from different individuals, they seem an ideal source of polymorphic data.
Assessing the level of gene expression
Since ESTs are generated by random sequencing of clones from many different
Libraries.Example is creation of CGAP.
In cDNA microarrays SAGE

EST - "Expressed Sequence Tags": - Manali Mehendale

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EST - "Expressed Sequence Tags": - Manali Mehendale

Uploaded by

Copyright:

Available Formats

EST Expressed Sequence Tags

What are ESTs ??

Overview of how ESTs are constructed

Clone cDNA into a vector to make a library

Sequence 5 and 3 ends of cDNA insert

Pick individual clones

Debates over ESTs

Difficult to find every mRNA

Relationship between EST sequence and mRNA transcript

Querying Unigene result

OTHER SIMILAR DATABASES: TIGR Gene Indices BLAST -result

Querying the EST database

FASTA (Only email submission allowed at http://www.ebi.ac.uk) Smith and Waterman

CAN ALSO PERFORM MOTIF SEARCHES

EST CONTIG ASSEMBLY

FINDING CODING REGIONS

1) DNA (CDS) : result 2) Full DNA : result 3) Protein : result

Hunting for novel genes

Creating Gene Indices

Using EST and STS. Map location is provided by Unigene.

Gene Predication in genomic DNA

Ideal source of polymorphic data

Assessing the level of gene expression

In cDNA microarrays SAGE

You might also like