Professional Documents
Culture Documents
EST - "Expressed Sequence Tags": - Manali Mehendale
EST - "Expressed Sequence Tags": - Manali Mehendale
-Manali Mehendale
The ultimate goal of the genome project is to produce a complete and accurate sequence of the entire genetic material
Expressed Sequence Tags (ESTs) are short (usually about 300-500 bp), single-pass sequence reads from mRNA (cDNA). Typically they are produced in large batches. They represent a snapshot of genes expressed in a given tissue and/or at a given developmental stage. They are tags (some coding, others not) of expression for a given cDNA library Expressed sequence tags (ESTs) - An expressed sequence tag (EST) is a small part of the active part of a gene which can be used to fish the rest of the gene out of the chromosome
Merits of sequencing cDNA Represent only 3% of DNA, but represent the vast majority of information content Not easy to predict gene coding regions and their mRNA
AGAINST:
Splicing
Splice variants
3 EST sequences
Consequences of Methodology
Many sequences derived from 3 ends of mRNA thus mostly contain information about 3 untranslated regions. Average quality is low, errors are quite common. Genes that are highly expressed in tissues from which libraries have been made will be represented in many EST sequences. Genes that are expressed only in tissues that were not used for preparing cDNA will not be represented in the EST database. Substantial number of sequences derived from partially spliced RNA species. ESTs as good as the clones from which they are derived.
Why ESTs ?!
EST's can act as standard markers for the physical mapping of the genome. Additional advantage of pointing directly to an expressed gene.
Used intensively as a source of information for the discovery of new genes whose function can be tentatively deduced from sequence.
The DATA
Raw data is unorganised, unannotated, redundant and of low quality Present in flat file format Found at:
SAMPLE ENTRY (Genbank) CURRENT STATISTICS
EST clustering
UNIGENE:
-Experimental system for automatically partitioning Genbank sequences into a non redundant set of gene oriented clusters. -In addition to sequences of well characterized genes, hundreds of thousands EST sequences have been included
Unigene
Genbank mRNAs Genbank genomic CDSs dbESTs
Preliminary clusters
Unanchored clusters
UniGene
UniGene.
STACK BLAST-result
BLAST : result
GCG package
ESTScan : http://www.ch.embnet.org/software/ESTScan.html
Type of hidden Markov model that explicitly deals with the possibility of errors in the sequence to analyze, and incorporates a method for correcting these errors.
TrEST
TrEST is an attempt to produce contigs from UniGene clusters (see below) and to translate them into proteins. This is a two-step process: (i) assembly of contigs from a collection of ESTs; (ii) translation of the assembled contigs into protein (using ESTscan)
Use's of ESTs
A large scale project generated 7,000 ESTs representing 4,000 sequences from T.gondii, sequence comparison with public databases identified potential for 500 novel genes.
Uses of ESTs.
Reannotation of C.elegans: - In about half the cases the computationally predicted genes were identical to the EST alignments; 25% of the genes were predicted with less accuracy, remaining were predicted poorly. - Error rate of worm genome is less than 0.0001 - Many of the alternative splices are not annotated on the genomic sequence - Computational methods may predict separate genes, whereas EST shows that these are exons of a single gene.
Uses of ESTs
Since ESTs are sequenced redundantly from libraries prepared from different individuals, they seem an ideal source of polymorphic data.
Since ESTs are generated by random sequencing of clones from many different
Libraries.Example is creation of CGAP.