Professional Documents
Culture Documents
Gene Structure and Identification: Genes and Genomes Orfs and More Consensus Sequences Gene Finding
Gene Structure and Identification: Genes and Genomes Orfs and More Consensus Sequences Gene Finding
• RNA genes
– rRNA
– tRNA
– snRNA, snoRNA…
Protein Coding Genes
• ORF
– long (usually >100 aa)
– “known” proteins likely
• Regulatory signals
– Depend on organism
• Prokaryotes vs Eukaryotes
• Verterbrate vs fungi, eg.
Prokaryotic Gene Expression
Promoter Cistron1 Cistron2 CistronN Terminator
Rho-independent Rho-dependent
Translation
Ribosome Binding Site, Shine-
Dalgarno Site
nnGGAGGnnnnnATG…
typical E. coli
nnaaAGGnnnnnATG
Eukaryotic Gene Expression
Cap
Splice
Translation Cleave/Polyadenylate
7m
G An
N
C Transport
Polypeptide 7m
G An
Eukaryotic Gene Complexity
• Yeast
– introns rare
– promoters adjacent
– genome dense
Exon/Intron Structure
CCACATTgtn(30-10,000)an(5-20)agCAGAA
...CCACATTCAGAA...
...ProHisSerGlu...
Alternative Splice
CCACATTgtn(30-10,000)an(5-20)agcagAA
...CCACATTAA...
...ProHisSTOP
Consensus Sequences
• Promoter sites
• Intron/Exon
• Transcription Termination/PolyA
• Translation initation
Finding Functional Sequences
GC CAAT TATA
DATABASE SEARCH
• BLASTN
– DNA:DNA comparison (ALWAYS!)
– Not sensitive (DNA conservation low)
• BLASTX/TBLASTX
– 6 frame ORFS:polypeptide database
– 6 frames vs. 6 frames of a DNA database