Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

Gene Structure and Identification

Genes and Genomes


ORFs and more
Consensus Sequences
Gene Finding
Genes
• Protein Coding

• RNA genes
– rRNA
– tRNA
– snRNA, snoRNA…
Protein Coding Genes
• ORF
– long (usually >100 aa)
– “known” proteins likely
• Regulatory signals
– Depend on organism
• Prokaryotes vs Eukaryotes
• Verterbrate vs fungi, eg.
Prokaryotic Gene Expression
Promoter Cistron1 Cistron2 CistronN Terminator

Transcription RNA Polymerase


mRNA 5’ 3’
1 2 N
Ribosome, tRNAs,
Translation
Protein Factors
N N
N C
C
C
1 2 3
Polypeptides
Bacterial Promoter
-35
T82T84G78A65C54A45…
(16-18 bp)…
T80A95T45A60A50T96…(A,G)
-10 +1
Alternate sigma factors
CCCTTGAA….CCCGATNT
Terminators
• Stem/loop • C-rich
– structural only • G-poor
• 3’-U tail • “loose” consensus

Rho-independent Rho-dependent
Translation
Ribosome Binding Site, Shine-
Dalgarno Site

nnGGAGGnnnnnATG…

typical E. coli

nnaaAGGnnnnnATG
Eukaryotic Gene Expression

Enhancer Promoter Transcribed Region Terminator

Transcription RNA Polymerase II


Primary transcript 5’ Intron1 3’
Exon1 Exon2

Cap
Splice
Translation Cleave/Polyadenylate
7m
G An
N
C Transport
Polypeptide 7m
G An
Eukaryotic Gene Complexity
• Yeast
– introns rare
– promoters adjacent
– genome dense
Exon/Intron Structure

CCACATTgtn(30-10,000)an(5-20)agCAGAA

...CCACATTCAGAA...
...ProHisSerGlu...
Alternative Splice

CCACATTgtn(30-10,000)an(5-20)agcagAA

...CCACATTAA...
...ProHisSTOP
Consensus Sequences
• Promoter sites
• Intron/Exon
• Transcription Termination/PolyA

• Translation initation
Finding Functional Sequences

Known Consensus Sequences


Consensus Sequence Generation
Functional Tests
Consensus Inference
• Position Weight Matrices
• Sequence Logos
ProfileScan
• Hidden Markov Models
Basal Promoter Analysis
Myers and Maniatis, Genes VI, 831

• ATATAA -30 TBP


• GGCCAATC -75 CTF/NF1
• GCCACACCC -90 SP1
+1

GC CAAT TATA
DATABASE SEARCH
• BLASTN
– DNA:DNA comparison (ALWAYS!)
– Not sensitive (DNA conservation low)
• BLASTX/TBLASTX
– 6 frame ORFS:polypeptide database
– 6 frames vs. 6 frames of a DNA database

You might also like