Promoter Prediction

Promoter Prediction
By :
Anurag Maheshwari (BTB/14/302)
Arpita Gupta (BTB/14/303)
Gurleen Singh Thind (BTB/14/304)
Jagrit Sharma (BTB/14/306)
Promoters
• Gene promoter are DNA sequences located upstream
of gene coding regions.
• Contains multiple cis-acting elements, which are
specific binding sites for TFs.
• Contains “Core Promoter”(-40 bp upstream of the
transcriptional initiation site) and comprises the TATA
box
• Chromatins allow distant cis-acting elements to fold
and spatially become proximal to the regulatory
complex
Types of Promoters
A) Constitutive promoters
• Drives somewhat constant levels of gene expression in all
tissues, at all times. No promoters are truly constitutive. Eg
CaMV355 promoter. High-expressing housekeeping genes are
good source (Ubiquitin, actin, Tubulin, EIF genes )
B) Spatiotemporal promoters
• More precise control of native genes and transgenes.
Restricts gene expression to certain cells, tissues, organs, or
developmental stages. Seed specific promoters in Hordien
and Glutenin genes
C). Inducible Promoters
• Responsive to environmental stimuli (Biotic and Abiotic
stresses)and external chemical stimuli.
Models for finding Binding Sites
A) Exact String Model
• Searches for exact sequence in the DNA
sequence
B). String Mismatches Model
• Tries to find almost exact sequence tolerating a mistake in one of
the positions
C). Degenerate String Method Model (Consensus Model)

Tries to find a sequence and allows various bases to be placed in
specific position of the sequence
Promoter Prediction Servers
URL: http://www.cbs.dtu.dk/services/Promoter/
Procedure
• 1. Specify the input sequences
 The sequences intended for processing can be input in the following two ways:
 Paste a single sequence (just the nucleotides) or a number of sequences in
FASTA format into the upper window of the main server page.
 Select a FASTA file on your local disk, either by typing the file name into the
lower window or by browsing the disk.
 The allowed input alphabet is A, C, G, T and X (unknown); all the other symbols
will be converted to X before processing.
• 2. Select the output format

 Click on the "Full output" button if you want the input sequences to be
included in the server output. The default output format shows the predictions
only.
• 3. Submit the job
 Click on the "Submit" button.
 Your output will be ready in a short time to be reviewed.
Output method
For each input sequence the name and length are first printed, followed by a table
in the form: Position, Score and Likelihood
• 'Position' is a position in the sequence.
• 'Score' is the prediction score for a transcription start site occurring within 100
base pairs upstream from that position.
• 'Likelihood' is a descriptive label associated with that score.
The scores are always positive numbers; they are labelled as follows:
• below 0.5 ignored
• 0.5 - 0.8 Marginal prediction
• 0.8 - 1.0 Medium likely prediction
• above 1.0 Highly likely prediction
The input sequence will be included in the output, preceeding the predictions if
"Full output" has been selected.
EXAMPLE INPUT
• INPUT SEQUENCE:
• >gi_209811_gb_J01917_ADRCG Adenovirus type 2, complete
genome.
CATCATCATAATATACCTTATTTTGGATTGAAGCCAATATGATAATGAGGGGGTGGAGTTTGTGACGTGGCGCGGGGC
GTGGGAACGGGGCGGGTGACGTAGTAGTGTGGCGGAAGTGTGATGTTGCAAGTGTGGCGGAACACATGTAAGCGC
CGGATGTGGTAAAAGTGACGTTTTTGGTGTGCGCCGGTGTATACGGGAAGTGACAATTTTCGCGCGGTTTTAGGCG
GATGTTGTAGTAAATTTGGGCGTAACCAAGTAATGTTTGGCCATTTTCGCGGGAAAACTGAATAAGAGGAAGTGAAA
TCTGAATAATTCTGTGTTACTCATAGCGCGTAATATTTGTCTAGGGCCGCGGGGCTTTGACCGTTTACGTGGAGACTC
GCCCAGGTGTTTTTCTCAGGTGTTTTCCGCGTTCCGGGTCAAAGTTGGCGTTTTATTATTATAGTCAGCTGACGCGCA
GTGTATTTATACCCGGTGAGTTCCTCAAGAGGCCACTCTTGAGTGCCAGCGAGTAGAGTTTTCTCCTCCGAGCCGCTC
CGACACCGGGACTGAAAATGAGACATATTATCTGCCACGGAGGTGTTATTACCGAAGAAATGGCCGCCAGTCTTTTG
GACCAGCTGATCGAAGAGGTACTGGCTGATAATCTTCCACCTCCTAGCCATTTTGAACCACCTACCCTTCACGAACTG
TATGATTTAGACGTGACGGCCCCCGAAGATCCCAACGAGGAGGCGGTTTCGCAGATTTTTCCCGAGTCTGTAATGTT
GGCGGTGCAGGAAGGGATTGACTTATTCACTTTTCCGCCGGCGCCCGGTTCTCCGGAGCCGCCTCACCTTTCCCGG
CAGCCCGAGCAGCCGGAGCAGAGAGCCTTGGGTCCGGTTTCTATGCCAAACCTTGTGCCGGAGGTGATCGATCTTA
CCTGCCACGAGGCTGGCTTTCCACCCAGTGACGAC
GAGGATGAAGAGGGTGAGGAGTTTGTGTTAGATTATGTGGAGCACCCCGGGCACGGTTGCAGGTCTTGTCATTATC
ACCGGAGGAATACGGGGGACCCAGATATTATGTGTTCGCTTTGCTATATGAGGACCTGTGGCATGTTTGTCTACAGTA
AGTGAAAATTATGGGCAGTCGGTGAT
AGAGTGGTGGGTTTGGTGTGGTAATTTTTTTTTAATTTTTACAGTTTTGTGGTTTAAAGA
EXAMPLE OUTPUT
gi_209811_gb_J01917_ADRCG Adenovirus type 2, complete
genome., 1200 nucleotides
Position Score Likelihood

600 1.063 Highly likely prediction
NOTE: The Input sequence will precede the output if

FULL OUTPUT has been selected.

Promoter Prediction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Promoter Prediction

Uploaded by

Copyright:

Available Formats

Promoter Prediction

C). Degenerate String Method Model (Consensus Model)

• 2. Select the output format

Position Score Likelihood

NOTE: The Input sequence will precede the output if

You might also like