Professional Documents
Culture Documents
Nucl. Acids Res. 2003 Thompson 3580 5
Nucl. Acids Res. 2003 Thompson 3580 5
Nucl. Acids Res. 2003 Thompson 3580 5
13
DOI: 10.1093/nar/gkg608
*To whom correspondence should be addressed. Tel: þ1 5184867882; Fax: þ1 518 473 2900; Email: thompson@wadsworth.org
Nucleic Acids Research, Vol. 31, No. 13 # Oxford University Press 2003; all rights reserved
Nucleic Acids Research, 2003, Vol. 31, No. 13 3581
HETEROGENEOUS BACKGROUND
As in previous Gibbs sampling models, the probability that a
particular position in the sequence is sampled as a site is
calculated as the ratio of the probability of the site under motif
Figure 1. The compositional variation of a 500 bp region upstream of the trans- models to the probability under a background model that
lation start site of the YDR226W/ADK1 gene from Saccharomyces cerevisiae. describes the sequence in the absence of TFBS. In previous
The probability of each base at each position is calculated by the Bayesian seg- implementations, background models assumed homogeneity in
Figure 3. The basic Gibbs Recursive Sampler data entry form, showing the
minimum amount of data required by the program.
Figure 2. The histogram shows the distribution of the distances of 182 reported
TFBS from start codons in E.coli regulatory sequences. The solid line is a mix-
ture of Gaussian functions fitted to the histogram data and represents the
default prokaryotic probability distribution for site locations.
sites) is returned at the bottom of the output, as are the 4. McCue,L.A., Thompson,W., Carmack,C.S., Ryan,M., Liu,J., Derbyshire,V.
background and null MAP terms. A full description of the and Lawrence,C.E. (2001) Phylogenetic footprinting of transcription factor
binding sites in proteobacterial genomes, Nucleic Acids Res., 39, 774–782.
output, along with other examples can be found at http:// 5. Liu,J., Neuwald,A. and Lawrence,C.E. (1995) Bayesian models for
bayesweb.wadsworth.org/gibbs/content.html#results. multiple local sequence alignment and Gibbs sampling strategies. J. Amer.
Stat. Assoc., 90, 1156–1170.
6. Lawrence,C.E. and Reilly,A. (1990) An expectation maximization (EM)
algorithm for the identification and characterization of common sites in
ADDITIONAL FEATURES unaligned biopolymer sequences. Proteins, 7, 41–51.
7. Bailey,T.L. and Elkin,C. (1994) Fitting a mixture model by expectation
The Gibbs Motif Sampler web page also allows for the analysis maximization to discover motifs in biopolymers. In Altman,R., Brutlag,D.,
of amino acid sequences. With the exception of nucleotide- Karp,P., Lathrop,R. and Searls,D. (eds), Proceedings of the Second
specific features described above, all of the sampling options International Conference on Intelligent Systems for Molecular Biology,
AAAI Press, Menlo Park, CA, pp. 28–36.
are applicable to proteins. Low complexity regions of amino 8. Webb,B.J., Liu,J.S. and Lawrence,C.E. (2002) BALSA: Bayesian algor-
acid sequences can be removed with an XNU process (20) ithm for local sequence alignment. Nucleic Acids Res., 30, 1268–1277.
using a BLOSSUM62 matrix. 9. Liu,X., Brutlag,DL. and Liu,J.S. (2001) BioProspector: discovering
conserved DNA motifs in upstream regulatory regions of co-expressed
genes. Proc. Pac. Symp. Biocomp., 6, 127–138.
10. Thijs,G., Lescot,M., Marchal,K., Rombauts,S., De Moor,B., Rouze,P. and
OTHER RESOURCES Moreau,Y. (2001). A higher order background model improves the detection
of regulatory elements by Gibbs Sampling. Bioinformatics, 17, 1113–1122.
A number of other web sites exist for the discovery of motifs in 11. Marchal,K., Thijs,G., De Keersmaecker,S., Monsieurs,P., De Moor,B. and