Mathematics of Genome Analysis, by Jerome K. Percus.

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

REVIEWS 191

Mathematics of genome analysis, by Jerome K. Percus. Pp. 139. £14.95 (pbk),


£40 (hbk). 2002. ISBN 0 521 58526 0 (pbk), 0 521 58517 1 (hbk) (Cambridge
University Press).
The mathematics of genome analysis or 'genomics' is a rapidly developing
field: as Percus puts it, 'traditional' here may refer to activities only two or three
years old! This short book provides an overview of die basic modelling techniques
from mathematics and statistical physics that are being used to tackle this most
fundamental problem of applied information theory, namely, how to reconstruct and
interpret the full sequence of base pairs in DNA from the avalanche of sequence data
provided by experimental fragments, including the problem of locating individual
genes. It is worth reminding ourselves at the outset of the size and complexity of
this linear jigsaw puzzle: although die alphabet of base pairs only consists of the four
letters A, T, G, C, there are some 3 x 109 such pairs in nuclear DNA of which only
about 3% constitute genes - the function of the remaining 97% 'junk DNA' is
unknown.
Reflecting its origin in lecture notes, Mathematics of genome analysis is
organised into five punchy chapters which are punctuated by sets of assignments.
Chapter 1 introduces the chemistry of DNA and describes statistical aspects of the
decomposition fragments produced by the action of restriction enzymes that cut at
specific short subsequences. Chapter 2 deals witii the recomposition problem: given
subchains of the full DNA sequence with (unknown) overlap, can these be ordered to
produce a map of the full sequence? This includes the problem of keeping track of
alterations that occur during meiosis and of tracing where on the DNA sequence a
specific 'word' occurs. Chapter 3 examines die statistical properties of a single full
DNA sequence, including the distribution of specific (short) sequences, largest
expected subsequences, long-range structure, repeating structure and correlations
between subsequences of a given length. Here - typically - the madiematics
involved ranges widely from asymptotic analysis of maximum likelihood estimators
and multidimensional generating functions to Markov processes and diffusion,
information-tiieoretic entropy and the finite Walsh-Fourier transform! Chapter 4
looks at die problem of comparing two DNA sequences (from the same or different
organisms). Here one is looking for significant (non-random) motifs which occur in
botii of mem which leads to probabilistic criteria for matching such as extreme value
analysis, 'scores' for imperfect matching and category analysis. All of this is
'static'; in a glimpse ahead to die next stage of genomics - die dynamics of protein
synthesis including the control aspects of the process - die final chapter contains a
tentative look at die thermodynamics involved in trying to model the separation of
DNA strands (as a precursor to transcription).
The American Joint Policy Board for Mathematics has chosen die role of
madiematics in analysing and understanding the data arising from the Human
Genome Project as die tiieme for Madiematics Awareness Montii 2002, [1]. Percus'
state of the art snapshot of what is involved in unravelling this 'cunning'st pattern of
excelling nature' (Othello, Act 5, Scene 2) could thus hardly be more timely.

Reference
1. http://www.mathform.org/mam/02/
NICK LORD
Tonbridge School, Kent TN9 HP

Downloaded from https://www.cambridge.org/core. Northwestern University Libraries, on 02 Jan 2020 at 22:01:45, subject to the
Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0025557200172596

You might also like