DNA, Words and Models: Statistics of Exceptional Words

Front Cover
Cambridge University Press, Oct 13, 2005 - Computers - 138 pages
An important problem in computational biology is identifying short DNA sequences (mathematically, 'words') associated to a biological function. One approach consists in determining whether a particular word is simply random or is of statistical significance, for example, because of its frequency or location. This book introduces the mathematical and statistical ideas used in solving this so-called exceptional word problem. It begins with a detailed description of the principal models used in sequence analysis: Markovian models are central here and capture compositional information on the sequence being analysed. There follows an introduction to several statistical methods that are used for finding exceptional words with respect to the model used. The second half of the book is illustrated with numerous examples provided from the analysis of bacterial genomes, making this a practical guide for users facing a real situation and needing to make an adequate procedure choice.
 

Contents

3
11
Introduction to Markov chain models
27
4
96
Overrepresentation of Chi sites in E coli and H influenzae
112
22828
116
7
119
1
124
2
126
References
134
Copyright

Common terms and phrases

Popular passages

Page 134 - Compound Poisson approximation for nonnegative random variables via Stein's method. Ann. Prob. 20 1843-1866. BARBOUR, AD, HOLST, L. and JANSON, S. 1992b. Poisson approximation. Oxford - University Press.
Page 134 - Koutras ( 1994). Distribution theory of runs: a Markov chain approach. J. Amer. Statist. Assoc. 89, 1050-1058.
Page 134 - A., Rouxel, T., Gleizes, A., Moszer, I. and Danchin, A. (1996). Uneven distribution of GATC motifs in the escherichia coli chromosome, its plasmids and its phages.

Bibliographic information