DNA, Words and Models: Statistics of Exceptional Words

Cambridge University Press, Oct 13, 2005 - Computers - 138 pages

An important problem in computational biology is identifying short DNA sequences (mathematically, 'words') associated to a biological function. One approach consists in determining whether a particular word is simply random or is of statistical significance, for example, because of its frequency or location. This book introduces the mathematical and statistical ideas used in solving this so-called exceptional word problem. It begins with a detailed description of the principal models used in sequence analysis: Markovian models are central here and capture compositional information on the sequence being analysed. There follows an introduction to several statistical methods that are used for finding exceptional words with respect to the model used. The second half of the book is illustrated with numerous examples provided from the analysis of bacterial genomes, making this a practical guide for users facing a real situation and needing to make an adequate procedure choice.

Preview this book »

Selected pages

3	11

Introduction to Markov chain models	27

4	96

Overrepresentation of Chi sites in E coli and H influenzae	112

22828	116

7	119

1	124

2	126

References	134

Copyright

Common terms and phrases

amino acid analysis bacterium Bernoulli model calculate codons coli complete genome compound Poisson distribution compound Poisson model compound Poisson process consider count N(w Cov[Y CP model cumulated distances denoted dinf dinucleotides distance of order DNA sequences dsup eight-letter word equation estimated exact distribution exceptional words exceptionality expected count first-order Markov chain formula frequencies function gctgg gctggtgg genes genome genome of H geometric distribution ggcct given heterogeneity Hidden Markov models independent influenzae letters Markov chain Markov chain model Markov models model M1 motif Nobs nucleotide counts nucleotides number of occurrences observed process observed sequence occurrence at position over-represented p-values palindromes parameters permutation model phase Pr{U probability a(w properties protein random sequences replication scores Section segment simple distances six-letter words Sobs stationary distribution statistical strand sub-words total variation distance transition probabilities translated sequences variance w₁ word count words of length Yi+d(w

Popular passages

Page 134 - Compound Poisson approximation for nonnegative random variables via Stein's method. Ann. Prob. 20 1843-1866. BARBOUR, AD, HOLST, L. and JANSON, S. 1992b. Poisson approximation. Oxford - University Press.‎

Appears in 6 books from 1969-2005

Page 134 - Koutras ( 1994). Distribution theory of runs: a Markov chain approach. J. Amer. Statist. Assoc. 89, 1050-1058.‎

Appears in 5 books from 1964-2005

Page 134 - A., Rouxel, T., Gleizes, A., Moszer, I. and Danchin, A. (1996). Uneven distribution of GATC motifs in the escherichia coli chromosome, its plasmids and its phages.‎

Appears in 2 books from 2005-2006

Bibliographic information

Title	DNA, Words and Models: Statistics of Exceptional Words
Authors	S. Robin, F. Rodolphe, S. Schbath
Edition	illustrated
Publisher	Cambridge University Press, 2005
ISBN	052184729X, 9780521847292
Length	138 pages
Subjects	Computers › Data Science › Bioinformatics Computers / Data Science / Bioinformatics Computers / Optical Data Processing Science / Life Sciences / Genetics & Genomics Science / Life Sciences / Molecular Biology

Export Citation	BiBTeX EndNote RefMan

About Google Books - Privacy Policy - Terms of Service - Information for Publishers - Report an issue - Help - Google Home