This book explores important aspects of Markov and hidden Markov processes and the applications of these ideas to various problems in computational biology. The book starts from first principles, so that no previous knowledge of probability is necessary. However, the work is rigorous and mathematical, making it useful to engineers and mathematicians, even those not interested in biological applications. A range of exercises is provided, including drills to familiarize the reader with concepts and more advanced problems that require deep thinking about the theory. Biological applications are taken from post-genomic biology, especially genomics and proteomics.
The topics examined include standard material such as the Perron-Frobenius theorem, transient and recurrent states, hitting probabilities and hitting times, maximum likelihood estimation, the Viterbi algorithm, and the Baum-Welch algorithm. The book contains discussions of extremely useful topics not usually seen at the basic level, such as ergodicity of Markov processes, Markov Chain Monte Carlo (MCMC), information theory, and large deviation theory for both i.i.d and Markov processes. The book also presents state-of-the-art realization theory for hidden Markov models. Among biological applications, it offers an in-depth look at the BLAST (Basic Local Alignment Search Technique) algorithm, including a comprehensive explanation of the underlying theory. Other applications such as profile hidden Markov models are also explored.
Die Inhaltsangabe kann sich auf eine andere Ausgabe dieses Titels beziehen.
M. Vidyasagar is the Cecil and Ida Green Chair in Systems Biology Science at the University of Texas, Dallas. His many books include Computational Cancer Biology: An Interaction Network Approach and Control System Synthesis: A Factorization Approach.
"This book provides a terrific introduction to an important and widely studied field--Markov processes (including hidden Markov processes)--with a particular view toward applications to problems in biology. With a wonderful balance of rigor, intuition, and choice of topics, the book gives a unique treatment of the subject for those interested in both fundamental theory and important applications."--Sanjeev Kulkarni, Princeton University
"Vidyasagar uses sound scholarship to address hidden Markov processes and their application to problems in computational biology, in particular to genomics and proteomics. The well-organized book examines topics not often covered, such as realization theory and order determination for hidden Markov processes, and also looks at significant properties such as ergodicity and mixing. This work will be useful to systems researchers as well as computational biologists."--Steve Marcus, University of Maryland
"This book provides a terrific introduction to an important and widely studied field--Markov processes (including hidden Markov processes)--with a particular view toward applications to problems in biology. With a wonderful balance of rigor, intuition, and choice of topics, the book gives a unique treatment of the subject for those interested in both fundamental theory and important applications."--Sanjeev Kulkarni, Princeton University
"Vidyasagar uses sound scholarship to address hidden Markov processes and their application to problems in computational biology, in particular to genomics and proteomics. The well-organized book examines topics not often covered, such as realization theory and order determination for hidden Markov processes, and also looks at significant properties such as ergodicity and mixing. This work will be useful to systems researchers as well as computational biologists."--Steve Marcus, University of Maryland
Preface, xi,
PART 1. PRELIMINARIES, 1,
Chapter 1. Introduction to Probability and Random Variables, 3,
Chapter 2. Introduction to Information Theory, 45,
Chapter 3. Nonnegative Matrices, 71,
PART 2. HIDDEN MARKOV PROCESSES, 99,
Chapter 4. Markov Processes, 101,
Chapter 5. Introduction to Large Deviation Theory, 129,
Chapter 6. Hidden Markov Processes: Basic Properties, 164,
Chapter 7. Hidden Markov Processes: The Complete Realization Problem, 177,
PART 3. APPLICATIONS TO BIOLOGY, 223,
Chapter 8. Some Applications to Computational Biology, 225,
Chapter 9. BLAST Theory, 255,
Bibliography, 273,
Index, 285,
Introduction to Probability and Random Variables
1.1 INTRODUCTION TO RANDOM VARIABLES
1.1.1 Motivation
Probability theory is an attempt to formalize the notion of uncertainty in the outcome of an experiment. For instance, suppose an urn contains four balls, colored red, blue, white, and green respectively. Suppose we dip our hand in the urn and pull out one of the balls "at random." What is the likelihood that the ball we pull out will be red? If we make multiple draws, replacing the drawn ball each time and shaking the urn thoroughly before the next draw, what is the likelihood that we have to make at least ten draws before we draw a red ball for the first time? Probability theory provides a mathematical abstraction and a framework where such issues can be addressed.
When there are only finitely many possible outcomes, probability theory becomes relatively simple. For instance, in the above example, when we draw a ball there are only four possible outcomes, namely: {R, B, W, G} with the obvious notation. If we draw two balls, after replacing the first ball drawn, then there 42 = 16 possible outcomes, represented as {RR, ..., GG}. In such situations, one can get by with simple "counting" arguments. The counting approach can also be made to work when the set of possible outcomes is countably infinite. This situation is studied in Section 1.3. However, in probability theory infinity is never very far away, and counting arguments can lead to serious logical inconsistencies if applied to situations where the set of possible outcomes is uncountably infinite. The great Russian mathematician A. N. Kolmogorov invented axiomatic probability theory in the 1930s precisely to address the issues thrown up by having uncountably many possible outcomes. Subsequent developments in probability theory have been based on the axiomatic foundation laid out in [81].
Example 1.1 Let us return to the example above. Suppose that all the four balls are identical in size and shape, and differ only in their color. Then it is reasonable to suppose that drawing any one color is as likely as drawing any other color, neither more nor less. This leads to the observation that the likelihood of drawing a red ball (or any other ball) is 1/4 = 0:25.
Example 1.2 Now suppose that the four balls are all spherical, and that their diameters are in the ratio 4 : 3 : 2 : 1 in the order red, blue, white, and green. We can suppose that the likelihood of our fingers touching and drawing a particular ball is proportional to its surface area. In this case, it follows that the likelihoods of drawing the four balls are in the proportion 42 : 32 : 22 : 12 or 16 : 9 : 4 : 1 in the order red, blue, white, and green. This leads to the conclusion that
P(R) = 16/30; P(B) = 9/30; P(W) = 4/30; P(G) = 1/30:
Example 1.3 There can be instances where such analytical reasoning can fail. Suppose that all balls have the same diameter, but the red ball is coated with an adhesive resin that makes it more likely to stick to our fingers when we touch it. The complicated interaction between the surface adhesion of our fingers and the surface of the ball may be too difficult to analyze, so we have no recourse other than to draw balls repeatedly and see how many times the red ball comes out. Suppose we make 1,000 draws, and the outcomes are: 451 red, 187 blue, 174 white, and 188 green. Then we can write
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
The symbol [??] is used instead of P to highlight the fact that these are simply observed frequencies, and not the true but unknown probabilities. Often the observed frequency of an outcome is referred to as its empirical probability, or the empirical estimate of the true but unknown probability based on a particular set of experiments. It is tempting to treat the observed frequencies as true probabilities, but that would not be correct. The reason is that if the experiment is repeated, the outcomes would in general be quite different. The reader can convince himself/herself of the difference between frequencies and probabilities by tossing a coin ten times, and another ten times. It is extremely unlikely that the same set of results will turn up both times. One of the important questions addressed in this book is: Just how close are the observed frequencies to the true but unknown probabilities, and just how quickly do these observed frequencies converge to the true probabilities? Such questions are addressed in Section 1.3.3.
1.1.2 Definition of a Random Variable and Probability
Suppose we wish to study the behavior of a "random" variable X that can assume one of only a finite set of values belonging to a set A = {a1, ..., an}. The set A of possible values is often referred to as the "alphabet" of the random variable. For example, in the ball-drawing experiment discussed in the preceding subsection, X can be thought of as the color of the ball drawn, and assumes values in the set {R, B, W, G}. This example, incidentally, serves to highlight the fact that the set of outcomes can consist of abstract symbols, and need not consist of numbers. This usage, adopted in this book, is at variance from the convention in many mathematics texts, where it is assumed that A is a subset of the real numbers R. However, since biological applications are a prime motivator for this book, it makes no sense to restrict A in this way. In genomics, for example, A consists of the four symbol set of nucleic acids, or nucleotides, usually denoted by {A, C, G, T}. Moreover, by allowing A to consist of arbitrary symbols, we also allow explicitly the possibility that there is no natural ordering of these symbols. For instance, in this book the nucleotides are written in the order A, C, G, T purely to follow the English alphabetical ordering. But there is no consensus on the ordering in biology texts. Thus any method of analysis that is developed here must be permutation independent. In other words, if we choose to order the symbols in the set A in some other fashion, the methods of analysis must give the same answers as before.
Now we give a general definition of the notion of probability, and introduce the notation that is used throughout the book.
Definition 1.1Given an integer n, the n-dimensional simplex Sn is defined as
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (1.1)
Thus Sn consists of all nonnegative vectors whose components add up to...
„Über diesen Titel“ kann sich auf eine andere Ausgabe dieses Titels beziehen.
Anbieter: HPB-Blue, Dallas, TX, USA
hardcover. Zustand: Fine. Connecting readers with great books since 1972! May contain remainder marks. We ship orders daily and Customer Service is our top priority! Bestandsnummer des Verkäufers D_00000000000010273883
Anzahl: 1 verfügbar
Anbieter: Your Online Bookstore, Houston, TX, USA
hardcover. Zustand: New. Bestandsnummer des Verkäufers 0691133158-11-33835106
Anbieter: Labyrinth Books, Princeton, NJ, USA
Zustand: New. Bestandsnummer des Verkäufers 181476
Anzahl: 5 verfügbar
Anbieter: GreatBookPrices, Columbia, MD, USA
Zustand: New. Bestandsnummer des Verkäufers 5017971-n
Anzahl: 2 verfügbar
Anbieter: PBShop.store US, Wood Dale, IL, USA
HRD. Zustand: New. New Book. Shipped from UK. Established seller since 2000. Bestandsnummer des Verkäufers WP-9780691133157
Anbieter: PBShop.store UK, Fairford, GLOS, Vereinigtes Königreich
HRD. Zustand: New. New Book. Shipped from UK. Established seller since 2000. Bestandsnummer des Verkäufers WP-9780691133157
Anzahl: 2 verfügbar
Anbieter: GreatBookPrices, Columbia, MD, USA
Zustand: As New. Unread book in perfect condition. Bestandsnummer des Verkäufers 5017971
Anzahl: 2 verfügbar
Anbieter: INDOO, Avenel, NJ, USA
Zustand: As New. Unread copy in mint condition. Bestandsnummer des Verkäufers PG9780691133157
Anzahl: Mehr als 20 verfügbar
Anbieter: INDOO, Avenel, NJ, USA
Zustand: New. Brand New. Bestandsnummer des Verkäufers 9780691133157
Anzahl: Mehr als 20 verfügbar
Anbieter: Rarewaves.com USA, London, LONDO, Vereinigtes Königreich
Hardback. Zustand: New. This book explores important aspects of Markov and hidden Markov processes and the applications of these ideas to various problems in computational biology. The book starts from first principles, so that no previous knowledge of probability is necessary. However, the work is rigorous and mathematical, making it useful to engineers and mathematicians, even those not interested in biological applications. A range of exercises is provided, including drills to familiarize the reader with concepts and more advanced problems that require deep thinking about the theory. Biological applications are taken from post-genomic biology, especially genomics and proteomics. The topics examined include standard material such as the Perron-Frobenius theorem, transient and recurrent states, hitting probabilities and hitting times, maximum likelihood estimation, the Viterbi algorithm, and the Baum-Welch algorithm. The book contains discussions of extremely useful topics not usually seen at the basic level, such as ergodicity of Markov processes, Markov Chain Monte Carlo (MCMC), information theory, and large deviation theory for both i.i.d and Markov processes.The book also presents state-of-the-art realization theory for hidden Markov models. Among biological applications, it offers an in-depth look at the BLAST (Basic Local Alignment Search Technique) algorithm, including a comprehensive explanation of the underlying theory. Other applications such as profile hidden Markov models are also explored. Bestandsnummer des Verkäufers LU-9780691133157
Anzahl: 1 verfügbar