The theory of probability had its origin in gambling and games of chance. It owes much to the curiosity of gamblers who pestered their friends in the mathematical world with all sorts of questions. Unfortunately this association with gambling contributed to a very slow and sporadic growth of probability theory as a mathematical discipline. The mathematicians of the day took little or no interest in the development of any theory but looked only at the combinatorial reasoning involved in each problem.
The first attempt at some mathematical rigor is credited to Laplace. In his monumental work, Theorie analytique des probabilités (1812), Laplace gave the classical definition of the probability of an event that can occur only in a finite number of ways as the proportion of the number of favorable outcomes to the total number of all possible outcomes, provided that all the outcomes are equally likely. According to this definition, the computation of the probability of events was reduced to combinatorial counting problems. Even in those days, this definition was found inadequate. In addition to being circular and restrictive, it did not answer the question of what probability is,it only gave a practical method of computing the probabilities of some simple events.
An extension of the classical definition of Laplace was used to evaluate the probabilities of sets of events with infinite outcomes. The notion of equal likelihood of certain events played a key role in this development. According to this extension, if Ω is some region with a well-defined measure (length, area, volume, etc.), the probability that a point chosen atrandom lies in a subregion A of Ω is the ratio measure(A)/measure(Ω). Many problems of geometric probability were solved using this extension. The trouble is that one can define “at random” in any way one pleases, and different definitions therefore lead to different answers. Joseph Bertrand, for example, in his book Calcul des probabilités (Paris, 1889) cited a number of problems in geometric probability where the result depended on the method of solution. In Example 9 we will discuss the famous Bertrand paradox and show that in reality there is nothing paradoxical about Bertrand’s paradoxes; once we define “probability spaces” carefully, the paradox is resolved. Nevertheless difficulties encountered in the field of geometric probability have been largely responsible for the slow growth of probability theory and its tardy acceptance by mathematicians as a mathematical discipline.
The mathematical theory of probability, as we know it today, is of comparatively recent origin. It was A. N. Kolmogorov who axiomatized probability in his fundamental work, Foundations of the Theory of Probability (Berlin), in 1933. According to this development, random events are represented by sets and probability is just a normed measure defined on these sets. This measure-theoretic development not only provided a logically consistent foundation for probability theory but also, at the same time, joined it to the mainstream of modern mathematics.
In this book we follow Kolmogorov’s axiomatic development. In Section 1.2 we introduce the notion of a sample space. In Section 1.3 we state Kolmogorov’s axioms of probability and study some simple consequences of these axioms. Section 1.4 is devoted to the computation of probability on finite sample spaces. Section 1.5 deals with conditional probability and Bayes’s rule while Section 1.6 examines the independence of events.
In most branches of knowledge, experiments are a way of life. In probability and statistics, too, we concern ourselves with special types of experiments. Consider the following examples.
The experiments described above have certain common features. For each experiment, we know in advance all possible outcomes, that is, there are no surprises in store after the performance of any experiment. On any performance of the experiment, however, we do not know what the specific outcome will be, that is, there is uncertainty about the outcome on any performance of the experiment. Moreover, the experiment can be repeated under identical conditions. These features describe a random (or a statistical) experiment.
In probability theory we study this uncertainty of a random experiment. It is convenient to associate with each such experiment a set Ω, the set of all possible outcomes of the experiment. To engage in any meaningful discussion about the experiment, we associate with Ω a σ -field , of subsets of Ω. We recall that a σ -field is a nonempty class of subsets of Ω that is closed under the formation of countable unions and complements and contains the null set Φ.
The elements of Ω are called sample points. Any set A ∈is known as an event. Clearly A is a collection of sample points. We say that an event A happens if the outcome of the experiment corresponds to a point in A. Each one-point set is known as a simple or an elementary event . If the set C contains only a finite number of points, we say that (Ω, ) is a finite sample space . If Ωcontains at most a countable number of points, we call (Ω, ) a discrete sample space. If, however, Ω contains uncountably many points, we say that (Ω, )is an uncountable sample space. In particular, if Ω = k or some rectangle in k , we call it a continuous sample space.
Remark 1. The choice of is an important one, and some remarks are in order. If Ω contains at most a countable number of points, we can always take to be the class of all subsets of Ω This is certainly a σ -field. Each one point set is a member of and is the fundamental object of interest. Every subset of Ω is an event. If Ω has uncountably many points, the class of all subsets of Ω is still a σ -field, but it is much too large a class of sets to be of interest. It may not be possible to choose the class of all subsets of Ω as . One of the most important examples of an uncountable sample space is the case in which Ω=or Ω is an interval in . In this case we would like all one-point subsets of Ω and all intervals (closed, open, or semiclosed) to be events. We use our knowledge of analysis to specify . We will not go into details here except to recall that the class of all semiclosed intervals (a,b ] generates a class 1 which is a σ -field on . This class contains all one-point sets and all intervals (finite or infinite). We take 1. Since we will be dealing mostly with the one-dimensional case, we will write instead of 1. There are many subsets of R that are not in 1, but we will not demonstrate this fact here. We refer the reader to Halmos [42] , Royden [96] , or Kolmogorov and Fomin [54] for further details.
Let (Ω, )be the sample space associated with a statistical experiment. In this section we define a probability set function and study some of its properties.
In many games of chance, probability is often stated in terms of odds against an event. Thus in horse racing a two dollar bet on a horse to win with odds of 2 to 1 (against) pays approximately six dollars if the horse wins the race. In this case the probability of winningis 1/3.
In this section we restrict attention to sample spaces that have at most a finite number of points. Let Ω = {ω1, ω2,…,ω n}and be the σ-field of all subsets of Ω. For any A∈ ,
In games of chance we usually deal with finite sample spaces where uniform probability is assigned to all simple events. The same is the case in sampling schemes. In such instances the computation of the probability of an event A reduces to a combinatorial counting problem. We therefore consider some rules of counting.
Rule 1. Given a collection of n1 elements elements and so on, up to nk elements , it is possible to form n 1. n 2..... n k ordered k -tuples containing one element of each kind, 1 .
Let p be the proportion of red marbles in the urn before the first draw. Show that as . Is this to be expected?
[ Hint: Use (1.3.6).]
What is the probability that the wife will sit next to her husband if all possible seating arrangements are equally likely?
So far, we have computed probabilities of events on the assumption that no information was available about the experiment other than the sample space. Sometimes, however, it is known that an event H has happened. How do we use this information in making a statement concerning the outcome of another event A? Consider the following examples.
Let (Ω, , P) be a probability space, and let A, B∈ , with PB> 0. By the multiplication rule we have
In many experiments the information provided by B does not affect the probability of event A, that is, .
We wish to emphasize that independence of events is not to be confused with disjoint or mutually exclusive events. If two events, each with nonzero probability, are mutually exclusive, they are obviously dependent since the occurrence of one will automatically preclude the occurrence of the other. Similarly, if A and B are independent and PA> 0, PB> 0, then A and B cannot be mutually exclusive.
Conversely, if this relation holds, P{A I BC} ≠ P{A I B}, and PA> 0, then B and C are independent (Strait [111] ).
for any event A ⊆[0,∞ ), where λ> 0 is a known constant. Thus the probability that a battery fails after time t is given by
If the times to failure of the batteries are independent, what is the probability that at least one battery will be operating after t 0hours?