In the textbook, the mathematical foundations of the probability theory are presented on the basis of Kolmogorov's
680 91 6MB
English Pages [360] Year 2020
МINISTRY OF EDUCATION AND SCIENCE OF THE REPUBLIC OF KAZAKHSTAN ALFARABI KAZAKH NATIONAL UNIVERSITY
Nursadyk Аkanbay
PROBABILITY THEORY AND MATHEMATICAL STATISTICS I Textbook Translated from Russian by L.M. Kovaleva and Z.I. Suleimenova
Almaty «Qazaq University» 2020
UDC 519.2(075.8) LBC 22.172я73 А 37 Recommended for publication by the decision of the meeting of the educationalmethodical association of the Republican educationalmethodological Council of the Ministry of Education and Science of the Republic of Kazakhstan on the basis of AlFarabi Kazakh National University in the specialties of higher and postgraduate education in the groups «Natural Sciences», «Humanities», «Social Sciences, Economics and Business», «Technical sciences and technology», «Creativity» (Minutes No.2 dated November 14, 2017); Scientific Council of Mechanics and Mathematics Faculty and RISO KazNU alfarabi (Minutes No.5 dated June 27, 2019) Rewievers: Doctor of Physical and Mathematical Sciences, Professor B.E. Kanguzhin Doctor of Physical and Mathematical Sciences, Professor М.N. Kalimoldaev Doctor of Physical and Mathematical Sciences, Professor B.D. Koshanov Translated from Russian by I.M. Kovaleva and Z.I. Suleimenova
Akanbay N. А 37 Probability Theory and Mathematical Statistics: in 2 parts. Part І: textbook / N. Akanbay. – Almaty: Qazaq University, 2020. – 359 p. ISBN 9786010445635 In the textbook, the mathematical foundations of the probability theory are presented on the basis of Kolmogorov's axiomatics. In the first chapter, materials on random events and their probabilities are presented in the framework of a discrete probability space. The second chapter is devoted to the general probability space. In the third chapter, a random variable is defined as a measurable function. The concept of mathematical expectation considered in the fourth chapter is introduced as a Lebesgue integral in a probability measure on a general probability space, while the readers are not required to have any information about the Lebesgue integral. The fifth chapter is devoted to limit theorems in the Bernoulli scheme. The last sixth and seventh chapters contain materials on various types of convergence of sequences of random variables and on the laws of large numbers. The textbook is recommended for undergraduate, graduate and doctoral PhD students studying in mathematical specialties and specialties related to mathematics. UDC 519.2(075.8) LBC 22.172я73 ISBN 9786010445635
© Аkanbay N., 2020 © AlFarabi KazNU, 2020
Глава
CONTENT
FOREWORD .............................................................................................................. 9 Chapter І. RANDOM EVENTS AND THEIR PROBABILITIES ........................ 15 §1. Sample space. Classical definition of probability. Simplest probability properties ................................................................................. 15 1.1. Discrete probability space...................................................................................... 16 1.1.1. Classical definition of probability ...................................................................... 17 1.1.2. Events. Operations on events .............................................................................. 19 1.2. Elements of combinatorics .................................................................................... 27 1.3. Distribution of balls into boxes.............................................................................. 35 1.4. Tasks for independent work................................................................................... 42 §2. Some classical models and distributions ............................................................. 45 2.1. The Bernoulli scheme. Binomial distribution........................................................ 45 2.2. Polynomial scheme. The polynomial distribution ................................................. 47 2.3. Hypergeometric and multidimensional hypergeometric distributions .................. 49 2.4. Tasks for independent work................................................................................... 53 §3. Geometric probabilities ........................................................................................ 54 3.1. Tasks for independent work................................................................................... 56 Chapter II. PROBABILITY SPACE ........................................................................ 59 §1. Axioms of probability theory. General probability space ................................. 59 1.1. The necessity of expanding the concept of space elementary events .................... 59 1.2. The probability on a measurable space .................................................................. 63 1.2.1. Probability properties.......................................................................................... 63 1.3. Tasks for independent work................................................................................... 73 §2. Algebras, sigmaalgebras and measurable spaces ............................................. 74 2.1. Algebras and sigmaalgebras ................................................................................. 74 2.1.1.The theorem on the continuation of probability .................................................. 76 2.2. The most important examples of measurable spaces ............................................. 77 2.2.1. Measurable space (R, β(R))................................................................................. 77 2.2.2. Measurable space (Rn, β(Rn)) .............................................................................. 78 2.2.3. Measurable space (R∞, β(R∞)) ............................................................................. 81 2.3. Tasks for independent work................................................................................... 82 § 3. Methods for specifying probabilistic measures on measurable spaces ........... 85 3.1. Space (R, β(R)). Distribution function ................................................................... 85 5
3.2. Space (Rn, β(Rn)). Multidimensional distribution function .................................... 92 3.3. Space (R∞, β(R∞)) ................................................................................................... 97 3.4. Tasks for independent work .................................................................................. 98 §4. Conditional probability. Independence .............................................................. 100 4.1. Conditional probability. The formula for multiplying probabilities ...................... 100 4.2. Independence ......................................................................................................... 105 4.2.1. Independence of events....................................................................................... 105 4.2.2. Independence of partitions and algebras. Independent trials. Independence of σalgebras .......................................................................................... 112 4.3. Total probability and Bayes formulas.................................................................... 120 4.3.1. Total probability formula .................................................................................... 121 4.3.2. The Bayes formulas ............................................................................................ 128 4.4. Tasks for independent work ................................................................................... 130 Chapter III. RANDOM VALUES ............................................................................. 135 §1. Random values and their distributions............................................................... 135 1.1. Discrete random variables ..................................................................................... 139 1.2. Absolutely continuous random variables............................................................... 142 1.3. Equivalent definitions of a random variable .......................................................... 147 1.4. Functions of (one) random variable ....................................................................... 148 1.4.1. Distributions of functions of a random variable ................................................. 148 1.4.2. Structures of measurable functions ..................................................................... 150 1.5. The class of extended random variables and the closure of this class with respect to pointwise convergence ......................................................................... 152 1.6. Tasks for independent work................................................................................... 153 §2. Multidimensional random variables ................................................................... 156 2.1. Multidimensional random variables and their distributions. Marginal distributions ................................................................................................... 156 2.1.1. Multidimensional discrete and absolutely continuous random variables ........... 158 2.2. Independence of random variables ........................................................................ 167 2.3. Functions of random variables............................................................................... 171 2.3.1. Distributions of sums, relations, and products of random variables................... 172 2.3.2. Linear transformation of random variables ........................................................ 176 2.4. Conditional distributions ....................................................................................... 186 2.5. Tasks for independent work................................................................................... 190 Chapter ІV. MATHEMATICAL EXPECTATION ................................................ 197 §1. General definition of mathematical expectation. Properties ............................ 197 1.1. The multiplicative property ................................................................................... 205 1.2. Properties «almost sure» ....................................................................................... 206 1.3. Convergence properties ........................................................................................ 209 1.4. Formulas for computing expectation ..................................................................... 212 1.4.1. Fubini theorem and some of its applications ...................................................... 216 1.5. Variance ................................................................................................................. 219 1.6. Inequalities that are related to mathematical expectation ...................................... 224 1.7. Mathematical expectation and variance: examples of calculation ........................ 229 1.8. Tasks for independent work................................................................................... 240 §2. Conditional probabilities and conditional mathematical expectations with respect to partitions and sigma algebras .......................................................... 247 6
2.1. Conditional probabilities and conditional mathematical expectations with respect to partitions ............................................................................................... 250 2.1.1. The conditional mathematical expectation of one simple random variable relative to another simple random variable..................................................... 252 2.2. Conditional probabilities and conditional mathematical expectations with respect to sigma algebras ...................................................................................... 256 2.2.1. Existence of conditional expectation with respect to the σalgebra.................... 258 2.2.2. Consistency of the definition of the conditional mathematical expectation with respect to a partition with definition of conditional mathematical expectation with respect to σalgebras.......................................................................... 260 2.2.3. Properties of conditional expectation ................................................................. 261 2.3. The structure of the conditional mathematical expectation of one random variable relative to the other ................................................................ 268 2.3.1. Properties and formulas for calculating the conditional expectation M(ξ/η = y).................................................................................................. 269 2.4. The conditional mathematical expectation and the optimal (in the meansquare sense) estimator ............................................................................ 273 2.5. Tasks for independent work .................................................................................. 276 Chapter V. LIMIT THEOREMS IN THE BERNULLI SCHEME §1. Laws of large numbers ......................................................................................... 281 §2. Limit theorems of MoivreLaplace ..................................................................... 285 2.1. The local MoivreLaplace theorem ....................................................................... 285 2.2. The MoivreLaplace integral theorem ................................................................... 290 2.2.1. Deviation of the relative frequency from the constant probability in independent tests....................................................................................................... 294 2.2.2. Finding the probability of the number of successes in the given interval .......... 295 §3. Poisson's theorem.................................................................................................. 299 §4. Tasks for independent work.................................................................................... 301 Chapter VI. DIFFERENT TYPES OF CONVERGENCE OF SEQUENCES OF RANDOM VARIABLES ..................................................... 303 §1. Different types of convergence of sequences of random variables and their connection ................................................................................................... 303 §2. Weak convergence ................................................................................................ 311 §3. The Cauchy criterion for convergences in probability and with probability 1 ................................................................................................ 315 §4. Tasks for independent work.................................................................................... 319 Chapter VII. THE LAWS OF LARGE NUMBERS .............................................. 323 §1. The weak law of large numbers ........................................................................... 323 1.1. Necessary and sufficient condition for the law of large numbers ......................... 326 §2. The strong law of large numbers ......................................................................... 330 §3. Tasks for independent work.................................................................................... 341 7
ANSWERS TO THE TASKS FOR INDEPENDENT WORK............................... 343 Appendix 1. Table of values of function M x Appendix 2. Table of values of function ) 0 x
1 x2 / 2 ...................................... 353 e 2S 1 x z2 / 2 dz ........................... 354 ³ e 2S 0
Appendix 3. Values of Poisson distribution S k (O )
e O
Ok k!
.................................... 355
BIBLIOGRAPHY ...................................................................................................... 357
8
Глава
FOREWORD
«Theory of probability and Mathematical Statistics» is included in the cycle of mandatory fundamental disciplines of mathematical specialties of universities. In addition, this subject required, is studied in a number of other specialties (mathematical and computer modeling, mechanics, computer science, physics, economics, actuarial mathematics, etc.) of universities. The proposed textbook was written by the author on the basis of his many years of experience in teaching various (simplified, ordinary, complicated, etc.) courses of this discipline to students, undergraduates and doctoral students of the PhD specialty «Mathematics» of the Mechanics and Mathematics Department of AlFarabi Kazakh National University. The content of the textbook is fully consistent with the standard curriculum of the discipline «Theory of probability and Mathematical Statistics» and consist of even parts. In accordance with tradition, the first chapter is devoted to elementary probability theory. The presentation begins with the construction of probabilistic models with a finite number of outcomes and the introduction of basic probabilistic concepts, such as elementary events, events, operations on events, event, probability etc. Then the concepts of a discrete space of elementary events and a discrete probability space are introduced. Further, special attention was paid to the classical definition of probability: on the basis of this definition, a number of simple but important properties of probability were proved; the concept of conditional probability is defined. Here are also given: necessary information from combinatorics; sampling models with and without return are considered; models for placing balls in boxes (statistics by Maxwell – Boltzmann, Bose – Einstein, Fermi – Dirac); Bernoulli scheme; binomial and polynomial models; hypergeometric and multidimensional hypergeometric distributions. At send of the chapter with a geometric definition of probability and provides solutions to two classical examples (the meeting problem, the Buffon problem). At the beginning of Chapter II, the rationale for the need to expand the concept of probability is given, after the concepts of a measurable set (random event), a measurable space are introduced and the axiomatics of probability theory of A.N. Kolmogorov. The properties of probability are proved on the basis of the axioms of probability theory also. Here, a theorem on the equivalence of the axioms of continuity and countable additivity is proved given, the most important examples of 9
measurable spaces, and methods for defining probability measures on measurable spaces using the distribution function are discussed. At the ends of chapter introducted concepts of conditional probability and independence (two or more events; partitions, algebras, and sigma – algebras; testing), proofs total probability of formulas, Bayes, and a number of examples with solutions and discussion (the problem of ruining a player, the problem of choosing a change strategy verbal exam, etc.). In § 1 of Chapter III, the concept of a random variable (as a measurable function) is introduced and other equivalent, usually easily verifiable, definitions are given. Since the distribution function of a random variable is a distribution function in the sense of the definition from Chapter II, based on the results known from the theory of functions, found a random variable can be only types – discrete, (absolutely) continuous and singular. A sufficient number of examples of discrete and continuous random variables are given. Section 2 of this chapter is devoted to the consideration of multidimensional (vector) random quantities and their distributions, questions of finding marginal and conditional distributions, and distributions of a function of random variables. Here, special attention is also paid to the concept of independence of random variables (criterion for the independence of discrete and continuous random variables, independence of (Borel) functions from independent random variables, composition formula, etc.). In §1 of Chapter IV, the theory of mathematical expectation is described – the Lebesgue integral over a probability measure (axiomatically introduced in Chapter II). Note that the reader is not supposed to know any preliminary information about Lebesgue integration. The main properties of mathematical expectation (properties: linearity, positivity, and finiteness; multiplicative property; almost certainly properties and convergence properties) and related to the mathematical expectation of CauchyBunyakovsky, Jensen, Lyapunov, Gelder, Minkowski, Markov, and Chebyshev inequalities are given with full proofs. The formulas for calculating the mathematical expectation are obtained using the variable replacement formula in the Lebesgue integral. Note also that the Carathéodory theorem on continuation of a measure known from measure theory is given without proof. The next paragraph (i.e., §2) of Chapter IV is devoted to the theory of conditional expectation with respect to sigma – algebras. Here, the concept of conditional expectation with respect to an event is first introduced as a natural generalization of ordinary (unconditional) expectation, and conditional probability is defined as the conditional expectation of an event indicator. After that, this definition is extended to the case of a partition and a number of basic properties are proved (including the formula for full mathematical expectation). Moreover, it is emphasized that this conditional expectation, by definition, is a random variable, measurable relative to the smallest sigmaalgebra generated by the original partition. After that, the concept of conditional mathematical expectation with respect to sigma – algebra, like the concept of mathematical expectation, is determined axiomatically (first for nonnegative, after – for any random variables). The existence of such a conditional mathematical expectation is proved with the help of the Radon – Nicodemus theorem known from the theory measure (this theorem is not proved in our textbook, it is only formulated in a form convenient for us). At the end of §2, the structure of the conditional mathematical 10
expectation of one random variable relative to another random variable is clarified, formulas for calculating the conditional mathematical expectations are obtained, and special attention is paid to the relation between the conditional mathematical expectation and the optimal meansquare estimate. The limit theorems presented in Chapter V in the Bernoulli scheme are historically the first classical limit theorems of probability theory and played an important role in the formation and further development of this science. Here, for the Bernoulli scheme, direct analytical evidence of the weak and strong laws of large numbers, the local and integral Moavre – Laplace theorem, and the Poisson theorem are presented, and some of their applications are considered. Chapter VI is devoted to the definitions of various types of convergence of sequences of random variables and their mutual relations. Of the many types of convergence, the following four types of convergence are considered here: convergence in probability, convergence with probability one (almost – probably), convergence on average of the order of r, and weak convergence (convergence in distribution). The main theorem on the relations between these types of convergence is proved and the Cauchy criterion for the first three types of convergence is proved. Counterexamples are given that show why the converse statements do not hold (for example, why convergence in probability does not imply convergence with one probability, etc.). The weak convergence is studied in more detail and equivalent definitions are given in terms of the corresponding sequences of the distribution function. In chapter VI, the laws of large numbers are presented from a general probabilistic point of view. Section 1 of this chapter considers a weak law of large numbers based on probability convergence (usually this law is simply called the law of large numbers), theorems (giving sufficient conditions) on the validity of the law of large numbers are proved. At the end of this a theorem on the necessary and sufficient condition for the law of large numbers is proved. In §2 we consider the strengthened version of the law of large numbers – (based on convergence with probability 1) the strengthened law of large numbers. Here several theorems are proved (Cantelli's theorem, Kolmogorov's theorem, etc.), and this section ends with a proof of the theorem on the necessary and sufficient condition for the strengthened law of large numbers in the case of a sequence of independent identically distributed random variables with finite mathematical expectation. Let us briefly dwell on the goals and methodologies of presenting materials in the proposed textbook. Considering that in the credit system of education a very important role is given to students' independent work, at the end of each paragraph of the textbook, a sufficient number of tasks and exercises (covering all the topics of this section) are offered for students to work independently. The significance of the task for independent work can be different: in some of them it is proposed to prove the statements formulated, but not proved in the main text; in others, it is required to fill in the missing details of incompletely proved statements; others contain statements used in the following statement; the fourth are aimed at providing additional information on the range of 11
issues under consideration; finally, some are simple statements. It should be noted that these tasks for independent work can be used in practical classes, as well as tasks for educational and research work. In addition, in the textbook: The materials under consideration are stated with the preservation of the structural and logical connection with other generally studied and previously studied compulsory subjects; Presentations of various topics and subjects of the discipline are carried out with the preservation of continuity, with particular attention paid to the disclosure, justification and probabilistic meaning of the introduced fundamental concepts, definitions and proved statements; When summarizing the previously studied materials (concepts, definitions, statements, etc.), special attention is paid to the justification of such a generalization and the feasibility of such an expanded study of this range of issues, their further role for this topic and, if possible, for the field of theory under study; Along with the foregoing, we: In the course of the presentation of the materials, we did our best to strengthen the activation of the creative and cognitive activities of students. To do this, when considering certain key issues (when proving statements, when introducing new concepts, etc.), students certainly paid special attention to the importance of certain conditions, to the possibility of overcoming the difficulties encountered, to the historical sequence of the results obtained, to building examples and counterexamples confirming or refuting certain positions of the theory under study (theorems, lemmas, statements, etc., proved); We tried to achieve a gradual transition of the learning process to a creative research process, to promote the awakening of students' interests in research work, while guided by the fact that our main goal is to prepare a highly qualified scientific researcher and scientistteacher; We constantly had in mind that the textbook on a specific discipline is primarily intended to provide individual training for everyone, for which the materials inside the textbook are arranged in such a way that everyone, based on their individual capabilities, can choose their path within this discipline with maximum benefit for themselves; Given that the discipline textbook is one of the important links in the process of choosing an individual trajectory for students (undergraduates), we tried to pay special attention to the development of professional interests related to this subject. In conclusion, we note that this textbook is primarily intended for students, undergraduates and doctoral students of the PhD specialty «Mathematics», but for the reasons given below, students who are close to mathematics, such as mechanics, computer and mathematical modeling, etc. can also use the textbook. In our opinion, the structure of the textbook allows university lecturers to choose a set of paragraphs from it, and within a paragraph there are many necessary points and subparagraphs (there may be many options), using which (not necessarily completely) you can develop an appropriate course in probability theory and mathematical statistics for various learning levels of the specialty «Mathematics» and other specialties. For example, from §2 of Chapter III, for undergraduate mathematics students, it will be sufficient to study the materials of subparagraphs 2.1.2 and 2.2.1 for discrete random variables, while, for example, for undergraduate mathematicians it is necessary to study the materials of paragraph 1.2 and subparagraph 1.4. 1 and 1.4.2 of §1 of Chapter IV. 12
The numbering of paragraphs in the textbook in each chapter is independent, sush as the numbering of theorems (lemmas, examples, etc.) in each paragraph. For ease of reading, the following direct system of references to theorems, lemmas, examples, formulas, etc., is used, depending on how far they are from the reading place. If reference is made to Theorem 1 or formula (3) of the paragraph to be read, then a reference to them will look like this: Theorem 1, formula (3). If reference is made to Theorem 1 and formula (3) of one of the previous paragraphs (for example, §3) of this chapter, then the reference will be of the form: §3, Theorem 1; §3, formula (3). If the link refers to another chapter, then the link starts with the chapter number: for example, if in the previous case the link refers to chapter II, the link looks like this: chapter II, §3, Theorem 1; chapter II, §3, formula (3). If the link refers to a source from the bibliographic list, then first the number of the source is indicated in square brackets (for example, [9]), and then the data of the literary source are indicated in accordance with the above agreements. The sign ■ means the end of the proof of the theorem (lemma, corollary, statement). N. Akanbay Almaty, 20192020
13
14
Chapter
І RANDOM EVENTS AND THEIR PROBABILITIES
§1. Sample space. Classical definition of probability. Simplest probability properties From a historical point of view, the classical definition of probability (classical probability) is the very first definition of probability. The classical definition is based on the notion of equal probability (equiprobability) of the outcomes of the random phenomenon under study. The property of equal probability (equiprobability) is a formally undefined primary concept. Let's give some examples on the explanation of the meaning of this property. 1. A single toss of a fair coin. It is clear that the possible outcomes of this experiment are: the occurrence of «Head», the occurrence of «Tail». In addition, the coin, possibly, will stand on the edge, roll away somewhere, etc. It is possible to list a number of mutually exclusive events that can occur with a real coin. In the mathematical description of this experiment, it is natural to abstract from a number of insignificant (practically impossible) outcomes and confine ourselves to only two (only possible) outcomes: the occurrence of «Head» (we denote this by «H»), the occurrence of «Tail» (we denote this by «T»). It is clear that if the coin is symmetric, then these two outcomes have no advantages over each other, and these two outcomes are equally likely, in other words, equally probable. 2. A single toss of a dice. A dice is a regular cube made of a homogeneous material, with faces numbered from 1 to 6. When throwing such a cube, only one of six outcomes can actually be realized: one point falling out, two points falling out, ..., six points falling out, and all these outcomes are equally likely (equally probable). As we see, in the above examples, individual outcomes have no advantages over each other – they are equally possible outcomes. The outcomes «H» and «T» when throwing a coin are mutually exclusive – they cannot occur simultaneously. The following events (outcomes) are also mutually exclusive: Ai ^ i `, i 1,6 , ( Ai means the occurrence of ݅points i 1,6 : when ݅ ് ݆ the events Ai and A j cannot occur simultaneously. Such only possible and 15
mutually exclusive (simultaneously nonappearing) outcomes of the experiment will be called elementary events (in the future elementary events will be denoted by Z («omega small»)). In this sense, for example, А = { an even number fell out with one dice throwing} is a composite (complex) event: A ^2,4,6`. This event will occur if and only if at least one of the elementary events {2}, {4}, {6} occurs. The set of all possible elementary events Z is called a sample space and is denoted by : («omega large»): : {Z} . Thus, sample space : is a set of all the only possible mutually exclusive outcomes of the experiment (elementary events) such that each experimental result of interest to us can be uniquely described with the help of the elements of this set. In the simplest experiments : is a finite set (finite sample space). In the above example (a coin toss) it consists of two elements: : ={H, T}. In the example of tossing a dice it consists of six elements: : ={1, 2, 3, 4, 5, 6}. But even with tossing a coin, it is possible to relate experiments in which the finite space of elementary events is not enough. Consider, for example, such an experiment: the coin is tossed until the Tail falls out, and when it first drops, the experiment stops. If, as above, the Tail dropping out is denoted by «T», and the Head dropping out by «H», then it is clear that the corresponding space : can be described as a countable set: : = {T, HT, HHT, …, HH…HT, …},
where an elementary event ߱ ൌ ܪܪ ᇣᇧᇤᇧᇥ ǥ ܶ ܪ, n 1, 2,... means that Tail first drops out ିଵ
at the nth coin flip. Finite or countable sample space is called a discrete sample space. Any subset А of a discrete sample space : ( A : ) is called an event (random event). Events are usually denoted by uppercase Latin letters: A, B, C, … If as a result of the experiment any of the elementary events Z A occurs, then it is said that the event A occurred: A occurred, if any Z A occurred and vice versa. For example, an event ܣൌ ሼʹǡ Ͷǡ ሽ, that means the dropping out of an even number of points in one throw of a dice will occur if and only if one of the events ܣଶ ൌ ሼʹሽǡ ܣସ ൌ ሼͶሽǡ ܣൌ ሼሽoccurs. 1.1. Discrete probability space Let
:
^Z1 ,Z2 ,...,` .
(1)
be a discrete sample space. Definition. If on : the nonnegative numerical function P, such that f
¦ P Zi i 1
16
1,
(2)
is given then it is said that the probabilities of elementary events are given (also it is said that the function P determines the probability distribution on : ), in this case the numbers pi PZi are called the probabilities of corresponding elementary events Z i ǣ ݅ ൌ ͳǡʹǡ ǥ. The probability of an event A : is a number P A
¦ P Zi ¦
i:Zi A
i:Zi A
pi ,
(2′)
i.е. the sum of the probabilities of all elementary events leading to an event A. The definition of probability (2′) is correct (the series in the righthand side absolutely converges). We note here that in the probability theory, we will not be interested in specific numerical values of the function P (this is only a question of the practical value of one or another model). It is clear that in the coin toss model one must assume P (T) = P (H) = 1/2; in the case of a symmetric dice P (1) = P (2) =... = P (6) = 1/6. In the experiment with coin tossing before the first Tail dropping out one must assume ି P (߱ ሻ ൌ ʹି ǡ ݊ ൌ ͳǡʹǡ ǥ. Since σஶ ൌ ͳ, then the function P , which is given ୀଵ ʹ on the outcomes of types ߱ ൌ ܪܪǥ ܶܪ, will define the probability distribution on : = {T, HT, HHT, … , HH…HT, …}. To calculate within the given probability space, for example, the probability of the event A = {the experiment will end at an even step}= = ሼ߱ଶ ǡ ߱ସ ǡ ǥ ሽ, it is necessary to calculate the sum of the corresponding probabilities: ܲሺܣሻ ൌ ͳȀ͵Ǥ
Definition. Let : ^Z1 , Z2 ,...,` be a discrete sample space, A ࣛ ^ A : A :` – a collection of all events (subsets) of : . Then a numerical function P, defined on ࣛ and satisfying properties (2), (2'), is called a probability or a probability function on ሺ : ǡ ࣛሻǡ and the triple ሺ : ǡ ࣛǡ ܲሻ is called a discrete probability space. (If : is a finite set, then ሺ : ǡ ࣛǡ ܲሻ is called a finite probability space). Remark. Everywhere in the future, by A we will denote the number of elements of a finite set A . Thus, a finite probability space is a triple ሺ : ǡ ࣛǡ ܲሻ with ห : ห ൏ λ. 1.1.1. Classical definition of probability Consider a finite probability space ሺ : ǡ ࣛǡ ܲሻ. Let : ^Z1 , Z2 ,..., Zn ` , : n f, and all the elementary events are equiprobable:
p
P ωi , i 1,...,n. 17
Then by the property (2)
1
n
¦p
np, p
i 1
1 , n
and for any event A : by the formula (2′) 1 i:ZiA n
P A
¦
A
A
n
:
.
(3)
Definition of probability of event A : ሺห : ห ൌ ݊ ൏ λሻ by the formula (3) is called a classical definition of probability (the term «uniform discrete distribution» is also often used). Thus, the classical definition of probability is used in those cases when all the elementary events (outcomes) of the experiment under consideration are equally likely (equiprobable), i.e. in determining conditions of this experiment, no elementary event has any advantages over others. The classical definition of probability can be formulated differently as follows: The probability of an event A is the ratio of the number of cases (elementary events) which are favorable for A (i.e., leading to the occurrence of event A) to the total number of cases (elementary events); The probability of the event A is equal to the ratio of the number A of elements of the event A to the number ห : ห ൌ ݊ of elements of : . Thus, in the experiment with a symmetric coin toss, the probability of an event A = {Tail occurs} = {T} is equal to P (А) = 1/2, and for the experiment with the throwing of a symmetric dice the probabilities of events A ={even number of points occurs} ( A ^2,4,6` ), B ={at least 3 points occur} ( B ^3,4,5,6` ), С = {the number of points dropped is a multiple of three} ( C ^3,6` ) are equal to (respectively) P A
3 6
1 , P B 2
4 6
2 , P C 3
2 6
1 . 3
From the classical definition of probability (2) it follows that the probability of any event A is between zero and one: Ͳ P ሺʏሻ ͳ (in the formula (3) Ͳ ȁܣȁ ݊). One of the first problems solved using the classical definition of probability was the following: Two dice are tossed at the same time. Find the probabilities of all possible sums of dropped points. It's clear that the sum of the dropped points is between 2 and 12. As the space of elementary events in this experiment, one can take the set :
where an elementary event Zi , j the second one.
^Z
ij
i, j :i, j
`
1,6 ,
i, j means that i points fell on the first dice, 18
j  on
The total number of elementary events is : 6 6 36, and all the elementary events are mutually exclusive (different elementary events cannot occur simultaneously) and equally likely (because of the symmetry of the dice, none of the elementary events has an advantage over the other), so a classical definition of probability can be applied to this problem. We introduce the events ܣ , meaning that the sum of the dropped points is k:
Аk
^ i, j : : i j
i.e.
A2 then A2
1. A3
Therefore A3 Furthermore,
А4
k`,
k
2,3,...,12 ,
^1,1 ` ,
^1, 2 , 2,1 `
2.
^1,3 , 2,2 , 3,1 `,... А11
^5,6 , 6,5 `,
А12
^6,6 `,
so
А4
3 ,..., А11
2, А12 1.
Consequently, their probabilities are computed from the classical definition of probability: Ρ Ак
Ак Ω
Ак . 36
Having calculated all the necessary probabilities, we obtain the following table. Table 1 Sum of dropped points ߢ Probabilities
Ρ Ак
2
3
4
5
6
7
8
9
10
11
12
1 36
2 36
3 36
4 36
5 36
6 36
5 36
4 36
3 36
2 36
1 36
Sum of all probabilities ¦ 1
1.1.2. Events. Operations on events The concepts, definitions, and properties of the probability introduced below apply not only to the case of a discrete probability space, but they also remain valid for all probabilistic spaces considered in what follows. 19
Certain event. I.e. : ك: , so : {Z} is an event and this event will necessarily happen as a result of experiment. Such an event is called a certain event. Thus, a certain event (designation: : ) is an event that will necessarily occur as a result of experiment. Impossible event is an event that will never happen as a result of experiment. An impossible event is denoted by (an empty set). Sum (union) of events. Sum (union) of events А and В (designation: A B ) is an event consisting of elementary events belonging to at least one of the events А and В: A B = { Z : : Z A or Z B }. Thus, the sum A B of events А and B is an event, which will occur if and only if at least one of them occurs. Product of events. A product (intersection) of events А and В (designation: A B or AB ) is an event which consists of elementary events belonging to А and В :
A B
^Z : : Z A , Z B`.
So, a product А∩В of events А and В is an event, which occurs if and only if events A and B occur simultaneously. Difference of events. A difference of events А and В (designation: A \ B ) is an event, which consists of elementary events belonging to А but not belonging to В :
A\ B
^Z : :Z A, Z B`
So, a difference A \ B of events А and В is an event, which occurs if and only if an event A occurs and В doesn’t occur. Opposite event. An event opposite to event А (designation: А ) is an event, which consists of all elementary events not belonging to А : А
^Z : :Z A`
So, an opposite event А occurs if and only if an event A doesn’t occur. Implication of one event from another. If all elementary events belonging to an event A also belong to an event B, then it is said that an event A implies an event B (designation: A B ):
A B ZAZ B. So, A B (an event А implies an event B) means that each time an event А occurs, an event В will also occur. Equality of events. If an event А implies an event В and an event В implies an event А , then such events are called equal events (designation: A B ): 20
A B A B and B A . An equality of events А and В means that events А and В occur or do not occur simultaneously. Disjoint events. If an occurrence of event А implies nonoccurrence of another event B and vice versa, in other words, if events А and В cannot occur simultaneously (i.е. AB ), then such events are called disjoint events. Remark. Further for disjoint events, instead of the sign of the sum « » we write the usual sign of the sum «+»: A B
A B , AB .
The following simple but very important relationships between events follow directly from the above definitions of operations on events:
: , : , A A : , AA , A \ B AB B \ A , A A , A : , A , A A A , AA A .
To illustrate the meaning of the operations applied to the events, we describe them using the Wien diagrams. Let the experiment consist in selecting a point inside the rectangle shown in Figure 1. Denote by А the event «the selected point lies inside the left circle», and by В the event «the selected point lies inside the right circle». Then the events A, А, B, В, A B, AB, A \ B, А \ В consist in hitting of the selected point in the inner parts of the regions shaded in the corresponding figures in Fig. 1. Let us give one example. Let the experiment consist in throwing a dice once. Denote by А falling of six points on the top face of the dice, by В – three points, by C – an even number of points, by D – a multiple of 3 points. Then
A
A B
A
AB
B
B
A\ B
A\ B
Figure 1. Description of the operations applied to the events using the Wien diagrams: 21
:
^1,2,3,4,5,6`, A ^6`, B ^3`, C ^2,4,6`, D ^3,6` , A C, B D, A D, A B D, CD A, AB , C ^1,3,5` , A \ C , C \ A ^2,4`, AD A.
The definition of the sum and product for two events can be generalized for any number of events. For example, the sums A1
A2
...
f
Ai
i 1
and A1 A2 ...
f
¦ Ai i 1
mean, respectively, the occurrence of at least one event from the sequence of events A1 , A2 ,... and one of the pairwise disjoint events from A1 , A2 ,... , and the event A1
f
A2 ...
Ai (or A1 A2 ...
f
Ai ) i 1
i 1
means simultaneous occurrence of all events A1 , A2 ,... . By the definition, operations of summation and multiplication ciative and commutative: for any events А,В,С
A A
B C
B
C
A
A
B B
C , A B
B
A,
C, A B
B
A.
are asso
In addition, summation and multiplication operations are mutually distributive: A ( B C ) ( A B) ( B C ), A ( B C ) ( A B) ( A C ).
The following duality principle plays a very important role in the probability theory: Duality principle. An event opposite to the sum of two events is equal to the product of the opposite events, and the event opposite to the product of two events is equal to the sum of events opposite to them: for any two events A and В ________
A B
_______
A B, A B 22
AB.
(4)
Proof. Let us prove, for example, the first equality in (3) as the equality of events. ________
By definition, an event A B is that the event A B (occurrence of at least one of events A or B) doesn’t occur. But it means that neither A nor B occurs, i.e. A and B , i.e. an event A ________
A
________
B occurs. Consequently, the event A B implies the event A B :
B A
B. Conversely, if an event A B occurs, then neither A, nor B occurs. Therefore, ________
________
an event A B doesn’t occur, i.e. an event A B occurs. Therefore, A B A B . Thus, by the definition of the equality of events, ________
B=A
A
B.
We now prove the same thing in terms of operations on sets. To do this, it suffi________
ces to show that every element of the set A B is contained in A B (and thus ________
A
B A
________
B ), and vice versa, every element of the set A B is contained in A B ________
(i.e. A B A B ). In its turn, these inclusions follow from the following chain of relations: ________
Z A B Z А B Z A, Z B Z A , Z B Z A B . The following relationship is also usually referred to as the duality principle:
A B A B .
(4´)
Obviously, the duality principle for a countable number of events A1 , A2 ,... is written as: f
f
i 1
i 1
Ai Ai ,
f
f
i 1
i 1
Ai Ai .
(4″)
The role of the duality principle in the probability theory is that for any statement related to a certain system of events an equivalent dual statement can be formulated in which the events must be replaced by the opposite ones, respectively, the sum by the product and vice versa, and the relations (3´) are taken into account. Returning to the distributivity relations, we note that for arbitrary systems of events A ǡ A1 , A2 ,... the following equalities are true §f · A ¨¨ Ai ¸¸ ©i 1 ¹
§f · A ¨¨ Ai ¸¸ ©i 1 ¹
f
A Ai , i 1
23
f
AAi . i 1
If the event A is written as a sum of pairwise disjoint events B1 , B2 ,..., Bn ,... , i.e. if for Bi B j (i z j ) ,
A B1 B2 ... Bn ... then we say that the event A is decomposed into special cases B1 , B2 ,..., Bn ,... . For example, with a single throwing of a dice, the event A = {fallout of an even number of points} is decomposed into events B2 , B4 , B6 which mean the falling out of two, four and six points (respectively):
A B2 B4 B6 . If for pairwise disjoint events B1 , B2 ,..., Bn ,...
B1 B2 ... Bn ... : , i.е. if as a result of the experiment, one of the events B1 , B2 ,..., Bn ,... will necessarily occur, then they say the events B1 , B2 ,..., Bn ,... form a complete group of events. For example, with one dice toss, events B1 , B2 ,..., B6 that mean the fallout of one, two, ..., six points (respectively) form a complete group of events. We now give a number of simple but very important properties of probability, which follow from the classical definition of probability. 1. The probability of any event is nonnegative: for any event A ࣛ ,
P( A) t 0 . This property is obvious (in the formula (3) the fraction is nonnegative). 2. The probability of a certain event is equal to one:
P(:)
1.
(5)
This property follows from the fact that, if we put in (3) A : , then
: :
P:
1.
3. For the disjoint events A , В ࣛ: P A B
P A P B . 24
(6)
The formula (6) is called the formula (theorem) of addition of probabilities for incompatible events. Proof. (6) is the corollary of the equality A B A B and definition (4). 4. The probability of an opposite (for A) event A is equal to P( A ) 1 P( A) .
(7)
Usually this property is formulated in the form «The sum of the probabilities of mutually opposite events is equal to one». The Proof follows from the equality A : A and definition (4). 5. The probability of an impossible event is equal to zero:
P() 0 .
(8)
This follows from the definition (an empty set does not contain elements), or in (6) it suffices to assume that В = . 6. If the event A implies the event B ( A B) , then
P( A) d P( B) .
(9)
Proof. If A B , then A d B . Therefore, by the definition (2), P A
A :
B
d
:
PB .
7. For any events A , В ࣛ the probability of their difference
P( A \ B) P( A) P( AB) .
(10)
In the particular case, if В implies A ( B A) , then
P( A \ B) P( A) P( B) . Proof. So
A\ B
AB , B B
A :
A( A( B B )
:,
then
A
AB AB .
The events AB and AB are disjoint:
AB
AB
AB ABB . 25
(10′)
Then, by the property 3, P( A)
P( AB AB )
P( AB) P( AB )
P( AB) P( A \ B) .
This implies that we need the property (10). To prove (10') it suffices to note that B A implies AB B . 8. For any events A , В :
P( A) P( B) P( AB )
P( A B)
(11′)
Generally, for any events A1 , A2 ,..., An ࣛ, §n · P ¨ Ai ¸ ©i 1 ¹
n 1 ¦ P Ai ¦ P Ai Aj ¦ P Ai Aj Ak ... 1 P A1 A2 ...An . (11) n
i 1
i j
i j k
The formula (11) is called the formula (theorem) of addition of probabilities. It is clear that the formulae (11′) and (6) are the particular cases of the formula (11). To prove (11'), we first note that A B A \ B B \ A AB , P( A B) P A \ B PB \ A P( AB ) P( A) P AB PB P( AB ) P( AB ) P( A) P( B) P( AB ) .
Formula (11) can be proved by induction, or we can use the following equality n
Ai
i 1
n
¦ Ai ¦ Ai A j
i 1
i j
¦
i j k
Ai A j Ak ... (1) n 1 A1 A2 ... An .
9. The probability P(A) of any event A is between zero and one:
0 d P( A) d 1 .
(12)
The proof follows immediately from
A
A
A : :,
property 6 and formulae (5) and (8). Remark. The classical definition of probability allows us to solve numerous practical problems. But in the process of expanding the field of application of the probability theory, a number of shortcomings in this definition were revealed. The following are among them: 26
a) The classical definition of probability makes too stringent demands on the set of conditions of the experiment under consideration. b) To apply the classical definition of probability, the space of elementary events must be a finite set, and all elementary events must be equally likely (that is, they must have equal probabilities). In connection with the foregoing, it becomes necessary to introduce a definition of probability that would not have shortcomings of the classical definition and at the same time contain a classical definition as a particular case. These questions will be discussed in the next chapter. As we saw above, the calculation of the probabilities of events according to the classical definition reduces to counting the number of elements of some finite sets, i.e. to combinatorial problems. Below, we present some information from combinatorial analysis from the point of view of their application to probabilistic problems. 1.2. Elements of combinatorics
A
Let’s consider some finite sets A and B which consist of n and m elements m f :
n f, B
А
^a1, a2 ,..., an ` ,
^b1, b2 ,..., bm `.
B
We define a new set (the Cartesian product) A × В as follows:
Аu B
^ a , b : a A, b B` i
j
i
j
Then the number of elements of a set (Cartesian product) is A u B A B n m , because all elements of this set can be arranged in n rows of m elements in each in the following way:
a1 , b1 , a1 , b2 , ..., a1, bm , a2 , b1 , a2 , b2 , ... , a2 , bm , ......................................
an , b1 , an , b2 , ... , an , bm This statement can be generalized in the following sense. Theorem 1. Let some finite sets be given: A1
^a
11
`
, a12 ,..., a1n1 , A2
A
k
^a
21
`
, a22 ,..., a2 n2 , ..., Am
nk f, k 1, 2,..., m . 27
^a
m1
, am 2 ,..., amnm
`
We define a new set (the Cartesian product А1 u А 2 u ... u Аm of sets А1 , А2 ,..., Аm ) as follows:
^
А1 u А2 u ... u Аm = a1i , a2i ,..., ami : aki Ak , k 1, 2,..., m; ik 1
2
m
k
`
1, 2,..., nk ;
Then
A1 u A2 u ...u Am Proof. For m
A1 A2 ... Am = n1n2 ...nm .
(13)
2 it is the above statement. In the case m 3 the number of triples
a1i , a2i , a3i , according to the proved statement, is equal to the product of the number of pairs a , a by the number of elements a3i , i.е. 1
2
3
1i1
2 i2
3
(n1 n2 ) n3
݊ଵ ή ݊ଶ ή ݊ଷ .
Now, to prove the theorem definitively, it suffices to use induction. ז Theorem 1 can be formulated differently as follows. Theorem 1′. If we have n1 elements of the 1st type a11,..., a1n1 , n 2 elements of the 2nd type a21 , a22 ,..., a2 n2 ,..., n m elements of the mth type am1 , am 2 ,..., amnm , then the
number of possible combinations a1i1 , a2i2 ,..., amim , containing one element of each type, is equal to n1 ή n 2 ή ...ή n m . ז Many applications are based on the following reformulation of the last theorem: when there are m sequential selections (decisions) with exactly n k possible outcomes at the kth step, one can obtain different n1 ή n 2 ή ...ή n m results.
Suppose now that some general population of n elements :0 ^a1 , a2 ,..., an ` is given (for example, :0 is the number of balls from 1 to n in the urn; a number of students in a group, etc.). An ordered sample (or simply a sample) of the size (number of elements) r , extracted from a given population, is called an ordered set a j1 , a j2 ,..., a jr .
For clarity, we can imagine that the sampling elements are extracted one by one. In this case, the sample can be of two types: а) sample with replacement (with repetitions); b) sample without replacement (without repetitions). Sample with replacement (with repetitions). In this case, before each extraction, the element selected in the previous step is returned back to the general population, so each extraction is made from the total population and it is possible to select the same element several times. Such samples are ordered sets in which repetitions are possible. 28
Sample without replacement (without repetitions). In this case, each extracted element before removing the next element is excluded (deleted) from the general population, so that the sample becomes an ordered set without repetitions. Obviously, with such a choice, the sample size r cannot be larger than the population size n ( r d n ). Let’s denote by : the set of the sample of size r . Then for the samples with replacement :
^ a , a j1
j2
,..., a jr : a jk :0 , k 1, 2,..., r
`
ᇣᇧᇧ ൈ ᇧᇧᇤᇧᇧ ൈᇧᇧᇥ : ᇧᇧ : ൈ ǥ : ǡ ᇧᇧ
and for the samples without replacement :
^ a , a j1
j2
,..., a jr : a jk :0 , a jk z a jl (k z l ), l , k 1,2,..., r` .
By the theorem proved above (see formula (13)), the number of samples with replacement is : :0 :0 ... :0 nr , and the number of samples without replacement is : n(n 1) ... (n r 1). For this last product, we introduce the abbreviated designation:
(n)r
n(n 1)(n 2) ... ( n r 1) .
(14)
It is clear that for nonnegative integers r and n, such that r ! n , we will have
( n) r
0 : ( n) r
0 ( r ! n ).
We state the results obtained as a theorem. Theorem 2. Suppose that a general population consisting of n elements is given and a sample of size r is extracted from it. Then the number of different samples with replacement is n r , and the number of samples without replacement is (n) r . Let’s consider a special case r n . In this case, if the sample is a sample without replacement, then it includes the whole population and is a permutation (or reordering) of its elements. Thus, n elements a1 , a2 ,...,an can be ordered by (n)n n! different ways. We will obtain as a result Corollary. The number of different permutations out of n elements is equal to
( n) n n !
(14´)
By the theorem 2, the number of samples of size r without replacement, generated from the general population, is equal to (n) r , and among them exactly r! samples have the same composition (see the corollary). Therefore, the number of samples 29
of size r without replacement, differing in composition (at least in one element) is equal to ( n) r r!
n(n 1) ... ( n r 1) r!
n! r !(n r )!
Cnr .
(15)
The number C nr is called the number of combinations of r out of n or a binomial coefficient. In order that formula (15) be valid for all integers r such that 0 d r d n , we further assume 0! = 1 and for r 0 or r ! n consider C nr 0 . Obviously, it follows from (15) that
C nr = C nn r . We consider subsets of volume r from a given general population consisting of n elements. Then the number of such subsets differing in composition (at least in one element) is equal to the number of samples without replacement of size r , i.e. the number C nr (because samples without replacement different in their composition are subsets of the original population). Thus, it is correct. Theorem 3. General population (set) of n elements has C nr different subsets of size r . In other words, a subset of r elements of a set of n elements can be chosen ז in C nr different ways. Usually the assertion of Theorem 3 is formulated as follows: there are C nr different ways to choose r elements from n elements. Theorem 4. Let the integer numbers r1 , r2 ,..., rk such that
r1 r2 ... rk
r , ri t 0 (i = 1,2,..., k ) .
be given. Then the number of ways in which the general population of n elements can be divided into k ordered parts (partition into k subsets), of which the first contains r1 elements, the second – r2 elements, … , the k – th – rk elements, is equal to r! . r1 ! r2 ! ... rk !
Cr (r1 ,..., rk )
(16)
(these numbers Cr (r1 ,..., rk ) are called polynomial or multinomial coefficients). Note that the order is important in such a sense that (r1 2, r2 3) and (r1 3, r2 2) are the different partitions, but the order within the groups is ignored. We also note that 30
0! 1 , so that vanishing of some ri does not affect the formula (16). As vanishing of numbers – elements ri is allowed, then n elements are divided into k or less subsets. Proof of theorem 4. The required decomposition can be obtained as follows: first we have to select r1 elements from the given n elements (this can be done in Crr1 ways);
after this we have to find r2 elements from the remaining r r1 elements (this can be
r done in C r2 r1 ways), then we have to find r3 elements from the remaining r (r1 r2 ) r elements in C r2( r1 r2 ) ways, etc., finally, we have to find rk elements from the remai
ning r (r1 r2 ... rk 1 ) rk (this can be done in one way). Thus, according to formula (13), the total number of all ways of the required decomposition is r Cr (r1,..., rk ) = Crr1 ή C rr2 r1 ή C rr2( r1 r2 ) ή ...ή C rkk . (16′) Furthermore, by uncovering the combinations in (16´) by the formula (15), we ז verify that the coefficients Cr (r1,..., rk ) are indeed determined by the formula (16). The procedure for obtaining a sample of size r from a given population of size n can be considered as an experiment, the outcomes of which are samples of size r . In other words, the space of elementary events corresponding to this experiment is a set whose elements (elementary events) are samples of volume r . In this case, the size of the space of elementary events (the number of elementary events) is n r (if the samples (elementary events) are samples with a replacement) or ( n) r (if the samples are samples without replacement). In the constructed space of elementary events, the elementary events are samples (with or without replacement) of size r . Usually we consider all these samples (as the results of the experiment) to be equally likely and call them random samples. According to the accepted terminology, if we say that a random sample of size r is chosen from the general population of size n , this means that this sample has the probability n r (in case of a choice with replacement) or probability
1 (in case of (n) r
choice without replacement). Examples 1. The paradox of de Mere. Which event is more likely when throwing three dice: the sum of the points dropped is 11 (eleven) or 12 (twelve)? De Mere considered these events to be equally probable and justified this with the following reasoning. The event that «the sum of the dropped points is 11 (eleven)» can occur as a result of the following combinations: (6, 4, 1), (6, 3, 2), (5, 5, 1), (5, 4, 2), (5, 3, 3), (4, 4, 3) 31
where, ex, (6, 4, 1) means that «6» occurred on the 1st dice, «4» – on the 2nd dice and «1» – on the 3rd one, etc. On the other hand, the event «the sum of dropped points is 12 (twelve)» can also occur as a result of the following six combinations: (6, 5, 1), (6, 4, 2), (6, 3, 3), (5, 5, 2), (5, 4, 3), (4, 4, 4). Consequently, these events are equally probable. Here, the mistake of de Mere is that the possible outcomes that he considered are not equally probable. For example, the event (6, 4, 1) can occur in 3! = 6 cases: (6, 4, 1), (6, 1, 4), …, (1, 4, 6). At the same time, for example, a combination (4, 4, 4) can occur only in one case. In modern language, de Mere incorrectly constructed the space of elementary events corresponding to the given problem. The solution of the problem. We define : as :
^ i, j, k : i, j, k
`
1,6
:0 u :0 u :0 ,
where : 0 ^1,2,3,4,5,6`. Let’s introduce the events: A11 ={the sum of points is equal to 11}, A12 ={the sum of points is equal to 12}. Hence
A11 A12
^ i, j, k : : i j k ^ i, j, k : : i j k
11` , 12` ,
We have
A11
27,
A12
25, :
63
216 ,
Therefore P A11
27 25 ! 216 216
P A12 .
2. From the general population :0 ^a1 , a2 ,..., an ` (for example, from an urn with numbered balls) of size n , a random sample with replacement of size r is extracted. a) Find the probability that the extracted sample is a sample without replacement (that is, all the extracted balls have different numbers). b) Find the probability that the first sample element is the first element of the general population, the second sample element is the second element of the general population (that is, the first ball extracted from the urn is the ball No. 1 and the second ball is the ball No. 2). 32
Solution. The space of elementary events is the set of all samples with replacement of size r :
:
^ a
j1
, a j2 ,..., a jr : a jk :0 , k
`
1, 2,..., r .
(n) r , because the number of all samples with nr replacement is n r , and the number of all samples without replacement is ( n) r . b) We need to find the probability of an event
а) The required probability is
A
^ a , a j1
^ a , a , a 1
2
: : k
j2
,..., a jr : : a j1
j3
,..., a jr
a1 , a j2 3, 4,..., r
a2
`
`
According to the formula (13),
A
nr 2 ,
consequently, P( A)
nr 2 nr
1 . n2
Note that if in the statement of the problem the condition «random sample with replacement of size r is extracted» is replaced by the condition «random sample without replacement of size r is extracted», we get P( A)
(n 2) r 2 ( n) r
1 . n(n 1)
3. The task of birthdays. Suppose there are n students in the classroom. Let's find the probability that birthdays at least for two of them coincide. Solution. r birthdays form the sample of size r from the general population of all the days of the year. Years are not the same in duration, and it is known that the birth rate does not remain constant throughout the year. In the first approximation, we can assume that there are 365 days in a year and each person can be born in any of 365 days with the same probability of 1/365. Note that under the assumptions made for the group of n = 366 (and more) persons, the required probability is practically equal to 1, i.e. it is practically certain that for n ≥ 366 at least two birthdays coincide. Therefore, consider the case n ≤ 365. In this case, the probability that r different students, who are in the classroom, have different birthdays is calculated from the point a) found in Example 2 (we have n = 365): 33
pr
1 ·§ 2 · § § r 1 · ¨1 ¸¨1 ¸ ... ¨1 ¸. 365 ¹ © 365 ¹© 365 ¹ ©
(365) r 365r
Then the probability that at least two students have coinciding birthdays, can be found like the probability of the opposite event by the formula
qr
1 pr
1
(365) r . 365r
In Table 2 below, the values of the probabilities q r for different r are calculated (accurate to the third decimal place). Table 2 r
qr
4 0.016
6 0.284
20 0.411
22 0.476
23 0.507
30 0.706
40 0.891
50 0.970
60 0.994
64 0.997
As it can be seen from the table, q22 0,5, q23 ! 0,5 and the required probability becomes sufficiently close to 1 already for groups that are significantly less than 366 people (such a conclusion is not obvious a priori). 4. The task about coincidences. Let us have n elements, arranged in some order, and let they be rearranged at random (all n! permutations are equally probable). What is the probability that at least one element will be in its place? Solution. We introduce the event: A = {at the permutation of n elements at least one element will be in its place}, Аi ={at the repositioning of n elements the ith element will be in its place},
i 1,2,..., n . Then A A1 A2 ... An , and for calculating probability P(A) we use the formula of addition of probabilities (formula (11)). As for any i the number of elements Ai (n 1)! , therefore Ρ(Ai )
Because Ai A j
(n 1 )! n!
1 , i 1, 2, ..., n. n
(n 2)! (i z j ) , therefore
Ρ(Ai Aj )
(n 2)! i z j . n! 34
Similarly, for different i, j , k Ρ(Ai Aj Ak )
etc.
(n 3)! , n!
Now, by the formula (11) (n 1)! (n 2)! (n 3)! Cn2 Cn3 n! n! n! i 1 1 1 1 1 ... ( 1) n1 1 (1 1 ... ( 1) n ). n! n! 2! 3! Ρ(A)
Ρ(
n
Ai ) Cn1
The expression in parentheses in the last relation gives the term No. ( n 1) of the series expansion of e 1 . Therefore, when n o f Ρ(A) o 1 e 1  0.63212.
The accuracy of the approximation can be seen from the following table, giving the exact values of the probabilities: Table 3
n
P A
3 0.66667
4 0.62500
5 0.63333
6 0.63196
7 0.63214
1.3. Distribution of balls in boxes Let there be r balls and n boxes, which are numerated by the numbers i 1, 2, ..., n. Denote the set of boxes by :0 {1,2,..., n}. Let us first consider the case of distinguishable (i.e., having some differences from each other – number, color, etc.) balls. Denote by : a sample space, corresponding to a random distribution of r balls into n boxes (here and further «random distribution of balls in boxes» means that any ball can get into any box with the same probability). If we denote by i j (j 1, 2,..., r ) the number of box into which the ball No. j got, then the sample space corresponding to the given experiment can be described as follows: :
^i1 , i2 ,...,ir : i j : 0 , j
1,2,..., r
`
: 0 u : 0 u ...u : 0
(17)
r
From this we see that the experiment consisting in placing r distinguishable balls into n distinguishable boxes and the experiment corresponding to the choice of a random sample of size r from the general population of size n are described by the same sample space (see the previous paragraph 1.1). 35
Remark. Above we used the figurative language of «balls» and «boxes», but the sample space, constructed earlier for this scheme, allows a large number of interpretations. For the convenience of further references, we present now a number of schemes that are visually very different but essentially equivalent to the abstract arrangement of r balls in n boxes in the sense that the corresponding outcomes differ only in their verbal description. In this case, the probabilities attributed to elementary events can be different in different examples. Example 5. а) Birthdays. The distribution of birthdays of r students corresponds to the distribution of r balls into n 365 boxes (it is assumed that there are 365 days in a year). b) When firing at targets, the bullet corresponds to the balls, and the targets to the boxes. c) In experiments with cosmic rays, particles that fall into Geiger counters play the role of balls, and the counters themselves are boxes. d) The elevator leaves (rises) with r people and stops on n floors. Then the distribution of people into groups, depending on the floor on which they exit, corresponds to the distribution of r balls in n boxes. e) The experiment consisting in throwing r dice corresponds to the distribution of r balls in n 6 boxes. If the experiment consists in throwing r symmetrical coins, then n 2 . From the above formula (17), according to Theorem 2 of the preceding section, it follows that : n r . The latter means that r distinguishable balls can be distributed over n distinguishable boxes in n r ways. In many cases it is necessary to consider the balls indistinguishable (the balls are the same and they do not differ from each other in color, shape, weight, etc.). For example, when examining the distribution of birthdays by days of the year, only the number of people born on a particular day is of interest (the number of balls that have fallen into a particular box). To show that depending on whether the balls are distinguishable or indistinguishable, the number of possible balls distributions in the boxes may be different. Let us give an example. Example 6. Placement (distribution) of three balls in three boxes. а) Let the balls be distinguishable. We denote these three balls by a, b, c, and each of the boxes is represented as an interval between two vertical segments. Then each of the possible distributions is an indecomposable outcome of the experiment (an elementary event). All possible outcomes of the experiment, consisting of placing three distinguishable balls into three distinguishable boxes, are presented in Table 4. An event A = {there is a box containing at least two balls} occurs when distribution is 1 – 21, and we express this by saying that an event is a set of elementary events 1 – 21. Similarly, an event В = {the first box is not empty} can be described as a set of points (elementary events) 1, 4 – 15, 22 – 27. An event С = АВ (A and В 36
occurred) is a set consisting of thirteen elementary events 1, 4 15. It is easy to see that A B : = {all the 27 outcomes from Table 4} is a certain event. An event A (A didn’t occur) consists of points 22 27 and can be described as A = {there are no empty boxes}. Table 4 № 1 2 3 4 5 6 7 8 9
1 аbс аb ас bс аb ас bс
2 аbс с b а 
3 аbс с b а
№ 10 11 12 13 14 15 16 17 18
1 а b с а b с 
2 bс ас аb аb aс bс
3 bс ас аb с b а
№ 19 20 21 22 23 24 25 26 27
1 а а b b с с
2 а b с b с а с а b
3 bс ас аb с b с а b а
Finally, an event {The first box is empty and there is no box containing more than one ball}ൌ (an impossible event, i.е. it cannot occur). b) Balls are indistinguishable. All three balls are the same, which means that further we do not make a difference between such placements as 4, 5, 6; 7, 8, 9; 10, 11, 12; 13, 14, 15; 16, 17, 18; 19, 20, 21; 22  27. Thus, in this case, Table 4 is reduced to the following Table 5 (in Table 5, indistinguishable balls are indicated by asterisks (*)). Table 5 №
1
2
3
№
1
2
3
1


6
*
**

2
*** 
***

7
*

**
3 4
**
*
*** 
8 9

** *
* **
5
**

*
10
*
*
*

As we see, in this case the sample space consists of only 10 points. If the balls are indistinguishable, but the boxes are distinguishable (for example, numbered), then each placement ( r1 , r2 ,..., rn ), where r j is the number of balls in the j th box, is completely described by the relations given below:
r1 r2 ... rn
r , ri t 0,
i 1,2,..., n ,
(17)
or «filling numbers», and nonnegative integers r1 , r2 ,..., rn completely describe all possible combinations of filling numbers. 37
We now prove the following lemma. Lemma. At random placement of r indistinguishable balls in n distinguishable boxes: а) The number of distinguishable placements (that is, the number of different solutions of equation (17)) is equal to
Аr ,n
C nr r 1
C nn1r 1 .
(18)
b) The number of distinguishable placements, when there will not be a single empty box, is equal to C rn11 . Proof. а) We denote the balls by asterisks (*) and draw n boxes in the form of n spaces between n +1 vertical dashes. For example, the placement
  

Is used to indicate the placement of r 6 balls into n 5 boxes so that these boxes contain 2, 1, 0, 0, 3 balls, respectively. With the introduction of such a notation, dashes necessarily stand at the beginning and at the end, but the remaining n 1 dashes and r asterisks can be arranged in random order. Hence it follows that the number of distinguishable placements is equal to the number of ways to choose r places for asterisks from the available n r 1 places (or the number of ways to select n 1 places for dashes from available n r 1 places), i.e. the formula (18) takes place. b) The condition that there is no empty box means that there are no two dashes nearby. Between r asterisks there are r 1 intervals, in n 1 of which there is one n 1 dash, so we have C r 1 choices.
Example 7. а) Simultaneous throwing of r indistinguishable dice gives Cr5 5 distinguishable outcomes (in formula (18), the number of all outcomes n = 6). b) Partial derivatives. It is known that partial derivatives of order r of an n variables do not depend on the analytical function f ( x1 , x2 ,..., xn ) of differentiation order, but depend on how many times the function is differentiated for each variable. Thus, each variable plays the role of a box, and, consequently, in this r case there are Cn r 1 different partial derivatives (see the lemma). Thus, the function of three variables has fifteen derivatives of the fourth order and 21 derivatives of the fifth order. c) r indistinguishable balls are randomly placed into n boxes. If we consider all indistinguishable distributions to be equally likely, then the probability that there will not be any empty boxes (of course, in this case r t n) ) is equal to (see the lemma)
p
Crn11 . Cnnr11 38
(19)
1.3.1. Statistics of MaxwellBoltzmann, BoseEinstein and FermiDirac We consider several important distribution problems arising in the study of certain particle systems in physics and statistical mechanics. In statistical mechanics, phase space is usually divided into a large number n of small regions or cells so that each particle is assigned to one cell. As a result, the state of the whole system is described as a random arrangement of r particles (balls) in n cells (boxes). The MaxwellBoltzmann system is characterized as a system of r distinguishable (different) particles, each of which can be in one of n cells (states), regardless of where the remaining particles are located. In such a system it is possible to have n r different arrangements of r particles into n cells. If, in doing so, all such arrangements (states of the system) are considered equally probable, then we speak of MaxwellBoltzmann statistics. Thus, in the MaxwellBoltzmann system (statistics), the probability of each state (elementary event) is n r . The BoseEinstein system is defined as a system of r indistinguishable particles, each of which independently of the others can be in one of n cells. Since the particles are indistinguishable, each state of this system is given by «filling numbers» r1 , r2 ,..., rn , where r j is the number of particles in the cell No. j. If in this case all states of the system are considered equiprobable, then we speak of BoseEinstein statistics. Thus, the probability of each state (elementary event) in the BoseEinstein system is
C
1 n 1 n r 1
(see formula (18)). Note that if in the BoseEinstein system we additionally require that no cells remain empty in each state of the system (clearly, this should be r t n) , then the numn 1
ber of possible states of the system will be reduced to Cr 1 (this we proved above, in part b) of the last lemma). The FermiDirac system is defined as a BoseEinstein system, which in addition to the Pauli exclusion principle requires that no more than one particle is in each cell. Since in this case the particles are also indistinguishable, the state of the system is characterized by the numbers r1 , r2 ,..., rn , where rj = 0 or rj = 1 (because in each cell there can be no more than one particle), j = 1,2,…,n, and mandatory r d n . You can specify the system status by specifying the filled cells. The latter can be r
selected by C n ways ( r cells can be selected from n cells for r particles in C nr ways), the FermiDirac system has the same number of states. If all states are equiprobable, then we speak of FermiDirac statistics. Thus, the probability of each state
1
(elementary event) in FermiDirac statistics is Cnr , r d n . Example 8. r distinguishable (for example, numbered) particles are arranged in n cells according to the MaxwellBoltzmann system. Find the probabilities of the following events: a) Exactly k (0 d k d r ) particles fell in a certain cell (say, in cell No. 1). b) Exactly k (0 d k d r ) particles fell in some cell. 39
c) Particle No. 1 fell into cell No. 1, particle No. 2 fell into cell No. 2. d) At least one cell is empty. Solution. а) A sample space can be described by the formula (17). If k particles fell in a certain cell (for example, in the cell No. 1), then the remaining r k particles must be placed in the remaining n 1 cells (except for the selected ones). For the cell No. 1 we can select k particles from r particles in C rk ways, and the remaining r k particles we can distribute into the remaining n 1 cells in (n 1) r k ways. Then, by the classical definition of probability, the required probability is p1
(n 1)r k C nr k r
k
§1· § 1· C ¨ ¸ ¨1 ¸ ©n¹ © n¹
r k
k r
.
b) We can select one cell from all n cells in n ways. So, the required probability p2 is n times greater than the probability p1, i.е. p2 np1 . c) It is not difficult to guess that we need to find the probability of an event A3
^(1, 2, i ,..., i ) : : 3
r
i j :0 , j 3,..., n`, :0
^1,2,...,n`.
Then the required probability is p3
A3 :
n r 2 nr
1 . n2
d) Let’s introduce the events: А = {at least one cell is empty},
Аk = {cell No. k is empty}, k = 1,2,…, n. So n
A
Ak ,
k 1
P( Ak )
An event Ak Al
(n 1)r nr
r
§ 1· ¨1 ¸ , © n¹
1 d k d n.
(k z l ) means that the cells k and l are empty, therefore
(n 2 )r Ρ(Ak Al ) nr
r
§ 2· ¨1 ¸ , 1 d k,l d n,k z l. © n¹ 40
Similarly, for different k , l , m :
(n 3 )r Ρ(Ak Al Am ) nr
r
§ 3· ¨1 ¸ . © n¹
etc. Finally, by the addition of the probability formula, the required probability is r
p4
r
§ 1· § 2· § n 1 · Ρ(A) n ¨1 ¸ Cn2 ¨1 ¸ ... ( 1 )n2 Cnn1 ¨1 ¸ n ¹ © n¹ © n¹ © n 1
j· § ¦ (1) j 1 Cnj ¨©1 n ¸¹ j 1
r
r
r
j· § ¦ (1) j 1 Cnj ¨©1 n ¸¹ . j 1 n
Example 9. r particles are distributed into n cells, 0 r d n . Assuming that the «particlecell» system obeys the MaxwellBoltzmann, BoseEinstein, and FermiDirac statistics, find the probabilities of the following events: а) There is one particle in each of predetermined r cells. b) There is one particle in some r cells. Solution. For MaxwellBoltzmann statistics, the number of all possible distribur tions is n , and the number of all possible distributions of r distinguishable particles into r cells is r ! In addition, in the case b) r cells (for the distribution of r particles) r from n cells can be selected in C n ways. So, the wanted probabilities are r! ; а) p nr Cnr r ! n! b) p . r n n r ! nr In BoseEinstein statistics, particles are indistinguishable, the number of all posn 1 sible distributions is Cn r 1 , in addition, the replacement of particles in places in predetermined cells does not give a new distribution. Therefore 1 ; а) p r Cn r 1
Cnr n!(n 1)! b) p . r Cn r 1 (n r )!(n r 1)! It is easy to see that in the case of FermiDirac statistics, the required probabilities are: а) p
C
b) p
1.
r 1 ; n
41
1.4. Tasks for independent work 1.4.1. Events. Operations on events 1. Winners of sports competitions are encouraged by the following awards: a certificate of honor (event A); a cash prize (event B), a medal (event C). What do the following events mean: а) A B ; b) ABC ; c) A B \ C ? 2. Prove an equality of events A B A and B \ A . 3. Two people play chess. Let A – the first player won, B – the second player won. Describe the following events (below and everywhere further, ' is the symmetric difference operation: for any events C and D : C'D (C \ D) ( D \ C ) ): а) A' B; b) A'B; c) A'B ; d) B / A ; 4. Prove the following equality: а) A /( A / B) d) A B f) A'B
AB;
A AB
( A'B) ;
AB 'AB ; ;
e) A / B.
b) AB
A B;
e) A'B
AB AB;
n
n
g) Ai
Ai ;
i 1
i 1
c) A B _______ n
h) Ai i 1
AB;
n
Ai .
i 1
5. Of the many married couples at random one is chosen. The event A = {the husband is more than 30 years old}, B = {the husband is older than the wife}, C = {the wife is more than 30 years old}. а) Clarify the meaning of events: ABC , A \ AB, ABC ; b) Show that Аʠത كВ. 6. Let A and B be some events. Prove that: а) A B AB ' ( A'B); b) A \ B A ' ( AB); c) ( A B ) '( A B) A ' B . 7. Prove that
A B A B
A B A B
B A B
A
B A B
and are certain events, and
A
A B A B A B A B
is an impossible event. 8. A1 , A2 , ..., AN are any events. Prove that: N N
Ak
n 1k n
N N
Ak
n 1k n
AN .
9. Express the following events through events A1 , A2 , A3 : а) Only an event A1 occurs; c) All three events occur; e) At least two events occur; g) Only two events occur; i) At most two events occur.
b) A1 and A2 occur, but A3 doesn’t occur; d) At least one event occurs; f) Only one event occurs; h) No events occurred;
42
10. Let
^An ` ,
n 1,2,... be a sequence of events. Bm denotes an event, which means that
among events A1 , A2 ,... the event Am will occur first of all. а) Express Bm through A1 , A2 ,..., Am ; b) Prove that events ʑଵ ,ʑଶ ,… are pairwise disjoint; f
c) Express Bn through A1 , A2 ,... . n 1
1.4.2. Finding probabilities by the classical definition of probability 1. The numbers 1,2,..., n are arranged in a random order. Find the probability that the numbers а) 1 and 2; b) 1, 2 and 3 are placed side by side in the indicate order. 2. Which event is more likely: when throwing six dice at least once, a «unit» (event A) has dropped out; when throwing twelve dice, at least two times a «unit» (event B) falls out? 3. (The task of the player de Mere) Which event is more likely: at four dice tosses, at least once a «six points» (event A) will fall out; at twentyfour tosses of two dice at least once the two «sixes» (an event B) will fall out? 4. а) Find the probability that of the three selected digits 2, 1, 0 will be repeated. b) Solve the same problem for the four selected digits. Remark. Here and further we will suppose random numbers taken from the general population numbers 0, 1, 2, ..., 9. 5. Find the probability pr that if you randomly select r digits from the table of random numbers, there will not be a single repetition. Using the Stirling formula, find the approximate value of p10 . 6. n balls are randomly distributed into n boxes. Find the probabilities of events: а) exactly one box will be empty; b) there will not be of empty boxes in the cases of MaxwellBoltzmann and BoseEinstein statistics. 7. The number a is randomly selected from the set {1, 2,..., N } . Denote by PN the probability that the number a 2 1 is a multiple of 10. Find the limit of this probability lim PN . N of
8. Two numbers [1 and [ 2 are chosen from the set ^1, 2,..., N ` consistently, without replacement. а) Find the probability that the second number will be greater than the first number, i.e. probability P^[ 2 ! [1 `. b) If three numbers were chosen from the given set by the selection scheme without replacement, then what is the probability that the third number will be a number lying between the first two numbers? 9. There are n places in the hall for the spectators. All places in the room are numbered and all tickets are sold. If the spectator chooses places randomly, then what is the limit when n o f of the probability that no spectator will sit on his (indicated in the ticket) place? 10. Dice is tossed n times. Find the probabilities of the following events: а) All n times the same number of points fall out; b) At least once the «six» occurs; c) «Six» occurs exactly one time; d) «Six» occurs exactly two times. 43
11. There are balls numbered from 1 to n in the urn. If the k balls are randomly selected from the urn under the selection scheme without replacement, what is the probability that the numbers of the selected balls (in order of extraction) form an increasing sequence? 12. Nine passengers randomly seated in three cars. Find the probabilities of the following events: a) A = {there are 3 passengers in each wagon}; b) B = {there are 4 passengers in one wagon, 3 – in the other and 2 in the third}. 13. r distinguishable balls are arranged into n boxes. Find the probability that the boxes Nos. 1, 2, ..., n will contain r1 , r2 ,..., rn balls respectively (
r1 r2 ... rn r , ri t 0, i 1, 2,..., n ). 14. There are 2n cards with the numbers of 1 up to 2n and 2n envelopes, on which the same numbers are written. Cards are randomly placed in envelopes (by one card in each envelope). Find the probability that the sum of the numbers on any envelope and the card lying in it is even. 15. n distinguishable balls are placed into n boxes so that it is equally likely for every ball to be in any box. Find the probabilities of the following events: а) All the balls are in the box No.1; b) Ball No. m is in the box No. l ; c) Balls with the numbers i1 , i2 , ..., ik i j z im , j z m are in the boxes with the numbers
j1 , j 2 , ... , j k
jl z j m , l z m (respectively).
16. Continuation. Suppose that under the conditions of the preceding problem there are n balls and N boxes. How will the corresponding probabilities change? 17. Find the probability that when you place r balls in n boxes, exactly m boxes will remain empty (assume the balls are indistinguishable and all distinguishable locations are equally likely). 18. n persons sit down in a row in a random order. What is the probability that two identified individuals will be nearby? Find the corresponding probability if n persons sit down at a round table. 19. Continuation. n persons sit down in a row or at a round table in a random order. Find in this and the other case the probability that between two definite persons there will be exactly r people. 20. By the condition of the game, three dice are thrown and if the sum of the points dropped does not exceed 10, then the first player will win. Find the probability of winning the first player. 21. n keys are given to a person, only one of them comes to his door. He tests them consistently (choice without return). It is clear that this process may require 1, 2, ..., n tests. Show that each of these outcomes has a probability 1/ n. 22. From an urn containing n balls, all the balls are extracted successively by the selection scheme with the return. What is the probability that all balls were extracted? 23. Testing the statistical hypothesis. A university professor was fined twelve times for illegal night car parking. All twelve penalties were imposed on Tuesday or Thursday. Find the probability of this event (Did it make sense to rent a garage only on Tuesdays and Thursdays?) 24. Continuation. Of the twelve fines, no fine was imposed on Sunday. Does this testify that fines are not imposed on Sunday? 25. Each of the n identical sticks randomly breaks into two parts – long and short. Then 2n fragments are combined into n pairs, each of which forms a new «stick». Find the probabilities that in doing so: а) parts will be connected in the original order; b) all long parts will be connected by short ones. 44
§ 2. Some classical models and distributions 2.1. The Bernoulli scheme. Binomial distribution Let some experiment be repeated n times and as a result of each experiment an event A may occur or not occur (for example, each experiment is throwing a coin, and an event A is dropping the «tail»). If an event A occurred as a result of the experiment, then we will say that there was a «success», if the event A did not occur, then we will say that there was a «failure». If we denote the result of the ith experiment by Z i and write down Z i = 1, if the «success» was in the ith experiment, and Z i = 0, if the «failure» was in ith experiment, then in the space of elementary events, corresponding to the nfold repetition of the original experiment, it can be described as follows:
:
^Z : Z Z ,Z ,...,Z : Z 1
2
n
i
0,1`.
Then let’s consider two positive numbers p, q such that p q 1 , and define the probability P Z of an elementary event Z : by the formula
PZ
Z
p q
n Z
,
(1)
where Z Z1 ... Zn is the number of successes. First of all, let us show the correctness of the definition (1), i.е. implementation of equality P(:) ¦ P(Z ) 1 . Z:
Really, P :
¦ P Z ¦ p
Z:
Z:
Z
q
n Z
n
¦ ¦ k 0
p k q nk
Z: Z k
n
n ¦ Cnk p k q nk p q
n
¦ Ak
p k q n k
k 0
1.
k 0
(Above we took into account, that for Ak
^Z : : Z
k ` the number of its
elements is Ak Cnk ). If now for any event A ࣛ ^ A : A :` we assume, by definition (§1, formula (1**)) P ( A) ¦ P (Z ) , then we obtain a finite probability space ሺ : ǡ ࣛǡ ܲሻ (see §1). ZA
If n 1 , then the sample space consists only of two points Z1 1 («success») and Z2 0 («failure»): : ^0,1`. Naturally, in this case the probability P(1) p is reasonably called the probability of success, and the probability P(0) q 1 p is the probability of failure. 45
The sequence of tests (experiments) described above, in which the probability of success is defined by formula (1), is called the Bernoulli scheme (or the Bernoulli series of independent trials – we will explain later why it is so called). For the Bernoulli scheme with the probability of success p, the probability of event
A k ={exactly k successes occur in n Bernoulli trials} is equal to Pn k
P Ak
¦ P(Z ) ¦
ZAk
p k q nk
ZAk
Ak p k q nk
Cnk p k q n k .
The fact that these numbers are indeed the probabilities follows from the relation n
n
k 0
k 0
¦ P Ak ¦ Cnk p k q nk
1.
A set of probabilities ^P( A0 ), P( A1 ), P( A2 ),..., P( An )` is called a binomial distribution (binomial distribution of the number of successes in the sample of size n). This distribution arises in a wide variety of probability models and plays an extremely important role in the probability theory. To understand the nature of this distribution for n = 5 («symmetrical case») and n 5, n 10 , we give the form of the graph of this distribution (see Fig. 1).
Fig. 1. The graph of binomial probability Pn (k ) when p 1 2 and n
5, 10 .
In connection with Fig. 1 we pay attention to the following: the probability P5 (k ) takes the greatest value at two points k1* 2 and k2* 3 ; but the probability P10 (k ) takes the greatest value only at one point k * 5 . To understand this difference, let's consider the probabilities
Pn (k )
P( Ak )
P(n, k )
when k changes. 46
Cnk p k q n k
We can write n ! p k 1q nk 1 k !(n k )! (k 1)!(n k 1)! n ! p k q nk
P(n, k 1) P(n, k )
p(n k ) . q(k 1)
Further, since the inequality P(n, k 1) ! P(n, k ) is equivalent to the inequality (n 1) p ! k 1 , then, if (n 1) p ! k 1 (or np q ! k ) then the probability P(n, k ) increases with the transition from k to k 1 ; conversely, if (n 1) p k 1 (or np q k ), then P(n, k ) decreases with the transition from k to k 1 . If (n 1) p k 1 (or np q k ), then P(n, k 1) P(n, k ) . Definition. The value of k , at which the probability P(n, k ) , as a function of
k , takes the greatest value P n, k max P(n, k ) is called the most likely number of 0d k d n
successes. From the definition and the above arguments we obtain the following statement:
If (n 1) p is not an integer, then k ª¬ n 1 p º¼ , where [ a ] is the integer part of number a ; If (n 1) p is an integer, then there are two most likely numbers of success: k
(n 1) p 1 and
k
(n 1) p .
In conclusion, we note that for the sequence of n independent Bernoulli trials with the probability of success p the probabilities of events a) not once there was a success, b) at least once there was a success, c) there were at least k 1 and at most k 2 successes can be found by the following formulas (prove!): n (1 p) n , а) P(n,0) q n
b)
¦ P(n, k ) k 1 k2
c)
¦ P(n, k )
k k1
1 P(n,0) 1 q n 1 (1 p) n ,
(2)
k2
¦ Cnk p k q nk .
k k1
2.2. Polynomial scheme. The polynomial distribution We generalize the binomial scheme to the case where each experiment can have r outcomes A1, A2 ,..., Ar r t 2 . Let’s denote by Z i the result of the i th experiment and write Zi a j , if the outcome Aj occurs as a result of the i th experiment (i 1,2,..., n , j 1,2,..., r) . 47
Then (corresponding to a sequence of n independent experiments) the sample space is:
:
^Z Z ,Z ,...,Z , 1
2
Zi :0 , i 1,2,..., n` ,
n
:0
^a1,..., ar `.
We denote by Q i (Z ) the number of equal outcomes a i of the sequence Z Z1 , Z2 ,..., Zn . In other words, Q i (Z ) means the number of occurrences of the outcome Ai in n trials:
Q i (Z )
n
¦ I{Z: Z j 1
j
ai }
(Z ),
(3)
where ܫʏ ሺ Z ሻ is an indicator of an event А:
I A Z 1, Z A;
I A Z 0, Z A.
Let’s now determine the probabilities of elementary events Z : by the formula:
p1Q1 (Z ) pQ2
P(Z)
(Z )
... pQr r (Z ) ,
(4)
where pi ! 0 , p1 p2 ... pr 1 . Let’s show that the definition of probability by the formula (4) is correct, i.e. P :
¦ Ρ(ω)
1,
ωΩ
Really, Q1 Z
¦ P Z ¦ p1 p2
Z:
=
¦n t0 ^nn t0,..., n ... n n` 1 1
=
Z:
r
2
r
¦
Z:Q1 (Z ) n1 ......... Q r (Z ) nr .
Q 2 Z
Q r Z
... pr
p1n1 p2 n2 ... pr nr
Cn (n1 , n2 ,..., nr ) p1n p2 n ¦ n t0 ^nn t0,..., n ... n n` 1
1 1
2
... pr nr ,
r
2
r
where Cn (n1 , n2 ,..., nr ) is total number of elementary events Z n1 elements a1 , n2 elements a 2 ,..., nr elements a r . Then, by the formula (16) from § 1, Cn (n1 , n2 ,..., nr )
48
n! . n1 ! n2 ! ... nr !
Z1,Z2 ,...,Zn :
with
Therefore
¦ P(Z )
Z:
¦
{n1 t 0,..., nr t 0} {n1 n2 ... nr n}
n! p1n1 ... prnr n1 ! n2 ! ... nr !
( p1 p2 ... pr )n
1,
and this proves the correctness of the definition (4). Let an event An1 , n2 ,..., nr mean that as a result of n independent tests the event A1 appeared n1 times, ..., the event Ar appeared nr times:
An1 ,n2 ,...,nr
^Z : :Q1 (Z)
n1,Q 2 (Z ) n2 ,...,Q r (Z) nr `.
The probability of this event is equal to P( An1 ,n2 ,...,nr )
P(n; n1 , n2 ,..., nr )
n! p1n1 p2n2 ... prnr . n1 ! n2 ! ... nr !
(5)
A set of probabilities ^P(n; n1, n2 ,..., nr )` is called a polynomial (or multinomial) distribution, and the constructed probability model is called a polynomial scheme. It is clear that the binomial scheme is a special case of a polynomial scheme. If n 1 , i.е. only one experiment takes place, then the event ^Q i 1,Q j 0, j z i` means that only the event Ai occurs as a result of the experiment, and by the formula (5) the probability of appearance of this event is P Ai pi . Thus, for a polynomial scheme, probabilities p1 , p2 ,..., pr have the meaning of the probabilities of appearances (for one experiment) of events A1 , A 2 ,..., Ar , respectively. Similarly, for i z j we have P Ai Aj
pi p j
P Ai P Aj , etc., which means
independence of events A1 , A 2 ,..., Ar . 2.3. Hypergeometric and multidimensional hypergeometric distributions Let the general population :0 contain n1 elements of the first kind a1, a2 ,..., an1 ;
n2 elements of the second kind b1, b2 ,..., bn2 , total n1 n2
^
`
n elements:
:0 = a1 , a2 ,..., an1 ; b1 , b2 ,..., bn2 , :0 = n1 n2
n.
The question is: if from this general population a random sample of volume k is taken out without replacement, then, what is the probability that among them there will be exactly k1 elements of the first kind and exactly k2 k k1 elements of the second kind? (It is clear that k d n, ki d min(ni , k ), i 1,2 ). 49
This problem can be formulated differently: from an urn, containing n1 white and n2 n n1 black balls, k balls are randomly selected. What is the probability that exactly k1 white and k2 k k1 black balls will be among them? The space of elementary events corresponding to this problem can be described, for example, as follows:
:
^Z Z ,Z ,...,Z : Z : ,Z
Then :
1
2
k
i
0
i
z Z j (i z j ), i, j 1,2,..., k`.
(n)k , and the number of elements of : with k1 elements of the first k
kind and k 2 elements of the second kind is equal to Ck 1 ( n1 ) k1 ( n2 ) k.2 . Then the required probability, according to the classical definition, is Ρn,n1 (k,k1 ) C
k1 k
(n1 )k1 (n2 )k2
Cnk11 Cnk22
Cnk11 Cnknk11
(n)k
Cnk
Cnk
^
.
(6)
`
A set of probabilities Pn,n1 (k , k1 ) is called a hypergeometric distribution. In another way, this distribution could be defined as follows: from n balls, those in the urn, we can select k balls by C nk ways; and from n1 white and n2 n n1 black k k k k k1 black balls by Cn11 Cn n11 ways (because any set of black balls can be combined with any set of white balls). Using the binomial coefficients, we see that the probabilities Pn,n1 k , k1 can also be calculated using the following formula:
balls we can select k1 white and k2
Pn,n1 k , k1
n1 k1 Ckk1 Cnk . Cnn1
In the formulas (6) and (6*), as we have already noted, k1
(6′) 0,1,2,...,min n1, k .
s
As for s ! m we have that Cm 0 , then for all k1 ! n1 or k1 ! k the probabilities, defined by the formulas (6), (6′) are equal to zero. Considering this, we can assume in the formulas (6), (6′) that k1 varies from 0 to k. The numbers Pn,n1 (k , k1 ) form a probability distribution, therefore
k
¦ Pn,n (k , k1)
k1 0
1
1,
and, summing up (6) by k1 from 0 to k, we obtain the following property of the binomial coefficient: k
¦ Cnk Cnknk
k1 0
1
1
1
1
50
Cnk .
(7)
If now the population :0 of size n contains n1 elements of the 1st type a1, a2 ,..., an1 ; n2 elements of the 2nd type b1, b2 ,..., bn2 ; ..., nr elements of type r c1, c2 ,..., cnr ( r 2, n1 + n2 +...+ nr = n ), then, repeating the above arguments, we obtain: the probability that in a randomly selected from a general population sample of size k without replacement we will have exactly k1 elements of the 1st type, k2 elements of the 2n type, …, kr elements of type r, is equal to Ρn,n1 ,...,nr (k,k1 ,...,kr )
Cnk11 Cnk22 ... Cnkrr Cnk
.
(8)
The set of probabilities ^Ρn,n1 ,...,nr (k,k1 ,...,kr )` is called a multidimensional hypergeometric distribution. The structure of hypergeometric (especially multidimensional hypergeometric) distribution is very complex. For example, the probability Pn,n1 (k , k1 ) contains nine factorials. Therefore, the questions of finding formulas for the approximate calculation of the probabilities of a hypergeometric distribution are very important. Let's give one result in this direction. Theorem 1 (Approximation of the hypergeometric distribution by the binomial n distribution). Let n1 n2 n ; n o f, n1 o f , but so that 1 o p > 0,[email protected] , i.e. n n2 o 1 p > 0,[email protected] . Then n Cnk11 Cnknk11 C
k n
Pn ,n1 (k , k1 ) o P(k , k1 )
Ckk1 p k1 (1 p ) nk1 .
(9)
Proof. In the formula (6) obtained for the probabilities Pn,n1 (k , k1 ) let’s divide the numerator and the denominator by n k . Then, we go to the limit when n o f, n1 o f . Using the conditions of the theorem, we have Ρn,n1 (k,k1 )
Cnk11 Cnk22 C
k n
k ! n1(n1 1 ) ... (n1 k1 1 ) n2 (n2 1 ) ... (n2 k2 1 ) k1 !k2 ! n(n 1 )(n 2 ) ... (n k 1 )
n1 § n1 1 ·§ n1 2 · § n k 1 · n § n 1 · § n k 1 · ¸¨ ¸ ... ¨ 1 1 ¸ 2 ¨ 2 ¸ ... ¨ 2 2 ¸ ¨ k ! n © n n ¹© n n ¹ n ¹ n © n n¹ n ¹ ©n ©n o k1 !k2 ! § 1 ·§ 2 · § k 1 · 1 ¨1 ¸¨1 ¸ ... ¨1 ¸ n ¹ © n ¹© n ¹ ©
o Ckk1 pk1 ( 1 p)k k1 Ρ(k , k1) . 51
ז
The assertion of the theorem shows that under the assumed assumptions the hypergeometric distribution is approximated by the binomial distribution, which is intuitively clear, because if n and n1 are large then the choice without replacement should give almost the same result as the choice with replacement. As we see, when calculating probabilities using binomial or hypergeometric distributions, we must calculate the factorials of sufficiently large numbers. The numbers n ! grow very rapidly with increasing n (e.g., 15! = 1 307674368000, аnd 100! contains 158 digits). Therefore, both from the theoretical and from the computational point of view, the wellknown Stirling formula is important: for n !! 1 (n is a sufficiently large number) θn
n!
2πnnne ne12n , 0 θn 1.
(10)
Example (Playing in Sports Lottery). An urn contains 49 (fortynine) identical balls with numbers 1, 2, ..., 49 and among them there are 6 (six) balls with winning (lucky) numbers. Six (6) balls are selected from the urn at random without replacement. Winning is determined by the number of winning balls among the selected balls. Find the probability pr of retrieving r ( r = 0,1,2,…,6) balls with winning numbers from the urn. Solution. As you can easily guess, the required probabilities can be found with the aid of a hypergeometric distribution: the total number of balls in the urn is n = 49, n1 = 6 of them are winning; we selected k 6 balls, k1 r of the last ones are winning. Then, by the formula (6)
pr
Ρ49 ,6 6,r
6 r C6r C49 6 C49
r
0,1,...,6 .
Calculations show that
p0  0.435965 ; p1  0.413019 ; p2  0.132378 ; p3  0.017650 ; p4  0.000969 ; p5  0.000018 ; p6  0.00000007151. 8
So, the probability of a maximal gain is p6  7.2 10 , and the probability of a gain (i.e. the probability of drawing from an urn at least three balls with the winning numbers) is equal to
p
p3 p4 p5 p6
1 ( p0 p1 p2 ) 1 0.981362 0.018638 .
In connection with the hypergeometric distribution, we make one remark about the nature of problems in the probability theory and mathematical statistics. Knowing the composition of the general population, we can find out the form of the sample by 52
using the hypergeometric distribution. This is a typical direct probabilistic problem. But often it is necessary to solve inverse problems, i.e. determine the nature of the population by the composition of the samples. Such kind of (figuratively speaking) inverse problems forms the content of mathematical statistics. 2.4. Tasks for independent work
1. Show that for a polynomial distribution ^Ρ(An1,n2,...,nr )` (see Formula (5)), the greatest probability value
is
attained
at
a
point
(k1 , k 2 ,..., k r )
that
satisfies
the
inequality
npi 1 ki d (n r 1) pi , i 1,2,..., r . 2. Using probabilistic considerations, prove the following relations: n
a) ¦ Cnk b) c) d)
2n ;
k 0 n
¦ (Cnk ) 2 C2nn ;
k 0 n
¦ kCnk
n 2 n1
k 0 n
¦ k (k 1)Cnk n(n 1)2 n2 (n t 2).
k 0
3. There are n white and m black balls in the urn (n t 2, m t 2) . Two balls are taken at random (without replacement) out of the urn. Find the probabilities of events: a) balls have the same color; b) balls have different colors. 4. There are n tickets with the numbers 1,2,..., n , there are r winning tickets among them. Someone bought r tickets. Find the probability that at least one of his tickets is winning. 5. The numbers 2, 4, 6, 7, 8, 11, 12 and 13 are written on eight cards. Two cards are chosen at random. Find the probability that you can reduce the fraction which is composed of two numbers written on these cards. 6. To reduce the number of teamsparticipants in a sports competition, 2n people participating in competitions are divided into two equal groups. Find the probability that the two strongest teams will fall into: a) different groups; b) one group. 7. Ten numbers are randomly chosen from the set of numbers 1, 2, ..., 20. Find the probabilities of the following events: A =^all numbers are even`; B =^exactly three numbers are multiples of four`; C =^there are five even and five odd numbers, and exactly one number is a multiple of ten`. 8. One of the nonempty subsets of an n element set is randomly chosen. Find the probability that the selected subset contains an even number of elements. 9. Five multicolored balls are in the urn. The sample of size 25 with replacement is taken out of the urn. Find the probability that by five balls of five different colors will be in the sample. 10. Two balls are randomly selected (with replacement) from an urn containing white and black balls. Prove that the probability of choosing balls of the same color is not less than 1/2.
53
§3. Geometric probabilities Let : ^Z` be a bounded subset of an ndimensional Euclidean space R n . We will assume that for : the concept of «volume» makes sense (for n 1 – length, for n 2 – area, for n 3 – usual volume, etc.) We denote by E E : the system of subsets of : (events), which have «volumes» and for any event A E : we will determine its probability by the relation P A
mes A , mes :
(1)
where mes (A) is the «volume» of the event (the set) А. The definition of probability by the formula (1) is called a geometric definition of probability. The constructed model can be considered as a model of an experiment consisting of random throwing of a point into the domain : (Here and in the following we will understand an expressions of the type «The point is randomly thrown into the area : » or «The random point is uniformly distributed in the domain : » as «The point dropped at random to the area : can reach any point of the area : , and the probability of this point falling into some part А of the area : is proportional to the «volume» of this part and does not depend on the form and location of this part in : »). Examples 1. A random point is placed on a segment of length l (say, a segment > 0,l @ ), as a result, the segment is divided into two parts. Find the probability that the length of a larger segment does not exceed 4 5l (event A ). Solution. Denote by x the length of one of the segments, then the length of the second segment is equal to l x (Fig. 1).
x 0
l lx Fig. 1
Them the sample space is :
^x : 0 d x d l` >0, l @,
and the desired event 54
A {x : : max( x, l x) d
4 1 4 l} [ l , l ]. 5 5 5
Therefore, we have by formula (1)
Ρ(A)
mes(A) mes(Ω)
3 l 5 l
3 . 5
2. At the random moment of time x a signal of length ' appears on the time segment >0,T @ . The receiver is switched on at a random time point y >0, T @ for a time t . Find the probability of detecting the signal by the receiver. Solution. The sample space is the domain :
^( x, y) :
0 d x, y d T `
>0,T @ u >0,T @.
If first a signal appears, and the receiver is connected later, i.e. if x d y , then the signal is detected only when y x d ' . Similarly, if y d x , then the signal can be detected only in the case if y t x t Thus, the event we need
A
^( x , y ) : : y x d ' , y t x , т.е. x y d t , x t y`
is the area that is shaded in the Fig. 2.
Fig. 2
We can find the probability we need using the formula (1):
P( A)
1 1 T 2 (T ')2 (T t )2 2 2 T2 55
2
2
1§ '· 1§ t· 1 ¨1 ¸ ¨1 ¸ . 2© T ¹ 2© T ¹
(2)
3. A task about a meeting. Two people, А and В, agreed to meet in the time interval >0,T @ . If A (or B) arrives first to the agreed location, then he will wait for B (or A) for a period of time ' (or t ) and in case of nonappearance of the latter leaves. Find the probability that the meeting will take place. Solution. Note that this is a differently rephrased task 2: signal is А, receiver is В. Consequently, the probability of the meeting is found from formula (2). Ex, if T 1 hour, t 15 minutes, ' 20 minutes, then p = 143/288. If T 1 hour, t ' 15 minutes, then p = Τͳ. 4. The Buffon Problem. Parallel straight lines, separated by a distance of 2a are in the plane. A needle with a length of 2l (l a) is randomly thrown at this plane. What is the probability that the needle will cross one of the parallel straight lines? Solution. First, we describe the space of elementary events corresponding to this experiment. Let x is a distance from the center of the needle to the nearest straight line, M is the angle between the needle and the nearest straight line. Then the pair (M , x) fully determines the location of the needle needed for selection of a specific straight line (Fig. 3). Since it is sufficient for us to know the position of the needle and the nearest straight line, the space of elementary events : is a rectangle
:
^(M , x) :
0 d M d S , 0 d x d a`
>0, S @ u >0, [email protected] .
The needle can intersect the straight line only if the condition х d lsinM takes place. Therefore, the event we need A ^(M , x) : : x d lsinM` is the region shaded in Fig. 4. Then S
P( A)
mes( A) mes(:)
³ l sin M dM 0
aS
2l . aS
Fig. 3
Fig. 4 3.1. Tasks for independent work
1. Three points are placed at random into the semistraight line >0, f . Find the probability that we can make a triangle from the segments formed from the point zero («0») to the given three points. 56
2. Two points are placed at random into the segment of length of l . Find the probability that a triangle can be made from the three formed segments. 3. Three points, one after another, are put at random on the segment of a line. Find the probability of hitting a third point between the first two points. 4. A random point X is placed on a segment AB of length a , then a random point Y is placed on a segment of length b. Assuming that the points A, B, C are on the line in this order, find the probability of forming a triangle from the segments AX, BY, XY. 5. A random point is thrown into the sphere of radius R. Find the probability that the distance from this point to the center of the sphere does not exceed r. 6. A random point is placed in the square. Find the probability that the distance from this point to the vertices of the square exceeds half of the length of the side of the square. 7. A random point A is placed in the square with the side a. Find the probability that the distance from A to the nearest side of the square does not exceed the distance from A to the nearest diagonal of the square. 8. The point X is randomly placed on a semicircumference C ( x , y ) : x 2 y 2 R 2 , y t 0 . Find the probabilities of the following events: a) the abscissa of the point X lies on the segment [– r, r]; b) the ordinate of the point lies on the segment >r, [email protected] . 9. The plane is marked with parallel straight lines at the same distance a from each other. The a coin (circle) of radius r ( r ) is randomly thrown to the plane. 2 Find the probability that the coin does not intersect any straight line. 10. The Bertrand Paradox. Two points are randomly chosen in a circumference of radius r. They are connected by a chord. Find the probability that the length of the chord will exceed 3 r (that is, the length of the side of an equilateral triangle inscribed in the circle). 11. Continuation. The point is randomly chosen in a circumference of radius r; a diameter is drawn through it. A random point (the middle of the chord that is perpendicular to the diameter) is taken on the diameter. Find the probability that the length of the obtained chord will surpass 3 r . 12. Continuation. The point is placed at random inside a circle of radius r. This point is the middle of the chord that is perpendicular to the diameter passing through it. Find the probability that the length of the obtained chord will surpass 3 r . 13. Two points are placed at random into segments [– a, a], [– b, b], a ! 0, b ! 0, p and q are their coordinates (respectively). Find the probability that the roots of the quadratic equation x 2 px q 0 are real numbers. 14. The segment of length of a1 + a 2 is divided into two parts of the length a1 and a 2 , respectively. The n points are randomly placed on this segment. Find the probability that exactly m out of n points will be placed on a part of the length a1. 15. Continuation. The segment of length a1 a2 ... a s is divided into s parts of the length
^
`
a1 , a2 ,..., a s . The n points are randomly placed on this segment. Find the probability that m1 , m2 ,..., ms (m1 m2 ... ms n) points will be placed on parts of lengths a1 , a2 ,..., a s (respectively).
57
58
Chapter
ІІ PROBABILITY SPACE
§1. Axioms of the probability theory. General probability space In Chapter I we considered the discrete space of elementary events and introduced the notion of a discrete probability space (Chapter I, §2). In it, an event is a subset of the discrete space of elementary events : ^Z1 , Z2 ,...` and the probability of an event A A ࣛ ൌ{ A ǣ A ك: } is defined as the sum of the probabilities of all elementary events Z A leading to the event A, i.e. P(A) ¦ Pω . ωA
After that, a classical definition of probability was given and a number of probability properties derived from this definition were given. For example, it was proved that the probability (probability function) P on ࣛ has the following properties: 1) For any A ࣛǡ P( A) t 0 ; 2) P: 1 ; 3) If A1 , A2 ,..., Ak are pairwise disjoint events Ai Aj , i z j , then § k · P¨¨ ¦ Ai ¸¸ ©i 1 ¹
k
¦ P(Ai ) . i 1
1.1. The necessity of expanding the concept of space elementary events As we already noted in the first chapter, the space of elementary events : {Z} corresponding to the experiment under consideration is not necessarily a discrete space of elementary events (that is, a finite or countable set). For example, random throwing 59
of a point into a segment ݐଵ ǡ ݐଶ (say, an experiment with temperature measurement) has a continuum of outcomes, because the result may be any point of a segment. If in the experiments that have a finite or countable set of outcomes, any set of outcomes (any subset of the space of elementary events) is an event, then in the example under consideration the situation is different. We will have great difficulties if we consider any subset of this interval as an event. In order to understand the essence of these difficulties, let us consider the question of constructing a probabilistic model of an experiment consisting of an infinite «independent» coin tossing with the probability of a «Tail» falling out at each step equal to p. As the set of all outcomes (the space of elementary events), it is natural to take the set : ^Z : Z Z1, Z2 ,..., Zn : Zi 0,1`. where Z i is a result of the i th trial: if in the i th coin tossing «Tail» occurs, then Zi 1; if «Head» occurs, then Zi 0 (i 1,2,..., n) . Let us now answer the question: what is the cardinality of the set : ? First of all, let us recall a wellknown result: any number a > 0,1 can be uniquely decomposed into a set (containing an infinite number of zeros) of binary fractions: a1 a2 a3 a ... (ai 0 or ai 1; i 1,2,3,...) 2 2 2 23 Whence, if we put a number a
Z1 2
Z2 2
2
Z3 23
... > 0,1 in correspondence to the
point (outcome) ω ω1 , ω2 , ω3 , ... : , then we see that there is a onetoone correspondence between the set : and the interval >0,1 , and therefore the set has the cardinality of the continuum. Now, to understand how to define the probability in the introduced model of an infinite number of «independent» tossing of the «right» (symmetric) coin, we note the following: Since it is possible to take as : the set >0,1 , then the problem of interest can be considered as a problem of the values of probabilities in the model of a random «choice of a point from a set >0,1 ». From considerations of symmetry it is clear that all outcomes, i.e. all points of the interval >0,1 must be «equally likely». But the set >0,1 is uncountable, and if we assume that its probability is 1, then the probability P(߱) of each elementary event Z >0,1 must necessarily be zero. However, from this method of specifying probability P(Z) 0 , Z >0,1 there is little that follows. The fact is that we are usually interested not in the probability of this or that outcome, but in the probability that the outcome of the experiment will belong to a given set of outcomes. 60
In a discrete probability space, by the probabilities P(Z ), Z : one can find the probability of an event A : : P(A) ¦ Pω . But in the case under consideration
P(Z)
ωA
0 , Z >0,1 we cannot determine, for example, the probability that a ran
domly chosen point from >0,1 will belong to an interval >1 3 , 2 3 , although it is intuitively clear that this probability is equal to 1/ 3 (Chapter I, §5). The above reasoning suggests that in the construction of probability models in the case of uncountable spaces : , the probability must be given not for individual outcomes, but for certain sets from a specially constructed class of subsets of : . We also pay attention to the following circumstance: in the case of a discrete probability space, the set of all eventsࣛ={ A ǣ A : } was a closed set with respect to the operations of summation, intersection (product), and transition to the complement (the opposite event). Naturally, these closedness properties must also hold in the case of general (covering, as a particular case, the discrete probability space) probability spaces. In other words, for any space of elementary events : ^Z` , we cannot consider any subset : as an event – in this case we will have to introduce a special class of subsets (the set of all events). The above considerations serve as a justification for the introduction of the following concepts and definitions. Let : ^Z` be any set, a system ࣛ be a set of some subsets of : . Definition 1. If for the system ࣛ of subsets of : the following conditions are satisfied А1. : ;ࣛ א А2. If A , B ࣛ, then A B ࣛǡ A B ࣛ; А3. If A ࣛ, then A ࣛ, then such a system ࣛ is called an algebra. It is not difficult to see that in this definition, it is sufficient to require only one of the two conditions A2: either A B ࣛǡ or A B ࣛ. Indeed, if for A , B ࣛ takes place A B ࣛǡ then by the condition А3 we obtain that A, B ࣛ. Furthermore, by the condition А2, A B ࣛ, therefore, again, by condition A3, A B A B ࣛ. The second case is shown in a similar way. Definition 2. If in definition 1 the condition A2 is replaced by the condition Аᇱ . If A1, A2 ,... ࣛ, then f
An ࣛ, n 1
f
A
n
ࣛ,
n 1
then the set system ࣛ is called σalgebra (sigmaalgebra) or the Borel field (the field of events). 61
In this definition it is also easy to see that it is sufficient to require only one of the two conditions Аᇱ . The V algebra, introduced by definition, will usually be denoted by ࣠. It follows from Definitions 1 and 2 that a σalgebra is always also an algebra, but the converse is not true: the algebra is not always a σalgebra (give an example). Thus, algebra is a collection of sets closed with respect to the operation of a complement (a complementary set) and a finite number of operations of summation (or intersection); in the definition of a σalgebra, in addition, the condition of closure with respect to a countable number of operations of summation (or intersection) is required. If a set : and the algebra or σalgebra ࣠ of subsets of this set are given, then they say that a measurable space ሺ : ǡ ࣠ሻ is given, and the sets A in ࣠ are called measurable sets. In order to formalize any probabilistic problem, it is necessary to assign to the experiment corresponding to this problem (to construct according to the corresponding experiment) a measurable space ሺ : ǡ ࣠ሻ: : is the set of all (elementary) outcomes of this experiment (space of elementary events), and algebra or σalgebra ࣠ singles out the system of all events. All other subsets : , not included in ࣠, are not events. Singling out of this or that algebra or σalgebra of events ࣠ is due, on the one hand, to the essence of the problem under consideration; on the other hand – to the nature of the set : . As in the probability theory the σalgebra ࣠ of subsets of the space of elementary events : is understood as a system of all events, and operations on events are defined by analogy with the corresponding operations of the set theory, but with their probabilistic meanings in the theory of probability, along with the usual settheoretical terminology, a somewhat different terminology is used. So, the space of elementary events : (therefore, it is an event) is called a certain event. By the properties (axioms) А1 and А3, the empty set also belongs to ࣠, i.е. is an event and it is called an impossible event. An event Ā is called an opposite or complementary event. The sum, the product and the difference of events, as well as the notion of another event following from some event and the equality of events are defined in exactly the same way as in Chapter I. If A B ( AB ) , then the events A and B are called disjoint events. As already noted in Ch. I, for the sum « » of disjoint events the summation sign is written as an ordinary sum «+». In conclusion, we note once again that operations on events are defined as operations on sets and possess all the properties of the latter. Taking this into account, further we will apply all properties of operations on sets to corresponding operations on events. For example, for the events A1, A2 ,... ࣠ duality principle takes place:
i
Ai
i
Ai
, i
62
Ai
i
Ai .
1.2. Probability in a measurable space In this subsection we introduce the concept of probability of an event and prove a number of important properties of probability. Definition 3. Let a measurable space ሺ : ǡ ࣛሻ, where : ^Z` is a sample space, ࣛ is algebra of subsets of : , be given. Then defined on ሺ : ǡ ࣛሻprobability (probabilistic measure, probabilistic function) is called a numerical function P , which is defined on ࣛ and assigns to the event A ࣛ its probability P(A) with the following properties: Р1. For any event A ࣛ the probability P( A) t 0 ; Р2. P(:) 1 ;
Р3. If A1 , A2 ,... ࣛ is a sequence of pairwise disjoint events Ai A j f
and
¦A
n
, i z j
ࣛ, then
n 1
f
§f · P¨¨ ¦ An ¸¸ ©n 1 ¹
¦ P( An ) .
n 1
f
If ࣛ is σalgebra, then in the definition 3 the requirement
¦A
n
ࣛ is super
n 1
fluous (by the definition of σalgebra it is automatically satisfied). In the probability theory, axiom P1 is called the axiom (or property) of nonnegativity, axiom Р2 – axiom (or property) of normalization, axiom Р3 – axiom (or property) of countable additivity or sigma ( V ) V additivity. The triple ሺ : ǡ ࣛǡ ܲሻ, where ࣛ is algebra, is called extended probabilistic space. The triple ሺ : ǡ ࣠ǡ ܲሻǡ where ࣠ is σalgebra, is called a (general) probabilistic space. If : is a discrete sample space, i.е. a finite or countable set then it is obvious that a system (set) of all subsets is a sigmaalgebra and the corresponding triple ሺ : ǡ ࣠ǡ ܲሻ is called a discrete probability space (see Ch. I, §1). In particular case, when : is a finite set ( : f ), then the tripleሺ : ǡ ࣠ǡ ܲሻ is called a finite probability space. The construction of a probability space ሺ : ǡ ࣠ǡ ܲሻ is the main stage in the process of constructing a mathematical model of the experiment. 1.2.1. Probability properties Now let us dwell on some very important properties of probability that follow from the axioms P1, P2, P3 (see Definition 3). 10. P 0 . I.e. : : , then by the properties Р2, Р3: 63
1
P(:)
P( :)
P(:) P()
1 P , i.е. P
0.
20. P( A ) 1 P( A) , i.e. A A : , then by the properties Р2, Р3, 1 P( A) P( A) . 30. If A B , then P( A) d P( B) . Indeed, in this case
B A ( B \ A) , P( B)
P( A) P( B \ A) t P( A) .
(because, by the property Р1, we have P B \ A t 0 ). 40. The probability of any event is between zero and one: For any A ࣠, 0 d P( A) d 1 . Indeed, i.e. A : , then by the axioms Р1, Р2 and property 30
0 d P( A) d P(:) 1 . 50. P( A \ B) Because
P( A) P( AB) .
A\ B
AB , A
A B B
AB AB ,
thus, by the axiom Р3
P( A)
P( AB) P( AB )
P( AB) P( A \ B) .
60. Addition formula for probabilities (for two events).
P( A
B)
P( A) P( B) P( AB) .
We have:
A B ( A \ B) AB (B \ A) Then by the axiom Р3 and property 50
P( A B)
P( A \ B) P( AB) P(B \ A)
P( AB)
P( A) P( AB) P( AB) P(B) P( A) P( B) P( AB) .
70. P( A B) d P( A) P(B) . This property is a corollary of the property 60 and axiom Р1. 64
80. Addition formula for probabilities (general case) For any events A1 , A2 , ..., An : §n · P ¨ Ai ¸ ©i 1 ¹
n
¦ P Ai ¦ P( Ai Aj ) ¦ P( Ai Aj Ak ) ... (1) n1 P( A1 A2 ... An ) . i 1
i j
i j k
We can prove this property, for example, by using the property 60 and induction by the number of events. §f · f 90. P ¨ Ai ¸ d ¦ P Ai . ©i 1 ¹ i 1 Let’s introduce the sequence of events: B1
A1 , B2
A2 A1 , B3
A3 A2 A1 ,..., Bn
An
n 1 i 1
Ai
An An 1 A1 ,... .
Then
and by the property 30
f
f
i 1
i 1
¦ Bi
Bi B j (i z j ),
Ai ,
P Bn d P An (n 1,2,...) .
Then by the axiom Р3 §f · P ¨ Ai ¸ ©i 1 ¹
§f · P ¨ ¦ Bi ¸ ©i 1 ¹
f
f
i 1
i 1
¦ P Bi d ¦ P Ai .
Now for the probability (probability function) we introduce the following axioms Р3′, Р3′′, Р3′′′ under the name of axioms (properties) of continuity: Р3′. а) A set function P is finitely additive on algebra ࣛ, i.е. for any A, B ࣛ, A B , P( A B) P( A) P( B) ; b) For any A1 A2 ..., An ࣛ,
f
A
n
ࣛ ,
n 1
§f · P ¨ An ¸ lim li P An . © n 1 ¹ nof
Р3′′. а) A set function P is finitely additive on algebra ࣛ; b) For any A1 A2 ..., An ࣛ,
f
A
n
n 1
65
ࣛ,
§f · P ¨ An ¸ lim li P An . © n 1 ¹ nof
Р3′′′. а) A set function P is finitely additive on algebra ࣛ; f
b) For any A1 A2 ..., An ࣛ,
n 1
§f · P ¨ An ¸ ©n 1 ¹
,
An
0.
Р3′ is called an axiom (property) of continuity from above, Р3′′ is called an axiom (property) of continuity from below, Р3′′′ is called an axiom (property) of continuity in « zero». The following theorem, which justifies the replacement of the axiom of countable additivity by the axioms of continuity and the axioms of continuity by each other, has numerous applications. Theorem (basic theorem). The axioms of countable additivity (sigma additivity) and continuity are equivalent: Р3֞Р3′ ֞Р3′′ ֞Р3′′′. Proof. Р3 ฺР3′. Let f
A1 , A2 ,... ࣛ, A1 A2 ... ,
A
n
ࣛ.
n 1
Let’s introduce a sequence of events:
A0
, B1
A1 \ A0 , B2
A2 \ A1 , B3
A3 \ A2 ,... Bn
An \ An1 ,...
Then
(i z j ) ,
Bi B j
f n 1
An
f
¦ Bn . n 1
Therefore, according to the axiom P3 (which, by the condition of the theorem, is satisfied) and determining the sum of the series, we can write: §f · P ¨ An ¸ ©n 1 ¹
§f · P ¨ ¦ Bn ¸ ©n 1 ¹
f
¦ P( Bn ) n 1
N
lim ¦ > P( An ) P( An1 )@
N of n 1
66
N
lim ¦ P( Bn )
N of n 1
lim P( AN ) .
N of
Р3′ฺР3′′. Let
A1 , A2 ,... ࣛ, An An1 , n 1,2,..., Let’s introduce the events Bn axiom Р3', §f · P ¨ Bn ¸ ©n 1 ¹
n
An
f
A
n
ࣛ.
n 1
1,2,... . Then Bn Bn1 and by the
lim P( Bn )
lim P( An ) 1 lim P( An ) .
§f · P ¨ An ¸ ©n 1 ¹
§f · §f · P ¨ An ¸ 1 P ¨ An ¸ . ©n 1 ¹ ©n 1 ¹
nof
nof
nof
On the other hand,
§f · P ¨ Bn ¸ ©n 1 ¹ Thus,
1 lim P( An ) 1 P( n of
f n 1
An ) ,
from where follows
§f · P ¨ An ¸ ©n 1 ¹
lim P( An ) .
A1 , A2 ,... ࣛ, Ai A j
(i z j ) ,
nof
Р3′′ฺР3′′′. Obviously Р3′′′ฺР3. Let f
¦ A ࣛ . i
i 1
Then §f · P ¨ ¦ Ai ¸ ©i 1 ¹
§ n · § f · P ¨ ¦ Ai ¸ P ¨ ¦ Ai ¸ , ©i 1 ¹ © i n1 ¹
and, i.e. f
¦ Ai p
i n1
( n o f) ,
then
§ n · lim P ¨ ¦ Ai ¸ nof i 1 nof 1 ©i 1 ¹ f f ª § · § ·º §f · lim « P ¨ ¦ Ai ¸ P ¨ ¦ Ai ¸ » P ¨ ¦ Ai ¸ . nof © i n1 ¹ ¼ ©i 1 ¹ ¬ ©i 1 ¹
f
¦ P( Ai ) i
n
lim ¦ P( Ai )
67
ז
Examples 1. For the sequence of events A1 , A2 ,... ࣠ the upper and lower limits are respectively the events: f
A
f
n 1k n
Ak ,
f
A
f
n 1k n
Ak .
It is clear that A A , therefore always P( A ) d P( A ) (probability property
30). If f
An n A ( i.e. An An1 ,
n 1
An
A ),
An
A ),
or f
An p A ( i.e. An An1 ,
n 1
A (Prove!) then A Let’s show that in this case
P( A )
P( A )
lim P( An ) . nof
We can write: f
A
n 1 f
A
n 1
Bn , Bn
Cn , Cn
f k n f k n
Ak Ak
f k n 1 f k n 1
Ak
Bn1 ;
Ak
Cn1 ;
Then, in accordance with the axioms (properties) Р3″ and Р3′: § f · lim P( Bn ) lim P ¨ Ak ¸ ; nof nof ©k n ¹ § f · P( A ) lim P(Cn ) lim P ¨ Ak ¸ . nof nof ©k n ¹
P( A )
A , then by applying ther first from the last relations Furthermore, i.e. A when An p A , and the second when An n A , we obtain an equality we need P( A )
P( A ) lim P( An ) . nof
68
Note that an event A in the example considered means that there will be infinitely many events from the sequence of events A1 , A2 ,... The validity of the foregoing follows from the following chain of relations:
Z A Z
f
f
n 1k n
Ak Z
f k n
Ak , n 1, 1, 2,... 2 Z Akn , n 1, 2,... .
(Recall: {an event A occurred}֞ { Z A occurred}). Likewise, the probabilistic meaning of the event A is the following: everything, perhaps except for a finite number of the events of sequence A1 , A2 ,... , occurred. We also pay attention to the fact that the events A and A are often denoted by
lim An , lim An or limsup An , liminf An : n
n
A* lim An lim sup An , A* lim An lim inf An . 2. Geometric definition of probability. Let the experiment consist in throwing a random point into the area : {Z} of an n – dimensional Euclidean space Rn. Let the concept of an ndimensional «volume» be defined for an area : (for n = 1 it is length, for n = 2 – area and for n = 3 – ordinary volume, etc.), and : has a finite volume: mes : f . Here, the set of all possible outcomes of the experiment is a set : and the experiment means selecting a point Z : . Sigmaalgebra ࣠ of subsets of : is all subsets of : , for which the concept of n dimensional volume is defined: ࣠ = { A : : for A the concept of n dimensional volume is defined}. For example, as the set of events ࣠ we can take socalled σalgebra ߚ ൌ ߚ( : ) of Borel subsets of : (for more details, see §2). We define the probability of an event A ࣠as the ratio of its volume to the volume of : : P( A)
mes( A) . mes(:)
(1)
It is clear that all axioms of probability (axioms P1, P2, P3) are satisfied. Realizing the corresponding Lebesgue measure under the n dimensional volume, we obtain a probability spaceሺ : ǡ ߚሺ : ሻǡ ܲሻ, where the probability is defined by the formula (1). This probability space is the model of problems, in which a particle (point) randomly rushes into the area : . It is assumed that its position is uniformly distributed in this area, i.e. the probability of a point falling into the area A : is proportional to the n dimensional volume of this region. 69
In this example we introduced the concept of socalled uniform distribution in the area Ω. The method described above for determining probability (see formula (1)) is called the geometric definition of probability (or geometric probability). 3. We give several properties of probability associated with «extreme» events, i.e. events having minimum and maximum probabilities (zero or one). Let A, B be some events: A, B ࣠. Then а) If P( B) 0 , then for any event A
P( AB) 0 , P A B P( A) ; Indeed, as AB B , then 0 ≤ P(AB) ≤ P(B) = 0, and the second relation is a consequence of the first one and, for example, the addition of the probabilities formula P( A B)
P( A) P( B) P( AB) = P(A) .
b) If P( B) 1 , then for any event A
P A B 1,
P AB P( A) ,
because B A B it implies 1 ≤ P( A B) ≤ 1, and the second relation can be obtained as a corollary of the addition of probabilities formula. c) If for the events A and B the probability of their product is P( AB) 1 , then P( A) P( B) 1,
because
AB A , AB B ,
therefore
1 P( AB) d P( A) , 1 P( AB) d P( B) . d) If for the events A and B the probability of their product is P( A B) 0 , then
P( A) 0, P B 0 , because
0 d P A d P( A B) d 0 ,
0 d P B d P( A B) d 0 .
Remark. We know that an impossible event has zero probability and this event will never happen as a result of the experiment. It turns out that the events different from , but having zero probability, can occur as a result of the experiment. 70
Example. The point is randomly placed on the segment [0,1]. An event A is that the point will fall exactly in the middle of the segment. Then P(A) = 0, but A can occur. Similarly, a certain event Ω has a probability of 1 and this event will necessarily occur as a result of the experiment. However, there are events that have a probability of 1, but may not pass through the experiment (for example, the event B = Ā in the example above). 4. BorelCantelli Lemma. Let A1 , A2 ,... be a sequence of events and f
¦ P( An ) f . n 1
Then P( A ) 0 , i.е. P A*
1.
In other words, with probability of 1 there will be only a finite number of events from the sequence of events A1 , A2 ,... Proof. Indeed, by the axiom stating continuity from below and in the conditions of Lemma P( A )
f § f · lim P ¨ Ak ¸ d lim li ¦ P( Ak ) nof © k n ¹ nof k n
P( A ) 0 .
0,
ז
As we noted above, in the probability theory, along with the usual set of theoretical terminology, a somewhat different terminology is used. It relates to the fact that the sets from : (belonging to an algebra or σalgebra ࣠) are interpreted as events. In this regard, below in Table 1 we give a brief terminological dictionary of the correspondence of the basic concepts and notations in the set theory and the probability theory. Table 1 Designation
Terminology in the set theory
Z
element, point Set of points (basic set), space
:
࣠
empty set σalgebra of sets
A ࣠
set of points sum (union) of sets A and B, i.e. set of points Z , belonging to A or B
A B
intersection of sets A and B, i.e. a set of points Z , 71
Terminology in the probability theory outcome, elementary event Sample space (set) of all outcomes, certain event impossible event σalgebra of events (field of events) event (if Z A occurred, then they say that an event A occurred) sum of events A and B; an event that consists in the fact that A occurred or B occurred; an event that consists in the fact that at least one of events A or B occurred intersection of events A and B; an event that consists in the fact that
belonging to A and B difference between sets A and B, i.е. the set of points Z , belonging to A, but not belonging to B
A B (or AB )
А\ В
complement of event А, i.е. the set of points Z, not belonging to А a set А is a subset of В, i.е. any element Z A is also an element of В
A =:\ A
A B
A and B are equal sets, i.е. A
A B
A B and В А
B
( AB
)
sets А and В do not intersect
sum (or union) of disjoint sets
A B
symmetric difference between sets А and В, i.е. the set
A \ B B \ A
A'B f
sum (union) of sets
n 1
A1 , A2 ,...
An
events A and B occurred simultaneously difference between events A and B, i.е. an event that consist in the fact that an event A occurred but an event B didn’t occur an opposite event to A; an event that consist in the nonoccurrence of event A an event A implies an event B, i.e. from the occurrence of A follows the occurrence of В А and В are equal events: events A and B occur or do not occur at the same time events А and В are mutually exclusive, events А and В can’t occur simultaneously sum of disjoint events; an event that consists in the fact that one of two disjoint events A or B occurred symmetric difference between events A and B; an event that consists in the fact that only one of events A and B occurred, but not both events occurred simultaneously sum of events A1 , A2 ,... ; an event that consists in occurrence of at least one of events
A1 , A2 ,... f
¦ An n 1
sum (union) of pairwise disjoint sets A1 , A2 ,...
f
intersection of sets
n 1
A1 , A2 ,...
An
An n A
или А
lim n An n
An p A
или A
lim p An n
an increasing sequence of sets An , converging to А, i.е. A1 A2 ..., and A
A1 , A2 ,... intersection of events A1 , A2 ,... ; an event that consists in the fact that events A1 , A2 ,... occurred simultaneously an increasing sequence of sets An, converging to an event A
f
An
n 1
a decreasing sequence of sets An , converging to А, i.е.
A1 A2 ..., A
sum of pairwise disjoint events A1 , A2 ,... ; an event that consists in occurrence of one of disjoint events
f
An
n 1
72
a decreasing sequence of events An, converging to an event А
the set
lim An
f
n
An event, which consists in the fact that there will be infinitely many events from A1 , A2 ,...
f
Ak
(or A*, or lim sup An)
n 1 k n
lim An
An event, which consists in the fact that there will be all events A1 , A2 ,... , with the possible exception of a finite number
the set
n
f
f
Ak
(or A , or lim inf An)
n 1 k n
1.3. Tasks for independent work 1. Prove the following relations:
lim sup An Bn lim sup An lim sup Bn , lim inf An Bn lim inf An lim inf Bn , lim sup An lim inf Bn lim sup An Bn lim sup An lim sup Bn ,
lim sup An
lim inf An ,
lim sup An .
f, xn .
2. Let x n be a numerical sequence, An Then x
lim inf An
lim sup xn and A lim sup An obey the following relations: f, x A f, [email protected] .
In other worlds, A = f, x or A = f, [email protected] Prove this assertion. 3. Let A1 , A2 ,... and B1 , B2 ,... be two sequences of events and the limit PBn o 1 (n o f) takes place. Show that the equality of limits lim P( An ) lim P( An Bn ) takes place if and only if at least nof
nof
one of the limits exists. 4. For A, B : the sequences A1 , A2 ,... of subsets of : An :, n 1,2,... are defined in the following way:
An
A , if n is an even number; An
B if n is an odd number.
Show that then A B ,
lim An n
lim An n
A B.
5. If for the sequence of subsets An :, n 1,2,... the lower and upper limits exist and are equal, i.е. lim An n
lim An then we will say that there exists a limit and define this limit by the relation n
lim An = lim An n
n
lim An . n
Show the validity of the following statements: 1) If A1 A2 ... , then lim An n
3) If Ai A j
f
An ;
n 1
(i z j ) , then lim An n
2) If A1 A2 ... , then lim An n
f
An ;
n 1
.
6. Let ( : ǡ ࣠ǡ ܲሻ be a probability space, An ϵ ऐ, n = 1, 2, … and there exists the limit lim An n
(see the previous task, No. 5). 73
Show that P§¨ lim An ·¸ lim P( An ) . ¹ n © n
7. Prove that for any events A1 , A2 ,..., An : а) P §¨ Ai ·¸ ¦ P Ai ¦ P( Ai Aj ) ¦ P( Ai Aj Ak ) ... (1) n1 P( A1 A2 ... An ) (this formula is i j i j k ©i 1 ¹ i 1 called the formula of addition of probabilities); n
n
b) P A1'A2 '...'An
2 ¦
1di1 i2 dn
§n · c) P¨ Ak ¸ ©k 1 ¹
n
¦ P Ai
i 1
P( Ai1 Ai2 ) 4
¦
1di1 i2 i3 dn
P( Ai1 Ai2 Ai3 ) ... ሺെʹሻିଵ P( A1 A2 ... An ) ;
¦ P Ak ¦ ¦ PAk A j ¦ n
k 1
n1 n
n2
k 1 j k 1
k 1
n 1
¦
j k 1 i
n
¦ P Ak A j Ai ... j 1
§2. Algebras, sigmaalgebras and measurable spaces 2.1. Algebras and sigmaalgebras As we have already noted, algebras and σalgebras are constituent elements in the construction of probability spaces. In this subsection we will first give some important examples of algebras and σalgebras. Then we will prove a number of statements that will be used further. Let : {Z} is some sample space. Then the sets of systems ࣠ {= כ, : }ǡ࣠ = כ൛ A ǣ A : ൟ are algebras and Valgebras. By definition, ࣠ כcontains all subsets of the sample space : and is the «richest» Valgebra, and ࣠ כis the «poorest» Valgebra. If A : , then the system ࣠ʏ = { , А , A , : } is also a σalgebra (it is called the σalgebra, generated by the event A). If D ^D1 D2 , ... ` is an countable partition of : (i.е. Di : , Di z , Di D j ( i z j ), ¦ Di : ), then the system i
°
½ °
n
D D ®¦ Di j , i j z il ( j z l ) , n f¾ ° ¯j
° ¿
1
is an algebra and this algebra is called an algebra generated by the partition D. 74
Note that the algebra D D is finite (i.е. only a finite number of sets of the form indicated in the definition enter it). It turns out that the inverse property also holds. Theorem 1. Any finite algebra ࣛ of subsets of : is generated by some partition : , i.е. in ࣛ there are pairwise disjoint B1 , B2 ,..., Bn such that :
n k 1
Bk
( Bi are called «atoms» of ࣛ) and any element А ࣛ is represented as the sum A
m
Bik .
k 1
EZ
Proof. Let a system ࣛ be a finite algebra of subsets of : . Let’s define a system { B ࣛ: Z B }. B and show that for any Z1 z Z2 or Let’s introduce for any Z : a set BZ BEZ
BZ2 , or BZ BZ . Note first, that for any Z : and B ࣛ the statement is true: if Z B , then BZ B . If Z1 BZ , then BZ1 BZ2 .
BZ1
1
2
2
Furthemore, if Z2 BZ1 , then BZ2 BZ1 , hence, BZ1
BZ2 . The case Z2 BZ
1
is impossible, because this case leads to a contradiction BZ2 BZ1 (i.e. BZ1 BZ2 ). We now choose the distinct sets B1 , B2 ,..., Br from the collection of sets BZ . These sets form a partition: Bi B j (i z j ) , B1 B2 ... Br : . Because any set B ࣛ can be represented as B
BZ , then this partition
ZB
generates an algebra ࣛ, that we need. For example, for A : the partition D { A, A } generates an algebra
ז
࣠ʏ = { , А, A , : }, and for A1 A2 A3
Ω the partition D { A1 , A2 , A3 } generates an algebra
ࣛ= ^, :, A1 , A2 , A3 , A1 A2 , A1 A3 , A2 A3 `. Lemma 1. If the sets system ࣛ is a system of some subsets of : , then there is the smallest algebra ߙሺࣛሻ and the smallest σalgebra ߪ(ࣛ), which contain all sets of ࣛ. Proof. There is at least one algebra and ߪalgebra, which cointain ࣛ (for example, it is ߪalgebra ࣠ = כሼ A ǣ A : ሽሻǤ Let's now form a system ߙሺࣛሻ (respectively ߪ(ࣛ)), consisting of such sets, which belong to any algebra (σalgebra), which contains ࣛ. It is easy to show that such 75
a system is an algebra (σalgebra), and besides, it is the smallest (because it is defined as the intersection of all algebras (σalgebras) containing ߙሺࣛሻ (ߪ(ࣛ)). ז The system ߙሺࣛሻ (ߪ(ࣛ)) is called the smallest algebra (σalgebra) generated by the set system. In many cases the question of additional conditions algebra or some other set system is also a σalgebra arises. Definition 1. Let a sytem ࣧ be a system of some subsets of : . Then, if from the fact that An ࣧ, n = 1, 2, … and An n A ( An p A ) it follows that A ࣧ , then, ࣧ is called a monotonic class. Theorem 2. In order that the algebra ࣛ be a V algebra, it is necessary and sufficient that it be a monotonic class. Proof. It is clear that any V algebra is also a monotonic class. Let now ࣛ be a monotonic class and An ࣛ, n = 1, 2, ... Then Bn
n
Ai ࣛ i 1
f
and Bn Bn 1 . Hence, by definition of the monotonic class, Bn n Ai ࣛ. i 1
f
Similarly, we can show that
Ai ࣛ.
ז
i 1
Remark 2. If a system ࣛ is a system of some subsets of : , then there is a smallest monotonic class ߤ(ࣛ), which contains ࣛ (the proof of existence of μ(ࣛ) is analogous to the proof of existence of ߙሺࣛሻሻ. We now present the following theorem without a proof. Theorem 3. ([9], pp. 3031). Let ࣛ be some algebra. Then the smallest sigmaalgebra and the smallest monotonic class containing this algebra coincide: μ(ࣛ) = σ(ࣛ). 2.1.1. The theorem on the continuation of probability Let’s return to the definition of a probability space. Let a triple ሺ : ǡ ࣛǡ ܲሻ form a probability space in the broad sense (ࣛ is algebra). As we have seen, we can associate with the algebra ࣛ the smallest V algebra V (ࣛ) containing ࣛ ( V (ࣛ) is the smallest V algebra generated by the algebra ࣛ). The following question is of considerable interest for probability theory: does the probability measure P on ࣛ determine a probability measure on ࣠ = σ(ࣛ) and is this uniquely true? In other words, is it sufficient to define the probability P only on some algebra ࣛ that generates ࣠ (i.е. to construct a probability space ൫ : ǡ ࣛǡ ܲ൯in the broad sense withߪ(ࣛ) = ࣠) for the construction a probability space ሺ : ǡ ࣠ǡ ܲሻ? The answer to this question is given by the following theorem of Carathedori (theorem on the extension of probability (probability measure)). 76
Theorem (the theorem of Carathedori on the extension of probability). Let ൫ : ǡ ࣛǡ ܲ൯be an extended probability space. Then on ࣠ =ߪሺࣛሻ there is a unique probabilistic measure Q, such that Q(А) = Ρ(А) for all А ࣛ א. Here we do not give a proof of this theorem. The proof of this theorem adapted to the probability measure is given in [11] (see [11], pp. 308314). Any extended probability space ൫ : ǡ ࣛǡ ܲ൯automatically defines the probability space ൫ : ǡ ࣠ǡ ܲ൯, where ࣠ = ߪሺࣛሻ is the smallest σalgebra containing the algebra. 2.2. The most important examples of measurable spaces 2.2.1. Measurable space R , E R Borel V algebra
f, f
E R . Let R
is a real number scale,
a, [email protected] ^x R : a x d b` for all f d a b f . We agree to understand such an interval a, [email protected] as the interval a, f . (This agreement is necessary in order for the complement to the interval f,[email protected] to be an interval of the same kind – open to the left and closed to the right). Let’s define the set system ࣛ as follows: ࣛ
®A : A ¯
n
½
i 1
¿
¦ ai , bi @, n f.¾ .
A system ࣛ with an empty set included in it is an algebra, but is not a σalgebra (for example, if An
0, 1 1/ [email protected] ࣛ,
f
then
An 0,1 ࣛ ,
n =1,2,…)
n 1
The smallest V algebra ߪ(ࣛ), containing ࣛ, is called a Borel ߪalgebra on the number scale, and the elements of the Borel V algebra are called the Borel sets. Everywhere further, according to the tradition, a V algebra defined in this way will be denoted by E R (or β, or E1 ). If we introduce the system of intervals J
^I : I a, [email protected]` and denote by V ( J ) the
smallest V algebra which contains J , then it is not difficult to verify that σ(J) = β(R). In other words, one can come to the Borel σalgebra from the system J , without of reference to the algebra ࣛ, because V ( J ) V D J (ߙሺJ ) is the smallest algebra which contains J ). Note that
a, b
1º § ¨ a, b » , a b ; n¼ 1©
f n
> a, b @
1 º § ¨ a , b » , a b ; ^a` n ¼ 1©
f n
77
f
§ 1©
1
º ¼
¨ a n , a» .
n
These ratios show that in the Borel σalgebra E R , in addition to the intervals of the form
a, b @ ,
there are onepoint sets
a, b , >a, [email protected], >a, b , f, b , f, [email protected], a, f .
^a`
and all intervals of the forms
From what has been said, we conclude that we can construct a Borel V algebra E R based not only on the intervals of the form a, [email protected] , but also on any of the forms of the last six intervals. Thus, the Borel V algebra E R on the number scale is the smallest V algebra containing all possible intervals on the number scale. Roughly speaking, a Borel V algebra can be imagined as a collection of sets obtained from intervals by means of a countable number of operations of union, intersection, and taking of complements. Measurable space ( R, E R ) will be indicated sometimes by ( R, E ) , sometimes by ( R1 , E1 ) .
2.2.2. Measurable space R n , E R n
Borel σalgebra E R n . Let
Rn
R u R
u ... R u
^x1 , x2 ,..., xn :
x1 R , ..., x n R`
n
be a direct (Cartesian) product of n exemplars (copies) of the number scale R. We define the system: J ( n)
^I u I 1
2
u ... u I n : Ii
ai , bi @,
ai bi , i 1,2,..., n` .
The system J n forms an algebra. The smallest V algebra V J ( n ) that contains J n is called a Borel V algebra
on R n and will be denoted by E R n :
E R n V J (n) . Elements of V algebra E R n are called the n dimensional Borel sets (or Borel sets on R n ). We will show that the definition of a Borel V algebra E R n could be obtained differently. 78
Indeed, along with rectangles I ( n) I1 u I 2 u ... u I n we consider rectangles B( n) B1 u B2 u ... u Bn with Borel sides ( Bk is a Borel set on a number scale, standing R u
R u ... u in the kth place in the direct product R . The smallest V algebra containing n
all possible rectangles with Borel sides is denoted by
R R
E
E
E R ..
E n
n
and is called a direct product of V algebras E R : ߚ ሺሻ ൌ ߚሺܴሻ ٔ ߚሺܴሻ ٔ ǥ ٔ ߚሺܴሻ ൌ ሼܤଵ ൈ ܤଶ ൈ ǥ ൈ ܤ ǣܤ ߚ אሺܴሻሽǤ
E R n , in other words, the smallest σalgebras,
We show that in fact E ( n )
I1 u I 2 u ... u I n , coinside with the σalgebras, generated by the wider class of rectangles B( n) B1 u B2 u ... u Bn with Borel sides. generated by the rectangles I ( n)
To prove this statement, i.e., the equality E ( n )
E R n , we will first prove an
auxiliary lemma. Lemma. Let a system E be a system of subsets of : , B : , and a set system E B is defined as follows: EB
In this case
^A B : A E`.
V (E B) V E B .
Proof. E V E , therefore E B V E B .
A set system V (E) B is a V  algebra, therefore, from the last relation we obtain the inclusion V (E B) V E B . In order to prove the converse inclusion, we introduce a system of sets: EB
^ A V (E ) : A B V (E B)`.
The system E B is the V algebra (because V E and V E B are σalgebras) and the following inclusions take place: 79
ك )(ߪ ك. It follows that V (E ) V (E B ) E B V (E ) , т.е. V (E ) E B .
Therefore, for any A V E , A B V (E B) ,
Hence
ז
V (E ) B V (E B) .
We now prove that E R n E n . The case of n 1 is obvious. Let n 2 . Since E R 2 E R u E R , then it suffices to show that for any
B1 , B2 E R the inclusion B1 u B2 E R 2 takes place. Let
R1 u R2 ,
R2
where R1 and R2 are «first» and «second» real number scales,
E1 E2
R1 u E2
E1 u R2
^B
^B
`
B1 u R2 : B1 E1 ,
1
`
R1 u B2 : B2 E2 , Ei
2
E Ri (i 1, 2) ,
J1 , J 2 are the sets of intervals in R1 and R2 (respectively) , and
J1
J1 u R2 , J 2
R1 u J 2 .
Then, by the above lemma, B1 u B2
V J1
B2 E1
B1
B2 V J1
which was to be proved. The case n ! 2 is proved similarly.
B2 J2
V J1 B2 V J1 u J 2
ז
80
R
2.2.3. Measurable space
f
, E Rf
Borel σalgebra E R f . This σalgebra plays a significant role in the probability theory, since it serves as a basis for constructing probabilistic models of experiments with an infinite number of outcomes. Let Rf
^ x x , x ,... : f x 1
2
k
1, 2,...`
f,k
R u R u ... .
Let’s denote by I k , Bk (respectively) the intervals a k , bk @ and Borel sets of the kth number scale (with the coordinates x k ). Let’s consider cylindrical sets J I1 u ... u I n
^x : x x , x ,... : x I , x
I 2 ,..., xn I n ` ,
(1)
J B1 u ... u Bn
^x : x x , x ,... : x B , x
B2 ,..., xn Bn ` ,
(2)
J B( n)
1
2
1
1
2
1
1
1
2
2
^x : x x , x ,... , x , x ,...x B 1
2
1
2
( n)
n
,
B( n)
B1 u ... u Bn ` .
(3)
We can consider each of the cylinders J B1 u ... u Bn or J B( n ) as a cylinders with bases in Rn1, Rn2 ,... : J B1 u ... u Bn J B
etc.
(n)
J B1 u ... u Bn u R ,
J B( n) u R ,
It follows that both systems of cylinders J B1 u ... u Bn and J B( n ) form algeb
ras. It is not difficult to verify that sets composed of unions of disjoint cylinders J I1 u ... u I n also form an algebra.
We denote by E Rf , E1 Rf and E 2 R f the smallest algebras containing
all the sets of the forms (1), (2), (3), respectively. It is clear that
E R f E1 R f E 2 R f . Let’s show that in fact these three σalgebras coincide. 81
For proof let’s denote for any n 1, 2, ...
^A R
En
n
`
: ^x : x1 , x2 ,... A` E R f .
Let B( n ) E R n . Then
B ( n) E n E R f .
But E n is a σalgebra, i.e.
E R n V E n E n E R f , hence,
E2 Rf E Rf .
So,
E Rf E1 Rf E 2 Rf .
Further, the sets from E R f will be called Borel sets (in R f ).
We give some examples of (often occurring) Borel sets on E R f :
^ b) ^x R
` ^x : x ! a` E R , a` ^x : x a` E R ,
f а) x R : sup xn ! a f
: inf xn
^
f
n
n
f
n
n
` ®x : x ¯ f f
c) x R f : limxn a
f
m
n 1k 1m n
^
1½ a ¾ E Rf , k¿
^x : supinf x ! a` E R .
`
f d) x R : lim xn ! a
f
n
mt n
m
f
f
e) { x R : the set of x R , for which lim x n exists and is finite}= n
f f
f
®¯ x R f : k 1n 1m 1
1½ xnm xn ¾ E R f . k¿
2.3. Tasks for independent work 1. Let : B
^0,1,2` . Give an example of σalgebras, which containes the sets
^1,2` . 2. Describe a σalgebra of subsets of the segment : 2 1 а) ª«0, º» , ª« ,1º» ; ¬ 3¼ ¬3 ¼ 1 1 b) ª«0, º» , ª« ,1º» ; 2 ¬ ¼ ¬2 ¼
82
A
>0,[email protected] , generated by the sets:
^0,1` and
c) ^0` ,^1` ; d) ; e) > 0,[email protected] ;
f) the set of all rational numbers of > 0,[email protected] . 3. Describe an algebra of events, generated by the: а) events with the zero probabilities; b) events with the probabilities of 1. 4. Let E1 , E 2 be the two ɐalgebras of subsets of some set Ω. Are the following classes of sets: а) E1 E 2 ; b)ߚଵ ߚ ଶ={А:А ߚ אଵ or Аߚ אଶ}; c) E1 \ E2 ^ A : A E1 , A E2 ` ; d)ߚଵ οߚଶ={А:Аߚ אଵ ̳ ߚଶ or Аߚ אଶ ̳ ߚଵ} σalgebras? 5. Let’s introduce on the number scale R (f, f) the metrics: x y
for x, y R , U1 x, y
1 x y
.
а) Prove that U1 x, y is really a metrics. b) Let’s denote by E 0 R the smallest ɐalgebra, generated by the open sets
^x R : U x, x U , U ! 0,
SU x0
0
1
`
x0 R .
Prove that E0 R E R .
6. Let E0 R n be the smallest ɐalgebra, generated by the open sets
^x R
SU x0
n
`
: U n x, x 0 U , x 0 R n , U ! 0
in metrics
Un x, x0 ¦ 2 k U1 xk , xk0 , x n
x1 , x2 ,..., xn Rn ,
k 1
U1 x k , x k0
x k x k0
1 x k x k0
x0
x , x ,..., x R , 0 1
0 2
0 n
n
.
E R .
Prove that E 0 R n
n
7. For all points x, x0 R f the distance between them is defined by the formula:
U f x, x 0
f
x k x k0
k 1
1 x k x k0
¦ 2 k
.
Let E 0 R f be the smallest ɐalgebra, generated by the open sets
^x R
SU x0
f
`
: U f x, x 0 U , x 0 R f , U ! 0 . 83
Prove that thisɐalgebra coincides with the σalgebra E R f , i.е. E R f E 0 R f . 8. The eventgenerated σalgebra. Let ࣠ be a V  algebra, B ࣠. Prove that a system of sets ࣠ ={ AB : A ࣠} is aߪalgebra (this ߪalgebra is called a ߪalgebra, generated by the event В). 9. Let ሺ : ǡ ࣠ǡ ܲሻ be a probability space, B ࣠, PB ! 0 and ࣠ ={ AB AB : A ࣠} is a ߪalgebra, generated by the event B (see Task #8). Let’s define on the ߪalgebra ࣠ the function of the set PB by the ratio PB AB
P AB P B
P AB . P B
Show that the function PB is a probabilistic function in the space (В,࣠ɦ ), i.е. on (В,࣠ ) it satisfies the axioms Р1, Р2, РЗ. (The probability space (В,࣠ , PB ), defined in this way, is called the probabilistic space generated by the event.) 10. Continuation. Let A ࣠, B ࣠, PB ! 0 . Let’s define a setfunction PB A by the ratio:
P AB . P B
PB A
Prove that the sodefined setfunction PB is a probabilistic function (probability) on (Ωǡ ࣠ሻ (thus, it is possible to construct a new probability space (В,࣠, PB )). 11. Could it be that: а) The number of elementary events is finite, but the number of events is infinite? b) The number of events is finite, but the number of elementary events is infinite? 12. Letሺ : ǡ ࣠ǡ ܲሻ be a probability space. With the help of a bijection f x : : o :1 we form the following triple: ( : ଵ ,࣠ଵ ǡ ܲଵ ), where ࣠ଵ= f ሺ࣠) = { f A : A ࣠}, and the numerical function P1 for B ࣠ଵ is defined in the following way: P1 B P f B . Is the triple ( : ଵ ,࣠ଵ ǡ ܲଵ ) a probability space? 13. Let : =(0, +λ), ࣠={А: А ك: }. For any A ࣠ let’s define the function Р(∙): ࣠ ื ܴ as follows:
P A
¦ 2 k ,
kA N
where N ^1,2,...`. Show that the triple ሺ : ǡ ࣠ǡ ܲሻ is a probability space. 14. Show that the sets A
B
^x R
f
`
: lim xn ! a , n of
f ½ f ® x R : ¦ xn ! a ¾ , n 1 ¿ ¯
84
C
^x R
f
: at least for one n takes place
n
¦ xk
k 1
½ 0¾ ¿
Are a Borel sets in the space R f . 15. Let ሺ : ǡ ࣠ǡ ܲሻ be a probability space. If for А, В ࣠ אthe probability of their symmetric difference P A'B 0 , then we will call such events the equivalent events and denote by ࣛ the class of sodefined events. If ࣛଵ and ࣛଶ are any classes of equivalent events, then the «distance» ߩሺࣛଵ ,ࣛଶ ) between ࣛଵ and ࣛଶ we define as follows: for any A1 ࣛ, A2 ࣛ,
U A1 , A2 P A1'A2 Show that the sodefined ߩ is a metrics on ࣛ and with this metrics ࣛ will be a complete metric space. 16. In the probability space ሺ : ǡ ࣠ǡ ܲሻ for any А, В ࣠ אwe will define a «distance»:
U 2 A, B
U1 A, B P A'B , P A'B , if P A B z 0 , ° ® P A B °0, if P A B 0 . ¯
Show that these «distances» satisfy the triangle inequality.
§3. Methods for specifying probabilistic measures on measurable spaces 3.1. Space (R, β(R)). Distribution function Let P P( A) be a probabilistic measure (probability) defined on σalgebra of Borel sets E R of a number scale. Consider the probability space R , E ( R), P . For the interval A
f , [email protected] E R
define the function F
F ( x)
F ( x) as
P f , [email protected] , x R .
(1)
Theorem 1. The function (1) has the following properties: F1. If x1 x2 then F x1 d F x2 (i.е. F F ( x) is a monotonically nondecreasing function); F ( x) 1 ; F2. F f lim F ( x) 0 , F f xlim n f xp f
F3. F ( x ) is a rightcontinuous function F ( x 0) limits at each point x R . 85
F ( x) , x R , it has left
Proof.
Ax1
The
property
f , x1 @ f , x2 @
F1
is
the
corollary
of
the
following. As Ax2 , then by the property of probability P Ax1 d P Ax2 .
The property F2 is the corollary of the following. From xn p f it implies that
f , xn @ p , from
yn n f it implies that f , yn @ n R . We also use a continuity from below (property Р3″) and continuity from above (property Р3′) of probability and monotonicity of the function F ( x) . It is not difficult to see that property F3 is also a consequence of the properties of continuity from below and from above. ז Definition 1. The function F F ( x) satisfying properties F1, F2, F3 is called the distribution function on the number line R . Thus, by the Theorem 1, the distribution function F F ( x) , defined by (1), corresponds to each probability function P in the space R , E ( R) . It turns out that the converse is also true. Theorem 2. Let the function F F ( x) be the distribution function on the number scale R (f , f) . Then on R , E ( R) there is the only one probabilistic measure P such that for any interval f d a b f the following takes place: P a, [email protected] F (b) F (a) .
(2)
Proof. By the theorem on the extension of the probability, for constructing a probability space R , E R , P , it suffices to specify the probability P on the algebra ࣛ generated by intervals of the form a, [email protected] (because ߪሺࣛሻ ൌ ߚሺܴሻ). But we know that any element A of algebra ࣛ can be written in the form of a finite sum of disjoint intervals of the form a, [email protected] : A
n
¦ ai , bi @ , i 1
ai bi .
( a i , bi may be infinite). Let’s by definition P0 A
n
¦ ª¬ F bi F ai º¼ . i 1
Then the properties F1, F2 are satisfied, therefore the axioms Р1, Р2 are also fulfilled. Now it remains only to verify the countable additivity (or continuity) of ܲ on ࣛ. f
Let ܤ ࣛ א, Bn1 Bn , Bn
B ࣛ.
n 1
86
Let’s show that P0 Bn o P0 B , when ݊ ՜ λ (property of continuity «from below»). Without loss of generality, we can assume that B consists of only one interval a, [email protected] : B a, [email protected] . As B Bn , then for any n there is an interval an , bn @ , such that
an , bn @ Bn ,
and such that B
a, [email protected] an , bn @ .
Let a semiinterval
an , bn @
be
contained in Bn and be a maximal halfinterval containing B . Furthemore, Bn , starting
with some n , does not contain any halfintervals outside an , bn @ (if c, d @ Bn for all
n , then c, d @ B ). Thus, from monotonicity of Bn and the fact that B a, [email protected] it
follows, that starting from some number, for all n, Bn But by the property F3 , P0 Bn
an , bn @ , where
ܽ ൌ ܽǡ ܾ ՝ ܾ.
F bn F an o F b F a .
Thus, axiom P3 is also fulfilled. ז Thus, there exists a onetoone correspondence between the probability functions P defined on R , E ( R) and the distribution functions F F ( x) that are defined on the number scale and satisfy the conditions F1, F2, F3. The probability measure P constructed by the distribution function F F ( x) is usually called the LebesgueStiltes probability measure. Particularly important is the case when
F ( x)
0 , ° ®x , ° 1, ¯
x 0, 0 d x d 1, x !1
In this case, the probability measure corresponding to F(x) and denoted by λ is called the Lebesgue measure on the segment > 0,[email protected] . It is clear that O ሺܽǡ ܾ༱ ൌ ܾ െ ܽ. In other words, the Lebesgue measure of the interval a, b @ , as well as of any of
the intervals a, b , a, [email protected] , > a, b , > a, b @ is just its length ܾ െ ܽ :
O a, [email protected] O a, b O > a, b O > a, [email protected] b a . According to condition F3, the distribution function F points of discontinuity of the first kind. The value
px
F ( x 0) F ( x 0) 87
F ( x) F ( x 0)
F ( x) can have only
is called a jump of the distribution function F ( x) at the point x. A jump of the distribution function is equal to zero at points of continuity and is strictly greater than zero at the points of discontinuity. If for any H ! 0 , the inequality F ( x H ) F ( x H ) ! 0 takes place then such a point x R is called a point of growth of the distribution function F ( x) . Theorem 3. Any distribution function F F ( x) has at most a countable number of discontinuity points. Proof. Denote by C ( F ) (here and everywhere below) the set of points of continuity of F F ( x) , by C (F ) – the complementary set (the set of points of discontinuity), and by Dk – the set of points of discontinuity, in which the jumps of the func§ 1
1º
tion F ( x) lie in the halfopen interval ¨ , , k 1, 2,... . So, by definition: © k 1 k »¼ C(F ) Dk
^ x R : F ( x)
F ( x 0)` ,
® x R \ C ( F ) : F ( x) F ( x 0) ¯
C(F )
R \ C(F ) ,
1 º½ § 1 , » ¾ , k 1,2,... . px ¨ © k 1 k ¼¿
Then the number of elements of each set D k is at most k, therefore the number of elements of the set C ( F ) R \ C ( F )
f k 1
Dk (as a countable sum of not more than a
countable number of sets) is at most countable. ז Remark 1. We obtain from the results of the theorem we have proved that the set C ( F ) is everywhere dense on the real line R . But it turns out that C F can also be a countable everywhere dense set on R . Example. Let ^r1 , r2 ,...` be a set of rational numbers on a line (it is well known that this set is everywhere dense on R ). To each point rk we put in correspondence a jump prk 2 k . Then the set of points of discontinuity of a function F ( x) ¦ prk is the k :rk d x
ªC F º R . ¬ ¼ The fact that the function defined in this way is a distribution function is verified directly. A distribution function F x that changes its values only at points of a finite or countable set X ^x1, x2 ,...` is called a discrete distribution function.
set C F
^r1, r2 ,...` , and its closure is
If the interval a j , b j º¼ contains only one point x j C ( F ) , then P a j , b j º¼
F (b j ) F (a j ) .
If a j n x j , b j p x j , then 88
pj
P
^ x `
¦ P ^ x j `
F ( x j ) F ( x j 0) ,
j
1.
j
It is obvious that the introduced discrete distribution function F x can be represented in the form
F x
¦
j: x j d x
The set of numbers p1 , p2 ,... , where p j
pj .
P ^x j ` ,
¦ p j 1, is called the disj
crete probability distribution. Examples of frequently occurring discrete probability distributions are given in Table 1 below. Table 1 Discrete distributions Probabilities pk 1 , k 1,2,...N . N
Distribution Name Discrete uniform Bernoulli Binomial Poisson Geometric Negative binomial (Pascal) Hypergeometric
p1
p,
p0
Cnk p k q nk , k e O
Ok k!
, k
q k 1 p , k
Ckr11 p r q k r , k k CM C NnkM
C Nn
,k
Parameters N
1,2,....
0 d p d1 ,
q.
q 1 p .
0,1,2,..., n .
0 d p d 1, q 1 p, n 1,2,...
0,1,2,... .
O ! 0.
1,2,... .
0 p d1 ,
q 1 p .
r, r 1,...
0 p d1 ,
q 1 p .
N, M, n are positive integers, N ! M, N ! n
0,1,... min (n,M).
If there is an amnonnegative function f ( x)
f x t 0
such that for all x R
we can write down the distribution function F x as an integral F x
x
³ f u du .
(3)
f
Then such a distribution function is called an absolutely continuous distribution function.
89
The integral (3), in the general case, must be understood in the sense of the Lebesgue integral ([9]), but for our purposes (in this textbook) this integral is sufficiently understood as an (improper) Riemann integral. From the definition of the distribution functions we obtain that any nonnegative function f ( x) , which is integrable by Riemann integral and such that f
³ f ( x)dx 1
f
determines by formula (3) a distribution function F F ( x) . In Table 2 we give examples of different distribution densities f ( x) that are especially important for probability theory and mathematical statistics, with indicating their names and parameters. Table 2 Absolutely continuous distributions Distribution Name
Density function f (x) 1 , ad xdb, ba 0 , x >a, [email protected] .
Uniform on >a, [email protected]
Laplace Normal or Gaussian
1 e 2V
1 2S V
e
, f x f .
V
2V 2
Chisquare with n freedom degrees, F n2
Стьюдента с n степенями свободы, t
, f x f .
x 0.
0, ሺǡ௦ሻ
T Cochi
1
ǡ Ͳ ݔ ͳ,
Ͳǡ ב ݔሺͲǡͳሻ
S [T x a 2 ] 2
, f x f .
n x 1 2 2 x e
, x t0, §n· Г¨ ¸ ©2¹ 0, x 0. § n 1· n 1 Г¨ ¸ 2 1 © 2 ¹ §¨ x ·¸ 2 , 1 n ¸¹ Sn Г § n · ¨© ¨ ¸ ©2¹ f x f .
n 22
a b.
a, V R , V ! 0 .
x a 2
௫ షభ ሺଵି௫ሻೞషభ
Beta
a, b R ,
xa
T O O 1 Tx x e , x ! 0, Г O
Gamma
Parameters
90
a, V R , V ! 0 .
T , O R , T ! 0, O ! 0.
ݎ Ͳǡ ݏ Ͳ.
T , aR , n
T !0 .
1,2,...
n 1,2,... .
.
m
m §m·2 1 ¨ ¸ 2 x ©n¹ , x t 0, m n §m n· B¨ , ¸ § mx · 2 ¸ © 2 2 ¹ ¨1 n ¹ © 0, x0.
F, F(m, n), FisherSnedekor
O ! 0.
O e Ox ,
Exponential
x t 0, 0 , x 0.
O
Twosided exponential
2
e
O x
n , m 1,2,... .
, f x f .
O ! 0.
In Table 2, Г ( x), B( x, y) are (respectively) the gamma function and the beta function: f
Г ( x)
t x 1 ³ e t dt , 0
1
B ( x, y )
x 1 y 1 ³ t (1 t ) dt 0
Г ( x) Г ( y ) . Г ( x y)
From Table 2, we note the following: if O 1 , then gammadistribution is exponential; if T
1 ,O 2
n ( n is an integer), then gammadistribution is F n2 – distribution. 2
In the probability theory, betadistribution with the parameters p
q
1 is cal2
led the law of arcsine. It turns out that there is a third type of distribution function – this is the socalled singular distribution function. This type is characterized by the fact that the distribution function F F ( x) is continuous, but the growth point forms a set of Lebesgue measure zero. Thus, a singular distribution function F x is continuous, but F c x 0 almost everywhere, and F f F f 1. An example of a singular distribution function is the Cantor function. Cantor function. This function is constructed as follows:
F x 0, x d 0; F x 1, x t 1 . For x >0,[email protected] define this function as follows. First, the segment > 0,[email protected] is divided
into three identical parts: >0,1/ [email protected] , >1/ 3, 2 / [email protected] , > 2 / 3,[email protected] . On the middle segment, we assume F x 1/ 2 . The remaining two extreme segments are again divided into three
equal parts each and on the inner segments we assume that F x 1/ 4 and 91
F x 3 / 4 (respectively). Further, each of the remaining segments is again divided into three equal parts and on the inner segments we define F x as a constant which is equal to the arithmetic mean between the adjacent, already defined values of F x , etc. At points that do not belong to such internal segments, we define F x by continuity. It is not difficult to see that the total length of the «internal» segments, on which F x is constant, is equal to 2 · 1 2 4 1§ 2 § 2· ... 1 ¨¨ ¨ ¸ ... ¸¸ 3 9 27 3© 3 © 3 ¹ ¹
1 1 1, 3 1 2 / 3
so that the function F x grows on a set of measure 0 (zero), but without jumps. Remark 2. The question arises: is it possible to find the types of distribution functions other than the three types defined above? The answer is given by the following Lebesgue theorem, which is given without proof. Theorem 4 (Lebesgue). Any distribution function F ( x) can be uniquely represented as a sum
F ( x)
O1F1 ( x) O2 F2 ( x) O3 F3 ( x) ,
(4)
where F1 ( x) is discrete, F2 ( x) is absolutely continuous, F3 ( x) is singular distribution function, the coefficients are nonnegative (O1 t 0, O2 t 0, O 3 t 0) and O1 O2 O3 1 . The representation of the distribution function in the form (4) in the literature is called the Lebesgue decomposition. In the theorem the functions F1 ( x) , F2 ( x) , F3 ( x) are called discrete, absolutely continuous and singular components of the distribution function F x (respectively). If in decomposition (4) one of the coefficients is equal to one, and the remaining ones are zeros, then the distribution function is called the net distribution function. For example, if ߣଵ ൌ ߣଷ ൌ Ͳǡ ߣଶ ൌ ͳǡ then it is an absolutely continuous distribution, etc. Remark 3. Usually (in particular, in the university courses of probability theory and mathematical statistics), we are dealing only with discrete and absolutely continuous distribution functions. Therefore, it has become customary to call absolutely continuous distribution functions briefly continuous distribution functions, so that in what follows the term «continuous distribution function» means that the distribution function F x is written in the form of the Riemann integral (3), where f(x) is the distribution density and f ( x) F c( x) by the Lebesgue measure. 3.2. Space (Rn, β(Rn)). Multidimensional distribution function Let P be a probability (probabilistic function), defined on R n , E ( R n ) . Let’s define the function of n variables: 92
Fn ( x1, x2 ,..., xn )
P f, x1 @ u ... u f, xn @ ,
or in a more compact form, Fn ( x)
P f, [email protected] ,
where x
x1, x2 ,..., xn , f, [email protected] f, x1 @ u ... u f, xn @ .
We now introduce the difference operator ' a , b : R n o R ( ai d bi ), i 1, 2,..., n , acting according to the formula: i
'ai ,bi Fn ( x1,..., xn )
i
Fn ( x1,..., xi 1, bi , xi 1,..., xn ) Fn ( x1,..., xi 1, ai , xi 1,..., xn ) . ( 5′)
Calculations show that ' a1 ,b1 ...' an ,bn Fn ( x1 ,..., xn )
P a, b @ ,
(5)
where a, [email protected] a1 , b1 @ u ... u an , bn @ . We see from this relation that unlike the onedimensional case, the probability P a, b @ , generally speaking, is not equal to the difference Fn (b) Fn (a) . As P a, [email protected] t 0 , then from the last relation (5) it follows that for any a (a1 ,..., an ) , b (b1 ,..., bn ) , ai d bi , i 1, 2,..., n , the property takes place FF4. 'a1 ,b1 ...'an ,bn F ( x1,..., xn ) t 0 .
This property FF4 of the function Fn x is called a property of nonnegative definiteness. Further, using the property of lower continuity of a probability function P , we obtain that the function Fn x is a right continuous function with respect to the set of variables: FF3. For x k x1( k ) ,..., xn( k ) p x x1 ,..., xn , Fn x ( k ) p Fn x .
In the same way, the following properties of the function a Fn x are easily proved: FF2. If x n y f,..., f , then lim Fn x Fn (f, f,..., f) 1 ; xn y
93
If at least one component of the vector y nus infinity), then
y1, y2 ,..., yn takes the value ∞ (mi
lim Fn ( x1 , x2 ,..., xn ) xp y
x1, x2 ,...,
FF1. If
xn
xd y
0.
y1,
y2,..., yn , i.e. xi d yi (i Fn ( x) d Fn ( y) .
1, 2,..., n) , then
This property of a function Fn x is called the monotonicity property.
Definition 2. A function of n variables Fn ( x) Fn x1 ,..., xn , satisfying the pro
perties FF1FF4, is called n dimensional (multidimentional ) distribution function in R n . To emphasize the importance of the property FF4, we give one example. The function 1, x y t 1, F2 ( x, y ) ® ¯0, x y 1, satisfies the properties FF1FF3, but doesn’t satisfy the property FF4:
' 0 ,1' 0 ,1F2 ( x, y) 1 1 1 0 1 0 . Thus, the function ܨଶ ሺݔǡ ݕሻ is not a (twodimensional) distribution function. Using the same arguments as in Theorem 2, we can prove the following theorem. Theorem 5. Let a function F ( x) Fn x1 ,..., x n be an n dimensional distribution n
function in R . Then on R n , E R n
exists, and the only one probabilistic measure P such that
for any n dimensional parallelepiped a, [email protected] a1 , b1 @ u ... u an , bn @ , f d ai bi f , i 1, 2,..., n , and difference operators οǤ defined by the formula (5′) P a, b @ ' a1b1 ' anbn Fn x1 ,..., xn .
Here are some important examples of multidimensional distribution functions. 1. Let F 1 x1 , F 2 x2 ,..., F n xn be onedimensional distribution functions on the number line. Then ndimensional functions Fn x1 , x2 ,..., xn
F 1 x1 F 2 x2 F n xn
are the multidimensional (ndimensional) distribution functions. 94
Really, implementation of properties FF1FF3 is obvious (recall properties F1F3), and property FF4 follows from the relation ' a1 ,b1 ' an ,bn Fn x1 ,..., xn
n
' a ,b F k xk k 1
k
k
n
ª¬ F k bk F k ak º¼ t 0 k 1
Particularly important is the case when F k xk 0, xk d 0; F k xk xk , 0 d xk d 1; F k xk 1, xk t 1 .
In this case for all 0 d xk d 1, k
1,2,...,n , we have
Fn x1 , x2 ,..., xn x1 x2 ... xn . The probability measure corresponding to this ndimensional distribution function is called the n  dimensional Lebesgue measure on the unit cube
>0,[email protected] >0,[email protected] u ... u >0,[email protected] . n
We obtain numerous examples of ndimensional distribution functions from multidimensional distribution functions represented in this form: Fn x1 ,..., xn
x1
xn
f
f
³ ... ³ f n u1, u2 ,..., un du1...dun
where f n u1 , u2 ,..., un are nonnegative functions with properties f
³
f
f
... ³ f n u1 , u 2 ,..., u n du1 ...dun
1
f
where the integrals are understood in the sense of the Riemann integral (in the general case, in the sense of the Lebesgue integral). The functions f n u1 , u 2 ,...,u n are called the ndimensional distribution densities (distribution densities of the n dimensional distribution function, ndimensional densities). 2. Multidimensional ( n dimensional) normal distribution.
V ij
Let V n
n, n i, j 1
be a stricty positively defined and symmetric n u n matrix:
¦ V ij Oi O j ! 0 , Oi R ,
i, j 1
i 1,2,..., n ; V ij 95
V ji (i, j 1,..., n ) .
Then the determinant of the matrix V is strictly positive ( det V ! 0 ); there is the inverse matrix A
aij
n ,n
V 1 and for such a matrix V the following function is
i, j 1
defined: 1
f n x1 , x2 ,..., xn
where a
2S
n 2
e
1 1 V ( x a ), x a 2
,
(6)
det V
a1 ,..., an R n , x ( x1,..., xn ) Rn , and for x, y R n
their scalar product
n
is defined by the ratio x, y
¦ xi yi . i 1
The fact that the defined by (6) function f n ( x) is nonnegative can be seen from the definition. We now prove that this function satisfies the condition
³
Rn
f n x dx 1 .
To do this, we first make a change of variables in the integral by the formula: x a Qy , where an orthogonal matrix Q is defined from the condition
Q VQ D diag (d1, d2 ,..., dn ) ( * is the transposition operation), Q Q E is a unit matrix, det Q 1, and diag d1 , d2 ,..., dn a diagonal matrix with the diagonal elements d1 , d2 ,..., dn , d1 ! 0,..., dn ! 0 (the existence of an orthogonal matrix with the above properties is known from the course of algebra). Then
V
1
( x a), x a
Q VQ *
dx
1
V
y, y
1
Qy , Qy
D
1
Q V
y , y
det Q dy dy , det V
1
*
Qy , y
n
¦ di1 yi2 , i 1
d1 d 2 ... d n .
We substitute these values into the required integral. Then
³n f n ( x)dx
R
f
f
³ ... ³
f
f
1
2S n f
³
i 1 f
n 2
e
1 n y2 ¦ i 2 i 1 di
dy1...dyn
d1d 2 ...d n y i2
1 e 2d i dyi 2Sd i
n f
³
i 1 f
f n
f
f i 1
1 e 2S
because, according to the wellknown Poisson integral 96
f
³ ... ³ y i2 2
dyi
1,
1 e 2Sd i
yi2 2 di
dyi
f
³
f
1 e 2S
x2 2 dx
1.
Thus, the function f n ( x) defined by (6) is indeed a multidimensional (ndimensional) distribution density. This distribution density is called the distribution density of ndimensional nondegenerate normal (Gaussian) distribution. For n 2 (for twodimensional distribution) the distribution density f2 x1, x2 can be reduced to the form (Prove!): 1
f 2 x1 , x2
2SV 1V 2
2U
where V i ! 0 ,
° ª x1 a1 2 1 exp ® « 2 2 1 U 2 °¯ 2 1 U «¬ V 1
x1 a1 x2 a2 x2 a2 V 1V 2
V 22
2
(7)
º ½° »¾, ¼» ¿°
U 1.
3.3. Space R f , E R f
Let the family of cylindrical sets in R f with «bases» B E R n be denoted by ܬ ሺܤሻ:
Jn B
^x R : x , x ,..., x B` , f
1
2
n
B E Rn ,
Let P be a probabilistic measure on a measurable space Rf , E Rf . For any
n 1, 2,... denote by Pn B
P( J n B ) ,
B E Rn .
The sequence of probabilistic measures P1 , P2 ,... , defined on measurable spaces
R , E R , R
2
, E R 2 (respectively), satisfies the consistency properties
Pn1 ( B u R)
Pn ( B) , n 1, 2,...
(8)
It turns out that the converse assertion is also true (this assertion follows from the following theorem, given without proof, ([9], pp. 178180)).
97
Theorem 6. (Kolmogorov's theorem on the extension of a probability measure on R , E R f ). Let P1 , P2 ,... be a sequence of probabilistic measures on R , E R ,
R , E R ,... possessing the property of consistency (8). 2
f
2
Then there exists a unique probabilistic measure P on Rf , E Rf such that
for each n 1, 2,... P( J n B )
B E Rn .
Pn B ,
(9)
Example 3. An example of a probability distribution on R f , E R f . Let F1 ( x) , F2 ( x) ,... a sequence of onedimensional distribution functions. Let’s define the functions G1 ( x1 )
F1 ( x1 ) , G 2 ( x1 , x 2 )
F1 ( x1 ) F2 ( x 2 ) ,...
and the corresponding probabilistic measures on
R , E R , R
2
, E R 2 ,... the
theorem 6 denote by P1 , P2 ,... (respectively). Then it follows from Theorem 6 that there exists a probabilistic measure P in
R
f
, E Rf such that
P ^ x Rf : x1 , x2 ,..., xn B` Pn ( B) ,
B E Rn ,
and in particular P ^ x Rf : x1 d a1 , x2 d a2 ..., xn d an ` F1 (a1 ) F2 (a2 ) ... Fn (an ) .
3.4. Tasks for independent work 1. Let’s consider a probability space R, E R , P and define a function F ( x) P f, [email protected] . Show the validity of the following formulas: P a, [email protected] F (b) F (a) ,
P > a, [email protected] F (b) F (a 0) ,
P a, b F (b 0) F (a) ,
P > a, b F (b 0) F (a 0) ,
P f, a F (a 0) ,
P a, f 1 F (a) ,
P > a, f 1 F (a 0) ,
P ^a` F (a) F (a 0) .
2. Let a function: G x, y
> x y @ ([a] be an integer part of the number а). 98
Prove that this function satisfies the properties FF1, FF2, FF3, but does not satisfy the property FF4, i.e. is not a distribution function on R 2 . 3. If F x is an (absolutly) continuous distribution function, f x is its distribution density, then the integrals
Dk
f
k ³ x f x dx ,
Ek
f
f
³
f
x f x dx , k k
0,1,... ,
are called (respectively) algebraic and absolute moments of the kth order of the function F x . Prove that for the algebraic moment of the kth order to be finite it is necessary and sufficient that the absolute moment of the kth order is finite: ߙ ൏ λ ֞ ߚ ൏ λǤ 4. Continuation. Let the absolute moment of the kth order of the distribution function F x be finite: E k f . 1
1
Prove that then Ek21 d Ek Ek 2 . Derive from this inequality E kk11 d E kk , from which the chain of inequalities follows
E1 d
1 E 22
d
1 E33
d ... d
1 E kk
.
5. Continuation. If for a distribution function F x its algebraic (or absolute) moment of the kth order is finite, i.e. D k f ( E k f ), then for any s d k takes place D s f ( E s f ). Prove this statement. 6. Let a distribution function be defined by the formula (below m ! 0 ):
F x
0, x 1, ® m ¯1 x , x t 1.
For this distribution function
Dk
Ek
m ( k m ), mk
and for k t m the moments of the kth order are not finite. (This problem shows that the moments of the distribution function cannot always be finite). The densities of the distributions appearing in tasks 715 are presented in Table 2 (see above, §3, point 3.1). 7. Show that the functions listed in the second column of Table 2 are the densities of some distribution functions. 8. F x is a function of a uniform distribution on > a, [email protected] with the density f x
1 , x > a, a @; f x 0, x > a, a @ , 2a
99
Show: then
a 2 k 2k 1 , Ek
D 2k 1 0, D 2k
a k k 1 , ( k
0,1,2,... ).
9. Show that the moments of the gammadistribution are equal to
Dk
E k T k O O 1 ...O k 1 , ( k
0,1,2,... ).
10. Show that the moments of the exponentialdistribution are equal to
Dk
Ok k!
Ek
11. Normal distribution with the parameters a bution. D 2 k 1 0, D 2 k
E2k
2k ! 2k k !
0, V
1 is called a standard normal distri1
2k 1 !!, E2k 1
2S
2k k 1 !,
12. Show that for the Laplace distribution (twoway exponential distribution) with the parameter a 0 takes place
D 2k 1 0, D 2 k
2k !, Ek
k !,
13. Show that the algebraic moments of the betadistribution with the parameters p, q are equal to Г ( p k ) Г ( p q) p( p 1)...( p k 1) . Dk Г ( p) Г ( p q k ) ( p q)( p q 1)...( p q k 1) 14. Show that the algebraic moments of the chisquare distribution with n degrees of freedom are equal to D k n(n 2)...[n 2(k 1)] . 15. Show that for the Student distribution with n freedom degrees
D 2k 1
0,
D 2k
1· · § §n nk Г¨ k ¸Г¨ k ¸ 2¹ ¹ © ©2 , 2k n . §n· S Г¨ ¸ ©2¹
§4. Conditional probability. Independence 4.1. Conditional probability. The formula for multiplying probabilities Let a sample space ሺ : ǡ ࣠ǡ ܲሻ be given. Let’s consider the following problem: if it is known that an event B ࣠ אoccurred ( P B ! 0 ), then, by using this additional information, how to find the probability of some (different from B ) event A? ࣠ א 100
Under these conditions, it is natural to regard not Ω but B as the space of elementary events, since the fact that B occurred means that we are talking only about those elementary events that belong to B. In general case, an event AB implies B AB B , but if it is known that an event B occured, then (under this condition) those and only those elementary events that belong to AB , imply A . Since we identify the event A and the set of outcomes that lead to the event A , we now need to identify the event A with the event AB AB . We can say that an event (set) AB AB is an event A, viewed from the point of view, according to which the space of elementary events is an event B. In the new sample space, the V algebra of events ࣠ is defined (or, as they say, induced) by the V algebra of events ࣠ (namely, ࣠ consists of events of the form AB AB ). Check, that ࣠ is really a σalgebra, i.e. ࣠ satisfies the conditions А1, А2', А3 from §1, point 1.1:࣠ ൌ ^AB AB : A ࣠ሽǤ Note that ࣠ is called a σalgebra generated by an event A . Let’s define on (B,࣠ ሻ a function PB through the probabilistic function P as follows: for ܣ ࣠ א we put PB AB
P AB P( B)
P( AB ) . P( B)
(1)
It follows from this definition (1) that for any event ܣ ࣠ א its probability i j PB AB t 0 , PB B 1 and for ܣ ൌ ܣ ࣠ א ܤ, A B A B i z j
¦ PB Ai B .
§f · PB ¨¨ ¦ Ai B ¸¸ ©i 1 ¹
f
i 1
The function PB thus introduced satisfies all the axioms P1, P2, P3 of the probabilistic function, hence is a probability (probabilistic function) on (B,࣠ ሻ. So, we built a new probability space (B,࣠ ǡ ܲ ሻ by the event ࣠ א ܤǡ ܲሺܤሻ Ͳ. This sample space is called the probability space generated by the event B. We will explain the meaning of the probability PB () with the help of the classical definition of probability. In this case :
^Z ,Z ,...,Z `, : 1
2
n
1 n
P Zi Suppose now that B
^Z ,Z i1
cal definition of probability, P B
i2
n f and 1 . i :
`
^Z
P A
s and, if n
,..., Zim , A
m , n
1, 2,..., n .
101
j1
`
, Z j2 ,..., Z js . Then, by the classi
AB
^Z
k1
`
r . n
, Zk2 ,..., Zkr , r d min m, s , then P( AB)
If an event B is considered as a new space of elementary events, then the probability of an event AB AB is defined as the ratio of the number r of outcomes that favor АВ to the total number of outcomes m . Thus, in accordance with the classical definition of probability PB AB
AB B
r/n m/n
r m
P( AB) P( B)
P( AB ) . P( B)
On the other hand, the probability PB () can also be considered on the original Valgebra ࣠. On ࣠ the function PB () is also a probability and is denoted by ܲሺήȀܤሻ: PB P / B . This is the basis for the following definition of the socalled conditional probability. Definition 1. The conditional probability of an event A given that an event B with P B ! 0 occurs is called the probability (designation PB A or P A / B ) PB A
P( A / B)
P( AB) . P( B)
(2)
The fact that the function ܲሺήȀܤሻ, defined by the formula (2), is a probabilistic function (probability) on ሺ : ǡ ࣠ሻǡ i.e. it satisfies the axioms Р1, Р2, Р3, follows directly from the definition (2). Thus, by the event B ࣠ אǡ P(B) ! 0 , we can construct the new probabilistic space (B,࣠ǡ ܲ ሻ. Therefore, for the probabilistic function ܲሺήȀܤሻ all the properties of probability that were proved above (in §1) take place:
P : / B 1 , P Ø / B 0 ,
P A/ B
P A1 A2 ... / B P A C / B
1 P A / B ,
P A1 / B P A2 / B ...
,
P A / B P C / B P AC / B .
We will prove only the last property. By definition (2) P A C / B
P A C B
P( AB) P( BC ) P( ACB) P( B)
P( B)
P AB CB P( B)
P( A / B) P(C / B) P( AC / B) .
102
Remark 1. We saw above that for any event A࣠ א P( A / B ) P( A / B ) 1 .
But in the general case it cannot be asserted that for any events A, B P( A / B) P( A / B ) 1 ,
P( A / B) P( A / B) 1.
(give examples!). Formula (2) implies the formula P( AB)
P( B) P( A / B) .
(3)
Formula (3) is called the formula (or theorem) of multiplication of probabilities. Formula (3) can be generalized to any finite number of products of events. Theorem 1. Let for the events A1 , A2 ,..., An the condition P( A1 A2 ... An1 ) ! 0 take place. Then
P( A1 A2 ...An )
P A1 P A2 / A1 P( A3 / A1 A2 )...P( An / A1 A2 ... An1 ) .
(3c)
Proof. First of all, we note that for all k 1,2,..., n 1, A1 A2 ... An1 A1 A2 ... Ank , this implies (by the condition theorem) 0 P( A1 A2 ... An1 ) d P( A1 A2 ... Ank ) , but this means that all conditional probabilities of the righthand side of (3c) are defined (have meaning). To prove (3c) we use induction. When n 2 the formula (3c) is just the formula (3). To prove (3c) for any finite number n, we denote by B A1 ... An1 , A An and apply (3) and the induction assumption of validity when n is replaced by n1. ז c The formulae of type (3) and (3 ) show that on the same sample space Ω with σalgebra ࣠, in addition to the probability P, it is convenient to consider conditional probabilities ܲሺήȀܤሻ. Examples 1. Consider a family with two children. Denote the boy and the girl by letters «b», «g» (respectively), and in the first place we will indicate the older child. Find the probability that in the family both children are boys (event С), given that a) older child is a boy (event А); b) at least one of children is a boy (event В); Solution. There are only four possibilities bb, bg, gb, gg, so the sample space can be described as: 103
: {bb, bg, gb, gg}. We assume that all elementary events are equally probable. We have:
A ={bb, bg}, B ={ bb, bg, gb}, С ={ bb}. Then ܲሺʠȀܣሻ ൌ
Note that ܲሺܥȀܣሻ ൌ
ͳ ͳ ǡܲሺܥȀ ܤሻ ൌ Ǥ ʹ ͵
ͳ ͳ ് ൌ ܲሺ ܥሻǡ ܲሺܥȀܤሻ ് ܲሺ ܥሻǤ ʹ Ͷ
2. There are m white and n m black (total m (n m) n ) balls in an urn. Two balls are extracted from the urn sequentially, without return. Find the probabilities of the following eevnts: а) the 1st ball is white (event A1 ); b) the 2nd ball is black (event A2 ); c) both balls are black (event A1 A2 ). m Solution. It is clear that Ρ A1 . n To find the probability Ρ A2 , we represent the event A2 in the form of a sum of two mutually exclusive events: A2
A2 A1 A2 A1 .
Whence, applying first the formula of addition of probabilities for mutually exclusive events, then the formula for multiplying probabilities, we obtain Ρ A2
Ρ A1 A2 Ρ A1 A2
=
Ρ A1 Ρ A2 / A1 Ρ A1 Ρ A2 / A1
m m 1 n m m n n 1 n n 1
m . n
Thus, we have proved that Ρ A1 A2
m(m 1 ) . n(n 1 )
Here we pay attention to the fact that the events A1 and A2 turned out to be equally probable, i.e. the probability of extraction from the urn of a white ball turned out to be independent of the order of extraction. 104
In another way, this example can be interpreted as follows. All balls in the urn are the n exam tickets, white balls are lucky tickets for this student, i.e. those tickets to which the student knows the correct answer (the student will successfully pass the exam). Then what is more likely for a student: «the student will go first and successfully pass the exam» (event A1 ) or «the student will go second and successfully pass the exam» (event A2 ). Our example shows that ܲ൫ A2 ൯ ൌ Ρ A1
m . Conclusion: the probability of a n
successful passing of the exam by the student does not depend on whether he is the first to enter the exam or the second, but depends only on his preparation for the exam. 4.2. Independence The concept of independence of two or more events (or trials), in a certain sense, occupies a central place in the probability theory. From the mathematical point of view, this concept defined the uniqueness that distinguishes the probability theory in the general theory, dealing with the study of measurable spaces with measure. We should also note that one of the founders of the probability theory, the outstanding scientist A. Kolmogorov, paid special attention to the fundamental nature of the concept of independence in the probability theory in the thirties of the last century (see [12]) A. Kolmogorov, Basic concepts of the probability theory. – Moscow, ed. «Science», 1974). Below we first dwell on the concepts of independence of events, after extending this notion to partitions and to the algebra of sets, in conclusion we will consider the independence of trials and Valgebras. 4.2.1. Independence of events Let a sample space ሺ : ǡ ࣠ǡ ܲሻ and events А,В ࣠ אbe given. If a conditional probability of an event A under the condition that an event B P( B) ! 0 occurred is equal to (unconditional) probability of an event A, i.е. P A / B
P A ,
(3′)
then it is natural to assume that the event A does not depend on the event B. If this is so (i.e, (3') takes place), then from the formula of conditional probability (3) we obtain the formula P( AB) P( A)P(B) . 105
(4)
Now let P( A) ! 0 and condition (3') be satisfied. Then, in view of the fact that (4) holds, we obtain P( B / A)
P( BA) P( A)
P( B) P( A) P( A)
P( B),
(5)
i.е. the event В does not depend on the event А. From what has been said, we come to the following conclusion: the concept of independence of two events is a symmetric concept – if the event A does not depend on the event B, then the event B does not depend on the event A. But the above formulas (3'), (5) have one drawback – they require the condition of strict positiveness of the probabilities that stand in the denominators in formulas (3') and (5): P( A) ! 0 or P(B) ! 0 . But, as we noted earlier (§1, point 1.2), the event can have a probability of 0 (zero), but it can happen. Therefore, in the formulas (3') and (5), the requirements P( A) ! 0 or P(B) ! 0 restrict the domains of applicability of these formulas and the concept of independence of events. Therefore, relation (4), which is a consequence of definitions (3') and (5), but which does not require conditions P( A) ! 0 or P( B) ! 0 , is taken for the definition of independence. Definition 2. If the probability of the product of events A and B is equal to the product of the probabilities of events A and B, i.e. if relation (4) is satisfied, then the events A and B are called independent events. We obtain from the definition, that if P( A) 0 , then for any В, Р(АВ) = 0 = = Р(А)Р(В) (because AB A , therefore 0 d P( AB) d P A 0 , i.е. P( AB) 0 ) i.е. (4) takes place. In other words, if Р(А) = 0, then А and any event В are independent. We now formulate several assertions related to independence in the form of a theorem. Theorem 2. а) If P(B) ! 0 , then independence of events А and В, i.е. ratio (4), is equivalent to condition P( A / B) P( A) . b) If A and B are independent events, then events A, B ( A, B ) and A, B are also independent events; c) If P( A) 0 or P( A) 1 , then А and any event В are independent; d) If А and B1 are independent events, A and B2 are independent events, while B1B2 , then А and B1 B2 are independent events. Proof. а) In this case (3′) implies (4) (we saw this above). If, however, (4) holds, then P( A / B)
P( AB) P( B)
P( A) P( B) P( B)
P( A) ,
i.е. the formula (3') is correct. b) It suffices to show that condition (4) implies relations P( AB)
P( A) P( B), P( AB ) 106
P ( A) P ( B ) .
Indeed, since B
AB AB,
A B
AB ,
then by the properties of probabilities and the definition (4) it can be written P( B)
P AB P( AB )
P( AB) P( AB)
P B P A P B
P A
B
1 P( A
P( A) P( B) P( AB),
P B 1 P A
P A P B
B ) 1 P( A) P( B) P( AB)
1 P( A) P( B) P( A) P( B)
(1 P( A))(1 P( B))
P( A) P( B ).
с) The case P( A) 0 was proved above (immediately after the definition). The validity of the assertion in the case when P( A) 1 is a consequence of the assertion for the case P( A) 0 and assertion b): if P( A) 0 , then P(A ) = 1 and vice versa. As a consequence of this statement (which is proved very simply), we get the following statement: an impossible event and any other event are independent; a certain event and any other event are independent. d) This property follows from the chain of the following equalities: P( A( B1 B2 ))
P( AB1 AB2 )
P( A) P( B1 ) P( A) P( B2 )
P( AB1 ) P( AB2 )
P( A)( P( B1 ) P( B2 ))
P( A) P( B1 B2 )
ז
Remark 2. If the condition B1B2 is not satisfied, then assertion d) of Theorem 2 may turn out to be false (Give an example). The concept of independence of two events introduced by Definition 1 is called statistical or stochastic independence (these terms are synonyms). Usually the independence of A and B is not established by means of equality (4), but is postulated on the basis of some considerations. Using equality (4), we calculate the probability P(AB), knowing the probabilities P(A) and P(B) of two independent events. When establishing the independence of events A and B, the following principle ~ ~ is often used: «Events A and B, real preimages A and B of which are causally independent, are independent (stochastically independent)». The real meaning of this principle can be related to the property of frequency B occur and appeared stability. Suppose that, for n observations, events A , B and AB n( A) , n( B), n( AB) times (respectively). Since from the stability of the frequencies it follows that n( A)  P( A), ) n
n( B )  P( B)), n 107
n( AB)  P( AB), n
n( AB) P P(( A / B) n( B )
P( AB) , P( B)
Then from the independence of events А and В, i.е. from P(A/B)=P(A) it follows that n( AB) n( A) ,  n n( B)
or, equivalently, n( AB) n( A) n( B)  . n n n
(6)
~ ~ The property (6) for causally independent real events A and B is established by the centuriesold practice of humanity. This allows us to formulate the above principle. It should be noted that this principle is by no means a theorem. Since it is not formulated in terms of a mathematical model, it cannot be a theorem. It is obvious that from the stochastic independence of events A and B, the causal independence of their ~ ~ real prototypes A and B does not follow. If the probability model is slightly modified, then independence can disappear. Examples 5. Two dice are tossed. Consider the events: A={«1» point occurs on the 1st dice}, В={«2» points occur on the 2nd dice}, С={the sum of the dropped points is less than or equal to 3}. Then: :
^ i, j : i, j
1,...,6` , A
C
^1, j : j
1,...,6` , B
^ i, 2 : i
1,...,6` ,
^i, j : i j d 3` ^1,1 , 1,2 , 2,1 ` ,
therefore
P A
P B
1 , P AB 6
1 1 1 P A P B , P(C ) 36 6 6 1 1 1 P( AC ) z P( A) P(C ). 18 12 6
1 , 12
Thus, A and B are stochastically independent events, but A and C are not independent. Also B and C are dependent events (Prove!) 4. Consider families with three children and we will assume that all eight possible outcomes are «bbb», «bbg», «bgb», ..., «ggg» («b» is a boy, «g» is a girl, «bgb» means that the older and younger children are a boy, and the middle child is a girl, etc.). We introduce the events: A = {there are both boys and girls in the family}, B = {no more than one daughter in the family}. Then 108
3 , P( B) 4
P( A)
1 , 2
P( AB)
3 , 8
therefore P( AB) P( A) P( B) .
The last equality means that A and B are independent events. But it turns out that for a family with two or four children, these events will be already dependent (Check!) Assume that pairwise independent events А, В, С are given: P( AB)
P( A) P( B), P( BC )
P( B) P(C ), P( AC )
P( A) P(C ) .
(7)
We pose the following question: does the independence of the events A,B and C follow from the pairwise independence of A, B, C (i.e. from (7)), i.e., is the formula P( ABC) P( AB)P(C) P( A)P(B)P(C) .
(8)
correct? The answer is negative. Let us give an example. 6. An example of Bernstein. Let we have a tetrahedron made of a homogeneous material and let its three faces be painted in three different colors – red (event A), blue (event B) and green (event C), and the fourth face is colored in all three colors (the ABC event). The experiment consists of one throwing of this tetrahedron on the plane and we will assume that if the tetrahedron fell on the face with some color, then the event signified this color occurred. 1 , because red color exists on two faces of the tetrahedron. Then P( A) 2 Similarly, P( B) P(C )
1 , since two different colors are found only on one face, 2
P( AB)
P( BC )
P( AC )
1 4
1 1 , 2 2
and these last relations mean the pairwise independence of the events A, B, C (conditions (7) are satisfied)). Further, since only one face of the tetrahedron is colored in all three colors, then P(ABC) = 1/4. So 1 4
P( ABC ) z P( A) P( B) P(C )
i.e. the condition (8) is not satisfied.
109
1 1 1 , 2 2 2
6. Let
:
following events:
^Z0 ,Z1 ,Z2 ,Z3` Ai ^Z 0 , Zi `, i 1 4
P( Ai Aj ) 1 4
1 1 2 2
1 4
and P(Zi )
i
0,1,2,3 . We introduce the
1,2,3 . Then P( Ai ) P( Aj )
i, j
1, 2,3; i z j ,
P( A1 A2 A3 ) z P ( A1 ) P( A2 ) P( A3 )
1 . 8
Therefore, events A1 , A2 , A3 are pairwise independent, but events Ai A j and Ak ( i, j, k 1,2,3 are different indices) are dependent. Note that practically examples 5 and 6 refer to the same probabilistic model. If events А, В, С are pairwise independent and, moreover, the probability of their product is equal to the product of their probabilities (Р(АВС)=Р(А)Р(В)Р(С), see formula (8)), then they are called mutually independent (or simply independent) events. By analogy, the following definition generalizes the definition of independence to the general case. Definition 3. Let A1 , A2 ,..., An be the events which are defined on the same probability space ሺ : ǡ ࣠ǡ ܲሻ. Therefore, if for any index 1 d i1 i2 ... ir d n , r 2,3,..., n, the equalities P( Ai1 Ai2 ...Air )
P( Ai1 ) P( Ai2 )...P( Air ) ,
(9)
hold, then the events A1 , A2 ,..., An are called mutually independent (or simply independent) events. In the formula (9), which gives a condition of independence of n events, there n are 2 n 1 conditions. Really, if r
2 , then C n2 conditions correspond to this case in (9):
P( Ai Aj )
P( Ai ) P( Aj ),
1 d i j d n,
(92)
if r 3 , then C n3 conditions correspond to this case in (9): P( Ai A j Ak )
etc., if r n , then C nn
P( Ai ) P( A j ) P( Ak ),
1 d i j k d n,
(93)
1 condition corresponds to this case in (9):
P( A1 A2 ... An )
P( A1 ) P( A2 )...P( An ) . 110
( 9n )
Thus, the total number of conditions in (9) Cn2 Cn3 ... Cnn
1 1 n Cn0 Cn1
2n n 1.
The relations (92), by definition, are the conditions for the pairwise independence of n events A1 , A2 ,..., An . It follows from the definition that if events A1 , A2 ,..., An are independent, then events of any subset of events Ai1 , Ai2 ,..., Aik ሺʹ ݇ ݊ െ ͳሻare also independent. Definition 3 also implies the following property of conditional probabilities. Theorem 3. If events A1 , A2 ,..., An are independent; indices i1 , i2 ,..., ir and j1 , j2 ,..., jk , given from the set 1, 2,..., n , are different; P( Ai1 Ai2 ... Air ) ! 0 , then P( Aj1 Aj2 ... Ajk / Ai1 Ai2 ... Air )
P( Aj1 Aj2 ... Ajk ) .
(10)
Proof. I.e. the events A1 , A2 ,..., An be independent, then the events Ai1 , Ai2 ,..., Air and the events Aj1 Aj2 ... Ajk are independent. Therefore
P( Aj1 Aj2 ...Ajk ) P( Aj1 ) P( Aj2 )...P( Ajk ),
P( Ai1 Ai2 ...Air ) P( Ai1 ) P( Ai2 )...P( Air ). and
P( Ai1 Ai2 ...Air A j1 A j 2 ...A j k )
P( Ai1 ) ... P( Air ) P( A j1 ) ... P( A j k ).
Now it remains to reveal the conditional probability on the lefthand side of (10) according to the conditional probability formula.ז The following definition gives a generalization of the concept of independence of events to a sequence of independent events. Definition 3′. Let A1 , A2 ,... be a sequence of events of some probability space ሺ : ǡ ࣠ǡ ܲሻ. If for any set of indices 1 d i1 i2 ... d ir d n ; r = 2,3,…,n; n = 2,3,… the conditions (9) are satisfied, then such a sequence of events is called a sequence of independent events. It is clear that this definition is equivalent to the fact that for any n 2,3,... any n events, taken from the sequence A1 , A2 ,... , are independent. Example 7. Let’s show that, if A1 , A2 ,... is a sequence of independent events, then § f · P¨ Ak ¸ ©k 1 ¹
f
P Ak .
k 1
111
Solution. Let’s introduce for n 1, 2,... a sequence of events Bn
n
Ak .
k 1
Then f
Bn p B
Bn1 Bn ,
k 1
Bk
f k 1
Ak ,
consequently, by the axiom of continuity from below (axiom Р3″) P( B)
§f · P¨¨ Ak ¸¸ ©k 1 ¹
lim PBn
nof
n
lim P Ak
nof
k 1
f
P Ak . . k 1
4.2.2. Independence of partitions and algebras. Independent trials. Independence of σalgebras Let : ^Z` be a sample space. In §2, p. 2.1 we introduced the concept of partition. Let’s recall this definition and some other results connected with it. If Di : , Di D j (i z j ) , ¦ Di : , then the system i
D
^D1 , D2 ,..., Dn ,...`
is called a partition of : , and the sets Di are called atoms of this partition. A sets system
½
n
D (D ) ® ¦ Di j : i j z il j z l , Di j D , n f ¾ . ¯j
¿
1
is called an algebra, generated by the partition D . Also in §2, p. 2.1 the theorem about the fact that «any finite algebra of sets is generated by some finite partition» was proved. Definition 4. Let the partitions D1 , D 2 ,...,D n be given. the atoms If for any indexes 1 d j1 j2 ... js d n , s 2,3,..., n D j1 D j1 , D j2 D j2 ,..., D js D js are independent, i.е. the following conditions take place:
P D j1 D j2 ...D js
PD j PD j ...PD j , 1
2
s
then the partitions D1 , D 2 ,...,D n are called independent partitions. 112
Definition 5. Let algebras ࣛଵ ǡ ࣛଶ ǡ ǥ ǡ ࣛ be given. If for any indexes 1 d i1 i2 ... ir d n , r 2,..., n, and events ܣೕ ࣛೕ , j 1,..., r , the conditions
P( Ai1 Ai2 ... Air )
P( Ai1 ) P( Ai2 )...P( Air ) ,
are satisfied, then the algebras ࣛଵ ǡ ࣛଶ ǡ ǥ ǡ ࣛ are called independent algebras. The concepts of sequences of independent partitions and sequences of independent algebras are defined by analogy with the concept of a sequence of independent events (see definition 2′). From the definitions 4, 5 we obtain that any subsets of k 2 d k d n partitions
(algebras) out of n n t 2 independent partitions (algebras) are also independent partitions (algebras). Similarly, any subsequence of independent partitions (algebras) is also a sequence of independent partitions (algebras). Theorem 4. In order for partitions D1 , D 2 ,...,D n to be independent, it is necessary and sufficient that the algebras ࣛଵ ൌ ߙ൫ D ଵ ൯ǡࣛଶ ൌ ߙ൫ D ଶ ൯ǡ ǥ ǡ ࣛ ൌ ߙ൫ D ൯ǡgenerated by them, are the independent algebras. Proof. Since D ߙ൫ D ൯ ൌ ࣛ , then the independence of the algebras ࣛଵ ǡ ࣛଶ ǡ ǥ ǡ ࣛ implies the independence of partitions D1 , D 2 ,...,D n . On the other hand, by the abovementioned Theorem 1 of §2, each of the events ܣೕ ࣛೕ can be written as the sum of some atoms of the partition D i : Ai
Dis1 ... Disl
From the independence of the partitions it follows that all the events represented in the form of the sums of the atoms of the corresponding partitions are also independent (for example, if D1 ^D11 , D12 , D13` , D 2 ^D21 , D22 , D23` and they are independent, then D11 D12 and D22 D23 are independent events, because: P ^( D11 D12 ) ( D22 D23 )`
P ^D11D22 D11D23 D12 D22 D12 D23 `
P( D11 ) P( D22 ) P( D11 ) P( D23 ) P( D12 ) P( D22 ) P( D12 ) P( D23 )
P( D11 ) P( D12 ) P( D22 ) P( D23 )
P( D11 D12 ) P( D22 D23 ) ).
But this, by definition 5, gives independence of algebras ࣛଵ ǡ ࣛଶ ǡ ǥ ǡ ࣛ . Any event A z generates a partition D A
ז
^A, A`, and this partition in its turn
generates an algebra ߙ൫ D ʏ ൯ ൌ ࣛʏ ൌ ሼ ǡ A ǡ А ǡ : ሽ. Since independence of events 113
A, B implies independence of A and B , A and B , A and B (§4, p. 4.2, theorem 2), then we obtain the validity of the following theorem. Theorem 5. Independence of events A1,A2 ,...,An is equivalent to the independence of the algebras ࣛʏభ ǡ ࣛʏమ ǡ ǥ ǡ ࣛʏ generated by them. Now we will briefly consider on the concepts of independent trials and the sequence of independent trials. Recall that under the trial we will understand an experiment, the outcomes of which are those or other (random) events. In our axiomatics, the trial is a probability space. For simplicity, first consider the case of two trials G1 and G2 . Let ( : ଵ , ࣠ଵ , ܲଵ ሻ and ( : ଶ , ࣠ଶ , ܲଶ ሻ be the probability spaces corresponding to the tests G1 and G2 . If these probabilistic spaces are the models of some causally independent trials, then the σalgebras ࣠ଵ ࣠ଶ must be independent. It is natural to define stochastic independence as follows: if any event of the probability space corresponding to the trial G1 does not depend on any event of the probability space corresponding to the trial G2 , then such trials G1 and G2 are called independent trials. The last said requires clarification, because to determine (stochastic) independence of events it is necessary that these events be determined on the same probability space. In other words, we must represent σalgebras ࣠ଵ ࣠ଶ as σsubalgebras of some σalgebra on a common probability space ሺ : ǡ ࣠ǡ ܲሻ. Such a probability space can always be constructed. To do this, we construct the probability space ൫ : ǡ ࣠ǡ ܲ൯ǡneeded for the «compound» experiment G, as a direct product of probability spaces corresponding to the experiments G1 and G2 . More precisely, in the constructed new probability space ൫ : ǡ ࣠ǡ ܲ൯we define the sample space : as a direct product of :1 and : 2 : : :1 u : 2 ; construct the σalgebra ࣠ as the direct product of the σalgebras ࣠ଵ ࣠ଶ : ࣠ = ࣠ଵ ۪࣠ଶ = ={ A1 × A2 : A1 ࣠ଵ ǡ A2 ࣠ଶ }; define the function P as a probability function onሺ : ǡ ࣠ሻ. Definition 6. If for any events A = A1 × A2 , A1 ࣠ଵ ǡ A2 ࣠ଶ the following condition is satisfied
P A
P A1 u A2
P1 A1 P2 A2
P A1 u : 2 P:1 u A2 ,
11
then the trials G1 and G2 are called independent trials. We will show that this definition is indeed a determination of the independence of trials G1 and G2 , i.е. it gives an independence of any event of the trial G1 from any event of the trial G2 . If we introduce events A '1 A1 u :2 , A '2 :1 u A2 , then for an event A1 A2 to occur, it is necessary and sufficient that an event A'1 A' 2 occurs, since :1 :2 is a certain event of the trial G1 ( G2 ). This establishes a onetoone correspondence 114
in the space ൫ : ǡ ࣠൯ǡ i.е., as the events of the probability space ൫ : ǡ ࣠ǡ ܲ൯ǡ the events A'1 and A' 2 (and therefore the events A1 (event of the trial G1 ) and A2 (event of the trial G2 )) are independent. And this, in turn, means independence of the σalgebras G1 and G2 . The concept of independence of the trials G1 , G2 ,..., Gn is defined similarly: we assign to each trial Gi the probability space ( : , ࣠ , ܲ ሻ and form a «composite» probability space ሺ : ǡ ࣠ǡ ܲሻ as follows: A' i l Ai : Ai is Aic
: :1 u : 2 u ...u : n , ࣠ ൌ ࣠ଵ ۪ ࣠ଶ ۪ ǥ ۪ ࣠ ൌ ൌ ሼ A1 ൈ A2 ൈ ǥ ൈ An ǣ A1 ࣠ଵ ǡ A2 ࣠ଶ ǡ ǥ ǡ An ࣠ ሽǡ
the function P is defined as a certain probability function on ሺ : ǡ ࣠ሻ. If for any events A1 × A2 ×...× An ࣠the following condition takes place: P A1 u A2 u ... u An
P1 A1 P2 A2 ....Pn An ,
(11′)
then we will call the trials G1 , G2 ,...,Gn independent trials. Furthermore, if for any indexes 1 d i1 i2 ... ik d n, k 2,3,..., n , n 2,3,..., and events A ೕ ࣠ೕ
P Ai1 u Ai2 u ... u Aik
Pi1 Ai1 Pi 2 Ai2 ...Pik Aik ,
then the sequence of trials G1 , G2 ,... will be called a sequence of independent trials. Examples 8. Bernoulli scheme. In this model
:
^Z : Z Z ,Z ,...,Z : Z 1
2
n
i
0,1`. ,
ࣛ ൌ ሼ A ǣ A ك: }, PZ
where
Z
Z
p q
n Z
,
(12)
Z1 ... Zn
is the number of successes. Let A ࣛ. If this event is determined only by the value Zk , then we will say that this event depends on the trial at the kth moment of time. Examples of such events are events 115
Ak
^Z : : Zk
1` ,
Ak
^Z : : Zk
0` .
Now we consider the partition D and the algebra ࣛ k generated by the event Ak k 1,2,..., n :
Dk ^Ak , Ak `, ࣛ ൌ ߙ൫ D ൯ ൌ ൛ ǡ : ǡ Ak ǡ A ൟǤ Then it is not difficult to show that
P( Ak )
p, P( Ak Al )
P( Ak ) P( Al )
p 2 k z l ,...
etc, in general, for k 2,3,..., n , i j z il j z l P ሺ A A ǥ A ሻ ൌ P ሺ A ሻ P ሺ A ሻǥ P ሺ A ሻ. మ ೖ భ మ ೖ భ
The last relation shows the independence of events A1 , A2 ,..., An , and this in its turn (in view of the assertions proved above) means the independence of partitions D1 , D 2 ,...,D n and generated by them algebras ࣛଵ ǡ ࣛଶ ǡ ǥ ǡ ࣛ ǡ thereby the independence of the corresponding trials. J. Bernoulli was the first scientist who studied the model examined and proved for it the socalled Large Numbers Law (this law will be considered in Chapter V, §1). This model can be called a «model of independent trials, in which each trial has only two outcomes with a probability of success p». In the literature, this model is usually called the sequence of independent Bernoulli trials or simply the Bernoulli scheme (Ch. I, §2, p. 2.1). 9. Polynomial scheme. If each trial has r outcomes with the probabilities of their occurrences p1,..., pr ( p1 p2 ... pr 1, r t 3) , then the probability that as a result of n independent trials (in a predetermined order) 1st outcome will occur n1 times, 2nd – n2 times, ..., r th – nr times, where n1 n2 ... nr n, n j t 0, is equal to
p1n1 p2n2 ... prnr (because of the independence of the trials, the probability of the products of events is equal to the product of their probabilities, and, for example, if the 1st outcome occurs n1 times, then in this product p1 occurs n1 times, etc. In its turn, the number of ways of occurrence of events in an arbitrary order in accordance with the abovementioned numbers of outcomes, i.е. the number of nonnegative solutions of the equation 116
n1 n2 ... nr
n, n j t 0,
is equal to (Ch. I, §1, p. 1.1, theorem 3)
n! . n1 !n2 !...nr ! Therefore, the probability that, as a result of n independent trials, the outcome No. i will occur (in an arbitrary order) ni times, where
i 1,2,..., r ; n1 n2 ... nr
n,
is equal to Pn (n1 , n2 ,..., nr )
n! p1r1 p2r2 ... prnr . n1 !n2 !...nr !
(13)
The described sequence of independent trials (model) is called a polynomial scheme, and distributions (13) – a polynomial (multinomial) distribution (see Chap. I, §2, p. 2.1). 10. Negative binomial distribution. Let’s find the probability that in the sequence of independent Bernoulli trials with the probability of success р the success #n will occur in the ( n k )th trial (k = 0,1,2,…). If the success No. n occurred in the ( n k )th trial, then it means that in the last, ( n k )th trial, success (event A) occurred, and in the previous ( n k 1 ) trials n 1 successes and k failures (event B) take place. Since the trials are independent, then the events A and B are independent and the required probability is n1 n1 k
C
p(n, n k ) P(BA) P(B)P( A) pn1qk p Cnnk11 pnqk (k 0,1,2,...).
(14)
This distribution is called the negative binomial distribution. The name of the distribution is associated with a (formally determined) binomial coefficient Ckn
(n)(n 1) ... (n k 1) k!
(1)k Cnk k 1
1
k
Cnnk11 .
(14*)
(For the integer negative numbers (– n ) the coefficients C kn are defined formally, by analogy with binomial coefficients). The last relation allows us to write the distribution (14) in the form
p(n, n k )
C kn p n (q) k , k 117
0,1,2,... .
Further, since f
n 1
¦ C n k 1 q
k
k 0
f
k
¦ C n (q )
k
(1 q) n
p n ,
k 0
Then f
¦ p(n, n k ) k 0
f
p n ¦ Ckn (q) k
pn pn 1 .
k 0
The latter means that the set of numbers ^ p(n, n k ), k 0,1,2,...` is really a probability distribution. The negative binomial distribution is sometimes called the Pascal distribution. In the case of n 1 the distribution (14) becomes a geometric distribution. 11. The Banach problem. Suppose there are 2 n matches in the matchboxes a and b (n in each matchbox). Someone chooses a matchbox a or b , each time with the probabilities P(a) p, P(b) q 1 p (respectively) and uses one match from the selected matchbox (for example, lights a cigarette). Find the probability that, when for the first time an empty matchbox was taken, then in the second box there are r (r d n) matches. Solution. We solve the problem with the help of a negativebinomial distribution. It is not difficult to see that the desired probability is the probability of an event A = {to select the first time the empty matchbox a ( b ) you need to select this matchbox n times, the other matchbox n – r times in the first n + n  r trials, and to select the matchbox a ( b ) in the last trial}. To implement this event, you need to make n 1 n r 2n r 1 extractions of matchboxes (trials). Let H1 and H 2 be the events, which mean that the matchbox a (event H1 ) or the matchbox b (event H 2 ) will be the last matchbox to be taken out. We have A
AH1 AH 2 .
Here the event AH1 means that when the first 2n r trials the matchbox b was extracted n – r times, the matchbox a – n times and the matchbox a was extracted when the last trial (i.e. in the trial № 2n r 1 ). Then by the negativebinomial distribution formula
P( AH1 )
C2nnr pnqnr p
Similarly,
P( AH 2 )
C2nnr pn1q nr .
C 2nn r q n 1 p n r . 118
By the addition of probability formula, the needed probability is
P( A)
C 2nn r p n 1 q n r q n 1 p n r . .
The found probability is not equal to the probability that at the time when some matchbox turned out to be an empty (but was not taken out empty), the other box contains exactly r matches. Note also that the first discovered empty matchbox does not have to be the first emptied matchbox (we offer to understand the described situations). Definition 7. Letሺ : ǡ ࣠ǡ ܲሻ be a probability space; ࣠ଵ ǡ ࣠ଶ ǡ ǥ ǡ ࣠ – σsubalgebras of a basic σalgebra ࣠ ǣ ࣠ ࣠ كǡ ࣠ െ σalgebras. If any events A1 ࣠ଵ ǡ A2 ࣠ଶ , ... , An ࣠ are independent, then σalgebras ࣠ଵ ǡ ࣠ଶ ǡ ǥ ǡ ࣠ are called an independent σalgebras. If for any n 2,3,... any n σalgebras, taken from the sequence of σalgebras ࣠ଵ ǡ ࣠ଶ ǡ ǥ are the independent σalgebras, then the sequence of σalgebras ࣠ଵ ǡ ࣠ଶ ǡ ǥ is called a sequence of independent σalgebras. It follows from definition that, if ࣠ଵ ǡ ࣠ଶ ǡ ǥ is a sequence of independent σalgebras, then for any nonnegative integers n1 , n2 ,... the subsequence ࣠భ ǡ ࣠మ ǡ ǥ is also a subsequence of independent σalgebras. It is obviously that the σsubalgebras of independent σalgebras are also an independent σalgebras. Theorem 6. (The fundamental theorem). If ࣛଵ ࣛଶ are the independent algebras, then the σalgebras ࣠ࣛభ ࣠ࣛమ ǡgenerated by them, are independent. Before proving this theorem, we prove so called approximation theorem, according to which any event from the σalgebra ࣠ࣛ , generated by some algebra ࣛ, can be approximated by a sequence of events from ࣛ. Theorem 7. (approximation theorem). Let ሺ : ǡ ࣠ǡ ܲሻ be a probability space and ࣠ࣛ is a σalgebra, generated by some algebra ࣛ of events from ࣠ࣛ . Then for any А ࣠ࣛ there is a sequence An ࣛ, n 1, 2,... with the property lim P An 'A
nof
lim P An A
nof
An A
0.
(15)
Remark 3. The condition (15) is equivalent to the conditions
lim P A \ An
nof
Since
lim P An \ A 0 .
nof
(15′)
P( A) P AAn PAAn P An PAn A PAn A ,
Then the assertion of this theorem means that P A lim P An , and any event nof
An ࣛ can be represented as a limit of a sequence of events from the generating algebra ࣛ (up to a set of zero probability). 119
Proof of the approximation theorem. If for an event А ࣠ there is a sequence of events An ࣛǡ satisfying condition (15), then we agree to call this event an approximble event. To prove the theorem, it suffices to verify that the set of approximable events ࣠ ={А ࣠: A is an approximable event} forms a σalgebra and that ࣛ ࣠ ك . The last inclusion is obvious. Besides, if А ࣠, B ࣠ , then A = : ךA ࣠ and A B ࣠ , i.е., ࣠ is an algebra. Furthermore, if the sequence Bn , N approximate BN when n n( N ) o f , and the event BN approximate the event B when N o f in the sense (15), then Bn , N approximate the event B (in the sense (15)) when N o f , n n( N ) o f . To see this, it suffices to note, for example,
0 d P( BBn, N ) d P( BB N ) P( B N Bn, N ) o 0 ( N o f , n n( N ) o f ). Let us now consider the event C
f k 1
n
Since Dn
k 1
Ck , where C ࣠ .
Ck C are the approximable events and P( DnC ) o 0 ( n o f ),
then from what was said above it follows that the event C is also approximable and, f
consequently, C
Ck ࣠ ǡi.e. ࣠ is a σalgebra.
ז
k 1
Proof of the fundamental theorem. If A1 ࣠ࣛభ and A2 ࣠ࣛమ , then, by the approximation theorem, there are a sequences A1n ࣛଵ and A2n ࣛଶ such that
P( Ai Ain with the notation B
A1 A2 , Bn
Ai Ain ) o 0 ( n o f ), i 1, 2, … . A1n A2 n , we obtain:
P( BBn BBn ) d P(BBn ) P(BBn ) o 0 , n o f , P( A1 A2 ) lim P( Bn ) lim P( A1n ) P( A2 n ) P( A1 ) P( A2 ) . nof
nof
ז
4.3. Total probability and Bayes formulas In this item we first prove a simple but important formula, called the formula of total probability. This formula is the main tool for calculating the probabilities of complex events using conditional probabilities. In conclusion of this item, questions of reassessing probabilities of hypotheses with the use of additional information will be considered, i.e. the socalled Bayes formulas are proved. Each subitem ends with a discussion of the relevant examples. 120
4.3.1. Total probability formula Theorem 8. Let ሺ : ǡ ࣠ǡ ܲሻ be a probability space, A ࣠, and an events , i z j ), have positive probabilities
H 1 , H 2 ,... ࣠ are pairwise disjoint ( Hi H j f
Ρ H ! 0 and satisfy the condition ¦ H i
i
:.
i 1
Then the probability of the event A can be found by the total probability formula f
Ρ A
¦ P(H i )P(A / H i ) .
(16)
i 1
Proof. It follows from the conditions of theorem that A
f
¦ AH i , i 1
and the events of a sequence AH1 , AH 2 ,... are pairwise disjoint. Then, by the countable additivity axiom, the probability P( A)
f
¦ P( AH i ) . i 1
Further, by the multiplication of probabilities formula,
P( AHi ) P( Hi ) P( A / H i ) . Substituting these probabilities into the previous formula, we obtain the desired formula (16).ז The total probability formula is usually applied in those cases when (for one reason or another) it is easier to find conditional probabilities P( A / Hi ) and probabilities P( H i ) than to directly calculate the probability P( A) . Examples 12. There are n1 white and m1 black balls in first urn and n2 white and m2 black – in second one. One ball is randomly selected from the first urn and is moved to the second one. Then one ball is randomly selected from the replenished second urn. Find the probability that this last ball is a white ball (event A ). Solution. Let’s denote by H1 ( H 2 ) the event, consisting in the fact that a white (black) ball was replenished from the first urn to the second urn. 121
Then P ( H1 ) P ( A / H1 )
n1 , n1 m1
m1 , n1 m1
P( H 2 )
n2 1 , n2 m2 1
P( A / H 2 )
n2 . n2 m2 1
Further, by the total probability formula (16) we have P( A)
n1 (n2 1) m1n2 (n1 m1 )(n2 m2 1) (n1 m1 )(n2 m2 1) n1 (n2 1) m1n2 . (n1 m1 )(n2 m2 1)
Let us consider a special case. Let the compositions of balls in urns be the same:
n1
n2
n, m1
m2
m.
Then P( A)
n ൌ P ൫ H1 ൯Ǥ nm
From the above we can draw the following conclusion. Suppose there are N urns with the same composition: there are n white and m black balls in each urn. Suppose at the first step one random ball is selected from the first urn and this ball is moved to the second urn. At the second step one random ball is selected from the replenished second urn and this ball is moved to the third urn, etc. At the last step one random ball is selected from the urn № N. Then the probability that this last ball is a white ball is equal to n . nm
13. The task of choosing the strategy for passing the exam. Let the exam on a particular subject be conducted orally and the teacher (examiner) prepared n exam tickets. Let the rules for taking the exam are such that students will successfully pass the exam provided they choose their «lucky» tickets, i.e. tickets, they know the answers to questions from them. Students go to the exam one at a time, select a ticket at random (without returning) and answer the ticket questions. If the student was able to successfully prepare only for m tickets, m d n , then, to successfully pass the exam with the greatest probability, what strategy should he choose: should he go to the exam first, second, somewhere in the middle or at the very end? 122
Solution. Cases when A enters the exam first or second, we have already considered in Example 2 (see the remark at the end of the solution of this example) (see the note at the end of the solution of this example) and it turned out that in both cases the probability of a successful passing of the exam is the same: m . n
We now consider the general case. We introduce the events: Ai – student A went to the exam ith in order and pulled his «lucky» ticket ( i 1, 2, ..., n. ), H (ij ) – (i1)
students who went to the exam before A, pulled out j «lucky» tickets j 0,1,2,..., i 1 . Then for the probabilities P( Ai ) , by the total probability formula, we can write: i 1
(i ) (i ) ¦ P H j P Ai / H j , i 3,...,n .
P( Ai )
j 0
(17)
If from the first i 1 entered examiners j students pulled «lucky» for A tickets, then A, entered the ith, will pull his «lucky» ticket with the probability
Ρ Ai /H (i) j
m j , n i 1
Because in this case total remainder is n (i 1) n i 1 tickets, and m j «lucky» among them. Similarly, the probability that out of i 1 examiners who entered the exam (i ) before A, exactly j students pulled out a «lucky» for A ticket (event H j ), according to the hypergeometric distribution, is equal to
P H ji
Cmj Cni 1m j Cni 1
.
Substituting these probabilities into the formula (17), we obtain: P Ai
m i 1 Cn (n i 1)
i 1
i 1 C j C i 1 j m nm C ni 1 j 0
¦
¦ Cmj Cni 1m j
j 0
m j n i 1
i2 m ¦ Cmj 11Cni 21((mj 11)) . i 1 Cn (n i 1) j 0
123
(18)
Taking into account the proved earlier formula (Ch. І, §2, p. 2.3, formula (7)) k
¦ Cml Cnkml
Cnk ,
l 0
we can write down P( Ai )
m Cni 1 (n i
>C 1)
m m(i 1) n i 1 n(n i 1)
i 1 n
Cni 21
@
mn m(i 1) n(n i 1)
m . n
Hence, P( A1 )
P( A2 )
...
P( An )
m . n
From the last relations we can draw the following conclusion: If you do not have any additional information in advance, then for A it does not matter, first he will go to the exam, or somewhere in the middle, or go down the very last one – the probability of successfully passing the exam is constant and equal to
m , i.е. depends only on the n
level of his knowledge. 14. Gambler's ruin problem. Two gamblers, A and B, play in some game before the complete ruin of one of them. Let the starting capitals of A and B are equal to a and b CU (respectively). A and B can win every game with the probabilities p and q (respectively), where p + q = 1 (we do not take into account the draw), and the size of the win of one of the players (which means that the size of the loss of another) is equal to 1 (one) CU. Assuming the outcomes of individual parties are independent, find the probabilities of ruin of each of the players. Solution. Let’s denote by An an event, consists in fact that, having the starting capital (at the beginning of the game) n CU, the player A eventually loses the game (is ruined), and through pn denote the probability of this event: pn P An . Then we need to find the probability pa (probability of ruin of A). It is clear that pa b 0 , p0 1 . If a player at the beginning of some party had capital in n CU, then his ruin can occur in two cases: or he will win the next party (an event H 1 ), or lose the next party (an event H 2 ). Then by the total probability formula pn
P An
In our problem
PH1 P An H1 PH 2 P An H 2 .
P H1
(18)
p, P H 2 q 1 p,
and, because A win the next party, then he has n + 1 CU, but he still lost the game, we have 124
Similarly
P An H1
pn1 ,
P An H 2
pn 1 .
Substituting these probabilities in (17), we obtain pp n1 qp n1 .
pn
Taking into account equality p q 1 , the last relation can be written in the form
p p n1 p n q p n p n1 .
(19)
Now we need to solve this last equation (19). First we consider the case p q 1/ 2 , i.е. the case, when the skill of players is the same. In this case, equation (19) is written in the form
pn 1 pn
pn pn 1 ,
and, assuming successively in this equation n 1, n 2,... , we find that
p n 1 p n
pn pn 1
...
p2 p1
p1 p0
c,
where c is some constant. It follows that
p1
p0 c , p 2
p0 2c ,..., pn
p0 nc .
We have p0 1 , pa b 0 , this means pn 1 nc . Further, since pab 0 , then
0 i.е. c
p a b
1 a b c ,
1 /(a b) , hence,
pn pa
1 n /( a b) , 1
a ab
b . ab
(20)
By analogy, for the probability qb of ruin of the second player, we obtain the formula a . (20′) qb ab 125
From the formulas (20 ') and (20) we can draw the following conclusion for equally skilled players p q 0,5 : one, who has a greater starting capital, has a lower probability of ruin: from a ! b follows pa qb , i.е. from a ! b follows
1 p a ! 1 qb .
§b ©a
· ¹
If, however, a !! b ¨ ~ 0 ¸ , i.е., if the first player's starting capital is much larger than that of the second player, then p a ~ 0 (т.е. qb ~ 1 ). In other words, we come to the conclusion that A practically does not lose (B is practically ruined). We now consider the general case p z q . By replacing in formula (19) n by k, then multiplying term by term from k = 1 to k = n, we have n
n
p pk 1 pk
q pk pk 1 .
k 1
k 1
It follows that n
n
k 2
k 2
p n pk pk 1 pn1 pn q n pk pk 1 p1 p0
p0
Reducing the last equality by the common multiplier and taking into account that 1 , we obtain the following relation:
pn1 pn
q / p n p1 1 .
It follows that
pn
p a b p n
a b 1
¦
k n
a b 1
pk 1 pk p1 1
p1 1 q / p
n
1 q / p
a b n
1 q / p
1 q / p 1 q / p
a b
k n
.
By calculating the value of both sides of (21) when n 1 p1
k ¦ q / p
(21)
0 , we see that
.
Substituting the value found for 1 p1 in formula (21), we obtain a solution of equation (19): 126
q / p q / p a b 1 q / p n
pn
a b
.
(19')
Substituting the value n a in last formula, we find the probability p a the probability of gambler A ruin: pa
q / p a q / p a b a b 1 q / p
1 p / q
q a pb q a b
b
1 p / q
p a b q a b
a b
.
(22)
Performing similar calculations for the probability of ruining the player B, we obtain the formula qb
1 q / p a
1 q / p a b
.
(23)
From the obtained formulas we can draw the following conclusions: If the skill of the player A is weaker than the skill of the player B p q , but the b player's A initial capital is much more than the player's B (a !! b, i.е. ~ 0) and a
§ p· ¨ ¸ ©q¹
a b
~ 0 ,then b
§ p· p a ~ 1 ¨¨ ¸¸ , ©q¹
b
§ p· qb ~ ¨ ¸ . ©q¹
From the above relations, we conclude that by regulating the relationship between the parameters p , q , b , a it is possible to reduce the probability of ruin of a player having a smaller initial capital, but playing more skillfully, and vice versa. 15. In reliability theory, the reliability function or simply reliability p(t ) is the probability that the device will function correctly from the initial moment t 0 to the current moment t. The limit of the ratio of the conditional probability of failure of a device that worked properly up to the moment t in the time interval t , t ' t (we denote this probability by p t ' t / t ) to the value of this interval (when ' t o 0 ) is called the failure rate. Assuming that the failure rate O t is known, to find the reliability function p(t ) Solution. Let’s introduce the following events: A {the device failed in the time interval t , t ' t }, C {the device was serviceable in the time interval 0,t ' t }, B {the device was serviceable in the time interval 0,t }.
127
Then C
P C
BA ,
P( B) P A / B
and from the condition of the problem p(t ) P(B) , p(t ' t ) P(C ) , P( A / B) O t lim , ' t o0 't
P( A / B) 1 O t ' t o ' t .
From the relations obtained it follows that
pt (1 O t ' t o(' t )) ,
p(t ' t )
p(t ' t ) p(t ) 't
O t p t
o't p t . 't
Passing to the limit when ' t o 0 , we obtain the equation
p ct
O t pt , p(0) 1 ,
the solution of which is the function
t ½ p(t ) exp ® ³ O s ds ¾ . ¯ 0 ¿ 4.3.2. The Bayes formulas Theorem 9. Let an events А and H 1 , H 2 ,... satisfy the conditions of the theorem 8. Then, if P( A) ! 0 , then the following Bayes formulas take place: Ρ H i / A
Ρ Hi Ρ A / Hi f
¦ ΡH j Ρ A / H j
,
i 1, 2,... .
(24)
j 1
Proof. By the conditional probability formula Ρ H i / A 128
Ρ H i A . Ρ A
(25)
Furthermore, by the probabilities multiplication formula, a numerator of the formula (25) is equal to ΡH i A ΡH i Ρ A/H i , and (by the total probability formula) the probability Ρ A of the denominator of (25) is equal to the expression of denominator of (24). ז The general scheme of application of the Bayes formula to the solution of practical problems is as follows. Let the event A can occur under different conditions, with respect to the nature of which the assumptions (hypotheses) H 1 , H 2 ,... can be made, and for whatever reasons we know the probabilities Ρ H i of these hypotheses. Let it also be known that the hypothesis H i gives a probability Ρ A/H i to the event A . If an experiment, in which an event A has occurred, is made, this should cause a reassessment of the hypotheses H i probabilities – Bayes' formulas qualitatively solve this problem. Usually, the probabilities of hypotheses Ρ H i are called a priori (preexperimentally determined) probabilities, but Ρ A/H i – a posterior (determined after the experiment) probabilities. Examples 16. At the factory manufacturing some products, the machines A, B,C produce 25%, 35%, 40% of all products (respectively). Defect in their products is 5%, 4% and 6% (respectively). All products manufactured by the factory are assembled into a common warehouse. а) What is the probability that a randomly taken product from a warehouse will be defective (event D)? b) What is the probability that the selected product, which turned out to be defective, was produced by the machine A( B,C ) " Solution. Let’s denote by A, B,C an events, consisting in the fact, that the selected product was produced by the machines A, B,C (respectively). а) By the total probability formula P( D)
P( A) P( D / A) P( B) P( D / B) P(C ) P( D / C ) = 0.25ή0.05+0.35ή0.04ή0.4ή0.02 = 0.0345.
b) By the Bayes formula the required probabilities are equal to ܲሺܣȀ ܦሻ ൌ
ܲሺܣሻܲሺܦȀܣሻ ͲǤͲͳʹͷ ʹͷ ൌ ൌ ܲሺܦሻ ͲǤͲ͵Ͷͷ ͻ
ܲሺܤȀ ܦሻ ൌ
ͲǤͲͳͶ ʹͺ ܲሺܤሻܲሺܦȀܤሻ ൌ ൌ ͲǤͲ͵Ͷͷ ͻ ܲሺܦሻ
129
ܲሺܥȀ ܦሻ ൌ
ܲሺ ܥሻܲሺܦȀܥሻ ͲǤͲͲͺ ͳ ൌ ൌ ܲሺܦሻ ͲǤͲ͵Ͷͷ ͻ
17. Suppose there are two coins in the urn: H 1 – symmetrical coin with probability of falling of the Tail, equal to 1 / 2 , and H 2 – asymmetrical coin with probability of falling of the Tail, equal to 1/ 3 . At random, one of the coins is removed and thrown. Suppose that a Tail occurred (an event A). The question is, what is the probability that the selected coin is symmetrical (asymmetrical)? Solution. By the condition of problem
P ( H1 )
P( H 2 ) 1/ 2, P( A / H1 ) 1 / 2,
P( A / H 2 ) 1/ 3 ,
and by the total probability formula P( A)
1 1 1 1 5 . 2 2 2 3 12
A reassessment of the probabilities of the hypotheses H 1 and H 2 is obtained from the Bayesian formulas: P( H 1 ) P( A / H 1 ) 1 / 4 P( A) 5 / 12 1/ 6 2 P( H 2 / A) . 5 / 12 5
P( H1 / A)
3 . 5
4.4. Tasks for independent work 1. Three numbers are selected from the set of numbers ^1,2,..., n` in the random selection scheme without returning. Find a conditional probability that third number will be in the interval between first and second numbers, given that first number less than second one. 2. Three dice are tossed. If you know that a different number of points fell on the dice, then what is the probability that the «six» fell on one of them? 3. It is known that when throwing five dice, a «1» fell out on at least one of them. What is the probability that in this case a «1» fell out at least twice? 4. Three dice are tossed. Find the probability of falling out six points on all dice, if you know that: а) six points fell out on one die; b) six points fell out on the 1st die; c) six points fell out on two dice; d) the same number of points fell out at least on two dice; e) the same number of points fell out on all dice; f) six points fell out at least on one die. 130
5. Four balls are randomly placed in four boxes. If it is known that the first two balls were placed into different boxes, then what is the probability that there are exactly three balls into some box? 6. It is known that with the random placement of seven balls into seven boxes, exactly two boxes remained empty. Prove that then the probability of placing three balls in one of the boxes is 1/4. 7. From an urn containing m white and nm black balls, r balls are extracted according to the random selection scheme without return. An event A0(i ) A1(i ) i th extracted ball is black (white). Find the conditional probabilities
ሺ௦ାଵሻ
ܲሺ A ଵ
ሺଵሻ
ሺଶሻ
ሺ௦ሻ
Ȁ A భ A మ ... A ೞ ሻǡ ݍ = 0 or ݍ = 1.
How will these probabilities change if the balls were extracted by random selection with a return? 8. Let a sample space be a union of a set, consisting of all r! permutations, obtaining from the elements a1 , a 2 ,..., a r and sets Z j (a j , a j ,..., a j ), j 1,2,..., r . We assume that each permutation has
2 a probability 1 r 2 (r 2)! , and each sequence Z j  a probability 1 r . Let the event Ak means that the element ak is in its own (kth) place (k 1,..., r ) .
Show that in this case an events A1 , A2 ,..., Ar are pairwise independent, but for different indexes i, j , k an events Ai , A j , Ak aren’t independent. 9. There are 3 white, 5 black and 2 red balls in some urn. Two players alternately retrieve one ball without returning from this urn. The winner is the one who takes out the white ball first. If a red ball appears, then a draw is declared. Consider following events: A1 = {the player, who starts the game, wins}, A2 = {the second participant wins}, В = {the game ended in a draw}. Find the probabilities of an events A1 , A2 , В.
10. Die is tossed twice. Let [1 and [ 2 be the number of points, occurred on 1st and 2nd die (respectively). Find the probabilities of following events:
A1 ={ [1 is divided into 2, [ 2 is divided into 3}; A2 ={ [1 is divided into 3, [ 2 is divided into 2}; A3 ={ [1 is divided into [ 2 }; A4 ={ [ 2 is divided into [1 }; A5 ={ [1 [ 2 is divided into 2};
A6 ={ [1 [ 2 is divided into 3}. Find all pairwise independent events Ai , A j ( i, j are different). 11. Is the following equality take place for any events A and B ( P( B) ! 0) : а) P( A / B) P( A / B ) 1 ; b) P( A / B) P( A / B ) 1? 12. Show: in an event A doesn’t depend on himself, then P( A) 0 or P ( A) 1 . 13. Show: if an events A and B are independent and P( A B) 1 , then P( A) 1 or P ( B ) 1 14. Let A and B are the independent events. 131
Prove that, if an events A B and A B are independent, then P( A) 1 , or P ( B ) 1 , or P ( A) 0 , or P ( B ) 0 . 15. Let A1 , A2 ,... be some sequence of independent eevnts. f
Then, for the convergence of a series ¦ 1 P Ak it is necessary and sufficient that k 1
§f · P¨ Ak ¸ ! 0 . ©k 1 ¹ Is it possible to replace the condition of independence of events by the condition of pairwise independence of events in this statement? 16. There are 3 white, 2 black balls in 1st urn and 2 white, 3 black balls in 2nd urn. Two balls are randomly selected from 1st urn and removed into 2nd urn. After this, one ball is randomly selected from 2nd urn. Find the probability that it is white. 17. There are 2 groups of urns: 2 urns with 3 white, 4 black balls in each of urns and 3 urns with 4 white and 3 black balls. One of urns is randomly selected and one ball is taken out of it at random. It was black. What is the probability that this ball was selected from an urn of 1st group? 18. There are 2 white, 1 black ball in 1st urn and 3 white, 2 black – in 2nd one. From each urn, one ball is extracted at random, and these two balls are transferred to the 3rd, empty urn. After that, one ball is removed from the last urn. What is the probability that it is white? 19. There is one ball in an urn, and it is known only that it is either white or black (with equal probability). One white ball was added to this urn, then one ball was removed from the replenished urn at random. If the last (extracted) ball turned white, what is the probability that there was originally a white ball in the urn? 20. There were m t 3 white and n black balls in an urn. Then one ball of unknown color was lost. To determine the color of the lost ball, two balls are extracted from the urn at random. If you know that these two balls turned out to be white, then what is the probability that the lost ball is also a white ball? How will the answer change if a b balls were randomly selected from the urn, and it turned out that there are a white and b black balls among them? 21. There are three balls in an urn, each of which can be white or black. Suppose that all four assumptions (hypotheses) about the composition of balls in an urn are equally probable. The four balls are extracted from the urn under the random selection scheme with the return. Find the probability that the balls appeared in the following sequence: black, white, white, white. Find the probabilities of hypotheses about the initial composition of balls in the urn under the assumption that the abovedescribed event A = {black, white, white, white} occurred. 22. Find the probability that when 2n independent Bernoulli trials with the probability of success p there will be m n successes and all trials with even numbers will end with success.
23. Let some insect lay k eggs with probability e O
Ok k!
(k
0,1,2,..., O ! 0) , and the probability
of developing an insect from the egg is p. Assuming mutual independence of development of eggs, to find the probability that the insect will have exactly m descendants. 24. In Bernoulli scheme, p is the probability of outcome «1» (success), and q 1 p – the probability of outcome «0» (failure). Find the probability that the chain «00» will appear before the chain «01». 25. Continuation. Find the probability that the chain «00» will appear before the chain «10». 132
26. Continuation. Find the probability that the chain «00» will appear before the chain «111». 27. Let the players A and B play a game consisting of separate parties, under the following conditions: the player, who wins the separate party, receives 1 (one) point. Player A can win each game with the probability D, and player B – with the probability E, where D ! E , D E 1 . It is believed that the player who first breaks away from the opponent by two points wins the game. а) Find the probability Р(А) that the game will be won by the player A. What is Р(В)? b) What is more profitable for A: to play one game, or play to the end, to win? 28. Continuation. Solve the previous task, if the condition of the game is changed as follows: who will win two consecutive games, he will win the whole game. 29. The Banach problem.Suppose there are m and n matches in the matchboxes a and b (respectively). Someone randomly chooses a matchbox a or b with the probabilities P(a) p and P(b) q 1 p (0 p 1) (respectively) and uses one match from the selected matchbox (for example, lights a cigarette). Find the probability that, when for the first time an empty matchbox was taken, then there are r matches in the second box. 30. Continuation. Suppose m n, p q 1 / 2 . Then what is the probability that the emptiness of the selected matchbox was detected when this (empty) matchbox was selected for the first time? 31. One of the sequences of letters is transmitted via the communication channel: AAAA, BBBB , CCCC with the probabilities p1 , p 2 , p3 (respectively) p1 p2 p3 1 . Each transmitted letter is correctly received with probability D and with the probabilities 1 1 D and 2 1 1 D received for each of the other two letters. It is assumed that the letters are distorted 2
independently of each other. Find the probability that AAAA were received, given that ABCA were accepted.
133
134
Chapter
ІІІ RANDOM VALUES
§1. Random values and their distributions Definition1.ሺ : ǡ ऐǡ ሻ is a general probability space, and [ is a numerical function defined on this space [ : : o R . If for any Borel set condition B E R the condition
[ 1 B
^Z : [ Z B ` ࣠,
(1)
then such a function [ is called a random variable. We note that the function [ satisfying condition (1) is called an ࣠measurable function (or simply a measurable function). If : R n , ऐ=ࢼሺ ሻ, then a measurable function is called an ndimensional Borel (if n 1 , then simply Borel) function. Remark 1. The requirement of measurability (1) is very important: if ሺ : ǡ ऐሻ a probability measure is given on P , then it makes sense to talk about the probability of an event [ 1 B ^Z : [ Z B ` that the values of a random variable belong to a Borel set B (say, to any interval or set B formed from intervals). We now define the set R, E R function in space P[ by the condition: P[ B P ^Z : [ Z B`
P^[
1
B ` ,
B E R .
(2)
It is easy to see that the set function P[ defined by condition (2) is a probability function. Indeed, for any B E R , P[ B P ^[ 1 B ` t 0 (axiom Р1), P[ R P: 1 (axiom Р2). 135
For any B1 , B2 , ... E R , Bi B j
i z j
§ f · § f ·½ P[ ¨ ¦ Bi ¸ P ®[ 1 ¨ ¦ Bi ¸¾ P © i 1 ¹ ¯ © i 1 ¹¿
^
`
f
f
f
i 1
i 1
¦[ 1 Bi ¦ P^[ 1 Bi ` ¦ P[ Bi , i 1
axiom P3 is also fulfilled. (Above we used the following properties of the operation of taking the preimage of a set: § ©
· ¹
[ 1 ¨ Ci ¸
Ci C j
i
i z j , то
i
[ 1 Ci ;
[ 1 Ci [ 1 C j ).
Definition 2. The probability function (measure) P[ determined by the condition (2) is called the distribution or the distribution law of a random variable [ . So, if a random variable [ is defined on a probabilistic space ሺ : ǡ ऐǡ ሻ, then this random variable generates a new probability space
R, E R , P . This new [
probability space R, E R , P[ is called a random probability space generated by a random variable [ . Definition 3. The relation F[ x
P ^Z : [ Z d x`
P[ f, x @
,
xR ,
(3)
function F[ (x) is called the distribution function of a random variable [ . We will show that the function F[ (x) is a distribution function in the sense of the definition of the distribution function from the theory of functions, that is, satisfies the properties of F1, F2, F3 from Chapter II, §3, item 3.1. Indeed, if x1 x2 , then f, x1 @ f, x2 @ and
F[ x1 P[ f, x1 @ d P[ f, x2 @ F[ x2 (property F1). Further, since F[ x is a monotonically nondecreasing function and, by the axioms of nonintersection from above and from below (axioms Р3', Р3''): if xn p x , then F[ x 0 lim F[ xn xn p x
lim P[ f, xn @ P[ f, [email protected] F[ x (property F3);
xn p x
if xn p f or xn n f , then respectively 136
lim F[ xn F[ f P[ 0 ,
xn p f
lim F[ xn F[ f P[ R 1 (property F2).
xn n f
It turns out that the converse assertion also holds. Theorem 1. Let F x be a distribution function, i.e. it satisfies the properties F1, F2, F3. Then there is a probability space ሺ : ǡ ࣠ǡ ܲሻ and a random variable defined on this probability space such that the function F x is a distribution function of this random variable [ : F[ x F x , x R .
Proof. First, we form the probability space ሺ : ǡ ࣠ǡ ܲሻ. As the space of elementary events : , we take the real axis, and for the σalgebra ࣠ we take a Borel σalgebra E R : ࣠=ߚሺܴሻ. Further, repeating wordforword the proof of Theorem 2 of 3.1, §3 of Ch. ІІ, we construct the required probability space. Finally, as a random variable [ , we take the identity map of real numbers to itself, i.e. [ : R o R , [ x x . Then F[ x P0 ^[ d x` P0 f, [email protected] F x .
ז
Comment. In some textbooks (for example: [11], [25]), the distribution function is defined as the probability of a random variable to fall into a rightopen interval f, x , i.e.
Fξ x P^ω: ξ ω x`
Pξ f,x , x R,
(3′)
In this case (that is, if F[ x is defined by formula (3′) ), the distribution function
is nonintersecting on the left F[ x 0 F[ x . Like any distribution function, the distribution function F[ x of a random variable [ has at most a countable number of points of discontinuity (Chapter II, §3, § 3.2.1, Theorem 3): C F[
^x : F[ x F[ x 0 ! 0` ^x1, x2 ,...`.
Using the definition (3), we show that for any interval the probability of a random variable [ falling into this interval can be calculated in terms of the distribution function F[ x of this random variable. 137
Let's start with x1 x 2 . Then, since
^Z : [ Z d x1` ^Z : x1 [ Z d x2 ` ^Z : [ Z d x2 ` then by the axiom (property) P3,
P^Z : [ d x1` P^Z : x1 [ d x2 ` P^Z : [ d x2 `. From the last relation (here and further, instead of ^Z : [ Z B` we write ^Z : [ B` or ^[ B`) we obtain the formula P^x1 [ d x2 ` P^[ d x2 ` P^[ d x1` F[ x2 F[ x1 .
(4)
Similarly, using a forward (decomposition) f
^[ x` ¦ ® x 1 [ d x 1 ½¾ , n 1 n¿ ¯ n 1
(4) and the axiom of countable additivity (axiom P3), we can write for the event [ x ^ ` probability P ^[ x`
f
¯
¦ P ®x n 1
N ª 1 1½ 1· § [ d x ¾ F[ x 1 lim ¦ « F[ ¨ x ¸ N of n 1 n¿ n¹ n 2¬ ©
1 ·º § F[ ¨ x ¸ n 1 ¹ ¼» ©
so
1· § lim F[ ¨ x ¸ N of N¹ ©
F[ x 0 .
P^[ x` F[ x 0 .
(5)
Further, using the definition (3) and properties (4)  (5), one can easily verify the validity of the following formulas: P^[
x` F[ x F[ x 0 ,
P^x1 d [ d x2 ` F[ x2 F[ x1 0 ,
(6)
P^x1 [ x2 ` F[ x2 0 F[ x1 ,
P^x1 d [ x2 ` F[ x2 0 F[ x1 0 .
We now note that, according to Lebesgue's theorem (Chapter II, §3, item 3.1, Theorem 4), the distribution function of any random variable, like any distribution function, can be uniquely represented as the sum of three «pure» distribution functions: discrete, absolutely continuous and singular. And, in accordance with Theorem 1 proved above and the definition of a random variable, we find that there are only three 138
types of random variables (corresponding to three kinds of «pure» distribution functions): discrete, absolutely continuous, and singular random variables. We now discuss in more detail the «pure» forms of random variables and their examples. 1.1. Discrete random variables Let [ be a random variable and its distribution function F[ x is a discrete distribution function (see Chapter II, §3, point 3.1). Then the set of points of discontinuity of this function C F[ ^x1 , x2 , ... ` and for points of discontinuity xk , P[ ^xk ` P^[
pk
xk ` F[ xk F[ xk 0 ,
that is, the law of distribution P[ of a random variable [ is concentrated on the set C F[ and
¦ pk 1 . It is clear that for such a random variable the probability of its k occurrence in the Borel set B E R can be found from P[ B P^[ B`
¦ P^[
i: xiB
xi `
¦ P[ ^xi ` ¦ pi .
i: xiB
(7)
i:xiB
From (7), as a special case, we find that the distribution function of a random variable [ is determined by the formula F[ x
P^[ d x`
¦ P^[
i : xi d x
xi `
¦ P[ ^xi ` ¦ >F[ xi F[ xi 0 @.
i : xi d x
i : xi d x
If for a random variable [ its distribution law P[ is determined by the relation (7), then such a random variable is called a discrete random variable. In other words, a discrete random variable is a random variable whose set of values is a finite or countable set (therefore, usually a random or finite number of values is called a discrete random variable): [ : : o X ^x1 , x2 ,...` R . In this case, a random variable that takes a finite number of values is called a simple random variable. We note that any numerical function defined on a discrete probability space is a discrete random variable because for discrete probability spaces : ^Z1 ,Z2 ,...`, ࣠ ൌ{ A ǣ A ك: }, [ : : o X ^[ Z1 , [ Z2 ,...` and for any B E R ,
[ 1 B
^Z : [ Z B ` ࣠, ¦ P ^x ` i
139
[
i
1.
Examples of Discrete Random Variables 1. The event indicator. For the event ࣠ א ܣwe define the indicator of the event А by the relation: 1, Z A , I A Z ® ¯ 0 , Z A.
(8)
Then for any Borel set B E R
I A1 (Z )
^Z : I A (Z ) B`
, ° A, ° ® ° A, ° ¯:,
if if if if
^0` B, ^1` B, ^0` B, ^1` B, ^0` B, ^1` B, ^0` B, ^1` B.
Therefore, for any event ࣠ א ܣwe have I A1 (Z ) ࣠, therefore (the indicator of event A) I A Z is a random variable. Obviously, for the indicator
P ^Z : I A Z 1`
P A ,
P^Z : I A Z 0` P A 1 P A .
It is clear that if ࣠ ב ܣ, then I A Z is not a random variable. 2. A Bernoulli random variable is a random variable that takes only two values, zero or one, with probabilities:
P^[ 1` p , P^[
0` 1 p , 0 p 1.
3. Binomial random variable. A binary random variable with parameters n and p is a random variable that takes values 0,1,2,..., n (of all n 1 values) with probabilities
P^[
k` = Pn k Cnk p k 1 p
n k
, k = 0,1,…n,
where 0 p 1, n is a nonnegative integer. A binomial random variable [ with parameters n, p is symbolically written in the form [ ~ Bi n ; p . For example, a random variable, defined as the number of successes in n independent Bernoulli trials with probability of success p, is a binomial random variable with parameters n, p . 140
4. A geometric random variable [ with a parameter p is a random variable that takes nonnegative integer values with probabilities
P^[
k`
1 p
k 1
p , k 1,2,..., 0 p 1 .
Example. Suppose that a sequence of independent tests is carried out, and as a result of each test, an event with probability p may appear. Then a random quantity [ equal to the test number giving for the first time the event that we are interested in is a geometric random variable with a parameter p . 5. Hypergeometric random variable. Suppose that k balls are extracted from an urn containing n white and m black balls. Denote by [ he number of white balls from the selected balls. Then the distribution of the random variable [ is a hypergeometric distribution (Chapter I, §2, paragraph 2.3, formula (6)):
P ^[
s`
Pn,m k ; s
CnsCmk s , s 0,1,2,...,min n, k . Cnk m
(8)
A random variable whose probability distribution (distribution law) is defined by a set of probabilities (8) is called a hypergeometric random variable. 6. Poisson random variable. If the distribution law of a nonnegative integer with a random variable is defined by formulas
P ^ξ
k` e λ
λk k!
πk (O ) k 0,1,2,... ,
(9)
where O ! 0 , then such a random variable is called a Poisson random variable with the parameter λ and symbolically it is written as [ ~ 3O . It's obvious that
f
πk O ! 0 and ¦ S O
1.
k
k 0
Sometimes it will be convenient for us to specify discrete random numbers in the form of a table (in Table 1 xi z x j , i z j, ¦ pi 1 ): i
Table 1 Values [ The corresponding probabilities P
x1 p1
x2 p2
...
xn
...
...
pn
...
As an example, we construct an approximate graph of the distribution function of a random variable [ taking the values x1 , x2 , x3 , where x3 0 , x1 ! 0 , x 2 ! x1 with the corresponding probabilities p1 , p2 , p3 , p1 p2 p3 1 (see Figure 1):
141
F[ x p1 p2 p3
1
p1 p3
p3 x3
x1
0
x2
x
Fig. 1
1.2. Absolutely continuous random variables If for a random variable [ is a nonnegative function f x the distribution function is written as the integral
f[ x t 0 such that
x
F[ x
³ f[ y dy , x R ,
(10)
f
then such a random variable is called an absolutely continuous random variable, and the function f[ x is called the distribution density of a random variable [ . The term absolutely continuous random variable is based on the fact that the function, representable in the form (10), is an absolutely continuous function (Chap. II, §3, p. 3.1). Remark 3. In this text, the integral in (10) will be understood everywhere as an (improper) Riemann integral, in the general case this integral will have to be understood as the Lebesgue integral (see Chapter IV). It follows from the definition that the distribution density of a random variable f[ x satisfies the condition f
³ f[ x dx 1 ,
(11)
f
because lim F[ x F[ f 1 ; xnf
If the point х is a point of continuity f[ x , then for sufficiently small 'x we have
P^x [ d x ' x`
f[ x ' x o ' x , 'x o 0 .
It also follows from the definition (10) that at points of continuity f[ x , 142
F[c x
f[ x ,
as well as for any a b P ^a [ d b`
P[ a, [email protected]
F[ b F[ a
b
³ f[ x dx .
(12)
a
From the last formula, we obtain that for an absolutely continuous random variable [ the probability that it takes some fixed value is always zero: P^[ a` 0 , a R . From this it follows that for any interval I ( I a, b , I > a, b , etc.) the formula P ^[ I ` P[ I ³ f[ x dx . (13) I
In general, in the case of an absolutely continuous random variable [ , for the probability of its occurrence in any Borel set B E R , the formula P ^[ B`
³ f[ x dx .
(13′)
B
Formula (13 '), which in the general case is understood as the Lebesgue integral, is a consequence of formula (12) and the fact that the set of intervals is an algebra, and the smallest sigmaalgebra containing J ^I : I a, [email protected]` coincides with the sigma
algebra of Borel sets ( V J E R ) and theorems on the extension of the probability (the Caratheodory theorem) from an algebra to the smallest sigmaalgebra containing it. Examples of absolutely continuous random variables 1. A random variable a b is uniformly distributed over > a, [email protected] . Let a random point >a, [email protected] be placed on the segment. We construct the probability space ሺ : ǡ ऐǡ ሻ as follows: : > a, [email protected] , ࣠ ^B >a, [email protected] : B E R ` – is the sigmaalgebra of Borel sets on
> a, [email protected] and take the coordinate of the random variable >a, [email protected] as a random variable: [ Z Z , Z > a, [email protected] . This function is a measurable function, i.e. a random variable.
In this way a certain random variable: If x a , then ^[ d x` (an impossible event) and F[ x P^[ d x` 0 . If x > a, [email protected] , then the event ^[ d x` means the point falls on the interval > a, x @ , then F[ x P ^[ d x`
xa , a d x db. ba
If x ! b , then the event ^[ d x` is a valid event and P^[ d x` 1. 143
In this way F[ x
0, x d a , ° xa ° , a d x d b, ® °ba °¯ 1, x t b .
The corresponding distribution density is f[ x
1 , a d x d b, ° ® ba ° 0, x > a, b @. ¯
The random variable [ defined above, i.e. a random variable [ with a distribution density f[ x (see the last formula) is called a random variable uniformly distributed on a segment > a, b @ , and in what follows it will be symbolically written as
[ ~ R a, b or [ ~ U a, b .
2. An exponential random variable with a parameter λ (parameter λ > 0) is defined as a random variable with a distribution function
F[ x
1 e O x , x t 0, ® ¯ 0, x 0
The fact that this function satisfies the properties F1, F2, F3 is obvious. The distribution density of an exponential random variable is
Oe O x , x t 0, f[ x ® ¯ 0, x 0. 3. A normal (Gaussian) random variable with parameters (a,V 2 ) is defined as a random variable with a distribution function F[ x
1 V 2S
x
³e
y a 2 2V 2
dy ,
(14)
f
where the parameters a R , 0 V f. Let us show that the function defined by (14) is indeed a distribution function. The property F3 is a consequence of the continuity of functions F[ x defined by formula (14). The properties F1, F2 follow from the nonnegativity of the integrand in the integral (14) and from the fact that 144
1 V 2S
f
³
e
xa
x a 2 2V 2
f
dx
V
y
1 2S
x o rf y o rf
f
³
e
y2 2
dy 1 .
f
(the last integral is the Poisson integral known from the analysis course). So, the normal random variable is determined by two parameters a and V 2 : a R , 0 V f. If a 0 , V 1 , then such a random variable is called the standard normal (Gaussian) random variable. In what follows, we will write the normal random variable with parameters (a,V 2 ) symbolically in the form [ ~ N a,V 2 . The density of the distribution of a random variable [ ~ N a,V 2 is the function f[ x
1 e V 2S
x a 2 2V 2
.
(14')
The function f[ (x) takes the maximum value at a point x a , the points x a r V are inflection points; with x o rf the abscissa axis being the asymptote of this function. The graph of the function is symmetric with respect to the vertical line x a . We also note that V as the function decreases, the maximum of the function begins to increase and the compression of the function to the yaxis increases. Whence, for example, the probability of a random variable to fall into an interval D , D , D ! 0 equal to pD
D
³ f[ ( x)dx
is higher, V is smaller. Hence, the parameter V can be
D
considered as a characteristic of the scatter of values of a random variable. Traditionally, the distribution density of a standard normal random variable is usually denoted by M0,1 x or M x (the M x function graph is shown in Figure 2):
M 0,1 x M x
Fig. 2 145
x2
1 2 e . 2S
(14»)
The distribution density of a random variable [ ~ N a,V 2 is usually denoted by
Ma ,V 2 x :
1 § xa· M . V ¨© V ¸¹
f[ x Ma ,V 2 x
(14»')
The distribution function of a random variable [ ~ N 0,1 is denoted by Φ0,1 x
or Φ x , and usually this function is called the Laplace function or the error integral. Here we draw attention to the fact that in some textbooks the Laplace function is called the function
Φ0 x
x2
x
1 x 2 ³ e dx . 2S 0
³ M x dx 0
We also note that the functions Φ x and Φ0 x are connected by the relation 1 2π
x
³e
x2 2
dx Φ x
f
1 Φ0 x 2
x2
1 1 x 2 ³ e dx . 2 2π 0
4. The random variable distributed according to Cauchy's law with the parameter ߠ, T ! 0 is defined as a random variable with the distribution function F[ x
1
S
T
x
³ T 2 y 2 dy
.
(15)
f
It is easy to verify that the function defined by the integral (15) satisfies all the requirements F1F3 of the distribution function. The distribution density of such a random variable is f[ x
T
S T x 2 2
, xR
.
(15')
Symbolically, such a random variable is written in the form [ ~ K T . Recall that in Chapters II, §3, item 3.1, Tables 1 and 2, we gave a number of other examples of discrete and absolutely continuous distributions (note that the random variables corresponding to these distributions have the same names). Remark 4. Random variables corresponding to singular distribution functions (Chap. II, §3, item 3.1) are called singular random variables. But we will not consider questions concerning singular random variables in this textbook. In addition, according to generally accepted traditions, we usually use the abbreviated term a «continuous random variable» instead of the term an «absolutely continuous random variable». 146
1.3. Equivalent definitions of a random variable By the general definition of a random variable (see Definition 1), in order to verify that some function [ : : o R is a random variable, we must verify that condition (1) holds for any Borel set B E R . But checking condition (1), generally speaking, is a difficult task. Therefore, it is important to find other equivalents of (1), but more or less easily verifiable definitions of a random variable. The following lemma shows that it is possible to narrow the families of Borel sigma algebras in the definition (1). Lemma. Let E be a system of sets on a number line such that V E = β R . Then, for a function [ :: o R to be a random variable, it is necessary and sufficient that the condition fulfills:
[ 1 E
^Z :[ Z E `࣠,
E E .
(16)
Evidence. Necessity is obvious. Adequacy. We define a system of sets D { D β R : [ 1 D ࣠ }.
The system D forms an σalgebra. This statement is a consequence of the fact that the operation of «taking the preimage» preserves the settheoretic operations of union, intersection and complementation: § · BD ¸ ¹ ©D
[ 1 ¨
D
[ 1 BD ,
§ · BD ¸ ¹ ©D
[ 1 ¨
D
[ 1 BD ,
[ 1 B [ 1 B ). Consequently E D E R . From whence E R V E V D D E R ,
Those D E R , as required. Consequence. In order for a function [ is necessary and sufficient that for any x R :
ז [ Z : : o R be a random variable, it
^Z : [ Z d x`࣠, or
^Z : [ Z x`࣠ǡ
or 147
(17)
^Z : [ Z ! x`࣠ǡ or
^Z : [ Z t x`࣠Ǥ
The proof of the corollary is immediately obtained from the lemma proved and from the fact that any of the systems of sets E1 E3
^x : x d c , c R`, ^x : x ! c , c R`,
E2 E4
^x : x c , c R` , ^x : x t c , c R`,
generates a Borel sigmaalgebra β R (Chapter II, §2, item 2.2), that is V E1 V E 2 V E 3 V E 4
E R .
ז
Let us note that in many textbooks ([10], [11], [25]) one of the statements (17) is taken from the very beginning for the definition of a random variable, usually one of the first two assertions) of the proved corollary. 1.4. Functions of (one) random variable The following theorem enables the construction of new random variables as a function of given random variables. Theorem 2. Let M M x be a Borel function, that is M :R o R and for any B E R , M 1 B E R and [
[ Z is some random variable.
Then the complex function (superposition) K Z M [ Z is also a random variable. Proof. The proof follows immediately from the fact that for any B E R
^Z : K Z B` ^Z :M [ Z B` ^Z : [ Z M 1 B `࣠, because M 1 B E R , and [ is a random variable.
ז
Thus, if [ is a random variable, then functions such as [ , [ n
[
min [ , 0 , [ are also random variables.
max [ , 0 ,
1.4.1. Distributions of functions of a random variable Let [ [ Z be a random variable, g g x a Borel function. We pose the following question: knowing the law of the distribution of a random variable [ , how can we find the distribution law of a random variable K g [ ? In 148
particular, knowing the function or the distribution density [ , how can we find a function or a distribution density K g [ ? If [ is a discrete random variable, then K g [ , where g g x is any (not necessarily Borel) function, and also a discrete random variable and, since,
^Z : K Z
y j`
^Z : g [ (Z)
y j`
i:g xi y j
P ^K
y j`
¦
i:g xi y j
P ^[
xi `
g [ is given by
then the law of distribution of a random variable K PK ^ y j `
^Z : [ Z
xi `
¦
i:g xi y j
P[ ^xi ` .
(18)
By definition, the distribution function of a random variable K g [ is the function FK x P ^K d x` P ^ g [ d x` P ^[ g 1 f, x @ ` . (19) If in addition the function g x is a monotonically nondecreasing function, then the inverse function g 1 x and FK x
P ^[ d g 1 x `
F[ g 1 x .
(20)
Hence, in particular, if F[ x is a continuous function, then the random variable
K F[ [ is uniformly distributed over > 0,[email protected] a random variable: FK x F[ F[1 x x ,
0 d x d 1. Conversely, if K is a random variable uniformly distributed on > 0,[email protected] , and F x is a given continuous distribution function, then the distribution function of a
random variable [
F 1 K is a function F x :
F[ x
P ^F 1 K d x`
P ^K d F x ` F x .
Thus we have obtained a method for constructing random variables with preassigned distributions using random variables uniformly distributed on > 0,[email protected] . If the function g x in addition is also differentiable, and the random variable [ has a distribution density f[ x , then for a random variable K g [ there also exists a distribution density fK x and this density can be found as follows: fK x
c F[c g 1 x g 1 x
c f[ g 1 x g 1 x 149
f[ g 1 x
g c g 1 x
.
(21)
We give some examples of application of formula (21). If g x a V x , V ! 0 , then f a V[ x
§ xa· f[ ¨ . V © V ¸¹ 1
From this, for example, we get: if [ ~ N 0,1 , то K a V [ ~ N a,V 2 . If g x
x3 , then f[ 3 x
If g x
1 3
3 x
2
f[
x . 3
x 2 , then for x ! 0 1
f [ 2 x
for x 0 have f[ 2 x
2 x
> f[ x f[ x @,
0.
For example, if [ ~ N 0,1 , we get x
1 e 2 ( x ! 0 ); 2Sx
f [ 2 x
f[ 2 x 0 ( x 0 ).
1.4.2. Structures of measurable functions Theorem 3. We denote by ࣠క the system of sets [ 1 B , B E R : ࣠క
^[
1
B : B E R ` ^^Z :[ Z B`:
B E R ` .
Then ࣠క is a σalgebra. Proof. Let us verify the properties A1, A2', A3 of the definition of σalgebras (Ch. II, §1, item 1.1, Definition 2): A1. : ࣠ א. It is obvious; A2'. If Cn [ 1 Bn , Bn E R , Сଵ ǡ Сଶ ǡ ǥ ࣠ אక , then f f §f · Cn [ 1 Bn [ 1 ¨ Bn ¸ ࣠క ; n 1 n 1 ©n 1 ¹ 150
A3. If C [ 1 B ࣠క , B E R , then C
[ 1 B ࣠క .
ז
σalgebra ࣠క is called σalgebra generated by a random variable [ . According to Theorem 2, if [ is a random variable, M M x the Borel func
tion, then K Z M [ Z is a random variable. It turns out that it is even an
࣠క – measurable random variable. This assertion follows from the following relations:
K 1 B
^Z : K Z B` ^Z : M [ Z B`
^Z : [ Z M
1
( B)` [ 1 M 1 B ࣠క .
The opposite result turns out to be true. Theorem 4. If K is a ࣠క – measurable random variable, then there is M M x a Borel function such that K Z M [ Z . ~ Proof. Let )[ be a class of all ࣠క – measurable functions K K Z , ) [ a
class of ࣠క – measurable functions, representable in the form M [ Z , where M is ~ ~ some Borel function. It's obvious that )[ )[ . We will show that in fact ) [ ) [
~
Let A ࣠క and K Z I A Z . We will show that K ) [ .
Indeed, if A ࣠క , then there is B E R such that A We introduce the function
^Z : [ Z B` [ 1 B .
1, x B , ¯ 0, x B.
F B x ®
~ Then I A Z F B [ Z ) [ . It follows that any simple ࣠క – measurable funcn ~ tion ¦ ci I Ai Z , Ai ࣠క also belongs to the class ) [ . i 1
Now let K be an arbitrary ࣠క – measurable function. Then, by Theorem 6 (proved below), there is a sequence of simple ࣠క – measurable random variables ^K n ` such that K n Z o K Z , n o f , Z : . As it has just been established, there exist Borel functions M n M n x such that K n Z M n [ Z , in addition M n [ Z o K Z , n o f, Z :. ½ We denote this B ® x R : lim M n x exist ¾ . This set is Borel, so the n of ¯ ¿ function ° lim M n x , x B , M x ® n of ° x B ¯ 0, 151
is also Borel. But then, obviously, K Z limM n [ Z M [ Z for every Z : , n
~ hence, ) [
)[ .
ז
1.5. The class of extended random variables and the closure of this class with respect to pointwise convergence If [1 , [ 2 , ... are sequences of random variables defined on the same probability f
space, then using them one can construct new functions, for example
¦ [k k 1
, lim [ n , n
lim [ n , etc. These functions, generally speaking, take values already in the extended n
number line R > f, [email protected] . Therefore, it is advisable to extend the class of random variables (࣠measurable quantities), assuming that they can take values r f . Therefore, it is reasonable to give such a definition. Definition 4. A function defined on a measurable space ሺ : ǡ ࣠ሻ taking values in R > f, [email protected] and for any Borel set B E R satisfying property (1) is called an extended random variable [ [ (Z) . The following theorem plays a key role in the construction of the Lebesgue integral (Chapter IV). Theorem 6. a) For any (including extended) random variable [ [ (Z) there is a sequence of simple (that takes only a finite number of values) random variables [1 , [2 ,... such that [n d [ for all Z : [n Z o [ Z ( n o f) . b) If in addition [ Z t 0 , then there is a sequence of simple random variables
[1 , [2 ,... such that 0 d [n Z n [ Z n o f for all Z : . Proof. b) We set
[ n Z
k 1 I k 1 Z nI^Z:[ Z !n` Z , n 1,2,..., k ½ n ® n [ Z d n ¾ 1 2 2 ¿ ¯2
n2n
¦ k
Then it is immediately verified that the sequence [ n is such that 0 d [n Z n [ Z n o f for all Z : . a) The assertion of this subsection follows from the statement of b) and from the fact that any random variable [ can be uniquely represented in the form [ [ [ , where [ max ^ [ ,0 ` , [ min ^[ ,0` . ז Theorem 7. If [1 , [ 2 ,... are a sequence of extended random variables, then the functions sup [ n , inf [ n , lim [ n , lim [ n are also (possibly, extended) random variables. n
n
n
n
152
The proofs follow from the fact that
^Z :sup [ n Z ! x` ^Z :[ n Z ! x` ࣠, n
^Z :inf [ n Z x` ^Z :[ n Z x`࣠, n
lim [ n
inf sup [ m ,
n
n
mt n
lim [ n
supinf [ m .
n
ז
mt n
n
Theorem 8. If [1 , [ 2 , ... are a sequence of extended random variables,
[ Z lim[n Z , Z : , nof
then the limit function [ Z is also an extended random variable. Proff.
^Z :[ Z x` ^Z :lim[ n Z x` ^Z : lim[ n Z lim[ n Z ` ^Z : lim[ n Z x` : ^Z : lim[ n Z x` ^Z : lim[ n Z x` ࣠ .
ז
Theorem 9. If [ , K – are extended random variables, then the functions [ r K , [ K , [ K are also (extended) random variables under the assumption that they are defined, i.e. there is no uncertainty as f f,
f 0 , . f 0
Proof. Let ^[n ` , ^Kn ` be sequences of simple random variables that converge to [ and K , respectively. Then:
[ n r Kn o [ r K ,
[ n K n o [ K ,
[n
1 n
Kn I^K
o n
0`
[ . K
Consequently, by Theorem 8, as limits of sequences of simple random variables, ז the limit functions [ r K , [ K , [ K are also (extended) random variables. 1.6. Tasks for independent work 1. The law of distribution of random variables [ is given by probabilities P ^[
k`
c , k (k 1)(k 2)
153
k 1, 2,...
а) find the constant c ; b) find the probability P^[ ! 3`. 2. Let the function F (x) be a function of the distribution of a random variable [ . How determined the distribution function G (x) of the random variable
1 ([ [ ) ? 2
3. A random variable [ is defined on a probability space ሺ : ǡ ࣠ǡ ܲሻ, where : =[0,1], ࣠ ൌ ߚሺሾͲǡͳሿሻ and P is the Lebesgue measure (as a probability). Find the distribution function of a random variable [ if: c [ (Z ) Z D , D 0; d) [ (Z ) sin SZ; 1 1 1 ° Z, 0 d Z d 3 , °4 , 0 d Z d 4 , 1 ° 0dZ d , ° ° 2Z , 2 f) [ (Z ) °® 1, 1 Z d 2 , j) [ (Z ) °® 1, 1 Z d 3 , e) [ (Z ) ® 1 3 3 4 4 °2(1 Z ), ° ° Z d1 ; 2 ¯ °1 , 3 Z d 1 . ° 3 2 °¯ 4 °Z , 3 Z d 1 ; 4 ¯ 4. Let ሺ : ǡ ࣠ǡ ܲሻ be a probability space, [ (Z ) : : o R and the following functions be random variables on this probability space: d) e [ ; e) [ [ f) [[ ] ([ a] a the whole part), а) [ 2 ; b) [ ; c) cos [ ; а) [ (Z ) Z ;
b) [ (Z ) Z 2 ;
Is then [ a random variable? 5. Prove that, if [ , K are random variables, then the sets : A {Z : : [ (Z) K (Z )}, B {Z : : [ (Z ) K (Z )}, C {Z : : [ (Z) d K (Z)}
are events. 6. : [0,1], ࣠ ൌ ߚሺሾͲǡͳሿሻ, P – is the Lebesgue measure on >0,[email protected] and the random variable [ is defined as follows:
а) [ Z
1 ª 1· ° 4 , Z «0, 4 ¸ , ¬ ¹ ° °1 1 3 ª · ® , Z « , ¸, ¬4 4 ¹ °2 ° ª3 º ° 1, Z « ,1» . ¬4 ¼ ¯
b) [ Z
Z 2
,
c) [ Z
1 . 2
Describe the sigmaalgebra that is affected by a given random variable. 7. Suppose that there are n balls with numbers 1,2,..., n in the urn. The balls are taken out of the urn one at a time with a return until one ball is removed the second time. Denote by [ the extraction number on which the event occurred. Prove that then P^[
k` (k 1)!Cnk 1.
k 1 , k nk
2,3,..n 1 .
8. [ is a symmetrically distributed (i.e. [ and [ are identically distributed random variables) random variable that A is symmetric with respect to zero (that is if x A , then x A ) the Borel set. We define a new random variable K by the relation: 154
[ , егер [ A, K ® ¯ [ , егер [ A. Prove that [ and K are identically distributed random variables. 9. The random variable [ is uniformly distributed on [–π, π]. Find the distribution density of a random variable K cos [ . 10. [ is an exponential random variable with a parameter D . Find the distribution densities of the following random variables: а) K1
[;
b) K 2
[2;
c) K3
1
D
ln [ ; d) K 4
e D[ .
11. The random variable [ is uniformly distributed on >0,[email protected] . Find the distribution densities of the following random variables: а) K1 2[ 1 ; b) K2 ln 1 [ ; 3 d) K4 ln [ . c) K 3 [ ;
12. Lognormal distribution. If ln [ ~ N a,V 2 , then the law of distribution of a random variable
[ is called the lognormal distribution, and the random variable [ itself is called the lognormally distributed (lognormal) random variable. Find the density of the distribution of the lognormal distribution. 13. The distribution density of a random variable [ is
f [ x cx 4 , x t 1;
f [ x 0, x 1.
Find: a) a constant c ; b) the distribution density of a random variable K 1 [ ; c) probability P ^0,1 K d 0,3` . 14. [ is a random variable distributed according to the Cauchy law with a parameter T 1 (see formula (15′)): [ ~ K 1 . Find the distribution densities of the following random variables: 1 [2 а) [1 ; b) [ 2 ; 1 [ 2 1 [ 2 1 c) [ 3 ; d) [ 4 arctg[ . [ 15. [  geometric random variable with parameter p : [ ~ G p , i.e. P^[
k ` q k 1 p, q
1 p, k
1,2,... .
Find the law of distribution of a random variable K
[ 2
>1 1 @ ? [
16. [ ~ N 0,1 , K [ I^ [ d1` [ I^ [ !1` , where I A is an indicator of event A . а) Prove that K ~ N 0,1 , i.e. K is a standard normal random variable. b) Is [ K a normal random variable?
155
§2. Multidimensional random variables 2.1. Multidimensional random variables and their distributions. Marginal distributions Let [1 , [ 2 , ... , [ n be random variables defined on the probability space ( : ǡ ࣠ǡ ܲሻ. For each Z : of these random variables, there ia a vector [ Z
[1 Z , [ 2 Z , ..., [ n Z
which is called a random vector or a multidimensional (n – dimensional) random variable. The mapping [ : : o Rn defined by means of a random vector [ [ Z is a measurable (࣠ – measurable) mapping: for any B1 u ... u Bn
[ 1 B
B E Rn
^Z : [ Z B` ^Z : [1 Z ,[ 2 Z ,...,[ n Z B` n
^Z : [ i Z Bi ` ࣠Ǥ
(1)
i 1
Therefore, for each n – dimensional Borel set B1 u B2 u ...Bn can define the set P[
B E R n , we
P[1 ,[2 ,...,[n function as follows:
P[1 ,[2 ,...,[n B1 u B2 u ... u Bn P ^Z : [ Z B `
^
P[ B
`
P [ 1 B
P ^[1 B1 , [ 2 B2 ,..., [ n Bn ` .
(2)
P[ is the probability function on R n , E R n (the proof is analogical to the proof in the homogeneous case). So, any multidimensional random variable [
[1 ,[ 2 ..., [ n
generates a new probability space Rn , E Rn , P[ , this probability space is called
a generated multidimensional random variable [ [1 ,[ 2 ,... ,[ n , a probability space P[ P[1 ,[ 2 ,...,[ n , and the probability measure is called the distribution law of a
multidimensional random variable (random vector) [ . From (1) for any x
x1 , x2 ,... , xn R n , we obtain
^Z : [ Z d x` ^Z : [1 Z d x1 ,[ 2 Z d x2 ,...,[ n Z d xn ` ࣠, which allows us to speak about the probability of an event ^Z : [ Z d x` . 156
The relation F x1 , x2 , ... , xn
F[ x
F[1 , [2 , ... , [n x1 , x2 , ... , xn
P^Z : [1 Z d x1 , [ 2 Z d x2 , ..., [ n Z d xn `
(3)
function F[ x F[1, [ 2 , ... , [ n x1, x2 , ... , xn is called the joint distribution function of random variables [1 , [ 2 , ... , [ n . This function does not decrease with respect to the set of arguments (property FF1), is continuous from the right along the set of variables (property FF3); F f, f,... , f 1; if at least one of the coordinates of the vector y y1 , y2 ,..., yn takes a value ( yi f at least for one i ), then lim F x1 , ... , xn 0 (property FF2). In xp y
addition to the above, this function has the following property of nonnegative definiteness: FF4. ' a ,b ...' a ,b F x1 , x2 ,..., xn t 0 , where the difference operator 1
1
n
n
' ai ,bi : R n o R ( ai d bi , i 1,2,..., n )
is defined by ' ai ,bi F x1 ,..., xn
F x1 ,..., xi 1 , bi , xi 1,..., xn F x1,..., xi 1, ai , xi 1,..., xn .
The first three properties of FF1, FF2, FF3 are proved in exactly the same way as properties F1, F2, F3 in the onedimensional case. Let us prove the property FF4. For simplicity, we consider only the case n 2 Then ' a1 ,b1 ' a2 ,b2 F x1 , x2
' a1 ,b1 ª¬ F x1 , b2 F x1 , a2 º¼
F b1, a2 F a1, a2
F b1 , b2 F a1 , b2
P^[1 d b1, [2 d b2 ` P ^[1 d a1 , [2 d b2 `
P ^[1 d b1 , [ 2 d a2 ` P ^[1 d a1 , [ 2 d a2 ` P ^a1 [1 d b1 , a2 [ 2 d b2 ` t 0
The general case is proved similarly, we only need to note that
' a1 ,b1 ...' an ,bn F x1 , x2 ,..., xn
P ^a1 [1 d b1 ,..., an [n d bn ` t 0 .
(4)
Thus, the joint distribution function of any multidimensional random variable [ [1 , [ 2 ,..., [ n satisfies all the properties FF1FF4 of the multidimensional distribution function (see Ch. II, §3, item 3.2, Definition 3), i.e. it is a multidimensional distribution function (in the sense of the definition of the theory of functions). 157
Using the same arguments as in Theorem 1 of §1, we can prove the validity of the following theorem. Theorem 1. Let F x1 , x2 , ... xn be a multidimensional distribution function determined on R n . Then there is a probability space ሺ : ǡ ࣠ǡ ܲሻ and random variables defined on this probability space such that the given multidimensional distribution function F x1 , x2 ,..., xn is a joint distribution function of the random variables [1, [ 2 , ... , [ n : F[1 , [2 ,..., [n x1 , x2 ,..., xn
F x1 , x2 ,..., xn , x1 , x2 ,..., xn R n .
As in the one  dimensional case, the joint distribution function F[1 ,...,[n x1 ,..., xn
uniquely determines the distribution law P[ B P[1 ,...,[n B . Since for any i 1, 2,..., n ,
^Z : [i Z d f`
: in multidimensional distribution functions F[ x , which lets us
go to f some coordinates xi1 ,...., xik , we obtain a joint distribution function different
from the [ j1 ,..., [ jnk components: F[1 ,...,[n x1 ,..., xn
F[ j ,...,[ j
xi1 f .............. xik f
nk
1
x
j1
,..., x jn k , k 1,2,..., n 1,
where il z im ( l z m ), jl z jm ( l z m ) и ^ j1 ,..., jnk ` ^1,2,..., n` \ ^i1 ,..., ik ` . The righthand side of the last distribution relation is called the n k dimensional marginal distribution. We give examples of some marginal distributions: F[1 ,[2 x1 , f
F[1 x1 ,
F[1 ,[2 f, x2
F[1 ,[2 ,[3 x1 , f, x3 F[1 ,[3 x1 , x3 ,
F[2 x2 ,
F[1 ,...,[n 1 ,[n x1 ,..., xn1 , f F[1 ,...,[n 1 x1 ,..., xn1 . 2.1.1. Multidimensional discrete and absolutely continuous random variables If the components of a random vector [ [ i , [ 2 ,..., [ n are discrete random variables, then such a random vector (such a multidimensional random variable) is called a discrete random vector (a multidimensional discrete random variable). For a discrete random vector
[: : o X
^x x i
1
i
, xi ,..., xi
158
2
n
: i
`
1,2,...
And for p x
P^[
¦ p x 1,
xX
x` , x X , P[ B
P ^[ B`
¦ P ^[ x` ¦ p x .
xB
xB
As an example of a multidimensional discrete random variable, we can take a random vector with a polynomial distribution law (Chap. I, §2, p. 2.2). Let us describe how one can determine such a random vector. Let a series of n independent trials be conducted, with each test having r r t 3 outcomes. Suppose that as a result of each j test the outcome can appear with a probability p j ! 0,
r
¦ pj j 1
1 . Let us denote by [ j the number of j appearance of
1, 2,..., r ) in these independent tests. It is clear that [ j can take the values 0,1, .., n and [1 [2 ... [r n . Then, as was shown in Chap. I, §2.2.2, the law of
the go ( j
distribution of a random vector [1 , [2 ,..., [r is determined by a set of probabilities: P ^[1
n1 , [ 2
n2 ,..., [ r
nr `
n j t 0,
n! p1n1 p2n2 ... prnr , n1 !n2 ! ... nr !
r
¦nj
n.
j 1
As another example, we can take a random vector having a multidimensional hypergeometric distribution (Ch. II, §2, p. 2.3). If for a multidimensional random variable [ [1 , [ 2 ,..., [ n there exists a nonnegative function of n variables f[ x function is represented as the integral
F[1 ,...,[n x1 ,..., xn
f[1 ,...,[n x1 ,..., xn such that the joint distribution
x1
xn
f
f
³ ... ³ f[1 ,...,[n u1,..., un du1...dun ,
(5)
(the integral in (5) is here understood as the multidimensional improper Riemann integral, in the general case this integral must be understood as an integral in the Lebesgue sense), then such a multidimensional random variable [ [1, ... , [ n is called a multidimensional absolutely continuous random variable, and the function f[ x f[1 ,...,[n x1 ,..., xn is called a multidimensional distribution density of a random
vector [ [1 ,..., [ n or a joint density distribution of random variables [1 , [ 2 , ... , [ n , or simply a multidimensional density. As in the onedimensional case, and in the multidimensional case, instead of «multidimensional absolutely continuous random variable», as a rule, we omit the word «absolutely» and say «multidimensional continuous random variable». 159
We give some important properties of the multidimensional distribution density: 1. f[i ,...,[n xi ,..., xn
f[ x t 0 ,
³
Rn
f[ x dx 1 ,
(5')
and any function satisfying property (5') can be the distribution density of some multidimensional random variable. 2. At points of continuity of the distribution density w n F[1 ,...,[n x1 ,..., xn
f[1 ,...,[n x1 ,..., xn
wx1wx2 ...wxn
,
P ^x1 [1 d x1 'x1 ,..., xn [n d xn 'xn `
(6)
f[1 ,...,[n x1 ,..., xn 'x1...'xn o 'x1'x2 ...'xn , max 'xi o 0 . i
3. For any Borel set B E R n P[ B
P ^[1 ,..., [ n B`
P ^[ B`
³ f[ ,...[ x1 ,..., xn dx1...dxn
B
1
³ f[ x dx
B
.
(7)
n
Properties 1 and 2 follow directly from the definition of density. Let us prove property 3 (that is, formula (7)). a j , b j º¼ , then formula (7) is the relation following If B I1 u I 2 u ... u I n , I j
from (4)  (5) P^a1 [1 d b1 , ... , a n [ n d bn `
b1
bn
a1
an
³ ... ³ f [1 ,...,[ n x1 , ..., x n dx1 ...dxn .
For the general case, i.e. for Borel sets B E R n , the validity of (7) follows from the fact that the system of sets
J
n
^I1 u ... u I n : I j
(a j , b j ], j 1,..., n`
E R and the application of the theorem on the extension
forms an algebra, V J n
n
of the probability from an algebra to the σalgebra containing it. It is clear that (1) follows from (7), and (7) follows from (3) in property 3, thus (1) and (7) are equivalent: (1) (7). Thus, property (7) can be taken as another equivalent definition of the multidimensional distribution density. We give two very important examples of multidimensional random variables. 160
[
Example 1. Let the distribution density of a multidimensional random variable
[1 ,[ 2 ,... ,[ n be defined by the formula f[ x
1 n ° mes D , x D E R , ® ° 0, x D ¯
(8)
(where mes D is the «measure» of the set D : if n 1 , it is the length, if n the area, if n 3 it is the usual volume, etc.). In this case, for B E R n (by definition (7))
2 , it is
mes B D . mes D
P ^[ B`
A multidimensional random variable [ [1 ,[ 2 ,... ,[ n whose distribution density is defined by formula (8) is called a multidimensional random variable uniformly distributed in the region D of space R n . Example 2. Multivariate normal (Gaussian) random variable. The distribution density of such a multidimensional random variable is (Chapter II, §3, § 3.2, Example 2) f[1 ,...,[n x1 ,..., xn
1
2S
n 2
e
1 1 R x a , x a 2
,
(9)
det R
where a R n , R is a positive definite n u n matrix, det R is the determinant of the matrix R , R 1 is the inverse matrix of R , and for x, y R n the scalar product n
x, y ¦ xi yi . i 1
Further, the socalled multidimensional normal (Gaussian) random variable will be called a multidimensional normal (Gaussian) random variable with parameters a, R and we will write this symbolically in the form [ ~ N a, R . A random variable [ ~ N 0,V 2 E , E is a unit matrix called a multidimensional spherical normal random variable. From (9) for a random variable [ ~ N 0,V 2 E , we find that its distribution density is given by f[1 ,...,[n x1 ,..., xn
1 n
2S 2 V n
e
161
1 2V 2
n
¦ xi2 i 1
n
i 1
x2
1 2Vi 2 e . 2SV
(9')
Let us now consider this question: if a multidimensional distribution is known, how can we find distributions of smaller dimensions? For simplicity, we consider this question for the cases of twodimensional and threedimensional distributions. Suppose [ , K and ] are also onedimensional discrete random variables:
^x1, x2 ,...` ,
[: : o X
^ y1, y2 ,...` ,
K: : o Y ^z1, z2 ,...`
] ::oZ Тhen
¦ ^[
¦ ^K
xi `
i
and since for any event A , A :
^
zk ` : ,
k
A , we can write
xi `
xi ,K
`
xi ,K
yj ,
xi ,K
y j ,]
k
^
` ¦ P^[ i ,k z k ` ¦ P^[
PK
P^[
¦ ^]
¦ P^[ j y j ` ¦ P^[
P^[ P[
yj`
j
yj
xi , ]
xi ,K
`
y j ,]
xi ,K
`
zk ,
zk ,
`
y j ,]
z k , etc.
P ^[
xi ,K
j
We introduce the following notation:
pijk
P ^[
xi ,K
P ^[
pi
p jk
P ^K
y j ,]
xi ` ,
zk ` , pij P ^[
pik
zk ` ,
y j ,]
pk
xi , ]
P ^]
y j`,
zk ` ,
zk ` , etc.
Then, in the new notation pi
¦ pijk , p j ¦ pijk , j ,k
i ,k
pij
¦ pijk , etc. k
If we consider only two random variables [ , K and denote pij then
¦ pij 1, i, j
P ^[
xi `
pi
P ^[
xi ,K
y j` ,
¦ pij , P ^K y j ` p j ¦ pij . j
i
If [ , K are the absolutely continuous random variables and f[ ,K x, y is their joint distribution density, then the relation (5) 162
x f
F[ ,K x, f
³ ³ f[ ,K u, dud
f f
and from equality F[ x F[ x, f we obtain that [ is an absolutely continuous random variable with the distribution density f
f[ x
³ f[ ,K x, y dy .
(10)
f
Similarly f
f[ ,K x, y
fK y
³ f[ ,K ,] x, y, z dz ,
f f f
³ ³ f[ ,K ,] x, y, z dxdz .
(10')
f f
So, using the joint distribution (joint distribution law, joint distribution function, joint distribution density), it is always possible to find marginal distributions (that is, distributions of smaller dimensions). But the converse is not always true – with known marginal distributions, it is not always possible to restore the joint distribution Let us give some examples. Examples
1,1` and all elementary events are equally probable. We define random variables [1 , [2 as follows: 3. Let :
^ i, j : i, j
[1 [1 i, j i , [ 2
[ 2 i, j
j , i, j
1, 1 .
Then for any i, j : P ^[1
P ^[1
i`
P ^[1
i, [ 2
1` P ^[1
i, [ 2
P ^[ 2
j`
j`
1 , 4
1`
i, [ 2
1 , j 2
1 1 4 4
1 , 2
i 1,1 .
1,1 .
Note that in this example P ^[1
i, [ 2
j`
1 4
1 1 2 2
P ^[1
i` P ^[ 2
j` ,
i.e. according to the marginal laws of distribution of random variables one can find their joint distribution law. 163
4. Let [1 , [2 be defined as in the previous example 3. We now define the probabilities of elementary events as follows: P ^1,1 ` P ^ 1, 1 `
1 , P ^ 1,1 ` 2
P ^1, 1 `
0.
Then, for example, P^[1 1, [2 1` 0 , and therefore the joint distribution of the random variables [ 1 , [ 2 does not coincide with the joint distribution of the random variables in Example 3. But the onedimensional distributions coincide with the corresponding distributions in Example 3: P ^[1
P ^[ 2
1`
P ^[1
1, [ 2
1`
1`
P ^[ 2
1, [1
1`
1 , 2 1 , 2
P ^[1 1`
P ^[1 1, [ 2 1`
P ^[ 2
P ^[ 2
1`
1, [1 1`
1 , 2 1 . 2
But at the same time we note that in our example P ^[1
1, [ 2
1 1 2 2
1` 0 z
P ^[1
1` P ^[ 2
1` ,
i.e. the joint distribution is not restored through onedimensional (marginal) distributions. 5. The twodimensional random variable [ ,K is uniformly distributed in the circle D
^ x, y : x
2
y 2 d R2` .
Find marginal distributions of [ , K . Solution. By the condition of the problem, the joint distribution density is 1 , x, y D , ° 2 ® SR ° 0, x, y D . ¯
f[ ,K x, y
Then by formula (10) 1 dy ³³ S R2 x2 y 2 d R
f[ x
2 R2 x2 , R d x d R, ° ® S R2 ° 0, x > R, R @. ¯
f n y is defined similarly. If [ , K are evenly distributed in the region D
^ x, y : x
2
y 2 d R 2 , x t 0, y t 0` , 164
It is easy to verify, R2 x2 , 0 d x d R, ° . ® S R2 ° 0, x >0, R @ ¯
f[ x
6. Let [ , K be a twodimensional normal random variable. We show that then the components [ , K are also normal random variables. Solution. As we showed earlier (see Chapter II, § 3, § 3.2), the joint distribution density can be written in the form f[ ,K x, y
2 2 ° x a1 y a2 y a2 º ½° 1 ª x a1 exp ® « 2 2U »¾ , 2 V1V 2 V 22 »¼ ° 2SV1V 2 1 U 2 °¯ 2 1 U «¬ V1 ¿
1
(10'')
where V i ! 0 , U 1. In f[ x
f
³ f[ ,K x, y dy
we change the variables
f
z
resulting integral we introduce a new variable u
f[ x
1 2SV 1 1 U 2
e
x a1 2 2V12
f
³e
1
2 1 U 2
y a2
V2
z , afterwards in the
U x a1 . Тhen V1
u2
du
f
1 e 2SV 1
x a1 2 2V12
,
as a consequence of the wellknown Poisson integral, f
³
e
1
2 1 U 2
u2
du
1 U 2 2S
f
The formula obtained for density f[ x shows that [ is a normal random
variable: [ ~ N a1 ,V 12 . By finding the density fK y by a similar method we will be convinced that K ~ N a2 ,V 22 .
The result obtained in this example – the components of a twodimensional normal random variable are also normal random variables, remains valid even in the multidimensional case n t 3 (this will be shown in Chapter V). Remark 1. It turns out that the inverse result is not always valid – from the fact that each component of a random vector is a normal random variable, the normality of the random vector itself does not always follow. Let us give an example. 165
Example 7. The plane y x and y x and the coordinate axes divide the plane into eight regions E1 , E2 ,..., E8 as shown in Fig. 3. Let us now determine the distribution density of a twodimensional random variable (random vector) [ ,K as follows: 0, x, y E E1 E3 E5 E7 , ° f[ ,K x, y ® , ° ¯ 2M x M y , x, y E E2 E4 E6 E8 where x2 1 2 M x e 2S is the distribution density of the standard normal random variable. It is not hard to see that f[ ,K x, y is a twodimensional distribution density.
Fig. 3
If x t 0 , then f[ x
f
³
f
f[ ,K x, y dy
f
³
f[ ,K x, y dy
0
³ f[ ,K x, y dy =
f
0
f f
x
x
0
0
³ 2M x M y dy ³ 2M x M y dy 2M x ³ M y dy M x .
Simple calculations show that, for f[ x M x , and fK y M y , y R,
then each of the random variables [ , K is a standard normal random variable. But the
vector [ ,K is not a normal vector.
166
2.2. Independence of random variables Definition 1. If the random variables [1 , [2 ,..., [n are such that, for any Borel
sets B1 , B2 ,..., Bn E R
P ^[1 B1 , [2 B2 ,..., [n Bn `
P ^[1 B1` P ^[2 B2 ` ... P ^[n Bn `
(11)
then such random variables [ 1 , [ 2 , ... , [ n are called independent random variables. Recalling the definition of a σalgebra generated by a random variable (Paragraph 1.4.2) and independence of σalgebras (Chapter II, §4, 4.2.2) from the definition 1 we obtain the following criterion of independence (or equivalent definition independence) of random variables: Theorem 2. In order for the random variables [ 1 , [ 2 , ... , [ n to be independent it is necessary and sufficient for the σalgebra generated by them ࣠కభ ǡ ࣠కమ ǡ ǥ ǡ ࣠క to be σalgebras. The following theorem gives one more equivalent definition of the independence of random variables. Theorem 3. In order for [ 1 , [ 2 , ... , [ n to be independent random variables it is necessary and sufficient that their joint distribution function to be equal to the product of the distribution functions of individual random variables:
F[1 , [ 2 ,..., [ n x1 , x2 ,...,xn F[1 ( x1 ) F[ 2 ( x2 ) ... F[ n ( x n )
(12)
Proof. Necessity is obvious (from (11) follows (12), because the semiinfinite interval is a Borel set). Sufficiency. We introduce the following notation: a
a1, a2 ,..., an , b b1, b2 ,..., bn , ai bi , P[ a, [email protected] P ^Z : a1 [1 Z d b1 ,..., an [ n Z d bn ` , P[ ai , bi @ P ^Z : ai [i Z d bi ` i
If (12) is satisfied, then from formula (4) we obtain that P[ a, b @
n
n
ª¬ F[ bi F[ ai º¼ P[ ai , bi @ . i 1
i
In other words, for all intervals I i
i
i 1
ai , bi @ , i n
i
1,2,...,n ,
P^[1 I 1 , ..., [ n I n ` P^[ i I i `. i 1
167
Now fix I 2 , ..., I n and show that for any B1 E R n
P^[ 1 B1 , [ 2 I 2 , ... , [ n I n ` P^[ 1 B1 ` P^[ i I i `.
(11′)
i 2
Let ࣧ be a collection of all Borel sets for which the last relation holds. Then ࣧ obviously contains the algebra ࣛ of sets consisting of the sums of disjoint intervals of the form I1 a1 , b1 @ . Therefore ࣛ ߚ ك ࣧ كሺܴሻǤ From the countability of the (and hence the continuity) probability, it also follows that the system ࣧ is a monotonic class (Chap. II, §2, § 2.1). therefore ߤሺࣛሻ ߚ ك ࣧ كሺܴሻǡ where ߤሺࣛሻ contains the algebra ࣛ – the smallest monotonic class. But according to theorem 3 of Ch. ІІ, §2, item 2.1, ߤሺࣛሻ ൌ ߪሺࣛሻ ൌ ߚሺܴሻǡ from whence ࣧ ൌ ߚሺܴሻǤ By fixing B1 , B2 E R and I 3 ,..., I n , in the same way, we prove the validity of (11′) for I1 B1 , I 2 B2 . Continuing this process, we come to equality (11). ז From the definition of independence (12) we obtain the following criterion for the independence of (absolutely) continuous random variables. Corollary 1. In order for the absolutely continuous random variables [ 1 , [ 2 , ... , [ n to be independent random variables, it is necessary and sufficient for the following equality to be true f[1 , [2 ,..., [n x1 , x2 ,..., xn
f[1 ( x1 ) f[2 ( x2 ) ... f[n ( x n ) ,
(13)
where f[1 , [2 ,..., [n x1 , x2 ,...,xn is the joint density of the distribution of random variables
[1 , [2 , ..., [n , f[i ( xi ) is the distribution density of a random variable [ i i 1,2,..., n . Remark 2. In the general case, the equality of densities in (13) must be understood as an almost sure equality with respect to the ndimensional Lebegue measure (for more details see Chapter IV, §1, § 1.4.1). We also use Lemma 1 to show that in order for the components of a multidimensional normal random variable (Example 2) to be independent random variables, it is necessary and sufficient that the matrix R
rij
n, n i, j 1
is a diagonal matrix ( rij
0 ( i z j ), rii > 0, i = 1,2,…,n).
168
Corollary 2. In order that discrete random variables
^ x1, x2 ,...` , ^ y1, y2 ,...` ,
[1 : : o X [2 : : o Y
....................................
^ z1, z2 ,...` ,
[n : : o Z
were independent random variables, it is necessary and sufficient that for any xi X , y j Y , …, zk Z , P ^ [1
xi , [2
y j ,..., [n
zk `
P ^ [1
xi ` P ^ [ 2
y j ` ... P ^ [ n
zk ` .
(14)
Proof. Without loss of generality we consider the case of only two random variables. Necessity. Suppose that (12) holds. Obviously, for specific values xi , y j of discrete random variables [ 1 , [ 2 there is a rectangle containing only these values T
a1, b1 @ u a2 , b2 @ . Then, according to formula (11) P ^[1
y j`
xi , [ 2
P ^[1 , [ 2 T ` P ^[1 a1 , b1 @ , [ 2 a2 , b2 @` P ^[1 a1 , b1 @` P ^[ 2 a2 , b2 @` ,
in other words
P ^[1
xi , [ 2
y j`
P ^a1 [1 d b1 , a2 [ 2 d b2 ` P ^[1
P ^a1 [1 d b1` P ^a2 [ 2 d b2 `
xi ` P ^[ 2
y j `,
i.e. (14) holds. Sufficiency. Suppose that (14) holds. Then
F[1 ,[2 x, y
¦
i:xi d x j: y j d y
P ^[1
P ^[1 d x, [ 2 d y` xi ` P ^[ 2
y j`
¦
i:xi d x j: y j d y
¦ P ^[1
i:xi d x
P ^[1 xi `
xi , [ 2
¦
j: y j d y
P ^[ 2
y j` y j`
P ^[1 d x` P ^[2 d y` F[1 x F[2 y ,
that is, (12) holds. ז We now consider functions of independent random variables and prove the following theorem. 169
Theorem 4. a) If [ and K are independent random variables, M x , \ x are
Borel functions, then the random variables [1 M [ and K1 \ K are also independent random variables. b) If [1 , [2 ,..., [m , [m1,..., [n are the independent random variables M x1 ,..., xm and \ xm1 ,..., xn are the multidimensional Borel functions, then [ M [1 , [2 ,..., [m and K \ [m1 ,..., [n are also independent random variables. Proof. a) For any Borel sets B1 , B2 E R P ^ M [ B1 ,\ K B2 `
P ^[ M 1 B1 , K \ 1 B2 `
P ^[ M 1 B1 ` P ^ K \ 1 B2 `
P ^ M [ B1` P ^ \ K B2 ` ,
by the definition of a Borel function,
M 1 B1 E R ,
[
\ 1 B2 E R ,
[ and K are independent random variables. Case b) is proved in a similar way, here we only need to note that if [1,[2 ,...,[m is a multidimensional random variable, g g x1 , x2 ,..., xn is a
multidimensional Borel function, then K g [ g [1 , [2 ,..., [ n is a random variable. ז Remark 3. The results of this theorem will usually be used in the form of the statement «The functions of independent random variables are also independent random variables». At the same time, we note that any function of a random variable (or random variables) is not necessarily a random variable (given an example), and the Borel condition for a function M,\ in Theorem 4, according to Theorem 2 of § 1, ensures that M [ and \ K are also random variables. It follows from the definition of the independence of random variables that if [1 , [2 ,..., [n are independent random variables, then for any indices 1 d i1 i2 ...... ik d n , k 2,..., n , [i1 , [i2 ,..., [ik are also independent random variables (for the proof of this in condition (11) j ^i1 , i2 ,..., ik ` , it suffices to set B j R ). We introduce the concept of a sequence of independent random variables. Definition 2. Let [1 , [ 2 ,... be a given sequence of random variables defined on one probability spaceሺ : ǡ ࣠ǡ ܲሻ. If for arbitrary indices 1 d i1 i2 ... ik d n , n 2, 3,..., random variables [i1 , [i2 ,..., [ik are independent random variables, then a sequence of random variables [1 , [2 ,... is called a sequence of independent random variables. 170
Thus, the independence of a sequence of random variables implies the independence of any subsequence of random variables (in particular, of any finite number of random variables) of this sequence. For random vectors, the concept of independence is defined by analogy with (11). For independent random vectors the analogue of Theorem 4 is also true: «The functions of independent random vectors are also independent random vectors» (see Remark 3). The sequence of independent random vectors is also determined by analogy with a sequence of independent random variables. Now let us return to the examples of 2.1.1. from the point of view of the dependence and independence of the random variables considered there: in Example 3 random variables (components of a twodimensional random vector) are independent random variables; in examples 4 and 5 – components of random vectors, dependent random variables; in Example 6, the components of a random vector are independent normal random variables only when U 0 . 2.3. Functions of random variables We pose the following question: by the wellknown joint law (joint functions or density) of the distribution of random variables [1 , [2 ,..., [n and given Borel functions gi x1 ,..., xn , i 1, 2,...m defined on the same probability space, how can we determine the joint law (joint function or density) of distribution of the new random variables constructed from them
K1
g1 [1 ,..., [n , K2
g2 [1 ,..., [n , ..., Km
gm [1 ,..., [n ,
To understand the essence of the matter, let us consider a few particular (but important) cases. Suppose that g g x1,..., xn is a given Borel function and f[ ,...,[ x1 ,..., xn f x1 ,..., xn , 1
n
the n distribution density of a random variable [1 , [2 ,..., [ n , is known. Then for a random variable K
g [ 1 ,...,[ n and any Borel sets B, B1, B2 E R we can write:
P ^K B`
P ^ g [1 ,..., [ n B`
P ^[1 ,..., [ n g 1 B `
³ ³ f x1 , x2 ,..., xn dx1dx2 ...dxn .
g 1 B
Similarly for random variables K1 P^K1 B1 ,K2 B2 `
g1 [1 ,..., [ k , K2
g 2 [ k 1 ,..., [ n :
P ^g1 [1 ,..., [k B1 , g2 [k 1 ,..., [n B2 `
P ^[1 ,..., [ k g11 B1 , [ k 1 ,..., [ n g 21 B2 ` 171
³ g 1 B ug 1 B ³ f x1 , ..., x k , x k 1 , ..., x n dx1 ...dxn . 1
If
1
2
2
[ 1 , ..., [ n
is a discrete random vector,
P ^K B`
P ^ g [1 ,..., [ n B`
¦
i ,..., k : g xi ,..., zk B
P ^[1
xi ,..., [ n
zk ` ,
(16)
and the cases of discrete random vectors K1 g1 [1 ,..., [ k , K2 g 2 [ k 1 ,..., [ n are defined similarly. If [1 , ... , [ n are independent random variables, then in the integral (15)
f x1 , x2 ,..., xn
f1 x1 f 2 x2 ... f n xn ,
where fi xi is the distribution density of a random variable [ i , in formula (16)
P^[ 1
xi , ... , [ n
z k ` P^[ 1
xi ` ... P^[ n
zk `
(see the formulas (13), (14)). If [1 , [ 2 ,..., [ k , [ k 1 ,..., [ n are the independent random vectors, then in the integral (15)
f[1 ,..., [ n x1 ,..., xn
f[1 ,..., [ k x1 ,..., xk f[ k 1 ,..., [ n xk 1 ,..., xn ,
and so forth. 2.3.1. Distributions of sums, relations and products of random variables Distribution of the sum of random variables. Let [ 1 , ... , [ n be absolutely
continuous random variables, f[1 ,...,[n x1..., xn
f x1 ,..., xn their joint distribution
density. Let us show that their sum K [1 [2 ... [n is also an absolutely continuous random variable and find an explicit form of its distribution density. By definition and on the basis of formula (15) FK x P^K d x` P^[1 ... [ n d x`
³ ...³
x1 ... xn d x
172
f x1 , ..., xn dx1...dxn .
First we consider the case n = 2. In this case FK x F[1 [2 x
f § x x1
· ¨ f x , x dx ¸dx 1 2 2 ³¨ ³ ¸ 1 f © f x1 x2 d x ¹ f § x x2 · ¨ f x , x dx ¸dx . ³ ¨ ³ 1 2 1¸ 2 f © f ¹
³³ f x1 , x2 dx1dx2
(The equality of the twodimensional integral to the repeated integral is true (the integrals are absolutely convergent). In the general case this is from Fubini's theorems (see special chapter IV C1, §1.4.1.)). Introducing new variables, we transform the integral to the form FK x
ª f º ³ « ³ f x1 , x2 x1 dx1 » dx2 f ¬ f ¼ x
ª f º ³ « ³ f x1 x2 , x2 dx2 » dx1 . f ¬ f ¼ x
Further, taking the derivative with respect to the upper limit, it was ascertained that the sum of two absolutely continuous random variables is also an absolutely continuous random variable and its distribution density is determined (by the joint distribution density) by formulas f [1 [ 2 x
f
³
f
f x1 , x x1 dx1
f
³ f x x2 , x2 dx2 .
(17′)
f
If, in addition, [1 , [ 2 are independent random variables, and f[1 x1 , f[2 x2 their respective distribution densities, then for the distribution density of their sum [1 [ 2 we obtain the formula f [1 [ 2 x
f
f
f
f
³ f[1 x1 f[ 2 x x1 dx1
³ f[ 2 x2 f[1 x x2 dx2 .
(17)
The last formula (17) is called the composition formula (or convolution) for the densities f[1 x and f[2 x . Convolution is usually denoted by f[1 f[2 x (sign * denoting the convolution operation). Thus, the composition (convolution) of f[1 x and f[2 x is equal to f[1 f[2 x
f[1 [2 x
(18)
and is defined by (17). If we consider the sum of three independent random variables [ 1 , [ 2 , [ 3 , then taking into account independence of [ 1 [ 2 and [ 3 we can write 173
f[1 [2 [3 x
f[1 [2 f[3 x
f[1 f[2 f[3 x ,
in general, for independent random variables [1 , [ 2 , ... , [ n , f[1 [2 ...[n x
f[1 f[2 ... f[n x .
(18')
If all the random variables [ 1 , ... , [ n are equally distributed and f [i x then
f[1 ...[n x
f f ... f x
f n x .
f x , (18»)
Now let [ 1 and [ 2 be integers, i.e. independent random variables taking only nonnegative integer values (the case of general discrete random variables is analogous). Then
^Z : [1 Z [ 2 Z
n
n`
^Z : [1 Z
k , [ 2 Z n k `,
k 0
whence P ^[1 [ 2
n`
n
¦ P ^[1 k` P ^[ 2 n k` .
k 0
If we introduce the notation
ak
P^[1
k` ,
bm
P ^[2
m` ,
P^[1 [2
cn
n` ,
then we get that
cn
n
¦ ak bn k
k 0
a0bn a1bn 1 ... an 1b1 anb0 .
(19)
The sequence ^cn ` defined by formula (19) is called the composition or convolution of sequences ^a n ` and ^bn ` is denoted in brief as
^cn ` ^an ` ^bn `.
(19')
Convolutions dn an bn cn , etc. are defined similarly. If [1 is (absolutely) continuous, and [ 2 is a discrete random variable, then
^[1 [2 d x`
j
^[
1
[ 2 d x, [ 2 174
y j`
j
^[
1
d x y j ,[2
y j` .
Consequently
F[1 [2 x
¦ P ^[1 d x y j ,[2 y j ` ¦ P ^[1 d x y j ` P ^[2 y j ` j
j
¦ P ^[2 y j ` F[ x y j . 1
j
Whence, since the term of differentiation of the series is legitimate, F[c1 [ 2 x
¦ P^[ 2
f [1 [ 2 x
j
y j `f [1 x y j .
Thus, the sum of a continuous and discrete (independent) random variable is a continuous random variable. For example, if [ 1 is a random variable, uniformly distributed on a segment
>0,[email protected] , and [ 2 is a random variable that takes integer values 0, r 1, r 2 , … with proba
bilities
P^[1
then for k d x k 1 k
P^[1
k ` p k ,
0, r 1, r 2,... f[1 [2 x
k` pk ,…, pk , in other words
f
f [1 [ 2 x
¦ pk I >k , k 1 x .
k f
Density of the distribution of the ratio. Let f[1 ,[2 x1 , x2 be a joint density of
the distribution of random variables [1 , [2 , and f[1 x1 , f[2 x2 are their separate (marginal) distribution densities. Then, by definition, the distribution function of the ratio [1 [2 (we assume that P^[2 z 0` 1) [ ½ F[1 [2 x P ® 1 d x ¾ [ ¯ 2 ¿
³³
f xx2
f [1 ,[2 x1 , x2 dx1dx2
³ ³ f[ ,[ x1 , x2 dx1dx2
x1 dx x2
0 f
1
2
0 f
³ ³ f[ ,[ x1 , x2 dx1dx2 .
f xx2
1
2
Whence we see that the density of the distribution of the ratio [1 / [ 2 is determined by the formula f[1 [2 x
f
³ x2 f[1 ,[2 xx2 , x2 dx2 0
f
³
f
0
³ x2 f[ ,[ xx2 , x2 dx2
f
x2 f[1 ,[2 xx2 , x2 dx2 175
1
2
(20)
It is clear that if [1 , [ 2 are independent random variables, then f [1 [2 x
f
x2 f [1 xx2 f [2 x2 dx2 .
³
f
(20')
Density of product distribution. Proceeding in the same way as in the case of the ratio of two random variables, it is easy to show that the distribution density of a product [ [1 [ 2 is determined by the formula f[1 [2 x
f
§ x· 1 f[1 ,[2 ¨¨ x1 , ¸¸dx1 , x1 © x1 ¹
³
f
(21)
and in the case of independence of [1 , [ 2 , we obtain the formula f[1 [2 x
f
³
f
§ x· 1 f[1 x1 f[2 ¨¨ ¸¸dx1 . x1 © x1 ¹
(21')
2.3.2. Linear transformation of random variables Let [ [1, [ 2 , ... , [ n be an n dimensional random variable, A – a nondegenerate det A z 0 n × n orderconstant matrix. We define a new random vector K K1 ,K2 ,... ,Kn :
K
n
¦ aij[ j i
A[ , i.e. Ki
1, 2, ..., n .
j 1
Let us show that if f[1,...,[n x1 ,...,xn f[ x is the joint density of the distribution of random variables [1, ... , [ n , then the joint density of the distribution of random variables K1 ,K2 ,...,Kn also exists and is determined by the (known) density f[1,...,[n x1 ,...,xn f[ x by the formula fK1 ,...Kn x1 ,..., xn
fK x
1 f[ A1x . det A
Indeed, by definition (7), for any Borel set B E R P^K B`
³ fK y dy
^
B
³ f[ x dx ³
1
A B
`
P^A[ B` P [ A 1 B
B
1 f [ A 1 y dy , det A
176
(22)
and this means that formula (22) is true. (Above, in the penultimate integral, we made a change of variables x A1 y and took into account that the modulus of the Jacobian 1 ). det A
of the transformation is equal to det A1
Example 8. We show that if [ [1 , [2 ,..., [ n ~ N a, R (see Example 2 and formula (9)), then for any nondegenerate n × n matrix А K A[ ~ N Aa, ARA* . Note that this means the following: any linear combination of normal random variables, i.e. n
¦ aij [ j , i
Ki
1,...n , is again a normal random variable.
j 1
Solution. We can write (below we use the fact that det A det A , 2 det ARA det A , . det R ! 0 .):
fK x
1 f [ A 1 x det A
1 det A
1
2S
1
2S
n 2
det ARA
n 2
e
1 §¨ R 1§¨ A1x a ·¸, A1x a ·¸ ¹ © 2© ¹
det R
1 · 1§ ¨ §¨ ARA* ·¸ x Aa , x Aa ¸ ¸ ¹ 2 ¨© © ¹ e
,
(22')
and this, by the definition of a multidimensional normal random variable (see formula (9)), means that K A[ ~ N Aa, ARA* . Further, we have [1 , [ 2 ,..., [ n normal random variables (Example 6), by the aboveproved K normal vector, therefore K1 ,K 2 ,..., K n are normal random variables (Example 6), but each of these random variables is a linear combination of normal random variables: Ki
n
¦ aij [ j . The question of the parameters of a normal random
j 1
variable K i will be considered in Chapter IV. Let us consider several special cases of application of formula (22 '). If [1 , [ 2 are independent N 0,1 random variables, and K1 [1 [2 , K2 [1 [2 , then from (22') we obtain that K1 N §¨ 0, ©
2 ·¸¹ , K 2
2
N §¨ 0, ©
2 ·¸¹ , and K1,K2 2
are
independent random variables. If [i ~ N ai ,V i2 , i 1,2,...,n , are independent random variables, then
f[ x
n
f [ xi i 1
in this case [
i
1
2S
n 2 V 1V 2 ...V n
e
n xi a i ¦ i 1 2V 2
2
i
[1, [2 , ... , [n ~ Na, R , where a a1, a2 ,... , an , 177
, R
diag V 12 ,V 22 ,..., V n2 .
Let А be an orthogonal matrix such that ARA * D diag V 12 ,V 22 ,..., V n2 , (the existence of such a matrix А is known from the course of algebra). Then for a random vector K A[ its components are independent random variables and ηi ~ N Aa i ,V i2 (this follows directly from (22 ')). Examples 9. We show that if [ j ~ Bin j , p , j 1,2,...,m and [1 , [2 ,..., [m are independent random variables, then [1 ... [m ~ Bi n1 ... nm , p : pay attention n1 , n2 ,..., nm may be different, but p for all random variables is the same. Solution. By the formula (19) (below q 1 p ) P ^[1 [ 2
n`
n
¦ P ^[1
k` P ^[ 2
k 0
n k`
n
¦ Cnk p k q n k Cnnk p nk q n nk 1
2
1
k 0
2
§ n k n k · n n1 n 2 n ¨¨ ¦ Cn1 Cn 2 ¸¸ p q ©k 0 ¹
Cnn1 n 2 p n q n1 n 2 n ,
which was to be shown. (Above we used the following property of binomial coefficients (Chap. I, §4, § 4.2, formula (7)):
C nn1 n2
n
¦ Cnk Cnn k . 1
k 0
2
It is obvious that the general case reduces to the case m 2 . 10. We will show that if [ i ~ ПOi , Oi ! 0, i 1,..., n, random variables are independent, then the finite sum [1 [ 2 ...[ n ~ ПO1 O2 ... On of independent Poisson random variables is again a Poisson random variable. Solution. We first consider the case of two random variables ( n 2 ). Then, according to the composition formula (formula (19)): P^[1 [ 2
n`
n
¦ P^[1
k` P^[ 2
n k`
k 0
e O1O2
n
¦ e O1
k 0
1 n n! O1k On2k ¦ n! k 0 k!n k !
e O1O2
O1k k!
e O2
On2k
n k !
O1 O2 . n
n!
If n 3 , then, by the proved property [1 [2 ~ П O1 O2 , [1 [ 2 and [3 are independent, therefore (according to the case n 2 ) 178
[1 [2 [3 ~ П O1 O2 O3 , ect. 11. Let K1 , K2 be independent random variables distributed uniformly on a segment > a, [email protected] . Let us find the density of the distribution of the sum fK1 K2 x . Solution. First, we find the distribution density of a sum of random independent variables [1 , [ 2 uniformly distributed on the region >0, [email protected] . Given that in this case f[1 x f[1 x
f[2 x 1 , x >0,[email protected] ;
f[2 x 0 , x >0,[email protected] ,
and using the composition formula (17), we obtain
f[1 [2 x
0, x > 0, [email protected] , ° ® x , x > 0,[email protected] , ° ¯ 2 x, x >1, [email protected] ,
1
³ f x t dt 0
we now note that the random variables K1 and K 2 can be represented in the form
K1
a b a [1 ,
K2
a b a [2 ,
we find the density of the distribution of their sum with the help of the density found above f[1 [2 x . We have K1 K2
fK1 K2 x
2a b a [1 [2 , whence
1 § x 2a · f [1 [2 ¨ ¸ ba © ba ¹
° if x d 2a or x t 2b; ° 0, 2 x a ° , if 2a d x d a b; ® 2 ° b a ° 2b x , if a b d x d 2b. ° 2 ¯ b a
The found function fK1 K2 x is called Simpson's distribution law (density). This problem could easily be solved with the help of a geometric definition of probability. Indeed, if we take a rectangle as the space of elementary events T > a, [email protected] u > a, [email protected] , then in the case 2a d x d a b the probability of a random point K1,K2 falling into a triangle T ^x1 , x2 : x1 x2 d x` is 179
FK1 K2 x
P^K1 K 2 d x`
x 2a 2 . 2a b 2
If a b d x d 2b , then FK1 K2 x 1
2b x . 2 2 b a 2
If x d 2a and x t 2b , then respectively FK1K2 x 0 and FK1K2 x 1 . After differentiating FK1 K2 x with respect to x , we obtain (23).
If however [1 , [ 2 , [3 , are uniformly distributed >0, [email protected] independent random variables, then according to the composition formula (we also have [1 [ 2 and [ 3 independent random variables)
f[1 [2 [3 x
f[1 [2 * f[3 x
1
³ f[ [ x t dt 0
1
2
0, x > 0,[email protected] , ° 2 ° x , x 0,1 , > @ ° 2 ° ® 2 x 2 x 1 2 , x >1, [email protected] , °1 2 2 ° ° 3 x 2 , x > 2,[email protected] ° 2 ¯
Continuing in a similar way, we see that for uniformly distributed >0, [email protected] four independent random variables [ 1 , [ 2 , [ 3 , [ 4 , the distribution density of their sum [1 [ 2 [3 [ 4 is a fourpart curve of the third order, and so on. 12. [1 ,[ 2 ,[3 are independent N 0,1 random variables. We show that then [1 [2 [3 1 [32
~ N 0,1 .
Solution. By formula (15) we have ° [ [ [ ½° P ® 1 2 3 d x¾ 2 °¯ 1 [3 °¿ 2 2 x1 x2 x32 ½ 1 exp ® ¾ dx1dx2 dx3 . ³³³ (2S )3/2 2 x1 x2 x3 ¯ ¿ dx F[ ( x)
1 x32
180
We introduce new variables y1 , y2 , y3 by the relations: x1 x2 x3
y1
1 x32
, y2
x3 .
x2 , y3
Noting that the modulus of the Jacobian of the transformation is equal to 1 y32 , we write the integral for F[ x in terms of variables y1 , y2 , y3 . After that we differentiate the resulting expression F[ x for х. As a result, we get f
F[c( x)
f[ ( x )
1 y e
³
2 3
f
y32 1 y32 x 2
g ( x)dy2 ,
2
where 1 f ³e (2S )3/2 f
g ( x)
f
1 2S 2S
³e
y22 1 y3 2 y3 y2 1 y32 x 2 2
z 2 2 y3 xz 2
dz ,
z
f
g ( x)
f
1
1 y32 2S 2S
³e
1 y32
y2 1 y32 .
z y3 x , we obtain
In the last integral, introducing a substitution u 1
1
dy2
y 2 x2 u2 3 2 du e 2
1 e 1 y32 2S 1
f
y32 x 2 2
.
The found value of the functions g ( x) is substituted into the integral for f[ ( x ) : f
f [ ( x)
³
1
y32 e
y32 1 y32 x 2 2
f x2
f
1 e 1 y32 2S
y2
3 1 2 e ³ e 2 dy3 2S f
1
y32 x 2 2 dy 3
x2
1 2 e , 2S
which means that [ ~ N 0,1 . This example shows that nonlinear combinations of independent normal random variables can turn out to be normal random variables (see Example 8). Examples 1316 below are numerous and very important applications in mathematical statistics. 13. Let [1 , [ 2 ,..., [ n be independent N (0,1) random variables. Let us show that the density of distributed random variables F n tively, the functions 181
[12 [ 22 ... [ n2 and F n are, respecn
1
f Fn ( x) 2
n 1 2
n 1
§n· Г¨ ¸ ©2¹
x e
x2 2
, x ! 0;
2
nx ( n )n 2 e , x ! 0; n 1 n § · 2 2 Г¨ ¸ ©2¹
f Fn ( x) n
f Fn ( x) 0, x 0;
f Fn x 0, x 0;
(24)
(25)
n
In the formulas (24)  (25) and subsequently everywhere * x is the gamma – function: f
Г x
t x 1 ³ e t dt x ! 0 . 0
Solution. If x d 0 , then it is obvious that FFn ( x) 0 . If x ! 0 then according to the formula (15)
^[
P{F n d x} P
FFn ( x)
n
³ ...³
x12 x22 ... xn2 d x i 1
xi2
1 2 e dx1dx2 ...dxn 2S
1
³ ...³
x12 x22 ... xn2 d x
`
[22 ... [n2 d x
2 1
(2S )
n 2
e
1 2i
n
¦ xi2 1
dx1dx2 ...dxn .
To calculate the integral, we pass to spherical coordinates: x1 U cosT1 cosT2 ... cosTn1 , x2 U cosT1 cosT2 ... sin Tn1 , ………………………… xn U sin T1 .
As a result, we obtain that (below U n1D T1 ,...,Tn1 is the modulus of the Jacobian of the transformation) S
FFn ( x)
2
S 2 x
³S
... ³
2
2
³ S 0
1 (2S )
n 2
e
U2 2
U n1D T1 ,...,T n1 d U dT n1 ... dT1 = x
Cn ³ U e n 1
0
where the constant 182
U2 2
d U,
Cn
1 (2S )
S
S
2
2
³ ... ³S D T1 ,...,T n1 dT n1 ... dT1
n 2 S 2
2
And it depends only on n. If x o f , then f
FF n (f) 1 Cn ³ U
n 1
e
U2 2
n
1 §n· Cn Г ¨ ¸ 2 2 , ©2¹
dU
0
1
Cn
n
§ n · 1 Г ¨ ¸ 22 ©2¹
,
the distribution function of a random variable F n is the function FFn ( x)
x
1 n
³U
n 1
e
§ n · 1 Г ¨ ¸ 22 0 ©2¹
U2 2
d U, x ! 0 ;
Then the density of distribution F n is f F n (x) Further, since
^
`
FFn ( x) P F n d nx
FFc n (x) , i.e. (24) is true.
nx ,
FFn
n
then f Fn ( x)
n f Fn
nx ,
n
i.e. (25) is true. 14. F n2 (chisquare) distribution with degrees of freedom. Let [1 , [ 2 , ..., [ n be independent N (0,1) random variables. Then the distribution of a random variable Fn2 [12 [22 ... [n2 is called a chisquare ( F n2 ) distribution with degrees of freedom. 2
We show that the density of the distribution of a random variable F n is given by
f F 2 ( x) n
x
n x 1 2 2
e
n 2
§n· 2 Г¨ ¸ ©2¹
, x ! 0; f F 2 ( x) n
where * x is the gamma – function. 183
0, x 0,
(26)
Solution. It is obvious that for x d 0 , FF 2 ( x) 0 and f F ( x) 0. 2 n
n
For x ! 0
^
FF 2 ( x) P ^F n2 d x` P 0 d F n d x n
`
x
³ 0
f Fn (u )du ,
whence FFc2 ( x)
1
f F 2 ( x)
n
2 x
n
f Fn ( x ) .
Further, using formula (24), we obtain formula (26). 15. t distribution (Student's distribution with degrees of freedom). Let [ 0 , [1 , [ 2 ,...[ n be independent normal (Gaussian) random variables with
parameters 0 , V 2 . Then the distribution of the random variable
[0 n
tn
[ [ 22 ... [ n2 2 1
is called t distribution or Student’s distribution with n degrees of freedom. We show that the function
ft n x
§ n 1· Г¨ ¸ 1 1 © 2 ¹. , f x f, n 1 S n Г§n· § 2 2 · x ¨ ¸ 1 ©2¹ ¨ ¸ n¹ ©
(27)
is the Student's distribution density ( t distribution) with n degrees of freedom. Solution. We have tn
[0 n , Fn
[0 n ~ N (0, n);
x n1
f Fn ( x) 2
n 1 2
§n· Г¨ ¸ ©2¹
e
x2 2
moreover, [ 0 and F n are independent, therefore, by formula (20')
f tn ( x )
f[
0
x n
Fn
f
³ yf[ n ( xy ) f F ( y )dy 0
f
1 2S n 2
n 1 2
n
0
³y e
§n· Г¨ ¸ 0 ©2¹ 184
n
y 2 § x2 · ¨1 ¸ 2 ¨© n ¸¹
dy
( x t 0 ),
Further in the integral we introduce a new variable by the relation 1
f
z2
§ x2 · 2 z ¨1 ¸ y , after substituting the known value of the integral ³ z n e 2 dz , we n¹ © 0 obtain the desired formula (27). 16. The distribution of Snedekor with k , m degrees of freedom is the distribution of a random variable
F k2 F m2
Fk ,m
:
k
m F k2 , k F m2
m
where F k2 and F m2 are independent chisquare, randomly distributed variables with respect to k and m degrees of freedom. Let us find the density of the distribution of a random variable Fk , m .
Remark 5. Sometimes the distribution of Snedokor with k , m degrees of free
dom is called an F distribution with k , m degrees of freedom (or simply F distribution) or a Fisher distribution. F k2 F m2 , [2 . If we take into account independence of Solution. We denote [1 k m [1 , [ 2 , and apply the formula for finding the distribution density of the ratio of two independent random variables (formula (20 ')), then for x ! 0
f Fk ,m ( x)
k m k 1 1 1 2 2 k m x2
f [1 / [2 ( x) 2 k
m
1
1
k m 2 k
2
³y
§k · §m· 0 Г¨ ¸Г¨ ¸ ©2¹ © 2 ¹
1
k 2 m 2 x 2 (kx m) k m 2
f
§k · §m· Г¨ ¸Г¨ ¸ ©2¹ © 2 ¹
k m 2
k m y ( kx m ) 1 2 2 e dy
§k m· Г¨ ¸ © 2 ¹ k
§ k m · 2 1 Г¨ ¸x §k· © 2 ¹ ¨ ¸ k m ©m¹ §k · §m· Г ¨ ¸ Г ¨ ¸ (1 kx / m) 2 ©2¹ © 2 ¹
(28)
k 2
(here, in the first integral, we made a change of variables by the formula z y(kx m) ). Of course,
f Fk ,m ( x)
0 ( x 0).
185
2.4. Conditional distributions Let ሺ : ǡ ࣠ǡ ܲሻ be a probability space where a random variable is defined as ξ. Then the conditional probability for the event Ax ^Z : [ Z d x` and the event B ࣠ having a nonzero probability is defined as P Ax B . It is natural to call this conditional probability the conditional distribution function of a random variable [ with respect to an event B . We denote this conditional distribution function by F[ x B : F[ x B
P ^[ d x B`
P ^^[ d x`
P B
B`
.
(29)
It is easy to see that this function satisfies all the properties of F1, F2, F3 (unconditional) distribution functions from Ch. ІІ, §3, item 3.1. If for any event А ࣠ we introduce the probability PB with the help of the relation PB A
P AB , then it generates a new probability space ሺ : ǡ ࣠ǡ PB ሻ (see P B
Chapter II, §4, item 4.1.) and the function F[ x B is an unconditional (ordinary) distribution function on this probabilistic space. If the σalgebra ࣠క generated by a random variable [ and the event B are independent, then the events ^[ d x` and B are independent, therefore F[ x B F[ x As an example, consider the following problem. Let the operating time of a certain mechanism be a continuous random variable K with a distribution function F x If it is known that the mechanism has worked properly for a time a , then how is the distribution function of its remaining time of serviceable work determined? It is clear that we need to find the conditional probability P^K a t x K t a` (we will assume that P a P ^K t a` ! 0 ). We can write: P^K a t x K t a`
P^K a t x,K t a` P^K t a`
P x a Pa
1 F x a . 1 F a
(29′)
Let us note the following: in a number of applied problems, especially when considering a system consisting of a large number of reliable parts of complex mechanisms, it is reasonable to consider the time of an effective work of the mechanism K as an exponential random variable (this assumption is based on the Poisson theorem and the concepts of Poisson processes). But for an exponential random variable F x 1 e ax ( x t 0, a ! 0 – parameter), therefore, from the above relation for condi
ax tional probability, we get that P x e , i.e, the distribution of the time of the correct operation of the mechanism coincides with the distribution of the remaining time of the correct operation of the new mechanism. In other words, the new mechanism and the mechanism that has worked continuously for a time a , in terms of the time of good work in the future, are equivalent.
186
The exponential distribution, together with its discrete analogue – geometric k distribution P ^[ k` q 1 q , k 0,1,... , is the only absolutely continuous distribution possessing the abovenoted remarkable property. The latter is a consequence of the uniqueness of the solution of the functional equation obtained from (29 ')
P x a P x P a . We now discuss in more detail the cases of discrete and absolutely continuous random variables. If [ , K are the discrete random variables, pij their joint distribution law, then for i, j 1,2,... (under the conditions when the denominators of the fractions below are different from zero): pi
pij
¦ pij P ^[ xi ` ,
pi / j
j
p j pi j ,
pi
P ^[
xi K 1 p j
¦ p j pi j j
y j`
pij p j
,
¦ p j pij , ect, j
but the conditional distribution function F[ x K
yj
1 p j
¦ P ^[ xi K y j `
i: xi d x
¦ pij .
i: xi d x
It is clear that analogous relations for conditional distributions can also be written out in the cases of three, four, ..., and, in general of any, finite number of discrete random variables. For absolutely continuous random variables, we need to introduce the notion of conditional distribution density, taking into account the remarks made above about integrals. If [ , K are absolutely continuous random variables, then for any y R probability P ^K y` 0 , therefore we cannot determine the conditional distribution function F[ K x y P ^[ d x K y` directly by formula (29). Therefore, at this point, we first define the conditional distribution function with respect to the event B ^ y K d y 'y` according to formula (29), then we pass to the limit for 'y o 0 and it is precisely the resulting limit function that we understand as the conditional distribution function F[ K x y . So, x y 'y
F[ x B
P ^[ d x y K d y 'y`
³ ³
f
³ y
187
f[ ,K u, dud
y y 'y
fK  d
,
x
F[ K x y
³
lim F[ x B
f
f[ ,K u, y du fK y
'y o0
,
where f[ ,K x, y is the joint density of the distribution of random variables [ , K and fK y is the density of the distribution of the random variable K . The derivative with
respect to x of the conditional distribution function F[ K x y will be called the conditional density of the distribution of the random variable [ when the condition K y is satisfied and denote it by f[ K x y . So f[ K x y
From this
f[ ,K x, y fK y
f[ ,K x, y
fK y z 0 .
,
fK y f[ K x y .
(30)
(31)
The last formula reminds us of the formula for multiplication of probabilities, therefore formula (31) will be referred to as the density multiplication formula. It is not difficult to guess that the following important formulas also hold: x
F[ K x y f[ x
f
³
f
³
f
f[ ,K x, y dy
f[ K u y du , f
³
f
fK y f[ K x y dy .
(32)
The second of formulas (32) is an analogue of the formula of total probabilities for continuous random variables. Example 17. Consider a twodimensional normal vector [ ,K . Recalling that the distribution density of such a random vector is determined by formula (10») and writing down the formula for the marginal distribution density fK y , for the conditional distribution density [ under the condition K y , we obtain the formula f[ K x y
where m y a1
1 2
2S 1 U V 1
e
xm y 2
2V12 1 U 2
,
V1 U y a2 . Note that this function is the distribution density V2
N m y ,V 12 1 U 2 of a normal randomly distributed random variable. 188
Example 18. Let [ be a nonnegative random variable [ ! 0 and its distribution density f[ x f x be a continuous function. Suppose first that a random variable [ is observed, then a random variable K is defined as a random variable distributed uniformly in the interval 0, [ . In addition, we assume that the random variables K and [ K are independent. We will show that then the distribution density of a random variable [ is given by a 2 xe ax , x ! 0, f[ x f x ® x 0, ¯0, where a is a positive constant. Solution. From the condition of the problem, we obtain that the conditional distribution density K under the condition [ x is the function 1 ( 0 y x ), x
fK [ y x
If we denote fK y
fK [ y x 0 ( y ! x ).
g y by (32), then
g y
f
f
y
y
³ f x fK [ y x dx ³
consequently gc y
f y , y
f y
f x dx , x
yg c y .
Next, we write the condition of independence of K and [ K : fK ,[ y, x
fK ,[ K y, x y
fK y f[ K x y
g y f[ K x y .
On the other hand 1 x
fK /[ y / x
fK ,[ y, x f x
g y f[ K x y f x
means g y f[ K x y
1 f x . x
Whence y o x we obtain the relation
f x xg x f[ K 0 axgx , 189
( 0 y x ),
where a f[ K 0 ! 0 . Substituting this value into the equation found earlier f y yg c y , we have
axg x xg c x , g x ce ax . f
It follows
³ g x dx
1 from the condition that c a . So
0
f x
axg x
a 2 xe ax ( x ! 0 ).
2.5. Tasks for independent work
1. Let [ , K ~ N 0, V 2 be independent random variables. Show that then [1
[2
[ 2 K 2 and
[ are also independent random variables. K 2. [ ~ F n2 , K ~ F m2 , [ and K are independent. [ are also independent random variables. Show that then [1 [ K and [ 2 K 3. [1 and [ 2 are independent random variables uniformly distributed on (0,1). Show that
random variables K1 U cos M , K2 U sin M , where U 2 ln [1 , M 2S [ 2 are independent normal random variables with parameters (0,1). 4. [ ~ Bi(Q ; p) . In its turn v ~ Bi (m; q) . Show that then [ ~ Bi (m; pq) . Direction. First, find the joint law of distribution of random variables [ and Q , after finding the marginal distribution law [ . 5. First let us observe a Poisson random variable [ with a parameter O . After the independent testing [ of Bernoulli with a probability of success p . Find the law of distribution of a random variable K  the number of successes in the last test. 6. Suppose that for each fixed O ! 0 random variable [ ~ 3 (O ) , and O in turn has a gamma distribution with a parameter D , i.e. its distribution density is the function
f O ( x)
1 xD 1e x , x t 0, ° ® Г (D ) °0, x 0, ¯
where D is a fixed positive number, and * D is a gamma function. Show that the distribution law [ is defined by formulas P{[
k}
Г (D k ) §1· ¨ ¸ Г (D ) Г (k 1) © 2 ¹
D k
,
k
0,1,2,... .
Direction. First, show that for any event A and continuous random variable K , the formula 190
P A
f
³ P A K x fK x dx ,
f
then apply this formula for the case A ^[ k` , K O . 7. [ ~ 3 (O ) , in turn, O is an exponential random variable with a parameter с . Find the law of distribution of a random variable [ . 8. Continuation. In the previous task, the parameter O , in turn, obeys the gamma distribution of order D with the scale parameter с (c ! 0, D ! 1) , i.e. the distribution density O is given by fO ( x) cD 1
xD ecx ( x t 0); *(D 1)
f O ( x) 0 ( x 0) .
Show that the law of distribution of a random variable [ is determined by the relations P ^[
k`
*(k D 1) § 1 · ¨ ¸ k !*(D 1) © 1 c ¹
k D 1
cD 1 , k
0,1, 2... .
9. [ ~ Bi (n, p) . In turn, p is a betadistributed random variable with parameters r and s , i.e. the distribution density p is given by f p ( x)
Г (r s) r 1 x (1 x) s 1 , Г (r ) Г ( s )
0 x 1 ; f p ( x)
0 , x 0,1 .
а) Show that the law of distribution of a random variable [ is determined by the probabilities P{[
k} Cnk
Г ( r s ) Г (k r ) Г (n k s ) , k Г (r ) Г ( s ) Г (n r s )
b) Show that this distribution r
0,1, 2,..., n.
s 1 is a uniform distribution on the set {0,1,2,..., n} the
distribution [ is determined by the relations P{[
k}
1 . n 1
10. [1 ~ Bi(n, p) ; A random variable [ 2 when the condition [1 is satisfied is a binomial random variable with parameters ([1 , p) ; A random variable [ 3 when the condition is satisfied [ 2 is a binomial random variable with parameters ([ 2 , p) , etc. Show that then the random variable [ k is a binomial random variable with parameters (n, p k ) , i.e. show that [ k ~ Bi(n, p k ) . 11. [ 1 , [ 2 are independent random variables: [1 ~ N (0, V 12 ), [ 2 ~ N (0, V 22 ) . Show that then [1[ 2 ~ N (0, V 2 ) , [ [12 [ 22 where the parameter V 2 is related to the given parameters V 12 , V 22 by the relation 12 V
Direction. Use the ratio
1
1
[2
[12
1
[ 22
. 191
1
V 12
1
V 22
.
1 12. [ , K are independent random variables: [ ~ N §¨ 0, ·¸ , i.e. [ is a random variable with a © n¹ 2
distribution density f[ ( x)
n nx2 e , and K 2S fK ( x)
Fn n
, its distribution density
2n § x n · ¨ ¸¸ §n· ¨ Г¨ ¸ © 2 ¹ ©2¹
n 1
e
nx 2 2
, ( x t 0).
[ . K 13. Let [1 ,[ 2 , ...,[ n be independent identically distributed exponential random variables with
Find the distribution density of a random variable ]
parameters O : f [i (t )
O e Ot , t t 0 (distribution density).
a) Show the validity of the formula P ^[1 [2 ... [n ! t`
n 1
Ot
j 0
j!
¦ e Ot
j
.
After using this formula, find the density of the distribution of the sum Sn
[1 ... [n .
b) Find the distribution density Sn [1 ... [n using the composition formula (formulas (17)). 14. Continuation. Find the distribution density of a random variable K
[1 . [1 [ 2 ... [ n
15. [1 , [ 2 ,... are a sequence of independent identically distributed exponential random variables with parameters O , S0 0, S1 [1 ,..., Sn [1 [2 ... [n ,... . We define a random variable vt as the number of indices k t 1 : vt
f
¦ I ^Sk dt ` , satisfying the condition Sk d t :, where I A is the indicator event
A.
k 1
Show that vt is a Poisson random variable with the parameter Ot : vt ~ 3 O t . 16. Record values. Let [ 0 , [1 , [ 2 ,..., [ n be independent indivisible random variables distributed with parameters D . We define the random variable Q from the condition: v
min ^n : [1 [ 0 , [ 2 [ 0 ,..., [ n1 [ 0 , [ n ! [ 0 ` .
Find the distribution function and the distribution density of the first record random variable [Q . Direction. Notice
^[Q
f
d x` ¦ ^[Q d x,Q n 1
f
n` ¦ ^[ n d x, [1 [ 0 , [ 2 [ 0 ,..., [ n1 [ 0 , [ n ! [ 0 ` n 1
and find the probability of this event. 192
17. Let [1 , [ 2 ,..., [ n , ... be a sequence of mutually independent randomly distributed random variables on a segment >0, [email protected] , S0
0 , S1 [1 ,
S2
FSn ( x)
[1 [2 ,…, Sn
P ^Sn d x` ,
Fn ( x) f n (x)
[1 [2 ... [n ,... .
Fnc( x) .
Then using the composition formula, show that for n 1,2,... the formula f n 1 ( x)
1a ³ f n ( x y )dy a0
1 > Fn ( x) Fn ( x a)@ . a
Further, applying induction, obtain the following formulas
Fn ( x) where x x n
1 n ¦ (1)k Cnk ( x ka)n ; a n! k 0 n
f n1 ( x)
1 n1 ¦ (1)k Cnk1 ( x ka)n , a n! k 0 n 1
x  x  is the positive part of the real number x ( x 2
x, if x ! 0 , x
0, if x d 0 ) and
( x ) n .
18. Continuation. Show that if the random variables [1 , [ 2 ,...,[ n are independent and uniformly distributed on a segment > b, [email protected] , then the distribution density of a random variable S n [1 [ 2 ... [ n is determined by the formula f n ( x)
n 1 ¦ (1)k Cnk ( x 2kb)n1 . (2b) (n 1)! k 0 n
19. Triangular densities. Show that the distribution density of two uniformly distributed on
> a, [email protected] independent random variables, i.e. two independent random variables with distribution densities f a ( x)
1 ,  x  a ; f a x 0, x t a 2a
is a function g a ( x)
1 §  x · ¨1 ¸ ,  x  a ; g a x 0, x t a , a© a ¹
i.e. a triangular distribution density, in other words, show that f a f a ( x) g a ( x) . 20. Let [ , K be independent random variables distributed according to the Cauchy law with parameters, respectively a, b , i.e. they are independent random variables, respectively, with densities J a ( x)
a
S a 2 x 2
, f x f and
J b (x) .
Show what then J a J b ( x)
J a b ( x) . Further, from what has been said, if [1 , [ 2 ,...,[ n are independent randomly distributed random
variables with parameters D , equally distributed according to Cauchy's law, then the distribution [ [ ... [ n coincides with: J D (x) : f [ (x) J D (x) . density of the mean [ 1 2 n 193
21. Continuation. Let [1 , [ 2 be independent random variables distributed according to the Cauchy law J D (x) with a distribution density. Show that the distribution density of the sum K1 K2 of random variables K1 a[1 b[ 2 , K2 c[1 d[ 2 is a function J ( a bc d )D ( x) , in other words, this density is a convolution of the function J ( a b )D ( x) (the density of distribution of a random variable K1 ) and a function J ( c d )D ( x) (the density of distribution of a random variable) K 2 . 22. Let [1,[ 2 ,... be independent identically distributed random variables with distribution densities f ( x) xe x , x t 0 ; f ( x) 0, x 0 . We introduce the notation S0
0 , Sn
[1 [ 2 ... [ n ,…, n 1,2,... ,
and define a new random variable Q t as follows: vt
min ^n : S1 d t, S2 d t,..., Sn d t , Sn1 ! t` .
Find the distribution law Q t . 23. Let §
[1 , [2 ~ N ¨ (0, 0), ©
U ··
§1 ¨ ©U
¸¸ . 1 ¹¹
We introduce the random variables (polar coordinates) r , M : [12 [ 22 , M
r
arctg
[2 . [1
Show that the distribution density of a random variable φ is a function fM (\ )
1 U 2 , 0 \ 2S ; 2S (1 2 U sin\ cos\ ) fM \ 0, \ 0, 2S ,
and the random ߮ on only U 0 turns out to be uniformly distributed and independent of r. 24. [1 , [ 2 , ... , [ n are independent equally distributed and taking the values +1 and 1, respectively, with probabilities p and 1 p q random variables Kn [1 [2 ... [n . Prove that then P ^Kn
1`
1 ª1 ( p q) n º¼ , P ^Kn 2¬
1`
1 ª1 ( p q) n º¼ . 2¬
25. Let § ¨ [ [1 , [ 2 , [3 ~ N ¨ 0, 0, 0 , ¨ © We define a new random vector
K
§1 0 0·· ¨ ¸¸ ¨0 1 0¸¸ . ¨0 0 1¸¸ © ¹¹
[1, [2 [1,[3 [1 K1,K2 ,K3 . 194
Find the following quantities: a) the density of the distribution of the vector K ; b) marginal distribution density fK2 ,K3 ( x2 , x3 ) ; c) conditional distribution density fK1 ,K2 / K3 ( x1 , x 2 / y ) ; d) marginal distribution density fK2 ( x 2 ) ; e) conditional distribution density fK2 / K3 ( x2 / y ) ; f) conditional density of distribution fK1 / K2 ,K3 ( x1 / y, z ) . 26. Obtaining exponential random variables through uniformly distributed random variables. Let [1 , [ 2 , ... be a sequence of random variables uniformly distributed on >0,[email protected] We define a new random variable as follows:
Q
min ^n :[1 t [2 t ... t [n1 , [n1 [n ` .
Prove that then P ^[1 d x, Q
n`
x n1 xn , (n 1)! n!
then show that this implies P ^[1 d x, Q an even number} 1 e x . 27. Continuation. For tests take random values [1 , [ 2 ,...,[Q : if Q is an odd number, we will assume that there was a failure, if Q is an even number, then we believe that there was success. Independent tests will be conducted until the first success occurs. The new random variable K is defined as the sum of failures and the first value in a successful test. Prove that then K is an exponential random variable with a parameter O 1 , i.e. random variable with distribution function P ^K d x }= 1 e x . Let [1 ,[ 2 ,...,[ n (n t 2) be independent identically distributed random variables defined on a probability space :, F , P . For each Z : number [1 (Z ), [2 (Z ),..., [n (Z ) we arrange in ascending order and renumber them: [ (1) d [ (2) d ... d [( n ) . So the constructed sequence of random variables is called the variational series corresponding to random variables ξ1, ξ2,…,ξn, and the fifth term of this series [ (k ) is called the k th member of the variational series. For example, in special cases
[ (1) min ^[1 , [ 2 ,..., [ n `, [ ( n ) max ^[1 , [ 2 ,..., [ n `. It is clear that random variables ξ(1), ξ(2),…,ξ(n) are not independent random variables (they are subject to the above inequalities). 28. Let [1 , [ 2 ,...,[ n be independent identically distributed exponential random variables with
a parameter O , [ (1) , [ ( 2) ,... , [ ( n ) are the corresponding variational series. Show that then [ (1) , [ ( 2) [ (1) , [ (3) [ ( 2) ,... ..., [ ( n) [ ( n1) are independent random variables, and the densities of the distributions of random variables [ ( k ) [ ( k 1) , k 1, 2,..., n [ 0 0 are respectively functions O n k 1 e( nk 1)Ot (t ! 0) , i.e. [ ( k ) [ ( k 1) (k 1, 2 , ... , n) is an exponentially distributed random variable with a parameter Ok (n k 1) O . 29. [ 1 , [ 2 , [ 3 are independent N 0,1 random variables, [ 1 , [ 2 , [ 3 are the corresponding vaccination series. Find the distribution densities of the following random variables and random vectors: c) [ 2 [1 , [3 [1 ; а) [2 [1 , [3 [1 ; b) [2 [1 , [3 [2 ;
d) K
[ 2 [1 [3 [1
;
e) [ 2 [1 , [3 [ 2 ; f) 195
[ 2 [ 1 [ 3 [ 2
30. [1 ,[ 2 ,...,[ n are independent identically distributed random variables, their distribution functions are equal F[ ( x) F ( x) , their distribution densities are equal f[i ( x) f ( x) and [(1) d [(2) d ... d [( n ) is the corresponding variational series. Show the validity of the following formulas: i
а) F( k ) ( x) P ^[( k ) d x` b) f ( k ) ( x) n
l
n l
;
l k
k Cnk F ( x)
F(ck ) ( x)
c) F( k ),( l ) ( x, y )
n
¦ Cnl F ( x) 1 F ( x) k 1
P ^[( k ) d x, [( l ) d y`
1 F ( x)
nk
f ( x);
n! j n i j i F ( x) > F ( y) F ( x)@ >1 F ( y)@ , if x y ; max( l i ) i ! j !( n i j )! F( k ),(l ) ( x, y ) F(l ) ( y ), if x t y , n i
¦ ¦ i k j
where F(l ) ( y ) – the function defined in case a); f ( k ),(l ) ( x, y )
F ( x)
k 1
w 2 F( k ),(l ) ( x, y ) wxwy
f ( x) > F ( y ) F ( x) @
l k 1
n! (k 1)!(l k 1)!(n l )!
f ( y) 1 F ( y)
n l
, if x y ;
f ( k ),(l ) ( x, y ) 0 , if x t y .
31. Let [1 , [ 2 ,… be independent identically distributed with parameters O exponential random variables, S0 0 , S1 [1 ,…, Sn [1 [2 ... [n ,…, and the new random variable Q t is defined as follows: for t ! 0
Qt
min ^k : S0 t, S1 t ,..., Sk 1 t , Sk t t` .
Show that then the distribution density of a random variable [Q t is the function ft x
O 2 xe O x , 0 x d t , ° . ® O x ° ¯O 1 Ot e , x ! t
32. Continuation. Show that a random variable St with a parameter O .
196
SQ t t is an exponential random variable
Chapter
ʵV MATHEMATICAL EXPECTATION
Our main goal in this chapter is to introduce the concepts of mathematical expectation of a random variable and conditional mathematical expectation with respect to sigma algebras and to study their properties. From the point of view of measure theory, the mathematical expectation of a random variable is the Lebesgue integral with respect to a probability measure taken from the random variable (a measurable function). In the chapter, we derive formulas for calculating mathematical expectation and conditional mathematical expectation, and also we will consider other numerical characteristics (variance, covariance, correlation coefficient) of random variables. §1. General definition of mathematical expectation. Properties The mathematical expectation (mean value) of a random variable [ [ Z , given on a general probability space ሺ : ǡ ࣠ǡ ܲሻǡ is denoted by M[ and is determined gradually, in three stages: first for simple random variables, then for nonnegative ones, and finally, for arbitrary random variables. Definition 1. I. For a simple random variable n
[ Z ¦ x i I Ai ( Z ) , Ai ࣠, Ai A j
i z j ,
i 1
n
¦ Ai
:, n f,
i 1
by definition,
M[
n
¦ x i P Ai .
(1)
i 1
We note at once that for a random variable – the indicator I A Z of event A ࣠ – its mathematical expectation is equal to the probability of this event: 197
MI A Z
P A .
Remark 1. We draw attention to the fact that in Definition 1 it is not required that the values xi i 1, 2 , ... , n , taken by the random variable, be different, but only needed that A1 , A2 ,..., An be a partition of : . If
Bj
^Z : [ Z
`
y j , y j z yl
j z l ,
j
1, 2,..., m f ,
m
¦ Bj
:,
j 1
i.ɟ., if the events B1 , B2 ,.., Bm form a partition of : by the different values of a random variable [ [ Z , then we can represent the random variable [ in the form m
[ ¦ y j I B Z , Bj j 1
j
^Z :[ Z xi `
i: xi y j
Ai .
i: xi y j
In this case, by definition (1), we can write M[
¦ xi P Ai ¦ ¦ xi P Ai ¦ y j ¦ P Ai ¦ y j P B j . n
i 1
m
m
m
j 1 i: xi y j
j 1
i: xi y j
j 1
Remark 2. Here, incidentally, we will explain why the mathematical expectation of a random variable is also called the mean value of a random variable. Suppose that a (simple) random variable [ takes m different values y1 , y2 ,..., ym and the values y1 , y2 ,..., ym appear n1 , n2 ,..., nm times (respectively) in n observations of this random variables ( n n1 n2 ... nm ). Then it is clear that as an average value of a random variable one must take the following value:
y
n1 y1 n2 y2 ... nm ym n
y1
n n1 n y2 2 ... ym m . n n n
On the other hand, according to the statistical definition of probability, for sufficiently large n n !! 1 pj
P Bj
P ^[
y j` 
nj n
, j 1, 2,..., m,
which means that
M[
¦ y j P Bj ¦ y j p j  y . m
m
j 1
j 1
198
Consequently, the mathematical expectation M [ , defined above, for n !! 1 is indeed an approximate value of the mean value of the random variable. II. [ t 0 . Then the lemma is known (Chap. III, §1, Theorem 6, case b)): Lemma 1. For any nonnegative random variable [ t 0 there is a sequence of simple random variables [ n such that 0 d [ n n [ . According to stage I, a numerical sequence M [ n is defined. Since [ n d [ n1 , then M [ n d M [ n1 (the property 20 which is proved below) and there exists the limit lim M [n , which is finite or equal f . nof
Then, for [ t 0 , by definition, suppose that
M[
lim M[ n .
(2)
n of
III. Let [ be an arbitrary random variable. In this case, [ can be uniquely
written in the form [
[ [ , where max[ , 0 ,
[
[
min([ , 0) .
In this case, since [ t 0 , [ t 0 , then, according to the stage II, the mathematical expectations M[ , M[ are defined, and we suppose by definition:
M[
M[ M[ .
(3)
We say that the mathematical expectation exists, only if min M[ , M[ f . Thus: for simple random variables, the expectation always exists and is finite; for nonnegative [ the following takes place: or M [ f , or M [ f .
If M[ f , M[ f , then M[ f too; if M[
f , M[ f , then M[
if M[ f , M[
f , then M[
f ; f .
The expectation is not defined only in the case M[ M[ f . The correctness of determining the mathematical expectation. ȱ. The case of simple random variables. If a simple random variable [ is written in two ways through finite partitions of the space of elementary events : , i.e. if
[
n
l
¦ xi I A (Z ) ¦ z j IC Z , i 1
i
j 1
j
where the xi and the z j are not necessarily different, then we need to prove that 199
M[
¦ xi P Ai ¦ z j P C j . n
l
i 1
j 1
The proof of the last equality follows immediately from Remark 1 (see I) applied to both sums (both are equal to the same sum). ȱȱ. The case of nonnegative random variables. In this case, we need to show that if 0 d [ n n [ and 0 d K m n [ , where [ n , K m are the sequences of simple random variables, then lim M[ n lim MK m . n of
m of
To do this, we first prove the following lemma. Lemma 2. Let K , [ n be simple (nonnegative) random variables, n t 1 , and 0 d [ n n [ t K . Then
lim M[ n t MK .
n of
Proof. Suppose that H ! 0 and An ^Z : [ n Z ! K Z H `. It clear that in this case An n : and [ n [ n I An [ n I An t [ n I An t K H I An . Now, using the property 2ɨ of mathematical expectation for simple random variables (which will be proved below), we can write
M[ n t M K H I An
MKI An HP An
MK MKI A H P An t MK cP An H , n
where c
maxK Z .
Z :
Hence, in view of the arbitrariness of H >0, the required inequality follows. ז Proof of correctness. It follows from the lemma that ݈݅݉ ߦܯ ݈݅݉ߟܯ ൌ ߟܯ ,
ืஶ
by symmetry, ݈݅݉ ߟܯ ߦܯ , therefore, ืஶ
݈݅݉ ߦܯ ൌ ݈݅݉ ߟܯ .
ืஶ
ืஶ
ȱȱȱ. General case. The correctness of the definition in this case follows from the uniqueness of the representation of the random variable [ in the form [ [ [ . 200
Properties of mathematical expectation. 10. Linearity Suppose that M[ , MK and M[ MK exist, c is constant. Then
M [ K M[ MK ,
M c[ cM[ .
20. Positivity If [ t 0 , then M [ t 0 . If M[ and M K exist and [ t K , then M [ t M K . 30. Finiteness If M[ f , then M [ f .
If [ d K and MK f , then M[ f .
If M[ f , MK f , then M [ K f . Proof of the properties. ȱ. The case of simple random variables. 10. Let [ and K be simple random variables:
[
n
m
¦ xi I A Z , K ¦ y j I B Z , i
i 1
j
j 1
where Ai , B j ࣠ , Ai Aj
i z j ,
Bi B j
n
m
i 1
j 1
¦ Ai ¦ B j
:, n, m f.
^
If Z Ai B j , then [ Z K Z Further, since
[ Z K Z ¦ ¦ xi y j I A B , n
m
i
i 1 j 1
Ai P Ai
m
¦ Ai B j , j 1
¦ P Ai B j , m
j 1
Bj
j
n
¦ Ai B j , i 1
P Bj
¦ P Ai B j , n
i 1
then, by definition and according to the Remark 1, we can write M [ K
¦¦ xi y j P Ai B j n
m
i 1 j 1
¦ xi ¦ P Ai B j ¦ y j ¦ P Ai B j n
m
i 1
j 1
`
xi y j , in addition, Ai B j partition of : .
m
n
j 1
i 1
¦ xi P Ai ¦ y j P B j n
m
i 1
j 1
201
M [ MK .
n
¦ cxi I A
If c is constant, then c[
i 1
M c[
i
, hence,
n
n
i 1
i 1
¦ cxi P Ai c¦ xi P Ai cM [ .
20. If [ t 0 , then in (1) all xi t 0 , hence, M [ t 0 . If [ t K , then [ K [ K and, by the property 10, M [
M K M [ K .
Further, since [ K t 0 implies M [ K t 0 , we obtain, that M [ t M K . 30. The proof is obvious. ȱȱ. The case of nonnegative random variables. 10. Suppose that 0 d [n n [ , 0 d K n n K , where [ n , K n sequenceses of a simple random variables. Then
0 d [ n Kn n [ K ,
M [ n Kn
M [ n MK n ,
and, by definition,
M [ K
lim M [ n K n n of
lim M [ n lim M K n n of
n of
M [ MK .
If [ t 0 and c t 0 , then, the convergence 0 d [ n n [ implies the convergence 0 d c[ n n c[ , and from this we get that
M c[ lim M c[n c lim M [ n nof
nof
cM [ .
20. The convergence 0 d [ n n [ implies the convergence 0 d M [n n M [ . If [ t K , then, from the equality [ K [ K , by the proved property 10, we get that M [ M K M [ K t M K . 30. For [ t 0 we have [ [ and M [ M[ . If 0 d [ d K and M K f , then, the inequality M [ d M K (property 20) implies that M [ f . ȱȱȱ. The case of arbitrary random variables. 10. If c ! 0 , then, the equality [ [ [ implies that c[ c[ c[ . If c 0 , then c[ c [ c [ . Hence we obtain that
202
M c[
c M[ M[
c M[
cM [ .
Now we prove the additivity property M [ K M [ M K . First of all we note the following: If [ [1 [ 2 and [1 t 0 , [ 2 t 0 , then it implies that [1 [ G , [ 2 [ G , where G t 0 . Indeed, the equality [ [ [ [1 [2 implies the equalities [1 [ [2 [ t 0 . If now we introduce the notation [1 [ G , then [ 2 [ G . Further, if M [1 f , M [ 2 f , then it is easy to see that the equality [ [1 [ 2 implies the equality M [ M [1 M [ 2 . Therefore, by applyuing the equality just proved to the equality
[ K
[
K [ K , we can write M [ K
M [ K M [ K ,
It implies that M [ K M [ MK . This conclusion is valid for the case of finite M [ and MK . The case when one of the mathematical expectations M [ or MK is equal to infinity is analyzed in a similar way. 20. Prove that [ t K and an existence of M [ and MK implies the equality M [ t MK . The case MK f is obvious. Suppose that M K ! f . Then M K f . If K d [ then K d [ and K t [ . Therefore, M [ d M K f , it means that an expectation M [ is defined and MK MK MK d M[ M[ M[ .
The case M [ f is analogous to the case considered. 30. If [ [ [ , then [ [ [ , M [ M [ M [ . Therefore, M[
M [ M [ f implies that M [ f . The remaining assertions of properties 30 are obvious. We note further that the following important inequality holds: if M [ exists, then always
M[ d M [ . Indeed, [ d [ d [
(3*)
always, which implies, by the properties 10, 20,
M [ d M [ d M [ , i.ɟ. M [ d M [ . 203
The Lebesgue integral. A mathematical expectation M [ Z is denoted by
³ [ Z P dZ ( or ³ [ Z dP Z , or ³ [ dP ) and is called the Lebesgue integral of
:
:
:
࣠  measurable function [ [ Z by the probabilistic measure P . By definition, for the event Ⱥ ࣠ the mathematical expectation
M [ I A
³ [ Z I A Z P dZ
:
can be written as
M [ I A
³ [ Z P dZ . A
In particular, we note once again that for the event indicator (see formula (1))
³ P dZ
MI A
P A .
(1')
A
Remark 3. In the above definition of the Lebesgue integral, a measure P is a probability measure P : 1 , and a random variable (࣠ measurable function)
[ [ Z takes values in a set R (f, f) . Let now P be the measure which is given on the measurable space ሺ : ǡ ࣠ሻ and suppose that it may also take the value f , and the ࣠ measurable function (extended random variable) [ [ Z can take values on the set R > f, f @ . In this case an (Lebesque) integral ³ [ Z P dZ will be defined, as above (only with the replace:
ment of P for P) – first, for simple random variables, by the formula (1); after, for nonnegative random variables – by the limit (2); and in general case (in the absence of uncertainty of the form f f ) – by the formula ³ [ Z P dZ ³ [ Z P dZ ³ [ Z P dZ .
:
:
:
There is a very important case, when ሺ : ǡ ࣠ሻ = R, E R , and P is a Lebesque measure. In this case the integral ³ [ x P dx is usually denoted by ³ [ x dx (or R
R
f
f
f
f
³ [ x dx , or L ³ [ x dx ), and, to emphasize the difference of the integrals, the
corresponding Riemann integral is denoted by R
f
³ [ x dx . If P corresponds to
f
some extended distribution function G G ( x ) (Chap. II, §3) as the LebesgueStiltes 204
measure, then the integral ³ [ x P dx is called the LebesgueStiltes integral and, to R
distinguish this integral from the RiemannStieltes integral R S ³ [ x G dx , it is R
L S ³ [ x G dx .
denoted by
R
In this textbook, we assume that the readers are familiar with various types of the above integrals and their relationships. 1.1. The multiplicative property Theorem 1 (Multiplicative property of mathematical expectation). If
[1 , [ 2 ,..., [ n are the independent random variables with the finite mathematical expectations ( M [i f i 1, 2,..., n ), then M [1[2 ... [n f and M ([1 [ 2 ... [ n )
M [1 M [ 2 ... M [ n .
Proof. For simplicity, we consider the case n 2 ( [ 1 ȱ. Let [ ,K be simple independent random variables: n
m
(4)
[ , [2
n
K ).
n
ȟ = Ȉ xi IAi (Ȧ), Ș = Ȉ yi IBi (Ȧ) (n, m < , Ȉ Ai = Ȉ Bi = ȍ). i=1
j=1
i=1
j=1
Then
[K
n m
¦¦ xi y j I A B Z , i j
i 1j 1
and, since [ and K are independent, then PAi B j P Ai PB j . Therefore (by definition) n
m
n
m
MȟȘ = Ȉ Ȉ xi yj P(Ai) P(Bj) = Ȉ xiP(Ai) · Ȉ yjP(Bj) = Mȟ Â MȘ. i=1j=1
i=1
j=1
ȱȱ. If [ t 0, K t 0 , then, by the lemma 1, there are nonnegative sequences of simple random variables [ n , K n such that 0 d [ n M n [ n [ , 0 d K n g n K n K . In this case, [ n , K n are independent (as functions of independent random variables), and 0 d [ n K n M n [ g n K n [ K . 205
Hence, MȟȘ = lim MȟnȘn = lim (Mȟn Â = MȘn ) = lim Mȟn Â lim MȘn = Mȟ Â MȘ. n ื
n ื
n ื
n ื
(incidentally, we used the multiplicative property proved in Section I for simple random variables). ȱȱȱ. In general case [ [ [ , K K K , [ r and K r are independent nonnegative random variables. Then by the multiplicative property of the mathematical expectation for nonnegative random variables
M[ rK r
M[ r K r .
Therefore,
M [K M [ K K K
M [ K [ K [ K [ K
M[ K M[ K M[ K M[ MK
M[ MK M[ MK
M[
M[
M[ MK M[ MK
MK
MK
M[ MK .
ז
1.2. Properties «almost sure»
If for an event ࣨ ^Z : property ࣨ is satisfied} its probability is equal to 1 (i.ɟ., with the probability 1 the property ࣨ is satisfied: P(ࣨ ) = 1), then they say that the property ࣨ is satisfied almost sure (a.s.) and this fact is written in the form: ࣨ (a.s.), or ࣨ P a.s.). For example, if ࣨ = ^Z :[ Z 0` , P(ࣨ ) = 1, write this in the form ȟ = 0 (P – a.s.) or ȟ = 0 (a.s.). Theorem 2. 10. If [ 0 (a.s.), then M [ 0 . 20. If M [ f and for Ⱥ ࣠ the probability P A 0 , then M [ I A 0 , in the other words, for any event Ⱥ ࣠ with the null probability ( P A 0 ) the integral taken over A is equal to zero: Ⱥ ࣠ǡ P A 0 ฺ M [ I A
³ [ Z PdZ
0.
A
30. If [ K (a.s.) and M [ f , then M K f and M [ MK . 40. If [ t 0 and M [ 0 , then [ 0 (a.s.) 50. If for random variables [ ,K takes place M [ f , M K f and for any event Ⱥ ࣠ takes place M [ I A d M K I A , then [ d K (a.s.) 206
Proofs. 10. ȱ. The case of simple random variables. Suppose that
[
n
¦ xi I A Z , i 1
i
n f , Ai ࣠ , Ai A j
(i z j ) ,
n
¦ Ai
:.
i 1
Then, by the condition, P ^[ z 0`
P ® ^Z :[ Z ¯i: xi z0
Therefore, if xi z 0 , then P ^[ Hence, M[
½ xi `¾ 0 . ¿
xi ` 0 (Chap. II, §1, example 3).
n
n
i 1
i 1
¦ xi P Ai ¦ xi P ^[ xi `
¦ xi P ^[ xi ` ¦ xi P ^[ xi ` 0
.
i:xi z 0
i:xi 0
ȱȱ. [ t 0 . By the condition P ^[ ! 0`
0 . Further, according to the lemma 1,
there is a sequence of simple random variables [ n , such that 0 d [ n n [ . Since 0 d [ n d [ , then ^[ n ! 0` ^[ ! 0` , therefore 0 d P ^[ n ! 0` d P ^[ ! 0`
0 , P ^[ n
Now, according to the stage ȱ, M [ n
0` 1, n 1, 2,... .
0 , hence, M [
ȱȱȱ. [ is any random variable. Then
[
[ [,
[ d [ , From this it follows that [
[
[ d [ ,
lim M [ n n of
0.
[ [, [
0 (a.s.)
0 (a.s.) and [ 0 (a.s.). According to what was proved at the stage ȱI, M[ M[ 0 , hence, M[ 0 . 20. N ^Z : I N Z 0` ^Z :[ I N (Z ) 0` , this implies that 1 P N d P ^[ I N i.ɟ. [ I N
0` d 1,
P ^[ I N
0 (a.s.). Then, by the property 10, M [ I N 0 . 207
0` 1 ,
30. N
^Z :[ Z z K Z `, PN
0.
[ Z [ Z I N Z [ Z I N Z [ Z I N Z K Z I N Z . Similarly,
K Z [ Z I N Z K Z I N Z .
Then, M [ I N M K I N
M[
40. ȱ. [
M K I N
M K I N K I N
MK .
n
¦xi I A Z a simple random variable. i 1
i
Then, since xi t 0 , then from the equality n
¦xi P^[
M[
i 1
we see that, if xi z 0 , then P ^[ P ^[ z 0`
xi `
xi ` 0 ,
0 . It follows that
P ® ^[ ¯ i:xi z 0
½ xi `¾ d ¦ xi P ^[ ¿ i:xi z 0
xi ` 0 ,
and it means that [ 0 (a.s.) ȱȱ. Let [ be an arbitrary nonnegative random variable: [ t 0 . Let’s introduce the events A ^Z : [ Z ! 0` , An
®Z : [ Z t ¯
1½ ¾. n¿
In this case An n A and 0 d [ I An d [ I A . This implies that
0 d M [ I An d M [ I A d M[ 0 , 1 0 M [ I An t P An , n
i.ɟ. P An 0 , n 1,2,... . Then, P A
P ^[ ! 0` lim P An 0 , ɬ.ɟ. [ nof
208
0 (a.s.).
50. Suppose that B Then,
i.ɟ.,
PB
^Z :[ Z ! K Z ` . M K I B d M [ I B d M K I B ,
M [ I B
M ^[ K I B ` 0.
M K I B ,
We obtain from this, by the property 30, that [ K I B
0 (a.s.), P B 0 , i.ɟ.
P ^[ d K ` 1 .
ז
Theorem 3. If [ is an extended random variable and M [ f (i.ɟ. M [ f (property 30)), then P^[ f` 1 , i.ɟ., [ is a proper random variable.
Proof. Let’s introduce an event A ^Z : [ f`. Suppose that P A ! 0 . In this case M [ t M [ I A f P A f , and this contradicts the condition ז M [ f.
1.3. Convergence properties Theorem 4 (The monotonic convergence theorem). If 0 d [ n n [ , then M[
lim M [ n n of
M lim [ n . n of
(5)
Proof. From 0 d [ n d [ it follows that 0 d M[n d M[ and lim M [ n d M [ .
(5')
no f
Suppose now that a sequence of simple random variables [ nk satisfies the condition 0 d [nk n [n , k o f . Then the random variables K k
max [ nk are also simple random variables. 1d n d k
In addition, 0 d Kk
max [ nk d max [ n , k 1 1d n d k
1d n d k 1
K k 1 , i.e., 0 d Kk d Kk 1 .
So, Kk is a monotonically increasing sequence. Let’s denote by K the limit of this sequence: 0 d Kk n K . For any k the equality
Kk d [k holds, therefore
lim M K k
kof
M K d lim M [ k . kof
209
(5'')
Further, for n d k a double inequality [nk dKk dK takes place. If k o f , then,
for all n we have [ n d K , hence, M[ d MK . Now the required relation (5) follows from the last inequality and relations (5'), (5»). ז Corollary 1. If [1,[2 ,... is a sequence of nonnegative [n t 0 random variables, then § f · M ¨ ¦ [n ¸ ©n 1 ¹
f
¦ M [n .
(6)
n 1
The proof follows immediately from condition 0 d Kn
n
f
k 1
k 1
¦[k n ¦[k and theorem 4.
ז Corollary 2. If K is a random variable with the finite mathematical expectation ( M K f ), and a sequence of events An , n 1, 2, ..., satisfies the condition An p , then lim ³ K Z P dZ lim M K I An 0 . (6ƍ) nof
nof
An
Proof. Since M K f , then M K f (§1, finiteness property). We introduce new random variables:
Kn' K I An ,
Kn
K I An .
In this case
K By
the
monotonic
limMKn limM K I An nof
nof
K n' K n ,
MK
M K n M K n' ,
convergence
theorem,
0 d Kn' n K .
MK
0. But M K I An d M K I An , thereby
limKn' , nof
therefore
M K I An o 0 (n o f) .
ז
We note that under the conditions of the corollary An An :, An n : , it means that the condition (6ƍ) is equivalent to the following property (6ƍƍ): lim ³ K Z P dZ
nof
An
³K Z P dZ
MK .
:
Theorem 4 can be generalized as follows ([9], ɫɬɪ. 202203). 210
(6ƍƍ)
Theorem 4ƍ. Let K,[1,[2 ,... are the random variables.
ɚ) If for n t 1 the conditions [ n t K , MK !f, [n n [ hold, then M [n n M [ . b) If for n t 1 the conditions [ n d K , M K f , [n p [ hold, then M[n p M[ . Theorem 5 (Lebesgue's theorem on majorized convergence). If for any elementary event Z : the following convergence takes place:
lim [ n ( Z ) [ Z ,
nof
where [ n d K , MK f, then M [ f and
lim M[n
M[
nof
M lim[n . nof
(7)
Proof. Let’s introduce for any H ! 0 the following sequence of events:
^Z
An
`
: sup [m (Z) [ Z H , n 1, 2,... . mtn
In this case An p . Further, we estimate the summands in the sum [ n
[ n I A d [ H I A n
n
n
n
[ n I An [ n I An as follows:
[ IA H IA d [ IA H , n
n
n
n
n
n
n
n
[ n I A t [ H I A [ I A H I A t [ I A H , K I A d [ n I A d K I A . n
where it follows that
[ H [ I A K I A d [n d [ H K I A [ I A , n
n
n
n
M [ H 2 M K I An d M [ n d M [ H 2 M K I An .
Passing to the limit when n o f in the last inequalities and using Corollary 2, we can write: M [ H d lim M [ n d lim M [ n d M [ H , nof
nof
And then, taking into account that H ! 0 is any number, we obtain the desired relation lim M [ n M [ . ז n of
Corrolary 1. Under the hypotheses of Theorem 5 M [n [ o 0 211
( n o f) .
Proof. To prove the corollary, it suffices to note that
[ n [ d 2K . In fact, if we denote K n [ n [ , then K n d 2K , M 2K 2 M K f and Kn o 0 (n o f) . Then, by the theorem 5 MKn
M [n [ o 0 .
ז
Corrolary 2. Let for the random variables K ,[ ,[1 ,[ 2 ,... the following conditions
hold: [ n d K , [ n o [ Then M [
p
Z : and M K p f
for some p ! 0 .
f and M [n [
p
o 0 (n o f) .
To prove the corollary, it suffices to note that
[ dK ,
[n [ d [ [n p
p
d 2K . p
ז
Remnark 4. Under the conditions of the theorem 5, it means that under the conditions of corrolaries 1, 2, we can change the condition [n (Z) o[ Z (for any Z : ) by
weaker condition [ n (Z ) o [ Z (a.s.) (see [9], Chap. II, §6, theorem 3). 1.4. Formulas for computing expectation
As we have already noted in Ch. III, a random variable (from the point of view of probabilistic properties) defined on the probability space ሺ : ǡ ࣠ǡ ܲሻ is completely described by its distribution law P[ , therefore it can be considered as a function
x , x R , defined on the probability space R, E R , P[ . From what has been said, we can conclude that the mathematical expectation M [ Z ³ [ Z P dZ should practically not depend on the form of the function
[ x
:
[ Z , Z :, it should depend only on the probability distribution P[ . Indeed, for nonnegative random variable [ we defined its mathematical exepctation through the limit of a sequence of mathematical expectations of simple random variables 0 d [ n n [ by the relation M [ lim M [ n (§1, formula (2)). Then, for n of
example, by the theorem 3 (case b)) of point 1.4.1, §1, Chap. ȱȱȱ and by the properties of mathematical expectation 212
M [n
k 1 k 1 k½ P ®Z : n [ Z d n ¾ nP ^Z : [ Z ! n` n 2 2 ¿ 1 2 ¯ n 2 n k 1 § k 1 k º ¦ n P[ ¨ n , n » n 1 P[ f, [email protected] 2 k 1 © 2 2 ¼ n n2 k 1 ª §kº § k 1 ·º ¦ n « F[ ¨ n » F[ ¨ n ¸» n 1 F [ n . k 1 2 © 2 ¹¼ ¬ ©2 ¼
n 2 n
¦ k
The limit of the first sum in this sequence is the Lebesgue integral ³ [ Z P dZ , :
and the limits of the sums in the second and third rows are, respectively, the Lebesgue f
integral L ³ xP[ dx and LebesgueStieltes integral L S ³ xdF[ x (see Remark 5 :
0
below). Further, applying these arguments to random variables [ and [ , we would obtain for a mathematical expectation of a random variable [ [ [ a formula that depends only on the (function) distribution of the random variable [ f
³ xdF[ x .
M[
(8)
f
If M [ f , then the righthand side of (8) can be understood as an absolutely convergent improper RiemannStieltes integral. Our goal now is to derive a general formula for calculating the mathematical expectation of a random variable and indicate its explicit form in two cases (considered in this textbook)  for discrete and (absolutely) continuous random variables. Theorem 5 (The formula for the change of variables in the Lebesgue integral). Let [ [ Z be the random variable, defined on the probability space ሺ : ǡ ࣠ǡ ܲሻ, g g ( x) a Borel function. Then, if for A E R there exists at least one of integrals ³ g ( x) P[ dx or A
³
[ 1 A
g ([ Z ) P dZ , then there exists the second integral and they are equal:
³ g ( x ) P[ dx A
In the spesial case, when A Mg [ Z
³
[ 1 ( A )
g [ Z P d Z .
(9)
R , then
³ g [ Z P dZ ³ g x P[ dx .
:
R
Proof. Suppose that A E R and g x 213
IB x , B E R .
(9ƍ)
In this case g [ Z
I B [ Z
³ g x P[ dx ³ I B x P[ dx A
A
³ P dZ
³ I[
[ 1 AB
[ 1 A
1
B
I [ 1 B Z ,
1 ³ P[ dx P[ AB P ^[ AB `
AB
Z P dZ
³ g [ Z P dZ .
[ 1 A
The validity of formula (9) for any simple, including any nonnegative simple, function g ( x ) follows from the relation just proved. The validity of the theorem for any nonnegative Borel function g ( x ) follows from the monotone convergence theorem. In the general case, it suffices to write the function g ( x ) in the form g(x) g (x) g (x) and take into account the validity of the formula by changing the variables for integrals with g ( x) and g ( x ) , and note that, for example, if ³ g ( x) P[ (dx) f then ³ g ([ Z ) P[ (d Z ) f , i.e. from the existence of the integral [ 1 ( A )
A
³ g ( x) P[ (dx) it follows that there exists an integral ³ g [ Z P(d Z ) .
ז
[ 1 ( A )
A
Remark 5. The random variable distribution law P[ and its distribution function F[ ( x ) P ^[ d x` uniquely determine each other (Chap. III, §1); therefore, the
Lebesgue integral ³ g ( x) P[ ( dx ) is usually denoted by R
³ g ( x ) F[ ( dx ) or ³ g ( x)dF[ ( x) . R
In this way,
R
³ g ( x) P[ dx ³ g ( x)dF[ x . R
(9»)
R
The last integral is called the LebesgueStiltes integral (or simply Stieltes), taken with respect to a measure corresponding to the distribution function F[ ( x ) . Corollary. 1. If [ is an (absolutely) continuous random variable with the distribution density f[ x , g ( x) is the Borel function, then the mathematical expectation of the random variable K g [ is calculated by the formula Mg ([ )
f
³
f
g ( x ) f[ ( x) dx ,
(10)
where the integral of the right side (in general case) is understood as the Lebesgue integral taken from the function g ( x) f[ ( x) with respect to the Lebesgue measure on the line. 214
2. If [ is a discrete random variable that takes values x1, x2 ,... , then the mathematical expectation of a random variable K g [ is calculated by the formula
¦ g ( xi ) P ^[
Mg ([ )
i
xi ` .
(11)
Proof. 1. Let f[ x be the density function which corresponds to the distribution function F[ ( x) , i.ɟ. x
F[ ( x)
³
f
f[ y dy ,
where the integral is understood as the Lebesgue integral over the set f, x @ . We first consider the case when the function is an indicator of a Borel set: g ( x ) I B x , B E R . Then (by the definition of the Lebesgue integral) f
³
f
g x f[ x dx P[ B
f
³ I B x f[ x dx ³ f[ x dx
f
P ^[ B`
B
Mg [
MI^[B`
.
The validity of the formula follows from the formula and the theorem on the extension of the probability: for us (see Chapter III, §1, item 1.2, formula (13 ')). (The validity of the formula P ^[ B` ³ f[ x dx follows from the formula B
F[ b F[ a P[ a , b @
b
³ f[ x dx and the theorem on the extension of the probability: we have a
F[ b F[ a (see Chap. III, §1, p. 1.2, formula (13')).
Further, proceeding in the same way as in the proof of Theorem 5, we verify the validity of (10) in the general case. 2. For discrete random variables, formula (11) is a direct consequence of the definition of the LebesgueStiltes integral, it suffices to note only that 'F[ xi
F[ xi F[ xi 0
P ^[
xi ` .
ז
Remark 6. The formulas for calculating the mathematical expectation, similar to the last formulas, are also valid in the more general case, when mdimensional Borel function g g x1 , x 2 , ... , x m maps R m in R1 . Namely, if a multidimensional random variable (random vector) [ [ 1 , [ 2 , ... , [ m has a joint distribution function F[1 ,[ 2 ,...,[ m x1 , x 2 , ... , x m , then the following formula for calculating the mathematical expectation takes place: Mg [ 1 , [ 2 , ... , [ m
f
³
f
f
³ g x1 , x 2 , ..., x m dF[1 ,...,[ m x1 , ... , x m .
f
215
(12)
In particular, if [1 , [ 2 , ..., [ m is a multidimensional (absolutely) continuous random variable, then formula Mg [ 1 , ... , [ m
f
f
f
f
³ ³ g x1 , ... , x m f [1 , ... , [ m x1 , ... , x m dx1 dx 2 ... dx m ,
(13)
takes place. Here f [1 ,...,[ m x1 , ... , x m the joint distribution density [1 , ... , [ m . If [ [ 1 , ... , [ m is a multidimensional discrete random variable (discrete vector), then the following formula takes place:
^
¦ g x i , y j , ... , z k P [ 1
Mg [ 1 , [ 2 , ... , [ m
i, j , k
xi , [ 2
y j , ... , [ m
`
zk .
(14)
The proofs of formulas (12), (13), (14) can be carried out by analogy with the proofs of the formulas (9'), (10), (11) corresponding to them in the onedimensional case. We especially note that for the (absolutely) continuous random variables considered in our textbook, the conditions when the integrals (10), (13) can be regarded as Riemann integrals, will always be satisfied. Remark 7. When calculating mathematical expectations M[ , Mg [ , the techniques that circumvent the above formulas of direct computation are very often used, especially since it is not uncommon for the laws of distributions of random variables to be very complex, or not at all written out explicitly. One of these methods involves the presentation of [ in the form of a sum of simpler random variables (for example, indicators) and use of the linearity property of mathematical expectation (below we will give several examples of the application of this method). Another method of calculating mathematical expectations is associated with the use of the socalled generating and characteristic functions of random variables (these questions will be considered in the chapters devoted to these functions). 1.4.1. Fubini theorem and some of its applications
Fubini theorem, given below (without proof), plays the same role in our discipline as the wellknown theorem from the analysis of reduction of the Riemann double integral to the repeated one. This theorem will be used in the next section to justify the formulas for finding marginal densities, the independence criterion for continuous random variables in terms of densities and some other facts. Suppose that there are the spaces ሺ : ଵ ǡ ࣠ଵ ǡ P1 ሻ andሺ : ଶ ǡ ࣠ଶ ǡ P 2 ሻ, where P1 , P2 are some finite measures. We form the new space ሺ : ǡ ࣠ǡ P ሻ as follows: :
:1 u : 2 (Cartesian product),
࣠ = ࣠ଵ ۪࣠ଶ ={ A1× A2 : A1࣠ଵ ǡ A2 ࣠ଶ }, 216
Let’s define a measure P
P1 u P 2 (Cartesian product) by the relation
P 1 u P 2 A1 u A2
P 1 A1 P 2 A2 , A1 ࣠ଵ , A2 ࣠ଶ .
The existence of such a measure follows from the proof of Fubini theorem below. Theorem 1 (Fubini theorem). Let [ [ Z1 ,Z 2 be the ࣠ଵ
࣠ଶ – measurable and integrable by the measure P1 u P 2 function: ³ [ Z1 ,Z 2 d P1 u P 2 f .
:1 u : 2
Then the integrals ³ [ Z1 ,Z 2 P1 dZ1 and ³ [ Z1 ,Z 2 P 2 dZ 2 :1
:2
ɚ) are defined for all Z 2 and Z1 (respectively); b) are (respectively) ࣠ଶ  and ࣠ଵ measurable functions and °
½°
P 2 ®Z 2 : ³ [ Z1 ,Z 2 P1 dZ1 f¾ 0 ,
°¯ °¿ :1 ° ½° P1 ®Z1 : ³ [ Z1 ,Z 2 P 2 dZ 2 f ¾ 0 ; °¯ °¿ :2
c) the following formulas for reducing the integrals by the measure P1 u P 2 to the repeated integrals by the measures P1 and P 2 take place: ³ [ Z1 ,Z 2 d P1 u P 2
:1 u : 2
º ª ³ « ³ [ Z1 ,Z 2 P 2 dZ 2 » P1 dZ1 :1 ¬« : 2 ¼»
ª º ³ « ³ [ Z1 ,Z 2 P1 dZ1 » P 2 dZ 2 . : 2 «¬:1 »¼
Corollary 1. If one of the conditions ª º ³ « ³ [ Z1 ,Z 2 P 2 dZ 2 » P1 dZ1 f , :1 «¬: 2 »¼ ª º ³ « ³ [ (Z1 , Z2 ) P1 (dZ1 ) » P2 (dZ2 ) f :2 ¬ :1 ¼
is satisfied, then the other one is also satisfied and the assertions of the Fubini theorem hold. 217
Corollary 2. Let ([ ,K ) be a twodimensional random variable with joint distribution density f[ ,K ( x, y ) , i.ɟ., for ([ ,K ) there is a nonnegative E ( R 2 ) measurable function f[ ,K ( x, y ) which satisfies the condition P{([ ,K ) B}
B E (R2 ) .
³ f[ ,K ( x, y)dxdy ,
B
Then there are the onedimentional densities f[ ( x ) and fK ( y ) of random variables [ and K (respectively) and they are defined through a joint distribution density by the formulas f[ ( x )
f
³
f
f[ ,K ( x, y)dy ,
fK ( y )
f
³ f[ ,K ( x, y)dx .
(12)
f
Proof. Really, if A E ( R ) , then by the Fubini's theorem P{[ A} P{([ ,K ) A u R}
ª º ³ f[ ,K ( x, y)dxdy ³ « ³ f[ ,K ( x, y)dy » dx ,
Au R
¬
A R
¼
but this proves both the existence of (marginal) density f[ ( x ) of a random variable [ , and the formula for its finding by the first formula in (12). The existence of fK ( y ) ז and the validity of the second formula in (12) are proved similarly. We now prove the following assertion, which gives the condition (criterion) for the independence of continuous random variables in terms of distribution densities, and also strictly mathematically justifies assertions 1 of Theorem 3 from Ch. ȱȱ, §2, p. 2.2. Corollary 3. Suppose that for random variables [ and K there exists their joint distribution density f [ ,K ( x , y ) . Then, in order for the random variables [ and K to be independent, it is necessary and sufficient that their joint distribution density be equal (almost everywhere by the twodimensional Lebesgue measure) to the product of their (marginal) density distributions: f[ ,K ( x, y ) f[ ( x ) fK ( y ) . (13) Proof. In Ch. ȱȱȱ, §2, Theorem 3 we proved that in order that the random variables
[ and K are independent it is necessary and sufficient that F[ ,K ( x, y )
F[ ( x ) FK y ,
( x, y ) R 2 ,
(14)
i.e. that the joint distribution function F[ ,K x, y of random variables [ and K be equal to the product of their marginal (onedimensional) distribution functions F[ x and FK y .
218
Further, if (13) holds, then by the Fubini theorem F[ ,K ( x, y )
³
( f , x ]u( f , y ]
f[ ,K ( x, y ) dx dy
³
( f , x ]u( f , y ]
f[ ( x ) fK ( y ) dx dy
³ f[ ( x)dx ³ fK ( y )dy F[ ( x) FK ( y ) ,
( f , x ]
( f , y ]
i.ɟ. (14) holds and [ , K are independent. Conversely, if [ and K are independent and f[ ,K ( x, y ) is their joint distribution density, then again, by the Fubini theorem
³
( f , x ]u( f , y ]
f[ ,K ( x, y ) dx dy
³ f[ ( x ) dx ³ fK ( y ) dy
( f , x ]
³
( f , y ]
( f , x ]u( f , y ]
f[ ( x ) fK ( y ) dx dy .
Whence, applying the Caratheodory theorem on the continuation of probability, for any B E ( R 2 ) we obtain the equality
³ f[ ,K ( x, y )dx dy ³ f[ ( x) fK ( y )dx dy . B
B
Finally, since B is any Borel set, then by the mathematical expectation property 50 from p. 1.2 we verify the validity of (13). ז 1.5. Variance
If we take as Borel functions g(x) the functions g(x) = xn, g(x) = xn, n g(x) = (x – Mȟ)n, g(x) x Ȃȟ , then the mathematical expectations Mg [ , i.ɟ. Ȃȟ n , Ȃ [ , Ȃ [ Ȃ [ , Ȃ [ Ȃ [ n
n
n
are called (respectively) the nth moment (or the moment of the nth order); the nth absolute moment (or absolute moment of the nth order); the nth central moment (or the central moment of the nth order) and the nth central absolute moment (or the central absolute moment of the nth order) of a random variable [ . The second central moment of a random variable is called its variance (dispersion) and denoted by D [ . So,
D[
M [ M [ . 2
(15)
Variance properties: 10. D[ t 0, because
[
Ȃ [ t 0 D[ 2
219
M [ Ȃ [ t 0 . 2
D[ is called a meansquare deviation (sometimes a standard The value V deviation). 20. For the constants a,b:
D ( a b[ ) b 2 D[ . D(a) 0 , D(b[ ) b2 D[ .
(16)
The Proof directly follows from the definition. 30. If D[ 0 , then ȇ ^Ȧ : ȟ Ȧ Ȃȟ ` 1 , i.ɟ. [ Z Ȃ [ (a.s.). Really, as ([ Ȃ [ ) 2 t 0 , Ȃ ([ Ȃ [ ) 2 D[ 0 , then, by the property of mathematical expectation 40 (see p. 1.2, Theorem 2), [ Ȃ [ 0 (a.s.), i.ɟ. [ Ȃ [ (a.s.). 40. A variance can be calculated by the formula
D[
M[ 2 M[ . 2
(17)
The Proof of this formula follows from the relations Dȟ
Ȃ ȟ Ȃȟ
2
Ȃ ȟ 2 2ȟ Ȃȟ Ȃȟ
Ȃȟ 2 2 Ȃȟ Ȃȟ Ȃȟ
2
2
Ȃȟ 2 Ȃȟ . 2
Note that often finding a variance by formula (17) leads to a goal faster than finding it by formula (15). 50. For any random variables [ and K D ȟ r Ș
Dȟ DȘ r 2 Ȃ ȟ Ȃȟ Ș ȂȘ .
(18)
Really, by definition, D(ȟ r Ș) Ȃ ¬ªȟ r Ș Ȃȟ r ȂȘ ¼º
2
Ȃ ¬ª ȟ Ȃȟ r Ș ȂȘ ¼º
= Ȃ ȟ Ȃȟ Ȃ Ș ȂȘ r 2 Ȃ ȟ Ȃȟ Ș ȂȘ Dȟ DȘ r 2 Ȃ ȟ Ȃȟ Ș ȂȘ . 2
2
The mathematical expectation Ȃ ȟ Ȃȟ Ș ȂȘ 220
2
on the righthand side of (18) is called the covariance of the random variables [ and K and denoted by cov ȟ ,Ș . So, cov [ ,K Ȃ ȟ Ȃȟ Ș ȂȘ Ȃ ȟȘ ȟȂȘ ȘȂȟ ȂȟȂȘ cov ȟ ,ȟ
M [K M [ M K ,
(19)
D[ .
The following properties of covariance immediately follow from its definition (below a, b, c are constants): cov ȟ ,K
cov K , ȟ ,
cov aȟ ,bK
ab cov ȟ ,K ,
cov aȟ bK c, ]
(20)
a cov ȟ , ] b cov K , ] .
Taking into account the definition of cov ȟ ,Ș , the relation (18) can now be rewritten as follows: D [ r K
D[ DK r 2 cov [ ,K
(21)
It is easy to show that for all random variables [1 , [ 2 ,...[ n the variance of the sum is equal to § n · D ¨ ¦[ i ¸ ©i 1 ¹
¦ cov [ i , [ j ¦ D[ i ¦ cov [ i , [ j n
n
i, j 1
i 1
iz j
(22)
¦ D[ i 2¦ cov [ i , [ j n
i j
i 1
60. If [ and K are the independent random variables, then
cov ȟ ,Ș
D [ r K
0,
D[ DK .
Really, in this case, by the formula (20) and multiplicative property of the expectation,
cov ȟ ,Ș
Ȃȟ ȂȘ Ȃȟ ȂȘ
Therefore, (by the formula (21)) D [ r K
D[ DK .
221
0,
Similarly, for finite number of pairwise independent random variables [1 , [ 2 ,...[ n ,
due to the relations cov [i , [ j
0 i z j , the variance of their sum is equal to the
sum of their variances: § n · D ¨ ¦ [k ¸ ©k 1 ¹
n
¦ D[ k .
(22ƍ)
k 1
70. If each of the finite number of random variables [1 , [ 2 ,...[ n does not depend on the sum of the previous random variables, then D [1 [ 2 ... [ n
D[1 D[ 2 ... D[ n .
Really, in the case under consideration the sum [1 ... [ n 1 and [ n are independent; [1 ... [ n 2 and [ n1 are independent; ...; [ 2 and [ 1 are independent, so, by the previous property 60 we can write: D [1 ... [ n 1 [ n
D [1 ... [ n 1 D[ n
= D [1 ... [ n 2 D[ n 1 D[ n
D[1 D[ 2 ... D[ n .
If a covariance of random variables [ and K is equal to zero, i.ɟ. cov [ ,K 0 , then such random variables are called uncorrelated random variables. We obtain from this definition, that the properties 60 are also valid for uncorrelated random variables. Remark 8. By the property 60 the independent random variables are necessarily uncorrelated, but the converse is, generally speaking, incorrect – the independence of random variables does not always follow from their uncorrelatedness. S S Example. Random variable [ takes the values , 0 , with the same pro2
1 bability . Let’s define new random variables 3
[1 sin [ ,
2
[2
cos [ . In this case,
[ 1 and [ 2 are uncorrelated, but they are functionally dependent ( ȟ12 ȟ2 2 1.) If for the random variables [ and K their variances D[ ! 0, DK ! 0 , then the value U [ ,K which is defined by the formula ȡ
ȡ ȟ ,Ș
cov ȟ ,Ș Dȟ DȘ
(23)
is called a correlation coefficient of random variables [ and K . If [ and K are uncorrelated or independent, then U [ , K 0 , because in these cases cov [ ,K 0 . 222
By the CauchyBunyakovskii inequality (this inequality will be proved in the next subsection 1.6)
cov ȟ ,Ș
Ȃ ȟ Ȃȟ Ș ȂȘ d Ȃ ȟ Ȃȟ Ș ȂȘ d d Ȃ ȟ Ȃȟ Ȃ Ș ȂȘ 2
2
D[ DK .
(24)
U [ ,K d 1 .
It implies that always
If ȇ ^ȟ aȘ b` 1 , where a z 0, b – is a constant, then U 1 and conversely, if U 1 , then the random variables are linearly dependent with the probability 1. Really, if a z 0, b is a constant and [ aK b (a.s.), then D[ D aK b a 2 DK and by the property (20) cov [ ,K cov aK b, K
aDK .
This implies aDȘ
ȡ ȟ ,Ș
ȡ
Conversely, let U
DȘ a 2 DȘ
r1, ɬ.ɟ. U
1.
1. Let’s introduce new random variables: ȟ
In this case 0 [
a a
0 , D[
0 K
D ȟ r Ș
ȟ Ȃȟ , Dȟ
Ș
DK
Ș ȂȘ . DȘ
1 , therefore,
Dȟ DȘ r 2 ȡ ȟ ,Ș
2( 1 r ȡ).
From this we obtain by the property of variance 30 that: if U 1 , then [ K 0 (a.s.); if U 1 , then [ K 0 (a.s.). Then, by using the property of expectation 4ɨ from p. 1.2, we verify the equivalence of the last two relations to the following: if U 1 , then [ if U
1 , then [
aK b1 (a.s.); aK b2 (a.s.),
where a
Dȟ ! 0, DȘ
b1
Ȃȟ a ȂȘ,
223
b2
Ȃȟ a ȂȘ.
This is justification of the fact that the correlation coefficient is usually considered as a measure of the independence of random variables: if U [ ,K is sufficiently close to zero, then it is reasonable to assume that the random variables are independent; if U [ ,K is sufficiently close to r 1 , then it is reasonable to assume that between the random variables there is an (a.s.) linear relation. We now state the results obtained above in the form of a theorem. Theorem 8. 1. From the independence of random variables, their uncorrelatedness always follows, but the converse is not true: from the uncorrelatedness of random variables, their independence does not always follow. 2. The correlation coefficient of independent (also uncorrelated) random variables is always zero. 3. U [ ,K d 1. 4. ɚ) If U [ ,K
1 , then the random variables [ ,K are a.s. linearly dependent.
b) If the random variables [ ,K are a.s. linearly dependent, then U [ ,K
1.
1.6. Inequalities that are related to mathematical expectation
The inequalities proved in this subsection are very important and are often used both in probability theory and in analysis. The CauchyBunyakovsky Inequality. Let random variables [ , K be such that
M[ 2 f , MK 2 f . Then, M[K f and M [K d M[ 2 MK 2 .
(25)
Proof. Let M[ 2 ! 0 , MK 2 ! 0 . Let’s introduce new random variables:
[
~ [ ~ In this case M[ 2
M[ 2
K~
,
K MK 2
.
~ ~ MK~ 2 1 , and from the inequality [ 2 K~ 2 t 2 [ K~ we get
that
~
~
2M [ K~ d M[ 2 MK~ 2
~
2,
i.ɟ. M [ K~ d 1 . It follows that M [K d M[ 2 MK 2 , i.e. the inequality (25) holds. 224
~ If M[ 2 0 (or MK~ 2 0 ), then [ 0 (a.s.) (or K 0 (a.s.)), from which we obtain that ȟȘ 0 (a.s.). Hence, M[ 0 (or MK 0 ) and M [K 0 and the required inequality (25) becomes equality.ז Inequality of Jensen. Let g g( x ) be a convex downwards function, [ is a random variable and M [ f . Then,
g M[ d Mg [ . If g
(26)
g( x ) is a convex upward function, then g M[ t Mg[ .
(26ƍ)
Proof. As is known from the analysis, if g g( x ) is a convex downward function, then for each x0 R there is a number O x0 such that for all x R
g x t g x0 x x0 O x0 . Assuming that x [ , x0
(27)
M[ , the last inequality can be rewritten in the 
g [ t g M[ [ M[ O M[ , hence,
Mg [ t g M[ M [ M[ O M[ g M[ . For the case of a convex upward function g = g(x) it suffices to note that the last ז inequality holds to the contrary. Lyapunov inequality. For any numbers 0 s t and any random variable [
M [ d M [ s 1s
t 1t
Proof. We introduce the notation r
.
t . Then, assuming that K s
(29)
[
s
and
applying Jensen's inequality to the function g(x) = xr, we find that MȘr MȘr, i.ɟ.
M [
s t s
t
dM [ ,
which proves the Lyapunov inequality (29).ז The Lyapunov inequality implies the following chain of inequalities between absolute moments:
225
M[ d M[
d M [
212
313
d ... d M [
n
nn 1 n
.
(30)
For the proof of (30) it is sufficient to write out the Lyapunov inequality successively for s 1, t 2; s 2, t 3;..., s n 1, t n; Corrolary. If for any natural number n the nth absolute moment of a random variable [ is finite (i.ɟ. Mȟn < ), then all absolute moments of smaller orders are finite, i.ɟ. Mȟk < (k = 1,2,…,n – 1): Mȟn < Mȟk <
(k = 1,2,…,n – 1).
The Holder inequality. Suppose that for 1 p f , 1 q f and
following inequality takes place: M [ Then M [ K f and
p
f, M K
Ȃ ȟȘ d Ȃ ȟ
p
1 p
q
1
p
1
q
1 the
f.
1 q q
Ȃ K
.
(31)
If p q 2 , then from the Holder inequality (31), as a special case, we obtain the CauchyBunyakovsky inequality (25). Minkowski inequality. If M [ p f , M K p f , 1 d p f , then M [ K p f and 1 p
1 p
Ȃ ȟ Ș d Ȃ ȟ Ȃ Ș p
p
p
1 p
.
(32)
We do not give here the proofs of the last two inequalities (31) and (32); they are given, for example, in [1], [2], [9]. Chebyshev inequality. If [ is a nonnegative random variable [ t 0 , then for any H ! 0 Ȃȟ . ȇ ^ȟ t İ` d (33) İ Proof. For [ t 0 we can write:
ȟ
ȟ I^ȟ t İ` ȟ I^ȟ İ` t ȟ I^ȟ tİ` t İ I^ȟ tİ` .
Whence, on the basis of the nonnegativity of the mathematical expectation, we have 226
Ȃȟ t Ȃ İI^ȟ tİ`
İ ȇ ^ȟ t İ` ,
İ ȂI^ȟ tİ`
and this is the necessary inequality (33). ז Corollaries. For any random variable [ and any H ! 0 the following inequalities take place:
ȇ ^ ȟ t İ` d ȇ ^ ȟ t İ` d
Ȃȟ , İ
Ȃȟ 2 k , İ 2k
k
1, 2,...
Ȃ ȟ Ȃȟ ȇ ^ ȟ Ȃȟ t İ` d İ 2k
ȇ ^ ȟ Ȃȟ t İ` d
(341)
(342)
2k
Ȃ ȟ Ȃȟ İĮ
,k 1, 2 ,...
(343)
(Į ! 0 ).
(344)
Į
Proofs. The inequality (341) is the inequality (33), which is rewritten for ȟ t 0 .
To prove the inequalities (342), (343), (344), we first note that for any H ! 0 the following equality of events is satisfied:
^[
t H`
^[
2k
t H 2k ` ,
^[ M[ t H` ^ [ M[ t H `, ^ [ M [ t H ` ^ [ M [ D t H D ` D ! 0) . 2k
2k
Now it only remains for us to apply the inequality (341) (or (33)) to the probז abilities on the righthand sides of events. In what follows we shall call all these inequalities (33), (341) – (344) the Chebyshev inequalities. In the particular case k 1 , inequality (343) states the following: For any random variable ȟ and any H ! 0
Dȟ , İ2 Dȟ ȇ ^ ȟ Ȃȟ İ` t 1 2 İ ȇ ^ ȟ Ȃȟ t İ` d
227
(35)
In the educational literature, the last inequality (35) is usually called the Chebyshev inequality. Using inequality (35), we can estimate the probability of deviation of a random variable from its mathematical expectation through the known mathematical expectation and variance of the random variable under consideration. ȝ Example. Let us estimate the deviation of the relative frequency n from the n probability of success p in a sequence of n independent Bernoulli trials using the Chebyshev inequality (35). We know that P n ~ Bi n, p , and by the example 1 (see below, p. 1.7, example 1)
Ȃȝn
np , Dȝn
npq , and, by the properties of an expectation and variance, M
ȝn n
§ȝ · D¨ n ¸ © n ¹
p,
pq . n2
Theerfore, for any H ! 0 , by the Chebyshev inequality,
ȝ ½ pq ȇ ® n p d İ¾ t 1 2 . nİ ¯ n ¿
(36)
We pose the following question: how many Bernoulli trials must be carried out to ensure that the probability on the lefthand side of (36) is not less than the given probability 1 D (D is the number, which is sufficiently close to zero)? 1 Taking into account the inequality pq d , we see that the required number n of 4 tests can be determined from inequality
pq 1 d d Į, 2 4nİ 2 nİ
(37)
and for n we obtain the inequality
nt
1 . 4İ 2 Į
(38)
Thus, as a number n, we can take the smallest positive integer number, satisfying (38). For example, if D 0, 05 , H 0, 02 , then n = 12500, it means that conducting 12500 or more tests ensures the inequality ȝ ½ ȇ ® n p d 0.02 ¾ t 0.95 ¯ n ¿
regardless of the unknown probability of success p . 228
1.7. Mathematical expectation and variance: examples of calculation Ⱥ) The case of discrete random variables 1. Mathematical expectation and variance of Bernoulli, binomial, Poisson and geometric random variables. ɚ) [ Bernoulli random variable: P{[ 1} p, P{[ 0} q 1 p. In this case, by the formula (1):
Ȃȟ 1 p 0 q hence,
M[ 2 M[
D[
Ȃȟ 2
p,
2
p2 p
12 p 0 2 q
p (1 p )
p,
pq.
b) [ binomial random variable: [ ~ Bi ( n, p ). In this case, by the formula (1): Ȃȟ
n
n
k 0
k 0
¦ kCnk p k q nk ¦ k n
n! p k q nk k !(n k )!
( n 1)! p k 1q n 1( k 1) 1 ( k 1)!(( n 1) ( k 1))!
np ¦ k
n 1
np ¦ Cnk1 p k q n 1 k
np ( p q ) n 1
np.
k 0
In order to find the variance, we first find the second moment of the random variable. We can write: n
n
k 0
k 1
¦ k 2 Cnk p k q nk ¦ ¬ª k k 1 k ¼º
Ȃȟ 2 n
n 2 ! n 1 n 2 k 2 ! n 2 k 2
=¦ k
!
n! p k q nk k ! n k !
p 2 p k 2 q
n 2 k 2
Ȃȟ
n
n 1 np 2 ¦ Cnk22 p k 2 q n2( k 2) np k 2
n 1 np p q 2
n2
np
n 1 np 2 np
2
n 2 p 2 np 2 np .
Here we first presented k in the form k 2 k ( k 1) k (because such a record allowed us to reduce the factorials in the disclosure of the number of combinations by the n! formula Cnk ); then we again recorded new coefficients in the form of binomial k !(n k )! 229
coefficients; finally, we applied the Newton binomial formula a b
m
m
¦ Cml al b ml for l 0
the sum p q 1 . Now, given that for [ ~ Bi(n, p) its expectation is equal to M [ Mȟ 2 Mȟ
Dȟ
n 2 p 2 np 2 np np
2
2
np , we obtain
npq .
Thus, if [ ~ Bi(n, p) , then M [ np , D[ npq. Remark 9. Above, to calculate the mathematical expectation Mȟ 2 of an integervalued random variable [ , we presented this mathematical expectation in the form Mȟ ȟ 1 M [ ,
Mȟ 2
(39)
and this made it much easier for us to calculate Mȟ 2 . We will repeatedly use this technique further. From the formulas (17) and (39) we obtain the following useful formula for calculating the variance of an integervalued random variable:
Ȃȟ ȟ 1 Ȃȟ Ȃȟ 2 .
Dȟ
(40)
Another way to calculate the mathematical expectation of a random variable [ ~ Bi(n, p) . Let us write down a random variable in the form of a sum of independent Bernoulli random variables [1 , [ 2 ,..., [ n :
[
[1 [ 2 ... [ n , P^[ i 1` p, P^[ i
Then, since Ȃȟi
M[
p , Dȟi
0` q 1 q .
pq .
M [ 1 M [ 2 ... M [ n
p p ... p
np ,
n time
Dȟ1 Dȟ 2 ... Dȟ n
Dȟ
pq ...pq
npq.
c) [ ~ ɉ O . In this case, by definition (see the formula (11)): Ȃȟ
f
¦ k e Ȝ
k 0
Mȟ ȟ 1
f
Ȝk k!
¦ k(k 1 )e Ȝ
k 2
Ȝ k 1 1 (k 1 )!
f
e Ȝ Ȝ ¦ k
Ȝk k!
f
Ȝ k 2 2 (k 2 )!
e Ȝ Ȝ2 ¦ k
230
Ȝ e Ȝ e Ȝ
Ȝ,
Ȝ 2e Ȝe Ȝ
Ȝ2 .
Hence, by the formula (40),
Dȟ
Ȝ2 Ȝ Ȝ2
Thus, for [ ~ ɉ O we obtain that Mȟ
Ȝ.
Dȟ
Ȝ.
d) [ ~ G p , i.ɟ. [ is a geometric random variable with the parameter p . In
this case q
1 p :
Ȃȟ
f §f · p ¦ kq k 1 p ¨ ¦ q k ¸ k 1 © k 1 ¹q
f
¦ k q k 1 p k 1
c § q · p¨ ¸ © 1 q ¹q
Mȟ 2
f
¦ k 2 q k 1 p k 1
p ( 1 q)2
c
1 , p
f
f
k 1
k 1
p ¦ k(k 1 )q k 1 ¦ kq k 1 p .
Here the last sum is 1 p (see the first relation), and the first sum is f
pq ¦ k ( k 1) q k 2
k 1
"
"
§ q2 · pq ¨ ¸ © 1 q ¹ qq
§ f · pq ¨ ¦ q k ¸ © k 2 ¹ qq
f
p ¦ k ( k 1) q k 1
k 2
pq
2 . p3
Thus, M[ 2
2q 1 , p2 p
D[
2q 1 1 p2 p p2
1 p . p2
Finally, we obtain that, if [ is a geometric random variale with the parameter p, then Mȟ
1 , D[ p
1 p p2
q . p2
Remark 10. In some sources, a geometric random variable with a parameter p is k called a random variable K with the distribution law P Ș k` 1 p p, k 1, 2 , ... .
^
There, this random variable means the number of trials before the first success in the sequence of independent Bernoulli trials. And by our definition, a geometric random variable [ is a test number in which the first success occurred. Thus, [ K 1. Therefore
MȘ
Mȟ 1
q , p 231
Dȟ
DȘ
q . p2
2. The sum of a random number of random summands. Let [1,[ 2,...,[ n indepen
dent identically distributed random variables with Ȃȟi a, D[i V 2 , Q – is an integervalued random variable that does not depend on them and takes values k 0,1,..., n . We define a new random variable SQ (the sum of a random number of random variables) as follows:
[1 [ 2 ... [Q ,if Q ! 0 ® ¯0,if Q 0.
SQ
Let’s find the expectation MSQ and variance DSQ of the random variable SQ . n
¦ I{Q
Solution. Since the sum of indicators
1 , we can write down SQ in the
k}
k 0
form: § n SQ ¨ ¦ I^Q ©k 0
SQ
Because for Z ^Z : Q Z
· k` ¸ ¹
n
n
¦ SQ I^Q
¦ Sk I^Q
k`
k 0
k 0
k ` we have SQ Z
k`
,
S k (Z ) .
According to the condition, Q and [1,[ 2,..,[ n are independent, therefore, the random variables I{Q k } and Sk [1 [ 2 ... [ k are independent too (as functions of independent random variables). Therefore, applying first the linear, then the multiplicative properties of the mathematical expectation, we can write: MSv
¦ M Sk I^v n
k 0
k
`
n
¦ MSk MI^v
k 0
n
k`
n
n
k 1
k 1
¦ M [1 ... M [ k P v k
k 0
¦ ka ȇ ^Ȟ k` a ¦ k ȇ ^Ȟ k` a ȂȞ. To calculate the variance, we first find the mathematical expectation MSv2 . Using the above representation for SQ , we can write SQ 2
n
n
¦ ¦ S k Sl I^Q k` I^Q k 1l 1
n
l`
¦ S k 2 I^Q
Because for k z l a product if indicators I^Q
k 1
k`
I^Q
l`
k`
,
0.
Furthermore, by the properties of mathematical expectation, MSv2
n
¦ ȂSk 2 P ^Ȟ k 1
232
k` .
(41)
Let’s calculate a mathematical expectation MS k2 . Taking into account that Ȃȟ j a, Dȟ j V 2 Ȃȟ j 2 a 2 and for j z l random variables [ j and [l are independent, we obtain: MSk2
k
k
j ,l 1
j 1
¦ M [ j[l ¦ M [ j 2 k (k 1)a 2 k (V 2 a 2 ) k (k 1)a 2 .
Substituting these values into formula (41) and taking (40) into account, we obtain: MSv2
V
2
ı
2
a 2 ¦ kP ^Ȟ n
n
k` a 2 ¦ k(k 1 )P ^Ȟ
k 1
a 2 ȂȞ a 2 DȞ ȂȞ ȂȞ
From where, since MS Ȟ
aMv , DSQ DSQ
2
k 1
k`
V 2 ȂȞ a 2 DȞ a 2 ȂȞ . 2
ȂSv2 MS v , we get that 2
V 2 ȂQ a 2 DQ .
(42)
3. Hypergeometric distribution. There are m black, nm white (total n) balls in the urn. r balls are randomly selected. Let the random variable [ be the number of black balls among r selected ones. Let us find the mathematical expectation and variance of this random variable in two ways. Solution. Method 1. Let's calculate the required quantities directly, using the distribution law of a random variable [ . We know that (see Chap. I, §2, p. 2.3)
P ^[
k`
Cmk Cnrmk Cnr
(k
0,1, 2,..., m).
By definition,
M[
m
¦k
k 0
Cmk Cnrmk . Cnr
Writing a combination C mk inside the sum in the form
m! , after perfork !(m k )!
ming the necessary reduction, we get that M[
C r n
1
m
m ¦ C mk 11C nr11((mk 1)1) k 1
233
m C nr11 C nr
mr . n
(43)
We now calculate the mathematical expectation Ȃȟ ȟ 1 :
C
m r 1 n k 1
M [ ([ 1)
¦ k (k 1)Cmk Cnrmk
m( m 1) r 2 Cn 2 Cnr
rm(m 1)(r 1) . n(n 1)
It follows that D[
rm(m 1)(r 1) mr m2r 2 2 n(n 1) n n
rm § m ·§ r 1 · ¨1 ¸¨1 ¸. n © n ¹© n 1 ¹
Remark 11. If the balls extracted from the urn are taken one by one without returning, then for the number K of black balls from the r selected balls we would get (Prove!) rm M[ , n rm(n m) rm § m · ¨ 1 ¸ ! Dȟ n2 n © n¹ ȂȘ
DȘ
(44)
Method 2. We introduce the random variables [ k as follows: [ k 1, if the ball chosen in the kth extraction (experiment) is black; [ k 0 , if the ball chosen in the kth extraction (experiment) is white (k d r ) . In this case: m M [ k P ^ȟ k 1` , n m(n m) Dȟ k P ^[ k 1` P ^[ k 0` . n2 We further note that for j z k we have [ j[ k 1 (if the balls extracted in the jth and kth experiments turn out to be black) or [ j[ k 0 (in other cases). Therefore (below, we believe that j k ): M [ j[ k
P ^[ j[ k
1`
P{ [ j
1 }P{ [ k
1/ ȟj
cov(ȟ j ,ȟ k )
r
¦ Mȟ j j 1
¦ D[ j ¦ cov [ j , [ k r
j 1
1}
M [ j[ k M [ j M [ k Mȟ
Dȟ
P{ [ j
jzk
234
1,[ k
1}
m m 1 , n n 1
m(n m) , n 2 (n 1 )
rm , n rm § m ·§ r 1 · ¨1 ¸¨1 ¸. n © n ¹© n 1 ¹
B) The case of continuous random variables 1. Mathematical expectation and variance of uniformly distributed, exponential and normal random variables. Solution. ɚ) If a random variable [ uniformly distributed on the segment > a, b @ , a b, then, by the formula (5), b
M[
ab , 2
1
³ x b a dx a
M [ M [
D[
2
ab· 1 § ³ ¨ x 2 ¸ b a dx ¹ a© b
2
b a
2
12
.
b) For an exponential (with the parameter O ) random variable [ , f
M[
³ xO e
O x
1
dx
O
0
,
2
f
1· § O x ³ ¨ x O ¸ Oe dx ¹ 0©
D[
1
O2
.
c) For [ ~ N a , V 2 , M[
1 2SV
f
³ xe
x a 2 2V 2
1 2S
dx
f
f
³ V y a e
y2 2
dy.
f
(we have made a change of variables x a V y in the first integral). If we rewrite the last integral in the form of a sum of two integrals, then the first integral will be equal to zero (under the integral sign there will be an odd function and the integral is taken over the symmetric with respect to the origin region), and the second integral is equal to (see the wellknown Poisson integral) f
³e
y2 2
2S .
dy
f
For the variance D[
2
f
1 ³ x a 2SV e f
x a 2 2V 2
2
1 y2 dx V ³ y e dy V 2 , S 2 f 2
f
2
because 2
1 y2 y e dy ³ 2S f f
2
2
y y2 e 2S 235
1 f 2S f
2
f y 2
³e
f
dy 1 .
So, if [ ~ N a , V 2 , then the meaning of the parameters is as follows: a D[
M[,
V 2. 2. For [ ~ N 0, V 2 let’s find its nth moment M [ n . Solution. By the formula (10):
an
M[
f
1 ³ x 2S V e f
n
V
2
n
f
n 1 ³ x
x2 2V 2
x § 2 1 dx V ³ x d ¨ e 2V ¨ 2S V f © 2
f
x2
n2
f
2 1 e 2V dx 2SV
2
n 1
· ¸ ¸ ¹
n 1 V 2 an2 .
Whence we obtain recurrence relations: an
n 1 V 2 an 2 ,
n
2,3,...
If n 2 k 1 (even number), then an a2 k 1 0 ( k 0,1, 2,... ). If n 2k (odd number), then for k 0,1,2,... from the above recurrence relations we obtain: a0 1 , a2 k
2k 1 V 2 a2 k 2 2k 1 2k 3 V 2V 2 a2 k 4 ... 2k 1 2k 3 ... 3 1 V 2 k .
So, for [ ~ N 0,V 2 : M [ 2 k 1 M [ 2k
a2 k
a2 k 1
0
k
0,1,... ,
2k 1 !!V 2 k k
1, 2,... .
3. [ ,K – uniformly distributed in the region D
^ x, y : x
2
y 2 d 1, x t 0, y t 0`
twodimensional random variable. Let us find the correlation coefficient of random variables [ and K . Solution. The joint distribution function u is a function of [ and K is the function 4 , if x , y D ; f[ ,K x, y S f[ ,K x, y 0, if x , y D . 236
Then, for 0 d x d 1 , f
f[ x
³
f
1 x 2
f[ ,K x, y dy
4
4
1 x2 .
³ S dy S 0
In this case, by the formulas (10) and (13),
M[
4
1
3 2 2 1
2 1 x
1
2
x 1 x 2 dx ³ 1 x 2 d 1 x 2 ³ S S 0
S
0
41
4
M [K
1 x 2
³³ xy S dxdy S ³ ³ D 0 0
21
x 1 x S³
2
0
1
S
M[ 2
4
1
x S³
1 x 2 dx
2
0
4
0
1 , 2S S
2
sin S ³0
2
t cos 2 tdt
1 2S
2
³ 1 cos 4t dt 0
In exactly the same way (with symmetry taken into account), MK
4 , MK 2 3S
DK
1 § 4 · ¨ ¸ 4 © 3S ¹
1 . 4
Hence, D[
2
9S 2 16 . 36S 2
Then the correlation coefficient U [ ,K
1 4 4 2S 3S 3S 9S 2 16 36S 2
M [K M [ MK D[ DK
2 9S 32 . 9S 2 16
5. The joint distribution density of random variables [1 ,[ 2 is: f[1 ,[2 x, y
2
S x y
f[1 ,[ 2 x, y
4 , 3S
xydydx
2 § x2 x4 · ¨ ¸ S © 2 4 ¹0
dx
3 2
2
2 3
, if
x2 y2 t 1 ;
0, if x 2 y 2 1 . 237
1 . 4
2 2 Find M [12 [22 and D [1 [2 . Solution. By the formula (13),
M [12 [ 22
x12 x22
³³
x12 x22 t1
2
S x12 x22
3
dx1dx2 .
We pass to the polar coordinate system: x1 U cos M , x2 the Jacobian transformation is U, and we can write that f2S
M [12 [ 22
M [12 [ 22
³ ³ U
1 0 f2S
2
SU
2 ³³U
1 0
6
2
SU 6
U dU dM
U sin M , module of
4 , 3
U dU dM 2.
From here
D [12 [ 22
M [
2 1
[ 22 M [12 [ 22
§4· 2¨ ¸ ©3¹
2
2
2 . 9
6. Let [1 ,[ 2 be independent N a , V 2 random variables.
We show that then
M max[1 , [ 2 a
V , S
M min[1 , [ 2 a
V . S
Solution. Let K1 ,K 2 be independent N 0,1 random variables. Let’s introduce new random variables:
[1 VK1 a1 , [ 2 VK2 a2 aº ª max [1 , [ 2 max VK1 a, VK 2 a V « max K1 ,K 2 ». V¼ ¬ K max K1 ,K 2 . In this case
[
FK x
P ^max K1 ,K 2 d x` P ^K1 d x` P ^K 2 d x` 238
P ^K1 d x,K 2 d x`
) x
2
,
where 1 2ʌ
ĭ x
x
y2 2
2
1 x2 e . 2ʌ
ĭc x M x
³ e dy,
f
Further, MȘ
f
³
f
xd ĭ x f
f f
f
1 ʌ ʌ
a· a· § 1 § ı ¨ MȘ ¸ ı ¨ ¸ ı¹ © © ʌ ı¹
a
2
f
1 x2 e dx f 2ʌ
f
2³
2 ³ M x dx Mȟ
f
2 ³ ĭ x d M x
2 ³ xĭ x M x dx
2
1 , ʌ ı . ʌ
And the statement that M min [1 , [ 2
a
V S
is obtained directly from the relation min [ ,K max [ ,K [ K ,
M[
MK
a,
or for this it suffices to note that
P ^min [1 , [ 2 d x` 1 1 ĭ x . 2
6. Let’s show that, if
§1 U · · ¸¸ , © U 1¹¹
§
[1 , [ 2 ~ N ¨ 0,0 , ¨ ©
then M [12[ 22
1 2U 2 .
§ §1 0·· Solution. For K1 ,K 2 ~ N ¨ 0,0 , ¨ ¸ ¸ , according to the example 7 of Chap. ©0 1¹¹ © ȱȱȱ, §2, p. 2.3.2, we can write [ AK , where the elements of the matrix A are
a11
a21
1 ȡ , a12 2 239
1 ȡ , a22 2
a12 .
Therefore,
ª1 U 2 1 U 2 º M« K1 K2 » 2 ¬ 2 ¼
M[ [
2 2 1 2
2
1 ª 2 M ¬1 U 2 K14 2 1 U 2 K12K22 1 U K24 º ¼ 4 3ª 2 2 2 1 U 1 U ¼º 1 U 2 1 2 U 2 . « 4¬ 4 (we took into account that M K14
M K 24
3 ).
1.8. Tasks for independent work 1. The distribution law of a random variable [ is given using the table: [
2
1
0
1
2
P
1 10
1 5
3 10
1 5
1 5
Find the expectations of random variables:
[1
[,
[2 [ 2 ,
[3
2[ 1,
[4
[ .
2. Pascal distribution. The distribution law of a random variable [ is defined as follows: P ^[
k`
Ok , (1 O ) k 1
(O ! 0, k
1, 2,...) .
Find the mathematical expectation and variance of a random variable [ . 3. The random variable [ takes values with probabilities 0,1, 2,..., n that decrease in geometric progression. ɚ) Find the relationship between M [ and D [ ; b) Assuming that M [ a is known, find the distribution law of [ . 4. Poya Distribution. A random variable [ takes only nonnegative integer values (that is, [ is an integervalued random variable) and its distribution law is defined by the relations: pk
P ^[
§ O · (1 D )(1 2D ) ... > (1 ( k 1)D @ k` ¨ p0 , ¸ k! © 1 DO ¹ k
where p0
P ^[
0` (1 DO )
1
D
,
O ! 0, D ! 0 .
Find the values M[ and D[ . 5. Show that for an integervalued random variable [ , the following formulas are true: 240
ɚ) M [
f
¦ pk , where pk k t1
P ^[ t k ` ;
2 ¦ kpk M[ M[ 1 . k t1 6. Show that if [ is a nonnegative random variable, F[ x is its distribution function, then the following formulas are true: b) D[
M[
f
>
@
³ 1 F[ x dx ,
0
and for any constant c t 0 ,
M min[ , c
c
>
@
³ 1 F[ x dx .
0
7. r balls are randomly distributed over n boxes so that any ball can get into any box with equal probability 1/n. Denote by P 0 ( r , n) the number of empty boxes in this distribution. Calculate the variance DP 0 (r , n) and find the asymptotic formula for this variance when r o O ( 0, f ) . n
r o f, n o f ,
Direction. See the example 3 in p. 1.6. 8. Continuation. Suppose that the conditions of Problem 7 are satisfied. We denote by P k (r , n) the number of boxes in which exactly k balls hit. For k 1,2,... to calculate mathematical expectations MP k (r , n) and, under the conditions k
r const , r o f, n o f , o O (0, f ) , find an asymptotic formula for the mathematical expecn
tation MP k (r , n) . 9. Show that for an integervalued random variable [ the following formula for factorial moments is true ( k t 2) : M ([ ) k
M[ ([ 1) ... ([ k 1)
f
k ¦ (n) k 1 P^[ ! n` . n 1
10. If [ ~ N ( a, V 2 ) , what are the absolute moments mn M [ a n 1,2,... ? 11. In the Bernoulli scheme, p is the probability of success (outcome «1»), q 1 p is the probability of failure (outcome «0»). We will assume, then at the kth ( k t 2 ) trial there appeared the chain «00», if at the (k – 1)th and kth trials zeros were the outcomes. Let Q 00 be the number of occurrences of chains «00» in n trials. Find M Q 00 and DQ 00 . 12. Continuation. In the previous task 11, find the formula for the mathematical expectation and variance of the number Q 111 (chain «111» appears during the kth trial ( k t 3 ), if units were the outcomes of the k2, k1 and kth trials). 13. Suppose that there are n chips, numbered with numbers 1,2,...,n , in the urn. Sequential chips retrieval is made until the same chip appears twice in a row. Let [ be the number of retrievals before this event. n
Show that ɚ) pk
P ^[
k`
k 1 ! Cnk 1 k k1 , n
k 2,3,...,n 1 ;
241
§ 1 · § 1 ·§ 2 · § 1 ·§ 2 · § n 1 · 2 ¨1 ¸ ¨1 ¸¨1 ¸ ... ¨1 ¸¨1 ¸....¨1 ¸. n n n n ¹ © ¹ © ¹© ¹ © n ¹© n ¹ © 14. A physical system can be in one of the states E1 , E 2 ,... It is known that if at a time t the system is in a state Ei , then the probability that at a time t+1 the system will be in the state E j is b) M[
pij
Ci e
2 i j
.
ɚ) Find the constant Ci ; b) If the system is in a state E1 at a time t, then what is the mathematical expectation of its jump (that is, the difference of the state numbers)? 15. A random variable [ can take k nonnegative values. Prove that then the sequences n 1 ɚ) M [ n ; b)
M[
n
M[ n
tend to the greatest value of [ when n o f . 16. For a given numerical sequence x1 , x2 ,...,xn ,... , a sequence of random variables [1 ,[ 2 ,..., [ n ,... is defined by conditions P^[ n
xi `
1 , i n
1,2,..., n; n 1,2,... ,
in other words, a random variable [ n takes the first n values of a given sequence of numbers with the same probability ( n =1,2,...) ɚ) Prove that, if lim xn a f , then lim M[ n a ; n of
n of
b) Give an example of the fact that the convergence of a sequence ^x n ` is not a necessary condition for the convergence of the sequence ^M[ n ` . 17. Prove that if for moments of random variable [ takes place an equality M[ 2 M[ 3 M [ 4 , then [ can take only two possible values, 0 and 1. 18. Prove that, if M[ 2 n , M[ 2 n 1 , M[ 2 n 2 are consecutive terms of the arithmetic progression, then they are equal to each other and the random variable [ can take only two possible values, 0 and 1. 19. Prove that, if M[ n , M[ n 1 ɢ M[ n 2 are consecutive terms of the arithmetic progression, then the random variable [ is a constant or can take only two values, one of which is zero. 20. Indistinguishable r1 letters alpha and r2 letters beta are arranged in a single row randomly. We denote by [ the number of series of letters alpha at such an arrangement. Find M[ and D[. 21. There are N balls in the urn and they are numbered by the numbers 1, 2, ..., N. The sample of size n, without replacement, is selected from the urn. Let [ be the largest number of the ball from the sample. Show that: P^[ M[
k`
C kn11 C Nn
n ( N 1), n 1
, k D[
n, n 1,..., N ; n( N n)( N 1) . (n 1) 2 (n 2)
22. Let [1 , [ 2 , ... , [ n , [ n 1 be independent identically distributed random variables: 242
P^[ i
1`
p , P{[ i
0}
q
1 p , i 1, 2 , ... , n, n 1 ;
Let’s define new random variables: Kk
I ^[ k z[ k 1 ` , k
1,2,..., n ;
Sn K1 K2 ... Kn .
Show that then MS n
2 pqn ;
2 pq(1 2 pq)n 2 pq( p q) 2 (n 1) .
DS n
23. Let Sn be the number of transitions from success to failure and from failure to success in the sequence of n independent Bernoulli trials with probability of success p. Find MSn and DSn. Direction. See previous task. 24. [ , K are random variables with zero mathematical expectations, unit variances and correlation coefficient U. Show that then M max{[ 2 , K 2 } d 1 1 U 2 . 25. Show that if [ is a random variable with the distribution function F(x), then Msign[ 1 F 0 F 0 0 ,
and correlation coefficient of random variables [ and sign[ is nonnegative. 26. Shwo that if for a random variable [ , P^[ ! 0` D , P^[ 0` E , M[
then
a , MK
b,
cov [ , sign [ b a D E .
27. Suppose that random variables [ and K have nonnegative integer values and their joint distribution law is defined by the formulas P{[
n,K
k} e O
On n!
C nk p k (1 p ) n k , 0 d k d n; P^[
n, K k` 0, k ! n ,
where 0 O f, 0 d p d 1 are given parameters. Find M[ , MK , D[ , DK , cov([ ,K ) and U [ ,K (correlation coefficient). 28. Twodimensional random variable [1 , [ 2 is uniformly distributed in the circle K
^ x , x 1
2
: x12 x 22 d 1` .
Find mathematical expectations, variances, covariances and correlation coefficients of random [ 12 [ 22 . variables [1 , [ 2 , r 29. Joint distribution density of random variables [1 , [ 2 is given as follows: 243
In the circle K r ^ x1 , x 2 : x12 x 22 d r 2 ` it is given by the function f x1 , x2 c r x12 x 22 ; for x1 , x2 K r , f x1 , x 2 0. Find the following values: ɚ) constant c; b) probabilities P ^[ 1 , [ 2 K U ` , where K U a circle with the center at the origin and radius
U 0 U d r ; c) covariance and correlation coefficient of random variables [1 , [ 2 . 30. Let ( [ , K ) be a twodimensional normal random vector, M[ MK M[K U . Show that then
0 , D[
DK
1,
1 1 arcsin U , 2 S 1 U 2 , M max [ ,K S
P{[K ! 0}
1 U 2
M min[ ,K
.
S
31. Let [ 1, [ 2 ,…, [ n be independent identically distributed random variables, M[ i D[ i = V
a,
(i 1,2,..., n). Find mathematical expectations and variances of random variables 2
n
[ = 1 ¦[ i , n
1 n
S2
i 1
i 1
M [ i a 3 , then show that
If we assume in addition that there is P 3
n
¦ ([ i [ ) 2 .
cov [ , S 2
(n 1) P 3 n2
.
32. Let [1 , [ 2 ,..., [ n be the independent N a , V 2 – distributed random variables. Let’s define new random variables K n , M n , S n,k , M n,k (k t 1) as follows:
Kn S n, k
n 1
n1
2 ¦ [ i 1 [ i , Mn ¦ [i1 [i ,
i 1 n 1
i 1
¦ [ i 1 [ i , 2k
i 1
M n,k
n 1
¦ [ i 1 [ i
2 k 1
.
i 1
ɚ) Find mathematical expectations of these random variables. b) If we remove the condition of independence, how will the answers change? 33. Let [ 1 , [ 2 , … be a sequence of independent identically distributed exponential random variables with the parameter O , S 0 0, Sk
[ 1 ȟ 2 ... ȟ k
(k
1 , 2 , 3 ,...) .
Find mathematical expectations and variance of random variable Qt which is defined by the condition 244
vt
min ^k t 1: S1 t , S 2 t , ... , S k 1 t , S k t t}
34. Continuation. Let a sequence of random variables [ 1 , [ 2 , ... and a random variable vt be defined as in the previous task. Find mathematical expectations and variance of random variable S vt [ 1 [ 2 ... [ vt .
35. Let [ 1 , [ 2 ,... be a sequence of independent uniformly distributed on >0,[email protected] random variables. Let’s define the random variable Q min^k : [1 [ 2 ... [ k ! 1` . Show that its mathematical expectation is MQ e . 36. Let [1 , [ 2 be exponential random variables with the parameters O1 , O2 (respectively), [i ( i 1,2 ). Ki [1 [ 2 Find the correlation coefficient of random variables K1 ,K 2 . How will the answer change if [1 , [ 2 are any independent random variables with variances D[ i
V i2 f ( i 1,2 )?
37. For the random variable [ ~ N a, V 2 we define new random variables by the relations:
[ (a)
[ , if [ t a, , [ (a) ® ¯0, if [ t a
[ , [ a, ® ¯0, [ t a.
a Find mathematical expectations, variances, covariances of random variables [ , [ a .
38. Let [1 , [ 2 ,..., [ n be independent uniformly distributed on >a, [email protected] random variables,
[
min^[1 , [ 2 ,..., [ n `,
[
max^[1, [ 2 ,..., [ n `.
Show that then the joint distribution density of these random variables is the function f [ ,[ ( x, y )
n(n 1) (b a ) n
( y x) n 2 ,
a d x d y db,
and using this, prove the validity of the relations: M[
D[
na b , n 1
D[
cov([ , [ )
M[
nb a , n 1
n(b a) 2 (n 1) 2 (n 2) (b a) 2 (n 1) 2 (n 2)
245
,
39. Let [1 , [ 2 ,..., [ n be independent identically distributed exponential (with the parameter Ȝ = 1) random variables, [ (1) , [ ( 2) ,..., [ ( n ) is the variational series corresponding to these random variab
les. Show that then the random variables Kr
( n r 1)[ r [ r 1 , r 1,2,..., n, [ 0
0 ,
are the independent identically distributed exponential random variables with the parameter O 1 . From this, using relation 1 MK r ( n r 1)M[ ( r ) M[ ( r 1) , get the following formula for the mathematical expectation: M[ ( r )
n
¦
j n r 1
1 . j
How to write the corresponding formula for the variance of a random variable [ r ? 40. A random vector ( [ , K ) is uniformly distributed in the domain which is bounded by the 2 2 ellipse x y 1 , i.ɟ. the density of the distribution of the vector is given by: 2 2
a
b
f[ ,K ( x, y )
1 , ( x, y ) D Sab
f [ ,K ( x, y )
½ x2 y2 ®( x, y ) : 2 2 d 1¾; a b ¯ ¿
0, ( x, y ) D .
Find the covariance and correlation coefficient of coordinates of this vector (i.ɟ. [ and K ). 41. Continuation. Let in the previous task D
½ x 2 ( y ɫx) 2 d 1¾ . ®( x, y ) : 2 2 a b ¯ ¿
Find the correlation coefficient [ and K . 42. Let for integervalued random variables [1 , [ 2 , ... , [ r the condition [ 1 [ 2 ... [ r hold and for any ni t 0, n1 n 2 ... n r n ,
n
P^[ 1 n1 ,..., [ r n r ` n! p1n1 p 2n2 ... p rnr , n1 ! n 2 !... n r !
where pi t 0, p1 p 2 ... p r 1 . Find the correlation coefficient of random variables [ i and [ j i z j . 43. Let [1 , [ 2 , ... be a sequence of independent identically distributed random variables that
take only positive values, and there are mathematical expectations M[ k a , M [ k1 b . Show that for S n [1 [ 2 ... [ n the mathematical expectation M S n1 f and for M [ k S n1 n 1 , k 1,2,..., n . 44. Let [1 ,[ 2 ,...,[ n be random variables with finite mathematical expectations. 246
Show that inequalities are then satisfied: M max{[1 , [ 2 ,...,[ n } t max^M[1 , M[ 2 ,..., M[ n ` , M min{[1 , [ 2 ,...,[ n } d min^M[1 , M[ 2 ,..., M[ n ` .
45. Let [1 , [ 2 , ... be a sequence of independent identically distributed random variables with finite variances, Sn [1 ... [ n , n 1 , 2, ... . Show that in this case a correlation coefficient of the random variables S k and S k m is equal
to
k km
.
46. Let [ , K be independent random variables, f [ ( x)
1
S 1 x 2
fK ( x)
xe
x2 2
x 1 , x ! 0
are their distribution densities (respectively). Show that a random variable ] [ K is a normal random variable.
§2. Conditional probabilities and conditional mathematical expectations with respect to partitions and sigma algebras
First, let us dwell on the question of determining the conditional mathematical expectation of a random variable with respect to an event. Let ሺ : ǡ ࣠ǡ ܲሻ be some probability space, B ࣠ – some event ( P B ! 0 ). Then, with the help of the event B, we can define a new probability space (see Chap. II, §4, p. 4.1) ሺ : ǡ ࣠ǡ PB ሻ, where the probability function PB is defined in terms of the probability P by the formula PB ( A)
P ( AB ) , A( B )
P( A / B)
A ऐǤ
Now let [ be a random variable defined on the probability space ሺ : ǡ ऐǡ ሻ. In this case, it is obvious that [ is a random variable also defined on ቀ : ǡ ऐǡ PB ቁ. Recalling now the general definition of the mathematical expectation of a random variable, the mathematical expectation of [ in the space ሺ : ǡ ऐǡ PB ሻ will be called the conditional mathematical expectation of [ with respect to the event B and we denote this conditional mathematical expectation as M [ / B . Thus, by definition M [ / B
³ [ Z PB d Z .
:
247
According to this definition, if g g x is a Borel function, then we can write the conditional mathematical expectation of a random variable K to an event B in the form of an integral M g [ / B
g [ with respect
³ g [ Z PB d Z .
:
To clarify the relationship between mathematical expectations Mg [ and
M g [ / B , let us consider successively the three stages of the general definition of mathematical expectation as the Lebesgue integral of §1. Suppose that C ऐ and g x I C x . In this case g [ Z I[ 1 C Z and,
by the Lebesgue integral definition, M g [ / B
³ I[
1
:
P dZ
³
(ɋ ) B
1
[ (ɋ )
PB dZ PB [ 1 (ɋ ) .
On the other hand, by definition,
PB [ 1 (ɋ )
P [ 1 (ɋ ) ȼ P B
1 ³ I 1 Z P d Z P B B [ C
1 ³ P dZ P B [ 1 C B 1 ³ g [ (Z ) P d Z . P B B
So, for the indicator g x IC x , M g [ / B
1 ³ g [ Z P d Z P B B
M ( g ([ ) I B ) . P ( B)
(*)
From this we immediately obtain that the formula (*) is also true for simple functions g g x . Further, using the formula for the change of variables in the Lebesgue integral (§1, p. 1.4), we easily see that formula (*) is also true in general (that is, for any Borel functions g g x ). Of course, the equality (*) could be proved directly, starting from the definition of a probability function PB : M g [ / B
³ g [ Z P dZ / B
:
1 ³ g [ Z P d Z P( B) B 248
1 ³ g [ Z P d Z B P B : M ( g ([ ) I B ) P( B)
It follows from (*) that
M ( g ([ ) I B )
P ( B ) M ( g ([ ) / B ) .
(*')
Now let the events B1 , B2 ,... form a complete group of events, i.e. i z j , P Bi ! 0,
Bi B j
¦ Bi i
:.
In this case
³ [ Z P dZ ¦i ³ [ Z P dZ
M[
:
Bi
¦ M [ I B ¦ P Bi M [ / Bi i
i
(**)
i
The formula (**) is called the formula of complete mathematical expectation. If we introduce the function
F[ x / B
PB [ d x
P ^[ d x / B` ,
then this function is a distribution function of the (considered on the probability space ቀ : ǡ ऐǡ PB ቁሻrandom variable [ (Chap. III, §2, p. 2.2). The function F[ x / B is called the conditional distribution function of the (considered on ሺ : ǡ ऐǡ ሻ) random variable [ with respect to the event B. Using the assertions proved in §1, we can write the formula M g [ / B
³ g x dF[ x / B . R
If the sigmaalgebra ऐ , generated by the random variable [ , doesn’t depend on the event B, then for any event Ⱥ ऐ we have PB A P A , so that
F[ x / B
F[ x ;
M [ / B M [ ; M [ I B
P B M[.
If we denote by D a partition D {B1 , B2 ,...} , then, it is clear that a conditional mathematical expectation of random variable [ with respect to the partition D must be determined by the formula M ([ / D )
¦ M ([ / Bi ) I Bi (Z ) . i
249
We now note the following. Above we assumed the fulfillment of the conditions P ( B ) ! 0, P ( Bi ) ! 0 . Meanwhile, in the probability theory, it is often necessary to consider conditional mathematical expectations with respect to events having zero probability. Consider, for example, the following experiment. A random point is placed on the segment > 0,[email protected] and, if the coordinate of this point is equal to x, then a coin with a probability of falling of the Tail, equal to x, is tossed. Let Q be the number of fallouts of the Tail in n tossings of this coin. Let's ask the question: what is the conditional probability P^Q k [ x` , where [ is the coordinate of a point, randomly placed on > 0,[email protected] ? Since P^[ x` 0 , then we can (formally) define the probability P^Q k [ x` by the condition probability formula. Meanwhile, it is intuitively clear that this nk probability is equal to Cnk x k 1 x . In the same way, if [ ,K are random variables, where Ș is a continuous random variable, g x, y is twodimensional Borel function, then P ^K y` 0 and it is clear that M g [ ,K / K y Mg [ , y should be considered, although the first mathematical expectation is taken with respect to the event with zero probability. First, we give a general definition of conditional expectation (in particular, conditional probability) with respect to a finite partition, then – with respect to a subsigmaalgebra V of the fundamental algebra ࣠ (that is, with respect to the Valgebra G, where G ࣠ ) and prove a number of important properties of such conditional expectations. In conclusion of this chapter, we obtain formulas for calculating conditional expectations and conditional probabilities for Valgebras. 2.1. Conditional probabilities and conditional mathematical expectations with respect to partitions Definition 1. Let ሺ : ǡ ࣛǡ ܲሻ be a finite probability space and D – some partition of : : Di ࣛ , P Di ! 0 , D1 ... Dk : . Then, for the event A ࣛ a (random) variable P A D P A D Z
k
¦ P A Di I D Z i 1
i
^ D1 , D2 ,...,Dk `
(1)
is called a conditional probability of event A with respect to partition D. Thus, the conditional probability P A D Z is a random variable, and it takes values P A Di on the atoms Di of the partition D: 250
P A D Z Properties
P A Di , Z Di .
P A : P A ;
P A B D P A D P B D ;
Since P A D Z is a random variable, then we can raise the question of its mathematical expectation. Using definition (1) and the formula of total probability, it is easy to show that the following formula for the total mathematical expectation holds:
MP A D P A . Definition 2. Let K be a (simple) random variable, DK
Dj
P ^K
^ D1 , D2 ,..., Dk `, where
y j ` is a partition of : , generated by the random variable K .
In this case, the conditional probability P A K
P A DK is called a conditional
probability of event A with respect to random variable K. Definition 3. The following mathematical expectations are called the conditional l
¦ x j I Aj Z with respect to the
mathematical expectations of a random variable [
j 1
event Di and the partition D (respectively):
M [ Di
l
¦ x j P A j j 1
M [ D
§ Di ¨¨ ©
M [ I Di · ¸
, P Di ¸¹
(2)
l
¦ x j P A j D .
(3)
j 1
From formulas (1) – (3) we see that the conditional mathematical expectation with respect to the partition can also be determined by the formula M [ D Z
k
¦ M [ i 1
Di I Di Z .
Remark 1. To understand why the conditional mathematical expectation
M [ D with respect to an event D is determined by formula (2), it suffices to recall the definition of (unconditional) mathematical expectation M[
l
j 1
251
¦ x j P Aj .
Then the conditional mathematical expectation M [ / D with respect a partition D is naturally defined as the sum of the products of the values of the random variable by the corresponding conditional probabilities of taking a random value of these values relative to the partition. According to Definition 3, conditional expectation M(ȟ/D)(Ȧ) is a random variable that takes the same value M [ Di for all elementary events Z belonging to the same atom Di . Definitions of conditional expectations M [ D are correct, i.e. do not depend on the way of representing (recording) a random variable [ . The following properties of (a.s.)conditional mathematical expectation follows directly from the definition:
M [ : M[ ;
M a [ b K / D
where a ,b are constants;
M c D
aM [ / D bM K / D ,
c , c is constant; M I A Z D P A D .
The last property shows, in particular, that the properties of conditional probabilities can be obtained directly from the properties of conditional mathematical expectations. The following important property generalizes the total probability formula and is called the formula of complete mathematical expectation:
M M [ D
M[ .
(3')
The proof follows immediately from the chain of the following equalities:
M M [ D
M ¦ x j P Aj D l
j 1
l
¦ x j MP Aj D
j 1
l
¦ x j P Aj
j 1
M[ .
ז
2.1.1. The conditional mathematical expectation of one simple random variable relative to another simple random variable
Let [ ,K be simple random variables. In this case, by definition, a conditional mathematical expectation of [ with respect to K (denotation is M [ K ) is defined as a conditional mathematical expectation of random variable [ with respect to the partition generated by the random variable K DK
^D1 , D2 ,, Dk `, D j 252
^K
y j `, P D j ! 0 :
M [ K M [ DK .
(4)
Properties. 10. If the random variable K is measurable with respect to the partition D, i.e., DK D , in other words, if K can be represented in the form
K
k
¦ yi I Di Z , then
i 1
M [ K D K M [ D .
(5)
20. If [ and K are independent, then
M [ K M[ . 30. For any random variable K
M K K K . 40. For any random variables K1 ,K 2
M >M [ K1 ,K 2 K1 @ M [ K1 ,
(6)
where M [ K1 ,K 2 M [ DK1 ,K 2 , and DK1 ,K 2 is a partition generated by the random variables K1 and K 2 . In general case, M >M [ D 2 D1 @ M [ D1 , (7) where D1 D 2 (the partition D2 is «finer» than the partition D1). The proofs of the properties reduce to simple tests of the indicated relations. For example: 30. For K Z
k
¦ yi I Di Z , where Di
i 1
^Z : K Z
yi `, DK ^D1 ,..., Dk `, we
have:
§ k
·
©
¹
M K K M K DK ¦ M K Di I Di Z ¦ ¨¨ ¦ yi P D j Di ¸¸I Di Z i 1 i 1 j 1 k
k k
¦ ¦ yj
i 1j 1
10. If [
l
P D j Di I Z P Di Di
¦ x j I Aj , then [K
j 1
k
l
k
k
¦ yi I Di Z K Z .
i 1
¦ ¦ x j yi I Aj Di , hence,
j 1i 1
253
l
M [K D
k
¦¦ x j yi P A j Di D j 1i 1
l
k
k
¦¦ x j yi ¦ PA j Di j 1i 1
Di I Di Z
m 1
2 On the other hand, I D i
ª
l
k
k
¦¦ x j yi ¦ PA j Di j 1i 1
m 1
l
¦ x j yi P A j j 1
º ª
(5')
Di I Di Z .
i z m . Given this, we can write
I Di , I Di I Dm 0
k
Dm I Dm Z
l
º
ª
¼»
¬i
º
k
K M [ D «¦ yi I Di Z » « ¦ x j P A j D » «¦ yi I Di Z » ¬i
¼ ¬« j
1
1
ª l º ¦ « ¦ x j P A j Dm »I D m Z «j 1 »¼ m 1¬ k
k
l
¦ ¦ yi x j P A j i 1j 1
¼
1
(5'')
Di I Di Z
Comparing now (5') and (5»), we see that formula (5) is true. 20. If [
l
¦ xi I A Z and K Z are independent, then the relation M [ K M[
i 1
i
is directly verified. Relations (6) and (7) are proved similarly, but they require longer calculations (the proofs of these properties in the general case, when sigma algebras are considered instead of partitions, will be given below, in 2.2). ז Examples 1. Let [ , K be the independent Bernoulli random variables and
P ^[ 1` P ^K 1` p , P ^[
0` P ^K 0` q 1 p .
Then
P ^[ K k K` P ^[ K k K 0`I ^K 0`+ P ^[ K k K 1`I ^K 1` P^[ k `I ^K 0` P^[ k 1`I ^K 1` q1 K I ^k 0` p1 K qK I ^k 1` pK I ^k 2`. M [ K / K M [ K M K K M[ K p K .
The last relation can be obtained in another way as follows:
M [ K / K
2
¦ kP^[ K
k 0
k K` p1 K qK 2 pK
p K.
2. If [ and K be any independent identically distributed simple random variables, then 254
[ K
M [ [ K M K [ K
2
because
,
2 M [ [ K M [ [ K M K [ K M [ K [ K [ K . By definition, the conditional variance of a random variable [ with respect to a partition D is defined as a random variable
>
@
D[ D M [ M [ D 2 D .
(8)
Statement 1. The variance of a simple random variable [ can be calculated in terms of conditional mathematical expectations and variances with respect to a partition D by the formula
M D [ D D M [ D .
D[
(9)
Proof.
>
D [ D M [ 2 2[M [ D M [ D 2 D
>
@
M [ 2 D 2 M >[ M [ D D @ M M [ D 2 D
M [ 2 D 2M [ D 2 M [ D 2
@
M [ 2 D M [ D 2 .
It implies that
MD[ D MM [ 2 D M M [ D 2
M[ 2 M >M [ D @2 .
On the other hand,
DM [ D M >M [ D @2 MM [ D 2
M >M [ D @2 M[ 2 .
Summarizing the last relations, we obtain the required formula (9). ז Note that above, in passing, we received another formula for calculating the conditional variance:
D [ D M [ 2 D M [
D
2 .
Statement 2. For any function, the relation
M > f K M [ K @ M >[ f K @ holds. Proof.
M > f K M [ K @ M >M > f K [ K @@ M >[ f K @ .
255
(10)
Here we used measurability of f (K ) relative to DK and the formula of complete ז mathematical expectation. Example 3 (The sum of a random number of random summands). Let [1 ,[ 2 ,...,[ n , W be the independent simple random variables, at that [1 ,[ 2 ,...,[ n are identically distributed, and W takes values 1 ,2 ,..., n . Let SW [1 [ 2 ... [W (the sum of a random number of random summands). In this case
M SW W W M[1 , DSW W W D[1 , M SW
MW D[1 DW M[1 2 .
MW M[1 , D SW
(11)
Indeed, we can write n
M SW W
¦ M SW W
k 1
n
k I ^W
¦ MSk I ^W k`
k 1
k`
n
¦ M Sk W
k 1
k I ^W
k`
n
¦ kM[1 I ^W k` M[1 W .
k 1
We here first used independence of Sk and W , then – the identical distribution of [1 ,[ 2 ,...,[ n . The second relation is proved in a similar way. The last formulas are the corrolary from the formula of complete mathematical expectation. Remark. In conclusion, we note the following circumstance. P AB (where A, B Concerning the concept of conditional probability P A/ B P B are the event, P B ! 0 ), introduced in Chapter II, §4, we note that this definition, generally speaking, does not correspond to the treatment of the conditional probability in Kolmogorov axiomatics. In application to the discrete case, if one strictly adheres to this axiomatics, one would have to consider the probabilities P A/ B and P A / B , and under the conditional probability to understand a random variable, defined on : and equal to P A/ B for Z B and P A / B for Z B , i.ɟ. this probability had to be defined as the condiB , B is a partition of : . tional probability P A D Z , where D
^ `
2.2. Conditional probabilities and conditional mathematical expectations with respect to sigma algebras
The following theorem, which is given without proof, ([9], Chap. II, §6) plays a key role in the construction of conditional mathematical expectations. Theorem 1 (The RadonNikodym theorem). Let ሺ : ǡ ࣠ሻ be a measurable space, P V finite measure and O – measure with a sign (i.ɟ. O O1 O2 , where at least 256
one of the measures O1 or O2 is finite), which is absolutely continuous with respect to P ( O f , [email protected] such that
O A ³ f ( Z )P dZ , Ⱥ ࣠ . A
Up to sets of P measure zero, the function f Z is unique: if h h( Z ) is another ࣠ – measurable function such that
O A ³ h( Z )P dZ , Ⱥ ࣠ , A
then P ^Z : f Z z hZ ` 0 . 0 ,]. [email protected] If O is a measure, then f f (Z ) takes values in R >[0, Remark 1. The function f f (Z ) forming the RadonNikodym theorem is called the RadonNikodym derivative or the density of the measure O with respect to dO dO the measure P , and is denoted by or Z . dP dP Before introducing the notion of conditional mathematical expectation with respect to a ıalgebra, we recall that, according to §1, the expectation M[ was defined in two steps: first for nonnegative random variables [ [ Z , then generally, using equality M[
M[ M[ , where [
[ [ , [
max[ , 0 , [
min([ , 0) , and
only under the assumption that min M[ , M[ f . Such a twostorey construction is also used when determining the conditional mathematical expectation with respect to V algebras. Let ሺ : ǡ ࣠ǡ ܲሻ be a probability space, G – some ıalgebra, G ࣠ ( G – ısubalgebra of ࣠ ), [ [ Z – a random variable. Definition 1. 1) The conditional mathematical expectation of a nonnegative random variable [ with respect to the ıalgebra G is a nonnegative extended random variable, denoted by M [ G or M [ G Z , such that ɚ) M [ G is G measurable; b) for any A G ,
³ [ Z P dZ
A
³ M [ G Z P dZ .
(1)
A
It is clear that the condition (1) can be written in the form M [ Z I A Z
M M [ G Z I A Z , 257
AG .
(1*)
2) The conditional mathematical expectation M [ G (or M [ G Z ) of an arbitrary random variable [ with respect to the ıalgebra G is deemed to be definite if P – a.s. min (M(ȟ +/G), M(ȟ –/G)) < ,
and is given by the formula
M [ G { M [ G M [ G ,
(2)
and on the set (of zero probability) of those elementary events for which M [ G M [ G f , the difference M [ G M [ G is determined arbitrarily, for example, it is assumed to be equal to zero.
2.2.1. Existence of conditional expectation with respect to the ıalgebra Theorem 2. Suppose that for a random variable [ there is its mathematical expectation M[ . Then, the function of set Q A ³ [ Z P dZ , A G , is a measure with a sign A
on : , G , and Q is absolutely continuous with respect to the measure P (considered on : , G , G ࣠ ), i.ɟ., if for A G , P A 0 , then Q A 0 (it is written like this: Q P ). Proof. Suppose that [ t 0 . If A1 , A2 ,࣠ , AiiA Ajj = (i j), then for A
f
¦ An n 1
we can write Q A
M [ I A
§f · M ¨ ¦ [ I An ¸ ©n 1 ¹
¦ M [ I A f
n 1
n
f
¦ Q An . n 1
(We have taken advantage here of the fact that for nonnegative [1 ,[ 2 ,... §
f
·
f
[ I An (see §1, theorem 4, corollary 1). ¹ n 1 If now [ is an arbitrary random variable, for which the expectation M[ is defined, then the countable additivity of Q follows from the representation M ¨ ¦[ n ¸ ©n
1
¦ M[ n , where [ n
Q A Q A Q A , where
Q A ³ [ Z P dZ ,
Q A ³ [ Z P dZ ,
A
A
258
established countable additivity for nonnegative random variables and the fact that min Q ȍ , Q ȍ f . So, if M[ is defined, then the function of set Q Q A is a measure with a sign – countably additive set function, representable in the form Q Q1 Q2 , where at least one of the measures Q1 or Q2 is finite. We now show the property Q P . For the proof it suffices to consider the case of nonnegative random variables.
If [
n
¦ xk I Ak is a simple nonnegative random variable and P A 0 , then
k 1
Q A M [ I A
n
¦ xk P Ak A
0.
k 1
If [ t 0 , [ n n 1,2 ,... is a sequence of simple nonnegative random variables, such that 0 d [ n n [ , then by the monotonic convergence theorem
Q A M [ I A
lim M [ n I A 0 ,
n of
ז because M [ I A 0 for any n t 1 and A with P A 0 . Remark 2. Note that the given above theorem of RadonNikodym in a certain sense is the converse of the statement just proved. Corrolary. Let [ be a nonnegative random variable. Then, a G measurable random variable M [ G is such that for any A G
Q A ³ M [ G Z P dZ .
(3)
A
Proof. Really, if [ t 0 , then the theorem 2 implies that the set function Q A ³ [ Z P dZ , A G , is a measure on : , G and this measure Q is absolutely A
continuous with respect to the measure P, considered on : , G . Then, according to the RadonNikodym theorem, there exists a Gmeasurable random variable M [ G Z and it satisfies condition (3). Furthermore, condition (1) follows from the condition Q A
³ [ Z P(dZ), A G , and condition (3).
ז
A
Remark 3. In accordance with the RadonNicodym theorem, the conditional mathematical expectation M [ G is uniquely determined only up to sets of zero Pmeasure. In other words, one can take any random variable K satisfying the condition Q A ³K Z P(dZ ) , A G , as M [ G . The random variable K K Z , defined A
in this way, is called the variant of conditional mathematical expectation M [ G Z . 259
Remark 4. In accordance with the remark to the RadonNicodym theorem,
M [ G {
dQ Z , dP
(4)
i.e., the conditional mathematical expectation is nothing other than the RadonNicodym derivative of the measure Q with respect to the measure P (considered on : , G ). Remark 5. In connection with relation (1), we cannot assert that always M [ G [ , since a random variable [ does not have to be Gmeasurable. Definition 2. Suppose that B ࣠. Then, the conditional mathematical expectation M I B G is denoted by PB G , or P B G Z , and called conditional probability of an event B with respect to a Valgebra G , G ࣠:
P B G M I B G . It follows from definitions 1 and 2 that for any fixed B ࣠ conditional probability P B G Z is such a random variable, that: а) P B G is Gmeasurable random variable, b) for any A G ,
P A B ³ P B G Z P dZ .
(5)
A
M ሺ [ Ȁ࣠ K ሻǡ where ࣠ K is Valgebra generated by the random variable K , is denoted by M [ K and called a conditional mathematical expectation of a random variable [ with respect to a random variable K : M ሺ [ Ȁ࣠ K ሻ M [ K .
(6)
Similarly, the conditional probability of the event B ࣠ with respect to the σalgebra ࣠ K is denoted by P B K :
P ሺ B Ȁ࣠ K ሻ P B K . 2.2.2. Consistency of the definition of the conditional mathematical expectation with respect to a partition with definition of conditional mathematical expectation with respect to σalgebras Suppose that D ^ D1 , D2 ,...`, Di D j
A Di , then P A 0 or P Di A 0 . 260
i z j , P Di ! 0 ,
¦ Di : , and, if i
Theorem 2. If G V D is the smallest ıalgebra, generated by the partition D,
[ is random variable, for which M[ is defined, then (below the notation ȟ = Ș (P – a.s. on A) means that P A ^[ z K` 0 ), M [ G
M [ I Di P Di
( P – a.s. on Di ),
or, which is the same, 1
M [ G
³ [ Z P dZ P Di Di
( P – a.s. on Di ).
Proof. M [ G is constant on Di (Chap. ȱȱȱ, §1, theorem 5): Ki M [ G I Di . But on the other side,
M [ I Di
³ [ Z P dZ
Di
³ M [ G Z P dZ Ki P Di ,
Di
from where it follows
Ki
1
³ [ Z P dZ . P Di Di
2.2.3. Properties of conditional expectation
We assume that all the random variables, considered below, are defined on the same probability space ൫ : ǡ ࣠ǡ ܲ൯ǡ and for all these random variables there exist their mathematical expectations, and ıalgebras G, G1, G2 are sub ıalgebras of the basic ıalgebra ࣠ (G, G1,G2 ࣠ ). Theorem 4 (Properties of conditional expectation). 10. If c is constant and [ c (a.s.), then M [ G c (a.s.) 20. If [ d K (a.s.), then M [ / G d M K / G (a.s.) 30. M [ G d M [ G (a.s.) 40. If a ,b – constants and a M[ b MK is defined, then
M a [ bK G a M [ G b M K G (a.s.) 50. Suppose that ࣠{ כØ, : }, then M ሺ [ Ȁ࣠ כሻ M [ (a.s.) 60. M ሺ [ Ȁ࣠ሻ [ (a.s.) 70. (The formula of total expectation) 261
M M [ G M[ .
(7)
M >M [ G2 G1 @ M [ G1 (a.s.)
(8)
80. If G1 G2 , then
90. If G1 G2 , then
M >M [ G2 G1 @ M [ G2 (a.s.)
(9)
100. If a random variable [ doesn’t depend on ıalgebra G (i.ɟ. if a random variable [ and indicator I B of any event B G are independent) and an expectation M[ is defined, then
M [ G
M[ (a.s.).
110. If K is G measurable random variable, M K f , M [ K f , then
M [ K G K M [ G (a.s.) Proof of properties. 10. Random variable [ c (a.s.), c – constant, is a G measurable function, therefore it suffices to show that for any A G the equality
³ [ Z P d Z ³ cP d Z A
A
takes place. This, in turn, follows from the relation
[ Z I A Z cI A Z (a.s.) and the equality M [ I A M cI A (§1, p. 1.2, property 30). 20. If [ d K (a.s.), then [ Z I A Z d K Z I A Z , A G. Then, by the property 5ɨ of expectation (see §1, p. 1.2), M[ IA d MK IA , i.ɟ.
³ [ Z P d Z d ³ K Z P d Z , A G , A
A
hence,
³ M [ A
G P dZ d ³ M K G P dZ , A G. A
262
Now, using the following property of mathematical expectation: «If M [ f , M K f and for any event Ⱥ ࣠ the inequality M [ I A d M K I A holds, then [ d K (a.s.)», we obtain the inequality
M [ G d M K G (a.s.) 30. This property is a direct consequence of property 20 and inequality [ d[ d [ . 40. This property follows from the following statement (below we used the linear property of mathematical expectation and the definition of conditional mathematical expectation): for the event A G and the constants a, b the following relations are true:
³ a[ bK P dZ ³ a[ P dZ ³ bK P dZ A
A
A
³ aM [ G P dZ ³ bM K G P dZ A
A
³ ¬ª aM [ G bM K G ¼º P dZ . A
50. For any random variable [ its mathematical expectation M[ (as a constant) is ࣠ כmeasurable, and, if A : or A , then
³ [ Z P d Z ³ M [ P d Z , A
A
which shows the validity of the property being proved. 60. Random variable [ , by definition, is always ࣠ measurable and
³ [ Z P(dZ ) ³ M ([ / ࣠ሻ P(dZ ) , A
Ⱥ ࣠ ,
A
therefore M ሺ [ Ȁ࣠ሻ = [ (a.s.). 70. This property follows from the properties 50 and 80 (here it suffices to assume G1 ^, :` , G2 G ). In the literature this property is called the formula of complete mathematical expectation and is briefly read: «the mathematical expectation of the conditional mathematical expectation is equal to the unconditional mathematical expectation». 80. Suppose that A G1 . In this case
³ M [
G1 P dZ
A
³ [ Z P dZ . A
263
Since G1 G2 , then A G2 , therefore
³ M ª¬M [
G2 G1 º¼ P dZ
A
³ M [
G2 P dZ
A
³ [ Z P dZ . A
So, for A G1 ,
³ M [
G1 P dZ
A
³ M ª¬M [ A
G2 G1 º¼ P dZ .
Therefore, by the property of expectation that is used in the proof of property 10,
M [ G1 M ª¬ M [ G2 G1 º¼ (ɩ.ɧ.). 90. If A G1 , then, by the definition of conditional mathematical expectation M ª¬ M [ G2 G1 º¼ ,
³ M ª¬M [ A
G2 G1 º¼ P dZ
³ M [
G2 P dZ .
A
The function M [ G2 , by definition, is a G2 measurable function, and since G2 G1 , then it is also a G1 measurable function. It implies that a random variable M [ G2 is one of the versions of the conditional mathematical expectation
M ª¬ M [ G2 G1 º¼ , thereby the property 90. 100. The mathematical expectation M [ , as constant, is G measurable, so we need to check that for any event AG
³ [ Z P d Z ³ M [ P d Z , A
A
i.ɟ., it is necessary to check that M [ I A M [ MI A Z . If M [ f or [ is nonnegative, then this relation is a corollary of multiplicative property of mathematical expectation. The general case follows from the uniqueness of the representation of the random variable [ in the form [ [ [ . 110. The proof of this property will be given after the proof of the following ז Theorem 3 (with the application of statement a) of this theorem). Theorem 3 (Theorem on the convergence under the sign of conditional expectation). Let ^[ n `n 1 be a sequence of extended random variables. a) If [ n d K , MK f and [ n o [ (a.s.), then f
264
M [ n / G o M [ / G (a.s.), M [ n [ G o 0 (a.s.). b) If [ n t K , MK ! f and [ n n [ (a.s.), then
M [ n G n M [ G (a.s.). ɫ) If [ n d K , MK f and [ n p [ (a.s.), then
M [ n G p M [ G (a.s.). d) If [ n t K , M K ! f , then M lim [ n G d lim M [ n G (a.s.).
ɟ) If [n d K , M K f , then
lim M [ n / G d M lim [ n G (a.s.).
f) If [ n t 0 , then §
f
f
·
¦ M [ n G (a.s.).
M ¨ ¦[ n G ¸ ©n
Proof. ɚ) Suppose that K n
¹ n
1
1
sup [ m [ . Since [n o [ (a.s.), then Kn p 0 (a.s.). mt n
We have M [ n f , M [ f , therefore, by the properties 30,40,
M [ n G M [ G
M [ n [ G d M [ n [ G d M Kn G .
Also, M Kn1 G d M Kn G (a.s.), therefore, there is (a.s.)limit h limM Kn G . n
Then, if n o f 0 d ³ h(Z ) P (dZ ) d ³ M K n G P (dZ ) :
M M Kn G
:
MKn
³ Kn (Z ) P(dZ ) o 0,
:
(the convergence to zero of the last integral is a consequence of the theorem on majorized convergence, because 0 d Kn d 2K , M K f ). Hence, ³ h(Z ) P(dZ ) 0 , :
265
therefore, by the property of the mathematical expectation: if K t 0 and M K 0 , then K 0 (a.s.), h 0 (a.s.). b) Suppose first that K { 0 . We have: M [ n G d M [ n 1 G (a.s.), therefore there is (a.s.)limit ] Z lim M [n G . nof
Then the equality
³ [ n (Z ) P (dZ ) ³ M [ n G Z P (dZ ) , A G ,
:
:
and the theorem on monotonic convergence implies
³ [ (Z ) P(dZ ) ³ ] Z P(dZ ) , A G . A
A
Hence,
M [ I A (Z ) M ] I A (Z ) , A G , that implies [
] (ɩ.ɧ.).
In general case 0 d [ n n [ and, according to the statement just proved,
M [ n G n M [ G (ɩ.ɧ.). But 0 d [ n d [ , M [ f , therefore, by the property ɚ)
M [n G o M [ G (a.s.), and this together with the relation (7) shows the validity of the property b). Property c) follows directly from property b). d) Suppose that K n inf [ m . Then Kn n K lim[n and by the property b), mt n
M K n G n M K G (a.s.).
therefore, a.s. M lim [ n G M K G lim M Kn G n
lim M Kn G d lim M [ n G
Property ɟ) is a corollary of property d). f) If [ n t 0 , then by the property 40, 266
(7)
§ n · M ¨ ¦ [k G ¸ ©k 1 ¹
n
¦ M [ k G (a.s.), k 1
this together with property b) proves the required assertion f). ז 0 Proof of the property 11 . Suppose that K IB , BG. In this case, for any A G ,
³ [ (Z )K Z P(dZ ) ³ [ Z I B (Z ) P(dZ ) A
A
³ [ Z P( dZ )
A B
³ I B (Z ) M [
³
A B
G Z P(dZ )
A
M [ G Z P(dZ )
³K Z M [
G Z P(dZ ) .
A
From this, according to the additive property of the Lebesgue integral, we obtain that property
³ [ (Z )K Z P(dZ ) ³K Z M [ A
G Z P(dZ ) , A G ,
(8)
A
remains true for simple random variables K
n
¦ y k I Bk
k 1
, Bk G , n f .
Therefore, by property 50 of conditional expectation, for simple random variables we have M [ K G K M [ G (a.s.).
(9)
Now, let K be any Gmeasurable random variable such that M K f , and a sequence of simple random variables K1 ,K 2 ,... , satisfies the conditions Kn d K , Kn o K . Then, by the property 90,
M [ K n G K n M [ G Also [ Kn d [ K theorem,
(a.s.)
and M [K f . Therefore, according to the poit a) of the M [ K n G o M [ K G (a.s.).
Further, since M [ f , then M [ G f (a.s.), thereby
Kn M [ G o K M [ G (a.s.). 267
2.3. The structure of the conditional mathematical expectation of one random variable relative to the other
Let [ [ Z and K K Z be random variables defined on the same probability space ሺ : ǡ ࣠ǡ ܲሻ. In this case, by definition, conditional mathematical expectation M [ K Z is an ࣠క – measurable random variable and, according to the theorem 4 (see p. 1.4.2, §1, Chap. ȱȱȱ), there exists a Borel function m = m(y) ( m() : R o R
[f,f] )
Such that for all Z : , m(Ș(Ȧ)) = M(ȟ/Ș)(Ȧ).
(10)
We shall denote this function m(y) by M(ȟ/Ș = y): m(y) = M(ȟ/Ș = y).
(10ƍ)
According to the definition of the conditional mathematical expectation, for any event A ࣠K :
³ [ Z P dZ ³ M [ K Z P dZ ³ m K Z P dZ .
A
A
A
Further, by the formula for the change of variables under the sign of the Lebesgue integral,
³
^Z:K Z B`
mK Z P dZ
³ m y PK dy , B E R ,
B
where PK is a law of probability distribution of a random variable K .
Hence, m = m(y) is a Borel function such that for every B E R ,
[ Z P dZ ³ ^Z :K Z B`
³ m y PK dy .
B
The foregoing suggests that one can come to the definition of conditional mathematical expectation M(ȟ/Ș = y) in a different way. Definition 3. The conditional mathematical expectation of a random variable [ under the condition that a random variable K takes a value y (Ș = y) is any E R measurable function m = m(y) for which
268
³ [ Z P d Z
K
1
³ [ Z P d Z ^Z :K Z B`
(B)
³ m y PK ddyy , B E R .
(11)
B
Remark 6. The existence of a function m(y) = M(ȟ/Ș = y) follows from the RadonNikodym theorem, if one observes that the set function
QB
³ [ Z P dZ
K 1 ( B )
is (absolutely continuous with respect to a measure PK ) a measure with a sign. Remark 7. Suppose that m(y) is a conditional mathematical expectation in the sense of the last definition. Then, again applying the formula for the change of variables under the sign of the Lebesgue integral, we find that
³ [ Z P dZ ³ m yy PK dy
K 1 B
B
³ m( K( Z ))P( dZ ) ,
K 1 ( B )
B E R .
(12)
The function m = m(Ș) is a ࣠K – measurable function, and all the sets in࣠K are exhausted by sets K 1 B ^Z : K Z B`, B E R ; thus, the last relation is true for any B E R . By definition, it implies that mK is a conditional mathematical expectation M(ȟ/Ș) : m(Ș(Ȧ)) = M(ȟ/Ș) (Ȧ). Thus, knowing M(ȟ/Ș = y) you can restore M [ K and vice versa, you can find M(ȟ/Ș = y) by M [ K . From the intuitive point of view, M(ȟ/Ș = y) is a simpler and more understandable object than M [ K . However, the conditional mathematical expectation M [ K , considered as an ࣠K – measurable random variable, is more convenient in work.
2.3.1. Properties and formulas for calculating the conditional expectation M(ȟ/Ș = y)
The above properties of the conditional mathematical expectation M [ K , as well as the statement of the theorem on limit transitions under the sign of conditional mathematical expectation M [ K , remain valid also in the case of conditional mathematical expectation M(ȟ/Ș = y) (with the substitution «Palmost sure (a.s.)» or briefly «almost sure (a.s.)» by « PK almost sure (a.s.)»). For example, if M [ f ,
M [ f ( K ) f , where f
f ( y ) is a E R measurable function, then
M(ȟ f(Ș) / Ș = y) = f(y) M(ȟ / Ș = y) ( PK – a.s.) 269
If [ and K are independent, then M(ȟ / Ș = y) = Mȟ
( PK a.s.)
If [ and K are independent and B E R2 , then M [IB(ȟ, Ș) / Ș = y] = M IB(ȟ, y) ( PK a.s.)
If ĳ = ĳ(x, y) is a E R2 measurable function, such that M M [ ,K f , then
PK a.s.)
M [ĳ(ȟ, Ș) / Ș = y] = Mĳ(ȟ, y)
Definition 4. We will call the conditional probability of event A ࣠ under condition Ș = y (notation P(A/Ș = y) the conditional mathematical expectation M(IA/Ș = y)
P A K
y M I A K
y .
(13)
It is clear that P(A/Ș = y) could be defined as such a E R measurable function that y PK dy dy , B E R .
P A ^K B` ³ P A K B
(14)
If K is a continuous random variable, P ^ K f` 1 , and fK y is its distribution density, then, by taking B
R , from formula (14) we obtain the formula
P A
f
³ P A / K y fK y dy .
(14ƍ)
f
If K is a discrete random variable, P ^K
P A ^K
P A /K
yk
P A
¦ P ^K k
f
ykk ` ! 0 , ¦ P ^K
P ^K
kk= 11
yk `
yk `
yk `P A / K
;
yk .
ykk` 1 , then
(14ƍƍ)
For y {y1, y2,…} we can define the conditional probability P(A/Ș = y) in an arbitrary way (for example, assuming it to be zero).
270
If [ is an (arbitrary) random variable, for which there is a mathematical expectation f
M[ , and K is a discrete random variable, such that P(Ș = yk) > 0, ¦ P^K ykk` 1, then, by assuming B
k 1
^ yk ` , we obtain from the formula (11)
m yk
M [ / K
yk
P ^K
1
³
yk ` ^Z:K Z yk `
[ Z P d Z .
(15)
For y {y1, y2,…} a conditional mathematical expectation M(ȟ/Ș = y) is arbitrary defined (for example, assumed to be zero). If, in addition, [ is a discrete random variable, then from formula (15) we obtain the following formula for conditional expectation: M [ / K
yk
P ^K
1
yk `
¦ x j P ^[ x j ,K yk ` j
(15ƍ)
¦ x j P ^[ x j / K yk ` j
If [ is a continuous random variable, then we obtain from the formula (15) (from the proved below formula (18)) the formula
M [ / K
f
yk
³
f
xf[ /K x / yk dx ,
(15ƍƍ)
where the function f[ /K x / yk is the conditional distribution density of random
variable [ under the condition of occurrence on the event ^K
f[ /K x / yk
d P ^[ d x / K dx
yk ` :
f[ ,K x, yk
yk `
P ^K
yk `
.
Theorem 5. Let [ ,K be a twodimensional (absolutely) continuous random variable having a joint distribution density fȟ, Ș (x, y). Let fȟ (x), fȘ (y) be marginal distribution densities of [ and K (respectively). Let's define a function
f[
/ y K xx/y
f[ ,K xx,y , y ffKȘ (yy)
(if fȘ (y) = 0), then we suppose that fȟ, Ș (x / y) = 0. 271
.
(16)
In this case, for any Borel set C E R ,
P ^[ C K
y ` ³ f[ K (x/y) x y dx dx ,
(17)
C
i.ɟ. fȟ / Ș (x / y) is the conditional distribution density of random variable [ on condition Ș = y. Proof. To prove (17), it suffices to verify the validity of formula (14) for any B E R and A ^Z : [ Z C`. On the basis of formulas (9 ') and (10) in §1 and Fubini's theorem, we can write ª ³ «« ³ f[ B¬ C
K
º
ª
º
¼»
«C B¬
¼»
x y dx » PK (dy ) ³ « ³ f[ K x y dx » fK y (dy )
³ f[ K x y fK y dxdy
CuB
x, y dxdy dxdy ³ f[ ,K (x,y) CuB
P ^[ ,K C u B` P ^[ C K B `,
ז which proves the theorem. Corollary. Under the conditions of Theorem 5, the following assertions are true: If there is M[ , then
x y dx. M [ K yy ³ x f[ K (x/y) dx .
(18)
R
If M g [ f (g(x) is a Borel function), then the following formula takes place: dx.. x y dx M g [ K yy ³ g x f[ K (x/y)
(19)
R
In particular, for conditional variance D [ K d y
D [ K
y
M ª [ M [ K ¬
³ ¬ª x M [ K
R
d K ,
y K 2
y ¼º f[ K x y dx 2
yº ¼
(20)
The proofs of the assertions of the corollary are analogous to the proofs of Theorem 5. ז
272
2.4. The conditional mathematical expectation and the optimal (in the meansquare sense) estimator
Now consider the following problem related to the theory of estimates, as an example of applying the concept of conditional mathematical expectation. Let [ ,K be a twodimensional random variable, and [ is observable, and K is not subject to observation. Let us pose the question: how to «estimate» the component K from the values of observations over ȟ? A more general question: how to «estimate» the component from the values of observations over it? To make this task more specific, we introduce the notion of an estimator. Let M M x be a Borel function. Then a random variable M [ will be called an estimator of K with respect to [ , and we call the mathematical expectation M ª¬K M [ ¼º the meansquare error of this estimator. 2
The estimator M [ , taken from the condition ' { M ª¬K M [ º¼
2
inf M ¬ªK M [ ¼º
2
(21)
M
(in (21) if it is taken over the class of all Borel functions M M x ) is called the optimal estimator in the meansquare sense. Theorem 6. If MK 2 f , then there is an optimal estimator M M [ and one can take as a function M x the function
M x M K [
x .
(22)
Proof. Without loss of generality, we can confine ourselves to considering only estimators M [ with finite second moments MM 2 [ f ).
Then, if M [ is such an estimator, and M [ M ª¬K M [ ¼º
2
M K [ , then
M ª¬K M [ M [ M [ º¼
2
M ¬ªK M [ º¼ 2 M ª¬K M [ M [ M [ º¼ 2
M ª¬M [ M [ º¼ t M ª¬K M [ º¼ , 2
2
because M ª¬M [ M [ º¼ t 0 and by the properties of mathematical expectation 2
273
M ª¬K M [ M [ M [ º¼
^
`
M M ª¬K M [ M [ M [ º¼ [ M ª¬M [ M [ M K M [ [ º¼ 0. Above we first applied the formula of complete mathematical expectation (property 70), after this (taking advantage of ࣠క measurability of function M [ M [ ) we used property 110, and finally, we take into account the equality that follows from the properties 40 and 60
M ª¬K M [ [ º¼ M K [ M M [ [ M K [ M [ M K [ M K [ 0 .
ז
Example 1. The structure of the estimator M [ M K [ in the case of a twodimensional normal random variable [ ,K . By the formula (6) of item 3.2, §3, Chap. Iȱ and formula (16) for the conditional distribution density we obtain 1
fK [ y x
2S 1 U 2 V 22
where m( x) m2
e
y m ( x ) 2 2V 22 (1 U 2 )
,
(23)
V2 U x m1 , V1
m1 M [ , m2 MK , V 12 D [ , V 22 U M [K m 1 m 2 V 1V 2 .
DK ,
Further, applying formulas (18), (20), we obtain for conditional mathematical expectation and conditional variance:
M K [
x
f
³ yfK [ y x dy m( x) ,
(24)
f
D K [ f
³
f
x
M ªK M K [ ¬
x [ 2
( y m( x )) 2 fK [ y x dy V 22 1 U 2
xº ¼
.
(25)
As we see (see formula (25)), a conditional variance DK [ x doesn’t depend on the initial condition x, thus, the error in the meansquare sense is 274
'
M ª¬K M K [
x º¼
MD K [
2
x V 22 1 U 2 .
(26)
Note that in the formulas (24) – (26) we assumed that the conditions D[ ! 0 , DK ! 0 are satisfied. It is clear that these formulas remain valid in the case of D[ ! 0 , DK 0 . We now state our results in the form of a theorem. Theorem 7. Let [ ,K be a twodimensional normal random variable, and D[ ! 0 . Then the optimal estimator of K by [ in the meansquare sense is the conditional expectation M K [
MK
cov [ ,K [ M [ , D[
(27)
and the rootmeansquare error of the estimator is given by ' { M >K M K [ @
2
DK
cov 2 [ , K . D[
(28)
Remark 8. The curve y( x) M K [ x is called a curve of regression of K with respect to [ , or simply a regression of K on [ . Taking into account the definition (10'), from (27) we obtain that for normal random variables, the regression of K on [ is a linear function M K [
x a bx ,
where a
MK
cov [ ,K M[ , D[
b
cov [ ,K . D[
Example 2. Let H1 , H 2 be standard normal random variables and suppose that
[
a1H1 a2H 2 ,
K b1H1 b2H 2 ,
where a1 , a2 , b1 , b2 are some constants. Since linear combinations of normal random variables are also normal random variables, in this case: [ ~ N 0, a12 a22 ,
K ~ N 0,b12 b22 , cov[ , K a1b1 a 2 b2 .
275
If in addition the condition a12 a22 ! 0 is satisfied, then from formulas (27) (28) for the curve of regression of K with respect to [ and the mean square error we obtain (respectively) M K [
a1b1 a2b2 [, a12 a22
'
(a1b2 a2b1 ) 2 . a12 a22
(29)
2.5. Tasks for independent work 1. Let [ ,K be independent geometric random variables with a parameter p: k 1 P{[ k} P{K k} q p, q 1 p, 0 p 1, k 1,2,...
Find the following probabilities: ɚ) P{[ K } ; b) P{[ k / [ K} ; c) P{[ t K} ; d) P{[ k / [ t K} ; e) P{[ k / [ ! K} ; f) P{[ k / [ K l} ; g) P{[ k / [ K l}. 2. Let [ be a random variable that takes nonnegative integer values and it is known that ɚ) P{[ t 0} 1 , P{[ k 1/ [ {k, k 1}}
r , r ! 0 k 0,1, 2,... k r 1
1 b) P{[ t 0} 1 , P{[ k 1/ [ {k, k 1}} c , k 0 , 1, 2, ... , . 2 Find distribution law of [ , i.ɟ. find the probabilities P{[ k } (k 0 ,1, 2, ...) . 3. ɚ) Show that we can calculate the conditional variance of a random variable [ with respect to a random variable K , i.ɟ. the variance D [ K
by the formula
2 M ª [ M [ K K º ¬ ¼
D[ K M [ 2 K M [ K . 2
b) Continuation. Prove the validity of formula D[
MD[ K DM [ K .
4. Let [ ,K be independent identically distributed exponential random variables with a parameter O. Find conditional mathematical expectation M [ [ K y and conditional variance D[ [ K y . 5. Let [ ,K be the independent identically distributed exponential random variables,
f [ ( x)
fK ( x) O2 xeOx ( x t 0) ;
f [ ( x)
fK ( x) 0 ( x 0) 
their distribution densities (parameter O ! 0 ). 276
Find conditional mathematical expectation M [ [ K y and conditional variance D[ [ K y . 6. Let [ ,K be independent uniformly distributed on > 0,[email protected] random variables.
Find conditional mathematical expectations M [ [ K y , M [ [ K and conditional variances D[ [ K y , D ([ / [ K ) . 7. Continuation. Let [ ,K be any independent random variables with the finite variances. Find the following conditional mathematical expectations and conditional variances: M [ [ K , D [ [ K ,
M [ [ K y , D[ [ K y
8. Joint distribution density of random variables [ ,K is given by the formula 2 1 ½ , x, y G ® x, y : 0 d x d 3 , 0 y 1 x ¾ ; ° f[ ,K x, y ® 3 3 ¿ ¯ ° x y G 0 , ( , ) . ¯
Find conditional mathematical expectation M [ K y . 9. Let [ 1 , [ 2 be normal random variables such that M[1 M[ 2 0, D[1
V12 , D[2 V 22 , cov[1 , [ 2 UV 1V 2 .
ɚ) Find conditional mathematical expectation M [ 2 / [ 1 ; b) Find conditional variance D [ 2 / [ 1 . 10. Random vector [ ,K is uniformly distributed in the domain, which is bounded by the 2 2 ellipse x 2 y2 1 .
a
b
Find conditional distribution density f [ K x y and conditional mathematical expectations M [ K
y , M K [
x .
11. Let [ 1 , [ 2 , ..., [ n be independent N 0,1 distributed (i.ɟ. standard normal) random variables, Km2 [12 [22 ... [m2 , Kn2 Km2 [m2 1 ... [n2 (m n) .
Show that conditional distribution density fK 2 /K 2 ( x / y ) on condition 0 x y is defined by m n the formula fK 2 /K 2 ( x / y ) m
n
§n· m 1 Ƚ¨ ¸ § x ·2 § ©2¹ ¨ ¸ ¨1 §m· §nm·© y ¹ © Ƚ¨ ¸Ƚ¨ ¸ ©2¹ © 2 ¹
x· ¸ y¹
nm 1 2
1 . y
12. Let [ , K be independent identically distributed random variables with finite mathematical expectations. Show that in this case random variables M [ [ K and M K [ K are the identically distributed random variables. 13. Random vector [ ,K is uniformly distributed in the domain 277
x 2 ( y ɫx)2 ½ D ® x, y : 2 d 1¾ , a b2 ¯ ¿
where a , b , c are positive parameters. Show that in this case the conditional distribution density fK [ y x is defined by the formula
1 , ° fK / [ ( y / x) ® 2b 1 ( x a) 2 ° ¯0,
if
y cx d 1 ( x a) 2 ,
if
y cx ! 1 ( x a) 2 ,
and, by using this density, find conditional mathematical expectation M K [ x and conditional variance D K [ x . 14. Continuation. In the conditions of the previous task, find the conditional mathematical expectation M [ K y . 15. [i ~ N (0,1) , i 1, 2, 3, and [1 , [ 2 , [ 3 are independent random variables. Let’s introduce new random variables: K1 [ 2 [ 1 , K 2 [ 3 [ 1 . Find conditional mathematical expectations M K1 K2
y , M K1 K2 and conditional varian
ces D K2 K1
y , D K2 K1 . 16. Continuation. Suppose that in the conditions of the previous task K1
[1 , K 2
[ 3 [1 .
[ 2 [1 , K 3
Find the following conditional expectations and conditional variances: M K1 K2
y, K3
D K1 K2
y, K3
z , z ,
M K1 K2 , K3 ,
D K1 K2 , K3 .
17. Let [ be a random Poisson variable with the parameter O , and in its turn, the parameter O is uniformly distributed on >0 ,[email protected] random variable. Find the distribution law for [ . Direction. Use the formula
P ^[
k`
f
³ P [
k /K
f
y fK y dy .
18. Solve the previous task, provided that O is an exponential random variable with a parameter P ! 0 . Calculate the mathematical expectation of [ . Direction. See the direction to the previous task. 19. Let [1 ,[ 2 ,...[ n ,… be independent identically distributed random variables with
a M [i f . Show that in this case for the random variables Sn
278
[1 [2 ... [ n , n 1, 2,... the relations
Sn , n
M [ 1 S n
M S m S n M Sn m Sn
m S n ( m n ), n (m!0) Sn ma
are almost sure correct. 20. Let [1 , [ 2 ,... be independent N 0,1 distributed random variables, S n [ 1 [ 2 ... [ n n 1,2,... . Find the following conditional expectations and the corresponding variances: M S n S n m M S n
y ,
S n m ,
M S n m S n
M S n m
S n .
y ,
Direction. Use the results of the previous task and the example 2. 21. Let [ ,K be independent random variables. Show that in this case the relations
MD [K [ M [ 2 DK , DM [K [
MK
2
D[
are true. Further, with the help of the last relations, find the corresponding mathematical expectations and variances in the cases of the following random variables: ɚ) [ ~ N a, V 2 , K ~ U c, d ; b) [ ~ U c, d , K ~ N a,V 2 ; c) [ , K – exponential random variables with the parameters O1 and O2 (respectively); d) [ ~ F 2 n , K ~ F 2 m .
22. Show the validity of the following relations for independent random variables [ ,K : DM [ K [ D[ ; DM [ [ K D[ DK 4 ; MD[ K [ DK ; MD [ [ K 3 M [ 2 2 M [ M K M K 2 4 .
23. Inequality of Jensen. Let M x be a convex downwards function, [ – a random variable
defined on the probability space ሺ : ǡ ࣠ǡ ܲሻ and M M [ f . Then, for any subsigmaalgebra G of V algebra ࣠ the inequality M M [ G t M M [ G
holds.
Direction. First note that since the function M x is convex downwards, for each x 0 there is
such O x 0 that inequality M x t M x0 O x0 x x0 holds. Then assume in this inequality that x0
M [ G , x [ and take the conditional mathematical expectation with respect to V  algebra G
from both parts of the inequality. 279
24. Prove that, if G1 and G2 are independent V algebras, then for any random variables [ ,K the conditional expectations M [ G1 and M [ G2 are also independent random variables.
25. Let G1 and G2 be independent V algebras. Show that in this case for any random variable [ with finite mathematical expectation the inequality M [ G1 G2 M[
holds almost sure. 26. Let [ be an integervalues random variable and it is known that P{0 < ȟ < } = 1, P{ȟ = k + 1 / ȟ > k} = p, k = 0,1,2,… Find the distribution law P{[ k}, k 1,2,... 27. [ 1 ~ Bi(n, p ) ; [ 2 is such that, when the condition on [ 1 is satisfied, it is distributed according to the binomial law with the parameters ([ 1 , p ) ; [ 3 is such that, when the condition on [ 2 is satisfied, it is distributed according to the binomial law with the parameters ([ 2 , p ) , etc. Show that in this case [k ~ Bi(n, p k ) . 28. Let [,[2 ,...,[n be independent Poisson random variables with the parameters O1 , O 2 ,..., O n (respectively). Find conditional probability P^[1 [ 2 ... [ l m / [1 [ 2 ... [ n k` . Direction. Use a known fact: if [ i ~ 3 O i i 1, ..., s and random variables are independent, then [ 1 ... [ s ~ 3 O1 O 2 ... O s . 29. A random variable [ , that obeys a Poisson distribution with a parameter O , is observed. Then [ Bernoulli trials with the probability of success p are conducted. Show that a random variable, which is equal to the number of successes, is distributed according to Poisson's law with the parameter Op . 30. Random variables [ ,K are independent identically distributed with the density f x O2 xeOx , x t 0 . Show that in this case the conditional variance D [ / [ K z = z2 / 20. 31. Let [ be a Poisson random variable with the parameter O , and, in its turn, the parameter O is a random variable, uniformly distributed on >0 ,[email protected] . Find the distribution law of [ . Direction. Use the formula
P ^[
k`
f
³ P [
k /K
f
y fK y dy .
32. Prove that, if all the events of ıalgebra G have the probability 0 or 1, then M [ / G M[ (a.s.) 33. Let [ and K be independent random variables.
Find DM [K / K , if [ is uniformly distributed on >0 ,[email protected] , and K ~ N a ,V 2 . 34. Let a random variable take at most n values. Is it true that M [ / G also takes no more than n values? 35. Let [ and K be random variables with finite mathematical expectations. Prove that, if M [ / K K and M K / [ [ with the probability 1, then [ K (a.s.)
280
Chapter
V LIMIT THEOREMS IN THE BERNULLI SCHEME
Earlier, we defined the Bernoulli scheme in the form of a triple ሺ : ǡ ࣛǡ ܲሻ (Chapter I, §4, item 4.1), where: 1) : ^Z Z1 , Z2 ,..., Zn : Zi 0 or 1`; 2)ࣛ ^A: A :` ; n
¦Zi
n
¦Zi
n
3) P(Z ) p i 1 q i 1 ( p q 1, p ! 0) . We also saw that the Bernoulli scheme is a probable model of a sequence of independent trials with two outcomes p  n (Chapter II, §2, point 2.3). The study of a random variable, an equal number of successes, led us to the socalled binomial distribution. The calculation of the probabilities of a binomial distribution with an increasing number of trials is very complicated in connection with the presence there of factorials of very large numbers and multiplying them by very small numbers p k q nk . It should be ensured that the intermediate numerical results do not exceed the range of acceptable values. Therefore, it is very important to find approximate formulas for calculating the probabilities of the binomial distribution and the sums of such probabilities. The statements that enable us to find such approximate or asymptotic formulas are called limit theorems in the Bernoulli scheme. §1. Laws of large numbers For the Bernoulli scheme we introduce random variables [1 , [ 2 ,...,[ n as follows: [i (Z ) [i (Z1,..., Zn ) Zi , i 1,..., n. In sense, these random values mean the following: if in the i th test there was a success, then [ i = 1; if the i th test was a failure, then [ i = 0. So, the introduced random variables are equally distributed and independent: P ^[i
P ^[i
D ,[ j
1`
E ` P ^[i
p, P ^[i
D ` P ^[ j
281
0`
q,
E ` , i z j, D , E
0,1.
Denote the number of successes in the first n tests by μn , i.e.:
Pn Z [1 Z [2 Z ... [n Z Z1 Z2 ... Zn . Тhen
Pn k
P ^Z : Pn Z
k ` Cnk p k q n k k
0,1,..., n ,
consequently Pn ~ Bi(n, p) , M Pn Z np, DPn npq (Chapter ІIІ, §1; Chapter ІV, §1). Theorem 1 (The law of large numbers for the Bernoulli scheme). For any H ! 0
μ ½ P ® n p ! ε ¾ o 0, n o f . ¯ n ¿
(1)
Proof. First, applying the Chebyshev inequality to the estimated probability, after passing to the limit for n o f , we obtain §P · D¨ n ¸ P ½ n P® n p ! H¾ d © 2 ¹ H ¯ n ¿
DP n n 2H 2
npq n 2H 2
pq o 0 n o f . nH 2
ז
Applying Chebyshev's inequality to the 4 (fourth) power of a random variable μn , we can prove the following strengthened version of Theorem 1: Theorem 2 (The strengthened law of large numbers for the Bernoulli scheme). For any ε ! 0 ½ P (2) P ®sup k p ! H ¾ o 0, n o f . ¯ k tn k
¿
Proof. We can write: f μ ½½ P ® ® k p ! ε ¾¾ d ¿¿ ¯k n ¯ k 4 f μ ½ f M(μk kp) d ¦ P ® k p ! ε¾ d ¦ . k 4ε 4 k n ¯ k ¿ k n
½ μ P ®sup k p ! ε ¾ ¯ k tn k ¿
(2')
The last inequality was written by applying the Chebyshev inequality to the μ fourth power of the random variable k p . But k M(μk kp)
4
k
¦ M(ξ j p) j 1
4
§ k · M ¨ ¦ (ξ j p) ¸ ©j1 ¹
4
,
6¦ M(ξi p) (ξ j p) 2
i j
282
2
because the mathematical expectations for the product of odd powers of ( ξi p ) are zero and the random variables are independent. Further, for different indices i z j , M ξi p ξ j p 2
M ξi p M ξ j p 2
M ξ j p
1 p
4
4
2
2
pq
Dξi Dξ j
p p4q
2
,
pq 4 qp 4 .
Consequently, in view of the obvious inequality pq d 1 , M μk kp
k pq qp 6C
4
4
4
4
2 k
pq
2
d k k k 1
k2.
So ½ 1 f 1 μ P ®sup k p ! ε ¾ d 4 ¦ 2 o 0 (n o f ). ¯ k tn k ¿ ε k nk
(2'')
ז REMARK 1. Assertion (2) is stronger than assertion (1): (2) (1). In fact, from (2 ') and (2 «), when the condition of Theorem 1 is satisfied, we obtain that f μ ½ ¦ P ® n p ! ε ¾ o 0 (n o f ) , k n ¯ n ¿ and this implies (1). On the other hand, it can be shown that condition (2) is equivalent to condition P Z P ®ω : lim n ¯ nof n
½ p ¾ 1. ¿
( 2 )
In the probability theory, convergence ( 2 ) is called convergence with probability 1 (or almostsurely convergent convergence) of a sequence of random variables μn n
to a constant p , and convergence (1) is called convergence in probability of a
μn to p . n Thus convergence ( 2 ) implies (1); therefore, for the Bernoulli scheme, the convergence (1) is called the (weak) law of large numbers, and the convergence (2) (i.e. ( 2 ) ,) is the strong law of large numbers for the Bernoulli scheme.
sequence of random variables
283
Corollary 1. If the function f ( x) is continuous on the uniform convergence is true
>0,[email protected] , then for all 0 d p d 1
§μ · Mf ¨ n ¸ o f(p), n o f . © n ¹
(3)
Proof. For any H ! 0 and n 1, 2,..., we introduce the following events:
μn ½ ® p ! ε ¾. ¯ n ¿
An H
Further, denoting by I An (H ) the indicator of the event An (H ) , we can write: §μ · M f ¨ n ¸ f(p) © n¹ §μ · §μ · M f ¨ n ¸ f(p) I An H M f ¨ n ¸ f(p) I A H d n © n ¹ © n ¹
d 2mP ^ An H ` sup f(x p) f(x) P An H , x dε
where m
sup f ( x) .
0d xd1
Then relation (3) is a consequence of the convergence
P An H o 0 (n o f ) and uniform continuity f ( x) on > 0,[email protected] .
Corollary 2. If the function f ( x) is continuous on x > 0, [email protected] the uniform convergence is true § · ¦ f ¨ ¸ Cnk x k (1 x)nk o f ( x) . n k 0 n
k
© ¹
ז
>0,[email protected] , then n o f for all
(4)
The proof follows immediately from §P · Mf ¨ n ¸ © n ¹
§k· f ¨ ¸ Cnk p k (1 p) n k , ©n¹
because (4) is another entry (3). ז It is not difficult to guess that from (4), as a consequence, we can obtain the wellknown Weierstrass theorem on uniform approximations of a function continuous on a segment by polynomials. We also note that the polynomials on the righthand side are called Bernstein polynomials. 284
§2. Limit theorems of MoivreLaplace Suppose, as in §1
Pn (Z ) [1 (Z ) [2 (Z ) ... [n (Z ) . Тhen §P · M¨ n¸ © n ¹
p;
§P · D¨ n ¸ © n ¹
§P · M ¨ n p¸ n © ¹
2
pq . n
(1)
From this we note that
Pn  p, n
Pn
p 
n
pq . n
If we consider the probabilities pq ½ °P ° P® n p d x ¾, n n ° ° ¯ ¿
(2)
Cnk p k q nk , 0 d k d n ,
(3)
then introducing the notation
Pn (k ) we can write
° P pq °½ ° Pn np °½ P® n p d x d x¾ ¾ P® n °¿ °¯ n °¯ npq °¿
¦
° k np °½ d x¾ ®k : npq °¿ ¯°
Pn (k ).
(4)
We pose the problem of finding convenient asymptotic formulas for n o f , the probabilities Pn (k ) and their sums for those k that satisfy the conditions on the righthand side of (4). The following result gives an answer not only for these values k , but also for 2 those k that satisfy the condition k np o (npq ) 3 .
2.1. The local MoivreLaplace theorem Theorem 1 (Local theorem of MoivreLaplace). Let 0 p 1. Then uniformly for all k such that the approximation k np Pn (k ) ~
1 e 2S npq
( k np ) 2 2 npq
285
, n of ,
o (npq )
2
3
is true (5)
i.e. with n o f Pn (k )
sup
^k : k np dM ( n )`
where M (n) o (npq)
2
3
1 e 2S npq
1 o 0 ,
( k np )2 2 npq
(5′)
.
(Above and everywhere in the sequel for positive sequences an , bn , the notation an 1 ). means an ~ bn that lim n of b n Before proving the theorem, let us pay attention to the following: if we introduce a function 2
1 x2 e 2S
M ( x)
(6)
k np , Theorem 1 can be formulated as follows: npq
and we denote this xk
Theorem 1'. Let 0 p 1. Then for all xk a о (npq) P ^Pn
Pn (k )
k`
1
1 o 0,
Pn np xk npq ~
6
,
1 M ( xk ), npq
(7)
i.e. with n o f sup
^xk : xk d\ ( n )`
npqPn np xk npq
M ( xk )
(7′)
where \ (n) o(npq) 6 . Proof of the theorem. By the wellknown Stirling formula for sufficiently large n ( n !! 1), 1
n!
2S ne n nneT ( n)
2S nennn (1 R(n)),
where 1 1 T ( n) , 12n 1 12n 286
R(n) o 0 (n o f).
(8)
Applying the Stirling formula, for n o f, k o f, Cnk
n k o f , we can write
n! k !(n k )!
2S ne n n n 1 R(n)
where
H
2S ke k k k 1 R(k ) 2S (n k )e ( nk ) (n k ) nk 1 R(n k ) 1 1 H k nk , k§ k· §k· § k· 2S n ¨1 ¸ ¨ ¸ ¨1 ¸ n© n¹ ©n¹ © n¹
H (k , n, n k )
1 R(n) 1 o 0 (n o f, k o f, n k o f). 1 R(k ) 1 R(n k )
k , and taking into consideration the aforesaid, we write down n the formula (3) differently:
Designating pˆ
k
§ p · §1 p · ¨ ¸ ¨ ¸ 2S npˆ 1 pˆ © pˆ ¹ © 1 pˆ ¹ 1
Pn (k )
nk
p 1 exp ®k ln (n k ) ln ˆ p 1 2S npˆ 1 pˆ ¯ 1
1 H p½ ¾ 1 H pˆ ¿
(9)
ª k pˆ § k · 1 pˆ º ½ exp ®n « ln ¨1 ¸ ln » ¾ 1 H . 2S npˆ 1 pˆ ¯ ¬ n p © n ¹ 1 p ¼¿ 1
We introduce the function H ( x)
x ln
x 1 x (1 x)ln . p 1 p
Then relation (9) can be written in the form Pn (k )
1 2S npˆ 1 pˆ
exp ^nH pˆ `1 H .
For the considered k, we have k np p pˆ
k np n
o §¨ pq © 287
o (npq ) n
2
3
2
3
, therefore
· o 0 (n o f). ¸ ¹
(10)
Taking this into account, we expand the function H pˆ in the neighborhood of point p in a Taylor series: H pˆ
1 1 2 3 H cc( p ) pˆ p H ccc( p ) pˆ p ... . 2 6
H ( p ) H c( p) pˆ p
From the formula (10), H c( x)
ln
x 1 x ln , p 1 p
H cc( x)
1 1 , x 1 x
H ccc( x)
1 1 . x 2 (1 x)2
Location H c( p)
H ( p)
H cc( p)
0,
H ccc( p)
1 1 p2 q2
1 1 p 1 p q p . p2q2
1 , pq
In addition, for the k at n o f 1 n §k 2 nH cc(p) pˆ p ¨ 2 2 pq © n 1 3 nH ccc(p) pˆ p 6
· p¸ ¹
2
(k np)2 , 2npq
0( 1 ).
So 1 e 2S npq
Pn (k )
( k np )2 2 npq
1 H (k , n, n k ) ,
where 1 H (k , n, n k )
1 H (k , n, n k ) 1 0(1) .
Further, for M (n) o (npq)
2
3
p(1 p) . pˆ (1 pˆ )
it is not difficult to show that
sup H~(k , n, n k ) o 0, n o f, ^k: k np dM n `
i.e. the relation (7) is true. ז We draw attention to the fact that relations (7), (7') could be written in the form P ^P n
k` ~
1 e 2S npq
( k np )2 2 npq
,
288
k np
o (npq )
2
3
,
(11)
° P np P® n ¯° npq
x ½° 1 x¾ ~ e 2, 2S npq ¿° 2
x
o npq 6 , 1
(11′)
(in the last formula np x npq ^0,1, 2,...n` ).
In conclusion, we recall that the function M x introduced by (6) is the distribution density of the standard normal random variable. Example. Let the probability that a product released by some factory is defective be 0.005. From the batch of the products of this plant, 10,000 items were randomly selected. Find the probability that among the selected products exactly 40 items will be defective. Solution. We have n = 10 000, p = 0.005, k = 40. Hence the required probability is: 40 9960 40 P10000 (40) C10000 0.005 0.995 . To approximate this probability, we apply formula (10 '). Then npq
10000 0.005 0.995 xk
k np npq
49.75  7.05,
 1.42.
From Appendix 1 we find that M (1, 42) 0,1456 . Therefore P10000 (40) 
0.1456  0.0206. 7.05
A direct calculation (without using the theorem of MoivreLaplace) gives
P10000 (40)  0.0197 . To understand the nature of approximate calculations using the MoivreLaplace theorem, we give the following table for p = 0.2; n = 25 (in the table, the probability values are given up to the fourth digit after the decimal point). Table 1
k
xk
Pn k
0 1 2 3
2.5 2.0 1.5 1.0
0.0037 0.0236 0.0708 0.1358
npq Pn k 0.0075 0.0472 0.1417 0.2715 289
M xk 0.0175 0.0540 0.1295 0.2420
4 5 6 7 8 9 10 11 12 13 14 >14
0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 >4.5
0.1867 0.1960 0.1633 0.1108 0.0623 0.0294 0.0118 0.0040 0.0012 0.0003 0.0000 0.0000
0.3734 0.3920 0.3267 0.2217 0.1247 0.0589 0.0236 0.0080 0.0023 0.0006 0.0000 0.0000
0.3521 0.3989 0.3521 0.2420 0.1295 0.0540 0.0175 0.0044 0.0009 0.0001 0.0000 0.0000
2.2. The MoivreLaplace integral theorem According to the relation (11′) xk
k np npq
o(npq)1 6 ,
' xk
1 , npq
P ^P n
k`  M xk 'xk .
and for all np xk npq , ° P np P® n ° npq ¯
½ ° xk ¾ ° ¿
Pn (np xk npq )
It is clear that 'xk o 0 (n o f) , and the set of points ^ xk ` as it were «fills» the entire numerical line. It is therefore natural to expect that (11') can be used to obtain an approximate integral formula 2
b ½ 1 b 2x P np ° ° P ®a n d b ¾  ¦ M ( xk )'xk  ³ M ( x ) dx = ³ e dx . 2S a npq a ° ° ¯ ¿ k:a xk db
We proceed to precise formulations. Theorem 2 (The MoivreLaplace integral theorem). Let 0 p 1, Pn (k ) Cnk p k q nk , Pn a, b @
¦
^k :a xk db`
Pn (np xk npq )
¦
^k :a xk db`
P ^P n
k` ,
where f a d b d f, and the sum is taken over all ^ xk ` for which np xk npq is an integer. 290
Then 2
sup
fd a bdf
1 b x2 Pn (a, b] ³ e dx o 0, n o f . 2S a
(12)
Proof. First we consider the case of finite a, b . In this case, by the local theorem Pn a, b @
where for H
Pn (np xk npq )
M ( xk )'xk (1 H ( xk , n)),
(13)
H ( xk , n) sup H ( xk , n) o 0, n o f .
(14)
xk dT
Consequently, for fixed a and b such that T d a d b d T ,
¦
a xk db
¦ M ( xk )'xk ¦ H ( xk , n)M ( xk )'xk .
Pn (np xk npq )
a xk db
a xk db
By the definition of the integral
¦ M ( xk )'xk
a xk db
b
(1) ³ M ( x)dx Rn (a, b) ,
(15)
a
and by the known properties of the integral sum, the remainder n o f tends to zero uniformly: sup Rn(1) (a, b) o 0 ( n o f). (16) T d a db dT
On the other hand
¦ H ( xk , n)M ( xk )'xk d sup H ( xk , n) sup
sup
xk dT
T d a db dT a xk db
¦ M ( xk )'xk
T d a db dT a xk db
d
ªb º d sup H ( xk , n) sup « ³ M ( x)dx Rn(1) (a, b) » d xk dT T d a dbdT ¬ a ¼ Т ª º (1) d sup H ( xk , n) « ³ M ( x)dx sup Rn (a, b) » o 0 ( n o f). x dT T d a dbdT ¬ Т ¼ k
The convergence to zero of the last relation follows from (14), (16) and from the fact that T
f
T
f
³ M ( x)dx d ³ M ( x)dx 291
1 2S
f
³e
f
x2 2
dx 1 .
Now given the above, we can write b
sup Pn a, [email protected] ³ M ( x)dx o 0 (n o f) .
T d a dbdT
(17)
a
We introduce the notation: x
Ф( x)
1 2S
³ M ( x)dx
f x
Ф0 ( x)
x
³
e
0
x2 2
dx,
f
1 x ³e 2S 0
³ M ( x)dx
x2 2
(18) dx.
Тhen 1 Ф0 ( x) , Ф(b) Ф(a) Ф0 (b) Ф0 a . 2
Ф( x)
And this, in turn, allows us to write formula (17) in the form sup
T d a db dT
Pn a, b @ (Ф(b) Ф(a)) o 0 (n o f) .
(17′)
We now show that formula (17') (and hence formula (17)) also holds for T By virtue of (17), for any H ! 0 there is T T (H ) such that Ф(T ) Ф(T )
1 2S
T
³e
x2 2
H
dx ! 1
4
T
.
According to (17'), one can find such N that for all n ! N and T sup Pn a, b @ (Ф(b) Ф( a))
T d a db dT
H . 4
It follows from (18) and (19) that H
H
Pn T , T @ ! Ф(T ) Ф(T ) ! 1 , 4 2
and this, in turn, implies that Pn f, T @ Pn (Т , f) 1 Pn T , T @ d 292
H , 2
f.
(18′) T (H )
(19)
where Pn f, T @ lim Pn s, T @ , spf
Pn (T , f)
lim Pn T , s @ . snf
Thus, for any f d a d T T d b d f , Pn a, [email protected] (Ф(b) Ф(a)) d Pn a, T @ (Ф(T ) Ф(a)) Pn T , T @ (Ф(T ) Ф(T )) Pn T , [email protected] (Ф(b) Ф(T ) d T
H
f
4
d Pn f, T @ ³ M ( x)dx
f
Pn T , [email protected] ³ M ( x)dx d T
H H H H H d 1 (Ф(T ) Ф(T )) H. 4 2 4 2 4 (in deriving the last inequalities we used relations (17') and (18)). Together with (17 '), it is easy to deduce from this that uniformly over all f d a b d f probability Pn a, [email protected] at n o f tend to Ф(b) Ф(a) uniformly. ז In probabilistic language, the result (12) can be formulated as follows: ½ P np 1 b x2 ° ° P ®a n d b¾ ³ e dx o 0 (n o f) . npq 2S a ° ° ¯ ¿ 2
sup
fd a dbdf
(12′)
i.e. for all f d a b d f with n o f uniformity ° ½° P np 1 b x2 d b¾ o P ®a n ³ e dx Ф(b) Ф(a) = Ф0 (b) Ф0 (a) . npq 2S a °¯ °¿ 2
(12′′)
If we introduce the function Fn x
Pn f, x @
° P np °½ P® n d x¾ , °¿ ¯° npq
(20)
then from (12') follows the convergence
sup Fn ( x) Ф( x) o 0 (n o f) .
fd xdf
(21)
Remark 1. The question naturally arises as to how fast the approach to zero in (12 ') and (21) occurs with increasing. We give (without proof) a result relating here, which is a particular case of the socalled BerryEssen theorem: 293
p2 q2 . npq
sup Fn ( x) Ф( x) d
fd x df
§
1 · ¸¸ cannot be improved, which © npq ¹
It is important to emphasize that the order ¨ ¨
means that the approximation Fn x with the help of Ф( x) can be poor for values close to zero or one p (even for large n ). The function defined by (20) is a function Fn x of the distribution of the random P np variable n , and the function Ф( x) defined by (18) is a distribution function of the npq
standard normal random variable. Therefore, the approximations obtained with the help of local and integral limit theorems are called normal approximations. The function Ф0 ( x ) (or Ф( x) ) is called the Laplace function or the error integral. The tabulated values of these functions are given in Appendix 2. Obviously, these functions are 1 Ф0 ( x) . related by a relation Ф( x) 2 2.2.1. Deviation of the relative frequency from the constant probability in independent tests For any H ! 0 P ½ P® n p d H¾ ¯ n ¿ P np n °½ n ° P np ° dH d n dH P® n ¾ P ®H pq ¿° pq npq °¯ ¯° npq
n °½ ¾ pq °¿
.
Then it follows from (12 '), (18) that P ½ 1 P® n p d H¾ n 2S ¯ ¿
P ½ § § P ® n p d H ¾ ¨ Ф0 ¨ H ¯ n ¿ ¨© ©
H
n pq
³ H
e
x2 2
dx o 0,
n o f,
(23)
n pq
§ n · ¸ Ф0 ¨ H pq ¹ ©
n ·· ¸ ¸ o 0, n o f, pq ¹ ¸¹
(23′)
If we take into account the oddness of Ф0 ( x ) ( Ф0 ( x) Ф0 ( x) ), then the last relation can be rewritten in the form 294
§ P ½ P ® n p d H ¾ 2Ф0 ¨ H ¯ n ¿ ©
n · ¸ o 0, n o f . pq ¹
(23′′)
From where, since Ф0 (f) 0.5 , we have P ½ P ® n p d H ¾ o 1, n o f, ¯ n ¿
means
P ½ P ® n p ! H ¾ o 0, n o f. ¯ n ¿
(24)
We recall that relation (24) is an assertion of the law of large numbers for the Bernoulli scheme (see §1). Usually, in order to estimate the absolute value of the deviation of the relative frequency from the probability of success, use the approximate formula derived from (23 «) § P ½ P ® n p d H ¾  2Ф0 ¨ H ¯ n ¿ ©
n · ¸, n o f . pq ¹
(25)
Recall that for this deviation the Chebyshev inequality gave an estimate (Chapter IV, §1, item 1.6, formula (20)):
P ½ pq P ® n p d H ¾ t 1 2 . nH ¯ n ¿ 2.2.2. Finding the probability of the number of successes in the given interval Since M Pn
np, DPn
npq , then for a random variable Pn
M Pn 0, 0 DPn written in the form
Pn M Pn , DP n
1 . For this normalized random variable, the relation (12») is
P ^a Pn d b` o Ф(b) Ф(a) = Ф0 (b) Ф0 (a), n o f . 295
(12''')
On the other hand, it is important for the researcher to find not the probability of occurrence of the event in question a certain number of times, but to find the probability of occurrence of this event in a given interval, i.e. to find the probability of an event ^k1 d Pn d k2` . For the probability of this event, we can write: P ^k1 d Pn d k2 `
° k np Pn np k2 np ½° d d P® 1 ¾. npq npq °¿ °¯ npq
(26)
Using the relation (12 «) allows us to establish the validity of the following assertions: For any f d k1 k2 d f , ª § k np · § k np · º Ф¨ 1 o 0, n o f, P ^k1 d Pn d k2 ` «Ф ¨ 2 ¸ ¨ ¸ ¨ npq ¸¸ » «¬ © npq ¹ © ¹ »¼
(27)
ª § k np · § k np · º P ^k1 d Pn d k2 ` «Ф0 ¨ 2 Ф0 ¨ 1 o 0, n o f. ¸ ¨ ¸ ¨ npq ¸¸ » «¬ © npq ¹ » © ¹¼
(27′)
It is clear that with approximate calculation of the probability P ^k1 d Pn d k2 ` , certain errors will be admitted using formulas (27), (27'). It is possible to estimate this error, but its exact estimate is very complicated, and the simpler estimates are very rough. It is known ([10], Chapter IV, §23) that this error can be reduced if the limits of integration on the righthand sides of (27) (or (27')) are slightly changed
§ · § · P ^k1 d Pn d k2 `  Ф ¨ x 1 ¸ Ф ¨ x 1 ¸ © k2 2 ¹ © k1 2 ¹
§ · § · Ф0 ¨ x 1 ¸ Ф0 ¨ x 1 ¸ , © k2 2 ¹ © k1 2 ¹
(28)
where k1
x
1 k1 2
x
k2
1 np 2 npq
k2 1 2
1 np 2 npq
xk1
1 , 2 npq
xk2
1 . 2 npq
From Table 2 it can be seen that the approximate formula (28) gives the probability P ^k1 d Pn d k2 ` values to within three or four decimal places, even for the order of several hundred. The usual formula (27') does not give this accuracy. 296
Table 2
k1
k2
The exact values of probabilities calculated by the formula
Approximate values of probabilities calculated by formula (27')
The calculated values of the probabilities calculated by formula (28)
k2
¦ Pn (k )
k k1
n = 100, p = 0.25 15 20 30
35 30 40
0.9852 0.7967 0.1492
60 70 80
70 80 90
0.9615 0.5366 0.2510
105 115 135
145 135 155
0.9659 0.7219 0.1621
0.9791 0.7518 0.1238 n = 300, p = 0.25 0.9545 0.4950 0.2297 n = 500, p = 0.25 0.9611 0.6983 0.1499
0.9845 0.7960 0.1492 0.9612 0.5366 0.2545 0.9658 0.7218 0.1624
Examples 1. A symmetrical coin is thrown 200 times. Find the probability that the coat of arms will fall at least 95 and not more than 105 times. Solution. By assumption, n 200, p q 0.5 , k1 = 95, k 2 = 105. Let P be the number of falling out of the coat of arms. Then we need to find the probability P^95 d P d 105` . We have xk1
k1 np
95 200 0.5
npq
200 0.5 0.5
1 2
k2 np
xk2
,
105 200 0.5 200 0.5 0.5
npq
Then, using formula (27'), using Appendix 2, we obtain § 1 · § 1 · P ^95 d P d 105`  Ф0 ¨ ¸ Ф0 ¨ ¸ 2 2¹ © ¹ ©
§ 1 · 2Ф0 ¨ ¸ © 2¹
 2Ф0 0.7071  2 0.2611 0.5222
If we applied Eq. (28), we would obtain x
k1
1 2
0.55 2 ,
x
k2
1 2
0.55 2 ,
P^95 d P d 105`  2Ф0 0.55 2  2Ф0 0.7778  0.56331. 297
1 2
.
The exact value of the sought probability P ^95 d P d 105`
105
k 2200  0.56325. ¦ C200
k 95
As we see, the approximation (28) is more accurate than the approximation (27'). Pn nnp 0.5 , 2. Let Pn , find the number a from the condition P ^ Pn d a`  0 npq
0.5 . i.e. of the condition P ^ Pn ! a`  0
Solution. We have, according to the formula (25), 2Ф0 a  0,5. From Appendix 2 we find that a = 0.6745. Therefore
P
^P
n
` ^P
np d 0,6745 npq  P
n
`
np ! 0.6745 npq  0.5 .
So, if a symmetrical coin is thrown once n , then the probability that the number of the drop of the emblem will be enclosed in the gap n 2 0.3373 n , n 2 0.3373 n
is approximately 0.5; Likewise, if the dice were thrown once n , then the probability that the number
«1» falling out would be enclosed in the interval n 6 0.251 n ,
n 6 0.251 n
would also be  0.5 . 3. How many times does it take to throw a dice so that the probability of the difference in the relative frequency module of falling number «1» from the probability of falling out «1» differed by no more than 0.01 was not less than the probability of the opposite event? Solution. If P n is the number of falling «1» in dice rolls n , then from the condition we get what we need to determine from the condition n P 1 ½ P 1 ½ P ® n d 0.01¾ t P ® n ! 0.99¾ . n n 6 6 ¯ ¿ ¯ ¿
From the relation (25), with p 1 6, H
0.01, q 5 6 , we obtain:
§ P 1 ½ n· P ® n d 0.01¾  2Ф0 ¨ 0.06 ¸, 5¹ ¯ n 6 ¿ © § P 1 ½ n· P ® n ! 0.99¾  1 2Ф0 ¨ 0.06 ¸. 5¹ ¯ n 6 ¿ ©
298
§ n· So, the desired n is determined from the inequality Ф0 ¨ 0.06 ¸ t 0.25 . From 5¹ © Appendix 2 we find that the solution Ф0 x 0.25 of the equation is the number 0.6745. The sequential n is determined from the condition 0.06
n t 0.6745 , т.е. n t 632. 5
§3. Poisson's theorem As already noted in §2 (see §2, §2.2.1), for p approaching zero or unity, the approximations obtained by means of the local or integral theorem of the MooreLaplace approximation become bad. Therefore, the question arises whether it is impossible to find better approximations for the probabilities of interest for small values p (or q ), rather than normal ones. This approximation is given by the following Poisson theorem. Theorem 1. (Poisson's theorem) If n o f , p o 0 and On np o O ! 0 , then for any fixed k 0,1,2,...
Cnk p k q nk
Pn (k ) o S k (O ) eO
Ok k!
.
(1)
The proof of the theorem follows directly from the following relations: Pn (k ) Cnk p k (1 p ) n k nk
k
n(n 1)(n 2) ... ( n k 1) § On · § On · ¨ ¸ ¨1 ¸ k! n¹ © n¹ © n k Onk § 1 ·§ 2 · § k 1 · § On · § On · 1¨1 ¸¨1 ¸ ...¨1 ¸¨1 ¸ ¨1 ¸ k ! © n ¹© n ¹ © n ¹© n¹ © n¹
o
Ok k!
o
S k O ( n o f) .
e O
Finding the limits, we took into account that when n o f n
On o O ,
§ On · O ¨1 ¸ o e , n © ¹
§ On · ¨1 ¸ n¹ ©
k
o 0.
ז
Comment. We recall that in Chapter I (see Chap. I, §2) we called a set of probabilities ^S k O ` a Poisson distribution, and in Chapter III (see Chapter III, §1) a 299
random variable with a distribution ^S k O ` was called a Poisson random variable with a parameter O (symbolically this was written in the form: [ ~ 3(O ) ). The theorem just proved serves as justification for these introduced terms. It is also known ([10], Chapter IV, §20) that the accuracy of approximation (1) is
Pn (k ) e O
Ok k!
d
O2 n
,
O
np,
(2)
Examples 1. The device consists of 1000 independently operating elements. The probability of failure in the time interval t of any of the elements is 0.003. Find the probability that at least three elements fail during the time t. Solution. For us the number of elements n 1000 is big enough, and probability 0.003 is small enough. If P is the number of failed elements, then Ρ ^ μ t 3`
1000
¦ Ρ1000 k 1 Ρ1000 (0) Ρ1000 (1) Ρ1000 (2) .
k 3
Taking np 1000 0.003 3 O , to calculate the probabilities Ρ1000 (0) , Ρ1000 (1) , P1000 (2) , we apply the Poisson theorem (approximate formula (1)):
§ 32 33 · Ρ ^ μ t 3`  1 π0 (3) π1 (3) π2 (3) 1 ¨ e 3 e 3 e 3 ¸ 2! 3! ¹ © From Appendix 3 we find that π0(3) = 0.0498, π1(3) = 0.1494, π2(3) = 0.2240. Substituting these values into the previous relation, we get that
Ρ^ μ t 3`  0.5768. The admitted error, according to formula (2), is no more than
O2 n
32 1000
0,009 .
Precise calculations show that P1000 (0) 0.0495615, P1000 (1) 0.1491318, P1000 (2) 0.224146 , 5^P t 3`  0.5771607 .
2. The device consists of a large number of independently working elements. If, in the time interval t, each of the elements fails with a very small probability, then in order that the probability of failure of at least one element in time t is 0.98, how many devices should we have on average? 300
Solution. By the conditions of the example, we can apply the Poisson theorem and, if P is the number of failed elements, then
Ρ ^ μ t 1` 1 Ρ ^ μ 0` 1 e λ
0.98 .
Since eO 0.02, then O ln50  4 , and for the Poisson distribution just the parameter O is an average number. §4. Tasks for independent work 1. The radio telegraph station accepts texts consisting of numbers. Due to the presence of various interference, each incoming digit has errors (is distorted) with a probability of 0.01. Considering that the figures are taken independently of each other, find the probability that in the 1 100 number in the text: a) there are exactly 7 errors; b) the number of mistakenly received digits is not more than 20. 2. If lefthanders make up on average 1% of all people, what is the probability that in the number of randomly selected 200 people: a) there will be exactly 4 lefties; b) there will be at least 4 lefties. 3. When you throw a coin exactly 14 000 times the coat of arms falls 7,228 times. Find the probability of the deviation of the number of the emblem falling from the average number of the emblem falling out (i.e. from np 7000) not less than in the indicated experiment. 4. Consider the sequence of independent Bernoulli trials with probability of success p 0,5 and denote by P n the number of successes. To use the MoivreLaplace theorem n 100 , find the approximate values of the probabilities: n n½ Ρ® μn d ¾, 2 2 ¯ ¿
n n½ Ρ® μn t ¾. 2 2 ¯ ¿
5. Continuation. Solve the previous task for the case n 128 . 6. A theater for 1,000 people, has two different entrances. Near each of the entrances there is a wardrobe. How many seats must be in each of the wardrobes so that on average in 99 cases out of 100 all spectators can undress in the wardrobe of the entrance through which they entered. Assume that the audience selects entrances with equal probabilities and consider two cases: а) the spectators come in pairs; b) spectators come singly. 7. Two dice are simultaneously thrown 100 times. We denote by P n the number of odd and one even number of points falling out. Find the probability Ρ^μn t 95`. 8. There are 2,500 adults living in the village along the railway line. Each of them travels about 6 times per month by train to the city, choosing the days of trips for random reasons regardless of the others. What is the minimum capacity to have a train, so that it overflows on average no more than once in 100 days (the train goes once a day). 9. In an almost unrestricted collection, half of the objects has the property A, and the fifth part – the property B. Properties A and B are distributed between the objects independently. A sample of 1,600 items was made. 301
What is the probability that in this sample the frequencies of the properties of A and B deviate from probabilities by no more than 1%? 10. In each of n independent trials, certain events may occur with probability p . Find the following probabilities: a) When n 1500 the modulus of the difference in the relative occurrence frequency of an event p 0,4 does not exceed 0.02; b) The number of occurrences of the event will be between: 1) 570 and 630; 2) 600 and 660; 3) 620 and 680; 4) 580 and 640. c) What are the limits of the frequency of the event for n 1200 for which the probability p 2 3 of deviation from it is 0.985? d) How many tests are needed so that the probability that the deviation from the frequency p 3 8 in one direction or another will be less than 0.01 is 0.995?
302
Chapter
VІ DIFFERENT TYPES OF CONVERGENCE OF SEQUENCES OF RANDOM VARIABLES
§1. Different types of convergence of sequences of random variables and their connection Let [ , [1 , [ 2 ,... be random variables given on the probability space ሺ : ǡ ࣠ǡ ܲሻ. Definition 1. If for any H ! 0 convergence
^
`
P Z : [ n Z [ Z ! H o 0 , n o f ,
(1)
then we say that a sequence of random variables [1 , [ 2 ,... converges in probability to a random variable [ and this is written in the form P [n o[ .
(1′)
We recall that we have encountered this type of convergence earlier in the proof of the law of large numbers for the Bernoulli scheme (Chapter V, §1). Obviously, the definition (1) is equivalent to the following: For any H ! 0 P Z : [ n Z [ Z H 1 , n o f . (1″)
^
`
Definition 2. If P ^Z : [ n Z o [ Z ` 1 , n o f ,
(2)
is the set of elementary events for which convergence [n Z o [ Z has probability 1, then we say that a sequence of random variables [1 , [ 2 ,... converges to a random 303
variable [ with probability 1 (or almost sure (a.s.), or almost everywhere (a.e.) and this convergence is denoted as: a.s.
[ n o [ , or
a.e.
[ n o [ , or [ n o [ ( a.s.) .
a.s.
Thus, [ n o [ and P ^Z : [ n Z o [ Z ` are equivalent. Definition 3. If for some 0 r f ,
(2′)
0, n o f, the convergences
r
M [n [ o 0, n o f,
(3)
then we say that a sequence of random variables [1 , [ 2 ,... converges to a random [ mean of the rth degree. In the analysis such a convergence is called convergence in the sense Lr , and therefore the convergence (3) is usually written in the form Lr
[ n o[ .
(3′)
If r 2 convergence (3) is called meansquare convergence (convergence in the meansquare sense) and is written as:
[ l.i.m.[ n .
(3*)
n of
( l.i.m. – limit in mean). Definition 4. If for any continuous and bounded function f
f x ,
Mf [n o Mf [ , n o f,
(4)
then we say that a sequence of random variables [1 , [ 2 ,... converges to a random variable [ in distribution or weakly, and it is written in the form d
[n o[ ,
weak
or [ n o [ ,
or [ n o [ (߱) (weak).
(4′)
From the definition of the mathematical expectation and the formula for the change of variables in the Lebesgue integral, we immediately obtain the equivalence of the convergence (4) to any of the following convergences:
³ f [n Z P dZ o ³ f [ Z P dZ , n o f,
:
:
304
f
³
f
f x dF[n x o f
³
f
³ f x dF[ x , n o f ,
(4*)
f
f x Pn dx o
f
f
³ f x P dx ,
f
where probabilistic measures Pn , P correspond to functions F[n x , F[ x . When the convergence condition (4*) in the middle is satisfied, it is said that the sequence of the distribution function of random variables F[n x converges weakly to the distribution function of a random variable F[ x (this kind of convergence of the sequence of the distribution functions will be considered in detail in the next section). Connections between different types of convergence of sequences of random variables are established by the assertions proved below. P
d
Statement 1. [n o[ [n o[ . Proof. For any H ! 0 and for any continuous and bounded function f
f x
1 2
(for simplicity, we assume that f x d ) and for any G ! 0 we can write (below I A is an event indicator A ):
M f [ n f [ d
M f [ f [ I^ ` d d M f [ f [ I^ ` P^ [ [ ! G ` .
d M f [ n f [ I^ [
n [
dG `
[ n [ dG
n
[ n [ !G
n
n
Now choose a sufficiently large number N and a number G ! 0 so that the folloH
wing condition is satisfied: P ^ [ ! N ` . Since the function f 2
f x is a continuous
H
function, for any H ! 0 there is one G ! 0 that x y G follows f x f y d . 2
Therefore
M f [ n f [ I^ [ [ dG ` n
M f [ n f [ I ^ [
M f [ n f [ I^ [
H d MI^ [ ! N` MI^ [ d N` 2 So
n [
dG `
P^ [ ! N`
n [
dG `
I^ [ d N ` d
H H H d H. 2 2 2
Mf [ n Mf [ d H P^[ n [ ! G `, 305
I^ [ ! N `
P
but [ n o[ (condition (1)), H is any positive number, therefore convergence follows d
from the last inequality [ n o [ , hence the convergence (4) is true.
ז
P
Assertion 2. If [ n o [ , then for any continuous (but not necessarily bounded) P
g x the convergence g [n o g [ is true. Evidence. Since g g x is a continuous function, for any H ! 0 there is G ! 0
function g
such that x y G it follows g x g y H . Consequently
^Z : [ Z [ Z G ` ^Z : g [ Z g [ Z H ` , ^Z : [ Z [ Z t G ` ^Z : g [ Z g [ Z t H ` , n
n
n
n
Whence
^
` ^
`
P Z : g [ n Z g [ Z t H d P Z : [ n Z [ Z t G o 0 ,
n o f. ז
Remark 1. In Claims 1 and 2, the requirements of boundedness and continuity are important. For example, if [ Z d 1 , °[ Z , Z An , [ n Z ® 2 °¯n , Z An , 1 n
where An ࣠ и P An 1 , then P ^[ n o [`
P An
1 o 0 ( n o f ), n
P
i.е. [ n o [ at n o f . But at the same time
M [ n Z
2 ³ [ n Z P d Z ³ [ n Z P d Z ³ n P d Z t
:
An
An
t 1 n P An 1 n o f ( n o f ), 2
M [ d 1 , means M [n o M [ ( n o f ) и M [n o M [ ( n o f ).
306
Theorem 1 (The theorem on majorized convergence). If a sequence of random variables K n , n 1, 2,..., and random variables K , ] , satisfy the conditions P
Kn d ] , Kn oK
M] f,
and
then there exists a limit MK
lim MKn . n of
Proof. M K exists, because for any N and H ! 0 ,
M min K , N lim M min K , N I^K n of
n K
H `
d
d lim M min Kn , N H d lim M K n H d M ] H . nof
nof
P
Further, from condition Kn oK ( n o f ) it follows that for any G ! 0 and sufficiently large n
M ] I^K K !H ` G , n
M K I^K K !H ` G . n
Consequently,
MKn MK d M Kn K I^K K dH ` M Kn K I^K K !H ` H 2G . n n
ז
From the definition of almost sure (a.s.) convergence and using the convergence criterion of the sequence, we get that condition (2) is equivalent to the condition: For any H ! 0 f f § · P¨ Z : [ n Z [ Z d H ¸ 1 . © H !0 N 1 n N ¹
^
`
(2′)
From this it is not difficult to obtain the validity of the following criterion for convergence (a.s.) a.s. [ Theorem 2. In order that the convergence is satisfied n o [ , it is necessary and sufficient that condition
§ f · lim P ¨ Z : [ k Z [ Z ! H ¸ 0 . nof ©k n ¹
^
`
307
(2*)
f
¦ P ^ [ k [ ! H ` implies condition (2 ') and
The convergence of the series
k 1
P
condition P ^ [ k [ ! H ` o 0 ( k o f ), i.e. and convergence in probability [ n o [ . Let us formulate these results in the form of separate statements. Assertion 3. If for any H ! 0 f
¦ P ^ [k [ ! H ` f ,
(5)
k 1
a.s.
then [ n o [ ( n o f ). a.s.
P
Statement 4. [ n o [ [ n o [ . The following lemma plays an important role in proving the properties that occur with probability 1. Lemma 1 (BorelCantelli lemma). Let A1 , A2 ,... be a sequence of events defined on a probability space ሺ : ǡ ࣠ǡ ܲሻ the upper limit of this sequence of events A* , i.e. A* means the onset of an infinitely large number of events of this sequence (Chapter I, §1): f f
A*
Ak .
n 1k n
Then the following assertions hold: f
а) If б) If P A
*
¦ P An f , n 1 f
тhen P A* 0;
¦ P An f and A1, A2 ,... – sequence of independent events, then n 1
1.
Proof. а) Since Bn
f k n
Ak
f k n 1
Ak
Bn 1 , by the axiom of continuity from
below (axiom P3)״ P A*
§f f · P¨ Ak ¸ © n 1k n ¹
f § f · li P ¨ Ak ¸ d lim lim li ¦ P Ak 0 . nof nof © k n ¹ nof k n f
(vanishing of the last limit is a consequence of the convergence of the series ¦ P Ak ). k 1
b) If A1 , A2 ,... are a sequence of independent events, then A1, A2 ,... are also a sequence of independent events. Therefore, for any N t n , 308
§ N · P ¨ Ak ¸ ©k n ¹
N
P Ak , k n
By the axiom P3״ § f · P ¨ Ak ¸ ©k n ¹
§ N · li P ¨ Ak ¸ lim N of ©k n ¹
f
N
li P Ak lim
P Ak .
N of k n
k n
Further, since for 0 d x 1 an inequality holds ln 1 x d x , then f
f
k n
k n
ln P Ak ln 1 P Ak
f
f
k n
k n
¦ ln 1 P Ak d ¦ P Ak .
f
By hypothesis ¦ P Ak f , from where k n
§ · § · P Ak P ¨ Ak ¸ 1 P ¨ Ak ¸ 0 . k n k n k n f
f
f
¹
©
©
¹
§f · Ak ¸ 1 , i.e. ©k n ¹
Consequently P ¨
· §f f P¨ Ak ¸ © n 1k n ¹
P A* 1 .
ז
Corollary 1. If A1 , A2 ,... are a sequence of independent events, then, depending f
on the convergence or divergence of the series or one: P A = 0 or P A *
*
¦ P An the probability P A* is zero n 1
1.
Corollary 2. Assertion a) of the BorelCantelli lemma implies assertion 3. Proff. If for any H ! 0 we introduce events An H
^Z : [
then condition (5) becomes a condition Cantelli lemma P A H 0 , where
n
Z [ Z ! H ` , f
¦ P An H f . Therefore, by the Boreln 1
*
A* H
f
f
n 1k n
309
An H .
In other words, f
¦ P ^ [k [ ! H ` f , H ! 0 k 1
P A* H 0 ,
P^Z : [ n Z o [ Z ` 0
H !0
P ^Z : [ n Z o [ Z ` 1 ,
i.е. the statement is true . ז Corollary 3. Let H 1 , H 2 , . . be a sequence of nonnegative numbers satisfying condition H n p 0 ( n o f ). Then, if condition f
¦ P ^ [n [ ! H n ` f , n 1
(6)
a.s.
then [ n o [ ( n o f ). Proff. Let An ^ [ n [ ! H n ` . Then by the BorelCantelli lemma P A* 0 , therefore for almost all Z : there is N
N Z such that n t N Z the inequality
[ n Z [ Z d H n holds. But H n p 0 , therefore, for almost every Z : [ n Z o [ Z ,
i.e. a.s.
[n o [ . Lr
ז
P
Statement 5. [ n o [ [ n o [ , in particular
l.i.m.[ n n of
P
[ [n o[ .
The proof follows immediately from the Chebyshev inequality. ז Combining all of the statements 15 proved above, we obtain the following theorem. Theorem3 (The main theorem on the connections of different kinds of convergences of sequences of random variables). a.s.
[n o [ d
P
[n o[ .
[ n o[ Lr
[ n o[ 310
§2. Weak convergence In §1 (see Definition 4) we have already defined the concept of weak convergence of a sequence of random variables and a sequence of distribution functions corresponding to these random variables. But if this question is considered from the point of view of general (not necessarily distribution functions of random variables) sequences of the distribution function, then difficulties arise in the sense that the limit of the sequence of any distribution function may not be a distribution function. For example, consider the following sequence of the distribution functions: Fn x
1 , n d x n ; 2
Then for n o f Fn x o
Fn x 1 2
0 , x n ;
Fn x 1 , x t n .
G x , but G x is not a distribution function.
In view of the above, we give the following definition. Definitions 5. Let Fn x , n 1, 2,... , be a sequence of a distribution function,
F x is a distribution function.
If convergence holds at points of continuity F x
lim Fn x nof
F x ,
then it is said that the sequence of the distribution function Fn x converges to the distribution function F x in the weak sense (or basically) and denotes it as follows: W
Fn x o F x .
(7)
Thus, by definition, W
Fn x o F x lim Fn x nof
F x , x C F ,
(8)
where the set C F is the set of points of continuity of the function F x : C F
^x R : F x
F x 0 ` .
Theorem 5 (The theorem on weak convergence). In order for the sequence of the distribution function Fn x , n 1, 2,... , to converge in the weak sense to a distribution function F x , it is necessary and sufficient that for any continuous and bounded function f x 311
f
f
f
f
³ f x dFn x o ³ f x dF x , n o f ,
(9)
i.е. (8) (9)). Proof. (8) (9). F x , Fn x – are distribution functions, therefore for every H ! 0 can choose sufficiently large and are points of continuity F x of numbers N , N
( N , N C F ) so that the inequalities
³
x tN
dFn x H ,
³
dF x H .
x tN
Тhen f
f
N
N
f
f
N
N
³ f x dFn x ³ f x dF x d 2H ³ f x dFn x ³ f x dF x ,
where for simplicity we have chosen a continuous bounded function f x such that f x d 1 . Next, we split the halfinterval N , N @ by the points ݔ C F , and
x0
N x1 x2 ... xk
N , and approximate the function f x
g H , x
¦ f x j I x k
j 1
j 1 , x j º ¼
x
so that the conditions N
³
N
N
f x g H , x dFn x H ,
³ f x g H , x dF x H
N
(above, as always, I A x is the set A indicator). Therefore, for the proof of (9) for any continuous and bounded function f x it suffices to prove this for the function g(Ԫ, х). defined above. The validity of the latter follows from the fact that n o f , N
³
N
g H , x dFn x
¦ f x j ª¬ Fn x j Fn x j 1 º¼ o k
j 1
o ¦ f x j ª¬ F x j F x j 1 º¼ j 1 k
N
³ g H , x dF x
.
N
(9) (8). We take any H ! 0 and define a continuous function fH t as in Figure 5. We have Fn x
x
³
f
fH t dFn t d
x H
³
f
fH t dFn t
312
f
³ fH t dFn t ,
f
therefore, according to (9) limsup Fn x d nof
f
³
f
fH t dF t
x H
³ fH t dF t F x H .
f
If x C F , then, because of the arbitrariness of the choice H , limsup Fn x d F x , nof
Fig. 5
so that relation (8) follows from (9). ז Almost word for word, repeating the proofs of the theorem with obvious changes, we will verify the validity of the following theorem. Theorem 5. In order that weak convergence take place W
Fn x o F x ,
it is necessary and sufficient that condition
Fn y Fn x o F y F x , x, y C F . Theorem 6. If C F
^x : F x
F x 0 `
R,
i.е. the limit function F x is continuous, then the weak convergence W
Fn x o F x is equivalent to uniform convergence 313
( n o f)
sup Fn x F x o 0 ( n o f ). x
Proof. Since a continuous function F x , for each x R of the convergence convergence on any fragment > N , N @ . This can be shown, for example, in the following way. From the segment > N , N @ we select points x0 , x1 , x2 ,..., xm so that the conditions Fn x o F x follows the uniformity of this
are satisfied N
x0 x1 x2 ... xm
N . Next, choose a number n0 so that for H n ! n0 the condition is satisfied Fn xi F xi d , where H ! 0 is any number. Then 5
for any point x > N , N @ there is a number k such that xk d x d xk 1 for all n ! n0 : Fn x F x d Fn x Fn xk Fn xk F xk F xk F x d d Fn xk 1 Fn xk Fn xk F xk F xk F x d d Fn xk 1 Fn xk F xk 1 F xk F xk Fn xk Fn xk F xk F xk F x d d
H 5
H 5
H 5
H 5
H 5
H.
If x ! N , however, we can write: Fn x F x
x
³
f
dFn x
x
³ dF x
f
x
x
N
N
Fn N F N ³ dFn x ³ dF x f
f
N
N
H ³ dFn x ³ dF x H 1 Fn N 1 F N 3H ,
where we have chosen the number N in such a way that the inequalities are satisfied 1 Fn N H , 1 F N H . In the case x N , the reasoning is analogous to the case x ! N . ז REMARK 2. If the distribution functions Fn x , F x are discrete and the sets of their discontinuity points coincide C Fn C F
^x1, x2 ,...` ,
then the weak
W
convergence Fn x o F x is equivalent to the convergence Fn xk Fn xk 0 o F xk F xk 0 , 314
k 1,2,... .
To see this it suffices to choose in Theorem 5 x, y C F so that the conditions
y p xk , x n xk . If Fn x and F x are distribution functions of the random variables, respectively, [ n and [ , i.e. Fn x
F[n x
P ^[ n d x` , P
F x F[ x P ^[ d x` , d
then, according to assertion 1, [ n o [ [ n o [ . It turns out that if the limiting random variable is degenerate, then the converse assertion also holds; the following theorem is true. P
Theorem 7. If [ n o [ , P^[
c` 1 , where c is a constant, then d
P
[n o[ [n o[ . In other words, if the sequence of the distribution function Fn x converges weakly to the distribution function
F x 0 , x c ; F x 1 , x t c , then P
[ n o [ , P^[
c` 1 .
Proof. Necessity is the one proved in paragraph 2.1. statement 1. Adequacy. For any H ! 0 , taking into account degeneracy of [ , i.e. P^[ c` 1 , we can write: P ^ [n [ d H `
P ^ [n c d H `
P ^c H d [ d c H `
F[n c H F[n c H 0 o F[ c H F[ c H 1,
ז
As was required to prove. §3. The Cauchy criterion for convergences in probability and with probability 1
First we will introduce the definition of fundamental sequences of random variables. If for any H ! 0
^
`
P Z : [ n Z [ m Z ! H , n, m o f , 315
(10)
then such a sequence of random variables [ n is called a probabilityfundamental sequence. If for a sequence of random variables [ n condition f f § · P¨ Z : [ n Z [ m Z H ¸ 1 , © H !0 N 1 n ,m N ¹
^
`
(11)
then such a sequence of random variables [ n is said to be fundamental with probability 1 (almost everywhere fundamental, almostprobably fundamental) sequence. Theorem 8. In order that the sequence of random variables [ n , n 1, 2,... , be a sequence with probability 1, it is necessary and sufficient that for any H ! 0
½ ° ° P ®sup [ k [l t H ¾ o 0 , k tn ° ° ¯ l tn ¿
n o f,
(11′)
n o f.
(11″)
or, equivalently, that convergence takes place
^
`
P sup [ n k [ n t H o 0 , k t0
Proof. Necessity. Suppose that condition (11) is satisfied. We denote by A
f
H
f
H !0 N 1 k ,l N
where Ak ,l H
H As P A
^Z : [
k
Ak ,l , H
Z [ l Z H ` .
1 , тhen
§ f f H · P¨ Ak ,l ¸ 0 , © N 1 k ,l N ¹
§ f f H · P¨ Ak ,l ¸ 1 , © N 1 k ,l N ¹ Therefore, by axiom P3( ״Ch. І , §1), § f H · lim P ¨ Ak ,l ¸ N of © k ,l N ¹
½ ° ° li lim P ®sup [ k [l t H ¾ 0 , of No ktN ° ° ¯ ltN ¿
i.е. (11′) is fulfilled. 316
To prove sufficiency, we need to do the above in reverse order. The equivalence of (11') and (11») follows immediately from the following inequalities:
sup [nk [n d sup [nk [ nl d 2sup [ nk [ n . k t0
k t0 l t0
k t0
ז
We now prove the Cauchy criterion for convergence in probability and with probability 1. Theorem 9 (Cauchy's criterion for convergence in probability). In order for a sequence of random variables [ n , n 1, 2,... , to converge in probability, it is necessary and sufficient that this sequence be fundamental in probability. P
Proof. Necessity. Let [ n o [ . As
[n [m d [n [ [m [ , have
^[
n
H½ [m ! H ` ® [n [ ! ¾ 2¿ ¯
H½ ® [m [ ! ¾ , 2¿ ¯ P
thereby (taking into account the convergence [ n o [ )
H½ H½ P ^ [ n [ m ! H ` d P ® [ n [ ! ¾ P ® [ m [ ! ¾ o 0 ( n, m o f ), 2¿ 2¿ ¯ ¯ and this means fundamental to the probability of a sequence [ n . Before proving the sufficiency of the conditions of the theorem, we first prove an important lemma. Lemma 2. If [ n , n 1, 2,... , is a sequence that is fundamental in probability, then from it we can select a subsequence converging with probability 1. Proof. We choose any sequence of numbers H k such that H k ! 0 , k 1,2,... , f
¦Hk f k 1
(for example H k
We set n1
nk
2 k ), and form a subsequence [ nk as follows:
1 and define nk by induction from condition
^
`
min n ! nk 1 : P ^ [t [ s ! H k ` H k , t t n, s t n .
Then, by construction
¦ P ^ [n [n ! H k ` ¦ H k f , f
k 1
f
k 1
k
k 1
317
therefore, by the BorelCantelli lemma, in a sequence of events Ak
^[
nk 1
[nk ! H k
`
with probability 1 only a finite number of events is realized. In other words, for the event A*
f
f
m 1k m
Ak we have P A*
0. f
Consequently, a number of random variables [1 ¦ [ nk 1 [ nk k 1
probability 1:
^
`
f
P [1 ¦ [ nk 1 [ nk f
^Z [
We introduce an event N
[ Z
k 1
:
1
f
¦ [ nk 1 [ nk
converge with
1.
`
f and define a random variable
k 1
f °[1 Z ¦ [ nk 1 [ nk , Z N , ® k 1 °0, Z N . ¯
Then it is clear that N 1
¦ [n
k 1
k 1
[ nk
a.s.
[ n N o [ ( N o f ).
ז
Proof of the sufficiency condition of Theorem 9. If [ n is a sequence that is fundamental in probability, then by Lemma 2 there a.s.
exist a subsequence [ n k and a random variable [ such that [ n k o [ , in this case H½ H½ P ^ [ n [ ! H ` d P ® [ n [ nk ! ¾ P ® [ nk [ ! ¾ o 0 ( n, nk o f ), 2¿ 2¿ ¯ ¯
because the first probability on the righthand side tends to zero because of the fundamental nature of the probability of the sequence [ n , and the second probability of the righthand side tends to zero due to the fact that a.s.
P
[ nk o [ [ n o [ . k
P
ז So [ n o [ , as required. Theorem 10 (Cauchy's criterion for convergence with probability 1). In order for a sequence of random variables [ n , n 1, 2,... , to converge to a random variable [ 318
with probability 1 (almost – probably, almost – everywhere), it is necessary and sufficient that this sequence be fundamental with probability 1. a.s.
Proof. Necessity. If [ n o [ , then using inequality sup [ k [ l d sup [ k [ sup [ l [ , k tn l tn
k tn
l tn
We can write
½ H½ H½ ° ° P ®sup [ k [l ! H ¾ d P ®sup [ k [ ! ¾ P ®sup [l [ ! ¾ . 2¿ 2¿ k tn ¯ k tn ¯ l tn ° ° ¯ l tn ¿ Further, by theorem 1 of 1.1. (see formula (2')), probabilities on the right side tend to zero, and this means (by theorem 8) the fundamental property with probability 1 of the sequence [ n .
Adequacy. Let [ n be a fundamental sequence with probability 1. We introduce the event N { Z : [ n Z is not almost sure (a.s.) a fundamental sequence}. Then for each Z N : \ N sequence [ n Z , n 1, 2,... , is fundamental with probability 1, and by the Cauchy criterion for a number sequence for each Z N there exists a limit lim [n Z . We define the random variable [ Z as follows: nof
° lim [ n Z , Z N
[ Z ®nof ° ¯0,
: \ N,
Z N.
Then by construction a.s.
[n o [ .
ז
§4. Tasks for independent work P
P
a.s.
Lr
1. Prove that if [ n o [ and [ n oK , then [ and K are equivalent random variables, i.е. P ^Z : [ Z z K Z ` 0 . 2. Prove that if [ n o [ and [ n o K , r ! 0 , then P ^[ K` 1 . P
P
3. Let [ n o [ , [ n oK and [ ~ K ( [ and K equivalent random variables).
319
Prove that for any H ! 0 , P ^ [ n Kn ! H ` o 0 , n o f . P
P
4. Prove that if [ n o [ , K n o K , then P
a[ n bK n o a[ bK ( a , b – constants), P
[n o [ , P
[ nKn o [K .
If under the conditions of this task, convergence in probability is replaced by almost sure (a.s.) convergence, are the statements of the task still true? P
5. Show that convergence [ n [ o 0 implies convergence [ n2 o[ 2 . 2
P
6. For a sequence of random variables [1 , [ 2 ,... we prove the following statements: P 1 P 1 o ; а) If [ n o a z 0 , then [n a a.s.
b) If [ n o a ≠ 0, then
1
[n
a.s.
o
1 . a weak
weak
7. Is the statement [ n o [ ฺ [ n [ o 0 ? weak
P
8. Prove that if [ n o [ , K n o 0 , then weak
а) [ n K n o [ ; P
b) [ nK n o 0 . a.s.
g x from convergence [ n o [ follows the
9. Prove that for any continuous function g a.s.
convergence g [ n o g [ . 10. Prove that for any continuous bounded function g
P
g x from convergence [ n o [
Lr
follows the convergence g [ n o g [ , r ! 0 . 11. Prove that if a sequence of random variables [1 , [ 2 ,... and a random variable [ satisfy a.s.
f
condition ¦ M [ k [ 2 f , then [ n o [ . k 1
Direction. Notice P ^ [n [ ! H ` d
M [ n [
2
and apply statement 3. H2 12. Suppose that a sequence of random variables [ n , n 1,2,... , for some r ! 0 satisfies the f
r
condition ¦ M [ n f . n 1
320
a.s.
Show that then [ n o [ . 13. Let [ n , n 1,2,... , be a sequence of independent identically distributed random variables. Prove that the following statements are true:
M [1 f ¦ P^[1 ! H n` f f
n 1
[
f
¦ P ® n1 ¯
n 1
½ !H¾f ¿
[n a.s. o0. n
14. Let [ n , n 1,2,... , be a sequence of independent identically distributed random variables. Prove that then, in order that only a finite number of events An
^[
n
t n
`
occur with
probability 1, it is necessary and sufficient that the variance of the random variables be finite. 15. [1 , [ 2 ,... is a sequence of random variables with M[ i a , D[ i V 2 f and cov[ i , [ j 0 ( i z j ). Show that then the following assertion is true:
[1 [2 ... [n2 P ®lim nof n2 ¯
½ a¾ 1 . ¿
Direction. Find variances of random variables [1 [ 2 ... [ n 2 n 2 , then apply the Chebyshev inequality and use the result of Assertion 3. 16. [1 , [ 2 ,... are sequences of random variables with cov[ i , [ j 0 ( i z j ). We define a new sequence of random variables:
Kn
max
n2 k d n 1
2
[n2 1 [n2 2 ... [k k n2 a ,
Show that then the sequence
Kn n2
M[ i
n
a , D[ i
V2 f
and
1,2,... .
a.s. converges to zero.
Direction. Find the variance D [ n2 1 [ n2 2 ... [ k , then taking n
P ^max ^K1 ,K2 ,...,Kn ` ! H ` d ¦ P ^Ki ! H ` , i 1
apply Chebyshev's inequality and use the result of Assertion 3. 17. Let [ n , n 1,2,... , be a sequence of independent identically distributed random variables with M [1 0 , M [12 1 . Show that then there is a weak convergence §[ [ [ · max ¨ 1 , 2 ,..., n ¸ o 0 , n o f . n¹ © n n
321
18. Show that if [ defined as [ ,K M [K .
l.i.m .[ n , K n of
l.i.m .Kn , then [n ,Kn o [ ,K , where the scalar product is n of
19. Let [1 , [ 2 ,... be a sequence of random variables, Q1 ,Q 2 ,... be a sequence independent of [1 , [ 2 ,... integral random variables. Prove that then the following assertions hold: P
P
P
а) Q n o f , [ n o [ [Q n o [ ; P
b) Q n o f , [ n o [ (weak.) [Q n o [ (weak.) Direction. Use the following inequalities:
^
P [Q n [ t H
^
`
f
¦ P ^ [k [
`
P [Q n d x
k 1
f
t H ` P ^Q n
¦ P^[ k d x`P^Q n k `.
k 1
322
k` ,
Chapter
VІІ THE LAWS OF LARGE NUMBERS
§1. The weak law of large numbers Definition 1. Let [1 , [ 2 ,... be a sequence of random variables, M [1 , M [2 ,... – a sequence of their expectations (respectively). If for any H ! 0 when n o f a probability convergence takes place: [ [ ... [ n M [1 M [ 2 ... M [ n ½ P® 1 2 !H¾o0, n n ¯ ¿
(1)
then we say that a sequence of random variables [1 , [ 2 ,... obeys the (weak) large numbers law (or obeys the large numbers law, or the large numbers law can be applied to this sequence). We introduce the notation: Sn
[1 [2 ... [n ,
Sn
Sn . n
Then, the relation (1) means a convergence in probability P
Sn MSn o 0 ( n o f ) .
It is clear, that the convergence (2) is equivalent to the convergence
^
`
P Sn MSn d H o1 ( n o f ).
(2)
Theorem 1 (Chebyshev theorem). If [1 , [ 2 ,... is a sequence of pairwise independent random variables and the variances of the terms of the sequence are uniformly bounded, i.е. D[i d c f , i 1,2,... , then this sequence obeys the law of large numbers. 323
Remark 1. The convergences (1), (2), as already noted, are the convergences in probability. In what follows we prove (see §2) that under certain conditions on sequence of random variables [1 , [ 2 ,... also convergence with probability 1 takes place. In this case, it is said that the sequence of random variables obeys the strong law of large numbers. Since convergence with probability 1 implies convergence in probability, then the law of large numbers, which is weaker (than the strengthened law of large numbers), is called the weak law of large numbers. In what follows, as is generally accepted, the weak law of large numbers will be briefly called the law of large numbers. Proof. By Chebyshev inequality, for any H ! 0
^
`
P Sn MSn ! H d
DSn
H
2
1
c
n
¦ D[k d nH 2 o 0 n 2H 2
( n o f ),
k 1
which was to be proved (we used the relation §1 n · D ¨ ¦ [i ¸ ©n i 1 ¹
DSn
1 n ¦ cov [i ,[ j n2 i , j 1
1 n ¦ D[i , n2 i 1
since, due to pairwise independence of [i , [ j , cov [i , [ j 0 ( i z j ), cov [i , [i D[i ). ז Corollary 1. If [1 , [ 2 ,... is a sequence of pairwise independent, identically distributed random variables and D[n large numbers. Proof. In this case DSn
1 n ¦ D[i n2 i 1
V 2 f , then this sequence obeys the law of
1 nV 2 n2
V2 n
o 0 ( n o f ).
ז
Corollary 2. Let Pn be the number of «successes» in a sequence of (independent) Bernoulli trials with the probability of success p , then
Pn n
P
o p.
Proof. Let’s introduce a sequence of random variables [1 , [ 2 ,... as follows:
[ i =1, if a success was in the ith trial, [ i = 0, if a failure was in the ith trial. 324
Then
Pn [1 [2 ... [n , M Pn
np 1 p .
np , DPn
Now the assertion of Corollary 2 follows from Corollary 1. ז Corollary 3. If for a sequence of pairwise independent random variables
[1 , [ 2 ,...
M [1
M [2
... M [n
... a , D[1 d c,..., D[n d c,...
(the variances are uniformly bounded), then for any H ! 0 [ [ ... [ n ½ a H ¾ o 1 ( n o f ). P® 1 2 n ¯ ¿
(3)
This corollary is a special case of Chebyshev theorem. We note that this corollary can be considered as a justification for the rule of the arithmetic mean, which is often used in measure theory. Corollary 4. If in the sequence of random variables [1 , [ 2 ,... each random variable [ n ( n 1, 2,... ) is independent of the subsequent ones [n1 , [n2 ,... and condition 1 n ¦ D[i o 0 n2 i 1
( n o f ),
holds, then, such a sequence obeys the law of large numbers. The proof of the corollary is immediately obtained from the Chebyshev inequality and the relation § n · D ¨ ¦ [i ¸ ©i 1 ¹
§ n · § n · D[1 D ¨ ¦[i ¸ D[1 D[2 D ¨ ¦[i ¸ ©i 2 ¹ ©i 3 ¹ D[1 D[2 ... D[n .
ז
Theorem 2 (Poisson theorem). If in the sequence of independent trials the probability of occurrence of an event A in the kth trial is equal to pk , and the number of occurrences of the event A in the first n trials is equal to Pn , then P ½ p p2 ... pn P® n 1 !H¾o0 n ¯ n ¿
( n o f ).
Proof. Let’s define the random variables [1 , [ 2 ,... as in corollary 2. 325
Then
M [k
pk ,
D[ k
pk q k d
1 , 4
therefore, this theorem (Poisson theorem) is a special case of Chebyshev theorem. ז Theorem 3 (Markov theorem). If for a sequence of random variables [1 , [ 2 ,... the condition 1 § n · D ¦ [i o 0 ( n o f ), (4) n2 ¨© i 1 ¸¹ holds, then such a sequence obeys the law of large numbers. Proof of the theorem follows directly from the Chebyshev inequality (in this case 1 § n · D ¦[i ). ז it suffices to note that DSn n2 ¨© i 1 ¸¹ We note that if, under the conditions of the last theorem, a given sequence of random variables consists of pairwise independent random variables, then condition (4) becomes the condition
1 n ¦ D[i o 0 n2 i 1
( n o f ).
(4′)
Remark 1. In the Chebyshev theorem the condition of pairwise independence of random variables can be replaced by the condition of their pairwise uncorrelatedness (see the proof of the theorem). It is also easy to see that the Chebyshev theorem is a special case of Markov theorem. 1.1. Necessary and sufficient condition for the law of large numbers All the above theorems and corollaries give only sufficient conditions for the fulfillment of the law of large numbers for given sequences of random variables. Now we prove a theorem on the necessary and sufficient condition for the fulfillment of the law of large numbers. Theorem 4. Let [1 , [ 2 ,... be any sequence of (linked arbitrarily) random variables. Then, in order for this sequence to obey the law of large numbers, it is necessary and sufficient that the condition is fulfilled: 2
§ n · ¨ ¦ [ k M [ k ¸ ¹ o0 M ©k 1 2 n § · 2 n ¨ ¦ [ k M [ k ¸ ©k 1 ¹ 326
( n o f ).
(5)
Proof. First, using the notations introduced above, we rewrite condition (5) in the form: Sn2 M o 0 ( n o f ), (5′) 1 Sn2 where Sn
Sn MS M n,
Sn
1 n ¦ [k , nk 1
MSn
1 n ¦ M [k . nk 1
Sufficiency. Suppose that (5′) holds. Then for any H ! 0 [ [ ... [ n M [1 M [ 2 ... M [ n ½ !H¾ P® 1 2 n n ¯ ¿ P
^
Sn MSn t H
` P^ S
n
tH
`
M MI
^ Sn tH `
ª1 S 2 S 2 º 1 H 2 ª S 2 º M « 2 n n 2 I S tH » d 2 M « n 2 I S tH » d ^ ` ^ ` H ¬ Sn 1 Sn n ¼ ¬1 S n n ¼ d
1 H 2
H2
M
Sn2 o 0 (n o f), 1 Sn2
i.е., the sequence [1 , [ 2 ,... obeys the law of large numbers. Necessity. You can write: P
^
Sn t H
`
M
Sn2 S n2 S n2 t M I M H 2. 1 Sn2 ^ Sn H ` 1 S n2 1 Sn2
MI M
^ Sn tH `
tM
Sn2 I 1 Sn2 ^ Sn tH `
So, 0dM
^
`
Sn2 d H 2 P Sn t H . 2 1 Sn
P
But Sn o0 o 0 ( n o f ), аnd H ! 0 is a positive number, therefore, (5') holds, and, hence, (5) holds too. ז Remark 2. It is easy to see that all of the above statements are consequences of the theorem just proved. Indeed, for any n and [ k , S n2 d Sn2 1 Sn2
º ª1 n « n ¦ [ k M [ k » ¼ ¬ k1 327
2
n ª1 § n ·º M [k ¸» [ ¦ ¦ k «n ¨ k 1 ¹¼ ¬ ©k 1
2
therefore, in cases where variances exist, M
Sn2 1 § n · d D ¦[k . 1 Sn2 n 2 ¨© k 1 ¸¹
From this it follows that if the conditions of the Markov theorem are satisfied, then condition (5) is satisfied, so this sequence of random variables obeys the law of large numbers. If [1 , [ 2 ,... is a sequence of independent random variables, then condition (5) is equivalent to the condition
[ M [ k o 0 ¦M 2 k 2 k 1 n [ k M [ k 2
n
( n o f ).
Examples Example 1 (The Bernstein theorem). If for a sequence of random variables
[1 , [ 2 ,... the variances are uniformly bounded ( D[n d c f ), and the correlation coefficients of the random variables [i , [ j ( i z j ) Ui j
cov [i , [ j D[i D[ j
when j i o f uniformly converge to zero, then such a sequence obeys the law of large numbers. Solution. Let us write the variance of the sum in the form
§ n · D ¨ ¦ [i ¸ ©i 1 ¹
n
¦ D[i ¦ cov [i , [ j , iz j
i 1
and estimate the value
¦ cov [i , [ j ¦ Uij iz j
iz j
D[i D[ j
as follows. Considering that n is a sufficiently large number, we choose N for j i ! N (it is clear that N d n ) so that the inequality Ui j H holds for any given number H ! 0 . Then, taking into account that Uij d 1 and using the condition of the example and the choice of a number N, we can write: 328
¦ Uij iz j
D[i D[ j d c ¦ Uij
c
iz j
d cH
¦
1 c
i j !N
¦
i j dN
¦
i j !N
Uij c
¦
i j dN
Uij d
1 d cH 2c n N N 1 .
Hence,
1 § n · nc cH 2c n N N 1 D ¦ [i d o 0 ( n o f ), n 2 ¨© i 1 ¸¹ n2 and by theorem 3 (Markov theorem) this sequence obeys the large numbers law. Example 2. Let [1 , [ 2 ,... be a sequence of independent random variables, and
^
P [n
nD
`
^
P [n
nD
`
1 . 2
Find out for which D this sequence obeys the law of large numbers. Solution. M [ n 0 , D[n n2D . Then, in order for a series with a common term D[ n 1 2 2 2D to converge, that is, for a given sequence to obey the law of large numn n 1 bers, it is sufficient (Theorem 3) to be 2 2D ! 1 , i.е. D . 2 Example 3. Suppose that for a sequence of random variables [1 , [ 2 ,...
M [i M [i d c f ,
i 1, 2,... ,
where a1 , a2 ,... is a sequence of real random variables tending to zero. Show that then the law of large numbers can be applied to the sequence. Solution. Taking into account that for any random variable K and any H ! 0 the inequality M K MK , P ^ K MK ! H ` d H always holds, we can write for any H ! 0 : M
1 n 1 n ai[i ¦ ai M [i ¦ ni1 ni1
1 ½ 1 P ® ¦ ai[i ¦ ai M [i ! H ¾ d ni1 ¯ni1 ¿ n 1 c n ai M [i M [i d ¦ ¦ ai o 0 nH i 1 nH i 1 n
n
329
H (n o f),
Since, from an o 0 , by the lemma 1′ (which will be proved in the next section), it follows that
1 n ¦ ai o 0 ( n o f ). ni1
So, a1[1 a2[ 2 ... an[ n a1M [1 a2 M [ 2 ... an M [ n P o 0 , ( n o f ), n n
but it means that a sequence a1[1 , a2[2 ,... obeys the large numbers law.
Definition 1. Let [1 , [ 2 ,... be a sequence of random variables, M [1 , M [2 ,... – a sequence of corresponding mathematical expectations. If when n o f the following convergence with probability 1 (i.e., almost sure convergence) holds: [ [ ... [ n M [1 M [ 2 ... M [ n ½ o 0¾ 1 , P® 1 2 n n ¯ ¿
(1)
Then they say that the sequence of random vriables [1 , [ 2 ,... obeys the strong law of large numbers (or, that we can apply the strong law of large numbers to the sequence). If, as in the preceding section, we introduce the notations
Sn [1 [2 ... [n , Sn
Sn , n
then the relation (1), by definition of almost sure convergence (a.s.), is equivalent to the relations: for n o f S n MSn o 0 (a.s.), n
(1′)
Sn MSn o 0 (a.s.).
(1′′) P
Since, the convergence Sn MSn o 0 (a.s.) implies the convergence Sn MSn o 0 , then we obtain that any sequence of random variables that obeys the strong law of large numbers, also obeys the (weak) law of large numbers. 330
Theorem 1 (Cantelli theorem). Let [1 , [ 2 ,... be a sequence of independent 4 random variables with finite fourth moments ( M [n f , n 1, 2,... ) and this sequence satisfies the following condition: there is some constant c such that the inequalities hold: 4
M [n M [n d c, n t 1. Then this sequence obeys the strong law of large numbers. Proof. Without loss of generality, we can assume that M [ n
0 , n 1, 2,...
(otherwise we can go to the sequence [n [n M [n ). Then, according to the assessment 3 Ch. VI, §1, it suffices to prove that for any H ! 0 f
¦ P ^ Sn
`
!H f.
n 1
Using the following form of Chebyshev's inequality:
P^ [ t H ` d we obtain f
¦ P ^ Sn n 1
M[
2k
( k 1,2,... ),
H 2k
`
f
!H d¦
M Sn
n 1
H4
4
.
Therefore, to prove the theorem it suffices to prove the convergence of a series 4
with a common term M Sn . We can write down:
S n4
[1 [ 2 ... [ n
4
i 1
¦
i j k l
4!
n
4!
[ i2[ j[ k ¦ [i4 ¦ 2! 2![i2[ j2 ¦ 2!1!1! iz j izk ik
i, j i j
4! 3 [i [ j . i z j 3!1!
4![i[ j[ k [l ¦
Further, taking into account the independence of the random variables [1 , [ 2 ,... , the equalities M [i 0 ( i d n ), the conditions of the theorem and, incidentally, using 2 the inequality M [i d
M [i4 , we obtain that 331
MSn4
n
n
n
i 1
i, j 1 i j
i, j 1 i j
¦ M [i4 6 ¦ M [i2 M [ j2 d nc 6 ¦
d nc 6 Cn2 c So, M Sn
4
3n
2
M [i4 M [ j4 d
2n c 3n 2c.
4 1 3c MS n4 2 , i.е., the series with the general term M Sn converges, 4 n n
as required. ז The requirement of the finiteness of the fourth moment in Cantelli's theorem is too rigid, and in the Kolmogorov theorems proved below this condition is significantly weakened. Theorem 2 (A.N. Kolmogorov). Let [1 , [ 2 ,... be a sequence of random variables with finite second moments, and the sequence of numbers 0 bn n f satisfies the condition f D[ (2) ¦ b2 n f . n 1 n Then for a given sequence of random variables the convergence is true: S n MSn o 0 (a.s.). bn
In the special case, if bn
(3)
n , then the condition D[ n f 2 1 n
(2′)
S n MSn o 0 (a.s.). n
(1′)
f
¦ n
implies the convergence
To prove this theorem and (the following) Theorem 3 we need the following auxiliary lemma. Lemma 1 (Teplitz). Let
bn
^an ` n 1 f
be a sequence of nonnegative numbers,
n
¦ ai , bn ! 0 ( n t 1) and bn n f ( n o f ), and ^ xn `n 1 be a sequence, converging f
i 1
to some х. Then, when n o f 1 n ¦ajxj o x . bn j 1 332
(4)
In the special case, if an
1 , then x1 x2 ... xn o x. n
(5)
Proof. Suppose that H ! 0 and choose the number n0 n0 H such that for all H n t n0 will be xn x d (we have xn o x ( n o f ), so this possibility is always 2 available). Further, taking into account, that bn n f , one can choose n1 ! n0 so that inequality H 1 n ¦ xj x bn1 j 1 2 holds.
Then, for all n ! n1 n · · b 1 n § 1§ n ¨¨ a j x j n a j x ¸¸ d ¨ ¦ a j x j ¦ a j x ¸ d ¦ bn j 1 © aj j 1 ¹ ¹ bn © j 1 n0 n n 1 1 1 d ¦aj xj x aj xj x ¦ ¦ aj xj x d bn j 1 bn j 1 bn j n 0 1
1 n ¦ajxj x bn j 1
d
1 bn1
n0
¦aj xj x j 1
H 1 2 bn
n
¦
j n0 1
aj d
H 2
1 bn
n
¦
j n0 1
aj xj x d
H bn bn0 2
bn
H 2
1 bn1
H 2
n0
¦aj
xj x
j 1
H.
Since H ! 0 is an arbitrary number, the lemma follows from the last relations written. ז We formulate once again the important special case of Lemma 1 as a separate lemma. Lemma 1′. If
^ xn `n 1 is a converging sequence and f
xn o x when n o f , then
x1 x2 ... xn o x ( n o f ). n Lemma 2 (Kronecker). Suppose that 0 bn n f , and a sequences f
converging ( ¦ xn f ). n 1
333
^ xn `n 1 f
is
Then
1 bn
n
¦ bj x j o 0 ,
yn and n
n , xn
In the special case, if bn
n o f.
(6)
j 1
f
¦ n 1
yn f , then n
y1 y2 ... yn o 0 , n o f. n
Proof. Let’s introduce the notations b0
0 , S0
0 , Sn
(7) n
¦ x j . In this case we j 1
can write down: n
¦ bj x j j 1
n
¦ b j S j S j 1 j 1
n
n
n
¦ b j S j ¦ b j S j 1 j 1
j 1
n
n
j 1
j 1
¦ b j S j ¦ b j b j 1 S j 1 ¦ b j 1S j 1 j 1
n
bn S n ¦ b j b j 1 S j 1 . j 1
It implies that 1 n ¦ bj x j bn j 1
If we denote a j the sequence Sn
n
¦ xj
Sn
bj bj 1 , then bn
1 n ¦ b j b j 1 S j 1 . bn j 1 n
n
j 1
j 1
¦ b j b j 1 ¦ a j , moreover, by condition,
converges.
j 1
Therefore, if Sn o x ( n o f ), then, by Lemma 1, 1 bn
n
¦ a j S j 1 o x , j 1
which means that 1 bn
n
¦ bj x j o x x
0 ( n o f ).
ז
j 1
As in the case of the previous lemma, we formulate separately an important special case of the proved lemma 2. 334
Lemma 2′. If for the sequence of numbers ^ yn `n
f
then
the series
1
f
¦ n 1
yn converges, n
y1 y2 ... yn o 0 ( n o f ). n
Proof of Theorem 2. We can write down: 1 n § [k M [k ¦ bk ¨ b bn k 1 © k
Sn MSn bn
· ¸. ¹
Then, by the Kronecker lemma, in order for convergence hold for (when n o f ), it is sufficient that the series
S n MSn o 0 (a.s.) to bn
n
[k M [k
k 1
bk
¦
be converging
(a.s.). But according to the KolmogorovKhinchin theorem, which will be proved in the next chapter (see Ch. VIII, §2, p. 2.1), the condition § [ M[ · ¦ D¨ k b k ¸ k 1 k © ¹ n
n
D[ k f, 2 1 bk
¦ k
(i.е. condition (2)) is a sufficient condition for the (a.s.)convergence of the series n
[k M [k
k 1
bk
¦
,
hence, for the (a.s.)  convergence of (3). ז If the random variables [1 , [ 2 ,... are not only independent but equally distributed, then the requirement of finiteness of the second moment in Theorem 2 is superfluous. The requirement that only the first absolute moment be finite is sufficient. This fact is established by the Kolmogorov theorem. Theorem 3 (Kolmogorov). Let [1 , [ 2 ,... be a sequence of independent identically distributed random variables with the finite first absolute moment ( M [1 f ). Then this sequence obeys the strong law of large numbers: when n o f
[1 [ 2 ... [ n o a (a.s.), n where a
M [1 . 335
(8)
To prove the theorem we need the following auxiliary lemma. Lemma 3. Let [ be a random variable. Then, in order for the mathematical expectation of this random variable to be finite, it is necessary and sufficient that the series
f
¦ P^ [ n 1
t n` be converging: f
M [ f ֞ ¦ P ^ [ t n` f .
(9)
n 1
Proof. First of all, we note that by the definition of the mathematical expectation, M [ f M [ f (Ch. ІV, §1). Therefore, under the conditions of the lemma, from the very beginning, the random variable [ can be considered nonnegative. If we denote the integer part of a (nonnegative) random variable [ by >[ @ , then this integer
part can be written in the form >[ @
f
¦ kI^k d[ k 1` . Using this representation, we get that k 0
f
M >[ @
¦ kP ^k d [ k 1` . k 0
But the mathematical expectation of any nonnegative integer random variable f
¦ P ^K t k` ([6], problem 3.2.82), in addition,
K is equal to MK
k 1
^>[ @ t k` ^[ t k` . Given these circumstances, we can write: M[
¦ M [ I^k d[ k 1` d ¦ k 1 P ^k d [ k 1` f
f
k 0
k 0
f
f
k 0
k 0
¦ kP ^k d [ k 1` ¦ P ^k d [ k 1` f
¦ P ^>[ @ t k` 1 k 1
M >[ @ 1
f
¦ P ^[ t k` 1. k 1
Similarly,
M[
f
f
¦ M [ I^k d[ k 1` t ¦ kP ^k d [ k 1` k 0
k 0
f
¦ P ^[ t k`. k 1
Thus, for a nonnegative random variable [ we always have 336
f
f
k 1
k 1
¦ P ^[ t k` d M [ d ¦ P ^[ t k` 1.
(10)
From the last inequalities we obtain that in order for M [ f (i.e. M [ f ) the convergence of the series (9) is necessary and sufficient. ז Proof of Theorem 3. We get from Lemma 3 that in order for M [n f it is f
necessary and sufficient that the condition
¦ P ^ [n n 1
t n` f holds.
Further, since the last series converges, from the second part of the BorelCantelli lemma we obtain that P{infinitely many times [ n t n`
0,
hence, with probability 1 only a finite number of events [n n occurs. We introduce new random variables [ n as follows:
[ n = [ n , if [n n ; [ n = 0, otherwise. Without loss of generality, we assume that M [ n
[1 [ 2 ... [ n n
o 0 (a.s.)
~
[ 1
~ ~ [ 2 ... [ n a.s. o 0. n
(Generally speaking, it is not necessary that M [n
M [n
M [n I^ [ n` n
0 . Then, when n o f
0 , but
M [ I^ ` o 0 , n o f ). 1
[1 n
From this, applying Lemma 1, we obtain that the convergence is fulfilled:
1 n ¦ M [k o 0, n o f , nk 1 Hence,
[1 [ 2 ... [ n
o 0 (a.s.) if and only if
n
[
1
M [1 ... [ n M [ n n 337
o 0 (a.s.).
(11)
Let’s denote Kn
[n M [n . According to Kronecker lemma, in order to perform
¦
Kn
M [1 f ensures the convergence of the series
¦
f
converges a.s. n In its turn, according to Theorem 2, for this it suffices to show that the condition
(11), it is sufficient to establish that the series
n 1
DK n . This assertion follows from 2 1 n
f
n
the following chain of equalities and inequalities: DKn 2 1 n
f
f
¦ n
f
¦
MK n2
n 1
n2
1 ¦ n2 M [n I^ [n n` n 1 f
D[ n f M [ n d¦ 2 2 n 1 n 1 n
2
f
¦ n
¦ n2 M [1I^ [ n` f
2
1
n 1
2
1
f
f 1 1 2 2 ª º ª º M [ I M [ I ¦ n2 ¦ ¬ 1 ^k 1d [1 k` ¼ ¦ ¬ 1 ^k 1d [1 k` ¼ ¦ n2 d n 1 k 1 k 1 n k f n 1 ª 2 d 2¦ M [1 I^k 1d [ k` º d 2¦ M ª [1 I^k 1d [ k` º 2M [1 f . 1 1 ¬ ¼ ¬ ¼ k 1k k 1 n
ז
Remark 1. The statement of the theorem admits an involution in the following sense: If for a sequence of independent identically distributed random variables [1 , [2 ,... when n o f [1 [ 2 ... [ n o c (a.s.), n where c is a finite constant ( c f ), then M [1 f and M [1 Really, if
c.
Sn o c (a.s.), then n
[n n
Sn n 1 Sn1 o 0 (a.s.), n n n 1
therefore, P{for an infinite number of n takes place
½ Sn ! n¾ 0 . n ¿
f
Then, by the BorelCantelli lemma,
¦ P ^ [1 ! n` f and, according to lemma 3, n 1
M [1 f . Then it follows from proved Theorem 2 that c
M [1 .
This allows us to formulate Theorem 3 differently, in the form: Theorem 3′. Let [1 , [ 2 ,... be a sequence of independent identically distributed random variables. 338
Then, in order to, when n o f
[1 [ 2 ... [ n n
o a (a.s.)
it is necessary and sufficient the existence of a finite M [1 If M [1
a.
f , then [ [ ... [ n P ®limsup 1 2 n ¯ nof
½ f ¾ 1 . ¿
(12)
Proof. The first part of the theorem has already been proved above (see Theorem 3 and Remark 1). Let us prove the second part of the Theorem. To do this, we take a positive number c ! 0 and introduce a sequence of events
An
^Z : [
³
dF x
n
Z t cn` .
Then, P An
x tcn
F cn 1 F cn ,
where F x is the distribution function of [1 . Hence, f º 1ª 0 P A F x dx ~ 1 F x dx » . ¦ n c«³ ³ n 1 0 ¬ f ¼ f
Since for any distribution function F x the following relations take place f
f
0
0
0
³ 1 F x dx ,
³ xdF x
³
xdF x
f
0
³ F x dx f
(Prove!), then, by the condition of the theorem f
³
M [1
x dF x
f 0
f
f
0
0
f
f
0
³ xdF x
³ F x dx ³ 1 F x dx 339
³ xdF x
f
Thus, f
¦ P An
f .
n 1
In our case, the random variables [ n (hence, the events An ) are independent, therefore, for the second part of the BorelCantelli lemma, for any c ! 0 takes place: P{for an infinite number of n takes place the inequality [ n ! cn ` 1 , and this easily implies (12). ז Theorem 4 (The strong law of large numbers for the Bernoulli scheme). In the sequence of independent Bernoulli trials with probability of success p, for the number Pn of successes, the following a.s. convergence takes place:
P n a.s. n
o p ( n o f ).
(13)
The proof of the theorem follows immediately from Theorem 3. For this it suffices to note that Pn [1 [2 ... [n , where [i 1 , if success occurs in the i th trial, and [i 0 , if failure occurs in the i  th trial. ז Example 4. If [1 , [ 2 ,... is a sequence of identically distributed random variables,
M[i
a f , then the sequence of random variables [1 [ 2 ... [ n n
Kn obeys the strong large numbers law. Solution. By Theorem 3, Kn
[1 [ 2 ... [ n a.s. oa. n
Next, repeating almost wordforword the proof of Lemma 1 (only taking ܾଵ ൌ ͳ, bk k and introducing the corresponding probabilistic changes), we get that the following convergence is true:
K1 K 2 ... K n n
a. s .
oa .
Example 5. If [1 , [ 2 ,... is a sequence of independent random variables, [i d c f ( i 1, 2,... ), Sn [1 [2 ... [n , then 340
S n MSn a.s. o 0 ( n o f ). n ln n
bn
Solution. Under the conditions of Theorem 2 it suffices to assume that n ln n and notice that f
1
¦ n ln 2 n f .
D[ n f ,
n 2
§3. Tasks for independent work 1. Let [1 , [2 ,... be a sequence of independent identically distributed random variables, M [1 f . Another sequence of independent random variables T1 ,T2 ,... is also independent of the
sequence [1 , [2 ,... , and Tn d 1 , M T n 0 ( n 1, 2,... ). Show that then the strengthened law of large numbers can be applied to a sequence T1[1 ,T2[2 ,... , i.e. the following сonvergence takes place when n o f : T1[1 T 2[ 2 ... T n [ n n
a. s .
o0 .
2. Let [1 , [2 ,... be a sequence of independent identically distributed random variables. D
а) Show that, if for some D: 0 D 1 we have M [1 f , then the following convergence holds:
Sn 1
a.s .
o 0 ( n o f ),
nD E
b) Show that, if for some E: 1 d E 2 we have M [1 f , then the following convergence holds, when n o f : S n nMS1 1
a.s .
o 0.
nE
3. Let [1 , [2 ,... be a sequence of independent identically distributed random variables,
^
P [n
2k ln k 2ln ln k
`
1 , k 1, 2,... , 2k
Show that this sequence obeys the (weak) law of large numbers. 4. Let [1 , [2 ,... be a sequence of independent random variables. Show that if for some r t 1 f
¦ n 1
M [n
2r
nr 1
then the following convergence is true 341
f,
[1 [ 2 ... [ n n
а.s.
o0 ( n o f )
5. Let [1 , [2 ,... be a sequence of independent random variables, P ^[n
n`
P ^[n
P ^[n
n` 2 n ,
0` 0 .
Is it possible to apply the (weak) law of large numbers to this sequence? 6. Let [1 , [2 ,... be a sequence of independent random variables,
^
P [n
n
` P ^[
n
n
`
1 . 2
Is it possible to apply the (weak) law of large numbers to this sequence? 7. Let [1 , [2 ,... be a sequence of independent random variables,
^
P [n
n
` P ^[
n
n
`
1 , P ^[ n 2 n
0` 1
1 . n
Is it possible to apply the (weak) law of large numbers to this sequence? 8. Let [1 , [2 ,... be a sequence of independent random variables, [ n ~ N 0, ck D , c ! 0, D ! 0 are some constants. What conditions must the constants be subject to, in order for this sequence to obey the (weak) law of large numbers? 9. Show that if for the sequence of random variables [1 , [2 ,... cov [ k , [l d c f, k , l 1, 2... ,
and cov [k , [l o 0, k l o f , then this sequence obeys the large numbers law. 10. Is the following assertion true: if for a sequence of random variables the law of large numbers is fulfilled, and a1 , a2 ,... is a sequence of uniformly bounded nonnegative numbers, then the sequence of random variables a1[1 , a2[2 ,... obeys the law of large numbers? 11. If in the previous task [1 , [2 ,... is a sequence of independent identically distributed random variables, is the statement of task 10 true? 12. Let [1 , [2 ,... be a sequence of independent identically distributed random variables, D[i c f , and a1 , a2 ,... is a sequence of nondecreasing positive numbers. Prove that then for the applicability of the law of large numbers to a sequence a1[1 , a2[2 ,... it is necessary and sufficient that the condition an o 0 ( n o f ) holds. n
13. Prove that if [1 , [2 ,... is a sequence of independent random variables with M [ n [n n
d c f ( n 1, 2,... ) and when n o f
1 n a.s. , ¦ [i o 0 ni 1
Then for any H ! 0 the following convergence holds: f
D[
¦ n 2Hn f . n 1
342
0,
Глава
ANSWERS TO THE TASKS OF INDEPENDENT WORK
Chapter I §1, item 1.3.1. 1. a) A B means that the winners are awarded a simple bonus or cash prize or both; b) ABC means that the winner receives all three types of premium; c) AB \ C means that the winner was simultaneously awarded a simple bonus and a cash prize, but was not awarded a medal. 2. B \ A B A , A B A A B A A B A B A . 3. In all cases, a draw. 5. ABC means that both husband and wife are aged more than thirty years, and the husband is older than the wife; A \ AB AB means that the husband more than thirty, but he is younger than his wife; A B C means that the husband and wife are aged more than thirty, and the husband is younger than his wife. 9. а) A1 A2 A3 ; b) A1 A2 A3 ; c) A1 A2 A3 ; d) A1 A2 A3 ; e) A1 A2 A2 A3 A1 A3 ; f) A1 A2 A3 A1 A2 A3 A1 A2 A3 ; g) A1 A2 A3 A1 A2 A3 A1 A2 A3 ; k) A1 A2 A3 ; l) A1 A2 A3 \ A1 A2 A3 . 10. а) Bm
f
f
A1 A2 Am1 Am ; b) ¦ Bn
An ;
n 1
n 1
§1, item 1.3.2. 1.а) 1 / n; b) 1 / n(n 1). 2. P( A) ! P( B). 3. P( A) P( B). 4. а) p2
0,01, p1
0,27, p0
0,72;
b) p1 0,001, p2 0,063, p3 k repetitions, k 0,1,2,3 ). 5. pr
0,432, p4
0,504; ( pk is the probability that there will be
(10) r 10 r , p10  0,0003598.
6. а) Cn2 n!n n , nCnn12 C2nn1 ; b) n!n n , 1 C2nn1 7. If N 10k l ,then PN 2k N (l 0); PN (2k 1) N (1 d l 9); PN
2(k 1) N (l
9); PN o 1 ( N o f) . 5
8. 1/ 2; 1/ 3. 9. e 1 . 10. а) 1 / 6 n1 ; b) (6 n 5 n ) / 6 n ; c) n 5 n1 / 6 n ; d) (6 n 5 n n 5 n1 ) / 6 n. 11.
1 С nk ; . k ! nk
12. а)
9!
3!
3
3
9
; b)
9! . 4!3!2!39 343
13.
r! 1 . r1!r2!... rn ! n r
14. 1
.
C 2nn
15. а) n n ; b) n 1 ; c) n k .
16. а) N n ; b) 1 N ;c) N k . Cnm Crn1m 1 . Cnn r 1
17. a)
19.
b) n 1 ; c) n k . 18. 2 / n; 2 / n 1 .
2(n 2) r (n r 1)! n!
2(n r 1) 2 ; . n 1 n(n 1)
20. pd10 1 3 6 10 15 21 25 27 216 1 2 . n! 12 23. 2 7  0,0000003. 22. n . n 24. 6 7 12  1 6 . 25. а) 1 3 5 .... 2n 1 2 n n! 2n !; b) n! 1 3 ... 2n 1 2 n n! 2n ! n! 1 1 n n , where n n1 ... n6 . 26. а) n ; b) n ; c) ¦ n n n ! ! ! n 2 n 6 n N 6 6 6 1 2 6 1 2 6 1
pr
1
27. а) 4; b) 9. 28. 1 35 36 n t 0,5 , n t 25 . 1 1 1 n 1 , lim p n e 1 . 29. p n 1 ... 1 1! 2 ! 3! n ! n of n r 1 r 1 § 1 ·§ 2 · § r 2 · r 1 , r 2, 3,..., n 1 ; 30. а) q r q1 ¨1 ¸¨1 ¸ ... ¨1 ¸ n ¹ n nr © n ¹© n ¹ © (n) r . 1 q1 ... qr nr r 1 r 1 § 1· § 1· b) qr* ¨1 ¸ , pr* 1 q1* q2* ... qr* ¨1 ¸ , r 1, 2 , ... . n © n¹ © n¹
§2, item 4 2nm nn 1 mm 1 3. а) . b) . n m n m 1 n m n m 1 4. 1
Cnrr
7. P A
2 2
n 1
8.
.
Cnr
n
5. 5/14.
6. а)
n ; b) n 1 . 2n 1 2n 1
7 5 1 C53 C15 C10 C 21 C84 ; P B ; PC . 10 10 10 C20 C 20 C 20
1 ; 1
9.
4
25!
5!
5
5 25
n
. m
11.
1 · 1 · § § 1 · § 12. а) 1 ¨1 k ¸ ; б) C nm ¨ k ¸ ¨1 k ¸ © 2 ¹ ©2 ¹ © 2 ¹
2
4
10! § 1 · § 1 · § 3 · ¨ ¸ ¨ ¸ ¨ ¸ . 4!2!4! © 2 ¹ © 5 ¹ © 10 ¹ nm
.
§3, item 3.1 1. 0.5.
2. 0.25. 3. 1/3. 4.
ba , 3a ! 2b; 2a
3
S S §r· 5. 1 ¨ ¸ . 6. 1 ; 7. 1 tg 4 8 ©R¹
2 2.
344
4ab 5a 2 , 8a(b a)
3a 2b.
0 , q2
1 . n
r 2 2 § R ; b) ¨ 2 arcsin R r 8. а) ¨ SR R © a 2r 9. . 10. 1/3. 11. 0.5. 12. 0.25. a 2 arcsin
13.
· ¸ S R; ¸ ¹
§ a1 · ¸¸ 14. C nm ¨¨ © a1 a2 ¹
3 5 b 2 1 a2 , a 2 d 4b; , a t 4b. 2 24b 2 3a m
n! § a1 · 1 § a2 · 15. ¨ ¸ ¨ ¸ m1!m2 !... ms ! © l ¹ © l ¹
m2
m
§ a2 · ¸¸ ¨¨ © a1 a2 ¹
nm
.
m
§a · s ... ¨ s ¸ , where l © l ¹
a1 a2 ... as .
Chapter IІ §2, item 2.3 1.࣠ ^A: A :`. ª 1 · ª 1 2 º § 2 º ª 2 º ª1 º ª 1 · § 2 º 2. а) , >0,[email protected] , «0, ¸ , « , » , ¨ ,1» , «0, », « ,1», «0, ¸ ¨ ,1» ; ¬ 3 ¹ ¬3 3 ¼ © 3 ¼ ¬ 3 ¼ ¬3 ¼ ¬ 3 ¹ © 3 ¼ 1 ½ ª 1 º ª1 º ª 1 · § 1 º 1 ½ b) , >0,[email protected] , «0, » , « ,1» , «0, ¸ , ¨ ,1» , ® ¾ , >0,[email protected] \ ® ¾ ; ¯2¿ ¬ 2¼ ¬2 ¼ ¬ 2 ¹ © 2 ¼ ¯2¿ c) , >0,[email protected] , ^0`, ^1` , 0,[email protected] , >1,0 , 0,1 , ^0,1` ; ª 1 · ª1 1 º § 1 º ª 1 º ª1 º ª 1 · § 1 º d) , >0,[email protected] , «0, ¸, « , », ¨ ,1», «0, », « ,1» , «0, ¸ ¨ ,1» ; ¬ 3 ¹ ¬3 2 ¼ © 2 ¼ ¬ 2 ¼ ¬3 ¼ ¬ 3 ¹ © 2 ¼ e) , >0,[email protected] ; f) , >0,[email protected] ; g) , >0,[email protected] , a set of all rational numbers of the segment >0,[email protected] . 3. In both cases, the system of events has probabilities of zero or one. 4. a) Yes; b) Generally speaking, no; c) No; d) None. 11. a) No; b) Yes. 12. Yes. §4, item 4.4
1 25 / 6
5
1. 1 / 3 ;
2. 1 / 2 ;
3.
; 5 1 5 / 6 4. а) 11 / 75 ; b) 1 / 25 ; c) 1/ 15 ; d) 1 / 24 ; e) 1 / 6 ; f) 1/ 91. m q1 q2 ... q s m 5. 3 / 8 ; 7. 9. 83 / 210, 43 / 210, 2 / 5 ; ; ; ns n 10. Events Ai , A j , i, j ^1,2,5,6` , i, j  various. 11. No, No. 15. No. 16. 16 / 35 ; 17. 8 / 17 ; m2 ma 20. . 21. 4 / 5, 1 / 5, 0,0 . ; nm2 m na b Op m m 0,1,... . 24. q . 23. e Op 25. q 2 . m! 27. а) P( A)
18. 11 / 30 ;
19. 2 / 3 .
22.
Cnm p nm q nm .
26.
q 1 p3 . q p3
D2 E2 P ( B ) ; . b) For A profitable play to the end. D2 E2 D2 E2
28. а) P( A) D 2
1 E , P( B) D DE E 2 2
E2
1 D ; D DE E 2 2
345
DE D E ! 0 , those. A it is profitable to play the whole game to the end. D 2 DE E 2 1 29. Cmmnr p m1q nr Cmn nr p mr q nr . 30. C 2nn 2 2n1  (n !! 1) . 2 nS 2Dp1 31. . 32. e Ot . 2Dp1 1 D p2 p3 34. No. Example: P ( B ) 0 , A and C  any events. 35. No. Example: : ^1,2,3,4`, A ^1,2`, B ^4`, C ^2,3` , P^` i 1/ 4, i 1,2,3,4. A, B are dependent, B, C are dependent, but A, C are independent events. b) P( A) D
Chapter ІІІ §1, item 1.6 1. а) c 4 ; b) 1 4 . 2. Gx F x ( x t 0 ), Gx 0 ( x 0 ). 0, ° ® x, °1, ¯
3. а) F x P^[ d x`
c) F x
x d 0, 0 x 1, x t 1.
x d 1, 0, ° 1 ® °¯1 x D , x ! 1.
x d 0,
b) F x
0, ° ® x, °1, ¯
d) F x
xd0 0, °2 ° 0 x 1, ® arcsin x, °S x t 1. °¯1,
0 x 1, x t 1.
e) as in case a). 8 4 3 1 1 d x d , °3 x, 0, x , 27 ° 4 ° ° 8 °° 2 , 1 1 ° 0 , x f) F x ® 3 g) F x ® , d x 1, 27 4 °2 °2 1 x t 1. 0 d x d 1, °1, ° x, ° °3 3 ¯ °¯1, x !1 4. а) No; b) No; c) No; d) Yes; e) Yes; f) No. 1 3 º½ ª 1 · ª1 3 · ª3 º ª 3 · ª1 º 6. а) ®, :, «0, ¸ , « , ¸ , « ,1» , «0, ¸ , « ,1» , ª«0, ·¸ ª« ,1 ,1» ¾ ; 4 4 4 4 4 4 4 4 ¬ ¹ ¬ ¹ ¬ ¼ ¬ ¹ ¬ ¼ ¬ ¹ ¬ ¼¿ ¯ b) E >0,[email protected] ; c) ^, :`. 1 , x 1; fK x 0 , x t 1. 9. f K x S 1 x2 10. а) fK1 x 2DxeDx , x t 0 ; fK1 x 0, x 0 . 2
b) fK2 x De D
^
x
2 x , x ! 0; f x `, f x f ; K2
0, x d 0 ;
c) D 2 exp D x eDx d) f K4 x 1 , 0 x 1; fK4 x 0, 0 0,1 . 11. а)
1 , x >1,[email protected]; 0, x >1,[email protected] ; 2
b) e x , x ! 0 ; 0, x 0 ; 346
c)
1 3
x
2
12. fK x 13. а) c
, 0 x d 1 ; 0 , x 0,[email protected] ;
d) e x , x t 0; 0, x 0 .
2 ° ln x a ½ °, exp ® ¾ x ! 0 ; fK x 0 , x 0 . 2 ° ° 2 V Vx 2S ¯ ¿ 2 3; ) fK ( x) 3x , 0 x 1; fK ( x) 0, x 0,1 ; b) 0,026 ;
1
14. a), b) f [1 ( x)
f [ 2 ( x)
1
S x(1 x)
, 0 x 1 ; f [1 ( x)
f [2 ( x) 0 , x (0,1) ;
S S 1 1 ; d) f [ 4 ( x) , x d ; f [ 4 ( x) 0 , x ! . 2 S 2 2 S 1 x 1 q . 16. b) No. 0` ; P^K [ ` 1 q 1 q 0, x 1; FK x F 0 , 1 d x 1; FK x 1, x t 1;
c) f [3 ( x)
15. P^K 17. FK x
§2, item 2.5 7. P^[
5. K ~ 3 O p .
k`
1 c k 1
0,1,2,...
Ox n1 e Ox , x ! 0 ; f ( x) 0 , x 0 . S n 1 ! n2 f K x n 1 1 x , 0 x 1 ; f K x 0 , x 0,1 . F[Q x 1 1 Dx e Dx , x ! 0 ; f [Q x F[cQ x D 2 xeDx , x ! 0 ;
13. f S n x O
16.
,k
§ n 1 · *¨ n 1 ¸ © 2 ¹ 1 x 2 2 is the distribution of the Student's law. §n· S *¨ ¸ ©2¹
12. f[ K x
14.
c
n
F[Q ( x) 22. P ^Q t
n`
f [Q ( x)
0, x 0 .
2 n 1 t
t e t 2 n et , n 0,1,2,... * 2n 2 * 2n 1
§
§ 1 1 1 · ·
¨ ©
¨ ¸¸ © 1 1 2 ¹ ¹
25. а) K ~ N ¨¨ 0,0,0 , ¨ 1 2 1 ¸ ¸¸ ; ¨ ¸ § 1 ª2 2 2 º½ § 2 1·· exp ® « x 22 x 2 x3 x32 » ¾ , т.е. K2 ,K3 ~ N ¨ 0,0 , ¨ ¸¸ . 3 3 ¼¿ 2S 3 ¯ 2 ¬3 © 1 2¹¹ © 1 1 ½ exp ® 3x12 2 x1 x 2 2 x 2 y x 22 ¾ ; c) fK1 ,K2 K3 x1 , x 2 y 2S 2 ¯ ¿ § y 3· d) K2 ~ N 0, 2 ; e) N ¨ , ¸ , © 2 2¹ § y z 1· , ¸ random variables. f) Density of distributed N ¨ 3 3¹ ©
b) fK2 ,K3 x 2 , x3
1
>
@
§ § 2 1·· 29. а) Density of random variable distribution N ¨ 0,0 , ¨ ¸¸ ; © 1 2¹¹ ©
347
§
§ 2 1· · ¸¸ ; © 1 2 ¹ ¹
b) Density of random variable distribution N ¨ 0, 0 , ¨
© ½ 1 2 c) exp ® x1 x1 x 2 x 22 ¾ , 0 d x1 d x2 ; 0 –in other cases; S ¿ ¯ 3 3 x 3 d) P^K d x` 0 d x d 1; 0, x >0,[email protected] arctg 2x S 3 ½ 1 e) exp ® x12 x1 x 2 x 22 ¾ , x1 t 0 , x2 t 0 ; 0 –in other cases; S ¿ ¯ 3
3
f) P^K d x`
3
arctg
S
x 3 , 0 d x f; 0, x >0, f . 2 x
Chapter ІV §1, item 1.7 1. M [1 1; M [2 1,6; M [3 1, 4; M [4 2. M[ O , D[ O (O 1) .
M[ M[ 2 ; b) P^[
3. а) D[ 4. M [
n`
O , D[ O 1 DO ,
0, 2 .
an
is Pascal distribution.
1 a n1
r r r § 2· § 1· § § 1· · n(n 1) ¨1 ¸ n ¨1 ¸ ¨1 n ¨1 ¸ ¸ ne O 1 (1 O )e O (1 o(1)), © n¹ © n ¹ ¨© © n ¹ ¸¹ . r o O ,n o f n r k nC rk § 1 · Ok O r 1 n e 1 o1 , o O , n o f . 8. MP k r , n ¨ ¸ k k! n n © n¹
D P0 ( r , n)
10. m2n
2
2n 1 !!V 2n , m2n1
S
pq2 n 1 q 3n 5 , где q 1 p .
11. MQ 00
n 1 q2 ,
DQ 00
12. MQ 111
n 2 p3 ,
DQ 111
13. а) Ci
1 32
;
i 1
2 n n!V 2n 1 , n 0,1,2,.. ..
p 3q n 2 3n 8 p 5n 16 p 2 .
b) 1.
16. b) ^x n `is bounded xn C , but not convergent sequence. Then M[ n
^M[ n `
is a convergent sequence. 20. M[
r1 1 r2 , D[ r1 r2
23. MS n
2n 1 pq ,
27. M [ 28. M[1
D[
M[ 2
O,
.
r1 r2 1 r1 r2 2 2 DS n 2 pq1 2 pq n 1 2 pq p q .
MK
0;
r1r2 r1 1 r2 1
C o 0 n o f . n
DK
D[ 1
cov [ ,K O p, U
D[ 2
1 ; 4
Mr
348
2 , 3
p.
Dr
1 ; Others are zero. 18
29. а) c
3 ; Sr 3
31. M [
a, MS 2
central moment.
U 2 3r 2U
b)
6r 3
n 1 2 V , D[ n
2n 1 V
c) cov[1 , [ 2 U
;
n 1 2 § P
V 2 , DS 2
n
n
2n 1 V 2 ,
MS n,k
¨ ©
3
0. 4
n3 2· P 2 ¸ , where P k is the k th n 1 ¹
2 2k V 2k n 1 2k 1 !!, MM n,k
0. S b) In the above answers, the value 2V 2 should be replaced by 2V 2 1 U , where U is the correlation coefficient of the random variables [ i , [ j i z j .
32. а) MK n
33. MQ t 34. MSQ t 36. U
, M] n
Ot 1, DQ t 1 Ot
1
, DSQ t
O O12 O22 , U 2O1O2
37. M[ a
Ot . O2
.
V 12 V 22 . 2V 1V 2 a 2S 2SV 2 4V 2 2 2S aV ; 4S
V a , D[ a 2 2S
2SV 2 3a 2S 6V 2 2 2S aV a V ; D[ a 4S 2 2S a 2S 2V 2 . cov [ a , [ a 4S M[ a
n
39. D[ r 42. Uij
¦
j n r 1
1 . j2
40. cov[ ,K
pi p j
1 pi 1 p j
U [ ,K 0 .
41. U [ ,K
ca 2 . 4
, i, j 1, 2,...r , i z j. .
§2, item 2.4 y y2 , y ! 0 ; D[ [ K y . 2 12 y y2 5. M [ [ K y , y ! 0 ; D[ [ K y . 2 20 y [ K 6. M [ [ K , M [ [ K y , 0 y 2; 2 2 2 y 2 , 1 d y d 2 . y2 , 0 d y d 1; D[ [ K y 12 12 [ K 2 . y [ K 7. M [ [ K , ; D[ [ K M [ 2 [ K M [ [ K y 4 2 2 1 3 8. f [ K x y , x, y G ; M [ K y , 0 y 1. 21 y 31 y
4. M [ [ K
y
9. а) M [ 2 [1 10. f [ K x y
U
V2 [1 ; V1
б) D[ 2 [1
1 2
2
1 U V 2
2 2
, if x a 1 y 2 b 2 ;
2 a 1 y b f [ K x y 0 ,in other cases; M [ K
y 0 . 349
.
12. M K [
x 0 .
13. M K [
x cx , DK [
x
b2 2 a x2 , 0 d x a . 3a 2
a 2c 2 y. b2 a 2c 2 3 1 y 15. M K1 K 2 y , M K1 K 2 . K 2 ; DK1 K 2 y DK1 K 2 2 2 2 K K [ 2 [ 3 2[1 yz 16. M K1 K 2 y ,K3 z , M K1 K2 ,K3 2 3 ; 3 2 2 1 . DK1 K 2 y ,K3 z DK1 K 2 ,K3 3 1 1 §2 º ª 1º 1 § 1 2º 5 17. а) ; b) , if Z «0, » ; , if Z ¨ , » ; , if Z ¨ ,1» ; 2 6 ©3 ¼ ¬ 3¼ 2 © 3 3¼ 6 3 §1 º ª 1º c) [ , if Z «0, » ; , if Z ¨ ,1» . 4 ©2 ¼ ¬ 2¼ n n 20. M S n S n m y S nm , y , M S n S n m nm nm nm , M S n m Sn y y , M S n m S n S n , D Sn Snm DS n S n m y nm D S n m S n y m , D S n m Sn m .
14. M [ K
y
21. а) a 2 V 2 d c 2 12 , V 2 d c 2 4 ;
a d c 12 ; 27. No. 2
2
c) 2
O12 O22
,1
O12 O22
>
@
b) d c 2 3d c 2 V 2 12 ,
; d) O1O2 1 O1 ,
O1O22 .
ChapterV §4. 1. а) 0.08; b) 0.9953. 2. а) 0.090224; b) 0.15. 3. 0.00014. 4. а) 0.0078. b) 0.174. 5. 0.9615. 6. P np a npq P n np a npq  2) 0 a t 0,9995, a t 3,6, n 2206, 0 P n 6 . 7. а) 3919 d n d 16432 ; b) 5488 d n d 11634 . 8. 2) 0 1 0,68269... ; 1 2) 0 1 0,31730... . 9. The answer of the previous task. 10. а) 558; b) 541. 11. 0,1841. 12. 547. 13. 0.3933. 14. 0.9993. 15. а) 0.8859; b) 1) 0.8859; 2) 0.4991; 3) 0.1468; 4) 0.8353; b) 764 P n 836 ; c) n 18500
^
`
. Chapter VI §4. 4. Yes. 7. Generally speaking, no. Example: [ , [1 , [2 ,... are independent identically distributed random variables.
350
Chapter VIІ §3. 5. Yes. 6. No. 7. Yes. 8. Yes, if D 1 . 9. No. Example: [n 10. Yes.
(1)n [ , [ nondegenerate random variable, M [
0, a2 n
1, a2 n1
Chapter VIІІ §5. 1. P ^[ n
1` 3 n ; 0.
2. p12 (n) (1 (1)n ) / 2; p11 (n) (1 (1)n ) / 2. 3. 1) (0.385, 0.336, 0,279); 2) 0.0336; 3) (16/47, 17/47, 14/47 ). D D E D 5. а) {1,2}; b) 138/97;c) p1 52 / 97, p2 40 / 97, pi 1 pi , i 1, 2; d) S 1 S 2 0; S 3 12 / 91; S 4 46 / 291; S 5 S 6 51/194. 6. а) No, if p z q ; Yes, if p
q.
b) Yes, P ^Kn1 1/ Kn 1` c) Yes, p11
p23
p31
p; P ^Kn1 1/ Kn 1` 1 p.
p43 1 p; p12
p24
p32
p44
p.
§ 3/ 7 4/ 7 · 7. ¨ ¸. ©1/11 10 /11¹ 8. P ^P0 (n 1) k / P0 (n) k` k / N ; P ^P0 (n 1) k 1/ P0 (n) k` ( N k ) / N.
351
0.
352
APPENDICES Appendix 1 Table of values of function M x
x
1 x2 / 2 e 2S
0,0 0,1 0,2 0,3 0.4 0,5 0,6 0,7 0,8 0,9
0 0,3989 3970 3910 3814 3683 3521 3332 3123 2897 2661
1 3989 3965 3902 3802 3668 3503 3312 3104 2874 2637
2 3989 3961 3894 3790 3652 3485 3292 3079 2850 2613
3 3988 3956 3885 3778 3637 3467 3271 3056 2827 2589
4 3986 3951 3876 3765 3621 3448 3251 3034 2803 2565
5 3984 3945 3867 3752 3605 3429 3230 3011 2780 2541
6 3982 3939 3857 3739 3589 3410 3209 2989 2756 2516
7 3980 3932 3847 3726 3572 3391 3187 2966 2732 2492
8 3977 3925 3836 3712 3555 3372 3166 2943 2709 2468
9 3973 3918 3825 3697 3538 3352 3144 2920 2685 2444
1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9
0,2420 2179 1942 1714 1497 1295 1109 0940 0790 0656
2396 2155 1919 1691 1476 1276 1092 0925 0775 0644
2371 2131 1895 1669 1456 1257 1074 0909 0761 0632
2347 2107 1872 1647 1435 1238 1057 0893 0748 0620
2323 2083 1849 1626 1415 1219 1040 0878 0734 0608
2299 2059 1826 1604 1394 1200 1023 0863 0721 0596
2275 2036 1804 1582 1374 1182 1006 0848 0707 0584
2251 2012 1781 1561 1354 1163 0989 0833 0694 0573
2227 1989 1758 1539 1334 1145 0973 0818 0681 0562
2203 1965 1738 1518 1315 1127 0957 0804 0669 0551
2,0 2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2,9
0,0540 0440 0355 0283 0224 0175 0136 0104 0079 0060
0529 0431 0347 0277 0219 0171 0132 0101 0077 0058
0519 0422 0339 0270 0213 0167 0129 0099 0075 0056
0508 0413 0332 0264 0208 0163 0126 0096 0073 0055
0498 0404 0325 0258 0203 0158 0122 0093 0071 0053
0488 0396 0317 0252 0198 0154 0119 0091 0069 0051
0478 0387 0310 0246 0194 0151 0116 0088 0067 0050
0468 0379 0303 0241 0189 0147 0113 0086 0065 0048
0459 0371 0297 0235 0184 0143 0110 0084 0063 0047
0449 0363 0290 0229 0180 0139 0107 0081 0061 0046
3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 3,9
0,0044 0033 0024 0017 0012 0009 0006 0004 0003 0002
0043 0032 0023 0017 0012 0008 0006 0004 0003 0002
0042 0031 0022 0016 0012 0008 0006 0004 0003 0002
0040 0030 0022 0016 0011 0008 0005 0004 0003 0002
0039 0029 0021 0015 0011 0008 0005 0004 0003 0002
0038 0028 0020 0015 0010 0007 0005 0004 0002 0002
0037 0027 0020 0014 0010 0007 0005 0003 0002 0002
0036 0026 0019 0014 0010 0007 0005 0003 0002 0002
0035 0025 0018 0013 0009 0007 0005 0003 0002 0001
0034 0025 0018 0013 0009 006 0004 0003 0002 0001
Appendix 2 Table of values of function ) 0 x
1
x
2S
0
z ³ e
2
/2
dz
x
) 0 ( x)
x
) 0 ( x)
x
) 0 ( x)
x
) 0 ( x)
0,00 0,01 0,02 0,03 0,04 0,05 0,06 0,07 0,08 0,09 0,10 0,11 0,12 0,13 0,14 0,15 0,16 0,17 0,18 0,19 0,20 0,21 0,22 0,23 0,24 0,25 0,26 0,27 0,28 0,29 0,30 0,31 0,32 0,33 0,34 0,35 0,36 0,37 0,38 0,39 0,40 0,41 0,42 0,43 0,44 0,45 0,46 0,47 0,48 0,49 0,50 0,51 0,52 0,53 0,54 0,55
0,0000 0,0040 0,0080 0,0120 0,0160 0,0199 0,0239 0,0279 0,0319 0,0359 0,0398 0,0438 0,0478 0,0517 0,0557 0,0596 0,0636 0,0675 0,0714 0,0753 0,0793 0,0832 0,0871 0,0910 0,0948 0,0987 0,1026 0,1064 0,1103 0,1141 0,1179 0,1217 0,1255 0,1293 0,1331 0,1368 0,1406 0,1443 0,1480 0,1517 0,1554 0,1591 0,1628 0,1664 0,1700 0,1736 0,1772 0,1808 0,1844 0,1879 0,1915 0,1950 0,1985 0,2019 0,2054 0,2088
0,56 0,57 0,58 0,59 0,60 0,61 0,62 0,63 0,64 0,65 0,66 0,67 0,68 0,69 0,70 0,71 0,72 0,73 0,74 0,75 0,76 0,77 0,78 0,79 0,80 0,81 0,82 0,83 0,84 0,85 0,86 0,87 0,88 0,89 0,90 0,91 0,92 0,93 0,94 0,95 0,96 0,97 0,98 0,99 1,00 1,01 1,02 1,03 1,04 1,05 1,06 1,07 1,08 1,09 1,10 1,11
0,2123 0,2157 0,2190 0,2224 0,2257 0,2291 0,2324 0,2357 0,2389 0,2422 0,2454 0,2486 0,2517 0,2549 0,2580 0,2611 0,2642 0,2673 0,2703 0,2734 0,2764 0,2794 0,2823 0,2852 0,2881 0,2910 0,2939 0,2967 0,2995 0,3023 0,3051 0,3078 0,3106 0,3133 0,3159 0,3186 0,3212 0,3238 0,3264 0,3289 0,3315 0,3340 0,3365 0,3389 0,3413 0,3438 0,3461 0,3485 0,3508 0,3531 0,3554 0,3577 0,3599 0,3621 0,3643 0,3665
1,12 1,13 1,14 1,15 1,16 1,17 1,18 1,19 1,20 1,21 1,22 1,23 1,24 1,25 1,26 1,27 1,28 1,29 1,30 1,31 1,32 1,33 1,34 1,35 1,36 1,37 1,38 1,39 1,40 1,41 1,42 1,43 1,44 1,45 1,46 1,47 1,48 1,49 1,50 1,51 1,52 1,53 1,54 1,55 1,56 1,57 1,58 1,59 1,60 1,61 1,62 1,63 1,64 1,65 1,66 1,67
0,3686 0,3708 0,3729 0,3749 0,3770 0,3790 0,3810 0,3830 0,3849 0,3869 0,3883 0,3907 0,3925 0,3944 0,3962 0,3980 0,3997 0,4015 0,4032 0,4049 0,4066 0,4082 0,4099 0,4115 0,4131 0,4147 0,4162 0,4177 0,4192 0,4207 0,4222 0,4236 0,4251 0,4265 0,4279 0,4292 0,4306 0,4319 0,4332 0,4345 0,4357 0,4370 0,4382 0,4394 0,4406 0,4418 0,4429 0,4441 0,4452 0,4463 0,4474 0,4484 0,4495 0,4505 0,4515 0,4525
1,68 1,69 1,70 1,71 1,72 1,73 1,74 1,75 1,76 1,77 1,78 1,79 1,80 1,81 1,82 1,83 1,84 1,85 1,86 1,87 1,88 1,89 1,90 1,91 1,92 1,93 1,94 1,95 1,96 1,97 1,98 1,99 2,00 2,02 2,04 2,06 2,08 2,10 2,12 2,14 2,16 2,18 2,20 2,22 2,24 2,26 2,28 2,30 2,32 2,34 2,36 2,38 2,40 2,42 2,44 2,46
0,4535 0,4545 0,4554 0,4564 0,4573 0,4582 0,4591 0,4599 0,4608 0,4616 0,4625 0,4633 0,4641 0,4649 0,4656 0,4664 0,4671 0,4678 0,4686 0,4693 0,4699 0,4706 0,4713 0,4719 0,4726 0,4732 0,4738 0,4744 0,4750 0,4756 0,4761 0,4767 0,4772 0,4783 0,4793 0,4803 0,4812 0,4821 0,4830 0,4838 0,4846 0,4854 0,4861 0,4868 0,4875 0,4881 0,4887 0,4893 0,4898 0,4904 0,4909 0,4913 0,4918 0,4922 0,4927 0,4931
2,48 2,50 2,52 2,54 2,56 2,58 2,60 2,62 2,64
0,4934 0,4938 0,4941 0,4945 0,4948 0,4951 0,4953 0,4956 0,4959
2,66 2,68 2,70 2,72 2,74 2,46 2,78 2,80 2,82
0,4961 0,4963 0,4965 0,4967 0,4969 0,4971 0,4973 0,4974 0,4976
2,84 2,86 2,88 2,90 2,92 2,94 2,96 2,98 3,00
0,4977 0,4979 0,4980 0,4981 0,4982 0,4984 0,4985 0,4986 0,49865
3,20 3,40 3,60 3,80 4,00 4,50 5,00
0,49931 0,49966 0,499841 0,499928 0,499968 0,499997 0,499997
Appendix 3 Values of Poisson distribution S k (O )
e O
O
k
k!
Ȝ
k
0,1 0,9048 0,0905 0,0045 0,0002 0,0000 0,0000 0,0000 0,0000
k
2,0 0,1353 0,2707 0,2707 0,1805 0,0902 0,0361 0,0120 0,0034 0,0009 0,0002 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000
0 1 2 3 4 5 6 7
0,2 0,8187 0,1637 0,0164 0,0011 0,0001 0,0000 0,0000 0,0000
0,3 0,7408 0,2223 0,0333 0,0033 0,0003 0,0000 0,0000 0,0000
0,4 0,6703 0,2681 0,0536 0,0072 0,0007 0,0001 0,0000 0,0000
0,5 0,6065 0,3033 0,0758 0,0126 0,0016 0,0002 0,0000 0,0000
0,6 0,5488 0,3293 0,0988 0,0198 0,0030 0,0003 0,0000 0,0000
0,7 0,4966 0,3476 0,1216 0,0284 0,0050 0,0007 0,0001 0,0000
0,8 0,4493 0,3595 0,1438 0,0383 0,0077 0,0012 0,0002 0,0000
0,9 0,4066 0,3659 0,1647 0,0494 0,0111 0,0020 0,0003 0,0000
1,0 0,3679 0,3679 0,1839 0,0613 0,0153 0,0031 0,0005 0,0001
8,0 0,0003 0,0027 0,0107 0,0286 0,0572 0,0916 0,1221 0,1396 0,1396 0,1241 0,0993 0,0722 0,0481 0,0296 0,0169 0,0090 0,0045 0,0021 0,0009 0,0004 0,0002 0,0001 0,0000 0,0000 0,0000 0,0000
9,0 0,0001 0,0011 0,0050 0,0150 0,0337 0,0607 0,0911 0,1171 0,1318 0,1318 0,1186 0,0970 0,0728 0,0504 0,0324 0,0194 0,0109 0,0058 0,0029 0,0014 0,0006 0,0003 0,0001 0,0000 0,0000 0,0000
10,0 0,0001 0,0005 0,0023 0,0076 0,0189 0,0378 0,0631 0,0901 0,1126 0,1251 0,1251 0,1137 0,0948 0,0729 0,0521 0,0347 0,0217 0,0128 0,0071 0,0037 0,0019 0,0009 0,0004 0,0002 0,0001 0,0000
Ȝ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
3,0 0,0498 0,1494 0,2240 0,2240 0,1681 0,1008 0,0504 0,0216 0,0081 0,0027 0,0008 0,0002 0,0001 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000
4,0 0,0183 0,0733 0,1465 0,1954 0,1954 0,1563 0,1042 0,0595 0,0298 0,0132 0,0053 0,0019 0,0006 0,0002 0,0001 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000
5,0 0,0067 0,0337 0,0842 0,1404 0,1755 0,1755 0,1462 0,1045 0,0653 0,0363 0,0181 0,0082 0,0034 0,0013 0,0005 0,0002 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000
6,0 0,0025 0,0149 0,0446 0,0892 0,1339 0,1606 0,1606 0,1377 0,1033 0,0689 0,0413 0,0225 0,0113 0,0052 0,0022 0,0009 0,0003 0,0001 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000
7,0 0,0009 0,0064 0,0223 0,0521 0,0912 0,1277 0,1490 0,1490 0,1304 0,1014 0,0710 0,0452 0,0264 0,0142 0,0071 0,0033 0,0015 0,0006 0,0002 0,0001 0,0000 0,0000 0,0000 0,0000 0,0000 0,0000
ǪȒȇȉȇ
BIBLIOGRAPHY
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
18.
19.
Akanbay N. Theory of probability and Mathematical Statistics I. – Almaty: Kazakh University, 2017. – P. 442. Akanbay N. Theory of probability and Mathematical Statistics II. – Almaty: Kazakh University, 2017. – P. 458. Akanbay N. Theory of probability and Mathematical Statistics courses I. – Almaty: Kazakh University, 2011. – P. 391. Akanbay N. Mathematical statistics. – Almaty: Kazakh University, 2011. – P. 324. Akanbay N. A collection of problems and exercises of probability theory I. – Almaty: Kazakh University, 2014. – P. 480. Akanbay N. A collection of problems and exercises of probability theory II. – Almaty: Kazakh University, 2014. – P. 367. Akanbay N. A collection of problem and exercises of probability theory III. – Almaty: Kazakh University, 2007. – P. 256. Akanbay N. Fundamentals of probability theory, mathematical statistics and the theory of random processes. – Almaty: Kazakh University, 2005. – P. 259. Shiryaev A.N. Probability. – M.: Nauka, 1980. – P. 576. Sevastyanov A. The course of probability theory and mathematical statistics. – M.: Nauka, 1982. – P. 255. Borovkov A.A. Probability Theory. – M.: Nauka, 1976. – P. 352. Kolmogorov A.N. Basic concepts of probability theory. – M.: Science, 1974. – P. 119. KazakhRussian, RussianKazakh terminological DICTIONARY. Mathematics. – Almaty.: KȺZakparat determination corporations, 2014. – P. 440. Akanbay N. Fundamentals of probability theory, mathematical statistics and the theory of random processes. – Almaty: Kazakh University, 2007. – P. 302. Zubkov A.M., Sevastyanov A., Chistyakov V.P. A collection of problems in probability theory. – M.: Nauka, 1976. – P. 320. Prokhorov A.V., Ushakov V.G., Ushakov N.G. A collection of problems in probability theory. – M.: Nauka, 1989. – P. 328. Feller W. An introduction to probability theory and its applications. Volume I. – John Wiley and Sons. – New YorkChichesterBrisbaneToronto, 1970. (Translation into Russian: V. Feller. An introduction to probability theory and its applications. – M.: Mir, 1984. – Vol. I. – P. 752). Feller W. An introduction to probability theory and is applications. Vol. II. – John Wiley and Sons, Inc. – New YorkLondonSidneyToronto, 1971. (Translation into Russian: V. Feller. An introduction to probability theory and its applications. – M.: Mir, 1986. – V. II. – Pp. 738). Lehmann E.L. Testing statistical hypotheses. New York John Wiley. Sons, Inc. London. Chapman. Hall. Limited, 1959. (Translation into Russian: E.L. Lehman. Testing of statistical hypotheses. – M.: Mir, 1984. – P. 500).
20. Wentzel A.D. The course of the theory of random processes. – M.: Nauka, 2006. – P. 320. 21. Cramer H., Leadbetter M.R. Stationary Random Process. – Prinston University Press, 1969 (Translation into Russian: G. Kramer, M. Lidbetter. Stationary random processes. – M.: Mir, 1969. – P. 399). 22. Cramer H. Matematical Methods of Statistics. – Prinston University Press, 1946. (Translation into Russian: G. Kramer. Mathematical methods of statistics. – M.: Mir, 1975. – P. 648). 23. Tutubalin V.N. The theory of probability and random processes. Fundamentals of the mathematical apparatus and applied aspects. – Ed. Moscow State University, 1992. – P. 400. 24. Lukacs E. Characteristic functions. Second Edition. Revised and extended. Griffin. – London, 1969 (Translation into Russian: E. Lukach. Characteristic functions. – M.: Nauka, 1979. – P. 424). 25. Gnedenko B.V. The course of probability theory. – M.: Nauka, 1988. – P. 448. 26. Lamperti J. Probability. – Amsterdam, 1966 (Translation into Russian: J. Lamperti. Probability. – M.: Nauka, 1973. – P. 182.) 27. Borovkov A.A. Mathematical statistics. – M.: Nauka, 1984. – P. 472. 28. Schmetterer L. Einfuhrung in die mathematische Statistik. Springer Verlay Win. – New York, 1966. (Translation into Russian: S. Schmetterer. Introduction to mathematical statistics. – M.: Science, 1976. – 520 p.). 29. Ivchenko G.I., Medvedev Yu.I. Mathematical statistics. – M.: Higher school, 1984. – P. 248. 30. Sokolov G.A., Gladkikh I.M. Mathematical statistics. – M.: Examination, 2007. – P. 431. 31. Korshunov D.A., Chernova N.I. A collection of tasks and exercises in mathematical statistics. – Novosibirsk: Publishing House of the Institute of Mathematics, 2001. – P. 120. 33. Shiryaev A.N. The problems of Probability theory. – M.: MCNMO, 2011. – P. 416. 34. Shiryaev A.N. Probability. – 1, 2. – M.: MCNMO, 2011. – P. 840.
Educational issue
Ⱥkanbay Nursadyk PROBABILITY THEORY AND MATHEMATICAL STATISTICS I Textbook Editor L. Strautman Typesetting G. Ʉaliyeva ɋover design Y. Gorbunov Cover design photos were used from sites www.background2672597_960_720.com
IB No. 13561
Signed for publishing 10.04.2020. Format 70x100 1/12. Offset paper. Digital printing. Volume 29,91 printer’s sheet. 100 copies. Order No.3407. Publishing house «Qazaq University» AlFarabi Kazakh National University KazNU, 71 AlFarabi, 050040, Almaty Printed in the printing office of the «Qazaq University» publishing house.
«ҚАЗАҚ УНИВЕРСИТЕТІ» баспа үйінің жаңа кітаптары
Ақанбай Н. Ықтималдықтар теориясы және математикалық статистика. ІІ: оқулық / Н. Ақанбай. – Алматы: Қазақ университеті, 2017. – 458 б. ISBN 9786010422926 (ортақ) ISBN 9786010422940 (2кітап) Оқулық механикаматематика факультетінің математика және басқа да маман дықтар студенттеріне, магистрлеріне, докторанттарына және ҚазҰУдің білім жетілдіру институтының тыңдаушыларына оқылған дәрістер негі зінде жазылды және ҚР БжҒМнің шешімімен 2011 жылы «Оқулық» грифі беріліп, баспадан шыққан аттас оқулығының өңделген және бірқатар жаңа материалдармен толықтырылған І бөлімінің тікелей жалғасы болып табылады. Оқулықтың бұл ІІ бөлімінде Ібөлімге кітап көлемінің көтермеуіне байланысты енбей қалған, бірақ «Ықтималдықтар теориясы және математикалық статистика» пәнінің типтік оқу бағдарламасы бойынша оқытылуға тиісті келесідей тараулар қамтылған: Туындатқыш функциялар; Сипаттамалық функциялар; Тәуелсіз кездейсоқ шамалар тізбектері үшін шектік теоремалар; Математикалық статистикаға кіріс пе; Таңдамалық теорияның негізгі ұғымдары; Үлестірімдердің белгісіз параметрлерін бағалау; Статистикалық гипотезаларды тексеру теория сы элементтері; Кездейсоқ процестер теориясының жалпы және корре ляциялық теориясы элементтері; Стационар процестер теориясына кіріс пе; Марков процестері: Негізгі ұғымдар. Оқулықтың құрылымы және оқулықтағы материалдардың орналастырылуы бұл оқулықты математика ғана емес, математикаға жақын басқа да (информатика, механика, физика, математикалық және компьютерлік мо дельдеу, т.с.с.) мамандықтар студенттері, магистрлері, докторанттарымен қатар оқытушыларының да пайдалануына мүмкіндік береді. Аканбай Н. Теория вероятностей и математическая статистика: в 2 частях. Часть I: учебник / перев. с каз. языка Н. Аканбай. – Алматы: Қазақ уни верситеті, 2019. – 376 с. ISBN 9786010442283 В учебнике математические основы теории вероятностей изложены на базе аксиоматики А.Н. Колмогорова. В 1 главе материалы о случайных со бытиях и их вероятностях рассматриваются в рамках дискретного ве роятностного пространства. 2 глава посвящена общему вероятностному пространству. В 3 главе случайная величина определена как измеримая функция. Рассматриваемое в 4 главе понятие математического ожи дания введено как интеграл Лебега по вероятностной мере на общем вероятностном пространстве, при этом от читателей не требуется знание какихлибо сведений об интеграле Лебега. В 5 главе рассматриваются предельные теоремы в схеме Бернулли. В 6 и 7 главах изложены мате риалы об различных видах сходимости последовательностей случайных величин и о законах больших чисел. Учебник рекомендован студентам, магистрантам и докторантам PhD, обучаю щимся по специальности «Математика».
Кітаптарды сатып алу үшін «Қазақ университеті» баспа үйінің маркетинг және сату бөліміне хабарласу керек. Байланыс тел: 8(727) 3773411, коллцентр: 8 (727) 3773399. Email: [email protected], cайт: www.magkaznu.kz, интернетмагазин: www.magkaznu.com