Probability Theory and Mathematical Statistics: educational-methodical manual 9786010438774

This tutorial is a training and methodical complex of discipline "Probability Theory and Mathematical Statistics&qu

313 32 3MB

English Pages [150] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Probability Theory and Mathematical Statistics: educational-methodical manual
 9786010438774

Table of contents :
titul
Ковалева lessons (Восстановлен)
Ковалева practical (Восстановлен)
Ковалева individual work
IV

Citation preview

AL-FARABI KAZAKH NATIONAL UNIVERSITY

I. Kovaleva

PROBABILITY THEORY AND MATHEMATICAL STATISTICS Educational-methodical manual

Almaty «Qazaq University» 2019

UDC 511 (075) LBC 22.1 я 73 K 79 Recommended for publication by the decision of the Academic Council of the Faculty of Mechanics and Mathematics, Editorial and Publishing Council of Al-Farabi Kazakh National University (Protocol №3 dated 06.02.2019) Reviewers: Candidate of Physical and Mathematical Sciences, Associate professor N.P. Azanov Candidate of Pedagogical Sciences, Associate professor L.N. Orazbek

K 79

Kovaleva I. Probability Theory and Mathematical Statistics: educational-methodical manual / I. Kovaleva. – Almaty: Qazaq university, 2019. – 150 p. ISBN 978-601-04-3877-4 This tutorial is a training and methodical complex of discipline "Probability Theory and Mathematical Statistics" for the students of such specialties as Information Systems, Computer Science, Automation and Control, Computing and software, Big Data, etc. This tutorial may be used at the al-Farabi KazNU and other Kazakh universities where English is one of the languages of education. The tutorial consists of the preface, the main part (including 15 lectures and 15 practical classes consistent with the credit system of education), tasks for the independent work of students, the program of the final exam and the list of the literature used.

UDC 511 (075) LBC 22.1 я 73 ISBN 978-601-04-3877-4

© Kovaleva I., 2019 © Al-Farabi KazNU, 2019

Preface .................................................................................................... 5 I. Lectures .............................................................................................. 7 Lecture 1. Subject of probability theory. Events and operations on events. ................................................................................................ 7 Lecture 2. Definitions of probability. ...................................................... 14 Lecture 3. Additive rule. Conditional probability. Product rule. ............. 21 Lecture 4. Independence of events. Bayes formulas................................ 28 Lecture 5. Bernoulli scheme. Polynomial scheme. .................................. 35 Lecture 6. Asymptotic formulas of Bernoulli scheme. ............................ 42 Lecture 7. Random variables (finite case). .............................................. 49 Lecture 8. Numerical characteristics of random variables. ...................... 55 Lecture 9. Random variable (general case). ............................................ 62 Lecture 10. Characteristic functions. ....................................................... 70 Lecture 11. Large numbers law. Central limit theorem. .......................... 75 Lecture 12. Population and sample. ........................................................ 80 Lecture 13. Point estimators of parameters. ............................................ 13 Lecture 14. Interval estimators of parameters. ........................................ 93 Lecture 15. Statistical hypotheses testing. ............................................... 97 II. Practical lessons ................................................................................ 103 Practical lesson 1. Simplest probabilistic problems. ................................ 103 Practical lesson 2. Classical definition of probability. ............................. 105 Practical lesson 3. Classical definition of probability (part 2). ................ 108 Practical lesson 4. Conditional probability and independence................. 109 Practical lesson 5. Sum and product formulas. ........................................ 111 Practical lesson 6. Bayes formulas. ........................................................ 113 Practical lesson 7. Bernoulli scheme. ...................................................... 115 Practical lesson 8. Simple random variables. .......................................... 116 Practical lesson 9. Numerical characteristics of random variables. ......... 117 Practical lesson 10. Joint distributions. ................................................... 119 Practical lesson 11. Continuous random variable. ................................... 120

Practical lesson 12. Types of distributions. ............................................. 121 Practical lesson 13. Chebyshev’s inequality. Large numbers law. Central limit theorem............................................................................... 122 Practical lesson 14. Sample characteristics.............................................. 124 Practical lesson 15. Parameter estimators. ............................................... 125 III. Individual work of students (IWS) ................................................ 127 IWS №1. The simplest problems of probability theory. .......................... 127 IWS №2. Geometric definition of probability. Sum and product formulas. ................................................................................................. 129 IWS №3. Bayes formulas +. .................................................................... 132 IWS №4. Independent trials. ................................................................... 134 IWS №5. Discrete and continuous random variables. ............................. 138 IWS №6. Joint distribution of discrete random variables. Mathematical statistics elements. ............................................................ 142 IV. Final exam program ....................................................................... 146 References .............................................................................................. 149

A full-fledged scientific research in any field of knowledge, including the field of information technology, cannot be imagined without taking into account the spread of experimental data or observations, without taking into account the totality of factors affecting the results obtained. This is the explanation of the interest in probability theory and mathematical statistics that has been observed recently. The analysis of the trends of introduction of probabilistic statistical methods into the solution of practical problems shows, that this interest will only increase with time. The reason for this is the acceleration of the development of computing technology, the development and widespread introduction of new information technologies, as well as new results that have been obtained recently in the field of simulation modeling. The revolution that took place in the field of creating software platforms for simulation modeling makes it one of the main methods for studying complex systems, phenomena and processes as sets of many simpler interacting elements. The results obtained in this way will be characterized by the greatest completeness and objectivity only when taking into account the random nature of interactions, factors and disturbances in such models. The proposed textbook is based on the lectures that the author has been delivering at the Faculty of Mechanics and Mathematics of the al-Farabi KazNU for a number of years. The manual is adapted for the English language as a foreign (nonnative) language. It is based mainly on the Russian-language literature translated into English.

There are 15 lectures (in accordance with the credit system of training), 15 practical lessons, as well as 30 tasks for independent work (30 variants of each task). Sample questions for the final exam are also presented. At the beginning of each lecture the plan is given, at the end of the lecture – questions for self-examination. For better understanding of the lecture material, it is recommended to answer these questions.

Lecture content 1. Subject of probability theory. 2. Elements of combinatory analysis. 3. Events. Operations on events. 4. Urn model.

1. The subject of probability theory is a mathematical analysis of regularities generated by random events. Random event is an empirical event, which (for a given set of conditions) can be characterized by the fact that: (i) it doesn’t have a deterministic regularity (in a single experiment, it may or may not occur); (ii) it has a statistical stability (relative frequency of its occurrence is stabilized when the repetition of the experiment is performed a lot of times). E x a m p l e. The “fair” coin is tossed. Let us consider a random event – falling out of heads. We do not know if this event will happen with a single flip of the coin (i). But if we make a large number of independent tosses, we notice that for a “fair” coin the frequency of getting heads or tails is close to 0.5 (ii).

A statistical stability of frequencies leads to the opportunity of a quantitative estimation of randomness of random event A. Therefore, the probability theory postulates an existence of numerical characteristic of random event which is named probability – P(A). The general property of probability is that when the number of independent experiments increases then the relative frequency of the event is going to its probability. For our example, it means that the probability of fallout of heads is equal to 0.5. 2. Elements of combinatory analysis. In different situations, there is a problem of calculating the quantity of elements of a set defined by those or other conditions. General statement of a combinatory problem is as follows. We have to select m elements (sample) from n different elements (general set). One element can enter the sample more than once. It is necessary to specify the exact number of samples from the general set by m and n. Let us formulate two general combinatorial rules. Rule of sum (addition principle). If we have a ways of doing something and b ways of doing something else and we cannot commit both actions at the same time, then there are a + b ways to choose one of the actions. Rule of product (multiplication principle). If there are a ways of doing something and b ways of doing something else, then there are a · b ways of performing both actions. Samples. Let n be the size of the general set and m be the size of the sample. We have two types of sampling the elements in the sample: ordered (if we change the order of the sample elements then this will be already another sample) and unordered (if we change the order of the sample elements then this will be the same sample). In addition, we have two sample types: sample with replacement (one element can enter the sample more than once) and sample without replacement (one element cannot enter the sample more than once).

D e f. Ordered sample of size m, without replacement, is called the Arrangement of m (objects) out of n (is written as Аnm ). The number of arrangements of m out of n is equal to

Аnm = n(n - 1)(n - 2)…(n - m +1) = n ! / (n-m)! Actually, the first element may be chosen in n ways, the 2nd one – in (n-1) ways, since the first element may not be used again, etc., the element number m – in (n-m+1) ways. Then we use the rule of product. E x a m p l e. Assume that we have a general set of size 3 (n = 3) {1, 2, 3} and make the ordered samples without replacement of size 2. It is possible to obtain А32 = 3!/(3-2)! = 6 arrangements 12, 13, 21, 23, 31, 32. N o t e. If m = n, then such an arrangement is called the Permutation Рn: Рn = Аnn = n! E x a m p l e. It is possible to make Р3 = 3! = 6 permutations out of 3 elements 1, 2, 3: 123, 132, 213, 231, 312, 321. D e f. Unordered sample of size m, without replacement, is called the Combination of m out of n (is written as Cnm ). The number of combinations of m out of n is equal to

Cnm = n(n - 1)(n - 2)…(n – m + 1) / n! = n! / (m! (n - m)!) = Аnm / m! Really, the number of ordered samples of m out of n is m! times more than that of unordered samples because we can rearrange m elements in m! ways (see Permutation).

E x a m p l e. It is possible to make C 32 = 3! / (2! (3 - 2)!) = 3 combinations of 2 out of 3: 12, 13, 23. D e f. An ordered sample with replacement is called the Arrangement with replacement. The number of arrangements with replacement of m out of n is equal to  n  ... n. Unm = nm = n  m

Indeed, the first element may be chosen in n ways, the 2nd one – in n ways, etc., the element number m – in n ways. Then we use the rule of product. E x a m p l e. It is possible to make 32 = 9 arrangements with replacement of 2 out of 3: 11, 12, 13, 21, 22, 23, 31, 32, 33. D e f. An unordered sample with replacement is called Combination with replacement. The number of combinations with replacement of m out of n is equal to fnm = Cnmm1 . E x a m p l e. It is possible to make C32 21 = 6 combinations with replacement of 2 out of 3: 11, 12, 13, 22, 23, 33. Let us summarize our results in the table: set choice without replacement with replacement

ordered

Аnm = n!/(n-m)! Unm = nm

unordered

Cnm = n!/(m!(n-m)!) m

fnm = Cnm1

3. Events. Operations on events. One of the basic concepts of probability theory is a random event or event. In a real life, an event is an outcome of the test (supervision, experiment) that can take place or not take place. E x a m p l e. A fair dice is tossed once. The events are: A = {“6” occur} or В = {even number occur}. Let us consider some special cases of events and operations on events. D e f. A certain event (is written as ) is an event which always occur (for a given set of conditions, i.e., in the given experiment). E x a m p l e. A fair dice is tossed.  = {“1” or “2” or … or “6” occur}. D e f. An impossible event (is written as ) is an event which never occur (for a given set of conditions). E x a m p l e. A fair dice is tossed.  = {“7” occur}. D e f. An events А and Ā are called mutually opposite if these events and only these are possible in the given experiment and if A occur then Ā does not occur, and conversely. E x a m p l e. A coin is tossed. А = {Heads occur}; Ā = {Tails occur}. D e f. Union or Sum of events А and В (АUВ or А + В) is an event that occur if and only if A occur, В occur, or А and В both occur.

E x a m p l e. A fair dice is tossed. А = {even number of points occur}; В = {number of points, which is multiple of three, occur}; А + В = {“2” or “3” or “4” or “6” occur}. D e f. Intersection or Product of events А and В (А∩В or АВ) is an event that occur if and only if А and В simultaneously occur. E x a m p l e. A fair dice is tossed. А = {even number of points occur}; В = {number of points, which is multiple of three, occur}; АВ = {“6” occur}. D e f. Residual of events А and В (А\В or А - В) is an event that occur if and only if A occur and В does not occur. E x a m p l e. A fair dice is tossed. А = {even number of points occur}; В = {number of points, which is multiple of three, occur}; А-В = {“2” or “4” occur}. D e f. Two events А and В are said to be mutually exclusive or disjoint if it is impossible that both A and B simultaneously occur: АВ = . E x a m p l e. A coin is tossed. Disjoint events are А = {Heads occur}; В = {Tails occur}. 4. Urn model. Currently, the approach at which an event is defined through the concept of elementary event is the most common in probability theory. In a mathematical model, it is possible to accept this concept as initial one, which does not have definition and is characterized only by its properties. The most common probability-theoretic model in elementary cases is the Urn model.

Urn model. Suppose there is an urn with balls of equal size. Our experiment: we randomly select a ball from this urn. Let  = {1, …, n} be the set of all balls in the urn. If we take out the ball i  А, where А is some subset of the set , then we will say that an event A occur; if i  А, then we will say that an event A does not occur. In this case the event А is identified with a subset А of the set of all possible outcomes of our experiment, or (as we will say), elementary events. Let us consider in arbitrary probability-theoretic model some basic set  = {}. Its elements  are called elementary events, the basic set  is called sample space and some its subsets А   are called events. Therefore, operations on events are set operations. E x a m p l e. Coin is tossed 3 times. Sample space consists of 8 points:  = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} = {1, …, 8}. Event А = {HTT, THT, TTH, TTT} is that “at least two tails occur” and it is the subset of , that consist of points 4, 6, 7, 8.

QUESTIONS FOR SELF-EXAMINATION 1. We select m elements out of n. Which of four samples of m out of n is the largest? 2. What type of samples is 7-digit phone number? The word “digital”? 3. Three dice are tossed. How many possible outcomes are there? 4. Four coins are tossed. How many possible outcomes are there? 5. What is certain event? Impossible event? 6. What are mutually opposite events? 7. What are union and intersection of events А and В? 8. What are mutually exclusive (disjoint) events? 9. What is the union and intersection of certain and impossible events? 10. What is the union and intersection of two mutually opposite events?

Lecture content 1. Axiomatic definition of probability. 2. Classical definition of probability. 3. Geometric definition of probability. 4. Statistical definition of probability.

1. Axiomatic definition of probability. Let A0 be some system of sets A  . Using the set-theoretic operations U, ∩ and \, we can build a new system of sets that are also events. By adding certain and impossible events to these events, we will get algebra. Therefore, algebra F is the system of sets, such that: 1*. ,   F (F is closed under impossible and certain events). 2*. If AF, so Ā  F (F is closed under complement). 3*. If A, B  F, so A U B, A ∩ B  F (F is closed under union and intersection). Thus, we will consider algebras as the systems of events. EXAMPLES. (i) {, } is trivial algebra; (ii) {А, Ā, , } is algebra generated by event A; (iii) {А: А  } is the set of all subsets of . We consider Example (iii) in more details. Suppose, for example, that sample space consists of two elements:  = {1, 2}. Then the set of all subsets consists of 22 = 4 elements: F = {{1}, {2}, , }. We show that F is an algebra. Indeed,   F,   F (1*). Further,  is the complement of  and vice versa; {1} is the complement of {2} and vice versa. Thus, F is closed under complements (2*).

Finally (there we have C42 = 6 unions (intersections); why is it so in your opinion?), U{1} = {1}, U{2} = {2}, U = , {1}U{2}= , {1}U = , {2}U = , so F is closed under unions. Further, ∩{1} = , ∩{2} = , ∩ = , {1}∩{2} = ; {1}∩ = {1}; {2}∩ = {2}, so F is closed under intersections (3*). Kolmogorov axioms. These are the axioms of probability theory, introduced by A.N. Kolmogorov. Let sample space  be a finite set of elements , which are called elementary events, F be a set of all subsets of  (algebra of events). The elements of F are called events. 1. For any event A there is a nonnegative number P(A), which is called the probability of event A. 2. P() = 1 (normalization axiom). 3. If events A and B are disjoint, then P(A+B) = P(A) + P(B) (additivity axiom). D e f. A probability space is a triple (, F, P), where  is sample space; F is algebra of events; P is probability, satisfying axioms 1) - 3). The simplest probability spaces are constructed as follows. Take an arbitrary finite set of elements  = {1, 2, ..., n} and an arbitrary set of nonnegative numbers {p1, p2,…, pn} such that p1 + p2 + ... + pn = 1. Here algebra F is the set of all subsets of : F = {А: А  }. For any A  : А = { i1 , …, ik } we suppose: P(A) = p i1 +…+ pik . Say, that p1, p2,…, pn are the probabilities of elementary events 1, 2, ..., n (respectively). 2. Classical definition of probability (Discrete Uniform Probability Law). In a special case, when  = {1, ..., n} and pi = 1/n, i = 1,...,n, we have for event А = { i1 , …, ik }: Р(А) = p i1 +…+ pik = k/n,

or

Р(А) = А /  ,

where   is the total number of possible outcomes (total number of points in ), А  is the number of favorable outcomes (number of points in A). Therefore, we use classical definition of probability when we have an experiment with a finite number of outcomes and all outcomes are equally likely. N o t e. It is obvious that for any event A 0 ≤ Р(А) ≤ 1. EXAMPLE. A single dice is tossed. By using the classical definition of probability, find the probability of event A = {even number of points occur}. We have  = {1, 2, 3, 4, 5, 6}, it contains 6 elementary events (all possible outcomes); A = {2, 4, 6}, it contains 3 elementary events (favorable outcomes). By the classical definition of probability, Р(А) = А /  = 3/6 = 1/2. N o t e. We note that, for example, throwing a quadrangular truncated pyramid does not fit into the classical scheme, because its outcomes are not equally likely. 3. Geometric definition of probability. At the beginning of the development of probability theory, insufficiency of the classical definition of probability took place. It was based on the consideration of a finite number of equally likely outcomes. Even then, specific examples have led to some modification of the definition and construction of the concept of probability for the cases where there are infinite set of possible outcomes. At the same time, the concept of "equal possibility" of all outcomes continued to play the major role.

Suppose, there is some domain G (for example, on the plane), and it contains another domain g. We throw a point into the domain G at random. It is necessary to find the probability of the event A = {the point falls into the domain g}. The thrown point can get into any part of the domain G and the probability of hitting the point in some part of G is proportional to the area of this part and does not depend on its location and shape. Thus, by definition, the probability of hitting the point in the domain g when it is randomly thrown on the domain G is equal to Р(А) = S(g)/S(G). Where S is the area.

g G

EXAMPLE (probability of two people meeting). Two people A and B have a meeting at a given place between 10 and 11 p.m. The one who comes first will wait for 20 minutes and leave if the other one does not come. What is the probability that they will meet? S o l u t i o n. We have an event C = {A and B met}. Let x be the moment of the arrival of A and y – the moment of the arrival of B. Then the domain G is the square G = [0. 60] [0. 60] (in minutes). The domain g is the solution of the inequality. x - y 20.

Let us represent x and y as Cartesian coordinates on the plane. We have: у

60

20

0

20

60

х

Probability of event C is equal to the ratio of the area of the shaded part to the total area of the square: P(A) = (602 - 402)/602 = 5/9. 4. Statistical definition of probability. Long-term observation of the occurrence or non-occurrence of event A with a large number of independent experiments, performed in the same conditions, in some cases shows that the number of occurrences of A obeys stable laws. We will consider the frequency definition of probability, as it is the one that currently is the most widely held. To do this, we need to define two concepts: (i) sample space S, and (ii) relative frequency. (i) Sample space, S, is the collection (sometimes-called universe) of all possible outcomes. For a stochastic system, or an experiment, the sample space is a set where each outcome comprises one element of the set. (ii) Relative frequency is the proportion of the sample space where an event A occur. In the experiment with 100 outcomes where A occur 80 times, the relative frequency is 80/100 or 0.8. The frequency approach is based on the notion of statistical regularity, i.e., in the long run, over replicates, the cumulative relative frequency of an event (A) stabilizes. The best way to illustrate

this is an example experiment that we run many times and measure the cumulative relative frequency (crf). The crf is simply the relative frequency computed cumulatively over some number of replicates of samples, each with a space S. We have (if the limit exists): P(A) = lim crf n ( A). n

Paradox de Mere. The Frenchman de Mere, in the experiment with the dice, noticed that, when throwing three dice, the combination giving the sum of 11, falls out more often than the combination giving the sum of 12. However, in his view, these combinations should be equally likely. De Mere reasoned that we have 6 different ways to get the sum 11: {1+4+6, 1+5+5, 2+3+6, 2+4+5, 3+3+5, 3+4+4}. In addition, we have 6 different ways to get the sum 12: {1+5+6, 2+4+6, 2+5+5, 3+3+6, 3+4+5, 4+4+4}. There is an equality of numbers of outcomes in the events A = {sum = 11} and B = {sum = 12}. So, P(A) = P(B). Pascal found the error of de Mere: the outcomes of de Mere were not equally likely. For example, we can get the outcome 1+4+6 by 6 ways (see "Permutations", Lecture 1): 1+4+6, 1+6+4, 4+1+6, 4+6+1, 6+1+4, 6+4+1.

In addition, we can get, for example, the outcome 3+3+5 by 3 ways: 3+3+5, 3+5+3, 5+3+3. The fact that the dice are different must be taken into account. We have:  = 666 = 216, A= 6 + 3 + 6 + 6 + 3 + 3 = 27, B= 6 + 6 + 3 + 3 + 6 + 1 = 25. So, P(A) = 27/216 > 25/216 = P(B). N o t e. De Mere confused the concept of Event with the concept of Elementary event! QUESTIONS FOR SELF-EXAMINATION 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

What is sample space? If we have a finite sample space, then, what is an event? What is the difference between elementary event and event? Formulate Kolmogorov axioms. What are the results of the experiment, if we use classical definition of probability? What double inequality holds for the probability? Formulate classical definition of probability. Formulate geometric definition of probability. Formulate statistical definition of probability. What is the mistake of de Mere?

Lecture content 1. Additive rule. 2. Conditional probability. 3. Product rule (Multiplicative rule).

1. Additive rule. Consider the corollary of Kolmogorov axioms. Axiom 3 (axiom of addition) may be generalized by induction to an arbitrary number n of disjoint events. If the events A1, A2, ..., An are mutually exclusive (Ai∩Aj = , i  j, i, j = 1, 2, ..., n), then the probability of their union (sum) is equal to the sum of their probabilities: P(A1 + A2 + ... + An) = P(A1) + P(A2) + ... + P(An).

(3.1)

Formula (3.1) is the addition formula for mutually exclusive events. Notice that, if the events A1, A2, ..., An are mutually exclusive, then the event A1 + A2+...+ An means that exactly one of the events A1, A2, ..., An occur. For example, an event A1 + A2 + A3 (when n = 3) is the appearance of A1 or A2 or A3. D e f. The group of events A1, A2, ..., An, such where P(Ai) > > 0 (i = 1, 2, …, n) is called the complete group of events (CGE) if: 1) they are mutually exclusive: Аi ∩ Аj =  (i  j, i, j = 1, 2, ..., n); 2) one (and only one) of them occur necessarily: А1 + А2 +…+ + Аn = . E x a m p l e. A dice is tossed. The events A = {“1” or “3” occur}, B = {even number occur},

B = {“5” occur}. are the CGE. A s s e r t i o n 1. The sum of probabilities of events of the CGE is equal to one: Р(А1) + Р(А2) + …+ Р(Аn) = 1. Indeed, let A1, A2, ..., An be the CGE, i.e. these events are mutually exclusive and their sum is a certain event . Then, using the formula (3.1), we have: Р(А1) + Р(А2) + …+ Р(Аn) = Р(А1 + А2 + … + Аn) = Р() = 1 (last equality is Kolmogorov axiom 2). Mutually complementary events. There is another definition of mutually complementary events (we gave the first definition in Lecture 1) that uses the concept of CGE. D e f. Two events A and Ā are called mutually complementary events, if they form CGE. A s s e r t i o n 2. The sum of probabilities of mutually complementary events is equal to 1: P(A) + P(Ā) = 1.

(3.2)

Indeed, by definition, mutually complementary events form CGE and the sum of probabilities of the events forming CGE is equal to 1 according to assertion 1. N o t e. Sometimes, when solving problems, it is convenient to find the probability of a complementary event and then, to find the probability of the event by (3.2). EXAMPLE. There are 10 balls in an urn, 4 of them are white. Three balls are randomly taken out (without replacement). Find the probability of the event A = {at least one of them is white}.

The occurrence of event A means the occurrence of exactly one of the events A1 = {one ball is white and two are not white}, A2 = {two balls are white and one is not white}, A3 = {all three balls are white}, i.e. A = A1 + A2 + A3. Then, the events A1, A2, A3 are mutually exclusive, so, according to formula (3.1) we have P(A1 +A2 + A3) = P(A1) + P(A2) + P(A3). Further, we use combinatorial formulas 3 Р(А1) = C14 C 62 / C10 =1/2, 3 Р(А2) = C 24 C16 / C10 = 3/10, 3 Р(А3) = C 34 / C10 = 1/30

and obtain Р(А) = 1/2 + 3/10 + 1/30 = 5/6. Now, let us solve this problem by applying formula (3.2). We have: Ā = {all three balls are not white}. The probability of this event is 3 Р(Ā) = C 36 / C10 = 1/6.

Having applied formula (3.2) we get: P(A) = 1 - P(Ā) = 1 - 1/6 = 5/6.

Thus, we got the same result, but by a more rational way. Let us generalize formula (3.1) on arbitrary events. Let A and B be two arbitrary events (perhaps joint, i.e. AB  ). We have: A+B = A + (B \ AB), B = AB + (B \ AB), where the terms on the right sides of both equations are disjoint events. Then, according to Kolmogorov axiom 3, we have: P(A + (B \ AB)) = P(A) + P(B \ AB), P(B) = P(AB) + P(B \ AB), that implies P(A+B) = P(A) + P(B) - P(AB), i.e. the probability of the union of two events is the sum of their probabilities minus the probability of their joint occurrence. The latter formula is the formula of addition of probabilities for n arbitrary events. This formula is generalized by induction to any finite number of events: n

Р(  Аi) = i1

n



i 1

P(Ai) -



P(Ai Aj) +



P(Ai Aj Ak) - …+

1 i  j k  n

1i  j n

n

+ (-1)n+1P(  Ai). i1

2. Conditional probability. The probability P(A) as the degree of objective possibility of occurrence of event A is meaningful under a specific complex of conditions. If the conditions change, then the probability of event A can also change. Let’s add a new condition to the complex of conditions, under which the probability P(A) is studied. The probability of an event A occurring when it is known that some event B (P(B) > 0) has occurred is called its conditional probability and denoted by P(A/B).

The symbol P(A/B) is usually read “the probability that A occur given that B occur” or simply “the probability of A, given B.” Strictly speaking, unconditional (regular) probability P(A) is also conditional, because it is obtained under a specific complex of conditions. The concept of conditional probability is one of fundamental working tools of probability theory. It appears when we replace the sample space  for its nonempty subset B. EXAMPLE. Suppose that there are N students in some group and there are NA students who practice swimming and NB students who practice chess game (it is possible that one student practices both swimming and chess game). One student is randomly selected from the group. Consider the following events: A = {this student practices swimming}, B = {this student practices chess game}. It may happen that we are not interested in all N students, but only in those who practice chess game. Suppose we want to calculate the probability of the event consisting in the fact that the selected student practices swimming, but under additional condition that he also practices chess game. Denote by NAB the number of students engaged in swimming and chess. By the classical definition of probability, we have: P(B) = NB / N,

(3.3)

P(AB) = NAB / N.

(3.4)

The probability of A, given B is equal to P(A / B) = NAB / NB.

(3.5)

In other words, this probability is the proportion of students engaged in both swimming and chess among the students engaged in chess.

By dividing the numerator and denominator of the right side of (3.5) by N, we obtain (see (3.3) and (3.4)): Р(А/В) =

N AB / N = Р(АВ)/Р(В). NB / N

Thus, P(A / B) =

P ( AB) , P( B)

(3.6)

P(B / A) =

P ( AB ) P ( A)

(3.7)

where P(B) > 0. Similarly,

where P(A) > 0. This example is the motivation of definitions given by (3.6) and (3.7). 3. Product rule (Multiplicative rule). It is a corollary from the formula of conditional probability. If P(A) > 0 and P(B) > 0 then both equations (3.6) and (3.7) are the equivalents of the so-called Product rule: P(AB) = P(A)P(B / A) = P(B)P(A/ B),

(3.8)

Multiplicative Rule is applicable also in the case when at least one of the events A or B is an impossible event, since in this case, e.g. from P(A) = 0 it follows that P(A/B) = 0 and P(AB) = 0. We have three events: P(ABC) = P(A)P(B /A)P(C /AB). Formula (3.8) can be generalized to an arbitrary number n of events by induction: Р(А1 А2… Аn) = Р(А1) Р(А2/ А1) Р(А3/ А1А2)… Р(Аn/ А1 А2… Аn-1).

EXAMPLE. Three cards are taken out (without replacement) from a well-shuffled pack of 52 cards. Find the probability that “hearts” do not occur. Let Ai = {“hearts” do not occur when ith is taken out of the pack}, i = 1, 2, 3. We have Р(А1А2 А3) = Р(А1) Р(А2/ А1) Р(А3/ А1А2) = = (39/52)  (38/51)  (37/50). N o t e. We can find this probability by the classical definition of probability: Р(А1А2 А3) = C 339 / C 352 = (373839)/(505152). EXAMPLE. Three cards are taken out (without replacement) from an ordinary deck. Find the probability that exactly one “spades” occur. Let Bi = {ith card is “spades”}. We use an additive rule (for mutually exclusive events) and a product rule: Р(B1+B2+ B3) = (13/52)(39/51)(38/50) +(39/52)(13/51)(38/50) +(39/52)(38/51)(13/50) = = (13/52)(39/51)(38/50) C13  0.436. N o t e. We can find this probability by the classical definition of probability: 2 Р(B1+B2+ B3) = C113C39 / C 352  0.436.

QUESTIONS FOR SELF-EXAMINATION 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

What is the probability of the sum of pairwise disjoint events? What is the complete group of events? Give an example. What is the sum of probabilities of mutually opposite events? Why? What is the probability of the sum of three arbitrary events? Formulate the sum rule for n arbitrary events. Write down the conditional probability formula. When the conditional probability formula is determined? From what formula does the product rule follow? Formulate the product rule for two arbitrary events. Formulate the product rule for three arbitrary events. Formulate the product rule for n arbitrary events.

Lecture content 1. Independence of events. 2. Total probability formula. 3. Bayes formulas.

1. Independence of events. D e f. An event A does not depend on an event B if (P(B) > 0) P(A/B) = P(A), i.e. the occurrence of B does not affect the probability of A. N o t e. Suppose P(A) > 0, P(B) > 0. If an event A does not depend on an event B (i.e. P(A/B) = P(A)) then, according to (3.8) from the Lecture 3 P(AB) = P(A)P(B/A) = P(B)P(A/B) = P(B)P(A) which implies that (we divide (4.1) by P(A))

(4.1)

P(B/A) = P(B), i.e. B also does not depend on A. Thus, independence of events is mutual. We can determine the independence of events A and B by the following (4.1) equality P(AB) = P(A)P(B)

(4.2)

(probability of intersection of independent events is the product of their probabilities). N o t e. This definition is always true, including the case when at least one of the probabilities P(A), P(B) is zero. N o t e. The formula (4.2) is usually used for practical verification of the independence of events. Let us generalize the concept of independence to the set consisting of three events. D e f. An events A, B and C are called mutually independent, if 1) they are pairwise independent, i.e. P(AB) = P(A)P(B), P(AC) = = P(A)P(C), P(BC) = P(B)P(C); 2) P(ABC) = P(A)P(B)P(C). N o t e that a pairwise independence does not imply mutual independence. COUNTEREXAMPLE (Bernstein). Regular tetrahedron is tossed. It is painted as follows: one face is red, another one – yellow, the third one – green and the fourth one – red-yellow-green (tricolor). Let R = {tetrahedron fell to the face in a red colour}, Y = {tetrahedron fell to the face in a yellow colour}, G = {tetrahedron fell to the face in a green colour}. Show: events R, Y and G are pairwise independent, but are not mutually independent. It is obvious, that P(R) = P(Y) = P(G) = 1/2, P(RY) = P(RG) = P(YG) = 1/4,

that is equivalent to P(RY) = P(R)P(Y), P(RG) = P(R)P(G), P(YG) = P(Y)P(G), i.e. there is a pairwise independence of the events R, Y and G. However, according to the definition of mutual independence of events R, Y and G, we must have the equality P(RYG) = P(R)P(Y)P(G). We have: 1/8 = P(R)P(Y)P(G)  P(RYG) = 1/4, i.e. R, Y and G are not mutually independent. N o t e. Other variations of this example exist. Let us generalize the definition of independence by induction to the set of n events. D e f. The events A1, A2, ..., An are mutually independent, if 1) P(AiAj) = P(Ai)P(Aj) if 1 ≤ i < j ≤ n; 2) P(AiAjAk) = P(Ai)P(Aj)P(Ak) if 1 ≤ i < j < k ≤ n; … n-1) P(A1A2...An) = P(A1)P(A2)…P(An). 2) Total probability formula is used in cases where an event A can occur only when one of the events (called hypotheses) H1, H2, ..., Hn  , that make up a complete group, has occurred. Total probability formula. Let {H1, H2, ..., Hn} be a CGE – complete group of events, (these are mutually exclusive and exhaustive events, i.e. Hi are pairwise disjoint and their union is , see Lecture 3), and P(Hi) > 0, i = 1, 2, …, n. If A is any event, then Р(А) =

n



i 1

Р(Нi)Р(А/Hi).

P r o o f. We have: A = A = A(H1 + H2 + ... + Hn) = AH1 + AH2 + ... + AHn, where the summands of the last sum are mutually exclusive. Then, by the Sum rule for mutually exclusive events (see Lecture 3), P(A) = P(AH1) + P(AH2) + ... + P(AHn). By applying the Multiplicative rule (see Lecture 3) to the righthand side of the last equality, we obtain: Р(А) =

n



Р(Нi)Р(А/Hi).

i 1

EXAMPLE. There were n white and m black balls in the urn. Then one ball (of unknown colour) was lost. Then, one ball is taken out of the urn. What is the probability that it is white? We have: A = {it is white}. Hypotheses: H1 = {white ball was lost}, H2 = {black ball was lost}.

n , nm m Р(Н2) = . nm Р(Н1) =

N o t e. P(H1) + P(H2) = 1 (always, P(H1) + P(H2) + ... + P(Hn) = 1, because H1, H2, ..., Hn – CGE). Conditional probabilities: Р(А/Н1) =

n 1 , n  m 1

Р(А/Н2) =

n . n  m 1

Then Р(А) =

m n n n 1 n + = . n  m n  m 1 n  m n  m 1 n  m

N o t e that the probability of taking out a white ball after losing is equal to the probability of taking out it before losing! If two, three, etc., m + n – 1 balls will be lost, then the probability of extracting a white ball after the loss will still be equal to the similar probability before the loss. Try to explain why this is happening. 3) Bayes formulas. Suppose Н1, Н2, …, Нn are the hypotheses (CGE) and А (Р(A) > 0) – some event. Then the conditional probability that the hypothesis Hi took place, if event A was observed as a result of an experiment, can be calculated by the formula: P(Нi/A) =

P( H i ) P( A / H i ) n

, i = 1, 2, …, n.

 P( H j ) P( A / H j ) j 1

P r o o f. We use Product rule (see Lecture 3): Р(НiA) = Р(Нi)P(A/ Нi) = P(A) P(Нi/A). It follows from here that P(Нi/A) =

P( H i ) P( A / H i ) . P( A)

Then we use the Total probability formula: P(Нi/A) =

P( H i ) P( A / H i ) n

 P( H j ) P( A / H j ) j 1

, i = 1, 2, …, n.

N o t e. The sum of probabilities of the hypotheses after the occurrence of A is equal to the similar sum before its occurrence, i.e., it is 1: n

n

 P( H i / A)  i 1

 P( H ) P( A / H ) i 1 n

 P( H j 1

i

j

i

 1.

) P( A / H j )

EXAMPLE (see picture). О Н1 Н3

Н2

А

A tourist, who does not know the way, sets out from point O, goes along one of the three roads and gets to point A. Find the probability that he went along ith path (i = 1, 2, 3). S o l u t i o n. We have: А = {the tourist gets to point A}. Hypotheses: Н1 = {the tourist goes along the 1st road}, Н2 = {the tourist goes along the 2nd road}, Н3 = {the tourist goes along the 3rd road}. Obviously, that Р(Н1) = Р(Н2) = Р(Н3) = 1/3.

Let us calculate the conditional probabilities: Р(А/H1) = 1/3, Р(А/H2) = 0, Р(А/H3) = 1. Therefore, by the Total Probability formula Р(А) = (1/3 + 0 + 1)/3 = 4/9. Finally, by the Bayes formulas we have: Р(Н1/A) = Р(Н1) Р(A/Н1)/P(A) = 1/4, Р(Н2/A) = Р(Н2) Р(A/Н2)/P(A) = 0, Р(Н3/A) = Р(Н3) Р(A/Н3)/P(A) = 3/4. N o t e that (see. picture) if the tourist gets to point A, then it is most likely that he chose the 3rd way, so Р(Н3/A) is the highest among the probabilities Р(Нi /A).

QUESTIONS FOR SELF-EXAMINATION 1. Give the definition of the independence of two events, using conditional probability. 2. If event A does not depend on event B, then does it imply that B also does not depend on A? Explain the answer. 3. What formula of independence do we use in practice for checking the independence of the events A and B? 4. Give the definition of mutual independence for three events A, B, C. 5. What is the connection of the pairwise independence of events with a mutual independence? 6. Tell about Bernstein example. 7. What is Total Probability formula used for? 8. Formulate Total Probability formula. 9. What is the sum of probabilities of hypotheses? Why? 10. What is Bayes formula used for? Formulate Bayes formula.

Lecture content 1. Bernoulli scheme. 2. Bernoulli formula. 3. The most probable number of successes in n Bernoulli trials. 4. Polynomial scheme.

1. Bernoulli scheme. D e f. Bernoulli (or binomial) trials is a sequence of n independent random experiments, each of which has only two possible outcomes: "success (S)" (if an event A of interest to us occur) and "failure (F)” (if A does not occur), and besides the probability of success is the same every time the experiment. E x a m p l e s: – Flipping a coin. In this case, e.g., "heads" is success and "tails" is failure. A fair coin has the probability of success equal to 0.5 by definition. In this case, there are exactly two outcomes. – Dice tossing, where, e.g., a “6” is "success" and everything else – "failure". In this case, the event «“6”» may have 1 outcome; the complementary event «not “6”» may have 5 outcomes. – Target shooting. E.g.: “success” is a target hitting, “failure” is a miss. We have a theoretical probabilistic model corresponding to n Bernoulli trials. Suppose р = P(S), q = P(F), Аi is an outcome of the ith trial (Аi = 1, if S occur in the ith trial and Аi = 0, if F occur in the ith trial). The triple (, F, Р), where  = {:  = (a1, a2, …, an), ai = 0; 1}; A = {A: A  };    Ω, p() = p is Bernoulli Scheme.

a1  a2 ... an

q n( a1a2 ...an ) , p + q = 1

(5.1)

E x a m p l e. A coin is tossed 10 times. Let S be an occurrence of tails. Then  = {:  = (a1, a2,…, a10), ai = 0; 1}, where ai = 1 if tails occur and ai = 0 if heads occur; A = {A: A  }; p() = (1 / 2)

a1  a2 ... a10

(1 / 2)10( a1 a2 ... a10 ) .

2. Bernoulli Formula. The formula, used with Bernoulli trials, computes the probability of obtaining exactly "m" successes in "n" Bernoulli trials. E.g., the probability of obtaining m defective products among n products (if we know the probability of defect); the probability of m target hitting at n shots (if we know the probability of hitting), etc. Bernoulli Formula. The probability of obtaining exactly m successes in n Bernoulli trials is equal to Pn(m) = C nm p m q n – m,

(5.2)

where р is the probability of success, q = 1-p is the probability of failure. P r o o f. Let, e.g., we have 4 Bernoulli trials (n = 4) and it is necessary to find the probability of 3 successes (m = 3). The event В = {3 successes in 4 Bernoulli trials} contains the following elementary events:

1 = {S in 1st, 2nd, 3rd trial and F in 4th trial} = {SSSF}; similarly,

2 = {SSFS}; 3 = {SFSS}; 4 = {FSSS}.

Note that we have C 43  Then,

4! = 4 elementary events. 3! 1!

Р(1) = Р{SSSF} = p3q1; Р(2) = Р(3) = Р(4) = p3q1.

Thus, the probabilities of elementary events of event B are the same and equal to p3q1. Further, because the event B contains C 43 disjoint equally likely elementary events, so its probability (by the Sum rule) is equal to P(B) = C 43 p3q1 = C 43 p3q4 –1. Consider a general case. For an elementary event  = (a1, a2,…, an) we have: ai = 1, if S occur in the ith trial and Аi = 0, if F occur in the ith trial. We denote by Вm = {: a1+ a2+…+an = m}. This event is the fact that m successes occur in n Bernoulli trials. Because if   Вm then (see (5.1)) р() = p m q n – m, so Р(Вm) = Pn(m) = Cnm pmq n – m, where Cnm is the number of elementary events constituting an event Вm, i.e. the number of possible locations of m successes among n places. So, Pn(m) = Cnm p m q n – m. E x a m p l e. Coin is tossed 10 times. Find the probability that tails occur exactly 5 times.

S o l u t i o n. We have: n = 10, m = 5, p = q = 1/2. Then by the formula (5.2) 5 P10(5) = C10 (1/2)5 (1/2)5 = 10! (1/2)10  0,25. 5! 5!

3. The most probable number of successes in n Bernoulli trials. Consider next example. Coin is tossed 4 times; m is the number of occurrence of tails. We have by the formula (5.2): 0 P4(0) = C 4 (1/2) 0 (1/2)4 = 1/16; 1 P4(1) = C 4 (1/2) 1 (1/2)3 = 4/16;

P4(2) = C 42 (1/2) 2 (1/2)2 = 6/16; P4(3) = C 43 (1/2) 3 (1/2)1 = 4/16; 4 P4(4) = C 4 (1/2) 4 (1/2)0 = 1/16.

It's clear that there the values of m (in this case one: m = 2) having the maximal probability Pn(m) exist. D e f. The number m0 of occurrence of success in n Bernoulli trials is called the most probable number of successes, if the probability Pn(m0) is the largest among the probabilities Pn(m) for any m = 0, 1, …, n: Pn(m0)  Pn(m), m = 0, 1, …, n.

(5.3)

We calculate m0. We have:

Pn (m  1) n!m!(n  m)! p m1 (1  p) nm1 nm p = .  m nm Pn (m) m 1 1 p (m  1)!(n  m  1)! n! p (1  p) Pn(m+1) > Pn(m) if

nm p >1, i.е. m < np – q; m 1 1 p

(*)

Pn(m+1) = Pn(m) if

nm p =1, i.е. m = np – q; m 1 1 p

Pn(m+1) < Pn(m) if

nm p np – q. m 1 1 p

(**)

Thus, while m increases and reaches the value np-q, we will always have Pn(m+1) > Pn(m), i.e. with the increase in number m the probability Pn(m) will always increase. When number m, by increasing, crosses the border np-q, the probability Pn(m) begins to decrease and will decrease to Pn(n). Always, by (5.3), Pn(m0 + 1)  Pn(m0). This takes place when (see (**)) m0  np – q. On the other hand, always by (5.3), Pn(m0)  Pn(m0 – 1). This takes place when (see (*)) m0 – 1  np – q, i.е. m0  np – q + 1 = np + p. Therefore, the number m0 must satisfy the double inequality np – q  m0  np + p.

(5.4)

Therefore, (5.4) define the interval (length of 1!) in which there is an integer number m0.

If np+p is integer, then two most probable numbers of successes, which have the same probabilities (by definition (5.3)) exist. E x a m p l e. A coin is tossed 4 times. Find the most probable number of occurrence of tails. S o l u t i o n. We have by formula (5.4): 5/2 – 1/2  m0  5/2 + 1/2, i.е.

2  m0  3

and we have two most probable numbers of occurrence of tails: 2 and 3. Calculate their probabilities: Р5 (2) = C52 (1 / 2) 2 (1 / 2) 3  10 / 32 and Р5 (3) = C53 (1 / 2) 3 (1 / 2) 2  10 / 32 . We have Р5 (2) = Р5 (3). 4. Polynomial scheme. Consider n independent trials with m outcomes in each trial. Suppose there are m possible outcomes 1, …, m in each trial and P(ith outcome) = pi, where p1 + … + pm = 1. Denote by P(n1, …, nm) the probability of the event A = {the 1st outcome occurs n1 times, the 2nd outcome – n2 times, …, the mth outcome - nm times}. T h e o r e m. For any n and any nonnegative integer numbers n1, …, nm such that n1+ … +nm = n it takes place that: P(n1, …, nm) =

n! p1n1 ... p mnm . n1!...nm !

P r o o f. Consider one elementary event (n1 of “1”, n2 of “2”, …, nm of “m”):

1 ,...,1; 2 ,..., m 2;...; m,...,  n1

n2

nm

It is a result of n experiments, when all favorable outcomes occur in a predetermined order. The probability of this result is equal n to the product of probabilities p1 1 ... pmnm . The remaining favorable outcomes differ only in the arrangement of numbers 1, ..., m by n places. The number of such outcomes is the number of ways to arrange n1 of “1”, n2 of “2”, …, nm of “m” by n places. This number is

Cnn1 Cnn2 n1 Cnn3 n1 n2 ...Cnnmn1 ...nm1 =

n! . n1!...nm !

QUESTIONS FOR SELF-EXAMINATION 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

What are Bernoulli trials? What are “p” and “q” in Bernoulli scheme? What is p + q? What is Bernoulli formula used for? Write down Bernoulli formula. Explain all components. What is the behaviour of the probability Pn(m) in Bernoulli trials? Give the definition of the maximal probable number of successes in n Bernoulli trials. What is the double inequality that determines the maximal probable number of successes in n Bernoulli trials? What is the length of segment in which a maximal probable number of successes in n Bernoulli trials lies? In what case there will be two maximal probable numbers of successes in n Bernoulli trials? What is P(n1, …, nm)?

Lecture content 1. Poisson approximate formula. 2. Local formula of Moivre-Laplace. 3. Integral formula of Moivre-Laplace.

1. Poisson approximate formula. Suppose we need to calculate the probability Pn(m) of m successes in n Bernoulli trials, where n is large, eg., P500(300). By the Bernoulli formula, we have: P500(300) =

300 300 200 р q = C500

500 ! р300q200. 300 ! 200 !

It is clear that in this case direct calculation by the Bernoulli formula is technically difficult, especially when p and q are fractional. Therefore, it is necessary to have a simple approximate formula to calculate the required probability for large n. These asymptotic formulas are determined by the Poisson theorem, local and integral theorems of Moivre – Laplace. T h e o r e m (Poisson). If the probability р of occurrence of some event А in each of n independent trials tends to zero (р  0) with an unlimited increasing number n of trials (n  ), while the product np tends to a constant number  (nр  ), then the probability Pn(m), that the event А will occur m times in n independent trials, satisfies the limit equality

lim Pn(m) =

n

m -  е . m!

(6.1)

P r o o f. By the Bernoulli formula Pn(m) =

n(n  1)...( n  m  1) m p (1 – p)n (1 – p)- m, m!

or, given that lim np = , i.е., (if n is large enough) p  /n, n

Pn(m) 

m m!

  1   m 1 n m 11  ...1    (1-/n) (1-/n)- . n    n 

Because

lim (1-1/n) =…= lim (1 - (m - 1)/n) = 1, n

n

lim (1 - /n) = lim ((1 - /n) - n/) -  = е- n

n

and

n

lim (1 - /n)- m = 1,

n

so

lim Pn(m) = е - 

n

m . m!

Strictly speaking, the condition “р  0 if n   so that nр  ” of Poisson theorem, contradicts the original assumption of a Bernoulli trials definition, according to which the probability of occurrence of event A is constant in all trials. However, if the probability p is constant and low, the number n of trials is large and the number  = np is not big and not small (we assume that p < 0.1, np  10), then the approximate formula of Poisson follows from the limit equality (6.1): Pn(m)  е - 

m . m!

E x a m p l e. Suppose we have a community of 500 people. Find the probabilities of events (i = 0, 1, …, 500): Аi = {birthday of i people is January, 1}. S o l u t i o n. We have 500 Bernoulli trials (n = 500) and i successes (m = i). A success is the birth on the 1st of January, the probability of success р = 1/365 (for simplicity, we assume that there are 365 days in year), the probability of failure q = 1 – p = 1 – 1/365 = = 364/365. We have: i = 0: А0 = {none of the 500 people has a birthday on January, 1}. By the Bernoulli formula 0 Р500(0) = C500 (1/365)0 (364/365)500  0,25366.

By the Poisson formula ( = np = 500/365): 0 Р500(0)  (500 / 365) е –500/365  0,25414. 0! Further calculations are summarized in the Table:

i 0 1 2 3 4 5 6 …

P500(i) (by the Bernoulli formula) 0.25366 0.34844 0.23883 0.10892 0.03718 0.01013 0.00230 …

P500(i) (by the Poisson formula) 0.25414 0.34814 0.23845 0.10888 0.03729 0.01022 0.00233 …

If i  7 then the probabilities P500(i) are small enough and we ignore them.

The table shows that the use of the Poisson formula instead of the Bernoulli formula gives an error in the third decimal place, which is a good approximation in this case. N o t e. The most probable number of birthdays on January, 1 is m0 = 1 (P500(1) is the maximal probability among P500(i)). The use of formula (5.4) (see Lecture 5) gives the same result: 500  1/365 – (1 – 1/365)  m0  500  1/365 +1/365, i.е. 0,37260  m0  1,37260. This interval contains only one integer number m0 = 1, which coincides with the one obtained from the table. 2. Local formula of Moivre-Laplace. Local theorem of Moivre-Laplace gives an asymptotic formula, which permits to find the probability of occurrence of m successes in n Bernoulli trials, if n is sufficiently large. This formula was found in 1730 by Moivre for the special case (where the probability p of success in each trial is equal to 1/2), in 1783 Laplace generalized the Moivre formula for any p such that 0 < p 4). E x a m p l e. Find the probability that there are exactly 100 boys among 200 children admitted to the first class. S o l u t i o n. We have: n = 200, m = 100, p = q = 1/2. Then by the formula (6.2) we obtain: P200(100) 

1 1  100  200 1/ 2  (0)  0.056.  = 200 1/ 2 1/ 2  200 1/ 2 1/ 2  5 2

Therefore, the probability of our event is 0.056 (note that it is sufficiently small).

3. Integral formula of Moivre-Laplace. Now suppose that in the conditions of previous example we need to find the probability that the number of boys will be from 90 to 110 (inclusive). By the Sum rule (for mutually exclusive events) this probability is Р(90  m  110) = P200(90) + P200(91) + …+ P200(110). In principle, it is possible to calculate each term by the local Moivre-Laplace formula and then find the sum, but the calculations are cumbersome. In such cases, we use the so-called Integral Moivre-Laplace theorem. Integral theorem of Moivre-Laplace. If the probability p of occurrence of event A in each of independent trials is constant and differs from 0 and 1, then the probability Pn(m1  m  m2) that the event A will occur from m1 up to m2 times in n trials is approximately equal to    m  np   - Ф m1  np  , Pn(m1  m  m2)  Ф 2 (6.3)  npq   npq      where Ф(х) = is the Laplace function.

1 2

x

e



t2 / 2

dt

Formula (6.3) is called Integral formula of Moivre-Laplace. The greater n, the more precise it is: if npq  20 then (6.3), as well as (6.2), allows a small error. Function Ф(х) is tabulated. To use tables, it is necessary to know the following properties of function Ф(х): (i) Ф(х) + Ф(- х) = 1; (ii) Ф(x) increases monotonically; (iii) Ф(х)  1 when х   (practically it can be considered that Ф(х)  1 when х > 4). E x a m p l e. Find the probability that there are 90 to 110 boys among 200 children admitted to the first class. S o l u t i o n. We have: n = 200, m1 = 90, m2 = 110, p = q = 1/2. Then by formula (6.3) we obtain:

 110  200 1/ 2   90  200 1/ 2  P200 (90  m  110)  Ф  - Ф =  200 1/ 2 1/ 2   200 1/ 2 1/ 2  = Ф( 2 )  Ф( 2 ) = 1 – 2 Ф( 2 )  0,838. Therefore, the probability of our event is 0.838 (note that it is sufficiently large). QUESTIONS FOR SELF-EXAMINATION 1. What approximate formulas of Bernoulli scheme do you know? 2. In what cases is the approximate formula of Poisson used instead of the Bernoulli formula? 3. What is the contradiction between the conditions of Poisson theorem and the original assumption of Bernoulli trials definition? 4. Write down the approximate Poisson formula and explain its components. 5. Under what restrictions on p and n does the approximate Poisson formula give sufficiently good approximation for the Bernoulli formula? 6. In what cases are the limit formulas of Moivre-Laplace used? 7. In what cases is the Local Moivre-Laplace formula used instead of Bernoulli formula? 8. Under what restrictions on n does the Local Moivre-Laplace formula have a small error? 9. What is the Integral Moivre-Laplace formula used for? 10. In what cases does the Integral Moivre-Laplace formula have a small error?

Lecture content 1. Definitions. 2. Independence of random variables. 3. Operations on random variables. 3. Two-dimensional random variables.

1. Definitions. The concept of a random variable is one of the basic concepts of probability theory. Let us consider some e x a m p l e s of random variables. 1. The number of boys among 10 newborns. 2. The velocity of the gas molecule. 2. Life time. 3. Daily store revenue. D e f. Let  = {1, 2, …, n} be some finite sample space. A numerical functions  = (), defined on , is called a (simple) random variable (in abbreviated form – RV). Let us consider a RV , taking (different) values x1, x2, …, xm:

 = {x1, x2, …, xm}. D e f. Distribution law of RV  is any mapping between the possible values xi of this RV and their probabilities рi: P( = xi) = рi, i = 1, 2, ..., m. R e m a r k. Always р1 + р2 +…+ рm = 1, because the events { = xi} are CGE, because they are exhaustive mutually exclusive events (Why?), but the sum of the event probabilities consisting of CGE is 1 (see Lecture 3, assertion 1).

D e f. A mapping between possible values of RV and their probabilities, given in a tabular form 

х2 p2

x1 р1

р

… …

xm pm

is called a distribution series of RV . E x a m p l e. A fair dice is tossed. RV  is the number of dropped points. Find the distribution law and distribution series of . S o l u t i o n. The values of this RV are 1, 2,…, 6, besides, Р( = 1) = Р( = 2) =…= Р( = 6) = 1/6. We have:  р

1 1/6

2 1/6

3 1/6

4 1/6

5 1/6

6 1/6

D e f. Let  be RV, х is any real number. The probability that  takes the value, which is less than or equal to х, is called a distribution function (D.f.) of  (denoted by F(x) or just F(x)): F(x) = P{  x} =

 pi .

i:xi  x

E x a m p l e. Find the distribution function of RV  from the previous example. S o l u t i o n. 0 1/6  2/6  F(x) = 3/6 4/6  5/6 1 

if x  1 if 1  x  2 if 2  x  3 if 3  x  4 if 4  x  5 if 5  x  6 if x  6

Graphically:

Properties of distribution function: 1. 0  F(x)  1. It is obvious, because the distribution function is the probability. 2. Р{x1 <   x2} = F(x2) – F(x1). We have {  x2} = {  x1} + {x1 <   x2}, where the events of the right-hand side are disjoint. By the formula of the sum of disjoint events, Р{  x2} = Р{  x1} + Р{x1 <   x2}, i.е., by definition of D.f., F(x2) = F(x1) + Р{x1 <   x2}, which is the equivalent of 2. 3. Distribution function does not decrease i.е. F(x1)  F(x2) if x1  x2.

Suppose that x1  x2. We have from the proof of property 2: F(x2) = F(x1) + Р{x1 <   x2} F(x1), because the probability is non-negative. 4. Distribution function is continuous on the right. 5. F(x)  0 when х  - ; F(x)  1 when х  + . 2. Two-dimensional random variables. Suppose that two random variables  = () and  = () are given at the same (finite) sample space  = {}. In this case it is said, that two-dimensional random variable (, ) or two-dimensional random vector  = (, ) is given. Distribution (, ) is: \ x1 x2 …

y1 p11 p21 … pm1

y2 p12 p22 … pm2

… … … … …

yk p1k p2k … pmk

where х1, х2, …, хm are possible values of ; while y1, y2,…, yk are possible values of , pij = Р( = хi,  = yj),  pij  1. i, j

Two-variable function F, (x, y) = Р(1  х, 2  y) = =   pij , which is defined for all real numbers x and y, is i:xi  x j: y j  y

called a joint distribution function of (, ). 3. Independence of RVs. Suppose that two RVs  = {х1, х2, …, хm} and  = {y1, y2, …, yk} are given. These RVs are called independent if for any i = 1, 2, …, m and j = 1, 2, …, k Р{ = xi,  = yj} = Р{ = xi}Р{ = yj}.

(7.1)

E x a m p l e. Two coins are tossed. Let RV  be the number of tails, which occur in the 1st coin tossing;  is the number of tails which occur in the 2nd coin tossing. Let us verify the independence of  and . S o l u t i o n. Corresponding sample space is  = {HH, HT, TH, TT}. Further, Р{ = 0,  = 0} = 1/4 = 1/2  1/2 = Р{ = 0}Р{ = 0}; Р{ = 0,  = 1} = 1/4 = 1/2  1/2 = Р{ = 0}Р{ = 1}; Р{ = 1,  = 0} = 1/4 = 1/2  1/2 = Р{ = 1}Р{ = 0}; Р{ = 1,  = 1} = 1/4 = 1/2  1/2 = Р{ = 1}Р{ = 1}. So (7.1) is performed, so,  and  are independent. D e f. Let 1, 2, …, r be some set of RVs with the values from (finite) set Х  R. These RVs are called (mutually) independent if for any x1, x2, …, xr  X P{1 = x1, 2 = x2, …, r = xr} = P{1 = x1} P{2 = x2}…P{r = xr}. 4. Operations on random variables. Suppose that two RVs

 and  are given on the some (finite) sample space (pi = P{ = xi}, qj = P{ = yj}):  pi

 qj

х1 p1 y1 q1

x2 p2 y2 q2

… … … …

xm pm yk qk

D e f. Product с (c = const) is the RV with the distribution series с pi

cх1 p1

cx2 p2

… …

cxm pm

D e f. The power n is RV with the distribution series n pi

х1n p1

x2n p2

… …

xmn pm

D e f. The sum  +  (difference  - , product , quotient (if  ≠ 0) /) of RVs  and  is RV with the values хi + yj (хi – yj, хiyj, хi/yj), i = 1, 2, …, m, j = 1, 2, …, k and probabilities рij = Р{ = xi,  = yj}. R e m a r k. If  and  are independent then (by definition (8.1)) рij = Р{ = xi,  = yj} = Р{ = xi}Р{ = yj} = рiqj.

QUESTIONS FOR SELF-EXAMINATION 1. What is a (simple) random variable? Give example. 2. Give the definition of the random variable distribution law. Why is the sum of probabilities of all the values of any RV 1? 3. What is the distribution function of a random variable? 4. What properties does the distribution function of a random variable have? 5. What form does a graph of the distribution function of a simple random variable have? 6. When is it said that two-dimensional random vector is given? 7. How is the joint distribution function of two random variables defined? 8. What are pij probabilities? 9. Give the definition of independence of two random variables  = {х1, …, хm} and  = {y1, …, yk}. 10. Define the sum of RVs.

Lecture content 1. Expected value. 2. Variance. 3. Numerical characteristics of the measure of stochastically linear relation between two random variables.

1. Expected value. Let (, F, Р) be a (finite) probabilistic space,  = (),    – some RV with the values in the set Х = {x1, x2, …, xm}. The distribution of  is : хi pi

х1 p1

x2 p2

… …

xm pm

D e f. Expectation (mean value) of a RV  is the number E =

m



i 1

xi pi.

The main p r o p e r t i e s of E (с, А, b are constants; ,  - RVs) 1. 2. 3. 4. 5. 6.

E (с) = с. E (a + b) = aE + bE (linearity). If  and  are independent then E = EE. If   0 then E  0 (non-negativity). If    then E  E (monotony). E E.

P r o o f s. Suppose that RV  has a set of values Х = {x1, x2, …, xm}, RV  has a set of values Y = {y1, y2, …, yk}. Introduce the denotations Ai = { = xi}, i = 1, 2, …, m; Bj = { = yj}, j = 1, 2, …, k, then RV  and  can be represented as follows:

=

m



i 1

xiI(Ai),  =

k



j 1

yjI(Bj),

(8.1)

where RV I(Ai) = IAi() (set Ai indicator) equal to 1, if   Ai and 0 otherwise; similarly, I(Bj) = 1, if   Bj and I(Bj) = 0 otherwise. Obviously: E(I(Ai)) = Р(Ai), E(I(Bj)) = Р(Bj). Note that the events A1, A2, …, Am (B1, B2, …, Bk) are a complete group of events (see Lecture 3). We proceed to the proof of properties. 1. Constant c can be seen as RV, that takes the value c with the probability 1, so E(с) = с1 = с. 2. Suppose we have the following representation (8.1). Then a + b = a  xiI(Ai) + b  yjI(Bj) = (Ai = j

i

j AiBj, Bj = i AiBj) =

= a  xi  I(AiBj) + b  yj  I(AiBj) = a  xiI(AiBj) + j

i

j

i

i ,j

+ b  yjI(AiBj) = i ,j

= E(a + b) =

(axi + byj)I(AiBj),  i ,j

(axi + byj)P(AiBj) =  axiP(Ai) +  byjP(Bj) =  i ,j j i

= a  xiP(Ai) + b  yjP(Bj) = a E + b E. j

i

3. We have E = E(  xiI(Ai))( i

=

xi yjР(AiBj)  i ,j

=

xi yjI(Ai Bj) = j yjI(Bj)) = E  i ,j

xi yjР(Ai)Р(Bj) =  xiP(Ai)  yjP(Bj) =  i ,j j i = EE,

where we used the fact that the events Ai and Bj are independent for independent RVs, i.e. Р(AiBj) = Р(Ai)Р(Bj). Property 4) is obvious. Property 5) follows from properties 2) and 4):      -    0  E - E = E( - )  0  E  E. 6.E=   xiP(Ai)  xi P(Ai) = E. i

i

D e f. Let f(x) be an arbitrary numerical function defined on the set of real numbers. Then Ef() =

m

 f(xi)pi.

i 1

2. Numerical characteristics of the measure of stochastically linear relation between two random variables. In probability theory and statistics, covariance is a measure of how much two RVs change together. If the greater values of one variable mainly correspond to the greater values of the other variable, and the same holds for the smaller values, i.e., the variables tend to show similar behaviour in this plane, the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the smaller values of the other, i.e., the variables tend to show opposite behaviour, the covariance is negative. Therefore, the sign of the covariance shows the tendency in the linear relationship between the variables. D e f. A covariance between two jointly distributed random variables  and  is the number Cov(, ) = E( - E)( - E). In view of this, we have one more property of variance: for any RVs  and  V(  ) = E((  ) - E(  ))2 = E(( - E)  ( - E))2 = E( - E)2 + +E( - E)2  2E( - E)( - E) = V + V 2 cov(, ). (8.3)

For example, if f(x) = x2, so the value Ef() = E(2) = x12p1 + x22 p2+…+ xm2pm is a mean-squared value of RV . 3. Variance. It is a characteristic of deviation of the random variable values from its mean value. D e f. Variance of RV  is the number V = E( - E)2. N o t e. It is obvious that the variance is non-negative! Because it is the mean value of non-negative RV ( - E)2 (see property 4 of expectation). We have by using expectation properties: V = E( - E)2 = E(2 – 2E + (E)2) = E(2) – (E)2, i.е.

V = E(2) – (E)2.

(8.2)

D e f. The value  = V  is called a standard deviation of RV . N o t e. A standard deviation has the dimension of a random variable, while a variance has the dimension of the square of a random variable. P r o p e r t i e s of variance (c = const) 1. Vс = 0. P r o o f. Vс = E(с-E(с))2 = E(с-с)2 = E(0) = (see property 1 of E) = 0. 2. V(c) = с2 V. P r o o f. E(c) = E(с - E(с))2 = E(с - сE)2 = с2E( - E)2 = с2 V.

3. V(c + ) = V. P r o o f. V(c + ) = E((с + ) - E(с + ))2 = E(с +  - с - E)2 = = E( - E)2 = V. 4. If RVs  and  are independent, then V(  ) = V + V. P r o o f. V(  ) = (see (8.2)) = E(  )2 – (E(  ))2 = = E(2  2 + 2) – (E  E)2 = E2  2E + E2 – (E)2   2 EE - (E)2 = (in view of the independence of these RVs we have E = EE, see property 3) of an expectation) = = (E2 – (E)2) + (E2 – (E)2) = V + V. N o t e (interpretation of expectation and variance in the financial analysis). Suppose, for example, we know the distribution  of some financial asset (e.g., share), i.е. we know the values xi of profitability of the asset and their probabilities pi during a certain period. Then an expectation E is an average (forecasted) yield, and a variance V (or standard deviation ) is a measure of the deviation or oscillation of the yield around the mean value, i.е. asset risk. N o t e. If Сov(, ) > 0 then, when one RV increases, the other one tends to increase. If Сov(, ) < 0 then, when one RV increases, the other one tends to decrease. If Сov(, ) = 0 then, there is no linear (direct) relation between RVs. The more convenient formula for calculating covariance can be obtained by using the properties of an expectation: cov(, ) = E( - E)( - E) = E( - E - E + EE) = = E - EE - EE + EE = E - EE, i.е.

Cov(, ) = E - EE.

(8.4)

N o t e. If  and  are independent, then E = EE (see property 3) of expectation) and, according to (8.4), Cov(, ) = 0. In this case, it is said that RVs  and  are uncorrelated. N o t e. If RVs  and  are independent, then they are uncorrelated, but the reverse (in general case) is not true.

E x a m p l e. RV  is given: 

0 1/3

P

/2



1/3

1/3

Consider RVs  = sin and  = cos. Construct their distributions: 

0 2/3

р

 р

-1 1/3

1 1/3 0 1/3

1 1/3

We have

Cov(, ) = E - EE = = (E = 0, because RV  = sincos = ½ sin = 0 for all ; E = 0) = 0, i.е.  and  are uncorrelated, but they are dependent not only in the probabilistic sense: Р{ = 1,  = 1} = 0  1/9 = Р{ = 1}Р{ = 1}, but functionally dependent: 2 + 2 = 1. N o t e. Covariance shows only the direction of stochastically linear relations between two random variables. The normalized version of the covariance, the correlation coefficient, however, shows the direction and strength of linear relations. D e f. If V > 0, V > 0, then the number

, =

Cov( , ) V  V

is called the correlation coefficient between RV  and .

N o t e. ,   1 and if ,  = 1, then  and  are linearly dependent. How to interpret a correlation coefficient? Let us consider the magnitude and strength of the correlation coefficient. The magnitude of the correlation coefficient reflects the strength of the connection. When assessing the strength of the relationship of the correlation coefficients, the Cheddock scale is used: Table of analysis of the strength of the correlation between variables Absolut value of correlation coefficient 0 - 0.3 0.3 - 0.5 0.5 – 0.7 0.7 – 0.9 0.9 - 1

Interpretation of strength of correlation very weak weak average high very high

QUESTIONS FOR SELF-EXAMINATION 1. Give the definition of an expectation of RV. What is its meaningful meaning? 2. If  is RV, c = const., then what is E(c)? 3. What is an expectation of the sum (difference) of two RVs? 4. In what case is an expectation of the product of two RVs equal to the product of their expectations? 5. Give the definition of the variance of RV. What is its meaningful meaning? 6. If  is RV, c = const., then, what is V(c)? V(c + )? 7. What is the variance of the sum (difference) of two RVs? Only in what case? 8. What is a standard deviation of RV ? 9. What does it mean: Сov(, ) > 0? Сov(, ) < 0? Сov(, ) = 0? 10. What is a correlation coefficient of two RVs? What does it show?

Lecture content 1. Types of random variables. 2. Types of distributions: discrete distributions. 3. Types of distributions: continuous distributions.

1. Types of random variables. Let (, F, Р) be an arbitrary (not necessarily finite) probability space. D e f. Numerical function  = () of the elementary event    is called Random variable (RV) if for any real number x there is the probability of an event {  : ()  x} = {  x}. D e f . A function

(9.1)

F(x) = P{  x},

which is defined for all х  R is a distribution function of RV  (its properties are similar to the finite case). Three types of RVs (they differ in their distribution functions): – discrete; – (absolutely) continuous; – singular. D e f. Random variable with a finite or countable set of values is called a Discrete RV (DRV). N o t e. Simple RV is a DRV because it has a finite set of values. Suppose DRV  has values x1, x2, … with the probabilities p1, p2, … (respectively). Distribution law of DRV is the mapping P( = xi) = рi , i = 1, 2, … (р1 + р2 + … = 1).

Distribution series of DRV is the table  р

х2 p2

x1 р1

… …

Distribution function of DRV is the function F(x) = F (x) = P{  x}, x  R. Expectation of DRV is the number E =





xi pi.

i 1





N o t e. E exists if and only if the series 

converges (i.e.



i 1

i 1

xipi absolutely

xipi < ∞).

Expectation of some function f() of DRV  is the number Ef() =





i 1

f(xi)pi, 



if the series



i 1

f(xi)pi absolutely converges (i.e.



i 1

f(xi)pi < ∞).

Variance of DRV is the number V = E( - E)2 = E(2) - (E)2. D e f. Continuous RV (CRV)  is RV which has a Distribution density р(х) = р(х), i.e. the integrable function p(x) such that (properties of density function): 1) p(x)  0; 

2)

 p( x)dx = 1;



3)  a, b  R, a < b, Р{a    b} =

b

 p( x)dx .

a

Distribution Function of CRV is the function x

 p(t )dt ,

F(x) =



vice versa,

р(х) = F (x)

(at the points of existence of the derivative). N o t e. Property 3) of density implies Р{a    b} = = F(b) – F(a). Expectation of CRV is the number E =



 xp( x)dx





(if this integral absolutely converges, i.e.

 x p( x)dx   ).



Expectation of some function f() of CRV  is the number Ef() =



 f ( x) p( x)dx





(if this integral absolutely converges, i.e.



f ( x) p( x)dx   ).



Variance of DRV is the number V = E( - E)2. D e f. RVs 1,…, r determined on arbitrary sample space  and taking values in the set X  R are called mutually independent if for any x1, …, xr  R

P{1  x1, …, r  xr} = P{1  x1}…P{r  xr}. N o t e. The function which is in the left side of the last equality is called the function of joint distribution of RVs 1,…, r (denotation is F1 ,...,  rr (x1, …, xr}); the functions of the right side - the marginal distribution functions (denotation of each is F1 (xi), i = 1, …, r). i 2. Types of distributions: discrete distributions. Let us consider the most common special types of DRVs-distributions. 1. Binomial distribution is defined by the formula (0 < p < 1, q = 1- p, m = 0, 1, …, n) m

Р{ = m} = Cn p m q n– m. N o t e. In the special case n = 1 we have Bernoulli distribution Р{ = m} = pm q1– m, m = 0; 1. The probabilities Р{ = m} are calculated by the Bernoulli formula, so, binomial distribution law is the law of the number m of occurrences of event А in the course of n independent trials, in each of which it may occur with the same probability р. We have for this law E = np, V = npq. Binomial distribution is widely used in the theory and practice of statistical control of quality; when describing the operations of queuing systems; in the theory of firing and in other areas. E x a m p l e s of binomial RVs: number m of defective items in the batch of a certain quantity n, selected from the mass products manufactured in the stationary mode; number m of individuals with a given combination of properties among a random sample size of n of the general population.

2. Poisson distribution is defined by the formula ( > 0 is a parameter, m = 0, 1, …) Р{ = m} = е -  m / m! We have

E = V = .

N o t e. If р  0 when n  , moreover, np   = const., this law is a limit case of a binomial law. Since in this case the probability p of occurrence of event A in each trial is small, the Poisson law is called the law of "rare events". E x a m p l e s of Poisson RVs: the number of failures of debugged production process per unit time; the number of requests that are received in the queuing system per unit time; the number of accidents occurring in a population of individuals per unit time. 3. Geometric distribution is defined by the formula (0 < p < 1, q = 1- p, m = 1, 2, …) Р{ = m} = pqm –1. We have

E = 1/p, V = q/p2.

Geometric RV  = m is the number m of the Bernoulli trials with the probability of success р up to the first success. E x a m p l e s of geometric RVs: the number of calls to the correspondent of the radio operator until the call will be accepted; the number of shots until the first hit. 3. Types of distributions: continuous distributions. 1. Uniform (in the closed interval [a, b]) distribution is defined by the density function р(х) = 1/(b-a) if х[a, b] and р(х) = 0 if х[a, b]:

p(x) (x)

1/(b-a)

x а

0

b

We have E = (a + b)/2, V = (b – a)2/12. The mechanism of formation of this RV: the probability of obtaining the observation from the neighborhood of any given range [a, b] of its possible values depends only on the width of this neighborhood, and does not depend on its location. 2. Exponential distribution defined by the density function ( > 0 is a parameter) р(х) = е-х if х  0 and р(х) = 0 if х < 0: p(x) (х)



х

0

We have E = 1/, V = 1/2. 3. Normal (Gauss) distribution is defined by the density function (a  R,  > 0) р(х) =

1

 2



e

 x  a 2 2 2

:

We have E = a, V = 2. N o t e. We write   N(А, 2) and if   N(0, 1) then RV is called a Standard normal RV. This law is the most commonly encountered in practice. The main feature that distinguishes it from other laws is that it is a limiting law to which all other laws are approaching at very frequent typical conditions. N o t e. If   N(А, 2)  F(x) =

1

 2

x

e

 ( t  a ) 2 /( 2 2 )

dt



and if   N(0, 1)  F(x)  Ф(х) =

1 2

x

e

t2 / 2

dt (Laplace function).



We have (expression of normal distribution function through standard normal distribution function, i.e. Laplace function)

F(х) =

1

 2

x

e

 ( t  a ) 2 /( 2 2 )

dt = (z = (t–a)/) =



1

 2

xa 

e



z2 2

dz =



= Ф x  a  .    T h r e e s i g m a rule. If   N(А, 2) then it is practically authentically that   (А – 3, А + 3). P r o o f. Р(А-3    А+3) = F( А+3) - F(А - 3) =  a  3  a   a  3  a  = Ф  = Ф(3) – Ф(-3) = 0,9973.  - Ф      

QUESTIONS FOR SELF-EXAMINATION 1. What is a random variable (RV)? What are all types of RVs? 2. Give the definition of discrete RV. How is its expectation determined? When is it defined? 3. What are the properties of density function of continuous RV? How is its expectation calculated? 4. Binomial distribution: distribution law, expectation, variance. 5. Poisson distribution: distribution law, expectation, variance. 6. Geometric distribution: distribution law, expectation, variance. 7. Uniform [a, b] distribution: density function, expectation, variance. 8. Exponential distribution: density function, expectation, variance. 9. Normal distribution: density function, expectation, variance. 10. Which normal RV is called standard normal RV?

Lecture content 1. Definition of the characteristic function. 2. Characteristic function of main distributions. 3. Properties of characteristic functions.

1. Definition of characteristic function. In this Lecture: i =  1 is an imaginary unit, t - real variable, eit = cost + i sint Euler's formula, E(η + iζ) = Eη + iEζ - the method of calculating the expectation of the complex-valued RV η + iζ, if the mathematical expectations of its real (η) and imaginary (ζ) parts exist. As always, the module of the complex number z = x + iy is

x 2  y 2 , so |eit| = 1. D e f. The function ξ(t) = Eeitξ of a real variable t is called the characteristic function of RV ξ. called | z | =

2. Characteristic functions of main distributions. (i) Bernoulli distribution with the parameter p.

ξ(t) = Eeitξ = eit·0P(ξ = 0) + eit·1P(ξ = 1) = 1 - p + peit. (ii) Binomial distribution with the parameters n and p.

ξ(t) = Eeitξ = =

n

n

k 0

k 0

n

 e itk P(  k )   e itkCnk p k (1  p) nk  Cnk ( peit ) k (1  p) nk  k 0

= (1 - p + peit)n. The last equality is the binomial theorem. (iii) Poisson distribution with the parameter .

ξ(t) = Eeitξ = =





k 0

k 0

 e itk P(  k )   e itk

k

it (e it ) k  e  e e = k! k 0



e   e  

k! = exp((eit – 1)).

(iv) Exponential distribution with the parameter  > 0.

ξ(t) = Eeitξ = 





0

0

0







= e itx p( x)dx  e itxe x dx  e  x ( it ) dx  =

   it

   x ( it )    e  0    it 

,

because e  x ( it )  e x  0 when x  . (v) Standard normal distribution.

ξ(t) = Ee = itξ

1 2



e





itx

p( x)dx   e itxe  x

2

/2

dx  e t

2

/2

.

0

3. Properties of characteristic functions. 1. The characteristic function always exists: ξ(t)= eitξ ≤ 1. P r o o f. We know that V ≥ 0  E2 ≤ (E)2: ξ(t)2 = Ecos(tξ) + i Esin(tξ)2 = (Ecos(tξ))2 + (Esin(tξ))2 ≤ ≤ Ecos2(tξ) + Esin2(tξ) = = E(cos2(tξ) + sin2(tξ)) = E1 = 1.

2. The distribution function (distribution function, density, or distribution table) is uniquely restored from the characteristic function. If RVs have identical characteristic functions, then the distributions of these RVs coincide. The formulas by which the distribution is restored by the characteristic function are referred to in the analysis as the “inverse Fourier transform” formulas. For example, if the module of the characteristic function is integrated over the whole line, then the random variable has a distribution density, and it is found by the formula p(x) =

1 2



e

  (t )dt.

itx



3. The characteristic function of RV a + bξ is related to the characteristic function of RV ξ by the equality:

a + bξ(t) = Eeit(a + bξ) = eitaEei(tb)ξ = eita ξ(tb). E x a m p l e. Let us calculate the characteristic function of the random variable ξ having the normal distribution with the parameters a and σ2. We know that the “standardized” random variable η = (ξ − a) /σ has a characteristic function (t) = e t a characteristic function of the value ξ = a + ση is

ξ(t) = a+ (t) = eita (t) = eita e (ta)

2

2

/2

/2

. Therefore,

.

4. The characteristic function of the sum of independent RVs is equal to the product of the characteristic functions of the terms: if the random variables ξ and η are independent, then, by the property of expectations,

ξ+η(t) = Eeit(ξ+η) = EeitξEeitη = ξ(t)η(t). 5. Suppose there exists a moment of order k ∈ N of RV ξ, that is, E| ξ |k < ∞. Then the characteristic function ξ(t) is continuously

differentiable k times, and its kth derivative at zero is related to the moment of order k by the equation:

(k)ξ(0) =

dk Eeit dt k

 Ei k  k eit t 0

t 0

 i k E k .

6. Suppose there is a moment of order k  N of RV , i.e., E k < . Then, characteristic function ξ(t) in the neighbourhood of the point t = 0 decomposes into a Taylor series

ξ(t) = ξ(0) +

 

 

k t j ( j) i jt j k  ( 0 )  o t  1  E j  o t k     j ! j ! j 1 j 1 k

= 1  itE 

 

t2 ikt k E 2  ...  E k  o t k . 2 k!

  means the following. Let f(x) and g(x) be

N o t e. Here o t k

two functions defined in a punctured neighborhood of the point x0, and in this neighborhood, g does not vanish. It is said that f is “o” small from g (f = o(g)) when x goes to x0, if for any  > 0 there is such a punctured neighborhood U(x0) of points x0 that for all x  U(x0) the following inequality holds: |f(x)| <  |g(x)|. In conclusion, we will consider the concept of weak convergence of random variables, which we will need further. D e f (weak convergence of RVs). A sequence 1, 2, ... of RVs is said to converge in distribution, or converge weakly to a random variable  if lim Fn ( x)  F ( x) n 

for every number x ∈ R at which F is continuous. Here Fn and F are the distribution functions of RVs n and , respectively.

T h e o r e m (continuous correspondence theorem). Random variables ξn converge weakly to a random variable ξ if and only if, for any t, the characteristic functions ξn(t) converge to the characteristic function ξ(t). The formulated theorem establishes a continuous correspondence between the classes of distribution functions with weak convergence and of characteristic functions with convergence at each point. The “continuity” of this correspondence is that the limit in one class relative to the convergence given in this class corresponds to the limit in another class relative to the convergence specified in this other class.

QUESTIONS FOR SELF-EXAMINATION 1. What is the characteristic function of RV? 2. What form does the Bernoulli distribution (with the parameter p) characteristic function have? 3. What form does the Binomial distribution (with the parameters p and n) characteristic function have? 4. What form does the Poisson distribution (with the parameter ) characteristic function have? 5. What form does the exponential distribution (with the parameter  > 0) characteristic function have? 6. What form does the normal distribution (with the parameters a and 2) characteristic function have? 7. What properties of characteristic functions do you know? 8. Give the definition of weak convergence of random variables. 9. Formulate the continuous correspondence theorem. 10. What correspondence does this theorem establish?

Lecture content 1. Large numbers law. Chebyshev’s’s theorem. 2. Bernoulli theorem. 3. Central limit theorem.

1. Large numbers law. Chebyshev’s theorem. Large numbers law in a broad sense is understood as the general principle according to which, under the formulation of academician A.N. Kolmogorov, cumulative action of the large number of random factors leads (at some conditions) to the result which is almost independent from the case. In other words, at the large number of RVs, their average result is no more random and can be predicted with a high degree of definiteness. Large numbers law in a narrow sense is understood as a series of mathematical theorems, in each of which for those or other conditions the fact of approximation of average characteristics of the large number of trials to some certain constants is established. Chebyshev’s inequality. For any RV  and any  > 0 Р{ ≥ }  E2/2. P r o o f. Suppose that

 0 if    , 2   if    .

1 = 

Obviously, 2  1 and (see expectation properties) E2  E1  2 Р{   ≥ }, which is equivalent to (11.1).

(11.1)

C o r o l l a r y. For any RV  such that V < ∞ and any  > 0 the following inequality takes place: Р{  - E  ≥ }  V/2 (or Р{  - E  < } ≥ 1 - V/2). (11.2) Chebyshev’s Theorem (1867). If 1, 2, … are pairwise independent RVs with uniformly bounded variances, i.e. (c = const) V1  c, V2  c, … then for any  > 0 the following limit equality takes place:

1 n  1 n lim P    i   E i     1. n  n i 1 n i 1  N o t e. Such a convergence is called a convergence in probability. P r o o f. Since i are pairwise independent, we have (by the variance properties) n c 1 n V    i  = 1  V  i  . 2 n  n i 1  n i 1

n

Suppose that  = 1   i . Then (see expectation properties), n i 1 n n 1 1 E = E   i =  E  i . n i 1 n i 1 Further, (by (11.2))

1 n  1 n  1 n P    i   E  i     1 – V    i  /2  1 – c/(n2). n i 1  n i 1   n i 1 

We obtain by passing to the limit:

1 n  1 n lim P    i   E  i     1. n  n i 1 n i 1  Since the probability cannot be greater than one, we obtain the assertion of the theorem. The content of this important theorem is that the arithmetic mean n

1   i of n RVs 1, 2,…, n for a sufficiently large n (with large n i 1 probability) arbitrarily little differs from the arithmetic mean 1 n  E  i of their expectations. It follows that the average of a large n i 1 number of RVs has a little scattering. This happens because in the course of the calculation of the average, random deviations in one or another direction are mutually cancelled, so that the total deviation is small. Chebyshev’s theorem result is called the Large numbers law in the form of Chebyshev and plays an essential role in the theory of measurement processing.

2. Bernoulli theorem. A special case of Chebyshev’s theorem is Bernoulli theorem. Let  be the number of occurrences of an event А in n independent trials, р - the probability of occurrence of А in each trial. Then for any  > 0 the following limit equality takes place:    lim P   p     1. n   n  P r o o f. Let us introduce (Bernoulli) RV k, which is the number of occurrences of an event А in kth trial (p + q = 1): k р

0 q

1 p

We have Ek = р, Vk = pq. Show that pq  1/4. Let f(p) = pq = p(1 – p), 0  p  1. Let us explore this function for maximum. We have: f (p) = 1 – 2р = 0 if p = 1/2, so (f (p) = - 2  f(p) is convex (up) function) p = 1/2 is maximum point, fmax (p) = 1/4, so how f(p) = 0 in the endpoints of [0, 1]. We have:  = 1 + … + n, where Vk = pq  1/4 are uniformly n bounded, the value 1   i from Chebyshev’s theorem is n i 1 n 1 n 1 1 n   k = /n,  E  i =  E  k = p, so the Bernoulli theorem n k 1 n i 1 n k 1 is the simplest special case of Chebyshev’s theorem. The meaning of the Bernoulli theorem is that the (relative) frequency of occurrence of an event A in n Bernoulli trials tends to the probability of occurrence of this event in each trial. 3. Central limit theorem is one of the most remarkable results in probability theory. It states that the sum of a large number of independent RVs has (under certain conditions) a distribution, which is approximately normal. Hence it not only provides a simple method for computing approximate probabilities for sums of independent RVs, but it also helps to explain the remarkable fact that the empirical frequencies of so many natural populations exhibit bellshaped (that is, normal) curves. In the simplest form, the central limit theorem is formulated as follows. T h e o r e m (Central Limit Theorem). Let 1, 2, …, n be a sequence of independent (in the aggregate) and identically distributed (i.i.d.) RVs with finite and nonzero variance. Suppose E1 = a, V1 = 2. Then the distribution of RV

Un 

( S n  1  ...   n )  na

 n

tends to the standard normal distribution when n  . It means that Un converges weakly (see Lecture 10) to the RV with N(0, 1):

Un  N(0, 1). P r o o f. We introduce the "standardized" RVs ζi = (ξi − a)/σ. Really, E(ζi) = E((ξi − a)/σ) = (Eξi – a)/σ = (a – a)/σ = 0; V(ζi) = V((ξi − a)/σ) = Vξi /σ2 = σ2/ σ2 = 1. Suppose Zn be their sum: Zn = ζ1 + … + ζn = (Sn – na)/σ. It is required to prove that Zn / n  N(0, 1). The characteristic function (see Lecture 10) of random variable Zn / n is

Z

n

/ n

(t )   Zn (t / n )  ( 1 (t / n )) n .

(11.3)

The characteristic function of the RV ζ1 can be decomposed in a Taylor series, in the coefficients of which we use the known moments Eζ1 = 0, Eζ12 = Dζ1 = 1. We obtain

  (t )  1 + itE1 1

t2 E 12 + o(t2). 2

We substitute this decomposition, taken at the point t / n , into equality (11.3) and let n go to infinity. Once again, we use the 2nd remarkable limit. We have n

Z

 t2  t 2  t 2 / / 2   if n  .   ( t )  (  ( t / n ))  1   o  e  n 1  2n  n       n

n

/

In the limit, we obtained the characteristic function of the standard normal distribution. By the continuous correspondence theorem

(see Lecture 10), we can conclude with respect to the following weak convergence: S  na Zn / n = n  N(0, 1).  n QUESTIONS FOR SELF-EXAMINATION 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

What is large numbers law in a broad sense? What is large numbers law in a narrow sense? Formulate Chebyshev’s inequality and corollary of it. Formulate Chebyshev’s theorem. What is the meaningful meaning of Chebyshev’s theorem? Where is Chebyshev’s theorem used? Formulate the Bernoulli theorem. What is the meaningful meaning of the Bernoulli theorem? Formulate the central limit theorem. What is the meaningful meaning of the central limit theorem?

Lecture content 1. Tasks of mathematical statistics. 2. Population and sample. 3. Histogram. 4. Sample mean and sample variance. 5. Sample correlation coefficient.

The main purpose of the probability theory is the study of available probabilistic model. Mathematical Statistics solves the inverse problem – the construction of probabilistic-statistical model based on the results of observations. 1. Tasks of mathematical statistics. The main task of mathematical statistics is getting the conclusions of mass phenomena and

processes according to observations or experiments on them. The content of mathematical statistics is the development of methods of statistical monitoring and analysis of statistical data. The starting material for the statistical study of a real phenomenon is the result set of observations over them or special tests. Consider some of the key questions that arise in this case. 1. E s t i m a t i o n o f u n k n o w n p r o b a b i l i t y o f a r a n d o m e v e n t. For example, when conducting multiple runs of experiments with the fair coin tossing, it was noticed that the relative frequency of the fall of heads is close to 0.5. Based on this, the probability of the fall of heads in a single throw was estimated. 2. D e t e r m i n i n g unknown distribution f u n c t i o n. As a result of n independent trials over RV , its values Х1, Х2, …, Хn have been obtained. It is necessary to determine, at least approximately, the unknown distribution function F(x) of RV . 3. D e t e r m i n i n g u n k n o w n p a r a m e t e r s o f d i s t r i b u t i o n. RV  has the distribution function of a certain type, which depends on k unknown parameters. It is necessary to estimate the value of these parameters, based on the sequence of observations. 4. S t a t i s t i c a l h y p o t h e s e s t e s t i n g. Based on some of the reasons, we can assume that the distribution function of RV  is F(x). The question is, whether the observed values of  are consistent with the hypothesis that the distribution function of  really is F(x)? N o t e. Modern Mathematical Statistics is defined as the science of decision-making under uncertainty. 2. Population and sample. D e f. All the set of values of RV under study is called the (general) population. D e f. The sequence X1, X2, ..., Xn of results of independent observations of RV under study is called a random sample of size n (n is the number of sample units). Equivalent d e f. The sequence X1, X2, ..., Xn of RVs which are independent and have the same distribution function F(x) (i.i.d.), is called a sample of size n from a population with a distribution F(x).

D e f. The sequence of sample values, arranged by increasing, is called a variation series. Suppose we have a sample of size n extracted from a population:

Z1 , Z1 ,..., Z1 ; Z 2 , Z 2 ,..., Z 2 ;...; Z k , Z k ,..., Z k (n1+ n2+…+ nk = n).          n1

n2

nk

D e f. Different sample values Z1, Z2, …, Zk are called a variants; the numbers n1, n2, …, nk – their frequencies (the sum of frequencies is equal to the sample size n); the numbers w1 = n1/n, w2 = n2/n, …, wk = nk/n – their relative frequencies (the sum of relative frequencies is equal to 1). D e f. The table Zi ni

Z1 n1

Z2 n2

… …

Zk nk

is called a statistical series. N o t e. Sometimes a table Zi wi

Z1 w1

Z2 w2

… …

Zk wk

is called a statistical series (or relative frequencies series). N o t e. The last table (in mathematical statistics) is the analogue of distribution series of some RV (in probability theory), where relative frequencies wi play the role of probabilities pi of values of RV. D e f. Function Fn(x) = m/n, where n is the sample size, m is the number of sample values Xi such that Xi ≤ х, is called an empirical distribution function. D e f. (a main concept of mathematical statistics). Let Х1, Х2, …, Хn be a sample. Any function of the sample values is called an estimator or statistics. N o t e. An empirical distribution function Fn(x) is an estimator (based on the sample) for the distribution function F(x) of the whole population (general distribution function). A difference between them is, that F(x) is the probability of an event {  x}, but Fn(x) is a relative frequency of the same event.

P r o p e r t i e s o f Fn(x) 1. 0  Fn(x)  1. 2. Fn(x) is a non-decreasing function; its graph is a step function. 3. If Z1 is a minimal variant, then Fn(x) = 0 when х < Z1; if Zk is a maximal variant, then Fn(x) = 1 when x  Zk. 3. Histogram. Let population be continuous with the density p(x). Then a histogram for estimation of p(x) is used. A construction of a histogram is as follows. Suppose we have some sample X1, X2, ..., Xn from population. 1. We divide the sample into intervals (for example, by the Sturgess formula, the number of intervals N  1+3.322lg n, where n is the sample size). 2. The length of the interval (if we use intervals of equal length) h = (Xmax - Xmin)/N, where Xmax (Xmin) is the maximal (minimal) sample value. 3. We plot the interval borders on the x axis: Xmin, Xmin+h; Xmin+2h; ...; Xmin+Nh = Xmax. 4. Over the intervals, we construct the rectangles such that the area of each rectangle will be equal to the relative frequency of distribution: Si = hli = i/n, where li is the height of ith rectangle, i is the frequency of sample values in ith interval. We have: li = i/(nh). 

N o t e.

 p ( x)dx = 1, so the total area of our rectangles

 N

S=

 h i /( nh) = 1.

i 1

4. Sample mean and sample variance. Suppose we have some sample Х1, Х2, …, Хn of size n from a population  with the distribution function F(x). D e f. Sample mean is the number

X =

1 n

(X =

1 n

n

 Xi

i 1

k

 Zi ni for statistical series). i 1

Show that the sample mean is an estimator for a population expectation. We have

X = Z1n1/n + Z2n2/n + …+ Zknk/n = Z1w1 + Z2w2 +…+ Zkwk. If n is large, then (by the Bernoulli theorem) the relative frequencies of the values Zi are close to their probabilities pi: wi  pi and we obtain: X  Z1p1 + Z2p2 +…+ Zkpk  E, where  is some RV which is represented by the sample. D e f. The number S2 = (S2 =

1 n

k

1 n

n

__

 ( X i  X )2 i 1

__

 ( Zi  X )2 ni for the statistical series) is called a samp-

i 1

le variance. Show that a sample variance is an estimator for a population variance. We have S2 = (Z1 - X )2n1/n + (Z2 - X )2n2/n + …+ (Zk - X )2nk/n = = (Z1 - X )2w1 + (Z2 - X )2w2 + …+ (Zk - X )2wk.

If n is large, then (by the Bernoulli theorem) the relative frequencies of values Zi are close to their probabilities pi: wi  pi and we obtain: S2  (Z1 - X )2p1 + (Z2 - X )2p2 + …+ (Zk - X )2pk  V, where  is some RV which is represented by the sample. The formula for calculating sample variance is S2 =

1 n

__

n

n

2  ( X i  X )2 = 1n  X i - 2 X i 1

i 1

2

2

2

1 n

n

n

 X i + ( X )2 1n 1 = i 1

2

i 1

2

= X -2( X ) + ( X ) = X -( X ) . D e f. The number S12 = 1 n1

(S12 =

1 n 1

n

__

 ( X i  X )2 i 1

__

k

 ( Zi  X )2 ni

for statistical series) is called the

i 1

unbiased sample variance or adjusted sample variance. D e f. A sample standard deviation (or standard error – s.e.) is the number S=

S 2 (or S1  S12 ).

5. Sample correlation coefficient. Suppose that in the course of some experiment two random variables  and  are observed. Then, in n independent repetitions of the experiment we get n pairs of observations (X1, Y1), (X2, Y2), …, (Xn, Yn) (Xi is the sample values of RV , Yi is the sample values of RV ). D e f. Sample covariance between RVs  and  is the number S,  =

1 n

n

n

n

n

i 1

i 1

i 1

i 1

( X i  X )( Yi  Y )  1n  X iYi  1n  X i 1n Yi =

XY  X Y .

Sample covariance is an estimator (by the sample) for general covariance of RVs  and . It has the same properties as it does. D e f. Sample correlation coefficient between RVs  and  is the number (S > 0, S > 0) r =

S  , S  S

,

where S, S are sample standard deviations of RVs  and  (respectively). Sample correlation coefficient is an estimator (by the sample) for general correlation coefficient of RVs  and . It has the same properties as it does.

QUESTIONS FOR SELF-EXAMINATION 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

What is population? (Random) sample? What is variation series? Variant? Statistical series? What is an estimator or statistics? Give the definition of empirical distribution function. What does an empirical distribution function estimate? What does a histogram estimate? Give the definition of sample mean. What does it estimate? Give the definition of sample variance. What does it estimate? Give the definition of unbiased sample variance. What does it estimate? Give the definition of sample covariance. What does it estimate? Give the definition of sample correlation coefficient. What does it estimate?

Lecture content 1. Point estimators: definition and basic properties. 2. Methods of point estimators obtaining – Moments method. 3. Methods of point estimators obtaining – Maximum likelihood method.

1. Point estimators: definition and basic properties. 

D e f. A Point estimator  n for the parameter     R of population is any function of sample values Х1, Х2, …, Хn: 



 n =  n (Х1, Х2, …, Хn). 

N o t e. Х1, Х2, …, Хn are random variables, therefore  n is a random variable as a function of a random variables. P r o p e r t i e s of point estimators 1. Unbiasedness. 

D e f . An estimator  n for the parameter  of population is called unbiased if 

E  n = . 

Otherwise, it is biased. If E  n >  then an estimator will ove

restimate the real value of the parameter, if E  n <  then an estimator will it mark down. The requirement of unbiasedness guarantees absence of regular mistakes at estimating. __

N o t e. The point estimators for an expectation X and variance S12 are unbiased, but an estimator for a variance S2 is biased. Show it.

__

n

E X = E( 1n  X i ) = i 1

1 n

n

 EX i =

(by definition of sample,

i 1

Х1, Х2, …, Хn are i.i.d., so EХi = E, i = 1, 2, …, n) =

1 n

n

 E =

i 1

= 1n nE = E. Similarly, we can prove unbiasedness of S12, i.е. the equality ES12 = V, immediately implies biasedness of S2: ES2 = E( nn1 S12) =

n 1 ES 2 = n 1 V 1 n n

 V.

2. Consistency. 

D e f. An estimator  n of the parameter  is consistent if it satisfies the law of large numbers, i.e. it converges on probability to estimated parameter: for any  > 0





lim P ˆn      1.

n

In the case of use of consistent estimators, the increase in the sample size is justified, since for this, significant mistakes become improbable at estimating. Therefore, only consistent estimates have a practical sense. __

N o t e. Sample mean X is a consistent estimator for general expectation E. 3. Efficiency. Suppose two unbiased estimators ˆn(1) , ˆn( 2) for the same parameter θ of population are given. We prefer to use the estimator with

the smaller variance. That is, if both estimators are unbiased, ˆn(1) is relatively more efficient than ˆn( 2) if V ˆn( 2) > V ˆn(1) . D e f. An estimator ˆn (from some estimators class G) of parameter  is called effective in this class, if for any other estimator ˆn * from G and any 

E( ˆn - )2  E( ˆn *- )2 ,

i.e., it has the smallest mean-square deviation from the estimated parameter in the class. N o t e. For unbiased estimators E( ˆn - )2 = E( ˆn - E ˆn )2 = = V ˆn . If the unbiased estimator is an effective estimator in the unbiased class, then such a statistics is usually called simply effective. 2. Methods of point estimators obtaining: Moments Method. The method of moments proposes to use instead of true moments of random variables their sample moments to find the unknown parameter estimator. D e f. The k-th initial true moment of RV  is Ek. The k-th central true moment of RV  is E( - E)k. D e f. The k-th initial sample moment of RV  is the number mk =

1 n

n

 X ik .

The k-th central sample moment of RV  is the

i 1

n

__

number ck = 1n  ( X i  X ) 2 . i 1

Moments method is as follows. Suppose that any true moment of RV  (for example, k-th moment) is a function of parameter. Then the parameter  may be some function of the true k-th moment. If we substitute in this

function the unknown true k-th moment by its sample analogue, then 

we obtain instead of parameter  its estimator  n . E x a m p l e. Find an estimator for parameter  of uniform distribution on the closed interval [0; ] by the moments method. S o l u t i o n. We have E =

 2

  = 2E.

Then we change true moment of our distribution (expectation E) by sample moment (sample mean X ) and obtain an estimator by the moments method 

 MM = 2 X . 3. Methods of point estimators obtaining: Maximum Likelihood Method. It is the basic method of obtaining parameter estimators by sample data, offered by R. Fisher. Suppose  is discrete RV represented by the sample Х1, Х2, …, Хn and its distribution is some function of unknown parameter  of the population: Р( = Хi) = P(Хi; ), i = 1, 2, …, n). D e f. The function of argument  L() = L(Х1, Х2, …, Хn; ) = P(Х1; )P(Х2; )… P(Хn; ) is called a likelihood function (for the sample Х1, Х2, …, Хn). The value of  in which likelihood function reaches the maximum is maximum likelihood estimator for the parameter  (usually 

denoted by  ML ):

ˆ ML = ˆ ML(X1, ..., Xn) = arg max L(X1, ..., Xn; ). 

N o t e. As function ln x (x > 0) increases monotonically, functions L and lnL reach their maximum at the same value of parameter , so in some cases instead of the maximum of function L it is possible to find the maximum of function lnL. E x a m p l e. When n = 100 Bernoulli trials we obtain m = 55 successes. It is necessary to estimate the parameter р (the probability of success in a single trial) using the maximal likelihood method. S o l u t i o n. There we have:  is р. Let us consider Bernoulli RV  – the number of success in one Bernoulli trial, which is a discrete RV with distribution  P

0 1-p

1 p

In this case, the sample Х1, Х2, …, Хn is a sequence of 0 and 1, where 1 occur m times, i.е. Х1 + Х2 + …+ Хn = m. For construction of L(), we express P{ = Хi} through р and Хi: P{ = Xi} = p X i (1  p )1 X i (if success occurs in the i-th trial, i.е. Xi = 1, then Р{ = 1} = р; if failure occurs in the i-th trial, i.е. Xi = 0, then P{ = 0} = 1 – р). Write down a likelihood function when  = р: L(Х1, …, Xn; р) = P{X = X1}… P{X = Xn} = p X1 (1  p )1 X1 … p X n (1  p )1 X n = = p X1 ... X n (1  p ) n ( X1 ... X n ) = pm(1 – p)n – m. Let us find logarithm of L: ln L = m ln p + (n – m) ln(1 – p).

Let us find the maximum of function ln L. We have: dln L/dp = m/p – (n – m)/(1–p) = 0 if p = m/n. Therefore, as the 2nd derivative of L is negative, a likelihood function reaches maximum at the point p = m/n and 

 ML = m/n = 55/100 = 0.55 is the required maximum likelihood estimator for the probability of success in a single trial. N o t e. If a population is continuous with the density p(x) = = p(x; ) then L() = p(Х1; ) p(Х2; )…p(Хn; ). So, the essence of maximum likelihood method is that the most probable parameter value is the value which maximizes the probability of obtaining the sample Х1, Х2, …, Хn in n experiments. This value of the parameter depends on the sample and is the required estimator.

QUESTIONS FOR SELF-EXAMINATION 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

What is a point estimator for the parameter of a population? Which point estimator is called unbiased? Biased? Which of estimators X , S2, S12 are biased? Unbiased? Which point estimator is called consistent? Which point estimator is called effective (in a given class)? What is k-th initial true moment of RV? k-th central true moment? What is k-th initial sample moment of RV? k-th central sample moment? What is the essence of the method of moments? What is the essence of maximum likelihood method? What is the likelihood function for a discrete population? For a continuous population?

Lecture content 1. Interval estimator. 2. Confidence interval for an expectation of normal population when population variance is known. 3. Confidence interval for an expectation of normal population when population variance is unknown.

1. Interval estimator. Suppose we have a sample X1, X2, …, Xn from population     R with distribution that depends on unknown parameter     R. D e f. Interval estimator for the unknown parameter  of population is the numeric interval ^

^

(  n(1) , n( 2) ),

(14.1)

which covers the unknown value of  with given probability 1-. N o t e. The boundaries of this interval are sample functions, so they are RVs (the parameter  is not RV!). D e f. The interval (14.1) is called 100(1-)%-confidence interval; the probability 1- is called a confidence probability. The length of confidence interval greatly depends on the sample size n (it decreases when n increases) and on the probability 1- (it increases when 1- increases). Often, the confidence interval is symmetric with respect to parameter : ( - ,  + ). 2. Confidence interval for an expectation of normal population when population variance is known. Suppose we have a sample X1, X2, …, Xn from a normal population with the density

р(х; ) =



1

 2

 x  2

e

2 2

, where E = a =  is an unknown __

n

parameter. Sample mean X = 1n  X i is the best (in a certain sense) i 1

unbiased estimator for the expectation E =  of the population by the sample of size n. Suppose the variance of population (and sample values Xi) is known: V = 2. __

__

Let us consider RV  - X =  (deviation of X from the para__

meter );  is the distance from the random variable X to a certain constant . Generally speaking, a deviation  may range from -  to + , therefore we are interested in the probability that it does not exceed the permissible level of error : __

Р( < ) = Р(-  < X -  < ). __

Since in this inequality only X is RV, this probability depends __

only on the distribution of X . Obviously, __

__

__

Р{ -  < X <  + } = Р{ X -  <  < X + }. __

In this way, if FX (x ) is the distribution function of RV X , then __

Р{ -  < X <  + } = F X ( + ) - F X ( - ). Let us find FX (x ) .

It is known that linear function of normal RVs is normal. There__

fore, X is normal RV, but normal RV is defined by two parameters: expectation and variance. So, we have (there we use the definition of sample and properties of expectation and variance) __

n

E X = E( 1n  X i ) = i 1 n

__

V X = V( 1n  X i ) = i 1

1 n2

1 n

n

 EX i = 1n nE = ,

i 1 n

VX i = i 1

1 n2

nV = 2/n.

Therefore (there we use formula of transformation of arbitrary normal distribution function into standard normal distribution function, i.e. Laplace function Ф(x), see Lecture 10) __

Р{ -  < X <  + } = FX (   ) - FX (   ) = = Ф(( +  - )/(/ n )) – Ф(( -  - )/(/ n )) = = Ф( n /) – Ф(- n /) = 1 – 2Ф(- n /). We have: __

__

__

Р{ -  < X <  + } = Р{ X -  <  < X + } = = 1 – 2Ф(- n /). __



^

__



^

By denoting X -  =  n(1) and X +  =  n( 2) , we obtain the 

^

^



interval (  n( 1 ) , n( 2 ) ), which covers the unknown value of  with the probability 1-2Ф(-  n /), i.е. confidence interval which covers the unknown value of parameter  with the confidence probability 1-  = 1 - 2Ф(- n /).

So,  = 2Ф(- n /), i.е. Ф(- n /) = /2. By denoting u

=  n /, we obtain: Ф(-u) = /2 and

 = u  / n . __

Therefore, the interval (n is the sample size; X is sample mean; 2 is population variance; u is such an argument of Laplace function Ф(х), where Ф(-u) = /2) __

__

X - u  / n <  < X + u  / n , is the the 100(1-)%-confidence interval for the true content of an expectation E =  of a normal population if population variance 2 is known. 3. Confidence interval for an expectation of a normal population when population variance is unknown. If the variance of normal population is unknown then for the interval estimation of general population E = , the following confidence interval is used: Here t = t(n-1) is so called Student coefficient (we find it in the special tables when sample size n and confidence probability  are given). QUESTIONS FOR SELF-EXAMINATION 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

What is an interval estimator for an unknown parameter? Are the boundaries of the confidence interval RVs? Why? How does the value of the confidence interval depend on the sample size? How does the value of the confidence interval depend on the confidence probability? What is the best point estimator for population expectation by the sample of size n? What distribution does linear combination of normal RVs have? What is an expectation of the sample mean? What is a variance of the sample mean? What form does the confidence interval for an expectation of a normal population with a known variance have? What form does the confidence interval for an expectation of a normal population with an unknown variance have?

Lecture content 1. Types of statistical hypotheses. 2. Scheme of testing hypotheses about unknown parameters of a known general distribution. 3. Scheme of testing hypotheses about an unknown general distribution.

1. Types of statistical hypotheses. A statistical hypothesis, sometimes called confirmatory data analysis, is the hypothesis that is testable based on observing a process, which is modelled via a set of RVs. There are two main types of a statistical hypothesis: 1. A hypothesis about an unknown distribution of population. 2. A hypothesis about an unknown value of the parameter (when the distribution of a population is known). N o t e. Other types of hypotheses also exist. D e f. The hypothesis about an unknown general distribution or unknown parameters of known general distributions is called statistical hypotheses. E x a m p l e s: – The population has a Poisson distribution. – The variance of a normal population is equal to 1. Hypotheses testing and statistical inference based on them are one of the central problems of mathematical and applied statistics. 2. Scheme of testing hypotheses about an unknown parameters of a known general distribution. Let X1, X2, …, Xn be a sample from a population  with the distribution function ( is an unknown parameter) F(х) = F(х; ),     Rm. Two hypotheses are put forward with respect to parameter : Н0:   Z0 and

Н1:   Z1, where Z0  , Z1   are some given sets. The hypothesis Н0 is called a null-hypothesis, the hypothesis Н1 – alternative hypothesis. If there is no alternative then Z1 =  \ Z0 (i.е. Н1 = not Н0). D e f. A statistical test, or simply, test is any procedure based on observations X1, X2, ..., Xn, the result which is one of two possible solutions: 1) not to reject (accept) the null hypothesis Н0; 2) reject the null hypothesis H0 in favour of the alternative hypothesis Н1. Since the test uses a random sample X1, X2, ..., Xn, then, of course, there may be wrong decisions. In this regard, there are two test failures: – I type error: the null hypothesis is rejected when it is true; – II type error: the null hypothesis is accepted when the alternative hypothesis is true. The probability of the I type error a = Р(Н1Н0) is called a significance of the test. The probability of the II type error  = Р(Н0Н1), the value 1- is called a power of the test. Obviously, when building a test, it is natural to seek to reduce these errors, but it is impossible to minimize them simultaneously. With a fixed sample size, only one of the values  or  can be made arbitrarily small, that is associated with increasing of the other one. Only with an increase in the sample size is it possible to simultaneously reduce  and . Therefore, usually the following is done: the test significance is fixed and an attempt is made to find such a test that has the maximum power (exactly here the asymmetry of the hypotheses manifests itself, dividing them into the null and alternative ones). In practice, the following approach is often used to build tests. Suppose that you can find such statistics tn = tn(X1, X2, ..., Xn), that if the hypothesis H0 is true, then the distribution of the random variable tn is known (for example, tabulated). Then, for a given value  of the I type error, one can find a region K such that P(tn  K) = 1 -  (the

probability is calculated under the assumption that the null hypothesis is true). In this case, the test is defined as follows: 1) on the basis of observations X1, X2, ..., Xn, the value of statistics tn is calculated; 2) for a given significance level  the area K is found; 3) if tn  K, then H0 is not rejected (accepted); if tn  K,, then H0 is rejected in favour of H1. Statistics tn is called critical statistics, area K - a critical area. Thus, the set of possible values of critical statistics is divided into two disjoint subsets: the critical region (the hypothesis rejection region) and the region of allowable values (the hypothesis acceptance region). If the value of the critical statistics falls into the critical region, then the null hypothesis is rejected. 2. Scheme of testing hypotheses about an unknown general distribution. Suppose sample X1, X2, …, Xn is given and based on it we make assumptions about the distribution law of a general population  (ex.: Binomial, Poissonian, uniform, exponential, normal etc.): Н0: Х has a distribution F(x); Н1: A competing hypothesis. General scheme of applying the test is as follows: ~ – We choose some non-negative measure D of the deviation of empirical (experimental) distribution law from theoretical one: ~ ~ ~ D  D ( F ( x); F ( x)) , where F ( x ) is an empirical distribution, F(x) is

~

a theoretical one. Note that D is RV. – Suppose α  (0; 1). Let us find value µ such that ~ P{D  }   . – The decision rule is as follows. If for a given sample, D > µ, then we reject Н0; if D < µ, then experimental data are consistent with the hypothesis Н0 (µ is a test threshold – critical point,  is a significant level: P(H1/H0) = ). The difference is explained by different criteria of selection of

~

the measures D .

Let us consider the most reasonable and the most commonly used in practice criterion – The criterion 2 (chi-square) of Pearson (1900) for the case when the distribution parameters are known. R. Fisher (1924) clarified this criterion, when the distribution of the parameters are unknown and identified by the sample. Scheme of verification of criterion. Let F(x) be a theoretical distribution function of population  (hypothesis Н0). 1. We make a table (Xi are different sample values, ni are their frequencies, pi = P{X = Xi} are their theoretical probabilities, k

n i 1

i

 n,  p i  1) : Xi ni pi

X1 n1 p1

X2 n2 p2

… … …

… … …

Xi ni pi

Xk nk pk

2. For the continuous random variable  we divide the set of its values into k disjoint intervals and calculate the probability that  is in each interval (the interval is chosen such that the number of sample elements in it will be not too small, for example, npi  7) n1

n2

1



n3 3 3

ni

2

a1

a2

nk i ш

a3

ai-1

k

ai

ak-1

p1 = F(a1); pi = F(ai) - F(ai-1), ni are the frequencies of falling k

k

i 1

i 1

into the i-th interval (i = 2, …, k),  pi  1,  ni  n. If the hypothesis Н0 is true then relative frequencies pi* = ni/n (when a sample is large) are close to pi: ni/n  pi  ni  npi. Pearson proposed RV as a measure of deviation 2

k k (n  npi ) 2 n n ~  D    i  pi    i npi  i 1 i 1 pi  n

This RV has distribution 2(r), where r is the number of freedom degrees. About the number of freedom degrees r. Arguments of 2statistics are frequencies n1, …, nk (n1 +…+ nk = n). By the Pearson theorem the number of freedom degrees (in the case, when all parameters of distribution are known) is equal to r = k-1. However, Fisher in his theorem proved that if l is the number of unknown parameters then an RV 2 has the number of freedom degrees r = k – 1 - l (the number of freedom degrees is reduced because additional conditions are imposed on the frequencies). Rule of hypothesis verification by 2. 1. Choose the significance level . 2. A test threshold (critical point) is equal to  =  12 (r ) (quantile of the distribution 2), where r = k - 1- l. 3. Calculate the theoretical probabilities pi. 4. Find frequencies ni. 5. Construct the table №  - the

ni

pi

boundaries of interval

npi (the (ni  npi ) 2 theoretical npi frequencies)

1 … k k

 ni  n

i 1

k

 pi  1

i 1

k

 npi  n

i 1

k

 2 sample   i 1

(ni  npi ) 2 npi

Decision-making criterion: – If  2 sample   , then we reject H0; – If  2 sample   12 (r ) , then we accept H0 at significance level .

N o t e. If in some intervals npi < 3, then it is necessary to combine adjacent intervals. n  npi E x p l a n a t i o n: Random variable U i  i , i = 1, 2, …, npi r has a distribution which is closed to standard normal if and only if npi  5. The 2 – distribution with r freedom degrees is the distribution of a sum of squares of r independent standard normal random variables:  2 (r )  U12  U 22  ...  U r2 . QUESTIONS FOR SELF-EXAMINATION 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

What is a statistical hypothesis? What is a statistical test? What is a null-hypothesis? Alternative hypothesis? What is a type I error? What is the test significance? What is a type II error? What is the test power? In general, how is the test for checking hypotheses about unknown general distributions determined? (three points) What is the most commonly used in practice test for checking hypotheses about unknown general distributions? What is the criteria of decision-making in the case of 2-test? If (when performing 2-test) in some intervals npi < 3, then what are we doing? Why? What is 2-distribution?

1. How many distinct permutations can be made of the letters of the word INFINITY? 2. How many three-digit numbers can be formed from the digits 0, 1, 2, 3, 4, 5, and 6 if each digit can be used only once? 3. Define the sample space (that is, list all the elements) for each example below. a) The set of integers between 1 and 60 divisible by 7 (without remainder); b) The set Ω = {x | x2 + 4x – 5 = 0}; c) The set of outcomes when a coin is tossed until one tail or three heads appear; d) All points in the first quadrant inside a circle of radius 3 with the center at the origin. 4. There are 6 white and 4 black balls in an urn. One ball is randomly taken out of the urn. Find the probability that it is: a) white;

b) black; c) green; d) white or black. D i r e c t i v e. Here and hereinafter in this lesson use classical definition of probability P(A) = m/n (n is the number of all equally likely outcomes of random experiment, m is the number of favorable outcomes). 5. a) b) c) d) e)

4 fair coins are tossed. Find the probability that: exactly one tail occur; at least one tail occur; Tails occur on the 1st coin; exactly two tails occur; at most one tail occurs.

6. a) b) c) d) e) f) g)

Two dice are tossed. Find the probability that: a“6” occurs only on the 1st dice; at least one “6” occurs; the sum of the dropped points is equal to 10; the sum of the dropped points is less than or equal to 10; the sum of the dropped points is greater than 10; the sum of the dropped points is an even number; the product of the dropped points is an odd number.

7. a) b) c) d) e) f)

Three dice are tossed. Find the probability that: a“6” occurs only on the 1st dice; only one “6” occur; at least one “6” occurs; the sum of the dropped points is equal to 16; the sum of the dropped points is less than or equal to 16; the sum of the dropped points is an even number.

8. An experiment involves choosing an integer N between 0 and 9 (inclusive). Find the probabilities of the events: A = {N  5}; B = {3  N  7}; C = {N is even and N > 0}. 9. The wardrobe attendant simultaneously issued checks for four persons who handed over their hats to the wardrobe. Then he mixed up all the hats and hung them at random. Find the probability that: a) everyone will get their own hats; b) only three persons will get their own hats; c) only two persons will get their own hats; d) only one person will get his own hat; e) nobody will get his own hat.

1. There are 4 white and 6 black balls in an urn. Two balls are randomly selected (without replacement). Find the probability that they: a) are both white; b) are both black; c) are of different color; d) are of the same color. 2. There are 2 white, 4 black and 5 red balls in an urn. Three balls are randomly selected (without replacement). Find the probability that: a) they are all white; b) they are all black; 105

c) d) e) f)

they are of different color; they are of the same color; there are 2 white and 1 red ball among them; there is only 1 red ball among them.

3. There are 2 defective details in the party of 10 details. Three details are randomly selected (without replacement). Find the probability that: a) there is only 1 defective detail among them; b) there is at least 1 defective detail among them; c) they are all quality details; d) they are all defective details. 4. There is a deck of 36 playing cards. Four cards are randomly selected (without replacement). Find the probability that: a) all cards are Aces; b) only 2 cards are Aces; c) there is the Ace of spades among them; d) all cards are of the same suit; e) all cards are of the different suit; f) they are: 1 Ace and 3 Queens. 5. There is a deck of 52 playing cards. Three cards are randomly selected (without replacement). Find the probability that: a) they are: “3”, “7”, Ace (without regard to the order); b) there is only one Queen among them; c) there are no Queens among them; d) all cards are of the same suit; e) all cards are of the different suit. 6. Among 100 lottery tickets there are 5 winning tickets. Three tickets were bought. Find the probability that: a) all of them are winning; b) only one of them is winning; c) all of them are non-winning.

7. Let us consider lotto 5 of 36. There are 36 numbers on the lottery card (from 1 to 36). You must cross out 5 arbitrary numbers. If you have guessed all the numbers (without taking into account the order), then you get the maximum winning. If you have guessed only 4 numbers, you get the average winning. If you have guessed only 3 numbers - the minimum winning. Find the probability that you get: a) the maximum winning; b) the average winning; c) the minimum winning. 8. There is a code on the door of an entrance. The door will open if you press the digits of the code at the same time. Find the probability that you will guess (on the first try): a) the 3-digit code; b) the 4-digit code; c) the 5-digit code; d) the 6-digit code; e) the 7-digit code. How many digits should be there in the code to make it the hardest to guess? 9. A house has a 3-digit number. Find the probability that all digits of the number are: a) different; b) even; c) the same. 10. Find the probability that the 3 last digits of a phone-number are: a) different; b) odd; c) the same.

107

1. There are 5 letters: E, M, O, R, T. Find the probability that, with a random arrangement of letters in a row, we will get the word «METRO». 2. Three letters are randomly selected from the letters A, B, C, D, E. Then, they are arranged in a row. Find the probability that we get the word «CAB». 3. The child rearranges the letters A, A, M, M in a random order. Find the probability that he will get the word «МАМА». 4. Ruslan, Mary, and 10 other people are in the queue. Find the probability that there are 3 people between Ruslan and Mary. 5. N (distinguished) balls are randomly distributed by k (distinguished) boxes. A few balls (or even all balls) can get into the same box. Find the probability that there will be N1 balls in the 1st box; N2 - in the 2nd box; …; Nk - in the kth box (N1 + N2 +… + Nk = N). 6. Ten (distinguished) balls are randomly distributed by 6 (distinguished) boxes. A few balls (or even all balls) can get into the same box. Find the probability that: a) all the balls are in the 1st box; b) all the balls are in the same box; c) there are 5 balls in the 1st box; 3 – in the 2nd; 2 – in the 3rd box; d) balls are distributed uniformly by 5 boxes. 7. Find the probability that 12 randomly selected people: a) have birthdays in different months; b) at least 2 of them have birthdays in the same month.

8. Find the probability that the birthdays of 6 people will fall in exactly 2 months. 9. There’s an elevator that accesses n+1 floors, with k people in it. It starts on the first floor. Any person can go out on any floor (starting from the second one). Find the probability that: a) all persons go out on different floors; b) all persons go out on the same floor; c) all persons go out on the 2nd floor; d) at least 2 people go out on the same floor. 10. From a population of n elements, a sample of size r is taken. Find the probability that none of N prescribed elements will be included in the sample, assuming the sampling to be: a) without replacement; b) with replacement. Compare the numerical values for the two methods when (i) n = 100, r = N = 3; (ii) n = 100, r = N =10.

1. A dice is tossed. Find the probability that a“2” occurs, given that: a) even number occurs; b) odd number occurs; c) prime (i.e. 1, 2, 3 or 5) number occurs. 2. Two dice are tossed. Find the probability that two “6” occurred, given that: a) there was a sum of points, which is a multiple of 6; b) there was a sum of points, which is a multiple of 3. 109

3. There are 5 red, 4 blue and 3 green balls in an urn. Two balls are taken out (without replacement). Find the probability that they have: a) different colour; b) the same colour, given that there are no blue balls among them. 4. There are 100 cards with the numbers 00, 01, …, 98, 99. One card is randomly selected. Find the probability that the sum of digits on this card is equal to i (i = 0, 1, …, 18), given that the product of digits on this card is equal to zero. 5. Given that a throw with ten dice produced at least one "6", what is the probability of two or more "6"? 6. Dice and coin are (simultaneously) tossed. Check the independence of events: A = {heads occur}; B = {even numbers of points occur}. 7. Three coins are tossed. Check the independence of events: A = {heads occur on the 1st coin}; B = {at least one tail occurs}. 8. We have a deck of 36 playing cards. One card is randomly selected. Check the independence of events: A = {it is a Queen} B = {it is a card of spades}. a) We add to the pack one “5” of spades. Check the independence of A and B in this case. b) We add to the pack four “5” of all suits. Check the independence of A and B in this case.

9. Let the events A1, A2, …, An be independent. Find the probability that none of the events occurs. 10. A coin is tossed twice. Find the probability of getting both heads.

1. Two brothers play in two different football teams (there are 11 players in each team). Find the probability that they both have: a) #7; b) the same number. 2. A probability of hitting the target for the 1st shooter is 0.4, for the 2nd - 0.5, for the 3rd one – 0.6. The shooters shot simultaneously. Find the probability that: a) only the 1st shooter hit the target; b) only one shooter hit the target; c) at least one shooter hit the target. 3. The probability of hitting a ship is 3/4 (for one torpedo). To sink a ship, one torpedo hit is enough. Four torpedoes were simultaneously fired into the ship. Find the probability that the ship sunk. 4. a) b) c) d) e) f)

Three dice are tossed. Find the probability that: a“6” occurred on each die; a“6” occurred only on the 1st die; only one “6” occurred; at least one “6” occurred; the same number of points occurred on each die; a different number of points occurred.

111

5. There are 2 white, 3 black balls in the 1st urn and 4 white, 5 black – in the 2nd one. One ball is taken out of each urn. Find the probability that: a) only one of them is white; b) at least one of them is white; c) at most one of them is white. 6. Four cards are taken out of the pack of 36 playing cards. Find the probability that: a) they are all Aces; b) only one of them is an Ace; c) at least one of them is an Ace; d) they all are of different suits; e) they all are of the same suit. 7. Find the probability that randomly selected 2-digital number is a multiple of 3 or 5. 8. The student knows 20 questions out of 30. The order of passing the exam: if he answers the 1st questions then he passes; if he does not answer the 1st questions but answers the 2nd ones then he passes too. In other cases, he does not pass the exam. Find the probability that he passes the exam. 9. There are 4 white and 5 black balls in a box. Two players, one by one, pull one ball out of the box (without replacement). The winner is the player who is the first to extract the white ball. Find the probability that the winner is a. the 1st player (who began this game); b. the 2nd player. Solve the problem when a choice is made with replacement. 10. The wardrobe attendant simultaneously issued checks for three persons who handed over their hats to the wardrobe. Then he mixed up all the hats and hung them at random. Find the probability that:

a) b) c) d)

everyone will get their own hats; only two persons will get their own hats; only one persons will get his own hat; nobody will get his own hat.

1. The 1st box contains 3 white and 2 black balls, the 2nd – 4 white and 5 black ones. One box is randomly selected and one ball is taken out of it. Find the probability that it is white. 2. There are 3 boxes: the ith box contains ai white and bi black balls, i = 1, 2, 3. One box is randomly selected and one ball is taken out of it. Find the probability that it is white. 3. There were 2 white, 4 black balls in the 1st urn and 5 white, 3 black balls in the 2nd one. We randomly removed two balls from the 1st urn into the 2nd one. Then we randomly extracted three balls from the 2nd urn. Find the probability that there are 2 white balls among them. 4. There are 10 machines of brand A, 5 – of brand B and 3 – of brand C in a workshop. The probability of defect is: 3% for the machines of brand A, 5% - of brand B and 6% - of brand C. One detail is randomly selected. Find the probability that it is defective. 5. There were m white and n black balls in an urn. Then one ball was lost. After that, one ball is taken out of the urn. Find the probability that it is white. What is this probability, if two balls will be lost? Three balls? 6. In blood transfusion, it is necessary to take into account groups of blood of the donor and patient. It is possible to transfuse 113

the blood of any group to a patient with the 4th blood group; the blood of the 1st, 2nd (3rd) group - to a patient with the 2nd (3rd) blood group; the blood of 1st group - to a patient with the 1st blood group. 33.7% of people have the 1st blood group; 37.5% – the 2nd; 20.9% – the 3rd; the rest of the population – the 4th. Find the probability that it is possible to transfuse the blood of a randomly taken donor to a randomly taken patient. Solve this problem for the case when there are two donors. 7. 1st machine produces 40% of all details, the 2nd machine – 35% and the 3rd one – the remaining quantity of details. Defective details for these machines make 2%, 3% and 4% (respectively). One detail was randomly selected. It was defective. Find the probability that it was produced by the 1st machine. 8. The probability of hitting the target is 0.3 for the 1st shooter and 0.8 for the 2nd one. The shooters shot simultaneously. After shooting, only one hit of the target was fixed. Find the probability that the 1st shooter hit the target. Solve this problem for the case when there are three shooters, and the probability of hitting the target for the 3rd shooter is 0.6. 9. We assume that 5% of all men and 0.25% of all women are colorblind. Randomly selected person is colorblind. Find the probability that this person is a man. 10. Some manufacturing company employs three analytical plans for the design and development of some product. For cost reasons, all three plans are used at varying times. In fact, plans 1, 2, and 3 are used for 30%, 20%, and 50% of the products (respectively). The defect rate is different for the three procedures as follows: P(D/P1) = 0.01, P(D/P2) = 0.03, P(D/P3) = 0.02, where P(D/Pj) is the probability of a defective product, given plan j was used (j = 1, 2, 3). If randomly selected product was defective, then which plan was (most likely) used?

1. Six Bernoulli trials were conducted. The probability of occurrence of some event in each trial is 1/5. Find the probability that the event occurred at most 2 times. 2. Defect is 5%. Ten details are randomly selected. Find the probability that among them there are: a) exactly 2 defective details; b) at least one defective detail. 3. There are 5 children in some family. Find the probability that among them there are from 2 to 4 boys (the probability of a boy birth is 0.5). 4. There are (in average) 6 snowy days in some city in December. Five random days (in December) are given. Find the probability that at least 3 of them are snowy days. 5. What is more probable to win (the player and the rival are equipotent, no draw): 2 games out of 4; 3 out of 6 or 4 out of 8? 6. In a test consisting of 10 questions, 5 answers are proposed for each question (only one of them is correct). Find the probability to guess the answers to at least 5 out of 10 questions. 7. A fair coin is tossed 11 times. Find the most probable number of obtaining tails and its probability. 8. The automatic telephone exchange serves 1000 subscribers. The probability that one subscriber will call within one hour is 0.001. Find the probability that 5 subscribers will call within one hour. 9. A book of 1000 pages contains 5 misprints. Estimate the chances that a given page contains at least 2 misprints. 115

10. If there is (on the average) 1 per cent left-handers, estimate the chances of having at least 4 left-handers among 200 people.

1. Distribution series of random variables (RV)  is given:  Р

a) b) c) d) e)

-2 0.1

-1 0.2

0 0.4

1 0.1

2 x

Find x; find P{-1   < 2}; find P{    1}; find P{ ≥ 0}; construct distribution function F(x) (with a graph).

2. A coin is tossed 4 times. RV  is the number of occurrences of heads. Find: a) distribution series of RV ; b) probability that the number of heads will be less than 3; c) distribution function of  (with a graph). 3. Two independent shots on the target were fired. The probability of hitting the target (in one shot) is 0.8. RV  is the number of hits of the target. Find the distribution series of RV . 4. Three independent throws of the ball into the basket were made. The probability of a successful throw is 0.8. RV  is the number of successful throws. Find the distribution series of RV . 5. There are 3 white and 6 black balls in an urn. Three balls are randomly selected (without replacement). RV  is the number of white balls among them. Find the distribution series of RV .

6. Solve #5 in the case of choice with replacement. 7. Two shooters shoot at the target. The probabilities of hitting the target are 0.3 and 0.9 (respectively). Each makes two shots. RV  is the number of hits of the target for 1st shooter, RV  - for 2nd one. Find distribution series of  + . 8. Find distribution series of  from #7. 9. Two dice are tossed. RV  is the number of points occurred on the 1st dice, RV  - on the 2nd one. Find distribution series of  / . 10. Find the distribution series for    and 2 + 1, where  is RV from #1.

1. Two independent RVs  and  are given. We have: E = 2, E = -3, V = 1, V = 16. Find: a. E(4 - 2); b. V(4 - 2); c. E(4 - 2)2. 2. Distribution series of RV  is given:  Р

-1 0.2

0 0.1

1 0.4

Find: a) E(3 - 4); 117

2 0.1

4 0.2

b) V(3 - 4); c) 3 - 4. 3. Find an expectation and a variance of determinant    = 11 12 , where ij are independent RVs with Eij = 0, Vij = 2.  21  22 4. Three independent shots were fired at the target. The probability of hitting the target is 0,9 (in one shot). RV  is the number of hits of the target. Find: a) the probability that   (0,5; 2]; b) mean value of ; c) variance of ; d) standard deviation of . 5. There are 2 false coins among 10 coins in an urn. Three coins are randomly selected (without replacement). RV  is the number of false coins among the selected ones. Find the mean number of false coins among the selected ones. 6. Solve #4 for the case of choice with replacement. 7. Defect is 1%. Four details are randomly selected. Find a variance of the number of defective details among the selected ones. 8. There are 2 red and 2 blue balls in an urn. Three balls are randomly selected (with replacement). RV  is the number of red balls among the selected ones. Find a standard deviation of . 9. Solve #8 for the case of choice without replacement. 10. Two dice are tossed. Find an expectation and variance of the sum of points occurred on them.

1. Joint distribution of RVs  and  is given:

a) b) c) d)

 (yj)  (xi)

-1

0

1

-2 4

1/12 1/3

1/4 1/12

1/6 x

Find x; write down one-dimensional distributions of  and ; check independence of  and ; find E, V, E, V (use formulas E =  xi pij , i, j

E =

 y j pij , E

2

i, j

=



xi2 pij

, E =



y 2j pij

);

i, j

i, j

e) find Cov(, ); f) find correlation coefficient (, ); g) make a conclusion about the direction and strength of stochastic linear relation between  and .

2. Joint distribution of RVs  and  is given 

 -1 4

1 0.1 0.2

2 0.2 0.1

4 0.3 0.1

a) Find joint distribution of RVs + and ; b) write down one-dimensional distributions of + and ; c) check independence of + and . d) If + and  are independent then (, ) = ? e) If + and  are dependent then find (, ). f) Make a conclusion about the direction and strength of stochastic linear relation between + and . 119

1. Function is given: if x  0

0

(х) = 

x

cxe if x  0.

At what value of parameter c this function will be a distribution density of some continuous RV? 2. Distribution function of continuous RV  is given: 0 if x  0  F ( x )   x 2 if 0  x  1 1 if x  1 

Find: a) distribution density p(x); b) P{  (0.5; 1.5)}; c) E; d) V; e) . 3. Distribution function of continuous RV  is given: 0 if x  2  F ( x )  ( x  2 )2 if 2  x  3 1 if x  3 

Find: 1. distribution density p(x); 2. P{  (1.5; 2.5)};

3. E; 4. V; 5. . 4. Uniform (on segment) distribution. Distribution density of continuous RV  is given: 1 / 2 if 1  x  3, р(х) =  0 if x  1, x  3.

Find: a) distribution function; b) P{  (0.5; 1.5)}; c) E; d) V; e) . 5. Distribution density of RV  is given: р(x) = 3(х+3)2 if х [-3; -2] and р(x) = 0 if х  [-3;-2]. Find the distribution function.

1. Given density function of RV 

0.5 if x [c; 3] p(х) =  0 if x  [c; 3] a) Find unknown constant c; b) Construct distribution function (with graph); c) Find an expectation of ; 121

d) Find a variance of ; e) Find a standard deviation of ; f) What is the name of this distribution? 2. lation: a) b) c)

Random variable   U[a, b]. Derive the formula for calcu-

3. lation: a) b) c)

Random variable   Exp(). Derive the formula for calcu-

F() (with graph); E; V.

F() (with graph); E; V.

4. Random variable   N(0, 1). Derive the formula for calculation: a) F(x) (with a graph); b) E.

1. The length of manufactured products is RV with an expectation 90 cm and variance 0.0225. By using Chebyshev’s inequality, estimate the probability that: a) the deviation of the length of the product from its expectation does not exceed 0.4 (in absolute value); b) the length is between 89.7 and 90.3.

2. The device consists of 10 independently working items. The probability of failure of each item in time t is 0.05. By using Chebyshev’s inequality, estimate the probability that the absolute value of difference between the number of failed items and average failure number in time t is less than 2. 3. Discrete RV is given:  Р

0.3 0.2

0.6 0.8

By using Chebyshev’s inequality, estimate the probability that  - E < 0.2. 4. The sequence of independent RVs 1, 2, …, n is given. RV n (n = 1, 2, …) takes only three values  n ; 0; n with the probabilities 1/n, 1 – 2/n, 1/n. Does Chebyshev’s theorem apply to this sequence? 5. The sequence of independent RVs 1, 2, …, n, … is given: n Р

-A (n+1)/(2n +1)

A n/(2n +1)

Does Chebyshev’s theorem apply to this sequence? 6. A fair dice is tossed independently 600 times. Give an approximation (by the central limit theorem – CLT) for the probability that the number falls of 6 is between 95 and 110. 7. Some persons in the quantity of 1000 arrive to the left or right entrance of the theatre independently (they choose between the entrances with probability 0.5). How many hangers is it necessary to place into the left and right cloakrooms, if the administration of the theatre wants to give only a 1 percent chance to the event that somebody cannot place his/her coat in the nearest cloakroom. (Each person has a coat). 123

8. 100 numbers have been randomly chosen on the segment [0; 1], more precisely, 100 independent RVs 1, 2, …, n uniformly distributed on the segment [0; 1] are considered. Find the probability that their sum is between 51 and 60, i.e. P(51 ≤ i ≤ 60).

1. Sample is given: 20, 20, 20, 80, 50, 50, 50, 60, 50, 80, 20, 60, 60, 20, 60, 60. a) Construct a variation series; b) Construct a statistical series; c) Construct an empirical distribution function (with a graph); d) Calculate a sample mean; e) Calculate a sample variance; f) Calculate an unbiased estimator for the variance; g) Find a standard error; h) Construct a histogram (using the Sturgess formula to find the number of (equidistant) intervals). 2. The results of simultaneous observations of RV  and  are given: : Xi : Yi

Draw a scatterplot. Find: a) sample mean X ; b) sample mean Y ; c) sample variance S2 ; d) sample variance S2 ;

8 2

5 3

5 4

6 4

4 8

e) sample covariance S; f) sample correlation coefficient r. Make a conclusion about the direction and strength of stochastic linear relation between  and . 3. Two-dimensional sample of observations over two RVs  and  is given: (1,2), (1,3),(1,2),(1,2),(2,1),(2,3),(2,3),(2,3), (3,1), (3,2),(3,3),(3,3), (3,3),(3,3),(4,2),(4,2). a) Construct a table of frequencies; b) Find a sample correlation coefficient between RVs  and ; c) Interpret the value of sample correlation coefficient.

1. A sample from normal distribution with the parameters a  R and 2 > 0 is given. By the moments method construct an estimator for: a) unknown mean value a; b) unknown variance 2 if mean value a is known; c) two-dimensional parameter (a, 2). 2. Find an estimator for parameter  of the uniform distribution on the interval (-1, +1) by the moments method. 3. Find an estimator for the parameter p (0 < p < 1) of Bernoulli distribution by the moments method. 4. Find an estimator for the parameter  > 0 of distribution with the density p(y) = y-1 (y  [0; ]) by the maximal likelihood (ML) method. 125

5. Find an estimator for the parameters a and 2 of normal distribution by the ML method. 6. Let X1, ..., Xn be a sample size of n from uniform distribution on (; +a), where   R, a is known. Find ML-estimator for . 7. Let X1, ..., Xn be a sample size of n from distribution with the 3  density p(x) = 2 x e 2

x  

2

4

2

. Find ML-estimator for .

8. Let X1, ..., Xn be a sample size of n from the Poisson distribution with the parameter . Find ML-estimator for . 9. For a long time, when analyzing some material, a standard deviation of 0.12% was determined for the iron content. Find with a confidence level of 0.95 the confidence interval for the true iron content in the sample, if according to the results of 6 analyzes the average iron content was 32.56%. 10. A sample is extracted from the population: -2; 1; 5; 4; 4; -2; 2; 2; 3; 3. Estimate with confidence probability 0.9 an expectation of normally distributed characteristic of the general population by the sample mean, using the confidence interval.

IWS №1 Solve the tasks №№ 1 – 5 (about unknown letters N, m, etc, see “VARIANTS”). Problem 1 Two dice are tossed. Find the probability that: a) the sum of points is less than or equal to N; b) the product of points is less than or equal to N; c) the product of points is a multiple of N; d) a“6” occurs at least on one of the dice. Problem 2 There are 4 varieties of items: ni items of the ith variety (i = 1, 2, 3, 4). We randomly select (without replacement) m items for the control. Find the probability that there are mi items of the ith variety 4

(i = 1, 2, 3, 4,

 mi  m ) among them. Solve this problem for the i 1

case of the choice with replacement.

Problem 3 There are k winning lottery tickets among n lottery tickets. We bought m tickets. Find the probability that there are l winning lottery tickets among them. Problem 4 There is an elevator that accesses k floors, with n people in it. It starts on the first floor. Any person can go out on any floor (starting from the second one). Find the probability that: a) all persons go out on different floors; b) all persons go out on the same floor; c) all persons go out on 2nd floor; d) at least 2 people go out on the same floor. Problem 5 We throw a point on the segment of unit length (at random). Find the probability that the distance from the point to both ends of the segment exceeds the value of 1/k. VARIANTS Problem №  1

2

3

4

5

Variant № 

N

n1 n2 n3 n4 m1 m2 m3 m4

n

l m K

k

n

k

1

3

1

2

3

4

1

1

2

3

10 2 4 6

6

4

4

2

4

2

2

4

2

1

1

1

2

10 2 3 6

7

4

5

3

5

2

3

4

1

1

2

3

1

10 3 5 7

8

5

6

4

6

1

4

2

3

1

2

1

2

10 3 5 6

9

5

5

5

7

4

2

2

2

3

1

2

1

11 2 5 7 10

6

6

6

8

3

2

3

2

2

1

3

1

11 3 4 8 11

4

7

7

9

5

1

2

2

3

1

1

1

11 3 5 7 12

4

6

8

10

2

5

2

1

1

3

1

1

12 3 8 5 13

3

7

9

3

4

2

3

2

2

1

2

1

12 2 8 3 14

3

8

10

4

3

3

4

1

2

1

2

1

12 2 5 4 13

4

7

11

5

2

3

3

3

1

2

3

1

9

2 4 6 12

3

8

12

6

1

3

4

3

1

2

2

1

9

3 5 6 11

3

5

13

7

2

3

4

2

1

2

3

1

9

2 3 7 10

4

6

14

8

1

2

3

5

1

1

2

3

8

2 4 5

9

4

7

15

9

2

3

4

2

1

2

2

1

8

2 5 4

8

3

8

16

10

3

2

2

4

2

1

1

1

8

3 4 5

7

3

9

17

11

4

3

2

3

2

1

2

1

10 4 6 5

6

4

8

18

12

3

3

4

2

2

1

2

2

10 5 7 7

7

4

7

19

13

2

4

5

1

2

2

3

1

10 4 6 7

8

5

6

20

14

3

4

3

2

2

2

3

2

12 4 8 6

9

5

5

21

15

2

5

2

3

1

3

1

2

8

2 3 4 10

6

4

22

16

4

4

2

2

2

2

2

1

8

2 3 5 11

4

4

23

17

2

7

2

1

1

5

2

1

8

2 4 3 12

4

5

24

18

3

1

6

2

2

1

3

1

8

3 5 4 13

3

6

25

19

2

2

2

3

1

1

1

2

8

1 4 2 14

3

7

26

20

1

3

3

2

1

3

1

1

9

2 3 5 12

3

8

27

3

1

4

2

2

0

2

1

1

9

3 4 4 11

3

9

28

4

2

3

1

3

1

2

0

1

9

2 6 3 10

4

10

29

5

3

1

2

3

0

1

1

2

9

4 5 5

9

4

9

30

6

3

2

3

1

2

2

2

0

9

3 5 4

8

3

8

IWS №2 Solve the tasks №№ 6 – 10 (about unknown letters Ti, t, etc, see “VARIANTS”). Problem 6 The moments of starts of two events are uniformly distributed in the time interval from T1 to T2 (hours). One event lasts 10 minutes, another – t min. Find the probability that:

a) the events intersect in time; b) the events do not intersect in time. Problem 7 The point appears at random in a circle of radius R. Find the probability that it will fall onto one of two disjoint figures having the areas S1 and S2. Problem 8 There are k1 % and k2 % of benign products in two batches (respectively). From each batch, one product is randomly selected. Find the probability of finding among them: a) at least one defective product; b) two defectives products; c) one benign and one defective product. Problem 9 Two shooters are shooting the target. The probability of hitting the target in one shot is р1 for the first shooter and р2 for the second one. The first shooter made n1 shots, the second – n2. Find the probability that the target was not hit. Problem 10 Two players A and B toss a coin one by one. The winner is the one who first obtains “tails”. The player A makes the first throw, the player B makes the second throw, then the player A makes the third throw, etc. a) Find the probability that: Variants 1-8. Player A won before the k-th throw. Variants 9-15. Player A won not later than the k-th throw. Variants 16-23. Player B won before the k-th throw. Variants 24-31. Player B won not later than the k-th throw.

Variants 32-40. Player A won at the k-th throw. b) Find these probabilities, if the game is infinite. VARIANTS Problem № Variant T1 № 1 9

6

7

8

9

T2

t

R

S1

S2

k1

k2

p1

p2

10 n1

n2

k

10

10

11

2.25

3.52

71

47

0.61 0.55

2

3

4

2

9

11

20

12

2.37

3.52

78

39

0.62 0.54

3

2

5

3

10

11

10

13

2.49

3.52

87

31

0.63 0.53

2

3

6

4

10

12

20

14

2.55

1.57

72

46

0.64 0.52

3

2

7

5

11

12

15

11

2.27

5.57

79

38

0.65 0.51

2

3

8

6

11

13

15

12

2.39

5.57

86

32

0.66 0.49

3

2

9

7

9

930

10

13

2.51

1.57

73

45

0.67 0.48

2

3

10

8

9

1130 20

14

2.57

3.52

81

37

0.68 0.47

3

2

11

30

15

11

2.29

3.52

85

33

0,69 0.46

2

3

4

10

1130

15

12

2.41

3.52

74

44

0.71 0.45

3

2

5

11

11

1130

5

13

2.53

3.52

82

36

0.72 0.44

2

3

6

12

11 1230

5

14

2.59

5.57

84

34

0.73 0.43

3

2

7

13

12

13

5

15

2.5

8.7

75

43

0.74 0.42

2

3

8

14

12

1230

10

16

2.6

8.5

83

35

0.75 0.41

3

2

9

15

12 1330

5

11

2.2

3.5

76

42

0.76 0.39

2

3

10

16

13

14

10

12

2.4

3.5

77

41

0.77 0.38

3

2

12

17

18

19

10

13

2.5

3.5

47

71

0.78 0.37

2

3

5

18

18

20

20

14

2.6

1.8

39

78

0.39 0.45

3

2

6

19

17

18

10

15

2.7

7.9

31

87

0.38 0.46

2

3

7

20

17

19

20

16

2.7

8.2

72

46

0.37 0.47

3

2

8

21

19

20

15

11

2.3

3.5

38

79

0.36 0.48

2

3

9

22

19

21

15

12

2.4

3,5

32

86

0.35 0.49

3

2

10

17 10 17 1830 20

13 14

2.5 2.6

3.5 5.6

73 81

45 37

0.34 0.51 0.33 0.52

2 3

3 2

11 4

9 10

23 24

10 10

1730

25

16 1630 15

15

2.5

8.7

33

85

0.32 0.53

2

3

5

26

16

1730

15

11

2.3

5.6

44

74

0.31 0.54

3

2

6

27

17 1730

5

12

2.4

5.6

36

82

0.29 0.55

2

3

7

28

17

1830

5

13

2.5

3.5

84

34

0.28 0.56

3

2

8

29

16

17

5

14

2.6

5.6

75

43

0.27 0.57

2

3

9

30

16 1630 10

15

2.7

7.9

83

35

0.26 0.58

3

2

10

IWS №3 Solve the tasks №№ 11-15 (about unknown letters M, ni, etc, see “VARIANTS”). Problem 11 An urn contains M indexed balls numbered from 1 to M. The balls are extracted one by one (without replacement). We consider the following events: A = {the balls were extracted in the following sequence: 1, 2, ..., M}; B = {at least once, the number of extractions of the ball coincides with its own number}; C = {the number of extractions of the ball never coincides with its own number}. Find the probabilities of events A, B, C. Find these probabilities, if M tends to infinity. Problem 12 Among 1000 lamps there are ni lamps which belong to the 3

batch #i, i = 1, 2, 3,

 ni = 1000. There are 6% defective lamps in

i 1

the 1st batch, 5% – in the 2nd and 4% – in the 3rd one. One lamp is randomly selected. Find the probability that it is defective.

Problem 13 There are: N1 white and М1 black balls in the 1st urn; N2 white and М2 black – in the 2nd one. We replace k balls from the 1st urn to the 2nd, then we randomly select one ball from the 2nd urn. Find the probability that it is white. Problem 14 There are k pure and l slaked postage stamps in some album. Then, m stamps are randomly selected (there are pure and slaked stamps among them), they are subjected to special blanking and returned to the album. After this, n stamps are selected again. Find the probability that they are all pure. Problem 15 There are the products from three factories at the store: mi % of products from the i-th factory (i = 1, 2, 3). The factory #i produces ni % of the products of the first grade. One product is bought. It turns out that it is of the first grade. Find the probability that it is produced by the i-th factory. VARIANTS Problem 11 № Variant №  M

12 n1

13 n2 N1 M1 N2 M2 k k

14

15

l m n m1 m2 m3 n1 n2 n3

j

1

12 100

250 4 1 2 5 3 8 10 3 2 50 30 20 70 80 90 1

2

8

430

180 7 3 5 1 4 7 6 2 3 50 30 20 70 80 90 2

3

5

170

540 2 3 5 4 1 6 8 3 1 50 30 20 70 80 90 3

4

11 520

390 8 2 3 2 5 12 5 3 2 60 20 20 70 80 90 1

5

7

600 6 4 1 7 2 13 11 2 4 60 20 20 70 80 90 2

6

10 700

90

7

6

610 5 5 4 10 6 12 7 2 4 40 30 30 80 80 90 1

360 240

3 2 4 4 2 11 8 2 5 60 20 20 70 80 90 3

8

9

80

710 3 2 4 6 4 9 6 2 3 40 30 30 80 80 90 2

9

3

630

230 1 9 3 3 4 10 7 4 1 40 30 30 80 80 90 3

10

8

600

320 3 7 5 2 3 11 7 4 4 40 20 40 90 90 80 1

11

5

810

70

12

10 450

280 2 3 7 1 2 8 7 3 3 40 20 40 90 90 80 3

13

6

640

240 2 2 3 1 1 12 10 4 2 70 20 10 70 80 90 1

14

9

470

470 2 5 3 1 6 9 6 1 3 70 20 10 70 80 90 2

15

4

80

480 6 4 3 3 4 6 8 3 2 70 20 10 70 80 90 3

16

7

570

270 5 5 4 3 3 14 13 3 3 60 10 30 80 90 80 1

17

5

200

200 5 3 5 2 5 11 10 4 5 60 10 30 80 90 80 2

18

11 190

600 12 1 4 7 5 7 5 2 2 60 10 30 80 90 80 3

19

9

100

500 2 4 5 5 5 15 9 4 3 50 20 30 90 80 90 1

20

6

200

200 5 8 2 6 4 8 10 3 3 50 20 30 90 80 90 2

21

12 690

150 4 8 1 2 3 12 5 2 2 50 20 30 90 80 90 3

22

8

550

250 5 2 2 4 2 14 11 3 5 30 30 40 70 70 80 1

23

10 700

300 2 2 4 5 3 6 7 2 2 30 30 40 70 70 80 2

24

7

440

440 5 2 2 6 5 13 9 4 4 30 30 40 70 70 80 3

25

3

360

360 7 3 5 1 4 9 6 3 3 20 40 40 90 90 80 1

26

6

230

270 2 4 2 4 5 11 10 2 5 20 40 40 90 90 80 2

27

9

160

360 5 3 5 7 4 7 8 4 3 20 40 40 90 90 80 3

28

4

270

570 4 5 5 8 2 12 11 5 4 10 50 40 70 90 80 1

29

7

620

300 4 8 2 4 4 8 3 2 2 10 50 40 70 90 80 2

30

5

140

540 2 3 4 2 4 6 6 1 2 10 50 40 70 90 80 3

4 6 7 8 5 13 8 5 2 40 20 40 90 90 80 2

IWS №4 Solve the tasks №№ 16-20 (about unknown letters see “VARIANTS”). Problem 16 The probability of winning the lottery for one ticket is p. Suppose, n tickets are purchased. Find the most probable number of winning tickets and corresponding probability.

Problem 17 Each lottery ticket may contain a large winning with a probability p1, a small winning with a probability p2 and it may by without 3

any winning with a probability p3 (  pi  1 ). Suppose, n tickets are i 1

purchased. Find the probability of getting n1 large and n2 small winnings. Problem 18 The probability of failure of the telephone system at each call is p. Suppose, n calls are received. Find the probability of m failures. Problem 19 The probability of occurrence of an event in each of n independent trials is p. Find the probability that the number m of occurrences of an event satisfies the following conditions: Variants 1 – 11: k1  m  k2. Variants 12 – 21: k1  m. Variants 22 – 31: m  k2. Variants 32 – 40: m = k1. Problem 20 We toss a coin until heads fall out n times. Find the probability that tails fall m times.

VARIANTS 16 Problem №  p n Variant №  1 0.3 10

n 15

17 n1 n2 p1 1 2 0.1

p2 0.2

m 7

18 n 1000

p 0.002

2

0.3

14

15

2

1 0.15 0.15

7

1000

0.003

3

0.3

13

15

2

2 0.15 0.15

7

1000

0.004

4

0.3

12

15

1

1

0.1 0.15

7

1000

0.005

5 6 7 8

0.3 0.3 0.4 0.4

11 15 11 13

15 15 15 15

3 2 3 1

2 0.2 0.25 2 0.15 0.2 1 0.2 0.15 2 0.13 0.17

7 7 7 7

1000 1000 1000 1000

0.006 0.007 0.008 0.009

9

0.4

14

15

2

1 0.14 0.16

7

1000

0.01

10

0.4

10

15

1

3 0.16 0.24

7

1000

0.011

11

0.4

12

15

3

2 0.17 0.23

8

200

0.01

12

0.4

15

15

3

1 0.18 0.12

8

300

0.01

13

0.5

12

15

3

1 0.19 0.11

8

200

0.02

14

0,4

12

15

3

3

0.2 0.26

8

500

0.01

15

0.5

11

14

1

3 0.09 0.21

8

300

0.02

16

0.5

13

14

1

4

0.1 0.21

8

700

0.01

17

0.5

14

14

2

2 0.11 0.2

8

400

0.02

18

0.5

15

14

2

4 0.12 0.2

8

900

0.01

19 20

0.6 0.6

13 11

14 14

3 2

3 0.15 0.2 3 0.2 0.2

8 8

500 1000

0.02 0.011

21

0.6

12

14

3

4

0.3

0.2

9

500

0.004

22

0.6

10

14

2

3

0.1

0.2

9

600

0.005

23

0.6

15

14

3

4

0.2 0.25

9

400

0.01

24

0.6

14

14

5

4 0.25 0.35

9

500

0.01

25

0.7

14

14

4

4 0.21 0.39

9

600

0.01

26 27 28

0.7 0.7 0.7

10 15 11

14 14 14

4 2 1

3 0.1 0.3 2 0.25 0.35 2 0.1 0.15

9 9 9

1000 1000 1000

0.001 0.008 0.009

29

0.7

12

14

1

1 0.05 0.15

9

1000

0.01

30

0.7

13

14

1

2

9

1000

0.011

0.1

0.1

Problem №  Variant №  1

n 100

P 0.8

k1 80

k2 90

n 3

m 2

2

100

0.8

85

95

7

3

3

100

0.8

70

95

4

7

4

100

0.7

83

93

4

3

5

100

0.7

50

60

3

6

6

100

0.7

65

75

6

5

7

100

0.7

70

80

3

5

8

100

0.6

40

50

8

3

9

100

0.75

65

80

6

4

10

100

0.75

70

85

4

5

11

100

0.75

68

78

2

7

12

100

0.7

60

5

4

13

100

0.7

70

8

6

14

100

0.7

80

2

6

15

100

0.6

65

2

3

16

100

0.6

75

4

2

17

100

0.6

50

7

6

18

100

0.8

70

5

3

19

100

0.8

80

4

6

20

100

0.8

90

8

5

21

100

0.8

95

6

3

22

100

0.3

20

5

2

23

100

0.3

30

3

7

24

100

0.3

40

6

8

25

200

0.4

80

5

6

26

200

0.4

90

7

4

27

200

0.4

100

5

7

28

300

0.8

250

6

2

29

400

0.6

270

7

5

30

400

0.7

290

8

4

19

20

IWS №5 Solve the tasks №№ 21-25 (about unknown letters see “VARIANTS”). Problem 21 RV  is given by the distribution  Р

-2 0.1

0 0.3

2 0.4

n+2 х

Find: a) х; b) P(  (n/2; n]}; c) F(x); d) E, V, . Problem 22 Discrete RV  have only two values х1 and х2, (х1 < х2). Given: the probability р1 of possible value х1, expectation E and variance V. Find the distribution of this RV. Problem 23 Given: distribution density р(х) of RV . Find: a) parameter , b) E, c) V, d) F(x), e) P(x1 <  < x2):  1 , x  a, b Variants 1 – 8. р(х) =  .   а 0, x  a, b 

a, x   , b Variants 9 – 16. р(х) =  . 0, x   , b  , x  a, b Variants 17– 24. р(х) =  . 0, x  a, b

 b   a , x   2  Variants 25 – 31. р(х) =  0, x   b    2 

, ,

b  2  .

b  2 

 1 Variants 32 – 40. р(х) =  b   , x  a, b . 0, x  a, b 

Problem 24 RV  is given by the distribution function F(x). Find its density function, expectation and variance. Problem 25 It is given: expectation А and standard deviation  of RV  having normal distribution. Find the probability that the value of  will be in the given interval (a, ).

VARIANTS Problem № 

21

Variant №  1

n 1

p1 0.1

E 3.9

V 0.09

a 2.5

b 4

x1 3

x2 3.3

2

2

0.3

3.7

0.21

1.5

3

2

2.6

3

3

0.5

3.5

0.25

1.5

2.5

2

2.3

4

4

0.7

3.3

0.21

1

3.5

2

2.8

22

23

5

5

0.9

3.1

0.09

-1

2

-0.7

1.1

6

6

0.9

2.2

0.36

-2

1

-1.5

0.3

7

7

0.8

3.2

0.16

-3

5

-2

2

8

8

0.6

3.4

0.24

-1.5

2.5

-1

0

9

9

0.4

3.6

0.24

1

1.8

1.3

1.6

10

10

0.2

3.8

0.16

1

2.4

1.5

2

11

1.5

0.5

5.5

0.25

2

3.5

2.5

3

12

2.5

0.4

5.2

0.96

2

2,8

2.1

2.5

13

3.5

0.3

7.7

0.21

1

2.8

-1

3

14

4.5

0.2

8.8

0.16

1

2.6

1.5

3

15

5.5

0.7

5.6

0.84

2

3

1

3

16

6.5

0.8

1.4

0.64

2

4.8

4.5

5

17

7.5

0.1

0.8

0.36

-4

-2

-1

0

18

8.5

0.2

0.4

1.44

-3

-1

-2

0

19

9.5

0.3

-0.2

3.36

2

4

0

3

20

10.5

0.4

0.6

0.24

1

3

0

2

21

0.1

0.5

-1

1

1

1.5

0

0.5

22

0.2

0.6

-0.6

0.24

-1

1.5

0

1

23

0.3

0.5

0

4

-1.5

-1

-1

2

24

0.4

0.4

0

6

-1.5

1

-1

1

25

0.5

0.3

2.1

1.89

0.5

1

0

3

26

0.6

0.2

1.6

0.64

0.2

2

0

4

27

0.7

0.1

2.7

0.81

0.5

3

0

0.5

28

0.8

0.5

1.5

0.25

0.4

4

1

5

29

0.9

0.1

0.8

0.36

0.25

1

0

3

30

11

0.2

2.6

0.64

0.02

2

0

3

Problem №  Variant №  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

24 F(x) 0 if х  0; х2 if 0 < x  1; 1 if x > 1 0 if х  1; (х2 – x)/2 if 1 < x  2; 1 if x > 2 0 if х  0; х3 if 0 < x  1; 1 if x > 1 0 if х  0; 3х2 + 2x if 0 < x  1/3; 1 if x > 1/3 0 if х  2; х/2-1 if 2 < x  4; 1 if x > 4 0 if х  0; х2/9 if 0 < x  3; 1 if x > 3 0 if х  0; х2/4 if 0 < x  2; 1 if x > 2 0 if х  -/2; cosх if -/2 < x  0; 1 if x > 0 0 if х  0; 2sinx if 0 < x  /6; 1 if x > /6 0 if х  3/4; cos2х if 3/4 < x  ; 1 if x >  0 if х < 1; (x-1)2 if 1х2; 1 if х > 2. 0 if х < 0; x3 if 0х1; 1 if х > 1 0 if х -1 0 if х < 5; (x-5)3 if 5х6; 1 if х > 6 0 if х < -1; (x+1)3 if -1х0; 1 if х > 0 0 if х < 2; x-2 if х [2; 3]; 1 if х > 3 0 if х < -5; (x+5)2 if х [-5; -4]; 1 if х > -4 0 if х < -4; (x+4)2 if х [-4; -3]; 1 if х > -3 0 if х < -3; (x+3)2 if х [-3; -2]; 1 if х > -2 0 if х < -5; (x+5)3 if х [-5; -4]; 1 if х > -4 0 if х < -4; (x+4)3 if х [-4; -3]; 1 if х > -3 0 if х < -3; (x+3)3 if х [-3; -2]; 1 if х > -2

25 А 10 9

 4 5

a 2 5

13 14



8 7

1 2

4 3

9 10

6

3

2

11

5

1

1

12

4

5

2

11

3

2

3

10

2

5

4

9

2

4

6

10

1

4

0

3

-1 -3

2 5

-2 -2

0 0

-4

1

-5

-2

0

10

-1

1

1

10

0

5

10

1

9

11

-10

5

-12

-8

-5

4

-7

0

-2

0,25

-2,5

-1,5

1

16

0

10

2

25

0

5

0 if х < 0; 1-e-2x if х  0 0 if х < 0; 1-e-3x if х  0 0 if х < 0; 1-e-4x if х  0 0 if х < 0; 1-e-5x if х  0 0 if х < 0; 1-e-x/2 if х  0 0 if х < 0; 1-e-x/3 if х  0 0 if х < 0; 1-e-x/4 if х  0 0 if х < 0; 1-e-x/5 if х  0

23 24 25 26 27 28 29 30

-1 4 5 3 5 10 2 -4

1 9 9 16 1 12 1 1

-1,5 3 3 1 4 5 -1 -5

0,5 5 6 5 6 15 4 -3

IWS №6 Solve the tasks №№ 26-30 (about unknown letters see “VARIANTS”). Problem 26 Joint distribution of RVs  and  is given: \ y1 y2

х1 0.15 0.3

х2 0.25 0.15

х3 0.1 А

a. Find А. b. Write down joint distribution of RVs U =  +  and V = . c. Check U and V for independence. d. Find the marginal distributions of U and V. e. Compute the correlation coefficient (U, V). f. Make a conclusion about the direction and closeness of the stochastic-linear connection between U and V. Problem 27 The sample is given (see VARIANTS). Find: a. Statistical series. b. Empirical distribution function. c. The sample mean. d. The sample variance.

e. Unbiased estimator of variance. f. Standard deviation. g. Histogram (using the Sturgess formula find the number of (equidistant) intervals). Problem 28 The results of simultaneous observations of random variables X and Y (see variants) are given. Find the sample correlation coefficient and make conclusion about the direction and closeness of stochastically-linear relation between X and Y. Problem 29 A sample of a large batch of electric lamps consists of n lamps. The mean value of burning lamp of this sample is equal to X . Find the confidence interval for the mean burning lamp of whole batch `with a confidence level of 1-, if it is known that the standard deviation of burning lamp is equal to . Problem 30 By the sample of the Problem 27 (see VARIANTS) construct a (1-)×100% confidence interval for the true content of the expectation of a normally distributed population. VARIANTS 26 Task№  Variant №  x1 x2 x3 y1 y2 1

-2 0

2

-3 -2 1

3

-1 1

4

-3 -1 0 -1 1

5

0

1

2 -3 1 0 3

2 -2 0

2 -2 1

27 Sample values

28 Х

Y (respectively) -2, 4, 4, -2, -2, 0, 0, 2, 1, 2, 4, 4, 5, 6 2, 4, 8, 8, 12 1, 4, -2 -20, 40, 40, -20, -20, 0, 0, -2, 0, 4, 5, 12, 10, 8, 8, 10 20, 10, 10, 40, -20 5 2, 4, 4, 2, 2, 0, 0, 2, 1, 1, -1, 1, 1, 2, 20, 20, 10, 12, 4, 2 2 12 20, 40, 45, 25, 20, 0, 0, 2, 2, 3, 5, 5 -5, -4, 0, 2, 4 20, 15, 10, 45, 25 -2, 3, 3, -1, -1, 0, 0, 1, 1, 1 2, 3, 4, 4, 6 0, -1, -1, -2, -10

6 7 8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23 24 25 26

1

2

3 -2 0 2, 3, 3, 2, 2, 0, 1, 3, 1, 1 -2, 1, 1, 2, 4, 4, 0, -2, -3 4 -1 1 3 -1 0 -20, 40, 40, -20, -20, 0, 0, 2, 4, 6, 8, 8 0, 1, 10, 20, 30 20, 10, 10 -2 0 2 -2 1 40, 52, 20, 20, 10, 20, 10, -5, 5, 2, 0, 10, 20, 20, 50, 10, 30, 30 1 50 -1 0 1 -1 2 -5.5; 4; 0; 4; -5.5; -5.5; 0; 1, 3, 5, 7, 9 3, 1, 1, 0, 0 0; 2; 5.5; 4 0 1 2 -1 0 100; 150; 120; 155; 160; 2, 4, 5, 5, 5 10, 8, 8, 5, 6 155; 115; 115; 120; 150 -1 1 2 1 2 2, 3, 3, 5, 4, 4, 5, 2, 1, 10, -5, -2, 0, 0, 1, 3, 7, 7, 6 1, 2, 3, 3, 5 1 -2 -1 0 -2 1 2.1; 2.2; 0; 2.5; 2.1; 2.8; 1, 1, 5, 10, 0, 1, 1, 2, 1 2.5; 2.8; 2.5; 2,.1; 2.2; 2.2 8 0 1 3 -2 0 200; 250; 220; 220; 250; 10, 12, 10, 0, 0, -1, -2, -1 200; 200; 220; 210 15, 15 1 2 3 -2 -1 1, 3, 1, 1, 3, 2, 2, 2, 5, 1, 1, 1, 0, 2, 4 5, 5, 2, 3, 0 1, 5, 5, 3, 3, 4 -1 0 1 -2 2 10, 30, 10, 10, 30, 20, 20, 2, 4, 6, 4, 8 1, 3, 1, 5, 8 20, 50, 10, 10, 50, 50, 30, 30, 40 0 1 2 -1 1 -20, 30, 30, -10, -10, 0, 0, 1, 0, 1, 2, 5 2, 5, 4, 6, 6 10, 10, -10 -2 1 2 -1 0 15, 30, 15, 15, 30, 25, 25, -5, 4, 1, 0, 15, 20, 2, 40, 25, 50, 15, 15, 50, 50, 30, 2 40 30, 45 1 2 3 -1 0 1, 4, 1, 1, 4, 2, 2, 2, 6, 1, 1, 4, 6, 5, 5 10, 8, 5, 5, 6 1, 6, 6, 3, 4, 4 -3 -2 0 1 2 -2, 2, 2, -1, -1, 0, 0, 3, 3, 3 -5, -3, 0, 1, 1, 2, 7, 6, 6 1 -1 1 2 -2 1 -25, 35, 35, -10, -10, 0, 0, 1, 3, 5, 10, 0, 1, 2, 2, 0 15, 15, -10 9 -3 -1 1 -1 2 -4.5; 4.5; 0; 4.5; -4.5; - 10, 12, 10, 0, 0, -1, -1, -3 4.5; 0; 0; 2; 4.5; 4 10, 15 -2 0 2 0 1 1, 5, 1, 1, 5, 2, 2, 2, 5, 1, 2, 1, 0, 2, 3 5, 4, 2, 3, 0 1, 5, 5, 5, 4, 4 -1 0 1 0 1 25, 30, 25, 25, 30, 35, 35, 2, 4, 5, 4, 6 1, 3, 3, 5, 9 35, 50, 25, 25, 50, 50, 30, 35, 45 0 1 2 -2 0 -3, 3, 3, -1, -1, 0, 0, 2, 2, 2 1, 0, 1, -1, -2, 5, 4, 6, 5 5 0 1 2 -1 1 1, 3, 1, 1, 3, 2, 2, 2, 3, 1, -5, 4, 1, 0, 15, 20, 2, 40, 1, 3, 3, 3, 4, 4 2 40 -1 0 1 -1 2 -30, 30, 30, -20, -20, 0, 0, 10, 9, 9, 5, 0, 2, 2, 5, 5 20, 20, -20 6

27

-1 0

2

28

-2 1

0

29

-2 -1 0

30

0

1

0

3 4 -3, -3, -3, -1, -1, 0, 0, 2, 2, 1, 2, 3, 3, 5 2 1 2 -1, 3, -1, -1, 3, 2, 2, 2, 3, - -1, -1, 0, 2, 1, -1, 3, 3, 3, 4, 4 2 1 2 -1, 3, -1, -1, 3, 2, 2, 2, 3, - 0, 0, 1, 1, 3 1, -1, 3, 3, 3 1 2 1, 3, 1, -1, 3, 2, 2, 2, 3, -1, -2, -2, -1, 0, 1, 3, 3 0

Task № Variant № 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

29

30



1-a

n

X

1-a

50 45 40 55 60 35 30 25 20 15 35 30 25 20 22 20 25 5 4 5 60 5 32 25 2 15 5 20 10 50

0.95 0.9 0.99 0.8 0.85 0.98 0.96 0.92 0.9 0.95 0.8 0.95 0.99 0.9 0.8 0.85 0.95 0.8 0.99 0.85 0.8 0.96 0.93 0.92 0.88 0.95 0.99 0.9 0.89 0.9

95 90 100 110 120 105 98 115 110 115 120 115 120 110 115 120 200 150 110 115 140 100 97 125 112 135 100 200 250 200

2000 1500 1200 1400 1300 1100 1150 990 1000 1200 1100 990 995 1105 900 950 20 15 1100 1250 13 11 15 99 10 12 10 12 10 10

0.9 0.95 0.85 0.99 0.98 0.8 0.92 0.97 0.96 0.92 0.93 0.88 0.79 0.91 0.89 0.82 0.95 0.91 0.86 0.98 0.97 0.82 0.9 0.97 0.95 0.9 0.95 0.8 0.81 0.9

8, 8, 6, 5, 5 10, 9, 6, 6, 6 3, 3, 2, 2, 0 0, 1, 1, 2, 2

1. Definitions of Probability 1. Formulate axiomatic definition of probability. What is probability space? 2. Formulate classical definition of probability. Does a heptahedron throwing correspond to this definition? Justify your answer. 3. Formulate geometric definition of probability. 4. Formulate statistical definition of probability. 2. Additive Rule 1. Formulate the additive rule for n mutually exclusive events. Which Kolmogorov axiom entails this rule? Formulate the event А1 + А2 + … + Аn, if А1, А2, …, Аn are mutually exclusive events. 2. Define a complete group of events. 3. What is the sum of probabilities of events of a complete group? Show. What is the sum of probabilities of two mutually exclusive events? 4. Formulate and prove the additive rule for two arbitrary events. Formulate the event А1 + А2 + … + Аn, if А1, А2, …, Аn are arbitrary events. Formulate the additive rule for n arbitrary events. 3. Conditional Probability. Independence. Multiplicative rule 1. Write down a conditional probability formula. In which case is it defined? Define the independence of two events using it. 2. Write down a multiplicative rule for two arbitrary events. Which formula of probability theory entails this rule? Write down a multiplicative rule for two independent events. 3. When are three events A, B, C mutually independent? Does a pairwise independence imply mutual independence? If not, then give a counter-example. 4. Write multiplicative rules for n arbitrary events and n independent events.

4. Total Probability and Bayes formulas 1. Formulate the total probability formula. 2. Prove this formula. 3. Formulate the Bayes formula. 4. Prove this formula. 5. Bernoulli Formula 1. What are Bernoulli trials? 2. Give an example of Bernoulli trials. 3. We take some balls (without replacement) out of the box. Are these trials the Bernoulli trials? Justify your answer. 4. Formulate Bernoulli formula (explain all designations). Prove it. 6. Simple random variable (RV) 1. Give the definition of simple RV. Give some examples. 2. Distribution law of RV. What is the sum of probabilities of all the values of RV? 3. Distribution function of RV. 4. What properties of distribution function do you know? 7. Expectation and variance of simple random variable 1. Give the definition of expectation. What is meaningful meaning of expectation? 2. What are main properties of expectation? 3. Give definition of variance. What is meaningful meaning of variance? 4. What are main properties of variance? 8. Discrete distributions 1. Write out the formula of binomial distribution law. What are its expectation and variance? 2. Give an example of binomial random variable. 3. Write out the formula of Poisson distribution law. What are its expectation and variance? 4. Give an example of Poisson random variable. 9. Continuous Distributions 1. Write out the formula of uniform on [a, b] distribution density (with a graph). Derive the formula of distribution function through it (with graph). Give an example of such a random variable (RV).

2. What is an expectation of uniform on [a, b] distribution? What is its variance? 3. Write out the formula of exponential (with the parameter ) distribution density (with a graph). Derive the formula of distribution function through it (with a graph). Give an example of such RV. What are its expectation and variance? 4. Write out the formula of normal distribution density (with a graph). Derive the formula of distribution function through it (with a graph). What are its expectation and variance? Derive the formula of distribution function through it (with a graph). What is the feature that makes this law stand out among the rest? 10. Basic Concept of Mathematical Statistics 1. What are population and sample? What are their analogues in probability theory? 2. Give the definition of empirical distribution function. What is the difference between the empirical and theoretical distribution functions? 3. Write down the formula of sample mean calculating. What parameter of population does it estimate? Show this. 4. Write down the formula of sample variance. What parameter of population does it estimate? Show this. What is unbiased estimator for the general variance?

Main 1. Walpole, Myers, Myers, Ye. Probability and Statistics for Engineers and Scientists. – USA: Pearson, 2012. 2. Suhov Y., Kelbert M.. Probability and Statistics by Example. – V. I. – Cambridge University Press, 2005. 3. Kovaleva I., Skakov S. The text-book on probability theory and mathematical statistics. – Almaty: IAB, 2005. 4. Bertsekas D., Tsitsiklis J. Introduction to Probability. – M.: I.T., 2000. 5. Sebastiani P. A Tutorial on Probability Theory. – University of Massachusetts, 2018. Additional 6. Kremer N.Sh. Probability theory and mathematical statistics. – M.: «UNITY», 2000. 7. Kolemayev V.A., Staroverov O.V., Turundayvsky V.B. Probability theory and mathematical statistics. – М.: Vyshaya shkola, 1991. 8. Gmurman V.Ye. Probability theory and mathematical statistics. – М.: Vyshaya shkola, 1999. 9. Gmurman V.Ye. Manual on solving problems on probability theory and mathematical statistics. – М.: Vyshaya shkola, 1999. 10. Agapov G.I. Book of problems on probability theory. – М.: Vyshaya shkola, 1985.

Еducational issue

Kovaleva Irina PROBABILITY THEORY AND MATHEMATICAL STATISTICS Educational-methodical manual

Editor V. Popova Typesetting and cover design G. Kaliyeva Cover design photos were used from sites

IB No.12629 Signed for publishing 27.02.2019. Format 60x84 1/16. Offset paper. Digital printing. Volume 9,37 printer’s sheet. 60 copies. Order No.972. Publishing house Qazaq University Al-Farabi Kazakh National University KazNU, 71 Al-Farabi, 050040, Almaty Printed in the printing office of the Qazaq University Publishing House.