Introduction to Probability with Statistical Applications, Second Edition (Instructor Solution Manual, Solutions) [2, 2nd ed, 2e] 9783319306186, 3319306189

obtained thanks to https://t.me/HermitianSociety

510 132 854KB

English Pages 152 Year 2016

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Introduction to Probability with Statistical Applications, Second  Edition  (Instructor Solution Manual, Solutions) [2, 2nd ed, 2e]
 9783319306186, 3319306189

Citation preview

2. The Algebra of Events

2.1.1. a) The sample points are HH, HT, T H, T T, and the elementary events are {HH} , {HT } , {T H} , {T T } . b) The event that corresponds to the statement “at least one tail is obtained” is {HT, T H, T T }. c) The event that corresponds to “at most one tail is obtained” is {HH, HT, T H} . 2.1.2. a) Yes. Just ignore the third toss, that is, take {HHH, HHT }, {HT H, HT T }, {T HH, T HT }, {T T H, T T T } as sample points to describe two tosses of a coin. b) X = {HHH, HHT, HT H, HT T, T HH, T HT, T T H}, Y = {HHH, HHT, HT H, HT T, T HH, T HT }, Z = {HT T, T HT, T T H}. 2.1.3. a) Four different sample spaces to describe three tosses of a coin are: S1 = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }, S2 = {0H, 1H, 2H, 3H}, S3 = {an even # of H’s, an odd # of H’s}, S4 = {HHHH, HHHT, HHT H, HHT T, HT HH, HT HT, HT T H, HT T T, T HHH, T HHT, T HT H, T HT T, T T HH, T T HT, T T T H, T T T T }, where the fourth letter is to be ignored in each sample point b) For S1 the event corresponding to the statement “at most one tail is obtained in three tosses” is {HHH, HHT, HT H, T HH}. For S2 it is {2H, 3H}, and in S3 it is not possible to find such an event. For S4 the event corresponding to the statement “at most one tail is obtained in the first three tosses” is {HHHH, HHHT, HHT H, HHT T, HT HH, HT HT, T HHH, T HHT } . c) It is not possible to find an event corresponding to the statement “at most one tail is obtained in three tosses” in every conceivable sample space for the tossing of three coins, because some sample spaces are too coarse, that is, the sample points that contain this outcome also contain opposite outcomes. For instance, in S3 above, the sample point “an even # of H’s” contains the out-

2

2. The Algebra of Events

comes HHT, HT H, T HH for which our statement is true and the outcome T T T, for which it is not true. 2.1.4. Three different sample spaces to describe drawing a card are: S1 = {face card, number card}, S2 = {black, red}, S3 = {even numbered card, odd numbered card, face card}. 2.1.5. In the 52-element sample space for the drawing of a card a) the events corresponding to the statements p = “An Ace or a red King is drawn,” and q = “The card drawn is neither red, nor odd, nor a face card”are P = {AS, AH, AD, AC, KH, KD}, Q = {2C, 4C, 6C, 8C, 10C, 2S, 4S, 6S, 8S, 10S}, b) statements corresponding to the events U = {AH, KH, QH, JH}, and V = {2C, 4C, 6C, 8C, 10C, 2S, 4S, 6S, 8S, 10S} are u = “The Ace of hearts or a heart face card is drawn,” and v = “An even numbered black card is drawn.” 2.1.6. S1 = {AAA, AAB, AAN, ABA, ABB, ABN, AN A, AN B, ANN, BAA, BAB, BAN, BBA, BBB, BBN, BN A, BNB, BN N, N AA, N AB, N AN, N BA, N BB, N BN, N N A, N NB, NN N, }, S2 = {0N, 1N, 2N, 3N }. 2.1.7. Three possible sample spaces are: S1 = {The 365 days of the year}, S2 = {January, February,. . . , December}, S3 = {Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday}. 2.2.1. a) {1, 3, 5, 7, 9} or {k : k = 2n + 1, n = 0, 1, 2, 3, 4} . b) {2n : n = 1, 2, 3, 4, 5}, c) {JS, QS, KS, JC, QC, KC}, d) {−3, −2, −1, 1, 2, 3}, e) {x : −1 < x < 1}. 2.2.2. a) {1, 2, 3} , b) {1} c) {2, 3, 4, 5, 6, 7, 8} , d) {1, 2, 3, 4, 5, 6, 7} , e) {7} , f) {4} , g) {3, 4, 5} . 2.2.3. ∅, {a} , {b} , {c} , {ab} , {ac} , {bc} , {abc} . 2.2.4. A ∪ B = {I, II, III} = {IV } = {III, IV } ∩ {II, IV } = A ∩ B.

2. The Algebra of Events

3

2.2.5. A ∩ B ∩ C = {1} , (A ∩ B) ∩ C = {1, 4} ∩ {1, 2, 3, 7} = {1} , A∩(B∩C) = {1, 3, 4, 5}∩{1, 2} = {1} , B∩(A∩C) = {1, 2, 4, 6}∩{1, 3} = {1} . 2.2.6. A ∪ (B ∪ C) = {1, 3, 4, 5} ∪ {1, 2, 3, 4, 6, 7} = {1, 2, 3, 4, 5, 6, 7} , (A ∪ B) ∪ C = {1, 2, 3, 4, 5, 6} ∪ {1, 2, 3, 7} = {1, 2, 3, 4, 5, 6, 7} . 2.2.7. a) A ∩ (B ∪ C) = {1, 3, 4, 5} ∩ {1, 2, 3, 4, 6, 7} = {1, 3, 4} , but (A ∩ B) ∪ C = {1, 4} ∪ {1, 2, 3, 7} = {1, 2, 3, 4, 7} . b) A ∩ (B ∪ C) = {1, 3, 4, 5} ∩ {1, 2, 3, 4, 6, 7} = {1, 3, 4} , and (A ∩ B) ∪ (A ∩ C) = {1, 4} ∪ {1, 3} = {1, 3, 4} . c) (A ∩ B) ∪ C = {1, 4} ∪ {1, 2, 3, 7} = {1, 2, 3, 4, 7} and (A ∪ C) ∩ (B ∪ C) = {1, 2, 3, 4, 5, 7} ∩ {1, 2, 3, 4, 6, 7} = {1, 2, 3, 4, 7} . 2.2.8. a) {8} = A ∪ B ∪ C = A B C, b) {3} = ABC, c) {1, 4, 5} = ABC ∪ ABC ∪ AB C = A(B ∪ B C) = A(BC ∪ C), d) {1, 4, 5, 8} = ABC ∪ ABC ∪ AB C ∪ A B C = AB ∪ B C, e) {2, 6} = ABC ∪ ABC = AB, f) {2, 6, 7} = AB ∪ A BC. 2.2.9. The Venn diagram below illustrates the relation A ∩ B = ∅. Using the region numbers from Figure 2.1, we have A ∩B = {2, 3} ∩{1, 3} = {3}, which is the region outside both A and B. Similarly, A ∪ B = {2, 3} ∪ {1, 3} = {1, 2, 3} = S, the whole sample space.

Fig. 2.1. A ∩ B = ∅

2.2.10. If x ∈ A ∪ B, then x ∈ / A ∪ B, and so x ∈ / A and x ∈ / B. Hence x ∈ A and x ∈ B, that is, x ∈ A∩B. Thus A ∪ B ⊂ A∩B. Conversely, if x ∈ A∩B, then

4

2. The Algebra of Events

x ∈ A and x ∈ B, whence x ∈ / A and x ∈ / B, and so x ∈ / A∪B and x ∈ A ∪ B. Thus A ∩ B ⊂ A ∪ B. The two inclusions above imply A ∪ B = A ∩ B. 2.2.11. 1. Assume that A ⊂ B, that is, that whenever x ∈ A, then x ∈ B. Then A ∪ B = {x : x ∈ A or x ∈ B} ⊂ {x : x ∈ B or x ∈ B} = B. On the other hand, clearly B ⊂ A ∪ B. Thus, A ⊂ B implies A ∪ B = B. 2. Conversely, assume that A∪B = B, that is, that {x : x ∈ A or x ∈ B} = B. Hence, if x ∈ A, then x must also belong to B, which means that A ⊂ B. Alternatively, by the definition of unions, A ⊂ A ∪ B, and so, if A ∪ B = B, then substituting B for A ∪ B in the previous relation, we obtain that A ∪ B = B implies A ⊂ B. 2.2.12. 1. Assume that A ⊂ B. Then A ∩ B = {x : x ∈ A and x ∈ B} ⊃ {x : x ∈ A and x ∈ A} = A. On the other hand, clearly A ∩ B ⊂ A. Thus, A ⊂ B implies A ∩ B = A. 2. Conversely, assume that A ∩ B = A, that is, that {x : x ∈ A and x ∈ B} = A. Hence, if x ∈ A, then x must also belong to A ∩ B, that is, to B as well, which means that A ⊂ B. Alternatively, by the definition of intersections, A ∩ B ⊂ B, and so, if A ∩ B = A, then substituting A for A ∩ B in the previous relation, we obtain that A ∩ B = A implies A ⊂ B. 2.2.13. a) (A − B)−C = {3, 5}−{1, 2, 3, 7} = {5} and (A − C)−(B − C) = {4, 5}− {4, 6} = {5}. b) A − (B ∪ C) = {1, 3, 4, 5} − {1, 2, 3, 4, 6, 7} = {5} and (A − B) − C = {3, 5} − {1, 2, 3, 7} = {5}. c) (AB)−C = {1, 4}−{1, 2, 3, 7} = {4} and (A−C)(B−C) = {4, 5}∩{4, 6} = {4}. 2.2.14. a) A − BC = ABC = A B ∪ C = AB ∪ AC = (A − B) ∪ (A − C) . b) On the one hand, (A − B) ∪ C = AB ∪ C, and on the other, ((A ∪ C)−B)∪BC = (A ∪ C) B∪BC = AB∪BC ∪BC = AB∪ B ∪ B C = AB ∪ SC = AB ∪ C. 2.3.1. a) The event R corresponding to r = “b is 4 or 5” is the shaded region consisting of the fourth and fifth columns in Figure 2.2, that is, R = {(b, w) : b = 4, 5 and w = 1, 2, ..., 6}. b) The event corresponding to q or r is Q∪R, the shaded region in Figure 2.3, that is, {(b, w) : (b = 1, 2, ..., 6, and w = 1, 2, 3) or (b = 4, 5 and w = 4, 5, 6)}. c) The event corresponding to r but not q is R ∩ Q, the darkly shaded region in Figure 2.4, that is, {(b, w) : b = 4, 5 and w = 4, 5, 6}.

2. The Algebra of Events

5

Fig. 2.2. R corresponding to r = “b is 4 or 5”

Fig. 2.3. Q ∪ R corresponding to q or r

d) The event corresponding to p and q and r is P ∩ Q ∩ R, the triply shaded region in Figure 2.5, that is, {(4, 3), (5, 2)}. e) The event corresponding to q and r, but not p is Q ∩ R ∩ P, the darkly shaded region in Figure 2.6, that is, {(4, 1), (4, 2), (5, 1), (5, 3)}.

6

2. The Algebra of Events

Fig. 2.4. R ∩ Q corresponding to r but not q

Fig. 2.5. P ∩ Q ∩ R corresponding to p and q and r

2.3.2. P1 = {5, 6, 7} = AB C ∪ ABC ∪ A BC. P2 = {5, 6, 7, 2, 3, 4, 1} = A ∪ B ∪ C. P3 = {5, 6, 7, 8} = P1 ∪ {8} = AB C ∪ ABC ∪ A BC ∪ A B C.

2. The Algebra of Events

7

Fig. 2.6. Q ∩ R ∩ P, corresponding to p, q and r, but not p

2.3.3. P4 = {2, 3, 4} = ABC ∪ ABC ∪ ABC. P5 = {2, 3, 4, 5, 6, 7, 8} = ABC = A ∪ B ∪ C. P6 = {1, 2, 3, 4} = {1} ∪ P4 = ABC ∪ ABC ∪ ABC ∪ ABC = AB ∪ AC ∪ BC. 2.3.4. i. a) A ∪ B = A ∩ B corresponds to the relation “it is not the case that an ace or a red card is drawn” is equivalent to the statement “the card drawn is not an ace and not red.” b) A ∩ B = A ∪ B corresponds to the relation “it is not the case that a red ace is drawn” is equivalent to the statement “the card drawn is not an ace or not red.” ii. S corresponds to the statement ”the card drawn is any card.” 2.3.5. a or b (that is, at least one of them) is certain to occur. 2.3.6. a and b are mutually exclusive. 2.3.7. 1. (A △ B) △ C = {2, 3, 5, 6} △ {1, 2, 3, 7} = ({2, 3, 5, 6} ∩ {4, 5, 6, 8}) ∪ ({1, 4, 7, 8} ∩ {1, 2, 3, 7}) = {5, 6, 1, 7}, A △ (B △ C) = {1, 3, 4, 5} △ {3, 4, 6, 7} = ({1, 3, 4, 5} ∩ {1, 2, 5, 8}) ∪ ({2, 6, 7, 8} ∩ {3, 4, 6, 7}) = {1, 5, 6, 7}.

8

2. The Algebra of Events

2. A (B △ C) = {1, 3, 4, 5} ∩ {3, 4, 6, 7} = {3, 4}, AB △ AC = {1, 4} △ {1, 3} = ({1, 4} ∩ {2, 4, 5, 6, 7, 8}) ∪ ({2, 3, 5, 6, 7, 8} ∩ {1, 3}) = {4, 3}. 2.3.8. No: A ∪ (B △ C) = {1, 3, 4, 5} ∪ {3, 4, 6, 7} = {1, 3, 4, 5, 6, 7}, but (A ∪ B) △ (A ∪ C) = {1, 2, 3, 4, 5, 6} △ {1, 2, 3, 4, 5, 7} = {6, 7}.

3. Combinatorial Problems

3.1.1. Let A = set of drinkers, and B = set of smokers. Then n(A) = 65, n(B) = 28, n(S) = 100, n(A ∪ B) = 70. Hence n(A∪B) = 70. From Theorem 3.1.2, n(AB) = n(A) + n(B) − n(A ∪ B), and so, n(AB) = 65 + 28 − 70 = 23. 3.1.2. A = {1, 2, 3} , B = {3, 4, 5} , C = {1, 5, 6} . 3.1.3. If A ∩ B = ∅, then A and B have no common element. Hence A, B and C cannot have any common element either. Alternatively, if A ∩ B = ∅, then A ∩ B ∩ C = (A ∩ B) ∩ C = ∅ ∩ C = ∅. The proofs of the other cases just need changing letters. 3.1.4. a) n(A ∪B ∪ C) = n((A ∪ B) ∪C) = n (A ∪ B)+ n(C) = n(A)+ n(B) +n(C). b) Use induction: First, the theorem is trivially true for k = 1. Next, asN−1 sume it to be true for all k < N. Then n(∪N k=1 Ak ) = n(∪k=1 Ak ∪ AN ) = N N−1 n(∪k=1 Ak ) + n(AN ) = k=1 n (Ak ) . 3.1.5. n(A) + n(B) + n(C) − n(A ∩ B) − n(A ∩ C) − n(B ∩ C) + n(A ∩ B ∩ C) = n(1, 3, 4, 5) + n(1, 2, 4, 6) + n(1, 2, 3, 7) − n(1, 4) − n(1, 3) − n(1, 2) + n(1) = n(1) + n(3) + n(4) + n(5) + n(1) + n(2) + n(4) + n(6) + n(1) + n(2) + n(3) + n(7) − [n(1) + (4)] − [n(1) + n(3)] − [n(1) + n(2)] + n(1) = n(1) + n(2) + n(3) + n(4) + n(5) + n(6) + n(7) = n(1, 2, 3, 4, 5, 6, 7) = n(A ∪ B ∪ C). 3.1.6. a) 16, b) 36, c) 27. 3.1.7. 1. By the definition of indicators, IAB (s) = 1 ⇔ s ∈ AB. By the definition of intersection, s ∈ AB ⇔ (s ∈ A and s ∈ B), and, by the definition of indicators, (s ∈ A and s ∈ B) ⇔ (IA (s) = 1 and IB (s) = 1) . Since 1 · 1 = 1 and 1 · 0 = 0 · 0 = 0, clearly, (IA (s) = 1 and IB (s) = 1) ⇔ IA (s) IB (s) = 1. Now, by the transitivity of equivalence relations, IAB (s) = 1 ⇔ IA (s) IB (s) = 1, which is equivalent to IAB = IA IB . 2. IA1 A2 ···An = IA1 IA2 · · · IAn for any n ≥ 2, because IA1 A2 ···An (s) = 1 ⇔ s ∈ A1 A2 · · · An , and on the other hand,

10

3. Combinatorial Problems

IA1 (s) IA2 (s) · · · IAn (s) = 1 ⇔ (IA1 (s) = 1, IA2 (s) = 1, . . . , IAn (s) = 1) ⇔ (s ∈ A1 , s ∈ A2 , . . . , s ∈ An ) ⇔ s ∈ A1 A2 · · · An . 3. By definition, IA∪B (s) = 1 ⇔ s ∈ A ∪ B. Furthermore, for disjoint A and B, IA (s) + IB (s) = 1 ⇔ s ∈ A ∪ B, because, IA (s) + IB (s) is 1 + 0 = 1 for s ∈ A, 0 + 1 = 1 for s ∈ B, and 0 otherwise Thus, by transitivity, IA∪B (s) = 1 ⇔ IA (s) + IB (s) = 1, which is equivalent to IA∪B = IA + IB . 4. IA = 1−IA , since IA (s) = 1 ⇔ s ∈ A and 1−IA (s) = 1−0 = 1 ⇔ s ∈ A. So, both sides of IA = 1 − IA are equal to 1 if and only if s ∈ A and are 0 otherwise. 3.1.8. Let S = {1, . . . , 1000} , A = {multiples of 3 in S}, B = {multiples of 6 in S}, C = {multiples of 8 in S}. Then n(A) = ⌊1000/3⌋ = 333, n(B) = ⌊1000/6⌋ = 166, n(C) = ⌊1000/8⌋ = 125, n(AB) = ⌊1000/6⌋ = 166, n(AC) = ⌊1000/24⌋ = 41 = n(BC) = n(ABC). Thus, n(A ∪ B ∪ C) = 333 + 166 + 125 − 166 − 41 − 41 + 41 = 417. 3.1.9. There are 2n basic functions and each one of those can be either included n or not in a canonical representation. Thus there are 22 different canonical representations, that is, this is the number of sets in the Boolean algebra generated by n sets. 3.2.1. a) S = {ASAH, ASAD, ASAC, AHAS, AHAD, AHAC, ADAS, ADAH, ADAC, ACAS, ACAH, ACAD}, b) {ASAH, ASAD, ASAC} , {AHAS, AHAD, AHAC} , {ADAS, ADAH, ADAC}, {ACAS, ACAH, ACAD} , c) {AHAS, AHAD, AHAC, ASAH, ADAH, ACAH} . 3.2.2. See Figure 3.1 The number of classes is 2 · 3 · 3 = 18.

Fig. 3.1.

3. Combinatorial Problems

11

3.2.3. See Figure 3.2.

Fig. 3.2.

3.2.4. See Figure 3.3. There are 3 · 3 · 2 = 18 possible complete dinners.

Fig. 3.3.

3.2.5. a) 30 · 29 · 28, b) 303 . 3.2.6. a) 5 · 102 · 5 = 2500, b) 5 · 102 · 2 = 1000, c) 2 + 22 + 23 + 24 = 30, d) 3 · 83 − 1 = 1535. (Subtract 1, because 0 = 0000 is not positive.)

12

3. Combinatorial Problems

3.2.7. a) 2 + 22 + 23 = 14, b) 2 + 22 + 23 + 24 = 30. 3.2.8. a) 2 · 5 · 4 · 3 · 2 · 1 = 240, b) 2 · 5 · 4 · 3 · 2 = 240, c) 4 · 5 · 4 · 3 = 240. 3.3.1. 5 P2 = 5 · 4 = 20, 6 P3 = 6 · 5 · 4 = 120, 8 P1 = 8, 5 P0 = 1, 6 P6 = 6! = 720. 3.3.2. 6 P3 = 6 · 5 · 4 = 120. 3.3.3. n! = n · (n − 1) · (n − 2) · · · 1 = n · [(n − 1) · (n − 2) · · · 1] = n · (n − 1)! 3.3.4. 5 C2 = 10, 6 C3 = 20, 8 C1 = 8, 5 C0 = 1, 6 C6 = 1. 3.3.5. {ABC ACB BAC BCA CAB CBA}, {ABD ADB BAD BDA DAB DBA}, {ACD ADC CAD CDA DAC DCA}, {BCD BDC CBD CDB DBC DCB}. The number of permutations is 4 P3 = 4 · 3 · 2 = 24 and each of the four marked sets containing six permutations corresponds to an unordered selection, that is, to a combination. Thus, by the division principle, the number of combinations must be 4 P3 /3! = 24/6 = 4, and this is, indeed, how many sets we got. 3.3.6. 12 a) 10 2 2 = 2970, 10 12 b) 1 3 = 2200, 12 c) 10 4 0 = 210, 22 d) 4 = 7315. 3.3.7. 6 P4 = 6 · 5 · 4 · 3 = 360. 3.3.8. 1 + 5 · 1 + 52 · 1 + 53 · 1 + 54 · 6 = 3906. 3.3.9. a) 5!/5 = 24, b) 5!/10 = 12. 3.3.10. a) 7 P5 /7 = 360, b) 7 P5 /14 = 180. 3.3.11. = 1666, a) 4999 3 b) 4999 = 1249, 4 c) 4999 = 416, 12 d) 4999 − 1666 − 1249 + 416 = 2500.

3. Combinatorial Problems

13

3.4.1. 1 1

1

1

2

1 1 1 1 1 1

4 5

6 7

8

3

28

6 10

15 21

1 4

1

10 20

35 56

1 3

5 15

35 70

6 21

56

7 28

1 9 36 84 126 126 84 36 3.4.2. 6 (a + b) = a6 + 6a5 b + 15a4 b2 + 20a3 b3 + 15a2 b4 + 6ab5 + b6 . 3.4.3. (1 + x)5 = x5 + 5x4 + 10x3 + 10x2 + 5x + 1. 3.4.4. (2x − 3)5 = 32x5 − 240x4 + 720x3 − 1080x2 + 810x − 243. 3.4.5. 10 8 = 45. 3.4.6. Consider the case of three sets A, B, C with only a single element in each, common to all of them. Then n (A) + n (B) + n (C) = (the number of ways of choosing one of the three sets)·1 = 31 , n (AB) + n (BC) + n (AC) = (the number of ways of choosing two of the three sets)·1 = 32 , and n (ABC) = 33 . Furthermore, n (A ∪ B ∪ C) = 1, and the selection of A∪B ∪C is equivalent to the selection of none of A, B, C. Thus, in the present case, n (A ∪ B ∪ C) = n (A) + n (B) + n (C) − [n (AB) + n (BC) + n (AC)] + n (ABC) is equivalent to 30 = 31 − 32 + 33 . 3.4.7. n n n n k k n−k a) = (4 + 1)n = 5n k=0 k 4 = k=0 k 4 · 1 n n n n k k n−k b) = (x + 1)n for any x = 0. k=0 k x = k=0 k x · 1 3.4.8. 9 11 10 a) With Alice: 1 · 11 1 2 = 396, without Alice: 2 2 = 2475, with or without Alice: 396 + 2475 = 2871. 10 11 10 b) With Alice: 10 1 2 = 450, without Alice: 2 2 = 2475, with or without Alice: 450 + 2475 = 2925. 11 10 c) With Alice: 1 · 10 2 = 45, without Alice: 2 2 = 2475, with or without Alice: 45 + 2475 = 2520. 9 11 10 d) With Alice: 1 · 11 1 · 1 · 1 = 99, without Alice: 2 2 = 2475, with or without Alice: 99 + 2475 = 2574. 3.4.9. a) 2n − n0 − n1 = 2n − 1 − n, b) n0 + n1 + n2 + n3 + n4 . 3.4.10. nr equals the number of ways of choosing r objects out of n. Let x and y denote two of the n objects. Then the selected r objects will contain

14

3. Combinatorial Problems

0, 1 or 2 of x and y. The number of ways of selecting r objects with neither x nor y is 20 n−2 , the number of ways of selecting r objects with one of r them is

2 1

n−2 r−1

, and the number of ways of selecting r objects with both

n n−2 n−2 x and y is 22 n−2 + 2 n−2 r−2 . Thus, r = r r−1 + r−2 . 3.5.1. a) 26 , b) 2 6 4 = 62 = 6·5 2·1 = 15, 6! c) 2 61 3 = 2!·1!·3! = 6·5·4 2 = 60, 6 4 d) 2 4 · 2 = 15 · 16 = 240. 3.5.2. Let r denote one step to the right and u one step up. Then each path of the rook from the lower left to the upper right corner can be represented as a string of 14 letters made up of 7 r’s and 7 u’s in any order. For example, the string rrrrrrruuuuuuu represents the path of moving straight to the lower right corner and then straight up. There are 7 14 7 = 3432 such strings. 3.5.3. 7! = 420, a) 2!·3! 5! b) 2! = 60, c) 6! 2! − 60 = 300, d) 300 − 60 = 240. 3.5.4. a)

n n1 , n2 , n3

n! n!(n − n1 )! = n1 !n2 !n3 ! n1 !(n − n1 )!n2 !n3 ! (n − n1 )! n! (n − n1 )! n! = = n1 !(n − n1 )! n2 !n3 ! n1 !(n − n1 )! n2 !n3 ! (n − n1 )! n n − n1 n! = = . n1 !(n − n1 )! n2 !(n − n1 − n2 )! n1 n2 =

b) n1 , nn2 , n3 is the number of ways of permuting n1 , n2 , n3 respectively of the three kinds of objects from a total of n1 + n2 + n3 = n objects. This selection can also be done by first selecting n1 places out of the n places for the first kind of object, which can be done in nn1 ways, and then selecting n2 places for the second kind of object out of the n − n1 places remaining after 1 the first step, and this can be done in n−n ways. At this point, there will n2 automatically remain n3 = n − (n1 + n2 ) places for the third kind of object. Thus, by the multiplication principle, the number of such permutations is n n−n1 n1 n2 . 3.5.5. 7! a) 2 3 7 2 0 = 2!·3!·2! = 210, 3 2 b) 210 · 2 · (−3) = −22, 680.

3. Combinatorial Problems

15

3.5.6. 4 can be partitioned, apart from order, into three terms as 4 + 0 + 0, 3 + 1 + 0, 2 + 2 + 0, 2 + 1 + 1. The corresponding multinomial coefficients are computed in Example 3.5.2 in the text. Thus, 4

(2 + 1 + 3)

= 1 · 24 + 14 + 34 +

4 · 23 · 11 + 23 · 31 + 13 · 21 + 13 · 31 + 33 · 11 + 33 · 21 +

6 · 22 · 12 + 22 · 32 + 12 · 32 + 12 · 22 · 1 · 3 + 2 · 12 · 3 + 2 · 1 · 32 = 1 · 98 + 4 · 118 + 6 · 49 + 12 · 36 = 1296. 3.5.7. a) This is like putting k = 10 indistinguishable balls into n = 3 distinguishable boxes. It can be done in n−1+k = 12 n−1 2 = 66 ways. b) There are 9 spaces between the 10 balls if we put them in a row. With two dividing bars, we can divide the balls into 3 groups. So, the number of ways of dividing them into 3 nonempty groups is 92 = 36. 3.5.8. If we put k balls down in a row, then there are k − 1 spaces between the balls. Placing n − 1 dividers in those spaces separates the balls into n groups. If no space gets more than one divider, then each group will have at least one ball. Thus, the number of ways of placing k indistinguishable balls into n ≤ k k−1 distinguishable boxes with each box getting at least one ball is n−1 . Alternatively, you may put one ball into each box and then distribute the remaining k − n balls into the n boxes arbitrarily, that is use the old formula n−1+k n−1+k with k replaced by k − n. n−1, k = n−1 3.5.9. You have to choose k boxes out of n. This can be done in nk ways. 3.5.10. a) The terms are of the form n1 , n62 , n3 an1 bn2 cn3 , where n1 + n2 + n3 = 6, and each term is uniquely determined by the choice of n1 , n2 , n3 . Now, if we distribute k = 6 balls into n = 3 boxes and call the numbers of balls in the boxes n1 , n2 , n3 , then each arrangement of the balls corresponds exactly to one term in the expansion. In other words, the letters may be regarded as boxes and the exponents as balls. Thus, the number of terms in the expansion of (a + b + c)6 is 3−1+6 = 82 = 28. 3−1 b) Similarly, the number of terms in the expansion of (a + b + c + d)5 is 4−1+5 = 83 = 56. 4−1

4. Probabilities

4.1.1. a) P(A) = 13 52 , 12 , b) P(B) = 52 4 c) P(C) = 52 , 3 d) P(A ∩ B) = 52 , 22 e) P(A ∪ B) = 52 , 4 f) P(B ∩ C) = 52 , g) P(B ∩ C) = 0, h) P(B ∪ C) = 1− P(B ∩ C) = 48 52 . 4.1.2. 6 a) P(A) = 36 , 18 b) P(B) = 36 , c) P(C) = 12 36 , 3 d) P(A ∩ B) = 36 , 21 e) P(A ∪ B) = 36 , 6 f) P(B ∩ C) = 36 , 6 g) P(B ∩ C) = 36 , 30 h) P(B ∪ C) = 36 . 4.1.3. By the solution of Exercise 2.3.2, the event that exactly one of A, B, C occurs is P1 = AB C ∪ ABC ∪ A BC. Since the events in this union are disjoint, Axiom 3 says that P (P1 ) = P AB C + P ABC + P A BC . On the other hand, the Venn diagram of Figure 1.3 and Axiom 3 again give P(A) + P(B) + P(C) − 2 [P(AB) + P(AC) + P(BC)] + 3P(ABC) = P(ABC) + P ABC + P(ABC) + P(ABC)

+ P(ABC) + P(ABC) + P(ABC) + P(ABC) + P(ABC) + P ABC + P(ABC) + P A BC − 2 P(ABC) + P(ABC) + P(ABC) + P ABC + P(ABC) + P(ABC) + 3P(ABC) = P AB C + P ABC + P A BC 4.1.4. By Corollary 4.1.1, P(A∪B) ≤ 1, and so, by Theorem 4.1.2, P(A)+ P(B)− P(AB) ≤ 1. Rearranging gives P(A)+ P(B) − 1 ≤ P(AB).

18

4. Probabilities

4.1.5. Since A − B = AB, we have, as in Problem 3.1.3, P (A) = P (A − B) + P (AB) , and so we always have P (A − B) = P (A) − P (AB) . Thus, P (A − B) = P (A) − P (B) if and only if P (AB) = P (B) . This relation is true, in particular, if AB = B, that is, if B ⊂ A. But P (AB) = P (B) can also hold if AB = B but P(AB) = 0, because P (B) = P(AB)+ P (AB) for any A and B. 4.1.6. 1. The required formula is P(A △ B) = P(A) + P(B) − 2P(AB). Proof: By Axiom 3, P(A) = P AB + P(AB) and P(B) = P(AB)+ P(AB) . Adding, we get P(A) + P(B) = P AB + 2P(AB) + P(AB). Subtract 2P(AB) on both sides. Then P(A) + P(B) − 2P(AB) = P AB + P(AB), and the right hand side equals P(AB ∪ AB) by Axiom 3. 2. Using the numbered regions of Figure 1.3 in the book, we can write P(A △ C) + P(C △ B) = P(AC ∪ AC) + P(CB ∪ CB) = P ({2, 4, 5, 7}) + P ({3, 4, 6, 7}) ≥ P ({2, 3, 5, 6}) = P(A △ B). 4.1.7. a) This result follows at once from Theorem 4.1.2 because we are subtracting the (by Axiom 1) nonnegative quantity P (AB) from P(A)+ P(B) on the right of Equation 4.1 to get P(A ∪ B). b) Apply the result of Part a) with A ∪ B in place of A, and C in place of B. Then we get P((A ∪ B) ∪ C) ≤ P(A ∪ B)+ P(C). Now, apply the result of Part a) to A ∪ B, and we obtain P((A ∪ B) ∪ C) ≤ P(A)+ P(B)+ P(C). Since unions are associative, this proves the required result. c) This relation can be proved by induction: As seen above, it is true for n = 2 and 3. For any larger n, assume the formula to be true for n − 1. Then we can prove it for n as follows: P( ni=1 Ai ) = P( n−1 i=1 Ai ∪ An ), and by Part n−1 a), P( n−1 A ∪ A ) ≤ P( A ) + P (A ) . By the induction hypothesis, i n i n i=1 i=1 n−1 P( n−1 A ) ≤ P(A ), and so, putting all these relations together, we i i i=1 i=1 get P( ni=1 Ai ) ≤ ni=1 P(Ai ). 4.1.8. a) P({a, b}) = 37 , P({a, c}) = 57 , P({a, d}) = 17 , P({b, c}) = 76 , P({b, d}) = 2 4 3 5 7 , P({c, d}) = 7 , P({a, b, c}) = 1, P({a, b, d}) = 7 , P({a, c, d}) = 7 , 6 P({b, c, d}) = 7 , P(S) = 1. b) For example, if A = {a, b} and B = {b, d}, then AB = {b} = B, but P(AB) = P(B) = 27 . 4.2.1. a) S = {ASAH, ASKS, ASKH, AHAS, AHKS, AHKH, KSAS, KSAH, KSKH, KHAS, KHAH, KHKS}. 8 b) P(A and K) = 12 = 23 . c) There are 6 possible unordered pairs, 4 of which are favorable. So, P(A and K) = 46 = 23 .

4. Probabilities

19

d) Here we are drawing without replacement and so each pair consists of two different cards. Thus, each unordered pair corresponds to two ordered 1 pairs and therefore each one has probability 2 · 12 = 16 . In Example 3.2.2, some unordered pairs correspond to two ordered pairs and some to one. 4.2.2. 2 P(K first and A second) = P (A first and K second) = 422 = 14 . Thus P(K and A in either order) = 2 · 14 = 12 . We cannot obtain this result by counting unordered pairs, because the 10 possible unordered pairs are not equally likely: For instance P(AS, AH in 1 either order) = 2 · 412 = 18 , but P(AS, AS) = 16 . 4.2.3. We did not get P(at least one six) = 1, in spite of the fact that on each throw the probability of getting a six is 61 , and 6 times 16 is 1, for two reasons: First, we would be justified in taking the 16 six times here only if the events of getting a six on the different throws were mutually exclusive; then the probability of getting a six on one of the throws could be computed by Axiom 3 as 6 · 16 , but these are not mutually exclusive events. Second, the event of getting at least one six is not the same as the event of getting a six on the first throw, or on the second, or etc. 4.2.4. a) 1 1

51 12 52 13

=

51! 13!39! 13 1 · = = . 12!39! 52! 52 4

b) 13 5

39 8 52 13

=

13! 39! 13!39! · · ≈ 0.125. 5!8! 8!31! 52!

c) 13 2 13 5 2 52 13

1 1

=

13! 5!8!

2

·

13! 13!39! · ≈ 0.0002. 2!11! 52!

4.2.5. 5 P(different numbers with three dice) = 6·5·4 63 = 9 . 4.2.6. m + n people can be seated in (m + n)! ways. The number of favorable cases is (n + 1) m!n!, because the number of women before the group of men can be 0, 1, 2, . . . , or n, for n + 1 choices, and in each case the men can be permuted m! ways among themselves and the women n! ways. Thus P = (n+1)m!n! (m+n)! . 4.2.7. m + n people can be seated in (m + n)! ways. The number of favorable cases is (m + n) m!n!, because the group of men can start at any one of the

20

4. Probabilities

m+n seats and must be followed by the group of women, and in each case the men can be permuted m! ways among themselves and the women n! ways. m!n! Thus, P = (m+n)m!n! = (m+n−1)! . (m+n)! 4.2.8. This is like the birthday problem. So, P(at least 2 on same floor) = 1− P(all on different floors) = 1 − 88P66 ≈ 0.923. 4.2.9. This problem is like sampling good and bad items without replacement. The good items are the player’s numbers and the bad ones are the rest. Thus, (6)(36) 1 −7 and P(match 5) = P(jackpot) = 6 42 0 = 6!36! 42! = 5,245,786 ≈ 2 · 10 (6) 6 36 (5)( 1 ) 6·36 108 −5 = 6 · 36 · 6!36! 42! = 5,245,786 = 2,622, 893 ≈ 4 · 10 . (42 6) 4.2.10. (2)(98) 98! 10·9 1 a) P(both) = 2 1008 = 8!90! · 10!90! 100! = 100·99 = 110 . ( 10 ) (2)(98) 98! 90·89 89 · 10!90! b) P(neither) = 0 10010 = 10!88! 100! = 100·99 = 110 . ( 10 ) (1)(99) 99! 10 1 · 10!90! c) P(A in sample) = 1 1009 = 9!90! 100! = 100 = 10 . ( 10 ) (2)(98) 98! 10·90 2 · 10!90! d) P(A or B but not both) = 1 1009 = 2 · 9!89! 100! = 2 · 100·99 = 11 . ( 10 ) 4.2.11. 3 a) 31 12 = 38 , 3 7 2 b) 31 10 = 0.441, 10 2 3 3 7 c) 2 10 10 = 0.189. 4.2.12. The first card can be any card and the second card must be of the same denomination. Thus, 3 for n = 1: P(pair) = 51 ≈ .059, 7 for n = 2: P(pair) = 103 ≈ .068, 15 for n = 4: P(pair) = 207 ≈ .072, 23 for n = 6: P(pair) = 311 ≈ .074, 31 for n = 8: P(pair) = 415 ≈ .075. 4.2.13. By the result of Example 2.5.3, the number of ways of placing k indistinguishable balls into n distinguishable boxes is k+n−1 . This is the total k number of cases. Now, if a given cell contains exactly m particles, then the remaining n − 1 cells contain k − m particles. Thus, the number of ways of filling those is given by the same formula with k replaced by k − m and n by n − 1, that is, by (k−m)+(n−1)−1 = k+n−m−2 . Since the selected cell k−m k−m can be filled with m indistinguishable particles in just one way, the number of favorable cases is the same. 4.2.14. By the result of Exercise 3.5.8, the number of ways of placing k indistinguishable balls into n distinguishable boxes with each box getting at k−1 least one ball is n−1 . This is the total number of cases. Now, if exactly

4. Probabilities

21

m cells contain no particles, then the remaining n − m cells contain all k particles with each cell getting at least one particle.. Thus, the number of ways of filling those is given by the same formula with the same k and n k−1 replaced by n − m, that is, by (n−m)−1 . Since the m empty cells can be selected

n m

ways, the number of favorable cases is

n m

k−1 n−m−1

. The total

k+n−1 k

number of cases is for Bose-Einstein statistics. 4.2.15. To get 5 cards of different denominations, we may first choose the 5 denominations out of the 13 possible ones and then choose one card from the 4 cards of each of the selected denominations. Thus, P(all different) = 4 5 (13 5 )(1) ≈ 0.507. (Note that we have included “straights” and “flushes” in the 52 (5) count, that is, cards with five consecutive denominations or five cards of the same suit, which are very valuable hands, while the other cases of different denominations are poor hands.) 4.2.16. In poker, P(two pairs) =

13 2

4 2

4 2 52 5

44 1

≈ 0.048.

4.2.17. For the pair we have 13 possible denominations and then for the triple, 12 possible denominations. For the pair we have 42 choices from the 4 cards of the selected denomination and for the triple 43 . Thus, P(full house in 13·12·(42)(43) poker) = ≈ 0.0014. (52 5) 4.2.18. In poker dice, P(four of a kind) =

6·5· 4 5 1 ≈ .019. 65

4.2.19. In poker dice, we have 6 possible numbers for the pair and then 5 for the triple. These selections can be ordered in 253 ways. Thus 6·5·(253) ≈ 0.0386. P(full house in poker dice) = 65 4.2.20. Combinatorially: The total number of ordered selections of n items out of N different ones is N Pn . The number of ordered ways of choosing n1 good items out of the total of N1 good items is N1 Pn1 and similarly for the bad items the number of ordered choices is N2 Pn2 . These choices can be arranged in n1 ,nn2 different orders. Thus, by the multiplication principle, the number of favorable choices is n1 ,nn2 N1 Pn1 N2 Pn2 . Algebraically:

22

4. Probabilities n n1 , n2 N1 Pn1 N2 Pn2 N Pn

=

n! n1 !n2 !

·

N1 !

(N1 −n1 )!

·

N2 !

(N2 −n2 )!

N!/(N − n)! N2 ! n1 !(N1 −n1 )! · n2 !(N2 −n2 )! N1 !

=

N! n!(N−n)!

=

N1 n1

N2 n2 N n

.

4.2.21. If 0 ≤ n1 ≤ n, n1 ≤ N1 and n − n1 ≤ N2 , then the last inequality is equivalent to n − N2 ≤ n1 , which together with 0 ≤ n1 means that n1 is greater than or equal to both 0 and n − N2 , and so max(0, n − N2 ) ≤ n1 . The middle two inequalities say that n1 is less than or equal to both n and N1 , and so n1 ≤ min(n, N1 ). Thus, 0 ≤ n1 ≤ n, n1 ≤ N1 and n − n1 ≤ N2 imply max(0, n − N2 ) ≤ n1 ≤ min(n, N1 ). Conversely, if max(0, n − N2 ) ≤ n1 ≤ min(n, N1 ), then the first part implies that 0 ≤ n1 and n − N2 ≤ n1 , or n − n1 ≤ N2 , and the second part implies that n1 ≤ n and n1 ≤ N1 . Thus, max(0, n − N2 ) ≤ n1 ≤ min(n, N1 ) implies 0 ≤ n1 ≤ n, n1 ≤ N1 and n − n1 ≤ N2 . 4.2.22. a) P(4 aces and 3 spades) =

12 2

4 4 52 13

36 7

≈ 0.0008,

b) P(at least 3 of each suit) =

4

13 3 13 3 4 52 13

≈ 0.105.

4.3.1. Let E = “even” and O = “odd” and consider the sample space S = {EEE, EEO, EOE, EOO, OEE, OEO, OOE, OOO} for throwing three dice. Then A = {EEE, EEO, EOE, EOO}, B = {EEE, EOO, OEE, OOO}, and AB = {EEE, EOO}. The elementary events are equally likely, and so P (A) = P (B) = 48 = 21 and P (AB) = 28 = 41 . Hence, P (A) P (B) = P (AB) . 7 12 1 4.3.2. From Figure 4.1 below, P(A) = 21 36 = 12 , P(B) = 36 = 3 , and 7 7 1 P(AB) = 36 = 12 · 3 .

4. Probabilities

23

Fig. 4.1.

4.3.3. 4 = 19 . Also, AC = ABC = {(3, 6)} P (A) = P (B) = 12 and P (C) = 36 1 and so P (AC) = 36 . Thus, P (AC) = P (A) P (C) and A and C are not 1 independent, but P (ABC) = 36 = P (A) P (B) P (C) . 4.3.4. P(AB) = P(HH) = 14 = 12 · 21 = P(A)P(B) , P(AC) = P(HT ) = 14 = 1 1 1 1 1 2 · 2 = P(A)P(C) , P(BC) = P(T H) = 4 = 2 · 2 = P(B)P(C) , but P(ABC) = 1 1 1 P(∅) = 0 = 2 · 2 · 2 . 4.3.5. a) Let A and B be independent. Then P AB = P (A) − P (AB) = P (A) − P (A) P (B) = P (A) [1 − P (B)] = P (A) P B . b) Similarly, P(A B) = P B − P AB = P B − P (A) P B = [1 − P (A)] P B = P(A)P B . 4.3.6. a) No: If A and B are independent and P(A) = 0 and P(B) = 0, then P(AB) = P(A)P(B) = 0, but then A and B are not mutually exclusive, because if they were, then we would have AB = ∅ and P(AB) = 0. b) No: If A and B are mutually exclusive, then AB = ∅ and P(AB) = 0, and if A and B were independent, then we would also have P(A)P(B) = 0, and so at least one of P(A) or P(B) would be equal to 0, in contradiction to the assumption that P(A) = 0 and P(B) = 0. 4.3.7. 5 5 5 5 p (0) = 21 , p (1) = 5 · 21 , p (2) = 10 · 12 , p (3) = 10 · 12 , 5 5 p (4) = 5 · 21 , p (5) = 12 . See Figure 4.2.

24

4. Probabilities

Fig. 4.2.

4.3.8. 4 5 2 P(exactly 4 sixes) = 64 16 ≈ 0.00804, 6 6 1 5 5 1 P(exactly 5 sixes) = 5 6 ≈ 0.00064, 6 1 6 P(exactly 6 sixes) = 6 ≈ 0.00002, P(at least 4 sixes) = P(4, 5, or 6 sixes) ≈ 0.00870, P(at most 3 sixes) = 1− P(at least 4 sixes) ≈ 0.99130. 4.3.9. 5 The probability that a ball picked at random is red is 15 = 13 . Similarly, the probability is the same for white and also for blue. Thus, the probability for any color combination in a given order for six independently chosen balls 6 is 31 . We can obtain two of each color in 2, 62, 2 different orders. Thus,

a) b) c) d) e)

6

P(two of each color) = 2, 62, 2 13 ≈ 0.123. 4.3.10. A ∪ B ∪ C = S implies P(A ∪ B ∪ C) = 1. Also, by Theorem 4.1.3 and the assumed independence, P(A∪B ∪C) = P(A)+P(B)+P(C)−P(A)P(B)−P(A)P(C)−P(B)P(C) + P(A)P(B)P(C) = 1. Hence, (P(A) − 1) (P(B) − 1) (P(C) − 1) = 0, and so at least one of P(A), P(B), or P(C) must equal 1. The other two are unrestricted. Alternatively, by DeMorgan’s law, A ∪ B ∪ C = A B C, and so P (A ∪ B ∪ C) = P A B C = 1 − P A B C = 1. Thus, by the assumed independence, P(A)P(B)P(C) = 0, and so at least one of P(A), P(B), or P(C) must equal 0, that is, at least one of P(A), P(B), or P(C) must equal 1.

4. Probabilities

25

4.3.11. If A, B, and C are pairwise independent and A is independent of B ∪ C, then, on the one hand, P (A (B ∪ C)) = P (AB ∪ AC) = P (AB) + P (AC) − P (ABC) = P (A) P(B) + P (A) P(C) − P (ABC) , and on the other hand, P (A (B ∪ C)) = P (A) P (B ∪ C) = P (A) [P(B) + P(C) − P (BC)] = P (A) P(B) + P (A) P(C) − P (A) P(B)P (C) . Thus, P (ABC) = P (A) P(B)P (C) , and this relation, together with the assumed pairwise independence, proves that A, B and C are totally independent. 4.3.12. Let A, B, and C be pairwise independent events and A be independent of BC. Then P (A) P(B) = P(AB) = P(ABC ∪ ABC) = P(ABC) + P(ABC)

= P(ABC) + P (A) P(BC) = P(ABC) + P (A) P(B)P(C).

(In the last step, we used the result of Exercise 4.3.5.) Thus, P(ABC) = P (A) P(B) − P (A) P(B)P(C)

= P (A) P(B) 1 − P(C) = P (A) P(B)P(C).

4.4.1. 9 11 6 6 a) fA = 20 , fB = 20 , fAB = 20 , fA|B = 11 , fB|A = 69 . b) P (A) = 12 , P(B) = 21 , P (AB) = 14 , P (A|B) = 12 , P (B|A) = 12 . 4.4.2. 3 1 P(w ≤ 3 and b + w = 7) = 36 = 12 , P(w ≤ 3|b + w = 7) = 36 = 21 , 3 1 P(b + w = 7|w ≤ 3) = 18 = 6 . 4.4.3. If A = {K or 2} and B = {J, Q, K}, then AB = {K} and P (A|B) = 13 . 4.4.4. a) P(Republican) = .25, b) P(under 30) = .30 c) P(Republican if under 30) = .05/.30 = 0.166 . . . d) P(under 30 if Republican) = .05/.25 = .20 e) P(Democrat) = .40 f) P(Democrat if under 30) = .095/.30 = .3166 . . . g) P(Independent) = .35 h) P(Independent if under 30) = .155/.30 = 0.5166 . . . .

26

4. Probabilities

4.4.5. By Theorem 4.4.1, Part 3, P(Republican | under 30) + P(Democrat | under 30) + P(Independent | under 30) = P(Republican or Democrat or Independent | under 30) and under 30) 30) = P(S | under 30) = P(SP(under = P(under 30) P(under 30) = 1. 4.4.6. 8 9 P(A)+ P(B)− P(AB) = P(A ∪ B) ≤ 1, and so 10 + 10 − P(AB) ≤ 1. P(AB) 7/10 9 7 8 Hence, P(AB) ≥ 10 + 10 − 1 = 10 and P(A|B) = P(B) ≥ 9/10 = 79 . 4.4.7. Whether the selected girl is the first, second or third child in the family, her siblings, in the order of their births, can be bb, bg, gb, or gg. In two of these cases does the family have two girls and one boy. Thus, P(two girls and one boy | one child is a girl) = 42 = 12 . 4.4.8. 2 3 P(exactly one six) = 3 · 16 · 65 and P(at least one six) = 1 − 56 . Thus, 2 3· 1 ·( 5 ) P(exactly one six | at least one six) = 6 56 3 ≈ 0.824. 1−( 6 ) 4.4.9. The number of ways of drawing two Kings, which are also face cards, without replacement, is 4 · 3, and the number of ways of drawing two face 4·3 1 cards is 12 · 11. Thus, P(two Kings | two face cards) = 12·11 = 11 . 4.4.10. For A and B, with P(B) = 0 and P(B) = 0, assume P(A|B) = P(A|B). This means that P(AB) P(AB) , = P(B) P(B) which is equivalent to P(AB) (1 − P(B)) = P(AB)P(B). Rearranging this equation, we get P(AB) = (P(AB) + P(AB)) P(B) = P(AB ∪ AB) P(B), and so P(AB) = P(A)P(B), which shows that A and B are independent of each other. The argument can be reversed to prove the converse. 4.4.11. (4)(48) 8 . P(exactly one King | at most one King) = 4 481 + 14 48 = 55 (1)( 1 ) (0)( 2 ) 4.4.12. With replacement: P(exactly one King | at most one King) =

2 0

2 1 1 12 1 1 13 13 2 1 0 12 2 1 1 + 1 13 13 13

12 1 13

1 = . 7

4. Probabilities

27

4.4.13. a) Let A = ”no spades” and B = ”five hearts.” Then 13 0

P (AB) =

13 5 52 13

13 5

P (B) =

39 8 52 13

26 8

,

,

and so 26 8 39 8

P (A|B) =

575 ≈ 0.02539. 22 644

=

b) From above, A = ”at least one spade,” and so P AB = P (B) − P (AB) , and P A|B =

P AB P (B) − P (AB) = P (B) P (B)

= 1 − P (A|B) = 1 − 4.5.1. See Figure 4.3.

Fig. 4.3.

P (W ) =

1 2

·

1 4

+

1 2

·

3 5

=

17 40 .

26 8 39 8

≈ 0.9746.

28

4. Probabilities

4.5.2. See Figure 4.4.

Fig. 4.4.

P(Black and White) = Figure 4.4. 4.5.3. a)

1 2

·

3 4

·

1 3

+

1 2

P(both are Aces|one is an Ace) =

·

1 4

·1+

1 2

·

2 5

·

3 4

+

4·3 1 = . 4 · 51 + 48 · 4 33

1 2

·

3 5

·

2 4

=

11 20 .

See

4. Probabilities

29

b) P(both are Aces|one is a red Ace) = P(a red Ace plus another Ace) = P(one is a red Ace)

2 52 2 52

+

·

3 51

2 52

·

+

2 52

2 51

+

2 51 48 2 52 · 51

·

=

5 . 101

c) P(both are Aces|one is AS) =

1·3+3·1 1 = . 1 · 51 + 51 · 1 17

d) P(one is AS|both are Aces) = P(AS plus another Ace) = P(both are Aces) .

1 52

·

3 3 51 + 52 4 3 52 · 51

·

1 51

1 = . 2

4.5.4. Let Am denote the event that Alice with initial capital m is ruined. The difference equation for P(Am ) is the same as in Example 4.5.5: P(Am ) = P(Am+1 ) · 21 + P(Am−1 ) · 12 . It is just one of the boundary conditions that changes: we still have P(A0 ) = 1, but the game ends if Alice wins all of Bob’s money, and so P(Am+n ) = 0. Thus, the solution is given by Equation 4.81, but with n replaced by m + n, that is, the probability that Alice is ruined is m n P(Am ) = 1 − m+n = m+n , which is the ratio of Bob’s initial capital to the total capital of the two players. By symmetry, the probability of Bob’s ruin m is m+n . Furthermore, we can see from these formulas that eventually one of m n the two players will be ruined with certainty, because m+n + m+n = 1, that is, a draw is not possible. 4.5.5. Equation 4.80 becomes P(Am ) = P(Am+1 )·p+ P(Am−1 )·q for 0 < m < n, where q = 1 − p and Am denotes the event that the gambler with initial capital m is ruined. First, we try to find constants λ such that P(Am ) = λm for 0 < m < n, just as in the analogous,but more familiar, case of linear homogeneous differential equations with constant coefficients. Substituting from here into the first equation, we get λm = pλm+1 + qλm−1 , and canceling λm , 1 = pλ + λq or, equivalently, the quadratic equation pλ2 − λ + q = 0. The √ 1−4pq solutions are λ = 1± 2p . Now, 1 − 4pq = (p − q)2 , and so λ = 1±|p−q| . 2p 1+q−p = 1 and λ = = Separating the two roots, we obtain λ1 = 1+p−q 2 2p 2p q m p and for the general solution of the difference equation P(Am ) = aλ1 + m

q bλm . As in Example 4.5.5, we have the boundary conditions 2 = a+b p P(A0 ) = 1 and P(An ) = 0 and we use them to determine the constants a

and b. Consequently, a + b

q p

0

= a + b = 1 and a + b

q p

n

= 0. Hence,

30

4. Probabilities

b=

1 1−(q/p)n

−(q/p)n 1−(q/p)n . Thus, the (q/p)m −(q/p)n , if he starts with 1−(q/p)n

and a =

probability of the gambler’s ruin

is P(Am ) = m dollars and stops if he reaches n dollars. If q < p, that is, the game is favorable for our gambler, then n limm→∞ (q/p) = 0, and so the gambler may play forever without getting m

. ruined and the probability that he does not get ruined is 1 − pq 4.5.6. The prisoner should put one white marble in one of the urns, say in U1 , 74 and all the others in U2 . Then P(W ) = 12 · 1 + 12 · 49 99 = 99 . The fact that this arrangement is the best one can be seen by showing that any other arrangement of the 100 balls can be improved by transferring balls from one urn to the other, until we reach this arrangement, which cannot be further improved. So, let U1 contain a1 ≥ 1 white and b1 ≥ 1 black balls, and U2 contain a2 = 50 − a1 white and b2 = 50 − b1 black balls. Also, suppose that a1 + b1 ≤ a2 + b2 . (If not, then switch the numbering of the urns.) Now transfer all the black balls from U1 to U2 . Then the improvement in P(W |U1 ) is 1−

a1 b1 = a1 + b1 a1 + b1

and the worsening in P(W |U2 ) is a2 a2 a2 b1 − = a2 + b2 a2 + b2 + b1 (a2 + b2 ) (a2 + b2 + b1 ) b1 a2 b1 ≤ · < . a1 + b1 a2 + 50 a1 + b1 Thus, in P(W ) = 21 · P(W |U1 )+ 12 · P(W |U2 ) there is a net increase as a result of the transfer. Next, if a1 > 1, then transfer a1 − 1 balls from U1 to U2 . This move does not change P(W |U1 ) , but increases P(W |U2 ) as much as possible. 4.5.7. P(U1 |W ) =

P(W |U1 )P(U1 ) = P(W |U1 )P(U1 ) + P(W |U2 )P(U2 )

1 4

1 1 4 · 2 1 3 2 + 5

·

·

1 2

=

5 . 17

4.5.8. P(U1 |BW ∪ W B) =

P(BW ∪ W B|U1 )P(U1 ) = P(BW ∪ W B)

3 4

·

1 3

1 4 11 20

+

·1

1 2

=

5 . 11

4.5.9. P(GG|G) P(G|GG)P(GG) P(G|GG)P(GG) + P(G|BG)P(BG) + P(G|GB)P(GB) + P(G|BB)P(BB) 1 · 14 1 = = . 1 1 1 2 1 · 4 + 2 · 4 + 12 · 14 =

4. Probabilities

For other ways of solving this problem, see Example 4.4.5. 4.5.10. 3 · 23 · 81 P(BGG ∪ GBG ∪ GGB|G) = 1 1 1 0 · 8 + 3 · 3 · 8 + 3 · 23 · 81 + 1 ·

1 8

31

1 = . 2

4.5.11.

P(W B|BW ∪ W B) =

3 4

·

1 2 4 · 5 3 1 5 + 4

·

2 5

=

2 . 11

4.5.12. Let A = “marked correctly,” B1 = “knew the answer,” and B2 = B1 . Then 1 · 43 P (A|B1 ) P (B1 ) 15 P (B1 |A) = = = . 3 P (A|B1 ) P (B1 ) + P (A|B2 ) P (B2 ) 16 1 · 4 + 51 · 14 4.5.13. Let A = “the witness says the hit-and-run taxi was blue,” B1 = “the hitand-run taxi was blue,” and B2 = “the hit-and-run taxi was black.” Then P(B1 |A) =

.80 · .15 P(A|B1 )P(B1 ) = ≈ 0.41. P(A|B1 )P(B1 ) + P(A|B2 )P(B2 ) .80 · .15 + .20 · .85

Thus, the evidence against the blue taxi is very weak. 4.5.14. Let U1 = ”the box with two gold coins is picked,” U2 = ”the box with two silver coins is picked,” U3 = ”the box with one gold and one silver coin is picked,” G = ”a gold coin is picked.” Then, by Bayes’ Theorem, P (G|U1 ) P (U1 ) P (G|U1 ) P (U1 ) + P (G|U2 ) P (U2 ) + P (G|U3 ) P (U3 ) 1 · 13 2 = = . 1 3 1 · 3 + 0 · 13 + 12 · 13

P (U1 |G) =

4.5.15. 1. In this case, the host will open door no. 3 if and only if either the car is behind door no. 1 or door no.2, and those two cases are equally likely. Thus, P(car is behind 2|3 is opened) = 1/2. So, in this case it does not matter whether the player switches or not. 2. P(car is behind 2|3 is opened) P(car is behind 2 and 3 is opened) = P(3 is opened) P(3 is opened|car is behind 2)P(car is behind 2) = P(3 is opened|car is behind 1)P(car is behind 1) + P(3 is opened|car is behind 2)P(car is behind 2) 1 · (1/3) 1 = = . p · (1/3) + 1 · (1/3) p+1

32

4. Probabilities

Now, p ≤ 1, hence p + 1 ≤ 2, and dividing both sides by 2 (p + 1) , we get 1 1 ≤ = P(car is behind 2|3 is opened). 2 p+1 Thus the player should always switch doors, except when p = 1, it does not matter.

5. Random Variables

5.1.1. 39 52 The p.f. of X is given by f(x) = 13 for x = 0, 1, . . . , 5, with x 5−x 5 the histogram in Figure 5.1, and the d.f. of X is given by

0.4

0.3

0.2

0.1

0

1

2

x3

4

5

Fig. 5.1.

 0 if     .222 if     .633 if  F (x) ≈ .907 if   .989 if     .999 if    1 if

x 2π.

Thus, FR,Θ (r, θ) = FR (r) FΘ (θ) , and R and Θ are independent. 5.5.7.

Fig. 5.35.

If X and Y denote the arrival times of Alice and Bob, respectively, then they will meet if and only if |X − Y | ≤ 1 or, equivalently, X −1 ≤ Y ≤ X +1. Now, (X, Y ) is uniformly distributed on the square [2, 6] × [2, 6] , and the above condition is satisfied by the points of the shaded region in Figure 5.35., whose area is 42 − the area of the two triangles = 16 − 9 = 7. Thus, 7 . P(A and B meet) = 16 5.5.8. A triangle can be constructed if and only if the sum of any two sides is longer than the third side. In our case, this condition means the three inequalities: X + (Y − X) ≥ 2 − Y, Y − X + (2 − Y ) ≥ X, and X + (2 − Y ) ≥ Y − X. The first two are automatically satisfied and the third one can be reduced to Y ≤ X + 1. In the xy-plane the point (X, Y ) must lie in the [0, 1] × [1, 2] square and the y = x + 1 line cuts this square in half. Thus, P(Y ≤ X + 1) = 12 .

5. Random Variables

67

5.5.9.

Fig. 5.36.

Let the circle have radius r and choose a coordinate system with origin at the center of the circle and so that the first random point is A (0, r) . See Figure 5.36. Then if the second random point is within a distance r of the point A, then it must lie in the intersection of the original circle and another circle of radius r centered at A. From elementary geometry, the angle BOC r2 π of the sector BOC is 21 r2 2π is 2π 3 , and so the area 3 = 3 . The area of the √ √ 2 triangle BOC is 12 r 23 r = r 4 3 . Therefore the area of the intersection of the circles is 2

r2 π 3



√ r2 3 4

= r2

2π 3



√ 3 2

will be nearer to each other than r) =

. Thus, P(the two random points

1 2 πr2 r

2π 3



√ 3 2

=

2 3



√ 3 2π .

5.5.10. Clearly, P(Xi = 0) = P(Xi = 1) = 21 for all i, and P((X1 , X2 ) = (0, 0)) = P((X1 , X2 ) = (0, 1)) = P((X1 , X2 ) = (1, 0)) = P((X1 , X2 ) = (1, 1)) = 14 . Thus, P((X1 , X2 ) = (0, 0)) = P(X1 = 0)P(X2 = 0) , etc. By symmetry, similar relations hold for the pairs (X1 , X3 ) and (X2 , X3 ) as well. On the other hand, for instance, P((X1 , X2 , X3 ) = (0, 0, 0)) = 14 = P(X1 = 0)P(X2 = 0)P(X3 = 0) = 81 . 5.5.11. 1. fX (x) =

1 if 0 < x < 1 0 otherwise,

fY (y) =

1 if 0 < y < 1 0 otherwise,

68

5. Random Variables

and so z fX ( ) = y

1 if 0 < z < y 0 otherwise,

and, from Part 1 of Theorem 5.5.10, fZ (z) = (0,1)∩(z,∞)

1 dy = y

1 z

1 dy = − ln z if 0 < z < 1, y

and fZ (z) = 0 otherwise. 2. Now, fX (zy) =

1 if 0 < y < 1/z 0 otherwise

and, from Part 2 of Theorem 5.5.10,  if 0 < z < 1  1/2 fZ (z) = ydy = 1/ 2z 2 if z ≥ 1  (0,1)∩(0,1/z) 0 if z ≤ 0.

5.5.12. P(2 heads in the first four tosses and five heads altogether) = P(2 heads in the first four tosses and three heads in the last six) = 6 1 6 15 = 128 . 3 2

4 2

1 4 2 ·

5.5.13. 1. From Definition 5.2.3, P(T > 200) = 1 − F (200) = e−200/100 ≈ 0.135. 2. P(T < 400) = F (40) = 1 − e−40/100 ≈ 0.330. 3. P(max Ti > 200) = 1− P(max Ti ≤ 200) = 1− P(all Ti ≤ 200) = 10 1 − 1 − e−200/100 ≈ 0.76640

4. P(min Ti < 40) = P(all Ti < 40) = 1−P (all Ti ≥ 40) = 1− e−40/100 0.98168.

10



5.5.14. FY,Z (y, z) = P(Y ≤ y, Z ≤ z) = P(Y ≤ y) − P(Y ≤ y, Z > z) = P(X1 ≤ y, X2 ≤ y, ..., Xn ≤ y) − P(z < X1 ≤ y, z < X2 ≤ y, ..., z < Xn ≤ y) [FX (y)]n − [FX (y) − FX (z)]n if z < y = . [FX (y)]n otherwise 5.5.15. By Definition 5.2.3, fTi (t) =

0 if t < 0 λe−λt if t ≥ 0

5. Random Variables

69

s

for i = 1, 2, and from Equation 5.128, fS (s) = 0 fT1 (t)fT2 (s − t)dx = s s s −λt λe−λ(s−t) dt = λ2 e−λs 0 e−λt eλt dt = λ2 e−λs 0 1dt = λ2 se−λs if 0 λe s ≥ 0 and, clearly, fS (s) = 0 if s < 0. 5.5.16. Substituting the densities from Definition 5.2.3 into Equation 5.128 we get s s s fS (s) = 0 fT1 (t)fT2 (s−t)dx = 0 λe−λt µe−µ(s−t) dt = λµe−µs 0 e−λt eµt dt = −µs

s

λµe−µs 0 e(µ−λ)t dt = λµe e(µ−λ)s − 1 = µ−λ and, clearly, fS (s) = 0 if s < 0.

λµ µ−λ

e−λs − e−µs if s ≥ 0

5.5.17. By Definition 5.2.3, fTi (t) =

0 if t < 0 λe−λt if t ≥ 0

for i = 1, 2. If X = −T2 , then FX (x) = P (−T2 ≤ x) = P (T2 ≥ −x) = 1 − FT (−x) . Hence, ′ (x) = FT′ 2 (−x) = fT2 (−x) = fX (x) = FX

λeλx if x ≤ 0 . 0 if x > 0

Since Z = T1 − T2 = X + T1 , we get, by Equation 5.124, for z < 0, fZ (z) = ∞ z z f (x)fT (z − x)dx = −∞ λeλx λe−λ(z−x) dx = e−λz −∞ λ2 e2λx dx = −∞ X z e−λz λ2 e2λx −∞ = eλz λ2 . If z > 0, then fZ (z) =

∞ −∞

0

fX (x)fT (z − x)dx =

= e−λz

0

λ2 e2λx dx = e−λz

−∞

λeλx λe−λ(z−x) dx

−∞

λ 2λx e 2

0 −∞

λ = e−λz . 2

5.5.18. 1. As in Example 5.1.9, f (z − x) = 1 if 0 ≤ z − x ≤ 1, or equivalently, if z − 1 ≤ x ≤ z, and it is 0 otherwise. Thus the convolution formula gives z

fn+1 (z) = 0

z

fn (x)f(z − x)dx =

2. From Example 5.1.9,  0 if x < 0    x if 0 ≤ x < 1 f2 (x) = 2 − x if 1 ≤ x < 2    0 if 2 ≤ x.

fn (x)dx. z−1

70

5. Random Variables

So, using the result of Part 1 above with n = 2, we have for 0 ≤ z < 1, z 1 f3 (z) = 0 xdx = z 2 /2. For 1 ≤ z < 2, we obtain f3 (z) = z−1 xdx + z 2 1 (2 − x) dx = −z + 3z − 3/2. For 2 ≤ z < 3, we obtain f3 (z) = 2 2 (2 − x) dx = z /2 − 3z + 9/2. Otherwise f3 (z) = 0. Thus z−1      

0 if z < 0 z 2 /2 if 0 ≤ z < 1 f3 (z) = −z 2 + 3z − 3/2 if 1 ≤ z < 2   z 2 /2 − 3z + 9/2 if 2 ≤ z < 3    0 if 3 ≤ z,

graphed in Figure 5.37.

0.7 0.6 0.5 y0.4 0.3 0.2 0.1 0

0.5

1

1.5 z

2

2.5

3

Fig. 5.37. y = f3 (z)

Notice the resemblance to a normal curve (see Section 7.2). 5.5.19. The common density of X and Y is f (x) =

1 if 0 ≤ x ≤ 1 0 otherwise.

Let U = −Y. Then FU (u) = P (−Y ≤ u) = P (Y ≥ −u) = 1 − FY (−u) , and fU (u) = FU′ (u) = FY′ (−u) = f(−u) =

1 if − 1 ≤ u ≤ 0 0 otherwise.

5. Random Variables

71

Thus, by Equation 5.128,  0 if z < −1    z z + 1 if −1 ≤ z < 0 fZ (z) = fU (x)fY (z−x)dx = 1dx = 1 − z if 0 ≤ z < 1  0 [−1,0]∩[z−1,z]   0 if 1 ≤ z.

5.6.1. The joint distribution of X and Y is trinomial (with the third possibility being that we get any number other than 1 or 6) and so, fX,Y (i, j) = P (X = i, Y = j) =

4 i j k

1 6

i

j

1 6

4 6

k

for i, j, k = 0, 1, . . . , 4, i + j + k = 4. The table below shows the values of this function: i\j 0 1 2 3 4 fY (j)

0 16/81 16/81 6/81 1/81 1/1296 625/1296

1 16/81 12/81 3/81 1/324 0 125/324

2 6/81 3/81 1/216 0 0 25/216

3 1/81 1/324 0 0 0 5/324

4 1/1296 0 0 0 0 1/1296

Now, the conditional p.f. is given by.fX|Y (i|j) = values are i\j 0 1 2 3 4

0

1

2

3

256 625 256 625 96 625 16 625 1 625

64 125 48 125 12 125 1 125

16 25 8 25 1 25

4 5 1 5

0

0 0

0 0 0

fX (i) 625/1296 125/324 25/216 5/324 1/1296 1

fX,Y (i,j) fY (j) ,

4 1 0 0 0 0

5.6.2. 1. First, we tabulate the possible values of Z : From here, we can read off the joint p.f. fX,Z (x, z) as f (x,z) Now, fX|Z (x|z) = X,Z fZ (z) , and so its table is

and so its

72

5. Random Variables y\x 1 2 3 4 5 6 z\x 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 7

1 1/36 1/36 1/36 1/36 1/36 1/36 0 0 0 0 0

2 0 1/36 1/36 1/36 1/36 1/36 1/36 0 0 0 0

z\x 2 3 4 5 6 7 8 9 10 11 12

1 1 1/2 1/3 1/4 1/5 1/6 0 0 0 0 0

2 3 4 5 6 7 8 3 0 0 1/36 1/36 1/36 1/36 1/36 1/36 0 0 0 2 0 1/2 1/3 1/4 1/5 1/6 1/5 0 0 0 0

3 4 5 6 7 8 9

4 5 6 7 8 9 10

4 0 0 0 1/36 1/36 1/36 1/36 1/36 1/36 0 0 3 0 0 1/3 1/4 1/5 1/6 1/5 1/4 0 0 0

5 6 7 8 9 10 11 5 0 0 0 0 1/36 1/36 1/36 1/36 1/36 1/36 0

4 0 0 0 1/4 1/5 1/6 1/5 1/4 1/3 0 0

5 0 0 0 0 1/5 1/6 1/5 1/4 1/3 1/2 0

6 7 8 9 10 11 12 6 0 0 0 0 0 1/36 1/36 1/36 1/36 1/36 1/36

fZ (z) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

6 0 0 0 0 0 1/6 1/5 1/4 1/3 1/2 1

2. Clearly, f(X,Y )|Z (x, y|2) =

1 if x = y = 1 0 otherwise,

fX|Z (x|2) =

1 if x = 1 0 otherwise,

fY |Z (y|2) =

1 if y = 1 0 otherwise.

and

Thus, f(X,Y )|Z (x, y|2) = fX|Z (x|2) fY |Z (y|2) , which means that X and Y are independent under the condition Z = 2.

5. Random Variables

73

3. Now, f(X,Y )|Z (1, 2|3) = 12 and fX|Z (1|3) = 12 and fY |Z (2|3) = 12 . Thus, f(X,Y )|Z (x, y|3) = fX|Z (x|3) fY |Z (y|3) does not hold for all (x, y) , which means that X and Y are not independent under the condition Z = 3. 5.6.3. By Example 5.5.4, P (A|X = x) =

P 12 < Y < x + P x − 12 < Y
1 2

and so,

 0 if x < 0 P (X ≤ x, Z ≤ 1)  FX|A (x) = = 1 − (1 − x)2 if 0 ≤ x < 1.  1/2 1 if x ≥ 1

Hence, by differentiation, fX|A (x) =

2 (1 − x) if 0 ≤ x < 1 . 0 otherwise

2. Fig. 5.42.

6. Expectation, Variance, Moments

6.1.1. 9

E (X) = i=1

4 16 85 i + 10 · = ≈ 6.54. 52 52 13

6.1.2. k indistinguishable balls can go into n distinguishable boxes in n−1+k = k n−1+k 6 ways. In our case, n = 3 and k = 4, and so there are 2 = 15 n−1 ways. if the first box has 0 balls, then the second box can get 0,1,2,3, or 5 . Similarly, 4 balls, and the third box gets the rest. Thus, P(X = 0) = 15 3 2 1 4 . Hence, P(X = 1) = 15 , P(X = 2) = 15 , P(X = 3) = 15 , and P(X = 4) = 15 5 4 3 2 1 4 E(X) = 0 · 15 + 1 · 15 + 2 · 15 + 3 · 15 + 4 · 15 = 20 = . 15 3 6.1.3. ∞

E (T ) = 0

t · λ2 te−λt dt.

Substituting s = λt, we get E (T ) =

1 λ



s2 e−s ds,

0

and integration by parts, with u = s2 , dv = e−s ds, gives E (T ) =

1 λ

s2 e−s

∞ 0



+

2se−s ds =

0



2 λ

0 −s

A second integration by parts, with u = s, dv = e E (T ) =

2 λ

se−s

∞ 0



+

e−s ds =

0

6.1.4.

E (X) = 35 ·

1 37 + (−1) · ≈ −0.05. 38 38

2 . λ

se−s ds.

ds, results in

80

6. Expectation, Variance, Moments

6.1.5. From the hint, n−1

g ′ (x) =

ixi−1 =

(x − 1) nxn−1 − 1 − (xn − x) · 1

i2i−1 =

1 · n2n−1 − 1 − (2n − 2) · 1 12

(x − 1)2

i=1

and n−1

g ′ (2) = i=1

= n2n−1 − 1 − 2 2n−1 − 1 = (n − 2) 2n−1 + 1. n−1

Furthermore, by the geometric series formula, i=1 2i−1 = 2n−1 −1. Adding the two sums and their computed values, we obtain n−1 i=1

(i + 1) 2i−1 = (n − 2) 2n−1 + 1 + 2n−1 − 1 = n2n−1 − 2n + 2n−1 = (n + 1) 2n−1 − 2n ,

from which Equation 6.20 follows at once by rearrangement. 6.1.6. 1. 2.

∞ ∞ 1 1 1 1 π π = 1. −∞ π 1+x2 dx = π arctan x −∞ = π 2 − − 2 ∞ ∞ ∞ ∞ 2 x 1 du 1 1 1 dx = dx = = |x| π 1+x2 π 0 1+x2 π 1 u π ln u 1 −∞

= ∞.

6.1.7. The distribution of a discrete X is symmetric about a number α if all possible values of X are paired so that for each xi < α there is a possible value xj > α, and vice versa, such that α − xi = xj − α and f (xi ) = f (xj ) . For such X, E (X) =

xi f (xi ) =

xi f (xi ) + αf (α) + xi α

(Here f (α) = 0, if α is not a possible value of X.) In the last term, we apply the symmetry conditions: xj f (xj ) = xj >α

xi 0, hence q < 1, and so we must have √ √ SD (X) < 5. On the other hand, SD (X) gets arbitrarily close to 5 for sufficiently large n.

36 . λ4

6. Expectation, Variance, Moments

87

6.2.9. X and Y are both binomial with p = 12 . Thus, E (X) = E (Y ) = np = 2 n 2 = E X 2 = V ar (X) + [E (X)]2 = npq + (np)2 = n+n 2 . Now, E Y 4 . Furthermore, E (X + Y )2

= E X 2 + 2E (XY ) + E Y 2 = 2 · 2

2E (XY ) . On the other hand, X + Y = n and so E (X + Y ) 2

2

n+n2 4

+

= n2 .

2

Hence, E (XY ) = n2 − n+n = n 4−n . (Alternatively, Y = n − X, E (XY ) = 4 2 2 E (X (n − X)) = E (nX)− E X 2 = n · n2 − n+n = n 4−n .) Thus, E (XY ) = 4 E (X) E (Y ) .

E

6.3.1. " = X − µX and Y" = Y − µY . Then m3 (X + Y ) = Let us write X " + Y" X

3

" 3 + 3E X " 2 E Y" + 3E X " E Y" 2 + E Y" 3 , =E X

where we used the independence of X and Y, from which the independence " 2 and Y" and that of X " and Y" 2 follows. Now, E X " = E Y" = 0, and of X so the preceding equation reduces to m3 (X + Y ) = m3 (X) + m3 (Y ) . 6.3.2.

" = X−µX and Y" = Y −µY . Then m4 (X + Y ) = E Write X

" + Y" X

4

" 4 + 4E X " 3 E Y" + 6E X " 2 E Y" 2 + 4E X " E Y" 3 + E Y" 4 . E X

" = E Y" = 0, and so the second and fourth terms in the expanNow, E X sion are 0, but the middle term equals 6V ar(X)V ar(Y ), which is generally not 0. 6.3.3. ψY (t) = E et(aX+b) = E eatX ebt = ψX (at) ebt . 6.3.4.

ψ (t) =

pet −pet (1 + qet ) pet , ψ′ (t) = , ψ′′ (t) = . 2 3 t t 1 − qe (qe − 1) (qet − 1)

Hence, E X 2 = ψ′′ (0) =

−p (1 + q) 3

(q − 1)

=

1+q , p2

and V ar(X) = E X 2 − [E (X)]2 =

1+q 1 q − 2 = 2. p2 p p

=

88

6. Expectation, Variance, Moments

6.3.5. ψX−µ (t) = E et(X−µ) and so, by the result of Exercise 6.3.3, ψX−µ (t) = ψX (t) e−µt = pet + q

n −npt

e

= peqt + qe−pt

n

. Now, V ar (X) is the second moment of X − µ. From the expression above, ′ ψX−µ (t) = peqt + qe−pt

n−1

npq eqt − e−pt

and n−2

′′ (t) = npq[ peqt + qe−pt ψX−µ

+ peqt + qe−pt

n−2

+ 2 peqt + qe−pt + peqt + qe−pt

nqpe−2pt − peqt + qe−pt

n−2

n−1

npqe2qt − 2 peqt + qe−pt

n−2

pqeqt−pt − peqt + qe−pt

qeqt + peqt + qe−pt

n−1

n−2

npqeqt−pt

pqe2qt

n−2

qpe−2pt

pe−pt ]

Hence, by Theorem 6.3.1, ′′ V ar (X) = ψX−µ (0) = npq.

6.3.6. n

ψ (t) = E etX =

1 tx et ent − 1 e = . n n et − 1 x=1

G (s) = E sX =

s sn − 1 . n s−1

6.3.7.

ψZ (t) = E et(X−Y ) = E etX e−tY = E etX E e−tY = ψ (t) ψ (−t) . 6.3.8. 1. ψ (t) = =

1 2 1 2

∞ −∞ ∞

etx e−|x| dx = etx e−x dx +

0 ∞

1 2

1 2

∞ 0 0 −∞

etx e−x dx +

1 2

0 −∞

etx ex dx 0

1 1 (t−1)x 1 1 (t+1)x · e + · e 2 t−1 2 t + 1 x=0 x=−∞ 1 1 1 1 1 = · + · = if |t| < 1. 2 1−t 2 1+t 1 − t2 =

etx ex dx

6. Expectation, Variance, Moments

89

2. 1 = 1 + t2 + t4 + · · · , 1 − t2 and so, E X 2k = 1 and E X 2k−1 = 0 for k = 1, 2, 3, . . . . 6.3.9. n X

G (s) = E s

= x=0

n x n−x x p q s = (ps + q)n x

. 6.3.10.

X

G (s) = E s



=

pq

x−1 x

s =

x=1



p (qs)x q −1 =

x=1

ps . 1 − qs

6.3.11. Let X1 , X2 , X3 denote the points showing on the three dice, respectively and let S = X1 + X2 + X3 . Then, by Exercise 6.3.6, 6

GXi (s) =

1 x s s6 − 1 s = 6 6 s−1 x=1

for each i, and 6

GS (s) = =

1 x s 6 x=1

3

=

1 3 s 1 + s + s2 + s3 + s4 + s5 216

3

1 3 s 1 + 3s + 6s2 + · · · . 216

The probabilities are the coefficients of sk , and so, p3 = 6 1 and p5 = 216 = 36 . 6.4.1.

1 216 ,

p4 =

3 216

=

1 72 ,

V ar(X + Y ) = E (X + Y − (µX + µY ))2 = E ((X − µX ) + (Y − µY ))2 = E (X − µX )2 + 2 (X − µX ) (Y − µY ) + (Y − µY )2 = E (X − µX )2 + 2E ((X − µX ) (Y − µY )) + E (Y − µY )2 = V ar(X) + 2Cov (X, Y ) + V ar(Y ).

90

6. Expectation, Variance, Moments

6.4.2. 1

E X2 =

1−y

0

0

1

E Y2 =

1 2x2 dxdy = , 6

1−y

0

1 , 6

2y 2 dxdy =

0 1

1−y

E (XY ) =

2xydxdy = 0

0

1

1−y

1 2xdxdy = , 3

1−y

1 2ydxdy = , 3

E (X) = 0

0

1

E (Y ) = 0

0

1 , 12

and so, 2 σX = σY2 =

1 3

2

1 − 12

1 3

1 − 6

Cov (X, Y ) =

=

1 , 18

2

=−

1 36

and ρ (X, Y ) =

Cov (X, Y ) 1 =− . σX σY 2

6.4.3. " = X − µX and Y" = Y − µY . Then, 1. Let us write X Cov (U, V ) = Cov (X + Y, X − Y ) = E

" + Y" X

" − Y" X

" 2 − Y" 2 = E X " 2 − E Y" 2 = 0. =E X

2. For instance, P(V = 0|U = 2) = 1, but P(V = 0) = Theorem 5.6.3, U and V are not independent. 6.4.4. E X

2

= −1

E Y

2

√ 1−x2

1 0

√ 1−x2

1

= −1

0

2 2 1 x dydx = , π 4 2 2 1 y dydx = , π 4

6 36

= 1, and so, by

6. Expectation, Variance, Moments √ 1−x2

1

E (XY ) = −1

0 √ 1−x2

1

1

0 √ 1−x2

E (Y ) = −1

0

2 xydydx = 0, π

2 xdydx = 0, π

E (X) = −1

91

2 4 ydydx = . π 3π

Hence, 1 2 σX = σY2 = , Cov (X, Y ) = ρ (X, Y ) = 0. 4 6.4.5. 1. With µX =

m i=1 pi xi m

and µY =

n j=1 pj yj ,

n

Cov (X, Y ) = i=1 j=1

pij (xi − µX ) (yj − µY )

or, alternatively, m

n

Cov (X, Y ) = i=1 j=1

m

pij xi yj −µX µY =

n

i=1 j=1

m

pij xi yj −

n

pi xi i=1

pj yj . j=1

2. ρ (X, Y ) =

m i=1 m 2 i=1 pi xi

n j=1 pij xi yj

m i=1 pi xi



2 m i=1 pi xi )

−(

n 2 j=1 pj yj

n j=1 pj yj



n j=1 pj yj

6.4.6. ∂ ∂a

m i=1

m

(axi + b − yi )2 =

i=1

2 (axi + b − yi ) xi m

m

x2i + 2b

= 2a i=1

i=1

m

xi − 2

xi yi = 0, i=1

and ∂ ∂b

m i=1

m

(axi + b − yi )2 =

i=1

2 (axi + b − yi ) m

= 2a i=1

m

xi + 2mb − 2

yi = 0. i=1

2

.

92

6. Expectation, Variance, Moments

Now, µX = 2 σX =

1 m 1 m

m

xi , µY = i=1 m i=1

1 m

m

yi , i=1

x2i − µ2X , σY2 =

1 m

m i=1

yi2 − µ2Y ,

and Cov (X, Y ) =

1 m

m i=1

xi yi − µX µY .

Substituting these values for the sums changes the first two equations to 2 a σX + µ2X + bµX − Cov (X, Y ) − µX µY = 0

and b + aµX − µY = 0. Multiplying the last equation by µX and subtracting the result from the ) 2 previous equation, we get aσX − Cov (X, Y ) = 0, and so, a = Cov(X,Y = 2 σX σY ρ σX . Substituting from here into the second equation above, results in b = Y µY − ρ σσX µX . Thus, y = ax + b becomes y=ρ

σY σY x + µY − ρ µX σX σX

or, equivalently, y=ρ

σY (x − µX ) + µY . σX

6.4.7.

Cov (U, V ) = Cov (aX + b, cY + d) = E ((aX + b − (aµX + b)) (cY + d − (cµY + d))) = E (ac (X − µX ) (Y − µY )) = acCov (X, Y ) , σU = |a|σX , and σV = |c|σY. Thus, ρ (U, V ) =

Cov (U, V ) acCov (X, Y ) = = sign (ac) ρ (X, Y ) . σU σV |a|σX |c|σX.

6. Expectation, Variance, Moments

93

6.4.8. Cov (U, V ) = Cov (aX + bY, cX + dY ) = E ((aX + bY − (aµX + bµY )) (cX + dY − (cµX + dµY ))) = E ((a (X − µX ) + b (Y − µY )) (c (X − µX ) + d (Y − µY )))

= acE (X − µX )2 + (ad + bc) E ((X − µX ) (Y − µY )) + bdE (Y − µY )2 = acV ar(X) + (ad + bc) Cov (X, Y ) + bdV ar(Y )

. 6.4.9. By the result of Exercise 6.4.1, V ar (X − 3Y ) = V ar (X) + 2Cov (X, −3Y ) + V ar (−3Y ) and, by Theorem 6.2.2 and by the obvious property Cov (X, aY ) = aCov (X, Y ) , we get V ar (X − 3Y ) = V ar (X) − 6Cov (X, Y ) + 9V ar (Y ) . Now, Cov (X, Y ) = ρV ar (X) V ar (Y ) =

1 · 4 · 1 = 2, 2

and so, V ar (X − 3Y ) = 4 − 6 · 2 + 9 = 1. 6.4.10.  Cov 

m

i=1

n

ai Xi , j=1





bj Yj  = E  

=E m

m i=1 m

n

ai (Xi − µXi )

i=1 j=1 n

= i=1 j=1 m n

=

j=1

n

bj Yj − µYj  

ai bj (Xi − µXi ) Yj − µYj 

ai bj E (Xi − µXi ) Yj − µYj ai bj Cov (Xi , Yj ) .

i=1 j=1



94

6. Expectation, Variance, Moments

6.4.11. Student A B C D E Ave.

X 40 60 80 90 90 72

Y 50 55 75 80 90 70

X2 1600 3600 6400 8100 8100 5560

Y2 2500 3025 5625 6400 8100 5130

XY 2000 3300 6000 7200 8100 5320

Hence µX = 72, µY = 70, σX =

# 5560 − 722 ≈ 19.39,

# 5320 − 70 · 72 5130 − 702 ≈ 15.166, and ρ ≈ ≈ 0.952. 19.39 · 15.166 Thus, the equation of the least squares line is σY =

y = 0.952 ·

15.166 (x − 72) + 70 19.39

or y = 0.745x + 16.383 and the plot is given in Figure 6.1.

100 90 80 70 y 60 50 40 30

Fig. 6.1.

40

50

60 x 70

80

90

100

6. Expectation, Variance, Moments

95

6.5.1. By Definition 6.5.1, for discrete X and Y, Ey (X) =

xfX|Y (x|y) x:fX|Y (x|y)>0

and, by Theorem 6.1.3 with g (Y ) = EY (X), E (EY (X)) =

fY (y) Ey (X) . y

Thus, E (EY (X)) =

xfY (y) fX|Y (x|y) y x:fX|Y (x|y)>0

=

f (x, y) =

x x

y

xfX (x) = E (X) . x

6.5.2. In Exercise 5.4.2 we obtained the joint probability function fU,V (u, v) of U and V as given by the table u\v 1 2 3 4 5 6 fV (v)

1 1/36 2/36 2/36 2/36 2/36 2/36 11/36

2 0 1/36 2/36 2/36 2/36 2/36 9/36

3 0 0 1/36 2/36 2/36 2/36 7/36

4 0 0 0 1/36 2/36 2/36 5/36

5 0 0 0 0 1/36 2/36 3/36

6 0 0 0 0 0 1/36 1/36

fU (u) 1/36 3/36 5/36 7/36 9/36 11/36 1

Thus, the conditional probability function fU|V (u, v) is given by the table u\v 1 2 3 4 5 6

1 1/11 2/11 2/11 2/11 2/11 2/11

2 0 1/9 2/9 2/9 2/9 2/9

3 0 0 1/7 2/7 2/7 2/7

4 0 0 0 1/5 2/5 2/5

5 0 0 0 0 1/3 2/3

6 0 0 0 0 0 1

and the conditional probability function fV |U (u, v) is given by the table

96

6. Expectation, Variance, Moments u\v 1 2 3 4 5 6

1 1 2/3 2/5 2/7 2/9 2/11

Thus, Ev (U) = are given by

u

1 41/11

v Ev (U)

2 0 1/3 2/5 2/7 2/9 2/11

3 0 0 1/5 2/7 2/9 2/11

4 0 0 0 1/7 2/9 2/11

5 0 0 0 0 1/9 2/11

ufU|V (u, v) and Eu (V ) = 2 38/9

3 33/7

4 26/5

v

5 17/3

6 0 0 0 0 0 1/11

vfV |U (u, v), and they 6 6

and 1 1

u Eu (V )

2 4/3

3 9/5

4 16/7

5 25/9

6 36/11

Hence, E (EV (U )) =

Ev (U ) fV (v) v

=

41 11 38 9 33 7 26 5 17 3 1 161 · + · + · + · + · +6· = 11 36 9 36 7 36 5 36 3 36 36 36

and E (U ) = u

ufU (u) = 1 ·

3 5 7 9 11 161 1 +2· +3· +4· +5· +6· = 36 36 36 36 36 36 36

is the same. Similarly, E (EU (V )) =

Eu (V ) fU (u) u

=1·

4 3 9 5 16 7 25 9 36 11 91 1 + · + · + · + · + · = 36 3 36 5 36 7 36 9 36 11 36 36

and E (V ) = v

vfV (v) = 1 ·

11 9 7 5 3 1 91 +2· +3· +4· +5· +6· = , 36 36 36 36 36 36 36

the same. 6.5.3. Clearly, EH (X) = 1 and ET (X) = 27 . Thus, E (X) =

1 2

·1+

1 2

·

7 2

= 94 .

6. Expectation, Variance, Moments

97

6.5.4. We can tabulate fY |X (x, y) as follows: x\y 1 2 3 4

1

2

3

4

1 4

1 4 1 42

1 4 2 42 1 43

1 4 3 42 3 43 1 44

0 0 0

0 0

0

5 0

6 0

7 0

8 0

4 42 6 43 4 44

3 42 10 43 10 44

2 42 12 43 20 44

1 42 12 43 31 44

9 0 0

10 0 0

11 0 0

12 0 0

10 43 40 44

6 43 44 44

3 43 40 44

1 43 31 44

Thus, EX (Y ) has the following values: x Ex (Y )

1 10/4

2 80/42

3 480/43

4 2560/44

and E (Y ) = E (EX (Y )) =

1 · 4

10 80 480 2560 + 2 + 3 + 4 4 4 4 4

6.5.5. Here, f (x, y) =

2 if 0 < x, 0 < y, x + y < 1 0 otherwise.

Hence, for 0 < x < 1, fX (x) =



1−x

2dy = 2 (1 − x) ,

f(x, y)dy = 0

−∞

and fX (x) = 0 otherwise. Similarly, for 0 < y < 1, fY (y) =



1−y

f (x, y)dx =

−∞

0

2dx = 2 (1 − y) ,

and fY (y) = 0 otherwise. From Equation 6.5.10, ∞

Ey (X) =

xfX|Y (x, y) dx =

−∞

∞ −∞

x

f (x, y) dx, fY (y)

and so, 1−y

Ey (X) = 0

2x 1−y dx = if 0 < y < 1 2 (1 − y) 2

and Ey (X) = 0 otherwise. Similarly,

=

25 . 4

13 0 0 3

14 0 0 4

15 0 0 3

16 0 0 4

20 44

10 44

4 44

1 44

98

6. Expectation, Variance, Moments

Ex (Y ) =

1−x if 0 < x < 1 2

and Ex (Y ) = 0 otherwise. Now, by Theorem 6.5.1, 1

E (X) = E (EY (X)) =

1

Ey (X) fY (y)dy = 0

0

1−y 1 · 2 (1 − y) dy = . 2 3

We can also compute E (X) directly as 1

1−y

E (X) = 0

0

1 2xdxdy = . 3

By symmetry, E (Y ) = 31 , too. 6.5.6. Here, f (x, y) =

2 if 0 < x < y < 1 0 otherwise.

Hence, for 0 < x < 1, fX (x) =



1

f(x, y)dy =

−∞

x

2dy = 2 (1 − x) ,

and fX (x) = 0 otherwise. Similarly, for 0 < y < 1, fY (y) =



y

f (x, y)dx =

2dx = 2y,

−∞

0

and fY (y) = 0 otherwise. From Equation 6.175, ∞

Ey (X) =

xfX|Y (x, y) dx =

−∞



x

−∞

f (x, y) dx, fY (y)

and so, y

Ey (X) = 0

2x 1 dx = y if 0 < y < 1 2y 2

and Ey (X) = 0 otherwise. Similarly, Ex (Y ) =

∞ −∞

y

f (x, y) dy = fX (x)

1 x

2y 1 dy = (x + 1) if 0 < x < 1 2 (1 − x) 2

and Ex (Y ) = 0 otherwise. Now, by Theorem 6.5.1, 1

E (X) = E (EY (X)) =

1

Ey (X) fY (y)dy = 0

0

y 1 · 2ydy = . 2 3

6. Expectation, Variance, Moments

99

We can also compute E (X) directly as 1

y

E (X) = 0

0

1 2xdxdy = . 3

Similarly, 1

E (Y ) =

1

Ex (Y ) fX (x)dx = 0

0

1 2 (x + 1) · 2 (1 − x) dx = . 2 3

Alternatively, 1

y

E (Y ) =

2ydxdy = 0

0

2 . 3

6.5.7. From the solution of Exercise 5.6.7,  1  z if 0 < x ≤ z < 1 1 if 1 ≤ z < 1 + x < 2 fX|Z (x|z) = 2−z  0 otherwise

and

1 if 0 < x ≤ z < 1 + x < 2 0 otherwise.

fZ|X (z|x) = Thus, Ez (X) = and Ex (Z) =

 

z x dx = z2 0 z 1 x z z−1 2−z dx = 2



0

1+x x

zdz = x +

0

if 0 < z < 1 if 1 ≤ z < 2 otherwise

1 2

if 0 < x < 1 otherwise.

6.5.8. For continuous (X, Y ) , Ey (EY (X)) =



Ey (X) fX|Y (x, y) dx.

−∞

Now, Ey (X) is a function of y only, and so it can be taken out from the integral. Thus, ∞ −∞

Ey (X) fX|Y (x, y) dx = Ey (X)

∞ −∞

fX|Y (x, y) dx = Ey (X) .

100

6. Expectation, Variance, Moments

For discrete (X, Y ) , Ey (EY (X)) =

Ey (X) fX|Y (x, y) = Ey (X)

fX|Y (x, y) = Ey (X) .

x

x

In both cases, replacing y by Y yields the required result. 6.5.9. By Definition 6.5.1 and Theorem 6.1.4, Ey (g (X, Y )) =



g (x, y) fX|Y (x, y) dx =

−∞



g (x, y)

−∞

f (x, y) dx fY (y)

and E (EY (g (X, Y ))) = = =



Ey (g (X, Y )) fY (y) dy

−∞ ∞

! f (x, y) g (x, y) dx fY (y) dy fY (y) −∞ ∞

−∞ ∞



−∞

−∞

g (x, y) f (x, y) dxdy = E (g (X, Y )) .

6.5.10. By the result of Exercise 6.5.9 (which is valid for arbitrary (X, Y )) and by Lemma 6.5.1, Cov (X, Y ) = E ((X − µX ) (Y − µY )) = E (EY ((X − µX ) (Y − µY ))) = E ((Y − µY ) EY ((X − µX ))) = E ((Y − µY ) (c − µX )) = (c − µX ) E ((Y − µY )) = 0. 6.5.11. From Theorem 6.5.1, E (X) = E (EY (X)) = E (c) =





−∞

−∞

cf (x, y) dxdy = c.

Furthermore, V ary (X) = Ey [X − Ey (X)]2 = Ey (X − c)2 =

∞ −∞

(x − c)2

Now, E (V arY (X)) = = =

∞ −∞ ∞

V ary (X) fY (y) dy ! f (x, y) (x − c) dx fY (y) dy fY (y) −∞ ∞

−∞ ∞



−∞

−∞

2

(x − c)2 f (x, y) dxdy = V ar (X) .

f (x, y) dx. fY (y)

6. Expectation, Variance, Moments

101

6.5.12. 16

V arx (Y ) = Ex (Y − Ex (Y ))2 = Thus, for x = 0, $ 1 10 V arx (Y ) = 1− 4 4

2

y=1

2

10 + 2− 4

(y − Ex (Y ))2 fY |X (x, y) .

2

10 + 3− 4

2

10 + 4− 4

%

5 = , 4

for x = 1, V arx (Y ) = +

4 42

5−

1 42

2− 2

80 42

+

2

80 42

3 42

+

2 42

80 42

2

2

3 43

6−

+

2

3−

80 42

2 42

7−

3 42

+

2

80 42

+

1 42

2

80 42

4−

80 42

2

480 43

2

8−

5 = , 2

for x = 2, V arx (Y ) = +

10 43

6− +

1 43

3−

480 43

6 43

2

480 43

4− 2

12 480 7− 3 3 4 4

+

2

480 43

10 −

+

+

3 43

+

11 −

2

480 43

+

6 43

5− 2

12 480 8− 3 3 4 4 2

480 43

+

1 43

+ 12 −

10 43

9−

480 43

2

480 43

=

2

15 , 8

and for x = 3, V arx (Y ) = +

20 44

+

7−

1 44

4−

2560 44

2

+

40 2560 11 − 4 44 4

+

10 44

14 −

2560 44

2560 44

31 44

2

+

2

+

8− 31 44

2

+

4 44

4 44

2560 44

12 −

5− 2

+

+

10 44

2560 44

2

+

20 44

2

+

+

13 −

1 44

6− 2

40 2560 9− 4 4 4 4

2560 44

15 −

2

2560 44

44 44

2560 44

16 −

2560 44

=

197 . 128

Hence, V ar (Y ) = E (V arX (Y )) =

1 · 4

5 5 15 17 + + + 4 2 8 32

2

2560 44 10 −

2560 44

2

2

=

17 . 32

2

102

6. Expectation, Variance, Moments

6.5.13. For continuous (X, Y ) with density f (x, y) , 2

V ary (X) = Ey [X − Ey (X)]



=

−∞

2

[x − Ey (x)]

f (x, y) dx fY (y)

and so E (V arY (X)) = =



! f (x, y) dx fY (y) dy [x − Ey (X)] fY (y) −∞ ∞

−∞ ∞



−∞

−∞

2

2

[x − Ey (X)] f (x, y) dxdy. ∞



On the other hand, V ar (X) = −∞ −∞ [x − E (X)]2 f (x, y) dxdy, and since Ey (X) = E (X) in general, also V ar (X) = E (V arY (X)) in general. 6.5.14. From the solution of Exercise 6.5.13,







−∞ −∞ ∞ ∞

−∞

−∞





[x − E (X)]2 f (x, y) dxdy −∞ −∞ ! 2 [x − Ey (X)] fX|Y (x, y) dx fY (y) dy =

V ar (X) − E (V arY (X)) =

2

2

[E (X)] − [Ey (X)] + 2x (Ey (X) − E (X)) fX|Y

! (x, y) dx fY (y) dy.

Now, ∞

2xEy (X) fX|Y (x, y) dx = 2 [Ey (X)]2

−∞

and ∞

2xE (X) fX|Y (x, y) dx = 2E (X) Ey (X) .

−∞

Thus, V ar (X) − E (V arY (X)) =

∞ −∞

[[E (X)]2 − [Ey (X)]2 + 2 [Ey (X)]2

−2E (X) Ey (X)]fY (y) dy = E [Ey (X)]2 − [E (X)]2 = V ar (Ey (X)) . 6.6.1. 1 1. Let n = 2k + 1 for k = 1, 2, . . . . Then P(X = xi ) = 2k+1 for all i, k 1 k P(X < xk+1 ) = 2k+1 ≤ 2 and P(X > xk+1 ) = 2k+1 ≤ 12 . Thus, the median is m = xk+1 .

6. Expectation, Variance, Moments

103

2. Let n = 2k for k = 1, 2, . . . and m any number such that xk < m < xk+1 k 1 for all i, P(X < m) = 2k ≤ 12 and P(X > m) = Then P(X = xi ) = 2k k 1 2k ≤ 2 . Thus, any such m is a median. 6.6.2. The distribution of a discrete X is symmetric about a number α if all possible values of X are paired so that for each xi < α there is a possible value xj > α, and vice versa, such that α − xi = xj − α and f (xi ) = f (xj ) . For such X, P(X < α) = xi α f (xj ) = P(X > α) . Clearly, P(X < α) + P(X = α) + P(X > α) = 1, and so, 2P(X < α) = 1− P(X = α) ≤ 1 and 2P(X > α) = 1− P(X = α) ≤ 1. Thus, P(X < α) ≤ 12 and P(X > α) ≤ 21 , which show, by Definition 6.6.1, that α is a median. 6.6.3. The converse of Theorem 6.6.1 says: For m a median of a random variable X, P(X < m) = 12 and P(X > m) = 12 imply P(X = m) = 0. This statement is true, because one of the statements X < m, X > m, or X = m is certain to be true and so P(X < m) + P(X > m) + P(X = m) = 1. On the other hand, the hypothesis says that P(X < m) + P(X > m) = 21 + 12 = 1, and subtracting the latter equation from the former, we get P(X = m) = 0. 6.6.4. The inequalities P(X ≥ m) ≥ 12 and P(X ≤ m) ≥ 12 are equivalent to 1− P(X ≥ m) ≤ 1 − 12 and 1− P(X ≤ m) ≤ 1 − 12 , that is, to P(X < m) ≤ 21 and P(X > m) ≤ 12 , which are equivalent, by Definition 6.6.1, to m being a median. 6.6.5.

E (|X − c|) =

∞ −∞

c

|x−c|f (x) dx =

−∞

(c − x) f (x) dx+

∞ c

(x − c) f (x) dx.

hence, using the fundamental theorem of calculus, we get d E (|X − c|) = dc

c −∞

f (x) dx −

∞ c

f (x) dx = F (c) − [1 − F (c)] .

Thus, E (|X − c|) has a critical point where 2F (c) − 1 = 0, or where F (c) = 1 2 , that is, if c is a median m. Since we assumed that f is continuous and f (x) > 0, m is unique. The second derivative test shows that E (|X − c|) has d2 a minimum at c = m, for dc 2 E (|X − c|) = 2f (c) > 0 by assumption. 6.6.6. 1 We must find m such that P X ≤m = P X ≥ Since F (x) = x for 0 < x < 1, we get m = 2.

1 m

= 1−F

1 m

= 12 .

104

6. Expectation, Variance, Moments

6.6.7. For general X the 50th percentile is defined as the number x.5 = min {x : F (x) ≥ .5} . Since any d.f. is continuous from the right, P(X ≤ x.5 ) = F (x.5 ) ≥ .5, and so we have P(X > x.5 ) = 1 − F (x.5 ) ≤ 1 − .5 = .5. Also, by the definition of x.5 as the minimum x such that F (x) ≥ .5, we have F (x) = P(X ≤ x) < .5 for x < x.5 . Hence, P(X < x.5 ) = limx→x− F (x) ≤ .5. Thus, x.5 satisfies the two conditions in the definition of .5 the median. 6.6.8. The first grades in increasing order are 40, 60, 80, 80, 90 with probability 0.2 for each. By Definition 6.6.2 the first quartile is q1 = x.25 = min {x : F (x) ≥ .25} . Since F (40) = .2 and F (60) = .4, q1 = 60. Similarly, q2 = q3 = 80 The second grades in increasing order are 50, 55, 75, 80, 90, and for these, q1 = 55, q2 = 75, and q3 = 80. 6.6.9.

 

if x < −1 if − 1 ≤ x < 1 Thus, the quantile  1 if 1 ≤ x. √ function is F −1 (p) = 2 p − 1 for p ∈ (0, 1) . Its graph is given in Figure 6.2. The d.f. of this r.v. is F (x) =

0

(x+1)2 4

1

0

-1

Fig. 6.2.

p

1

6. Expectation, Variance, Moments

6.6.10. The d.f. of this  0      x2 1 F (x) = 2  x−1    2  1

105

r.v. is if if if if if

x 1 and X2 (1) > 1) = [P (X (1) > 1)]2 = 1 − 2e−1

2

4. P (X1 (1) = 2 and X2 (1) = 2|X (2) = 4) 2

e−1 · 12! P (X1 (1) = 2 and X2 (1) = 2) = = 4 P (X (2) = 4) e−2 · 24!

2

≈ 0.375.

≈ 0.0698.

7. Some Special Distributions

7.1.2. λ=

1 20 .

1. P(X (2) > 0) = 1 − e−2λ ≈ 9.5%. 2. P(X (3) > 0) = 1 − e−3λ ≈ 13.9%. 3. The coats have 1.5 times as much material as the pants, and so we would expect 1.5 · 9.5% = 14.25% flawed coats. We get slightly fewer, because bigger pieces are more likely to have multiple defects than smaller pieces, and so the defects fall on a smaller percent of pieces than expected. 7.1.3. λ = 2 per g · min. 1. P (X (1) > 2) = 1 − e−2

20 21 22 + + 0! 1! 2!

P (X (2) > 4) = 1 − e−4

40 41 42 43 44 + + + + 0! 1! 2! 3! 4!

≈ 0.371.

P (X (1) > 4) = 1 − e−2

20 21 22 23 24 + + + + 0! 1! 2! 3! 4!

≈ 0.053.

≈ 0.323.

2.

3.

4. P (T1 > 1) = P (X (1) = 0) = e−2 ≈ 0.135. 5. P T2 >

1 2

7.1.4. λ = 1 per month.

=P X 2·

1 2

= 0 = e−2 ≈ 0.135.

110

7. Some Special Distributions

1. P (X (2) = 0) = e−2 ≈ 0.135. 2. 12 2

2

e−1

1 − e−1

10

≈ 0.09.

3. 1 − e−1

12

+

12 −1 e 1 − e−1 1

11

+

12 2

e−1

2

1 − e−1

10

≈ 0.123.

4. The two months’ occurrences are independent, and so P(0 in Feb|0 in Jan) = P(0 in Feb) = e−1 ≈ 0.368. 7.1.5.

P(even) − P(odd) = e−λt = e−λt

(λt)0 (λt)2 + +··· 0! 2!

− e−λt

(λt)1 (λt)3 + + ··· 1! 3!

(λt)0 (λt)1 (λt)2 (λt)3 − + − +··· 0! 1! 2! 3!

= e−λt e−λt = e−2λt .

On the other hand, P(even)+ P(odd) = 1, and so, adding the two equations, we get 2P(even) = 1 + e−2λt , and subtracting them, 2P(odd) = 1 − e−2λt . 7.1.6.

P (XA (t) = m, XB (t) = n) =

m + n m n (λt)m+n e−λt (pλt)m (qλt)n −λt p q = · e . m (m + n)! m! n!

Thus, P (XA (t) = m) =



(pλt)m (qλt)n −λt (pλt)m qλt −λt (pλt)m −pλt · e = e e = e . m! n! m! m! n=0

Similarly, P (XB (t) = n) =

(qλt)n −qλt e . n!

7.1.7. Consider the instants s−∆s < s < t < t+∆t ≤ s′ −∆s′ < s′ < t′ < t′ +∆t′ and let T1 and T2 denote two distinct interarrival times. Then

7. Some Special Distributions

111

1 P(X (t) − X (s) = 0, ∆t∆t′ X (t + ∆t) − X (t) = 1, X (t′ ) − X (s′ ) = 0, X (t′ + ∆t′ ) − X (t′ ) = 1| X (s) − X (s − ∆s) = 1, X (s′ ) − X (s′ − ∆s′ ) = 1) = ′ ′ ′ ′ λ∆se−λ∆s e−λ(t−s) λ∆te−λ∆t λ∆s′ e−λ∆s e−λ(t −s ) λ∆t′ e−λ∆t lim ′ ∆s→0,∆t→0,∆s →0,∆t′ →0 λ∆se−λ∆s λ∆s′ e−λ∆s′ ∆t∆t′ ′ ′ = λ2 e−λ(t−s) e−λ(t −s ) = f (t − s) f (t′ − s′ ) , fT1 ,T2 (t − s, t′ − s′ ) =

lim

∆s→0,∆t→0,∆s′ →0,∆t′ →0

T1

T2

where in the last step, we used part 2 of Theorem 7.1.7. If t = s′ , the proof would be similar. 7.1.8. For k > 0, P (X = k) =

λk e−λ λ = k! k

λk−1 e−λ (k − 1)!

=

λ P (X = k − 1) . k

Hence, if k < λ, then P(X = k − 1) < P(X = k) > and if k > λ, then P(X = k − 1) > P(X = k) . Thus, P(X = k) is increasing for k < λ and decreasing for k > λ. If λ is an integer, then for k = λ P(X = λ − 1) = P(X = λ) , and so P(X = k) is maximum at both λ and λ − 1. If λ > 1 is not an integer, then the maximum occurs only at k = [λ] , because then λ P(X = [λ]) = [λ] P(X = [λ] − 1) > P(X = [λ] − 1) . If 0 < λ < 1, then −λ P(X = 1) = λe < e−λ = P(X = 0) and the maximum occurs at 0. 7.2.1. Using the table, we obtain 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

P(Z < 2) ≈ .9772 P(Z > 2) ≈ 1 − .9772 = .0228 P(Z = 2) = 0 P(Z < −2) = P(Z > 2) ≈ .0228 P(−2 < Z < 2) = P(Z < 2)− P(Z < −2) ≈ .9772 − .0228 = .9544 P(|Z| > 2) = 1− P(−2 < Z < 2) ≈ 1 − 0.9544 = .045 6 P(−2 < Z < 1) = P(Z < 1)− P(Z < −2) ≈ .8413 − .0228 = .8185 P(z < Z) = .05 ⇐⇒ P(Z < z) = .95 ⇐⇒ z ≈ 1.6448 P(−z < Z < z) = .9 ⇐⇒ P(Z < z) = .95 ⇐⇒ z ≈ 1.6448 P(−z < Z < z) = .8 ⇐⇒ P(Z < z) = .90 ⇐⇒ z ≈ 1.2815 7.2.2.

1. 2. 3. 4. 5.

1 P(X < 11) = P( X−10 < 11−10 2 2 ) = P Z < 2 = Φ (.5) ≈ .6915. P(X > 11) = 1− P(X < 11) ≈ 1 − .6915 = .3085. 1 1 P(X < 9) = P( X−10 < 9−10 2 2 ) = P Z < − 2 = P Z > 2 ≈ .3085. P(9 < X < 11) = P − 12 < Z < 21 ≈ .6915 − .3085 = .3830. P(9 < X < 12) = Φ (1) − Φ (−.5) ≈ .8413 − .3085 = .5328.

112

7. Some Special Distributions

< x−10 = 1 − Φ (z) = .05 or, 6. P(x < X) = 1− P(X < x) = 1− P X−10 2 2 x−10 equivalently, Φ (z) = .95, with z = 2 . From the table, z ≈ 1.645, and so x ≈ 10 + 2 · 1.645 = 13.29. 7. P(10 − x < X < 10 + x) = P(− x2 < X−10 < x2 ) = Φ (z) − Φ (−z) = 2 x 2Φ (z) − 1 = .9, where z = 2 . Hence, Φ (z) = 1.9 2 = .95, z ≈ 1.645, and so x ≈ 3.29. 8. As in Part 7, P(10 − x < X < 10 + x) = 2Φ (z) − 1 = .8, where z = x2 . Hence, Φ (z) = 1.8 2 = .9, z ≈ 1.28, and x ≈ 2.56. 7.2.3. 2

2

2

1. ϕ (z) = √12π e−z /2 , ϕ′ (z) = √−1 ze−z /2 , ϕ′′ (z) = √12π z 2 − 1 e−z /2 . 2π ϕ′′ (z) changes sign at z 2 = 1, that is, at z = ±1. Thus, ϕ has points of inflection at and only at z = ±1. 2. 2 2 1 e−(x−µ) /2σ , f(x) = √ 2πσ

f ′ (x) = − √

1 x − µ −(x−µ)2 /2σ2 e , 2πσ σ 2

2 2 1 1 x−µ f ′′ (x) = − √ e−(x−µ) /2σ + √ σ2 2πσ3 2πσ $ % 2 2 2 1 x−µ =√ − 1 e−(x−µ) /2σ . 3 σ 2πσ

2

= 1, that is, at f ′′ (x) changes sign at x−µ σ points of inflection at and only at x = µ ± σ.

x−µ σ

2

2

e−(x−µ)

/2σ2

= ±1. Thus, f has

7. Some Special Distributions

113

7.2.4. 2 2 1 1. fH (x) = 12 fX (x) + 12 fY (x), where fX (x) = √2π2.6 e−(x−66) /(2·2.6 ) and 2 2 fY (x) = √ 1 e−(x−69) /(2·2.6 ) . The graphs are in Figure 7.1.

2π2.6

0.14 0.12 0.1 0.08 y 0.06 0.04 0.02 0

60

62

64

66

x 68

70

72

74

Fig. 7.1.

2. E (H) = 66+69 = 67.5 and E H 2 = 21 E X 2 + 12 E Y 2 = 12 2.62 + 662 + 2 1 2 2 ≈ 4565.26. Hence, V ar (H) = E H 2 − 67.52 ≈ 9 and 2 2.6 + 69 SD (H) ≈ 3. 3. P(66 < H < 69) = 21 P(66 < X < 69) + 21 P(66 < Y < 69) 69−66 1 66−69 69−69 = 12 P( 66−66 2.6 < Z < 2.6 ) + 2 P( 2.6 < Z < 2.6 ) 1 1 ≈ 2 [Φ (1.15) − Φ (0)] + 2 [Φ (0) − Φ (−1.15)] = 21 [Φ (1.15) − Φ (−1.15)] ≈ 12 [0.875 − 0.125] = 0.325.

114

7. Some Special Distributions

7.2.5. 1. Assume a < 0. Then the d.f. of Y can be computed as FY (y) = P(Y ≤ 2 2 ∞ √1 y) = P(aX + b ≤ y) = P(X ≥ y−b e−(x−µ) /2σ dxand a ) = 2πσ (y−b)/a from here the chain rule and the fundamental theorem of calculus give 2 2 y−b d · e−(((y−b)/a)−µ) /2σ = its p.d.f. as fY (y) = FY′ (y) = √−1 a 2πσ dy 2 2 √ 1 e−(y−(aµ+b)) /2(aσ) . 2π|a|σ

A comparison with Definition 7.1.2 shows that this function is the p.d.f. of a normal r.v. with aµ + b in place 2 of µ and (aσ) in place of σ2 . 2. For any a = 0, ψaX+b (t) = E et(aX+b) = ebt E eatX = ebt ψX (at) = 2 2 2 2 ebt eaµt+σ (at) /2 = e(aµ+b)t+(aσ) t /2 . A comparison with Theorem 7.2.5 shows that this function is the m.g.f. of a normal r.v. with aµ + b in place of µ and (aσ)2 in place of σ2 . 7.2.6. P(X < 750) = .95 and P(X < 500) = .46. Thus, P( X−µ < 750−µ σ σ ) = 750−µ X−µ 500−µ 500−µ 500−µ Φ = .95 and P( σ < σ ) = Φ = .46 or Φ − σ = σ σ 500−µ .54. From the table, 750−µ ≈ 1.645 and − ≈ .10. Solving the last two σ σ equations, we get µ ≈ 514.3 and σ ≈ 143.27. 7.2.7. 2 2 2 1 Comparing ce−(x+2) /24 with the general normal p.d.f. √2πσ e−(x−µ) /2σ , we can see that this distribution is normal with µ = −2, σ2 = 12, and 1 c = √24π . 7.2.8. 2 2 2 Completing the square, we obtain ce−x −4x = ce−(x+2) +4 = ce4 e−(x+2) . 2 2 1 e−(x−µ) /2σ , the p.d.f. of a normal distriThis expression is of the form √2πσ 1 1 bution, with µ = −2, 2σ2 = 1 and √2πσ = ce4 . Thus, σ = √12 and c = √πe 4. 7.2.9. Let X1 and X2 denote the two weights. √ Then, according to Theorem 7.2.6, X1 − X2 is normal with µ = 0 and σ = 2 · .32 ≈ 0.4525. Thus, |X1 − X2 | .5 > 0.4525 0.4525 = P (|Z| > 1.105) = 2 (1 − Φ (1.105)) ≈ 0.27.

P (|X1 − X2 | > .5) ≈ P

7.2.10. √1 for n √ √ 2 √ n e−nx /2 . Thus, fZ has a max at 0, √ n n 2π 2π x = ± √1n . The graphs are in Figure 7.2.

1. According to Corollary 7.2.3, µ (Zn ) = 0 and σ (Zn ) = n. Also, fZn (x) = inflection points at

each and

7. Some Special Distributions

115

1.4 1.2 1 y0.8 0.6 0.4 0.2 -2

0

-1

1 x

2

Fig. 7.2.

√ 2. Zn is N (0, 1/n) , and so 1/Z√nn = nZn is standard normal. Hence the third quartile of Zn is the xn such that √ √ √ 0.75 = P (Zn ≤ xn ) = P nZn ≤ nxn = Φ nxn . √ Thus, from the normal table, nxn = 0.675 and so, for n = 1, 4, and 16 the third quartiles are x1 = 0.675, x4 = 0.675 = 0.3375, x16 = 0.675 = 2 4 0.16875. The first quartiles are the negatives of these, and the second quartiles are all 0. 7.2.11. If z = Φ−1 (1 − p), then, since Φ is strictly increasing, we can solve this equation for p, to get Φ (z) = 1 − p, or 1 − Φ (z) = p. Here 1 − Φ (z) is the area of the tail to the right of z under the standard normal curve, which equals the area of the corresponding left tail, that is, 1 − Φ (z) = Φ (−z) . So, Φ (−z) = p. Solving this equation results in −z = Φ−1 (p), which, when we substitute z from the first equation, yields Φ−1 (1 − p) = −Φ−1 (p). 7.2.12. −1 Let x = FX (p). Since FX is strictly increasing, we can solve this equation for p, to get p = FX (x) or, equivalently, p = P(X ≤ x) = P x−µ σ

Φ . Solving p = Φ µ + σΦ−1 (p).

x−µ σ

20 3

1 6

3

5 6

17

≈ 0.238.

x−µ σ −1 FX (p)



for x, we get x = µ+σΦ−1 (p), and so

7.3.1. We use the binomial p.f. with n = 20 and p = 61 : P (X = 3) =

X−µ σ

= =

116

7. Some Special Distributions

Using the normal approximation, we have µ = 20 ·

20 ·

1 6

·

5 6

=

5 3,

1 6

=

10 3

and σ =

and so,

2.5 − 10/3 3.5 − 10/3 1, P (Nr = k) = P(exactly r − 1 successes and ≤ r − 1 failures at trial number k −1)·p+P(exactly r −1 failures and ≤ r −1 successes at trial number k−1)·q =

k−1 r−1

pr q k−r + pk−r q r

for k = r, r+1, . . . , 2r−1.

7.4.3. If the number of failures before the rth success is k, then the total number of trials up to and including the rth success is k + r. Thus, P(Yr = k) = r k P(Xr = k + r) , where Xr is negative binomial, and so P(Yr = k) = k+r−1 r−1 p q . 7.4.4. 1. 2.

8 k=2 15 k=2

k−1 2−1 k−1 2−1

1 2 6 1 2 6

5 k 6 5 k 6

≈ 0.2745.

≈ 0.514, but so we need at least 15 rolls.

14 k−1 k=2 2−1

1 2 6

5 k 6

≈ 0.489, and

7.4.5. P (Xr = k, Xr+s = l) = P (Xr = k) P (Xs = l − k)

k − 1 r k−r l − k − 1 s l−k−s p q p q r−1 s−1 k − 1 l − k − 1 r+s l−(r+s) = p q r−1 s−1 =

for k = r, r + 1, r + 2, . . . and l = k + s, k + s + 1, . . . . 7.4.6. Letting Xm+n and Xn denote negative binomial random variables, we have

118

7. Some Special Distributions

P (Sm = k, Xm+n = r) P (Xm+n = r) P(Sm = k)P(Xn = r − k) = P (Xm+n = r)

P (Sm = k|Xm+n = r) =

=

m k

n−1 r−k n−r+k q r−k−1 p m+n−1 r m+n−r p q r−1

pk q m−k

=

m n−1 k r−k−1 m+n−1 r−1

for k = max (0, r − n) , . . . , min (m, r − 1) .

7.4.7. Letting f (x) denote the gamma density from Definition 7.4.2, we have f ′ (x) = −λα xα−2 e−λx −α+1+λx for x > 0. This expression equals 0 if Γ (α) α−1 x = λ . Since f is positive and bounded, it must have a maximum at this critical point. 7.4.8. See Figures 7.3,. . . ,7.9.

1 0.8 0.6 0.4 0.2

0

1

2

3

x

4

5

Fig. 7.3. Gamma density for α = 1, λ = 1

7.4.9. ∞

α

α

1. E T k = 0 Γλ(α) tk+α−1 e−λt dt = Γλ(α) · any positive integer k. 2. V ar (T ) = E T 2 − [E (T )]2 = α(α+1) − λ2 3. ψ (t) =

∞ λ xα−1 e−(λ−t)x dx 0 Γ (α) α

=

α

λ Γ (α)

·

Γ (α+k) λα+k

=

α(α+1)···(α+k−1) λk

α 2 = λα2 . λ Γ (α) λ (λ−t)α = λ−t

α

for t < λ.

7.4.10. Fχn (x) = P(χn ≤ x) = P χ2n ≤ x2 = Fχ2n x2 . Thus, d F 2 x2 = Fχ′ 2n x2 · 2x dx χn 0 if x ≤ 0 2 · 2x = 2( n −1) − x2 1 2 x e · 2x if x > 0. n/2

fχn (x) = Fχ′ n (x) = = fχ2n x2

2

Γ (n/2)

for

7. Some Special Distributions

119

2

1.5

1

0.5

0

1

2

x

3

4

5

3

4

5

Fig. 7.4. Gamma density for α = 1, λ = 2

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

1

2

x

Fig. 7.5. Gamma density for α = 2, λ = 1

7.4.11. √ = 2π(2k)! for k = 0, 1, 2, ... by induction. For k = 0 We prove Γ 2k+1 2k k! 2 it reduces to Equation 7.4.30, which was proved in the book. Next, assume that it is true for k − 1. Then, using also the reduction formula 7.4.11, we get Γ

2k + 1 2

2 (k − 1) + 1 2 (k − 1) + 1 2 (k − 1) + 1 +1 = Γ 2 2 2 √ √ √ π (2k − 1)! 2k π (2k)! 2 (k − 1) + 1 π (2 (k − 1))! = = · = . 2k−1 2(k−1) 2 2 (k − 1)! 2k 22k k! 2 (k − 1)! =Γ

Thus, the truth of the formula for any k − 1 implies its truth for k, and so it is proved for all k.

120

7. Some Special Distributions

0.5

0.4

0.3

0.2

0.1

0

1

2

x

3

4

5

3

4

5

Fig. 7.6. Gamma density for α = 1, λ = 1/2

1.4 1.2 1 0.8 0.6 0.4 0.2 0

1

2

x

Fig. 7.7. Gamma density for α = 1/2, λ = 1

7.4.12. U = X 2 + Y 2 is χ22 . Thus, from the solution of Exercise 7.4.13, U is 1 exponential with parameter 21 , and P(U ≤ 1) = 1 − e− 2 ≈ 0.393. 7.4.13. X Y σ and σ are i.i.d. standard normal variables, and so degrees of freedom. Thus, FU (u) = P (U ≤ u) = P Hence,

U u ≤ 2 2 σ σ

= Fχ22

u . σ2

U σ2

is χ2 with 2

7. Some Special Distributions

121

2.5 2 1.5 1 0.5

0

1

2

x

3

4

5

3

4

5

Fig. 7.8. Gamma density for α = 1/2, λ = 2

0.8

0.6

0.4

0.2

0

1

2

x

Fig. 7.9. Gamma density for α = 4, λ = 4

fU (u) = FU′ (u) = = fχ22

u σ2

u u 1 d F 2 = Fχ′ 2 · 2 2 du χ2 σ2 σ2 σ 1 0 if u ≤ 0 u · 2 = , 1 − 2σ 2 if u > 0 e σ 2σ 2

which shows that U is exponential with parameter λ = 2σ1 2 . In particular, the χ22 distribution is the same as the exponential with parameter 12 . 7.4.14. V 2 σ2 is χ with n degrees of freedom. Thus, fV (v) =

1 v · fχ2n 2 σ σ2

=

0 1 n/2 2 Γ (n/2)σ2

v σ2

n 2 −1

if v ≤ 0 v . e− 2σ2 if v > 0

122

7. Some Special Distributions

7.4.15. By Theorem 5.5.8,

and

  0 if y < 0 n FY (y) = [FX (y)] = yn if 0 ≤ y < 1  1 if y ≥ 1 n

FZ (z) = 1 − [1 − FX (z)] = Thus,

 

0 if z < 0 n 1 − (1 − z) if 0 ≤ z < 1 .  1 if z ≥ 1

fY (y) = FY′ (y) =

ny n−1 if 0 ≤ y < 1 0 otherwise

fZ (z) = FZ′ (z) =

n (1 − z)n−1 if 0 ≤ z < 1 . 0 otherwise

and

Comparing these expressions with Definition 7.4.4, we see that Y is beta with r = n and s = 1, and Z is beta with r = 1 and s = n. 7.4.16. Write f (p) = cpk (1 − p)n−k . Then, if k = 0 and k = n, f ′ (p) = ckpk−1 (1 − p)n−k − c (n − k) pk (1 − p)n−k−1 = cpk−1 (1 − p)n−k−1 (k (1 − p) − (n − k) p) = cpk−1 (1 − p)n−k−1 (k − np) ,

and this expression is 0 if p = nk . We have f ′ (p) = 0 for p = 0 and p = 1 as well, but for those values f (p) = 0, too, which is the minimum. Since f (p) is continuous on the closed interval [0, 1] , it has an absolute maximum there, which must be at the single critical point nk , since f nk > 0. If k = 0, then f (p) = c (1 − p)n , which has its maximum clearly at p = 0 = nk . Similarly, if k = n, then f (p) = cpk , which has its maximum at p = 1 = nk . 7.4.17. See Figures 7.10,. . . , 7.15. 7.4.18. 1. E X k Γ (r+s) Γ (r)Γ (s)

= =

1 k 1 r−1 (1 − x)s−1 dx B(r,s) 0 x · x r(r+1)···(r+k−1) (r+s)(r+s+1)···(r+s+k−1) .

2. V ar (X) = E X 2 − [E (X)]2 =

r(r+1) (r+s)(r+s+1)

=



B(r+k,s) B(r,s)

r r+s

2

=

=

Γ (r+k)Γ (s) Γ (r+s+k)

·

rs . (r+s)2 (r+s+1)

7. Some Special Distributions

123

2

1.5

1

0.5

0

0.2

0.4

x

0.6

0.8

1

x

0.6

0.8

1

Fig. 7.10. Beta density for r = 1, s = 2

2

1.5

1

0.5

0

0.2

0.4

Fig. 7.11. Beta density for r = 2, s = 1

7.4.19. In Theorem 5.6.2, Equation 5.142, we substitute fX|P (k, p) =

n k

pk (1 − p)n−k

− p)s−1 if 0 ≤ p ≤ 1 . Then, 0 otherwise multiplying these two expressions together, we obtain n−k+s−1 cpk+r−1 (1 − p) for p ∈ [0, 1] and k = 0, 1, . . . , n fP (p, k) = , 0 otherwise where we left the constant c undetermined. Its value could be determined by the coefficients in fX|P and fP and the integral in the denominator of Bayes’ Theorem, but we can find it much more easily by noting that the variable part being a power of p times a power of 1 − p, the posterior density fP must be beta. Thus, fP is beta with parameters k + r and n − k + s, and 1 c = B(k+r,n−k+s) . for k = 0, 1, . . . , n and fP (p) =

1 r−1 (1 B(r,s) p

124

7. Some Special Distributions

0.5 0.4 0.3 0.2 0.1

0

0.2

0.4

x

0.6

0.8

1

x

0.6

0.8

1

Fig. 7.12. Beta density for r = 2, s = 2

1 0.8 0.6 0.4 0.2

0

0.2

0.4

Fig. 7.13. Beta density for r = 1, s = 3

7.5.1. Clearly, Y1 and Y2 , as linear combinations of normals, are normal. To show that they are standard normal, we compute their expectations and variances: E (Y1 ) =

a11 a12 E (Z1 ) + E (Z2 ) = 0 σ1 σ1

and 1 E (Y2 ) = # 1 − ρ2

a21 a11 ρ − σ2 σ1

because E (Z1 ) = E (Z2 ) = 0.

E (Z1 ) +

a22 a12 ρ − σ2 σ1

! E (Z2 ) = 0

7. Some Special Distributions

125

7 6 5 4 3 2 1 0

0.2

0.4

x

0.6

0.8

1

0.6

0.8

1

Fig. 7.14. Beta density for r = 1/2, s = 1

4

3 2 1

0

0.2

0.4

x

Fig. 7.15. Beta density for r = 11, s = 21

Now, V ar (Y1 ) =

a11 σ1

2

V ar (Z1 )+

a12 σ1

2

V ar (Z2 ) =

a11 σ1

because V ar (Z1 ) = V ar (Z2 ) = 1 and σ12 = a211 + a212 . Also,

2

+

a12 σ1

2

= 1,

126

7. Some Special Distributions

V ar (Y2 ) 1 = 1 − ρ2

$

a21 a11 ρ − σ2 σ1

2

a22 a12 ρ V ar (Z1 ) + − σ2 σ1 $ % 2 2 1 a21 a11 ρ a22 a12 ρ = − + − 1 − ρ2 σ2 σ1 σ2 σ1

%

2

V ar (Z2 )

1 2 2 [(σ1 a21 − σ2 a11 ρ) + (σ1 a22 − σ2 a12 ρ) ] (1 − ρ2 ) σ12 σ22 1 = σ2 a2 + a222 − 2σ1 σ2 ρ (a21 a11 + a22 a12 ) + σ22 ρ2 a211 + a212 (1 − ρ2 ) σ12 σ22 1 21 1 = σ2 σ2 − 2σ12 σ22 ρ2 + σ12 σ22 ρ2 = 1. (1 − ρ2 ) σ12 σ22 1 2

=

Similarly, Cov (Y1 , Y2 ) = E (Y1 Y2 ) 1

a11 a12 Z1 + Z2 σ1 σ1

=E

# 1 − ρ2

a21 a11 ρ − σ2 σ1

Z1 +

a22 a12 ρ − σ2 σ1

Z2

and so, using the fact that Z1 and Z2 are independent standard normal, 1

a11 σ1

a21 a11 ρ − σ2 σ1

a12 + σ1

a22 a12 ρ − σ2 σ1

!

Cov (Y1 , Y2 ) = # 1 − ρ2 1 =# σ1 σ2 (a11 a21 + a12 a22 ) − σ22 a211 + a212 ρ 1 − ρ2 σ12 σ22 1 =# σ12 σ22 ρ − σ12 σ22 ρ = 0. 1 − ρ2 σ12 σ22

7.5.2. 1.

1 −1 exp f (x1 , x2 ) = 2π · 0.6 · 6 0.72

$

x1 − 2 3

2

− 2 · 0.8 ·

x1 − 2 3

2. E (X2 |X1 = x1 ) = −1 + 0.8 · 2 ·

x1 − 2 = (−31 + 8x1 ) /15 3

and E (X1 |X2 = x2 ) = 2 + 0.8 · 3 ·

x2 + 1 = 3.2 + 1.2x2 , 2

x2 + 1 2

+

x2 + 1 2

2

%

,

!

7. Some Special Distributions

127

3. V ar (X2 |X1 = x1 ) = 0.36 · 4 = 1.44 and V ar (X1 |X2 = x2 ) = 0.36 · 9 = 3.24, 4.

and

− x2 + 1 − 0.8 · 2 · 1 fX2 |X1 (x2 |x1 ) = √ exp 2 · 0.36 · 4 2π · 0.6 · 2 − x1 − 2 − 0.8 · 3 · 1 fX1 |X2 (x1 |x2 ) = √ exp 2 · 0.36 · 9 2π · 0.6 · 3

5.

and

− (x1 − 2) 1 exp fX1 (x1 ) = √ 2·9 2π · 3

x1 −2 2 3

x2 +1 2 2

,

2

1 − (x2 + 1)2 exp fX2 (x2 ) = √ . 2·4 2π · 2

7.5.3. Equating the coefficients of like powers in the exponents in Equation 7.147 and in the present problem, we get, for the coefficient of x21 , 1 − ρ2 σ12 = 1, for the coefficient of x1 x2 , (1−ρ−2ρ = 1, and for the coefficient of x22 , 2 )σ σ 1 2 1 2 2 1 − ρ σ2 = 2 . These three equations for the three unknowns yield ρ = − √18 , σ1 = 87 , σ2 = √27 . Now, for the coefficients of x1 and x2 we get −2µ1 − µ2 = −2 and −µ1 − 4µ2 = 6. Hence, µ1 = 2 and µ2 = −2. Furthermore, $ % 2 2 −1 x1 − µ1 x1 − µ1 x2 − µ2 x2 − µ2 − 2ρ + 2 (1 − ρ2 ) σ1 σ1 σ2 σ2 =

−1 2 x1 + x1 x2 + 2x22 − 2x1 + 6x2 + 8 . 2

This exponent differs from the given one by the constant −1 2 · 8 = −4,√which √ can be split off, and since √ 1 2 = 4π7 , we obtain A = 4π7 e−4 . 2π

(1−ρ )σ1 σ2

Thus, by Theorem 7.5.2 (X1 , X2 ) is a bivariate normal pair with the above parameters. 7.5.4. fX (x) = +

∞ −∞



f (x, y) dy =

−∞

2 1 √ −x2 /2 [ 2e − e−x 2π

√ −y2 /2 2 2 2e − e−y dye−x ] = 1 2π



2

e−y dy

−∞

√ −x2 /2 √ √ 2 2 1 −x2 /2 2e − e−x π + πe−x = e . 2π

128

7. Some Special Distributions ∞

2

1 −y /2 e . Thus, X and Y are stanSimilarly, fY (y) = −∞ f (x, y) dx = 2π dard normal, and so E (X) = E (Y ) = 0 and Cov (X, Y ) = E (XY ) . Now, ∞ ∞ E (XY ) = −∞ −∞ xyf (x, y) dxdy = 0 because f is an even function of both x and y. Hence, Cov (X, Y ) = 0 but, clearly, f (x, y) = fX (x)fY (y), that is, X and Y are not independent.

7.5.5. By the result of Exercise 6.4.8, Cov (T1 , T2 ) = cos θ sin θV ar (X1 ) + cos2 θ − sin2 θ Cov (X1 , X2 ) − cos θ sin θV ar (X2 ) . Hence, 2Cov (T1 , T2 ) = sin 2θ (V ar (X1 ) − V ar (X2 )) + 2 cos 2θCov (X1 , X2 ) . By Theorems 6.4.2, 7.5.1 and 7.5.4, T1 and T2 are independent if and only if their covariance is zero, that is, sin 2θ (V ar (X1 ) − V ar (X2 )) + 2 cos 2θCov (X1 , X2 ) = 0. This equation is equivalent to cos 2θ V ar (X2 ) − V ar (X1 ) = . sin 2θ 2Cov (X1 , X2 ) 7.5.6. X1 +X2 2

is normal with µ = 122.4. Thus, P

X1 + X2 > 80 2

µ1 +µ2 2

=1−Φ

= 70 and σ 2 = 14 σ12 + 14 σ22 +

80 − 70 √ 122.4

ρσ1 σ2 2

=

≈ 0.183.

7.5.7. The conditional expected score on the second exam is given by Equation 7.141 as E (X2 |X1 = 80) = 70+0.70·12· 80−70 = 77. The conditional variance of X2 12 is given by Equation 7.142 as V ar (X2 |X1 = 80) = 1 − 0.702 122 = 73.44. From the table, the 90th percentile of the standard normal distribution is z.90 ≈ 1.28. √ Since X2 under the condition X1 = 80 is normal with µ = 77 and σ = 73.44 ≈ 8.57, we obtain x.90 ≈ 77 + 8.57 · 1.28 ≈ 88. 7.5.8. The height of the man at the third quartile in standard units is given by z1 such that Φ (z1 ) = .75. Thus, from the table we find z1 ≈ 0.6745. Thus, by Equation 7.145, z2 ≈ 0.7 · 0.6745 ≈ 0.4721, which, according to the table, is at the 68th percentile. This is an instance of the regression effect: the height’s

7. Some Special Distributions

129

being at the 75th percentile “predicts” the weight’s being above average, but not by as much as the height. 7.5.9. If (X1 , X2 ) is bivariate normal as given by Definition 7.5.1, then aX1 +bX2 is a linear combination of the independent normals Z1 and Z2 , plus a constant, and so Theorems 7.2.4 and 7.2.6 show that it is normal. To prove the converse, assume that all linear combinations of X1 and X2 are normal, and choose two linear combinations, T1 = a1 X1 + b1 X2 and T2 = a2 X1 + b2 X2 such that Cov (T1 , T2 ) = 0. Such a choice is always possible, since if Cov (X1 , X2 ) = 0, then T1 = X1 and T2 = X2 will do, and otherwise the rotation from Exercise 7.5.5 achieves it. Next, we proceed much as in the proof of Theorem 7.5.1: Let ψ denote the bivariate moment generating function of (T1 , T2 ) , that is, let ψ (s, t) = E esT1 +tT2 . Now, Y = sT1 + tT2 is normal, because it is a linear combination of X1 and X2 . Denoting the parameters of T1 and T2 by µ1 , σ1 and µ2 , σ2 , respectively, we have µY = sµ1 + tµ2 and σY2 = s2 σ12 + t2 σ22 . (There is no term here with σ1,2 , because we have chosen T1 and T2 so that Cov (T1 , T2 ) = 0.) Denote the mgf. of Y by ψY , that is, let ψY (t) = E etY . Then, by Equation 2 2 2 2 7.2.15, ψ (s, t) = ψY (1) = esµ1 +tµ2 +(s σ1 +t σ2 )/2 , which we can factor as 2

2

2

2

ψ (s, t) = esµ1 +s σ1 /2 etµ2 +t σ2 /2 = ψT1 (s) ψT2 (t) . This equation shows that T1 and T2 are independent. Now, define Z1 and Z2 as the standardizations of T1 and T2 . Inverting the transformations that have led from X1 , X2 to the independent normals Z1 , Z2 , we can write X1 and X2 in the form given in Definition 7.5.1, showing thereby that (X1 , X2 ) is bivariate normal. 7.5.10. Let X1 denote the height of the husband and X2 the height of his wife. Then X1 − X2 is normal with µ = µ1 − µ2 = 4 and σ2 = σ12 + σ22 + 2ρσ1 σ2 = 42 + 3.62 + 2 · 0.25 · 4 · 3.6 = 36.16. P (X1 − X2 < 0) = Φ

0−4 √ 36.16

≈ 0.253.

7.5.11. µU1 = µ1 +2µ2 = 2+2·(−1) = 0, µU2 = µ1 −2µ2 +1 = 2−2·(−1)+1 = 5, 2 = σ12 +4σ22 +4σ1,2 = 32 +4·22 +4·4.8 = 44.2, σ1,2 = σ1 σ2 ρ = 3·2·0.8 = 4.8, σU 1 2 2 2 2 σU2 = σ1 + 4σ2 − 4σ1,2 = 3 + 4 · 22 − 4 · 4.8 = 5.8, σU1 ,U2 = σ12 − 4σ22 = −7 32 − 4 · 22 = −7, and ρU1 ,U2 = √44.2·5.8 ≈ −0.437. Thus fU1 ,U2 (u1 , u2 ) ≈ $

1 −1 # √ exp · 2 2 (1 − 0.4372 ) 2π (1 − 0.437 ) 44.2 · 5.8

u21 + 2 · 0.437 44.2

u √ 1 44.2

u2 − 5 √ 5.8

% (u2 − 5)2 . + 5.8

130

7. Some Special Distributions

7.5.12. Solving the given quadratic equation for x2 , we get 2 1 x2 = x1 + 23 ± 3 3

−5 (x1 − 69)2 + 9c.

Thus the average of the two solutions is 2 2 x2 = x1 + 23 = 69 + (x1 − 69) , 3 3 which is the equation of the regression line.

8. The Elements of Mathematical Statistics

8.1.1. Replacing σ2 by v in Equation 7.1.10, we get L (µ, v) =

n

1 √ 2πv

e−

(xi −µ)2 /2v

.

Hence, √ n 1 ln L (µ, v) = −n ln 2π − ln v − 2 2v

n i=1

(xi − µ)2

and to find the critical point we set ∂ n 1 ln L (µ, v) = − + 2 ∂v 2v 2v Solving for v results in v =

n i=1

n i=1 (xi

1 n

8.1.2.

L (λ) = fn (x; λ) =

n &

Hence, xi

and so d n ln (L (λ)) = − dλ λ yields "= λ

n 1 = . xi xn

− µ)2 , which is σ "2 .

λe−λxi = λn e−λ

i=1

ln (L (λ)) = n ln λ − λ

(xi − µ)2 = 0.

xi = 0

xi

.

132

8. The Elements of Mathematical Statistics

The function L has a maximum at this value, because d2 n ln (L (λ)) = − 2 < 0 2 dλ λ " at λ. 8.1.3. If X ∗ is a discrete r.v. with n possible values xi and p.f. f (xi ) = all i, then, by Equation 6.5, n

µ = E (X ∗ ) = i=1

1 n

for

1 xi n

and, by Definition 6.2.1 and Theorem 6.1.3, n

1 (xi − µ)2 = σ "2 . n

V ar (X ∗ ) = i=1

8.1.4.

L (λ) = fn (x; λ) =

n & λxi e−λ

i=1

xi !

.

Hence, ln (L (λ)) =

xi ln λ − nλ − ln (xi !)

and so d ln (L (λ)) = dλ

xi −n=0 λ

yields "= λ

xi = xn . n

The function L has a maximum at this value, because d2 xi ln (L (λ)) = − 2 < 0 2 dλ λ " provided that xi > 0. (Since each xi is a nonnegative integer, xi ≥ at λ, " = 0 does not give a Poisson distribution, 0 must hold. If xi = 0, then λ and so no MLE exists.)

8. The Elements of Mathematical Statistics

133

8.1.5. a) L (λ) = fn (x; λ) =

n &

λxλ−1 = λn i

i=1

Hence,

xλ−1 . i

i=1

ln (L (λ)) = n ln λ + (λ − 1) and so d n ln (L (λ)) = + dλ λ yields the critical value " = −n . λ ln xi

n &

ln xi

ln xi = 0

The function L has a maximum at this value, because n d2 " ln (L (λ)) = − 2 < 0 at λ = λ. dλ2 λ b) 1

E(X) =

λxλ dx =

0

λxλ+1 λ+1

1

= 0

λ . λ+1

Hence, λE(X) + E(X) = λ and so λ=

E(X) . 1 − E(X)

Thus, the method of moments gives " = xn . λ 1 − xn 8.1.6. f (x; θ1 , θ2 ) = Thus,

1 for θ1 ≤ x ≤ θ2 . θ2 − θ1

L (θ1 , θ2 ) = fn (x; θ1 , θ2 ) =

1 θ2 − θ1

n

for θ1 ≤ xi ≤ θ2 and for all i. The latter condition is equivalent to θ1 ≤ min{x1 , x2 , ..., xn } and max{x1 , x2 , ..., xn } ≤ θ2 . Hence, L (θ1 , θ2 ) will be maximum when θ2 − θ1 is as small as possible, that is, when θ2 is minimum and θ1 is maximum. This will happen when θ1 = min{x1 , x2 , ..., xn } and θ2 = max{x1 , x2 , ..., xn }.

134

8. The Elements of Mathematical Statistics

8.1.7. By Theorem 5.5.8, if Y = max{X1 , X2 , ..., Xn } for i.id. random variables n with common d.f. FX , then FY (y) = [FX (y)] . Thus, in the present case, y n yn−1 FY (y) = θ for 0 < y < θ and fY (y) = n θn . Therefore, E (Θ) =

n+1 n

θ

yn 0

y n−1 n + 1 y n+1 dy = n θ θn n + 1

θ

= θ, 0

which shows that Θ is an unbiased estimator of θ. 8.1.8. The required confidence intervals are given as in Example 8.1.7 by xn − c √σn , xn + c √σn , with xn = µ " = 20 and, according to the remark

for following Definition 8.1.4, with σ " = 4 in place of σ. Here c = Φ−1 γ+1 2 confidence level γ, as given by Equation 8.27. Thus, for γ = .90 we have c = Φ−1 1.90 ≈ 1.645 and confidence interval 2 4 4 20 − 1.645 · √ , 20 + 1.645 · √ 50 50

= (20 − 0.93, 20 + 0.93) .

Similarly, for γ = .95, c = Φ−1 1.95 ≈ 1.96 and the confidence interval is 2 20 ± 1.11, and for γ = .99, c = Φ−1 1.99 ≈ 2.576 and the confidence interval 2 is 20 ± 1.46. 8.1.9. This problem is an instance of the general case considered in Example p) 285 = 0.57 and p"(1−" = 0.57(1−0.57) ≈ 0.022. 8.1.8. Here p" = 500 n 500 We are given the successive γ values 0.90, 0.95 and 0.99, for each of which we need to solve the equation γ = 2Φ (c) − 1 to obtain c. Hence, . Now, Φ−1 1.90 ≈ 1.64, Φ−1 1.95 ≈ 1.96, and Φ−1 1.99 ≈ c = Φ−1 γ+1 2 2 2 2 2.58. So, the required approximate confidence intervals (expressed in decimals rather than percents) are (0.57 − 1.64 · 0.022, 0.57 + 1.64 · 0.022) ≈ (0.534, 0.606) , (0.57 − 1.96 · 0.022, 0.57 + 1.96 · 0.022) ≈ (0.527, 0.613) , and (0.57 − 2.58 · 0.022, 0.57 + 2.58 · 0.022) ≈ (0.513, 0.627) .

8.1.10. As in Example 8.1.9, the normal approximation gives P

Xn − µ √ < c = Φ (c) σ/ n

for any c > 0, or, equivalently, σ P X n − c √ < µ = Φ (c) . n

8. The Elements of Mathematical Statistics

135

Thus, with µ " = 533 − 520 √= 13 = xn and σ "2 = 2 · 602 , the interval µ " − c √σ"n , ∞ = 13 − c 60·10 2 , ∞ is a γ = Φ (c) level confidence inter√

val for µ. Setting 13 − c 60·10 2 = 0, we obtain c ≈ 1.53. This value corresponds to a γ = Φ (1.53) ≈ 0.94 confidence level for the (0, ∞) interval. In other words, we can have 94% confidence that the increase is real. 8.2.1. We use a large-sample Z-test. The null hypothesis is that the sample was selected from the student population with mean grade 66 and SD 24, that is, H0 is µ = 66. The alternative HA is that the students in the sample come from a different population, for which µ < 66. The test statistic is X, which we take to be approximately normal, because n is sufficiently large for the CLT to apply. The rejection region is the set (−∞, 53]. We compute the P-value as P X ≤ 53|H0 = P

X − 66 53 − 66 ≤ 24 24

≈ Φ (−0.542) ≈ 0.294.

This probability is high enough for us to accept the null hypothesis, that is, that the low average of this class is due to chance, these students may well come from a population with mean grade 66. 8.2.2. We use a large-sample Z-test for the mean weight µ of all the cows after the diet. We take H0 : µ = 500 and HA : µ > 500. The test statistic is X, the mean weight of the cows in the sample after the diet. The rejection region is {x ≥ 508} . We assume that X is approximately normal with SD √2550 ≈ 3.5. We compute the P-value as P X ≥ 508|H0 = P

X − 500 508 − 500 ≥ 3.5 3.5

≈ 1 − Φ (2.2857) ≈ 0.011.

Thus, we reject the null hypothesis at the 5% level: the diet is probably effective. At the 1% level the decision is close: if we apply the test rigidly, we should accept H0 . Probably more testing is needed, however, the improvement is slight and the decision might hinge on other factors, like the price and availability of the new diet. 8.2.3. We use a large-sample paired Z-test for the mean increase µ = µ2 − µ1 of the weights, with µ1 denoting the hypothetical mean weight of the cow population before the diet and µ2 that after the diet. We take H0 : µ = 0 and HA : µ > 0. The test statistic is X, the mean weight increase of the cows in the sample. The rejection region is {x ≥ 10} . We assume that X is approximately normal with SD √2050 ≈ 2.83. We compute the P-value as P X ≥ 10|H0 = P

X −0 10 ≥ 2.83 2.83

≈ 1 − Φ (3.53) ≈ 0.0002.

136

8. The Elements of Mathematical Statistics

Thus, we reject the null hypothesis: the diet is very likely to be effective; however, the improvement is slight and the decision might hinge on other factors, like the price and availability of the new diet. 8.2.4. Let p denote the probability that any randomly selected person for this jury pool is black. The hypotheses are H0 : p = 0.10 and HA : p < 0.10. The test statistic we use is the number X of blacks in the jury pool. The rejection region is {x = 0} . This X is binomial and the P-value is P(X = 0|H0 ) = 0.9050 ≈ 0.005. This is highly significant evidence against H0 , that is, for discrimination. 8.2.5. The ages at first marriage of 83 Roman men were found to have a sample mean of 21.17 and sample standard deviation 5.47.Assuming a random sample, size 83 is suficiently large for safely √ using the Z-test, in this case with σ = 5.47 and standard deviation 5.47/ 83 = 0.6 for X. We take the null hypothesis to be that the population mean is µ = 28, and the alternative that it is less. With the above assumptions, we can compute the P-value, that is, the probability that the sample mean turns out to be 21.17 or less if the population mean is 28, as X − 28 21.17 − 28 ≤ 0.6 0.6 21.17 − 28 ≈Φ ≈ Φ (−11.4) ≈ 2 · 10−30 . 0.6

P X ≤ 21.17 = P

If the population mean is assumed to be 24, then the P-value is X − 24 21.17 − 24 ≤ 0.6 0.6 21.17 − 24 ≈Φ = Φ (−4.7) ≈ 1.3 · 10−6 , 0.6

P X ≤ 21.17 = P

still a very low value. Thus, both null hypotheses µ = 28 and µ = 24 must be rejected with practical certainty, unless the assumptions can be shown to be invalid. 8.2.6. The test statistic we use is the number X of nondefective chips in the sample. This X is binomial and, under the assumption H0 , it has parameters n = 50 and p0 = 0.99. The rejection region is of the form {x ≤ c} and, to obtain the P-value for the actual sample, we use c = 49. Thus, the P-value is P (X ≤ 49|H0 ) = 1 − P (X = 50|H0 ) ≈ 1 − 0.9950 ≈ 0.395, and we accept H0 , although the test leaves considerable doubt about the claim, but not enough to reject it. We need a better test.

8. The Elements of Mathematical Statistics

137

8.2.7. √ From the given data, σX = σ/ n = 2/10, and so X is approximately normal with mean µ0 and SD σX = 0.2. Hence |X − .µ0 | c > 0.2 0.2 c c =2 1−Φ ≈ P |Z| > 0.2 0.2

P |X − .µ0 | > c = P

= 0.05,

or Φ

c = 0.975 0.2

and so c = 0.2 · Φ−1 (0.975) ≈ 0.2 · 1.96 = 0.392. 8.3.1. a) Here a type 2 error means that we erroneously reject an effective drug. b) Accepting the drug as effective means the same as rejecting H0 . Thus, we want π (6.5) = P X ∈ C|µ = 6.5 , which, from Equation 8.3.4, approximately equals Φ 6.65−6.5 = Φ (1) 0.15 ≈ 0.841. If µ = 6.5, then the drug has really reduced the duration of the cold from 7 to 6.5 days, and the test will correctly show with probability 0.841 that the drug works. 8.3.2. a) Here a type 2 error means that the coin is accepted as fair, when in fact, it is not. b) Accepting the coin as fair means accepting H0 . Thus, we want P X ∈ C|p = 0.55 = 1 − π (0.55) , which, by Equation 8.60, equals Φ

0.598 − 0.55 # 0.55 (1 − 0.55) /100

−Φ

0.402 − 0.55 # 0.55 (1 − 0.55) /100

≈ Φ (0.9648) − Φ (−2.9749) ≈ 0.8327 − 0.0015 = 0.8312. Thus, our test accepts a moderately unfair coin as fair with rather high probability. Not a very good test. 8.3.3. We use a large-sample Z-test. The null hypothesis is that the sample was selected from the student population with mean grade 66 and SD 24, that is, H0 is µ = 66. The alternative HA is that the students in the sample come from a different population, for which µ < 66. The test statistic is X, which we take to be approximately normal, because n is sufficiently large for

138

8. The Elements of Mathematical Statistics

the CLT to apply. The rejection region is the set (−∞, c]. We compute the c−66 P-value as α = P X ≤ c|H0 = P X−66 ≈ Φ c−66 ≈ 0.05. Thus, 24 ≤ 24 24

solving this equation for c yields c = 66+24Φ−1 (0.05) ≈ 66−24·1.645 ≈ 26.5 and the rejection region is (−∞, 26.5]. The power function is given by π (µ) = . The graph is given in Figure.8.1 P X ∈ C|µ = P X ≤ 26.5|µ ≈ Φ 26.5−µ 24 below.

1 0.8 0.6 y 0.4 0.2

0

20

40

µ

60

80

100

Fig. 8.1.

8.3.4. We assume θ0 = µ0 = 500, √ and an approximately normal T = X with mean θ = µ and SD = 25/ n. With the rejection region of the form C = (c, ∞) , we want to determine c and n such that α = .05 and β (515) = .05 as well. These conditions amount to P X > c|µ = 500 = P

X − 500 c − 500 c − 500 √ > √ µ = 500 ≈ 1−Φ √ 25/ n 25/ n 25/ n

= .05

and P X ≤ c|µ = 515 = P

X − 515 c − 515 √ ≤ √ µ = 515 ≈ Φ 25/ n 25/ n

Hence c − 500 √ = Φ−1 (0.95) ≈ 1.645 25/ n and c − 515 √ = Φ−1 (0.05) ≈ −1.645. 25/ n

c − 515 √ 25/ n

= .05.

8. The Elements of Mathematical Statistics

139

√ √ Thus, c − 500 = 1.645 · 25/ n and c − 515 = −1.645 · 25/ n and so c = 507.5 and n ≈ 30. The power function is π (µ) = P X ∈ C|µ = P X ≥ 507.5|µ ≈ Φ

507.5 − µ √ 25/ 30

with graph in Figure 8.2.

1 0.8 0.6 y 0.4 0.2

0

500

505

µ

510

515

520

Fig. 8.2.

8.3.5. Let X denote the number of nondefective chips. The rejection region is the set of integers C = {0, 1, 2, . . . , 10} . The operating characteristic function is 1 − π (p) = P X ∈ C|p =

12 12 12 11 p (1 − p)0 + p (1 − p)1 , 0 1

and its plot is given in Figure 8.3. This is not a very good test: For instance, when the probability p of a chip’s being nondefective is .8, the graph shows that the test still accepts the lot with the fairly high probability of about .3. We could improve the test by sampling more chips or by rejecting the lot if even one defective is found in the sample. 8.4.1. For the given data, we find x = 949 and σ " ≈ 62.4. We use the t-distribution with 4 degrees of freedom. We want to find t such that P(T < t) = .975. From a t-table we obtain t ≈ 2.78, and so P −2.78
2. = n − 2 2 Γ (n/2)

2

Hence, E (Fm,n ) =

n/2

n n−2 .

8.6.4. The sample proportions P"1 and P"2 of successful decreases are binomial 57 (divided by n) with expected values p"1 = 38 "2 = 72 ≈ 0.792. 70 ≈ 0.543 and p We take H0 : p2 = p1 and HA : p2 > p1 . Thus, under H0 , P"2 − P"1 is approximately normal with mean µ = 0 and p" (1 − p")

σ "=

Hence,

1 1 + n1 n2

=

95 142

1−

95 142

0.249 P P"2 − P"1 > 0.792 − 0.543 ≈ 1 − Φ 0.079

1 1 + 70 72

≈ 0.079.

≈ 1 − Φ (3.15) ≈ 0.0008.

The effect is highly significant.

8.7.1. Use the large sample formula P (Dn ≥ dn ) ≈ 2

From

2 nc

= dn we get

P (Dn ≥ dn ) ≈ 2



2 300 c

∞ k−1 −2k2 c2 e . k=1 (−1)

= 0.06, and so c ≈ 0.735. Thus,

(−1)k−1 e−2k

k=1

2

0.7352

≈ 0.65,

and, this value being fairly large, we accept H0 . 8.7.2.  0 if x < 20     2/10 if 20 ≤ x < 40    3/10 if 40 ≤ x < 50 F (x) =  5/10 if 50 ≤ x < 70    8/10 if 70 ≤ x < 80    1 if x ≥ 80

with graph in Figure 8.4. 8.7.3. Since |Fm (x) − Gn (x) | has only a finite number of values, it does assume its supremum at some values of x, that is, its supremum is its maximum.

8. The Elements of Mathematical Statistics

147

Fig. 8.4.

Also, since Fm (x) and Gn (x) are right-continuous step functions with jumps at the zi , max |Fm (x) − Gn (x) | is assumed at every point of an interval [zk , zl ) and, in particular, at zk . 8.7.4. As in Example 7.7.2, we construct the following table to find dn : xi F (xi ) Fn (xi ) d+ i d− i

.002 .002 .05 .048 .002

xi F (xi ) Fn (xi ) d+ i d− i

.499 .499 .55 .051 −.001

.004 .004 .10 .096 −.046

.060 .060 .15 .090 −.04

.099 .099 .20 .101 −.051

.217 .217 .25 .033 .017

.598 .598 .60 .002 .048

.602 .602 .65 .048 .002

.618 .618 .70 .082 −.032

.627 .627 .75 .123 −.073

.288 .288 .30 .002 .048 .630 .630 .80 .170 −.12

.366 .366 .35 −.016 .066 .766 .766 .85 .084 −.034

.391 .391 .40 .009 .041 .852 .852 .90 .048 .002

.428 .428 .45 .022 .028

.852 .852 .95 .098 −.048

Hence dn = 0.170. In the table, the entry for n = 20 under P = .20 is .231. Since dn is smaller than .231, the P-value is greater than .20. We accept the null-hypothesis. 8.7.5. Use the large sample formula ' ∞ 2 2 m+n P Dmn ≥ c ≈2 (−1)k−1 e−2k c . mn k=1

From c

m+n mn

= dmn we get c

200+300 200·300

= 0.08, and so c ≈ 0.876. Thus,

.432 .432 .50 .068 −.018 .939 .939 1 .061 −.011

148

8. The Elements of Mathematical Statistics

P (Dmn ≥ dmn ) ≈ 2



(−1)k−1 e−2k

2

0.8762

k=1

≈ 0.43.

We accept H0 . 8.7.6. As in Example 8.7.3, we construct the following table to find dmn : zi Fm (zi ) Gn (zi ) |Fm (zi ) − Gn (zi ) |

25 1/6 0 7/42

28 2/6 0 14/42

zi Fm (zi ) Gn (zi ) |Fm (zi ) − Gn (zi ) |

38 2/6 1/7 8/42

44 3/6 2/7 9/42

39 3/6 1/7 15/42 51 3/6 3/7 3/42

52 4/6 3/7 10/42 66 4/6 4/7 4/42

75 5/6 4/7 11/42

89 5/6 5/7 5/42

96 1 6/7 6/42

93 5/6 6/7 1/42

98 1 1 0/42

Thus, dmn = 15/42. The critical value at m = 6 and n = 7 in the two-sample K-S table for α = 0.05 is 30/42. Since dmn is less than this, again we accept the null hypothesis at the 5% level. In fact, the P-value is still much higher than 0.05. 8.8.1. We can write y = " a + "b (x − x) = " a − "bx + "bx, and so " c=" a − "bx. Hence " =E A " − Bx " = a − bx, E C

and

2 2 2 " = V ar A " + x2 V ar B " = σ + x2 σ = σ V ar C n n" σ12 n

1+

x2 n" σ12

,

2

" C " = Cov B, " A " − Bx " = Cov A, "B " − xV ar B " = − xσ . Cov B, n" σ12

The form y = " c + "bx for the regression line is less desirable than the form " y=" a + b (x − x) , because the nonzero covariance (unless x = 0) implies that " and C " are not independent, whereas A " and B " are. B 8.8.2. (xi − x) (yi − y) = xi (yi − y)−x (yi − y) = xi (yi − y) , which implies the statement.

xi (yi − y)−x·0 =

8. The Elements of Mathematical Statistics

149

8.8.3.

and

# √ "2 1 − ρ"2 2.7 1 − 0.512 " =σ √ SD B = √ = 0.0028 n" σ1 5721 · 10.8 x2 (1 − ρ"2 ) 1+ = n" σ12

σ "2 √ n

" = SD C

2.7 √ 5721

'

1+

52.42 (1 − 0.512 ) = 0.036. 5721 · 10.82

Thus the regression equation is very accurate because of the large sample size. 8.8.4. In Exercise 6.4.11 we obtained the empirical least squares line for five data points (xi , yi ) and found, with "1 = 72, µ "2 = 70, σ "1 = √ our current notation, µ √ 5320−70·72 5560 − 722 ≈ 19.39, σ "2 = 5130 − 702 ≈ 15.166, ρ" ≈ 19.39·15.166 ≈ 0.952, " a = 70, and "b = 0.745. Thus the prediction for the y-score is y0 = 70 + 0.745 (100 − 72) ≈ 91.

and the estimate for the SD is $ σ "=

2

5130 − 70

1−

5320 − 70 · 72 19.39 · 15.166

2

282 1.2 + 5 · (5560 − 722 )

%1/2

≈ 5.89.

# # Hence c ≈ 5.89 · 5/3 · t.90 (3) ≈ 5.89· 5/3· 1.64 ≈ 12.5 and an approximate 80% confidence interval for the predicted y-score at x0 = 100 is 91 ± 12.5. 8.8.5. We obtain the regression equation of X on Y by switching the roles of X and Y in y=µ "2 + ρ"

and so it is

σ "2 (x − µ "1 ) , σ "1

√ σ "1 5140 − 702 5220 − 702 √ x=µ "1 + ρ" (y − µ "2 ) = 70 + # (y − 70) σ "2 (5130 − 702 ) · (5220 − 702 ) 5130 − 702 24 = 70 + (y − 70) , 23 and the predicted x-score for y0 = 100 is x0 = 70 +

24 (100 − 70) = 101.3. 23

150

8. The Elements of Mathematical Statistics

Thus we see that in the original units there is no regression effect in this case because x0 − 70 = 31.3 > y0 − 70 = 30. This ”anomaly” has occurred because ρ" σ1 < σ "2 , but in standard units we do have regression here too: y0 − µ "2 100 − 70 x0 − µ "1 101.3 − 70 =√ =√ = 1.9781 > = 1.7497. 2 σ "2 σ "1 5130 − 70 5220 − 702

The SD of the predicted x-score is $ %1/2 1 (y0 − y)2 2 2 σ "= σ "1 1 − ρ" 1+ + n n" σ22 $ 2 5140 − 702 = 5220 − 702 1 − (5130 − 702 ) · (5220 − 702 )

302 1.2 + 5 · (5130 − 702 )

%1/2

≈ 11.74 # # So c ≈ 11.74 · 5/3 · t.95 (3) ≈ 11.74 · 5/3 · 2.35 ≈ 35.6, and an approximate 90% confidence interval for the predicted x-score is the interval 101.3 ± 35.6. 8.8.6. We obtain the regression equation of X on Y by switching the roles of X and Y in y=µ "2 + ρ"

and so it is

σ "2 (x − µ "1 ) , σ "1

√ σ "1 5320 − 70 · 72 5560 − 722 √ x=µ "1 + ρ" (y − µ (y − 70) "2 ) = 72 + # σ "2 (5130 − 702 ) · (5560 − 722 ) 5130 − 702 ≈ 72 + 0.952 (y − 70) ,

and the predicted x-score for y0 = 100 is x0 ≈ 72 + 0.952 (100 − 70) ≈ 100.56.

The SD of the predicted x-score is $ %1/2 1 (y0 − y)2 2 2 σ "= σ "1 1 − ρ" 1+ + n n" σ22 $ (5320 − 70 · 72)2 = 5560 − 722 1 − (5130 − 702 ) · (5560 − 722 ) ≈ 8.35.

302 1.2 + 5 · (5130 − 702 )

%1/2

8. The Elements of Mathematical Statistics

151

# # So c ≈ 8.35 · 5/3 · t.90 (3) ≈ 8.35 · 5/3 · 1.638 ≈ 17.66, and an approximate 80% confidence interval for the predicted x-score is 100.56 ± 17.66. 8.8.7. Let xi = 0 for i = 1, 2, . . . , n − 1 and xn = 1. Then σ "12 =

1 − n

1 n

2

=

n−1 n2

and

2 2 " = σ = nσ . V ar B n" σ12 n−1

Hence P



 √ √ "−b n−1 B ε n − 1 " − b < ε = P  √ B < √ nσ nσ √ ε n−1 ε √ = 2Φ − 1 → 2Φ −1