Connecting Discrete Mathematics and Computer Science (Instructor Solution Manual, Solutions) [2 ed.] 1009150499, 9781009150491

631 72 1MB

English Pages 248 Year 2022

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Connecting Discrete Mathematics and Computer Science  (Instructor Solution Manual, Solutions) [2 ed.]
 1009150499, 9781009150491

Table of contents :
Basic Data Types
Logic
Proofs
Mathematical Induction
Analysis of Algorithms
Number Theory
Relations
Counting
Probability
Graphs and Trees

Citation preview

Connecting Discrete Mathematics and Computer Science Solution Manual

This version: January 21, 2022

David Liben-Nowell

2 2.2

Basic Data Types

Booleans, Numbers, and Arithmetic 2.1 112 and 201 2.2 111 and 201 2.3 18 and 39 2.4 17 and 38 2.5 Yes: there’s no fractional part in either x or y, so when they’re summed then there’s still no fractional part. 2.6 Yes: if we can write x = neither b = 0 nor d = 0).

a b

and y =

c d

with b ̸= 0 and d ̸= 0, then we can write x + y as

ad+cb bd

(and bd ̸= 0 because

2.7 No, and here’s a counterexample: π and 1 − π are both irrational numbers, but π + (1 − π) = 1, which is rational. 2.8 6, because ⌊2.5⌋ = 2 and ⌈3.75⌉ = 4 (and 2 + 4 = 6). 2.9 3, because ⌊3.14159⌋ = 3 and ⌈0.87853⌉ = 1 (and 3 · 1 = 3). 2.10 34 = 3 · 3 · 3 · 3 = 81, because ⌊3.14159⌋ = 3 and ⌈3.14159⌉ = 4. 2.11 The functions floor and truncate differ on negative numbers. For any real number x, we have ⌊x⌋ ≤ x—even if x is negative. That’s not true for truncation. For example, ⌊−3.14159⌋ = −4 but trunc(−3.14159) = −3 .14159 = −3. 2.12 ⌊x + 0.5⌋ 2.13 0.1 · ⌊10x + 0.5⌋   2.14 10−k · 10k x + 0.5   2.15 10−k · 10k x 2.16 If x is an integer, then ⌊x⌋ + ⌈x⌉ = 2x; if x isn’t an integer, then ⌊x⌋ + ⌈x⌉ = 2 + 3 for any x in [2, 3]. Thus the expression x − ⌊x⌋+⌈x⌉ is equivalent to x − x = 0 for x = 2 or 3 (yielding values 0 and 0), and x − 2.5 for noninteger x. 2 So the largest possible value for this expression is 0.4999999 · · · , when x = 2.9999999 · · · . 2.17 By the same logic as in Exercise 2.16, the smallest value of the expression x − ⌊x⌋+⌈x⌉ is −0.5 + ϵ, which occurs for 2 x = 2 + ϵ, for arbitrarily small values of ϵ. For example, the value is −0.4999999 when x = 2.00000001. 2.18 ⌊x⌋ 2.19 ⌈x⌉

3

4

Basic Data Types

2.20 ⌈x⌉ 2.21 ⌊x⌋ 2.22 No: for example, |⌊−3.5⌋| = |−4| = 4, but ⌊|−3.5|⌋ = ⌊3.5⌋ = 3. 2.23 Yes: we have that x − ⌊x⌋ = (x + 1) − ⌊x + 1⌋ because both x and x + 1 have the same fractional part (so both x − ⌊x⌋ and (x + 1) − ⌊x + 1⌋ denote the fractional part of x). 2.24 No: if x = y = 0.5, then ⌊x⌋ + ⌊y⌋ = ⌊0.5⌋ + ⌊0.5⌋ = 0 + 0 = 0, but ⌊x + y⌋ = ⌊0.5 + 0.5⌋ = ⌊1⌋ = 1. 2.25 The key observation is that ⌈x⌉ = ⌊x⌋ when x is an integer (and otherwise ⌈x⌉ = ⌊x⌋ + 1). Thus 1 + ⌊x⌋ − ⌈x⌉ is 1 when x is an integer, and it’s 0 when x is not an integer. 2.26 Observe that ⌊ n+1 ⌋ = ⌈ n2 ⌉: it’s true when n is even, and it’s true when n is odd. Thus there are ⌈ n2 ⌉ − 1 elements less 2 than the specified entry, and n − (⌈ 2n ⌉ − 1 + 1) = n − ⌈ 2n ⌉ = ⌊ n2 ⌋ elements larger. 2.27 310 is bigger: 310 > 39 = (33 )3 = 273 , which is definitely (much) larger than 103 . 2.28 48 = (22 )8 = 216 = 65,536 2.29 1/216 =

1 , 65,536

which is (by calculator) 0.00001525878.

2.30 65,536 2.31 −4 · 65,536 = −262,144 2.32 4, because 44 = 256. 2.33 2.34

√ 4 √ 4

8 ≈ 1.6818, because 1.68184 ≈ 8.0001. (Alternatively,

√ 4

8 = (23 )1/4 = 23/4 .)

512 ≈ 4.7568, because 4.75684 ≈ 511.9877. (Alternatively, 83/4 =

√ 4

83 = (29 )1/4 = 29/4 .)

2.35 Undefined (because we don’t include imaginary numbers in this book): there’s no real number x for which x4 = −9. 2.36 log2 8 = 3, because 23 = 8. 2.37 log2 (1/8) = −3, because 2−3 = 1/23 = 1/8. 2.38 log8 2 = 1/3, because 81/3 = 2 (which is true because 23 = 8). 2.39 log1/8 2 = −1/3, because (1/8)−1/3 = 81/3 = 2. 2.40 log10 17 is larger than 1 (because 101 < 17), but log17 10 is smaller than 1 (because 171 > 10). So log10 17 is larger. 2.41 By the definition of logarithms, logb 1 is the real number y such that by = 1. By Theorem 2.8.1, for any real number b we have b0 = 1, so logb 1 = 0. 2.42 By the definition of logarithms, logb b is the real number y such that by = b. By Theorem 2.8.2, for any real number b we have b1 = b, so logb b = 1. 2.43 Let q = logb x and r = logb y. By the definition of logarithms, then, bq = x and br = y. Using Theorem 2.8.3, we have x · y = bq · br = bq+r . Because xy = bq+r , we have by the definition of logs that logb xy = q + r.

2.2 Booleans, Numbers, and Arithmetic

5

2.44 Let q = logb x. By the definition of logarithms, we have bq = x. Thus, raising both sides to the yth power, we have (bq )y = xy . We can rewrite (bq )y = bqy using Theorem 2.8.4, so xy = bqy . Again using the definition of logarithms, therefore logb xy = qy. By the definition of q, this value is y · logb x. 2.45 Let q = logc b and r = logb x, so that cq = b and br = x. Then x = (cq )r = cqr by the definition of logs and Theorem 2.8.4, and therefore logc x = qr = (logc b) · (logb x); rearranging yields the claim. 2.46 We show that b[logb x] = x by showing that logb of the two sides are equal: i h logb b[logb x] = [logb x] · logb b = [logb x] · 1 = logb x.

by Theorem 2.10.5 by Theorem 2.10.2

2.47 We show that n[logb a] = a[logb n] by showing that logb of the two sides are equal: h i logb n[logb a] = [logb a] · logb n = [logb n] · logb a h i = logb a[logb n] . 2.48 The property logb

x y

by Theorem 2.10.5 x · y = y · x for any x and y by Theorem 2.10.5, applied “backward”

= logb x − logb y follows directly from the rule for the log of a product: logb

x y

= logb (x · y−1 )

x y

−1

= logb x + logb y = logb x + (−1) · logb y. 2.49 The hyperceiling of n can be defined as ⌈n⌉ = 2⌈log2 n⌉ 2.50 The number of columns needed to write down n in standard decimal notation is   if n > 0 ⌈log10 (n + 1)⌉ 1 if n = 0  1 + ⌈log (−n + 1)⌉ if n < 0, 10 where the “1 + ” in the last case accounts for the negative sign, which takes up a column. 2.51 202 mod 2 = 0, because 202 = 0 + 101 · 2. 2.52 202 mod 3 = 1, because 202 = 1 + (67 · 3). 2.53 202 mod 10 = 2, because 202 = 2 + (20 · 10). 2.54 −202 mod 10 = 8, because −202 = 8 + (−21 · 10). 2.55 17 mod 42 = 17, because 17 = 17 + (0 · 42). 2.56 42 mod 17 = 8, because 42 = 8 + (2 · 17). 2.57 17 mod 17 = 0, because 17 = 0 + (1 · 17). 2.58 −42 mod 17 = 9, because −42 = 9 + (−3 · 17). 2.59 −42 mod 42 = 0, because −42 = 0 + (−1 · 42). 2.60 Here is a definition of the behavior of % in Python that’s consistent with the reported behavior:

= x · y−1 for any x and y by Theorem 2.10.3 by Theorem 2.10.5

6

Basic Data Types

• If k > 0, then n mod k is n − k · ⌊ nk ⌋. • If k < 0, then n mod k is −(−n mod −k) = n − k · ⌊ nk ⌋. (And Python generates a ZeroDivisionError if k = 0.) 2.61 30 2.62 1 2.63 10 2.64 68 2.65 53 2.66 A solution in Python is shown in Figure S.2.1 on p. 7. 2.67 A solution in Python is shown in Figure S.2.2 on p. 7. 2.68 A solution in Python is shown in Figure S.2.3 on p. 7. The numbers produced as output are: 4, 9, 25, 49, 121, 169, 289, 361, 529, 841, and 961. Note that these output values are 22 , 32 , 52 , 72 , 112 , 132 , . . . , 312 —the squares of all sufficiently small prime numbers. For a prime number p, the three factors of p2 are 1, p, and p2 (and p2 has no other factor). 2.69 6 + 6 + 6 + 6 + 6 + 6 = 36 2.70 1 + 4 + 9 + 16 + 25 + 36 = 91 2.71 4 + 16 + 64 + 256 + 1024 + 4096 = 5460 2.72 1(2) + 2(4) + 3(8) + 4(16) + 5(32) + 6(64) = 642 2.73 (1 + 2) + (2 + 4) + (3 + 8) + (4 + 16) + (5 + 32) + (6 + 64) = 147 2.74 6 · 6 · 6 · 6 · 6 · 6 = 46,656 2.75 1 · 4 · 9 · 16 · 25 · 36 = 518,400 2.76 22 · 24 · 26 · 28 · 210 · 212 = 242 = 4,398,046,511,104 2.77 1(2) · 2(4) · 3(8) · 4(16) · 5(32) · 6(64) = 1,509,949,440 2.78 (1 + 2) · (2 + 4) · (3 + 8) · (4 + 16) · (5 + 32) · (6 + 64) = 10,256,400 2.79 21 + 42 + 63 + 84 + 105 + 126 = 441 2.80 21 + 40 + 54 + 60 + 55 + 36 = 266 2.81 1 + 6 + 18 + 40 + 75 + 126 = 266 2.82 8 + 14 + 18 + 20 + 20 + 18 + 14 + 8 = 120 2.83 36 + 35 + 33 + 30 + 26 + 21 + 15 + 8 = 204 2.84 44 + 49 + 51 + 50 + 46 + 39 + 29 + 16 = 324

2.2 Booleans, Numbers, and Arithmetic

1 2

# Determine whether a given positive integer $n$ is prime by testing all possible divisors # between 2 and n-1. Use your program to find all prime numbers less than 202.

3 4 5 6 7 8

def isPrime(n): for d in range(2, n): if n % d == 0: return False return True

9 10 11 12

for n in range(2, 202): if isPrime(n): print(n)

13 14 15 16

# When executed, the outputted numbers are: # 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 # 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199

Figure S.2.1 A Python program testing whether a given positive integer n is prime.

1 2 3

# A perfect number is a positive integer n that has the following property: # n is equal to the sum of all positive integers k < n that evenly divide n. # Write a program that finds the four smallest perfect numbers.

4 5 6 7 8 9 10 11

def perfect(n): '''Is n a perfect number?''' factorSum = 0 for d in range(1, n): if n % d == 0: factorSum += d return factorSum == n

12 13 14 15 16 17 18 19

perfect_numbers = [] n = 1 while len(perfect_numbers) < 4: if perfect(n): perfect_numbers.append(n) n += 1 print(perfect_numbers)

20 21

# Output:

[6, 28, 496, 8128]

Figure S.2.2 A Python program testing whether a given positive integer n is a perfect number.

1

# Find all integers between 1 and 1000 that are evenly divisible by *exactly three* integers.

2 3 4 5 6 7 8

def threeFactors(n): factorCount = 0 for d in range(1,n+1): if n % d == 0: factorCount += 1 return factorCount == 3

9 10 11 12

for n in range(1001): if threeFactors(n): print(n)

Figure S.2.3 A Python program to find all integers between 1 and 1000 that are evenly divisible by exactly three integers.

7

8

Basic Data Types

2.85 (11 + 21 + 31 + 41 ) + (12 + 22 + 32 + 42 ) + (13 + 23 + 33 + 43 ) + (14 + 24 + 34 + 44 ), which is equal to (1 + 2 + 3 + 4) + (1 + 4 + 9 + 16) + (1 + 8 + 27 + 64) + (1 + 16 + 81 + 256), or 494.

2.3

Sets: Unordered Collections 2.86 Yes, 6 ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f}. 2.87 No, h ∈ / {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f}. 2.88 No, the only elements of H are single characters, so a70e ∈ / H. 2.89 |H| = 16 2.90 For the set S = {0 + 0, 0 + 1, 1 + 0, 1 + 1, 0 · 0, 0 · 1, 1 · 0, 1 · 1}: • 0 + 0 and 0 · 0 and 0 · 1 and 1 · 0 are all equal to 0. • 0 + 1 and 1 + 0 and 1 · 1 are all equal to 1. • 1 + 1 is equal to 2. So 0, 1, and 2 are elements of S, but 3 is not. 2.91 |S| = 3: although there are eight different descriptions of elements in Exercise 2.90, only three distinct values result and so there are exactly three elements in the set. 2.92 4 is in H but not in T, because 4 mod 2 = 0 but 4 mod 3 ̸= 0. Other elements in H but not T are 2, 3, 5, 8, 9, a, b, c, d, e, and f. 2.93 13 is an element of T, because 13 mod 2 = 1 = 13 mod 3. Other elements in T but not H are 12, 18, and 19. 2.94 The elements of T that are not in S are: 6, 7, 12, 13, 18, and 19. 2.95 The only element of S that is not in T is 2. 2.96 |T| = 8 2.97 {0, 1, 2, 3, 4} 2.98 {} 2.99 {4, 5, 9} 2.100 {0, 1, 3, 4, 5, 7, 8, 9} 2.101 {1, 3, 7, 8} 2.102 {0} 2.103 {1, 2, 3, 6, 7, 8} 2.104 {1, 2, 3, 4, 5, 7, 8, 9} 2.105 {4, 5}

2.3 Sets: Unordered Collections

9

2.106 C − ∼C is just C itself: {0, 3, 6, 9}. 2.107 C − ∼A is {3, 9}, so ∼(C − ∼A) is {0, 1, 2, 4, 5, 6, 7, 8}. 2.108 Yes, it’s possible: if E = Y, then we have E − Y = Y − E = ∅. (But if E ̸= Y, then there’s some element x in one set but not the other—and if x ∈ E but x ∈ / Y, then x ∈ E − Y but x ∈ / Y − E, so E − Y ̸= Y − E. A similar situation occurs if there’s a y ∈ Y but y ∈ / E.) 2.109 D ∪ E may be a subset of D: for D1 = E1 = {1}, we have D1 ∪ E1 = {1} ⊆ D; for D2 = ∅ and E2 = {1}, we have D2 ∪ E2 = {1} ̸⊆ D2 . 2.110 D ∩ E must be a subset of D: every element of D ∩ E is by definition an element of both D and E, which means that any x ∈ D ∩ E must by definition satisfy x ∈ D. 2.111 D − E must be a subset of D: every element of D − E is by definition an element of D (but not of E), which means that any x ∈ D − E must by definition satisfy x ∈ D. 2.112 E − D contains no elements in D, so no element x ∈ E − D satisfies x ∈ D—but if E − D is empty, then E − D is a subset of every set! For D1 = E1 = {1}, we have E1 − D1 = {} ⊆ D; for D2 = ∅ and E2 = {1}, we have E2 − D2 = {1} ̸⊆ D2 . Thus E − D may be a subset of D. 2.113 ∼D contains no elements in D, so no element x ∈ ∼D satisfies x ∈ D—but if ∼D is empty, then ∼D is a subset of every set, including D! For D1 = U (where U is the universe), we have ∼D1 = {} ⊆ D1 ; for D2 = ∅, we have ∼D2 = U ̸⊆ D2 . Thus ∼D may be a subset of D. 2.114 Not disjoint: 1 ∈ F and 1 ∈ G. 2.115 Not disjoint: 3 ∈ ∼F and 3 ∈ G. 2.116 Disjoint: F ∩ G = {1} and 1 ∈ / H. 2.117 Disjoint: by definition, no element is in both a set and the complement of that set. 2.118 S ∪ T is smallest when one set is a subset of the other. Specifically, if S = {1, 2, . . . , n} and T = {1, 2, . . . , m}, then S ∪ T = {1, 2, . . . , max(n, m)} and has cardinality max(n, m). 2.119 S ∩ T is smallest when the two sets are disjoint. Specifically, if S = {1, 2, . . . , n} and T = {−1, −2, . . . , −m}, then S ∩ T = ∅ and has cardinality 0. 2.120 S − T is smallest when as many elements of S as possible are also in T. Specifically, if S = {1, 2, . . . , n} and T = {1, 2, . . . , m}, then S − T = {m + 1, m + 2, . . . , n} if n ≥ m + 1, and otherwise has cardinality 0. 2.121 S∪T is largest when the two sets are disjoint. Specifically, if S = {1, 2, . . . , n} and T = {n + 1, n + 2, . . . , n + m}, then S ∪ T = {1, 2, . . . , n + m} and has cardinality n + m. 2.122 S ∩ T is largest when one set is a subset of the other. Specifically, if S = {1, 2, . . . , n} and T = {1, 2, . . . , m}, then S ∩ T = {1, 2, . . . , min(n, m)} and has cardinality min(n, m). 2.123 S − T is largest when no element T is also in S, so that S − T = S. Specifically, if S = {1, 2, . . . , n} and T = {−1, −2, . . . , −m}, then S − T = {1, 2, . . . , n} and has cardinality n. 2.124 |A ∩ B| = 2, |A ∩ C| = 1, and |B ∩ C| = 1. 2.125 We have |A ∩ B| = 2, |A ∩ C| = 1, and |B ∩ C| = 1; and furthermore |A ∪ B| = 5, |A ∪ C| = 3, and |B ∪ C| = 4. Thus the Jaccard similarity of A and B is 52 = 0.4; the Jaccard similarity of A and C is 31 = 0.33; and the Jaccard similarity of B and C is 14 = 0.25.

10

Basic Data Types

2.126 No. If A = {1} and B = {1, 2, 3} and C = {2, 3, 4}, then A’s closest set is B but B’s closest set is C, not A. 2.127 Still no! If A = {1} and B = {1, 2, 3} and C = {2, 3, 4}, then A’s closest set is B (a Jaccard coefficient of versus 04 for C) but B’s closest set is C (a Jaccard coefficient of 42 ), not A (a Jaccard coefficient of 13 ).

1 3

for B,

2.128 True. Looking at Venn diagrams for ∼A and ∼B, we see that A

B

B

A unioned with

A

B

is

which includes every element not in A ∩ B. Thus ∼A ∪ ∼B contains precisely those elements not in A ∩ B, and therefore A ∩ B = ∼(∼A ∪ ∼B). 2.129 True. Looking at Venn diagrams for ∼A and ∼B, we see that A

B

B

A intersected with

A

B

is

which includes every element not in A ∪ B. Thus ∼A ∩ ∼B contains precisely those elements not in A ∪ B, and therefore A ∪ B = ∼(∼A ∩ ∼B). 2.130 No, (A − B) ∪ (B − C) and (A ∪ B) − C are not the same. The first set includes elements of A ∩ ∼B ∩ C; the latter doesn’t: (A − B) ∪ (B − C) B

A

(A ∪ B) − C B

A

C

C

For example, suppose A = C = {1} and B = ∅. Then (A − B) ∪ (B − C) = ({1} − ∅) ∪ (∅ − {1}) = {1} − ∅ = {1}, but (A ∪ B) − C = ({1} ∪ ∅) − {1} = {1} − {1} = ∅. 2.131 Yes, (B − A) ∩ (C − A) and (B ∩ C) − A are the same. The first set contains any element that’s in B but not A and also in C but not A—in other words, an element in B and C but not A. That’s just the definition of (B ∩ C) − A: B

A

C

2.132 There are five ways of partitioning {1, 2, 3}: • All separate: {{1} , {2} , {3}}. • 1 alone; 2 and 3 together: {{1} , {2, 3}}.

2.4 Sequences: Ordered Collections

11

• 2 alone; 1 and 3 together: {{2} , {1, 3}}. • 3 alone; 1 and 2 together: {{3} , {1, 2}}. • All together: {{1, 2, 3}}. 2.133 A partition that does it is {{Alice, Bob, Desdemona} , {Charlie} , {Eve, Frank}} . The ABD set has intracluster distances {0.0, 1.7, 0.8, 1.1}; the EF set has intracluster distances {0.0, 1.9}; and the only intracluster distance in the C set is 0.0. The largest of these distances is less than 2.0. 2.134 Just put each person in their own subset. The only way x ∈ Si and y ∈ Si is when x = y, so the intracluster distance is 0.0. 2.135 Again, just put all people in their own subset. The intercluster distance is the largest entry in the table, namely 7.8. 2.136 No, it’s not a partition of S, because the sets are not disjoint. For example, 6 ∈ W and 6 ∈ H. 2.137 {{} , {1} , {a} , {1, a}} 2.138 {{} , {1}} 2.139 {{}} 2.140 P(P(1)) = P({{} , {1}}). This set is the power set of a 2-element set, and so it contains four elements: the empty set, the two singleton sets, and the set containing both elements:   {} ,     {{}} , .    {{1}} ,  {{} , {1}}

2.4

Sequences: Ordered Collections 2.141 {⟨1, 1⟩, ⟨1, 4⟩, ⟨1, 16⟩, ⟨2, 1⟩, ⟨2, 4⟩, ⟨2, 16⟩, ⟨3, 1⟩, ⟨3, 4⟩, ⟨3, 16⟩} 2.142 {⟨1, 1⟩, ⟨1, 2⟩, ⟨1, 3⟩, ⟨4, 1⟩, ⟨4, 2⟩, ⟨4, 3⟩, ⟨16, 1⟩, ⟨16, 2⟩, ⟨16, 3⟩} 2.143 {⟨1, 1, 1⟩} 2.144 {⟨1, 2, 1⟩, ⟨1, 2, 4⟩, ⟨1, 2, 16⟩, ⟨1, 3, 1⟩, ⟨1, 3, 4⟩, ⟨1, 3, 16⟩, ⟨2, 2, 1⟩, ⟨2, 2, 4⟩, ⟨2, 2, 16⟩, ⟨2, 3, 1⟩, ⟨2, 3, 4⟩, ⟨2, 3, 16⟩} 2.145 A = {1, 2} and B = {1}. 2.146 T = {2, 4}. Because we are told that ⟨?, 2⟩, ⟨?, 4⟩ ∈ S × T (for some values of ?), we know that 2 ∈ T and 4 ∈ T. And because, for S = {1, 2, 3, 4, 5, 6, 7, 8}, the set S × {2, 4} contains 16 elements already, there can’t be any other elements in T. 2.147 T = ∅. If there were any element t ∈ T, then ⟨1, t⟩ ∈ S × T—but we’re told that S × T is empty, so there can’t be any t ∈ T. 2.148 We can conclude that 3 ∈ T and that 1, 2, 4, 5, 6, 7, 8 ∈ / T (but we cannot conclude anything about any other possible elements in T). For an element ⟨x, x⟩ to be in S × T, we need x ∈ S ∩ T; for ⟨x, x⟩ not to be in S × T, we need x ∈ / S ∩ T. Thus 3 ∈ T, and 1, 2, 4, 5, 6, 7, 8 ∈ / T. But T can contain any other element not in S: for example, if T = {3, 9} then

12

Basic Data Types

S × T = {⟨1, 3⟩, . . . , ⟨8, 3⟩, ⟨1, 9⟩, . . . , ⟨8, 9⟩} and T × S = {⟨3, 1⟩, . . . , ⟨3, 8⟩, ⟨9, 1⟩, . . . , ⟨9, 8⟩}. The only element that appears in both of these sets is ⟨3, 3⟩. 2.149 Every pair in S × T must also be in T × S, and vice versa. This situation can happen in two ways. First, we could have T = S = {1, 2, . . . , 8}, so that S × T = T × S = S × S. Second, we could have T = ∅, because S × ∅ = ∅ × S = ∅. Those are the only two possibilities, so either T = S or T = ∅. 2.150 {a, h} × {1} 2.151 {c, f} × {1, 8} 2.152 {a, b, c, d, e, f, g, h} × {2, 7} 2.153 {a, b, c, d, e, f, g, h} × {3, 4, 5, 6} 2.154 There are 27 elements in this set: {⟨0, 0, 0⟩, ⟨0, 0, 1⟩, ⟨0, 0, 2⟩, ⟨0, 1, 0⟩, ⟨0, 1, 1⟩, ⟨0, 1, 2⟩, ⟨0, 2, 0⟩, ⟨0, 2, 1⟩, ⟨0, 2, 2⟩, ⟨1, 0, 0⟩, ⟨1, 0, 1⟩, ⟨1, 0, 2⟩, ⟨1, 1, 0⟩, ⟨1, 1, 1⟩, ⟨1, 1, 2⟩, ⟨1, 2, 0⟩, ⟨1, 2, 1⟩, ⟨1, 2, 2⟩, ⟨2, 0, 0⟩, ⟨2, 0, 1⟩, ⟨2, 0, 2⟩, ⟨2, 1, 0⟩, ⟨2, 1, 1⟩, ⟨2, 1, 2⟩, ⟨2, 2, 0⟩, ⟨2, 2, 1⟩, ⟨2, 2, 2⟩} 2.155 Omitting the angle brackets and commas, the set is: {ACCE, ACDE, ADCE, ADDE, BCCE, BCDE, BDCE, BDDE}. 2.156 Omitting the angle brackets and commas, the set is: {0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111}. 2.157 Σ8 2.158 (Σ − {A, E, I, O, U})5 2.159 Here’s one reasonably compact way of representing the answer: in a sequence of 6 symbols, we’ll allow the ith symbol to be any element of Σ, while the others must not be vowels. The desired set is: 5 h [

i (Σ − {A, E, I, O, U})i × Σ × (Σ − {A, E, I, O, U})5−i .

i=0

(There are many other ways to express the answer too.) 2.160 Here’s one of many ways to express this set: [

[(Σ − {A, E, I, O, U}) ∪ {v}]6 .

v∈{A,E,I,O,U}

2.161 2.162 2.163



12 + 32 =

p √



1+9=

22 + (−2)2 =

42 + 02 =







10

4+4=

16 + 0 =





8

16 = 4

2.164 ⟨1 + 2, 3 + (−2)⟩ = ⟨3, 1⟩ 2.165 ⟨3 · (−3), 3 · (−1)⟩ = ⟨−9, −3⟩ 2.166 ⟨2 · 1 + 4 − 3 · 2, 2 · 3 + 0 − 3 · (−2)⟩ = ⟨0, 12⟩ 2.167 ∥a∥ + ∥c∥ = 2.168 ∥a∥ + ∥b∥ =



10 + 4 ≈ 7.1623, while ∥a + c∥ = ∥⟨5, 3⟩∥ =



10 +





52 + 32 =

8 ≈ 5.9907, while ∥a + b∥ = ∥⟨3, −1⟩∥ =

p



34 ≈ 5.8310.

32 + (−1)2 =



10 ≈ 3.1623.

2.4 Sequences: Ordered Collections

13

p √ 2.169 3∥d∥ = 3 (−3)2 + (−1)2 = 3 10 ≈ 9.4868.p √ We get the same value for ∥3d∥ = ∥⟨−9, −3⟩∥ = (−9)2 + (−3)2 = 90 ≈ 9.4868. 2.170 Consider an arbitrary vector x ∈ Rn and an arbitrary scalar a ∈ R. Here’s a derivation to establish that ∥ax∥ = a∥x∥: ∥ax∥ =

sX

[(ax)i ]2

definition of length

i

=

sX s

X

x2i

a2 is the same for every i

i

s =

definition of vector times scalar

i

a2

=

a2 x2i

a2 ·

sX

sX =a x2i

x2i

Theorem 2.8.5

i

definition of square root

i

= a∥x∥.

definition of length

2.171 We have ∥x∥ + ∥y∥ = ∥x + y∥ whenever x and y “point in exactly the same direction”—that is, when we can write x = ay for some scalar a ≥ 0. In this case, we have ∥x∥ + ∥y∥ = ∥x∥ + ∥ax∥ = (1 + a)∥x|, and we also have ∥x + y∥ = ∥x + ax∥ = ∥(1 + a)x∥ = (1 + a)∥x∥. When x ̸= ay, then the two sides aren’t equal. Visually, the sum of the lengths of the dashed lines in the picture below show ∥x∥ + ∥y∥ (and the dotted lines show ∥y∥ + ∥x∥), while the solid line has length ∥x + y∥. Because the latter “cuts the corner” whenever x and y don’t point in exactly the same direction, it’s smaller than the former. x+y y x

2.172 1 · 2 + 3 · −2 = 2 − 6 = −4 2.173 1 · −3 + 3 · −1 = −3 − 3 = −6 2.174 4 · 4 + 0 · 0 = 16 + 0 = 16 2.175 The Manhattan distance is |1 − 2| + |3 − −2| = 1 + 5 = 6; the Euclidean distance is √ √ 12 + 52 = 26 ≈ 5.099. 2.176 The Manhattan distance is 4 + 4 = 8; the Euclidean distance is 2.177 The Manhattan distance is 2 + 2 = 4; the Euclidean distance is

√ √

42 + 42 = 22 + 22 =

p

(1 − 2)2 + (3 − −2)2 =



√ 32 = 4 2 ≈ 5.6568.



√ 8 = 2 2 ≈ 2.8284.

2.178 The largest possible Euclidean distance is 1—for example, if x = ⟨0, 0⟩ and y = ⟨1, 0⟩, then the Manhattan distance is 1, and so is the Euclidean distance.

14

Basic Data Types

2.179 The smallest possible Euclidean distance is √12 : if x = ⟨0, 0⟩ and y = ⟨0.5, 0.5⟩, then the Manhattan distance is indeed 1 = 0.5 + 0.5, while the Euclidean distance is q

2.180 The smallest possible Euclidean distance is

 1 2 2

+

 1 2 2

1 √ —for n

q =

2.181

1 n

=

√1

2

.

example, if x = ⟨0, 0, . . . , 0⟩ and y = ⟨ 1n , 1n , . . . , 1n ⟩, then

+ n1 + · · · + p Euclidean distance = n · ( 1n )2

Manhattan distance =

1 2

1 n

=1 q = 1n .

3 2 1 0

−1 −2 −3

2.182

−3 −2 −1

0

1

2

3

−3 −2 −1

0

1

2

3

3 2 1 0

−1 −2 −3

2.183 Under Manhattan distance, the point ⟨16, 40⟩ has distance 12 + 2 = 14 from distance 8 + 7 = 15 from s, √ g and has √ 2 + 22 = so it’s closer to√ g. n Under Euclidean distance, the point ⟨16, 40⟩ has distance 12 146 ≈ 12.08305 · · · from √ g and distance 82 + 72 = 113 ≈ 10.6301 · · · from s, so it’s closer to s. 2.184 The Manhattan distance from the point ⟨8 + δ, y⟩ to s is δ + |33 − y|. The Manhattan distance from this point to g is 4 + δ + |42 − y|. Because δ appears in both formulas, the point is closer to g exactly when 4 + |42 − y| < |33 − y|. • If y < 33, then these distances are 4 + 42 − y and 33 − y; the distance to g is always larger. • If 33 ≤ y < 42, then these distances are 4 + 42 − y and y − 33; the former is smaller when 4 + 42 − y < y − 33, which, solving for y, occurs when 2y > 79. The distance to g is smaller when y > 39.5. • If y > 42, then these distances are 4 + y − 42 and y − 33; the distance to g is always smaller. Thus s is closer if y < 39.5 and g is closer if y > 39.5 (and they’re equidistant if y = 39.5). 2.185 A point ⟨4 − δ, y⟩ is 4 units closer to g than to s in the horizontal direction. Thus, unless the point is 4 or more units closer to g than to s vertically, g is closer. To be 4 units closer vertically means |42 − y| < 4 + |33 − y|, which happens exactly when y < 35.5. 2.186 3|x − 8| + 1.5|y − 33| 2.187 An image of the Voronoi diagram is shown below. The three different colors represent the three regions; the dashed lines represent the bisectors of each pair of points.

2.4 Sequences: Ordered Collections

15

2.188 As before, the three colors represent the three regions; the dashed lines represent the bisectors of each pair of points:

2.189 Here is an image of the diagram (with notation as in the previous two solutions):

2.190 For a pair of points p = ⟨p1 , p2 ⟩ and q = ⟨q1 , q2 ⟩, we will need to be able to find the line that bisects them: 1 p2 +q2 • The midpoint of p and q is on that line—namely the point ⟨ p1 +q , 2 ⟩. 2 • The slope of the bisecting line is the negative reciprocal of the slope of the line joining p and q—that is,

−(p1 +q1 ) . p2 +q2

Given these facts, the algorithm is fairly straightforward: given three points p, q, r, we find the bisector for each pair of points. The portion of the plane closest to p is that portion that’s both on p’s side of the p–q bisector and on p’s side of the p–r bisector. The other regions are analogous. Coding this in a programming language requires some additional detail in representing lines, but the idea is just as described here. A solution in Python is shown in Figure S.2.4, on p. 16. 2.191 6 by 3 2.192 6

16

Basic Data Types

1

import sys

2 3 4 5 6

class Point: def __init__(self, x, y): self.x = x self.y = y

7

def getX(self): return self.x def getY(self): return self.y

8 9 10 11 12 13

def midpoint(pointA, pointB): '''Finds the point halfway between two Points.''' return Point((pointA.getX() + pointB.getX()) / 2, (pointA.getY() + pointB.getY()) / 2)

14 15 16 17

def bisectorSlope(pointA, pointB): '''Finds the slope of the bisector between two Points.''' return -(pointA.getX() - pointB.getX()) / (pointA.getY() - pointB.getY())

18 19 20 21

def bisectorRegion(pointA, pointB): '''Returns a representation of the half plane on a's side of bisector between a and b.''' halfway = midpoint(pointA, pointB)

22

# Case I: the bisector of {a,b} is a vertical line. if pointA.getY() == pointB.getY(): if pointA.getX() < halfway.getX(): # a is left of the bisector return "x < %.2f" % halfway.getX() else: return "x > %.2f" % halfway.getX()

23 24 25 26 27 28 29

# Case II: the bisector of {a,b} is a horizontal line. slope = bisectorSlope(pointA, pointB) if slope == 0: if pointA.getY() < halfway.getY(): # a is below the bisector return "y < %.2f" % halfway.getY() else: return "y > %.2f" % halfway.getY()

30 31 32 33 34 35 36 37

# Case III: the bisector of {a,b} has nonzero, noninfinite slope. yIntercept = - slope * halfway.getX() + halfway.getY() if pointA.getY() < pointB.getY(): # a is below the bisector return "y < %.2f*x + %.2f" % (slope, yIntercept) else: return "y > %.2f*x + %.2f" % (slope, yIntercept)

38 39 40 41 42 43 44 45 46 47

def voronoiRegion(ego, alterA, alterB): '''Returns a representation of the portion of the plane closer to a than to either b or c.''' return bisectorRegion(ego, alterA) + " and " + bisectorRegion(ego, alterB)

48 49 50 51 52 53

# # a b c

Usage:

python voronoi_shareable.py 0 0 4 5 3 1

= Point(float(sys.argv[1]), float(sys.argv[2])) = Point(float(sys.argv[3]), float(sys.argv[4])) = Point(float(sys.argv[5]), float(sys.argv[6]))

54 55 56 57

print(voronoiRegion(a, b, c)) print(voronoiRegion(b, a, c)) print(voronoiRegion(c, a, b))

Figure S.2.4 Python code for Voronoi diagrams.

[for points (0,0) and (4,5) and (3,1)] [error-checking code omitted for brevity.]

2.4 Sequences: Ordered Collections

2.193 ⟨4, 1⟩, ⟨5, 1⟩, and ⟨7, 3⟩ 

9 0  18 2.194 3M =  21 21 3 

0 2.195 9 2

8 6 3

 6 24  0 15  12 21

27 27 6 15 6 18

  7 0 0  + 3 1 3

  7 7 6 = 12 3 5

2 5 2

 7 6 8

10 11 5

2.196 Undefined; the dimensions of B and F don’t match.  2.197

  1 8 + 8 3

3 0



0 2.198 9 2

8 6 3 

0 2.201 9 2 

0 2.202 9 2



  0 0 0 = 18 3 4

8 6 3

  1 −6 = 8 0

1 5

2 4

8 6 3

 0 5 0 7 3 3

  8 56 5 = 87 2 40

8 6 3

 0 7 0 3 3 1

2 5 2





  0 0 0  + 9 3 2

5 10

3 0

2.199 −2 ·

2.200 0.5 ·

  4 11 = 2 3

 0 0 6

16 12 6

 −2 −16

  9 0.5 = 0 2.5

1 2

 4.5 0  40 102 37

  7 24 6 = 81 5 26

40 48 25

 48 99 47

2.203 Undefined: A has three columns but F has only two rows, so the dimensions don’t match up. 2.204 Undefined: B has two columns but C has three rows, so the dimensions don’t match up.  2.205  2.206

3 0

1 8

8 3

4 2 





1 2.207 0.25 1 1 

1 2.208 0.5 1 1

8 3

  4 27 = 2 24

14 16

3 0

  1 24 = 9 8

40 19

0 0 1 0 0 1





  0 0 0 + 0.75 0 1 0   0 0 0 + 0.5 0 0 1

0 1 1 0 1 1

  .25 0 0 = .25 1 1   0 .5 0 = .5 1 1

0 .5 1

0 .75 1  0 0 .5

 0 0 .75

17

18

Basic Data Types

2.209 Here’s one example (there are many more): 

0 G = 0 1

0 0 1



 0 0 0



1 H = 1 1 ′

and

0 1 1

 0 0 1

2.210 Here’s a solution in Python: 1 2 3 4 5 6

# This function uses an image-processing library that supports the following operations: # Image(w, h) --> create a new empty image with width w and height h # image.getWidth() --> the width of the image (in pixels) # image.getHeight() --> the height of the image (in pixels) # image.getPixel(x, y) --> the color value of the pixel value at coordinate (x, y) # image.setPixel(x, y, color) --> change the pixel at coordinate (x, y) to have the given color

7 8 9 10 11 12 13 14 15 16 17 18 19 20

def average(imageA, imageB, lambda): width = imageA.getWidth() height = imageA.getHeight() if lambda < 0 or lambda > 1 or imageB.getWidth() != width or or imageB.getHeight() != height: return "Error!" newImage = Image(width, height) for x in range(height): for y in range(width): avg = lambda * imageA.getPixel(x, y) + (1 - lambda) * imageB.getPixel(x, y) newImage.setPixel(x, y, avg) return newImage

2.211 Let A be an m-by-n matrix, and I be the n-by-n identity matrix. We must show that AI is identical to A. Here’s a derivation showing that the ⟨i, j⟩th entries of the two matrices are the same: (AI)i,j =

n X

Ai,k Ik,j

k=1

=

n X

( Ai,k ·

k=1

definition of matrix multiplication

0 if k ̸= j 1 if k = j

= Ai,j · 1 = Ai,j .

definition of I simplfying: there is only one nonzero term in the sum (namely for k = j)

Thus for any indices i and j we have that (AI)i,j = Ai,j .  2.212  2.213  2.214  2.215

2 1

3 1

2 1

1 1

2 1

1 1

5 3

3 2



2 1

3 1



2 1

  2 3 = 1 1

3 1



7 3

  9 23 = 4 10

30 13







2 =

2  1 · 1

5 3

3 2



  1 34 = 0 21

  21 1 · 13 1

  1 55 = 0 34

34 21



2.216 We’re given that y + w = 0 and 2y + w = 1. From the former, we know y = −w; plugging this value for y into the latter yields −2w + w = 1, so we have w = −1 and y = 1.

2.5 Functions

19

From x + z = 1 and 2x + z = 0, similarly, we know z = 1 − x and thus that 2x + 1 − x = 0. Therefore x = −1 and z = 2. The final inverse is therefore   −1 1 . 2 −1  And to verify the solution: indeed, we have  2.217  2.218  2.219

1 2

  1 −1 · 1 2

  1 1 = −1 0

 0 , as desired. 1

 1 −0.5

−2 1.5 0 1

1 0

1 0

0 1







         x y 1 1 1 1 x y 1 0 were the inverse of —that is, suppose · = . z w 1 1 1 1 z w 0 1 Then we’d need x + z = 1 (to make the ⟨1, 1⟩st entry of the product correct) and x + z = 0 (to make the ⟨1, 2⟩th entry of the product correct). But it’s not possible for x + z to simultaneously equal 0 and 1!

2.220 Suppose that the matrix

2.221 [0, 0, 0, 0, 0, 0, 0] 2.222 [0, 1, 1, 0, 0, 1, 1] 2.223 [1, 0, 0, 1, 1, 0, 0]

2.5

Functions 2.224 f(3) = (32 + 3) mod 8 = 12 mod 8 = 4 2.225 f(7) = (72 + 3) mod 8 = 52 mod 8 = 4 2.226 f(x) = 3 when x = 0 and when x = 4. 2.227

x 0 1 2 3 4 5 6 7

f(x) 3 4 7 4 3 4 7 4

2.228 quantize(n) = 52 ·

n 52

+ 28

2.229 The domain is {0, 1, . . . , 255} × {1, . . . , 256}. The range is {0, 1, . . . , 255}: all output colors are possible, because for any n ∈ {0, 1, . . . , 255} we have that quantize(n, 256) = n.

20

Basic Data Types

2.230 The step size is ⌈ 256 ⌉, so we have k  quantize(n, k) =

%    $ 256 256 n ·  256  + k . k 2 k

⌉ = 2—so Note that this expression will do something a little strange for large k: for example, when k = 200, we have ⌈ 256 k the first 127 quanta correspond to colors {1, 3, 5, . . . , 255}, and the remaining 73 quanta are never used (because they’re bigger than 256). 2.231 A function quantize(n) : {0, 1, . . . , 255} → {a1 , a2 , . . . , ak } can be c-to-1 only if k | 256—that is, if k evenly divides 256. (For example, there’s no way to quantize 256 colors into three precisely equal pieces, because 256 is not evenly divisible by 3.) So k must be a power of two; specifically, we have to have k ∈ {1, 2, 4, 8, 16, 32, 64, 128, 256}. 2.232 Here’s a solution in Python: 1 2 3 4 5 6

# This function uses an image-processing library that supports the following operations: # image.copyGray() --> create a [grayscale] copy of the given image # image.getWidth() --> the width of the image (in pixels) # image.getHeight() --> the height of the image (in pixels) # image.getPixel(x, y) --> the color value of the pixel value at coordinate (x, y) # image.setPixel(x, y, color) --> change the pixel at coordinate (x, y) to have the given color

7 8 9 10 11 12 13 14 15

def quantize(image, k): quantizedImage = image.copyGray() stepSize = int(math.ceil(256 / k)) for x in range(image.getWidth()): for y in range(image.getHeight()): color = (stepSize // 2) + stepSize * (image.getPixel(x,y) // stepSize) quantizedImage.setPixel(x, y, color) return quantizedImage

2.233 The domain is R; the range is R≥0 . 2.234 The domain is R; the range is Z. 2.235 The domain is R; the range is R>0 . 2.236 The domain is R>0 ; the range is R. 2.237 The domain is Z; the range is {0, 1}. 2.238 The domain is Z≥2 ; the range is {0, 1, 2}. 2.239 The domain is Z × Z≥2 ; the range is Z≥0 . 2.240 The domain is Z; the range is {True, False}. 2.241 The domain is

S

n∈Z≥1

Rn ; the range R≥0 .

 2.242 The domain is R; the range is the set of all unit vectors in R2 —that is, the set x ∈ R2 : ∥x∥ = 1 .   2.243 The function add can be written as add(⟨h, m⟩, x) = ⟨[(h + m + 60x − 1) mod 12] + 1, m + x mod 60⟩. (The hours component is a little more complicated to write because of 0-based indexing vs. 1-based indexing: the hour is represented as [x − 1 mod 12] + 1 because anything mod 12 is a value between 0 and 11, and we need the hours value to be between 1 and 12.) 2.244 (f ◦ f)(x) = f(f(x)) = x mod 10

2.5 Functions

21

2.245 (h ◦ h)(x) = h(h(x)) = 4x 2.246 (f ◦ g)(x) = f(g(x)) = x + 3 mod 10 2.247 (g ◦ h)(x) = g(h(x)) = 2x + 3 2.248 (h ◦ g)(x) = h(g(x)) = 2(x + 3) = 2x + 6 2.249 (f ◦ h)(x) = f(h(x)) = 2x mod 10 2.250 (f ◦ g ◦ h)(x) = f(g(h(x))) = 2x + 3 mod 10 2.251 (g ◦ h)(x) = g(h(x)) = 2 · h(x), so we need h(x) = 32 x + 21 . 2.252 (h ◦ g)(x) = h(g(x)) = h(2x), so we need h(2x) = 3x + 1 for every x. That’s true when h(x) = 23 x + 1. 2.253 Yes, f(x) = x is onto; every output value is hit. 2.254 The function f(x) = x2 mod 4 has the values f(0) = 02 f(1) = 12 f(2) = 22 f(3) = 32

mod 4 = 0 mod 4 = 0 mod 4 = 1 mod 4 = 1 mod 4 = 4 mod 4 = 0 mod 4 = 9 mod 4 = 1

Thus there is no x such that f(x) = 2 or f(x) = 3, so the function is not onto. 2.255 For the function f : {0, 1, 2, 3} → {0, 1, 2, 3} defined as f(x) = (x2 − x) mod 4, the values are f(0) = (02 − 0) mod 4 = 0 − 0 mod 4 = 0 f(1) = (12 − 1) mod 4 = 1 − 1 mod 4 = 0 f(2) = (22 − 2) mod 4 = 4 − 2 mod 4 = 2 f(3) = (32 − 3) mod 4 = 9 − 3 mod 4 = 2 There’s no x such that f(x) = 1 or f(x) = 3, so the function is not onto. 2.256 Yes, this function f : {0, 1, 2, 3} → {0, 1, 2, 3}, which we could also have written as f(x) = 3 − x, is onto; every output value is hit. 2.257 There’s no x such that f(x) = 0 or f(x) = 3, so the function is not onto. 2.258 For the function f : {0, 1, 2, 3} → {0, 1, . . . , 7}, we have f(0) = 02 f(1) = 12 f(2) = 22 f(3) = 32

mod 8 = 0 mod 8 = 0 mod 8 = 1 mod 8 = 1 . mod 8 = 4 mod 8 = 4 mod 8 = 9 mod 8 = 1

This function is not one-to-one, because f(1) = f(3). 2.259 For f(x) = x3 mod 8 as a function f : {0, 1, 2, 3} → {0, 1, . . . , 7}, we have f(0) = 03 f(1) = 13 f(2) = 23 f(3) = 33 This function is not one-to-one, because f(0) = f(2).

mod 8 = 0 mod 8 = 0 mod 8 = 1 mod 8 = 1 mod 8 = 8 mod 8 = 0 mod 8 = 27 mod 8 = 7

22

Basic Data Types

2.260 For f(x) = (x3 − x) mod 8 as a function f : {0, 1, 2, 3} → {0, 1, . . . , 7}, we have f(0) = (03 − 0) mod 8 = 0 mod 8 = 0 f(1) = (13 − 1) mod 8 = 0 mod 8 = 0 f(2) = (23 − 2) mod 8 = 6 mod 8 = 6 f(3) = (33 − 3) mod 8 = 24 mod 8 = 0 This function is not one-to-one, because f(0) = f(1) = f(3). 2.261 For f(x) = (x3 + 2x) mod 8 as a function f : {0, 1, 2, 3} → {0, 1, . . . , 7}, we have f(0) = (03 + 0) mod 8 = 0 mod 8 = 0 f(1) = (13 + 2) mod 8 = 3 mod 8 = 3 f(2) = (23 + 4) mod 8 = 12 mod 8 = 4 f(3) = (33 + 6) mod 8 = 33 mod 8 = 1 This function is one-to-one, as no output is hit more than once. 2.262 This function is not one-to-one, because f(1) = f(3). 2.263 Every element except A[1] (which is the root) has a parent. An element i is a parent only if it has a left child, which occurs when 2i ≤ n. Thus parent : {2, 3, . . . , n} → {1, 2, . . . , ⌊ 2n ⌋}, and: • The domain is {2, 3, . . . , n}. • The range is {1, 2, . . . , ⌊ n2 ⌋}. • The function parent is not one-to-one: for example, parent(2) = parent(3) = 1. 2.264 An element i has a left child when 2i ≤ n and a right child when 2i + 1 ≤ n. Every element except A[1] is the child of some other element; even-numbered elements are left children and odd-numbered elements are right children. Thus: • left : {1, 2, . . . , ⌊ n2 ⌋} → {k ≥ 2 : 2 | k and k ≤ n}. The domain is {1, 2, . . . , ⌊ 2n ⌋} and the range is the set of even integers {2, 4, . . . , n} [or n − 1 if n is odd]. • right : {1, 2, . . . , ⌊ n−1 ⌋} → {k ≥ 2 : 2 ̸ | k and k ≤ n}. 2 ⌋} and the range is the set of odd integers {3, 5, . . . , n} [or n − 1 if n is even]. The domain is {1, 2, . . . , ⌊ n−1 2 Both left and right are one-to-one: if i ̸= i′ , then 2i ̸= 2i′ and 2i + 1 ̸= 2i′ + 1. Thus there are no two distinct elements i and i′ that have left(i) = left(i′ ) or right(i) = right(i′ ). 2.265 (parent ◦ left)(i) = parent(left(i)) = ⌊ 2i2 ⌋ = ⌊i⌋ = i. In other words, parent ◦ left is the identity function: for any i, we have (parent ◦ left)(i) = i. In English, this composition asks for the parent of the left child of a given index—but the parent of i’s left child simply is i! 2.266 (parent ◦ right)(i) = parent(right(i)) = ⌊ 2i+1 ⌋ = ⌊i + 12 ⌋ = i. In other words, parent ◦ right is the identity 2 function: for any i, we have (parent ◦ right)(i) = i. In English, this composition asks for the parent of the right child of a given index—but, just as for parent ◦ left—the parent of i’s right child simply is i. 2.267 (left ◦ parent)(i) = left(parent(i)) = 2 ⌊ 2i ⌋ for any i ≥ 2. In other words, this function is   if 2 | i i (left ◦ parent)(i) = i − 1 if 2 ̸ | i  undefined if i = 1. In English, this composition asks for the left child of a given index’s parent. An index that’s a left child is the left child of its parent; for an index that’s a right child, the left child of its parent is its left-hand sibling. 2.268 (right ◦ parent)(i) = right(parent(i)) = 2 ⌊ 2i ⌋ + 1 for any i ≥ 2. In other words, this function is   if 2 | i i + 1 (right ◦ parent)(i) = i if 2 ̸ | i  undefined if i = 1.

2.5 Functions

23

In English, this composition asks for the right child of a given index’s parent. An index that’s a right child is the right child of its parent; for an index that’s a left child, the right child of its parent is its right-hand sibling. 2.269 f−1 (y) = 2.270 g−1 (y) =

y−1 3

√ 3

y

2.271 h−1 (y) = log3 y 2.272 This function isn’t one-to-one (it can’t be: there are 24 different inputs and only 12 different outputs—for example, f(7) = f(19)), so it doesn’t have an inverse. 2.273 The degree is 3. 2.274 The degree is 3. 2.275 The degree is 3; the polynomial is p(x) = 4x4 − (2x2 − x)2 = 4x4 − 4x4 + 4x3 − x2 = 4x3 − x2 . 2.276 The largest possible degree is still 7; the smallest possible degree is 0 (if p and q are identical—for example, for p(x) = x7 and q(x) = x7 , we have p(x) − q(x) = 0). P P 2.277 The largest possible degree is 14, and that’s the smallest, too! If p(x) = 7i=0 ai xi and q(x) = 7i=0 bi xi , where a7 ̸= 0 and b7 ̸= 0, then there’s a term a7 b7 x14 in the product. And neither a7 nor b7 can equal zero, because p and q are both polynomials with degree 7. 2.278 The largest possible degree is 49, and that’s the smallest, too! If p(x) = a7 ̸= 0 and b7 ̸= 0, then we have

P7 i=0

ai xi and q(x) =

P7 i=0

bi xi , where

P  7 i p(q(x)) = p i=0 bi x P j X7 7 = aj bi xi i=0 j=0 P 7 7 i all the terms when j ≤ 7 have exponent ≤ 7 · 6 = 42 = a7 + [terms of degree ≤ 42] i=0 bi x  7 = a7 b77 x7 + [terms of degree ≤ 6] + [terms of degree ≤ 42] = a7 b77 (x7 )7 + [terms of degree ≤ 42] + [more terms of degree ≤ 42] = a7 b77 x49 + [many terms of degree ≤ 42] Because a7 ̸= 0 and b7 ̸= 0, then, this polynomial has degree 49. 2.279 p(x) = x2 + 1 (there’s no value of x ∈ R for which 1 + x2 = 0, because we’d need x2 = −1). 2.280 p(x) = x2 (which has a single root at x = 0). 2.281 p(x) = x2 − 1 (which has roots at x = −1 and x = 1). 2.282 Note that if L has an even number of elements, then there are two values “in the middle” of the sorted order. The lower median is the smaller of the two; the upper median is the larger of the two. We’ll find the lower median in the case of an even-length array. Here’s a simple algorithm (though there are much faster solutions!), making use of findMaxIndex from Figure 2.57 (and the analogous findMinIndex, which is a small modification of findMaxIndex [changing the > to < in Line 3]):

24

Basic Data Types

findMedian(L): Input: A list L with n ≥ 1 elements L[1], . . . , L[n]. Output: An index i ∈ {1, 2, . . . , n} such that L[i] is the (lower) median value in L 1 while |L| > 2:

maxIndex := findMaxIndex(L) minIndex := findMinIndex(L) 4 delete L[minIndex] and L[maxIndex] 5 return L[1] 2 3

(We could have found the upper median, instead of the lower median, by returning L[2] in the case that |L| = 2.)

3 3.2

Logic

An Introduction to Propositional Logic 3.1 False: 22 + 32 = 4 + 9 = 13 ̸= 16 = 42 . 3.2 False: the binary number 11010010 has the value 2 + 16 + 64 + 128 = 210, not 202. 3.3 True: in repeated iterations, the value of x is 202, then 101, then 50, then 25, then 12, then 6, and then 3. When x is 3, then we do one last iteration and set x to 1, and the loop terminates. 3.4 x ** y is valid Python if and only if x and y are both numeric values: r ⇔ u ∧ v. 3.5 x + y is valid Python if and only if x and y are both numeric values, or they’re both lists: p ⇔ (u ∧ v) ∨ (w ∧ z). 3.6 x * y is valid Python if and only if x and y are both numeric values, or if one of x and y is a list and the other is numeric: q ⇔ (u ∧ v) ∨ (u ∧ z) ∨ (v ∧ w). 3.7 x * y is a list if x * y is valid Python and x and y are not both numeric values: s ⇐ q ∧ ¬(u ∧ v). 3.8 If x + y is a list, then x * y is not a list: s ⇒ ¬t. 3.9 x + y and x ** y are both valid Python only if x is not a list: p ∧ r ⇒ ¬w. 3.10 She should answer “yes”: p ⇒ q is true whenever p is false, and “you’re over 55 years old” is false for her. (That is, False ⇒ True is true.) P Q 3.11 To write and notation: we Vn the solution more compactly, we’ll use Wn notation for ands and ors that’s similar to will write i=1 pi to mean p1 ∧ p2 ∧ · · · ∧ pn , and i=1 pi to mean p1 ∨ p2 ∨ · · · ∨ pn . To express “at least 3 of {p1 , . . . , pn } are true,” we’ll require that (for some values of i and j and k) we have that pi , pj , and pk are all true. We’ll then take the “or” over all the values of i, j, and k, where i < j < k. Formally, the proposition is n n _ _

n _

(pi ∧ pj ∧ pk ).

i=1 j=i+1 k=j+1

3.12 The easiest way to write “at least n − 1 of {p1 , . . . , pn } are true” is to say that, for every two variables from {p1 , . . . , pn }, at least one of the two is true. Using the same notation as in Exercise 3.11, we can write this proposition as n n ^ ^

(pi ∨ pj ).

i=1 j=i+1

3.13 The identity of ∨ is False: x ∨ False ≡ False ∨ x ≡ x. 3.14 The identity of ∧ is True: x ∧ True ≡ True ∧ x ≡ x. 3.15 The identity of ⇔ is True: x ⇔ True ≡ True ⇔ x ≡ x.

25

26

Logic

3.16 The identity of ⊕ is True: x ⊕ False ≡ False ⊕ x ≡ x. 3.17 The zero of ∨ is True: x ∨ True ≡ True ∨ x ≡ True. 3.18 The zero of ∧ is False: x ∧ False ≡ False ∧ x ≡ False. 3.19 The operator ⇔ doesn’t have a zero. Because True ⇔ x ≡ x, the proposition True ⇔ x is true when x = True and it’s false when x = False. Similarly, because True ⇔ x ≡ ¬x, the proposition False ⇔ x is false when x = True and it’s true when x = False. So neither True nor False is a zero for ⇔. 3.20 The operator ⊕ doesn’t have a zero: there’s no value z such that z ⊕ True ≡ z ⊕ False, so no z is a zero for ⊕. 3.21 The left identity of ⇒ is True; the right identity is False: p ⇒ False ≡ p and True ⇒ p ≡ p. 3.22 The right zero of ⇒ is True: p ⇒ True ≡ True. But there is no left zero for implies: True ⇒ p is equivalent to p, not to True; and False ⇒ p is equivalent to True, not to False. 3.23 x * y is equivalent to x ∧ y. 3.24 x + y is equivalent to x ∨ y. 3.25 1 - x is equivalent to ¬x and 1 - y is equivalent to ¬y. Addition is the equivalent of a disjunction, so 2 - x - y is equivalent to ¬x ∨ ¬y. 3.26 (x * (1 - y)) + ((1 - x) * y) is equivalent to (x ∧ ¬y) ∨ (¬x ∧ y), which is equivalent to x ⊕ y. 3.27 (p1 + p2 + p3 + · · · + pn ) ≥ 3 3.28 (p1 + p2 + p3 + · · · + pn ) ≥ n − 1 3.29 x3 : x is greater than or equal to 8 if and only if x’s binary representation is 1???. 3.30 ¬x0 ∧ ¬x1 : x is evenly divisible by 4 if and only if x’s binary representation is ??00. 3.31 The value of x is a + b, where a = 4x2 + 1x0 and b = 8x3 + 2x1 . It takes a bit of work to persuade yourself of this fact, but it turns out that x is divisible by 5 if and only if both a and b are divisible by 5. Thus the proposition is: (x0 ⇔ x2 ) a is divisible by 5



(x1 ⇔ x3 ) . b is divisible by 5

You can verify that this proposition is correct with a truth table. 3.32 There are two values of x that are evenly divisible by 9: x = 0 and x = 9. Written in binary, those values are 0000 and 1001. Thus x is evenly divisible by 9 if the middle two bits are both zero, and the outer two bits match: (x0 ⇔ x3 ) ∧ ¬x1 ∧ ¬x2 . 3.33 There are several ways to express that x is an exact power of two. Here’s one, which simply lists the four different values that are powers of two (1000, 0100, 0010, and 0001, respectively): [¬x0 ∧ ¬x1 ∧ ¬x2 ∧ x3 ] ∨ [¬x0 ∧ ¬x1 ∧ x2 ∧ ¬x3 ] ∨ [¬x0 ∧ x1 ∧ ¬x2 ∧ ¬x3 ] ∨ [x0 ∧ ¬x1 ∧ ¬x2 ∧ ¬x3 ] Another version is to say that at least one bit is 1 but that no two bits are 1: (x1 ∨ x2 ∨ x3 ∨ x4 ) ∧ (¬x1 ∨ ¬x2 ) ∧ (¬x1 ∨ ¬x3 ) ∧ (¬x1 ∨ ¬x4 ) ∧ (¬x2 ∨ ¬x3 ) ∧ (¬x2 ∨ ¬x4 ) ∧ (¬x3 ∨ ¬x4 ). 3.34 (x0 ⇔ y0 ) ∧ (x1 ⇔ y1 ) ∧ (x2 ⇔ y2 ) ∧ (x3 ⇔ y3 )

3.3 Propositional Logic: Some Extensions

27

3.35 Expressing x ≤ y is a bit more complicated than expressing equality. The following proposition does so by writing out “x and y are identical before the ith position, and furthermore xi < yi ” for each possible index i: (y3 ∧ ¬x3 ) ∨ [(y3 ⇔ x3 ) ∧ (y2 ∧ ¬x2 )] ∨ [(y3 ⇔ x3 ) ∧ (y2 ⇔ x2 ) ∧ (y1 ∧ ¬x1 )] ∨ [(y3 ⇔ x3 ) ∧ (y2 ⇔ x2 ) ∧ (y1 ⇔ x1 ) ∧ (y0 ∨ ¬x0 )]. (The last disjunct also permits y0 = x0 when x1,2,3 = y1,2,3 .) 3.36 The key point is that, when written in binary, doubling a number simply involves shifting it one position to the left. Thus: ¬y3 ∧¬x0 ∧(y2 ⇔ x3 ) ∧ (y1 ⇔ x2 ) ∧ (y0 ⇔ x1 ).

we need y ≤ 8 (otherwise else 2y > 16), and also: (1) we need x to be even, and (2) the other bits to match

3.37 The only values of x and y that satisfy xy = yx are either those for which x = y, or {x, y} = {2, 4}, as 24 = 16 = 42 . Thus: (x0 ⇔ y0 ) ∧ (x1 ⇔ y1 ) ∧ (x2 ⇔ y2 ) ∧ (x3 ⇔ y3 ) ∨ ([¬x3 ∧ ¬y3 ] ∧ [x1 ⊕ x2 ] ∧ [y1 ⊕ y2 ] ∧ [¬x0 ∧ ¬y0 ])

either x = y, or {x, y} = {2, 4}.

(Because 2 and 4 are represented as 0010 or 0100 in binary, we need the 0th and 3rd bits to be zero, and exactly one of the middle bits to be one.) 3.38 Given a 4-bit number x written in binary, let y be (x + 1) mod 16. Then: y0 y1 y2 y3

= = = =

¬x0 (¬x0 ∧ x1 ) ∨ (x0 ∧ ¬x1 ) ((¬x0 ∨ ¬x1 ) ∧ x2 ) ∨ (x0 ∧ x1 ∧ ¬x2 ) ((¬x0 ∨ ¬x1 ∨ ¬x2 ) ∧ x3 ) ∨ (x0 ∧ x1 ∧ x2 ∧ ¬x3 )

3.39 A solution in Python is shown in Figure S.3.1 on p. 28, in the first block of code (which defines a Proposition class with a getVariables method). 3.40 An added method for Propositions (the evaluate method), and corresponding a TruthAssignment class, are shown in the second block of code in Figure S.3.1 on p. 28. 3.41 A function to produce all TruthAssignments for a given set of variables is shown in Figure S.3.2; sample output on a small proposition is shown in the last block in the same figure.

3.3

Propositional Logic: Some Extensions 3.42 p ⇒ p ≡ True 3.43 p ⊕ p ≡ False 3.44 p ⇔ p ≡ True 3.45 Either p ⇒ (¬p ⇒ (p ⇒ q)) or (p ⇒ ¬p) ⇒ (p ⇒ q) is a tautology. 3.46 (p ⇒ (¬p ⇒ p)) ⇒ q

28

Logic

1 2 3 4 5 6

class Proposition: def __init__(self, parsed_expression): '''A class for Boolean formulas. Representations are lists in prefix notation, as in: "p" ["not", "p"] ["and", ["not", "p"], "q"] (Legal connectives: not, or, and, implies, iff, xor.)''' self.formula = parsed_expression

7 8 9

def getFormula(self): return self.formula

10 11 12 13 14 15 16 17 18 19 20

21 22 23 24 25 26 27 28 29 30 31 32

def getVariables(self): '''Extract all propositional variables from this proposition.''' if type(self.formula) == str: return set([self.formula]) elif self.formula[0] == "not": Phi = Proposition(self.formula[1]) return Phi.getVariables() elif self.formula[0] in ["implies", "and", "or", "xor", "iff"]: Phi, Psi = Proposition(self.formula[1]), Proposition(self.formula[2]) return Phi.getVariables().union(Psi.getVariables()) def evaluate(self, truth_assignment): # Add this method to the Proposition class. '''What is the truth value of this proposition under the given truth_assignment (a dictionary mapping variables to True or False).''' def helper(expr): if type(expr) == str: return truth_assignment.lookup(expr) elif expr[0] == "not": return not helper(expr[1]) elif expr[0] == "and": return helper(expr[1]) and helper(expr[2]) elif expr[0] == "or": return helper(expr[1]) or helper(expr[2]) elif expr[0] == "implies": return not helper(expr[1]) or helper(expr[2]) elif expr[0] == "xor": return helper(expr[1]) != helper(expr[2]) elif expr[0] == "iff": return helper(expr[1]) == helper(expr[2]) return helper(self.formula)

33 34 35 36 37

class TruthAssignment: '''A truth assignment, built from a dictionary mapping variable names to True or False.''' def __init__(self, variable_mapping): self.mapping = variable_mapping

38 39 40

def lookup(self, variable): return self.mapping[variable]

41 42 43

def extend(self, variable, truth_value): self.mapping[variable] = truth_value

Figure S.3.1 Evaluating a proposition (and finding the set of truth assignments that make it true), in Python.

3.47 The implicit parentheses make the given proposition into ((p ⇒ ¬p) ⇒ p) ⇒ q, which is logically equivalent to p ⇒ q.

3.48 Set all three of p, q, and r to be False. Then

p ⇒ (q ⇒ r) = True False ⇒ (False ⇒ False) False ⇒ True True

but

(p ⇒ q) ⇒ r) = False. (False ⇒ False) ⇒ False True ⇒ False False

3.49 Here’s the truth table for both p ⇒ (q ⇒ q) and (p ⇒ q) ⇒ q:

3.3 Propositional Logic: Some Extensions

44 45 46 47 48 49 50 51 52 53

29

def all_truth_assignments(variables): '''Compute a list of all TruthAssignments for the given set of variables.''' if len(variables) == 0: return [TruthAssignment({})] else: assignmentsT = all_truth_assignments(list(variables)[1:]) assignmentsF = copy.deepcopy(assignmentsT) [rho.extend(list(variables)[0], True) for rho in assignmentsT] [rho.extend(list(variables)[0], False) for rho in assignmentsF] return assignmentsT + assignmentsF

54 55 56 57

58 59 60 61

def satisfying_assignments(proposition): variable_set = proposition.getVariables() return [rho for rho in all_truth_assignments(variable_set) if proposition.evaluate(rho)] # For example, the following code outputs three truth assignments: # {'q': True, 'p': True} {'q': True, 'p': False} {'q': False, 'p': False} for rho in satisfying_assignments(Proposition(["or", ["not", "p"], "q"])): print(rho.mapping)

Figure S.3.2 Evaluating a proposition (and finding the set of truth assignments that make it true), in Python. p T T F F

p⇒q T F T T

q T F T F

(p ⇒ q) ⇒ q

T T T F

q⇒q T T T T

p ⇒ (q ⇒ q) T T T T

So p ⇒ (q ⇒ q) is a tautology (because its conclusion q ⇒ q is a tautology), but (p ⇒ q) ⇒ q is false when both p and q are false. 3.50 The propositions p ⇒ (p ⇒ q) and (p ⇒ p) ⇒ q are logically equivalent; they’re both true except when p is true and q is false. Thus they are both logically equivalent to p ⇒ q. Neither proposition is a tautology; both are satisfiable. 3.51 This answer is wrong: False ⊕ False is false, but when p = q = False, then ¬(p ∧ q) ⇒ (¬p ∧ ¬q) has the value ¬(False ∧ False) ⇒ (¬False ∧ ¬False) ≡ ¬False ⇒ True ≡ True ⇒ True ≡ True. 3.52 This answer is wrong: False ⊕ False is false, but when p = q = False, then (p ⇒ ¬q) ∧ (q ⇒ ¬p) has the value (False ⇒ ¬False) ∧ (False ⇒ ¬False) ≡ (False ⇒ True) ∧ (False ⇒ True) ≡ True ∧ True ≡ True. 3.53 Let’s build a truth table: p T T F F

q T F T F

¬p

¬p ⇒ q

¬(p ∧ q)

(¬p ⇒ q) ∧ ¬(p ∧ q) [the given proposition]

F F T T

T T T F

F T T T

F T T F

The last column of this truth table matches the truth table for p ⊕ q, so the solution is correct.

30

Logic

3.54 The given proposition is ¬ [(p ∧ ¬q ⇒ ¬p ∧ q) ∧ (¬p ∧ q ⇒ p ∧ ¬q)] , which has two repeated subexpressions, namely p ∧ ¬q and ¬p ∧ q, which we might call φ and φ, respectively. Then the given proposition can be written as ¬[(φ ⇒ ψ) ∧ (ψ ⇒ φ)]. Here’s a truth table: p T T F F

q T F T F

φ = p ∧ ¬q F T F F

ψ = ¬p ∧ q F F T F

φ⇒ψ T F T T

ψ⇒φ T T F T

(φ ⇒ ψ ) ∧ (ψ ⇒ φ )

¬[(φ ⇒ ψ) ∧ (ψ ⇒ φ)]

T F F T

F T T F

The truth table for p ⊕ q is exactly the last column of this truth table—the given proposition—so the solution is correct. 3.55 One proposition that works: (p ∨ q) ⇒ (¬p ∨ ¬q). (There are many other choices too.) 3.56 Note that p ∨ (¬p ∧ q) is equivalent to p ∨ q, so 1 if x > 20 or (x ≤ 20 and y < 0) then

foo(x, y) 3 else 4 bar(x, y) 2

1 if x > 20 or y < 0 then

can be rewritten as

foo(x, y)

2

.

3 else

bar(x, y)

4

3.57 Note that (x − y) · y ≥ 0 if and only if x − y and y are either both positive, or they are both negative. In other words, the condition (x − y) · y ≥ 0 is equivalent to (x − y ≥ 0) ⇔ (y ≥ 0). Thus, writing y ≥ 0 as p, and x ≥ y as q, the given expression is p ∨ q ∨ (p ⇔ q). This proposition is a tautology! Thus 1 if y ≥ 0 or y ≤ x or (x − y) · y ≥ 0 then 2

foo(x, y)

can be rewritten simply as

3 else 4

.

1 foo(x, y)

bar(x, y)

3.58 Write p to denote 12 | x and q to denote 4 | x. Note that if p is true, then q must be true. By unnesting the conditions, we see that foo is called if p ∧ ¬q (which is impossible). Thus 1 if x mod 12 = 0 then 2 3 4 5 6 7 8 9 10

if x mod 4 ̸= 0 then foo(x, y) else bar(x, y) else if x = 17 then baz(x, y) else quz(x, y)

1 if x mod 12 = 0 then 2

bar(x, y)

3 else

can be simplified into

4 5 6 7

if x = 17 then baz(x, y) else quz(x, y)

3.59 Let’s build a truth table for (¬p ⇒ q) ∧ (q ∧ p ⇒ ¬p): p T T F F

q T F T F

¬p ⇒ q

(q ∧ p ⇒ ¬p)

(¬p ⇒ q) ∧ (q ∧ p ⇒ ¬p)

T T T F

F T T T

F T T F

Thus the given proposition is logically equivalent to p ⊕ q.

.

3.3 Propositional Logic: Some Extensions

31

3.60 The expression p ⇒ p is always true, and so too is q ⇒ True. Thus we can immediately rewrite the statement as (p ⇒ ¬p) ⇒ ( (q ⇒ (p ⇒ p) ) ⇒ p) ≡ (p ⇒ ¬p) ⇒ ((q ⇒ True) ⇒ p) ≡ (p ⇒ ¬p) ⇒ (True ⇒ p). True q ⇒ True ≡ True

If p is true, then (p ⇒ ¬p) ⇒ (True ⇒ p) is (True ⇒ False) ⇒ (True ⇒ True), which is just False ⇒ True, or True. If p is false, then (p ⇒ ¬p) ⇒ (True ⇒ p) is (False ⇒ True) ⇒ (True ⇒ False), which is just True ⇒ False, or False. Thus the whole expression is logically equivalent to p. 3.61 Both p ⇒ p and ¬p ⇒ ¬p are tautologies, so the given proposition (p ⇒ p) ⇒ (¬p ⇒ ¬p) ∧ q is equivalent to True ⇒ True ∧ q. That proposition is true when q is true, and false when q is false—so the whole expression is logically equivalent to q. 3.62 The claim states that every proposition over the single variable p is either logically equivalent to p or it’s logically equivalent to ¬p. The claim is false; a proposition over a single variable p might be logically equivalent to True or False instead of being logically equivalent to p or ¬p. (For example, p ∧ ¬p.) However, every proposition φ over p is logically equivalent to one of {p, ¬p, True, False}. To prove this, observe that there are only two truth assignments for φ, one where p is true and one where p is false. Thus there are only four possible truth tables for φ: p T F

(1) T T

(2) T F

(3) F T

(4) F F

These columns are, respectively: (1) True, (2) p, (3) ¬p, and (4) False. 3.63 p T T F F

3.64 p T T F F

3.65 p T T F F

3.66 p T T F F

3.67 p T T F F

q T F T F

p⇒q T F T T

q T F T F

p∨q T T T F

p ⇒ (p ∨ q) T T T T

q T F T F

p∧q T F F F

(p ∧ q) ⇒ p

q T F T F

¬p

q T F T F

¬p

F F T T

F F T T

¬q

(p ⇒ q) ∧ ¬q

¬p

(p ⇒ q) ∧ ¬q ⇒ ¬p

F T F T

F F F T

F F T T

T T T T

T T T T p∨q T T T F p⇒q T F T T

(p ∨ q) ∧ ¬p

(p ∨ q) ∧ ¬p ⇒ q

F F T F

T T T T

¬p ⇒ q

(p ⇒ q) ∧ (¬p ⇒ q)

[(p ⇒ q) ∧ (¬p ⇒ q)] ⇒ q

T T T F

T F T F

T T T T

32

Logic

3.68 p T T T T F F F F

3.69 p T T T T F F F F

3.70 p T T T T F F F F

3.71 p T T T T F F F F

3.72 p T T T T F F F F

q T T F F T T F F

r T F T F T F T F

p⇒q T T F F T T T T

q⇒r T F T T T F T T

(p ⇒ q) ∧ (q ⇒ r)

q T T F F T T F F

r T F T F T F T F

q∧r T F F F T F F F

p⇒q T T F F T T T T

p⇒r T F T F T T T T

p⇒q∧r T F F F T T T T

(p ⇒ q) ∧ (p ⇒ r) ⇔ p ⇒ q ∧ r

q T T F F T T F F

r T F T F T F T F

q∨r T T T F T T T F

p⇒q T T F F T T T T

p⇒r T F T F T T T T

p⇒q∨r T T T F T T T T

(p ⇒ q) ∨ (p ⇒ r) ⇔ p ⇒ q ∨ r

q T T F F T T F F

r T F T F T F T F

q∨r T T T F T T T F

p ∧ (q ∨ r) T T T F F F F F

q T T F F T T F F

r T F T F T F T F

q⇒r T F T T T F T T

T F F F T F T T

p ⇒ (q ⇒ r) T F T T T T T T

p∧q T T F F F F F F

p∧q T T F F F F F F

p∧r T F T F F F F F

p⇒r T F T F T T T T

[(p ⇒ q) ∧ (q ⇒ r)] ⇒ (p ⇒ r)

T T T T T T T T

T T T T T T T T

T T T T T T T T

(p ∧ q) ∨ (p ∧ r)

p∧q⇒r T F T T T T T T

T T T F F F F F

p ∧ (q ∨ r) ⇔ (p ∧ q) ∨ (p ∧ r) T T T T T T T T

p ⇒ (q ⇒ r) ⇔ p ∧ q ⇒ r T T T T T T T T

3.3 Propositional Logic: Some Extensions

3.73 p T T F F

q T F T F

p∧q T F F F

p ∨ (p ∧ q) T T F F

q T F T F

p∨q T T T F

p ∧ (p ∨ q) T T F F

q T F T F

p⊕q F T T F

p∨q T T T F

q T F T F

p∧q T F F F

¬(p ∧ q)

¬p

¬q

¬p ∨ ¬ q

F T T T

F F T T

F T F T

F T T T

q T F T F

p∨q T T T F

¬(p ∨ q)

¬p

¬q

¬p ∧ ¬ q

F F F T

F F T T

F T F T

F F F T

Because the p column and p ∨ (p ∧ q) column match, the proposition p ∨ (p ∧ q) ⇔ p is a tautology. 3.74 p T T F F

Because the p column and p ∧ (p ∨ q) column match, the proposition p ∧ (p ∨ q) ⇔ p is a tautology. 3.75 p T T F F

3.76 p T T F F

3.77 p T T F F

3.78 p T T T T F F F F

3.79 p T T T T F F F F

p⊕q⇒p∨q T T T T

q T T F F T T F F

r T F T F T F T F

q∨r T T T F T T T F

p ∨ (q ∨ r) T T T T T T T F

p∨q T T T T T T F F

(p ∨ q) ∨ r

q T T F F T T F F

r T F T F T F T F

q∧r T F F F T F F F

p ∧ (q ∧ r) T F F F F F F F

p∧q T T F F F F F F

(p ∧ q) ∧ r

T T T T T T T F

T F F F F F F F

33

34

Logic

3.80 p T T T T F F F F

3.81 p T T T T F F F F

3.82 p T T F F

3.83 p T T T T F F F F

3.84 p T T F F

3.85 p T T F F

q T T F F T T F F

r T F T F T F T F

q⊕r F T T F F T T F

q T T F F T T F F

r T F T F T F T F

q⇔r T F F T T F F T

q T F T F q T T F F T T F F

p⇒q T F T T r T F T F T F T F

p ⊕ (q ⊕ r) T F F T F T T F p ⇔ (q ⇔ r) T F F T F T T F

¬p

¬p ∨ q

F F T T

T F T T

q⇒r T F T T T F T T

p⊕q F F T T T T F F

p ⇒ (q ⇒ r) T F T T T T T T

p⇔q T T F F F F T T

p∧q T T F F F F F F

q T F T F

p⇔q T F F T

¬p

¬q

¬p ⇔ ¬ q

F F T T

F T F T

T F F T

q T F T F

p⇒q T F T T

¬(p ⇒ q)

¬q

F T F F

F T F T

(p ⊕ q) ⊕ r

T F F T F T T F (p ⇔ q) ⇔ r

T F F T F T T F

p∧q⇒r T F T T T T T T

p ∧ ¬q F T F F

3.86 If p stands for an expression that causes an error when it’s evaluated, then the two blocks of code aren’t equivalent. The first block causes a crash; the second doesn’t. For example, if p stands for the expression 3 / 0 > 1, the left block crashes and the second merrily sets x to 51. 3.87 The tautology is p ⇔ (p ⇔ True). Observe that p ⇔ (p ⇔ True) is logically equivalent to (p ⇔ p) ⇔ True by the associativity of ⇔. And p ⇔ p is a tautology, by the idempotence of ⇔.

35

3.3 Propositional Logic: Some Extensions

3.88 We must find a circuit that’s true when the true inputs are {q} or {r}, and false when the true inputs are {p} or {p, q} or {p, q, r}. The only one-gate circuit of this form is ¬p—though there are many less efficient ways to express a logically equivalent solution, or an inequivalent one, with additional gates. One equivalent example, among many, is ¬(p ∧ (r ∨ p)); one inequivalent answer is ¬p ∧ (q ∨ r). 3.89 We must find a circuit that’s true when the true inputs are {p, q} or {p, r}, and false when the true inputs are {p} or {q} or {r}. No 0- or 1-gate circuits achieve this specification, but the 2-gate circuit p ∧ (q ∨ r) does. (Again, there are many less efficient ways of expressing an equivalent formula.) 3.90 There are two possible settings that can make the light go on (the only types of inputs that aren’t fully specified in the question have zero or two true inputs): • If the light is on when the true inputs are {}, then the circuit is equivalent to ¬(p ∨ q ∨ r). • If the light is on when the true inputs are some pair of inputs (say, {p, q}, without loss of generality), then the circuit is equivalent to p ∧ q ∧ ¬r. (If it were a different pair of input variables that caused the light to go on, then those two input variables would appear unnegated in the circuit’s formula, and the third variable would appear negated.) There is no circuit with two or fewer gates that achieves this specification. 3.91 There are two possible pairs of settings that can make the light go on (the only types of inputs that aren’t fully specified in the question have zero or one true inputs): • If the light is on when the true inputs are {} or {p}, then the simplest circuit is equivalent to ¬(q ∨ r). (It doesn’t matter which singleton input causes the bulb to go on, so without loss of generality let’s call it p.) • If the light is on when the true inputs are {p} or {q} (without loss of generality, similarly), then the circuit ¬(r∨(p∧q)) matches the specification. (In this case, the {} input also causes the bulb to go on.) There is no circuit with two or fewer gates that matches this case. The former circuit, of the form ¬(q ∨ r), uses only two gates; the latter requires three. 3.92 The circuit is equivalent to False, which can be written as p∧¬p (and cannot be expressed with fewer than two gates). 3.93 The sixteen propositions over p and q are listed in Figure 4.34, or reproduced here: 1

p T T F F

q T F T F

2

3

True p ∨ q p ⇐ q T T T T T T T T F T F T

4

5

p p⇒q T T T F F T F T

6

7

8

9

10

11

q p ⇔ q p ∧ q p nand q p ⊕ q ¬q T T T F F F F F F T T T T F F T T F F T F T F T

12

F T F F

13

¬p F F T T

14

F F T F

15

16

p nor q False F F F F F F T F

Of these, here are the ones that can be expressed with zero, one, or two ∧, ∨, and ¬ gates: • Two can be expressed with 0 gates: p and q. • Four can be expressed with 1 gate: ¬p, ¬q, p ∧ q, and p ∨ q. • Eight can be expressed with 2 gates: true (p ∨ ¬p); false (p ∧ ¬p); nand (¬(p ∧ q)), nor (¬(p ∨ q)); p ⇒ q (as ¬p ∨ q); q ⇒ p (as ¬q ∨ p); column #12 (p ∧ ¬q); and column #14 (¬p ∧ q). The only two propositions that can’t be expressed with two or fewer gates are p ⊕ q and p ⇔ q. 3.94 There are 84 different propositions, which is verified by the output of the Python program in Figure S.3.3. 3.95 The proposition p ⊕ q ⊕ r ⊕ s ⊕ t is true when an odd number of the Boolean variables {p, q, r, s, t} are true—that is, when p + q + r + s + t mod 2 = 1. 3.96 Here is the pseudocode:

36

Logic

1 2 3

def equivalent_propositions(propA, propB, truth_assignments): '''Is propA's truth value equal to propB's in each rho in truth_assignments?''' return all([propA.evaluate(rho) == propB.evaluate(rho) for rho in truth_assignments])

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

def all_formulae(variables, numGates): '''Compute a list of all Propositions over the given variables with the given number of gates.''' if numGates == 0: return [Proposition(var) for var in variables] else: subformulae = {} for i in range(numGates): subformulae[i] = all_formulae(variables, i) result = [Proposition(["not", phi.getFormula()]) for phi in subformulae[numGates-1]] for i in range(numGates): result += [Proposition([operator, phi.getFormula(), psi.getFormula()]) for operator in ["and", "or"] for phi in subformulae[i] for psi in subformulae[numGates-i-1]] return result

20 21 22 23 24 25 26 27

def cull_equivalent_formulae(formulae, truth_assignments): '''Filter out any logically equivalent formulae in the given list of formulae.''' uniqueList = [] for phi in formulae: if not any(equivalent_propositions(phi, psi, truth_assignments) for psi in uniqueList): uniqueList.append(phi) return uniqueList

28 29 30 31 32 33 34 35 36

variables = ["p", "q", "r"] gate_count = 3 allAssignments = all_truth_assignments(variables) allFormulas = [phi for num_gates in range(gate_count+1) for phi in all_formulae(variables, num_gates)] allUniqueFormulas = cull_equivalent_formulae(allFormulas, allAssignments) print(len(allUniqueFormulas), "distinct formulas are representable with", gate_count, "gates.") # Output: 84 distinct formulas are representable with 3 gates.

Figure S.3.3 Python code finding circuits with inputs {p, q, r}, and at most three gates chosen from {∧, ∨, ¬}. The code here relies on the class and function definitions found in Figure S.3.1.

1 for y := 1, 2, . . . , height: 2 3 4 5 6 7 8 9 10 11 12

for x := 1, 2, . . . , width: if P[x,y] is more white than black then error := “white” − P[x, y] P[x, y] := “white” else error := “black” − P[x, y] P[x, y] := “black” if x < width then distribute E. if x > 1 and y < height then distribute SW. if y < height then distribute S. if x < width and y < height then distribute SE.

3.97 r ∨ (p ∧ q) 3.98 (q ∧ r) ∨ (¬q ∧ ¬r) ∨ (¬p ∧ ¬q ∧ r) ∨ (¬p ∧ q ∧ ¬r) 3.99 p ∨ q

3.3 Propositional Logic: Some Extensions

37

3.100 ¬p ∧ ¬q ∧ ¬r 3.101 (p ∨ r) ∧ (q ∨ r) 3.102 p ∧ (¬q ∨ ¬r) 3.103 (p ∨ ¬q) ∧ (¬q ∨ r) 3.104 p ∧ (q ∨ r) 3.105 We can produce a one-clause 3CNF tautology: for example, p ∨ ¬p ∨ q. 3.106 Using De Morgan’s Laws, we know that a clause (a ∨ b ∨ c) is logically equivalent to ¬(¬a ∧ ¬b ∧ ¬c). That is, the truth assignment a = b = c = False does not satisfy any proposition · · · ∧ (a ∨ b ∨ c) ∧ · · ·. However, this particular clause is satisfied by any other truth assignment. Thus, we’ll need to use a different clause to rule out each of the eight possible truth assignments. So the smallest 3CNF formula that’s not satisfiable has three variables and eight clauses, each of which rules out one possible satisfying truth assignment: (p ∨ q ∨ r) ∧ (p ∨ q ∨ ¬r) ∧ (p ∨ ¬q ∨ r) ∧ (p ∨ ¬q ∨ ¬r) ∧ (¬p ∨ q ∨ r) ∧ (¬p ∨ q ∨ ¬r) ∧ (¬p ∨ ¬q ∨ r) ∧ (¬p ∨ ¬q ∨ ¬r). 3.107 Tautological DNF formulae correspond to satisfiable CNF formulae. A Boolean formula φ is a tautology if and only if ¬φ is not satisfiable, and for an m-clause φ in 3DNF, by an application of De Morgan’s Laws we see that ¬φ is equivalent to an m-clause 3CNF formula. Thus the smallest 3DNF formula that’s a tautology has three variables and eight clauses, each of which “rules in” one satisfying truth assignment. (p ∧ q ∧ r) ∨ (p ∧ q ∧ ¬r) ∨ (p ∧ ¬q ∧ r) ∨ (p ∧ ¬q ∧ ¬r) ∨ (¬p ∧ q ∧ r) ∨ (¬p ∧ q ∧ ¬r) ∨ (¬p ∧ ¬q ∧ r) ∨ (¬p ∧ ¬q ∧ ¬r). 3.108 We can give a one-clause non-satisfiable 3DNF formula: p ∧ ¬p ∧ q. 3.109 One way to work out this quantity is to split up the types of legal clauses based on the number of unnegated variables in them. We can have clauses with 0, 1, 2, or 3 unnegated clauses: • 3 unnegated variables: the only such clause is p ∨ q ∨ r. • 2 unnegated variables: there are three different choices of which variable fails to appear in unnegated form (p, q, r), and there are three different choices of which variable does appear in negated form (again, p, q, r). Thus there are 3 · 3 = 9 total such clauses. • 1 unnegated variable: again, there are three choices for the unnegated variable, and three choices for the omitted negated variable. Again there are 3 · 3 = 9 total such clauses. • 0 unnegated variables: the only such clause is ¬p ∨ ¬q ∨ ¬r. So the total is 1 + 9 + 9 + 1 = 20 clauses. 3.110 Because a formula in 3CNF is a conjunction of clauses, the entire proposition is a tautology if and only if every clause is a tautology. The only tautological clauses are of the form a ∨ ¬a ∨ b, so the largest tautological 3CNF formula has six clauses: (p ∨ ¬p ∨ q) ∧ (p ∨ ¬p ∨ r) ∧ (q ∨ ¬q ∨ p) ∧ (q ∨ ¬q ∨ r) ∧ (r ∨ ¬r ∨ p) ∧ (r ∨ ¬r ∨ q). 3.111 Suppose that φ is satisfied by the all-true truth assignment—that is, when p = q = r = True then φ itself evaluates to true. It’s fairly easy to see that if the all-true truth assignment is the only truth assignment that satisfies φ then there can be more clauses in φ than if there are many satisfying truth assignments. We cannot have the clause ¬p ∨ ¬q ∨ ¬r in φ; otherwise, the all-true assignment doesn’t satisfy φ. But any clause that contains at least one unnegated literal can be in φ. Those clauses are: • 1 clause with 3 unnegated variables: the only such clause is p ∨ q ∨ r. • 9 clauses with 2 unnegated variables: there are three choices of which variable fails to appear in unnegated form, and there are three different choices of which variable does appear in negated form.

38

Logic

• 9 clauses with 1 unnegated variable: again, there are three choices for the unnegated variable, and three choices for the omitted negated variable. So the total is 1 + 9 + 9 = 19 clauses: (p ∨ q ∨ r) ∧ (¬p ∨ q ∨ r) ∧ (p ∨ ¬q ∨ r) ∧ (p ∨ q ∨ ¬r) ∧ (¬p ∨ p ∨ q) ∧ (¬p ∨ p ∨ r) ∧ (¬q ∨ q ∨ p) ∧ (¬q ∨ q ∨ r) ∧ (¬r ∨ r ∨ p) ∧ (¬r ∨ r ∨ q) ∧ (¬p ∨ ¬q ∨ r) ∧ (¬p ∨ q ∨ ¬r) ∧ (p ∨ ¬q ∨ ¬r) ∧ (p ∨ ¬p ∨ ¬q) ∧ (p ∨ ¬p ∨ ¬r) ∧ (q ∨ ¬q ∨ ¬p) ∧ (q ∨ ¬q ∨ ¬r) ∧ (r ∨ ¬r ∨ ¬p) ∧ (r ∨ ¬r ∨ ¬q).

3.4

An Introduction to Predicate Logic 3.112 P(x) = x has strong typing and x is object-oriented. 3.113 P(x) = x is imperative. 3.114 P(x) = x has strong typing and x is not object-oriented. 3.115 P(x) = x has either scope, or x is both imperative and has strong typing. 3.116 P(x) = x is either object-oriented or scripting. 3.117 P(x) = x does not have strong typing or x is not object-oriented. 3.118 n ≥ 8 3.119 ∃i ∈ {1, 2, . . . , n} : isLower(xi ) 3.120 ∃i ∈ {1, 2, . . . , n} : ¬isLower(xi ) ∧ ¬isUpper(xi ) ∧ ¬isDigit(xi ) 3.121 If there are no chairs at all in Bananaland, then both “all chairs in Bananaland are green” and “no chairs in Bananaland are green” are true. 3.122 The given predicate is logically equivalent to [age(x) < 18]∨[gpa(x) ≥ 3.0]. Here’s the relevant truth table, writing p for [age(x) < 18] and q for [gpa(x) ≥ 3.0)]: p T F T F

q T T F F

¬p ∧ q

F T F F

p ∨ (¬p ∧ q) T T T F

p∨q T T T F

3.123 The given predicate is logically equivalent to ¬[takenCS(x)]. Here’s the relevant truth table, writing p for takenCS(x) and q for [home(x) = Hawaii]: p T T F F

q T F T F

q∧p T F F F

q⇒q∧p T T F T

¬(q ⇒ q ∧ p)

F F T F

p ⇒ ¬(q ⇒ (q ∧ p)) F F T T

¬p

F F T T

3.124 The given predicate is logically equivalent to hasMajor(x) ∧ ([schoolYear(x) ̸= 3] ∨ ¬onCampus(x)). Here’s the relevant truth table, writing p for hasMajor(x) and q for [schoolYear(x) = 3] and r for onCampus(x):

3.4 An Introduction to Predicate Logic

p T T T T F F F F

q T T F F T T F F

r T F T F T F T F

p ∧ ¬q ∧ r F F T F F F F F

p ∧ ¬q ∧ ¬r F F F T F F F F

p ∧ q ∧ ¬r F T F F F F F F

(p ∧ ¬q ∧ r) ∨ (p ∧ ¬q ∧ ¬r) ∨ (p ∧ q ∧ ¬r)

F T T T F F F F

39

p ∧ (¬q ∨ ¬r) F T T T F F F F

3.125 The claim is false. Let’s think of this problem as a question about a proposition involving {True, False, ∧, ∨, ¬, ⇒} and—once each—p and q. We’ll prove that the claim is false by proving that it’s impossible to express p ⇔ q and p ⊕ q with this kind of proposition. First, observe that any proposition involving p and q once each must involve a subproposition of the form φ ⋄ ψ, where φ uses p and ψ uses q. (In the tree-based representation, φ ⋄ ψ is the least common ancestor of p and q.) But φ, then, is a proposition involving p as the only variable, and ψ is a proposition involving q as the only variable. That means that φ can only be logically equivalent to one of four functions: True, False, p, ¬p. And ψ can only be logically equivalent to one of four functions: True, False, q, ¬q. (See Exercise 3.62.) Then what can φ ⋄ ψ represent, for a connective ∧, ∨, or ⇒ (or ⇐)? If φ doesn’t express either p or ¬p, or if ψ doesn’t express either q or ¬q, the entire proposition can only be equivalent to one of the six logical functions that do not involve both p and q: True, False, p, ¬p, q, ¬q. Otherwise, we are left with only these possible expressions:

ψ≡q

ψ ≡ ¬q

φ≡p p∧q p∨q p ⇒ q ≡ ¬p ∨ q p ⇐ q ≡ ¬q ∨ p p ∧ ¬q p ∨ ¬q p ⇒ ¬q ≡ ¬p ∨ ¬q p ⇐ ¬q ≡ p ∨ q

φ ≡ ¬p ¬p ∧ q ¬p ∨ q ¬p ⇒ q ≡ p ∨ q ¬p ⇐ q ≡ ¬q ∨ ¬p ¬p ∧ ¬q ¬p ∨ ¬q ¬p ⇒ ¬q ¬p ⇐ ¬q ≡ ¬p ∨ q

But φ ⋄ ψ therefore can only compute the following functions: • the six functions not involving both p and q: True, False, p, ¬p, q, ¬q. • the four conjunctions: p ∧ q, ¬p ∧ q, p ∧ ¬q, ¬p ∧ ¬q. • the four disjunctions: p ∨ q, ¬p ∨ q, p ∨ ¬q, ¬p ∨ ¬q. (The implications are all logically equivalent to one of these disjunctions, as the table shows.) There are only 14 entries here, which leaves two unimplementable logical functions. (See Exercise 3.93, or Figure 4.34.) They are, perhaps intuitively, p ⇔ q and p ⊕ q. 3.126 “I am ambivalent about writing my program in Java.” 3.127 “I was excited to learn about a study abroad computer science program on the island of Java.” 3.128 “I love to start my day with a cup o’ Java.” 3.129 “Implementing this algorithm in Java really helped me get with the program.” 3.130 We can prove that this statement is false by counterexample: the number n = 2 is prime, but 3.131 ∃x : ¬[Q(x) ⇒ P(x)], which is logically equivalent to ∃x : Q(x) ∧ ¬P(x). 3.132 “There exists a negative entry in the array A.”

n 2

= 1 is in Z.

40

Logic

3.133 The sentence is ambiguous, because “or” of “parentheses or braces” might be inclusive or exclusive, though the much more natural reading is the former. (It’s a rare programming language that has two fundamentally different ways of expressing block structure.) Negating that version yields: “There exists a decent programming language that denotes block structure with neither parentheses nor braces.” 3.134 “No odd number is evenly divisible by any other odd number.” 3.135 This sentence is ambiguous: the type of quantification expressed by “a lake” could be universal or existential. (In other words: are we talking about some particular lake, and claiming that there’s some place in Minnesota that’s far from that one lake? Or are we saying that there’s a place in Minnesota that is far away from all lakes?) Thus there are two reasonable readings of the original sentence: ∃x ∈ Minnesota : ∃ℓ ∈ Lakes : d(x, ℓ) ≥ 10 ∃x ∈ Minnesota : ∀ℓ ∈ Lakes : d(x, ℓ) ≥ 10. The second reading seems more natural, and we can negate that reading as follows: “For every point in Minnesota, there is some lake within 10 miles of that point.” 3.136 This sentence is ambiguous; the order of quantification is unclear. Let’s write A to denote the set of sorting algorithms and In to denote the set of n-element arrays. Then there are two natural readings: ∀x ∈ A : ∃y ∈ In : x(y) takes ≥ n log n steps and ∃y ∈ In : ∀x ∈ A : x(y) takes ≥ n log n steps (There’s also some ambiguity in the meaning of n, which may be implicitly quantified or may have a particular numerical value from context. We’ll leave this ambiguity alone.) The first reading, informally: after I see your code for sorting, it’s possible for me to construct an array that make your particular sorting algorithm slow. Negated: “There’s a sorting algorithm that, for every n-element array, takes fewer than n log n steps on that array.” (Some sorting algorithm is always fast.) The second reading, informally: there’s a single devilish array that makes every sorting algorithm slow. Negated: “For every n-element array, there’s some sorting algorithm that takes fewer than n log n steps on that array.” (No array is always sorted slowly.) 3.137 Assume the antecedent—that is, assume that there exists a particular value x∗ ∈ S such that P(x∗ ) ∧ Q(x∗ ). Then certainly (1) there exists an x ∈ S such that P(x) (namely, x∗ ) and (2) there exists an x ∈ S such that Q(x) (again, namely, x∗ ). But (1) just expresses ∃x ∈ S : P(x), and (2) just expresses ∃x ∈ S : Q(x). 3.138 We’re asked to show that ∀x ∈ S : [P(x) ∧ Q(x)] is true if and only if [∀x ∈ S : P(x)] ∧ [∀x ∈ S : Q(x)] is. We’ll be lazy and use Example 3.44 to solve the problem, by showing that the desired claim is an instance of Example 3.44:   ∀x ∈ S : P(x) ∧ Q(x) ⇔ ∀x ∈ S : P(x) ∧ ∀x ∈ S : Q(x)     ≡ ¬∀x ∈ S : P(x) ∧ Q(x) ⇔ ¬ ∀x ∈ S : P(x) ∧ ∀x ∈ S : Q(x) p ⇔ q ≡ ¬p ⇔ ¬q   ≡ ∃x ∈ S : ¬[P(x) ∧ Q(x)] ⇔ ¬∀x ∈ S : P(x) ∨ ¬∀x ∈ S : Q(x) De Morgan’s Laws and their predicate-logic analogues   ≡ ∃x ∈ S : ¬P(x) ∨ ¬Q(x) ⇔ ∃x ∈ S : ¬P(x) ∨ ∃x ∈ S : ¬Q(x). De Morgan’s Laws and their predicate-logic analogues But both ¬P and ¬Q are predicates too! So the given proposition is logically equivalent to the theorem from Example 3.44, and thus is a theorem as well.   3.139 Suppose that ∀x ∈ S : P(x) ⇒ Q(x) and ∀x ∈ S : P(x). We must show that ∀x ∈ S : Q(x) is true. But let x∗ ∈ S be arbitrary. By the two assumptions, we have that P(x∗ ) and P(x∗ ) ⇒ Q(x∗ ). Therefore, by Modus Ponens, we have Q(x∗ ). Because x∗ was arbitrary, we’ve argued that ∀x ∈ S : Q(x). 3.140 The given statements, respectively, express “every condition-A-satisfying element also satisfies condition B” and “every element satisfies if-A-then-B.” These two conditions are true precisely when {x ∈ S : P(x)} ⊆ {x ∈ S : Q(x)}. 3.141 The given statements, respectively, express “there exists a condition-A-satisfying element that also satisfies condition B” and “there exists an element that satisfies both condition A and condition B.” These two conditions are true precisely when {x ∈ S : P(x)} ∩ {x ∈ S : Q(x)} is nonempty.

3.4 An Introduction to Predicate Logic

41

3.142 For intuition, here are two completely logically equivalent (English-language) statements: (1) Ronald Reagan was the tallest U. S. president. φ

(2) For every positive integer n, Ronald Reagan was the tallest U. S. president. φ

Because n doesn’t appear in φ, either φ is true for every integer n (in which case φ and ∀n : φ are both true), or φ is true for no integer n (in which case φ and ∀n : φ are both false). To prove the equivalence, first suppose φ is true. Then, for an arbitrary x ∈ S, we have that φ is still true (because φ doesn’t depend on x). That is, ∀x ∈ S : φ. For the other direction, suppose that ∀x ∈ S : φ. Let a ∈ S be arbitrary. Then, plugging in a for every free occurrence of x in φ must yield a true statement—but there are no such occurrences of x, so φ itself must be true. 3.143 Let’s consider two cases: either φ is true, or φ is false. If φ is true, then the left-hand side is obviously true (it’s True ∨ something). And the right-hand side is also true: φ is true (and doesn’t depend on x), so for any particular value of x, we still have that φ is true—and therefore so is φ ∨ P(x). Because that’s true for any x ∈ S, we have ∀x ∈ S : φ ∨ P(x) too. If φ is false, then the left-hand side is true if and only if ∀x ∈ S : P(x) (because False ∨ something ≡ something). And, on the right-hand side, for any particular value of x, we have that φ ∨ P(x) ≡ P(x). Thus the right-hand side is also true if and only if ∀x ∈ S : P(x). 3.144 We’ll use Exercise 3.143 and De Morgan’s Laws to show this equivalence. h i h i φ ∧ ∃x ∈ S : P(x) ⇔ ∃x ∈ S : φ ∧ P(x)  h i h i ≡ ¬ φ ∧ ∃x ∈ S : P(x) ⇔ ¬ ∃x ∈ S : φ ∧ P(x) h i h i ≡ ¬φ ∨ ¬ ∃x ∈ S : P(x) ⇔ ¬ ∃x ∈ S : φ ∧ P(x) h i h i ≡ ¬φ ∨ ∀x ∈ S : ¬P(x) ⇔ ∀x ∈ S : ¬φ ∨ ¬P(x) .

Exercise 3.84: p ⇔ q ≡ ¬p ⇔ ¬q De Morgan’s Laws De Morgan’s Laws

But this last statement is exactly Exercise 3.143 (where the proposition is ¬φ and the predicate is ¬P(x)). 3.145 Let’s consider two cases: either φ is true, or φ is false. If φ is false, then the left-hand side is vacuously true, and the right-hand side is also true: φ is false (and doesn’t depend on x), so for any particular value of x, we still have that φ is false—and therefore so is φ ⇒ P(x). Because that’s true for any x ∈ S, we have ∃x ∈ S : φ ⇒ P(x) too. If φ is true, then the left-hand side is true if and only if [∃x ∈ S : P(x)]. And, on the right-hand side, for any particular value of x, we have that True ⇒ P(x) if only if P(x), so the right-hand side is also true if and only if ∀x ∈ S : P(x). 3.146 Again, we’ll be lazy: p ⇒ q ≡ ¬p ∨ q, so h

i h i ∃x ∈ S : P(x) ⇒ φ ≡ ¬ ∃x ∈ S : P(x) ∨ φ h i ≡ ∀x ∈ S : ¬P(x) ∨ φ h i ≡ ∀x ∈ S : ¬P(x) ∨ φ h i ≡ ∀x ∈ S : P(x) ⇒ φ .

De Morgan’s Laws Exercise 3.143

3.147 Let φ be “x = 0” and let P(x) be “x ̸= 0” and let S be R. Then the left-hand side says that the predicate [x = 0] ∨ [∀x ∈ R : x ̸= 0] is true for every integer. But the proposition is true when x = 0 but false when x = 42. But the right-hand side says that the predicate ∀x ∈ R : [x = 0] ∨ [x ̸= 0], which is a theorem, is true for every integer. And it is.

42

Logic

3.5

Nested Quantifiers 3.148 The proposition is true. For any c ∈ R, we choose the constant function fc that always outputs c—that is, the function defined as fc (x) = c for all x ∈ R. (For example, f3 (x) is the function that takes an input x and, no matter what x is, returns the value 3.) Then fc (x) = c for any x—and, in particular, fc (0) = c. 3.149 The proposition is false. It would require a single function f such that we have both that f(0) = 0 and that f(0) = 1, for example—but that’s not possible for a function, because f(0) can have only one value. 3.150 The proposition is true. For any c ∈ R, we choose the function fc that subtracts c from its input—that is, the function defined as fc (x) = x − c for all x ∈ R. And fc (c) = 0. 3.151 The proposition is true. The always-zero function zero defined as zero(x) = 0 for any x ∈ R satisfies the condition that ∀x ∈ R : zero(x) = 0. 3.152 ∀t ∈ T : ∀j ∈ J : ∀j′ ∈ J : scheduledAt(j, t) ∧ scheduledAt(j′ , t) ⇒ j = j′ 3.153 ∀j ∈ J : ∃t ∈ T : scheduledAt(j, t) 3.154 We can require that every time at which A is run is followed by two minutes of A not being run: ∀t ∈ T : scheduledAt(A, t) ⇒ ¬scheduledAt(A, t + 1) ∧ ¬scheduledAt(A, t + 2). Alternatively, we look at pairs of times when A is scheduled and require that the times be more than two minutes apart: ∀t ∈ T : ∀t′ ∈ T : scheduledAt(A, t) ∧ scheduledAt(A, t′ ) ⇒ (t = t′ ∨ |t − t′ | > 2). (In the latter case, we have to be careful not to forbid A being scheduled at time t and time t′ where t = t′ !) 3.155 ∃t, t′ , t′′ ∈ T : scheduledAt(B, t) ∧ scheduledAt(B, t′ ) ∧ scheduledAt(B, t′′ ) ∧ t ̸= t′ ∧ t ̸= t′′ ∧ t′ ̸= t′′ 3.156 One way of expressing this condition: if C is scheduled at times t and t′ > t, then there’s no time later than t′ at which C is also scheduled. Formally:   ∀t, t′ ∈ T : scheduledAt(C, t) ∧ scheduledAt(C, t′ ) ∧ t < t′ ⇒ ∀t′′ ∈ T : t′′ > t′ ⇒ ¬scheduledAt(C, t′′ ) . Alternatively, if C is scheduled at times t, t′ , and t′′ , then at least two of {t, t′ , t′′ } are actually equal: ∃t, t′ , t′′ ∈ T : scheduledAt(C, t) ∧ scheduledAt(C, t′ ) ∧ scheduledAt(C, t′′ ) ⇒ t = t′ ∨ t = t′′ ∨ t′ = t′′ . 3.157 The given statement is equivalent to the following: every time t at which E is run is followed by a time at which D is run. Formalizing that:   ∀t ∈ T : scheduledAt(E, t) ⇒ ∃t′ ∈ T : t′ > t ∧ scheduledAt(D, t′ ) . 3.158 ∀t, t′ ∈ T : scheduledAt(G, t) ∧ scheduledAt(G, t′ ) ∧ t < t′ ⇒ [∃t′′ ∈ T : t < t′′ < t′ ∧ scheduledAt(F, t′′ )] 3.159 ∀t, t′ ∈ T : scheduledAt(I, t) ∧ scheduledAt(I, t′ ) ∧ t < t′ ⇒  ∀t′′ , t′′′ ∈ T : t < t′′ < t′ ∧ t < t′′′ < t′ ∧ t′′ ̸= t′′′ ∧ scheduledAt(H, t′′ ) ⇒ ¬scheduledAt(H, t′′′ ) 3.160 ∀x ∈ {1, . . . , n} : ∀y ∈ {1, . . . m} : P[x, y] = 0 3.161 ∃x ∈ {1, . . . , n} : ∃y ∈ {1, . . . m} : P[x, y] = 1 3.162 ∀y ∈ {1, . . . m} : ∃x ∈ {1, . . . , n} : P[x, y] = 1 3.163 ∀x ∈ {1, . . . , n} : ∀y ∈ {1, . . . m − 1} : P[x, y] = 1 ⇒ P[x, y + 1] = 0 3.164 ∀i, j ∈ {1, . . . , 15} : G[i, j] ⇒ (G[i − 1, j] ∨ G[i + 1, j]) ∧ (G[i, j − 1] ∨ G[i, j + 1])

3.5 Nested Quantifiers

43

3.165 We can write this by saying that the first open square in any horizontal word (G[i, j] and ¬G[i − i, j]) is followed by two open squares horizontally (G[i + 1, j] and G[i + 2, j]), and a similar condition for vertical words: h i ∀i, j ∈ {1, . . . , 15} : G[i, j] ∧ ¬G[i − 1, j] ⇒ G[i + 1, j] ∧ G[i + 2, j] h i ∧ G[i, j] ∧ ¬G[i, j − 1] ⇒ G[i, j + 1] ∧ G[i, j + 2] . 3.166 This condition is the simplest one to state: ∀i, j ∈ {1, . . . , 15} : G[i, j] = G[16 − i, 16 − j]. 3.167 A path from ⟨x1 , y1 ⟩ to ⟨xk , yk ⟩ is a sequence ⟨x1 , y1 ⟩, ⟨x2 , y2 ⟩, . . . , ⟨xk , yk ⟩ of squares such that ∀i ∈ {1, . . . , k} : G[xi , yi ]



∀i ∈ {1, . . . , k − 1} : |xi − xi+1 | + |yi − yi+1 | = 1 .

all the squares in the path are open

we only go one square down or across in each step of the path

Let P(i, j, x, y) be a predicate that is true exactly when there exists a path from ⟨i, j⟩ to ⟨x, y⟩. Overall interlock means that ∀i, j, x, y ∈ {1, . . . , 15} : G[i, j] ∧ G[x, y] ⇒ P(i, j, x, y). 3.168 Write K to denote {1, . . . , k}. Then the set {A1 , A2 , . . . , Ak } is a partition of a set S if: (i) ∀i ∈ K : Ai ̸= ∅ S (ii) ki=1 Ai = hS i (iii) ∀i ∈ K : ∀j ∈ K : i ̸= j ⇒ Ai ∩ Aj = ∅

all Ai s are nonempty A1 ∪ A2 ∪ · · · ∪ Ak = S Ai and Aj are disjoint for i ̸= j

Alternatively, we can state (ii) directly with quantifiers: first, ∀i ∈ K : Ai ⊆ S; and, second, ∀x ∈ S : [∃i ∈ K : x ∈ Ai ]. 3.169 The maximum of the array A[1 . . . n] is an integer x ∈ Z such that ∃i ∈ {1, 2, . . . , n} : A[i] = x and such that ∀i ∈ {1, 2, . . . , n} : A[i] ≤ x. (The first condition is the one more frequently omitted: it says that we aren’t allowed to make up a maximum like 9 for the array ⟨1, 2, 3⟩, even though 9 is greater than or equal to every entry in A.) 3.170 right(c) means: ∀t ∈ T : c(t) = t. 3.171 keepsTime(c) means: ∃x ∈ Z : ∀t ∈ T : c(t) = add(t, x). 3.172 closeEnough(c) means: ∀t ∈ T : ∃x ∈ {−2, −1, 0, 1, 2} : c(t) = add(t, x). 3.173 broken(c) means: ∃t′ ∈ T : ∀t ∈ T : c(t) = t′ . 3.174 The adage says broken(c) ⇒ ∃t ∈ T : c(t) = t. Suppose that broken(c). That is, by definition, there exists a time t′ ∈ T such that, for all t ∈ T, we have c(t) = t′ . In particular, then, when t = t′ , we have c(t′ ) = t′ . Thus, if broken(c), then there exists a time t′ ∈ T such that c(t′ ) = t′ . 3.175 ∀i ∈ {1, 2, . . . , n} : ri = i 3.176 Suppose that we start with the sorted pile ⟨1, 2, . . . , k − 1, k, k + 1, k + 2, . . . , n⟩, and perform one flip of k ≥ 2 pancakes. The resulting pile is ⟨k, k − 1, . . . , 1, k + 1, k + 2, . . . , n⟩. That’s precisely the structure of a pile of pancakes that can be sorted with one flip (undoing the unsorting flip), so: ∃k ∈ {2, 3, . . . , n} : ∀i ∈ {1, 2, . . . , k} : ri = k + 1 − i ∧ ∀i ∈ {k + 1, k + 2, . . . , n} : ri = i . the pile starts with k, k − 1, . . . , 1 …

… and ends with k + 1, k + 2, . . . , n

3.177 As in Exercise 3.176, after a single flip of k ≥ 2 pancakes, the resulting pile is ⟨k, k − 1, . . . , 1, k + 1, k + 2, . . . , n⟩. Suppose we then perform an additional flip of j ≥ 2 pancakes: • If j = k, the result is automatically sorted. (We just undid the first flip.) • If j < k, the result is ⟨k − j − 1, k − j, . . . , k, k − j, . . . , 1, k + 1, k + 2, . . . , n⟩. • If j > k, the result is ⟨j, j − 1, . . . , k + 1, 1, 2, . . . , k, j + 1, j + 2, . . . , n⟩.

44

Logic

Thus the pancake piles that can be sorted with two flips satisfy: ∀i ∈ {1, 2, . . . , n} : ri = i ∨ ∃j, k ∈ {2, . . . , n} : j < k ∨ ∃j, k ∈ {2, . . . , n} : j > k . ∧ ∀i ∈ {1, . . . , j} : ri = k − j + i ∧ ∀i ∈ {1, . . . , j − k} : ri = j + 1 − i do anything, and then undo it ∧ ∀i ∈ {j + 1, . . . , k} : ri = k + 1 − i ∧ ∀i ∈ {j − k + 1, . . . , j} : ri = i + k − j ∧ ∀i ∈ {k + 1, . . . , n} : ri = i ∧ ∀i ∈ {j + 1, . . . , n} : ri = i 3.178 ∀x ∈ P :

h

first undo the j-flip, then the k-flip [for j < k]

first undo the j-flip, then the k-flip [for j > k]



∃t ∈ T : bought(x, t) ⇒ ∃y ∈ P : ∃t′ ∈ T : t′ < t ∧ bought(y, t′ ) ∧ friends(x, y)

i

3.179 The claim is probably false. There exists some person who was the very first person to buy an iPad. (Call her Alice.) Then Alice is in P, and there exists a unique t0 ∈ T such that bought(x, t0 ). But Alice certainly has no friends who bought an iPad before time t0 —in fact, nobody bought an iPad before t0 . The reason that we can’t rule out the possibility that the claim is true is that a person in P could buy more than one iPad. Concretely, suppose that P = {p1 , p2 , . . . , pn }, and that T = {1, 2, . . . , 2n}. Suppose that, for each person pi , we have that bought(pi , i) and bought(pi , n + i). And suppose that everybody in P has at least one friend. Then every p ∈ P who bought an iPad has a friend who bought an iPad before (the second time that) p did. 3.180 This assertion fails on an input in which every element of A is identical—for example, ⟨2, 2, 2, 2⟩. 3.181 This assertion fails on an array A that has two initial negative entries, where the second is larger than the first—for example, ⟨−2, −1, 0, 1⟩. 3.182 This assertion fails on an array A that has nonunique entries—for example, ⟨2, 2, 4, 4⟩. 3.183 ∃! x ∈ Z : P(x) is equivalent to ∃x ∈ Z : [P(x) ∧ (∀y ∈ Z : P(y) ⇒ x = y)] . 3.184 ∃∞ x ∈ Z : P(x) is equivalent to ∀y ∈ Z : [∃x ∈ Z : x > y ∧ P(x)] . 3.185 Write P(n) to denote n > 2; Q(n) to denote 2 | n; and R(p, q, n) to denote isPrime(p) ∧ isPrime(q) ∧ n = p + q. Then we have: ∀n ∈ Z : P(n) ∧ Q(n) ⇒ [∃p ∈ Z : ∃q ∈ Z : R(p, q, n)] ≡ ∀n ∈ Z : ∃p ∈ Z : ∃q ∈ Z : [P(n) ∧ Q(n) ⇒ R(p, q, n)] ≡ ∀n ∈ Z : ∃p ∈ Z : ∃q ∈ Z : [¬(P(n) ∧ Q(n)) ∨ R(p, q, n)] ≡ ∀n ∈ Z : ∃p ∈ Z : ∃q ∈ Z : [¬P(n) ∨ ¬Q(n) ∨ R(p, q, n)].

Exercise 3.145 p ⇒ q ≡ ¬p ∨ q De Morgan’s Laws

3.186 The predicate isPrime(x) can be written as ∀d ∈ Z≥2 : d < x ⇒ d ̸ | x (a number x is prime if all potential divisors between 2 and x − 1 do not evenly divide x). Thus we can rewrite Goldbach’s conjecture as: ∀n ∈ Z : ∃p ∈ Z : ∃q ∈ Z : h h ii n ≤ 2 ∨ 2 ̸ | n ∨ ∀d ∈ Z≥2 : (d < p ⇒ d ̸ | p) ∧ (d < q ⇒ d ̸ | q) ∧ n = p + q . 3.187 The predicate d | m can be written as ∃k ∈ Z : dk = m. Thus we can further rewrite Goldbach’s conjecture as: ∀n ∈ Z : ∃p ∈ Z : ∃q ∈ Z :  n ≤ 2 ∨ [∀k ∈ Z : 2k ̸= n] ∨

h

i ∀d ∈ Z≥2 : (d < p ⇒ [∀k ∈ Z : kd ̸= p]) ∧ (d < q ⇒ [∀k ∈ Z : kd ̸= q]) ∧ n = p + q .

3.188 A solution in Python—a slow solution; we’ll see faster algorithms for finding primes in Chapter 7—is shown in Figure S.3.4. 3.189 This proposition is sometimes false. Let S be the set of students at a certain college in the midwest, and let P(x, y) denote “x and y are taking at least one class together this term.” Then ∀x ∈ S : ∃y ∈ S : P(x, y) is true: every

3.5 Nested Quantifiers

1 2 3 4 5 6 7 8

45

def findPrimes(n): primes = [2] candidate = 3 while candidate < n: if 0 not in [candidate % d for d in primes]: primes.append(candidate) candidate += 2 return primes

9 10 11 12 13 14 15 16 17 18 19 20

def testGoldbach(n): primes = findPrimes(n) for k in range(2, n, 2): # 2, 4, 6, ..., n happy = False for prime in primes: if k - prime in primes: print("%d = %d + %d" % (k, prime, k - prime)) happy = True break if not happy: print("VIOLATION FOUND!", k)

21 22

testGoldbach(10000)

Figure S.3.4 Some Python code to test for violations of Goldbach’s conjecture. (It doesn’t find any.)

class has at least two students, so every student must have at least one classmate. On the other hand, the statement ∃y ∈ S : ∀x ∈ S : P(x, y) is false: there is no single person taking classes with everyone (every CS major, every music major, every sociology major, etc.) on campus. 3.190 This proposition is always true. Let’s prove it. Suppose that ∃y ∈ S : ∀x ∈ S : P(x, y). That means that there is a special y ∈ S—call it y∗ —so that ∀x ∈ S : P(x, y∗ ). To show that ∀x ∈ S : ∃y ∈ S : P(x, y), we need to show that, for an arbitrary x, there is some yx so that P(x, yx ). But we can just choose yx = y∗ for every x. 3.191 We want to show that ∀x ∈ S : [P(x) ⇒ (∃y ∈ S : P(y))]. To do so, let’s consider a generic x ∈ S. We want to show that P(x) ⇒ ∃y ∈ S : P(y). But this is easy! Assume the antecedent—that is, assume that P(x). But then showing that ∃y ∈ S : P(y) is trivial: we can just choose y to be x itself. (To take a step back and state this less formally: for any x ∈ S that satisfies P, of course there exists some element of S that satisfies P!) 3.192 We’ll prove ∃x ∈ S : [P(x) ⇒ (∀y ∈ S : P(y))] by case analysis: either ∀y ∈ S : P(y), or not. • Suppose ∀y ∈ S : P(y). Then the given statement is equivalent to ∃x ∈ S : [P(x) ⇒ True]. But P(x) ⇒ True is true for any x—anything implies true!—and so the given statement is equivalent to ∃x ∈ S : True, which is obviously true. • On the other hand, suppose that ¬∀y ∈ S : P(y). Distributing the negation using De Morgan’s Laws, we know that ∃y ∈ S : ¬P(y).

(†)

The statement we’re trying to prove is equivalent to ∃x ∈ S : [P(x) ⇒ False] (because ∀y ∈ S : P(y) is False under our assumption), which is equivalent to ∃x ∈ S : [¬P(x)], because q ⇒ False is equivalent to ¬q. But ∃x ∈ S : ¬P(x) is just (†), which we’ve already proven. (Again, to take a step back and state this less formally: if P holds for every element of S, then the body of the existential quantifier is satisfied for every x ∈ S, but if not then it’s satisfied for every x for which ¬P(x).) 3.193 There are two natural readings: • ∃c : ∀d : c crashes on day d There’s a single computer that crashes every day. • ∀d : ∃c : c crashes on day d Every day, some computer or another crashes. 3.194 There are two natural readings:

46

Logic

• ∀ prime p ̸= 2 : ∃ odd d > 1 : d divides p For every prime number p except 2, there’s some odd integer d > 1 that evenly divides p. • ∃ odd d > 1 : ∀ prime p ̸= 2 : d divides p There’s a single odd integer d > 1 such that d evenly divides every prime number except 2. 3.195 There are four (!) natural readings: • ∀ student s : ∃ class c : ∀ term t : s takes c during t For every student, there’s a class c such that the student takes c every term. • ∀ term t : ∃ class c : ∀ student s : s takes c during t For every term, there’s a single class in that term that every student takes. • ∀ student s : ∀ term t : ∃ class c : s takes c during t For every student, in every term, that student takes one class or another. • ∃ class c : ∀ student s : ∀ term t : s takes c during t There’s a particular class c such that every student takes c in every term. 3.196 There are at least three natural readings: • For every submitted program p, there’s a student-submitted test case (possibly different for different programs p, each possibly submitted by a different student) that p fails on: ∀ program p : ∃ case c : ∃ student s : s submitted c, and p fails on c • There’s a single student s such that, for every submitted program p, there’s a test case (possibly different for different programs p submitted by s that p fails on: ∃ student s : ∀ program p : ∃ case c : s submitted c, and p fails on c • There’s a single student who submitted a single test case c such that every submitted program fails on c: ∃ student s : ∃ case c : ∀ program p : s submitted c, and p fails on c. (A fourth interpretation, ∃ case : ∀ programs : ∃ student s : · · · , doesn’t seem natural.) 3.197 The theorem is the interpretation For every prime number p except 2, there’s some odd integer d > 1 that evenly divides p.” Here’s the proof: any prime number p other than 2 is necessarily odd. Thus, given any prime p ̸= 2, we can simply choose d to be equal to p itself. After all, p is odd, and p | p. The non-theorem is the interpretation There’s a single odd integer d > 1 such that d evenly divides every prime number except 2. For d > 1 to evenly divide the prime number 3, we’d need d = 3—but 3 ̸ | 5. 3.198 ∃x ∈ S : P(x) 3.199 ∃x ∈ S : ∃y ∈ S : ¬P(x, y) 3.200 ∃x ∈ S : ∀y ∈ S : P(x, y) 3.201 ∀x ∈ S : ∃y ∈ S : ¬P(x, y) 3.202 ∀x ∈ S : ∀y ∈ S : ¬P(x) 3.203 ∃x ∈ S : ∃y ∈ S : P(x, y) 3.204 We’ll give a recursive algorithm to turn this type of quantified expression into a quantifier-free proposition. Specifically: • translate every universal quantifier ∀x ∈ {0, 1} : P(x) into the conjunction P(0) ∧ P(1). • translate every existential quantifier ∃x ∈ {0, 1} : P(x) into the disjunction P(0) ∨ P(1). The resulting expression is a statement of propositional logic, where the atomic propositions are Pi (0) and Pi (1), where the predicates used in φ are P1 , P2 , . . . , Pk . We can now figure out whether φ is a theorem by applying the algorithm for propositional logic: build a truth table (there are 2k atomic propositions, so there are 22k rows in the truth table), and test whether the resulting expression is true in every row. If so, φ is a theorem; if not, φ is not a theorem.

4 4.2

Proofs

Error-Correcting Codes 4.1 A solution in Python is shown in Figure S.4.1. 4.2 It does not allow us to detect any single substitution error. For example, a 7 in an odd-indexed digit would now increase sum by 4—just as a 2 would. Thus an odd-indexed substitution of 2 ↔ 7 wouldn’t be detected. For example, 2345 6789 0123 4567 and 7345 6789 0123 4567 would produce the same value of sum. 4.3 It does not. For example, a 4 and a 1 in an odd-indexed digit have the same effect on sum: increasing sum by 3 when the digit is 1 (because 1 · 3 = 3), and also increasing sum by 3 when the digit is 4 (because 4 · 3 = 12 → 1 + 2 = 3). So we couldn’t detect an odd-indexed substitution of 1 ↔ 4, among other examples. 4.4 The change to quintuple the odd-indexed digits (instead of doubling them) does work! The crucial observation, as Exercise 4.1 describes, is that the sum of the ones and tens digits of 5n are distinct for every digit n: original digit quintupled tens’ digit quintupled ones’ digit change to sum

n=0 5n ⌋=0 ⌊ 10 5n mod 10 = 0 5n 5n mod 10 + ⌊ 10 ⌋=0

1 0 5 5

2 1 0 1

3 1 5 6

4 2 0 2

5 2 5 7

6 3 0 3

7 3 5 8

8 4 0 4

9 4 5 9

Because the 10 values in the last row of this table are all different, the quintupling version of the code successfully detects any single substitution error. 4.5 While cc-check can detect many transpositions errors, it cannot detect them all. For example, swapping an adjacent 0 and 9 has no effect on the overall sum. If the first two digits of the number are 09 [doubling the first], then the contribution to sum from these digits is (0 + 0) + 9 = 9. But if we transpose those first two digits to 90 [doubling the first], then the contribution to sum is (1 + 8) + 0 = 9. Both sequences contribute 9 to sum, so they cannot be distinguished. 4.6 To argue that the Hamming distance ∆ : {0, 1}n × {0, 1}n → R≥0 is a metric, we will argue that the three desired properties (reflexivity, symmetry, and transitivity) hold for arbitrary inputs. Let x, y, z ∈ {0, 1}n be arbitrary n-bit strings. Reflexivity: d(x, x) = 0 because x and x differ in zero positions. For x ̸= y, there must be an index i such that xi ̸= yi (that’s what it means for x ̸= y). Thus ∆(x, y) ≥ 1. Symmetry: we need to argue that d(x, y) = d(y, x) for all x, y. The function ∆(x, y) denotes the number of bit positions in which x and y differ—that is, ∆(x, y) = |{i : xi ̸= yi }|. Similarly, ∆(y, x) = |{i : yi ̸= xi }|. But {i : xi ̸= yi } = {i : yi ̸= xi }, because ̸= is a symmetric relation: a ̸= b just means precisely the same thing as b ̸= a. Triangle inequality: we need to argue that d(x, y) ≤ d(x, z) + d(z, y) for all x, y, z. Let’s think about a single bit at a time. We claim that, for any xi , yi , zi ∈ {0, 1}, the following holds: if xi ̸= yi , then either xi ̸= zi or yi ̸= zi . (This fact ought to be fairly obvious; the easiest way to see it may be by drawing out all eight cases—for each xi ∈ {0, 1}, yi ∈ {0, 1}, and zi ∈ {0, 1}—and checking them all.) Thus ∆(x, y) = |{i : xi ̸= yi }| ≤ |{i : xi ̸= zi or yi ̸= zi }| = |{i : xi ̸= zi } ∪ {i : yi ̸= zi }| ≤ |{i : xi ̸= zi }| + |{i : yi ̸= zi }| = ∆(x, z) + ∆(z, y).

the above argument because |S ∪ T| ≤ |S| + |T|

4.7 This function is a metric.

47

48

Proofs

1 2 3 4 5 6 7 8 9 10 11

def is_valid_credit_card_number(numstring): ''' Is the given number (represented as a string) a legitimate credit-card number? ''' total = 0 for i in range(len(numstring)): digit = int(numstring[i]) if i % 2 == 0: # Python is 0-indexed, so even i --> double the digit. digit = 2 * digit total += (digit % 10) + (digit // 10) # ones' digit + tens' digit. return total % 10 == 0

12 13 14 15 16 17 18 19 20 21

def complete_credit_card_number(numstring): ''' Given a numerical string with zero or one "?"s, fill in the ? [if any] to make it a legitimate credit-card number. (Returns "None" in error conditions.) ''' if numstring.count("?") == 0 and is_valid_credit_card_number(numstring): return numstring elif numstring.count("?") != 1: # we can't handle multiple s, or invalid -less numbers. return None

22 23 24 25 26 27 28 29 30 31

missing_index = numstring.index("?") total = 0 for i in range(len(numstring)): if numstring[i] != "?": digit = int(numstring[i]) if i % 2 == 0: digit = 2 * digit total += (digit % 10) + (digit // 10) total = total % 10

32 33 34 35 36 37 38

if missing_index % 2 == 1: # we're missing an undoubled digit. missing_digit = -total % 10 elif total % 2 == 0: # we're missing a doubled digit, and the total without it is even. missing_digit = (-total % 10) // 2 else: # we're missing a doubled digit, and the total without it is odd. missing_digit = 9 - (total // 2)

39 40

return numstring.replace("?", str(missing_digit))

41 42 43 44 45

Alice = "4389 5867 4645 4389".replace(" ","") # https://fossbytes.com/tools/credit-card-generator Bob = "4664 6925 9756 4261".replace(" ","") # invalid! for card in [Bob, Alice] + [Alice[:i] + "?" + Alice[i+1:] for i in range(16)]: print(card, complete_credit_card_number(card))

Figure S.4.1 Testing the validity of a credit-card number, or filling in a missing digit, in Python.

Reflexivity is easy: certainly d(x, y) = 0 if and only if x = y, because x1,...,n = y1,...,n means x = y. Symmetry is also easy, because the definition of d was symmetric with respect to its two arguments; xi+1,...,n = yi+1,...,n if and only if yi+1,...,n = xi+1,...,n . For the triangle inequality, we need to argue that d(x, y) ≤ d(x, z) + d(y, z). If x = y, we’re done immediately. Otherwise, suppose that xi ̸= yi , but xi+1,...,n = yi+1,...,n . Because zi therefore cannot equal both xi and yi , we have that d(x, y) ≤ d(x, z) or d(x, y) ≤ d(y, z). In either case, certainly d(x, y) ≤ d(x, z) + d(y, z). 4.8 This function meets reflexivity and symmetry, but it fails the triangle inequality. For example, d(0000, 1111) = 4, but d(0000, 0101) = 1 and d(0101, 1111) = 1 (because there’s never more than one consecutive disagreement between 0101 and either 0000 or 1111). But then d(0000, 1111) ̸≤ d(0000, 0101) + d(0101, 1111) . 4

1

1

4.2 Error-Correcting Codes

49

4.9 This function fails reflexivity. The problem statement itself gives an example: d(11010, 01010) = |2 − 2| = 0, but 11010 ̸= 01010. 4.10 This function violates the triangle inequality. For example, let x = 01, y = 10, and z = 11. Then: d(01, 10) d(01, 11) d(10, 11)

=1− =1− =1−

2·0 1+1 2·1 1+2 2·1 1+2

=1− =1− =1−

0 2 2 3 2 3

=1 = 31 = 31 .

But then d(01, 10) ̸≤ d(01, 11) + d(11, 01) . 1

1/3

1/3

4.11 This function is a metric—in fact, it’s exactly equivalent to regular-old geometric distance on the line. Reflexivity is straightforward (|a − b| = 0 if and only if a = b, and x and y represent the same number in binary if and only if x = y (as strings). Symmetry is immediate from the fact that |a − b| = |b − a|. The triangle inequality follows because   if b ≤ c ≤ a, or if a ≤ c ≤ b |a − b| |a − c| + |c − b| = 2|a − b| + |b − c| if a ≤ b ≤ c, or if c ≤ b ≤ a  2|a − b| + |a − c| if b ≤ a ≤ c, or if c ≤ a ≤ b. In all three cases, this value is at least |a − b|. 4.12 By definition of the minimum distance of a code C being 2t + 1, there exist two codewords x∗ , y∗ ∈ C such that ∗ ∗ ∆(x , y ) = 2t + 1. Suppose we receive y∗ as our received (possibly corrupted) codeword. We cannot distinguish between y∗ as the transmitted codeword (and there was no error), or x∗ as the transmitted codeword (and it was corrupted by precisely 2t + 1 errors into y∗ ). So we cannot detect 2t + 1 errors. Let i be the smallest index such that ∆(x1...i , y1...i ) = t + 1. Thus ∆(xi+1...n , yi+1...n ) = t. Suppose we receive the bitstring z, where zj = x∗j for j ≤ i and zj = y∗j for j > i. Then ∆(y∗ , z) = t + 1 and ∆(x∗ , z) = t. If up to t + 1 errors have occurred, we can’t distinguish these two cases. So we cannot correct t + 1 errors. 4.13 The argument is very similar to Theorem 4.6. For error detection, we can detect 2t − 1 errors. It’s still the case that the only way we’d fail to detect an error is if the transmitted codeword c were corrupted into a bitstring c′ that was in fact another codeword. Because ∆(c, c′ ) ≥ 2t, this mistake can only occur if there were 2t or more errors. For error correction, we can correct t−1 errors. (When the minimum distance is even, after t errors we might be exactly halfway between two codewords, in which case we can’t correct the error.) Let x ∈ C, let c′ ∈ {0, 1}n be a corrupted bitstring with ∆(x, c′ ) ≤ t − 1, and let y ∈ C − {x} be any other codeword. Then: ′



∆(c , y) ≥ ∆(x, y) − ∆(x, c ) ′

≥ 2t − ∆(x, c ) ≥ 2t − (t − 1) =t+1 ≥ ∆(x, c′ ).

triangle inequality ∆(·, ·) ≤ 2t by definition of minimum distance ∆(x, c′ ) ≤ t − 1 by assumption ∆(x, c′ ) ≤ t − 1 by assumption

Just as in the previous exercise, we cannot detect 2t errors (a received bitstring in C could have experienced 2t errors) or detect t errors (the bitstring z precisely halfway between x, y ∈ C with ∆(x, y) = 2t has ∆(x, z) = ∆(y, z) = t). 4.14 Let C ⊆ {0, 1}n be a code. Call a codeword x ∈ C consistent with a string x′ ∈ {0, 1, ?}n , for all i ∈ {1, 2, . . . , n}, we have that x′i = xi or x′i = ?. If two bitstrings x, y ∈ {0, 1}n are both consistent with x′ , then x and y can differ only in those positions where x′i = ?. Thus, if the number of ?s in x′i is ≤ t, then ∆(x, y) ≤ t. Now suppose that C can detect t substitution errors. Thus the minimum distance of C must be at least t + 1. (See the previous two exercises.) But then, if we erase t entries in any codeword x ∈ C to produce x′ , there is no other codeword consistent with x′ . So we can correct t erasure errors by identifying the unique codeword consistent with the received string.

50

Proofs

4.15 Because a deletion is just a silent erasure, we can transform any erasure into a deletion by deleting each ‘?’. So, if we receive a codeword with at most t erasures, we turn it into a codeword with at most t deletions; because the code can correct these deletion errors, we can use the deletion-error-correcting algorithm, whatever it is, to correct the resulting string. 4.16 The code C = {0001, 0010, 0100, 1000} can correct any single erasure error—if there’s a 1 in the unerased part of the codeword, the erased bit was a 1; otherwise the erased bit was a 0. But C cannot handle a single deletion error: the corrupted codeword 000 can result from a single deletion in any of the four codewords in C. 4.17 We have n-bit codewords, and n-bit messages (because |C| = 2n ). Any distinct codewords differ in at least one position (otherwise they’d be the same!), and 000 · · · 0 and 100 · · · 0 differ in only one position. Thus the rate of this code is nn = 1, and the minimum distance is 1 as well. Because the minimum distance is 2 · t + 1 with t = 0, this code can correct up to t = 0 errors, and detect up to 2t = 0 errors. (That is, it’s useless.) 4.18 There are 2 = 21 codewords with n bits each, so the rate is 1n . The minimum distance is just the Hamming distance between the two codewords, which is n. We can detect up to n − 1 errors, and we can correct up to ⌊(n − 1)/2⌋ errors. 4.19 We have n-bit codewords, where the nth bit is fixed given the first n − 1 bit values. So we have (n − 1)-bit messages and n-bit codewords. Thus the rate is (n − 1)/n = 1 − n1 . Because any two bitstrings that differ in exactly one position also have different parity, the minimum distance of this code is 2. Because the minimum distance is 2 · t with t = 1, this code can correct up to t = 1 errors, and detect up to t − 1 = 0 errors. (That is, it can detect a single error, but can’t correct anything—it’s got the same basic properties as the credit-card number scheme.) 4.20 The key fact that we’ll use in the proof is that for two bitstrings x, y ∈ {0, 1}n , we have parity(x) = parity(y) if and only if ∆(x, y) is even. (To see this fact, observe that the bits where xi = yi contribute equally to both sides of the equation, and the bits where xi ̸= yi contribute a 1 to exactly one side. The two sides end up having equal parity if and only if we’ve added 1 to one side or the other an even number of times.) Consider two distinct codewords x, y ∈ C. By definition of minimum distance, ∆(x, y) ≥ 2t + 1. We’ll give an argument by cases that the corresponding codewords x′ , y′ ∈ C ′ have Hamming distance that’s at least 2t + 2. If ∆(x, y) ≥ 2t + 2, then we’re immediately done: we’ve added an extra bit to the end of x and y, but that can’t possibly decrease the Hamming distance. If ∆(x, y) = 2t + 1, then we argued above that parity(x) ̸= parity(y)—but then ∆(x′ , y′ ) = 1 + ∆(x, y) = 1 + (2t + 1) = 2t + 2. (Note that ∆(x, y) cannot be less than 2t + 1, by definition.) 4.21 Given a bitstring c′ , let c∗ be the codeword in the Repetitionℓ code closest to c′ in Hamming distance. By definition, the codeword c∗ consists of ℓ identical repetitions of some block m. Suppose for a contradiction that there exists an index i such that there are k < ℓ2 blocks of c′ in which the ith entry of the block differs from c∗i . Define codeword ˆc to match c∗ in all bits except the ith of each block, which is flipped. It’s easy to see that ∆(c∗ , c′ ) − ∆(ˆc, c′ ) = (ℓ − k) − k = ℓ − 2k > 0, which means that ∆(c∗ , c′ ) > ∆(ˆc, c′ ). This contradicts the assumption that c∗ was the closest codeword to c′ . 4.22 We fail to correct an error in a Repetition3 codeword only if the same error is repeated in two of the three codeword blocks. Thus we could successfully correct as many as 4 errors in the 12-bit codewords: for example, if the codeword were 0000 0000 0000 and it were corrupted to 1111 0000 0000. 4.23 Rephrased, the upper-bound argument of Section 4.2.5 implies that every 7-bit string is within Hamming distance 1 of a codeword. (There are 16 codewords, and 8 unique bitstrings within Hamming distance 1 of each, and so there are 16 · 8 = 128 different bitstrings within Hamming distance 1 of their nearest codeword. But there are only 128 different 7-bit strings!) Therefore, if a codeword c is corrupted by two different errors, then necessarily the resulting string c′ is closer to a different codeword than c, because c′ is within Hamming distance 1 of some codeword (and it’s not c). 4.24 A solution in Python appears in Figure S.4.2. 4.25 The proof of Lemma 4.14 carries over directly. Consider any two distinct messages m, m′ ∈ {0, 1}n . We must show that the codewords c and c′ associated with m and m′ satisfy ∆(c, c′ ) ≥ 3. We’ll consider three cases based on ∆(m, m′ ): Case I: ∆(m, m′ ) ≥ 3. We’re done immediately; the message bits of c and c′ differ in at least three positions.

4.2 Error-Correcting Codes

1 2 3 4 5

def dot_product(x, y): total = 0 for i in range(len(x)): total += x[i] * y[i] return total

# error-checking if len(x) != len(y) is omitted.

def hamming_distance(x, y): dist = 0 for i in range(len(x)): if x[i] != y[i]: dist += 1 return dist

# error-checking if len(x) != len(y) is omitted.

51

6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

def encode_hamming(m): # error-checking return [dot_product(m, x) % 2 for x in [[1, [0, [0, [0, [0, [1, [1,

if 0, 1, 0, 0, 1, 0, 1,

len(m) != 4 is omitted. 0, 0], 0, 0], 1, 0], 0, 1], 1, 1], 1, 1], 0, 1]]]

22 23 24

# All 16 4-bit messages: messages = [[(x // 8) % 2, (x // 4) % 2, (x // 2) % 2, x % 2] for x in range(16)]

25 26 27 28 29 30 31 32

# Print distance between encodings (under Hamming code) of all message pairs. # Output: 16 pairs of distance 0, 112 of distance 3, 112 of distance 4, and 16 of distance 7. # (The pairs of distance 0 are when m1 == m2.) for m1 in messages: for m2 in messages: c1, c2 = encode_hamming(m1), encode_hamming(m2) print(c1, c2, hamming_distance(c1, c2))

Figure S.4.2 Python code to verify that the minimum distance of the Hamming code is 3.

Case II: ∆(m, m′ ) = 2. Then at least one of the three parity bits contains one of the bit positions where mi ̸= m′i but not the other. (This fact follows from (b).) Therefore this parity bit differs in c and c′ , giving two differing message bits and at least one differing parity bit. Case III: ∆(m, m′ ) = 1. Then at least two of the three parity bits contain the bit position where mi ̸= m′i . (This fact follows from (a).) This fact implies that there is one differing message bits and at least two differing parity bit.

4.26 We’re looking for four parity bits for 11-bit messages. There are a lot of options; here’s one of them. Writing the message bits as abcdefghijk, define the following parity bits:

parity bit #1: the parity of bcd hij parity bit #2: the parity of a cd fg j parity bit #3: the parity of ab d e g i parity bit #4: the parity of abc ef h

k k k k

There are four message bits in three different parity bits (abcd); six message bits in two different parity bits (efghij); and one message bit all four parity bits (k). Thus every message bit appears in at least two parity bits, and it’s immediate from the table above that no two message bits appears in exactly the same parity bits.

4.27 There are a lot of options, but here’s one of them. Writing the message bits as abcdefghijklmnopqrstuvwxyz, define the following parity bits:

52

Proofs

parity bit #1: parity bit #2: parity bit #3: parity bit #4: parity bit #5:

bcde jklmno pqrs a cde ghi mno p tuv ab de f hi kl o q t wx abc e fg ij l n r u w y abcd fgh jk m s v xy

z z z z z

There are 5 message bits in four different parity bits (abcde); 10 message bits in three different parity bits (fghijklmno); 10 message bits in two different parity bits (pqrstuvwxy); and one message bit all five parity bits (z). Thus every message bit appears in at least two parity bits, and it’s again immediate from the list of parity bits that no two message bits appears in exactly the same set of parity bits. 4.28 We have (n − ℓ)-bit messages and n-bit codewords, so we need to define ℓ parity bits satisfying conditions (a) and (b) from Exercise 4.25. That is, the message bits are numbered 1, 2, . . . , 2ℓ − ℓ − 1. Define the set of all ℓ-bit bitstrings that contain two or more ones: B = {x ∈ {0, 1}ℓ : x contains at least 2 ones} . Note that there are 2ℓ bitstrings in {0, 1}ℓ , of which ℓ contain exactly one 1 and one contains no 1 at all. (This argument is exactly the one in Lemma 4.15.) Thus |B| = 2ℓ − ℓ − 1. Label each message bit with a unique bitstring from B. Then, for i ∈ {1, 2, . . . , ℓ}, define parity bit #i to be the parity of message bits {x ∈ B : xi = 1}. Because each x ∈ B has at least two ones, by definition, we satisfy condition (a). Because each bitstring in B is distinct, we satisfy condition (b). 4.29 We can solve the rat-poisoning problem using the parity bits of the Hamming code, with k = 3 rats. Call the purported poison bottles {a, b, c, d, e, f, g, h}, and feed the rats as follows: • Rat #1 gets {a, b, c, e}. • Rat #2 gets {a, b, d, f}. • Rat #3 gets {a, c, d, g}. Because each purported poison has been given to a distinct subset of the rats, we can figure out which bottle is real poison by seeing which rats died. (Observe that the key fact is that we have 23 = 8 subsets of rats, and ≤ 8 bottles to test.) Here is the decoding of the results of the evil experiment: nobody dies rat #1 dies rat #2 dies rat #3 dies rats #1 and #2 die rats #1 and #3 die rats #2 and #3 die rats #1, #2, and #3 die

=⇒ =⇒ =⇒ =⇒ =⇒ =⇒ =⇒ =⇒

h is the real poison. e is the real poison. f is the real poison. g is the real poison. b is the real poison. c is the real poison. d is the real poison. a is the real poison.

4.30 One solution in Python is shown in Figure S.4.3. 4.31 We’re going to use the fact that every 7-bit string is either (a) a Hamming code codeword, or (b) within Hamming distance 1 of a Hamming code codeword. We’ll also rely on the fact that the number of codewords themselves is a small fraction of {0, 1}7 —many more bitstrings fall into category (b). Number the people 1, 2, . . . , 7. If you are the ith person, look at the 7-bit string corresponding to the hats that you see on all other people (call blue 0 and red 1), placing a ? in your own position i. Observe that there is necessarily a way to fill in the ? so that the resulting bitstring x ∈ {0, 1}7 is not a Hamming codeword: if you see 6 bits that are consistent with a codeword c, then ? would be ¬ci ; if you see 6 bits that differ from a codeword c in exactly one spot, then ? would be ci . Declare that your hat color is the bit that you just filled in for ?. • If your hats are assigned in such a way that the corresponding bitstring is a Hamming code codeword, then you will all incorrectly state the color of your hat. You lose. • If your hats are assigned in such a way that the corresponding bitstring is not a Hamming code codeword, then you will all correctly state the color of your hat. (If your hats are not a codeword, there’s one person i whose hat is wrong relative to the closest codeword c; everyone except i answers consistently with c, and i answers inconsistently with c—which means you’re all correct!) You win.

53

4.3 Proof Techniques

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

def golay_code(): output = open("golay_code.txt", "w") codewords = [] candidate = 0 while candidate < 2**23: okay = True for i in range(len(codewords), 0, -1): # counting down! if hamming_distance(int_to_binary(candidate), codewords[i-1]) < 7: okay = False break if okay: codewords.append(int_to_binary(candidate)) output.write(int_to_binary(candidate) + "\n") candidate += 1 output.close()

Figure S.4.3 A Python implementation of the “greedy algorithm” for the Golay code. There are 24 = 16 codewords in the Hamming code, and 27 = 128 hat configurations. You only lose on codewords, and there are 16 codewords. Thus you lose a 16/128 = 1/8 fraction of the time. You win with probability 7/8 = 0.875. 4.32 Consider the set of codewords with their first d − 1 bits deleted: C ′ = {xd,d+1,...,n : x ∈ C} . Let x and y be two distinct codewords in C. Observe that their corresponding bitstrings in C ′ must still be distinct: ∆(xd,d+1,...,n , yd,d+1,...,n ) ≥ 1 because the first d − 1 bits of x, y can’t differ by more than d − 1 positions! Therefore |C| = |C ′ |. But C ′ ⊆ {0, 1}n−d+1 , and so |C ′ | ≤ 2n−d+1 . Because |C| = 2k by definition, we have k 2 = |C| = |C ′ | ≤ 2n−d+1 , and thus k ≤ n − d + 1.

4.3

Proof Techniques 4.33 Assume that x and y are rational. By definition, then, there exist integers a ̸= 0, b ̸= 0, c, and d, such that x = y = bd . Then x−y=

c a

+

d b

=

c a

and

cb+da . ab

Because a, b, c, d ∈ Z, we have that cb + da ∈ Z and ab ∈ Z too; because a ̸= 0 and b ̸= 0 we have that ab ̸= 0. Therefore x − y is rational, by definition, because we can write it as mn for integers n and m ̸= 0—namely, n = cb + da and m = ab. 4.34 Assume that x and y are rational, and y ̸= 0. By definition, then, there exist integers a ̸= 0, b ̸= 0, c, and d, such that x = ac and y = db ; because y ̸= 0, we also have that d ̸= 0. Then x = y

c a d b

=

c b bc · = . a d ad

Because a, b, c, d ∈ Z, we have that bc, ad ∈ Z; because a ̸= 0 and d ̸= 0 we have that ad ̸= 0. Therefore by definition. 4.35 The first statement is false, and the second is true.

x y

is rational

√ • If xy and x are both rational, then y is too. This statement is false for x = 0 and y = 2: in this case x = 0 and xy = 0 are both rational, but y is not rational. • If x − y and x are both rational, then y is too. This statement is true. To prove it, we’ll simply invoke Exercise 4.33, which states that if a and b are rational, then a − b is rational too. Let a = x and b = x − y. Then Exercise 4.33 says that a − b = x − (x − y) = y is rational too.

54

Proofs

4.36 Let n be any odd integer. We can express n in binary as ⟨bk , bk−1 , . . . , b1 , b0 ⟩ ∈ {0, 1}k+1 for some k ≥ 1, so that n = 2k bk + 2k−1 bk−1 + · · · + 2b1 + b0 . Then, taking both sides modulo 2 and observing that 2i bi mod 2 = 0 for any i ≥ 1, we have n mod 2 = b0 mod 2. Because n is odd, we know that n mod 2 = 1. Thus b0 = b0 mod 2 = 1 as well. 4.37 Let n be a positive integer. We can express the digits of n as ⟨ak , ak−1 , . . . , a1 , a0 ⟩ ∈ {0, 1, . . . , 9}k+1 for some k ≥ 1, so that n = 10k ak + 10k−1 ak−1 + · · · + 10a1 + a0 . Taking both sides modulo 5 and observing that 10i ai mod 5 = 0 for any i ≥ 1, we have n mod 5 = a0 mod 5. The integer n is divisible by 5 if and only if n mod 5 = 0, which therefore occurs if and only if a0 mod 5 = 0. The only two digits that are evenly divisible by 5 are {0, 5}, so n is divisible by 5 if and only if a0 ∈ {0, 5}. 4.38 Let dℓ , dℓ−1 , . . . , d1 , d0 denote the digits of n, reading from left to right, so that n = d0 + 10d1 + 100d2 + 1000d3 + · · · + 10ℓ dℓ , or, factoring 10i into 2i · 5i and then dividing both sides by 2k , n=

ℓ X

2i 5i di

i=0

n/2k =

ℓ X

2i−k 5i di .

(∗)

i=0

The integer n is a divisible by 2k if and only if n/2k is an integer, which because of (∗) occurs if and only if the right-hand P i−k i side of (∗) is an integer. And that’s true if and only if k−1 5 di is an integer, because all other terms in the right-hand i=0 2 P Pk−1 i k k Pk−1 i i i i side of (∗) are integers. Therefore 2 | n if and only if 2 | i=0 2 5 di —but k−1 i=0 2 5 di = i=0 10 di simply is the last k digits of n. 4.39 We can factor n3 − n = (n2 − 1)n = (n + 1)(n − 1)n. We can then give a proof by cases, based on the value of n mod 3: • If n mod 3 = 0, then (n + 1)(n − 1)n is divisible by three because n is. • If n mod 3 = 1, then (n + 1)(n − 1)n is divisible by three because n − 1 is. • If n mod 3 = 2, then (n + 1)(n − 1)n is divisible by three because n + 1 is. 4.40 We’ll give a proof by cases based on n mod 3: • If n mod 3 = 0, then we can write n = 3k for an integer k. Then n2 + 1 = (3k)2 + 1 = 9k2 + 1. Because 3k2 is an integer—that is, because 3 | 9k2 , we have that n2 + 1 mod 3 = 1. • If n mod 3 = 1, then we can write n = 3k + 1 for an integer k. Then n2 + 1 = (3k + 1)2 + 1 = (9k2 + 6k + 1) + 1 = (9k2 + 6k) + 2 = 3(3k2 + 2k) + 2. Thus we have that n2 + 1 mod 3 = 2. • If n mod 3 = 2, then we can write n = 3k + 2 for an integer k. Then n2 + 1 = (3k + 2)2 + 1 = (9k2 + 12k + 4) + 1 = (9k2 + 12k) + 5 = 3(3k2 + 4k) + 5. Thus we have that n2 + 1 mod 3 = 5 mod 3 = 2. In all of these cases, n2 + 1 is not evenly divisible by three. 4.41 We can prove this claim from scratch, or use Example 4.14, which states that |a − b| ≤ |a − c| + |b − c|. Let a = x, b = −y, and c = 0. Then we have |x − (−y)| ≤ |x − 0| + |(−y) − 0| which, rewriting, states that |x + y| ≤ |x| + | − y|. The claim follows because |y| = | − y|.

4.3 Proof Techniques

55

4.42 Rewriting, we have to show that |x| ≤ |y| + |x − y|. We can prove this claim from scratch, or use Example 4.14, which states that |a − b| ≤ |a − c| + |b − c|. Let a = x + y, b = y, and c = 2y. Then we have |(x + y) − y| ≤ |(x + y) − 2y| + |y − 2y|, which, rewriting, states that |x| ≤ |x − y| + | − y|. Because | − y| = |y|, we thus have |x| ≤ |x − y| + |y|. Subtracting |y| from both sides yields |x| − |y| ≤ |x − y|, as desired. 4.43 We’ll prove this result by cases, depending whether both, neither, or one of {x, y} is negative: • Suppose both are negative: that is, x < 0 and y < 0. Then |x| = −x and |y| = −y, and x · y > 0 so |x · y| = xy. Thus |x||y| = (−x)(−y) = xy = |x · y|. • Suppose neither is negative: that is, x ≥ 0 and y ≥ 0. Then |x| = x and |y| = y, and x · y > 0 so |x · y| = xy. Thus |x||y| = xy = |x · y|. • Suppose that one of x and y is negative. Without loss of generality, suppose x < 0 and y ≥ 0. Then |x| = −x and |y| = y, and x · y ≤ 0 so |x · y| = −xy. Thus |x||y| = (−x)y = −xy = |x · y|. 4.44 Assume that |x| ≤ |y|. This claim is a straightforward consequence of Exercise 4.41: |x + y| ≤ |x| + |y| ≤ |y| + |y| = 2|y|.

Exercise 4.41 |x| ≤ |y| by assumption

Thus |x + y|/2 ≤ |y|. 4.45 We’ll prove the result by mutual implication. First, we’ll prove the right-to-left direction: if A = ∅ or B = ∅ or A = B, then A × B = B × A. We’ll consider all three cases: • If A = ∅, then A × B = ∅ and B × A = ∅. Thus A × B = B × A. • If B = ∅, similarly, we have A × B = B × A. • If A = B, then A × B and B × A are equal because they’re actually expressing the same set (which is the same as A × A and B × B). For the converse, we’ll prove the contrapositive: if A ̸= ∅ and B ̸= ∅ and A ̸= B, then A × B ̸= B × A. Because A ̸= B, there exists an element in one set but not the other—that is, an element a ∈ A but a ∈ / B, or an element b ∈ B but b ∈ / A. Without loss of generality, we’ll consider the former case. Suppose a ∈ A but a ∈ / B. Because B ̸= ∅, there is an element b ∈ B. Then ⟨a, b⟩ ∈ A × B, but a ∈ / B so ⟨a, b⟩ ∈ / B × A. Thus A × B ̸= B × A. 4.46 This claim is straightforward to see with two cases: • if x ≥ 0 then x · x = positive · positive = positive. • if x ≤ 0 then x · x = negative · negative = positive. 4.47 By Exercise 4.46, we have (x − y)2 ≥ 0. Adding 4xy to both sides, we have (x − y)2 + 4xy ≥ 4xy. Observe √ that (x − y)2 + 4xy =√x2 − 2xy + y2 + 4xy = x2 + 2xy + y2 = (x + y)2 . Thus we have (x + y)2 ≥ 4xy. So x + y ≥ 4xy, and (x + y)/2 ≥ xy. 4.48 The “if” direction is simple: if x = y, then arithmetic mean of x and y is (x + y)/2 = 2x/2 = x, and the geometric √ √ √ mean is xy = x2 = x. The “only if” direction requires us to show that if (x + y)/2 = xy, then x = y. We show this fact by algebra: (x + y)/2 =



√ xy ⇔ x + y = 2 xy √ ⇒ (x + y)2 = (2 xy)2 ⇒ x2 + 2xy + y2 = 4xy ⇔ x2 − 2xy + y2 = 0 ⇔ (x − y)2 = 0 ⇔ x = y.

56

Proofs √ 4.49 This claim follows immediately from the Arithmetic Mean–Geometric Mean inequality: because y ≥ x, we know that y ≥ 0. And we’re told that x ≥ 0, so x/y ≥ 0 too. Thus we can invoke Exercise 4.47 on y and x/y: r y + yx x ≥ y· by Exercise 4.47 2 y √ = x. 4.50 Observe that

√ √ √ x| − | x − xy | = (y − x) − ( x − xy ) √ = y − 2 x + xy √ = (y2 − 2y x + x)/y √ = (y − x)2 /y. √ √ √ By assumption, y > x ≥ 0. Thus (y − x)2 > 0, and (y − x)2 /y > 0 too. Thus √ √ |y − x| − | x − xy | > 0. |y −



y>



x by assumption, so x/y
| xy −



x|.

(∗)

For the second part of the statement, (∗) and Exercise 4.44 tell us that x √ √ √ − x + y − x y − x > y 2 x √ + y − 2 x y = 2 x y +y √ − x . = 2 4.51 We’ll prove the contrapositive: if n is a perfect square, then n mod 4 ∈ / {2, 3} (that is, n mod 4 ∈ {0, 1}). Assume that n is a perfect square—say n = k2 for an integer k. Let’s look at two cases: k is even, or k is odd. Case I: k is even. Then k = 2d for an integer d. Therefore n = k2 = (2d)2 = 4d2 . Because d is an integer, 4d2 mod 4 = 0, and thus n mod 4 = 0. Case II: k is odd. Then k = 2d + 1 for an integer d. Therefore n = k2 = (2d + 1)2 = 4d2 + 2 · (2d · 1) + 12 = 2 4d + 4d + 1. Because d is an integer, we have 4d2 mod 4 = 0 and 4d mod 4 = 0, and thus n mod 4 = 1. 4.52 We’ll prove the contrapositive: if n or m is evenly divisible by 3, then nm is. Without loss of generality, suppose that n is divisible by three—say, n = 3k for an integer k. Then nm = 3km for integers k and m—and 3 · (any integer) is divisible by three by definition. 4.53 We’ll prove the contrapositive: if n is odd, then 2n4 + n + 5 is even. But 2n4 is even, and n is odd by the assumption that n is odd, and 5 is odd—and even + odd + odd is even, so the claim follows. 4.54 We proceed by mutual implication. Suppose that n is even. Any integer times an even number is even, and so thus n · n2 = n3 is even (because n is even). Conversely, we need to show that if n3 is even, then n is even. We’ll prove the contrapositive: if n is odd, then n3 is odd. If n is odd, we can write n = 2k + 1 for an integer k, and thus n3 = (2k + 1)3 = 8k3 + 12k2 + 6k + 1—which is odd, because the first three terms are all even. 4.55 We proceed by mutual implication. First, assume 3 | n. By definition, there exists an integer k such that n = 3k. Therefore n2 = (3k)2 = 9k2 = 3 · (3k2 ). Thus 3 | n2 too.

4.4 Some Examples of Proofs

57

Second, for the converse, we prove the contrapositive: if 3 ̸ | n, then 3 ̸ | n2 . Assume that 3 ̸ | n. Then there exist integers k and r ∈ {1, 2} such that n = 3k + r. Therefore n2 = (3k + r)2 = 9k2 + 6kr + r2 = 3(3k2 + 6r) + r2 . Thus 3 | n2 only if 3 | r2 . But r ∈ {1, 2}, and 12 = 1 and 22 = 4, neither of which is divisible by 3. Therefore n2 = (3k + r)2 = 9k2 + 6kr + r2 = 3(3k2 + 6r) + r2 is not divisible by 3. 4.56 Assume for a contradiction that x and y are both integers. By factoring the left-hand side, we see that (x−y)(x+y) = 1. The only product of integers equal to 1 is 1 · 1, so we have x − y = 1 and x + y = 1. Adding these two equations together yields x − y + (x + y) = 1 + 1—that is, 2x = 2. Thus x = 1. But then y must equal 0, which isn’t a positive integer. 4.57 Suppose for a contradiction that 12x + 3y = 254 for integers x and y. But 12x + 3y is therefore divisible by three, because 3 | 12x and 3 | 3y. But 254 = 3 · (84) + 2, so 3 ̸ | 254. But then 3 | 12x + 3y and 3 ̸ | 254, which contradicts the assumption that 12x + 3y = 254. √ √ 4.58 Assume that 3 2 is rational. Thus we can write 3 2 = n/d for integers n and d ̸= 0 where n and d are in lowest terms. Cubing both sides yields that n3 /d3 = 2, and therefore that n3 = 2d3 . Because 2d3 is even, we know that n3 is even. Therefore, by Exercise 4.54, n is even too. Because n is even, we can write n = 2k for an integer k, so that n3 = 8k3 . Thus n3 = 8k3 and n3 = 2d3 , so 2d3 = 8k3 and d3 = 4k3 . Hence d3 is even, and—again using Exercise 4.54—we have that d is even. But that is a contradiction: we assumed that n/d was in lowest terms, but now we have shown that n and d are both even! √ √ √ 4.59 Assume that 3 is rational—that is, assume that we can write 3 in lowest terms as 3 = n/d, for integers n and d ̸= 0 where n and d have no common divisors. Squaring both sides yields that n2 /d2 = 3, and therefore that n2 = 3 · d2 . Because 3 | 3 · d2 , we know that 3 | n2 . Therefore, by Exercise 4.55, we have that n is itself divisible by 3. Because 3 | n, we can write n = 3k for an integer k, so n2 = (3k)2 = 9k2 . Thus n2 = 9k2 and n2 = 3d2 , so 3d2 = 9k2 and d2 = 3k2 . Hence d2 is divisible by 3, and—again using Exercise 4.55—we have that 3 | d. That’s a contradiction: we assumed that n/d was in lowest terms, but we have now shown that n and d are both divisible by three! 4.60 Suppose for a contradiction that x and y ̸= x are both strict majority elements. Then, by definition, the sets X = {i ∈ {1, 2, . . . , n} : A[i] = x}

and

Y = {i ∈ {1, 2, . . . , n} : A[i] = y}

satisfy |X| > 2n and |Y| > n2 . But because X, Y ⊆ {1, 2, . . . , n}, which has only n elements, there must be an index i in X ∩ Y. But by definition of X and Y we must have A[i] = x and A[i] = y—but x ̸= y, so these conditions are contradictory. 4.61 Consider x = 4.62 Consider x =

√ √

√ 2 and y = 1/ 2. Then xy = 1, which is rational, but neither x nor y is rational. 2 and y =



2. Then x − y = 0, which is rational, but neither x nor y is rational.

4.63 Consider x = π and y = π. Then x/y = 1, which is rational, but neither x nor y is rational. 4.64 I found a counterexample to Claim 2 by writing a program (by exhaustively computing the value of x2 + y2 for all 1 < x < y < 100, and then looking for values that were triplicated). We can express 325 as 12 + 182 = 102 + 152 = 62 + 172 . 4.65 The function f(n) = n2 + n + 41 does not yield a prime for every nonnegative integer n. For example, f(41) = 412 + 41 + 41 = 41(41 + 1 + 1) = 41 · 43, which is not prime.

4.4

Some Examples of Proofs 4.66 First, here are propositions using only {∨, ∧, ⇒, ¬} that are logically equivalent to operators 1 – 8 from Figure 4.34:

58

Proofs

column 1 2 3 4 5 6 7 8

truth table [in standard order] TTTT TTTF TTFT TTFF TFTT TFTF TFFT TFFF

original proposition True p∨q p⇐q p p⇒q q p⇔q p∧q

equivalent proposition p⇒p p∨q q⇒p p p⇒q q (p ⇒ q) ∧ (q ⇒ p) p∧q

For the remaining columns, observe that 9 – 16 are just the negations of 1 – 8 , in reversed order: 9 is ¬ 8 ; 10 is ¬ 7 ; etc. Because we’re allowed to use ¬ in our equivalent expressions, we can express 9 – 16 by negating the corresponding translation from the above table. 4.67 Given the solution to Exercise 4.66, it suffices to show that we can express ⇒ using {∧, ∨, ¬}. And, indeed, p ⇒ q is equivalent to ¬p ∨ q. (Now, to translate i to use only {∧, ∨, ¬}, we translate i as in Exercise 4.66 to use only {∧, ∨, ¬, ⇒}, and then further translate any use of ⇒ to use ∨ and ¬.) 4.68 Given the solution to Exercise 4.67, it suffices to show that we can express ∧ using {∨, ¬}, and ∨ using {∧, ¬}. But these are just De Morgan’s Laws: p ∧ q is equivalent to ¬(¬p ∨ ¬q), and p ∨ q is equivalent to ¬(¬p ∧ ¬q). 4.69 Given the solution to Exercise 4.68, it suffices to show that we can express ∧ and ¬ using the Sheffer stroke. Indeed, p | p is logically equivalent to ¬p, and ¬(p | q) ≡ (p | q) | (p | q) is logically equivalent to p ∧ q. 4.70 Given the solution to Exercise 4.68, it suffices to show that we can express ∨ and ¬ using the Peirce arrow ↓. Indeed, p ↓ p is logically equivalent to ¬p, and ¬(p ↓ q) ≡ (p ↓ q) ↓ (p ↓ q) is logically equivalent to p ∨ q. 4.71 Call a Boolean formula φ over variables p and q truth-preserving if φ is true whenever p and q are both true. (So φ is truth-preserving when the first row of its truth table is True.) We claim that any logical expression φ that involves only {p, q, ∨, ∧}, in any combination, is truth-preserving. A fully rigorous proof needs induction (see Chapter 5), but the basic idea is fairly simple. The trivial logical expressions p or q certainly are truth-preserving. And if both φ and ψ are truth-preserving then both φ ∧ ψ and φ ∨ ψ are truth-preserving too. In sum, using only ∧s and ∨s, we can only build truth-preserving expressions. But there are non-truth-preserving formulas—for example, p ⊕ q is false when p = q = T—which therefore cannot be expressed solely using the operators {∧, ∨}. Therefore this set of operators is not universal. 4.72 Let’s say that a proposition in the form ∀/∃ x1 : ∀/∃ x2 : · · · ∀/∃ xk : P(x1 , x2 , . . . , xk ) is outer-quantifier-only form (OQOF). (P may use any of the logical connectives from propositional logic.) We must show that any fully quantified proposition φ of predicate logic is logically equivalent to one in OQOF. First, rename all the variables in φ to ensure that all bound variables in φ are unique. Then apply the transformations of Exercises 4.66–4.71 to ensure that the only operators in φ are ¬, ∧, ∀, and ∃. We’ll give a recursive transformation to translate φ into OQOF, where we only make a recursive call on a proposition with strictly fewer of those four operators. Case I: φ is of the form ¬ψ. First, recursively convert ψ to OQOF. If the resulting expression ψ′ has no quantifiers, then we’re done: just return ¬ψ′ . Otherwise, ψ′ is of the form ∀/∃ x : τ . Then we can return ∃/∀ x : ρ, where ρ is the OQOF of ¬τ , found recursively. Case II: φ is of the form ψ ∧ τ . Recursively convert ψ and τ to OQOF, and call the resulting expressions ψ′ and τ ′ : •If ψ′ = ∀x : ρ, then φ is [∀x : ρ] ∧ τ , which is logically equivalent to [∀x : ρ ∧ τ ]: [∀x : ρ] ∧ τ ≡ [∀x : ρ] ∧ [∀x : τ ] ≡ [∀x : ρ ∧ τ ].

Exercise 3.142 and Exercise 3.138

Recursively convert ρ ∧ τ to OQOF. •If ψ′ = ∃x : ρ, then φ ≡ [∃x : ρ] ∧ τ ≡ [∃x : ρ ∧ τ ] by Exercise 3.144. Recursively convert ρ ∧ τ to OQOF. •If ψ′ contains no quantifiers but τ ′ does, then follow the previous two bullet points (but on τ ′ instead of φ’). •If neither ψ′ nor τ ′ contain quantifiers, then we’re done: just return ψ′ ∧ τ ′ . Case III: φ is of the form ∀/∃ x : ψ. Recursively convert ψ to OQOF. 4.73 Let φ = x1 ∨ x2 ∨ · · · ∨ xn . This proposition has only one clause, but is true under every truth assignment except the all-false one. But our translation thus builds 2n − 1 clauses in the equivalent DNF formula.

59

4.4 Some Examples of Proofs

4.74 Draw an identical right triangle, rotated 180◦ from the original. The two triangles fit together to create a x-by-y rectangle, which has area xy. Thus the area of the triangle is xy/2. xy y

y x

x

4.75 Draw four identical right triangles, with legs of length a and b and hypotenuse of length c, arranged as in the figure. The entire enclosed space is a c-by-c square, which has area c2 . The unfilled inner square has side length b − a, so its area is (b − a)2 = b2 − 2ab + a2 . Thus, writing ∆ for the area of the triangle, we have 4∆ + b2 − 2ab + a2 = c2 . By the previous exercise, we have ∆ = ab/2; thus b2 + a2 = c2 . 4.76 The right triangle in question has hypotenuse x + y and its vertical leg has length x − y. Writing d for the length of the other (horizontal) leg, the Pythagorean Theorem tells us that (x − y)2 + d2 = (x + y)2 and thus x2 − 2xy + y2 + d2 = x2 + 2xy + y2

multiplying out

2

and so

adding 2xy − y2 to both sides

d = 4xy √ d = 2 xy.

taking square roots, since d and x and y are positive

√ √ The hypotenuse is the longest side of a right triangle, so x + y ≥ 2 xy—or, dividing by two, (x + y)/2 ≥ xy. 4.77 Writing a = |x1 − y1 | and b = |x2 − y2 |, the Manhattan distance is a + b and the Euclidean distance is Because a and b are both nonnegative, we have a+b≥



a2 + b2 .

p

a2 + b2 p ⇔ (a + b)2 ≥ ( a2 + b2 )2 ⇔ a2 + 2ab + b2 ≥ a2 + b2 ⇔ 2ab ≥ 0.

And certainly 2ab ≥ 0, because a ≥ 0 and b ≥ 0. 4.78 As in the previous exercise, write a = |x1 − y1 |. Let d be the Manhattan distance, so that d = a + |x2 − y2 |. Thus p the Euclidean distance is a2 + (d − a)2 . It’s fairly easy to see from calculus (or from plotting the function) that the Euclidean distance when a2 = (d − a)2 —that is, when a = d/2. In this case, the Euclidean distance is p p is maximized √ 2 2 2(d/2) = d /2 = d/ 2. Thus we have dmanhattan (x, y) ≤



2 · deuclidean (x, y).

√ The points x = ⟨0, 0⟩ and y = ⟨d/2, d/2⟩ satisfy dmanhattan (x, y) = d and deuclidean (x, y) = d/ 2, as desired. p 4.79 We’ll give a direct proof. The given point ⟨x, y⟩ is distance d = x2 + y2 away from the origin, in some angle α above the x-axis where cos α = x/d and sin α = y/d. We now seek a point at distance d away from the origin, but now at angle α + θ above the x-axis:

60

Proofs

the rotated point

⟨x, y⟩ θ

α

Thus the new position is ⟨d cos(α + θ), d sin(α + θ)⟩ = ⟨d[cos α cos θ − sin α sin θ], d[sin α cos θ + cos α sin θ])⟩ = ⟨cos θ[d cos α] − sin θ[d sin α], cos θ[d sin α] + sin θ[d cos α])⟩ = ⟨x cos θ − y sin θ, y cos θ + x sin θ⟩, as desired. 4.80 The number 6 is a perfect number: its factors are 1, 2, and 3, and 1 + 2 + 3 = 6. (One way to find this fact is by writing a program to systematically test small integers; 6 is the smallest integer that passes the test. The next one is 28 = 1 + 2 + 4 + 7 + 14.) 4.81 If p is prime, then the only positive integer factors of p2 are 1, p, and p2 . But 1 + p = p2 only for p = (1 ± which is not a prime integer. (See Example 6.27.)



5)/2,

4.82 Taking the numbers {n, n + 1, . . . , n + 5} all modulo 6, we must have the values {0, 1, 2, 3, 4, 5} (in some order). The three numbers k such that k mod 6 ∈ {0, 2, 4} are all even, and the number k such that k mod 6 = 3 is divisible by three. Because n ≥ 10, we have that 2 and 3 are not k themselves, and thus the only two candidate primes are the two numbers k such that k mod 6 ∈ {1, 5}. 4.83 The claim is false. I found a counterexample by writing a small Python program: 1 2 3 4 5

def isPrime(n): for d in range(2, int(n**0.5)): if n % d == 0: return False return True

6 7 8 9 10

n = 2 while any(map(isPrime, range(n, n+10))): n = n + 1 print("n =", n, "through n+9 =", n+9)

My program terminated with n = 200. Indeed, none of {200, 201, . . . , 209} is prime: 200 = 23 · 52 205 = 5 · 41

201 = 3 · 67 206 = 2 · 103

202 = 2 · 101 207 = 32 · 23

203 = 7 · 29 208 = 24 · 13

204 = 22 · 3 · 17 209 = 11 · 19.

4.84 Let k be a locker number. Note that k’s state is changed once per factor of k. For example, if k is prime, then k is opened when i = √ 1 and closed when i = k, and left untouched at every other stage. The only time k has an odd number of factors is when k is a factor of k—every other factor i is “balanced” by the factor k/i. Thus every locker is left closed except the perfect squares: {1, 4, 9, 16, 25, 36, 49, 64, 81, 100}. 4.85 It’s a small bug, but it’s still a bug:√ consider n = 2. The modified claim would be: √ 2 is composite if and only if 2 is evenly divisible by some k ∈ {2, . . . , ⌈ 2⌉}. But 2 is divisible by 2, and 2 ∈ {2, . . . , ⌈ 2⌉}, and 2 is not composite. √  4.86 It’s immediate that if n is divisible by such a k ∈ n , . . . , n − 1 , then n is composite. For the other direction,  √  n that divides n evenly. That assume that n is composite. By Theorem 4.32, there exists an integer k ∈ 2, 3, . . . , √  √  is, there exists an integer d such that kd = n. But k ≥ 2 implies that d < n, and k ≤ n implies that d ≥ n .

4.5 Proof Techniques

4.87 Take the number n = 202. It’s composite, but it is not evenly divisible by any number in {7, 8, . . . , 21}: its only nontrivial factors are 2 and 101.

√

61

  √  n/2 , . . . , 3 n/2 =

2 4.88 Let n be p2 for any prime √ p. Then the only even divisors of n are 1, p, and p . The smallest prime factor that evenly divides n is therefore p = n. And there are infinitely many such integers n, because there are infinitely many primes.

4.5

Proof Techniques 4.89 Invalid. The premises are of the form p ⇒ q and ¬p; the conclusion is q. But with p = True and q = False, the premises are both true and the conclusion is false. 4.90 Invalid. The premises are of the form p ⇒ q and q; the conclusion is p. But with p = False and q = True, the premises are both true and the conclusion is false. 4.91 Valid. The premises are of the form p ⇒ q and ¬q; the conclusion is ¬p. This inference is Modus Tollens, which is a valid step. 4.92 Invalid. The premises are of the form p ∨ q and p; the conclusion is q. But with p = True and q = False, the premises are both true and the conclusion is false. 4.93 Valid. The premises are of the form ∀x ∈ S : P(x) and a ∈ S; the conclusion is P(a). This is a valid inference (it’s precisely what “for all” means). 4.94 Valid. The premises are of the form p ⇒ q ∨ r and p and ¬r; the conclusion is q. This is a valid inference, because in the only truth assignment in which all of the premises are true (when p = q = True and r = False), the conclusion is true: p T T T T F F F F

q T T F F T T F F

r T F T F T F T F

p ⇒ (q ∨ r) T T T F T T T T

all premises (p and ¬r and p ⇒ q ∨ r) F T F F F F F F

conclusion (q) T

4.95 The problem is in the statement “Because p and q are both prime, we know that p does not evenly divide q.” The stated assumptions were that p and q both evenly divide n and are prime. If p and q are different, then p ̸ | q—but there’s nothing in the theorem or proof that requires p ̸= q. 4.96 Let n = 15, p = 5, and q = 5. Indeed n is a positive integer, p, q ∈ Z≥2 , p and q are prime, and n is evenly divisible by both p and q. But n is not divisible by pq = 25. 4.97 The problem is a fundamental confusion of the variables k and n. Example 4.7 establishes that 6! + 1 is not evenly divisible by any k ∈ {2, . . . , 6}, not any k ∈ {2, . . . , 6!}. 4.98 721 is divisible by 7, as 7 · 103 = 721. Thus 721 is not prime. 4.99 Here are two small examples: 2! + 1 = 2 + 1 = 3 is prime, and 3! + 1 = 6 + 1 = 7 is prime. 4.100 This argument affirms the consequent: Example 4.11 says that if rational—and not the converse.

√ 2 2

and

2 √ 2

are rational, then their product is

62

Proofs

4.101 Suppose for a contradiction that √82 were rational—that is, √82 = nd for integers n and d. But then √ would be rational too—but 2 is not rational, which contradicts our assumption.

8d n

=

8

√ 8

2

=



2

4.102 The problem is in this sentence: “Because r < 12, adding r2 to a multiple of 12 does not result in another multiple of 12.” There may be numbers not divisible by 12 whose squares are divisible by 12 (unlike for divisibility by 2: the square of an odd number is never even): for example, 62 = 36, which is divisible by 12. 4.103 One counterexample: n = 18 is not divisible by 12, but 182 = 324 = 27 · 12 is divisible by 12. (In fact, it’s true that 6 | n if and only if 12 | n2 .) 4.104 The problem is in the sentence “Therefore, by the same logic as in Example 4.18, we have that n is itself divisible by 4.” We can have 4 | n2 without 4 | n—for example, for n = 6, we have 4 | 36 but 4 ̸ | 6. 4.105 The problem is in the sentence “Observe that whenever a ≤ b and c ≤ d, we know that ac ≤ bd.” If we multiply by a negative number, then the inequality swaps: n ≤ m if and only if −2n ≥ −2m, for example. In fact, we’re plugging in x = y = √12 —so both y − 4x and y − 2x are negative. Thus the multiplication based on (2) in (3) should have swapped the inequality. r+t 4.106 The issue is that the average of rs and ut might be very far from s+u . Thus we can’t validly draw the conclusion that the last sentence infers. To be precise, suppose algorithm F is correct on a out of b members of W and c out of d members of B. Suppose algorithm G is correct on r out of s members of W and t out of u members of B. By assumption,

r a > b s

c t > . d u

and

Implicitly the argument concludes that, therefore, we have a+c r+t > . b+d s+u But that’s not necessarily true. For example, consider a = 34, b = 100, r = 1, s = 3, c = 1, d = 2, t = 49, u = 50. Then we have a = 34 r=1 > b = 100 s=3

c=1 t = 49 > . d=2 u = 50

and

But a + c = 34 + 1 = 35 and b + d = 100 + 2 = 102, so r+t 50 s + u = 3 + 100 = 103 so s+u = 103 ≈ 0.4854.

a+c b+d

=

35 102

≈ 0.3431. Meanwhile r + t = 1 + 49 = 50 and

4.107 The success rate is the number of correct answers over the number of people encountered. The success rate for F is 75 + 60 135 .75 · 100 + .60 · 100 = = = 0.675. 200 200 200 The success rate for G is .70 · 1000 + .50 · 100 700 + 50 750 = = = 0.6818 · · · . 1100 1100 1100 4.108 The error is that the two shapes that have been produced are not right triangles—in fact, they are not triangles at all! The slope of the first constituent triangle is 38 = 0.375. The slope of the second constituent triangle is 25 = 0.4. Thus when these two pieces are lined up, the apparent hypotenuse of the composed “triangle” is not a straight line! In the left-hand configuration, it’s “bowed in”; in the right-hand configuration it’s “bowed out.” Here are the drawings from the question, augmented with the actual diagonal (a dotted line) from the lower-left to upper-right corners of the rectangles.

4.5 Proof Techniques

63

The difference between these “hypotenuses” is one unit of area—enough to account for the apparent difference. To see this more explicitly, here is the same rectangle, with the two “hypotenuses” drawn and the area between them shaded.

4.109 This proof uses the fallacy of proving true. It assumes that the Pythagorean Theorem is true, and derives True (the similarity of the triangles).

5 5.2

Mathematical Induction

Proofs by Mathematical Induction 5.1 Let P(n) denote the claim that n X

i2 =

n(n+1)(2n+1) . 6

i=0

We will prove that P(n) holds for all integers n ≥ 0 by induction on n. For the base case (n = 0), we need to show that P(0)—that is, we need to prove that 0 X

i2 =

0(0+1)(2·0+1) . 6

i=0

But the left-hand side is 02 = 0, and the right-hand side is 06 = 0, so P(0) holds. For the inductive case (n ≥ 1), we assume the inductive hypothesis P(n − 1), and we must prove P(n): n X

i2 = n2 +

i=0

= = Pn i=0

i2 =

i2

definition of summation

i=0

= n2 +

Thus we have proven

n−1 X

(n−1)(n)(2n−1) 6

the inductive hypothesis P(n − 1)

2n3 +3n2 +n 6 n(2n+1)(n+1) . 6

n(n+1)(2n+1) , 6

multiplying out and collecting like terms factoring

which is precisely P(n).

5.2 Let P(n) denote the claim that n X

i3 =

n4 +2n3 +n2 . 4

i=0

We prove that P(n) holds for all integers n ≥ 0 by induction on n. For the base case (n = 0), we need to show that P(0) holds, which is straightforward because n X i=0

64

i3 =

0 X i=0

i3 = 03 = 0 =

04 +2·03 +02 4

=

n4 +2n3 +n2 . 4

5.2 Proofs by Mathematical Induction

65

For the inductive case (n ≥ 1), we assume the inductive hypothesis P(n − 1), and we must prove P(n): n X

i3 = n3 +

i=0

n−1 X

i3

definition of summation

i=0

(n − 1)4 + 2(n − 1)3 + (n − 1)2 inductive hypothesis P(n − 1) 4  4  3 2 3 2 2 [n − 4n + 6n − 4n + 1] + 2[n − 3n + 3n − 1] + [n − 2n + 1] 3 =n + multiplying out 4 4 3 2 n − 2n + n = n3 + collecting like terms 4 4 3 2 n + 2n + n = algebra , 4 = n3 +

precisely as required by P(n). 5.3 Let P(n) denote the claim that (−1)n = 1 if n is even, and (−1)n = −1 if n is odd. We’ll prove that P(n) holds for all nonnegative integers n by induction on n. Base case (n = 0): We must prove P(0): that is, we must prove that −10 = 1 if 0 is even, and −10 = 0 if 0 is odd. But 0 is even and (−1)0 = 1 by the definition of exponentiation, so P(0) holds. Inductive case (n ≥ 1): We assume the inductive hypothesis P(n − 1), and we must prove P(n). (−1)n = −1 · (−1)n−1 ( 1 if n − 1 is even = −1 · −1 if n − 1 is odd ( 1 if n is odd = −1 · −1 if n is even ( −1 if n is odd = 1 if n is even

definition of exponentiation inductive hypothesis

n is odd ⇔ n − 1 is even

multiplying through

Thus we have shown that (−1)n is −1 if n is odd, and it’s 1 if n is even—in other words, we’ve proven P(n). 5.4 Let P(n) denote the claim that n X

1 i(i+1)

=

i=1

n . n+1

We prove that P(n) holds for any n ≥ 0, by induction on n. Base case (n = 0): Observe that 0 X

1 i(i+1)

=0

i=1

by definition (the summation is over an empty range), and

0 0+1

= 0 too. Thus P(0) holds.

66

Mathematical Induction

Inductive case (n ≥ 1): We assume the inductive hypothesis P(n − 1), and we must prove that P(n) holds too. n X

X 1 + n(n + 1) i=1 n−1

1 i(i+1)

=

i=1

1 i(i+1)

1 n−1 + n(n + 1) n 1 + (n − 1)(n + 1) = n(n + 1) =

1 + n2 − 1 n(n + 1) n = . n+1 =

definition of summation

inductive hypothesis putting terms over common denominator multiplying out and combining like terms factoring

Thus P(n) holds. 5.5 Let P(n) denote the claim that n X

2 i(i+2)

i=1

=

3 1 1 − − . 2 n+1 n+2

We prove that P(n) holds for any n ≥ 0, by induction on n. Base case (n = 0): Observe that the left-hand side is 0 (the summation is over an empty range), and 32 − 11 − 12 = 0 too. Thus P(0) holds. Inductive case (n ≥ 1): We assume the inductive hypothesis P(n − 1), and we must prove that P(n) holds too. n X i=1

X 2 2 + i(i+2) n(n + 2) i=1   2 3 1 1 = + − − n(n + 2) 2 n n+1 2(n + 1) − (n + 2)(n + 1) − n(n + 2) 3 = + 2 n(n + 2)(n + 1) n−1

2 i(i+2)

=

3 2 3 = 2 3 = 2 3 = 2 =

2n2 + 3n n(n + 2)(n + 1) 2n + 3 − (n + 2)(n + 1) [n + 1] + [n + 2] − (n + 2)(n + 1) 1 1 − − . n+2 n+1 −

definition of summation

inductive hypothesis putting terms over common denominator multiplying out and collecting like terms cancelling the

n n

rearranging via algebra—a choice made only by knowing the expression we’re aiming for! algebra

Thus P(n) holds. 5.6 Let P(n) denote the claim n X

i · (i!) = (n + 1)! − 1.

i=1

We’ll prove that P(n) holds for any n ≥ 0 by induction. For the base case (n = 0), we must prove P(0). The left-hand side of P(0) is an empty summation, so it’s equal to 0. The right-hand side of P(0) is (0 + 1)! − 1 = 1! − 1 = 1 − 1 = 0. Thus P(0) holds because both sides of the equation are equal to 0.

67

5.2 Proofs by Mathematical Induction

For the inductive case (n ≥ 1), we assume the inductive hypothesis P(n − 1) and prove P(n): n X

i · (i!) = n · (n!) +

n−1 X

i=0

i · (i!)

= n · (n!) + n! − 1 = (n + 1) · (n!) − 1 = (n + 1)! − 1. 5.7 We’ll show that dn =

(

50 √ mm. 2)n

definition of summation

i=0

inductive hypothesis factoring definition of factorial

Let P(n) denote this claim. Here’s a proof by induction that P(n) holds for all n ≥ 0:

= 50. Thus P(0) holds. base case (n = 0): We were told by definition that d0 = 50 mm. And (√502)0 = 50 1 inductive case (n ≥ 1): We assume the inductive hypothesis P(n − 1)—namely, we assume that dn−1 = We must prove P(n): dn−1 dn = √ 2 =

(

(

√ 50 2)n−1

mm.

definition of dn

√ 50 2)n−1



mm

2 50

indutive hypothesis P(n − 1)

√ mm = √ ( 2)n−1 · 2 50 = √ mm. ( 2)n

algebra algebra

Thus we have proven P(n), as desired. 5.8 For a circle of diameter d, the area is π · ( d2 )2 . Thus the light-gathering area when the lens is set to dn is 2  2  625π 1 1 50 mm = n mm2 . π· · dn = π · · √ 2 2 ( 2)n 2 That is, the light-gathering area decreases by a factor of 2 with every step. The f-stop when the aperture diameter is dn is defined as 50dmm , so the f-stop is n √ 50 mm h i = ( 2)n . 50 √mm ( 2)n

√ √ √ Thus the f-stops are 1, 2, ( 2)2 , ( 2)3 , etc. Or, rounding to two significant digits: 1, 1.4, 2, 2.8, 4, 5.6, 8, 11, etc. (These values are the names of the f-stop settings.) P 5.9 The ith odd positive number is equal to 2i − 1. Thus the quantity that we’re trying to understand is ni=1 (2i − 1). For n = 1, this quantity is 1;P for n = 2, it’s 1 + 3 = 4; for n = 3, it’s 1 + 3 + 5 = 9. From the pattern 1, 4, 9, . . ., we might conjecture that the sum ni=1 (2i − 1) is equal to n2 . Let’s prove it. P Let P(n) denote the claim ni=1 (2i − 1) = n2 . We’ll prove that P(n) holds for all n ≥ 1 by induction on n. P base case (n = 1): By definition, 1i=1 (2i − 1) = 2 − 1 = 1, and 12 = 1. inductive case (n ≥ 2): We assume the inductive hypothesis P(n − 1), and prove that P(n) holds too: n X

(2i − 1) = 2n − 1 +

i=1

n−1 X

(2i − 1)

= 2n − 1 + (n − 1)2 = 2n − 1 + n2 − 2n + 1 2

=n . Thus P(n) holds, and the result follows.

definition of summation

i=1

inductive hypothesis P(n − 1) algebra algebra

68

Mathematical Induction

5.10 positive integers is to observe that Pn The ith positive even integerPisn 2i. One way to analyze the sum of the Pnfirst n even n(n+1) i=1 P2in is precisely equal to 2 · i=1 i; we showed in Theorem 5.3 that i=1 i = · 2 . But we can also directly prove that i=1 2i is n(n + 1). Let P(n) denote this claim. We proceed by induction on n: P base case (n = 0): Indeed, we have ni=1 2i = 0 = 0 · (0 + 1). inductive case (n ≥ 1): We assume the inductive hypothesis P(n − 1), and prove P(n): n X

2i = 2n +

i=1

n−1 X

2i

definition of summation

i=1

= 2n + (n − 1)(n − 1 + 1) = 2n + (n − 1)n = n(2 + n − 1) = n(n + 1).

inductive hypothesis P(n − 1) algebra factoring algebra

Thus P(n) is true too. P 1 1 is exactly 32 less than 1. Let P(n) denote the property that ni=1 21i = 1 − 21n . 5.11 A summation like 12 + 14 + 18 + 161 + 32 We’ll prove that P(n) holds for all integers n ≥ 0 by induction on n. P For the base case (n = 0), the property P(0) requires that 0i=1 21i = 1 − 210 . But the left-hand side is an empty 1 summation, so its value is 0. The right-hand side is 1 − 1 = 1 − 1 = 0 too. Thus P(0) holds. For the inductive P case (n ≥ 1), we assume the inductive hypothesis P(n − 1) and we must prove P(n)—that is, we must prove that ni=1 21i = 1 − 21n . We’ll start from the left-hand side and prove it equal to the right-hand side: " n−1 # n X X 1 1 = + 21n breaking apart the summation 2i 2i i=1

i=1 1 1 = [1 − 2n− 1 ] + 2n 1 1 = 1 − 2 · 2n + 2n = 1 − 21n .

inductive hypothesis 1 2n−1

=2·

1 2n

algebra

Thus P(n) holds, and the entire claim holds. 5.12 Let P(n) denote the claim that n X

(x0 + ir) = (n + 1)x0 +

rn(n+1) . 2

i=0

Here’s a proof that P(n) holds for all n ≥ 0, by induction on n. P , as desired. base case (n = 0): We have 0i=0 x0 + ir = x0 + 0 · r = x0 + r 0·1 2 inductive case (n ≥ 1): We assume the inductive hypothesis P(n − 1), and we must prove P(n): " n−1 # n X X (x0 + ir) = [x0 + nr] + (x0 + ir) i=0

i=0

h = [x0 + nr] + n · x0 +

r(n−1)(n) 2

+ 1) = (n + 1)x0 + rn · ( n−1 2 = (n + 1)x0 +

r·n(n+1) . 2

i

definition of summation

inductive hypothesis P(n − 1) combining like terms algebra

Thus P(n) is true too. 5.13 Let P(n) denote the claim that a knight’s walk exists for an n-by-n chessboard. We’ll prove that P(n) is true for any n ≥ 4 by induction on n. Actually, we will prove something slightly stronger: let K(n) denote the claim that, for any two squares i and j in an n-by-n board, there’s a knight’s walk that starts at square i and goes to square j. We’ll prove that K(n) holds for every n ≥ 4 by induction. (Our walks will be very inefficient—repeatedly revisiting many squares—but the repetition will make the proof somewhat simpler.)

69

5.2 Proofs by Mathematical Induction

base case (n = 4): We must show that there exists a knight’s walk for a 4-by-4 board, starting at any position and ending at any position. Consider the following walk: 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

This walk w starts at the bottom-left cell and returns to that same cell, visiting every cell of the board along the way. But that means that that there’s a walk starting from cell i and going to cell j, for arbitrary i and j: consider doing the given walk three times in a row. Delete the portion of the triplicated walk before the first visit to i (which is Part I), and delete the portion after the last visit to j (which is in Part III). Because Part II of the triplicated walk is intact, the unexcised portion of the triplicated walk is an i-to-j walk that visits all of the cells one or more times each. inductive case (n ≥ 5): We assume the inductive hypothesis K(n − 1): for any cells i and j in an (n − 1)-by-(n − 1) board, there’s a knight’s walk from i to j. We must prove K(n). To prove K(n), we will use consecutive walks on four (n − 1)-by-(n − 1) sub-boards, and then paste them together. Consider the n-by-n board, and divide it into four overlapping (n − 1)-by-(n − 1) boards: original

A

B

C

D

(The same square is highlighted in each of the four subgrids.) Now, to generate a knight’s walk on the entire board from an arbitrary cell i to an arbitrary cell j, complete the following six consecutive walks: •in an arbitrary one of the sub-boards (A, B, C, or D) in which i appears, a walk from cell i to cell . •a walk from cell to cell in A. •a walk from cell to cell in B. •a walk from cell to cell in C. •a walk from cell to cell in D. •in an arbitrary one of the sub-boards (A, B, C, or D) in which j appears, a walk from cell to cell j. In each of the six cases, the described walk exists by the inductive hypothesis. These six walks consecutively form a knight’s walk from cell i to j in the n-by-n board. Because i and j were arbitrary, we have proven K(n). 5.14 A solution in Python is shown in Figure S.5.1. 5.15 Let R(n) denote the claim that there’s a rook’s tour for any n-by-n chessboard, starting from any corner position of the board. We’ll prove that R(n) holds for all n ≥ 1 by induction on n. base case (n = 1): There’s nothing to do! We start in the only cell, and we’re done after zero moves. inductive case (n ≥ 2): We assume the inductive hypothesis R(n − 1)—that there is a rook’s tour starting in any corner of an (n − 1)-by-(n − 1) board. We must prove R(n). Here is a construction of a rook’s tour of the n-by-n board: 1. Start in any corner. 2. Go straight vertically to the adjacent corner. 3. Go straight across horizontally to the adjacent corner. 4. Move vertically by one cell away from the corner. 5. Complete a rook’s tour of the (n − 1)-by-(n − 1) uncovered sub-board. After step 4, we have visited each cell in one outside column and one outside row, and we’re in the corner of the (n−1)by-(n − 1) remainder of the board. By the inductive hypothesis R(n − 1), a rook’s tour of the (n − 1)-by-(n − 1) uncovered sub-board must exist.

70

Mathematical Induction

1 2 3 4 5 6 7 8 9 10 11 12 13 14

def knight_walk(start, end, shift, n): ''' Consider the n-by-n square in the chessboard whose upper-left coordinate is (1 + shift[0], 1 + shift[1]). For example, a standard chessboard is an 8-by-8 square with shift (0,0); the bottom right quadrant has n = 4 and shift = (4,4). This function constructs a walk from the start square to the end square, covering every square in this n-by-n subboard. ''' if n == 4: # A sequence of 18 moves that covers all 16 squares of a 4-by-4 board ... w = [(1,1), (2,3), (4,4), (3,2), (1,3), (3,4), (4,2), (2,1), (3,3), (4,1), (2,2), (1,4), (2,2), (4,3), (3,1), (1,2), (2,4), (3,2), (1,1)] # ... adjusted by the given (x,y) shift. w = [(a + shift[0], b + shift[1]) for (a,b) in w]

15

# Triplicate the sequence, and trim the portion before the first visit # to start and the last visit to end. return w[w.index(start):-1] + w + w[1:w.index(end)+1]

16 17 18 19

else: # The square we keep returning to. touchstone = (2 + shift[0], 2 + shift[1])

20 21 22 23

# What (n-1)-by-(n-1) subgrid is start in? If it's not in the nth column or nth # row of the grid, then it's in the subgrid starting in the same position as this # whole n-by-n grid. Else, we shift right by one column or down by one row (or both). start_shift = (shift[0] + (start[0] == shift[0] + n), shift[1] + (start[1] == shift[1] + n)) # ... and similarly for the end square. end_shift = (shift[0] + (end[0] == shift[0] + n), shift[1] + (end[1] == shift[1] + n))

24 25 26 27 28 29 30

w1 w2 w3 w4 w5 w6

31 32 33 34 35 36

= = = = = =

knight_walk(start, touchstone, start_shift, n-1) knight_walk(touchstone, touchstone, (shift[0], knight_walk(touchstone, touchstone, (shift[0]+1, knight_walk(touchstone, touchstone, (shift[0], knight_walk(touchstone, touchstone, (shift[0]+1, knight_walk(touchstone, end, end_shift, n-1)

shift[1]), shift[1]), shift[1]+1), shift[1]+1),

n-1) n-1) n-1) n-1)

# # # #

cover cover cover cover

subgrid subgrid subgrid subgrid

A B C D

37

return w1 + w2[1:] + w3[1:] + w4[1:] + w5[1:] + w6[1:]

38

Figure S.5.1 Finding a knight’s walk in an n-by-n chessboard.

Here is an example for an 8-by-8 board:

0Z0Z0Z0Z Z0Z0Z0Z0 6 0Z0Z0Z0Z 5 Z0Z0Z0Z0 4 0Z0Z0Z0Z 3 Z0Z0Z0Z0 2 0Z0Z0Z0Z 1 S0Z0Z0Z0

RZ0Z0Z0Z Z0Z0Z0Z0 6 0Z0Z0Z0Z 5 Z0Z0Z0Z0 4 0Z0Z0Z0Z 3 Z0Z0Z0Z0 2 0Z0Z0Z0Z 1 Z0Z0Z0Z0

0Z0Z0Z0S Z0Z0Z0Z0 6 0Z0Z0Z0Z 5 Z0Z0Z0Z0 4 0Z0Z0Z0Z 3 Z0Z0Z0Z0 2 0Z0Z0Z0Z 1 Z0Z0Z0Z0

0Z0Z0Z0Z Z0Z0Z0ZR 6 0Z0Z0Z0Z 5 Z0Z0Z0Z0 4 0Z0Z0Z0Z 3 Z0Z0Z0Z0 2 0Z0Z0Z0Z 1 Z0Z0Z0Z0

8

8

8

8

7

7

7

7

a

b

c

d

e

f

g

h

a

b

c

d

e

f

g

h

a

b

c

d

e

f

g

h

a

b

c

d

e

f

g

h

s 3

5.16 Observe that a Von Koch line of size s consists of four Von Koch lines of size of one lower level, so the total length of the segments increases by a factor of 43 with each level. That is, write the length of a Von Koch line with level ℓ and size s as length(ℓ, s). Then length(0, s) = s

and

length(1, s) = 4 · length(n, 3s ).

We claim, then, that length(ℓ, s) = ( 34 )ℓ ·s. We’ll prove this fact by induction. Let F(ℓ) denote the claim that length(ℓ, s) = ( 43 )ℓ · s for any s ≥ 0. We’ll prove that F(ℓ) holds for all ℓ ≥ 0 by induction on ℓ. base case (ℓ = 0): By definition, we have length(0, s) = s, and indeed ( 34 )0 · s = 1 · s = s.

71

5.2 Proofs by Mathematical Induction

inductive case (ℓ ≥ 1): We assume the inductive hypothesis F(ℓ − 1) and we must prove F(ℓ). Indeed length(ℓ, s) = 4 · length(ℓ − 1, 3s ) = 4 · ( 43 )ℓ−1 · ( 3s ) = =

( 34 ) · ( 43 )ℓ−1 ( 34 )ℓ · s.

·s

definition of the fractal inductive hypothesis algebra algebra

The total perimeter of the Von Koch snowflake is therefore 3 · length(ℓ, s) = 3 · s · ( 34 )ℓ . 5.17 Write perimeter(ℓ, s) to denote the perimeter of a Sierpi´nski triangle with side length s and level ℓ. Observe that perimeter(0, s) = 3s

and

perimeter(ℓ + 1, s) = 3 · perimeter(ℓ, 2s ).

We claim by induction that perimeter(ℓ, s) = 3 · ( 23 )ℓ s. The base case (ℓ = 0) is immediate, because perimeter(0, s) = 3s, and 3 · ( 23 )0 s = 3s too. For the inductive case (ℓ ≥ 1), we assume the inductive hypothesis—namely perimeter(ℓ − 1, s) = 3 · ( 32 )ℓ−1 s. Now perimeter(ℓ, s) = 3 · perimeter(ℓ − 1, s/2)

definition of the fractal

= 3 · 3 · ( 23 )ℓ−1 2s =3·

inductive hypothesis

( 23 )ℓ s,

algebra

as desired. 5.18 Observe that the outer perimeter of a Sierpi´nski carpet with side length s is precisely 4s. What about the total interior perimeter? It has an interior perimeter 4s3 —the central ( 3s )-by-( 3s ) square—plus 8 times the interior perimeter of Sierpi´nski carpet of level ℓ − 1 with side length 3s . Thus the interior perimeter p(ℓ, s) satisfies p(0, s) = 0

and

p(ℓ + 1, s) =

4s 3

+ 8 · p(ℓ, 3s ).

We claim that p(ℓ, s) = 45 · s · [( 38 )ℓ − 1], which we’ll prove by induction on ℓ. (You might be able to generate this conjecture by writing out the summation for a few levels, and then using Theorem 5.5.) • For the base case, indeed p(0, s) = 0 by definition and 54 · s · [( 38 )0 − 1] = 0, because ( 83 )0 − 1 = 1 − 1 = 0. • For the inductive case, we assume the inductive hypothesis—namely that p(ℓ − 1, s) = 54 · s · [( 83 )ℓ−1 − 1] for any s—and we must analyze p(ℓ, s). p(ℓ, s) = = = = =

4s + 8 · p(ℓ − 1, 3s ) 3 4s + 8 · 45 · ( 3s ) · [( 83 )ℓ−1 3  4 · s · 53 + ( 83 ) · [( 83 )ℓ−1 5   4 · s · 53 + ( 83 )ℓ − 83 5   4 · s · ( 38 )ℓ − 1 . 5

definition of the fractal

− 1] − 1]



inductive hypothesis algebra—making the expression look as much like the goal as possible algebra algebra

Thus the inductive claim is proven. Thus the total internal perimeter of the Sierpi´nski carpet of level ℓ and size s is ( 4s5 ) · [( 38 )ℓ − 1], and the outer perimeter is precisely 4s. Thus the total perimeter is 4s + 4s5 · [( 83 )ℓ − 1]. 5.19 In the Von Koch snowflake of level ℓ, we start with a single s-sided equilateral triangle, and, for each level i = 1, . . . , ℓ, s on each of 3 sides we “bump out” an additional 4i−1 triangles—each with √ sidex2length 3i . line from a By trigonometry, an equilateral triangle with side length x has area 3 · 4 . (Dropping a perpendicular p vertex to the opposite side shows that the height h of the triangle satisfies ( 2x )2 + h2 = x2 , so h = 3x2 /4.)

72

Mathematical Induction

For s = 1, then, the total area of the figure is

"

(area of an equilateral triangle with side length 1) +

=

=

= = = = =

# 3·4

i−1

· (area of a

( 31i )-sided

equilateral triangle)

i=1

" ℓ # √ X 3 · ( 31i )2 3 i−1 + 3·4 · 4 4 i=1 # √ " ℓ X 3 i−1 1 2 1+ 3 · 4 · ( 3i ) 4 i=1 # √ " ℓ X 3 4 i−1 1 (9) 1+ 3 · 4 i=1 # √ " ℓ−1 X 3 1 4 i 1+ 3 · (9) 4 i=0 √   3 1 1 − ( 49 )ℓ 1+ · 4 3 1 − 94 √  3 1 + 35 · (1 − ( 49 )ℓ ) 4 √  3 8 − 35 · ( 94 )ℓ 4 5 √ √  ℓ 2 3 3 3 4 − · . 5 20 9

√ =

ℓ X

an equilateral triangle with side length x has area

3 · ( 31i )2 = 3 ·

1 9i

=3·



1 9



·

x2 4

1 9i−1

reindexing the summation

Theorem 5.5

5.20 Write area(ℓ, s) to denote the area of a Sierpi´nski triangle with side length s and level ℓ. We can give a short inductive proof to show that area(ℓ, s) = ( 43 )ℓ · area(0, s). The base case (ℓ = 0) is immediate, because ( 34 )0 · area(0, s) = 1 · area(0, s) = area(0, s). For the inductive case (ℓ ≥ 1), we assume the inductive hypothesis—namely area(ℓ − 1, s) = ( 43 )ℓ−1 · area(0, s) for any s. Therefore: area(ℓ, s) = 3 · area(ℓ − 1, 2s ) = = = =

definition of the fractal

3 · ( 34 )ℓ−1 · area(0, 2s ) 3 · ( 34 )ℓ−1 · area(0,s) 4 3 ℓ−1 3 · ( ) · area(0, s) 4 4 3 ℓ ( 4 ) · area(0, s),

inductive hypothesis halving the side length of an equilateral triangle quarters the area algebra algebra

completing the inductive proof. √ As in the previous exercise, an equilateral triangle with sides of length x has area 3 · Thus we have that the area of the Sierpi´nski triangle at level ℓ with size 1 is area(ℓ, 1) = ( 34 )ℓ ·

√ 4

3

x2 , 4

so area(0, s) =





s2 . 4

.

5.21 Write area(ℓ, s) to denote the area of a Sierpi´nski carpet with side length s and level ℓ. Here’s a short inductive proof that area(ℓ, s) = ( 98 )ℓ · s2 : The base case (ℓ = 0) is immediate, because ( 89 )0 = 1, and the area of the s-by-s square (the Sierpi´nski carpet at level 0) is s2 . For the inductive case (ℓ ≥ 1), we assume the inductive hypothesis—namely area(ℓ − 1, s) = ( 89 )ℓ−1 · s2 for any s. Therefore: area(ℓ, s) = 8 · area(ℓ − 1, 3s ) = = as desired.

8 · ( 89 )ℓ−1 ( 89 )ℓ · s2 ,

·

( 3s )2

definition of the fractal inductive hypothesis algebra

5.2 Proofs by Mathematical Induction

73

5.22 The perimeter is infinite, because ( 43 )ℓ (see Exercise 5.16) grows without bound as ℓ increases. 5.23 The perimeter of the Sierpi´nski triangle at level ∞ is infinite, because 3 · ( 32 )ℓ (from Exercise 5.17) goes to infinity as ℓ grows. 5.24 The perimeter of the Sierpi´nski carpet at level ∞ is infinite, because 4s + (4s/5) · [( 83 )ℓ − 1] (from Exercise 5.18) goes to infinity as ℓ grows. 5.25 The area of the infinite Von Koch snowflake converges: √ " 3 1+ 4 √  3 1+ 4 √  3 1+ 4 √ 2 3 . 5

area of the Von Koch snowflake = = = =

ℓ−1 1 X 4 i · ( ) 3 i=0 9

1 1 · 3 1 − ( 49 )  1 9 · 3 5

5.26 The area of the Sierpi´nski triangle at level ∞ is zero, because ( 34 )ℓ ·

√ 3 4

# Exercise 5.19

 Corollary 5.6, when ℓ is infinite

(from Exercise 5.20) tends to zero as ℓ grows.

5.27 Like the Sierpi´nski triangle, the area of the Sierpi´nski carpet at level ∞ is zero, because ( 89 )ℓ (from Exercise 5.21) tends to zero as ℓ grows. 5.28 A solution in Python is shown in the second block of code in Figure S.5.2. 5.29 A solution in Python is shown in the last block of code in Figure S.5.2. 5.30 The total of the entries in an n-by-n magic square is 2

n X i=1

i=

n2 (n2 + 1) , 2

by Theorem 5.3. There are n rows, so if their sums are equal then the sum of the entries2 in each must be 1n th of the total. Thus the row, column,2 and diagonal sums of an n-by-n magic square must all be n(n 2+1) . (To check this formula: for n = 3, that value is 3(3 2+1) = 3·10 = 15. Indeed, each row, column, and diagonal of the example magic square given in 2 the exercise sums to 15.) 5.31 Let P(k) denote the claim H2k ≥

k 2

+ 1. We’ll prove P(k) for all integers k ≥ 0 by induction on k:

base case (k = 0): We need to show that H20 = H1 = 1 is at least 20 + 1—and, indeed, 1 ≥ 1. inductive case (k ≥ 1): We assume the inductive hypothesis P(k − 1)—namely, H2k−1 ≥

k−1 + 1, 2

74

Mathematical Induction

1 2 3 4

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38

# # # #

Python solution using a graphics package in which: -- a Point is defined using a (x,y) coordinate pair. -- a Polygon is defined using the list of points that form its vertices. -- to draw a Polygon object p in a given window, we must call the p.draw(window) method.

def sierpinski_triangle(level, length, x, y): '''Draws a Sierpinski triangle with side length length, level level, and with bottom-left coordinate (x,y) in the graphics window called window.''' if level == 0: shape = Polygon(Point(x, y), Point(x + length, y), Point(x + length/2, y - int(length*math.sin(math.pi/3.0)))) shape.draw(window) return 3 * length, 3**0.5 * length**2 / 4.0 else: perimeter, area = 0, 0 for (newX, newY) in [(x, y), (x + length / 2, y), (x + length / 4, y - length * 0.5 * math.sin(math.pi/3.0))]: p, a = sierpinski_triangle(level - 1, length / 2, newX, newY) perimeter += p area += a return perimeter, area def sierpinski_carpet(level, length, x, y): '''Draws a Sierpinski carpet with side length length, level level, and with bottom-left coordinate (x,y) in the graphics window called window.''' if level == 0: square = Rectangle(Point(x, y), Point(x + length, y + length)) square.setFill("blue") square.draw(window) return length * length else: area = 0 for dx in range(3): for dy in range(3): if dx != 1 or dy != 1: area += sierpinski_carpet(level - 1, length / 3, x + dx * (length / 3), y - dy * (length / 3)) return area

Figure S.5.2 Drawing a Sierpi´nski triangle and Sierpi´nski carpet in Python.

and we must prove P(k):

k

2 X 1 H2k = i i=1  k− 1    2X 2k X 1 1 +  = i i i=1 i=2k−1 +1   2k X 1  = H2k−1 +  i i=2k−1 +1   2k X 1 k−2  +1+ ≥ 2 i k− 1 i=2

+1

definition of harmonic numbers

definition of summation

definition of harmonic numbers

inductive hypothesis

5.2 Proofs by Mathematical Induction  2k X k−2 ≥ +1+ 2 k− 1 i=2

+1

 1 2k

k−2 1 = + 1 + 2k−1 · k 2 2 k−2 1 = +1+ 2 2 k−1 = + 1. 2

75

every term in the summation is ≥

1 2k

there are 2k−1 terms in the summation algebra algebra

Thus P(k) is true too. 5.32 Assuming Theorem 5.9, we can finish the proof by observing the following. Let k = ⌊log n⌋, so that 2k ≤ n < 2k+1 — that is, k ≤ log n < k + 1. Therefore, because Ha > Hb if and only if a > b, we know that H2k ≤ Hn < H2k+1 .

(∗)

By Theorem 5.9 and (∗), then, we know that k/2 + 1 ≤ H2k ≤ Hn < H2k+1 ≤ k + 2. Thus k/2 + 1 ≤ Hn < k + 2. By definition of k, we have that k ≤ log n < k + 1, which means that log n − 1 < k (and thus (log n − 1)/2 + 1 < k/2 + 1), and that log n ≥ k (and thus log n + 2 ≥ k + 2). Putting these facts together yields (log n − 1)/2 + 1 < k/2 + 1 ≤ Hn < k + 2 ≤ log n + 2. 5.33 Let x ≥ −1 be arbitrary. Let P(n) denote the claim that (1 + x)n ≥ 1 + nx. We’ll prove that P(n) holds for every integer n ≥ 1 by induction. base case (n = 1): P(1) follows because (1 + x)n and 1 + nx are both equal to 1 + x when n = 1. inductive case (n ≥ 2): We assume the inductive hypothesis P(n − 1)—namely that (1 + x)n−1 ≥ 1 + (n − 1)x—and we must prove P(n). (1 + x)n = (1 + x) · (1 + x)n−1 ≥ (1 + x) · (1 + (n − 1)x)

inductive hypothesis

(Note that a ≤ b ⇒ ay ≤ by only when y ≥ 0, but because x ≥ −1 we know that 1 + x ≥ 0.) = 1 + (n − 1)x + x + (n − 1)x2 = 1 + nx + (n − 1)x ≥ 1 + nx

2

algebra (multiplying out) algebra (combining like terms)

because both n − 1 ≥ 0 (because n ≥ 2) and x2 ≥ 0 (because all squares are nonnegative; see Exercise 4.46), so (n − 1)x2 ≥ 0. 5.34 For n = 3, we have 2n = 8 and n! = 3 · 2 · 1 = 6. For n = 4, we have 2n = 16 and n! = 4 · 3 · 2 · 1 = 24. So we’ll prove that 2n ≤ n! for all integers n ≥ 4. For the base case (n = 4), we argued above that 2n = 16 and n! = 24. And, indeed, 24 ≥ 16. For the inductive case (n ≥ 5), we assume the inductive hypothesis 2n−1 ≤ (n − 1)! and we must prove 2n ≤ n!. But 2n = 2 · 2n−1 ≤ 2 · (n − 1)! ≤ n · (n − 1)! = n!

inductive hypothesis n ≥ 5, so definitely n ≥ 2

5.35 Fix a particular integer b ≥ 1. Unlike most of these exercises, probably the hardest part here is identifying the integer k after which the inequality bn ≤ n! begins to hold. (Another way of stating it: unusually, the base case is probably harder to prove than the inductive case.) It turns out that k = 2b2 will work. Let P(n) be the claim that bn ≤ n!. Here’s a proof that P(n) holds for all n ≥ 2b2 .

76

Mathematical Induction

2

base case (n = 2b2 ): We must prove P(2b2 )—that is, we must show that b[2b ] ≤ (2b2 )!. 2

(2b2 )! =

2b Y

i

definition of factorial

i=1 2

2b Y



i

dropping multiplicands i = 1, 2, . . . , b2

i=b2 +1 2

2b Y



b2

for i = b2 + 1, b2 + 2, . . . , 2b2 , we have i ≥ b2

i=b2 +1

= (b2 )[b

2

]

2

= b[2b ] .

there are b2 multiplicands, each equal to b2 Theorem 2.8.4: (bx )y = bxy

2

Thus we have shown that (2b2 )! ≥ b[2b ] , which is just P(2b2 ). inductive case (n ≥ 2b2 + 1): We assume the inductive hypothesis P(n − 1)—that is, bn−1 ≤ (n − 1)!—and we must prove bn ≤ n!. But this follows precisely as in Exercise 5.34: bn = b · bn−1 ≤ b · (n − 1)! ≤ n · (n − 1)! = n!.

inductive hypothesis n ≥ 2b2 + 1, so definitely n ≥ b

5.36 Note that n2 ≥ 3n if and only if n2 − 3n = n · (n − 3) ≥ 0. Thus n2 = 3n for n ∈ {0, 3}. Let’s choose k = 3. We’ll prove that n2 ≥ 3n for all n ≥ 3, by induction on n. For the base case (n = 3), we have n2 = 9 and 3n = 9. And, indeed, 9 ≥ 9. For the inductive case (n ≥ 4), we assume the inductive hypothesis (n − 1)2 ≥ 3(n − 1) and we must prove n2 ≥ 3n: n2 = (n − 1)2 + 2n − 1 ≥ 3(n − 1) + 2n − 1 = 3n + 2n − 4.

inductive hypothesis

Because n ≥ 4, we know 2n − 4 ≥ 0, so we have now shown that n2 ≥ 3n. 5.37 Observe that n3 ̸≤ 2n for n = 9, as 93 = 729 and 29 = 512. But 103 = 1000 and 210 = 1024. And once these functions cross, they do not cross back. Let’s prove it: let’s prove that n3 ≤ 2n for all n ≥ 10 by induction on n. For the base case (n = 10), we have n3 = 1000 and 2n = 1024. And, indeed, 1000 ≤ 1024. For the inductive case (n ≥ 11), we assume the inductive hypothesis (n − 1)3 ≤ 2n−1 and we must prove n3 ≤ 2n . But 2n = 2 · 2n−1 ≥ 2 · (n − 1)3

inductive hypothesis

= 2 · (n − 3n + 3n − 1) 3

2

= n3 + n3 − 6n2 + 6n − 2 ≥ n3 + (11n2 − 6n2 ) + (66 − 2) 3

n ≥ 11 implies both that n3 ≥ 11n2 and 6n ≥ 66

2

= n + 5n + 64 ≥ n3 .

5n2 ≥ 0 and 64 ≥ 0

5.38 Let P(n) be the stated property, that odd?(n) returns True if and only if n is odd. We’ll prove that P(n) holds for all integers n ≥ 0 by induction on n. For the base case (n = 0), observe that 0 is even and that by inspection of the algorithm, indeed odd?(0) returns False as desired.

5.2 Proofs by Mathematical Induction

77

For the inductive case (n ≥ 1), we assume the inductive hypothesis P(n − 1)—that is, odd?(n − 1) returns True if and only if n − 1 is odd. We must prove P(n). odd?(n) returns True ⇔ odd?(n − 1) returns False ⇔ ¬ (n − 1 is odd) ⇔ n − 1 is even ⇔ n is odd. 5.39 Consider the following property: P(k) = for any integer n, sum(n, n + k) returns

inspection of the algorithm inductive hypothesis

Pn+k i=n

i.

We’ll prove that P(k) holds Pfor all integers k ≥ 0, by induction on k. (An equivalent way of stating this proof is that we are proving sum(n, m) = mi=n i for any n and m ≥ n by induction on the quantity m − n.) For the base case, we must prove P(0). Let n be an arbitrary integer. Then sum(n, n + 0) = n + 0 =n =

n+0 X

inspection of the algorithm

i.

i=n

For the inductive case (k ≥ 1), we assume the inductive hypothesis P(k − 1), and we must prove P(k). sum(n, n + k) = n + sum(n + 1, n + k) =n+

n+k X

inspection of the algorithm inductive hypothesis (applied on n′ = n + 1)

i

i=n+1

=

n+k X

i.

i=n

5.40 The proof of Exercise 5.39 would have to change almost not at all. The inductive hypothesis still applies (because we have (m − 1) − n = (m − n) − 1, just like m − (n + 1) = (m − n) − 1). The key fact about summations that we used in the previous proof was n+k X

i=n+

i=n

n+k X

i;

i=n+1

in the modified proof, we’d use the key fact that n+k X

i = (n + k) +

i=n

n+k−1 X

i.

i=n

But no other changes would have to be made. 5.41 Let P(n) denote the proposition that 8n − 3n is divisible by 5. We will prove P(n) for all n ≥ 0 by induction on n. For the base case (n = 0), we must prove that 80 − 30 is divisible by 5. Because 80 − 30 = 1 − 1 = 0, the property holds immediately because 0 is divisible by 5. For the inductive case (n ≥ 1), we assume the inductive hypothesis P(n − 1), that 8n−1 − 3n−1 is evenly divisible by 5—say, 8n−1 − 3n−1 = 5k for some integer k. We must prove P(n), that 8n − 3n is evenly divisible by 5 too. 8n − 3n = 8 · (8n−1 ) + 3 · (3n−1 ) = 8 · (8n−1 ) + 3 · (3n−1 ) + 5 · (3n−1 ) − 5 · (3n−1 ) = 8 · (8n−1 + 3n−1 ) − 5 · (3n−1 ) = 8 · (5k) − 5 · (3n−1 ) = 5 · (8k − 3

n−1

).

inductive hypothesis

78

Mathematical Induction

Because (8k − 3n−1 ) is an integer, then, 8n − 3n is evenly divisible by 5, and the theorem follows. 5.42 By computing 9n mod 10 for a few small values of n, a pattern begins to emerge: 90 mod 10 = 1 mod 10

=1

1

=9

2

=1

9 mod 10 = 9 mod 10 9 mod 10 = 81 mod 10 3

9 mod 10 = 729 mod 10 = 9 94 mod 10 = 6561 mod 10 = 1 Thus we conjecture that the following property P(n) holds for any integer n ≥ 0: ( n

9 mod 10 =

1 if n is even 9 if n is odd.

Let’s prove it by induction on n. base case (n = 0): The number 0 is even, and indeed 90 mod 10 = 1. Thus P(0) holds. inductive case (n ≥ 1): We assume the inductive hypothesis P(n − 1), and we must prove P(n). Note that, by definition of mod, the fact that x mod 10 = y means that there exists an integer k such that x = 10k + y. We’ll use that fact in our proof: 9n mod 10 definition of exponentiation = 9 · 9n−1 mod 10 ( 9 · (10k + 1) mod 10, for some integer k if n − 1 is even = inductive hypothesis and the above discussion 9 · (10k + 9) mod 10, for some integer k if n − 1 is odd ( (90k + 9) mod 10, for some integer k if n is odd = n − 1 is odd ⇔ n is even; multiplying out the product (90k + 81) mod 10, for some integer k if n is even ( 9 mod 10 if n is odd = 10k + b mod 10 = b mod 10 for any integers k and b 81 mod 10 if n is even ( 9 if n is odd = 1 if n is even.

Thus we have shown P(n), and the inductive case follows. 5.43 By computing 2n mod 7 for a few small values of n, we see that the value cycles among three values: 1 (when n is a multiple of three), 2 (when n is one more than a multiple of three), and 4 (when n is two more than a multiple of three). Here’s a slightly glib way of phrasing this property: 2n mod 7 = 2n mod 3 . We’ll prove that this property holds for all n ≥ 0 by induction on n. For the base case (n = 0), we have 20 mod 7 = 1 mod 7 = 1 = 20 = 20 mod 3 , as required.

5.3 Proofs by Mathematical Induction

79

For the inductive case (n ≥ 1), we assume the inductive hypothesis 2n−1 mod 7 = 2n−1 mod 3 , and we must prove 2 mod 7 = 2n mod 3 . Here is the proof: n

2n mod 7 = (2 · 2n−1 ) mod 7   2 · (7k + 1) mod 7, for some integer k if n − 1 mod 3 = 0 = 2 · (7k + 2) mod 7, for some integer k if n − 1 mod 3 = 1  2 · (7k + 4) mod 7, for some integer k if n − 1 mod 3 = 2   2 mod 7 if n − 1 mod 3 = 0 = 4 mod 7 if n − 1 mod 3 = 1  8 mod 7 if n − 1 mod 3 = 2   2 if n − 1 mod 3 = 0, a.k.a. if n mod 3 = 1 = 4 if n − 1 mod 3 = 1, a.k.a. if n mod 3 = 2  1 if n − 1 mod 3 = 2, a.k.a. if n mod 3 = 0

definition of exponentiation

inductive hypothesis and the discussion in Exercise 5.42

7k + b mod 7 = b mod 7 for any integers k and b

properties of mod

= 2n mod 3 . 5.44 When we’re counting from 00 · · · 0 to 11 · · · 1, how many times does the ith bit have to change? The least-significant bit changes every step. The second-least-significant bit changes every other step. The third-least-significant bit changes every fourth step. And so forth. In total, then, the number of bit flips is n−1 n n−1 n−1 n−1 n X X X X X 2 = 2n−i = 2j = 2k+1 = 2 · 2k . i 2 i=0 i=0 j=1 k=0 k=0

Thus, by Example 5.2, the total number of bit flips is 2 · (2n − 1) = 2n+1 − 2. (For n = 3, as in the example in the question, this quantity is 24 − 2 = 16 − 2 = 14, so the formula checks out.) Thus the average number of bit flips per step is 1 2n+1 − 2 = 2 − n−1 , 2n 2 just shy of two bit flips per step on average. 5.45 Let P(n) be the proposition that the regions defined by n fences can be labeled 0 (no dog) or 1 (dog) so that no pair of adjacent regions has the same label. We’ll prove that P(n) holds for all n by induction on n. For the base case (n = 0), there are no fences and the whole yard is a single region. Label it 1. Every pair of adjacent regions has different labels vacuously, because there are no pairs of adjacent regions. Thus P(0) is true. For the inductive case (n ≥ 1), we assume the inductive hypothesis P(n − 1): it’s possible to label the regions defined by any n − 1 fences so that no adjacent pair of regions has the same label. We must prove P(n). Consider an arbitrary configuration of n fences, numbered #1, #2, . . . #(n − 1), #n. Ignoring fence #n, by the inductive hypothesis we can label the polygonal regions defined by only the first n − 1 fences—fences #1, #2, . . . #(n − 1)—so that no adjacent pair of these regions have the same label. Now place the nth fence down, and invert the label for every region on one side of fence #n (0 becomes 1; 1 becomes 0). We claim that adjacent regions in this configuration must different labels: • If two regions are adjacent because they share a fence other than fence #n, then they have different labels because, by the inductive hypothesis, our labels for the first n − 1 fences had no adjacent regions with the same label. • If two regions are adjacent because of fence #n, then they have different labels because they had the same label before the inversion (they were actually the same region before fence #n was added), so after the inversion they have different labels. Therefore, P(n − 1) implies P(n), and the entire claim follows.

5.3

Proofs by Mathematical Induction 5.46 We proceed by strong induction on n.

80

Mathematical Induction

Base cases (n = 0 and n = 1): By inspection, on inputs n = 0 and n = 1 the algorithm returns in Line 2 with only the one call to parity (that is, without making a recursive call). Thus the depth of the recursion is 1, and indeed 1 + ⌊0/2⌋ = 1 + ⌊1/2⌋ = 1 + 0 = 1. Inductive case (n ≥ 2): We assume the inductive hypothesis—for any integer 0 ≤ k < n, the recursion depth of parity(k) is 1 + ⌊k/2⌋—and we must prove that the recursion depth of parity(n) is 1 + ⌊n/2⌋. [recursion depth of parity(n)] = 1 + [recursion depth of parity(n − 2)] by inspection (because n ≥ 2 and by Line 4) = 1 + 1 + ⌊(n − 2)/2⌋ by the inductive hypothesis = 1 + ⌊1 + (n − 2)/2⌋ adding an integer inside or outside the floor is equivalent = 1 + ⌊n/2⌋ , as desired. 5.47 We proceed by strong induction on n. Base cases (n = 0 and n = 1): By inspection, on inputs n = 0 and n = 1 the algorithm returns ⟨b0 ⟩ = ⟨n⟩ in Line 2. Indeed, exactly as desired, we have 0 X

bi 2i = b0 20 = b0 = n.

i=0

Inductive case (n ≥ 2): We assume the inductive hypothesis—namely that, for any integer 0 ≤ m < n, the value returned P by toBinary(m) is a tuple ⟨aℓ , . . . , a0 ⟩ satisfying ℓi=0 ai 2i = m. We must show that toBinary(n) returns ⟨bk , . . . , b0 ⟩ Pk such that i=0 bi 2i = n. Let ⟨bk , bk−1 , . . . , b1 , b0 ⟩ be the value returned by toBinary(n) in Line 6. By inspection of the algorithm, we see that ⟨bk , bk−1 , . . . , b1 ⟩ = ⟨ak−1 , . . . , a0 ⟩, where ⟨ak−1 , . . . , a0 ⟩ is the value returned by toBinary(⌊n/2⌋), and b0 = parity(n). Thus: k X

" i

bi 2 =

i=0

k X

# ai−1 2

+ (parity(n)) · 20

i

i=1

=2·

" k−1 X

inspection of the algorithm/definition of bi and ai values

# ai 2

i

+ (parity(n)) · 20

reindexing and pulling out a factor of 2

i=0

= 2 · ⌊n/2⌋ + (parity(n)) · 20 ( if n is even 2 · n2 + 0 = 2 · n−1 + 1 if n is odd 2

inductive hypothesis (which applies because 0 ≤ ⌊n/2⌋ < n) definition of parity and floor

= n. Thus we have shown that

Pk i=0

bi 2i = n, where toBinary(n) returns ⟨bk , . . . , b0 ⟩, precisely as required.

5.48 Let P(k) denote the following claim: If

Pk−1 i=0

ai 2i =

Pk−1 i=0

bi 2i with a, b ∈ {0, 1}k , then a = b.

We’ll prove P(k) for all k ≥ 1, by induction on k. base case (k = 1): There are only two 1-bit bitstrings, ⟨0⟩ and ⟨1⟩. If a0 · 20 = b0 · 20 , then necessarily a0 = b0 . inductive case (k ≥ 1): We assume the inductive hypothesis P(k − 1), and we must prove P(k). Suppose that we have bitstrings a ∈ {0, 1}k and b ∈ {0, 1}k such that k−1 X

ai 2i =

i=0

Write n to denote the common value of these summations.

k−1 X i=0

bi 2i .

5.3 Proofs by Mathematical Induction

81

First, we argue that ak−1 = bk−1 (that is, the most-significant bits match). Suppose not—namely suppose (without loss of generality) that ak−1 = 1 but bk−1 = 0. But then k−1 X

and

ai 2i = 1 · 2k−1 +

i=0

i=0

k−1 X

k−2 X

bi 2i = 0 · 2k−1 +

i=0

by Example 5.2. But then Now observe that

Pk−1 i=0

k−2 X

k−1 X

bi 2i ≤

i=0

ai 2 ≥ 2 i

ai 2i ≥ 1 · 2k−1 = 2k−1

k−1

ai 2i =

and

Pk−1 i=0

k−1 X

bi 2 ≤ 2 i

k−2 X

2i

= 2k−1 − 1

i=0 k−1

− 1, so these two summations cannot be equal.

bi 2i

by assumption

−ak−1 2k−1 = −bk−1 2k−1

by the above argument that ak−1 = bk−1

i=0

i=0

and therefore k−2 X i=0

ai 2i =

k−2 X

bi 2i .

by subtraction

i=0

By the inductive hypothesis P(k − 1), then, we know that ⟨ak−2 , ak−3 , . . . , a0 ⟩ = ⟨bk−2 , bk−3 , . . . , b0 ⟩. Because we’ve already shown that ak−1 = bk−1 , we therefore know that a = b, and we’ve established P(k). 5.49 Here’s the algorithm: remainder(n, k): // assume n ≥ 0 and k ≥ 1 are integers. 1 if n ≤ k − 1 then 2 return n 3 else 4 return remainder(n − k, k)

Let k ≥ 1 be an arbitrary integer. For an integer n, let Pk (n) denote the property that remainder(n, k) = n mod k. We claim that Pk (n) holds for any integer n ≥ 0. We proceed by strong induction on n: Base cases (n ∈ {0, 1, . . . , k − 1}): By inspection, on input n ≤ k − 1, the algorithm returns n in Line 2. Indeed, by definition of mod we have that n mod k = n for any such n. Thus Pk (0), Pk (1), . . . , Pk (k − 1) all hold. Inductive case (n ≥ k): We assume the inductive hypothesis Pk (0) ∧ Pk (1) ∧ · · · ∧ Pk (n − 1), and we must prove Pk (n). remainder(n, k) = remainder(n − k, k) by inspection (specifically because n ≥ k and by Line 4) = (n − k) mod k by the inductive hypothesis Pk (n − k), specifically because n ≥ k ⇒ n > n − k ≥ 0 = n mod k. (n − k) mod k = n mod k by Definition 2.11 5.50 Here is the algorithm: baseConvert(n, k): // assume n ≥ 0 and k ≥ 2 are integers. 1 if n ≤ k − 1 then 2 return ⟨n⟩ 3 else 4 ⟨bk , . . . , b0 ⟩ := baseConvert(⌊n/k⌋ , k) 5 x := remainder(n, k) 6 return ⟨bk , . . . , b0 , x⟩

We must prove that baseConvert(n, k) = ⟨bℓ , bℓ−1 , . . . , b0 ⟩ implies n =

Pℓ i=0

ki bi . We’ll use strong induction on n:

82

Mathematical Induction

Base cases (n ∈ {0, 1, . . . , k − 1}): By inspection, on input n ∈ {0, 1, . . . , k − 1} the algorithm returns ⟨b0 ⟩ = ⟨n⟩ in Line 2. Indeed, exactly as desired, we have that 0 X

bi ki = b0 k0 = b0 = n.

i=0

Inductive case (n ≥ k): We assume the inductive hypothesis—for any integer 0 ≤ m < n, the value ⟨aℓ , . . . , a0 ⟩ returned P by baseConvert(m, k) satisfies ℓi=0 ai ki = m. We must show that baseConvert(n, k) returns ⟨bt , . . . , b0 ⟩ such Pt that i=0 bi ki = n. Let ⟨bt , bt−1 , . . . , b1 , b0 ⟩ be the value returned by baseConvert(n, k) in Line 6. By inspection, ⟨bt , bt−1 , . . . , b1 ⟩ = ⟨aℓ−1 , . . . , a0 ⟩, where ⟨aℓ−1 , . . . , a0 ⟩ is the value returned by baseConvert(⌊n/k⌋ , k), and b0 = remainder(n, k). Thus: ℓ X i=0

" i

bi k =

ℓ X

# i

ai−1 k

i=1

=k·

" ℓ−1 X

+ (remainder(n, k)) · k0

inspection of the algorithm/definition of bi and ai values

+ (remainder(n, k)) · k0

reindexing and pulling out a factor of k

# i

ai k

i=0

= k · ⌊n/k⌋ + (remainder(n, k)) · k0 inductive hypothesis (which applies because 0 ≤ ⌊n/k⌋ < n) = k · ⌊n/k⌋ + n mod k Exercise 5.49 = n. by the definition of mod, Definition 2.11 Pℓ i Thus we have shown that i=0 bi k = n, where baseConvert(n, k) returns ⟨bℓ , . . . , b0 ⟩, precisely as required. 5.51 A solution in Python is shown in Figure S.5.3. 5.52 Let P(n) denote the claim “the first player can force a win if and only if n mod 3 ̸= 0.” We’ll prove that P(n) holds for any integer n ≥ 1, by strong induction on n. There are two base cases: if n = 1 or n = 2, then the first player immediately wins by taking n cards (and indeed 1 mod 3 ̸= 0 and 2 mod 3 ̸= 0). For the inductive case, let n ≥ 3 be arbitrary. We’ll show that, assuming the inductive hypotheses P(n−1) and P(n−2), that P(n) holds as well. The key observation is the following: • if n is divisible by three, then neither n − 1 nor n − 2 are. • if n is not divisible by three, then either n − 1 or n − 2 is.

1 2 3 4 5 6 7 8

def remainder(n, k): ''' Compute n mod k, given n>=0 and k>=1. ''' if n 2 with d satisfying 2 ≤ d < n—that is, d ∈ Zn and d ̸= 0. Then GCD(d, n) ≥ 2, so d and n are not relatively prime. By Theorem 7.24, d−1 does not exist in Zn . 7.106 For p = 2, we have 2p+1 mod p = 23 mod 2 = 8 mod 2 = 0. Otherwise, by Fermat’s Little Theorem, when p ̸= 2 (and thus 2 ̸≡p 0), we have p+1

2

mod p = 4 · 2

p−1

mod p = 4 mod p.

For any p ≥ 5, we have 4 mod p = 4. For p = 3, we have 4 mod 3 = 1. Thus   0 if p = 2 p+1 2 mod p = 1 if p = 3  4 otherwise. 7.107 249 is not prime. If n is prime, then Fermat’s Little Theorem establishes that an−1 mod n = 1 for any a ̸= 0 in Zn . For a = 247, then, if n is prime, we’d have 247n−1 mod n = 1. But 247n−1 mod n ̸= 1 for n = 249, so 249 is not prime. (Though you can’t conclude anything about the primality of 247 from the fact stated in the exercise, it turns out that 247 is not prime: 247 = 13 · 19.) P Q 7.108 Let x = ki=1 ai Ni di , where N = ki=1 ni and Ni = N/ni and di is the multiplicative inverse of Ni in Zni . We must show that x mod nj = aj for all 1 ≤ j ≤ k: " k # X x mod nj = definition of x ai Ni di mod nj 

=

i=1 k X

 ai 

i=1



= aj 

Y ℓ̸=j

Y

  nℓ  di  mod nj

definition of Ni

ℓ̸=i



nℓ  dj mod nj

all other terms are multiples of nj

138

Number Theory

= aj · (Nj dj mod nj ) mod nj = aj · 1 mod nj = aj .

definition of Nj , and (7.4.3) dj was chosen as the multiplicative inverse of Nj mod nj

7.109 Observe that for any prime number n, we must have that φ(n) = n − 1 (there is no positive integer < n that shares a divisor other than 1 with a prime number). Thus the Fermat–Euler theorem says that 1 = aφ(n) mod n = an−1 mod n. That’s just Fermat’s Little Theorem. 7.110 The argument is direct: observe that a · aφ(n)−a = aφ(n) , and aφ(n) ≡n 1 by the Fermat–Euler Theorem. The integers less than 60 that are relatively prime to it are {1, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 49, 53, 59}, so φ(60) = 16. Note (either by calculator or using mod-exp from Figure 7.6) that 715 mod 60 = 43 and 1716 mod 60 = 53, and 3116 mod 60 = 31. So the claim is that 7−1 = 43 and 17−1 = 53 and 31−1 = 31. Indeed 7 · 43 = 301 ≡60 1 and 17 · 53 = 901 ≡60 1 and 31 · 31 = 961 ≡60 1. 7.111 Here’s an implementation in Python. (See Exercise 7.25 for modexp and Exercise 7.31 for euclid.) 1 2 3 4 5

def totient(n): count = 0 for k in range(1, n): count += (euclid(n, k) == 1) return count

6 7 8

def inverse(a,n): return modexp(a, totient(n) - 1, n)

7.112 Again in Python: 1 2 3 4 5 6 7 8 9

def verify_carmichael(n): divisors = [d for d in range(2, n) if n % d == 0] success = True for a in range(1, n): if euclid(a,n) == 1 and modexp(a, n-1, n) != 1: success = False if len(divisors) > 0 and success: print("Carmichael!", n, "is not prime -- divisors are", divisors, "-- but it passes the test.")

10 11 12

# verify_carmichael(561) # --> Carmichael! 561 is not prime -- divisors are [3, 11, 17, 33, 51, 187] -- but it passes the test.

7.113 If n is composite, then for some p ∈ {2, . . . , n − 1} we have p | n. Observe that GCD(pn−1 , n) ̸= 1 (p is a common divisor), and thus pn−1 mod n is also divisible by p—and thus is not equal to 1. 7.114 A Python solution (which depends on the Sieve of Eratosthenes; see Figure S.7.2): 1 2 3 4 5 6 7 8 9 10

def korselt(n): carmichaels = [] primes = sieve_of_eratosthenes(n) for candidate in range(1, n): divisors = [d for d in range(2, candidate) if candidate % d == 0] squarefree = not any([candidate % (d * d) == 0 for d in divisors]) korselt_test = all([(candidate - 1) % (p - 1) == 0 for p in divisors if p in primes]) if len(divisors) > 1 and squarefree and korselt_test: carmichaels.append(candidate) return carmichaels

11 12

# korselt(10000) --> [561, 1105, 1729, 2465, 2821, 6601, 8911]

7.115 Suppose that n is an even Carmichael number. Note that n > 3 by Korselt’s theorem (so that n is composite). Then, by definition 2 | n—and 4 ̸ | n because, by Korselt’s theorem, n must be squarefree. Let p ≥ 3 be the next-smallest prime

7.5 Cryptography

139

factor of n. (There must be one, as n > 3 and 4 ̸ | n.) By Korselt’s theorem, p − 1 | n − 1—but p − 1 is even, and n − 1 is odd! That’s a contradiction. 7.116 Here is a solution in Python: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

7.5

def miller_rabin(n, k=16): '''Test whether n is prime, using k trials of the Miller-Rabin test. The probability of a false positive is < 2^k.''' r = 0 while (n - 1) % (2 * 2**r) == 0: r += 1 d = (n - 1) / 2**r for i in range(k): a = random.randint(1, n-1) sigma = [a**d % n] while len(sigma) b by simply performing exponentiation and comparing xe to c. Thus, in each step, we identify the middle of the range of candidate values of x, and compare x and b. In log n steps, we’ve found b. In RSA, on the other hand, we are given a number c and a number e and a number n, and we must find the value b such that be mod n = c. The issue is that, given a candidate x, even knowing that xe mod n < be mod n, we don’t know that x < b. We don’t know how many times xe and be “wrap around” in the modular computation. (For example, 35 mod 11 is larger than 25 mod 11—but 2 < 3, so we’re in trouble!) 7.142 See keygen in Figure S.7.4. 7.143 See encrypt and decrypt in Figure S.7.4. 7.144 See str2list and list2str (which rely on str2int, int2str, and baseConvert) in Figure S.7.4. 7.145 Here’s Python code testing these implementations (and also testing the implementation in Exercise 7.147):

142

Number Theory

1 2 3 4 5 6 7 8

def keygen(p, q): '''Generate an RSA keypair given two primes p, q.''' n = p * q e = 3 while euclid(e, (p-1) * (q-1)) != 1: e += 2 d = inverse(e, (p-1) * (q-1)) return ((e, n), (d, n))

9 10 11 12 13

def encrypt(plaintext, pubkey): '''Encrypt an integer using the RSA public key pubkey.''' e, n = pubkey[0], pubkey[1] return modexp(plaintext, e, n)

14 15 16 17 18

def decrypt(ciphertext, privkey): '''Decrypt an encoded message using the RSA private key privkey.''' d, n = privkey[0], privkey[1] return modexp(ciphertext, d, n)

19 20 21 22 23 24 25 26 27 28 29 30

def baseConvert(n,b): '''Convert an integer n into a base-b integer.''' d = [] c = 1 while n > 0: dI = (n // c) % b n = n - dI * c c = c * b d.append(dI) d.reverse() return d

31 32 33 34 35 36 37

def str2int(s): '''Convert an ASCII string, viewed as a base-256 number, to an integer.''' n = 0 for i in range(len(s)): n = ord(s[i]) + 256*n return n

38 39 40 41 42 43 44 45 46

def int2str(n): '''Convert an integer to an ASCII string, viewed as a base-256 number.''' s = [] while n > 0: s.append(chr(n % 256)) n = n // 256 s.reverse() return "".join(s)

47 48 49 50 51

def str2list(s, blockSize): '''Convert an ASCII string, viewed as a base-256 number, to an base-blockSize list of integers.''' return baseConvert(str2int(s), blockSize)

52 53 54 55 56 57 58 59

def list2str(L, blockSize): '''Convert a base-blockSize list of integers to an ASCII string, viewed as a base-256 number.''' x = 0 for n in L: x = n + blockSize*x return int2str(x)

Figure S.7.4 An implementation of RSA in Python.

7.5 Cryptography

1 2 3 4

143

p = 5277019477592911 q = 7502904222052693 pub, priv = keygen(p, q) d, n = priv[0], priv[1]

5 6 7 8 9 10

plaintext = "THE SECRET OF BEING BORING IS TO SAY EVERYTHING." ciphertext = [encrypt(m, pub) for m in str2list(plaintext, n)] decoded = [decrypt(m, priv) for m in ciphertext] print(decoded) print(list2str(decoded, n))

11 12 13 14

decoded = [decryptCRT(m, d, p, q) for m in ciphertext] print(decoded) print(list2str(decoded, n))

7.146 Here’s Python code, using the Miller–Rabin test (see Exercise 7.116): 1 2 3 4 5 6 7 8 9

def find_prime(lo, hi): '''Repeatedly select a random odd number not divisible by 5 in [lo, hi] and return it if the Miller-Rabin test claims it's prime.''' while True: n = random.randint(lo, hi) if n % 10 in [0, 2, 4, 5, 6, 8]: # n is definitely not prime continue if miller_rabin(n): return n

7.147 Here’s the alternative implementation of encrypt: 1 2 3 4 5 6 7

def decryptCRT(ciphertext, d, p, q): '''Decrypt an encoded message using the RSA private key (d, p*q).''' cP = modexp(ciphertext, d, p) cQ = modexp(ciphertext, d, q) x, y, r = extended_euclidean_algorithm(p, q) c = (cP*y*q + cQ*x*p) % (p*q) return c

7.148 Note that d

d mod p−1

c mod p = [c h

· ck(p−1) ] mod p

i

= [cd mod p−1 mod p] · [(ck )p−1 ] mod p mod p h

i

= [cd mod p−1 mod p] · [1] mod p by Fermat’s Little Theorem, unless ck ≡p 0. But if ck ≡p 0 then c ≡p 0 by Exercise 7.48, in which case cd mod p = 0 = cd mod p−1 mod p.

8 8.2

Relations

Formal Introduction 8.1 The relation divides on {1, 2, . . . , 8} is   ⟨1, 1⟩, ⟨1, 2⟩, ⟨1, 3⟩, ⟨1, 4⟩, ⟨1, 5⟩, ⟨1, 6⟩, ⟨1, 7⟩, ⟨1, 8⟩, ⟨2, 2⟩, ⟨2, 4⟩, ⟨2, 6⟩, ⟨2, 8⟩, . ⟨3, 3⟩, ⟨3, 6⟩, ⟨4, 4⟩, ⟨4, 8⟩, ⟨5, 5⟩, ⟨6, 6⟩, ⟨7, 7⟩, ⟨8, 8⟩ 8.2 The subset relation on P({1, 2, 3}) is  ⟨{} , {1}⟩, ⟨{} , {2}⟩, ⟨{} , {3}⟩⟨{} , {1, 2}⟩, ⟨{} , {1, 3}⟩, ⟨{} , {2, 3}⟩, ⟨{} , {1, 2, 3}⟩,     ⟨{1} , {1, 2}⟩, ⟨{1} , {1, 3}⟩, ⟨{1} , {1, 2, 3}⟩, ⟨{2} , {1, 2}⟩, ⟨{2} , {2, 3}⟩, ⟨{2} , {1, 2, 3}⟩,     ⟨{3} , {1, 3}⟩, ⟨{3} , {2, 3}⟩, ⟨{3} , {1, 2, 3}⟩, ⟨{1, 2} , {1, 2, 3}⟩, ⟨{1, 3} , {1, 2, 3}⟩, ⟨{2, 3} , {1, 2, 3}⟩ 8.3 The isProperPrefix relation on bitstrings of length ≤ 3 is  ⟨ϵ, 0⟩, ⟨ϵ, 1⟩, ⟨ϵ, 00⟩, ⟨ϵ, 01⟩, ⟨ϵ, 10⟩, ⟨ϵ, 11⟩, ⟨ϵ, 000⟩, ⟨ϵ, 001⟩,     ⟨ϵ, 010⟩, ⟨ϵ, 011⟩, ⟨ϵ, 100⟩, ⟨ϵ, 101⟩, ⟨ϵ, 110⟩, ⟨ϵ, 111⟩, ⟨0, 00⟩, ⟨0, 01⟩, ⟨0, 000⟩, ⟨0, 001⟩, ⟨0, 010⟩, ⟨0, 011⟩,   ⟨1, 10⟩, ⟨1, 11⟩, ⟨1, 100⟩, ⟨1, 101⟩, ⟨1, 110⟩, ⟨1, 111⟩,  ⟨00, 000⟩, ⟨00, 001⟩, ⟨01, 010⟩, ⟨01, 011⟩, ⟨10, 100⟩, ⟨10, 101⟩,

        

.

    

⟨11, 110⟩, ⟨11, 111⟩

8.4 On bitstrings of length ≤ 3, isProperSubstring is    ⟨ϵ, 0⟩, ⟨ϵ, 1⟩, ⟨ϵ, 00⟩, ⟨ϵ, 01⟩, ⟨ϵ, 10⟩, ⟨ϵ, 11⟩,   ⟨ϵ, 000⟩, ⟨ϵ, 001⟩, ⟨ϵ, 010⟩, ⟨ϵ, 011⟩, ⟨ϵ, 100⟩, ⟨ϵ, 101⟩, ⟨ϵ, 110⟩, ⟨ϵ, 111⟩,     ⟨ 0 , 00⟩, ⟨0, 01⟩, ⟨0, 10⟩, ⟨0, 000⟩, ⟨0, 001⟩, ⟨0, 010⟩, ⟨0, 011⟩, ⟨0, 100⟩, ⟨0, 101⟩, ⟨0, 110⟩,   ⟨1, 01⟩, ⟨1, 10⟩, ⟨1, 11⟩, ⟨1, 001⟩, ⟨1, 010⟩, ⟨1, 011⟩, ⟨1, 100⟩, ⟨1, 101⟩, ⟨1, 110⟩, ⟨1, 111⟩, ⟨00, 000⟩, ⟨00, 001⟩, ⟨00, 100⟩,     ⟨ 01, 001⟩, ⟨01, 010⟩, ⟨01, 011⟩, ⟨01, 101⟩,     ⟨ 10, 010⟩, ⟨10, 100⟩, ⟨10, 101⟩, ⟨10, 110⟩,   ⟨11, 011⟩, ⟨11, 110⟩, ⟨11, 111⟩

   

.

                    

.

8.5 isProperSubsequence consists of all pairs ⟨x, y⟩ where x is a proper substring of y, plus any pairs for which the elements of x appear nonconsecutively in y. For any y ̸= ϵ, we have that ⟨ϵ, y⟩ ∈ isProperSubstring. For x ∈ {0, 1}, we have that ⟨x, y⟩ ∈ isProperSubstring ⇔ ⟨x, y⟩ ∈ isProperSubsequence (there’s only one symbol in x, so it can’t be “split up”). No string of length 3 is proper substring or subsequence of a string of length ≤ 3. So the only interesting strings 00, 01, 10, and 11. If 01 appears at all in y, it appears consecutively. (Think about why!) Similarly for 10. Thus isProperSubsequence = isProperSubstring ∪ {⟨00, 010⟩, ⟨11, 101⟩} . 8.6 isAnagram includes all pairs ⟨x, x⟩ for all bitstrings x ∈ {ϵ, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111} plus all of the following:    ⟨01, 10⟩, ⟨10, 01⟩,  ⟨001, 010⟩, ⟨001, 100⟩, ⟨010, 001⟩, ⟨010, 100⟩, ⟨100, 001⟩, ⟨100, 010⟩, .  ⟨110, 101⟩, ⟨110, 011⟩, ⟨101, 110⟩, ⟨101, 011⟩, ⟨011, 110⟩, ⟨011, 101⟩ 

144

8.2 Formal Introduction

145

8.7 ⊆ 8.8 = 8.9 ∅ 8.10 ⊂ 8.11 The pair ⟨A, B⟩ ∈ ∼⊂ if ⟨A, B⟩ ∈ / ⊂—that is, if

¬(A ⊂ B) ⇔ ⇔ ⇔ ⇔

¬(∀x : x ∈ A ⇒ x ∈ B) ∃x : ¬(x ∈ A ⇒ x ∈ B) ∃x : (x ∈ A ∧ x ∈ / B) A − B ̸= ∅.

8.12 R−1 = {⟨2, 2⟩, ⟨1, 5⟩, ⟨3, 2⟩, ⟨2, 5⟩, ⟨1, 2⟩} 8.13 S−1 = {⟨4, 3⟩, ⟨3, 5⟩, ⟨6, 6⟩, ⟨4, 1⟩, ⟨3, 4⟩} 8.14 R ◦ R = {⟨2, 1⟩, ⟨2, 2⟩, ⟨2, 3⟩, ⟨5, 1⟩, ⟨5, 2⟩, ⟨5, 3⟩} 8.15 R ◦ S = ∅ 8.16 S ◦ R = {⟨2, 4⟩, ⟨5, 4⟩} 8.17 R ◦ S−1 = {⟨3, 1⟩, ⟨3, 2⟩} 8.18 S ◦ R−1 = {⟨1, 3⟩, ⟨2, 3⟩} 8.19 S−1 ◦ S = {⟨4, 4⟩, ⟨4, 5⟩, ⟨5, 4⟩, ⟨5, 5⟩, ⟨1, 1⟩, ⟨1, 3⟩, ⟨3, 1⟩, ⟨3, 3⟩, ⟨6, 6⟩} 8.20 The relation is shown in Figure S.8.1a. 8.21 R ◦ R−1 denotes the set of pairs of ingredients that are used together in at least one sauce. The relation is shown in Figure S.8.1b. 8.22 R−1 ◦ R denotes the set of pairs of sauces that share at least one ingredient. Because all five sauces include butter, all pairs are in the relation, so R−1 ◦ R equals

{Béchamel, Espagnole, Hollandaise, Velouté, Tomate} × {Béchamel, Espagnole, Hollandaise, Velouté, Tomate} . 8.23 at ◦ taking 8.24 at ◦ taughtIn−1 8.25 taking−1 ◦ taking 8.26 taking−1 ◦ at−1 ◦ at ◦ taking 8.27 parent ◦ parent = grandparent 8.28 (parent−1 ) ◦ (parent−1 ) = grandchild

146

Relations

(a) sauce Béchamel Béchamel Béchamel Espagnole Espagnole Espagnole Hollandaise Hollandaise Hollandaise Velouté Velouté Velouté Tomate Tomate Tomate

(b) ingredient milk butter flour stock butter flour egg butter lemon juice stock butter flour tomatoes butter flour

ingredient milk milk milk butter butter butter butter butter butter butter tomatoes tomatoes tomatoes lemon juice lemon juice lemon juice egg egg egg flour flour flour flour flour stock stock stock stock stock stock

ingredient milk butter flour milk butter flour stock egg lemon juice tomatoes tomatoes butter flour egg butter lemon juice egg butter lemon juice milk butter flour stock tomatoes butter flour stock egg lemon juice tomatoes

Figure S.8.1 Some relations based on sauces in French cooking. 8.29 Writing parent−1 as child, we have parent ◦ child = sibling (or yourself): ⟨x, y⟩ is in the given relation if there exists a person z such that ⟨x, z⟩ ∈ child and ⟨z, y⟩ ∈ parent. That is, x is a child of z, who is a parent of y—in other words, x and y share a parent. (You share a parent with yourself, so x and y can be siblings or they can be the same person.) 8.30 Writing parent−1 as child, we have child ◦ parent = procreated-with (or yourself, if you have a child): ⟨x, y⟩ is in the given relation if there exists a person z such that ⟨x, z⟩ ∈ parent and ⟨z, y⟩ ∈ child. That is, x is a parent of z, who is a child of y—that is, x and y share a child. (You share a child with yourself if you’ve had a child, so x and y can be the joint parents of a child, or they can be the same parent of a child.) 8.31 Writing parent ◦ parent as grandparent and parent−1 ◦ parent−1 as grandchild = grandparent−1 , the given relation is grandparent ◦ grandchild. That is, ⟨x, y⟩ ∈ grandparent ◦ grandchild if there’s a z such that x is the grandchild of z and z is the grandparent of y—that is, x and y share a grandparent. That means that x and y are either the same person, siblings, or first cousins. 8.32 Writing parent ◦ (parent−1 ) as sibling-or-self, we’re looking for sibling-or-self ◦ sibling-or-self. Because the sibling of my sibling is my sibling, in fact this relation also simply represents sibling-or-self. 8.33 The largest possible size is nm: for example, if R = {⟨1, x⟩ : 1 ≤ x ≤ n} and S = {⟨y, 1⟩ : 1 ≤ y ≤ m}, then R ◦ S = {⟨y, x⟩ : 1 ≤ x ≤ n and 1 ≤ y ≤ m}.

8.2 Formal Introduction

147

The smallest possible size is zero, if, for example, R ⊆ Z>0 × Z>0 and S ⊆ Z≤0 × Z≤0 . 8.34 We claim that both sets denote the set of pairs ⟨a, d⟩ for which there exist b and c such that ⟨a, b⟩ ∈ T, ⟨b, c⟩ ∈ S, and ⟨c, d⟩ ∈ R:

⟨a, d⟩ ∈ R ◦ (S ◦ T) ⇔ ∃c : ⟨a, c⟩ ∈ S ◦ T and ⟨c, d⟩ ∈ R ⇔ ∃c : [∃b : ⟨a, b⟩ ∈ T and ⟨b, c⟩ ∈ S] and ⟨c, d⟩ ∈ R ⇔ ∃b, c : [⟨a, b⟩ ∈ T and ⟨b, c⟩ ∈ S and ⟨c, d⟩ ∈ R] ⇔ ∃b : ⟨a, b⟩ ∈ T and [∃c : ⟨b, c⟩ ∈ S and ⟨c, d⟩ ∈ R] ⇔ ∃b : ⟨a, b⟩ ∈ T and ⟨b, d⟩ ∈ R ◦ S ⇔ ⟨a, d⟩ ∈ (R ◦ S) ◦ T.

definition of composition definition of composition properties of ∃ properties of ∃ definition of composition definition of composition

8.35 Let a and c be generic elements. Then:

⟨a, c⟩ ∈ (R ◦ S)−1 ⇔ ⟨c, a⟩ ∈ R ◦ S ⇔ ∃b : ⟨c, b⟩ ∈ S and ⟨b, a⟩ ∈ R

definition of inverse definition of composition

⇔ ∃b : ⟨b, c⟩ ∈ S−1 and ⟨a, b⟩ ∈ R−1 −1

⇔ ∃b : ⟨a, b⟩ ∈ R

definition of inverse

−1

and ⟨b, c⟩ ∈ S

⇔ ⟨a, c⟩ ∈ S−1 ◦ R−1 .

commutativity of ∧ definition of composition

8.36 False. The pair ⟨x, x⟩ ∈ R ◦ R−1 if and only if there exists a y ∈ B such that ⟨x, y⟩ ∈ R. For example, if R = ∅, then R ◦ R−1 = ∅ too. 8.37 R × R 8.38 identity = {⟨x, x⟩ : x ∈ Z} 8.39 Consider any x ∈ A. Because R is a function, there’s one and only one y ∈ B such that ⟨x, y⟩ ∈ R. Because T is a function, there’s one and only one z ∈ C such that ⟨y, z⟩ ∈ T. A pair ⟨x, z⟩ is in T ◦ R if there exists a y such that ⟨x, y⟩ = R and ⟨y, z⟩ ∈ T. Thus, for any x, there’s one and only one such y, and one and only such z. Therefore T ◦ R is a function. 8.40 Fix z ∈ C. Because T is one-to-one, there is at most one y ∈ B such that ⟨y, z⟩ ∈ T. Because R is one-to-one, for any y (including the lone y from the previous sentence, if there is one) there is at most one x ∈ A such that ⟨x, y⟩ ∈ R. Thus there’s at most one x ∈ A such that ∃b ∈ B : ⟨x, b⟩ ∈ R and ⟨b, z⟩ ∈ T—that is, there’s at most one x ∈ A such that ⟨x, z⟩ ∈ T ◦ R. Thus T ◦ R is one-to-one. 8.41 Fix z ∈ C. Because T is onto, there is at least one y ∈ B such that ⟨y, z⟩ ∈ T. Because R is onto, for any y (including the values of y from the previous sentence) there is at least one x ∈ A such that ⟨x, y⟩ ∈ R. Thus there’s at least one x ∈ A such that ∃b ∈ B : ⟨x, b⟩ ∈ R and ⟨b, z⟩ ∈ T—that is, there’s at least one x ∈ A such that ⟨x, z⟩ ∈ T ◦ R. Thus T ◦ R is onto. 8.42 Neither must be a function. For example, consider R ⊆ {a, b} × {0, 1} and T ⊆ {0, 1} × {A, B}. R = {⟨a, 0⟩, ⟨a, 1⟩, ⟨b, 0⟩, ⟨b, 1⟩} T = {⟨0, A⟩}

(not a function) (not a function—T(1) is undefined)

But, composing these relations, we get T ◦ R = {⟨a, A⟩, ⟨b, A⟩} .

(a function!)

8.43 T does not need to be one-to-one. For example, consider R ⊆ {a, b} × {0, 1} and T ⊆ {0, 1} × {A, B}. R = {⟨a, 0⟩, ⟨b, 1⟩} T = {⟨0, A⟩, ⟨1, B⟩, ⟨2, B⟩}

(one-to-one) (not one-to-one—T(1) = T(2))

148

Relations

But, composing these relations, we get T ◦ R = {⟨a, A⟩, ⟨b, B⟩} .

(one-to-one!)

On the other hand, R does need to be one-to-one. Suppose not. Then R(x) = R(x′ ) = y for some x and x′ ̸= x. Because T is a function, T(y) = z for some particular z. But then ⟨x, z⟩, ⟨x′ , z⟩ ∈ T ◦ R—making T ◦ R not one-to-one. 8.44 R does not need to be onto. For example, consider R ⊆ {a, b} × {0, 1, 2} and T ⊆ {0, 1, 2} × {A, B}. R = {⟨a, 0⟩, ⟨b, 1⟩} T = {⟨0, A⟩, ⟨1, B⟩, ⟨2, B⟩}

(not onto—no x with R(x) = 2) (onto)

But, composing these relations, we get T ◦ R = {⟨a, A⟩, ⟨b, B⟩} .

(onto!)

On the other hand, T does need to be onto. Suppose not. Then there’s a z such that no y satisfies T(y) = z. But then ⟨x, z⟩ ∈ / T ◦ R for any x—making T ◦ R not onto. 8.45 join(≤, ≤) ∪ join(≤−1 , ≤−1 ) ( 8.46 project(select(C, f), {1}) where f(c, r, g, b) =

True if r = 0 False otherwise.

  8.47 project join project(C, {1, 4}), [project(C, {1, 4})]−1 , {1, 3} ( 8.48 project(select(C, f), {1}) where f(c, r, g, b) =

8.49 Blue ◦ Blue−1 8.50 [Blue ◦ ≥ ◦ Red−1 ] ∩ =

8.3

Properties of Relations 8.51

9

10 11

8 12 7 0 6 1 5 4

2 3

True if b ≥ r False otherwise.

8.3 Properties of Relations

8.52

10

11

149

12 13

9

14 8 0 7 1 6 5

8.53

10

2 4

3

11

12 13

9

14 8 0 7 1 6 5

2 4

3

 8.54 The relation ⟨x, x⟩ : x5 ≡5 x is reflexive: 05 = 0 ≡5 0 and 15 = 1 ≡5 1 and 25 = 32 ≡5 2 and 35 = 243 ≡5 3 and 45 = 1024 ≡5 4. 8.55 The relation R = {⟨x, y⟩ : x + y ≡5 0} is neither reflexive nor irreflexive: ⟨0, 0⟩ ∈ R because 0 + 0 = 0 ≡5 0, but ⟨1, 1⟩ ∈ / R because 1 + 1 = 2 ̸≡5 1. 8.56 R = {⟨x, y⟩ : there exists z such that x · z ≡5 y} is reflexive: ⟨x, x⟩ ∈ R because we can always choose z = 1.  8.57 The relation R = ⟨x, y⟩ : there exists z such that x2 · z2 ≡5 y is neither reflexive nor irreflexive: ⟨0, 0⟩ ∈ R because 02 · z2 = 0 ≡5 0 for any z. But ⟨2, 2⟩ ∈ / R because 22 z2 = 4z2 ̸≡5 2 for any z: 4 · 02 ≡5 0 and 4 · 12 ≡5 4 and 2 2 2 4 · 2 ≡5 1 and 4 · 3 ≡5 1 and 4 · 4 ≡5 4. 8.58 The claim is true: ⟨x, y⟩ ∈ R if and only if ⟨y, x⟩ ∈ R−1 , so ⟨x, x⟩ ∈ R if and only if ⟨x, x⟩ ∈ R−1 . Thus R is reflexive (every ⟨x, x⟩ ∈ R) if and only if R−1 is reflexive (every ⟨x, x⟩ ∈ R−1 ). 8.59 The claim is true: consider an arbitrary x. By definition, ⟨x, x⟩ ∈ R ◦ T if and only if there exists a y such that ⟨x, y⟩ ∈ T and ⟨y, x⟩ ∈ R. But we can always just choose y = x. By the reflexivity of R and T, we have that ⟨x, x⟩ ∈ T and ⟨x, x⟩ ∈ R, and thus there exists a y such that ⟨x, y⟩ ∈ T and ⟨y, x⟩ ∈ R. Therefore ⟨x, x⟩ ∈ R ◦ T for every x, and R ◦ T is reflexive. 8.60 The claim is false: consider R = {⟨0, 1⟩, ⟨1, 0⟩}. Then R is not reflexive. But ⟨0, 0⟩ ∈ R ◦ R (because ⟨0, 1⟩ ∈ R and ⟨1, 0⟩ ∈ R) and ⟨1, 1⟩ ∈ R ◦ R (because ⟨1, 0⟩ ∈ R and ⟨0, 1⟩ ∈ R). 8.61 The claim is true: ⟨x, y⟩ ∈ R if and only if ⟨y, x⟩ ∈ R−1 , so ⟨x, x⟩ ∈ R if and only if ⟨x, x⟩ ∈ R−1 . Thus every ⟨x, x⟩ ∈ / R if and only if every ⟨x, x⟩ ∈ / R−1 . 8.62 The claim is false: consider R = {⟨0, 1⟩, ⟨1, 0⟩}. Then R is irreflexive. But ⟨0, 0⟩ ∈ R ◦ R (because ⟨0, 1⟩ ∈ R and ⟨1, 0⟩ ∈ R).

150

Relations

8.63 Symmetric and antisymmetric. This relation is in fact {⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨3, 3⟩, ⟨4, 4⟩} and thus satisfies both requirements. But the relation is not asymmetric, because ⟨0, 0⟩ ∈ R. 8.64 Symmetric. x + y ≡5 y + x, so ⟨x, y⟩ ∈ R ⇔ ⟨y, x⟩ ∈ R. It is not asymmetric because ⟨0, 0⟩ ∈ R, and not antisymmetric because ⟨2, 3⟩, ⟨3, 2⟩ ∈ R. 8.65 None of these. The relation is not antisymmetric (and not asymmetric) because ⟨1, 2⟩, ⟨2, 1⟩ ∈ R (for ⟨1, 2⟩ choose z = 2 so that 1 · z ≡5 2; for ⟨2, 1⟩ choose z = 3 so that 2 · z ≡5 1). It is not symmetric because ⟨1, 0⟩ ∈ R (choose z = 0) but ⟨0, 1⟩ ∈ / R (there’s no z such that 0z ≡5 1). 8.66 None of these. The relation is not symmetric because ⟨1, 0⟩ ∈ R (choose z = 0) but ⟨0, 1⟩ ∈ / R (there’s no z such that 0z ≡5 1). It is not antisymmetric (and not asymmetric) because ⟨1, 4⟩, ⟨4, 1⟩ ∈ R (for ⟨1, 4⟩ choose z = 2 and 12 · z2 ≡5 4; for ⟨4, 1⟩ choose z = 1 and 42 · z2 ≡5 1). 8.67 The key step is the fact (“the axiom of extensionality”) that two sets A and B are equal if and only if x ∈ A ⇔ x ∈ B for all values of x. Here’s the proof of the claim: R is symmetric ⇔ ∀x, y : [⟨x, y⟩ ∈ R ⇔ ⟨y, x⟩ ∈ R]

definition of symmetry

−1

⇔ ∀x, y : [⟨x, y⟩ ∈ R ⇔ ⟨x, y⟩ ∈ R ]

definition of inverse

−1

⇔R=R .

“axiom of extensionality”

8.68 We’ll prove that R is not antisymmetric if and only if R ∩ R−1 ̸⊆ {⟨a, a⟩ : a ∈ A}; the desired fact follows immediately (because p ⇔ q and ¬p ⇔ ¬q are logically equivalent). R∩R

−1

̸⊆ {⟨a, a⟩ : a ∈ A} ⇔ there exists ⟨a, b⟩ ∈ R ∩ R−1 with a ̸= b −1

⇔ there exists ⟨a, b⟩ ∈ R and ⟨a, b⟩ ∈ R with a ̸= b ⇔ there exists ⟨a, b⟩ ∈ R and ⟨b, a⟩ ∈ R with a ̸= b ⇔ R is not antisymmetric.

definition of ⊆ definition of ∩ definition of inverse definition of antisymmetry

8.69 Analogously to the previous exercise, we’ll prove that R is asymmetric if and only if R ∩ R−1 = ∅ by proving that R is not asymmetric if and only if R ∩ R−1 ̸= ∅: −1

R∩R

̸= ∅ ⇔ there exists ⟨a, b⟩ ∈ R ∩ R−1 ⇔ there exists ⟨a, b⟩ ∈ R and ⟨a, b⟩ ∈ R−1 ⇔ there exists ⟨a, b⟩ ∈ R and ⟨b, a⟩ ∈ R ⇔ R is not asymmetric.

definition of ∩ definition of inverse definition of asymmetry

8.70 Suppose ⟨x, y⟩ ∈ R for x ̸= y. If ⟨y, x⟩ ∈ / R too, then R is not symmetric, but if ⟨y, x⟩ ∈ R then R is not antisymmetric. Thus any R containing ⟨x, y⟩ ∈ R for x ̸= y cannot be both symmetric and antisymmetric. But any relation R satisfying R ⊆ {⟨x, x⟩ : x ∈ A} is both symmetric. It’s symmetric because ⟨x, x⟩ ∈ R implies ⟨x, x⟩ ∈ R, and it’s antisymmetric by Theorem 8.8. 8.71 Using Theorem 8.8: if R is asymmetric, then R ∩ R−1 = ∅, and ∅ is certainly a subset of {⟨a, a⟩ : a ∈ A}. Thus R is antisymmetric. 8.72 {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 0⟩, ⟨1, 1⟩} or {⟨0, 0⟩, ⟨1, 1⟩}. 8.73 {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 1⟩} or {⟨0, 0⟩, ⟨1, 0⟩, ⟨1, 1⟩} or {⟨0, 0⟩, ⟨1, 1⟩}. 8.74 Impossible! To be reflexive, we must include ⟨0, 0⟩; to be asymmetric we cannot have ⟨0, 0⟩. 8.75 {⟨0, 1⟩, ⟨1, 0⟩} or ∅ 8.76 {⟨0, 1⟩} or {⟨1, 1⟩} or ∅

8.3 Properties of Relations

151

8.77 {⟨0, 1⟩} or {⟨1, 1⟩} or ∅ 8.78 {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 0⟩} or {⟨0, 0⟩} or {⟨0, 1⟩, ⟨1, 0⟩, ⟨1, 1⟩} or {⟨1, 1⟩} 8.79 {⟨0, 0⟩, ⟨1, 0⟩} or {⟨0, 0⟩, ⟨0, 1⟩} or {⟨0, 0⟩} or {⟨1, 0⟩, ⟨1, 1⟩} or {⟨0, 1⟩, ⟨1, 1⟩} or {⟨1, 1⟩} 8.80 Impossible! Asymmetry requires that we have neither ⟨0, 0⟩ nor ⟨1, 1⟩, but then the relation is irreflexive.  8.81 R = ⟨x, x⟩ : x5 ≡5 x is transitive. There’s no trio of elements a, b, c such that ⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ R unless a = b = c, so it’s immediate that the relation is transitive. 8.82 R = {⟨x, y⟩ : x + y ≡5 0} is not transitive. We have ⟨2, 3⟩ ∈ R and ⟨3, 2⟩ ∈ R, but ⟨2, 2⟩ ∈ / R. 8.83 R = {⟨x, y⟩ : there exists z such that x · z ≡5 y} is transitive. Suppose that ⟨a, b⟩ ∈ R (because az1 ≡5 b) and ⟨b, c⟩ ∈ R (because bz2 ≡5 c). Then ⟨a, c⟩ ∈ R: choose z = z1 z2 , and thus az = (az1 )z2 ≡5 bz2 ≡5 c.  8.84 R = ⟨x, y⟩ : there exists z such that x2 · z2 ≡5 y is transitive. The relation consists of ⟨0, 0⟩ plus all pairs ⟨a, r⟩ for r ∈ {0, 1, 4} and a ∈ {0, 1, 2, 3, 4}—that is, R = {⟨0, 0⟩} ∪ ({0, 1, 2, 3, 4} × {0, 1, 4}) . We can check exhaustively that this relation is in fact transitive. 8.85 Assume that R is irreflexive and transitive. We must show that R is asymmetric. Suppose for a contradiction that ⟨a, b⟩ ∈ R and ⟨b, a⟩ ∈ R. But then ⟨a, a⟩ ∈ R by transitivity—but ⟨a, a⟩ ∈ R violates irreflexivity! Thus the assumption was false, and ⟨a, b⟩ ∈ R ⇒ ⟨b, a⟩ ∈ / R, precisely as asymmetry demands. 8.86 We’ll show that R is transitive if and only if R ◦ R ⊆ R by mutual implication. First suppose R is transitive. Then

⟨a, c⟩ ∈ R ◦ R ⇔ ∃b : ⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ R ⇒ ⟨a, c⟩ ∈ R.

definition of ◦ by transitivity

Thus R ◦ R ⊆ R. Conversely, suppose that R ◦ R ⊆ R. Then

⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ R ⇒ ⟨a, c⟩ ∈ R ◦ R ⇒ ⟨a, c⟩ ∈ R.

definition of ◦ definition of ⊆

Thus R is transitive. 8.87 Let R = {⟨0, 1⟩}. Then R ◦ R = ∅, which is not equal to R. But R is vacuously transitive, because there is no trio of elements a, b, c such that ⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ R. 8.88 Yes, it is possible for R to be simultaneously symmetric, transitive, and irreflexive—but only when R = ∅. Suppose that R is nonempty; in other words, suppose that ⟨a, b⟩ ∈ R for some particular a and b. Then, by the symmetry of R, we know that ⟨b, a⟩ ∈ R too. Then by transitivity (and the facts that ⟨a, b⟩ ∈ R and ⟨b, a⟩ ∈ R) we have that ⟨a, a⟩ ∈ R, too, violating irreflexivity. 8.89 Suppose ⟨a, f(a)⟩ ∈ R for f(a) ̸= a. Because R is a function, there must be a pair ⟨f(a), f(f(a))⟩ ∈ R. By transitivity, then, ⟨a, f(f(a))⟩ ∈ R too. Thus, for R to be a function, we must have f(a) = f(f(a)). Therefore it is possible for R to be simultaneously transitive and a function, so long as R has the following property: for any b in the image of the function R, it must be the case that ⟨b, b⟩ ∈ R. To state it another way: for every b ∈ A, the set Sb = {a : f(a) = b} is either empty, or b ∈ Sb . 8.90 The only problem with transitivity that arises is if 0 → 1 and 1 → 0, but either 0 ̸→ 0 or 1 ̸→ 1. Thus the only nontransitive relations on {0, 1} are {⟨0, 1⟩, ⟨1, 0⟩, ⟨1, 1⟩} and {⟨0, 1⟩, ⟨1, 0⟩, ⟨0, 0⟩} and {⟨0, 1⟩, ⟨1, 0⟩}. So the transitive ones are the other 14:

• {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 0⟩, ⟨1, 1⟩}

152

Relations

• • • • • • • • • • • • •

{⟨0, 0⟩, ⟨1, 0⟩, ⟨1, 1⟩} {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 1⟩} {⟨0, 0⟩, ⟨1, 1⟩} {⟨0, 0⟩, ⟨1, 0⟩} {⟨0, 0⟩, ⟨0, 1⟩} {⟨0, 0⟩} {⟨1, 0⟩, ⟨1, 1⟩} {⟨0, 1⟩, ⟨1, 1⟩} {⟨1, 1⟩} {⟨0, 1⟩, ⟨1, 0⟩} {⟨1, 0⟩} {⟨0, 1⟩} {}

8.91 Reflexivity requires both ⟨0, 0⟩ and ⟨1, 1⟩, so only the following are both transitive and reflexive:

• • • •

{⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 0⟩, ⟨1, 1⟩} {⟨0, 0⟩, ⟨1, 0⟩, ⟨1, 1⟩} {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 1⟩} {⟨0, 0⟩, ⟨1, 1⟩}

For symmetry, we must have both ⟨0, 1⟩ or ⟨1, 0⟩ or neither, not just one of the two, which eliminates the middle two relations from this list. So only the following relations on {0, 1} are transitive, reflexive, and symmetric:

• {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 0⟩, ⟨1, 1⟩} • {⟨0, 0⟩, ⟨1, 1⟩} 8.92 {⟨2, 4⟩, ⟨4, 3⟩, ⟨4, 4⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨3, 3⟩} 8.93 {⟨2, 4⟩, ⟨4, 3⟩, ⟨4, 4⟩, ⟨4, 2⟩, ⟨3, 4⟩} 8.94 {⟨2, 4⟩, ⟨4, 3⟩, ⟨4, 4⟩, ⟨2, 3⟩} 8.95 {⟨2, 4⟩, ⟨4, 3⟩, ⟨4, 4⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨3, 3⟩, ⟨2, 3⟩} 8.96 {⟨2, 4⟩, ⟨4, 3⟩, ⟨4, 4⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨3, 3⟩, ⟨2, 3⟩, ⟨4, 2⟩, ⟨3, 4⟩, ⟨3, 2⟩} 8.97 T ∪ {⟨1, 1⟩, ⟨2, 2⟩, ⟨3, 3⟩, ⟨4, 4⟩, ⟨5, 5⟩} 8.98 T ∪ {⟨4, 3⟩, ⟨5, 4⟩} 8.99 T ∪ {⟨1, 1⟩, ⟨1, 4⟩, ⟨1, 5⟩, ⟨2, 2⟩, ⟨2, 4⟩, ⟨2, 5⟩, ⟨3, 3⟩, ⟨3, 5⟩} 8.100 The symmetric closure of ≥ is R × R: every pair of real numbers ⟨x, y⟩ either satisfies x ≥ y or y ≥ x. (The symmetric closure of > is {⟨x, y⟩ ∈ R × R : x ̸= y}.) 8.101 There’s an almost word-for-word translation from the mathematical notation if we represent a relation as a Python set of ordered pairs (for example, representing the relation {⟨3, 1⟩, ⟨5, 2⟩} as set((3, 1), (5, 2)). So, instead, we’ll give an alternative (slower) solution based on a simpler data structure: a relation simply consists of an arbitrarily ordered list of ordered pairs of elements. For example, the above relation is represented as [(3, 1), (5, 2)] or [(5, 2), (3, 1)]. See invert and compose in Figure S.8.2. 8.102 See reflexive, irreflexive, symmetric, antisymmetric, asymmetric, and transitive in Figure S.8.2. 8.103 See reflexive_closure, symmetric_closure, and transitive_closure in Figure S.8.2. 8.104 See Figure S.8.3. Running this code lists the exact same relations as in Exercises 8.72–8.80 (though, of course, in different orders).

8.3 Properties of Relations

1 2

def invert(R): return [(y, x) for (x, y) in R]

3 4 5 6 7 8 9 10

def compose(R, S): R_compose_S = [] for (a, b) in R: for (c, d) in S: if b == c: R_compose_S += [(a, d)] return R_compose_S

11 12 13 14 15 16

def reflexive(R): for x in universe: # a global variable! [a list]. if (x, x) not in R: return False return True

17 18 19 20 21 22

def irreflexive(R): for (x,y) in R: if x == y: return False return True

23 24 25

def symmetric(R): return sorted(R) == sorted(invert(R))

26 27 28 29 30 31

def antisymmetric(R): for (x, y) in R: if (y, x) in R and x != y: return False return True

32 33 34 35 36 37

def asymmetric(R): for (x, y) in R: if (y, x) in R: return False return True

38 39 40 41 42 43

def transitive(R): for (x, y) in compose(R, R): if (x, y) not in R: return False return True

44 45 46

def reflexive_closure(R): return R + [(x, x) for x in universe if (x, x) not in R]

47 48 49

def symmetric_closure(R): return R + [(y, x) for (x, y) in R if (y, x) not in R]

50 51 52 53 54 55 56 57 58 59 60 61

def transitive_closure(R): result = R changed = True while changed: changed = False for (a, b) in result: for (c, d) in result: if b == c and (a, d) not in result: result += [(a, d)] changed = True return result

Figure S.8.2 An implementation of relations (including closures) in Python.

153

154

Relations

62 63 64 65 66 67

# Compute a list of all relations on the universe [0, 1]. universe = [0, 1] all_binary_relations = [[]] for x in universe: for y in universe: all_binary_relations = all_binary_relations + [L + [(x, y)] for L in all_binary_relations]

68 69 70 71

reflexives = [R for R in all_binary_relations if reflexive(R)] irreflexives = [R for R in all_binary_relations if irreflexive(R)] neithers = [R for R in all_binary_relations if not reflexive(R) and not irreflexive(R)]

72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88

for (number, description, relations) in [ \ ("8.72", "reflexive, symmetric", [R for R in reflexives if symmetric(R)]), ("8.73", "reflexive, antisymmetric", [R for R in reflexives if antisymmetric(R)]), ("8.74", "reflexive, asymmetric", [R for R in reflexives if asymmetric(R)]), ("8.75", "irreflexive, symmetric", [R for R in irreflexives if symmetric(R)]), ("8.76", "irreflexive, antisymmetric", [R for R in irreflexives if antisymmetric(R)]), ("8.77 ", "irreflexive, asymmetric", [R for R in irreflexives if asymmetric(R)]), ("8.78", "not refl/irrefl., symmetric", [R for R in neithers if symmetric(R)]), ("8.79", "not refl/irrefl., antisymmetric", [R for R in neithers if antisymmetric(R)]), ("8.80", "not refl/irrefl., asymmetric", [R for R in neithers if asymmetric(R)]) ]: print("Exercise %s (%s):" % (number, description)) if len(relations) == 0: relations = ["impossible!"] for R in relations: print(" ", R)

Figure S.8.3 Verifying Exercises 8.72–8.80 using Figure S.8.2.

8.105 Let R be a relation, and let S ⊇ R be transitive relation. We’ll argue that Rk ⊆ S for any k ≥ 1 by induction on k. For the base case (k = 1), the statement is immediate: S ⊇ R and R1 = R, so R1 ⊆ S. For the inductive case (k ≥ 2), we assume the inductive hypothesis Rk−1 ⊆ S, and we must prove that if ⟨a, c⟩ ∈ Rk then ⟨a, c⟩ ∈ S. By definition, Rk = R ◦ Rk−1 . Thus:

⟨a, c⟩ ∈ Rk ⇔ ⟨a, c⟩ ∈ R ◦ Rk−1 ⇔ ∃b : ⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ ◦Rk−1 ⇔ ∃b : ⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ S ⇒ ∃b : ⟨a, b⟩ ∈ S and ⟨b, c⟩ ∈ S ⇒ ⟨a, c⟩ ∈ S.

definition of Rk definition of ◦ inductive hypothesis S ⊇ R by assumption S is transitive by assumption

8.106 Let A = {1, 2, . . . , n}, and let R = successor ∩ (A × A). Then the transitive closure of R is ≤, and contains n(n + 1)/2 pairs. Thus our c gets arbitrarily close to 0.5 as n gets bigger and bigger. 8.107 We claim that ⟨x, x + k⟩ is in the transitive closure T of successor, by induction on k. For the base case (k = 1), indeed ⟨x, x + 1⟩ ∈ successor itself. (And by definition T ⊇ successor.) For the inductive case (k ≥ 2), we assume the inductive hypothesis ⟨x, x + k − 1⟩ ∈ T. We must prove ⟨x, x + k⟩ ∈ T. But ⟨x + k − 1, x + k⟩ ∈ successor by the definition of successor, and successor ⊆ T by the definition of transitive closure. Thus we have both ⟨x, x + k − 1⟩ ∈ T (inductive hypothesis) and ⟨x + k − 1, x + k⟩ ∈ T (successor ⊆ T). By the transitivity of T, then, ⟨x, x + k⟩ ∈ T too. 8.108 The X closure of R is a superset of R (specifically, the smallest superset of R) that furthermore has the property of being X. But if R is not antisymmetric (that is, if ⟨a, b⟩, ⟨b, a⟩ ∈ R for some a ̸= b), then every superset of R also fails to be antisymmetric. Thus the “antisymmetric closure” would be undefined!

8.4 Special Relations

8.4

155

Special Relations 8.109 {⟨0, 0⟩, ⟨1, 1⟩} and {⟨0, 0⟩, ⟨0, 1⟩, ⟨1, 0⟩, ⟨1, 1⟩} 8.110 Because they must be reflexive, {⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨3, 3⟩} are in all equivalence relations, but what other pairs we add can vary. There are 15 possibilities:

{} {⟨0, 1⟩, ⟨1, 0⟩} {⟨0, 2⟩, ⟨2, 0⟩} {⟨0, 3⟩, ⟨3, 0⟩} {⟨1, 2⟩, ⟨2, 1⟩} {⟨1, 3⟩, ⟨3, 1⟩} {⟨2, 3⟩, ⟨3, 2⟩} {⟨0, 1⟩, ⟨1, 0⟩, ⟨2, 3⟩, ⟨3, 2⟩} {⟨0, 2⟩, ⟨2, 0⟩, ⟨1, 3⟩, ⟨3, 1⟩} {⟨0, 3⟩, ⟨3, 0⟩, ⟨1, 2⟩, ⟨2, 1⟩} {⟨0, 1⟩, ⟨1, 0⟩, ⟨0, 2⟩, ⟨2, 0⟩, ⟨1, 2⟩, ⟨2, 1⟩} {⟨0, 1⟩, ⟨1, 0⟩, ⟨0, 3⟩, ⟨3, 0⟩, ⟨1, 3⟩, ⟨3, 1⟩} {⟨0, 3⟩, ⟨3, 0⟩, ⟨0, 2⟩, ⟨2, 0⟩, ⟨3, 2⟩, ⟨2, 3⟩} {⟨3, 1⟩, ⟨1, 3⟩, ⟨3, 2⟩, ⟨2, 3⟩, ⟨1, 2⟩, ⟨2, 1⟩} {⟨0, 1⟩, ⟨1, 0⟩, ⟨0, 2⟩, ⟨2, 0⟩, ⟨0, 3⟩, ⟨3, 0⟩, ⟨1, 2⟩, ⟨2, 1⟩, ⟨1, 3⟩, ⟨3, 1⟩, ⟨2, 3⟩, ⟨3, 2⟩} .

all separate 0, 1 together 0, 2 together 0, 3 together 1, 2 together 1, 3 together 2, 3 together 0, 1 and 2, 3 together 0, 2 and 1, 3 together 0, 3 and 1, 2 together 0, 1, 2 together 0, 1, 3 together 0, 2, 3 together 1, 2, 3 together all elements together

8.111 Yes, R1 is an equivalence relation. The equivalence classes are:

• • • • •

∅ {0} {1} , {0, 1} {2} , {0, 2} , {1, 2} , {0, 1, 2} {3} , {0, 3} , {1, 3} , {2, 3} , {0, 1, 3} , {0, 2, 3} , {1, 2, 3} , {0, 1, 2, 3}.

8.112 Yes, R2 is an equivalence relation. The equivalence classes are:

• • • • • • •

{} , {0} {1} , {0, 1} {2} , {0, 2} {3} , {0, 3} , {1, 2} , {0, 1, 2} {1, 3} , {0, 1, 3} {2, 3} , {0, 2, 3} {1, 2, 3} , {0, 1, 2, 3}.

8.113 Yes, R3 is an equivalence relation. In fact, the intersection of any two equivalence relations must be an equivalence relation. Its equivalence classes are the intersections of the equivalence classes of the two:

• • • • • • • • •

{} {0} {1} , {0, 1} {2} , {0, 2} {1, 2} , {0, 1, 2} {3} , {0, 3} {1, 3} , {0, 1, 3} {2, 3} , {0, 2, 3} {1, 2, 3} , {0, 1, 2, 3}.

8.114 No, R4 is not an equivalence relation: {0} ∩ {0, 1} ̸= ∅ and {1} ∩ {0, 1} ̸= ∅, but {0} ∩ {1} = ∅; thus the relation isn’t transitive.

156

Relations

8.115 Yes, R5 is an equivalence relation, with the following equivalence classes:

• • • • •

{} {0} , {1} , {2} , {3} {0, 1} , {0, 2} , {0, 3} , {1, 2} , {1, 3} , {2, 3} {0, 1, 2} , {0, 1, 2} , {0, 2, 3} , {1, 2, 3} {0, 1, 2, 3}.

8.116 No, R−1 ◦ R need not be reflexive. If there’s no pair ⟨•, a⟩ ∈ R, then ⟨a, a⟩ ∈ / R−1 ◦ R. For example, let R = {⟨0, 1⟩} −1 be a relation on {0, 1} × {0, 1}. Then R ◦ R = {⟨1, 1⟩}, which is not reflexive because it’s missing ⟨0, 0⟩. 8.117 Yes, R−1 ◦ R must be symmetric. Let ⟨a, c⟩ be a generic pair of elements. Then

⟨a, c⟩ ∈ R−1 ◦ R ⇔ ∃b : ⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ R−1 ⇔ ∃b : ⟨a, b⟩ ∈ R and ⟨c, b⟩ ∈ R −1

⇔ ∃b : ⟨b, a⟩ ∈ R ⇔ ⟨c, a⟩ ∈ R

−1

and ⟨c, b⟩ ∈ R

◦ R.

definition of ◦ definition of −1 definition of −1 definition of ◦

8.118 False, R−1 ◦ R need not be transitive. For example, let R = {⟨0, 2⟩, ⟨0, 3⟩, ⟨1, 3⟩, ⟨1, 4⟩}. Then −1

R

◦ R = {⟨2, 0⟩, ⟨3, 0⟩, ⟨3, 1⟩, ⟨4, 1⟩} ◦ {⟨0, 2⟩, ⟨0, 3⟩, ⟨1, 3⟩, ⟨1, 4⟩} = {⟨2, 2⟩, ⟨2, 3⟩, ⟨3, 2⟩, ⟨3, 3⟩, ⟨3, 4⟩, ⟨4, 3⟩, ⟨4, 4⟩}

which is not transitive because it contains ⟨2, 3⟩ and ⟨3, 4⟩ but is missing ⟨2, 4⟩. 8.119 The equivalence relation ≡coarsest has only one equivalence class, and ≡coarsest = A × A. Every pair of elements is in A × A, so it’s immediate that the relation is reflexive, symmetric, and transitive. Every equivalent relation ≡ on A refines ≡coarsest , vacuously, because (a ≡ a′ ) ⇒ (a ≡coarsest a′ ) because anything implies true, and a ≡coarsest a′ is true for any a and a′ . 8.120 The equivalence relation ≡finest has a separate equivalence class for each element of A, and ≡finest = {⟨a, a⟩ : a ∈ A}. The relation is reflexive by definition; it’s symmetric and transitive because there’s never a non-⟨a, a⟩ pair in the relation. Every equivalence relation ≡ on A is refined by ≡finest , vacuously, because (a ≡finest a′ ) ⇒ (a ≡ a′ ) because the only pairs related by ≡finest are ⟨a, a⟩ and ≡ must be reflexive. 8.121 ≡n is a refinement of any equivalence relation that combines one or more of ≡n ’s n equivalence classes together: that is, ≡n refines ≡k if a ≡n b ⇒ a ≡k b for any integers a and b. This implication holds precisely when k | n. Thus the coarsenings of ≡60 are ≡1 , ≡2 , ≡3 , ≡4 , ≡5 , ≡6 , ≡10 , ≡12 , ≡15 , ≡20 , ≡30 , ≡60 . And there are infinitely many refinements of ≡60 , namely ≡60 , ≡120 , ≡180 , . . . . 8.122 The claim is very false. We can coarsen ≡60 by combining any equivalence classes of the relation, not just the ones that are defined by divisibility. For example, the relation R = {⟨a, b⟩ : (a mod 60 is prime) ⇔ (b mod 60 is prime)} is a two-class equivalence relation, where the two equivalence classes are

[0]≡60 ∪ [1]≡60 ∪ [4]≡60 ∪ [6]≡60 . . . ∪ [58]≡60

and

[2]≡60 ∪ [3]≡60 ∪ [5]≡60 ∪ [7]≡60 . . . ∪ [59]≡60 .

The only two-class equivalence relation of the form ≡k is ≡2 , whose equivalence classes are very different from these. 8.123 Yes, the is the same object as is a refinement of has the same value as. Any object x has the same value as the object x itself; thus, if x and y are the same object, then x and y have the same value. That is: x ≡identical object y implies x ≡same value y. But there may be two distinct objects storing the same value, so the converse is not true. Thus, by definition, ≡identical object refines ≡same value . 8.124 The partial orders on {0, 1} are:

• ⟨0, 0⟩, ⟨1, 1⟩

8.4 Special Relations

157

• ⟨0, 0⟩, ⟨1, 1⟩, ⟨0, 1⟩ • ⟨0, 0⟩, ⟨1, 1⟩, ⟨1, 0⟩. 8.125 The partial orders on {0, 1, 2} are:

• • • • • • • • • • • • • • • • • • •

⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨0, 1⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨1, 2⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨0, 2⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨1, 0⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨2, 1⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨2, 0⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨0, 1⟩, ⟨0, 2⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨1, 0⟩, ⟨1, 2⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨2, 0⟩, ⟨2, 1⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨1, 0⟩, ⟨2, 0⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨0, 1⟩, ⟨2, 1⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨0, 2⟩, ⟨1, 2⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨0, 1⟩, ⟨1, 2⟩, ⟨0, 2⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨0, 2⟩, ⟨2, 1⟩, ⟨0, 1⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨1, 0⟩, ⟨0, 2⟩, ⟨1, 2⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨1, 2⟩, ⟨2, 0⟩, ⟨1, 0⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨2, 0⟩, ⟨0, 1⟩, ⟨2, 1⟩ ⟨0, 0⟩, ⟨1, 1⟩, ⟨2, 2⟩, ⟨2, 1⟩, ⟨1, 0⟩, ⟨2, 0⟩.

8.126 Neither, because it’s not antisymmetric: ⟨{2, 5} , {3, 4}⟩ ∈ R1 and ⟨{3, 4} , {2, 5}⟩ ∈ R1 because 3 + 4 ≤ 2 + 5 and 2 + 5 ≤ 3 + 4. 8.127 This relation is a partial order:

• It’s antisymmetric (there are no two distinct subsets of {2, 3, 4, 5} that have the same product—though that’s only because of the particular involved). Q numbersQ

• It’s reflexive (because Qa∈A a = Qa∈A a for any Q set A). Q Q • It’s transitive (because a∈A a ≤ b∈B b ≤ c∈C c implies a∈A a ≤ c∈C c for any sets A, B, and C). 8.128 Partial order. It’s antisymmetric because A ⊆ B and B ⊆ A only if A = B; it’s reflexive because A ⊆ A; and it’s transitive because A ⊆ B ⊆ C implies A ⊆ C. 8.129 Partial order. Actually this is the same question as the previous exercise, just reversed—antisymmetry, reflexivity, and transitivity follow in precisely the same way. 8.130 Total order. It’s irreflexive (|A| ̸< |A|) but it is transitive (|A| < |B| < |C| implies |A| < |C|). 8.131 Let ⪯ ⊆ A × A. Then:

⪯ is reflexive ⇔ ∀a ∈ A : ⟨a, a⟩ ∈ ⪯

definition of reflexivity

⇔ ∀a ∈ A : ⟨a, a⟩ ∈ ⪯−1 ⇔⪯

−1

definition of −1

is reflexive

definition of reflexivity

⪯ is antisymmetric ⇔ ∀a, b ∈ A : ⟨a, b⟩ ∈ ⪯ and ⟨b, a⟩ ∈ ⪯ ⇒ a = b ⇔ ∀a, b ∈ A : ⟨b, a⟩ ∈ ⪯ ⇔⪯

−1

−1

and ⟨a, b⟩ ∈ ⪯

−1

definition of antisymmetry

⇒a=b

definition of −1

is antisymmetric

definition of antisymmetry

⪯ is transitive ⇔ ∀a, b, c ∈ A : ⟨a, b⟩ ∈ ⪯ and ⟨b, c⟩ ∈ ⪯ ⇒ ⟨a, c⟩ ∈ ⪯ ⇔ ∀a, b, c ∈ A : ⟨b, a⟩ ∈ ⪯

−1

and ⟨c, b⟩ ∈ ⪯

−1

⇒ ⟨c, a⟩ ∈ ⪯

definition of transitivity −1

definition of −1

158

Relations

⇔ ⪯−1 is transitive

definition of transitivity

Thus ⪯ is a partial order if and only if ⪯−1 is a partial order. 8.132 Suppose that the relation ⪯ is a partial order—in other words, suppose that ⪯ is transitive, reflexive, and antisymmetric. Let R = {⟨a, b⟩ : a ⪯ b and a ̸= b}. Irreflexivity of R is immediate: by definition, if a ̸= b then ⟨a, b⟩ ∈ / R. For transitivity, let a, b, and c be arbitrary and suppose ⟨a, b⟩ ∈ R and ⟨b, c⟩ ∈ R. Then a ̸= b and a ⪯ b and b ⪯ c by definition of R, and by the transitivity of ⪯ we have a ⪯ c. If a = c, then a ⪯ b and b ⪯ a and a ̸= b, violating the antisymmetry of ⪯; thus a ⪯ c and a ̸= c. Thus R is transitive. 8.133 Suppose that R is a transitive, antisymmetric relation. We’ll show that there is no cycle of length k ≥ 2 in R by induction on k. For the base case (k = 2), a cycle would be a sequence ⟨a0 , a1 ⟩ of distinct elements where ⟨a0 , a1 ⟩ ∈ R and ⟨a1 , a0 ⟩ ∈ R. But that’s a direct violation of antisymmetry. For the inductive case (k ≥ 3), we assume the inductive hypothesis, that there is no cycle of length k−1. Suppose for the purposes of a contradiction that a0 , a1 , . . . , ak−1 ∈ A forms a cycle of length k: that is, suppose that ⟨ai , ai+1 mod k ⟩ ∈ R for each i ∈ {0, 1, . . . , k − 1}, with each ai distinct. But because ⟨ak−2 , ak−1 ⟩, ⟨ak−1 , a0 ⟩ ∈ R, by transitivity, ⟨ak−1 , a0 ⟩ ∈ R too. But then a0 , a1 , . . . , ak−2 ∈ A is a cycle of length k − 1—and, by the inductive hypothesis, there are no cycles of length k − 1. 8.134 The points x = ⟨2, 4⟩ and y = ⟨3, 3⟩ are incomparable: ⟨x, y⟩ ∈ / R because 4 ̸≤ 3 and ⟨y, x⟩ ∈ / R because 3 ̸≤ 2. 8.135 For reflexivity, we have ⟨⟨a, b⟩, ⟨a, b⟩⟩ ∈ R because a ≤ a and b ≤ b. For transitivity, suppose ⟨⟨a, b⟩, ⟨c, d⟩⟩ ∈ R and ⟨⟨c, d⟩, ⟨e, f⟩⟩ ∈ R. Then a ≤ c ≤ e and b ≤ d ≤ f, and thus a ≤ e and b ≤ f—so ⟨⟨a, b⟩, ⟨e, f⟩⟩ ∈ R too. Finally, for antisymmetry, suppose that ⟨⟨a, b⟩, ⟨c, d⟩⟩ ∈ R and ⟨⟨c, d⟩, ⟨a, b⟩⟩ ∈ R. But then a ≤ c and c ≤ a (so a = c), and b ≤ d and d ≤ b (so b = d). Thus ⟨a, b⟩ = ⟨c, d⟩. 8.136 ⟨1, 1⟩, ⟨1, 3⟩, ⟨1, 5⟩, ⟨2, 2⟩, ⟨2, 4⟩, ⟨2, 5⟩, ⟨3, 3⟩, ⟨3, 5⟩, ⟨4, 4⟩, ⟨4, 5⟩, ⟨5, 5⟩ 8.137 ⟨1, 1⟩, ⟨1, 2⟩, ⟨1, 3⟩, ⟨1, 4⟩, ⟨1, 5⟩, ⟨2, 2⟩, ⟨2, 3⟩, ⟨2, 4⟩, ⟨2, 5⟩, ⟨3, 3⟩, ⟨3, 5⟩, ⟨4, 4⟩, ⟨4, 5⟩, ⟨5, 5⟩ 8.138 ⟨1, 1⟩, ⟨1, 2⟩, ⟨1, 3⟩, ⟨1, 4⟩, ⟨1, 5⟩, ⟨2, 2⟩, ⟨2, 5⟩, ⟨3, 3⟩, ⟨3, 5⟩, ⟨4, 4⟩, ⟨5, 5⟩ 8.139 ⟨1, 1⟩, ⟨1, 4⟩, ⟨1, 5⟩, ⟨2, 2⟩, ⟨2, 4⟩, ⟨2, 5⟩, ⟨3, 3⟩, ⟨3, 4⟩, ⟨3, 5⟩, ⟨4, 4⟩, ⟨4, 5⟩, ⟨5, 5⟩

{1, 2, 3}

8.140

{1, 2}

{1, 3}

{2, 3}

{1}

{2}

{3}

{} 8.141 000 001 010 011 00 0

01

100 101 110 111 10

11 1

8.4 Special Relations

159

8.142 201 is the only immediate predecessor; 203 is the only immediate successor. 8.143 The immediate predecessors are {101, 2}; there are infinitely many immediate successors, namely all of the integers in the set {202p : p is prime}.   8.144 One such strict partial order is the transitive closure of R = ⟨n, 2n⟩ : n ∈ Z≥1 ∪ ⟨n, 2n + 1⟩ : n ∈ Z≥1 . [Note that the relation R is the “child of” relation for a heap stored in an array (see Figure 2.60). Each index has precisely two children.] This relation is irreflexive (no n ≥ 1 satisfies n ∈ {2n, 2n + 1}) and transitive by the definition of the transitive closure. Each n has precisely two immediate successors (2n and 2n + 1): there’s no m < 2n such that n ⪯ m so the only concern is that 2n ⪯ 2n + 1, but n ≥ 1 implies that 2n ≥ 2 and the successors of 2n are 4n > 2n + 1 and 4n + 1 > 2n + 1. Thus 2n and 2n + 1 are immediate successors of n. 8.145 Let ⪯ be a partial order on a finite set A. We’re asked to prove that there must be an a ∈ A that has fewer than two immediate successors, but we’ll prove something stronger: there must be an a ∈ A that has no immediate successors. Namely, let a be a maximal element, which is guaranteed to exist by Theorem 8.21. By definition, that maximal element has no immediate successors (or even non-immediate successors). 8.146 No integer n is a maximal element, because ⟨n + 1, n⟩ ∈ ≥. 8.147 Consider the infinite set R>0 and the partial order ≥. (Remember that 0 ∈ / R>0 .) There is no maximal element >0 under ≥ in R for the same reason as with the integers: for any x, ⟨x + 1, x⟩ ∈ ≥, so x is not maximal. There is no minimal element because, for any x, ⟨x, 2x ⟩ ∈ ≥, so x is not minimal. 8.148 By definition, a minimum element x has the property that every y satisfies y ⪯ x. Thus, if a and b are both minimum elements, we have a ⪯ b and b ⪯ a. But ⪯ is antisymmetric, so from a ⪯ b and b ⪯ a we know that a = b. 8.149 By definition of a being a minimum element, for any x we have a ⪯ x. We need to show that a is a minimal element—that is, there is no b ̸= a with b ⪯ a. Consider any b such that b ⪯ a. Then both a ⪯ b and b ⪯ a—but, by the antisymmetry of ⪯, we therefore have a = b. 8.150 A minimal element under ⪯ is a word w such that no other word is “contained in” the letters of w—that is, no subset of the letters of w can be rearranged to make a word. A maximal element under ⪯ is a word w such that no other word can be formed from the letters of w—that is, no letters can be added to w so that the resulting sequence can be rearranged to make a word. 8.151 The immediate successors of GRAMPS that I found in /usr/share/dict/words (filtered to remove any words that contained characters other than the 26 English letters) were the following. (I found them using the Python implementation in Figure S.8.4.) CAMPGROUNDS EPIGRAMS GAMEKEEPERS HOMOGRAPHS MAGNETOSPHERE MAINSPRING MONOGRAPHS PARADIGMS PLAGIARISM POSTMARKING PRAGMATICS PRAGMATISM PRAGMATIST PROGRAMS PROMULGATES PTARMIGANS RAMPAGES 8.152 The longest minimal elements that I found in /usr/share/dict/words (filtered to remove any words that contained characters other than the 26 English letters) were the following 6-letter words: ACACIA COCCUS JEJUNE JUJUBE LUXURY MUKLUK NUZZLE PUZZLE SCUZZY UNIQUE VOODOO I again found them using the Python implementation in Figure S.8.4; see minimal_elements. (In a Scrabble dictionary, the longest minimal element that I found was the word VIVIFIC [adj., “imparting spirit or vivacity”].) 8.153 A minimal element is a cell that does not depend on any other cell. A maximal element is a cell whose value is not used by the formula for any other cell in the spreadsheet. 8.154 ⟨1, 2, 3, 4, 5⟩, ⟨1, 2, 4, 3, 5⟩, ⟨1, 3, 2, 4, 5⟩, ⟨2, 1, 3, 4, 5⟩, ⟨2, 1, 4, 3, 5⟩, ⟨2, 4, 1, 3, 5⟩ 8.155 ⟨1, 2, 3, 4, 5⟩, ⟨1, 2, 4, 3, 5⟩

160

Relations

1 2 3 4 5 6 7 8

def goes_to(wordA, wordB): '''Can wordB be formed from wordA (plus at least one additional letter)?''' if len(wordA) >= len(wordB): return False for letter in wordA: if wordA.count(letter) > wordB.count(letter): return False return True

9 10 11 12 13 14 15 16 17 18 19 20 21

def immediate_successors(word, words): candidates = [w for w in words if goes_to(word, w)] result = [] for w in candidates: immediate = True for found in candidates: if goes_to(found,w): immediate = False break if immediate: result.append(w) return sorted(result)

22 23 24

def minimal_elements(words): result = []

25 26 27 28 29 30 31 32 33 34 35 36

# We consider the words in increasing order of length. # finding the minimal elements gets WAY slower! for w in sorted(words, key=lambda x:len(x)): minimal = True for pred in result: if goes_to(pred, w): minimal = False break if minimal: result.append(w) return result

If you don't sort, then

Figure S.8.4 An implementation of the anagram game in Python. 8.156 ⟨1, 2, 3, 4, 5⟩, ⟨1, 2, 3, 5, 4⟩, ⟨1, 2, 4, 3, 5⟩, ⟨1, 3, 2, 4, 5⟩, ⟨1, 3, 2, 5, 4⟩, ⟨1, 3, 4, 2, 5⟩, ⟨1, 4, 2, 3, 5⟩, ⟨1, 4, 3, 2, 5⟩ 8.157 ⟨1, 2, 3, 4, 5⟩, ⟨1, 3, 2, 4, 5⟩, ⟨2, 1, 3, 4, 5⟩, ⟨2, 3, 1, 4, 5⟩, ⟨3, 1, 2, 4, 5⟩, ⟨3, 2, 1, 4, 5⟩ 8.158 {1, 3}, {1, 5}, {3, 5}, {1, 3, 5}, {2, 4}, {2, 5}, {4, 5}, {2, 4, 5} 8.159 All subsets with two or more elements that do not contain both 3 and 4. There are 32 total subsets, of which 6 are too small (empty or singletons); there are 8 that contain both 3 and 4. The remaining 18 subsets are chains. 8.160 {1, 2}, {1, 4}, {3, 2}, {3, 4} 8.161 {3, 4} only 8.162 Yes: define ⪯ to be {⟨a, a⟩ : a ∈ A}. This relation is reflexive, symmetric, antisymmetric, and transitive—which is all that we need to have the relation be both a partial order and an equivalence relation.

9 9.2

Counting

Counting Unions and Sequences 9.1 The length of a tweet can range from i = 0 (an empty tweet) up to i = 140. The total number of possible tweets is 140 X i=0

i

256 =

256141 − 1 (28 )141 − 1 21128 − 1 = = , 256 − 1 255 255

which comes out to approximately 1.43 × 10337 tweets. 9.2 The number of digit-digit-digit-letter-letter-letter license plates is 10 · 10 · 10 · 26 · 26 · 26 = 17,576,000. 9.3 The number of letter-letter-letter-digit-digit-digit-digit license plates is 26 · 26 · 26 · 10 · 10 · 10 · 10 = 175,760,000. 9.4 The number of digit-letter-letter-letter-letter-digit license plates is 10 · 26 · 26 · 26 · 26 · 10 = 175,760,000. 9.5 There were 264 · 104 = (260)4 old license plates. Your new format has 36k possibilities if your plates contain k symbols. So you need to ensure that 36k ≥ (260)4 . In other words, you need k ≥ log36 (260) = 4 · 4

log 260 ≈ 6.207. log 36

You therefore need just over 6 symbols—which means you need to pick k = 7 (or larger). 9.6 Counting each type of license plate separately:

• digit-digit-digit-letter-digit-digit. There are 26 · 105 possibilities. • digit-digit-digit-letter-letter-digit-digit (with the first letter ≤ P). Because |{A, . . . , P}| = 16, there are 16 · 26 · 105 possibilities.

• digit-digit-digit-digit-letter-letter-digit-digit (with the first letter ≥ Q). Similarly, |{Q, . . . , Z}| = 10, so there are 10 · 26 · 106 possibilities. • digit-digit-digit-letter-letter-letter-digit-digit. There are 263 · 105 possibilities. Thus the total number of plates is 26 · 105 · (1 + 16 + 100 + 262 ) = 2,061,800,000. 9.7 There are 103 = 1000 numbers of length 3, and 104 = 10,000 of length 4, and 105 = 100,000 of length 5, leading to a total of 1000 + 10,000 + 100,000 = 111,000. (If you interpreted the system as forbidding leading zeroes in a password, then the set of legal passwords is precisely the set of numbers between 100 and 99,999, in which case there are 99999 − 99 = 99,900 valid choices.) 9.8 The number of legal passwords is 104 + 105 + 106 = 10,000 + 100,000 + 1,000,000 = 1,110,000. (Again, if you forbade leading zeroes, then any number between 1000 and 999,999 will do; there are 999,999 − 999 = 999,000 passwords valid under this reading.) 9.9 There are 60 powers. (To avoid any off-by-one errors, I checked this count with a little snippet of Python: 1 2

powers = [x / 4 for x in range(-10 * 4, 8 * 4 + 1) # x / 4 is the power (allowing x to be an int) if x != 0 and (abs(x / 4) 12: return "??? [too long for brute force on my laptop!]" orderings = set([s[0]]) for ch in s[1:]: orderings = {ordering[:i] + ch + ordering[i:] for ordering in orderings for i in range(len(ordering) + 1)} return len(orderings)

17 18 19 20

for s in ["PASCAL", "GRACEHOPPER", "ALANTURING", "CHARLESBABBAGE ", "ADALOVELACE ", "PEERTOPEERSYSTEM "]: print(s, count_rearrangements_formula(s), count_rearrangements_brute_force(s))

Figure S.9.5 Computing the ways to rearrange a given string’s letters.

9.3 Using Functions to Count

1

175

import math

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

def count_prime_factorization_multiplicities(n): ''' Computes a list of the exponents of the prime factors in the prime factorization of n. For example, the prime factorization of 600 is 2^3 * 3^1 * 5^2, so the result is [3, 1, 2]. ''' candidate = 2 factorCounts = [] while n > 1: count = 0 while n % candidate == 0: n = n // candidate count += 1 if count >= 1: factorCounts.append(count) candidate += 1 return factorCounts

19 20 21 22 23 24 25

def count_prime_factorizations(n): factorCounts = count_prime_factorization_multiplicities(n) result = math.factorial(sum(factorCounts)) for count in factorCounts: result = result // math.factorial(count) return result

26 27 28

most_factorizations = max(range(10001), key=count_prime_factorizations) print(most_factorizations, "has", count_prime_factorizations(most_factorizations), "factorizations")

Figure S.9.6 Counting prime factorizations.

9.107 There are only 850 unique boards: number of moves made 0 1 2 3 4 5 6 7 8 9

number of unique boards 1 3 12 38 108 174 228 174 89 23

See Figures S.9.7 and S.9.8 for the code, especially the rotate, reflect, and matches_equivalent methods. 9.108 There are 765 boards in the tree. See Figure S.9.8, especially the stop_once_won optional argument. 9.109 Each iteration removes 2 people, so the ith person who gets to choose a partner is choosing when there are n − 2i + 2 unassigned people, including herself. So she has n − 2i + 1 choices. There are 2n rounds of choices, so the total number of choices is n/2 Y i=1

Y

(n/2)−1

(n − 2i − 1) =

j=0

(2j + 1).

176

Counting

1

import copy

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

class Board: def __init__(self, *args): ''' Construct a new empty board (no arguments), a copy of a given board (one argument), or (four arguments: oldBoard, col, row, mover) a new board that extends oldBoard but with an additional move at (col, row) by mover. ''' if len(args) == 0: self.cells = [[None, None, None], [None, None, None], [None, None, None]] elif len(args) == 1: self.cells = args[0].copy() elif len(args) == 4: oldBoard, xMove, yMove, mover = args self.cells = copy.deepcopy(oldBoard.cells) self.cells[xMove][yMove] = mover

18 19 20 21 22 23

def rotate(self): '''Create a new Board, rotated 90 degrees from this one.''' return Board([[self.cells[i][0] for i in [2, 1, 0]], [self.cells[i][1] for i in [2, 1, 0]], [self.cells[i][2] for i in [2, 1, 0]]])

24 25 26 27 28 29

def reflect(self): '''Create a new Board, reflected horizontally from this one.''' return Board([[self.cells[0][i] for i in [2, 1, 0]], [self.cells[1][i] for i in [2, 1, 0]], [self.cells[2][i] for i in [2, 1, 0]]])

30 31 32 33 34 35 36 37

def has_winner(self): '''Has anyone won this game yet?''' return any([self.cells[xA][yA] == self.cells[xB][yB] == self.cells[xC][yC] != None for ((xA,yA), (xB,yB), (xC,yC)) in [((0,0), (0,1), (0,2)), ((1,0), (1,1), (1,2)), ((2,0), (2,1), (2,2)), # rows ((0,0), (1,0), (2,0)), ((0,1), (1,1), (2,1)), ((0,2), (1,2), (2,2)), # columns ((0,0), (1,1), (2,2)), ((2,0), (1,1), (0,2))]]) # diagonals

38 39 40 41 42

def get_subsequent_boards(self, mover): '''Creates new Boards for every possible move that mover can make in this board.''' return [Board(self, x, y, mover) for x in range(3) for y in range(3) if self.cells[x][y] == None]

43 44 45 46

def matches_exactly(self, other_boards): '''Is this Board exactly identical to any Board in other_boards?''' return any([self.cells == other.cells for other in other_boards])

47 48 49 50 51 52 53

def matches_equivalent(self, other_boards): '''Is this Board equivalent (under rotation/reflection) to any Board in other_boards?''' return any([board.matches_exactly(other_boards) or board.reflect().matches_exactly(other_boards) for board in [self, self.rotate(), self.rotate().rotate(), self.rotate().rotate().rotate()]])

Figure S.9.7 Building a game tree for Tic-Tac-Toe.

9.110 Following the hint, we will start from the right-hand side notation:

n!

(n/2)! · 2n/2

Qn

j

j=1 = hQn/2 i hQn/2 i j=1

j

j=1

2

n! (n/2)!·2n/2

and rewrite both factorials and 2n/2 using product

definition of factorial/exponentiation

9.3 Using Functions to Count

54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78

177

def build_game_tree(prevent_duplicates=False, prevent_equivalents=False, stop_once_won=False): ''' Builds the full game tree for Tic-Tac-Toe. If prevent_duplicates: do not create multiple copies of the same board (which arise from the same moves in different orders). If prevent_equivalents: do not create multiple copies of equivalent boards (including by rotation and reflection). If stop_once_won: do not create child boards if the game is already over (because one of the players has won already). ''' total_board_count = 0 current_layer = [Board()] for move in range(10): print("frontier", move, "has length", len(current_layer)) total_board_count += len(current_layer) next_layer = [] for board in current_layer: if not stop_once_won or not board.has_winner(): for child in board.get_subsequent_boards(move % 2): if (prevent_duplicates and not child.matches_exactly(next_layer)) \ or (prevent_equivalents and not child.matches_equivalent(next_layer)) \ or (not prevent_duplicates and not prevent_equivalents): next_layer.append(child) current_layer = next_layer print(total_board_count, "total boards.")

79 80 81 82 83

build_game_tree() build_game_tree(prevent_duplicates=True) build_game_tree(prevent_equivalents=True) build_game_tree(stop_once_won=True,prevent_equivalents=True)

# # # #

Exercise Exercise Exercise Exercise

9.103 and 9.105 9.104 and 9.106 9.107 9.108

Figure S.9.8 Building a game tree for Tic-Tac-Toe, continued. hQ

(n/2)−1 (2j j=0

hQ

=

Y

n/2 j=1

+ 1)

i hQ

n/2 j=1

i hQ i n/2 j j=1 2

i 2j splitting the numerator into even and odd multiplicands

(n/2)−1

=

cancelling 2j/(2 · j) for each j = 1 . . .

(2j + 1)

n 2

j=0

Y

(n/2)

=

(n − 2i + 1).

reindexing, with i =

n 2

−j

i=1

9.111 999 prefix reversals: choose any positive number (other than 1) of genes to reverse. 9.112 If we choose i as the first index of the range we’re reversing, then there are 1000 − i choices for j: any of the elements from {i + 1, i + 2, . . . , 1000} (which contains all but the first i choices). We cannot choose i = 1000. Thus the total number is 999 X i=1

1000 − i =

999 X

i

reindexing, with i′ = 1000 − i

i=1

= 999 · 1000/2 = 499,500.

9.113 If we choose j as the last index of the range we’re moving to the right, then there are j choices for the first moved element i: any spot from {1, 2, . . . , j}. Having chosen j, there are 1000 − j choices for the element to the right of which the transposed segment will be placed: any spot from {j + 1, j + 2, . . . , 1000}. Thus the total number of transpositions

178

Counting

is 1000 X

 j(1000 − j) = 1000

j=1

1000 X





j − 

1000 X

j=1

 j

2

j=1

1000 · 1000 · 1001 1000 · 1001 · 2001 − 2 6 = 500,500,000 − 333,833,500 = 166,666,500.

=

9.114 There are 20 · 20 = 400 different opening move combinations. The total number of games in the tournament is 150 · 7/2 = 525, which exceeds 400. Thus by the Pigeonhole Principle, at least two games have the same opening pair of moves. 9.115 A sequence of 7 wins and losses is an element of {W, L} . This set has cardinality 27 = 128. There are 150 players, and 150 > 128. Thus by the pigeonhole principle there must be two players with precisely the same sequence of wins and losses. 7

9.116 There are only 8 different possible records—a player can win k games for any k ∈ {0, 1, . . . , 7}, which has size 8. Thus there are 8 categories of players, and there must be a category with at least ⌈150/8⌉ = 19 players in it. 9.117 If a competitor has k wins, then there are 8 − k possibilities for their number of losses: anywhere between 0 and all 7 − k remaining games. Thus the number of possible win–loss-draw records is 7 X k=0

(8 − k) =

8 X j=1

j=

8(9) = 36. 2

Thus there are 36 categories of players, and so there must be a category with at least ⌈150/36⌉ = 5 players in it. 9.118 An update rule is a function f from the states of cell u and cell u’s eight neighbors to the new state of cell u. Denote a cell’s state as an element of {0, 1}, where 0 denotes inactive and 1 denotes active. Thus, more formally, f is a function 9 f : {0, 1} → {0, 1}. The set of functions from A to B has cardinality |B||A| —for each input element a ∈ A, we specify which element b ∈ B is the corresponding output—so there are

| {0, 1} ||{0,1}

9

|

possible update rules. Because | {0, 1} | = 29 = 512, there are a total of | {0, 1} update rules. 9

512

| = 2512 ≈ 1.34078079 × 10154

9.119 A strictly cardinal update rule is a function f : {0, 1} × {0, 1, . . . , 8} → {0, 1}. Thus there are 18 different input values (on + 0, on + 1, . . . , on + 8; off + 0, . . . , off + 8), and therefore 218 = 262,144 different strictly cardinal update rules. 9.120 In a 10-by-10 lattice, there are only 100 cells, each of which can be in one of two states—so there are “only” 2100 different configurations of the system. So, let K = 2100 + 1. By the pigeonhole principle, there must be a configuration that appears more than once in the sequence M0 , . . . , MK . Suppose that the configurations Mx and My are the same, for x < y. This entire system’s evolution is deterministic, so Mx+1 = My+1 too, and Mx+2 = My+2 , and so forth. Thus we’re stuck in an infinite loop: once we reach Mx , later (y − x steps later, to be precise), we’re back in the same configuration. Another y − x steps later, we’re back to the same configuration again. By the pigeonhole principle argument, we know that MK is part of this infinite loop. If MK = MK+1 , then the infinite loop has “period” one: we just repeat MK forever. (And thus we have eventual convergence.) If MK ̸= MK+1 then the infinite loop contains at least two distinct configurations, and the system oscillates. 9.121 The hash function maps A → B, with |A| = 1234 and |B| = 17. By the division rule, there must be a bucket with at least ⌈|A|/|B|⌉ elements—that is, with ⌈1234/17⌉ = 73 elements. If the hash function divides elements perfectly evenly, then 73 would be the maximum occupancy.

9.4 Combinations and Permutations

179

9.122 The condition is simply that |A| ≥ 202|B|. If this condition holds, then by the generalized version of the Pigeonhole Principle, there are |B| categories of objects (the |B| different possible output values of f), and some category must therefore have at least ⌈|A|/|B|⌉ ≥ ⌈202|B|/|B|⌉ = 202 elements. 9.123 There are 5 categories and n inputs, so there must be an output that’s assigned at least n/5 inputs. If that output ki is precisely in the middle of the range of values mapped to it, then there must be a point that’s n/10 away.

9.4

Combinations and Permutations 9.124

9 4

9.125

8 3

 

=

9 5

=

8 5

 

= 126: there are a total of 9 characters, of which 4 come from BACK. = 56: there are a total of 8 characters, of which 3 come from DAY.

 9.126 12 = 924: the only repetition (of P and D) comes within one of the words, and that doesn’t affect the number of 6 shuffles.  9.127 There are 94 shuffles, but any shuffle starting with a shuffle of LIF and D and ending with EEATH is counted twice—  once with LIFE’s E first, and once with DEATH’s E first. There are 43 = 4 such shuffles, so the total number of shuffles  of LIFE and DEATH is 94 − 4 = 122. 9.128 There are only two shuffles, because the resulting string must start with an O and end with an N, so the only choices are ONON and OONN. 9.129 The result must start with O. If its form is OO????, then there are two shuffles, corresponding to the two shuffles of UT and UT, just as in the previous exercise. If its form is OU????, then there are three shuffles, corresponding to the three shuffles of T and OUT, depending on whether nothing, the O, or the OU come before the first T. Thus the total number of shuffles is 5. 9.130 Here is a solution in Python: 1 2 3 4 5 6 7 8 9

def all_shuffles_with_duplicates(x, y): '''Compute a list of all shuffles (with duplicates allowed) of the strings x and y.''' if len(x) == 0: return [y] elif len(y) == 0: return [x] else: return [x[0] + rest for rest in all_shuffles_with_duplicates(x[1:], y)] \ + [y[0] + rest for rest in all_shuffles_with_duplicates(x, y[1:])]

10 11 12

def count_shuffles(x,y): return len(set(all_shuffles_with_duplicates(x, y)))

9.131 If no letters are shared between the two strings, then the total number of shuffles is ! |x| + |y| . |x|  The largest binomial coefficient nk is for k ∈ {⌊n/2⌋ , ⌈n/2⌉}, so the largest possible number of shuffles for two strings of total length n is ! ! n n = . ⌊n/2⌋ ⌈n/2⌉ 9.132 Suppose that x is AAA· · · A and y is AAA· · · A. Then, no matter the order in the shuffle, the resulting shuffle is simply n As—no matter the relative lengths of x and y. In this case, then, there is only one possible shuffle.

180

Counting

9.133 Suppose all symbols are distinct. Then, for each position in the shuffle, we have to choose which of the three input strings it comes from. The easiest way to write this: of the a + b + c slots, we choose a that come from the first string, and then of the b + c remaining slots we choose which b come from the second string. Thus the total number of shuffles is ! ! a+b+c b+c · . a b 9.134

42 16

9.135

23 0

 

+

23 1



+



23 2

+



23 3

= 1 + 23 + 253 + 1771 = 2048 = 211

9.136 In a 32-bit string, having a number of ones within ±2of the numberof zeros means having 15, 16, or 17 ones (and 17, 16, or 15 zeros). Thus the number of such strings is 32 + 32 + 32 . 15 16 17 9.137

64 32



is roughly 1.8326 × 1018 . The first k for which P23 i=0 P24 i=0 P25 i=0 P26 i=0 P27 i=0



64 k  64 k  64 k  64 k  64 k

64 k



is within a factor of twenty of that number is for k = 23.

≈ 3.022 × 1017 ≈ 5.529 × 1017 ≈ 9.539 × 1017 ≈ 1.555 × 1018 ≈ 2.402 × 1018

So there are as many 32-one strings as there are (≤ 27)-one strings in {0, 1} . 64

9.138 We seek the smallest n for which



n n/2

≤ 0.10 · 2n . It turns out that n = 64 is the breaking point:



62 31 262



= 0.1009 · · · and

64 32 264

= 0.0993 · · · .

9.139 52 − 13 = 39 cards are nonspades. We must choose all 13 cards from this set, so there are

39 13



such hands.

9.140 52 − 13 = 39 cards are nonhearts. We must choose 12 cards from this set, and one heart, so there are such hands. 9.141 There are 48 nonkings, and 9 cards left to choose, so there are there are



48 9

9.142 There are 48 nonqueens, and 13 cards left to choose, so there are there are



39 12



·

13 1

such hands.

48 13



such hands.

9.143 There are 48 nonjacks, and 4 jacks. We choose 11 of the former and two of the latter, so there are there are such hands.

48 11



·



4 2

9.144 There are 44 nonjack-nonqueens, 4 jacks,  and 4 queens. We choose 9 of the first, and two each of the latter two categories. Thus there are there are 449 · 42 · 42 such hands. 9.145  We  choose a suit for the honors, and then the remaining 8 cards from the other 47 cards in the deck. So there are 4 · 478 choices. However, this overcounts, because it’s possible to have honors in more than one suit at the same time. 1 The number of hands with double high honors can be thought of this way: two suits in which to have honors, then  choose  choose the remaining three cards from the other 42 cards in the deck: 42 · 43 . It’s impossible to have triple honors—that 3 would require 15 cards.     Thus the total number of hands with high honors is 41 · 47 − 42 · 433 . 8

181

9.4 Combinations and Permutations

9.146 Simply, we can’t have any face cards. There are 9 non-face cards in each suit, for 36 total—thus we have hands.

36 13



such

9.147 A zero-point hand has only non-face cards, and has three or more cards in each suit. That is, the hand must have (3, 3, 3, 4) cards per suit. Thus we can represent a zero-point hand as follows: first we choose which suit gets four cards. Then we choose 4 of the 9 non-face cards in the 4-card suit and in each 3-card suit we choose 3 of the 9 non-face cards. 3 In total, that’s 4 · 93 · 94 hands.  52 There are 13 total bridge hands. Therefore the total fraction of hands that have zero points is 0.0004704 · · · . I seem to get them a lot more often than that. 9.148 20232 9.149

202! (202−32)!

=

9.150

202+32−1 32

9.151

202 32

9.152

10 5

9.153

14 5

9.154

29 10

9.155

20 10



   

=



202! 170!

=

233 32



202! (202−32)!·32!

= =

233! 201!·32!

202! 170!·32!

= 252 = 2002 = 20,030,010 = 184,756

9.156 2nn 9.157 2n · (2n − 1) · (2n − 2) · · · (n + 1) =

(2n)! n!

9.158 All that matters is how many of the n elements of x are matched by each element of y; the alignment follows immediately. We can think of this process, then, as choosing n elements out of a set of 2n options, where repetition is   allowed but order doesn’t matter. Thus there are 2n+n−1 = 3n−1 ways to make this choice. n n 9.159 All that matters is which n elements of y we match; the alignment follows immediately. There are this.

2n n



ways of doing

9.160 This question is equivalent to choosing 202  elements from {a, b, c}, where order doesn’t matter and repetition is allowed. So the total number of choices is 204 = 20,706. 202 9.161 This question is equivalent to choosing 8 elements from {a, b, c, d, e}, where order doesn’t matter and repetition is  allowed, so the number of ways is 12 = 495. 8 9.162 This question is equivalent to choosing 88 elements from {a, b, c, d, e}, where order doesn’t matter and repetition  is allowed, so the number of ways is 92 = 2,794,155. 88

182

Counting  9.163 Suppose b = 64 − i; then a + c = 2i; by Theorem 9.16 there are 2i+1 = 2i + 1 ways to choose a and c to make 2i a + c = 2i (one for each a ∈ {0, 1, . . . , 2i}). Thus the number of choices is ! 64 64 X X 2i + 1 = 2i + 1 2i i=0 i=0

= 65 + 2

64 X

above discussion

i

i=1

= 65 + 2 · 32 · 65 = 652 . 9.164



141 3

9.165 103 9.166

29 20



 9.167 11 locks—one to ensure that each subset of size 5 is locked out. (For a subset A of size 5, everyone other than 5 the members of A gets the key associated with subset A; the members of Aare the ones who do not get that key.) Each scientist needs a key for every subset of size 6 she’s in, so she must carry 105 keys—there are 10 other scientists of which 5 are chosen in a particular subset. 9.168 There are n! orderings of the data, and then we divide by the number of orderings within each group ((n/10)! each), and the number of orderings of the groups (10!): 

n 10

n!  10

!

· 10!

9.169 One way to think about this problem: we write down n zeros, which leaves n + 1 “gaps” before/between/after the ways to do so. zeros. We must choose which k of these gaps to fill with a one—so there are n+1 k Alternatively, let T(n, k) denote the number of valid bitstrings for a given k and n. We’ll prove for any n ≥ 0 and k ≥ 0 that T(n, k) = n+1 by induction on n. k Base case (n = 0). If k = 0, there’s only one valid string (the empty string); if k = 1, there’s  still one valid string (1); if 1 k ≥ 2, there’s no valid string (because we have no zeros to separate the ones). Indeed 10 = 11 = 1 and k≥2 = 0.  Inductive case (n ≥ 1). We assume the inductive hypothesis that, for any k ≥ 0, we have T(n − 1, k) = nk . We must  prove that T(n, k) = n+1 for any k ≥ 0. Every (n, k)-valid string either starts with 0 and continues with (n − 1, k)k valid suffix, or it starts with 1, followed immediately by a 0 (otherwise there’d be two consecutive ones), and then continuing with (n − 1, k − 1)-valid suffix. In other words, T(n, k) = T(n − 1, k) + T(n − 1, k − 1). Thus we have T(n, k) = T(n − 1, k) + T(n − 1, k − 1)   n n

= =

+

k k−1  n+1 . k

above discussion inductive hypothesis Pascal’s Identity (Theorem 9.19)

9.170 There is a bijection between bitstrings x ∈ {0, 1} with n zeros and k ones in even-length blocks and string  n+(k/2) y ∈ {0, 2} with n zeros and 2k twos, where the bijection replaces 11 with a 2. The latter set has size n+(k/2) by n definition. n+k

9.171 The combinatorial proof: suppose we have a set of team of n players, and we need to choose a starting lineup of k of  n them, of whom one will be the captain. Thequantity k · nk represents first choosing  the k starters—that’s k choices; then we pick one of them as the captain, with 1k = k options. The quantity n · n−1 represents first choosing a  captain—with k−1 n = n choices; then we pick the k − 1 remaining starters from the n − 1 remaining players, with n−1 options. 1 k−1

9.4 Combinations and Permutations

The algebraic proof: ! k · n! n k· = k (n − k)! · k!

=

183

definition of binomial coefficient

k · n · (n − 1)! [(n − 1) − (k − 1)]! · k · (k − 1)!

definition of factorial; n − k = (n − 1) − (k − 1)

(n − 1)! [(n − 1) − (k − 1)]! · (k − 1)!

cancellation

=n·

! n−1 . k−1

=n·

definition of binomial coefficient

 P 9.172 We will prove ni=0 ni = 2n for all n ≥ 0 by induction on n.   P Base case (n = 0). 0i=0 0i = 00 = 1, and indeed 20 = 1.   P P n−1 Inductive case (n = 1). We assume the inductive hypothesis n−1 = 2n−1 . We must prove that ni=0 ni = 2n . i=0 i ! ! ! n−1 ! n X X n n n n = + + definition of summations i 0 n i i=0 i=1 ! ! n−1 " ! !# X n n n−1 n−1 = + + + Pascal’s identity 0 n i i−1 i=1 ! ! " n−1 !# " n−1 !# X n−1 X n−1 n n = + + splitting the summation + 0 n i i−1 i=1 i=1 ! " n−1 !# " n−2 !# ! X n−1 X n−1 n n = + reindexing and reordering + + 0 i i n i=1 i=0 ! ! " n−1 !# " n−2 !# X n−1 X n−1 n−1 n−1 (n) ( ) () ( ) + + + = = 1 = n−1 and nn = 1 = n−1 0 0 n−1 n − 1 0 i i i=1 i=0 " n−1 !# " n−1 !# X n−1 X n−1 = definition of summations + i i i=0 i=0

= 2n−1 + 2n−1 = 2n .

inductive hypothesis

9.173 I’m running a baking contest with 2n chefs, of whom n are professional and n are amateurs. Everyone makes a chocolate cake, and n winners are chosen. How many ways of selecting the n winners are there?  • There are 2n competitors, and n winners. Thus there are 2nn ways  of selecting the winners. • Suppose that precisely k of the n amateurs win. Thenthere are nk ways to pick the amateur winners. That leaves  n− k n n winners out of the n professionals, and there are n−k ways to pick the professional winners. Thus there are nk · n−k ways to pick the winners if there are k amateur winners. The value of k can range from 0 to k, so the total number of ways to pick the winners is n X  n



n n−k

k

,

k=0

which equals n X  n 2 k

by Theorem 9.18, because

n k



=



n n−k

so



n k

·



n n−k

k=0

=

n k



·

n k

 .

184

Counting

2n n

Because we’ve counted the same set both ways, we have



=

n X  n 2 k

.

k=0

9.174 Here’s an algebraic proof: ! n m

! m n! m! = · k (n − m)! · m! (m − k)! · k! n! (n − m)! · (m − k)! · k! n! (n − k)! = · k! · (n − k)! (n − m)! · (m − k)! (n − k)! n! · = k! · (n − k)! ([n − k] − [m − k])! · (m − k)!

=

=

! n · k

! n−k . m−k

9.175 Let S denote the set of all choices of pairs ⟨T, M⟩ where: T ⊆ {1, . . . , n} is a team chosen from a set of n candidates, |T| = m, M ⊆ T is a set of managers chosen from the chosen team, and |M| = k. We calculate |S| in two different ways:

• choose m team members from the set ofn candidates. Then choose k of those m to be managers. By the generalized product rule, then, we have |S| = mn mk . • choose k managers from the set of n candidates. Then choose  n−k m − k non-manager team members from the remaining n − k unchosen candidates. Then we have that |S| = nk m−k .  m

n m

Therefore

k

=

n k





n−k m−k

.

9.176 Here is a combinatorial proof. There is a binder containing n + 1 pages of tattoo designs, one design per page. You intend to get m + 1 tattoos. Here are two ways to choose: 

n+1 • Just pick m + 1 of the n + 1 pages (repetition forbidden; order irrelevant). There are m+1 ways to do so. • Choose a page k such that the design on page k+ 1 will be the last tattoo you choose. Select m of the pages {1, 2, . . . , k}

as the other tattoos you’ll select. There are mk ways of choosing m additional tattoos from the first k pages. Your choice of k can range from 0 to n (so that page k + 1 exists in the binder).

Thus

n+1 m+1



=

Pn k=0

k m



.

9.177 We’ll prove that the property ⌊n/2⌋

X



n−k k

= fn+1 ,

k=0

which we’ll call P(n), holds for all n ≥ 0 by strong induction on n. Base cases (n = 0 and n = 1). For n = 0 and n = 1, we have ⌊n/2⌋ = 0. So ⌊n/2⌋

X k=0

And f0 = f1 = 1. Thus P(0) and P(1) hold.



n−k k

=

0 X k=0



n−k k

=

n 0



= 1.

185

9.4 Combinations and Permutations

Inductive case (n ≥ 2). We assume the inductive hypotheses, namely P(0), P(1), . . . , P(n − 1). We must prove P(n). Observe that ⌊n/2⌋

X



n−k k

k=0



= 

= 

= 

= 

=

⌊n/2⌋

X

 n−1−k k

k=0 ⌊n/2⌋

X

 n−1−k k

k=0 ⌊n/2⌋

X

 n−1−k k

k=0 ⌊n/2⌋

X

 n−1−k k

k=0 ⌊n/2⌋

X



n−1−k k





+ 



+ 



+ 



+

⌊n/2⌋

X

 n−1−k k−1

k=0 ⌊n/2⌋

X

 n−1−k k−1

 

Pascal’s Identity

 (n−1−0)



0−1

k=1 ⌊n/2⌋−1

X

 n−2−j j

 

j=0 ⌊(n−2)/2⌋

X

=0

 (n−2)−j j

reindexing the second term with j := k − 1

 

⌊n/2⌋ − 1 = ⌊(n − 2)/2⌋

j=0



 + fn−1 .

inductive hypothesis

k=0

We’d be done if the first term is equal to fn , because fn + fn−1 = fn+1 by definition of the Fibonacci numbers. We’ll show this in two cases, depending on the parity of n: Case A: n is odd. Then there’s not much to do:     ⌊n/2⌋ ⌊(n−1)/2⌋ X n−1−k X  (n−1)−k  =  ⌊n/2⌋ = ⌊(n − 1)/2⌋ when n is odd k

k

k=0

k=0

= fn .

inductive hypothesis

Case B: n is even. There’s slightly more work here:     ⌊n/2⌋ ⌊(n−1)/2⌋+1 X n−1−k X  (n−1)−k  =  k

k=0

n/2 = ⌊n/2⌋ = ⌊(n − 1)/2⌋ + 1 when n is even

k



=

k=0 ⌊(n−1)/2⌋

X



(n−1)−k k

 

But for k = n/2, we have

((n−1)−k) k

=

(n−1−n/2) n/2

=

(n/2−1) n/2

=0

k=0

= fn .

inductive hypothesis

In either case, we’ve now shown that ⌊n/2⌋

X

 n−k k

k=0



=

⌊n/2⌋

X

  n−1−k  + fn−1 k

original argument above

k=0

= fn + fn−1 = fn+1 .

even/odd case-based argument above definition of the Fibonaccis

9.178 Suppose you have a deck of n red cards and m black cards, from which you choose a hand of k total cards.  • There are n + m total cards of which you choose k, so there are n+m ways to do it. k   m • If you choose r red cards, then you must choose k − r black cards. There are nr · k−r ways to do it. You can choose   Pk m any r between 0 and k, so there are r=0 k−r · nr ways in total. (Note that this formula is a generalization of Exercise 9.173.)

186

Counting

9.179 There are 2n subsequences of x and 2n subsequences of y. Thus we execute the a = b test 2n · 2n = 22n = 4n times.  9.180 There are ni subsequences of x of length i, and there are executions of the a = b test is n X  n

·

i

n i



n i



subsequences of y of length i. So the total number of

! 2n , n

=

i=0

by Exercise 9.173. 9.181 The performance in Figure 9.38b is ! √ 2n 2π2n 2ne 2n 2n! = ≈ √ n √ n n n! · n! 2πn ne 2πn ne  2 n

= √

4n e2

πn

1 = √ · πn

 n n e

 n n e

!n

4n2 e2 n e

·

n e

1 · 4n . πn

= √ √

Thus the latter is an improvement by a factor of ≈ πn.

9.182 We are asked to prove

n P

(−1)k ·

k=0

n k



= 0 using the Binomial Theorem:

n X

(−1)k ·

n k



=

n X

(−1)k · 1n−k ·



n k

k=0

k=0

= (1 + −1)n = 0n = 0.

9.183 We are asked to prove

Pn

(nk)

k=0 2k

=

 3 n 2

using the Binomial Theorem:

n X  n k k=0

Binomial Theorem

/ 2k =

n X

(1/2)k · 1n−k ·

n k



k=0

= ((1/2) + 1)n = (3/2)n .

Binomial Theorem

S 9.184 Consider an element x ∈ ki=1 Ai . Suppose that x appears in exactly xℓ of the sets Ai s—in other words, suppose that | {i : x ∈ Ai } | = xℓ . Observe that the inner summation in the given formula, over all i-set intersections, can equivalently

9.4 Combinations and Permutations

187

be written as a summation over elements: " # k X X i+1 (−1) · |Aj1 ∩ Aj2 ∩ · · · ∩ Aji | i=1

=

k X i=1

=

k X i=1

=

k X

 (−1)

=

= =

·

 (−1)

X ∪ x∈ ki=1 Ai

X ∪k

x∈ i=1 Ai

=

i+1

X ∪ x∈ ki=1 Ai

X ∪ x∈ ki=1 Ai

X

∪ x∈ ki=1 Ai

"

"

i+1

X

j1 = 1 for h in hands]) / len(hands)) sum([h.count_pairs() >= 2 for h in hands]) / len(hands)) sum([h.count_fifteens() >= 1 for h in hands]) / len(hands))

69 70 71 72 73 74 75 76

deck = [Card(r,s) for r in "A23456789TJQK " for s in ["clubs", "diamonds", "hearts", "spades"]] random_hands = [Hand(random.sample(deck, 4)) for i in range(1000000)] all_hands = [Hand([deck[i], deck[j], deck[k], deck[l]]) for i in range(len(deck)) for j in range(i) for k in range(j) for l in range(k)]

77 78 79 80 81

print("----- by sampling -------") verify(random_hands) print("----- exhaustively -------") verify(all_hands)

Figure S.10.2 Counting the number of Cribbage hands with certain properties. (See also Figure S.10.1.)

193

10.2 Probability, Outcomes, and Events

10.26 We lose if the median is in the first quarter of the indices. This occurs if two of the three sampled indices are in the first quarter, which occurs with probability

( 1/ 4) 3



3 2

+

all three are in the first quarter

· (1/4)2 · (3/4)

=

1 64

9 64

+

=

10 . 64

two of three are in the first quarter

There’s a similar problem if the median is in the last quarter. Thus we have a success probability of 1 − 10 − 10 = 64 64

44 64

=

11 . 16

10.27 We lose if the median is in the first (1 − α)n of the indices, which occurs if two of the three sampled indices are in this range, which occurs with probability

(1 − α)3

3 2

+

all three are in the first (1 − α)n



· (1 − α)2 · α

= (1 − α)2 (1 + 2α).

two of three are in the first (1 − α)n

There’s a similar problem if the median is in the last (1−α)n. Thus we have a success probability of 1−2(1−α)2 (1+2α). 10.28 Suppose Emacs wins with probability p. The tree diagram is shown in Figure S.10.3. Emacs wins in half of these branches. You can add up the probabilities, or see it this way: p3

3p3 (1 − p)

+

AAA three straight wins

6p3 (1 − p)2 .

+

BAAA, ABAA, AABA one loss (three ways)

AABBA, ABABA, ABBAA, BAABA, BABAA, BBAAA two losses (six ways)

For p = 0.6, this probability is 2133/55 = 0.68256. 10.29 Suppose Emacs wins with probability p. Then Emacs wins in 5 games with probability 6p3 (1 − p)2 and loses in 5 games with probability 6p2 (1 − p)3 . Thus the series goes 5 games with probability 6p (1 − p) (p + 1 − p) = 6p (1 − p) . 2

2

2

2

For p = 0.6, this probability is 216/54 = 0.3456.

1−p

p

1−p

p

1−p

p

1−p

p

1−p

p

1−p

p

1−p

p

BBB

AAA

p

1−p

p

1−p

p

p

1−p

AABBA AABBB

p

1−p

p

ABBB BAAA

ABAA

AABA

1−p

p

1−p

p

1−p

ABABA ABABB ABBAA ABBAB

Figure S.10.3 A tree diagram for Exercise 10.28.

1−p

p

BBAB

BABB

p

1−p

p

1−p

BAABA BAABB BABAA BABAB

1−p

p

1−p

BBAAA BBAAB

194

Probability

10.30 Plugging p = 0.70 into the formulas from the last two exercises, we have that the probabilities are h i   3 2 73 54 p 1 + 3(1 − p) + 6(1 − p) = 1000 · 1 + 109 + 100 = 0.83692 6p (1 − p) = 6 · 2

2

49 100

·

9 100

= 0.2646

10.31 The probability of the series having a fifth game is, as computed in the previous exercises, 6p2 (1 − p)2 . Differentiating, we have that the extreme points occur at solutions to 0 = 6p(1 − p)(2 − p). The solutions to this equation are p ∈ {0, 1/2, 1}. The values p = 0 and p = 1 are minima; p = 1/2 is a maximum, and the probability here is 6/24 = 3/8 = 0.375. 10.32 There is a fourth game if there’s not a sweep—that is, if neither team wins the first three games. So the probability is 1 − p3 − (1 − p)3 . Differentiating and solving for p when the derivative is zero yields that p ∈ {0, 1/2, 1} are the extreme points. Again, we get that p = 0 and p = 1 are minima and p = 1/2 is a maximum. Here, then, the probability that fourth game occurs is 1 − (1/8) − (1/8) = 3/4 = 0.75. 10.33 The probability of a fourth game is 1 − p3 − (1 − p)3 ; Emacs wins it with probability p · [1 − p3 − (1 − p)3 ] = p − p4 − p(1 − p)3 . Multiplying out and combining terms, we have that the probability in question is 3p − 3p . 2

3

Differentiating, we have 0 = 6p − 9p2 = 3p(2 − 3p). We have a minimum at p = 0 and a maximum at p = 2/3, when the probability is 4/9. 10.34 Let s ∈ S be an outcome. Then

P x∈S

Pr [x] = 1 by assumption, and therefore X

Pr [s] +

Pr [x] = 1.

x∈S−{s}

Subtracting all other probabilities from both sides, we have

Pr [s] = 1 −

X

Pr [x]

x∈S−{s}

≤1−

X

0

x∈S−{s}

= 1, again by the assumption (Definition 10.2). 10.35 For any event A ⊆ S, we have 1=

X

Pr [x]

by Definition 10.2

x∈S

=

X

=

X x∈A

S=A∪A

Pr [x]

x∈A∪A

Pr [x] +

X

Pr [x]

 

= Pr [A] + Pr A .   Thus Pr A = 1 − Pr [A].

A and A are disjoint

x∈A

definition of probability of an event

10.2 Probability, Outcomes, and Events

195

10.36 Let A, B ⊆ S be events. We must prove that Pr [A ∪ B] = Pr [A] + Pr [B] − Pr [A ∩ B]. This property is just Inclusion–Exclusion, weighted by the probability function Pr: by definition, X Pr [A ∪ B] = Pr [x] x∈A∪B

and we can rewrite A ∪ B as A ∪ (B − A), which are disjoint and therefore Pr [A ∪ (B − A)] = Pr [A] + Pr [B − A]. And Pr [B − A] = Pr [B] − Pr [B ∩ A] because B ∩ A ⊆ B. Thus

Pr [A ∪ B] = Pr [A ∪ (B − A)] = Pr [A] + Pr [B − A] = Pr [A] + Pr [B] − Pr [B ∩ A] . 10.37 The Union Bound follows immediately from the previous exercise and induction on n. More specifically: because Pr [A ∪ B] = Pr [A] + Pr [B] − Pr [A ∩ B] and Pr [A ∩ B] ≥ 0, we have Pr [A ∪ B] ≤ Pr [A] + Pr [B]. By applying this property to the two events An and (A1 ∪ A2 · · · ∪ An−1 ), the Union Bound follows. (It’s a good exercise to write out the rigorous inductive proof, which we’ve omitted here.) 10.38 Machine i tries to broadcast with probability p. All n − 1 other machines choose not to try to broadcast with probability (1 − p)n−1 . So machine i broadcasts successfully with probability p(1 − p)n−1 . There are n different possible choices of i, so the overall probability is np(1 − p)n−1 . 10.39 The probability np(1 − p)n−1 from the previous exercise is maximized when its derivative is zero. And d np(1 dp

− p)n−1 = n(1 − p)n−1 − n(n − 1)p(1 − p)n−2 .

So we choose p such that 0 = n(1 − p) n(n − 1)p(1 − p)

n−1

− n(n − 1)p(1 − p)n−2

= n(1 − p)n−1 (n − 1)p = 1 − p n − 1 = 1/p − 1 p = 1/n. n−2

So we maximize the probability of success when p = 1/n. (Incidentally, p = 1/n is precisely the probability setting for which we would expect exactly one machine to try to broadcast.) 10.40 The probability of success is np(1 − p)

n−1

= n(1/n)(1 − 1/n)n−1 =

By the given fact, this quantity is approximately

e−1 , 1−1/n

(1 − 1/n)n . 1 − 1/n

which tends to 1/e as n grows large.

10.41 There are 10 · 9 · 8 triples from {1, . . . , 10} with no duplications, and all 103 triples are equally likely. Thus the 720 probability of no collisions is 1000 . 3

10.42 All triples from {1, . . . , 10} are equally likely, and only {⟨1, 1, 1⟩, ⟨2, 2, 2⟩, . . . , ⟨9, 9, 9⟩} have all the hash values 10 1 equal. Thus the probability of all three elements having the same hash value is 1000 = 100 . 3

10.43 Once x1 is placed, say in slot s, there’s a 0.3 probability that x2 is in a slot ±1 of x1 : slot s − 1 or slot s or slot s + 1. (A collision between x1 and x2 leads to an adjacent slot being filled, as does x2 hashing to either unoccupied spot immediately to the left or right of x1 .) Thus there’s a 0.7 chance that x1 and x2 are placed without occupying adjacent cells. In this case, then 6 of the 10 slots are ±1 of x1 or x2 —and so there’s a 0.6 probability of an adjacency occurring when x3 is placed.

196

Probability

Therefore there’s a 0.3 chance of the first type of adjacency (between x1 and x2 ), and a 0.7 · 0.6 chance that there’s no x1 -and-x2 adjacency but there is an adjacency of between x3 and either x1 or x2 . Overall, then, the probability of two adjacent cells being filled is 0.3 + 0.7 · 0.6 = 0.72. 10.44 Place x1 ; its location doesn’t matter. (The cells are completely symmetric, so the situation is identical regardless of where it’s placed.) If x2 is adjacent to it (probability 0.2) or collides with it (probability 0.1), then there are two adjacent filled cells when x3 is placed. There’s a 0.4 chance x3 is within one of the pair of filled cells: the two adjacent cells to the pair, or a collision with either of the two. If x2 leaves a one-cell gap from x1 (probability 0.2), then there’s a 0.2 chance x3 fills in the gap, by hitting it directly or hitting the left of the two filled cells. Otherwise—if x1 and x2 are separated by more than one cell—then there’s no chance that a contiguous block of three cells will end up being filled. In sum, then, the probability that 3 adjacent slots are filled is 0.2 · 0.2 x1 and x2 are separated by one cell (and x3 fills that gap)

+

0.3 · 0.4

= 0.16.

x1 and x2 are adjacent (and x3 lands next to one of them)

10.45 Hashing to any of the k full blocks, or the unoccupied adjacent cells, extends the block. Thus there are k + 2 cells within or immediately adjacent to the block; if the next element hashes into one of those then the block grows. In other words, the block is extended with probability k+2 . n 10.46 A solution in Python is shown in Figure S.10.4. When I ran the tester for LinearProbingHashTable, I got statistics of x5000 moving about 1.5 slots on average (with some small variation, but again less than about ±0.1 slots across runs), with a maximum movement varying more widely but around 30 to 40 slots. 10.47 Again, see Figure S.10.4. When I ran the tester for QuadraticProbingHashTable, I got statistics of x5000 moving about 1.1 slots on average (with some small variation, about ±0.1 slots across runs), with a maximum movement varying between around 10 to 15 slots. 10.48 The major difference between clustering and secondary clustering is that every element in a clustered block with linear probing has the same probe sequence: no matter where you “enter” the block, you “exit” on the right-hand side, one slot at a time. Suppose slot a is full: although any two elements with h(x) = h(y) = a have the same probe sequence, if h(x) = a and h(y) = a + 1 then x and y have different probe sequences in quadratic probing. Thus the clustering in linear probing is worse than the clustering in quadratic probing. 10.49 Once again, a solution in Python is shown in Figure S.10.4. The tester for DoubleHashingHashTable gave me the statistics of x5000 moving between 0.9 and 1.0 slots on average, with a maximum movement of around 10 to 12 slots. 10.50 The problem of clustering drops by a large proportion with double hashing: the only time x and y have the same probe sequence is if both hash functions line up (that is, h(x) = h(y) and g(x) = g(y), which will happen only n12 of the time instead of 1n of the time. But it comes at a cost: we have to spend the time to compute two hash functions instead of one, so it’s noticeably slower when there aren’t too many collisions. 10.51 We differentiate and set the result to zero:  k d n · p · (1 − p)n−k = 0 dp k  ⇔ nk [kpk−1 (1 − p)n−k + (n − k)pk (1 − p)n−k−1 (−1)] = 0

⇔ ⇔ ⇔ ⇔ Thus k = np, or p =

k n

(1 − p)n−k = (n − k)pk (1 − p)n−k−1 k(1 − p) = (n − k)p k − kp = np − kp k = np. k−1

kp

maximizes this probability. (You can check that this extreme point is a maximum.)

10.2 Probability, Outcomes, and Events

1

197

import random

2 3 4 5

class HashTable: def __init__(self, slots): self.table = [None for i in range(slots)]

6 7 8

def next_slot(self, x, starting_slot, attempt_number): raise NotImplementedError("No probe sequence operation was implemented")

9 10 11 12 13 14 15 16 17 18 19 20

def add_element(self, x): '''Adds the element x to a randomly chosen cell in this table; Returns the **number** of cells probed (beyond the original).''' starting_slot = random.randint(0, len(self.table) - 1) slot = starting_slot attempt = 0 while self.table[slot] != None: attempt += 1 slot = self.next_slot(x, starting_slot, attempt) self.table[slot] = x return attempt

21 22 23 24

class LinearProbingHashTable(HashTable): def next_slot(self, x, starting_slot, attempt_number): return (starting_slot + attempt_number) % len(self.table)

25 26 27 28

class QuadraticProbingHashTable(HashTable): def next_slot(self, x, starting_slot, attempt_number): return (starting_slot + attempt_number**2) % len(self.table)

29 30 31 32 33

class DoubleHashingHashTable(HashTable): def __init__(self, slots): self.table = [None for i in range(slots)] self.stepper = {}

34 35 36 37 38

def next_slot(self, x, starting_slot, attempt_number): if attempt_number == 1: self.stepper[x] = random.randint(1, len(self.table) - 1) return (starting_slot + attempt_number * self.stepper[x]) % len(self.table)

39 40 41 42 43 44 45 46 47

def sample(hash_table_class, elements, slots, samples): movements = [] for run in range(samples): T = hash_table_class(slots) for x in range(elements): movement = T.add_element(x) movements.append(movement) print(str(hash_table_class), "average =", sum(movements) / samples, "; max =", max(movements))

48 49 50 51

sample(LinearProbingHashTable, 5000, 10007, 2048) sample(QuadraticProbingHashTable, 5000, 10007, 2048) sample(DoubleHashingHashTable, 5000, 10007, 2048)

Figure S.10.4 Testing how far elements move when resolving collisions in various ways in a hash table.

10.52 We differentiate and set the result to zero: d (1 dp

− p)n−1 p = p(1 − p)n−2 (−1)(n − 1) + (1 − p)n−1 = 0

⇔ ( n − 1) p = 1 − p ⇔ np − p = 1 − p ⇔ np = 1. (Again, you can check that this extreme point is a maximum.) Thus the maximum likelihood estimate occurs when np = 1, that is when p = 1/n.

198

Probability

10.3 Independence and Conditional Probability 10.53 The events in question are:

• Even: {2, 4, 6, 8, 10, 12}. (Probability 12 .) • Divisible by 3: {3, 6, 9, 12}. (Probability 31 .) • Both: {6, 12}. (Probability 16 .) Because

1 6

=

1 2

· 13 , the events are independent.

10.54 The events in question are:

• Even: {2, 4, 6, 8, 10, 12}. (Probability 12 .) • Divisible by 5: {5, 10}. (Probability 16 .) • Both: {10}. (Probability 121 .) Because

1 12

=

1 2

· 16 , the events are independent.

10.55 The events in question are:

• Even: {2, 4, 6, 8, 10, 12}. (Probability 12 .) • Divisible by 6: {6, 12}. (Probability 16 .) • Both: {6, 12}. (Probability 16 .) Because

1 6

̸=

1 12

=

· 16 , the events are not independent.

1 2

10.56 The events in question are:

• Even: {2, 4, 6, 8, 10, 12}. (Probability 12 .) • Divisible by 7: {7}. (Probability 121 .) • Both: none. (Probability 0.) Because 0 ̸=

1 24

=

1 2

·

1 , 12

the events are not independent.

10.57 The events in question are:

• R: {1, 2, 3, 4, 9, 10, 11, 12}. (Probability 128 .) • Odd: {1, 2, 3, 5, 7, 8, 10, 12}. (Probability 128 .) • Both: {1, 2, 3, 10, 12}. (Probability 125 .) Because

5 12

̸=

4 9

=

8 12

·

8 , 12

the events are not independent.

10.58 The events in question are:

• Even: 0, 2, 4, or 6 heads. Probability: 1 26

·

h  6 0

6 2

+



6 4

+



• Divisible by 3: 0 or 3 or 6 heads. Probability: 1 26

·

h  6 0

• Both: 0 or 6 heads. Probability: 1 26

Because

1 32

̸=

11 64

=

1 2

·

22 , 64

·

6 3

+

h  6 0



6 6

+

+

6 6

+

6 6

i

i

i

=

=

=

1+15+15+1 64

1+20+1 64

1+1 64

=

=

= 12 .

22 . 64

1 . 32

the events are not independent.

10.59 The events in question are:

• Even: 0, 2, 4, or 6 heads. Probability: 1 26

·

h  6 0

+

6 2



+

6 4



+

6 6

i

=

1+15+15+1 64

= 12 .

10.3 Independence and Conditional Probability

199

• Divisible by 4: 0 or 4 heads. Probability: 1 26

• Both: 0 or 4 heads. The probability is still Because

1 16

̸=

1 32

=

·

1 2

1 , 16

·

h  6 0

6 4

+

i

=

1+15 64

=

1 . 16

1 . 16

the events are not independent.

10.60 The events in question are:

• Even: 0, 2, 4, or 6 heads. Probability: 1 26

·

h  6 0

6 2

+



+

6 4



+

6 6

i

=

1+15+15+1 64

= 12 .

• Divisible by 5: 0 or 5 heads. Probability: 1 26

• Both: 0 or 5 heads. The probability is still Because

7 64

̸=

7 128

=

1 2

·

7 , 64

·

h  6 0

+

6 5

i

=

1+6 64

=

7 . 64

7 . 64

the events are not independent.

10.61 Here’s a table of the sample space and the two events: outcome HHH HHT HTH HTT THH THT TTH TTT

number of heads in {a, b} is odd? 7 7 3 3 3 3 7 7

number of heads in {b, c} is odd? 7 3 3 7 7 3 3 7

Thus each event happens individually in 4 of 8 outcomes (probability 21 ), and they happen together in 2 of 8 outcomes (probability 14 ). Because 14 = 12 · 12 , the events are independent. 10.62 The outcomes are the same as in the previous exercise, but the probabilities have changed: outcome HHH HHT HTH HTT THH THT TTH TTT

number of heads in {a, b} is odd? 7 7 3 3 3 3 7 7

number of heads in {b, c} is odd? 7 3 3 7 7 3 3 7

probability p3 p2 (1 − p) p2 (1 − p) p(1 − p)2 p2 (1 − p) p(1 − p)2 p(1 − p)2 (1 − p)3

200

Probability

Thus the probabilities of the events are:

Pr [{a, b} has odd number of heads] = 2p2 (1 − p) + 2p(1 − p)2 = 2p(1 − p)[p + 1 − p] = 2p(1 − p). Pr [{b, c} has odd number of heads] = 2p(1 − p). Pr [both {a, b} and {b, c} have odd numbers of heads] = p2 (1 − p) + p(1 − p)2 = p(1 − p)[p + 1 − p] = p(1 − p). The events are independent if and only if the product of the individual probabilities equals the probability of both occurring together. That is, the events are independent if and only if p(1 − p) = 2p(1 − p) · 2p(1 − p)

⇔ p(1 − p)[4p(1 − p) − 1] = 0. This equation is true if and only if p = 0, p = 1, or 4p(1 − p) − 1 = 0, which occurs when 0 = 4p − 4p − 1 = (2p − 1) . 2

2

Thus the events are independent if and only if p ∈ {0, 21 , 1}. 10.63 Pr [A] = 68 , Pr [B] = 28 , Pr [A ∩ B] = Specifically, they are negatively correlated.

0 . 8

Thus Pr [A ∩ B] < Pr [A] · Pr [B], so the events are dependent.

10.64 Pr [A] = 68 , Pr [C] = 68 , Pr [A ∩ B] = Specifically, they are positively correlated.

6 . 8

Thus Pr [A ∩ C] > Pr [A] · Pr [C], so the events are dependent.

10.65 Pr [B] = 28 , Pr [C] = 68 , Pr [B ∩ C] = Specifically, they are negatively correlated.

0 . 8

Thus Pr [B ∩ C] < Pr [B] · Pr [C], so the events are dependent.

10.66 Pr [A] = 68 , Pr [D] = 48 , Pr [A ∩ D] = 38 . Thus Pr [A ∩ D] = Pr [A] · Pr [D], so the events are independent. 10.67 Pr [A] = 68 , Pr [E] = 48 , Pr [A ∩ E] = 38 . Thus Pr [A ∩ E] = Pr [A] · Pr [E], so the events are independent. 10.68 Pr [A ∩ B] = 08 , Pr [E] = 48 , Pr [(A ∩ B) ∩ E] = 08 . Thus Pr [(A ∩ B) ∩ E] = Pr [A ∩ B] · Pr [E], so the events are independent. 10.69 Pr [A ∩ C] = 68 , Pr [E] = 48 , Pr [(A ∩ C) ∩ E] = 38 . Thus Pr [(A ∩ C) ∩ E] = Pr [A ∩ C] · Pr [E], so the events are independent. 10.70 Pr [A ∩ D] = 38 , Pr [E] = 48 , Pr [(A ∩ D) ∩ E] = 08 . Thus Pr [(A ∩ D) ∩ E] < Pr [A ∩ D] · Pr [E], so the events are dependent. Specifically, they are negatively correlated. 10.71 Events A and B are independent if and only if Pr [A ∩ B] = Pr [A]·Pr [B]. But Pr [B] = 0 implies that Pr [A ∩ B] = 0, because A ∩ B ⊆ B. Thus

Pr [A ∩ B] = 0 = Pr [A] · 0 = Pr [A] · Pr [B] . Thus A and B are independent, by definition.

above discussion x·0=0 by assumption Pr [B] = 0

201

10.3 Independence and Conditional Probability

10.72 Proving one direction suffices: the converse is the same claim, for the events A and B (because B = B). Assume that A and B are independent. We’ll prove that A and B are independent:





Pr A ∩ B = Pr [A] − Pr [A ∩ B] = Pr [A] h− Pr [A] · Pr i [ B] = Pr [A] 1 − Pr [B]

[ ] law of total probability: Pr [A] = Pr [A ∩ B] + Pr A ∩ B A and B are independent factoring

  = Pr [A] Pr B .

Theorem 10.4

10.73 A solution in Python is shown as the SubstitutionCipher class and get_random_substitution_cipher function in Figure S.10.5.

10.74 A solution in Python is shown as the get_letter_frequencies function in Figure S.10.5.

10.75 A solution in Python is shown as the decrypt_substitution_cipher_by_frequency function in Figure S.10.5.

10.76 A solution in Python is shown as the CaesarCipher class and get_random_caesar_cipher function in Figure S.10.6.

10.77 A solution in Python is shown as the decrypt_caesar_cipher_by_frequency function in Figure S.10.6.

10.78 There are 2n outcomes, each of which is equally likely. Of these outcomes, Ai,j occurs in precisely 2n−1 : for either value of the ith flip,n one and only one value of the jth flip causes Ai,j to occur; all other flips can be either heads or tails. Thus Pr [Ai,j ] = 2n2−1 = 12 . 10.79 Either {i, j} and {i′ , j′ } share a single index, or they share none. We’ll consider these two cases separately: Case 1: {i, j} and {i′ , j′ } share one index. Write {i, j} = {x, y} and {i′ , j′ } = {y, z}. Then (restricting our attention to these three flips [x and y and z]) the outcomes/events are:

xyz HHH HHT HTH HTT THH THT TTH TTT

Thus Pr [x ⊕ y|y ⊕ z] =

2 4

x⊕y

= 0.5 and Pr [x ⊕ y|y ⊕ z] =

3 3 3 3

2 4

y⊕z 3 3 3 3

= 0.5.

202

Probability

1

import random

2 3 4

ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' alphabet = 'abcdefghijklmnopqrstuvwxyz'

5 6 7 8

class SubstitutionCipher: def __init__(self, replacement_alphabet): self.mapping = replacement_alphabet

9 10 11 12 13 14 15 16 17 18 19

def encode(self, string): output = "" for ch in string: if ch in ALPHABET: output += self.mapping[ALPHABET.index(ch)] elif ch in alphabet: output += self.mapping[alphabet.index(ch)].lower() else: output += ch return output

20 21 22 23 24

def get_random_substitution_cipher(): permutation = list(ALPHABET) random.shuffle(permutation) return SubstitutionCipher(permutation)

25 26 27 28 29 30 31 32 33

def get_letter_frequencies(ciphertext): counts = {ch : 0 for ch in ALPHABET} letter_count = 0 for ch in ciphertext: if ch.upper() in ALPHABET: counts[ch.upper()] += 1 letter_count += 1 return {ch : counts[ch] / letter_count for ch in ALPHABET}

34 35 36 37

def decrypt_substitution_cipher_by_frequency(ciphertext, reference_text): cipher_frequencies = get_letter_frequencies(ciphertext) cipher_by_usage = sorted(ALPHABET, key=lambda ch: cipher_frequencies[ch])

38 39 40

reference_frequencies = get_letter_frequencies(reference_text) reference_by_usage = sorted(ALPHABET, key=lambda ch: reference_frequencies[ch])

41 42 43

decryption_alphabet = [reference_by_usage[cipher_by_usage.index(ch)] for ch in ALPHABET] return SubstitutionCipher("".join(decryption_alphabet))

Figure S.10.5 A substitution cipher in Python. Case 2: {i, j} and {i′ , j′ } do not share an index. Then the two events don’t even refer in any way to the same coin flip. Writing {i, j} = {w, x} and {i′ , j′ } = {y, z}, we have wxyz HHHH HHHT HHTH HHTT HTHH HTHT HTTH HTTT THHH THHT THTH THTT TTHH TTHT TTTH TTTT

w⊕x

y⊕z 3 3

3 3 3 3 3 3 3 3

3 3 3 3 3 3

10.3 Independence and Conditional Probability

44 45 46

203

class CaesarCipher(SubstitutionCipher): def __init__(self, shift): self.mapping = ALPHABET[shift:] + ALPHABET[:shift]

47 48 49 50

def get_random_caesar_cipher(): shift = random.randint(0, 25) return CaesarCipher(shift)

51 52 53 54 55 56 57 58 59 60 61 62 63

def decrypt_caesar_cipher_by_frequency(ciphertext, reference_text): cipher_frequencies = get_letter_frequencies(ciphertext) reference_frequencies = get_letter_frequencies(reference_text) best_shift, best_difference, best_decrypter = None, None, None for shift in range(26): decrypter = CaesarCipher(shift) difference = 0 for ch in ALPHABET: error = cipher_frequencies[ch] - reference_frequencies[decrypter.encode(ch)] difference += abs(error) if best_shift == None or difference < best_difference: best_shift, best_difference, best_decrypter = shift, difference, decrypter

Figure S.10.6 A Caesar cipher in Python (relying on the code in Figure S.10.5). Thus Pr [w ⊕ x|y ⊕ z] =

4 8

= 0.5 and Pr [x ⊕ y|y ⊕ z] =

4 8

= 0.5.

10.80 Consider the three events x ⊕ y, x ⊕ z, and y ⊕ z. (For example, A1,2 and A1,3 and A2,3 .) Knowing the values of x ⊕ y and x ⊕ z will allow you to infer y ⊕ z with certainty: xyz HHH HHT HTH HTT THH THT TTH TTT

x⊕y 3 3 3 3

x⊕z

y⊕z

3

3 3

3 3 3

Thus Pr [x ⊕ z|x ⊕ y, y ⊕ z] = Pr [x ⊕ z|x ⊕ y, y ⊕ z] = 02 = 0. But Pr [x ⊕ z|x ⊕ y, y ⊕ z] = Pr [x ⊕ z|x ⊕ y, y ⊕ z] = 22 = 1. 10.81

4 4

=1

10.82

4 8

= 0.5

10.83

2 3

10.84

2 4

= 0.5

10.85

2 4

= 0.5

10.86

2 7

10.87

4 7

10.88 Let’s draw a tree to see what’s happening:

3 3

204

Probability

1 12

··· (no threes present)

1 2

1 2

1 12

1 2

1 2

Let A denote the event “the top half of the drawn domino is 3.” Let B denote the event “doubles.” Then we have Pr [A] = 1 + 241 + 241 = 81 —the three left-most drawn domino leaves in the tree. And Pr [A ∩ B] = 241 + 241 = 121 —the two 24 left-most drawn domino leaves in the tree. Thus

Pr [B|A] =

Pr [A ∩ B] 1/8 2 = = . Pr [A] 1/12 3

10.89 By definition, A and B are independent if Pr [A ∩ B] = Pr [A] · Pr [B]. Because A ∩ B = ∅, we know that Pr [A ∩ B] = 0. Thus A and B are independent if and only if Pr [A] · Pr [B] = 0, which occurs precisely when Pr [A] = 0 or Pr [B] = 0. Thus, even though Pr [A ∩ B] = 0, A and B are independent if and only if either A or B occurs with probability zero. 10.90 There’s not enough information. For example, flip two fair coins, and define the following events: A = “the first coin came up heads” Then Pr [A|B] = Pr [B|A] = C are not independent. 10.91

1 , 2

B = “the second coin came up heads”

C = “both coins came up tails.”

and A and B are independent. On the other hand, Pr [A|C] = Pr [C|A] = 0, but A and

n−k n

10.92 The probability that the (k + 1)st element xk+1 causes a collision conditioned on no collisions occurring before that element is inserted is n−k , by Exercise 10.91. Say that xi is “safe” if no collision occurs when it’s inserted. Then the n probability that no collision occurs is

Pr [x1 safe] · Pr [x2 safe|x1 safe] · Pr [x3 safe|x1 , x2 safe] · · · · · Pr [xn safe|x1 , x2 , . . . , xn−1 safe] =

n−1 Y

n−k n

k=0

=

n! . nn

10.93 Let D be the event that x has the disease. By the law of total probability, we have     Pr [error] = Pr [error|D] Pr [D] + Pr error| D Pr D

= (0.01)(0.001) + (0.03)(0.999) = 0.02998. 10.94 Let D be the event that x has the disease. By the law of total probability, we have     Pr [error] = Pr [error|D] Pr [D] + Pr error| D Pr D

= (0.999)(0.001) + (0.001)(0.999) = 0.001998. 10.95 Bob receives a message with an even number of ones if there are zero, two, or four flips in transmission. The conditional probability of zero flips is therefore

(1 −

p)4

(1 − p)4 (1 − p)4  = . 4 2 4 2 4 (1 − p) + 6p2 (1 − p)2 + p4 + 2 p (1 − p) + p

10.3 Independence and Conditional Probability

205

For p = 0.01, this probability is

(0.99)4

(0.99)4 = 0.999388 · · · . + 6(0.01)2 (0.99)2 + (0.01)4

10.96 The conditional probability of receiving the correct message

(0.9)4 = 0.930902 · · · . (0.9)4 + 6(0.1)2 (0.9)2 + (0.1)4 10.97 Let biased and fair denote the events of pulling out the p-biased coin and the fair coin, respectively. Let H denote the event of flipping heads. Then:

Pr [H|biased] · Pr [biased] Pr [H] Pr [H|biased] · Pr [biased] = Pr [H|biased] · Pr [biased] + Pr [H|fair] · Pr [fair]

Pr [biased|H] =

= =

2 3

·

2 · 12 3 1 + 21 2

·

Bayes’ Rule Law of Total Probability

the given process of choosing a coin; the coins are p-biased and fair

1 2

4 . 7

10.98 Again let biased, fair, and H denote the events of choosing the p-biased coin, the fair coin, and flipping heads. Then:

Pr [HHHT|biased] · Pr [biased] Pr [HHHT] Pr [HHHT|biased] · Pr [biased] = Pr [HHHT|biased] · Pr [biased] + Pr [HHHT|fair] · Pr [fair]

Pr [biased|HHHT] =

=

(3/4)3 (1/4) · 12 (3/4)3 (1/4) · 12 + (1/2)3 (1/2) ·

=

27 . 31

1 2

Bayes’ Rule Law of Total Probability

the given process

10.99 Again let biased, fair, and H denote the events of choosing the p-biased coin, the fair coin, and flipping heads. Then:

Pr [biased|HTTTHT] Pr [HTTTHT|biased] · Pr [biased] = Pr [HTTTHT] Pr [HTTTHT|biased] · Pr [biased] = Pr [HTTTHT|biased] · Pr [biased] + Pr [HTTTHT|fair] · Pr [fair] =

(3/4)2 (1/4)4 · 12 (3/4)2 (1/4)4 · 12 + (1/2)2 (1/2)4 ·

=

9 . 73

1 2

Bayes’ Rule Law of Total Probability

the given process

10.100 The slot is full with probability m1 , and it’s empty with probability 1 − m1 . 10.101 The probability that hash function i does not fill this cell is 1 − m1 . We need all k of these non-filling events to occur, one per hash function. The events are independent, so the probability is (1 − m1 )k .

206

Probability

10.102 The event of a particular element filling cell i is independent for each element, and we just computed that its probability is (1 − m1 )k . Thus the probability of this cell remaining empty is h

(1 −

1 k in ) . m

10.103 Here is the sample space (that is, the set of all 16 possible hash values for x and y under h1 and h2 ): h1 (x) 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2

h2 (x) 1 1 1 1 2 2 2 2 1 1 1 1 2 2 2 2

Each outcome is equally likely, so there is a contained in {h1 (x), h2 (x)}.

10 16

h1 (y) 1 1 2 2 1 1 2 2 1 1 2 2 1 1 2 2

h2 (y) 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

false positive? 3

3 3 3 3 3 3 3 3

3

probability of a false positive—the chance that both y hash values are

10.104 The calculated false-positive rate is h

  2 h in i2 h i1 2 3 9 k 1 − (1 − m1 ) = 1 − (1 − 12 )2 = = . 4 16

Thus the actual false-positive rate (10/16) is slightly higher than the calculated false-positive rate (9/16). 10 = 0.8333 · · · , Another way of seeing this issue from the table: observe that Pr [B2 ] = 34 = 0.75. But Pr [B2 |B1 ] = 12 so Pr [B2 |B1 ] ̸= Pr [B2 ]. The reason for this discrepancy is that the slot h1 (y) being occupied (B1 ) is positively correlated with h2 (y) being occupied (B2 ), for the following somewhat subtle reason: the fact that h1 (y) is occupied makes it more likely that both cells are full because of x (that is, that h1 (x) ̸= h2 (x))—and therefore more likely that h2 (y) will also be occupied. 10.105 The (mis)calculated error rate is 0.05457 · · · ; my program varied across multiple runs but across 10 runs its error rate was above that number 5 times and below 5 times R (ranging from 0.05297 to 0.05566). My average observed error rate was 0.05444 · · · , just a little lower than predicted. The Python code is shown in Figure S.10.7.

10.4 Random Variables 10.106 The outcomes and the corresponding probabilities are:

10.4 Random Variables

1

import random

2 3 4 5

def create_random_function(input_size, output_size): '''Returns a random function {0, 1, ..., input_size - 1} --> {0, 1, ..., output_size - 1}.''' return [random.randint(0, output_size - 1) for i in range(input_size)]

6 7 8 9 10 11

def bloom_filter_false_positive_rate_empirical(n, m, k): '''Builds a Bloom filter in a m-cell hash table with k hash functions, and inserts n elements. Returns the observed fraction of false positives (computed for n non-inserted elements).''' table = [False for i in range(m)] hash_functions = [create_random_function(2 * n, m) for i in range(k)]

12 13 14 15

for x in range(n): for h in hash_functions: table[h[x]] = True

16 17 18 19 20

false_positives = 0 for y in range(n, 2*n): # y is an uninserted element! false_positives += all([table[h[y]] for h in hash_functions]) return false_positives / n

21 22 23 24

def bloom_filter_false_positive_rate_calculated(n, m, k): expected_error_rate = (1 - (1 - 1/m)**(k*n))**k return expected_error_rate

Figure S.10.7 Calculating the rate of false positives in a Bloom filter. outcome Computers are useless They can only give you answers 10.107 Pr [L = 4] =

12 44

probability

L 9 3 7 4 3 4 4 3 7

9 44 3 44 7 44 4 44 3 44 4 44 4 44 3 44 7 44

V 3 2 3 1 1 1 2 2 2

and E [V|L = 4] = 43 .

10.108 No, they are not independent; for example, Pr [L = 7] =

14 44

and Pr [V = 1] =

Pr [L = 7 and V = 1] = 0 ̸=

14 44

·

11 , 14

but

11 . 14

10.109 By definition, E [L] and E [V] are

E [L] = E [V] =

9 44 9 44

·9+ ·3+

3 44 3 44

·3+ ·2+

7 44 7 44

10.110 The variance of L is h i 2 2 var (L) = E L − [E [L]]  = 449 · 81 + 443 · 9 +

= = =

·7+ ·3+

4 44 4 44

·4+ ·1+

3 44 3 44

·3+ ·1+

7 · 49 + 444 · 16 + 443 44 729+27+343+64+27+64+64+27+343 − 64516 44 1936 74272−64516 1936 9756 = 5.039256 · · · . 1936

4 44 4 44

·4+ ·1+

·9+

4 44

4 44 4 44

·4+ ·2+

· 16 +

4 44

3 44 3 44

·3+ ·2+

· 16 +

7 44 7 44

3 44

· 7 = 254/44 and · 2 = 93/44.

·9+

7 44



· 49 −

2542 442

207

208

Probability

10.111 The variance of V is h i 2 2 var (V) = E V − [E [V]]  = 449 · 9 + 443 · 4 +

= = =

7 · 9 + 444 · 1 44 81+12+63+4+3+4+16+12+28 − 8649 44 1936 9812−8649 1936 1163 = 0.600723 · · · . 1936

+

3 44

·1+

4 44

·1+

4 44

·4+

3 44

10.112 We have H = 0 only if all coins come up tails, which happens with probability 1 − 2116 , we have H = 1. Thus E [H] = Pr [H = 1] = 1 − 2116 .

·4+

1 . 216

7 44



·4 −

932 442

Otherwise, with probability

10.113 Figure S.10.8 shows Python code to compute E [R] by computing all 216 sequences of coin flips, and then calculating the length of the longest run in each. I get an expected length of 4.23745 · · · . 10.114 No: Pr [R = 8] =

16 8



·

1 216

and Pr [H = 0] =

1 , 216

but Pr [H = 0 and R = 8] = 0 ̸= Pr [R = 8] · Pr [H = 0].

10.115 There are precisely 2 of the 18 faces (among the three dice) with each number in {1, . . . , 9}. Thus the probability 2 of rolling any of those numbers is exactly 18 = 19 . 1 10.116 By the previous exercise and the fact that the two rolls are independent, we know that there’s a 81 probability for 2 2 each ⟨X, Y⟩ ∈ {1, 2, . . . , 9} . The function f(x, y) = 9x − y is a bijection between {1, 2, . . . , 9} and {0, 1, . . . , 80}. Thus Pr [9X − Y = k] = 811 for each k ∈ {0, 1, . . . , 80}.

10.117 The expected value for each die is 5:

E [ B] =

1+2+5+6+7+9 6

=

30 6

=5

E [R] =

1+3+4+5+8+9 6

=

30 6

=5

E [ K] =

10.118 First, we’ll show that Pr [B > R], Pr [R > K], and Pr [K > B] are all equal to Pr [B > R|R = 1] · Pr [R = 1]

Pr [B > R|R = 3] · Pr [R = 3]

2+3+4+6+7+8 6

=

30 6

17 : 36

and so forth

Pr [B > R] = Pr [B > 1] · Pr [R = 1] + Pr [B > 3] · Pr [R = 3] + Pr [B > 4] · Pr [R = 4] + Pr [B > 5] · Pr [R = 5] + Pr [B > 8] · Pr [R = 8] + Pr [B > 9] · Pr [R = 9] 1 · [ Pr [B > 1] + Pr [B > 3] + Pr [B > 4] + Pr [B > 5] + Pr [B > 8] + Pr [B > 9] ] 6 17 5+4+4+3+1+0 = . = 36 36 1 Pr [R > K] = · [ Pr [R > 2] + Pr [R > 3] + Pr [R > 4] + Pr [R > 6] + Pr [R > 7] + Pr [R > 8] ] 6

=

1 2 3 4 5 6

def longest_run_length(x): zeros, ones = "0", "1" while zeros in x or ones in x: zeros = zeros + "0" ones = ones + "1" return len(zeros) - 1

7 8 9 10

run_lengths = [longest_run_length(bin(index)) for index in range(2**16)] print("The average longest run length is", sum(run_lengths) / len(run_lengths)) # Output: The average longest run length is 4.237457275390625

Figure S.10.8 A Python implementation to calculate the expected length of the longest run in a 16-bit string.

= 5.

10.4 Random Variables

209

5+4+3+2+2+1 17 = 36 36 1 Pr [K > B] = · [ Pr [K > 1] + Pr [K > 2] + Pr [K > 5] + Pr [K > 6] + Pr [K > 7] + Pr [K > 9] ] 6 6+5+3+2+1+0 17 = = . 36 36

=

Also observe that Pr [B = R] = 363 because there are three numbers shared between the two dice (and each pair comes 3 up with probability 361 ). Similarly, Pr [R = K] = 36 and Pr [K = B] = 363 . Thus

Pr [B > R and B ̸= R] Pr [B ̸= R] Pr [B > R] = Pr [B ̸= R] 17/36 17 = = . 33/36 33

Pr [B > R|B ̸= R] =

By precisely the same argument, we have Pr [R > K|R ̸= K] =

17 33

if B > R then B ̸= R

and Pr [K > B|K ̸= B] =

17 . 33

10.119 All three dice are fair, so the expectation of one is 16 of the sum of its faces. And 4 · 3 + 6 = 3 · 2 + 3 · 5 = 1 + 5 · 4 = 21. Thus all three random variables have expectation 21 = 3.5. 6 10.120 K > L unless we observe K = 3 and L = 5, which happens with probability (5/6) · (1/2) = 5/12. L > M unless we observe L = 2 and M = 4, which happens with probability (1/2) · (5/6) = 5/12. M > K unless we get M = 6 or K = 1, which happens with probability (1/6) + (1/6) − (1/36) = 11/36. (We had to subtract the 1/36 because we double-counted the ⟨6, 1⟩ outcome.) Thus Pr [K > L] = Pr [L > M] = 7/12 and Pr [M > K] = 25/36, all of which are strictly greater than 1/2. 10.121 Each die has expectation 216 . By linearity of expectation, the sum of any the two random variables V1 and V2 has expectation E [V1 + V2 ] = E [V1 ] + E [V2 ] = 42 = 7. 6 10.122 Here’s a solution in Python: 1 2 3

K = [3,3,3,3,3,6] L = [2,2,2,5,5,5] M = [1,4,4,4,4,4]

4 5 6 7

KK = [k1 + k2 for k1 in K for k2 in K] LL = [l1 + l2 for l1 in L for l2 in L] MM = [m1 + m2 for m1 in M for m2 in M]

8 9 10 11

print("K beats L:", len([(kk,ll) for kk in KK for ll in LL if kk > ll])) print("L beats M:", len([(ll,mm) for ll in LL for mm in MM if ll > mm])) print("M beats K:", len([(mm,kk) for mm in MM for kk in KK if mm > kk]))

# output: # output: # output:

K beats L: 531 L beats M: 531 M beats K: 625

There’s a 531/1296 = 0.40972 · · · chance that K1 + K2 > L1 + L2 . There’s a 531/1296 = 0.40972 · · · chance that L1 + L2 > M1 + M2 . There’s a 625/1296 = 0.48225 · · · chance that M1 + M2 > K1 + K2 . Indeed, all three of these probabilities are strictly less than 1/2. (And the probability of a tie—for example, of K1 + K2 = L1 + L2 —is still zero.) 10.123 The probabilities that we have P pairs in a five-card hand are:

Pr [P = 6] =

13 · 12 · 4  52

choose a 4-of-a-kind rank; choose any other rank (and suit)

5

Pr [P = 5] = 0 Pr [P = 4] = Pr [P = 3] =

13 · 13 ·



4 3

 4 3

· 12 · 52 5

·



 12

2 52 5

4 2



· 42

impossible to have exactly five pairs choose a 3-of-a-kind rank (and three suits); choose another rank for a pair (and two suits)

choose a 3-of-a-kind rank (and three suits); choose two other ranks (of any suit)

210

Probability 

Pr [P = 2] = Pr [P = 1] =

13 2

13 5

4 2





13 · 

Pr [P = 0] =

· 4 2



·

· 11 · 4

4

2 52

·

5  12 3

 52

choose two pair ranks (and two suits each); choose any other rank (and suit)

· 43

choose a pair rank (and two suits); choose three other ranks (and any suit)

5

· 45 

choose five distinct ranks (and any suit for each)

52 5

P6

The expected number of pairs is

i=0

Pr [P = i] · i, which, using the above calculations, comes out to be 0.5882 · · · .

10.124 Define an indicator random variable Ri,j that’s 1 if and only if cards #i and #j are a pair. We have Pr [Ri,j = 1] = 513 : no matter what card #i is, there are precisely 3 cards of the remaining 51 that pair with it. Thus P P the total number of pairs in the hand is 5i=1 i−1 j=1 Ri,j , which, by linearity of expectation, is 5 2



· E [Ri,j ] = 10 ·

3 51

=

30 51

= 0.5882 · · · .

10.125 Let Hi denote the high-card points of the ith card in the hand. Then we have

E [Hi ] = 4 · Pr [A] + 3 · Pr [K] + 2 · Pr [Q] + 1 · Pr [J] = 4 · 131 + 3 · 131 + 2 · 131 + 1 · 131 . = 10 13 By linearity of expectation, then, "

E

13 X

# Hi =

13 X

i=1

E [ Hi ] =

i=1

13 X

10 13

= 10.

i=1

The expected number of high-card points in a bridge hand is 10. 10.126 There are 13 hearts and 39 nonhearts. Thus 

• The number of hands void in hearts is 39 : choose 13 nonhearts. 13   • The number of hands with one heart is 131 · 39 : choose 1 heart, and 12 nonhearts. 12   • The number of hands with two hearts is 132 · 39 : choose 2 hearts, and 11 nonhearts. 11 Thus the expected number of hearts distribution points is 



39 13  52 13



+2·

13 1

·

39



 12 + 1 · 52 13



13 2

·

39



 11 = 0.404369 · · · . 52 13

10.127 By linearity of expectation, the expected number of points in a hand is 10 + 4 · 0.404369 · · · = 11.617479 · · · , where 4 · 0.404369 is the expected number of distribution points (multiplying the result from the previous exercise by four, because there are four suits in which it’s possible to get distribution points).

211

10.4 Random Variables

10.128 The second version of the summation for expectation in Definition 10.17 results from collecting each outcome x that has the same value of the random variable X(x). Here’s an algebraic proof of their equivalence: X X(x) · Pr [x] x∈S

=

X X y∈R

=

X X

X



collecting all terms in which X(x) = y

Pr [x]

for these terms, X(x) = y, so we can pull out the X(x) = y

x∈S: X(x)=y

y∈R

=

X(x) · Pr [x]

x∈S: X(x)=y

y · Pr [X = y] .

Pr [X = y] is precisely

y∈R

10.129 Let X and Y be independent random variables. Then X E [ X · Y] = (X · Y)(s) · Pr [s]

∑ x∈S: X(x)=y

Pr [x]

definition of expectation

s∈S

=

X

(x · y) · Pr [X = x and Y = y]

summing over values of X, Y instead of outcomes

x,y

=

XX x

=

X

(x · y) · Pr [X = x] · Pr [Y = y]

y

x · Pr [X = x] ·

x

=

X

X

definition of independent random variables

y · Pr [Y = y]

factoring

y

x · Pr [X = x] · E [Y]

x

= E [ Y] ·

X

definition of expectation

x · Pr [X = x]

factoring

x

= E [ Y] · E [ X] .

definition of expectation

10.130 Flip two fair coins. Let X denote the number of heads in those tosses, and let Y denote the number of tails in those tosses. Then E [X] = E [Y] = 1. But

E [X · Y] = X(HH) · Y(HH) · Pr [HH] + X(HT) · Y(HT) · Pr [HT] + X(TH) · Y(TH) · Pr [TH] + X(TT) · Y(TT) · Pr [TT] = 2 · 0 · Pr [HH] + 1 · 1 · Pr [HT] + 1 · 1 · Pr [TH] + 0 · 2 · Pr [TT] = 0 · 41 + 1 · 14 + 1 · 14 + 0 · 14 = 21 . Thus these random variables form an example instance in which E [X · Y] ̸= E [X] · E [Y]. 10.131 Consider the following sample space and corresponding values for the random variables X and Y: outcome HH HT TH TT

probability 0.25 0.25 0.25 0.25

X 0 1 −1 0

Y 1 0 0 −1

Then E [X · Y] = E [X] · E [Y]:

E [X] = 0.25(0 + 1 − 1 + 0) = 0 E [Y] = 0.25(1 + 0 + 0 − 1) = 0 E [XY] = 0.25(0 + 0 + 0 + 0) = 0.

212

Probability

But X and Y are not independent because, for example, Pr [X = 0] =

Pr [X = 0 and Y = 0] = 0 ̸=

1 4

1 2

and Pr [Y = 0] =

1 2

but

= Pr [X = 0] · Pr [Y = 0] .

Thus X and Y are an example of dependent random variables for which E [X · Y] = E [X] · E [Y]. 10.132 Following the hint, define a random variable Xi denoting the number of coin flips after the (i − 1)st Heads before P 1 you get another Heads. We have to wait 1000 i=1 Xi flips before we get the 1000th head, and we have E [Xi ] = p . By linearity 1000 of expectation, we have to wait p flips before we get the 1000th head. 10.133 Roughly, this process is flipping a single coin that comes up heads with probability p2 , so we’d expect a waiting time of about p12 flips. (But there’s a bit of subtlety, as the fact that flips #1 and #2 didn’t produce two consecutive heads makes it a bit less likely that flips #2 and #3 would.) To solve this problem more precisely, let F denote the expected number of flips before getting two consecutive heads. Then:

• there a 1 − p chance the first flip is tails; if so, we need F more flips (after the first) to get two consecutive heads. • there’s a p · (1 − p) chance the first flip is heads but the second is tails, in which case we need F more flips (after the first two) to get two consecutive heads.

• there’s a p2 chance the first two flips are heads, in which case we’re done. Therefore

  with probability p2 2 F = 1 + F with probability 1 − p  2 + F with probability p(1 − p).

In other words, we know that F satisfies F = 2p + (1 + F) · (1 − p) + (2 + F) · p · (1 − p) 2

= 2p2 + 1 − p + F(1 − p + p − p2 ) + 2p − 2p2 = 1 + p + F(1 − p2 ) 2

subtracting F(1 − p2 ) from both sides

or p F = 1 + p and thus F =

1+p . p2

dividing both sides by p2

Another way to think about it: a p-fraction of the time that the coin comes up Heads (which happens every p1 flips in expectation), we’re done with one additional flip; a (1 − p)-fraction of the time we must start over. So the expected number F of flips satisfies F = p · ( p1 + 1) + (1 − p) · ( p1 + 1 + F) , which also leads to F =

1+p . p2

10.134 The ith element of the array is swapped all the way back to the beginning of the array if and only if A[i] is the smallest element in A[1 . . . i]. The probability that the ith element is the smallest out of these i elements is precisely 1i . 10.135 The number of comparisons P is precisely the number of swapped pairs plus all failed (non-swap-inducing) comparisons. Write the latter as ni=1 Yi , where ( 1 if element i is not swapped all the way to the beginning Yi = 0 if it is. Observe that E [Yi ] = 1 −

1 i

by the previous exercise, and so

E [Y] =

n P i=1

E [Yi ] = n −

n P i=1

1 i

= n − Hn ,

213

10.4 Random Variables

where Hn is the nth harmonic number. We’ve already shown that the expected number of swapped pairs is the total number of comparisons, in expectation, is  n 2

2

n 2



/2. Thus

+ n − Hn .

. That value first exceeds 1 for n = 448, as 448 · 447 = 200,256, but 10.136 The expected number of collisions is n(n−1) 200,000 447 · 446 = 199,362. For 100,000 collisions, we need n(n − 1) ≥ 200,000 · 100,000, which first occurs for n = 141,422. 10.137 Using the Python implementation in Figure S.10.9, I saw 1.022 collisions on average with n = 448 elements, and 99996.74 collisions on average with n = 141,422 elements. 1 10.138 There are i − 1 filled slots, so there is a i−1 chance that the next element goes into an already filled slot, and a m−i+1 chance that the next element goes into an unfilled slot. Thus m

1 Pr [get a new type in a draw if you have i − 1 types already] m = . m−i+1

E [ Xi ] =

Example 10.41 above discussion

10.139 Let X denote the number of elements hashed before all m slots are full. Let X1 denote the time until you have one full slot; X2 denote the time between having one full slot and two full slots; X3 denote the time between having two full slots and three full slots; etc. Observe that X = X1 + X2 + X3 + · · · + Xm . But now " m # X E [X] = E Xi definition of Xi s i=1

= =

m X i=1 m X i=1

=m· =m·

E [Xi ] m m−i+1 m X i=1 m X j=1

1

linearity of expectation

Exercise 10.138

1 m−i+1 1 m

factoring

change of index of summation

import random

2 3 4 5 6 7 8 9 10 11 12 13 14

def count_collisions(slots, elements): ''' Count the number of collisions in a hash table with number of cells==slots and number of hashed items==elements, under a simple uniform hashing assumption. ''' T = [0 for i in range(slots)] collisions = 0 for x in range(elements): slot = random.randint(0, slots - 1) # choose the random slot. collisions += T[slot] T[slot] += 1 return collisions

15 16 17 18

for (num_trials, num_slots, num_elements) in [(1000, 100000, 448), (100, 100000,141422)]: total_collisions = sum([count_collisions(num_slots, num_elements) for i in range(num_trials)]) print("for", num_elements, "average number of collisions =", total_collisions / num_trials)

Figure S.10.9 Counting collisions when using chaining to resolve collisions in a hash table.

214

Probability

= m · Hm .

definition of harmonic numbers

10.140 There are 2020 equally probable possible sequences of responses; we’re happy with any of the 20! permutations of the elements. Thus we get 20 different answers in the first 20 trials with probability 20! ≈ 0.0000000232019 · · · . 2020 10.141 We expect to need 20 · H20 trials. The exact value of H20 is 1 + to need 71.95479 · · · trials to collect all 20 answers.

1 2

+

1 3

+ ··· +

= 3.5977 · · · , so we’d expect

1 20

10.142 Each trial needs to fail to get the particular answer, which happens with probability need all 200 trials to fail, which happens with probability (19/20)200 ≈ 0.00003505.

19 20

on any particular trial. We’d

10.143 Let Ei be the event that we’ve missed answer i. We’re looking to show that the probability of " 20 # 20 [ X Pr Ei ≤ Pr [Ei ] i=1

S20 i=1

Ei is small. Union Bound

i=1

=

20 X i=1

= 20 ·

 19 200 20

previous exercise

 19 200 20

= 0.00070105 · · · . Thus the probability of failure is upper bounded by ≈ 0.07%. 10.144 Consider an n-bit number. If the error occurs in bit i, which occurs with probability 1n , then the error is of size 2i−1 . Thus the expected error is n−1 X

1 n

· 2i =

1 n

· (2n − 1).

i=0

For n = 32, the expected error is therefore of size

232 −1 32

= 227 −

1 . 32

10.145 The probability is 1n : there are n slots, and i is equally likely to go into any of them. 10.146 Let Xi be an indicator random variable where Xi = 1 if and only if πi = 1. Then X = of expectation and the previous exercise, " n # n n n X X X X 1 E Xi = E [Xi ] = Pr [Xi ] = = n · n1 = 1. n i=1

i=1

i=1

Pn i=1

Xi , and, by linearity

i=1

10.147 Let X be a random variable that is always nonnegative. Let α ≥ 1 be arbitrary. Note that X either satisfies X ≥ α or 0 ≤ X < α. Thus:

E [X] = E [X|X ≥ α] · Pr [X ≥ α] + E [X|X < α] · Pr [X < α] ≥ α · Pr [X ≥ α] + 0 · Pr [X < α] = α · Pr [X ≥ α] . Dividing both sides by α yields that Pr [X ≥ α] ≤

E [ X] . α

10.148 By Markov’s inequality with α = (the median of X), we have

Pr [X ≥ median] ≤

E [X] . median

the “law of total expectation” (Theorem 10.21) X ≥ α and X ≥ 0 in the two terms, respectively

10.4 Random Variables

215

And, by definition of median, we also have

Pr [X ≥ median] ≥ Combining these inequalities, we have



1 2

E[X] median

and thus, rearranging, median ≤ 2 · E [X].

10.149 The probability that it takes k flips to get the first heads is ∞ X

1 2i

1 . 2

∞ X

· ( 32 )i =

i=1

1 . 2k

( 34 )i =

i=1

Thus your expected payment from the game is 3 4

·

∞ X

( 34 )i .

i=0

Using the formula for a geometric sum, this value is 3 4

·

1 1−

3 4

=

3 4

· 4 = 3.

You should be willing to pay $3 to play. 10.150 The probability that it takes k flips to get the first heads is X

1 2i

· 2i =

1 . 2k

Thus your expected payment from the game is

P i=1,2,...

1.

i=1,2,...

In other words, this game has infinite expected payment! You should be willing to pay any amount y to play (. . . assuming you care only about expectation, which is certainly not true of your true preferences for dollars. For example, suppose I gave you the choice between (i) a 50% chance at winning two trillion dollars [and a 50% chance of winning nothing], and (ii) a guaranteed one trillion dollars. You are not neutral between (i) and (ii), even though the two scenarios have exactly the same expected value!) 10.151 Note that X = i with probability

4 i



/16 for each i, which works out to

1 , 4, 6, 4, 1 16 16 16 16 16

for X = 0, 1, 2, 3, 4. Thus:

h i 2 2 var (X) = E X − [E [X]]

 2 0·1+1·4+2·6+3·4+4·1 02 · 1 + 12 · 4 + 22 · 6 + 32 · 4 + 42 · 1 − 16 16  2 0 + 4 + 12 + 12 + 4 0 + 4 + 24 + 36 + 16 = − 16 16 80 2 = − 2 = 5 − 4 = 1. 16

=

The variance is 1. 10.152 Note that E [Y] = 3.5, just as for a single roll: write Y1 and Y2 as half the sum of the result of two independentrolls  of the dice; then E [Yi ] = 0.5 · 3.5 = 1.75, and, by linearity of expectation, E [Y] = E [Y1 ] + E [Y2 ] = 3.5. As for E Y2 , there are 6 − i ways to get a total of 7 ± i, so h i

E Y2 = =

1 36

h

987 72

1( 22 )2 + 2( 32 )2 + 3( 42 )2 + 4( 25 )2 + 5( 62 )2 + 6( 72 )2 + 5( 82 )2 + 4( 92 )2 + 3( 102 )2 + 2( 11 )2 + 1( 122 )2 2

≈ 13.70833.

Thus var (Y) = 987/72 − (7/2)2 = 105/72 ≈ 1.45833, half of what it was before.

i

216

Probability

10.153 Let a ∈ R, and let X be a random variable. Then X E [a · X] = (a · X)(x)

definition of expectation

x∈S

=

X

a · X(x)

definition of random variable a · X

x∈S

=a·

X

X(x)

algebra

x∈S

= a · E [ X] .

definition of expectation

10.154 Let a ∈ R, and let X be a random variable. Then h i 2 2 var (a · X) = E (a · X) − (E [a · X]) h i = E a2 · X2 − (E [a · X])2 h i = a2 · E X2 − (a · E [X])2 h i = a2 · E X2 − a2 · (E [X])2 h h i i = a2 · E X2 − (E [X])2

Theorem 10.23 algebra linearity of expectation/Exercise 10.153 algebra factoring

= a · var (X) . 2

Theorem 10.23

10.155 Let X and Y be two independent random variables. Then: h i 2 2 var (X + Y) = E (X + Y) − [E [X + Y]] i h = E (X + Y)2 − [E [X] + E [Y]]2 h i = E X2 + 2XY + Y2 − [E [X]]2 − 2E [X] E [Y] − [E [Y]]2 h i h i = E X2 + E [2XY] + E Y2 − [E [X]]2 − 2E [X] E [Y] − [E [Y]]2 h i h i = E X2 − [E [X]]2 + E Y2 − [E [Y]]2 + E [2XY] − 2E [X] E [Y]

definition of variance linearity of expectation algebra linearity of expectation

= var (X) + var (Y) + E [2XY] − 2E [X] E [Y] = var (X) + var (Y) + E [2XY] − E [2XY] = var (X) + var (Y) .

rearranging terms definition of variance Exercise 10.129

10.156 Define indicator random variables X1 , X2 , . . . , Xn that areP1 if the ith flip comes up heads, so that X is the number n of heads found in n flips of a p-biased coin is given by X = i=1 Xi . Note that these indicator random variables are  2 2 independent, E [Xi ] = p, and var (Xi ) = E Xi − E [Xi ] = p − p2 = p(1 − p). Thus " n # n X X E [ X] = E Xi = E [Xi ] = np i=1

i=1

by linearity of expectation and the above discussion, and ! n n X X var (X) = var Xi = var (Xi ) = np(1 − p) i=1

i=1

by Exercise 10.155 and the above discussion. 10.157 Define an indicator random variable Xi that’s 1 if the ith flip came up heads. Then we can write Pn 1 i=1 Xi Y= = · X, n n

10.4 Random Variables

217

where X is the number of heads found in n flips of a p-biased coin. Thus

E [Y] = E = =

1

1 n 1 n

n



·X

· E [ X]

var (Y) = var

and linearity of expectation

=

Exercise 10.156

=

· pn

= p.

=

1 n2 1 n2

1 n

·X



· var (X)

Exercise 10.154

· np(1 − p)

Exercise 10.156

p(1 − p) . n

10.158 Recall the formula for a geometric summation n X

i

r =

i=0

rn+1 − 1 . r−1

(1)

Thus the first derivatives of both sides must be equal, and likewise the second derivatives. Taking the first and second derivatives of the left-hand side of (1), we have n n X d X i i−1 r = ir dr i=0 i=0

(2)

n n X d2 X i i−2 r = i( i − 1 ) r . d2 r i=0 i=0

(3)

Differentiating the right-hand side of (1), we have   (r − 1)(n + 1)rn − (rn+1 − 1) d rn+1 − 1 = dr r−1 (r − 1)2 rn+1 n − rn (n + 1) + 1 (r − 1)2 1 → as n → ∞ for r ∈ [0, 1). (r − 1)2

=

Similarly, taking the derivative of

rn+1 n−rn (n+1)+1 (r−1)2

(4)

we see that

  d2 rn+1 − 1 −2 2 → = as n → ∞ for r ∈ [0, 1). d2 r r−1 (r − 1)3 ( 1 − r) 3

(5)

Assembling these observations (and taking the limit at n → ∞), we have ∞ X

ir

i−1

i=0 ∞ X

1

=

( 1 − r) 2

ir =

( 1 − r) 2

i

i=0

r

by (1), (2), and (4)

,

multiplying both sides by r

which was the first formula we were asked to show. Similarly, ∞ X

i(i − 1)r

i=0 ∞ X i=0

i−2

=

i( i − 1 ) r = i

2

( 1 − r) 3 2r2 . ( 1 − r) 3

by (1), (3), and (5)

multiplying both sides by r2

218

Probability

Finally, adding the two sides of our first result to this equality, we derive our second desired formula: ∞ X

i(i − 1)r + i

i=0

∞ X

i

ir =

i=0 ∞ X i=0 ∞ X i=0 ∞ X i=0

2 i

ir = 2 i

ir = 2 i

ir =

2r2 r + ( 1 − r) 3 ( 1 − r) 2 r

( 1 − r) 2

+

2r2 ( 1 − r) 3

i + i(i − 1) = i2

r(1 − r) + 2r2 ( 1 − r) 3

algebra

r( 1 + r) . ( 1 − r) 3

algebra

10.159 Let X be a geometric random variable with parameter p. Then h i 2 probability of needing i flips var (X) = E X − E [X] "∞ #   2 X 1 2 i−1 = (1 − p) p · i − p i=1 # "∞ X 1 p ( 1 − p ) i i2 − 2 = 1 − p i=1 p p (1 − p)(2 − p) 1 · − 2 1−p p3 p 1−p . = p2

=

previous exercise

10.160 Each clause has three literals. There’s a 21 probability that any one of those literals fails to be set in the way that makes the clause true; the clause is false only if all three literals fail to be set in the way that makes the clause true. The three literals fails independently unless a variable and its negation both appear in the same clause, so the probability that a clause isn’t satisfied is 12 · 12 · 12 = 18 . (If a variable and its negation both appear in the same clause, then the clause must be satisfied by any assignment.) Thus the clause is set to true with probability 87 or great. 10.161 Let Xi be an indicator random variable that’s 1 if the ith clause is satisfied, and 0 if the ith clause is not Pmsatisfied. By the previous exercise, Pr [Xi = 1] ≥ 87 , and thus EP[Xi ] ≥ 78 . The number of satisfied clauses is X = i=1 Xi ; by . linearity of expectation, we can conclude that E [X] = mi=1 E [Xi ] ≥ 7m 8 10.162 Let X be any random variable. Suppose for a contradiction that Pr [X < E [X]] = 0. Then

E [X] = E [X|X ≥ E [X]] · Pr [X ≥ E [X]] + E [X|X < E [X]] · Pr [X < E [X]] the “law of total expectation” (Theorem 10.21) = E [X|X ≥ E [X]] · 0 + E [X|X < E [X]] · 1 = E [X|X < E [X]] < E [X] . X (and thus its expectation) is always less than E [X] if we condition on X < E [X]! That’s a contradiction: we can’t have E [X] < E [X]! The expected number of satisfied clauses in a random assignment for an m-clause 3CNF proposition φ is at least 7m , by Exercise 10.161. Thus by the above fact, the probability that a random assignment satisfies at least 7m clauses is 8 8 nonzero—and if there’s a nonzero chance of choosing an assignment in a set S, then |S| ̸= ∅!

11

Graphs and Trees

11.2 Formal Introduction 11.1 The graph is naturally undirected. It is not simple: there’s a self-loop on 1. Here’s the graph: 10 9

1

8

2

7

3 6

4 5

11.2 The graph is directed, and not simple: every node has a self-loop. 10 9

1

8

2

7

3 6

4 5

11.3 The graph is directed and simple (there are no self-loops nor parallel edges). 10 9

1

8

2

7

3 6

4 5

219

220

Graphs and Trees

11.4 Edges: {A, B}, {B, C}, {B, E}, {C, D}, {C, E}, {D, E}, {D, G}, {E, F}, {F, H}, {G, H}. The highest-degree node is E, with degree 4. 11.5 Edges: {A, H}, {B, E}, {B, D}, {C, H}, {C, E}, {D, B}, {E, F}, {G, F}. The node of highest degree is E, with degree 3. 11.6 The node of highest in-degree is H, with in-degree 3. The nodes of highest out-degree are A, B, C, and D, all with out-degree 2. 11.7 The nodes of highest in-degree are actually all of nodes, which are tied with in-degree 1. The node of highest outdegree is A, with out-degree 3.  11.8 |E| can be as large as n2 = n(n − 1)/2: every unordered pair of nodes can be joined by an edge. |E| can be as small as 0: there’s no requirement that there be any edges. 11.9 |E| can be as large as n(n − 1): every ordered pair of distinct nodes can be joined by an edge. (Or, to say it another way: every unordered pair can be joined in 2 directions.) |E| can still be as small as 0: there’s no requirement that there be any edges. 11.10 |E| can now be as large as n2 : all pairs in V × V are legal edges, because the joined nodes no longer have to be distinct. |E| can still be as small as 0. 11.11 |E| can now be arbitrarily large: for any integer k, we can have k edges from one node to another. So there’s no upper bound in terms of n. |E| can still be as small as 0. 11.12 There can be as many as 150 + 150 · 149 = 22,500 people in |S|. Here is a graph with this structure: each of Alice’s 150 friends has 149 distinct friends other than Alice, and so there are 150 people at distance 1 and 150 · 149 = 22,350 people at distance 2. Writing A1 , A2 , . . . , A150 to denote Alice’s friends, the structure of the network that we just described is as follows: Alice

1

2

...

...

A2

A1

149

1

2

...

149

A150

1

2

...

149

11.13 There can be as few as 150 people in |S|. If all of Alice’s friends are also friends with each other, then each of these people knows the 149 others and Alice, for a total of 150 friends. Here is a scaled-down version of the structure of the network, (shown here with only 15 nodes for readability):

11.2 Formal Introduction

221

11.14 The number of people Bob knows via a chain of k or fewer intermediate friends can be as many as k X

150 · 149 = 150 · i

i=0

149k+1 − 1 150 = · (149k+1 − 1) 149 − 1 148

using Theorem 5.5. (To check: when k = 0, this value is 150 ·(149 − 1) = 150. When k = 1, this value is 150 ·(1492 − 1) = 148 148 2 150 · (149 − 1)(149 + 1) = 150 .) 148 At each layer k ≥ 1, all 149k−1 people who are k − 1 hops removed from Bob have precisely 149 additional new friends. There are 150 people at level 0. (The rest of the argument matches the tree from Exercise 11.12, just with more layers.) 11.15 There can still be as few as 150 people in |Sk |. If all of Bob’s friends are also friends with each other, then no matter how many “layers” outward from Bob we go, we never encounter any new people. (The picture is identical to Exercise 11.13.) 11.16 Each of u’s neighbors v has degree at least one, because u is a neighbor of v. Thus X X degree(v) ≥ 1 = |neighbors(u)| = degree(u). v∈neighbors(u)

v∈neighbors(u)

P 11.17 By Theorem 11.8, the quantity u∈V degree(u) is 2|E|, which is an even number. But a sum with an odd number of odd-valued terms would be odd, so we must have that the number of odd terms in the sum—that is, nodd —is even. 11.18 Imagine looping over each edge to compute all nodes’ in- and out-degrees: 1 initialize inu and outu to 0 for each node u 2 initialize m to 0 3 for each edge {u, v} ∈ E: 4 5 6

outu := outu + 1 inv := inv + 1 m := m + 1

By precise analogy to the proof of Theorem 11.8, at the end of P we have that inu = in-degree(u) and P the for loop, outu = out-degree(u) and m = |E|. Furthermore, it’s clear that u inu = u outu = m, because each of these three quantities increased by 1 in each iteration. The claim follows. 11.19

head

11.20

head

11.21 There can’t be parallel edges because there’s one and only one outgoing edge from any particular node, but there can be self loops: head

11.22 Here’s the idea of the algorithm: we’re going to remember two nodes, called one and two. Start them both as the head of the list. We’ll repeatedly advance them down the list, but we’ll move one forward by one step in the list, and two

222

Graphs and Trees

forward by two. If we ever reach the end of the list (a null pointer), then the list is not circular. If one is ever equal to two after the initialization, then there’s a loop! Here’s the algorithm:

1 if L has fewer than two elements then 2

return “circular” or “noncircular” (as appropriate)

3 one := L.head.next 4 two := L.head.next.next 5 while one ̸= two

if L has < 2 elements following one or two then return “circular” or “noncircular” (as appropriate) 8 one := one.next 9 two := two.next.next 10 return “circular” 6 7

(Testing whether there are fewer than 2 nodes following x is done as follows: remember x, x.next, and x.next.next, testing that none of these values are null [in which case the list is noncircular] and that none of them are identical [in which case the list is circular].) One direction of correctness is easy. Suppose that L is not circular. Number the nodes of L as 1, 2, . . . , n. It’s a straightforward proof by induction that one always points to a smaller-index node than two so that we always have one ̸= two. And, after at most n2 iterations, we reach the end of the list (two.next.next is null) and the algorithm correctly returns “noncircular.” For the other direction, suppose that L is circular—with a k-node circle after the initial “tail” of n − k nodes. Then after k iterations of the loop, both one and two must be in the circle. If two is ℓ nodes behind one on the circle after k iterations, then after an additional ℓ mod k iterations, the values of one and two will coincide and the algorithm will correctly return “circular.”

2

11.23

3

1

5

4

11.24 Let n = 1. Then node 1’s next pointer is back to 1, because 1 mod n + 1 = 1. (Likewise, node 1’s prev pointer is back to 1.)

prev

1

next

11.25 Let n = 2. Then node 1’s next pointer is to 2, and 2’s next pointer is to 1 because 2 mod n + 1 = 1. Thus 1’s next pointer is to 2 and 1’s previous pointer is 2.

11.2 Formal Introduction

next

prev

1

2 prev

11.26

A:B B : A, C, E C : B, D, E D : C, E, G E : B, C, D, F F : E, H G : D, H H : D, G

11.27

A:H B : D, E C : E, H D:B E : B, C, F F : E, G G:F H : A, C

11.28

A : B, C B : E, H C : D, H D : G, H E:B F: G:F H:

11.29

A : B, C, D B: C: D:A

11.30 A B C D E F G H

A 0

B 1 0

C 0 1 0

D 0 0 1 0

E 0 1 1 1 0

F 0 0 0 0 1 0

G 0 0 0 1 0 0 0

H 0 0 0 0 0 1 1 0

next

223

224

Graphs and Trees

A 0

B 0 0

C 0 0 0

D 0 1 0 0

E 0 1 1 0 0

F 0 0 0 0 1 0

G 0 0 0 0 0 1 0

H 1 0 1 0 0 0 0 0

A B C D E F G H

A 0 0 0 0 0 0 0 0

B 1 0 0 0 1 0 0 0

C 1 0 0 0 0 0 0 0

D 0 0 1 0 0 0 0 0

E 0 1 0 0 0 0 0 0

F 0 0 0 0 0 0 1 0

G 0 0 0 1 0 0 0 0

H 0 0 1 1 0 0 0 0

A B C D

A 0 0 0 1

B 1 0 0 0

C 1 0 0 0

D 1 0 0 0

11.31 A B C D E F G H 11.32

11.33

11.34 Yes, it must be a directed graph. Suppose that G is an undirected graph with n nodes. There are n possible degrees—0, 1, . . . , n − 1 (with the last degree possible only if every other node is a neighbor). But it is not possible to have a node u of degree 0 and a node v of degree n − 1: either {u, v} ∈ E (in which case u has nonzero degree) or {u, v} ∈ / E (in which case v’s degree is less than n − 1). Thus there are only n − 1 possible degrees in G, and n nodes. By the pigeonhole principle, two nodes must have the same degree. Thus we have a contradiction of the assumption, and so G cannot be an undirected graph. 11.35 We have n nodes {0, 1, 2, . . . , n − 1}, with edges ⟨x, y⟩ when x < y. Thus node i has in-degree i (from the nodes 0, 1, . . . , i − 1) and out-degree n − 1 − i (to the nodes i + 1, i + 2, . . . , n). Thus every node has a unique out-degree, as required:

0

1

11.36 There are n − 1 edges out of

n 2



2

3

4

5

···

n−1

= n(n − 1)/2 possible edges. The density is thus n−1 2(n − 1) 2 = = . n(n − 1)/2 n(n − 1) n

11.37 There are n edges out of

n 2



= n(n − 1)/2 possible edges. The density is thus 2n 2 n = = . n(n − 1)/2 n(n − 1) n−1

11.38 There are 3 edges in each triangle, and

n 3

triangles. So there are n edges, and the density is (just as for a cycle)

n 2n 2 = = . n(n − 1)/2 n(n − 1) n−1

11.2 Formal Introduction

225

11.39 All nodes have identical degree in this graph, and all have the same number of potential neighbors too. Thus the density of the graph as a whole is equal to the fraction of actual neighbors to possible neighbors for any particular node. Consider any single node. It has n3 − 1 neighbors. It could have n − 1 neighbors. Thus the density of the graph is

−1 n−3 = . n−1 3n − 3 n 3

100

11.40

110

000

010 101

111

001

0000 0001 0010 0011 0100 0101 0110 0111 1000 1101 1010 1011 1100 1101 1110 1111

: : : : : : : : : : : : : : : :

0001, 0010, 0100, 1000 0000, 0011, 0101, 1001 0011, 0000, 0110, 1010 0010, 0001, 0111, 1011 0101, 0110, 0000, 1100 0100, 0111, 0001, 1101 0111, 0100, 0010, 1110 0110, 0101, 0011, 1111 1001, 1010, 1100, 0000 1000, 1011, 1101, 0001 1011, 1000, 1110, 0010 1010, 1001, 1111, 0011 1101, 1110, 1000, 0100 1100, 1111, 1001, 0101 1111, 1100, 1010, 0110 1110, 1101, 1011, 0111

0000 0001 0010 0011 0100 0101 0110 0111 1000 1101 1010 1011 1100 1101 1110 1111

11.41

011

11.42 0000 0001 0010 0011 0100 0101 0110 0111 1000 1101 1010 1011 1100 1101 1110 1111

0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 0

0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1

0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1

0 0 0 0 1 0 0 0 1 1 0 0 0 1 1 0

0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1

0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0

0 1 0 0 0 0 0 0 1 1 0 1 0 1 0 0

1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0

0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1

1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0

0 1 0 0 1 0 0 1 0 1 0 0 0 1 0 0

1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0

0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 0

1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0

1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0

0 1 1 0 1 0 0 0 1 0 0 0 0 0 0 0

226

Graphs and Trees

11.43 The degree of each node is n: there P is one bit to be flipped per bit position, and there are n bit positions. There are 2n nodes. So the total number of edges is u degree(u) = 2n · n, and thus there are n · 2n−1 edges. So the density is n · 2n−1 2n (2n −1) 2

=

n 2n − 1

11.44 Yes, these graphs are isomorphic. (Actually they’re both the hypercube H3 .) One way to see it is via the following mapping (you can verify that all the edges match up using this mapping). A

↕ J

B

↕ O

C

↕ P

D

↕ I

E

↕ L

F

↕ M

G

↕ K

H

↕ N

11.45 No, they’re not: there are four degree-3 nodes in the first graph (B, D, E, and G) but only two in the second (H, M). 11.46 No, they’re not isomorphic. The nodes 11 and 13 have degree zero in G1 , but only node 23 has degree zero in G2 . 11.47 The claim is true: there must be precisely two edges (the sum of the nodes’ degrees is 4), and they must connect two distinct pairs of nodes—otherwise there’d be a node with degree 2. The only graph of this form looks like

11.48 The claim is true. There can be only one pair of nodes missing an edge. Just mapping the two degree-three nodes to each other establishes the isomorphism. 11.49 The claim is false. Consider these graphs:

In the latter case, the two nodes of degree three are neighbors. In the former, they’re not. 11.50 The claim is false. Consider these graphs:

In the former case, there’s a way to get from every node to every other node. (In the language that we’ll see soon: the graph is connected.) That’s not true in the latter graph, and therefore they can’t be isomorphic. 11.51 There’s a complete subgraph of size 4: D, G, E, and I. 11.52 There’s a complete subgraph of size 5: B, C, E, H, and J. 11.53 There are no triples of nodes that are all joined by an edge, so the largest clique in this graph is of size two. Any pair of nodes with an edge between them forms a complete subgraph of size 2.

11.2 Formal Introduction

227

11.54 There could be as few as five movies, but not fewer. Each of the three “big” cliques needs at least one movie to generate it, and the curved edges can only have been the result of a 2-person movie (because they do not participate in any larger cliques). 11.55 We can’t be certain. Any additional movie m that involved a subset of the actors of some movie m′ described in the previous exercise wouldn’t create any additional edges. (That is, if m ⊆ m′ then adding m to a list of movies including m′ would have no effect on the collaboration graph.) So we can’t tell from this representation whether any such movie exists! 11.56 This graph is bipartite for any n. All edges join an even-numbered node to an odd-numbered node, so we can separate the nodes into two columns based on parity. 11.57 This graph is bipartite for any even n. In this case, all edges join an odd-numbered node to an even-numbered node (because ⟨n − 1, (n − 1) + 1 mod n⟩ joins an odd node to 0, an even node). So for even n we can separate the nodes into two columns based on parity. But the graph is not bipartite for odd n, for essentially the same reason: 0 goes in one column, then 1 must go in the other, then 2 must go into 0’s column, etc. But we get stuck when we try to join n − 1 (which is even) to 0 (also even). 11.58 This is bipartite if and only if n ∈ {1, 2}. Once n = 3, we’re stuck: there’s a triangle! 11.59 This graph is always bipartite. Put nodes {0, 1, . . . , n − 1} in the first column and nodes {n, n + 1, . . . , 2n − 1} in the second column. 11.60 Yes, it is bipartite. Here’s a rearrangement of the edges into two columns: A

H

C

E

B

D

F

G

11.61 This graph is not bipartite. If we put B in the left column, then we must put both C and E in the right column—but there’s an edge between them, so we’ve failed. 11.62 True: every edge connects a node in L to a node in R. Thus both the sum of the degrees of the nodes in L and the sum of the degrees of the nodes in R must equal |E|. 11.63 False. For example, consider L = {1} and R = {2} and there’s an edge {1, 2}. Then the sum of the degrees of the nodes in L is 1. 11.64 True! In fact, this claim is true even if the graph isn’t bipartite, by Theorem 11.8. 11.65 There are |L| · |R| = |L| · (n − |L|) edges in the graph. You can show in many different ways (calculus, or just plotting this quantity) that this quantity is maximized when |L| = n2 . In this case, the number of edges in the graph is n · n2 = n2 /4. 2 11.66 Zero! There’s nothing in the definition that prevents us from having |L| = 0 and |R| = n. And the complete bipartite graph K0,n has zero edges! 11.67 False. For example, the following pentagon is not bipartite, but contains no triangle as a subgraph:

228

Graphs and Trees

2 3

1 5

4

The correct statement, incidentally: any graph that does not contain any odd-length cycle (see Section 11.4) as a subgraph is bipartite. 11.68 The total number of edges is precisely the sum of the in-degrees and is also equal to the sum of the out-degrees. (See Exercise 11.18.) That is, X X din = |V| · din = |E| and dout = |V| · dout = |E|. u∈V

u∈V

Therefore din = dout = |E|/|V|. 11.69 Here is the same graph, with the nodes rearranged so that no edges cross: F

B E

A

C

D

G H 11.70 Here is the same graph, with the nodes rearranged so that no edges cross: A

H

B

G D

F I

C E

11.71 We’ll prove it by giving an algorithm to place the nodes without edges crossing.

1 y := 0 2 while there are undrawn nodes: 3 4 5 6 7 8 9 10 11

choose an arbitrary undrawn node v0 and place it at the point ⟨0, y⟩ let v1 be one of v0 ’s neighbors, chosen arbitrarily, and place it at the point ⟨1, y⟩ x := 1 while vx ’s other neighbor (other than vx−1 ) is not v0 : place vx at the point the point ⟨x, y⟩ let vx+1 be vx ’s other neighbor x := x + 1 draw the edge from vx to v0 [without the vertical position of the arc going above y + 0.5] y := y + 1

Consider an iteration for a particular value of y. Each newly encountered node must have either a “new” second neighbor, or it must have v0 as its second neighbor. (It cannot be any other previously encountered node, because all of those nodes already have their two neighbors accounted for!) When we find the node that reaches back to v0 —and we must, because the set of nodes is finite—we simply draw an curved arc back to v0 . There may still be nodes left, undrawn, at this point; we, just repeat the above algorithm (shifted far enough up so that the newly drawn nodes won’t be anywhere near the previously drawn ones) for a newly chosen v0 .

11.3 Paths, Connectivity, and Distances

229

We were asked to prove that any 2-regular graph is planar, and we’ve now done so constructively: we’ve given an algorithm that finds a planar layout of any such graph!

11.3 Paths, Connectivity, and Distances 11.72 ⟨D, E, B⟩ 11.73 ⟨C, H⟩ and ⟨C, D, E, B, H⟩ 11.74 ⟨C, D, B⟩ 11.75 ⟨A, G, H⟩ and ⟨A, G, B, H⟩ 11.76 ⟨D, B, C, E, G, B, H⟩ 11.77 ⟨B, A, F, D, C⟩ 11.78 ⟨B, C, F⟩ 11.79 ⟨B, E, H, F, D, C⟩ (among others) 11.80 {A, B, C, F}. 11.81 All of them: {A, B, C, D, E, F, G, H}. 11.82 The graph is not connected; there’s no path between A and B, for example. The connected components are:

• {A, F, G} • {B, C, D, E, H} 11.83 The graph is not strongly connected; there’s no path from any node ̸= A to A, for example. The strongly connected components are:

• • • •

{ A} { F} { H} {B, C, D, E, G}

11.84 The graph is connected. We can go from B to every other node except E via prefixes of the path ⟨B, A, F, D, C⟩, and get to E on the path ⟨B, A, E⟩. The graph is undirected, so each of these paths can also be followed in reverse. That’s enough to get from any node u to any node v: go from u to B as above, then go from B to v. 11.85 The graph is not strongly connected; there’s no path from any node A to D, for example. The strongly connected components are

• {A, B, C, F} • {D, E} 11.86 The adjacency matrix is symmetric, so we might as well view the graph as undirected (though it could be directed, too, just with every edge matched by a corresponding one that goes in the opposite direction). The graph is connected, which is easiest to see via picture:

230

Graphs and Trees

C

D

F

A H

B

G

E

11.87 Suppose there’s a path from s to t. By definition, that is, there is a sequence ⟨u0 , u1 , . . . , uk ⟩ where:

• u0 = s and uk = t, and • for each i ∈ {0, 1, . . . , k − 1} we have ⟨ui , ui+1 ⟩ ∈ E. Then the sequence ⟨uk , uk−1 , . . . , u0 ⟩ is a path from t to s: its first entry is t = uk ; its last entry is s = u0 ; and every pair ⟨ui+1 , ui ⟩ ∈ E because the graph is undirected (and thus ⟨ui , ui+1 ⟩ ∈ E if and only if ⟨ui+1 , ui ⟩ ∈ E). 11.88 In short: we can snip out the part between visits to the same node. Consider any non-simple path P = ⟨u0 , u1 , . . . , uk ⟩ from u0 to uk . Let v be the first node repeated in P, and say that v = ui and v = uj for j > i. That is, P = ⟨u0 , u1 , . . . , ui−1 , v, ui+1 , . . . , uj−1 , v, uj+1 , . . . , uk ⟩. Then we claim that ′

P = ⟨u0 , u1 , . . . , ui−1 , v, uj+1 , . . . , uk ⟩ is also a path from u0 to uk (all adjacent pairs in P′ were also adjacent pairs in P, so the edges exist), and it’s shorter than P. Thus P could not have been a shortest path. 11.89 In the complete graph Kn , we have

( d(s, t) =

1 if s ̸= t 0 if s = t.

Thus the diameter can be as small as 1 (if n ≥ 2) and it can be 0 (if n = 1). 11.90 The largest possible diameter is n − 1, which occurs on the graph that’s just an n-node path:

It can’t be larger, because the shortest path is always simple, and thus includes at most n nodes (and thus n − 1) edges. And the distance between the two ends of the path is exactly n − 1. 11.91 There can be as many as n/4 connected components: each connected component is a K4 , which gives each node an edge to the other 3 nodes in its component. There are precisely 4 nodes in each component, and thus n/4 total connected components. 11.92 There can be as few as one single connected component in such a graph. For example, define the graph with nodes V = {0, 1, . . . , n − 1} and edges {⟨u, (u + 1) mod n⟩ : u ∈ V}. This graph is 2-regular (every node has an edge to the node “one higher” and to the node “one lower”), and is connected. Add an arbitrary matching to these edges—for example, add the edges {⟨u, u + (n/2)⟩ : 0 ≤ u < n/2}—and the resulting graph is 3-regular, and still connected. 11.93 Here’s a picture of such a graph:

···

11.3 Paths, Connectivity, and Distances

231

We have n/4 groups of 4 nodes each. The graph is 3-regular: the two middle nodes in each group have three intragroup neighbors; the top and bottom nodes have two intragroup and one intergroup neighbors. The groups are arranged in a circle; the distance from a node s in group #1 to a node t in group #(n/8) is at least n/8 (because we have to go group by group from s to t, and either direction around the circle passes through n/8 groups). 11.94 Create a complete binary tree of depth k. Add a cycle in all subtrees of four leaves to give all the leaves a degree of 3. Add one extra leaf from the bottom-most right-most node, and include that extra node in the last cycle; then add an edge from that extra node back to the root. Every node now has degree 3:

We’ll argue that there is an upper bound of 8 log n on every distance. To get from any node s to any other node t, consider the path that goes via the root, using only tree edges. That path has length at most 2k. There are 1 + 2 + 4 +· · ·+ 2k = 2k+1 − 1 nodes. Observe that k+1

8 log n = 8 log(2

− 1) ≥ 8(k + 1) log 2 ≥ 8k.

And we showed that path lengths were at most 2k. 11.95 False. The claim fails on the graph L = {1, 3} and R = {2, 4} and E = {{1, 2} , {3, 4}}. 1

2

3

4

11.96 ABCDEJHFGI (among others). 11.97 We can choose any order of the n − 2 nodes other than {s, t}, and visit them in that order. So there are (n − 2)! different Hamiltonian paths from s to t in the n-node complete graph. 11.98 Suppose s is on the left side. After 2k − 1 steps, we will have visited k nodes on the left, and k − 1 nodes on the right. Thus we can only find a Hamiltonian path in the following two cases:

• each side of graph has the same number of nodes (that is, n = m), and furthermore s and t are on opposite sides. • s’s side of the graph has one more node than the other side (that is, n = m + 1), and furthermore s and t are both on the larger (n-node) side of the graph. In the former case, we can visit all n − 1 other nodes on the s-side in any order, and all m − 1 other nodes on the t-side in any order—though we must alternate sides. So there are (n − 1)! · (m − 1)! orderings. In the latter case, we can visit all n nodes on the m-node side in any order, and all n − 2 other nodes on the n-node side in any order after we start at s and before we finish at t. Again we must alternate sides. So there are (n − 2)! · m! orderings. In any other case, there are no Hamiltonian paths from s to t. 11.99 Fix a node u. There are 2 nodes at distance 1 from u (its neighbors), there are 2 nodes at distance 2 from u (its neighbors’ neighbors), and so forth—up to 2 nodes at distance (n − 1)/2. Thus 2 · (1 + 2 + 3 + · · · + n−1 ) 2 n−1 n−1 n−1 · ( + 1) 1 2 =2· 2 · 2 n−1 n−1 +1 n+1 = 2 = . 2 4

the average distance from u to other nodes =

the above argument arithmetic summation (Theorem 5.3) algebraic simplification

232

Graphs and Trees

11.100 Fix a node u. There are 2 nodes at distance 1 from u (its neighbors), there are 2 nodes at distance 2 from u (its neighbors’ neighbors), and so forth—up to 2 nodes at distance (n − 2)/2. There is one additional node at distance n/2 from u. Thus

) + n2 2 · (1 + 2 + 3 + · · · + n−2 2 n−1 n−2 n−2 2 · 2 · ( 2 + 1) + 2n 1 = · 2 n−1 n2 = . 4(n − 1)

the average distance from u to other nodes =

the above argument arithmetic summation (Theorem 5.3) algebraic simplification

11.101 Following the hint, we’ll compute the number of pairs of nodes in an n-node path separated by distance k, for any particular integer k. The largest node-to-node distance is certainly ≤ n, and there are n2 pairs of nodes. Thus: Pn the average distance between nodes in an n-node path =

k=1 (the

number of pairs separated by distance k) · k  . n 2

Looking at the nodes from left to right, observe that the first n − k nodes have a node to their right at distance k. (The node that’s k hops from the right-hand edge of the path does not have a k-hops-to-the-right node, because that would-be node is beyond the end of the chain.) Thus there are precisely n − k pairs of nodes that are separated by distance k, and therefore the average distance is Pn

k=1 (n − k) · k  n 2  P n P 2 n [ nk=1 k] − k=1 k  = n 2 n(n+1)(2n+1) n n(n+1) − 2  6 = n 2 n − 2n+1 = n · (n + 1) · 2 n(n−1)6 2 3n−2n−1 6 = n · (n + 1) · n(n−1) = 2

the above argument

multiplying out and factoring

sum of the first n (squares of) integers (Theorem 5.3/Exercise 5.1)

factoring; definition of

n · (n + 1) ·

n−1 6 n(n−1) 2

=

n+1 . 3

(n) 2

algebra

11.102 A solution in Python is shown in Figure S.11.1, which verifies the three previous exercises’ formulas for 2 ≤ n ≤ 512. 11.103 Observe that if G = ⟨V, E⟩ is disconnected then there must be a partition of V into disjoint sets A ∪ B with 1 ≤ |A| ≤ n − 1 such that no edge crosses from A to B. If there are k nodes in A and n − k nodes in B, then the maximum number of possible edges is ! k + 2

! n−k k(k − 1) + (n − k)(n − k − 1) = 2 2 k(k − 1) + n2 − n(k + 1 + k) + k(k + 1) 2 k2 − k + n2 − 2kn − n + k2 + k = 2 2k2 − 2kn + n2 − n = . 2

=

11.3 Paths, Connectivity, and Distances

1 2 3 4 5 6 7 8

233

def average_distance(nodes, distance_function): ''' Computes the average distance in a graph with the given nodes and the distance between node U and node V given by distance_function(U,V). ''' all_distances = [distance_function(x, y) for x in nodes for y in nodes if x != y] num_pairs = [1 for x in nodes for y in nodes if x != y] return sum(all_distances) / sum(num_pairs)

9 10 11 12 13 14 15

def average_distance_in_path(n): ''' Computes the average distance in a graph (1) -- (2) -- (3) -- ... -- (n-1) -- (n). ''' distance = lambda u, v : abs(u - v) # The distance between node U and node V is |U - V|. return average_distance(range(1, n+1), distance)

16 17 18 19 20 21 22 23 24 25

def average_distance_in_cycle(n): ''' Computes the average distance in a graph (1) -- (2) -- (3) -- ... -- (n-1) -- (n). |____________________________________| ''' # The shortest path length between U and V: the smaller of # the clockwise distance (V - U) % n and the counterclockwise distance (U - V) % n. distance = lambda u, v : min((v - u) % n, (u - v) % n) return average_distance(range(1, n+1), distance)

26 27 28 29 30 31 32

for n in range(2, 512): # All tests pass! assert average_distance_in_path(n) == (n + 1) / 3 if n % 2 == 0: assert average_distance_in_cycle(n) == n**2 / (4 * (n - 1)) if n % 2 == 1: assert average_distance_in_cycle(n) == (n + 1) / 4

Figure S.11.1 Verifying the average distance in a path and a cycle.

There are multiple ways to identify the values of k that cause extreme values, but differentiating is the easiest. This quantity has extreme points at the endpoints of the range (k = 1 and k = n − 1) and the value of k where d 2k2 − 2kn + n2 − n dk 2 4k − 2n ⇔0= 2 n ⇔k= . 2 0=

We can check that k = n/2 is a minimum, and k ∈ {1, n − 1} are the maxima we seek. In other words: G can be an (n − 1)-node clique Kn−1 , plus one additional node. That graph has

(n − 1) · (n − 2) 2 edges, and we argued above that we can’t have more edges in a disconnected graph. 11.104 We claim that the smallest possible number of edges in an n-node connected graph is n − 1. An n-node path has this many edges, so we only need to show that fewer isn’t possible. Let P(n) denote the claim that no (n − 2)-edge n-node graph is connected; we’ll prove P(n) for all n ≥ 2 by induction on n. base case (n = 2). Obviously, if there are zero edges, the graph is disconnected. Thus P(2) holds. inductive case (n ≥ 3). We assume the inductive hypothesis P(n − 1), and we must prove P(n). Let G be an arbitrary connected n-node graph with (n − 2) or fewer edges. First we claim that there must be a node u with degree 1. (If every node has degree at least 2, then the sum of the degrees is at least 2|V| and thus |E| ≥ n, by Theorem 11.8. If any node has degree 0, the graph is not connected.)

234

Graphs and Trees

Now let G′ be G with both the node u and the lone edge incident to u deleted. Then G′ has n − 1 nodes and n − 3 or fewer edges. But, by the inductive hypothesis P(n − 1), then G′ is not connected. Adding back in node u cannot bridge components because u has only one neighbor. Therefore G is also disconnected, and P(n) holds. 11.105 Define the graph with nodes V = {0, 1, . . . , n − 1} and edges {⟨u, (u + 1) mod n⟩ : u ∈ V}. In other words, the graph consists of a circle of n nodes, with a directed edge from each node to its clockwise neighbor. This graph is strongly connected (we can get from i to j by “going around the loop”) and has n edges. Now we’ll argue that there cannot be fewer edges. For any n ≥ 2, if a graph has fewer than n edges, then there’s a node u with out-degree equal to 0, and no other nodes are reachable from u—so the graph isn’t strongly connected. 11.106 Define the graph with nodes V = {1, 2, . . . , n} and edges {⟨u, v⟩ : u < v}. Every node is alone in its own SCC P P (there are no “left-pointing” edges), and there are ni=1 n − i = n−1 i=0 i = n(n − 1)/2 edges. Now we’ll argue that there cannot be more edges. If a graph has more than n(n − 1)/2 edges, then there’s a pair of nodes u and v with an edge ⟨u, v⟩ and another edge ⟨v, u⟩. (The argument is by the pigeonhole principle: there are n(n − 1)/2 unordered pairs of nodes and more than n(n − 1)/2 edges, so there must be an unordered pair of nodes containing more than one edge.) These nodes u and v are in the same SCC. 11.107 Let dG (u, v) denote the distance (shortest path length) between nodes u ∈ V and v ∈ V for a graph G = ⟨V, E⟩. Then: Reflexivity: For any node u ∈ V, there’s a path of length 0 from u to u, namely ⟨u⟩. There’s no path of length 0 from u to v ̸= u, because any such path has the form ⟨u, . . . , v⟩, which contains at least two nodes. Symmetry: Symmetry follows immediately from Exercise 11.87. Triangle inequality: Let dG (u, z) = k and dG (z, v) = ℓ. Consider a shortest path ⟨u, p1 , p2 , . . . , pk−1 , z⟩ from u to z and a shortest path ⟨z, q1 , q2 , . . . , qℓ−1 , v⟩ from z to v. Then the sequence

⟨u, p1 , p2 , . . . , pk−1 , z, q1 , q2 , . . . , qℓ−1 , v⟩ is a path from u to v of length k + ℓ (after we delete any segments that repeat edges, as in Exercise 11.88). This path may not be a shortest path, but it is a path, and therefore the shortest path’s length cannot exceed this path’s length. So dG (u, v) ≤ k + ℓ. 11.108 Consider G = ⟨V, E⟩ where V = {A, B, C} and E = {⟨A, B⟩, ⟨B, C⟩, ⟨C, A⟩}. Then dG (A, B) = 1 via the path ⟨A, B⟩ but dG (B, A) = 2 via the path ⟨B, C, A⟩. Thus symmetry is violated even though the graph is strongly connected. 11.109 The quantification is over all pairs in C. If there’s a path from A ∈ C to B ∈ C but not from B to A, then under the old definition C doesn’t meet the specification for ⟨s, t⟩ = ⟨A, B⟩ or for ⟨s, t⟩ = ⟨B, A⟩. Under the new definition C only fails for ⟨s, t⟩ = ⟨B, A⟩—but it fails nonetheless! 11.110 Let R(u, v) denote mutual reachability in a directed graph G. Then: Reflexivity: For any node u, we have that ⟨u⟩ is a path from u to u. Thus R(u, u). Symmetry: For any nodes u and v, if R(u, v) then (i) there’s a path from u to v, and (ii) a path from v to u. But (ii) and (i), respectively, establish that R(v, u). Transitivity: For any nodes u and v and x, if R(u, v) and R(v, x), then there’s a directed path P1 from u to v, a directed path P2 from v to u, a directed path Q1 from v to x, and a directed path Q2 from x to v. Stapling together P1 Q1 and Q2 P2 yields directed paths from u to x and from x to u. 11.111 The SCCs are:

• {A, B, F} • {C, E, D} • { G} 11.112 The graph is strongly connected. The following list of edges describes a way to go from node 1 back to node 1 visiting every node along the way: 1 → 5 → 1 → 9 → 1 → 2 → 10 → 4 → 7 → 8 → 12 → 8 → 11 → 6 → 7 → 0 → 3 → 1. This list means that there’s a directed path from every node to every other node, and the graph has only one SCC (containing all nodes). 11.113 Node D: it’s at distance 4, and no other node is that far away from A.

11.3 Paths, Connectivity, and Distances

235

11.114 Nodes G and E: both are at distance 3, and no other nodes are that far away from A. 11.115 Nodes 2, 5, 9, 10, 11, and 12 are all distance 3 away from 0. 11.116 Nodes 5 and 9 are both distance 8 away from 12; no other nodes are that far away. 11.117 Here’s the modification to BFS (as described in Figure 11.46b) to compute distances. Changes to the pseudocode are underlined.

Breadth-First Search (BFS): Input: a graph G = ⟨V, E⟩ and a source node s ∈ V Output: the set of nodes reachable from s in G, and their distances 1 Frontier := ⟨⟨s, 0⟩⟩

// Frontier will be a list of nodes to process, in order.

2 Known := ∅

// Known will be the list of already-processed nodes.

3 while Frontier is nonempty: 4 5

⟨u, d⟩ := the first node and distance in Frontier remove ⟨u, d⟩ from Frontier

for every neighbor v of u: if v is in neither Frontier nor Known then 8 add ⟨v, d + 1⟩ to the end of Frontier 9 add u to Known 10 return Known 6 7

11.118 The list is always of the form (block of nodes at distance d) + (block of nodes at distance d + 1). We can prove this fact inductively: to start, there’s only one distance (zero) stored in the list, and, while we’re processing a node in the distance d block, we can only add nodes of distance d + 1 to the end of Frontier. (We could write out a fully rigorous proof by induction; it’s good practice to do so, but we’ll omit it here.) 11.119 Consider the following graph: H F B

D

C

E

A

G Starting DFS at A will discover, say, A, then B, then F. (If the order in which nodes are visited is slightly different, we’d have to tweak the example slightly—to include nodes at H’s “level” of the graph that are neighbors of each of D, E, and G—but the idea is precisely the same.) At this point, all of C, D, and H will be in Frontier, and their distances are 1, 2, and 3, respectively. P 11.120 The ⟨i, j⟩th entry of the product is k Mi,k Mk,j , which denotes the number of entries k such that Mi,k = Mk,j = 1: that is, the number of nodes k such that i → k and k → j are both edges in the graph. Thus (MM)i,j denotes the number of paths of length 2 from i to j. 11.121 A solution in Python is shown in Figure S.11.2.

236

Graphs and Trees

1 2 3 4 5

def distance(wordA, wordB): if len(wordA) != len(wordB): return None else: return sum([wordA[i] != wordB[i] for i in range(len(wordA))])

6 7 8 9 10

def word_ladder(start, finish): valid = [w for w in words if len(start) == len(w)] def neighbors(w): return [x for x in valid if distance(w,x) == 1]

11

previous_layer = [start] known = {start: None} # known[word] = the word in the previous layer one different from word while finish not in known: next_layer = [] for parent in previous_layer: for child in neighbors(parent): if child not in known: known[child] = parent next_layer.append(child) previous_layer = next_layer

12 13 14 15 16 17 18 19 20 21 22

# Reconstruct the path by tracing backwards from finish back to start. path = [finish] while path[-1] != start: path.append(known[path[-1]]) return path

23 24 25 26 27 28 29 30 31

with open("/usr/share/dict/words") as f: words = [x.strip() for x in f] print(word_ladder("smile", "frown"))

Figure S.11.2 Finding a word ladder using BFS.

11.4 Trees 11.122 ABCA, BCEB, DEGFD, ABECA 11.123 ABGECA, BDFHGB, ABDFHGECA 11.124 CEC, FHF, ABGECA, ABDFHGECA 11.125 A simple cycle can visit each node once and only once, and thus traverses at most n edges. And a simple cycle traversing n edges is achievable, as in

11.126 A cycle must enter and leave each node it visits the same number of times. Thus the number of edges incident to any node u that are traversed by the cycle must be an even number. This solution is closely related to Eulerian paths, which traverse each edge of a graph once and only once. (Compare them to Hamiltonian paths, which go through each node once and only once.) Consider Kn for an odd number of nodes n. Then every node has even degree, and it turns out that we can actually traverse all of the edges in a cycle. Here’s the algorithm:

11.4 Trees

237

• start walking from node u by repeatedly choosing an arbitrary neighbor of the current node, deleting edges as they’re traversed, and stopping when the path is at a node with no remaining edges leaving it. Call C the resulting cycle. (Note that the only node where we can have ended is u: every node has even degree, and entering a node leaves it with odd remaining degree.) We may not have traversed every edge in the graph, but every node still has even degree. Also, we can’t be stuck anywhere unless we’ve visited every node, as the original graph was complete. So, after computing C:

• choose a node v with nonzero remaining degree. Walk from it as before until you return to v. “Splice” in the edges you just found into C, just after the first time C passes through v.

• repeat until there are no edges left.

 This process will traverse every edge—that is, all n2 edges.  In an n-node graph where n is even, it’s not possible to use all n2 edges, because each node’s degree is odd, so each node must have one untraversed edge. But we can proceed exactly as above, counting a node as unvisitable as soon as it  has remaining degree 1. This cycle will leave exactly 2n unvisited edges, for a total cycle length of n2 − n2 . 11.127 Choose any subset S of the nodes other than u. If |S| ≥ 2, then there is a cycle involves u and the nodes of S. In fact, there are |S|! different orderings of those other nodes, defining |S|! different cycles—except that we’ve double counted in a slightly subtle way. Specifically, we’ve counted each cycle ⟨u, s1 , s2 , . . . , s|S| , u⟩ and ⟨u, s|S| , s|S|−1 , . . . , s1 , u⟩ separately, going “clockwise” and “counterclockwise,” while they’re actually the same cycle. How many such subsets are there? We have n − 1 other nodes, and we can choose any number k ≥ 2 of them. There are n−1 subsets of k other nodes, and thus the number of cycles in which u participates is k ! n−1 n−1 X X (n − 1)! k! n−1 k! · · = k 2 k ! · ( n − k − 1 )! 2 k=2 k=2

(n − 1)! X n−1

=

2

k=2

definition of choose

1

(n − k − 1)!

n−3 (n − 1)! X 1 = . 2 j! k=0

reindexing: j = n − k − 1

P As n gets large, it turns out that ni=0 i!1 approaches e = 2.7182818 · · · . (If you happen to have seen Taylor series in calculus and you also happen to remember Taylor series from calculus, you can use them to show that this summation approaches e.) So for large n the number of cycles involving a particular node is approximately e(n − 1)! . 2 11.128 The only difference in the directed graph is that there’s no double counting, because ⟨u, s1 , s2 , . . . , s|S| , u⟩ and ⟨u, s|S| , s|S|−1 , . . . , s1 , u⟩ are different cycles. Thus the answer is twice the previous exercise’s solution:

(n − 1)! ·

n−3 X 1 , j ! k=0

which tends to e · (n − 1)! as n gets large. 11.129 The problem here is with the neighbors of s themselves: they have edges that go back to s, which falsely leads to the algorithm claiming that a cycle exists. For example, in the two-node graph with an edge from A to B, start running the algorithm with s = A. After the A iteration, Known = {A} and Frontier = ⟨B⟩. And s = A is a neighbor of the first node in Frontier and so we claim there’s a cycle, but there isn’t one. Here what’s perhaps the simplest fix: if s has k neighbors u1 , u2 , . . . , uk , let’s run BFS k times looking for s. In the ith run, we start the search from ui but exclude the edge {s, ui } from the graph. Note that if we find any other uj for j ̸= i in the BFS starting at ui , then there’s a cycle. Thus these BFS runs are on disjoint subsets of the nodes, and therefore their total running time is unchanged from the original BFS, and the algorithm still runs in O(|V| + |E|) time. 11.130 We can use the same algorithm as in the proof of Lemma 11.33:

238

Graphs and Trees

1 let u0 be an arbitrary node in the graph, and let i := 0 2 while the current node ui has no unvisited out-neighbors: 3 4

let ui+1 be an out-neighbor of ui that has not previously been visited. increment i

(The only change from Lemma 11.33 is underlined.) This process must terminate in at most |V| iterations, because we must visit a new node in each step. If the last node that we visited does not have out-degree zero, then we have found a cycle (just as in Lemma 11.33). 11.131 Choose any edge ⟨u, v⟩, and run the algorithm from the proof of Lemma 11.33 from u: that is, starting at u, repeatedly move to an unvisited neighbor of the current node; stop when the current node has no unvisited neighbors. You’ll find a degree-1 node x when you stop, because the graph is acyclic (and you’ll have gone at least one step—to v if nowhere else). Now restart the same process from x. For precisely the same reasons, you’ll terminate at a node y with degree 1, and y cannot equal x for the same reason that x ̸= u. 11.132 No simple path can contain a repeated node, including s—so a path from s to s isn’t a valid simple path unless it has length 0. (Under this definition, there’s no such thing as a cycle of nonzero length!) Even aside from that problem, this definition also forbids other nodes being repeated in the middle of the cycle, which is allowed in non-simple cycles. 11.133 This definition says that A,B,A is a cycle in the 2-node graph with V = {A, B} and E = {{A, B}}. (Thus, under this definition, no graph with an edge would be acyclic!) 11.134 This definition says that ⟨s⟩ (a 0-length path from s to itself) is a cycle. (Thus, under this definition, no graph at all would be acyclic!) 11.135 This definition says that A,B,C,B,A is a cycle in the graph 11.136 This definition says that A,B,C,D,B,A is a cycle in

A

B

A B

C C

. But this graph is acyclic!

. But A is not involved in a cycle!

D

11.137 If there’s a cycle c = ⟨s, u1 , u2 , . . . , uk , s⟩, then repeat the following until c is simple: find an i and j > i such that ui = uj , and redefine c as c′ = ⟨s, u1 , . . . , ui−1 , ui , uj+1 , . . . , s⟩. This c′ is still a cycle, and it’s shorter by at least one edge than c was. After at most |E| repetitions of this shrinking process, the cycle c is simple. (A loose upper bound on the number of shrinking rounds is the number of edges in the graph, since the original c can traverse each edge at most once and we shrink by at least one edge in each round.) Thus we’ve argued that if there’s a cycle in the graph, then there’s a simple cycle in the graph too. Conversely, if there’s a simple cycle, then there’s certainly a cycle—simple cycles are cycles. Thus G has a cycle if and only if G has a simple cycle. 11.138 The two graphs are the 2-node, 1-edge graph, and the 1-node, 0-edge graph: A

B

A

11.139 By Lemma 11.33, there must be a node in a tree with degree 0 or degree 1. So the graph is either 0-regular or 1-regular.

• For a 0-regular graph, by Theorem 11.8, we have that the number of edges is |E| = 21 · |V| · 0 = 0. By Theorem 11.35, we need |E| = |V| − 1. Thus |V| = 1. • For a 1-regular graph, by Theorem 11.8, we have that the number of edges is |E| = 21 · |V| · 1 = |V|/2. By Theorem 11.35, we need |E| = |V| − 1. Thus |V|/2 = |V| − 1, and so |V| = 2. 11.140 In Kn , every unordered quadruple of nodes forms multiple squares: for the triple ABCD, the squares are ABCDA, ABDCA, and ACBDA. (We might as well start from A; the  other orderings of BCD are the same as the above three cycles but going around in the opposite direction.) There are n4 quadruples of nodes, so there are 3 · n4 squares.

11.4 Trees

11.141 In Kn , every unordered trio of nodes forms a distinct triangle. There are

n 3



trios of nodes, so there are

n 3



239

triangles.

11.142 Let S denote the non-v neighbors of u and let T denote the non-u neighbors of v. If there exists a node z ∈ S ∩ T, then there’s a triangle of nodes {u, v, z}; thus S ∩ T = ∅. Thus

degree(u) + degree(v) = |S| + 1 + |T| + 1 = |S ∪ T| + 2 ≤n−2+2 = n.

definition of S and T S and T are disjoint by the above, so |S ∪ T| = |S| + |T| both S ⊆ V − {u, v} and T ⊆ V − {u, v}, so |S ∪ T| ≤ n − 2

11.143 Let P(n) denote the claim that any n-node triangle-free graph has at most n2 /4 edges. We’ll prove that P(n) holds for all n ≥ 1 by strong induction on n. base case #1 (n = 1): There are no edges in a 1-node graph, and indeed 0 ≤ 12 /4 = 0.25. base case #2 (n = 2): There is at most one edge in a 2-node graph, and indeed 1 ≤ 22 /4 = 1. inductive case (n ≥ 3): We assume the inductive hypotheses P(n′ ) for 1 ≤ n′ ≤ n − 1, and we must prove P(n). Let G = ⟨V, E⟩ be an arbitrary n-node triangle-free graph. If there are no edges in G, we’re done. If there is an edge ⟨u, v⟩ ∈ E, then let G′ be the induced subgraph on V − {u, v}. Note that G′ is triangle-free and contains n − 2 nodes. Then

|E| = number of edges in G′ + number of edges involving u and v ≤ = ≤

(n − 2)2 4

(n − 2)2 4

(n − 2)2 4

inductive hypothesis P(n − 2)

+ number of edges involving u and v + degree(u) + degree(v) − 1

degree(u) + degree(v) double-counts the edge between u and v

+n−1

previous exercise

n2 − 4n + 4 + 4n − 4 4 n2 = . 4

=

11.144 The complete bipartite graph Kn/2,n/2 contains n nodes, is triangle-free because it’s bipartite, and contains (n/2) · (n/2) = n2 /4 edges.

11.145 No, it’s not connected. For example, there’s no path from A to C.

C

11.146 Yes; it looks like this: A

B E D

. F

240

Graphs and Trees

11.147 Yes; it looks like this: A . D C F B E 11.148 No, it’s not acyclic. For example, ACFA is a cycle. 11.149 False; the following graph is a counterexample.

11.150 False; the following graph is a counterexample.

11.151 False; the following graph is a counterexample.

11.152 It breaks on the 1-node, 0-edge graph: the root of that tree is a leaf (though it has no parent). 11.153 C, E, H, I 11.154 A, B, D, F, G 11.155 The parent is A, the children are E and F, and the only sibling is B. 11.156 E, F, G, H, I 11.157 A, D 11.158 4 at B yields the tree B . The former has two leaves; the latter has only one.

A

11.159 False: for example, rerooting B

C

A C

11.160 Let P(h) denote the claim that a complete binary tree of height h contains precisely 2h+1 − 1 nodes. We’ll prove P(h) for all h ≥ 0 by induction on h. base case (h = 0): A complete binary tree of height 0 is a single node, so it indeed has 21 − 1 = 1 nodes.

11.4 Trees

241

inductive case (h ≥ 1): We assume the inductive hypothesis P(h − 1) and we must prove P(h). Note that a complete binary tree of height h is a root with two subtrees, both of which are complete binary trees of height h − 1. Thus number of nodes in a complete binary tree of height h

= 1 + 2 · (number of nodes in a complete binary tree of height h − 1) = 1 + 2(2 − 1) h

=2

h+1

above discussion inductive hypothesis P(h − 1)

− 1.

Thus P(h) holds. 11.161 The largest number occurs in a complete binary tree of height h, which has 2h leaves. The smallest number occurs in a nearly complete binary tree in which there is only one node at distance h from the root: this tree has the 2h−1 leaves of a complete binary tree of height h − 1, except the leftmost node at level h − 1 is an internal node, not a leaf, because it has one child (the only level-h node, which is itself a leaf). Thus the number of leaves in this tree is 2h−1 − 1 + 1 = 2h−1 . 11.162 If the height of a nearly complete binary tree is h, then there are between 1 and 2h nodes in the bottom row of the tree T. There is a leaf in the left subtree of T, but the crucial question is: is there also a leaf in the right subtree of T? If so, the diameter is 2h because the only path from the leftmost leaf to the rightmost leaf goes through the root, and traverses h edges to get to the root and h edges from the root. If not, the rightmost leaf is only h − 1 away from the root, and the diameter is h + (h − 1) = 2h − 1. 11.163 Each leaf is 2h away from any leaf on the “other side” of the tree (that is, in the root’s opposite subtree), as per Exercise 11.162. Thus, as each layer of the rerooted tree is at an increasing distance from the root, the new height is 2h. 11.164 The choice of root has no effect on the diameter. It’s still the distance between the two nodes that are farthest apart—namely 2h. 11.165 No node of degree 2 or 3 can be a leaf, and every non-root node of degree 1 must be a leaf. Thus the only change is that the new root used to be a leaf and isn’t anymore, so there are 2h − 1 leaves. 11.166 Arrange the nodes in a line, so node 1 is the root, and each node i has one child, node i + 1. Every node has one child (except for the 1000th node), and the height of the tree is 999. 11.167 A nearly complete binary tree of height 9 can fit 1000 nodes, so it’s possible to have height 9. Can we have a binary tree of height 8 that contains 1000 nodes? No: in a complete binary tree of height 8, there are P P precisely 8i=0 2i nodes. And 8i=0 2i = 29 − 1 = 511, which is less than 1000. 11.168 There can’t be more that 500 leaves in a 1000-node graph, by Example 5.19/Example 5.20. And the nearly complete binary tree from the previous exercise has 500 leaves (see the structure below), which therefore achieves the maximum possible:

9 full layers, with 1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 + 256 = 511 nodes

489 nodes in the last layer (all leaves)

9th full layer, containing 256 total nodes. (From left to right: 244 internal nodes with 2 children; 1 internal node with 1 child; 11 leaves)

242

Graphs and Trees

11.169 The “straight line” graph—every node except the 1000th has precisely one child—has only one leaf. And no acyclic 1000-node graph can have fewer than one leaf. 11.170 Consider a tree of height h in which no node that’s a left child of its parent has any children, and every node that’s a right child of its parent has two children: .

..

. Such a tree of height h has 2 nodes at depth d, for all d ≥ 1. Thus there are 2h + 1 nodes in a tree of height h, so the height of this n-node tree is (n − 1)/2. That’s the largest possible height for a binary tree that contains no nodes with one child. 11.171 ABCDEGHIF 11.172 CBAHGIEDF 11.173 CBHIGEFDA 1

11.174

3

4

5

2 11.175 1 3 5

2 4 11.176 The key will be define a tree in which there’s no node with a nonempty left subtree. In this case, the following two orderings are equivalent:

• traverse left; visit root; traverse right. • visit root; traverse left; traverse right. (Both are equivalent to “visit root; traverse right.”) Here is an example of such a tree: 1 2 3 4 n Thus pre-order and in-order traversals of this tree are both 1, 2, 3, . . . , n. 11.177 If there’s no node with a nonempty right subtree, then these two orderings are equivalent:

• traverse left; visit root; [traverse right]. • traverse left; [traverse right]; visit root.

11.4 Trees

243

(Both are equivalent to “traverse left; visit root.”) Here’s such a tree: 1 2 3 4 n Thus post-order and in-order traversals of this tree are both n, n − 1, . . . , 2, 1. 11.178 In both of the following trees, the pre-order traversal is CAB and the post-order traversal is BAC: C

C A

A B

B

11.179 Here is such an algorithm: reconstruct(i1...n , p1...n ): Input: i1...n is the in-order traversal of some tree T; p1...n is the post-order traversal of some tree T Output: the tree T. 1 if n = 0 then 2 return an empty tree. 3 else 4 root := pn 5 j := the index such that ij = root. 6 L := reconstruct(i1,2,...,j−1 , p1,2,...,j−1 ) 7 R := reconstruct(ij+1,...,n , pj,...,n−1 ) 8 return the tree with root root, left subtree L, and right subtree R.

11.180 We must include all but two edges from G in the spanning tree, and a 5-node tree contains 4 edges, by Theorem 11.35. We must include  the edge to the lone node of degree 1 in the graph. There are five other edges, of which we keep three, and there are 53 = 10 ways to do that. However, two of those ways leave the graph disconnected: we cannot leave out both of the edges incident to either of the degree-2 vertices. 11.181 We must remove two edges:

• BG and any one of AB/BD/DF/FH/HG/GE/EC/CA; or • any one of AB/GE/EC/CA and any one of BD/DF/FH/HG. That’s a total of 1 · 8 = 8 plus 4 · 4 = 16 spanning trees, for a grand total of 24. 11.182 We must remove three edges:

• • • •

AC and BF and any one of AE/EC/CG/GF/FD/DB/BA; or AC and one of BD/DF and any one of AE/EC/CG/GF/FB/BA; or one of AC/AE and BF and any one of AC/CG/GF/FD/DB/BA; or one of AC/AE and one of BD/DF and any one of AC/CG/GF/FB/BA.

That’s a total of 1 · 1 · 7 = 7 plus 1 · 2 · 6 = 12 plus 2 · 1 · 6 = 12 plus 2 · 2 · 5 = 20 spanning trees, for a grand total of 7 + 12 + 12 + 20 = 51.

244

Graphs and Trees

11.5 Weighted Graphs 11.183 ACDE, length 2 + 1 + 3 = 6. 11.184 ABE, length 1 + 3 = 4. 11.185 ACE or ABE, both length 1 + 3 = 4. 11.186 ABFH, length 1 + 2 + 4 = 7. 11.187 ACDFH, length 3 + 1 + 2 + 10 = 16. 11.188 We’ll define a graph with 1 + 3k nodes and k “diamonds”; in each diamond, both the upper path and the lower path have the same weight, so there are 2 shortest choices per diamond, and thus 2k total shortest paths:

s

1

n+2

3

n+4

2

n+1

4

n+3

···

2k − 1

n + 2k

2k

n + 2k − 1

t

We chose the weights of the edges carefully so that all the weights are different. Because there are n = 3k + 1 nodes in this k-diamond graph, we have that (n − 1)/3 = k. Thus there are k

(n−1)/3

2 =2

= (21/3 )n−1 ≈ 1.2599n−1

shortest paths through the graph. If n is not one more than a multiple of 3, then we can simply add one or two additional nodes in a chain between s and the first diamond. There are thus at least 1.2599n−3 = 0.5 · 1.2599n shortest s-to-t paths in the graph. 11.189 d(A, D) = 8 11.190 d(A, D) = 8, then d(A, E) = 9, and then d(A, H) = 17. 11.191 Running Dijkstra’s algorithm from H finds the following nodes and distances, in the order listed: node H F D B A C E

distance 0 10 12 16 17 17 17

(The last three nodes could have been listed in any order.) 11.192 When we’re proving that the distance from s to the chosen node u∗ is at least the number du∗ , we consider an arbitrary path P from s to u∗ , and we consider the portion of P up the first exit from the set S and the portion after. The argument claims that the portion of P after the first exit has length at least zero—but that can fail if the graph has negative edge weights. For example, consider this graph: A 98 S 99 B

11.5 Weighted Graphs

245

We say d(S, S) = 0 in the first iteration, and then we say d(S, A) = 98 in the second iteration. But if there’s an edge of length −10 from B to A, then in fact there’s a path of length 90 from A, via B. But Dijkstra’s algorithm would never update its recorded value for A. 11.193 We’ll take the second option, by describing a graph G′ so that Dijkstra’s algorithm run on G′ finds an SMP in G. G′ has the same nodes and edges as G, but with different weights: for an edge e with cost w in G, the edge e in G has cost w′ = log2 w. Then on any particular path P, we have ! X ′ X Y Theorem 2.10 (logb xy = logb x + logb y) we = log2 we = log2 we . e∈P

e∈P

e∈P

Q

Q Finding the path in G′ with the smallest log2 ( e∈P we ) is equivalent to finding the path in G′ with the smallest e∈P we P ′ because x < y ⇔ log x < log y. Thus finding the path with the smallest e∈P we solves the SMP problem. And therefore we can just run Dijkstra’s algorithm on the graph with the modified weights w′ . 11.194 We need all the logarithmic costs to be nonnegative, which means that we need all pre-logged costs to be greater than or equal to 1. (Recall that log x < 0 if and only if x < 1.) Thus we require that every we ≥ 1. 11.195 Include the edges AB, AC, CD, DE. 11.196 Include the edges AB, BD, BE, CE. 11.197 Include any two of the edges AB, BC, AC; include one of the edges BE, CE, and include the edge DE. Thus the following are all minimum spanning trees:

• • • • • •

AB, AB, AB, AB, AC, AC,

BC, BC, AC, AC, BC, BC,

BE, CE, BE, CE, BE, CE,

DE DE DE DE DE DE

11.198 Just include the six cheapest edges and we get a tree: AB, AE, BF, CD, CE, FH. 11.199 There is no minimum spanning tree, because the graph isn’t connected! 11.200 If the 8 cheapest edges form a simple path, then there is an MST that uses only these edges, and thus has cost 8 X i=1

i=

8 · (8 + 1) = 36. 2

11.201 If the 8 most expensive edges are all incident to the same node, then we’ll have to choose one of them, so there can be an MST with an edge of weight as high as 29 (leaving out the edges 30, 31, . . . , 36). 11.202 It’s possible to arrange the edge costs so that the MST has cost 1 + 2 + 4 + 7 + 11 + 16 +  22 + 29 = 92. Here’s how: call the nodes {0, 1, . . . , 8}. Arrange the edges so that, for each k = 0, 1, . . . , 8, the 2k cheapest edges are all among the nodes {0, 1, . . . , k}. (This arrangement causes the largest possible number of cheapest edges to be “wasted” by being part of even cheaper cycles.) Another way to describe this arrangement: connect the 8 most expensive edges to node 9. As argued the previous exercise, we must include the cheapest of these edges, namely the one of weight 29. Now there are 8 remaining nodes. Connect the 7 most expensive remaining edges to node 8. Again we must include the cheapest of these edges, namely the one of weight 22. And so forth. This yields an MST of cost 29 + 22 + 16 + 11 + 7 + 4 + 2 + 1 = 92. Pn−1 i = n(n − 1)/2. 11.203 The cheapest possible MST is the n − 1 cheapest edges, which has cost i=1 We’ll phrase the solution for the most expensive possible MST with a recurrence relation. Let C(n) denote the cost  of the most expensive MST on an Kn with edge weights {1, 2, . . . , n2 }. The most expensive MST has a single node n

246

Graphs and Trees

that has the n − 1 most expensive edges as its neighbors, and the other n − 1 nodes have the heaviest possible minimum spanning tree on Kn−1 . That is, C(1) = 0 C(n) =

n 2



Note that, for n ≥ 2, we have  n 2

( n)

− (n − 2) + C(n − 1).

− (n − 2) =

n(n−1) 2

2

− (n − 1) + 1 =

− (n − 2) is the cost of the (n − 1)st-most expensive edge

(n−2)(n−1) 2

+1=

n−1 2



(∗ )

+ 1.

Therefore, iterating the recurrence, we see that C( n ) =

=

n X  k 2 k=2 n X



− ( k − 2)

k−1 2





the recurrence, as described above, iterated out



+1

by (∗)

k=2

=n−1+

n−1 X  k

pulling out the constant; reindexing

2 k=1

n−1+1 2+1 ! n 3

=n−1+ =n−1+ =n−1+ Indeed, for n = 9, we have 8 +

9·8·7 6

! by Exercise 9.176

n(n − 1)(n − 2) . 6

=8+

504 6

= 8 + 84 = 92, exactly as in Exercise 11.202.

11.204 The graph is symmetric, so let’s just argue that the probability that we’re at node A after one step is still 31 :

Pr [u1 = A] = Pr [u1 = A|u0 = A] · Pr [u0 = A] + Pr [u1 = A|u0 = B] · Pr [u0 = B] + Pr [u1 = A|u0 = C] · Pr [u0 = C] = 0 · Pr [u0 = A] =0·

+

1 3

+

=0 =

+

1 2

· Pr [u0 = B]

1 2

·

+

1 2

· Pr [u0 = C]

1 2

·

we choose the next step randomly from the current node’s neighbors 1 3

+

1 3

the first node u0 was chosen according to p—that is, p(A) = p(B) = p(C) =

1 6

+

1 3

1 6

1 . 3

Thus Pr [u1 = A] = 1/3 = p(A). Similarly, Pr [u1 = B] = 1/3 = p(B) and Pr [u1 = C] = 1/3 = p(C). Thus p is a stationary distribution for this graph. 11.205 Any distribution of the following form, for any q ∈ [0, 1], is stationary: p(D) =

q 3

p(E) =

q 3

p(F) =

q 3

p(G) =

1−q 3

p(H) =

1−q 3

p(I) =

1−q . 3

The same calculation as in the previous exercise shows that the probability is stationary within each triangle. But if, for example, q = 0 then we’re uniform on the right-hand triangle, and if q = 1 then we’re uniform on the right-hand triangle. 11.206 A solution in Python is shown in Figure S.11.3, including testing on the graph from Figure 11.100a. I saw the statistics {'a': 671, 'c': 675, 'b': 654}. (The theoretical prediction says that we’d expect 2000/3 = 666.6 each.) 11.207 In even-numbered steps, you’ll always be on the “side” of the bipartite graph where you started; in odd-numbered steps you’ll be on the opposite “side.” So in this graph p1000 (J) = p1000 (L) ≈ 12 each, and p1000 (K) = p1000 (M) = 0 if we start at J. But if we start at K, the probabilities are reversed. Thus we haven’t converged to a unique stationary distribution.

11.5 Weighted Graphs

1

247

import random

2 3 4 5 6 7 8 9 10 11 12

def random_walk(V, E, v_zero, steps): ''' Performs a random walk on the (undirected) graph G = (V, E) starting at node v_zero, and continuing for steps steps, and returns the node at which the walk ends. ''' current = v_zero for i in range(steps): neighbors = [x for x in V if set([x, current]) in E] current = random.choice(neighbors) return current

13 14 15 16

# The graph from Figure 11.00a. V = ['a', 'b', 'c'] E = [set(['a','b']), set(['a','c']), set(['b','c'])]

17 18 19 20 21 22 23

# Performs 2000 instances of a 2000-step random walk, starting from node a. v_zero = 'a' counts = {'a' : 0, 'b' : 0, 'c' : 0} for i in range(2000): counts[random_walk(V, E, v_zero, 2000)] += 1 print(counts)

Figure S.11.3 Performing a 2000-step random walk 2000 times.

11.208 To show that p is a stationary distribution, we need to prove the following claim: if the probability distribution of where we are in step k is p, then the probability distribution of where we are in step k + 1 is p too. Consider a particular node u. Let v1 , v2 , . . . , vℓ denote the neighbors of u, where ℓ = degree(u). The key question is this: what is the probability that we will be at node u in step k + 1? Note that to be at node u at step k + 1, we had to be at one of u’s neighbors at step k and we had to follow the edge from that neighbor to u in step k + 1. And the probability that we’re at node vi in step k is exactly p(vi ), by the assumption, and the probability that we moved from vi to u in step k + 1 is 1/degree(vi ) because vi is a neighbor of u, and we moved from vi with equal probability to each of its degree(vi ) neighbors. Thus:

Pr [uk+1 = u] =

ℓ X

Pr [uk = vi and uk+1 = u]

law of total probability; we can only reach u from a neighbor

i=1

=

ℓ X

Pr [uk = vi ] · Pr [uk+1 = u|uk = vi ]

i=1

=

ℓ X

p(vi ) ·

i=1

= =

ℓ X degree(vi ) 1 · 2 · m degree (vi ) i=1 ℓ X i=1

=

1 degree(vi )

assumption/definition of the random walk (see above discussion)

definition of p

1 2·m

ℓ 2·m

=

degree(u) = p(u). 2·m

definition of ℓ = degree(u) and p

Thus the probability that we’re at node u in step k + 1 is precisely degree(u)/2m, which is just p(u) by definition. This proves the theorem: we assumed that p was the distribution at step k and proved that p was therefore the distribution at step k + 1, and therefore showed that p is a stationary distribution.

248

Graphs and Trees