A Mathematical Foundation for Computer Science 9798765785676

330 137 352MB

English Pages 1237 Year 2020

Polecaj historie

A Mathematical Foundation for Computer Science, Preliminary Edition 9781792405648

Undergraduate computer science students need to learn and use the mathematical method of abstraction, definition, and pr

1,262 125 4MB Read more

A Mathematical Foundation for Computer Science, Preliminary Edition 9781792405648

Undergraduate computer science students need to learn and use the mathematical method of abstraction, definition, and pr

3,287 426 66MB Read more

A Mathematical Foundation for Computer Science, Preliminary Edition [Preliminary ed.] 9781792405648

Undergraduate computer science students need to learn and use the mathematical method of abstraction, definition, and pr

1,002 130 219MB Read more

Concrete mathematics : a foundation for computer science 0201142368, 9780201142365

162 102 11MB Read more

Concrete mathematics : a foundation for computer science 0201142368, 9780201142365

161 7 11MB Read more

concrete mathematics - a foundation for computer science 9780201142365, 0201142368

180 35 11MB Read more

Concrete Mathematics: A Foundation for Computer Science 0201142368, 9780201142365

Concrete Mathematics is a blending of CONtinuous and disCRETE mathematics. ''More concretely,'' the

201 105 5MB Read more

Concrete Mathematics: A Foundation for Computer Science [2 ed.] 0201558025, 9780201558029

163 11 7MB Read more

Concrete Mathematics: A Foundation for Computer Science (2nd Edition) [2 ed.] 0201558025, 9780201558029

Table of contents : Cover......Page 1 Concrete Mathematics (Second edition)......Page 4 Copyright......Page 5 Preface...

243 124 4MB Read more

Concrete Mathematics: A Foundation for Computer Science [2 ed.] 0201558025, 9780201558029

156 22 3MB Read more

A Mathematical Foundation for Computer Science
9798765785676

Author / Uploaded
David Barrington

Citation preview

Sr ar

Orr

Oerr

Ooo

oer

Ooe-eoerro&

OOrr OrrOrr OCOCOCr0

——-OrrOrr =——Orr-OOrr

Ser OOrr me —-OOrr Orr [Orr Orrrr =—-OOrr

er

3 OrrOrroo 350 -OOCrrOrro DORR rOOCrrOrro

2rrO

OOrr OOreK Orr Oo

Sr rer Orr Fr OrrOOrrorr —OrrO0Crr Oreerrore Serer OrrOOrror —OOr Fr Orr OOrror Sr

r

ee

ee

RK Orr

a

OCOrrOrer

OOK

cooor

Rr Orr OorrooOoroorre

oOorro

SrOorrOOrrOrroo 2OrK 2~Orror (eOOrror

SOP —COOr

—OrrCOCOrCOrr Orr oO SR ROR Rr OOP Rr Or rr Orr OOO DOR Rr Orr OOF Orr Ore WwOOrrOrr oo

Orr OOOrcooroe —SOOP Kr OrrOOrr Orr Oo —OORP Rr OrrOOrr ROOF RK Orr OOOrF OOF DOr ——-Orro

Oe

Or

Oe

et

Or

OO

Ore

—OOR FOr rOOOCF OCR RF Orr OOrrOrre ~ORr OOF Orr SO0OrSOrr Orr oOrrorr Se Orr Ooo c COerr Orr SR ROR RF OOR KOR FOO ORF OORF Rr OF Rr OOr Fr Orr —OOr Rr Orr OOrr Orr ooo —OOr KF Orr OOF Orr oOOorF reore rreorer Oe

Orr

OOOroeoo

DOR RF Orr OCOF OOF Rr Orr OOrr Orr =e Or Rr OOCOr OOF Kr OFF OOF Tr Ore OOF RK OFF OOF Fr Orr SOR Rr Orr OOF FOr KR OOO Orr

OO

rr Orr

O0Or

COr rr Oe

COrrorr

~OrKOrK-O0OOrF OCOCr Kr Orr OCOr rr Orr

Sr

If A is a set, the things in A are called elements of A. The notation “a € A” means “x is an element of A”. We can denote a set by listing its members, separated by commas, between braces. It is common for all the elements of A to come from the same type T — note that T is a collection

of things and is thus a set. In this case we say that A is “a set of elements of type T” or “a set of type T”. Example: The set A = {2,3,5} is a set of naturals. The number 3 is an element of A, so the statement “3 € A” is true, while the statement “4 € A” is false. The set B ={Jane Austen, Chinua Achebe, Patrick O’Brian} is a set of novelists, that is, a set of type novelist. But we can also say that B is a set of type person. The set C = {Lady Murasaki, 3.26, George Eliot, 7} contains some real numbers and some novelists. This is a perfectly legal set, because it is a collection of things, but again we will normally restrict ourselves to sets that have elements of an easily understandable type.

In denoting a set by a list, we don’t need to write down all the elements if we can make them clear in some other way. For example {A,...,Z} is the set of all capital letters, {-128,...,127} is the set

of all integers from —128 through 127, and {1,3,5,...} is the set of all odd naturals.

Definition: If w is a variable of type T, and S$ is a statement about a thing of type T, then {w : S} is the set of all things of type T that make S true. This is called set builder notation. Example: Let x be a variable of type integer. Then {a : a < 3} is the set of all integers that are less than 3; —2 is an element of this set but 3 and 5 are not. Let n be a variable of type novelist. Then {n : n wrote in English} is the set of all novelists who wrote in English. George Eliot (who

wrote in English) is a member of this set and Lady Murasaki (who wrote in Japanese) is not.

‘We can use set builder notation to define sets even when we don’t have a way to test whether the statement is true. The set {n : n will write next year’s best-selling novel} is a set of novelists, but we can’t tell now which novelist is in it. We may have a good reason to have to work with such sets — consider the set {a : input a will cause my program to crash}.

Definition: Let A and B be sets. We say that A is a subset of B (written “A C B”) element of A is also an element of B. We say that A and B are equal (written “A = B”) AC Band BC Aare true, that is, if every element of A is an element of B and also every of B is an element of A. If A is a subset of B but A is not equal to B, we say that A is a subset of B and write A C B.

Example:

if every if both element proper

Let D be the set {George Eliot, Lady Murasaki}, E be the set {Chinua Achebe, Lady

Murasaki, Patrick O'Brian, George Eliot}, and F be the set {Lady Murasaki, George Eliot}. We can see that D C E because the two elements of D, George Eliot and Lady Murasaki, are each elements of E. Similarly, we can see that D C F because both these novelists are also elements of F. But the statement E C D is false, because not all the elements of E are also elements of D — for example, Chinua Achebe is not in D. Similarly E C F is false. But F C D is true, because

we'll look at some of the consequences of this kind of paradox for countability and computability theory. For now, though, we'll take refuge in the fact that the sets we plan to use will only rarely be sets of sets, and then only sets of sets from some fixed type 14

P2.3.5

The majority quantifier M has the following interpretation, whenever there are only finitely

members of the type over which it quantifies: Ma : P(a) means “P(2) is true for a majority of the possible 2”, or “P(x) is true for more x than it is false”. Consider the two sentences

M(x,y) : Q(a,y) where the type of “voter number y political systems® election”.

P2.3.6

and Maz: My : Q(a,y). Explain the « is “election distr , the type of in election district x voted for the where either of th sentences is

English meaning of these two sentences, y is “voter number”, and Q(x,y) means Silly Party”. Give examples from real equivalent to “the Silly Party wins the

Describe an example where one of the two sentences is true and the other is false.

(uses Java) Let X = Xn} be a set of n objects and Y = {y1,...,yn} be a set of m objects. Let R be a binary relation from X to Y, represented by a two-dimensional array R, so that R[i,j] represents the truth value of the statement “(x;,y;) € R” (or, equivalently,

“R(2i, yj)”)--

(a) Write a method boolean e (int value of the statement “Sa : R(x,

j) that takes one int argument and returns the truth y;)”.

(b) Write a method boolean a (int j) that takes one int argument and returns the truth value of the statement “Va : R(x, y;)”. P2.3.7

Let D be a set of dogs, with R being the subset of retrievers, B being the subset of black dogs,

P2.3.8

Let X be a set and R(«) a unary predicate on X.

and F being respectively. and Va : dy: Justify your

the subset of female dogs, with membership predicates R(x), B(x), and F(x) Suppose that the three statements dy: R(x) R(y), Ve: dy : B(x) @ Bly), F(«) @ F(y) are all true. What can you say about the number of dogs in D? answer.

(a) Write a quantified statement that says “there are exactly two elements « in X such that:

R(2) is true”.

(b) P2.3.9

P2.3.10

Write a quantified statement

that says “there are at least three elements 2 in X

R(x) true, and at least three with R(1) false”.

with

Let D be a set of dogs and let T be a subset of terriers, so that the predicate T(a) means

“dog x is a terri Let F(x) mean “dog 2 is fierce” and let S(x,y) mean “dog x is smaller than dog y”. Write quantified statements for the following, using only variables whose type is D: a)

There exists a

fierce terrier.

b)

All terriers are fierce.

(c)

There exists a fierce dog who is smaller than all terriers.

(a)

There exists a terrier who is smaller than all fierce dogs, except itself.

A set of numbers is said to be dense if given any two distinct numbers « and y, there exists

a number strictly between x and y. (a) Write a quantified statement that is true if and only if the type we are quantifying over is dense.

°The Electoral College used to decide the U.S. president is similar to one of these, but more complicated because of the different sizes of the “election districts” (states) 2-15

— just start generating all possible derivations in all possible ways until they generate n terminals, where n = |w|. As long as we take some precautions to avoid repeating ourselves’, we'll finish with an enormous (but finite) list of possible derivations that will include w if and only if it can be derived. But we should be able to answer this question more intelligently and efficiently than that. could a

What

derivation of the string w = w, ... Wn possibly look like? We have to apply some rule to the

start symbol S, say “S — of w — in fact, by giving some i and j, A generates If the rule we use has k how w is divided up into ‘We've reduced the same type: write a recursive

ABC”. Then the symbols A, B, and C each have names to the last letters of two of these substrings the string w; ... wi, B generates wi41...w;, and C non-terminals on the right-hand side, we need k — the substrings generated by each non-terminal.

to generate substrings we can say? that for generates wj41-..Wn1 numbers to specify

our original problem, whether S can lead to w;...Wn, to a set of problems of whether some non-terminal A, for example, can generate w;...w;. We can now boolean

method

gen(A,i,j)

to answer

this question:

“Can

the non-terminal

A

possibly generate the string w;...w;?” For each possible rule, and each possible way to divide w into substrings, we call gen recursively to see whether each of the non-terminals on the right-hand side can generate its assigned substring. The running time of this algorithm is horrible. It might call itself as many as n — 1 times, and at each level it has to consider each of the roughly (Ea) possible divisions of the string into substrings as well as each of the rules of the grammar. But we have used dynamic programming before to tame horrible recursive algorithms, and it can be used again here. If we make a table for the boolean values of gen(A,i,j)

for each of the possible choices of A, i, and j, we have to do only a

polynomial amount of work to fill in each entry of the table, and we have an algorithm that runs in polynomial time”. This isn’t good enough, though, for practical situations like the parsing of programs in compilers. There we want to parse a program in one pass as we read it, if at all possible. So the grammars of

actual programming languages have to be chosen from various restricted types that are known to have fast parsing algorithms. These are well beyond the scope of this book, but are dealt with in some books on computability and complexity, and in most books on compiler theory.

15.3.5

Exercises

E15.3.1

Show, using the given grammar, that the string [lrllrrirrrlirr is in the language Paren.

E15.3.2

Show, using the given grammar, that the string

at (ax (a+a+(axa+a) +a) *(a+a)) in the arithmetic expression language. ?5One way to do this would be to put the grammar into Chomsky Normal Form, as described in Excursion 15.4. °There is an important technicality here in that we want to assume that no nonterminal can derive the empty

string. This is the first stage of the Chomsky Normal Form construction of Excursion 15.4

“This is usually called the “CKY” algorithm after its discoverers: Cocke, Kasami, and Younger. 15-28

E15.3.3

Give the leftmost

derivations equivalent

E15.3.4

Consider the grammar with rules S + aSb and S — ba. Find a string with at least eight: letters generated by this grammar. Give the derivation and draw the parse tree.

E15.3.5

What does a parse tree for a right-linear (regular) grammar look like in general? What about a left-linear grammar? A linear grammar?

E15.3.6

In Problem

trated in Figure 15-7).

to the three example

14.2.9 we proved the Regular

says that if L is

Language

the language of a k-state DFA,

derivations in the text

Pumping

Lemma

(RLPL),

and w € L is a string with |w| >

(illus-

which

k, then w

can be written as xyz where for all naturals i, the string wy‘z is in L. Let EQ be the set of strings over {a,b} that have an equal number of a’s and b’s. Prove that EQ does satisfy the

conclusion of the Regular Language Pumping Lemma as stated above, although we proved in Problem 14.2.7 that it is not regular. E15.3.7

Let T be a parse tree for a string w using a grammar G. Prove that there exists another parse tree T’ for w using G, in which whenever any non-terminal A occurs twice on one path of the tree, at least one terminal is generated from the top A along with the bottom A. (Hint: Use induction on the number of nodes in T.)

E15.3.8

The base case of the recursion in the CKY

algorithm is when i = j, so we are asking whether

the non-terminal A can generate the string w;...w; = w;. given G, A, w, and i as input?

E15.3.9

How

do we answer

this question,

(uses Java) Suppose our grammar G contains the rule S + ABC, and assume that no nonterminal can generate an empty string. Write a Java code fragment indicating how the CKY algorithm answers the question gen(S, i, j) by making calls to gen with first parameter A, B, and C.

E15.3.10

Suppose that we are running the CKY algorithm on a string of length n. Explain why the generalization of the result of Exercise 15.3.9 to any rule of the grammar will take time polynomial in n.

15.3.6

Problems

P15.3.1

Find an example of a grammar and a string such that the string has two distinct leftmost: derivations in the grammar. Such a grammar is called ambiguous*!.

P15.3.2

In Problem 14.2.9 (b) we showed that the strings x, y, and z in the Regular Language Pumping

Lemma (also restated in Exercise 15.3.6 above) may be chosen with |ry| < k.

(a) Show that with this added condition, the language EQ (strings over {a,b} with equal numbers of a’s and b’s) no longer satisfies the conclusion of the RLPL. (b)

Let the language Z over the alphabet {a,b,c} be the concatenation of cc* with EQ. Argue using the Myhill-Nerode Theorem that this language is not regular.

314 language is called inherently ambiguous if every grammar for it is ambiguous. An example of an inherently ambiguous language is {a‘b’c* : i = j Vj = k}, but the proof of this is fairly hard, requiring a careful analysis of

when a symbol can generate itself.

15-29

(c) P15.3.3

Show that Z satisfies the conclusion of the RLPL

Prove the following CFL Pumping Lemma (CFLPL): Let A be a context-free language. Then there exists a number k such that for any string w € A with at least k letters, w can be written as a concatenation wvayz such that (1) v and y are not both empty, and (2) for all numbers

P15.3.4

Examine What

i, the string wo'ry'z is in A.

the behavior of the CKY

algorithm when the input grammar

is a linear grammar.

are the approximate time and space requirements”?

P15.3.5

Using the CFL

P15.3.6

Let EQs

{a™b" ab"

Pumping

Lemma

from Problem

: m,n > 0} is not context-free.

15.3.3 or directly, prove that the language

be the set of strings over {a,b,c} with an equal number

show in Section 15.5 that EQ3

of the CFLPL. P15.3.7

even with the added condition.

is not context-free.

Show that E@Q3

of a’s,

b’s, and c’s. We will

does satisfy the conclusion

Show that the following languages are not context free:

(a) {wew : w € {a, b}*} (b) {ww :w € {a, b}*} P15.3.8

Show that we can strengthen the conclusion of the CFLPL by saying in addition that |vry| < k.

P15.3.9

Let L be a context-free language over the alphabet {a}. such that for all i > k

a’ € L, then there

naturals ¢, a'tJ' € L. (Hint: P15.3.10

Use the result of Problem alphabet

is almost periodic,

exists a positive natural j < k such that for all

Use the CFLPL 15.3.9 to show

Prove that there exists a natural k

with the added provision in Problem 15.3.9.)

that any context-free language

and thus also regular.

(Hint:

Show

over a one-letter

that a’ € L >

atthem,

Then argue that for every congruence class modulo k!, either L contains no elements whose length is in the class, or there is a shortest such element.)

®t the following Excursion we define Chomsky Normal Form and ask the same question for Chomsky Normal

Form grammars.

15-30

15.4

Excursion:

Chomsky

Normal

Form

If we want to prove something about all context-free grammars, we have to deal with the full range of possible grammars and particularly of possible rules in those grammars. Life would be much easier if we could assume that we were given a grammar in some special normal form, so that it only had certain types of rules. In this Excursion we’re going to look at one such normal form invented by Chomsky. Any grammar can be converted to an equivalent grammar** that only has rules of certain types:

e There may be a rule S — A, whe

S is the start symbol.

e There may be rules of the form A — a, where terminal and the small letter a terminal. e There

may

be rules of the form A —

BC,

(as usual) the capital letter denotes a non-

where

A, B,

and

C are non-terminals

(possibly

equal to each other).

Converting a grammar to this Chomsky Normal Form will usually require adding more nonterminals and more rules, thus making the particular grammar harder to understand. But the class of Chomsky

Normal

Form

grammars

is easier to work

with.

For example,

we know

that a parse

tree in a CNF grammar has two children for each of its nodes, except for the leaves (no children) and the leaves’ parents (one child). We don’t have to worry about a symbol deriving itself without deriving some other terminals as well, a possibility that complicated our analysis of pumping. And when we apply the CKY algorithm to a CNF grammar, our task is simpler because we know that

each rule divides a substring into just two smaller substrings, not more. So given an arbitrary context-free grammar G’, how do we convert it to a CNF grammar H? First, notice that the second and third types of rules could never produce the string A. (This is easy to prove by induction — the start symbol is a nonempty string and no rule could change a nonempty string into an empty one.) So our first step is to:

e Determine which non-terminals can possibly derive \ in G — call these the erasable nonterminals. If the start symbol Sis erasable, add the rule S + A to H. Remove all other rules with on the right-hand side. But for every rule that has any erasable non-terminals on the right-hand side, add new rules with every possible combination of these erased. For example, if B and C are erasable and A + ABaCA is a rule, add the additional rules A + AaC'A, A-— ABaA, and A AaA.

How do we tell which non-terminals are erasable? A such that A— rule B +

this check

C,...C,

We make a list, starting with all non-terminals

X is a rule of G. We then check whether any other non-terminal B appears in a where C},...,C

with the expanded

list.

are each already the list. If so, we add B to the list and repeat

When

this process stops,

we have

exactly the set of erasable

non-terminals. ‘8Two grammars are equivalent if they generate exactly the same language of strings of terminals. 15-31

The next step is to get rid of long rules, with more than two letters on the right-hand side. We do this by introducing new non-terminals for each such rule, and a series of short rules that can be used only to simulate the long rule. For example, if we have a rule A + BaC'A, we can add two new non-terminals X and Y and have the rules A BX, X > aY, and Y > CA. Applying these three rules in succession to A yields BaC'A, and since X and Y will never appear in any other rules

there is no other way to use these rules to derive anything else. In general:

e For every rule A—

rules A > ByC1,

By... By,

Ci > BoC3

with k > 2, introduce new nonterminals C} ..,Ck—2 and new -, Cy-2 + Br-1By. Then delete the original long rule.

Next we make sure that the only rules with terminals on the right are of the desired type A —> b. For this we:

e For each terminal a, create a new

non-terminal N,.

Add

the rule N, >

a.

Replace

any a’s

on the right-hand side of other rules to N,’s.

Now

all our rules are of the type A —

a, A +

is to get rid of the rules of this last type.

BC,

or A >

B, and our only remaining

problem

To do this we first find the transitive closure of all such

rules, that is,

e If A B possible.

and

B

— C are

rules,

add

the rule A +

C.

Continue

this process

as long as

(We can organize this process using any of the transitive closure algorithms from Chapter 8.) Once we have a transitively closed set of A B rules, we must add in new rules to simulate all possible ways in which these A — B rules could be used. In particular,

e If A > B is e If A

B

in the transitive closure, and B — cis a

rule, add the rule A > c.

is in the transitive closure, and B — CD

is a rule, add the rule A > CD.

e If A — B

is in the transitive closure, and C > AD

isa

rule, add the rule C > BD.

e If A— B

is in the transitive closure, and C > DA

isa

rule, add the rule C

Once all these rules have been added, delete all rules of the form A —

DB.

B.

‘We need to be sure that at each stage of the transformation of the grammar, we do not change the language of strings generated by it. Whenever we add a rule in this process, for example, the new rule implements a rewriting operation that was already possible using other rules, so no new strings are added to the language. Whenever we delete a rule, we must be sure that no previously derivable string becomes

underivable

because of the deletion.

15-32

How

could, for example,

the rule A +

B be

used to derive a string? And

if A -

In any such derivation,

the B must

be expanded after the rule is used.

B represents the entire sequence of rules of this form that is used

in the derivation,

the expansion of the B must use a rule of the form B + CD. But then we know that we go from the A to the CD using a single new rule, added at the previous stage of the process. So the string eventually derived remains derivable and the language of the grammar does not change. The other deletions of rules can be justified similarly.

15.4.1

Writing

Exercise

Two choices:

1. Carry out the construction to produce a Chomsky Normal Form grammar equivalent to the following:

S + DbCe S74. C > ba D + Cab D-> S$

2. Examine the behavior of the CKY algorithm when the input grammar is in Chomsky Normal Form.

What

are the time

and

space

requirements?

Write

being casual if necessary about data types and so forth.

15-33

pseudocode

for this algorithm,

Figure 15-11:

15.5

CFL’s and Pushdown

15.5.1

Testing Membership

Every method

call in a language

Activation records on a stack

Automata With

a Stack

like Java involves the use of a stack.

The calling procedure is

suspended until the called procedure returns, and the computer stores the information, called an activation record, that will be needed to restart it. If method A calls method B, which then calls method C, the activation record for B is stored “on top of” the record for A — B must be

restarted before A is. The

last item stored must be the first item retrieved, requiring us to store

the activation records in a stack (see Figure 15-10). How

powerful

is a stack as a computing

tool?

In this section we'll define an abstract

computer

called a pushdown automaton (PDA) that is essentially an NFA enhanced by adding a stack. Since the memory available to this machine is no longer limited to a fixed number of states, it shouldn’t be surprising that PDA’s are more powerful than NFA’s and can recognize non-regular languages. In fact we'll prove that a PDA can decide a language if and only if it is definable by a context-free grammar. So once again a means of defining languages (grammars) coincides exactly with a means of deciding them. Let’s begin with our simple parenthesis grammar:

eS

Sr,

e S>

SS,

and

e SX.

In a data structures course, you may have seen some version of the following recursive algorithm that decides whether a string is in the language generated by this grammar. When the algorithm

15-34

Figure 15-12:

The actions of the PDA

on the string Irlrr

sees an /, it puts an activation record onto the stack and calls itself again.

When

it sees an r, as

long as the stack is not empty™!, it knows that the r matches with an I previously seen. So it terminates the current version and returns to its calling method, thus taking an activation record off the stack. Let’s look at this computation another way, so that we can see how to generalize it to other context-free grammars. The idea will be to use the stack to record what is left to be derived. At the beginning of the computation, we will put an S on the stack because we are looking for a string that can be derived from the initial symbol S$. Our first move will be to take the S off and put on the string 1Sr. (Note that we are writing the stack contents with the top character on the left, so that the 1 is now the top-of-stack character.) This reflects the fact that we have to get an 1, followed by a string derivable from $, and an r to finish. At the moment, we have successfully derived

a terminal that

matches

the next

terminal in our input

from the input and take the corresponding r off the stack.

15-11.)

string llrirr, so we'll read the |

(You can follow this process in Figure

Our next move”? is to replace the S on the top of the stack with the string SS, so the stack contents are now SS. Then we replace the top S by Sr, match an / from the input, replace the top S by A, and match an r from the input. We now have Sr on the stack and Irr remaining in the input. Once again we change the top S to 1Sr, match an 1, replace the S by A, match an r, and match

an r again.

The

“jobs to do” on the stack have been matched

fact, the rules of the grammar

have been used

exactly with the input string.

to derive an exact

have proven that the input string was derivable in this grammar.

copy of the input

In

string, so we

We'll see the general form of this

construction below, but first let’s formalize our model of an automaton with a stack. “Of course if the stack is empty, this r has no matching / and we reject the string,

’°How do we decide to make this particular move, rather than another? It’s not obvious, but we'll say more about

this below, in Exercise 15.5.4, and in Problem

15.5.4,

15-35

15.5.2

Definition of Pushdown

‘We define a pushdown

Automata

automaton by specifying several components:

An input alphabet 5, A stack alphabet I, A state set Q, with a start

state . and a set F of final states,

and

e A set of transitions A. A transition is a five-tuple (p,u,v;¢,w) from the set Q x I x D* x Q xI*, meaning “if in state p you can pop u from the stack and read v from the input, you may do so, enter state g, and push w onto the stack” .

If M is a pushdown automaton, we define the language L(M/) to be the set of strings w € X* such that M can start in state « with an empty stack, carry out a sequence of transitions during which it reads w from the input, and finish with an empty stack and*° in a state from F.

Note that this definition is inherently nondeterministic — it talks only about what the machine might do. In practice, of course, we would like a parsing algorithm to be deterministic. But the general correspondence between stack computations and context-free grammars only holds true for this nondeterministic

model.

It is possible to define deterministic pushdown

automata,

and there

is an extensive body of work dealing with when a language given by a grammar can be decided by a deterministic pushdown

15.5.3.

The

Top-Down

automaton.

But this area is beyond

the scope of this book.

Parser

Given this definition, it is not hard to generalize our example above to prove the following:

Parsing

M.

Proof:

Theorem:

Any context-free language is equal to L(M)

Given any context-free grammar

for G, such that L(G) = L(M).

G, we will build a PDA

M will have three states,

for some pushdown automaton

M,

called the top-down

parser

1, p, and f, with f the unique final state.

The stack alphabet will consist of all the terminals and non-terminals of G’, and of course the input alphabet will be X, the terminals of G. M will have transitions of three kinds:

e (v,A,A:p, 5),

to push an S$ onto the stack,

e (p,A,;p,u)

for each rule A +

u of G, to replace a non-terminal on the stack with a string

according to a rul 8°Qne could also vary the definition to allow the PDA to have just an empty stack, or just be in a

final state, to

accept a string. It turns out that most of the important results are not affected by this choice, so we will choose a single most convenient definition in our brief treatment of the subject

15-36

e (p,a,a;p, 2),

for each letter a in X, to pop a terminal off the stack and read an input letter

to match it, and

© (pr,

), to “declare victory” at any time®?. 7

Now we must prove that M can accept w and computation, we letters that have the first step of

can accept a

string w if and only if w € L(G).

Suppose first that M

fix a particular accepting computation of M. For each time step in this accepting define a current string as follows. The current string consists of all the input been read, in order, followed by the contents of the stack, top letter first. After M, the current string is exactly S, because no input has been read and S is on

the stack. At the end of the computation the current string is exactly w, because w has been read and the stack is empty. How does the current string change as M goes through its computation? ‘When M pops a non-terminal from the stack and replaces it with the right-hand side of a rule, the current string changes in the same way, so a rule of G has been applied to it. When M pops a

terminal and reads a letter, the current string does not change at all, because the same letter that leaves the stack is added to the string of letters read. So the sequence of current strings is exactly

a derivation in G of w, and thus w € L(G).

To complete the proof, we must show that if w € L(G), then w can be accepted by M. Here all we need to do is consider a leftmost derivation of w in G, as we defined in Section 15.3 above. Since only the leftmost non-terminal is expanded in a leftmost derivation, we can have M simulate

this derivation move by move.

At each step of the derivation,

there is a string consisting of zero

or more terminals, the leftmost non-terminal, and then the rest of the string. We can ensure that at each time step M has read the initial terminals from the input, has the leftmost nonterminal at

the top of the stack, and has the rest of the current string on the stack. Thus the “current string” of (letters read followed by stack contents) is exactly the current string in the derivation, and M can continue until the current string is w and exactly w has been read from the input. a In the real world,

parsing

is an important

problem.

A compiler,

for example,

has to determine

whether its input is a valid string in a formal language, often given by a context-free grammar**. Practical parsing algorithms must be deterministic, of course, but the two starting points for constructing the real ones are the top-down parser described above and a similar PDA called the bottom-up parser. The latter is described in Problem 15.5.1.

15.5.4

Simulating

a PDA

With

a Grammar

So PDA’s have at least as much power to define languages as do context-free grammars. A natural theoretical question is whether they have the same power, or more. Is the language of any PDA a context-free language? We'll now prove that it is — this is not a practically useful construction (like the top-down and bottom-up parsers) but does serve to tell us that these two classes of languages are the same. PDA

Simulation Theorem:

If M

is any PDA,

then L(M/) is a context-free language.

87We could omit this rule and make p a final state instead. 88Or in a more-or-less equivalent notation called Backus-Naur Form.

15-37

(b) Write the negation of this statement (so that it means that the set is not dense). (c)

Argue that the set of rational numbers is dense.

(d)

Argue (using the statement of part (b)) that the set of naturals is not dense.

2-16

Proof:

L(G)

We

will take an arbitrary

= L(M).

First, though,

make the argument

simpler.

PDA

M

and construct

we will transform M

Our M

a context-free grammar

into a particular normal

G such that

form®*? that will

will have the following two properties:

e It has exactly one final state f, and e Every

transition either pushes or pops exactly one character,

but not both.

That

is, every

transition is of either the form (p,a,u;q, A) or (p,A, u;q,@) for some a €T and some u € X*.

It is not hard to prove that given any PDA

M, there

is an equivalent PDA M’ (i.e., L(M) = L(M’))

such that MM’ has these two properties — see Exercise 15.5.5. For the rest of the proof we will assume that this has been done, and that the normal-form PDA has been renamed M. The grammar

G will have one non-terminal for every pair of states of M.

So if p and q

are states

(possibly equal) we have a non-terminal V,,. The goal of the construction will be to have a string w derivable from V,,q in the grammar if and only if M, started in state p with an empty stack, can possibly read w and finish in state q, also with an empty stack.

If p is the start state . and q is the

final state f, this is exactly the definition of a string being in L(G), so our start state will be V,,,. All that’s left is to define the rules of G, which will be of three types.

Vp

— A, reflecting the fact that we can go from a

q, and r are any three states of M,

First, for any state p we have

state to itself by doing nothing.

we have the rule V,,, +

V,qVj,r-

Secondly, if p,

This reflects the fact that if

we can go from p to q reading string wu, and then go from q to r reading string v, we have just gone from p to r reading uv, and wv must be in the language V,,. The third set of rules, pair of transitions of

unfortunately, is more complicated. There is one rule in this set for every where the first transition pushes a symbol a onto the stack and the second

pops the same symbol a. Without loss of generality, we can call these transitions (p, A,u;q,a) and (r,a,v; s,) where p, g, r, and s are states of M, u and v are strings in D*, and a is a letter in T.

The rule arising from these two transitions is:

Vp,s +

UVg,r?,

reflecting the following possible sequence of events:

1. M starts in state p. 2. M applies the first transition to push a and read u, ending in state q.

goes through some sequence of transitions during which it reads some word w in Vj, and ends with the a (pushed in step 2) alone on the stack. This a is not touched during this step. 4.

M applies the second transition to pop the a and read v, ending in state s.

8°{q Excursion 15.4 we explored a normal form for grammars that made certain proofs easier 15-38

5. So M has gone from p and an empty stack to s and an empty stack, and has read wwvu where w € Var.

To prove that this construction is correct we need to show that L(M) = L(G). The easier direction is L(G) C L(M). If w € L(G), there is a derivation of w in G from the start symbol V;,5. Following our construction above, we can build a sequence of transitions of M during which M adding two transitions to the sequence whenever we apply a rule of the third kind.

accepts

w,

To prove that L(M) C L(G), we need to show that the rules of the grammar capture all possible computations of the PDA. It will be enough to prove the following: Lemma: If M can go from state p and empty stack to state s and empty stack, while reading a string u, then V,,s can generate wu in G. Proof: If M does this, it undergoes some sequence of zero or more transitions. induction that the Lemma holds for all valid sequences of transitions.

e Base

case:

case u=

The

X and

sequence

of transitions is empty.

We will prove by

This is possible only if p = s, in which

Vp,s > A is a rule of G.

e Induction case 1: M empties its stack during the computation, in some state r. Then our sequence is the concatenation of two shorter sequences, and we can produce it using the rule Vpr > Vp,rV;,5 and the inductive hypothesis.

e Induction

case 2:

M

does not empty

its stack during the computation.

Then the letter

a pushed by the first transition (p,A,u;q,a) is the same letter popped by the last transition (r,a,v;8,) and the sequence between these two is produced by the symbol V,,-. So we use the rule V,,, +

wVq,rv and

the inductive hypothesis.

We are done, because a string u is in L(M)

if and only if M can go from z and empty stack to f

and empty stack while reading u. By the Lemma

this condition implies that V, ¢, the start state of

G, can generate u. So if u € L(M), then u € L(G).

15.5.5

E15.5.1

a

a

Exercises

Consider the PDA with alphabet {a,b}, and the following three transitions: (0.

state set {v, f}, start state c, final state set {f}, ts) iAhby fb), (f,0,A; f, A), and (f, a, a; f, A).

Show that this PDA can accept the string aabaa.

What is the language of this PDA?

E15.5.2

Design a PDA whose language is {a"b?” : n > 0}. Illustrate an accepting computation of your PDA on the string aabbbb. Argue that your PDA accepts exactly the strings in this language.

E15.5.3

Describe the top-down parser corresponding to the parenthesis grammar given in this section. In particular, list all of its transitions. Describe, step by step, an accepting computation of this parser on the input string Irllrillrrirrr.

15-39

E15.5.4

Consider the grammar with just two rules, S + 1SrS and S — . generates the parenthesis language.

Prove that this grammar

Describe the computation of the top-down parser for this

grammar on the input string Irllrillrrirrr. This grammar, unlike the other grammar for this language, has the property that the PDA can determine which rule to apply next input letter and the top character of the stack. Explain why this is true.

E15.5.5

based

on the

Let M be an arbitrary PDA. Explain how to construct a PDA M’, with L(M) = L(M’), such that M’ has exactly one final state and every transition of M’ either pushes or pops a single letter, but not both.

E15.5.6

Is the language {a’b/c* : i > j > k > 0} context-free?

Prove your answer.

E15.5.7

Let f be the string homomorphism from {a, b}* to itself defined by the rules f(a) = ab and f(b) = ba. For each of the following two languages, either prove that it is context-free or prove that it is not.

(a) Y = {wf(w) : w € {a, b}*} (b) Z= {w(f(w))® : w € {a,b}*} E15.5.8

If L is not context-free and R is regular, can we conclude that LO R is not context-free? Prove your answer. Compare your answer to the result of Problem 15.5.5.

E15.5.9

Let © = {0,1}.

Let X be the set of strings w such that there exist a natural k and a string

y such that w = 1*0y and y has at most k ones.

(a) Describe a grammar G such that L(G) = X. (b) E15.5.10

Describe a PDA M such that L(M) = X. You need not use the construction on your grammar, and you may describe the transitions of M informally.

The PDA M has state set {i,p, f} with start state i and only final state f. Its input alphabet is {a,b} and its stack alphabet is {a,c}. Its transitions are (i, A,a;p,c), (p,A,a;p,a), pp, A), and (p,c,b; f, A). (a) Verify

that

M

satisfies

the

normal

form

rules for our

construction

of an equivalent

grammar. (b) Determine the language L(!/) by directly analyzing the possible behavior of M.

15.5.6 P15.5.1

(c)

Describe the grammar obtained from M by our construction, identifying the relatively few rules that are needed to actually derive strings in the language, but indicating all the rules.

(d)

Argue that the language of your grammar is exactly L(M).

Problems Along with the top-down parser, we can also define a bottom-up parser. This operates by moving terminals from the input onto the stack, then applying rules backward to the stack until finally a start symbol is generated. So we have a rule (p, \, a;p, a) for every letter a in D, and a rule (p, v® Ap, A) for every rule A + v in the grammar. Give the complete description

of a PDA

M,

for your PDA.

with these rules, such that L(M) (Hint:

= L(G).

Prove that in fact L(M)

You want to prove that the computation of M

a time-reversed derivation of G, and vice versa.) 15-40

= L(G)

can be interpreted as

P15.5.2

Carry out the given construction to make a grammar for the language of the PDA specified in Exercise 15.5.1. (Note that this grammar satisfies the rules about one final state and pushing or popping exactly one letter.) You should get exactly twelve rules in your grammar. Show how most of these rules can be eliminated without changing the language of the grammar.

P15.5.3

Suppose that M is a PDA with six states and stack alphabet {a,b}. It has eleven transitions in all: Two push a, three pop a, five push b, and one pops b. If we create a grammar

equivalent to this PDA by the construction in this ion, how many rules result? (Ignore the possibility that the construction might produce the same rule in two different ways — how could this happen?)

P15.5.4

Suppose that a grammar G has the following two properties: (1) Every right-hand side of a rule begins with a terminal, and (2) No two rules have both the same left-hand side and the same

first letter of the right-hand

side.

Explain

how

to simulate

the top-down

parser

deterministically. P15.5.5

Let A be a context-free language and B be a regular language. Prove that AN B is contextfree. (Hint: Define a PDA M such that A = L(M), using our main theorem of this section. Define a DFA D such that B = L(D). Then use M and D to define a PDA N such that

ANB=L(N). P15.5.6

Let EQ3

be the set of all strings in {a,b,c}* with an equal number of occurrences of each

letter. Prove that £Q3 is not context-free. (Hint: Use the result of Problem EQ to the language Ex; = {a"b"c" : n > 0}, which we proved to be not

15.5.5 to relate context-free in

Section 15.3.) P15.5.7

Prove

that

the complement

of the

language

E3

=

{a"b"c"

: n

>

0}

is context-free.

This

shows that the context-free languages are not closed under complement. P15.5.8

Show

that

the language

E3

=

{a"b"c"

: n > 0}

is the intersection

of two

context-free

lan-

guages, thus proving that the context-free languages are not closed under intersection. P15.5.9

The equivalence of grammars

and pushdown automata can be useful in proving closure prop-

erties of the set of context-free languages, since we can take whichever definition of the class of context-free languages that is most useful for us. Here is an example. Recall that a string homomorphism is a function from ©* to A* that satisfies the rules f(A) = A and

S(ur) = Fw) FO). (a)

The homomorphic image f(Z) of a language L under a homomorphism f is the set {f(w) : w € L}. Prove that if L C X* is context-free and f is any homomorphism from

x* to A*, then f(Z) is context-free. (b)

P15.5.10

The inverse homomorphic image f~!(L) ofa language L under a homomorphism f is the set {w: f(w) € L}. Prove that ifL C A* is context-free and f is any homomorphism from ©* to A*, then f~!(L) is context-free.

Let G be an arbitrary directed graph with vertex set {v1,...,vn}. nonterminal

set {Vij

:1Jx illogical” to “There is no baby who is not illogical”. a negated universal, by changing Ja: P(x) to Vx rephrasing statements using the contrapositive rule 0 = 1, then all trout live

in trees”

For example, we

: 4P(x). So you might change “All babies are There is a similar rule to turn an existential into : >P(«). Handling negations is important when to reverse implications, as when we change “If

to “If it is not the case that all trout live

in trees, then 0 4 1”

and then to “If there exists a trout that does not live in trees, then 0 # 1.” Negations can also move around inside an English statement and even inside words, so that “There is no baby who is not illogical” could just as easily be “There is no baby who is logical”.

2.4.1

Writing

Exercises

Here are a few translations, which will help test where you are as far as being able to deal with the symbolic language. For the symbol to English translations, make the English as clear and readable as you can. This may involve a second pass after you've first written something to get the meaning clear.

Unlike above, all these examples are from more-or-less real mathematical text®.

°This text contains a number of terms with which you may well not be familiar — the last exercise, for example, involves material from Chapter 8 of this book. But the translation should not depend on the meanings of these terms.

2-17

by their index numbers. For example, the instruction “load contents of register 7 into register A” would require the Turing machine to find the seventh register and copy its contents back into the A register. How would it know which register is the seventh? In Problem 15.6.3 you'll figure out how

to implement

a binary

a marker past a register,

counter

on a piece

of tape.

If the machine

(2) move back to a counter and decrement

again, it can move the marker past a specified number of registers.

can repeatedly (1) move

it, and (3) find the marker

Note that the implementations

of these simple operations can be terribly ugly — for example, copying involves repeatedly reading one character at the source, moving to the destination, and writing the one character, with as many trips as there are characters to be copied.

But our concern here is not efficiency but feasibility,

as we just want to prove that each assembly operation can be simulated in principle by a Turing machine. At the beginning of this chapter we said that it could move both ways, write, and both ways alone doesn’t allow a two-way moving both ways and writing, without is a Turing machine

linear-space Turing

whose

that a Turing machine was made by augmenting a DFA so use unlimited memory. We saw in Section 15.1 that moving DFA to recognize any non-regular languages. What about the additional memory? A machine with these capabilities

total memory

machine.

is limited to be proportional

to the size of the input, a

We know that these machines cannot do everything that an TM

with unlimited memory can do, because there are particular problems, solvable with more memory, that can be shown to be impossible for them*?. However, the {a"b"} problem is not one of these

difficult ones, because the TM we designed above is itself a linear-space TM new tape to the right of its original input.

—

it never used any

Linear-space Turing machines were defined in the 1960’s as part of the Chomsky

hierarchy of lan-

guages mentioned in Section 15.2. We'll prove in Problem 15.8.9 that nondeterministic linear-space Turing machines (NDLSTM’s) can recognize exactly the same languages as “Type 1” grammars. In 1965 the following problem was posed: Is the complement of the language of an NDLSTM necessarily the language of another NDLSTM? Most researchers thought that this was not true in general, but in 1987 Immerman

15.6.5

and Szelepczényi proved that it is,

Exercises

E15.6.1

Why

is it important,

E15.6.2

Indicate how a 2WDFA

E15.6.3

Let M be a Turing machine with transition function 6. Given a string w, how can you decide whether w has the right form®! to denote a configuration of M? If w does have the right

O¢ x?

for defining the action of a Turing machine

on an input

string,

that:

may be simulated by a Turing machine.

“Such a proof is beyond the scope of this book — it is an example of the “space hierarchy theorem” to which we'll refer briefly in Section 15.11 °°So Type 1 languages are “closed under complement”, as are Type 3 (regular) languages.

But Type 0 and Type 2

(context-free) languages are not — we saw this in Problem 15.5.7 for Type 2 and we'll see it in Section 15.8 for Type 0. °'To tell whether w could actually arise as a configuration of M is literally impossible, as we'll see in Section 15.10.

But a string must follow certain rules to be a configuration of any Turing machine with a given alphabet and state set

15-48

form, describe how to compute one step

the string denoting the configuration of M

after it runs for

on w.

Trace the given Turing machine for {a"b” : n > 0} on the inputs a, aabba, and abb.

E15.6.

Build

om

E15.6.4

a Turing

machine

Merase

that

halts with the tape entirely blank.

E15.6.6

when

started

on

input

w

in configuration Ow), erase+

(i.

Give a complete state table for

Give (by state table) a Turing machine that halts on an input

w € {a,b,c}*

if and only if

w is of the form ucu’ for some string u € {a,b}*. You should be able to adapt the main example of this section. (This example proves that the language of a Turing machine need not be context-free.)

E15.6.7 Describe (at a high level) a Turing machine that halts on a string of the form a‘b’ if and only if i divides 7. Don’t worry about what happens if the input is not in a*b*.

E15.6.8

Let M be a Turing machine that never moves regular language.

E15.6.9

Suppose we revised the definition ofa Turing machine to allow it to stay in place after reading a letter, rather than just move left or right. Describe how to take any Turing machine of this new kind and create one of the original kind that has the same behavior on every input.

E15.6.10

In designing a Turing machine

its head to the left.

Prove that L(M)

is a

for a particular task, it is often useful to allow it to mark

a particular tape cell, without affecting its contents. In our {a”b"} example, we found the place in the tape where we had been working by finding particular blank squares. But, for example, the copying machine of Problem 15.6.2 is much simpler with the ability to mark

squares.

15.6.6

How can we implement this marking with a Turing machine as we have defined it?

Problems

P15.6.1

Build

P15.6.2

Build a Turing machine M,op, that when started on input w (i.e., in configuration Ow), halts with the tape contents ww. You need not give the entire state table as long as your description is clear and complete. Let = = {a,b}.

P15.6.3

Build a Turing machine Mount to implement a binary counter as follows. When started on a blank tape, Mcount will write the string 1 on the tape, change it to 10, change that to 11, and so forth, writing each binary number in order on the tape. Note that Mount never halts. (In general we would use a suitably modified copy of Mcount as a subprocedure in another Turing

a Turing

machine

M,pjf;

halts with tape contents DOwhD.

that

when

started

Let © = {a,b}.

on input

w

(i.e., in configuration

Ow),

machine.) P15.6.4

Let M be an arbitrary Turing machine. Indicate how to modify M, without language, so that whenever it halts, it does so on the leftmost cell of its tape. another symbol to the tape alphabet.)

P15.6.5

Give

a high-level

started

with tape

description contents

of a Turing

u#tv

where

machine

u and

v are

that

performs

binary

strings in {0,1},

changing its (Hint: Add

addition.

it should

halt with a

single binary string on the tape, denoting the sum of the naturals denoted by u and v. 15-49

When

P15.6.6

Give a high-level description of a Turing machine that has input alphabet a larger tape alphabet) and halts on the input a” if and only if n is prime. refer to the solution of Exercise 15.6.7.

P15.6.7

Most modern computers are based on the Von chine’s memory is stored in an infinite sequence a fixed number of bits called a word. One of the involves indirect addressing. For example, it there as a natural n, then copy the contents of

Turing machine could carry out this action.

15.6.3.)

{a} (but possibly You may want to

Neumann architecture, in which the ma{ro,1r1,12,...} of registers, each containing typical actions of a Von Neumann computer might look in register ro, interpret the word r,; into rj. Describe, at a high level, how a

(You will need the binary

counter from Problem

P15.6.8

Describe, at a high level, a Turing machine that takes as input a sequence of one or more naturals, in binary, separated by # symbols, and sorts them in increasing order.

P15.6.9

Describe, at a high level, a Turing machine that takes as input two naturals wv and v in binary (as in Problem 15.6.5) and halts with its tape containing the single natural wv in binary.

P15.6.10

A boustrophedonic Turing machine” begins in configuration .$w$ and has the property that it changes the direction of its tape movement only at the $ endmarkers. That is, it starts moving right, reaches the right endmarker, and starts moving left, possibly shifting the endmarker one space to the right as it changes direction’. Can any ordinary TM be simulated by a boustrophedonic TM? Justify your answer at a high level.

®2Boustrophedonic writing, named from the Greek for “ox-plowing”, has alternate lines written left to right and

right to left.

°8Thus, unlike the sweeping automaton of Problem 15.1.10, it has no limit on how much of its tape it can eventually

use.

15-50

15.7

Excursion:

Unrestricted

Grammars

Another form of Church’s Thesis says that any method of defining languages that is general and allows unlimited memory will define the same class of “computable languages”. We've seen a number of ways to define languages with grammars — rules for rewriting strings until a desired string is obtained. Context-free grammars are not general, because we may replace only one character at a time, and they have limited memory, because the strings along the way to a word w are never longer than w. By relaxing these two restrictions, however, we do get a general model. An unrestricted grammar has rules u — v where uw and v can be any strings of terminals and non-terminals, as long as there is at least one non-terminal in u. This additional freedom lets us define languages that are not context-free, as in the following example. Our terminals are a, b, and c, the nonterminals are A, B, C, S (start symbol), T, and U, and the rules are as follows:

S

—+

SABC

BA

—

AB

CA

+

AC

CB

+

BC

SA

>

aS

S

> T

TB

—

0bT

T

7

U

UC

+

U

U

+x

How is it possible to generate a string of nonterminals with this grammar? The initial S must®4 change to T, then to U, and then to X. Before changing to T it can generate any number of ABC strings.

The

A’s,

B’s,

and

C’s can be sorted into

alphabetical

order

by the second,

third,

and

fourth rules. Furthermore, unless they are sorted into alphabetical order, they cannot be changed into terminals, because only an S can change a’s, only a T can change b’s, and only a U can change c’s, and the transitions from S to T to U are not reversible. So we can generate a string if and only if it is the sorted version of a string in (abc)*, or a string of the form a"b"c" for some number n. Thus the language of this grammar is {a"b"c” : n > 0}, the language that we proved not to be context-free in Section 15.3. So we know that unrestricted grammars are strictly more powerful than context-free grammars. Just how powerful are these unrestricted grammars? In this Excursion we’ll show that they have the same power as nondeterministic Turing machines. (And in Section 15.8, we'll see that these in turn have the same power to recognize languages as do deterministic Turing machines.) °4By induction, there is always at most one nonterminal from the set {$,7,U} in the string.

15-51

Suppose that G = (X,V,S,R) is an unrestricted grammar. It’s not hard to see that we can design a NDTM N (with an appropriately expanded tape alphabet) that does the following on input string

we

b*:

e Writes a marker $ and an S after w on the tape. e Carries out a derivation of G on the portion of the tape to the right of the $. To do this it must guess a location, read a string that it the left-hand side of a rule, erase it, and insert

the right-hand side of the same rule. e Verify that the strings on either side of the $ are equal, and if so accept.

If w € L(G),

it is possible for N to generate w to the right of the $, verify that the two

the same string, and accept.

w’s are

Conversely, if N ever accepts on input w then it must have generated

w to the right of the $, and since it followed the rules we know w € L(G). So L(N) = L(G), and since G was arbitrary we know that every unrestricted grammar may be simulated by an NDTM. Now

suppose

N

is an

arbitrary

NDTM,

transition relation A C (Qx UxQx=~x

with

tape

alphabet

%,

state set Q,

start

state

1, and

{L, R}). We're going to construct a grammar G such that

L(G) = L(N), proving that NDTM’s and unrestricted grammars have the same computing power.

It is pretty clear how to simulate the action of N on the tape by grammar-like transformations of a string.

If, for example,

(q,a,r,b, R) € A, we need a

rule to change the substring ga (the head in

state q looking at an a) to br (the head in state r to the right of the resulting b). If, on the other hand,

(q,a,7r,b, L) € A, then we need to transform cga to rcb, for any possible tape letter c.

So we want G to generate a string w if and only if N a problem.

While N

accepts w. We can quickly see that we have

starts with w and computes until perhaps it accepts,

G must

start with the

single symbol S' and be able to generate w from nothing.

15.7.1

Writing

Exercise

Describe how to implement one of the following two solutions to the problem above, and thus build

G such that L(N) = L(G).

1. Design an unrestricted grammar H such that L(H) = {w#:Ow : w € D*}. Design G so that it generates a string in L(H) and then simulates N as above on the second copy of w. If the simulated N reaches the accept state, G can erase everything after the # and thus generate w. We can make # a nonterminal of G (though it is a terminal of H), so if N never accepts G can’t erase the # and thus can’t generate a string. Argue that given this construction, L(G) = L(N). 2. Design G so that transformations of G on a string simulate the action of N on its tape backwards. That is, a rule of G can take a string representating a configuration c to a string 15-52

representing another configuration d if and only if it is possible for N to move from d to c in one move. G will start with the final configuration of N and end with the start configuration of N on w. We need two special tricks: (1) We must modify N so that we know its exact configuration when it finishes — for example, it might erase its tape so the final configuration is just qaccept, and (2) We need rules to go from the start configuration qow to the string w,

carefully designed so that they can’t be applied to generate any other strings.

15-53

Figure 15-15:

15.8 15.8.1

Turing Machine Recognizable

Flowcharts for a recognizer and a decider

Semantics

and Decidable

Languages

So far we’ve defined the language of a Turing machine M to be the set of all strings w such that M eventually halts when started on w. You may have noticed a problem with this definition. If we want to know whether w is in L(M), we would naturally try to find out by starting M up. If it halts, we know that w € L(M). But what happens otherwise? M keeps on going forever, and we never get our answer. Furthermore, we can’t even look at M in general and decide that it ina loop, because there’s no reason it can’t keep writing on more and more of its tape, never entering

the same configuration twice. If a language L is equal to L(M) for some Turing machine M, then, we call it Turing recognizable’. This refers to the fact that M can recognize the strings in L by halting on them. (See Figure 15-14.) What would be more useful is a machine that decides L, so that when started on any string w M is guaranteed to halt and leave the answer “yes” or “no” on its tape, answering whether w € L. (See Figure 15-15.) If there exists a Turing machine that decides L in this way, we say that L is Turing decidable’. Applying the Church-Turing thesis, then, the Turing decidable languages are those whose decision problem is solvable by an algorithm, where we don’t place any constraints on the algorithm’s resources. Even though it might seem hard to prove anything about arbitrary Turing machines, it’s pretty easy to establish a relationship between Turing recognizable and Turing decidable languages: Proposition: Proof:

Any Turing decidable language is Turing recognizable.

IfL is Turing decidable, there must be a Turing machine M that decides L. To make a new

“°Turing recognizable sets are also sometimes called semirecursive, recursively enumerable, or Turing ac-

ceptable. 56 Also called recursive:

15-54

Figure 15-16: A recognizer built from a decider Turing machine N that recognizes L, we first have N simulate has just written on the tape.

If it is

, N halts.

M.

Then N looks at the answer it

If it is “no”, N goes into a

special state where

it loops forever (for example, we could have it move right and stay in the same state no matter what letter it sees). (See Figure 15-15.) So N halts on input w if and only if M says “yes” on w, which is true if and only if w € L. Thus L = L(N) and L is Turing recognizable. a TD/TR Theorem: A language L is Turing decidable if and only if both L and its complement T are Turing recognizable. Proof: There are two halves of the proof. For the easy half, assume that L is Turing decidable. The Proposition above tells us that L is Turing recognizable, so we only have to worry about Z. But it is easy to see that L is Turing decidable (and hence, by the Proposition, Turing recognizable) as well. Consider a machine M that decides L. To get a machine N that decides LZ, we have N simulate M and then look at the answer it has written. If it finds the word “yes”, N erases it, writes “no”,

and halts.

If it finds the word

“no”,

it erases it, writes

“yes”, and

w, N gives an answer that is the negation of M’s answer. (See Figure answers whether w € L, and so it decides Z and L is Turing decidable.

halts.

So on any input

15-16).

It thus correctly

For the other half, assume that both L and L are Turing recognizable, so that there are Turing machines M and N such that L(M) = L and L(N) = L. We need to define a new Turing machine D that decides L.

The main idea is for D to simulate M

input w. If it can do this successfully, halts, D erases the tape and writes

all it needs to do

and N

in parallel,

each starting

with

is to wait for either M

or N to halt. If if N halts it erases the tape and writes ““no”. Because

either w € L(M) or w € L(N) is true, but not both, we know that D will eventually halt on any input and give the correct answer. But how do we make D simulate the two machines in parallel? Here is one possible implementation. ‘When started on input w, D first makes a new copy of w to the right of the original, separated by a punctuation mark #. It also leaves a marker @ at the original head position in each copy, so the contents of the tape are @w#@Quw. Then it proceeds to alternately implement a step of M on the left copy and implement a step of N on the right copy. At each point, it remembers which state

15-55

Figure 15-17: A decider for Z

Figure 15-18: Running two Turing machines in parallel of M

and which

state of N

it is currently in.

So implementing

a step of

for example,

means

moving to the @ in the left string, reading the character to the right of it, determining what to do, printing the new character, moving the @ left or right, and remembering the new state. (See Figure 15-17.) Implementing a step of N means the same operations on the string to the right, using the 6 function of N to decide what to do. This is totally straightforward with one exception — the right string, next to the #.

if the left string’s @ is at the very right of

In this case, if the head of

M is supposed

to move

right, we need

to make room for it by adding the new blank character that should be found on M’s tape in the previously untouched square. So D invokes a procedure that shifts the # and the entire right string one square to the right, and inserts a blank in the newly opened space. Turing machine procedures

to shift a string were designed in the Problems of Section 15.6.

15-56

a

15.8.2

Nondeterministic

and

Multitape

Machines

What about nondeterministic Turing machines? If N is an NDTM, it’s natural to define L(N) to be the set of strings w such that it is possible for N to halt on input w. So we can define a language to be NDTM recognizable if it is equal to L(N) for some NDTM N. Fortunately, though, we don’t have to worry very much about this new concept, because it coincides with one we've already

seen:

Theorem:

A language is NDTM

recognizable if and only if it is Turing recognizable.

Proof: One direction is obvious, because if a language is Turing recognizable it is equal to L(M) for some ordinary Turing machine M, and we can just view M itself as an NDTM that has only one possible computation path. So L(M) is also an NDTM recognizable language. For the other direction, we assume that L = L(N) for some NDTM N. We need to design an ordinary Turing machine D such that D will halt on input w if and only if N can halt on input w. The way to do this is to have D systematically simulate N on all possible computation paths. But before we can explain how this works, we need a technical result about Turing machines.

A multitape Turing machine is similar to an ordinary one except that it has some fixed number k of tapes, rather than just one.

others.

Each tape has its own head that can move

independently of the

The transition function has a slightly different domain, Q x I’, indicating that the machine

sees the characters under each of the k heads, and a slightly different range, Q x ([ x {L, R})*, indicating that it must print one character on each tape and move each head either left or right.

If the Church-Turing Thesis is to be believed, a multitape Turing machine cannot be more powerful

than an ordinary one, and this result is what we will need as we examine NDTM’s. Multitape Lemma: If M is a multitape Turing machine, there exists an ordinary Turing machine D such that on any given input string w. halts if and only if D halts, and if they both halt they each leave the same output®’. Proof:

A simple way

to do this is to have

D maintain

the k tapes of M

next

to each other on

its single tape with a # symbol separating them, much as we maintained two strings on the same tape in our proof of the Theorem above. (See Figure 15-18.) A marking symbol @ in each string records the current position of the head for that string.

In order to simulate a step of M,

D must

have its single head traverse its tape to record the symbols under each of the k heads, decide what to do based on those symbols and its current state, then traverse the tape again to implement the required print operations and head moves. As before, if any of the heads move past the right ends

of their strings, D makes room by shifting part of the string right to open up a space and printing a blank symbol in the space. Clearly if D simulates each step of M correctly, it will halt if and only if M does. the same output as

In order to have

when it does halt, D must erase everything on its tape except for the part

"We'll adopt the convention that multitape machines start with their input on the first tape and all other tapes

blank, and that they leave their output on the first tape.

15-57

. (Translate to English — all variables are of type natural.

cation.) Va : ab: [(b > a) A7[ae: Ad: (c > I) A(d >

The symbol *-” denotes multipli-

1) A(c-d=8)]]

. (Translate to symbols, using the following predicates (all variables are real numbers): C(a) means “a continually increases”, R(a,b) means “a remains less than b”, L(a,b) means “a approaches a limit b”.) “If 2 continually increases but remains less than some number c, it approaches a limit; and this limit is either c or some lesser number.” (Hint: Assign a variable

to “the limit”.)

. (Translate to symbols, using “|a — b| < c” to represent “a is within c of b”. If you like you may declare some variables to be of type real and some of type positive real.): “For every positive real number e€ there exists a positive real number 6 such that whenever a real number

a is within 6 of xo, f(a) is within € of c.”. What (Hint: Look carefully at the word “whenever”.)

are the free variables in this statement?

4. (Translate to English, where all variables are of type

“node”, EP

means

“the graph has an

E(a,b) means “there is an edge from a to b”, P(a,b) means “there is a path and O(a) means “a has an odd number of neighbors”): [va : dy : E(x,y)] > [EP © ((Va : Vy: P(a,y)) A [Ba : dy: Vz: (@ #y) A(O(z) & ((z =2) V(z =y)))))]

2-18

Figure 15-19:

Simulating multiple tapes

representing the contents of the first tape of J

a

Armed with this Lemma, we can now continue with the simulation of an NDTM by an ordinary Turing machine. We'll carry out the simulation with a three-tape machine M, and then invoke the lemma to claim that this three-tape machine can itself be simulated by an ordinary one-tape Turing machine.

Remember

that our goal is to simulate all possible computation paths of our NDTM

so that the simulator will halt if and only if the NDTM

N,

can halt.

Tape 1 will get the input string w, which we will want to keep for reference through the entire computation. Tape 2 will hold a copy of the single tape of N, to be rewritten to exactly follow N’s computation. Finally Tape 3 will hold a string called the choice sequence, which we now

describe. N has a fixed number of transitions, so let’s number them from the set {1,...,k} for some number k. A choice sequence will be a string over the alphabet {1, :}, and thus represent a sequence of transitions. In any computation of N, there is a sequence of transitions that N undergoes, and thus a string of this kind. Most strings over this alphabet represent impossible sequences of transitions, of course, but each valid sequence is represented by a string. A computation that eventually halts is represented by a finite string, whose length is the number of steps in the computation.

Our machine M will work as follows.

It will successively try every possible string over the alphabet

{1,...,k} on Tape 3, starting with the one-letter strings, then two-letter, three-letter, and so on. For each such string it first clears Tape 2, copies w from Tape 1 onto Tape 2, and places the Tape

2 head at the initial blank.

Then it reads the choice sequence on Tape

3 letter by letter, trying

for each letter to implement that transition of N on Tape 2. If the transition is impossible, or if it runs out of choice sequence before halting, it clears Tape 2, puts the next string on Tape 3,and continues®®. If M ever reaches a halt state of N, then it halts as well.

It’s easy to see now that L(M)

= L(N).

If M

halts on an input

w, it can only be because it has

°8Note that M is essentially maintaining a base-k counter on Tape 3, much like the binary counter we constructed

in Problem 15.6.3

15-58

found a choice sequence that codes an accepting computation of N on w, and thus w € L(N). And if there is any possible accepting computation of N on w, it has a choice sequence coded by a finite

string that will eventually be tested by M.

So M will halt, and w € L(M).

a

Thus we could say that in the case of Turing machines, nondeterminism does not provide the power to recognize new languages. For finite-state machines it didn’t either (as we proved), but for pushdown automata it actually does (as we didn’t prove). Note, though, that there isn’t an obvious notion of an NDTM deciding a language, so we don’t have a nondeterministic version of the Turing decidable languages to compare against the deterministic one. Note also, for future reference, that in this simulation the deterministic machine takes enormously longer than the NDTM to halt on the string — we will revisit this in Section 15.11.

15.8.3 E15.8.1

Exercises Consider

the

{O,a,b,Y,N},

Turing

and

machine

whose

M

state

whose

set

input

alphabet

is {t,p,q,h}.

The

is {a,b},

whose

transition

tape

function

has

alphabet

i

6(1,0)

(2. R), d(p.a) = (GY, R), (0,8) = (@.N.R), 6(p,0) = (h,N.R), 5(q,a) = 5(q,0) (7,0,

(h,O, R) (the other values of the transition function may be set arbitrarily). Show that M decides a language and identify the language. Modify M to get a recognizer for the same language. E15.8.2

R), and 6(q,0)

Making reasonable assumptions about the existence of Turing machine algorithms for base-ten arithmetic, show the following: The language of prime numbers (written in decimal) is Turing decidable.

The set of numbers {2:

« is smaller than some prime p} is Turing decidable.

The set of numbers {x: both 6a +1 and 6x — 1 are prime} is Turing decidable”. The

set of numbers

{y : Ja

y
5. If we have any

8-27

tells us that Pr(X

> 1) < s.

so the bound is 1/(1/s) = s.)

(The

, there is no 3-clique and no 3-anticlique, ix-node graph, pick an arbitrary node a.

Either a has at least three neighbors or has at least three non-neighbors.

In the first

case, let b, c, and d be three of a’s neighbors. If there is any edge among b, c, or d, then those two nodes form a 3-clique with a. If they do not, those three nodes form a 3-anticlique. In the other case, let b, c, and d be any non-neighbors of a. If there is any non-edge among 8, c, and d, those two nodes form a 3-anticlique with a. If there is no such non-edge, then b, c, and d form a 3-clique.

In the terms of Problem 10.11.6, we have shown R(3,3) < R(3,2)+ R(2,3) =3+4+3. The general argument is similar. Exercise 10.11.10

This method does not extend from n/2 types to n/2 + en types, for any positive e. As we increase the number of types, the probability of all our coupons being in any particular S type increases, but the number of sets decreases, from (n72) to (n/2nen): Since our bound is the product of the probability in S and the number of sets, the question is whether the decrease in the latter makes up for the former. It doesn’t — we consider a particular « and estimate both numbers.

For all coupons to be in S, the probability is (Bebe yn

(2B)?

How much larger is this than

If we divide the new probability from the old one, we get (a*)"

which we can approximate as e?.

The

change

in the

2/2

(c72) =ta

number

of sets is Guztea)

=

Gytte)

n/2—en = Gfdaenyl

= (14+2e)", divided

by

n/2—en

‘When we cancel, there are en terms, each one no smaller than nyZren?

which is at most about (1 — 4e). This makes the total ratio no more than (1 — 4e)*" or about e-ten

much smaller than the increase in the probability.

So the new bound

is much larger than 1, which is useless since we already know a lower bound of 1 for any probability.

$-28

§.11 Exercise 11.1.1

Exercises From

Chapter

11

Pr(A|B|C) is defined to be Pr(ANC|BNC) which is, by the definition, Pr((ANC)ACBNC)) Pr(BACy Boe,

Also by the definition, Pr(A|B OC)

same.

is equal to Ee va.

The expected fraction of yes answers is q = (1+ p)/2,

which is the

and thus by algebra p = 2q—1.

Exercise 11.1.3 By the Law of Total Probability, the overall chance is (0.6)(0.4) + (0.4)(0.1) or 28%. Exercise 11.1.4 We are told that Pr(BN L)/Pr(L) = 0.6 and that Pr(BN L)/Pr(B) = 0.3, which tells us that Pr(B) = 2Pr(L) = (10/3) Pr(B NL). We cannot tell the values of these three probabilities from the information given — Pr(B OL) could be anything from 0 to 1/4. (It can’t be higher because if Pr(BM L) = 1/4, Pr(B UL) = (10/3)(1/4) + (5/3)(1/4) — (1/4) = 1, and thus if it were higher Pr(B UL) would be greater than 1. Exercise 11.1.5 If B = U, the universal set, then Pr(A|B) = Pr(ANU)/ Pr(U) = Pr(A). In general, if ACB, then Pr(A|B) = Pr(AN B)/Pr(B) = Pr(A)/ Pr(B) which is at least as great as Pr(A) because Pr(B) < 1. 6 After U;BB, and

two

which we reach with probability 0.15, the three balls left are one black

white,

so we have U; BBB

with 0.05 and

U;BBW

with

0.10.

Similarly,

we

compute Pr(U; BW B) = 0.10, Pr(U; BWW) = 0.05, Pr(Ui, W BB) = 0.10, Pr((iWBW) = 0.05, Pr(U,WWB) = 0.05, Pr(U, WWW) = 0, Pr(U,BBB) = 0, Pr(U,BBW) = 0, Pr(U,BWB) = 0, Pr(U,BWW) = 0.10, Pr(U,WBB) = 0, Pr(U,WBW) = 0.10, Pr(U,WW B) = 0.10, and Pr(U,WWW) = 0.20. Exercise 11.1.7 For the first example, we have Pr(U,;/BBB) = Pr(BBB/U,) pigs = Similarly, the probabilities for Uj /BBW, U,/BWB, and U,/WBW are ale each 1. For Pr(Ui/BWW), we have 0.10975 = 1/3, and we get the same result for U/W BW

and U;/WWB. Finally Pr(U;/WWW) =, since Pr(WWW/UW) = 0.

In each case Pr(U2/X) = 1— Pr(U;/X), so the various values are 0, 2/3, and 1. Exercise 11.1.8 At t = 1 she is in (event J) with probability 0.6 and out (event O) with probability 0.4. At t = 2 she is in for events IJ (probability 0.36) and OJ (probability 0.08), for a total probability to be inside is 0.36 = 0.08 = 0.44, so she is outside with probability 0.56. At ¢ = 3, the eight possible events have probabilities 0.216 for III, 0.144 for IJO,

0.144 for TOT, 0.096 for IOO, 0.048 for OIT, 0.032 for OIO, 0.064 for OOT, and 0.256 for OOO. The total probability to be inside is 0.216 + 0.144 + 0.048 + 0.064 = 0.472, so the probability to be outside is 1 — 0.472 = 0.528.

Exercise 11.1.9 Since X implies Y, we know Pr(Y|X) = 1. Since X is the event we called OI above, we know that Pr(X) = 0.08, and similarly Pr(Y) = 0.08 + 0.36 = 0.44 because Y is the union of events as OI and II. So we can use Bayes’ theorem to get Pr(X|Y) as 1908 = 2/11. Without Bayes’ Theorem, we would compute Pr(X/Y) asaa. and get the same answer, noting that XY and Y are the same event.

Exercise 11.110 Pr(BF|EF) is pte = $33= 1/2, and Pr(BF|AF) is ER

= f

1/3. The

nal ; that is, how we are selecting the dogs. Half of the elder-female couples have a younger female interpretation depends

on what

we mean when “one of the dogs is fe

$-29

Figure S-21:

An Event Tree for Exercise 11.1.8.

dog, but of all the couples with at least one female dog, only 1/3 have both female dogs. Exercise 11.2.1

a)

If q are the odds O(£), then since q= 74, Tp we have p= 7h. T+q

(b) Here qis slightly higher than the correct value 7, i (c)

and the difference q— 74; is

which by the quadratic formula, equals 0.01 when g is about 0.1051.

In this case the approximation a the difference being a

is slightly smaller than the correct value

This is at most

0.01 when

q is

at least about

th 9.512,

again using the quadratic formula. Exercise 11.2.2

If the Locals win by exactly seven, I win both bets, pay no service charges, and collect $200.

If anything else happens, I win one bet and lose the other, pay a service charge

on the losing bet, and lose a net $10.

My

$10 at odds of 20 to one.

bet is thus equivalent to a normal bet of

(a) Let the bet on each horse be x. Then the track returns 85% of the 8a total take to the winners, so each winning $1 bettor gets $6.80, which is their $1 bet back plus $5.80 winnings. The odds of the bet are therefore 5.8 to one. (b)

Exercise 11.2.3

e (b) Let ¢ be the total take and w be the amount bet on the winning horse. above,

the amount

returned

to the winning bettors will be 0.85¢,

and

As

we need

this to be at least 1.1w so that the winners will get at least the desired return. Equality holds when 1.lw = 0.85¢ or w = 17t/22 or about 0.773, so the track should not let more than about 77% of the bets be on a single horse. Exercise 11.2.4

Let p = 2q¢ +1 in Exercise

be the probability

11.1.2.

The

fraction

of a random

q that

answer

student “yes”

being

consist

guilty, of p/2

as estimated who

flip heads

(who answer honestly), p/2 who flip tails (following their rule), and (1 —p)/2 who are

innocent but flip tails. The fraction of “yes” answers are thus p/q.

Exercise 11.2.5

If the Locals win, she pays $6667 on winning day-one bets, collects $5000 from losing day-one bets, pays $4750 on winning day-two bets, and collects $6000 on losing daytwo

bets,

for a net

loss of $417.

If the

S-30

Locals

lose,

she collects $20,000

and

pays

$15,000 for day-one bets and collects $19,000 and pays $24,000 for day-two bets, breaking exactly even. She cannot equalize her position exactly, since if she offe: two different odds for the same game it is possible for someone to get a “middle” analogous to that in Exercise 11.2.2. Exercise 11.2.6

Let h be the hypothesis that the coin always comes up heads. The prior odds O(h) are similar to the prior probability estimate Pr(h) = 10~°. Let e be the event that the first 20 coin flips are heads. The likelihood L(e\h) is cat by the definition. If h is true, e will certainly happen, and if 7h is true, Pr(e) is 2-20.

Thus L(e|h) is

soe = 270, and the resulting posterior odds are 10-62”, close to 1, and the posterior probability estimate is close to 1/2. With 92 consecutive flips, the posterior odds are

about

10~°2°?

= 277.

If we let e’ be the event that the coin flips include any that

come up tails, Pr(hle’) is 0, so L(e’|h) is also 0, and Guildenstern should reject h with certainty. Exercise 11.2.7

(a)

The graph is connected if and only if there are at least two edges, which happens

(b)

The graph is now connected if and only if either of the other possible edges exist,

with probability 1/2.

which happens with probability 3/4.

(c)

Now the graph is connected if and only if both of the other possible edges exist, which happens with probability 1/4.

(d)

We'll use the interpretation from Exercise 11.1.10. Let ¢ be the event that the graph is connected, and d be the event that there is at least one edge. Then

Pr(c) = Pr(e Nd) = 1/2, Pr(d) = 7/8, and thus Pr(c|d) = 4/7.

Exercise 11.2.8

At the current price, I must pay ($0.70)(1.05) = $0.735 for a Silly-win contract, so I should buy if my subjective probability is greater than 0.735. IfI sell a contract, I get only ($0.70)(0.95) = $0.665, so I should sell if my subjective probability is less than 0.665.

Exercise 11.2.9

By the Binomial Theorem, Pr(e;) in this case is (pk

Exercise 11.2.10

Let s be the event

of a sale, a be the event

of the customer

and p the event of the customer having a piano.

are 1/499.

—p)r*. having an automobile,

In both cases the prior odds O(s)

(a) We have Pr(a) = 0.10, Pr(s Ma) = 0.001, Pr(s|a) = 1/100, Pr(sa) = 0.90, Pr(s 7a) = 0.001, Pr(s|>a) = 1/900, so L(s|a) = 9. The posterior odds O(s|a) are 9/499, and the posterior probability estimate is 9/508. (b)

Similarly, we have Pr(p) = 0.20, Pr(s Np) = 0.0001 (5% of the 0.002 probability of a sale), Pr(s|a) = 1/2000, Pr(>p) = 0.80, Pr(s N =p) = 0.0019, Pr(s|]negp) = 19/8000, so L(s|p) = 4/19. The posterior odds O(s|p) are (4/19)(1/499) or about 0.00042, and the posterior probability estimate is about the same.

Exercise 11.3.1

(a)

In the example above with 10% users, we would expect that there are 200 users, 180 of whom test positive, and 1800 non-users, 18 of whom test positive. We

would have odds of 10 and a probability that 10/11 of those who are actually users.

8-31

test positive

(b)

Now with 1% users, we would have 20 users, 18 of whom test positive, and 1980 non-users, 19.8 of whom test positive. The fraction of those who test positive

are 10/21 and the posterior odds are 10/11. (c)

Finally, with 0.1% users, there are 2 users,

non-users,

19.98 of whom

test positive

is about 9%,

test positive.

1.8 of whom

test positive, and

1998

The fraction of users among those who

with the posterior odds about

1/10.

Exercise 11.3.2

L(P|U) is now pools = 998 = 9.9.

Exercise 11.3.3

L(P|U) = = Ue. The test is some use as long as 1— FN > FP. Of course if PP or EN is very large, you could use redefine the test to give the opposite answer, so that the new

“false positive” becomes 1—F'N,

becomes 1 — FP. In this case if FP > 1— FN some use. If 1 — FN = F'P, you are out of luck. Exercise 11.3.4

Let P; and P, be the events event P; A\ Pp».

of positive

and the new “false negative”

the redefined test would also be of

results on the two

tests, and

let Q be the

Pr(QW) _= PrPy _Pr(PiJU)Pr( P2|U)_=O) _= 0.0001 0.81 = 8100, making : the postee (a) L(Q\U) =— POM) oT PrP rior odds 8100/999 or about 8.1 and a probability of about 89%.

(b) Now L(Q\U) = L(P,|U) = $39 = 90, since Q and P, are the same event. e (c) As in part (a) we have Pr(Q|U) = 0.81, but now Pr(Q|-U) = (0.01)(0.1) = 0.001, so L(Q|U) = 810 and the posterior odds are 0.81, for a probability of about 45%. Exercise 11.3.5

Of course for q = 0 nothing has changed. of death is now

For any given q, the expected probability

0.1 + 0.9q for p = 0 and 0.2 + 1.6q for p = 1. The linear function of

p through these two points is 0.1 +0.9q + p(0.1 + 0.7q), which we can again compare to the patient’s probability of death with no surgery. With g = 0.01 our function is 0.109 + 0.107p, which equals 0.1 + 0.3p when p = 0.0466 and equals 0.2 when p = 0.8505, so the surgery is better when p is between those two figures. The largest q for which the surgery makes sense is exactly 1/17 = 0.0588, because here both strategies give exactly 0.2 probability of death when p = 1/3. Exercise 11.3.6

Let’s see what happens if we try Surgery A first. If the patient has Disease A, they are cured with probability 0.90. If they have Disease B, they will be cured if they first survive Surgery A with 0.90 probability, and then survive Surgery B with 0.60 probability, giving them a total chance of 0.54 of being cured. So for an arbitrary p,

the patient is cured with (0.90)p+ (0.54)(1 — p) = 0.54 + (0.36)p.

If instead we try Surgery B first, if the patient has Disease A, they have to survive Surgery B with probability 0.80, then survive Surgery A with probability 0.80, giving them 0.64 of a cure overall. If they have Disease B, they are cured with probability

0.80 by the first surgery.

Their overall chance of a cure is (0.64)p + (0.80)(1 — p) =

0.80 — (0.16)p. The two options have the same chance 0.72 of a cure if p = 1/2. we should try Surgery B first if p< 1/2, and Surgery A first if p > 1/2.

So

Exercise 11.3.7 If the patient has the surgery, then if they have Disease A they are cured with probability 0.90, but if they have Disease B then they survive with probability 0.54. Their

$-32

P2.5.10

It is possible for two different finite languages X and Y to have the same Kleene star, that: is, for X* = Y*

to be true.

(a) Prove that X* = Y* if and only if both X C Y* and Y C X*. (b)

Use part (a) to show that X* = Y* if X = {a, abb, bb} and Y = {a, bb, bba}.

(c)

Prove that if X* = Y* and A ¢ X UY, string in Y have the same length.

2-24

then the shortest string in X and the shortest

overall chance of survival with surgery is (0.90)p+(0.54)(1—p) = 0.54+(0.36)p. With no surgery, if they have Disease A they survive with probability 0.60, and if they have Disease B they survive with probability 0.70. Their overall chance of survival with no surgery is (0.60)p+ (0.70)(1—p) = 0.70—(0.10)p. With p = 8/23, both survival probabilities are 1530/23, about 66.5. So the surgery improves the chances if p > 8/23, and hurts them if p < 8/23. Exercise 11.3.8

The defense attorney claims that Pr(B|A) the “random suspect”

is still 1/1461 even in the scenario when

is drawn from a population that are half pirates and thus half

have the given birthday. The correct Pr(B|A) is a bit more than 1/2 and is calculated later in that subsection. Exercise 11.3.9

In general Pr(B)

= Pr(A)(1/1461)+ (1 — Pr(A))(1)= 1-

Pr(A|B)= ToT

or essentially nr oe

H8oPr(A),

and thus

For Pr(A) = 0.9 this is about

= 0.6%, for Pr(A)= 0.99 it is about i= = 6.8%, for Pr(A) = 0.999 it is about = 68.4%. For the last case of Pr(A)= 0.9999, we can no longer approximate

[e- u6a0 p (4) by 1 1275 let = 87.3%. As the the probability that there are about seven

— Pr(A).

Here we get Pr(A|B)

= ToT

EST o.omo) or about

probability that a random suspect is innocent increa:

so does

a suspect with the given birthday is innocent. In the last case, times as many innocent people with the given birthday as there

are pirates.

Exercise 11.3.10

My

first question would

charged with something. From what we know, we can’t conclude that pirate-identified people are more likely to be chargeable, and it may be that the police are finding so many such chargeable suspects simply because they are looking for them rather than others.

Exercise 11.5.1

be the fraction of all people in those areas are liable to be

Certainly

the

past

and

present

record

of differential treatment

of various

minorities by the police is a serious issue, and we don’t intend to make Problem 11.3.10 asks you to take an open-ended look at similar issues.

light of it.

The prior odds that the parrot is Norwegian are 1/9. For each feature the given bird has, we multiply | the odds by og —= 3 as in the text. For each feature it does not have

we multiply by24 = 1/2. So for three features the odds are (1/9)(3°)(1/2) = and the probability is 60%. For two features the odds are (1/9)(3?)(1/2)? = and the probability is 20%. For one feature the odds are (1/9)(3)(1/2)? = 1/24 the probability is 4%. For no features the odds are (1/9)(1/2)4 = 1/144 and probability is 1/145, less than 1%. Exercise 11.5.2

3/2 1/4 and the

An arbitrary Norwegian Blue has a probability of G ')(0.6)'(0.4)44, so the probability that an arbitrary blue parrot is a Norwegian with no features is 0.10 times this expression for i = 0, which is 0.00256. Similarly the probability of a Norwegian with one feature is 0.01536, with two features 0.03456, with three features also

0.03456,

and with four features 0.01296.

An arbitrary Common

Blue has a proba-

ee of ¢') (0.2)*(0.8)4~, so the probability that an arbitrary parrot has i features is (0.9) times this expression. This gives us 0.36864 for both no features and one Eatin, 0.13824 for two features, 0.02304 for three features, and 0.00144 for four features. This matches what we computed in Exercise 11.5.1. Among no-feature parrots,

Norwegian Blues are only 1/144 as common as Common Blues. Among one-feature parrots, Norwegians are only 1/24 as common. For two features Norwegians are 1/4

$-33

as common, for three features Norwegians are 3/2 times features Norwegians are nine times as common. Exercise 11.5.3

as common,

and

for four

The prior probability of spam is 0.8, making the prior odds of spam 4. For each good

word that appears we multiply the odds by 0.04/0.2 = 1/5, and for each good word that does not appear we multiply by 0.96/0.8 = 6/5. For each bad word that appears we multiply by 0.20/0.04 = 5, and for each bad word that does not appear we multiply

by 0.8/0.96 = 5/6. If the number

of good and bad words are equal, these multiplications cancel and the

odds of spam remain 4. In general, let g be the number of good words seen and b be

the number of bad words. The posterior odds are then (4)(1/5)9(6/5)°-95°(5/6)>-6 = 4(6)°-9.

So the posterior odds depend

only on b — g, not g and b individually.

The

greatest possible odds of spam are if b—g = 5, where we have odds of 4-6? = 31104 and probability of about 0.99997, and the smallest possible odds of spam are if b—g = —5,

where we have odds of Exercise 11.5.4

4: 6~° = 1/1944 and probability of about 0.0005.

If Pr(ei|h) is estimated at 0, L(e;|h) also becomes 0, and the posterior odds of h become 0 no matter what the prior odds. Similarly, if Pr(e;|7h) is estimated to be 0, L(e;|h) becomes undefined or “positive infinity”, making the posterior odds positive infinity and the posterior probability 1. Zero estimates of Pr(-e;|h) or Pr(-7e;|7h) also force the posterior odds to 1 or to 0 respectively. Essentially, if the NBC has not seen a particular combination of event and hypothesis truth values, it assumes that the combination is impossible. It’s possible in this way for the posterior probability

to be forced to 0 by one event, and to 1 by a different event, making it impossible for the NBC to assign a probability at all. In Section 11.6 we'll look at techniques to ensure that this does not happen.

Exercise 11.5.5

For most

scoring

combinations

in poker,

the probability

of a given card

appearing

in a uniformly chosen hand with that combination is exactly 5/52, the same as the probability for that card appearing in a uniformly chosen random hand. The only exception is for a straight — a random straight has only a 2/40 chance of holding any particular ace, 3/40 for any particular two or king, 4/40 for any particular three or queen, and 5/40 for any other particular card. For any combination except the straight, the likelihood ration calculated for each card would each be expected to be

1, and any computed would just be random noise. Even for the case of straights, we would look at the hand and change the original odds only slightly, reducing them for aces, twos, and kings and increasing them for fours through jacks. This is not likely to be an effective way of detecting straights! Exercise 11.5.6

(a) L(R|S) = Pr(R|S)/ Pr(R|-S) = 0.4/0.1 = 4, and L(R|S), which is pr GES, or 0.6/0.9 = 2/3.

(b)

With

one positive

and

three negative

reports,

we multiply

the prior odds

by

4. (2/3)* = 32/27. Since this is greater than one, multiplying by it will increase our estimate of the probability that S is true.

(c) The prior odds for a probability of 0.01 are 0.01/0.99 = 1/99. With four positive reports, our posterior odds are (1/99)(4") = 256/99, which are greater than one, so we believe that it is now more likely than not that S is true. With three positive and one negative report, our posterior odds are (1/99)(4°) (2/3) = 128/297, less

$-34

than one. So four positive reports are necessary and sufficient to justify entering the building. Exercise 11.5.7

(a)

Since long texts would be more likely to include most or all of the letters, we would not be distinguishing between such texts at all.

(b)

Yes, if for example almost none of the Latin words had K’s and most English words did, the NBC would put a large weight on that feature.

(c)

If the non-words were uniformly chosen from all non-word strings, then the NBC would see about 5/26 of the negative examples with Z’s, but many fewer such positive examples. Similarly it would weight having an E as a word as a strong

of the

positive feature. So it might have some bias toward accepting real words, would also tend to accept other non-words that use the same letters.

(d)

but it

This would depend on the frequency of letters in the words from each language. The

two frequencies will be different in general,

and

will thus probably

accept

words from one language more often than those from the other. But it still suffers from the problem of ignoring the order of the letters in each string. Exercise 11.5.8

(a) Yes, certain digrams such as TH are much more common in English words than their reversals. Random strings, along having more letters that are rarer in words, will also have digrams including these letters, which will occur rarely or even never in the positive examples.

(b)

Polish

famously

uses digrams

like CZ

and

SZ

that

are rare in English,

there are presumably similar examples in the other direction.

digrams would weighted strongly by the NBC, answer most of the time. Exercise 11.5.9

and

would

and

Features for such

likely get the right

(a)

There’s no particular reason that words with double letters and words without them would differ in their occurrence of individual letters.

(b)

The features AA, BB,...,ZZ never occur in the negative examples, and at least one will appear in each positive example. So the NBC will put an infinite pos-

itive weight on the occurrence of each double letter that occurs in the positive examples. If it’s given a new string that has a double letter that does not occur in any positive example, it will treat it like any other string with no double letter. Exercise 11.5.10

This is exactly an example of the NBC described in the text. For each item, for example an icebox, he first counts the fraction of successful calls with an icebox to

estimate Pr(e;|s) and the fraction of unsuccessful calls with an icebox Pr(e;|>s). Then he computes L(e;|s) = Aes. and then multiplies his prior odds 1/499 of a sale by the product of the number L(e;|s) for all items i. Finally he converts these odds q into a probability q/(1+ 4). Exercise 11.6.1

If the number

of positive

examples

and

the number

of negative

examples

are both

the e same, same, n,n, we we assig assign a likelihood ke d ratio ofTm) M2) whichch has has the the same same effect effect asas leaving the feature out entirely. If the number of positive and negative examples are different, we are no longer ignoring the feature. For example, suppose that the feature e fails to occur in a large number m of positive examples and fails to occur in a small

number n of negative examples, and we get a new instance e with the feature. Our likelihood ratio L(e|h) we would now be ee which is less than 1, so that we view

$-35

this evidence as making h less likely. This makes sense because there is more evidence that this feature is unlikely when h is true, since we failed to see in the larger number of positive examples.

Exercise 11.6.2 ‘We want to treat the feature e in the same way we would treat the feature se. If we are adding a dummy instance to our count that has the feature, to avoid the possibility that the fraction having it is 0, then for the same reason we should add a dummy feature that does not have the feature. This makes our denominator n+ 2 for the n real instances and the two dummies. Exercise 11.6.3

If two features occur with the same likelihood ratio L, then their combined effect woudl be to multiply the overall likelihood odds by L?. So feature f; with likelihood

ratio L; should have weight w;, we can multiply the overall odds by L’*. Exercise 11.6.4

Seeing each word multiplies the odds Note seeing the first word multiplies not seeing the second word multiplies —not seeing the second word is only spam.

of spam by (0.40)/(0.10) = (0.04)/(0.01) = 4 the odds of spam by (0.60)/(0.90) = 2/3, while the odds of spam by (0.96)/(0.99) or about 0.97 very weak evidence against the message being

Exercise 11.6.5

As above, seeing the original bad word multiplies the odds by 4 and not seeing it multiplies the odds by 2/3. Seeing each new bad word multiplies them by 4 but not seeing each multiplies them by 0.97. So seeing one bad word and not seeing the other

nine multiplies the odds by 4(0.97)® = 3.03. Exercise 11.6.6

Suppose that the four observations are perfectly correlated, so that either all four detectives report movement or none of them do. If the probability was still 40% if the suspect is there and 10% if not, each detective’s probability is as reported. But the

correct analysis of getting four positive reports would now be to multiply the odds by four once, not four times. Exercise 11.6.7 In legitimate messages,

“computer science” appears 40% of the time, and “comput appears in 20% of the remaining messages, 12% of the total, putting “computer” in 52% of messages in all. In spam messagt computer science” appears in 10% and “computer” in 10% of the remainder, 9% of the total, for 19% in all. The analysis for science” is exactly the same. So when our NBC s “computer”, it multiplies the

odds of spam by (0.19)/(0.52) = 0.365. When it doesn’t se “computer”, it multiplies them by (0.81)(0.48) = 1.69. It treats “computer” and “science” as independent features, which they aren’t, and thus gives an exaggerated importance to both occurring together. Exercise 11.6.8

If a player has never been present for a winning point, or never been present for a losing point, the captain’s NBC will make the odds of winning with that player present

zero or infinite accordingly.

The most natural way to smooth would be to begin each

of the counts with one rather than zero.

Conditional independence

players work particularly well or particularly badly together.

could fail if two

The odds might

also

depend on which opponents are on the field. If two players are usually put in at the same time, their results will of course not be independent, and the NBC will tend to

exaggerate the effect of the event of both being in.

Exercise 11.6.9

For each of the two features, the NBC sees 6 positive strings and 6 negative strings for each, gets a likelihood ratio of 1 for each, and leaves the prior hypothesis unchanged. There is pretty clearly a relationship between these features and the presence of spam, but the NBC completely misses it. Of course, if we revised the NBC to have four

features (neither, “cat” only, “dog” only”, both) rather than two, it would find good likelihood ratios for each new feature. Exercise 11.6.10

Let C be the event that a transfer is cash-based and let B be the event that a transfer

is bad. We are told Pr(C|B) = 0.95, and we need to estimate Pr(C|-B). We can safely estimate Pr(C|=B) by Pr(C) = 0.19, since B is such a small fraction of the total. So L(C|B) = 5, and the same calculation gives us likelihood ratios of 5 for the other two features.

Our prior odds of B were Ts:

the posterior odds are

tm

so an NBC

would tell us that

or about 1/16, giving only about a 6% probability that a

transfer with all three suspect properties is bad!. If, for example, all of the cash-based transfers were also foreign and under the threshold, constituting the same 19% of the total, the correct likelihood ratio would be 5

rather than 125. Exercise 11.7.1

Each alarm goes off with probability 0.95 if B is true and probability 0.01 if it is false.

Thus Pr(A, 9 A9|B) = (0.95)? = 0.9025, Pr(Ay 9 Ag|>B) = (0.01)? = 0.0001, and

L(A, Ag|B) = 9025. The posterior odds are 9025/9999 = 0.9026, and the posterior probability is 0.9026/1.9026 = 0.474. If C = Ai M—Ag, we have that Pr(C|B) = (0.95)(0.05) = 0.0495, Pr(C|—B) = (0.01) (0.99) = 0.0099, and L(C|B) = 5. Exercise 11.7.2 We need Pr(c|aMb), Exercise 11.7.3

Pr(claN 7b),

Pr(c|nan b), Pr(c|aaN 7b), Pr(d|c), and Pr(d|7c).

We have an arbitrary choice of the six probabilities — let’s make the first, fourth, and

sixth probabilities listed above 1/3 and the rest 2/3.

Number the sixteen events 0000

through 1111 in binary — we get that Pr(0000), for example, is (1/2)(1/2)(2/3)(2/3) = 1/9. Similarly Pr(1100), Pr(0110), and Pr(1010) are all 1/9. Pr(0011), Pr(0100), Pr(1000), and Pr(1111) are each 1/36, and the other eight probabilities are each 1/18. Exercise 11.7.4

Garp is assuming that one crash makes another in the same place less likely, but if the

events are independent then the house is just as likely to be hit again as it was to be hit before. His reasoning might make sense, though, if the first crash prompted new safety measures for that location, to prevent the same kind of accident.

In the given model,

the prior odds that the house was high-risk were 1/99. We have Pr(C|HR) = 10~°, Pr(C|-HR) = 10-*, and thus L(C|HR) = 1000. The posterior odds that this house is high-risk are now 1000/99 = 10.1, for a posterior probability of about 90%. The estimated chance of a new plane crash in the next year is thus (0.90)10~°+ (0.10)10~8 or about 9 x 10~®, about ninety times greater than the prior estimate of 10-7. Exercise 11.7.5 Now we need the six probabilities Pr(b|a), Pr(b|7a), Pr(c\b), Pr(¢c|7b), Pr(d|c), and Pr(d|-c). Let’s assume that the first, third, and fifth of these are 2/3 and the rest 1/3, so that the truth of each condition makes the following one more likely to be true and vice versa.

The two most common

events are thus 0000 and 1111, each with

1A 1995 analysis by the US Office of Technology Assessment argues that such statistical inference is not

useful in detecting money laundering, since the fraction of total number of transfers are bad, and thus even

a small false positive rate will identify a much larger number of good transfers. 8-37

probability (1/2)(2/3)(2/3)(2/3) = 8/54. The events 0001, 0011, 0111, 1000, 1100, and 1110 each have probability 4/54. The two least most common events are 0101 and 1010 with probability 1/54, and the other six events have probability 2/54. Exercise 11.7.6 We need to determine the probabilities Pr(d|a) and Pr(d|-a). If a is true, then Pr(b) = Pr(bja) and Pr(-b) = 1 — Pr(bJa). So in this case, Pr(c) = Pr(c|b) Pr(b|a) + Pr(e|7b)(1 — Pr(bja)) = Pr(e|b) Pr(bla) + Pr(e|>b) — Pr(c|-b) Pr(bja). Similarly, if a is false, Pr(b) = Pr(b|7a) and Pr(-b) Pr(e|b) Pr(b|>a) + Pr(e|ab) — Pr(c|-b) Pr(b|7a).

Finally, if a is true, Pr(d|-c)[1 — Pr(c|b) Pr(d|c)[Pr(c|b) Pr(b| Pr(e|>b) + Pr(e|b)

= 1 — Pr(b|>a),

and Pr(c) =

Pr(d) = Pr(d\c){Pr(c|b) Pr(bja) + Pr(e|-b) — Pr(c|>b) Pr(b\a)} + Pr(bla) — Pr(e|>b) + Pr(¢e|7b) Pr(b|a), and if a is false, Pr(d) = >a) +Pr(c|=b)—Pr(e|=b) Pr(b|>a)]+Pr(d|-c)[1—Pr(e|b) Pr(b|>a)— Pr(b|7a).

So we have expressions for Pr(d\a) and Pr(d|-a) in terms of the six given quantities. Exercise 11.7.7

All of the events in the graph are assumed

any setting of true and false outcomes

to be independent,

so the probability of

for the events is simply the product

of the

probabilities for each one.

Exercise 11.7.8

(a) Without looking at the flagpole, he estimates the probability of a hit by the Law

of Total Probability as Pr(W) Pr(H|W) + Pr(=W) Pr(A|-W) (0.7)(0.8) = 0.74.

(b) With F true, the total probability plus (0.3) (0.8) (0.6) = 0.144 with it, probability of =H is (0.7)(0.1)(0.2) = 0.96 with it, for a total of 0.110. The 64.5%.

= (0.3)(0.6) +

H is (0.7)(0.1)(0.8) = 0.56 with no wind for a total of 0.200. With F false, the total 0.14 without the wind and (0.3)(0.8)(0.4) = probability estimate is mtn = 20/31 =

(c) WithF false, for H we have (0.7) (0.9) (0.8) = 0.504 with no wind, and (0.3) (0.2) (0.4) = 0.036 with it, for a total of 0.540. For =H we have (0.7)(0.9)(0.2) = 0.126 with no wind, and (0.3)(0.3)(0.4) = 0.024 with it, for a total of 0.150. The probability

estimate is gs{rsy = 18/23 = 78.3%.

Exercise 11.7.9

Exercise 11.7.10

Since

H

has two

parents,

we need

four conditional probabilities:

Pr(H|7=C A =D),

Pr(H|>C AD), Pr(H|C A-D), and Pr(H|C AD). From the data, without smoothing, these numbers would be 1/6, 5/6, 5/6, and 1/6. With smoothing, adding a negative and a positive example for each type, we would have 1/4, 3/4, 3/4, and 1/4. In the case with exactly one ofC and D, the NBC’s likelihood ratio would be 5xBx5x$ = 25 without smoothing, or 3 x 3 x 3 x 4 = 9. In the other cases it would be 1/25 without smoothing, or 1/9 with it. Node G (the patient being a Golden retriever) is a source node, with an edge to node

O (obstruction). Node J (infection) is a source node, and nodes O and I each have arrows to V (vomiting) and L (loss of appetite). Finally, O has a node to X (the Xray report).

Pr(G)

To make a fully realized network,

and Pr(J),

we would

need the prior probabilities

and the conditional probabilities Pr(G|O),

Pr(G|>O0),

Pr(X|O)

Pr(X|=0), Pr(V|O A), Pr(V|O A 72), Pr(V|AO A.D), Pr(V|5 A n1), Pr(L|O A 1) Pr(L|O A =I),

Pr(L|n0 AJ),

and Pr(L|> AI). S-38

Figure S-22: Exercise 11.9.1

The Bayesian network for Exercise 11.7.10.

A and B agree on a common pseudorandom generator and seed. Independently, they each generate a bit string s of the same length as A’s binary message m. A performs a bitwise XOR

on s and m to get a ciphertext c, and then sends it to B.

B performs

a bitwise XOR on c and s, to recover m. The cost savings is that A needs to send only a short secret message, the seed, rather than the long one-time pad. To break the message, an enemy could guess or figure the generator and seed. If the enemy could somehow obtain part of a message together with its ciphertext, they could generate portions of the pseudorandom string, which might help them generate other pseudorandom strings and decode other messages. Exercise 11.9.2

IfI have even one pair, I can find the corresponding number

a which was used in the

generator.

I can then simply plug a into the generator repeatedly and get new pseu-

dorandom

numbers,

to which I can apply a bitwise XOR

strings and decode them.

to the following ciphertext

IfI know m and a but not c, I can recover c with just two

pairs, since if y = ax +c, I have c= y — av. IfI have m but not a and c, I can break the generator with three pairs. Let 2, y = ax +c, and z = ay +c (each modulo m) be the three successive values of the generator. Since z — y = a(y — x), I can find a

by mod m arithmetic, and then solve for c. Exercise 11.9.3

The

Exercise 11.9.4

The values are 1, 10, 9, 2, 1, 10, 9, 2, 1, 10.

Exercise 11.9.5

The fractional part of a large floating point number will not be accurately kept by the floating-point notation, particularly if the number of bits of accuracy is smaller than the length of the integer part in binary.

Exercise 11.9.6

Here’s one approach that will eventually work, but will be somewhat

fixed points are 00,

10, 50, and

60.

Sixty of the

100 possible range values

are

taken on.

time-consuming

because it tries cards at random each time it looks for one that has not yet come up in the list.

int

[] deal

(int

cards) S-39

Random int [] for

r = new Random( ); answer = new int[cards];

(int

i=0;

boolean

do

numbers

distinct

"cards"

{\\ deals

i

0;

from

1 to 52

into

\\ entry

for 0 if not

slots--)

{

locations

from 0 to 51

yet filled

= r.nextInt(slots));

=

open

k’th

the

is

into position

"slots"

card

that

);

int[cards];

= 52;

newslot

\\ place \\

Random(

numbers

position

k =

where

"newslot"

newslot;

=0;

if

((answer[j]

== 0)

(answer[j]

== 0)

||

(togo

togo--;

> 0))

{

jtt}

answer[j] return

Exercise 11.9.7

answer;}

String nextString \\ returns string Random

r = new

String

result

for

(int

i =

result return

Exercise 11.9.8

= slots;}

(int length) { from {A,...,Z} of given

Random(

=

0;

+=

length

);

"";

i

(’A’

;_9 Gs

Gie

= 8507.

Her total chance of success is thus

1281072 4 1SLTBESr 5 a2598960 — 2598960°13

88657 + 178365

8 a13

8507 16215

)]

which is approximately

0.493 + (0.507) (0.193 + 0.293) = 0.739 Exercise 12.1.9

This is similar to the previous problem but considerably simpler. Of the 6° = 7776 rolls of five dice, only 6 - 5! = 720 fail to have a pair. In order to fail to get a pair,

she would need to fail three times in succession with probability (720/7776)° or about 0.000794. Exercise 12.1.10

Since there is no town beginning with X, either halifax or essex is a winning move.

Exercise 12.2.1

A steady-state distribution (a,b,c) satisfies a = (2/3)b, b = (1/2)a + (2/3)c, c= (1/2)a+ (1/3)b+ (1/3)e, which solves to a = 1/4 and b = c= 3/8.

and

Exercise 12.2.2

Let the two given stochastic matrices be A, with probabilities p;j, and B, with prob-

abilities qj. The product C has probabilities rjj, with rij = So, pikqij. We want to know that for each i, )0;rij = 1. But this sum is 0; 0, pinGej = Le Loy Pik Mei = Dae Pikldoj kj) The sum in square brackets is 1 because B is a stochastic matrix, so we have )>, pix which is 1 because A is a stochastic matrix.

Exercise 12.2.3

Exercise 12.2.2 tells us that since M4, and M2 are both stochastic, so is MjM2. transpose (My M2)" is easily seen to be Mf rem Since M, and M2 are both

bly stochastic, (Mi Mp)"

Exercise 12.2.4

their transposes are stochastic,

is stochastic,

and

hence that M,Mp

and Exercise is doubly

The dou-

12.2.2 also tells us that

stochastic.

The uniform distribution, with probability 1/k for each of the k states, is a steady state. (The resulting probability for each state is 1/k times the sum of the values in

a column.)

S-45

Exercise 12.2.5

The identity matrix is a doubly stochastic matrix, and any distribution is clearly a steady state, so there is more than one as long as we have more than one state.

Exercise 12.2.6

Let x and y be the steady-state probabilities for being inside and outside, respectively.

We are given that « = (32 +y)/5 and y = (4y+ 2)/5, which each tell us that y = 22. The only possibility is a = 1/3 and y = 2/3.

Exercise 12.2.7 At

t =

0 we have

(1,0,0)

for the probabilities of count

0, 1, or 2 modulo

3.

At

t = 1 we have (0,1/2,1/2), at t = 2 we have (1/2,1/4,1/4), and at t = 3 we have (1/4,3/8, 3/8). The steady-state probability is (1/3, 1/3, 1/3), which we can verify by multiplying the matrix.

Exercise 12.2.8

At t = 0 we have (1,0,0). At t = 1 we have (0.5,0.2,0.3). At ¢ = 2 we have (0.25, 0.40, 0.35), and at ¢ = 3 we have (0.125, 0.400.0.475). The steady-state proba-

bility is (0, 0.5, 0.5). Exercise 12.2.9

The states are described in the hint — each state is a pair consisting of a board position

and a number of doubles in the set {0,1,2}.

Each state has an arrow leaving it for

the possible die rolls from 2 through 12, with special arrows for the six ways to throw

doubles. In most cases these just represent the move on the board, of 2 through 12 spaces. But a move landing on Chance or Community Chest may move the player elsewhere, leading to more arrows for all the things that might happen due to a card. A throw

of doubles

increases

the

second

component

of the

state

by one,

unless

it

reaches 3 in which case the player either goes to Jail or (if she is already there) makes the given move from Just Visiting. Exercise 12.2.10

We first need to determine the state changes for each spin: e nun:

131,252,333

e gimel: e heii e shin:

If

141,241,331

141,231,331 14

and

2,243,333

z are the

steady-state

y =(x+y)/4, and z = (y+ 2z)/4. x = 6/9, y = 2/9, and z = 1/9.

probabilities,

we

have

x =

(3a + 2y + 2y)/4,

This gives us 2z = y and then 3y = 2,

telling us

Exercise 12.3.1

The Markov chain where every state goes to itself with probability 1 has this property. Its matrix is the identity matrix.

Exercise 12.3.2

If the steady state is (a,b, c,d), we have that a = 0.2b, b = 0.9a+0.8b, c = 0.la+0.6d, and d=c+0.4d.

with a+b+c+d

The first two equations imply a

0, and the second two, along

= 1, give us c = 3/8 and d = 5/8. Since these values are implied by

the equation v = vA, no other steady state distribution is possible.

Exercise 12.3.3 We have a = 0, b = 0.5a +b, c= 0.5¢ + 0.5d, and d = 0.5c + 0.5d, which tells us that a=0 and that c =d but nothing else. We could have b take on any value from 0 to

1 inclusive, and then c and d must each be (1 — b)/2.

S-46

Figure S-23: The Successive Triangles for the Chain of Exercise 12.3.5

Exercise 12.3.4 If At is reducible, so are A”, A**, and A

for all positive naturals k. To see this, let

r and s be states such that there is no path from r to s in the chain represented by A’. There can be no path from r to s in the chain represented by At, either, because

such a path would also exist (with k times as many edges) in the chain of A’. Exercise 12.3.5

The steady state can be derived from the equations a = 0.7b, b = 0.3b +c, and c=a

as (7/24, 10/24,7/24). Exercise 12.3.6

The diagram is shown in Figure 16-3.

Each state has a transition probability of 1/2 to each other state.

The initial distribu-

tion is (1, 0,0) followed by (0, 1/2, 1/2), (1/2, 1/4, 1/4), (1/4,3/8, 3/8), and (3/8, 5/16,5/16. For general i, we claim that the probability for state 1 is 1/3 + (2/3)(—1/2)' and that for the other two states are 1/3 + (—1/3)(—1/2)’. We verify the base case of i = 0 as 1/3 + 2/3 for state 1 and 1/3 — 1/3 for the other two. Assuming the IH for i, the new probability for state 1 is (1/2)(1/3 — (1/3)(—1/2)') for each state, totaling 1/3 — (1/3)(-1/2)' = 1/3 + (2/3)(—1/2)**1. The new probability for either of the other states is (1/2)[(1/3 + (2/3)(—1/2)') + (1/3 — (1/3)(—1/2)')], which is 1/3 + (1/6)(-1/2)' = 1/3 — (1/3)(-1/2)*+!. This turns out to be exactly the same Markov chain as in Exercise 12.2.7.

Exercise 12.3.7 In the lazy walk case, the transition probability for each state is 1/3 to each other state. In fact any initial distribution will result in the steady-state distribution of

(1/3,1/3, 1.3) on the next time step, and stay there forever.

In the very lazy walk case,

each state moves

to itself with probability

1/2 and

to

each of the others with probability 1/4. We begin with (1,0,0), then (1/2,1/4, 1/4) then (3/8,5/16, 5/16), then (11/32, 21/64, 21/64), and (43/128, 85/256, 85/256). It appears that the distribution for time 7 here is exactly the distribution for time 2i in Exercise 12.3.6, and we can prove this. The base case is the same, and if we have distribution (x,y, y) at time ¢ in the ordinary walk case, one step later we are at

(Qy,x+y,2+y)/2, and another step later we are at (2x +2y,2+3y,2+3y)/4, is exactly the result of one step in the very lazy walk from y). Exercise 12.3.8 In the ordinary walk case,

we begin at (1,0,0,0),

S-47

move

to (0,1/2,0,1/2),

which

then to

(1/2,0,1/2,0), and clearly continue alternating between these two distributions for-

ever.

In the lazy walk case, each state has 1/3 probability each to stay where it is or to move to either of its neighbors. The first five distributions are:

(1,0,0,0), (1, 1,0, 1)/3, (3, 2,2, 2)/9, (7,7, 6, 7)/27, and (21,20, 20, 20) /81. The general term for any even time step i is (1/4+(3/4)(1/3)*, 1/4—(1/4)(1/3)', 1/4— (1/4)(1/3)",1/4 — (1/4)(1/3)') and for the general term for any odd time step i is

(1/4 +(1/4)(1/3)*, 1/4 + (1/4)(1/3)*, 1/4 — (3/4)(1/3)*, 1/4 + (1/4)(1/3)"). The proof

is similar to that in Exercise 12.3.6.

In the very lazy walk case, each state has a probability of 1/2 to stay where it and 1/4 to move to either of its neighbors. The first five distributions are:

(1,0,0,0), (2,1,0,1)/4, (3, 2,1, 2)/8, (5,4,

3, 4)/16, and (9, 8, 7,8)/32.

So it appears that for general positive time step 7 we have:

(1/4 + (1/2)**4, 1/4, 1/4 — (1/2)**4, 1/4) and we can verify this by induction. Exercise 12.3.9

Exercise 12.3.10

Consider an arbitrary node of degree d. It begins with probability d/2e, and it sends equal probability 1/2e to each of its d neighbors. For the same reason, each of its d neighbors will send exactly 1/2e to it, so it finishes with d/2e probability again. (a)

Every tree is bipartite, and if we color the nodes as red or blue, every transition goes from red to blue or vice versa. If we start at a red nodes, all the probability is on red nodes on even time steps and on blue nodes on odd time steps.

(b)

From the above, the only possibility is if the total probabilities on red and blue nodes are each 1/2. If we look at the Markov process from taking two steps at: a time in this process, we have two separate processes, one on the red nodes and one on the blue, and since each new process has a positive chance to remain where it is, it will be irreducible.

Exercise 12.5.1

In the given game the player sees a die role and then makes a move.

If we change the

game to have the player announce a complete policy before rolling, that is, what move

they will make given any of the 21 possible die rolls, then this policy can be viewed as a “player action” in the new game. This solves the problem in theory, though in practice the number of possible policies

Exercise 12.5.2

is enormous.

Make a state for each pair of states (i,j) in the original game. When the state would move from i to j in the old game, make this two moves in the new game. We go from

i to (i,j) and collect the reward r(i,j). with probability 1 whatever

Then on the next move we go from (i,j) to j

the player’s choice of actions, and collect a zero reward.

The new game has twice as many moves and many more states, but the strategy choices and outcomes are exactly the same as for the old game with rewards on the edges. Exercise 12.5.3 We have a state for each amount

of money

(we must decide with what precision we

want to keep track of our account balance). The player has three possible actions from each state, representing the three investment choices of stocks, bonds, and cash. For each action there are three arrows based on the three market outcomes, in each case

going to the state representing the new balance after the gain or loss.

8-48

(For example,

if the player action is “cash”, all three of these arrows are self-loops.) States with a balance of $1,200,000 or higher are “winning states”, where the player will stay in cash in order to guarantee finishing with enough money. In other states, the best action may depend on how many years remain in the game, as we’ll see in the next section. Exercise 12.5.4

The distribution (1/n,1/n,...,1/n) is a steady state distribution, because the new probability for each state i is 1/n times the sum of the entries in the i’th column, which is just 1/n because the matrix is doubly stochastic. If the chain is irreducible and aperiodic, we know from the Steady State Theorem that there cannot be another steady-state distribution, and that the chain will approach this distribution from any starting point. Finally, if we square either the matrix from action A or the matrix

from action B, we get a matrix with no zeros. By the result of Problem 12.3.2, this guarantees that chains from both the original matrices are irreducible and aperiodic. Exercise 12.5.5

In fact it is not possible.

It is certainly possible for six of the eight policies.

Because

these two matrices are symmetric under rotation, we can deduce from the fact that BBA is best for 0-1-2 that ABB is best for 2-0-1 and that BAB is best for 1-2-0. Similarly, since BAA is best for 0-4-5, we know that ABA is best for 5-0-4 and that AAB is best for 4-5-0. But there remain policies AAA and BBB, which have the

same steady-state distribution and thus have the same average reward as each other in the long run for any reward function. If the rewards from each of the three states are equal, then AAA and BBB do as well as any of the others, but in this case all eight are the same. If one state has a better reward, though, a policy that will spend

more than a Exercise 12.5.6

third of the time in that state will be better than both AAA

e With one black sheep and removing the white sheep.

and BBB.

one white sheep, the Player gets a score of 1 by If the bleating goes ahead, with equal probability

either the white sheep becomes black, giving him a

score of 2, or the black sheep

becomes white, giving him a score of 0. His expected score is still (0.5)(2+0) = 1. e With

b >

1 black

sheep

and

one

white

one,

removing

the

white

sheep

ends

the game with a score of b. Allowing the bleating to go on, however, gives a probability of sa for the white sheep to become black, ending the game with

a score of b+ 1. This contributes pir

+1) = b

to the total score.

If instead

a black sheep should become white, there is still at least one remaining black sheep left, so the Player’s expected score in that case is certainly positive, so the

expected

score overall in the bleating case is strictly greater than b.

Exercise 12.5.7 We'll omit the diagrams, but the transition matrix for NNN 0.2 0.2 0.0

0.8 04 0.4

0.0 0.4 0.6

and the new transition matrix from this problem for CCC

0.4 0.4 0.0

0.4 0.2 0.2

49

from Problem 12.3.10 is

0.2 0.4 0.8

is

Here are the other six transition matrices:

NNC=]|

0.2

0.2 L 0.0

0.8

04 0.2

0.0]

04 08|

0.2 0.8 0.0] NCN =| 04 0.2 04 L 04 04 02| NCC =|

0.2

04 [0.0

0.8

0.2 0.2

0.0]

04 08|

0.4 04 02] CNN =| 02 04 04 L 04 04 02| CNC=| CCN=} Exercise 12.5.8

‘We must

04

04

0.2]

04

04

0.2]

02 [0.0 04 [0.4

assume that her move depends

04 0.2 0.2 0.4

04 08| 04 0.2|

only on the current command,

not her

direction the field or the prior history of any moves she has made. Tll omit

the diagram,

but we have states N, E, S, and W

in a clockwise,

and

four arrows for each state: (L,0.9) and (R,0.1) in the counterclockwise direction and (LZ, 0.1) and (R, 0,9) in the clockwise direction. She has made either 0, 1, 2, 3, or 4 correct turns, with probability @ (0.9)*(0.1)4-+ for each number i. She is facing north if 7 is even, and facing south if it is odd.

The probability that i is even is 0.6561 + 6(0.0081) + 0.0001 = 0.7048.

Again, she will be facing north if and only if she has made an even number correct turns (or an even number of incorrect ones). Her expected number

of of

errors is 10, but the variance of the number is 100(0.9)(0.1) = 9, and thus the

standard deviation is 3. We would expect that such a random variable would be

close to equally likely to be even or odd. If she made no errors in this case, she would be facing south. But whether she finishes facing north or south again depends on whether she makes an even or odd number of errors, and just as in part (d) we would expect this outcomes to be equally likely. Exercise 12.5.9

If you are in A, staying gives you an expected reward of (0.8)3 + (0.1)2+(0.1)0 = 2.6, while jumping gives you (0.0)3 + (0.5)2 + (0.5)2 = 1.0, so you should stay. In B, staying gets you (0.1)3 + (0.8)2 + (0.1)0 = 1.9, while jumping gives you (0.5)3 + (0.0)2 + (0.5)0 = 1.5, so you should stay. In C, staying gives you (0.1)3+(0.1)2+(0.8)0 = 0.5, while jumping gives you (0.5)3+(0.5)2+(0.0)0 = 2.5, so you should jump. The overall strategy is SSJ. 8-50

e In strategy SSJ,

the steady-state probabilities

a, b, and

c for the three states

are given by (a,b,c) = (0.8a + 0.1b + 0.5c, 0.1a + 0.8b + 0.5¢,0.1a + 0.1b + 0), and this solves to a = 5/11, b = 5/11, and c = 1/11. The steady-state expected reward per turn is 3(5/11) + 2(5/11) + 0(1/11) = 25/11 = 2.27.

Exercise 12.5.10

In strategy AA,

with probability r of going from P to Q and s of going from Q to

P, we can compute that the steady state distribution (x,y) satisfies the rule (a, y) = (x(1—r) + ys,ra+ (1 —s)y), giving us — Using action B in res and y = —.. rs state P replaces these r’s by and using B in state Q replaces these s’s by u’s. To maximize

the steady-state fraction of the time we spend in state P, we prefer r to t

if it is smaller, and s to uw if it is larger. Exercise 12.6.1

The expected

points.

return from kicking is 0.5 points, and from a two-point attempt

But near the end of the game,

if the only important

is 0.8

thing is to get at least

one point, the kick has a 50% chance of getting it while the two-point attempt has only a 40% chance. So, for example, if we have just tied the score with a touchdown

on the last play, we should still kick to maximize our chance of winning. Exercise 12.6.2 ‘We use induction on n, starting from the base case of n = 3 which is proved in the text. Along with showing that BAA outperforms BBA, we will also prove that the expected reward from state sy is greater than that from sg, and that the reward from 8, is greater than the reward from sg, in the n-move horizon game. This is true for

n = 3 by the calculation in the text. For the (n + 1)-move horizon game, we play the one-move horizon game with a reward function given by the expected reward from the n-move game. By the inductive hypothesis, we can see that B gives a better reward

from

state

so,

A

from

s;,

and

A

from

s9.

For

example,

from

s;

A

offers

(3/4)rn(1) + (1/4)rn(2) while B offers (1/4)rn(1) + (3/4)rn(2), so that A is better

because rp(1) > 1n(2)-

Exercise 12.6.3 ‘We simply need to note that after ¢ turns, if the expected reward from any policy using the original function is x, then the reward under the same policy using the new function is ax + bt.

This is because

if we write the old reward

new reward is ar;, + b+arj, +b+...+ar, sequence of moves

as rj, +... +1;,,

the

+b=a(ri, +...+7;,) + bt. So if one

is better than another with the original function, it remains better

for the new function. The analogous result does not hold if a is negative, or if we are comparing two state sequences of different lengths. Exercise 12.6.4

The expected number of turns played, E, obeys the rule E = (1— y)(1) +7(1+ B), which solves to E = 1+ 7E or E= 7. Thus the expected number of turns is 2 for y= 0.5 and 10 for 7 = 0.9. Looking at our example MDP, we know that the optimal

play in the 10-move horizon game is to play BAA BBA

for the last two.

We

would not expect

for the first eight moves, then play

that 0.9-discounted expected

return to

be the same as the return from this strategy, because it corresponds to always playing

BAA. Exercise 12.6.5

The first statement follows from the definition of expected value — both the expected reward from being in a random state and the dot product are )°; djr;. The second

statement follows from the Markov Chain Theorem and the first statement — we know that dA’ is a row vector giving the probability distribution of the position after t

8-51

steps, and again the dot product of this row vector with the reward function gives the expected reward for a random position chosen from that distribution.

Exercise 12.6.6 We use induction on n. For any i, let f(i) and g(i) be the expected reward after i turns, starting in P and Q respectively,

if we use the optimal strategy.

For n = 0, f(0) =

and g(0) = 0 no matter which strategy we use. For any natural n, we compute f(n+1)

as the maximum of 1 + rg(n) + (1 —1r)f(n) and 1+ tg(n) + (1 — t)f(n). Similarly, g(n +1) is the maximum of sf(n) + (1—s)g(n) and uf(n) + (1—u)g(n). If we know that f(n) > g(n), we know that taking the minimum of r and ¢, and the maximum of s and u, are the optimal strategies for this turn. Without loss of generality, with r and s as the choices, we can compute f(n+1)—g(n+1) as 1+(f(n)—g(n))(1-r—s). So we can prove by induction that 0 < f(n) — g(n) B(a). Since this is an implication, we can use a direct proof, assuming P(a)\W(a) and trying to prove B(a). As before, we use the Rule of Specification on the premise to get W(a) > B(a) for this particular arbitrary a, whereupon we can conclude B(a) by propositional calculus rules. Since a was arbitrary, and we proved (P(a) \ W(a)) + B(a) without any assumptions, the Rule of Generalization allows us to conclude Va : {(P(x) A W(x)) > B(2)]. ‘You may have noticed that like the propositional calculus, the predicate calculus seems to be able

to prove only obvious statements. If the truth of a statement has nothing to do with the meaning of the predicates, of course, we can’t expect to get any insight about the meaning. It’s somewhat more difficult (and beyond the scope of this book) to prove that the predicate calculus is complete (that is, that all true statements are provable), but this can be done. The real importance of these proof strategies, though, is that they remain valid and useful even when other proof rules are added that do depend on the meaning of the predicates. We'll see

examples of this starting with the case of number theory in Chapter 3.

2.6.3 E2.6.1

Exercises Indicate which quantifier proof rule to use in each situation, and outline how to use it:

a) (b) b

The desired conclusion is “All trout live in trees”.

)

You have the premise “Tommy

lives in tre

(c) You have the premise “All trout live in tr (d) E2.6.2

)

and Tommy

is a trout.

and Tommy is a

trout.

You have the premise “Some trout lives in trees”.

Prove that the statements Va : Vy: P(x,y)

and Vy : Vx : P(x,y) are logically equivalent, by

using the proof rules from this section to prove that each implies the other.

E2.6.3

Repeat Exercise 2.6.2 for the statements Sa : Sy: P(x,y)

and dy : Sa : P(a,y).

4 Use the proof rules to prove the statement Vy : dx : P(x, y) from the premise Ju : Vu : P(u, v). Is the converse of this implication always true? E2.6.5

The

law of vacuous

proof can easily be combined

that

any proposition at all holds for all members

2-27

with the Rule of Generalization to prove

of an empty

class.

Demonstrate

this by

Exercise 1.4.8

pq q>p

pA(q> 74) (pq) > (-aA->p) Exercise 1.4.9

ap V q, Either mackerel are not fish or trout live in trees, or both.

aq V p, Either trout do not live in trees or mackerel are fish, or both. pA (7qV 7q), Mackerel are fish and either trout do not live in trees or trout do not live in trees, or both.

>(p® q) V (=q A np), Either it is not the case that either mackerel are fish or trout live in trees, but not both, or both trout do not live in trees and mackerel are not fish, or both.

Exercise 1.4.10

Mackerel are fish if and only if trout live in trees. Trout live in trees if and only if mackerel are fish. Mackerel are fish, and trout live in trees if and only if trout do not live in trees. Mackerel

are fish or trout live in trees, or both, if and only if trout do not live

in trees. Exercise 1.5.1

The set of black horses The set of animals that are either female or are black sheep, or both The set of black animals that are either female or sheep, or both

The set of animals that are not female horses The set of female animals that are neither sheep nor horses

The set of animals that are either (a) horses or (b) female or sheep, but not both black and not female, but not both (a) and (b). Exercise 1.5.2

F\H (FN S)U(BNA)

BUS (FNBOS)A(BN4A) Exercise 1.5.3

{x : x either has five or more letters or has two a’s in a row, or both} {x : x both has five or more letters and has two a’s in a row} {x : x has five or more letters but does not have two a’s in a row} {x : a either has five or more letters or does not both have five or more letters and have two a’s in a row}

Exercise 1.5.4

{0, 1, 2,3, 4,5, 8} 0 {0, 1, 3,5, 8} {5,8}

{x : x is even}, same as E 16-6

cee --- evens 6 10 12 14

@©Kendall Hunt Publishing C Figure 16-1:

The Venn Diagram for Exercise 1.5.5

(f) ) {0,1,2,3,4,5,7,9,11,...} or {a (g) ) {13,5} (bh) ) {2,4,6,10,12,14,...} or (i)

) {0,1,3,4,5,...} (k) ) {0,1,3} (1) ) {6,8,10,...} or Exercise 1.5.6

is even but not 0 or 8}

{0,4,6,7,9,10,11,...} or {a : (w@=0) V (a

(j)

Exercise 1.5.5

9)}

or {x: a # 2}, same as C {@ : a > 6 and & is even}

See Figure S-1 for the diagram. (a)

{© :0

0 1 1 0

»)))

001 110 001 011

41

0 «0 1

It is not a tautology, because the second column of this truth table (representing the

+)

is not all ones.

pa\p

&

[(p

01/0 O}1 il

0 0 1

0

00/0

0

>

V

@

A

Pp)

0 1 1 1 1 0 00 0 0 hi Lhd

0 1 it

1

0

a

010

0

0

Exercise 1.6.3 The Venn diagram is in Figure S-2.

par\pA 0000/0

00

@

vr) 0

1 00 0011/0 1 01 0/0 01 1 01 0111/0 0 00 1 00/1 1021 101}/1 2121 1 1 10/1 1idij}i1211

0

®&

00

1 21 00 0 21 21 1 0 212 1 0210 10121

16-8

(r 21

0 21 0

21 0 21

21 1 0 0 21

>

4))

0 0 000 0 «0 2111 212121 100 2100

0 1 ol «0 ~«0

(p

00

A

0

21 21

©Kendall Hunt Publishing Company Figure 16-2: Exercise 1.6.4

The Venn Diagram for Exercise 1.6.3

The truth tables show that the two compound propositions have different truth values in two of the eight possible situations, those corresponding to p A 7q A ar and to pANqr-r.

par|p V @ A r)|(pPV gar 00 0/0 0 01/0 0 10/0

0 0 0

0 0 1

10

1:0

0 0}/0 0 0 00 01/0 00 01 00/0 1 100

011/011 1 1/0114 2141 0/1

0

0/1

10

1011/1 1001/1 1 10/1 11 00/1 diajtaaa.af1 Exercise 1.6.5

00

1011 1100 11.41

The table below has the sixteen possible columns labeled 0 through explicit compound proposition at the bottom of each column.

15,

with

Pq 00 Ol 1 0

0 0 0 0

a 0 0 0

2 0 0 1

3 0 0 ‘L

4 0 1 0

5 0 1 0

6 0 1 ‘L

7 0 1 al

11

0

0

1

p\q

0

pA-q

1

p

0

—pAq

1

4

0

peq

1

pvaq

Pg 0.0

8 1

9 1

10 A

0 1

11 L

0 dL.

12 1

13 1

1 0

14 L

1 dL.

15 1

0

1

0

1

0

1

0

1

ol 1 0 1

1

0 0

—pA-qg

0 0

pe

gq) pV-q

16-9

1 0

=p

pVG

—=pV-qg

1 1

1

the

Exercise 1.6.6

a

y

z|

Value | ITE(:

000) z 001) 2 010) 2 O11) 2 100} y 101; y 110; y 1idjly

The columns

0 1 0 1 0 0 1 1

for the atomic variables are not consistent.

The column

for the first p

agrees with that for the second q, and vice versa. Every column with the same atomic variable at the top should be the same.

Exercise 1.6.8

Exe:

The compound

proposition is a contradiction if and

only if the column

for the final

operation is all zeros. It is satisfiable if and only if there is are one or more ones in that column. (That is, it is satisfiable if and only if it is not a contradiction.)

e 1.6.9 If you know that x is true, for example, you may ignore the 2k-1 lines that have x1 half, leaving only 2*~! lines to consider, half the original number. If you know that p— qis false, then you know both that p is true and that q is false. of the lines of the table, 2'~? of them, have these two properties.

Exercise 1.6.10

Only a quarter

If the first old column (for P, say) has a 1, you may fill in a 1 for P V Q without looking at Q. If P is 0, you may fill in 1 for P > Q without looking at Q. But in the other two cases, you cannot be sure of the result of P + Q or P@Q without looking at both P and Q.

pale ® 4) + 00/0 001 o1j/o 111 10/1 1021 11/1012

(b)

((p A > 4) Vv (PA Q) 001 0 0 00 0 00011 011 11101010 0 10010010 21

pa\(p

Aq)

>

Pp

00/0 0 0 10 01/0 01 10 10/1 00411 Eajk ht a wt

(c)

pa|> 00/1 01/1 10/1 11/0

pA qe (pv Q 000 21 1011 0 001 2110101 100101110 1111010041

16-10

parl(PAn+aAWMA7N) 000; 001; 010); 011) 100; 101) 110); lL ik |

0 00 1 01 0 01 1 01 0 00 1 11 0 012 1 21 21 1 00 1 00 1 11000 1°00 1 1 1 a 2 2 2 F

() (f)

par\(e 00 0;0 001;/0 01 0}/0 O11}/0 10 0/1 1101/1 11 0/1 1.ia};i

Exercise 1.7.2

(a)

p|\(7

p

Oo; 1 1}/0

00 11

+

A

p ai(p

pvg aapV ap aq + aq qVp

(c)

peg

gq q 7p p

A 4a) 0 0 01 00 11

0)

@

p

0 1 02121

0

7)

1 1 0 212 1 1 0 2

2

1 1 1 1

1:0 1 0 1 21 1 21 0 0 21 0 21 2 YT A

>

>

0 0 1 212 0

0 1 0 1 0 1 0 L

0 1°21 0 «2°21 0 1 0 +272

@

+

1

1 1 2 1 «22

A BP)

0 0 0 1 0 0 001 2 21 «21

Premise Double Negation Definition of Implication Contrapositive Double Negation Definition of Implication

((pA 79) V (=p q)) ((-9A p) V (GA -P)) ((qA -p) V (4 \P))

q@2p

@

>

1°01 1°01 1°10 1°22 0 00 0 000 1 1 01 e112

00/0 01/0 10/1 1ay}il

(b)

001 000 0 01 0 00 1 1 1 100 2°21 212 tT O00

49

Premise

Definition of Exclusive Or

Commutativity of AND (twice) Commutativity of OR Definition of Exclusive Or

16-11

&

0 0 0 021 1 2 1 «21

(p

21 21

0 0

«21 «21

Oo 1 1 1 1

21 21 «21 4

>

1

21

7) 0

21 1

0 21

0 0 21 «21 0 0 «23 ~2

> 1 1

4q) 0 0

ne os 11 0 0 0 0 161 161

par\(p

A

0011/0 01 0/0 011}/0 10 0/1 1031/1 110; 1 1. a};1

00 0 1 01 0 0 01 01 00 00 00 01 1100 1212212

00

(e)

Exercise 1.7.3

0/0

00

4)

Ar) 00

&

(p

1 1 1 1 1 1 2

00 00 001 1 0 100 1 0 2 22

1

00

A

(q 0

A

0

Exercise 1.7.5

0 1 0 0 1 «21 0 0 0 0 21 10 0 21°21~«2

(pVq)Vr Premise =[7(pVq)A7r] — Demorgan Or-To-And =|(-p A 7g) Amr] DeMorgan Or-To-And -=[>pA (-qA n7r)] Associativity of AND =[>pA-(qVr)| DeMorgan And-to-Or pV (qVr) DeMorgan And-To-Or

(a) >((aA 7b) V (a@ b)) & (A(aA 7b) A7(a @ b))

(a)

Left Separation, a V 7b for p, c > d for q

(b)

Excluded Middle, r — —p for p

(c)

Contrapositive, a A b for p, b for q

ped

(pq) (pV

Exercise 1.7.7

8)

Premise

A(q-> Pp) g) A(=4qV p)

Equivalence and Implication Definition of Implication

=[(>(=p V q) V >(=q V p)|

DeMorgan And-To-Or

=[(pA 7g) V (¢A 7p)]

DeMorgan Or-To-And

>(p® q) Exercise 1.7.6

0

0 1

(b) (a+b) > (b> @)) A (6 a) > (a 8) > (a> d) > (2 ( ©) ((aAb) 4 (aVb)) 4 (aAb) > (aV b)) A ((a Vb) + (a AB))) Exercise 1.7.4

7))

Definition of Exclusive OR

Simplest is P = 0 and Q = 1. The compound proposition (r @0) — (r @ 1) simplifies to r > 71, which is not a tautology because it is false when r is true. pq apVq qV 7p

Premise Definition of Implication Commutativity of OR

aq > ap

Definition of Implication

-=(->q) Vp

Double Negation

Exercise 1.7.8

16-12

1. 2. 3.

7A(q— 1) 7(-qVr) gArr

Second Premise Definition of Implication DeMorgan

40q 5. pV 7q 6. 7qVp 7 q>p 8. p

Exercise 1.7.9

Exercise 1.7.10

Te 2. 3. 4.

=p) pvqg ap—q_ gq

Second Premise First Premise Definition of Implication Modus Ponens, lines 1 and 3

1.

pA(qVr)

Premise for first proof

2. 3.0 4.

p pq

Left Separation, line 1 Assumption for Case 1 of first proof Conjunction, lines 2 and 3

6. 7

nq qvr

Assumption for Case 2 of first proof Right Separation, line 1

8

or

Tertium Non Datur (see Exercise 1.7.8), lines 6 and 7

9.

pAr

Conjunction,

5.

10.

ll.

(pAq)V (pAr)

Left Joining, conclusion of Case 1

(pAq)V (pAr)

(pA(QVr)) > ((pAq) V (pAr))

lines 2 and 8

Right Joining, conclusion of Case 2

13. 14.

pAq p

15.

q

Proof by Cases, end of first proof Premise for second proof Assumption for Case 1 of second proof Left Separation Right Separation, line 13

16.

17. 18.

qvr

pA(qvr) 7(pAq)

Left Joining

19.

pAr

Conjunction, lines 14 and 16, conclusion of Case 1 Assumption for Case 2 of second proof Tertium Non Datur, lines 12 and 18

20.

r

Right Separation

21.

qvVr

Right Joining]

((pAq)V (pAr)) > (PA(GV7)) ((pAQg)V (pAr)) & (pA ( qv r))

Conjunction, lines 14 and 21, conclusion of Case 2 Proof by Cases, end of second proof Equivalence and Implication

12.

22.

23. 24. Exercise 1.8.1

Left Separation First Premise Commutativity of OR Definition of Implication Modus Ponens, lines 4 and 7

OAp 0

(pAq)V (pAr)

pA(qvVr)

(OAp) 30 -0 0—(0Ap) (OA p) rA(w— d) Second Premise 3. Assumption for Proof By Cases 4. cvd Right Joining 5. 7e Assumption for other half of Proof By Cases 6. ar Left Separation (from second premise) 7 (rVe)Vw Commutativity and Associativity of OR, from first premise 8 A(rVe) > w Definition of Implication 9. 7eA7r Conjunction from lines 5 and 6 10. 7(r Ve) DeMorgan Or-To-And ll. w Modus Ponens (lines 10, 8)

Exercise 1.8.3

12.

wrod

Right Separation, second premise

1B. 14. 15.

od cvd (premises) + (ec Vd)

Modus Ponens (lines 11, 12) Left Joining Proof By Cases

(a)

Subgoals are (pq derive q by Right

Aq) > (pV q) and (pAqA7q) Separation

use associativity of AND

and

> (pV q). For the first proof,

then p V q by Left

Joining.

For the second,

and Excluded Middle to get pA 0, which is 0 by Right

Zero, and derive p V q from 0 by Vacuous Proof. b)

Subgoals 8

are

(p Aq)

>

p and p >

(pV q).

But

each of these subgoals can be gi

proved by a single rule, Left Separation and Right Joining respectively. Exercise 1.8.4

Contrapositive

gives a premise of =(p V q) and a conclusion of =(p Aq).

The proof

goes through =p /\-q (DeMorgan), =p (Left Separation), and spV 7q (Right Joining), with the last step DeMorgan again. Contradiction gives a premise of (pA q) A 7(pV q) and a conclusion of 0. The premise

can be used to derive pA q A (=pA 7g) (by DeMorgan), and then commutativity and associativity of AND can be used to get pA 7p ANDed with other things, which is 0 by Excluded Middle and the Zero rules.

Exercise 1.8.5

(a) Get p by Left Separation.

( b) c)

Get =pA 7(q Vr) by DeMorgan Or-to- And. Change the second ANDed

component to q + r by Contrapositive.

(a)

Use the Vacuous Proof rule to get the desired implication from p.

(b)

Convert r > (ap Aq) to the desired OR statement.

(c) Use Modus Ponens on r + p and r to get p. Exercise 1.8.7 Letting s mean “you say a word” and c mean “I will cut off your heads”, our premise

is (s + c) A(>s + c). We can easily derive c from this premise:

16-14

1.

(sc)

Qn 3. 4. 5 6. tes

8 ste ¢ ons ase 16

(a)

Since we have p Vr as the front of an implication in the premise, to form it so we can use Modus Ponens with that implication.

(b)

We have only done one of the two cases — our proof of s used an assumption but

8

((s

A(7>8 4c)

Premise

e)A(=s—-c))— ce

Assumption for Case 1 Left Separation, line 1 Modus Ponens, lines 2 and 3, conclusion of Case 1 Assumption for Case 2 Right Separation, line 1 Modus Ponens, lines 5 and 6, conclusion of Case 2

Proof By Cases, end of proof

Exercise 1.8.8 we would like

we eventually need to prove that s holds with or without that assumption.

(c) We

need to put the statement

of line 7 into a form where

Ponens with line 6. If we instead used Tertium Non Datur

we can use Modus from Exercise 1.7.8,

we could use line 7 directly without this transformation. (d)

Exercise 1.8.9

In line 3, we were operating under the assumption of the first case, and now we are not. We cannot guarantee that something true in one case can be taken to use in a different case.

1. 2.

pV (qAr) p

Assumption for first half Assumption for Case 1 of first half

3. 4.

pvr xp

Left Joining, conclusion of Case 1 Assumption for Case 2 of first half

9.

pvr

Assumption

«8

Modus

5. 6. 7 8 10. Al:

12. 13.

qAr or pvr (pV(qAr)) > (pvr)

Tertium Non Datur (Exercise 1.7.8), lines 1 and 4 Right Separation Right Joining, conclusion of Case 2 Proof By Cases, end of first half

(pVr)>s8

for second half

Second premise

(pvr)>s ((pV(qAr))A((pV rr) +s8)) 48

Ponens,

lines 9 and

Conclusion of second half Hypothetical Syllogism, lines 8 and 12, end of proof

Exercise 1.8.10 ‘We need to show that P still implies C in the other case, where q ¢

need a new proof starting from P A =(q 4 r) or PA(q@r).

Exercise 1.10.1

Exercise 1.10.2

10

is false.

So we

(a)

Signature “real «, real y, real 2”, template

(b)

Signature

(c)

Signature “team X”

(d)

Signature

(a)

“The strings a and ab start with the same letter, and either ba has exactly two

“real x, real y”, template , template

“player p, team

“(y E(n +1) respectively, using quantifier proof rules. You do not need to justify standard

facts about addition and multiplication.

2-28

(b)

“Ifa and a start with the same letter, then either aaa starts and ends with the saine letter or b has exactly two letters if and only if b and aba start with the

same letter.” This is TRUE (c)

(1 > (1V (1 4 1)) = 1).

“If a has exactly two letters, then aa starts and ends with the same letter and either b and bbb start with the same letter, or \ has exactly two letters, but not

both.” This is also TRUE Exercise 1.10.3.

(0 > (1A (1@0)) = 1).

(a)

(P(ab) A R(w,ab)) > Q(w), w is the only free variable.

(b)

[Q(aba) @ R(aba, bab)| — [R(aa, bb) \ P(aa)], there are no free variables.

(c) >R(u, v) > [(P(u) A Q(u)) & (P(v) A Q(v))], u and v are the free variables. Exercise 1.10.4

Such a predicate is called a “proposition”, since it is true or false already without any variable values needing to be supplied. There is no reason why a predicate shouldn’t

have an empty argument

list, though it would normally be treated as a proposition

(a boolean constant).

Exercise 1.10.5

public

boolean

{// Returns

between

(real

x,

if x is between

true

real

y,

real

y and z.

z)

if (y < z) return (y < x) && (x (F(d) V R(d))

F(c) @ F(d) Ble) & (B(c) V Bd)

Exercise 1.10.8 Statement (a) tells us that R(c) is true and F(d) V R(d) is false, so >F(d) and 4R(d) are both true. Statement (b) that tells us that F'(c) is true, since F(d) is false. Finally, we can examine statement (c) by cases. If Cardie is not black, then at least one of them

is black and

Duncan

must

be black.

If Cardie is black,

which is a contradiction. So B(c) is false and B(d) is true. Exercise 1.10.9

(a) The three statements

@

((T(a) V T(b))

=T(a)), and T(c) + 7(T(a) AT(b) A T(c)).

( b) Exercise 1.10.10

are T(a)

+

then neither is black,

T(c))

T(b)

&

(T(e)

=

All we have done is to rename the three boolean variables, so nothing changes.

(a) CharAt(w,0,a) (b)

CharAt(w, |w| — 1,a)

(c)

CharAt(w®,i,a)

+ Char At(w, |w| —i—1,a)

16-16

16.2

Chapter

2

( (

o

Exercise 2.1.1

Exercises From

(c

(d

Exercise 2.1.2

(a) Z(ch) meaning “ch comes before e in the alphabet” (b)

{b, f,j,p,v}, with z added if you consider y a vowel

(c) Y(ch) meaning

“ch is the n’th letter in the alphabet and n is divisible by 3”

(d) {a,h,i,m, 0,t, Exercise 2.1.3

(a)

u,v, w, a, y}

Fist ch) meaning “ch, = b and chy € V”

(b) {(b,e), (d, 0)} (c)

ban,

(d) {(0,a), Exercise 2.1.4

“ch, = c or chy=

d, and chz = e or chy =u”

(b,e), (b,#), (b, u); (¢, 0), (d, 0)}

(a)

[[true, false, false, false],

true, false], [false, false,

[false, false,

false, false, false]]

(b)

[[true,

true,

false],

[false,

true,

[false,

true,

true, (c)

false,

false],

[[false, false,

Exercise 2.1.5

chz) meaning

false, false,

false, false],

false,

false], [false,

true],

[false, [true,

false, false,

true]]

[false, false,

false,

true],

false, false,

false,

false],

[false,

false]]

(a) {(z,y): (y=2)V y= at} (b) 9 (c) A

(a) {(x,y):(y=2)V (@=y+D} Exercise 2.1.6 Exercise 2.1.7

{(0,0,0), (0,1, 1), (0, 2,2), (0,3, 3), (1,0,1), (1,1, 2), (1,2, 3), (2,0, 2), (2, 1,3), (3,0,3)} ( (a)

A 100 by 100 boolean array will do the job, and no other representation is likely to be better unless the picture has a simple description.

(b)

A list wil have one entry for each of the 14,654 words that occur in the Bible. This will require much less storage than a boolean array, which would have an entry for each of the 26?° or about 2(1078) elements of S. And there is no clear way to determine membership in R that would be easier or faster than looking for the word in a list.

(c) Here a method can easily calculate whether (a, ) satisfies the rule. Exercise 2.1.8 (1, 2,3), (1,2,4), (1,2,5), (1,3,4), (1,3,5), (1, 4.5), (2,3, 4), (2,3,5), (2,4,5), (3,4, 5).

16-17

Exercise 2.1.9

To specify a k-ary relation on an n-element set, we need to say for each of the n* possible k-tuples whether it is in the relation or not. These n* binary choices may be made in 2”" possible ways. 10.

Exercise 2.1.10

So we could

have k=

For 2” 1 and

to be less than 1000, n* itself must be less than

n Wy : P(y)]

Q(w) A [Vx : Q(x) + R(w, 2)] [P(w) A Q(w) A R(w, ab)]| + (w = aa) Exercise 2.3.3

a true sentence

a false sentence,

ab is a counterexample

a false sentence,

the first square-bracketed

statement

is true and

the second is

false w is the only free variable, but the statement is false for all w w is the only free variable, but the statement is true for all w Exercise 2.3.4

Vw : >P(w) V 7Q(w) Given any string, either it does not have exactly two letters or it does not start and end with the same letter, or both.

dw: P(w) A 7Q(w) There exists a string of two letters that does not start and end with the same letter.

(Vw : da: (w 4 x) A R(w,x)] A [Ay : >P(y)] For every string, there is another

string that starts with the same letter, and there exists a string that does not have exactly two letters.

3Q(w) V [Ax : Q(z) A >R(w,«)] Hither the string w does not start and end with the same letter, or there exists a string that starts and ends with the same letter that does not start with the same letter as w. P(w) AQ(w) A R(w,ab) A (w # aa) The string w has exactly two letters, starts and ends with the same letter, and starts with the same letter as does ab, but w

is not aa. y is free,

x and z are bound 16-18

Exercise 2.3.6

(b)

w is free, x, y, and z are bound

(c)

y is free in the first expression and bound in the second, x is bound in the first expression and free in the second, and z is free

(a) m1(P)(x)

: P(w,y).

(b) Exercise 2.3.7 Exercise 2.3.8

is true if and only if dy

Exercise 2.5.1

e Predecessor: (% < y) A 74z:(@ < z)A(z+3

1* ————_>93

woo

20

“4 ©Kendall Hunt Publishing Company

Figure 16-3:

Diagrams of Two Relations for Exercise 2.8.1

1. 36

ow

1e—__>ew

oe ey

x

®,.

Sey

4e

1.

Oun

82S 4

ow

-« oz

1.

sg

One

36 4e

oz

1.

ow

36 4

oy oz

Oas

ow

x

On

en

oy oz

1.2»

ow

36 4s

oy oz

-

©Kendall Hunt Publishing Company Figure 16-4:

Diagrams of the Relations in Exercise 2.8.3

Exercise 2.8.2 The statement was 73a : 5b: Fe: ((a,b) € R) A((a,c) € RA(b Ac). If we omit the “b # c” from this statement, then no relation is well-defined unless it is the empty relation. If any a and b exist such that (a,b) € R, then if we take b = c the statement is falsified. Exercise 2.8.3

total, but not well-defined since 1 maps to two different elements both total and well-defined, since each element of A is mapped

element of B

to exactly one

well-defined but not total as 1 and 3 are not mapped not total as 1 and

4 are not mapped,

different elements of B

not well-defined

as 3 is mapped

to two

well-defined but not total, no elements of A are mapped total but not well-defined, all elements of A are mapped to more than one element Diagrams are in Figure S-4. Exercise 2.8.4

All binary relations on an empty or one-element set are both symmetric and antisymmetric. A binary relation a two-element set {a, }} is either one or the other, depending

on whether the truth values of R(a,b) and R(b,a) are the same or different. But on a three-element set {a,b,c} we can have R(a,b) and R(b,a) both true, making R not antisymmetric, and have R(a,c) true and R(c,a) false, making R not symmetric. 16-23

se

a

02

®.,

NG

S

/

A 4

Pg

\a,

aS

Na73

@

u

aD ©Kendall Hunt Publishing Company

Figure 16-5: Exercise 2.8.5

Diagrams of Relations for Exercise 2.8.6

Since the symmetry condition cannot be violated by a loop, a symmetric relation may have loops on any or all of its vertices, or on none. Thus it could be either reflexive

(all loops) or antireflexive (no loops), or neither. Since the matching arrow for an arrow from a to b goes from b to a, the matching arrow for an arrow from a to a also goes from a to a. Thus a loop is its own matching arrow, and always has a matching arrow if it exists itself. Exercise 2.8.6

Diagrams are in Figure S-5. symmetric, no other properties no properties reflexive, symmetric, transitive, not antisymmetric

not reflexive (no loop at 5), not symmetric (5 to 3, not 3 to 5), not antisymmetric (1 to 2 and 2 to 1), not transitive (5 to 3 and 3 to 4 but not 5 to 4) Exercise 2.8.7 As we observed in Exercise 2.8.4, ff we have zero, one, or two elements in our set, every reflexive relation is also transitive and we can check that each one is either symmetric, antisymmetric, or both. So we need at least three elements, and we can complete the

example from Exercise 2.8.4 to solve this problem as well. We make R(a, b), R(b,a), R(a,c), and R(b,c) true and make R(c,a) and R(c,b) false. This fails symmetry with aand b or with a and c, and fails antisymmetry with a and b. But we can check that it is transitive: in order to have R(#,y) and R(y,z) both true, we either have z = c (which makes R(x, z) true whatever « is) or have all three variables come from a and b (which also makes R(x, z) true).

Exercise 2.8.8

(a) Each candidate is qualified for at least one job. (b)

No candidate is qualified for more than one job.

(c) Each candidate is qualified for exactly one job. candidates might be qualified for the same job.)

16-24

(But note that two different

(d)

Exercise 2.8.9

Q must have a subset that is a function and also has the property that no two candidates are mapped to the same job. (This will be called being “one-to-one” in the next section.)

If A is empty, then the only possibly R is also empty and is both reflexive and antireflexive. If A has exactly one element a, then it is reflexive if R(a,a) is true and antireflexive

if it is false.

But if A has two distinct elements a and b, we can make it

neither by setting R(a,a) true and R(b,b) false. Exercise 2.8.10

(a)

If k is odd, every real number has a unique k’th root, so for any given 2, p(x) is defined and the unique possible value of y is the k’th root of p(x).

(b) With k = 2 (or actually for any other positive even k as well), there are two possible values of y if p(x) is positive, one if p(a

0, and none if it is negative.

So if p(x) = 0, Rg, is a function (the zero function. If p(w) = 1, then Rx» is total but not well-defined. If p(a) = —1, Ry» is well-defined but not total. (c) With k = 0, we have to have y° = p(x), which is only possible if p(x) = 1. So if for any x we have p(x) #1, then Ro,» is not total. But if p(x) is the constant function 1, then any y satisfies y=

Exercise 2.9.1

Exercise 2.9.2

p(x), so Ro.p is not well-defined.

Since f(g(x)) = (Qa + 3) +2 = 2x +5, fog is the function taking x to 2a +5. This is an injection but not a surjection or bijection, and has no inverse. Since 9(f(x)) = 2(a@+2)+3=2r+7, go f is the function taking x to 22 +7. This is also an injection but not a surjection or bijection, and has no inverse.

(a)a)

injection, not surjection

(b) (c (d)

(e)

surjection, not injection

)

jection, inverse is itself

bijection, inverse is itself neither injection nor surjection

f) surjection, not injection Exercise 2.9.3

( a) ( b)

f is employees to reals, g is employees to titles, no composition

f is employees to titles, g is titles to reals, (go f)(a) is the salary of employee x

(c) f is employees to employees, g is employees to reals, (g 0 f)(x) is the salary of employee x’s supervisor

(a) f is reals to reals, g is employees to reals, (f 0 g)() is the tax paid by employee xz

Exercise 2.9.4

(a) (fo g)(w) = aw®; (foh)(w) =v" if w = va, w® otherwise; (go f)(w) = wPa; (go h)(w) = w if w = va, wa otherwise; (ho g)(w) = w; (f ogoh)(w) w=u

= w® if

aw" otherwise.

‘tion and is its own inverse, g is an injection but not a surjection and has no inverse, h is a surjection but not an injection and has no inverse.

(c) (f 0) is the identity function; (go g)(w) = waa; (ho h)(w) = v if w = vaa, vb if w = vba, A if w =a,

w otherwise.

2.6.4

P2.6.1

Problems

Following Lewis Carroll, take the prem: “All angry dogs growl” (Va : (A(x) A D(x)) > G(a)), “All happy dogs wave their tails”, “All angry cats wave their tails’, “All happy cats growl”, “All animals are either angry or happy”, and “No animal both growls and waves its tail”, and prove the conclusion that no animal is both a dog and a cat. Use predicate calculus and indicate which proof rule justifies each step. Proof by contradiction is probably simplest.

P2.6.2

In Robert Heinlein’s novel The Number of the Beast, the following two logic puzzles occur, in which one is to derive a conclusion from six premises. (Heinlein designed these in the spirit of Lewis Carroll.) Your task is to give formal proofs that the conclusions are valid. In the first, the type of the variables is “my ideas”, and the premises are:

e Every idea of mine,

that cannot be expressed as a

AES (x) + RR(x))

syllogism, is really ridiculous;

(Va :

e None of my ideas about Bath-buns are worth writing down; (Va : B(«) — -WWD(z)) e No idea of mine, that fails to come true, can be expressed as a syllogism; (Va : 9T (a) > 7ES(2)) e I never

have

any

really ridiculous

(Va: RR(x) > RS(x))

idea,

that

I do not

at once

refer to my

solicitor;

e My dreams are all about Bath-buns; (Vx : D(x) > B(«)) e I never refer any idea of mine to my

RS(x)

solicitor,

> WWD(z))

The conclusion is “all my dreams come

unless it is worth writing down.

true”, or Vz : D(x)

+ T(x).

(Va :

Prove this from the

premises using the rules of propositional and predicate calculus.

P2.6.3

Heinlein’s second puzzle has the same form.

Here you get to figure out what

the intended

conclusion is to be!?, and prove it as above:

e Everything, not absolutely ugly, may be kept in a drawing room; e Nothing, that is encrusted with salt, is ever quite dry; e Nothing should be kept in a drawing room, unless it is free from damp;

e Time-traveling machines are always kept near the Nothing, that is what you expect it to be,

sea;

can be absolutely ugly;

e Whatever is kept near the sea gets encrusted with salt. P2.6.4

We can now adjust our rules from Section 1.5 for translating set identities into the propositional calculus, by adding a quantifier to the translations of A C B and A = B to get

A(x) + B(x)

€ A”.

empty

and Va : A(a) + B(x).

Here “A(a)”

is an abbreviation for the predicate

We can also use an existential quantifier to express the statement

rv: A(a)”

is equivalent to “A 4 @”.

that a set is not

Translate the following set statements into

the predicate calculus, and prove them with the rules from this section.

°You must also translate the statements into formal predicate calculus — note for example the two different

phrasings used for “is quite dry”. In the novel, the solver of the puzzle concludes (correctly) that the nearby aircar is also a time-traveling machine, but strictly speaking this is not a valid conclusion from the given premises.

2-29

Exercise 2.9.5 We must prove that for all elements a of A, ((hog)o f)(a)

= (ho (go f))(a). Let a be an arbitrary element of A. Let b = f(a), ¢ = g(b), and d= h(c) — we will show that both function outputs are d. The first is defined to be (ho g)(f(a)), which is (hog)(b), which by definition is h(g(b)) = h(c) = d. The other function output is defined to be h((g 0 f)(a)), which is h(g(f(a))) = h(g(b)) = h(c) = d. Since a was arbitrary, the outputs of the two functions are the same for any a in A and the two functions are equal.

Exercise 2.9.6

(a) We take the definition of “R is onto”, Vb: Ja: R(a,b), and rewrite it in terms of R-!. This gives us Vb: Ja: R~1(b, a), which is exactly the definition of “Ro is total”. The two statements are clearly equivalent.

(b) The definition of “R is one-to-one” is Va : Vb: Vb! : (R(a,b) A R(a,b')) > (b= v/). When we rewrite this in terms of R~! we get Va : Vb : Wb! : (R-1(b,a) A R-1(b',a)) + (b =H), which is exactly the definition of “R~! is well-defined”. The two statements are clearly equivalent.

(c) This follows immediately from (a) and (b). R is both onto and one-to-one if and only if R~! is both total and well-defined, which is the definition of R~! being a function.

(d) Let A= {a}, B = {b,b’}, and R be the relation from A to B given by the pairs (a,b) and (a,b’). Then R is not a function because it is not well-defined, but R-' is the function taking both b and U to a.

Exercise 2.9.7 The

inverse is the composition g~! o f~. To prove this we must show that both gtof-tofogand fogog!o f—! are the identity function on A. If we apply the first function to an element a, we get g~!(f~!(f(g(a)))).

This is g~!(g(a)

because

fo f~1 is the identity function, and this in tum is a because g~! o g is the identity function.

Exercise 2.9.8

A similar argument applies to the second function.

There are n” possible functions, because we must make r choices,

one to choose f(a)

for each element of A, and each choice is thus from n possibiliti If r = 0 then n” = 1, and there is indeed one function that has no pairs at all. Ifr = 1 then n" =n,

and we have n functions because we can map the single element of A to any of the n elements of B. If n =0 then n” = 0, unless r is also 0, and this is cannot have a function from a non-empty set to an empty set — to map some element of A to an element of B, and the latter does then n” = 1, and there is exactly one function, the one that maps

correct because we it would require us not exist. Ifn = 1 every element of A

to the single element of B. Exercise 2.9.9

If A has zero elements or one element, any function is an injection because we cannot violate the one-to-one condition without two different elements of A. Similarly if B has zero elements or one element, and A is non-empty, any function is a surjection because it meets the onto condition by hitting the single element if it exists. If A is empty and B is not, the unique function from A to B is not a surjection.

But if both

are empty, the unique function is both an injection and a surjection. Exercise 2.9.10

Given a set S, we choose a string w such that w.charAt (i) is true if and only if i € S. Given a string w, we let our set S be {i : w.charAt(i)}. If two sets are different, there is some element i in one and not the other, and the corresponding strings differ in the i’th place.

If two strings are different, there is a place i in which they differ, and

16-26

©Kendall Hunt Publishing Company Figure 16-6:

The Hasse Diagram for Exercise 2.10.3

element i is then in one set and not the other. So both functions are injections, and are inverses of one another, so they are bijections. Exercise 2.10.1

Let a and b be arbitrary naturals, with a > 0. First assume that dc : b = ac. The natural b%a is defined to be that natural r such that qa+r for some q and r 1, ¢ has multiple factorizations

including

, and so forth.

t=Oandt=1, where

public

Exercise 3.8.10

boolean

while

So unique factorization holds only for

there are no primes at all.

(n %

kenkenNumber

while (n % while (n % while (n % return (n ==

On

0) n 0) n 0) n 1);}

(long n)

3

Any natural is a Kenken number if and only if its one-digit primes. When we have removed all the tion, we are left with 1 if and only if the original will run quickly on any long argument, since it and thus there will be at most 63 divisions. predicate C(z,y)

that « =

y (mod

prime factorization includes only the one-digit primes from the factorizan was a Kenken number. This code can have at most 63 prime factors

Exercise 3.8.1

The

Exercise 3.8.2

Both these facts follow from the result that for any natural 2 and any positive natural

because x = x y = x(mod r) C is transitive: divides (a — y)

means

{

r). C

is reflexive:

O(x,2)

is true

(mod r) (r divides a —4 )). C is symmetric: If C(x,y) is true, then as well (because r divides y — x if it divides 2 — y) and thus C(y, 2). If C(,y) and C(y,z), then r divides both «—y and a — z. So it also + (y— 2) z, and thus C(2,z) is true.

r, there exist naturals q and a such that «= qr +a

anda

1, gi does not generate because

(g')"/4 = 1

because it equals (g”)*/4.

its order is less than n —

in particular,

Exercise 3.9.9 It is not possible. If S were any such set of complex numbers, it must contain both an additive identity a and a multiplicative identity m. We must have a = 0 in order to have a+m

m Aa.

=m.

Furthermore,

= m and

But then by closure under addition, S must also contain the elements 1+ 1,

1+141,1+1+4+1+1.,...,

Exercise 3.9.10

we must have m = 1 in order to have m*

If the number

of elements

and cannot be finite. is not a power

of p, then some

other prime gq

divides it,

and Cauchy’s Theorem says that an element of order q exists, contradicting Exercise

3.9.7. Exercise 3.11.1

(a) COGITO, ERGO SUM (b)

E PLURIBUS

UNUM

(c) ET TU, BRUTE? (d)

Exercise 3.11.2

“Out of many, one”)

(Julius Caesar,

ITE DOMUM

“I came, I saw, I conquered”)

(Monty Python’s Life of Brian, “Romans, go home” )

If a is relatively prime to m, we know an integer c exists such that ac = 1 (mod m) by the Inverse Theorem. If we let f(x) = ax +b and g c(a — b), then f(g(a)) = g(f(x)) = a. If a and m have a common factor r > 1, then all values of f(a) are congruent

Exercise 3.11.3

(Great Seal of the U.S.A,

(Shakespeare’s Julius Caesar, “And you, Brutus

VENI, VIDI, VICI

(e) ROMANI

(Descartes, “I think, therefore I am”)

to b modulo r and f cannot be onto as it misses the other elements.

public String rotate (String w, int k) {// rotates each letter of w by k, leaves string

for

out

(int

=

i=0;

"";

i < w.length();

16-46

i++)

{

non-letters

alone

char

ch = w.charAt (i);

char

outch;

if

(Ca’

else else

out return

outch

47, and this is greater than 47n + 47 = 47(n + 1) by the inductive hypothesis. So P(n + 1) is true, and we have completed the inductive step and thus the proof.

Exercise 4.4.2 When

is 2" =

137n3?

I see no obvious way

to solve this exactly,

but we may try

n= 10 (too small, because 2!0 = 1024 < 137000), and then n = 20 (barely too small, because 1048576 < 1096000) before noticing that n = 21 is the first natural making P(n) true (2097152 > 1268757). So it remains to assume P(n), for an arbitrary n > 21, and prove P(n +1). We know that 2"+! = 2" + 2", while 137(n + 1)? = 16-51

137n? + 411n? + 411n + 137. The right-hand side of this latter equation is clearly less than 137n* + 137n°, so by the inductive hypothesis it is less than 2" = 2”. We have proved P(n + 1),

Exercise 4.4.3

completing the inductive step and thus the proof.

By strong induction: For n < 4 there is nothing to prove, for n = 4 we can use two 2’s, for n = 5 we can use one 5. For n > 5, the strong inductive hypothesis says that we can make n — 1, so we make n+ 1 by adding one 2 to n—1. By odd-only and even-only induction: For evens with n > 4, the base case of n = 4 is done with two 2’s. If we can make n, then we can make n+ 2 by adding another 2, completing the induction for odds.

For evens with n > 4, the base is n = 5 which we

make with one 5. The inductive case goes from n to n+ 2, again by adding one 2. By ordinary induction: 2’s.

Assume

For n < 4 there is nothing to prove, and for n = 4 we use two

we can make

three 2’s to make

n+1.

n, with n > 4.

If we used a 5 to make

n, replace it with

If there is no 5 in the sum for n, since n > 4 there must be at

least two 2’s and we can thus make n+ 1 by replacing two 2’s with a 5. Either way, under the assumption that we could

make

n, we showed

that we could make

n + 1,

completing the ordinary induction. Exercise 4.4.4

The base case P(1) is true because 8 divides 17+7 = 8. For the inductive case, assume

that 8 divides n? +7 and we will show that 8 divides (n +2)? +7. We calculate that (n+2)? +7 =n? +4n+44+7= (n?+7)+4(n+1). But since n is odd, n +1 is even and 4(n + 1) is divisible by 8. Since (n + 2)? +7 is the sum of two numbers that are each divisible by 8, it is itself divisible by 8.

We have completed the induction and

proved the statement for all odd numbers.

Exercise 4.4.5

The base case is to prove that the first odd number is 1, and this is true by definition. The inductive case is to show that if the n’th odd number is 2n — 1, then the n + 1’st

odd number is 2(n + 1) — 1 =2n+1. This is true because by definition the n+ 1’st odd number is two greater than the n’th odd number. Exercise 4.4.6

Let P(n) be the statement the second player wins the n-stone game if and only if n is divisible by 4. P(0) is true because the second player wins the game with no stones, since the first player cannot move. Now assume that P(m) is true for all m with m < n, and consider the game with n+ 1 ston If n+ 1 is not divisible by 4, the first player can leave a number of stones that is divisible by 4 by taking one, two, or three stones. In the game that follows (in which the second player moves first), we know that the “second player” of that game has a winning strategy. This is the first player of the original game, so P(n + 1) is true in this case. (The number of stones is not divisible by 4, and the second player does not win.) The remaining case is when n+ 1 is divisible by 4, and we must prove that the second player has a winning strategy. The first player’s move will leave a number that is not divisible by 4, which means that the second player can leave a number that is divisible by 4, and we know by the strong inductive hypothesis that the second player has a winning strategy in this game. Thus P(n + 1) holds in this case as well.

Exercise 4.4.7 For the base case, are true.

Assume

with that

n =

1, the first statement

the m’th

statement

is true

must

be an axiom,

for all m

with m

2, and

assume P(n). Consider an arbitrary string w of length n+ 1 that contains both an aand ab. Write w as either xa or xb, where « is a string of length n. If x contains both an a and a b, the inductive hypothesis P(n) tells us it contains ab or ba as a substring, so w does as well.

If 2 does not contain both letters, and

w = xa,

then x

must be the string 6", and w contains a substring ba at the end. Similarly, if 2 does not contain both letters and w = xb, then x = a” and w has an ab substring at the end. We have proved P(n + 1) by cases. Exercise 4.4.9 We use strong induction on all positive naturals n, letting P(n) be the statement that any product of any n elements is in X. The base case P(1) is true because the product is the element a,, which is assumed to be in X. Assume that P(m) is true for all m

with m

S(y) > y. By a fact we proved about addition, (2+ S(y)) —y = (S(x) +y) —y.

Since we are assuming P(y),

we know that this latter expression equals S(«), as desired.

We now turn to Q(y), the statement Vx : (« > y) >

(« — y) + y=.

For the base

Q(0), let be arbitrary and note that (a > 0) + ((a—0)+0 = 2) follows from trivial proof, given the definitions of subtraction and addition. So assume Vx : (2 > y) > (w©—y)+y =a and we set out to prove Vx : (a > S(y)) + (w—S(y))+ S(y) = a. Let x be arbitrary and assume that x > S(y). By the Lemma, we know that x — S(y) = 0 if and only if z = S(y). In the case that x = S(y), (a—S(y))+S(y) = 0+S(y) = S(y) as desired. In the other case, we know that «—S(y) is not 0, so it is the predecessor of a—y. We thus need to compute pred(a—y)+S(y) = S(pred(a—y))+y = a—-y+y =, where the next to last step uses the fact that «—

Exercise 4.6.4

y 4 0.

Let x and y be arbitrary, and use ordinary induction on z. For the base cas —(y+0) and (a—y) —0 are both equal to y. For the inductive case, assume that «—(y+z) =

(a — y) — z and set out to prove (y+ S(z)) = (a@— y) — S(z). The left-hand side is a — S(y + z), which is the predecessor of x — (y + z), or 0 is x —(y+z) =0. The right-hand side is the predecessor of (a — y) — 2, or 0 if (ec —y)—z = 0. By the inductive hypothesis,

then, the left-hand side and right-hand side are the same.

Exercise 4.6.5 We first prove a lemma that times(x, induction on all positive w.

pred(w))=times(x, w) - xifw> 0. We use For the base of w = 1, times(x, 0) = times(x, 1) -

x is true as both sides are 0. For the induction, times(x,

w) w)

pred(S(w)))

= times(x,

= x + times(x, pred(w)). By the inductive hypothesis this is x + (times(x, - x) = (x + times(x, w)) - x = times(x, S(w)) - x.

Now to the main result. We let x and y be arbitrary and use induction on z. For the base case of z = 0, times(x, y - 0) = times(x, y) - times(x, 0) as both sides equal times (x,y). So assume that times(x, y - z) = times(x, y)

- times(x, times(x,

z)

and

S(z)).

set out to prove that times(x,

y - S(z))

The left-hand side is x times a number

= times(x,

y)

-

that is either the predeces-

sor of y — z, or 0 ify—z=0. By the lemma, this is times(x, y - z) - x, or 0 if y—z=0. The right-hand side is times(x, y) - (times(x, z) + x). If times(x,

y) >= times(x, hypothesis equals y) implies that y implicitly using a y < z-~

Exercise 4.6.6

(a)

z), this is (times(x, y) - times(x, z)) - x, which by the inductive the right-hand side. The other case of times(x, y) < times(z, < z, and in this case both sides of the equation are zero. (We are lemma that if a > 0, times(x, y) < times(x, z) if and only if

this is easy to prove by induction on a. We must show tity, and that know that the representative

that each operation is commutative, associative, and has an identhe distributive law holds. In the case of modular arithmetic, we class modulo m of a sum or a product does not depend on the of the congruence class we choose as an input. Thus to prove any

identity over congruence classes, such as a+ b = b+ a

16-54

or a(b +c) = ab+ae,

it

suffices to observe that the same identity holds over the integers.

If, for example,

we choose any three integers a, b, and c, we know that a(b +c) and ab+ be are the same integer. If we replace any of those three integers by others that are congruent to them modulo m, the congruence class of each expression remains the same, so the two clas remain equal. The identity properties of 0 for + and 1 for x over the integers imply the same properties modulo m.

(b)

Here

once again we can think

of an operation

threshold

t as being

the same

operation over the naturals,

followed by replacing the result by its equivalence

class for the relation where

all naturals t or greater are considered equivalent.

For each of the properties, the left-hand and right-hand sides yield the same natural result for any particular choice of inputs, so they yield equivalent results if we choose equivalent representatives. The identities remain 0 for + and 1 for

x.

Exercise 4.6.7 We know that the minimum and addition operations are commutative and associative for ordinary real numbers, and it is easy to check that the presence of co does not change any of these properties. The minimum of x and y is the same as the minimum of y and w, and the minimum of x, y, and z does not depend on any parenthesization.

The sum of any two numbers,

or of any three numbers,

including 00 is just 00, no

matter the order or the parentheses. For the distributive law, we must check that x plus the minimum of y and z is the minimum of «+y and «+ z, which is clearly true

by cases.

The identity for the new

“addition”

the new “multiplication” (addition) is 0.

(minimum)

is 00, and the identity for

Exercise 4.6.8 We've observed in Chapter 1 that the AND and OR operations are each commutative and associative, and that AND

distributes over OR.

but that is not a semiring property.) Exercise 4.6.9

(OR

also distributes over AND,

The identities are 0 for OR

and 1 for AND.

A polynomial in S{z] is a finite sum of terms of the form az, i where a is an element of S and i is a natural. If f and g are each such polynomials, we define f + g to be the sum of terms (a+ )2* for every i, where a and b are the 2’ coefficients in f and g respectively and the sum uses the addition operation of S. (We can take any missing coefficients to be 0.) This addition of polynomials is commutative and associative because for each value of i, the coefficients on each side are computed from the same inputs using the commutative and associative operations of S. The empty sum 0 is

the additive identity. To find the product

of two polynomials

f and g, we take every pair of terms aa’ in

f and bx! in g and form a new term (ab)2't/, with the multiplication being taken in S$. We then collect all these terms, adding together (using the addition of S) any that have the same exponent of a. If we reverse f and g, we get the same terms because S’s multiplication is commutative,

and the same sum because 5’s addition is

commutative and associative. The polynomial “1”, with the single term 12° where 1 is the identity of S, is the identity for polynomial multiplication. Why

is this multiplication associative?

If we compute f(gh) or (fg)h for three poly-

nomials f, g, and h, then either way we will produce exactly one term for each triple of

terms (ax 4 ) coming from f, g, and h respectively, and the value of this term will be (abe)2*t/+*, Because multiplication in S and addition in N are each associative, we get equal terms,

and because addition in S is commutative

16-55

and associative,

2.7

Excursion:

Practicing Proofs

In this Excursion we will apply our proof methods,

for both the propositional and predicate calculi,

to prove some statements about strings and formal languages. First,

recall the definitions of a string over a fixed alphabet,

of the concatenation of two

strings,

and of the concatenation product of two languages. While we have a formal definition of the concatenation product of languages in terms of quantifiers and concatenation of strings, we lack the formal machinery to prove even some very basic facts about the latter concept.

For example,

we've defined the reversal w® of a string w to be w written backwards. Playing around with examples should convince you of the general rules (w®)* = w and (uv) = v?u®, but we're not in a position to prove these facts until we have a more formal definition (which will come in Chapter

4). For now, we'll assume that these two statements are true. A palindrome is a string w such that w® = w. Examples attributed to Napoleon, “Able was I ere I saw Elba”, which

ablewasiereisawelba. following fact: Theorem:

We'll write Pal(w)

Vu : dv: Pal(wv).

to mean

are \, hannah, and a statement we might represent as the string

“w is a palindrome”.

Now let’s try to prove the

That is, any string can be extended to give a palindrome.

Proof: It’s usually good to start by trying a direct proof, trying to prove our goal statement with no assumptions. So let u be an arbitrary string, and we'll try to prove dv : Pal(uv) without any assumptions about u other than that it’s a string (then, by the Rule of Generalization, we'll be done). How to prove Jv : Pal(uv)? If we can think of a particular v such that we can prove wv is a palindrome, then we'll be done by the Rule of Existence. What v would work? It presumably depends on u, and in fact u® seems to work in examples. Can we prove Pal(uu®) in general? Yes,

given the observations above.

above,

(uu®)® = (u®)Fu®

suitable v always

We

need to show

that (wu®)”®

= wu.

Applying

the second

and, since (u*)* = u by the second rule, this is equal to uu”.

rule

So a

exists, though note that because of the order of the quantifiers, v may depend on

u rather than being the same for all u. Since we assumed nothing about u, we may use the Rule of

Generalization to conclude Vu : Ju : Pal(uv).

a

Here’s another example using the concatenation product of languages from Section 2.5:

Theorem: Proof:

The

For any language L, L0) = 0. first step

is to note

that

this statement

starts

“VL”,

so we

let L be an arbitrary

language and plan to finish with the Rule of Generalization. By the definition of set equality, we want to show that any string is in one of these languages if and only if it’s in the other. Unrolling the definition of LQ, we find that we want to prove

a2 € {w: du: dv: (w=uv) A(wE L) A(v E O)}] © [x EO). We can simplify the statement on the left to

du: du: (a= uv) A(u € L) A(v € 9). 2-32

we will get the same result either way for the sum of terms with each exponent. Similarly, in computing f(g+h)

abet

and fg+ fh, we will get the same terms of the form

or acx‘** either way, and because addition in S is commutative and associative

we will get the same result as the coefficient of either for each possible exponent.

Exercise 4.6.10 We

prove

Vx:

Al

by

induction

on all natural

For the

base

case,

we

are

given that A(0,0) is true. Our inductive hypothesis is A(«,), and our inductive goal is A(Sa,Sx). The first general rule tells us that A(Sa,y) is false, and the second (speciialized to Sx and y) tells us that A(Sa, Sx) is true, as desired.

Exercise 4.7.1 Exercise 4.7.2

By the first axiom ) is a string, by using the second axiom three times we show that a= append(A, a), ab = append(a,b), and aba = append(ab, a) are all strings. public

boolean

{// returns

Exercise 4.7.5

if x ==

x,

string

(isEmpty

if

(last(x) == last(y)) return isEqual (allButLast(x),

(x))

(y))

(isEmpty

return

return

isEmpty

false;

allbutLast(y));

public string oc (string w) {// returns one’s complement of w if (isEmpty (w)) return emptyString; string ocabl = oc(allButLast(w)); if (last (w) °0’) return append (ocabl, ’1’); if (last (w) °4’) return append (ocabl, ’0’); throw new Exception ("oc called on non-binary string") ;} public String rev (String w) {// returns reversal of w, computed for

out

=

- 1,

i >=

(n

return

0,

i--)

out;}

public String revRec (String w) {// returns reversal of w, computed int n = w.length(); if

recursion

without

"";

(int i = w.length() out += w.charAt (i);

return

Exercise 4.7.7

(y);

Define oc(\) to be A, oc(w0) to be oc(w)1, and oc(w1) to be oc(w)0.

String

Exercise 4.7.6

y)

false;}

return

else

Exercise 4.7.4

(string

if

if

Exercise 4.7.3

true

isEqual

0)

return

recursively

"";

revRec(w.substring(1,n))

A string wu is a suffix of \ if and only if it is empty..

+ w.substring(1) ;}

A string wa is a suffix of a string

w if and only if (1) w = va for some string v and (2) u is a suffix of v.

public static boolean isSuffix (string if (isEmpty(u)) return isEmpty (v); if (isEmpty(v)) return false; 16-56

u,

string

v)

{

2 4

&

am

:

3

5°

\

/ ° 4548

equality

qv 02

\

4ef°

order

OK)

[

universal

©Kendall Hunt Publishing Company Figure 16-10: Graphs of Three Relations for Exercise 4.9.1 if

(last(u)

return

!= last

(v))

return

isSuffix(allButLast(u),

false;

allButLast (v) ;}

Exercise 4.7.8 We define stars(A) to be A, and stars(wa) to be stars(w)x.

public static string stars (string w) { if (isEmpty(w)) return emptystring; return append(stars(allButLast(w)), ’*’);} Exercise 4.7.9 The relation is false if u is empty. Otherwise if u = va for some string v and some character x, contains(u,a) is true if and only either contains(v,a) or x =a.

public static boolean contains (string if (isEmpty(u)) return false; if

(last(u)

return

Exercise 4.7.10

==

a)

return

u,

char

a)

{

true;

contains(allButLast(u),

a);}

The key to the inductive step is that the double letter either occurs in allButLast (w) or is the last two letters of the string. two 1

pubilc static boolean hasDouble (string w) { if (isEmpty(w)) return false; if (isEmpty(allButLast(w))) return false; if (ast(w) == last(allButLast(w))) return return hasDouble(allButLast (w)) ;} Exercise 4.9.1

The three graphs are shown in Figure S-10.

Exercise 4.9.2

An undirected graph’s edge relation is anti-reflexive and symmetric (since every edge is bidirectional).

true;

(since the graph has no loops)

Exercise 4.9.3

The edge relation is reflexive if the graph has a loop at every vertex. It is anti-reflexive if the graph has no loops. It is symmetric if all non-loop edges are bidirectional (each arc has a corresponding arc in the other direction). It is antisymmetric if there is no non-loop arc with a corresponding arc in the other direction. It is transitive if every two-step path has a corresponding shortcut, an arc from the start of the first arc to the end of the second. The directed graph of an equivalence relation consists of a set of complete directed graphs, one for each equivalence class. The directed graph of a partial order has a loop at every vertex, no other cycles, and the transitivity property

above. To get from this graph to the Hasse diagram, remove the loops and remove all shortcut edges — all edges that have the same start and finish as a path of two or more edges. Exercise 4.9.4

Case 1: The path a is empty, and there is nothing to prove. Case 2: The path a is a path 8 followed by an edge e, where § is an empty path. In this case e is the first

edge, since a is e followed by an empty path. Case 3: The path a is 8 followed by an edge e, where 3 in not empty. By the inductive hypothesis, 8 has a first edge c and

consists of ¢ followed by some path y. In this case c is also the first edge of a, since a is c followed by ¥ followed by e, which since paths are transitive is c followed by the path made by composing ¥ and e.

Exercise 4.9.5

On a

directed

graph the path predicate need

not be symmetric.

On

an undirected

graph, the path predicate is reflexive by definition. It is clearly symmetric because any path can be reversed edge by edge (this is proved formally as Problem 4.9.1). It was proved to be transitive in the Theorem of Section 4.9.

To show that P(«,y)AP(y,) is an equivalence relation: It is reflexive because P(x, x) is true and thus P(x, a is given by the definition of P. It is clearly symmetric in a and y (using the commutativity of A) and thus is a symmetric relation. It is ive because if P(x,y) A P(y,) and P(y,z) A P(z,y) are given, P(a,z) and P(z,2) follow by separation and the transitivity of P. The relation P(x, on directed graphs.

y) V P(y, x) is reflexive and symmetric, but it need not be transitive For a counterexample,

consider the graph with vertex set {a, b, c}

and arcs (a,b) and (c,b). Here the relation holds for a and b (since P(a,b) is true), and for b and c (since P(c,b) is true) but not for a and c (since neither P(a,c) nor P(c,a) is true). Exercise 4.9.6

Any directed graph is isomorphic to itself, via the function that takes each vertex to its so isomorphism is a reflexive relation. If f is an isomorphism from G to H, we

know that it has an inverse function from H to G (because it is a bijection) and this function is an isomorphism because it takes arcs to arcs and non-arcs to non-arcs. s isomorphism a symmetric relation. Finally, if f is an isomorphism from G to H and k is an isomorphism from H to J, then the composition kof is a function

from G to I. We know that the composition of two bijections is a bijection, and it is easy to see that is also takes arcs to arcs and non-arcs to non-arcs. This shows that isomorphism is a transitive relation, making it an equivalence relation.

Exercise 4.9.7

(a)

An isomorphism f from an undirected graph G to another undirected

graph H

creates a bijection from the edges of G to the edges of H, taking edge (x,y) of G to edge (f(x), f(y)) of H. Because f has an inverse f~!, the mapping on edges also has an inverse and must be a bijection.

16-58

(b)

An undirected graph with three nodes could have zero, one, two, or three edges. If two such graphs each have zero edges, or each have three edges, then any bijection of the nodes is an isomorphism. If two such graphs each have one edge, we can choose a bijection of the nodes that takes the endpoints of one edge to the endpoints of the other. And if they have two edges, we choose a bijection that maps one node with two neighbors isomorphism.

(c) Let G have node node set {w from G to H, the three nodes. But Exercise 4.9.8

to the other, and

this will also be an

set {a,b,c,d} and edges (a,b), (a,c), and (a,d). Let H have } and edges (w, ), ( and (y,z). Iff were an isomorphism node f(a) in H would have to have edges to each of the other none of the four nodes in H has edges to each other node.

If f is an isomorphism from G to H, and x and y are any vertices of G, then f maps any path from a to y into a path from f(a) to f()., and the inverse of f maps any

path from f(a) to f(y) to a path from « to y.

(a) If there is a path in G from any « to any y, there is a path in H from f(a) to f(y).

Thus if G is connected, any two vertices in H have a path from one to the

other, since they are f(a) and f(y) for some vertices x and y in G. So H is connected. (b)

Similarly any cycle in G is mapped by f to a cycle in H, and vice versa by f’s inverse. So G has a cycle if and only if H does, and thus has no cycle if and only if H does.

(c) This follows directly from parts connected forest. Exercise 4.9.9

There

in a

also

is one graph with no arcs,

class by itself.

(a) and

(b) by the definition of a tree as a

in a class by itself, and

one graph with four arcs,

There are four graphs with exactly one arc, and these form two

isomorphism classes depending on whether the arc is a loop.

Similarly the four four

graphs with exactly three arcs form two classes, depending on whether the missing arc is a loop.

The six nodes with exactly two arcs divide into four equivalence classes

(giving us ten equivalence classes in all).

To see this, note that the two arcs could

include zero, one, or two loops. If there are zero or two, there is only one possible graph, but with one loop the non-loop could be directed either into or out of the node with the loop.

Exercise 4.9.10 Rename the vertices of G so that the n edges of the path become (v,v1), (v1.2), ; (Yn—1,Un- This may result in more than one name being given to a particular node.

In fact this must happen,

and only n different nodes. numbers,

there are n + 1 names

Un

i < j. The portion of the path from v; to

is then a directed cycle.

We add a second clause saying that if S is a tree with root s, the following is a tree: Nodes

are a new node z plus the nodes of S, arcs are

root of the new tree is x. Exercise 4.10.2

(vg, V1, U2, ---,

This means that at least one node has two different node

that is, it is both vj and vj where

vj, consisting of j — 1 edges, Exercise 4.10.1

because

Here is pseudo-Java code for the desired method:

(a,s) plus the arcs of S.

The

©Kendall Hunt Publishing Company Figure 16-11:

Six Trees for Exercise 4.10.4

natural numAtoms () {// returns number of atoms if

(isAtom)

return Exercise 4.10.3

(a )

b

) c) d) e)

(f )

return

in calling

1;

left.numAtoms()

object

+ right.numAtoms();}

((4xp) #1) - ( (x*x) + (yy) ) +*SS*CC abtaa*xab*-bb*+* +kakaat*3¥*akab+*3*xa*bb*bD*DbD

abtabtabt**

(1x) + Cex) = Go)

+ CRORX)

Exercise 4.10.4

The six trees are given in Figure S-11.

Exercise 4.10.5

Every arc enters exactly one node and so contributes 1 to the in-degree of exactly one

Exercise 4.10.6

node. Since a tree has one node of in-degree 0 (the root) and n — 1 nodes of degree 1, the sum of the in-degrees is n — 1 and there must be exactly n — 1 arcs. (a)

For the base case, the rooted directed tree has one node and no arcs, so the only possible path is the trivial one from the node to itself, which is finite. The depth

16-60

of this tree is 0.

For the inductive case,

we have a rooted directed tree T with

a root and arcs to the roots of one or more rooted directed trees, each of which by the inductive hypothesis has only finite paths. Any path in T is either a path entirely in one of the subtrees (which must be finite) or an arc from the root of T followed by a path in a subtree (which is also finite). (b)

The depth of the one-node

tree is 0.

If we make

a rooted directed tree T by

taking a root with arcs to the roots of rooted directed trees 1, S2.

the depth of T is 1 plus the largest depth of any of the the S;’s. the longest path in T must a longest path within 5;.

Sx, then

(This

take an arc to the root of one of the S;’s, then take

Exercise 4.10.7

public boolean contains (thing target) { if (isAtom) return equals (contents, target); return car().contains(target) || cdr().contains(target) ;}

Exercise 4.10.8

(a) 4(2%)— (2? +2?) = 32-8 =24

(b) (c) (a) (e)

is because

(2*2)+ (2*2) =8 444-444)= 16 2 + 3(2%) + 3(2%) +23 = 64 (2+ 2)(2+2)(2+2) = 64

(£) 1-2-2? 28-24 = —29 Exercise 4.10.9

For depth 0 we can only have 1. For depth 1, 1+ 1 has a larger value than 1 x 1 so the answer is 2. For depth 2, we can either add or multiply two maximal expressions of depth 1, and either way we get 4. For higher depth, we want to multiply: depth 3

gives 4 x 4 = 16, depth 4 gives 16 x 16 = 256, and depth 5 gives 256 x 256 = Exercise 4.10.10

Exercise 4.11.1

65536.

(a) Could be either, for example 1+ 1 and 1+1+1. (b)

Must be even. All constant expressions are even naturals, sum of two even naturals the result is even.

and if we have the

(c)

Must be odd. All constant expressions are odd naturals, and if we multiply two odd naturals the result is odd.

(d)

As in (b), since the product of two even naturals is even, the result must be even.

If 3 divides n, the 2 x n rectangle divides into 2 x 3 rectangles, each of which can be covered

by two L-shaped

tiles.

If 3 does not divide n, it does not

divide 2n either,

and the 2 x n rectangle cannot possibly be covered by L-shaped tiles of size 3 each. Exercise 4.11.2

For the base case, we have no cuts and one piece.

For the induction, we assume that

there is a way to make (n? +n +2)/2 pieces and no way to make more. the section that the n+1’st cut could always be chosen to add up to n+1

We showed in more pieces,

but no more. The new maximum number of pieces is thus (n? +n+2)/2+(n+1)

(n? + 3n + 4)/2 = ((n +1)? + (n+ 1) + 2)/2, as desired.

Exercise 4.11.3

=

Figure S-12 shows how one cut of a non-convex piece can yield three pieces. There is no limit to how many pieces could be produced if the boundary of the pizza is very wiggly.

16-61

@©Kendall Hunt Publishing Company Figure 16-12:

Cutting a non-convex pizza.

Exercise 4.11.4

Clearly with no lines we have one region, so the base case holds. Each new line intersects each of the old lines exactly once, so it passed through exactly n +1 old regions (as we discussed in the section) and thus adds exactly n+ 1 to the number of regions. By the arithmetic in Exercise 4.11.2, the solution is (n? + n + 2)/2 regions for n lines.

Exe

For n = 0, we have that statement

is false.

% F(n+1)

==

1%1

For n =

==

0 as desired.

2, 342

==

For n = 1, 241

1 as desired.

For

!=

1 so the desired

larger n, we

know

that

F(n) < F(n +1), so the equation F(n + 2) = F(n+1) + F(n) tells us that F(n+2) F(n) as desired.

For n = 0 the Euclidean Algorithm takes no steps, for

n= 1it takes one, and for n = 2 it takes one. For n = 3 it takes two, for n = 4 three, and in general for n > 2 it takes n — 1. To prove this, we take n = 2 as our base

case and prove the induction by referring to the first half of this exercise:

On input

F(n+2) and F(n+1), we do one division and are left with F(n+1) and F(n), which by the inductive hypothesis take n — 1 more steps. Thus we have n = (n+ 1) —1 total steps, and the inductive step is complete.

Exercise 4.11.6

Let t(n) be the number of tilings. The empty tiling makes ¢(0) = 1, and clearly t(1) = 1 as well. If we have a 2 x n rectangle for n > 1, we can tile it by either (a) placing a domino vertically at the right end, then tiling the remaining 2 x (n — 1) rectangle in one of t(n— 1) ways, or (b) placing two dominoes horizontally at the right end, then tiling the remaining 2 x (n — 2) rectangle in one of ¢(n — 2) ways. Hence t(n) = t(n—1) +t(n—2) and we have the Fibonacci recurrence, though with different starting conditions. We can see that t(0) = ¢(1) = 1 gives us the two base cases for an inductive proof that ¢(n) = F(n +1), where F(n) is the Fibonacci function from Excursion 4.5.

Exercise 4.11.7 ‘We can easily define a bijection from the perfect matchings of the grid graph to the tilings of the rectangle by dominoes. Given a matching, we place a domino over each edge of the matching, so each endpoint of the edge is covered by one of the squares of the domino. Since the matching is perfect, each square of the rectangle is covered exactly once, and we have a tiling. Given a tiling of the rectangle, we create a grid graph by placing a node at the center of each of the 2n squares of the rectangle and placing an edge between nodes that are adjacent horizontally or vertically. Then each domino in the tiling corresponds to an edge between the two nodes in the center of the domino’s two squares. The edges for the tiling dominos include each vertex of the grid graph exactly once as an endpoint, since the tiling includes each square exactly once.

16-62

©Kendall Hunt Publishing Company Figure 16-13:

Tiling a 4 x 4 square with T tetrominos.

So the edges form a perfect matching. Since each tiling corresponds to a different matching and vice versa, we have a bijection and the number of each is the same. Exercise 4.11.6 gives the number of tilings.

Exercise 4.11.8

(a) Figure S-13 shows a tiling of the 4 x 4 square with four T tetrominos. If 4 divides n, we can divide a 4 x n rectangle into n/4 such squares and tile each one separately. (b)

Clearly any tiling of a 4 x n rectangle will involve exactly n tetrominos. If we color the squares of the rectangle as suggested, each T tetromino will have three squares of one color and one of the other. If there are k tetrominos with three black squares, we will have a total of 3k + 1(n — k) black squares. If n is odd, this number must be odd, but the rectangle has 2n black and 2n while squares. (We've left open the question of whether the tiling is possible when n is even.

Exercise 4.11.9 We first prove that F(i) and F(i+ 6) are always congruent modulo 4 — the stated result follows immediately from this by induction on k. Using the Fibonacci rule, we can compute over the integers that

F(i+6) = F(i+5)+ F(i+4) =2F(i4+4) + F(E43) =

3F(i+3) + 2F(i+ 2) =5F( +2) +3F(é +1) =8F (i+ 1) +5F (i). Modulo 4, this last expression is congruent to OF(i +1) + LF(i) = F(i). Exercise 4.11.10 We'll assume without loss of generality that i < j. If i = 0, 2'+ 1 = 2 divides itself, but fails to divide 2’ + 1 for all positive j7 because these numbers are odd. Ifi = 1, so that 2! + 1 = 3 we can look at 2’ + 1 modulo 8 for all j. It starts at 2, and each time when we increase j by 1 we double the number and subtract 1. So 2 becomes 2(2) — 1 = 3 = 0, and 0 becomes 2(0) — 1 = 2, so we can prove by induction that (27 + 1)%3 is 0 for oddj and 2 for even j. So 2! + 1 divides 2’ + 1 if and only if j is odd. 16-63

For larger i we again look at the periodic behavior of 2’ + 1 as j increases. We start

at 2 for j = 0, and then run through 3,5,9,... (the values of 2+ 1) until we reach 2' +1 which is congruent to 0 modulo 2'+1. We then get 2(0) — 1 which is congruent to 2', then go through

modulo 2’ +1, if and only if

—3,—7, —15,...

until we reach 1 — 2' which is congruent

to 2

so that the process continues with period 2i. Thus 2' +1 divides 27 +1 ik for some odd natural k.

16-64

16.5 Exercise 5.1.1

Exercises From

Chapter

5

Let L = L(0*). Clearly \ € L, since 2 is defined to be in any star language. The only other way to get a string w into L would be if w = uv, u € L, and v = 0), but this is impossible as @ has no elements.

Exercise 5.1.2

Let u be arbitrary, assume that wu € A*, and prove uv € A* by induction on all strings vin A*.

If v= 2Xit

is clear that we =u

€ A*.

So let v =

ay,

with x € A* andy

€ A.

Then uv = (ux)y by associativity of concatenation on strings, is in A* by the inductive hypothesis, and (ua)y is in A* by the definition of A*. We have completed the induction, and since u was arbitrary we are done. Exercise 5.1.3

The language A’ is finite if and only if A = 0 or A = {A}. If A contains a string u with |u| > 0, then A contains the string u' for every natural i, and since these strings all have different lengths they are all different and A* contains infinitely many strings.

Exercise 5.1.4

The strings are ab, abbbb, aba, ababbb, abaa, abaabbb,

Exercise 5.1.5

The language b*aab* is the set of all strings of a’s and b’s that have exactly two a’s that are next to one another.

Exercise 5.1.6

bb, bbbbb, bba, bbabbb, bbaa, and

bbaabbb.

(a)

The before

simplest

expression

the a, between

is ©*ah*bd*,

the a and

because

there can be arbitrary

the 6, or after the b.

A more

strings

subtle analysis

shows that if there is an a somewhere before a b, there must be an a immediately before a b, so we can have the simpler expression b*abb*.

(b)

If we have both an a and a b, one must come before the other. So we can add the language of part (a) to the similar language of strings with a b before an a,

to get L*aL*bU* + L*bd*ad*, or U*abu* + U*baX*, or even U*(ab + ba)U*. Exercise 5.1.7 There are many examples,

such as a* and (a+ aa)*.

Exercise 5.1.8

Base case: n = 0, k” = 0, and there is exactly one string of length 0, which is \. Inductive case: Assume that there are k” strings of length n. We can make a string of length n + 1 by appending any of the k letters to any of the k” strings of length n. These k” x k = k"*! concatenations each give a different string, so there are k”+1 total strings of length n+ 1, completing the induction.

Exercise 5.1.9

There are 26 x 10 x 26 = 6760 choices for u, and 10 x 26 x 10 = 2600 choices for v, making 6760 x 2600 = 17576000. For comparison, the population of Canada in 2018 was estimated as 37.06 million, allowing about one postal code for every two people. In 2014, about 5 percent of the possible codes were in use, an average of about one code per 40 people.

(a) a* + a*ba* + a*ba*ba* + a*ba*ba*ba* (b)

a*ba* ba*ba*ba*

(c)

D*bu*bu*bu*bU*

e

Exercise 5.1.10

Exercise 5.2.1

(a) abd*

(b) ad*b 16-65

But

v € @ is false for any v, and the AND

of a

false statement

left half thus reduces to Ju : dv : 0 which is false for any

with anything

else is false.

The

, so it is equivalent to x € @ and we're

done. (Remember that Ja : 0 is definitely false, while Vx : 0 is true for an empty domain and false otherwise. 3x: 1 is true unless the domain is empty, while Vz : 1 is always true.) a

2.7.1

Writing

Exercises:

Here are a few practice statements to prove. For this exercise, you should go slightly overboard in justifying your steps. All the small-letter variables are of type “string in A*” (where A is a nonempty finite set, and A* is the set of all strings of letters in A) and all the capital-letter variables are of type “language over A” (or “subset of A*”).

1. Va: Pal(x)

Pal(a®).

2. VL:VM:VN:L(MUN)=LMULN.

3. Ju: Vou: dw: ((ww = w) A (uw = v)). 4. Assuming Vu : du : (wv € L) and da: Vy: Vz : (yaz # y), prove dw : Va: [(x € L) > Ay: (w = zy)]. Is this conclusion true without the first assumption? (The second assumption is true for strings over any nonempty alphabet.)

2-33

(c) ax* + E*b

(d) (ab)* Exercise 5.2.2

If w star even zero

€ (b+ ab*a)*, language. If w number of a’s or two a’s and

we show that w has an even number of a’s by induction on the = \ then there are zero a’s, an even number. If w = wv, u has an by the inductive hypothesis, and v € b + ab*a, then v has either w also has an even number of a’s.

For the other direction, we prove by induction on all even numbers

n that if w has

exactly na’s, then w € (b+ab*a)*. If n = 0, w € b* C (b+ ab*a)*. Assume the result for n, and let w have exactly n + 2 a’s. Define u to be the prefix of w including all letters up to but not including the next to last a, and let v be the rest of w. Then

u

has exactly n a’s, so by the inductive hypothesis it is in (b+ ab*a)*. The string v is in ab*ab*, so it is a string in ab*a followed by zero or more b’s and is in (b + ab*a)*. Since star languages are closed under concatenation and w = uv, w is in the desired language and we have completed the induction. Exercise 5.2.3 Exercise 5.2.4

(aaaaaa)*(a + aa + aaaaa) (a) Every a has a b before and after, b* + bE*b (b)

If the string is empty then the J is false. If there is a b in the string, we can let x be its location and the — is true vacuously. If the string has one or more a’s and no b’s, however, any choice of x allows the choice of y = x, which makes

the premise of the implication true and the conclusion false.

x*by*.

Exercise 5.2.5

So the language is

(c)

The statement says that for every position with an a, every position to its right also has an a. The language is b*a*, which includes the empty string.

(a)

baba, b, bbab, bbbaba, bbb, bbbbab, ababa, ab, abbab, abbbaba, abbb, and abbbbab

(b)

aba, abaa, abba, abbaa, abbba, abbbaa, abbbab, and abbbba

Base ca: n = 1, step, a ends in it must JH, the

For n = 0, the only string in the language is \, and F(0 +1) = 1. For the only string in the language is a, and F(1+ 1) = 1. For the inductive string of length n+ 1 in the language must either end in a or in b. If it a it must be wa for some w in the language of length n, and if it ends in b be wab for some w in the language of length n — 1. Thus given the strong number of strings of length n+ 1 in the language is exactly the number of

length n, which is F(n + 1), plus the number of length n — 1, which is F(n). Since F(n+1)+ F(n) = F(n +2), the number of strings of length n is F(n + 1+ 1) and we have completed the inductive step.

Exercise 5.2.7

(a)

(aa)*a is the set of all strings of a’s whose (aaa)*(a + aa) is the set of all strings of s’s The union of these two languages contains all are divisible both by 2 and by 3, which by the not divisible by 6.

length is not divisible by 2, and whose length is not divisible by 3. strings of a’s except for those that CRT is the set of strings of length

(b) (aa)*a + (aaa)*(a + aa) + (a°)*(a + aa + aaa + aaaa) + (a")*(a+a? +03 + at + a +a°) +(a")*(a+a?+...+a!°).

As in part (a), a string of a’s can fail to be

in this language only by having a length that is divisible by 2, 3, 5, 7, and 11.

16-66

By the CRT,

this set of missing strings is exactly the set of strings of a’s with

length divisible by 2 x 3x 5 x 7 x 11 = 2310. Exercise 5.2.8

Since every b except possibly the last one must be followed by at least three a’s, we can take the language (a+ baaa)* and then concatenate the set of possible ending strings, which is \ + ba* because the number of a’s after the last b could be any natural.

Exercise 5.2.9 ‘We can approximate the answer by the language (b+ aa)*, which gives us a subset of the desired language because it also requires an even number of a’s before the first b or after the last one. To get all the strings, we have to allow odd strings of a’s at the

beginning or the end, which we can do with the languages (a + A)(b+ aa)*(a+ A) or a*(b+aa)*a*. Exercise 5.2.10

It turns out that 20 of the 32 strings of length 5 are in the language.

With no b’s

we have just aaaaa. With one b we can have aaaab, aaaba, aabaa, abaaa, or baaaa. With two b’s we can have no a’s between them (aaabb, aabba, abbaa, or bbaaa) or two (baaba, abaab). With three b’s we can have aabbb, abbba, baabb, bbaab, or bbbaa. With

four b’s, the single a must come at the beginning or the end (abbbb, bbbba) and with five b’s we must have bbbbb. Exercise 5.4.1

If S =a

and T =}, then abab is in (ST)* \ S*T*, and aab is in S*T* \ (ST)*.

Exercise 5.4.2

ST C(S+T)*,so(S+T+ST) C (S+T)*. Hence, by the Proposition of the section, (S+T+ST)* C (S+T)*. Conversely, since $S+T C S+T+ST, it follows that

(S+T)* C(S+T+4+ST)*.

Exercise 5.4.3

For the base case of n = 0, we know that (A+$)° = A because any language raised to the 0’th power is \. So assume P(n), that (A+ S$)" is the sum for i from 0 ton of S’. Then (\+S)"*? is equal to (A+$)(S°+...+5") = (S°+...49")+(S'+...4.9"+), Since S* + S* = S' for any i, this is S°+9!+...+.$"*1 as desired.

Exercise 5.4.4

Let w be an arbitrary string. Then w is in ($+ T)(S +T) if and only if w = wv for some strings u and v that are each in S +7, meaning that they are each either in Sorin

T.

Let

us be the proposition

wu € S, and similarlly ut,

distributive law, Ju: du : (w = uv) A (us V ut) A (us V vt) is true

vs, and

vt.

if and only if

By

the

du : dt: (w = uv) A [(us A vs) V (us A vt) V (ut A vs) V (ut A vt)] We can distribute the existential quantifiers over the OR, ST) V (w € TS) V (w € TT). Exercise 5.4.5

to get (w € SS) V (w €

Suppose that w € ST — then w = wv with wu € S and v € T. But since u = a‘ and v =a! for some naturals i and j, wy = vu = at), Since w = vu with v € T and

u € S, we have that w ¢ TS.

So ST C TS, and reversing S and T in this argument

shows that TS C ST as well, and thus ST =TS.

Exercise 5.4.6 We first prove S'S* C S*S, by induction on the definition of the star language on the left. The base case is S$ C S*S, which is true because S* includes the language 0*, so that S = 0*S C S*S. For the inductive case, we consider a string of the form xyz where « € S, y € S*, and z € S. Our inductive hypothesis is that ry € S*S, so that ay can be written as wv where u € S* and v € S. Now we look at xyz, which equals

16-67

uvz. Because S* is closed under concatenation of strings in S on the right, we see that since u is in S*, so is wv. This makes uvz a string in S*S as desired. It remains

to prove

that S*S

is enough because S* C S*S.

C SS*.

To do this we will show

that S*S

C S*, which

We use induction on the definition of S*S.

For the base

case, we need only show that S C S* which we have already observed. So consider a string of the form xyz, where a € S*, y © S, and z € S. We actually don’t need any inductive hypothesis to finish this, because then ayz € S*.

from 2 € S*

we can derive wy € S* and

Exercise 5.4.7 We first show (S*)” ¢ ($*)*, which means that for any string w in S*, w” € (S®)*. For the base case, w = A, w® = X, and \ € (S*)* because ) is in any star language. For the inductive case, we consider a string wa with w € S*, « € S,

and w*® assumed

to be in (S")* by the inductive hypothesis. We must show that (wx)* € (S")*. We know that (wa)? = 2? w, so (wx)® is seen to be the concatenation of a string in S® and a string in (S")*. Since ZZ* C Z* for any language Z (a special case of our Lemma that Z*Z* C Z*), this half is done. It remains to show that (S)* C ($*)". We do this by induction on the definition of the left-hand language, so that every w € (S”)* is in (S*)®. Again the base case is w =, and this is true because ) is in S* and thus \? = ) is in (S*)*. Now we look at a string wa where w € (S*)*, x € S®, and by the IH w € (S*)®. We need to show wae € (S*)®, which means x*w® € S*. But x? € § and w® € S*, so again we are finished by the rule ZZ*

Exercise 5.4.8

C Z*.

An arbitrary string in (ST)* is (wx)”, where w is an arbitrary string in § and « an arbitrary string in T. Similarly an arbitrary string in T’S* is «w*, for the same conditions on w and

since (wr)”® other.

= 2% w*,

x.

For each direction of the proof,

we need

only observe

that

each string in one language meets the conditions to be in the

Exercise 5.4.9 We first show that S* A* C (SM A*)*. We do this by induction on the definition of the left-hand language — we must show that for every w € S*, if w ¢ A*, then w € (SM A*)*. The base case of w = A is clear because ) is in any star language. So

consider wa where w € S* and x € S, and by the IH w € A* implies w € (SN A*)*. For wa to be in A*, both w and x must be in A*. So w € (SN A*)*, « € (SN A*) and by the rule Z*Z C Z*, wa € (SN A*)*.

It remains to show that (SNA*)* C S*NA*. This is clear by induction on the definition of the left-hand language. If w = A we clearly have both w € S* and w € A*. If we

look at wa, where w € (S'N A*)*, a € SN A*, and by the IH w € S*N A’, it is clear that wa is in both S* and A* since the former is closed under concatenation on the right by strings in S, and the latter by strings in A* (by the Lemma that Z*Z* C Z*), and 2 is in both S and A*. Exercise 5.4.10 Exercise 5.5.1

No, we never used the fact that S and T were regular languages, only that they were languages. So all the proofs remain valid for S and T as arbitrary languages.

We use induction on all strings oc(x)A = oc(x), so the identity holds. oc(x)oc(z)1, and the identity holds oc(az) = oc(a)oc(z). Similarly, if y=

If y r) and oc(x)oc(y) = If y rz)l and oc(x)oc(z0) = because the inductive hypothesis tells us that 21, oc(xz1) and oc(«)oc(z)1 are both oc(xz)0. 16-68

Exercise 5.5.2 We

have first that reu(bac) = rev(ac)b = cab, and that reu((ac + bc)*ba(ba)*) rev((ba)*)rev(ba)rev((ab + bc)*). The latter, by three individual calculations, (ab)*ab(ba + cb)*, and the sum of the two results is the desired expression.

= is

Exercise 5.5.3 If R = 0, then Pref(R) = 0. If R = a for some letter, then Pref(R) = \+a. (The only two prefixes of the strings a are and a itself.) If R = S$ +T, then Pref(R) = Pref(S) + Pref(T). The other two clauses require special handling in case of an argument denoting the empty language. If L(T) = 0, then L(ST) = 0 and thus Pref(ST) = 0 for any S. Otherwise, if R = ST, then Pref(R) = Pref(S) + SPref(T).

(A prefix of a string in ST

ither a prefix of the S-string or the entire

‘S-string followed by a prefix of the T-string. Finally, if R = S*, then Pref(R) = S*Pref(S) because a prefix of a concatenation of S-strings must consist of zero or more S-strings followed by a prefix of another S-string. That is, unless L(S) = 0,

in which case Pref(S*) = 0*. method from Exercise 5.5.4::

Here is the code, which requires the emptyLanguage

public RegExp prefix (RegExp r) {// returns regular expression for prefix language of r if (r.isEmptySet()) return new RexExp(); if (r.isZero()) return plus (star (new RegExp()), new RegExp(’0’)); if (r.isOne()) return plus (star (new RegExp()), new RegExp(’1’)); RegExp s = r.firstArg(); if (r.isStar()) if (!emptyLanguage(s)) return cat (star(s), prefix(s)); else return star(new RegExp()); RegExp t = r.secondArg(); if (r.isUnion() return plus (prefix(s), prefix(t)); if (!emptyLanguage(t)) return plus (prefix (s), cat (s, prefix(t))); else return new RegExp() ;} Exercise 5.5.4

public static boolean emptyLanguage (RegExp r) {// returns true if r represents empty language if (x.isEmptySet()) return true; if

(r.isZero()

||

r.isOne()

||

r.isStar())

return

RegExp s = r.firstArg(); RegExp t = r.secondArg(); if (r.isUnion()) return emptyLanguage (s) && emptyLanguage (t) ; else return emptyLanguage(s) || emptyLanguage(t) ;} Exercise 5.5.5

public static boolean lambdaLanguage (RegExp r) {// returns true if r represents {lambda} if (x.isEmptySet() || r.isZero() || r.isOne()) RegExp s = r.firstArg();

if

(r.isStar()) return emptyLanguage(s)

16-69

|| lambdaLanguage(s);

false;

return

false;

RegExp t = r.secondArg(); if (r.isUnion())

if (emptyLanguage (s)) return lambdaLanguage(t); else if (lambdaLanguage(s)) return emptyLanguage(t) || lambdaLanguage(t); else

else Exercise 5.5.6

return

return

false;

lambdaLanguage(s)

&& lambdaLanguage(t) ;}

public static boolean finiteLanguage (RegExp r) {// returns true if r represents a finite language if

(r.isEmptySet()

RegExp

||

r.isZero()

s = r.firstArg();

if (r.isStar())

||

r.isOne())

return

return

emptyLanguage(s) || lambdaLanguage(s); RegExp t = r.secondArg(); if (finiteLanguage(s) && finiteLanguage(t)) return if

(r.isUnion())

return

true;

return

(emptyLanguage(s)

false;

||

true;

emptyLanguage(t));

Exercise 5.5.7 To test L(S) = L(T), we would use both of the algorithms to get a regular expression

denoting (L(S)N L(T)) U(L(S) N L(L)). denotes the empty language. Exercise 5.5.8

We then test whether this new expression

public static containsLambda (RegExp r) {// returns true if L(r) contains lambda if (r.isEmptySet() || r.isZero() || r.isOne()) RegExp s = r.firstArg(); if (r.isStar()) return true; RegExp t = r.secondArg();

if

(r.isUnion())

false;

return

(containsLambda(s)

return

return

(containsLambda(s)

|| &&

containsLambda(t)); containsLambda(t);

public static boolean hasStringWithO (RegExp r) {// returns true if L(r) has a string with a zero if (r.isZero()) return true; if (r.isEmptySet() || r.isOne()) return false; RegExp s = r.firstArg();

if

(r.isStar())

RegExp

hasStringWith0(s) ;

t = r.secondArg();

if (r.isUnion()) if if

return

(hasStringWith0(s) || hasStringWithO(t)); (hasStringWithO(s) && !emptyLanguage(t)) return (!emptyLanguage(t) && hasStringWithO(t)) return

return

Exercise 5.5.10

return

true; true;

false;

The question is just the negation of the question of whether L(S) contains at least one string with a 1 in it. So we can adapt the answer to Exercise 5.5.9 to test for

16-70

strings with 1’s instead of strings with 0’s, and then negate its answer by negating the result of each return statement. Exercise 5.7.1

We prove by induction on derivations that every string derivable from MIU (a) does not have an III or UU substring, and (b) is in the regular languag IS*U. For the base case, MIU itself clearly has these two properties. If a string Mw satisfies

(a) and (b), then only rule II can be applied to it. Thus the only from Mw is thus Mww, and this satisfies (a) and (b) as well: (b) is w € IX*U, and (a) is true because replacing w with ww does not UU substring. This completes the induction. Since strings satisfying allow for rule I, and all strings derivable from MJU

string derivable obvious because create an III or (a) and (b) only

have these properties,

there is

no way that the other rules can be used and every string is derivable from MIU rule IT. Exercise 5.7.2

by

For the base case, if there are no lines or circles then the map is one-colorable and hence two-colorable. For the inductive case, we assume that we have a two-colored map and consider adding another circle or line. With a circle, we reverse the color

of all regions inside the circle. Then two adjacent regions that are either both inside or both outside the circle still have different colors. Adjacent regions newly created from single regions by the circle also have different colors, because one was reversed and the other was not.

The

same

holds for lines, as we can reverse the color of all

the regions on one side of the new line. graphs and circle maps,

Thus two-colorability holds not only for line

but for maps made from both lines and circles.

Exercise 5.7.3 ‘We use strong induction on all positive naturals gq. For qg = 1, we use induction on positive naturals p. If p = 1 the method returns 1. If it terminates on (p, 1), then it

also terminates on (p+ 1,1) because on that input it recurses to (p,q). that the method

terminates

must show that it terminates

on input

Now assume

(p,r) for any p and for r < q for some

on (p,q + 1) for arbitrary p.

We

show

c.

We

this by strong

induction on p. If p < q+1 it will recurse on (q+1,p) and since p < q in this case the strong inductive hypothesis on q says that we will terminate. If p = q+1 the method

terminates immediately. And if p > q+ 1 it will recurse on (p—(q+1),q+1) and the strong inductive hypothesis on p says that this will terminate. This completes both inductions and proves termination for all positive naturals p and q.

Exercise 5.7.4

21/55 is 1/(2+ 13/21), or 1/(2 + 1/(1 + 8/13), or 1/(2 + 1/(1 + 1/(1 + 3/5), or 1/(2$1/(141/(1+1/(1+2/3))), which is finally 1/(2+1/(1+1/(1+1/(1 + 1/1 + 1/2)))))-

Exercise 5.7.5

1/(+1/7) 1/(5+7/43) 1/(4+ 43/222) 1/(3+222/931) 1/(2+931/3015) 1/(1+3015/6961)

16-71

= = = = = =

7/43 43/222 222/931 931/3015 3015/6961 6961/9976

Exercise 5.7.6 If B =a,

then 8 = 2+, ‘afl

and il — kj

equal to va -iL= af

is. If 6 = 244, then 6-1

il — kl - jk

derivations,

=

3 =— kate io) 74>. The new 1/8 then jap =

so if the number was 1 or —1 before it still

. The new number is (i — k)é — (j — Ok = tk = il and has not changed at all. Thus by induction on all the number if — kj is always 1 or —1.

karl

Exercise 5.7.7 The mediant is clearly a rational number because it is the quotient of two naturals, with the denominator b +d definitely nonzero because neither b nor d is zero. We are

told a/b < ¢/d, which means ad < be. To prove a/b < #5, we multiply through to get ab+ad < ab+be, which is true because ad < be. To prove a 0. To compute set, we multiply top and bottom by the complex conjugate of cz +d = cx + d+ cyi, which is cx + d— cyi.

This makes the denominator (cx + d)? + (cy), a positive real. The numerator is (ax + b + ayi)(cx + d — cyi), which has imaginary part a +ady — bey — axcy = (ad — be)y = y (because ad — be = 1), a positive real. So the imaginary part of the original quotient must

Exercise 5.7.10

be positive, and the quotient is thus in the upper half-plane.

If the first number a of is smaller than the first number b of y, we know that « is smaller than y, because x is a plus something smaller than one and y is b plus something smaller than one. Suppose a = b but the second number c of « is smaller

than the second number d of y. Then a while y is a plus a number between 1/d and are two cases — one sequence is a prefix of for some time. (We can think of the former other has a positive number.) If this first

plus a number between 1/c and 1/(c+ 1), 1/(d+ 1), so a is larger. In general there the other, or they disagree after agreeing case as one sequence having a 0 where the disagreement occurs in an odd-numbered

position, then the number with the smaller entry in that position is smaller. If it occurs in an even position, then the number with the smaller entry in that position

is larger.

Definition of (ab)*: If w = \ then w € (ab)*. If w = vab then w € (ab)* + v € (ab)*. If w A X and w cannot be written as vab, then w ¢ (ab)*.

16-72

Definition of L = (a+ab)*: w = vb, then if v=

Ifw=Athenw

ua then w € Lo

€L.

Ifw=vathnweLoveL.

ue L, otherwise w ¢ L.

Exercise 5.8.2

Bottom-up

Definition:

Definition:

\ € E, a ¢ E for any letter a, otherwise let w = uab and w € E Gue

Exercise 5.8.3

Bottom-up:

A

€

E,

If

if u € E

1 € Ps, if « € Py then 2a € Py.

if and only if 2%2 = 0 and a/2 € Py.

and

v €

UY,

Top-down:

Exercise 5.8.4 We first prove by induction on the bottom-up

then

w

€

E.

Top-down

1 € Po, if « > 1

E.

then x € Py

definition that every derivable string

is in M(I + U)* and has an J-count not divisible by 3. (Thus it is in the MU-puzzle language by the results of Excursion 5.6.) The strings placed in the language by the first clause of the definition all have either 1 or 2 I’s, so their J-count is not divisible by 3. If w is a string whose J-count is not divisible by 3, then the string derived from w by the second clause has the same J-count, and the strings derived from w by the third clause have an J-count that is greater by 3. So these strings also have an J-count that is not divisible by 3.

Now we have to prove that any string in M(I+U)*, whose I-count is not divisible by 3, is derivable from the bottom-up definition. We can do this by strong induction on the J-count: If it is 1 or 2 then the string is given by the first clause, and otherwise we can write w = ayz where y €¢ IU*IU*I and z ¢ U*. Thus w is derivable from x by one use of the second clause and zero or more uses of the third. The string x has a smaller J-count than w and one not divisible by 3, so by the inductive hypothesis it is derivable. For the top-down definition:

Look for the third-to-last J in w.

accept the string if has one or two J’s and reject it if

If there is none,

it has none.

then

(I’m omitting the

check that the string is in M(J+U)*.) If there is a third-to-last J in w, then let v be the substring of w including all letters up to but not including the third-to-last I. ‘We then have that w is in the language if and only if v is. Exercise 5.8.5

The RegExp class could define the following three members

for RegExp objects.

The

first, type, will be one of the constants EMPTY, ZERO, ONE, PLUS, CAT, and STAR. The methods isEmpty, isZero, etc., will simply check the value of this constant to find the

type of the regular expression. the first and

expression.

second

These

The other two members will be other RegExp objects,

arguments

members

of the plus,

cat,

or star operator

at the top

will be null if there is no first argument

of the

or no second

argument.

Exercise 5.8.6

(a)

A string w

is a palindrome if it is empty, if it is a single letter, or if the first and

last letters of w are equal and stripping off both yields a palindrome.

(b)

public static boolean isPalindrome (string w) { if (isEmpty(w)) return true; if (isEmpty (allButLast(w)) return true; if (first(w) != last(w)) return false; return isPalindrome(allButFirst (allButLast (w)));}

Exercise 5.8.7 Let p be the proposition w = w,

q be “w is a palindrome by the bottom-up tion”, and r be “w is a palindrome by the top-down defintion”.

16-73

defini-

e p—

q: We

use induction on all w.

For the base case,

if w = \ the bottom-up

definition says that w is a palindrome. If w = va for a string w and letter a, ifw = w we know that either v = \ (in which case w = a is a palindrome) or the first letter of v must be an a, so v = au and w = aua for some string u. Since w® = au®a, we know that u satisfies u = uw" and is a palindrome. bottom-up definition, w is also a palindrome.

By the

eq-r: If w =A or w =a for a letter a, the top-down definition says directly that w is a palindrome. If w = ava for some palindrome v, then the top-down

definition also says that w is a palindrome because it reduces to whether

v is a

palindrome. e r— p: If w is empty or a single because its first and last letters drome, then w = ava for some and we calculate w® = ava =

Exercise 5.8.8

(a)

letter, then clearly w = w*. If w is a palindrome are the same and stripping them yields a palinletter a and palindrome v. By the IH, v = vk, ava = w so p is true as well.

Clearly a single node with no edges is connected. When we add a new node, we add an edge connecting to an existing node. Since this existing node (by the IH) has paths to all other nodes, so does the new node. And of course the existing nodes still all have paths to one another.

(b)

Let T be a tree and let x be the given node of degree one. If we delete a and its edge from T to get a new graph 7’, we can see that T” is also a tree. It is connected because deleting « and its edge did not disturb any of the paths among the remaining nodes, and we cannot have created a cycle by deleting an edge. So we could have made T from T’. Now we can prove by induction on all positive naturals n that any tree with n nodes can be made by the rules of our bottom-up definition. If n = 1 we make it by the first rule. If T is a tree of n+1 nodes,

we form T” from it.

By the IH, T’ can be made

by the rules, so we can

make T be adding one more node and edge. (c)

Given a graph G, look for a node f degree one. If there is none, G is not a tree by our assumed fact. If there is, form a graph G’ by deleting and its edge. If G’ is a tree, then so is G. If G’ is not a tree, then neither is G. (This last fact requires justification. If G’ is not connected, then G is not connected, And if G’

has a cycle, G has the same cycle.)

Exercise 5.8.9 For the base case, coloring.

we can color the single node

with

either color and

get a valid

For the inductive case, if we have a two-colored tree and add a new node

«,

we can color the new node so as to make a valid coloring of the new tree, by simply choosing a color other than the color of «’s neighbor. Exercise 5.8.10 ‘We use induction on positive naturals to show that there are 2” total paths of length nin G,, 2"! from s to a, and 2"~+ from s to b,. For the base case of n = 1, there are paths of length 1 from s to a, and from s to 6), for 2= 2! total paths. We assume as our IH that in G,, there are 2”~! paths to each node, and look at Gn41- There are

2” total paths from s to an41, 2"~! passing through an and 2”~! passing through bn. Similarly there are 2” paths from s to bn41, again half through a, and half through

bn, for 2"+1 total paths.

Figure 16-14:

The Flowchart

Figure 16-15: 5.10.1

We case

Exercise 5.10.2

Exercise 5.10.4

a.

“if

(vm ==

We

replace

S;”. Exercise 5.10.3

replace

“do

S while

S1;

break;

al)

S1;

The Flowchart for Exercise 5.10.3.

(x)”

“loopbegin

“S;

while

$2;

break;

(m=a2)

S2;

looptest

(x)

if S$;

with a2:

case

else

for a Center-Exit Loop, for Exercise 5.10.2

(x)

..., T;

The flowchart is given as Figure 16-15. = true;

while

(y1) { if (workDone)

else

goOut;

{

yl

= false;}

{ y2

=

while

true;

(y2) { if (catMeowing) 16-75

{

We

...

case

else

if

loopend;”

The flowchart is given as Figure 16-14.

yi

S;”.

replace Sn;

an:

break;”

(n) with

Sn;”.

(n=an) with

“switch

“S;

while

(x)

T;

Source: David Mix Barrington Figure 2-2: A relation on A = {1,2,3}, B=

2.8 2.8.1

Properties The

of Binary

{w,2,y,

z}.

Relations

Definition of a Function

In Section 2.1 we defined a binary relation to be a subset of the direct product A x B for sets A and B. For any given element a of A, there may or may not be an element b of B that (a,b) is in the relation, and if there is one such b there may or may not be more than ‘We will sometimes call such a relation “a relation from A to B”. Remember also that we identify a relation with its corresponding predicate, so we may say “R(a,b)” interchangeably

“(a,b) ER”.

‘We can represent such a binary relation by a diagram like that in each member of A on the left side of the diagram, a dot for each and an arrow from left to right for each element of the relation: If an arrow from a to b. Note that a given dot on the left might have than one arrow leaving it.

some such one. may with

Figure 2-2. We draw a dot for element of B on the right side, (a,b) is in the relation we draw no arrows, one arrow, or more

In Section 2.1 we noted that a function can be thought of as a certain kind of relation, containing the pair (a, f(a)) for each element x of the domain. When can a relation be thought of as a function? A function must assign exactly one range element to each element of the domain. That is, for every element a there must be exactly one element b such that (a,b) is in the relation. In terms of the diagram, there must be exactly one arrow out of each dot on the left. (Thus the relation of Figure 2-2 is certainly not a function.) ‘You may have been taught in a calculus class to recognize whether a particular equation “defines a function”. An equation with two variables, each ranging over the real numbers, always defines a relation from R to R, namely, the set of pairs of reals that satisfy the equation. The equation y= defines a function from a to y, because for each value of x there is exactly one value of y making the equation true. The equation x y’, on the other hand, does not define a function

because for every positive value of x there are two values of y, namely \/z and —,/2,

2-34

that satisfy

Figure 16-16:

The Flowchart for Exercise 5.10.6. y3

=

true;

while

(y3)

{

if (catIn)

{

letCatOut;

else { else yi

letCatIn;

{

work; = true;}}

y2

work; work;

y3 y3

= false;} = false;}}

= false;}}

Note that the construction introduces two wholly unnecessary while with what

could

be simple

if

...

else

constructions.

Another

loops to deal

case for plain if

structures would make the construction more efficient. Exercise 5.10.5

The top box must be an action box or a decision box.

If it is an action box, we are in

Case O or Case I depending on whether any arrow goes back before the action box. If it is a decision box, we can always regard ourselves as being in Case III, since Case II is just a special case of Case III.

Exercise 5.10.6

The flowchart is given in Figure 16-16.

public

string

rev

(string w)

string out = ""; while !isEmpty(w)

out

return

{ = append(out,

{ first (w));

w = allButFirst(w) ;} out;}

Exercise 5.10.7 For the base case, we assume that atomic statements terminate.

If we concatenate two statements, each of which terminates on its own, the result terminates. And if both branches of an if-else statement terminate on their own, the if-else terminates

because it executes one of the two branches.

16-76

Figure 16-17: Exercise 5.10.8

The Flowchart for Exercise 5.10.8.

The flowchart is given in Figure 16-17.

public int countEdges (boolean [] [] E) { // assumes E is n by n, with n positive 0;

=

int

c

int

n = E[0].length;

for

(int

i++)

i y, and

z

This path either is a cycle itself or can be reduced to a cycle by deleting edges and nodes other than x, y, and z. This contradicts that assumption that G is a tree.

Exercise 9.1.4

Let T; have node set {a,b,c,d} and edges (a,b), (b,c), and (c,d). Let T have node set {w,a,y,z} and edges (w (w,y), and (w,z). Clearly both are trees, and they are not isomorphic because T; has no node of degree 3 while w is a node of degree 3 in To.

If a four-node

tree has a node

of degree 3 it must

be isomorphic

to T>, and

if it does not it must be isomorphic to T;. (An directed graph of maximum degree 2 consists of cycles and line graphs, and a tree with this property must be a line graph.) There are three different possible trees with five nod

, up to isomorphism.

If there

is a node of degree 4, the other four nodes must have degree one and we have a star graph. If ther a node of degree 3, it has three neighbors and the fifth node must be a neighbor of one of these neighbors. Otherwise, as above, we must have a line graph. Exercise 9.1.5

(a) The center node has five children. (b)

The root has one child (the center node) and that child has four children (the other four leaves).

(c)

If a is the root, it has children b and r, r has only child d, d has children e and g, € has only child c, g has children f and h. If b is the root, it has only child a,

which has only child r, and r’s family is as if a were the root. If c is the root, it has only child e which has only child d, d has children r and g, r has only child a which

has only child b, and g has children f and

h.

If d is the root,

it has

children e, r, and g, r has only child a which has only child 6, e has only child c, and g has children f and

h.

If e is the root,

16-80

it has children c and

d, d has

children r and g, and the families of r and g are the same as if d were the root. If f is the root, it has only child g which

has children d and

h, d has children

e and r, and the families of e and r are the same as if d were the root.

If g is

the root, it has children d, f, and h, and d’s family is the same as if f were the root. If h is the root, the tree is exactly as if f were the root except that f and

h change places. Exercise 9.1.6

Every edge contributes one to the degree of each of its endpoints, and thus contributes exactly two to the sum of the degrees.

Exercise 9.1.7 If every node had degree at least 2, then the sum of the degrees would be at least 2n, but by Exercise 9.1.6 this sum must be 2n — 2. So some node has degree less than 2. But a tree is connected, and thus every node in it has degree at least 1. Exercise 9.1.8 ‘We will show that every graph we construct in this way has exactly one simple path between each pair of distinct nodes. This is vacuously true for the one-node graph, since there is no pair of distinct nodes. For the inductive case, let T be a tree already constructed, which by the IH we assume to have the unique property. We take some node x of ¢ and add both a new node y and an edge (zx, y) to form T’. Now let u and v be distinct arbitrary nodes of T’. If neither u nor v is y, there was a unique simple path from u to v in T by the IH. It still exists in T’, and there is no other simple path

because no simple path can use the new edge without beginning or ending at y. The other case is when wu or v (let’s say v, without loss of generality) is y. There was a unique simple path from u to x in T, which still exists. By adding the new edge, we can make a simple path from u to y. And this path is unique, because any path from y must go to 2, whereupon there is only one way for it to reach u.

Exercise 9.1.9

(a)

The line graph has sequence

(b)

If n is even,

the other two graphs have (3,3, 3,2, 2).

(a)

the star graph has (2,2,2,2,1),

it is (n — 1,n — 1,n — 2,n- 2,

(n-1n-1,n-2,n—-2,...,(n+1)/2, (n+ 1)/:

(GES Exercise 9.1.10

(4,3,3,2,2),

(292) oD

1) 2

n/2,n/2).

(n—1)/2.

If n is odd,

and

it is

If w and v are any two strings in S,,, let w be their longest common prefix (possibly A). We can change u to w by a series of zero or more deletions of final letters, and then change w to v by zero or more appending operations. This gives asimple path in G, from wu to v. There can be no other simple path, because we must delete those letters and append those letters, and if we do anything else we

will have to reverse that operation and thus reuse an edge of G. Since there is a unique simple path between any pair of nodes, by the Unique Path Theorem

G

is a tree. (b)

Every node has degree 3, except in the special case of n = 0 when Jis

the only

node and has degree 0. From \ we can append any of the three letters, and from any other node we can either delete a letter or append one of the two allowed

letters. (c)

G, consists of three complete binary trees of depth n— 1, each containing 2" —1

nodes, and the node 4, for a total of 3(3" — 1) + 1 = 3(2”) — 2 nodes. We could also prove by induction on n that G;, has 3(2") — 2 nodes and (for positive n) 16-81

3(2”-1) leaves. The inductive step would use the fact that we double the number of leaves and add two new nodes for each old leaf. Exercise 9.3.1

‘We use induction on the call tree.

If the call does not call another method,

then n

must equal 0, and we return 1 which is the right answer by the definition of factorial. Otherwise we assume that the recursive call returns the right answer. We return n

times the result of the recursive call factorial(n - 1), which is n times (n — 1)! because the call is correct, which is n! by the definition of factorial. Exercise 9.3.2 ‘We use induction on the call tree. If there is no recursion then n must be 0 or 1, and we return n which is equal to F(n) by the definition. Otherwise we assume that the two recursive calls each return the right answer, so we return F(n — 1) + F(n — 2) which by the definition of the Fibonacci function is the right answer.

Exercise 9.3.3 They are not equivalent because the first is false when a and b are true and c is false, but the second is true in that case. The parse trees model the functional expressions or( and (a, b), c) for the first and and( a, or(b, c)) for the second. The prefix expressions are V /(\abc and AaV be respectively, and the postfix expressions are ab cV and abc V A respectively. The operator occurring first in the prefix expression, or last in the postfix expression, is the last one to be executed.

Exercise 9.3.4

For all three types,

single node.

the traversal of a one-node

tree is a sequence

consisting

of the

If the root r is a unary operator with single child s, the prefix traversal

is r followed by the prefix traversal of s, the infix traversal is not defined, and the postfix traversal is the postfix traversal of s followed by r. If r is a binary operator with children s and t, the prefix traversal is the r, followed by the prefix traversal of s, followed by the prefix traversal of t. The infix traversal is the infix traversal of s, followed by r, followed by the infix traversal of t. The postfix traversal is the postfix traversal of s, followed by the postfix traversal of t, followed by r.

Exercise 9.3.5 ‘We use induction on the parse tree. If the tree is a leaf, the prefix and postfix strings are identical and so have the same length. Otherwise the postfix and prefix strings of the tree are each concatenations of the corresponding strings for the children of the root, together with a string for the operator at the root. By assumption, for each child, the prefix and postfix strings for that child have the same length as each other. So the prefix and postfix strings for the tree are each concatenations where the sum of

the lengths of the elements of the concatenation are the same, so they have the same length.

Exercise 9.3.6 It is not true. Isomorphism of directed graphs depends on the presence or absence of edges, not on the order in which they occur. Although x A y and y A x are logically equivalent expressions, they are not the same expression. Yet they have identical parse trees, each with nodes for A, x, and y and edges from A to both z and y. If our expressions include non-commutative operators such as —, the directed graph alone will not distinguish between a > y and y > x, which are not even logically equivalent.

Exercise 9.3.7 The root node represents the call to £ib(5). It has two children, one for £ib(4) and one for £ib(3). The fib(4) child has children for fib(3) and fib(2), and the fib(3)

child has children for fib(2) and fib(1). The root thus has four grandchildren, one representing fib(3), two fib(2), and one fib(1). All but the last (which is a leaf) 16-82

have two

children each,

so the next

level of the tree has six nodes,

one for fib(2),

node on the depth-3 level.

The tree has 14

three for f£ib(1), and two for fib(0).

Finally the depth-4 level of the tree has two

leaves, for the two children of the £ib(2)

nodes in all. Exercise 9.3.8

The root node E has children for the two terms (a+ child is a T node with children for the two factors second child is also a T’ node, with children for its atomic expressions and so leaves of the tree. Next,

be)(c+ de + f) and gh. Its first (a + be) and (ec + de+ f). The two factors g and h (which are (a + bc) has an F node with a

single child for the expression a + bc, This expression node has children for its two

terms a and be. Then a has a single factor child, which is a leaf because a

is an atomic

expression. And bc has two factor children, each of which is an atomic expression. Finally, (c + de + f) is an expression node with children for its three terms, which in turn have one, two, and one factors respectively. Each of these factors are leaves of the parse tree.

Exercise 9.3.9

Let’s use induction on the number

of terms in the input

string.

Our

statement

is

“after reading the first n terms, sum has a value equal to the sum of those terms, and

the method is at the while statement”.

If there is only one (our base

since there

must be at least one), the call to evalTerm returns the correct value, and since the input is then over isPlus returns false and we return sum, which is the value of the term. Now assume as IH that we have read the first n terms, and the remaining input

is the plus sign and the last term.

Since by the IH we are at the while

statement,

isPlus is true, we read the next term and add its value to sum, and we reach the while statement again. Now sum has the correct value, and the method in the place

given by the n + 1 case of our statement.

Since we are at the end of the input, we

return the sum of the n+ 1 terms, which is the correct output.

Exercise 9.3.10

In the solutions to Exercise 4.10.9 we computed maximum values of 1 for d = 0, 2 for d=1, 4 for d= 2, 16 for d = 3, 256 for d = 4, and 65536 for d= 5. Once we reached

d = 2, each succeeding value is the square of the last because the best way to get a large value in depth d+ 1 is to multiply together two copies of the maximum value for depth d. The pattern is easier to see if we express the values as powers of 2: for d = 2, 24 for d = 3, 28 for d= 4, and 2'° for d = 5. In each case we have 2

and this general rule

is easy to prove by induction.

We

take d = 1 (value 22

2? ts

‘) as

our base case, and for the inductive case just observe that the claimed value for d+1,

2 Exercise 9.4.1

is just the square of the value for d given by the IH, which is ge

There are 4'° or about 4.295 billion ways to place one of the four numbers in each of the 16 spaces. Since there are 4! = 24 ways to place the four numbers in a column without repeating, there are 24° 331776 ways to place four such columns. Some

of these are Latin squares (no repeats in a row either) and some of those are sudoku solutions. As suggested in the exercise, we can get an upper bound on the number of Latin squares by taking the 24 ways to place the first row and looking at the number of ways to place each additional number. In the second row, there are three possibilities for the first number, either two or three for the second. her one or two for the third, and one for the fourth, for at most 3 x 3 x 2 = 18 possibilities. For the third row there are one or two possibilities for each of the first three entries, and one for the last, for at most 8 possibilities, and the last row is then entirely determined. Our upper bound

16-83

on the number of Latin squares is thus 24 x 18 x 8 = 3456. A more careful analysis shows that there are 576 Latin squares, half of which (288) are sudoku solutions. Exercise 9.4.2

There is one empty path of length 0, and d° = 1. Assume

paths of length exactly k.

that there are at most d*

A path of length k + 1 is formed by adding an edge to a

path of length k, and each path of length k can be extended

the last edge must originate at a specific vertex. paths of length k + 1, completing

So there are at most d* x d = d*++

the induction.

For paths of length at most

the Sum Rule we get that there are at most 1+d+...+d*, calculate as eee Exercise 9.4.3

(a) Yes, any (d) any

in at most d ways, since k, by

which may be easier to

No, different positions in the first column have different effects on diagonals. (b) any tour can be expressed as a tour beginning at the top left square. (c) Yes, solution can be changed to a solution with a 1 there by relabeling the numbers. No, moving to a corner is different from moving to a side or to the center. (e) Yes, game can be reduced to one of these three cases by rotating the board.

Let the graph have nodes corresponding to the integers, with edges from i to i+ 1 and i—1. Let the start node be 0 and the goal node be —2. It is possible for a generic search to begin by exploring

1, then 2, then 3, and

so on, never looking

at —1

but

continuing forever to search more positive integers. Exercise 9.4.5

This is possible if the algorithm has no memory of prior visits to a node, and if it chooses nodes from the open list in an unfortunate way. Let the graph be a complete

graph with three nodes a, b, and c, with a the start and c the goal.

It begins by

putting b and c on the open list. Suppose that it then takes b off the open list, so that it puts a and c on. Then suppose it takes a off the open list, putting b and c on, and

so forth, always taking a or b off rather than c and thus never testing whether c is a goal state. Exercise 9.4.6

If there is a path, to the open

Lemma

3 says that we will find it as long as every node added

list is eventually removed.

If there is no path,

Lemma

4 says that we

will declare defeat as long as the search state is finite and each node enters the open list only finitely many times. We are given that the graph is finite, and because it is acyclic, there are only finitely many paths to each node in the graph. We know that anode

is put on the open list once for each path to it. Thus the condition of Lemma

4 is fulfilled directly, and the condition of Lemma 3 is

fulfilled because only finitely

many actions can happen on the open list before the node is removed.

Exercise 9.4.7

Exercise 9.4.8

(a)

It means that every node will be reached before defeat is declared.

(b)

If the diameter is d, any search that reaches all nodes at depth at most d has in fact reached all the nodes.

Let’s set rn! = (1.01)” and then solve for n. Taking natural logs of both sides, we get 100 Inn = n(In 1.01), and the right-hand side is about 0.01n. This makes n about 10000 Inn.

If we take n = 10°, Inn is between 12 and 18 because 10 is between e? and

e3. So n is much bigger than 10000Inn, which is between 120000 and 180000. This means that for n = 10°, (1.01)” is much larger than n!°°, and for larger n this will be

even more pronounced.

Exercise 9.4.9

Let’s denote

states by strings that

have

the disks on A,

followed

by those

on B,

followed by those on C, separated by vertical bars. Disks on the same pole will be in numerical order, as they must be on the pole. Since disk 1 and disk 2 can each be on

any of the three poles, we have nine states: 12|| (the start state), 1|2|, 1||2, 2{1|], |12| (the goal state), |1|2, 2||1, |2|1, and ||12. Ifa state has disk 1 on top of disk 2, its node has degree 2 because disk 1 may be moved to either of the other two poles. Otherwise the two disks are on different poles and there are three moves: either disk may go to the empty pole, or 1 may go on top of 2. The state graph consists of three triangles based on where disk 2 is, connected by the edges that move disk 2. The shortest path from 12|| to |12| passes through 2||1 and |2|1, with three edges in all. Exercise 9.4.10

A state is characterized by the set of holes that have pegs, and every set except the empty set and the set of all pegs is possible, so we have 2° —2 = 510 states.

an edge for every legal move.

There is

Let’s denote the squares by the letters A, B, C for the

top row, D, E, F for the middle row, and G, H, I for the bottom. We can eliminate the start state with the center square E empty, since no jump is possible from it. If we start with a side square like B empty, there is one jump possible,

but the resulting state with E and H empty has no jump possible. So if we can do it at all, we start with a corner square like A empty. There are two possible moves from this position, which lead to having either B and C empty or D and G empty. Since these are similar, let’s look at B and C empty. From here there are two moves. The first leaves C, E, and H empty, from which there is one move leaving A, B, E, and H empty. From there is only one move, leaving B, D, E, G, and H empty, a dead-end position. The second

move

from B and C' empty leaves B, F, and

are three possible moves:

I empty.

From here there

(1) B, D, E, I, with one move to a dead end, (2) E, F, H,

and J empty, already a dead end, and (3) B, F, G, and H empty, with one move a dead end. So there is no solution to this version of the puzzle.

Exercise 9.5.1

The first solution found is 15863724.

along the way were 13524,

13528,

to

The five-queen solutions discovered and explored

13572,

13574,

13582,

13584,

13627,

13682,

13724,

13728, 13824, 13827, 13862, 13864, 14253, 14258, 14273, 14283, 14286, 14682, 14683, 14736, 14738, 14752, 14753, 14758, 14852, 14853, 15263, 15283, 15286, 15724, 15726, 15824, and

Exercise 9.5.2

15827.

Some of these extend to six or seven queens.

From 1, the DFS places 0 and 2 on the open list. It takes 0 off and puts 1 and 3 on — let’s assume it puts 1 on the top of the stack. It then takes 1 off and puts 0 and 2 on.

It will now alternate exploring 0 and 1 forever. If we then assume it can recognize 1 as the start node and not process it, it begins by taking 0 off and putting 1 and 3 on, then it skips 1, takes 3 off, and puts 0 and 2 on.

If it always puts the lower number

on top, it will now alternate forever between taking 0 off and taking 3 off.

Exercise 9.5.3 We start at 1 and put 0 and 2 on the queue.

We take 0 off and put 1 and 3 on. We then take 2 off and put 1 and 3 on. We’ve completed the processing of the first level, and the queue contents are now (1,3, 1,3). Similarly, processing the next level leaves eight nodes on the queue, four copies of 0 and four copies of 2. a)

There

are 2” paths of length

n, and

length n. 16-85

thus 2” nodes

on the queue

for paths of

impossible to find out its size. The empty set has size 0. If a set is finite and not empty, its size is some positive integer because it has at least one element. The set {2,3,5} is finite and has size 3. The set of all integers large enough to count how many there are. Similarly, the set of is the set of all even positive integers. One way to show that a of elements in the set that goes on forever and never repeats an

is infinite, because no integer is all positive integers is infinite, as set is infinite is to describe a list clement, such as {1,2,3,4,... } or

{2,4,6,...}. No one knows whether the set {x: both x and a + 2 are prime numbers} is finite or

not (we will define prime numbers in Chapter 3). Definition: Example:

A set identity is a statement about sets that is always true, for any sets. The statement

“A C A” is true, ifA is any set at all. Why?

In the rest of this chapter

we will develop techniques to make formal proofs of facts like this. But for now, it’s worth looking informally at why such a statement should always be true. The first step is to look at the definition to see exactly what the statement means. It tells us that “X C Y” means “every element of X is an element of Y”, so we know that “A C A” means “every element of A is an element of A”. It’s pretty hard to argue against this last statement, so for now we'll accept it as “obviously true”. (Later we will want to have some sort of criterion for what is obviously true and what isn’t.) Have we accomplished anything with this “proof”? We have shown that the statement about a new concept, “subset”, translates into an obviously true statement (about something being equal to itself) when we use the definition, so we have shown that the truth of the statement follows from the very meaning of “subset”. The identity isn’t very profound, but this informal “proof” is a good example of the overall mathematical method we will use throughout the book.

1.1.3

Exercises

We define the following sets of naturals: A = {0, 2,4}, B = {1,3,5,8}, C = {2}, D = {0,5, 8}, and E = {x : x is even}. El1.1.1

Indicate whether each of these statements is true or false:

(a) OE A

(b) TEE

(ec) CCA

(ad) DCB (e) DCE

(f) |DI=4

(s) |C|=1

(h)

D and E have no common element.

(i) E is finite. 1-6

y

(a) y =

y

2, a function

(b) 2 =’, not a function Source: David Mix Barrington

Figure 2-3: The graphs of two equations on two real variables. it. The relation defined by the equation

? is not a function.

these two relations, and the vertical line intercepting the

Figure 2-3 shows the graphs of

second graph twice to indicate that it is

not the graph of a function. Using quantifiers, we can give logical statements that define this property of being a function. It turns out to be convenient to regard the “exactly one” condition as the AND of two conditions. Definition: We say that a binary relation is total if every element of the first set is associated with at least one element of the second:

Ya: 3b: (a,b) ER Remember

that

the existential

quantifier

does

not

say anything

about

how

many

values

of its

variable satisfy its quantified part, as long as there is at least one.

Definition: A binary relation R is well-defined if for every first argument a there is at most one second argument 6 such that R(a,6) is true. Expressing this property with quantifiers is not as straightforward as expressing totalness. One way to express it is to say what does not happen — there are never two different b’s associated to

the same a:

ada : Sb: de: ((a,b) € R) A ((a,c) € R) A (b #0)

2-35

(b)

Now 1 still has only child is 3. searches only 2, two nodes to go

children At the and the on the

0 and 2, but at the next level 0’s only child is 3 and 2’s next level, the first 3 knows not to to search 0 and thus other searches only 0. Each subsequent level causes only queue.

(c)

Again the second level has only two nodes,

both for 3, and the third level has

two nodes, one for 0 and one for 2. But now at the fourth level, the node for 0 puts 3 on the queue, but the node for 2 puts on nothing because it recognizes 1 as the start node and 3 as its parent. So the fourth level has only one node,

for 3. The fifth level has one child of that node, for 2. And that is actually the entire tree, because

the node for 2 recognizes 3 as its parent

and

1 at its start

node. Exercise 9.5.4

The BFS could have a moment when it finishes many as b? nodes. The any one time while it is The search truncated

node in its queue processing all the DFS, on the other searching to depth

for every path of length exactly d at the nodes at distance d—1. This could be as hand, would have d nodes on its stack at d.

to depth d will take time proportional to the number

:

a

ap

of nodes

in the DFS tree of depth d, which is at most 1+6+b?+...4+b¢ = 4 a. The iterative deepening method would conduct this search only after the searches for all previous depths,

about et

which would

take total time et

=f

ot ae meee

ee

which

is only

times the time for the optimally truncated search.

Exercise 9.5.6 ‘Whenever we put a node onto the open list, we can record its depth, with the depth of the start node being 0 and the depth of any other node being one greater than the depth of the node causing it to be put on. If we ever find a node of depth n, we know that there is a path that contains the same node more than once, and there must be

a cycle.

Any cycle reachable from the start node will cause the search to eventually

reach depth n, so it suffices to search from each possible start node. If there is no cycle, we proved in Exercise 9.4.6 that the search will terminate. We can use either a DFS or a BFS, but the DFS is likely to find a node of depth n more quickly if it

exists.

nce the BFS will try all paths of length n — 1

Exercise 9.5.7 If s is a sink (with out-degree 0),

first. in the

the only path from s is the empty path.

Otherwise

there is the empty path, plus one path for each s’s neighbors.

public

int numPaths

s)

{

int

num

for

(each neighbor t of s) num += numPaths(t);

return

Exercise 9.5.8

(node

=

1;

num;}

Now we use that fact that there is one empty path of length 0 out of s, and a path of

length k out of s for every path of length k — 1 out of a neighbor of s. public if int

int (k

mnumPaths ==

num

0) =

0;

return

(node 1;

s,

int

k)

{

for

(each neighbor num

return

+=

t of s)

numPaths

(t,

k-1);

num;}

This works on any directed graph, as long as each node has only finitely many neighbors, and that condition is necessary for the number of paths to be finite.

Exercise 9.5.9

Each

search will place one node on the list for every path out of the start node,

so

each of the two searches will put exactly the same number of nodes on its list.

Exercise 9.5.10 ‘When we put the object containing the node « on the list, we include in that object a path from s to 2. When

we put « on the list becuase it is a neighbor of y, the path

we name from s to x is just the path we already had from s to y (located in the object we took off the open list to search y) with the edge from x to y appended.

Exercise 9.6.1

The DFS

and BFS

each search the entire connected

component

of the start node, so

these trees include all the nodes in the graph if and only if the graph is connected. If there is an edge that is not in the DFS or BFS trees, then it forms a cycle with the path in the tree between its endpoints,

so the original graph is not a tree.

Similarly,

if there is a cycle in the original graph, it is not possible for all the edges in that cycle to be included in the DFS or BFS tree, because those trees cannot contain cycles. Starting from a: a finds b which finds d which finds ¢ which finds e which finds f, so that the DFS tree is a line graph. Starting from c: ¢ finds a which finds b which finds d which finds f which finds e, so that again we have a line graph.

Exercise 9.6.3 The DFS tree is a line graph, and the BFS tree is a star graph. Exercise 9.6.4

Here

by

“isomorphic”,

we mean

isomorphic

as rooted

directed

trees.

All RDT’s

of

two nodes are isomorphic, as they must have a root and a single child. But there are two possible RDT’s

with three nodes,

one where

the root has a grandchild

and one

where it has two children. Let G have node set {a, b,c} with edges (a,b) and (b,c). If we conduct either a BFS or a DFS from a, we get a tree where b is the child of a and cis a grandchild. But if we do either a BFS or DFS from b, we get a tree where both aand care children of b. Exercise 9.6.5 If we start a BFS or DFS from a node x, we discover exactly those nodes that have a path from a2. We could do n? separate searches,

the matrix by filling for each node for each pair by starting a

seeing whether out the 2 row not discovered. of nodes in x’s new DFS from

one to fill out each entry M,,

of

y is discovered from x. Better is to do one search from of the matrix with a 1 for each node discovered and a 0 Better still is to search from x, enter a 1 in the matrix connected component, and then (if necessary) continue some node not connected to x. This would require a

number of searches equal to the number

of connected components in the graph.

Exercise 9.6.6 y could be in level i or any level with a smaller number. If (a, y) is a non-tree edge, we know that y has already been processed when it is discovered from a. Since we are still processing level i, any previously processed node must have level number i or

smaller.

Exe

Figure 16-18:

The DFS and BFS trees for Exercise 9.6.7.

Figure 16-19:

The DFS and BFS trees for Exercise 9.6.8.

e 9.6.7 The trees are shown in Figure S-18.

Exercise 9.6.8

9

The trees are shown in Figure S-19.

(a)

This must be true, because at each step the BFS and DFS explore the same set of nodes. So both will explore their last node on the same step, and the number of trees in the forests is the number of steps.

This could be true, and will be if the original directed graph is a directed forest.

But

it fails in the

simple

example

of node

set

{vj,v2,v3}

and

edge

set

{(v1, v2), (v1, v3), (v2, v3)}. The edge (v2, v3) is a tree edge of the DFS tree but a non-tree edge of the BFS tree. This cannot be true.

If the number of trees in the forests is c, the number of tree

edges in each forest is n — c, the same for both search This could

be true.

Consider

a directed

graph with node set {v1, v2, v3, v4}

edge set {(v1, v2), (v1, v3), (v3.04), (v4, ¥2)}.

16-88

and

The tree edges of the two trees are

Figure 16-20:

The DFS

and BFS trees for Exercise 9.6.10.

identical. The non-tree edge (v4, v2) is a cross edge of the DFS tree but does not go from one node to another at the same level.

Exercise 9.6.10 Both trees are shown in Figure S-20. (a) The DFS tree has eight tree edges and four cross edges: (h, i). (b)

Exercise 9.8.1

(d,e), (d,g), (

The BFS tree has eight tree edges and four non-tree edges: and (h, 9).

), and

(d,e), (f,c), (hi),

From a: We find b from a at distance 1, c from b at total distance 3 from a, e from a at distance 5, and d from ¢ at total distance 6. From c: We find b from c at distance

2, d from ¢ at distance 3, a from b at total distance 3 (it is arbitrary whether a

or d

is

found first), and e from d at total distance 7. From e: We find d from e at distance 4, a from e at distance 5, b from a at total distance 6, and c from d at total distance 7.

Exercise 9.8.2

Both search trees turn out to be star graphs, as we find each node from the root before we find any two-step paths.

Exercise 9.8.3

Every set of edges in the graph has a unique sum of weights, so it is not possible for two paths to have exactly the same cost and thus not possible for two nodes to have

the same distance to the root as calculated during the search. The distance associated to a node during the search is always the weight along some path. Exercise 9.8.4

If f(u,v) = 2 and f(v,w) = y, then there is a path from u to v of length x and a

path

from v to w of length y. There thus exists a path of length «+ y from u, through

v,

to w. The number f(u,w), the length of the shortest path from u to w, can be no greater than the length of this specific path. Thus f(u,w) < f(u,v) + f(v, w). Exercise 9.8.5

Let the graph have node set {a,b,c,d} with edges (a,b), (a,c), and (c,d) of weight 1 and edge (b,c) of weight —3. We conduct a uniform-cost search with start node a and goal node d. From a we find b at distance 1 and ¢ at distance 1. From b we then find ¢ at total distance —2 and a at distance 2. Next off the priority queue is c, from

16-89

which we find

at total distance

—5 and d

at total distance —1.

We examine b

first,

finding c at total distance —8, and so on. The goal node d is in the priority queue but will never be examined because the latest discovery of b or c will always have a lower distance. Exercise 9.8.6

The definition explicitly says that g(a, «) = 0, and it is clearly symmetric between x

and y, so g(x,y) = g(y, the triangle inequality, and the right-hand side 2f(L,y), the inequality

x). Let 2, y, and z be any three points. The left-hand side of g(x,y) + 9(y,z), equals f(x,L) + f(L,y) + f(y, L) + f(L.2), equals f(a,LZ) + f(L, z). Since the left-hand side is greater by holds.

Exercise 9.8.7 Consider a graph with three nodes a, y, and z, and edges (a, y) of weight 5, (x,z) of weight 3, and (y,z) of weight 1. The DFS has « find y, which finds z, for a tree where z is the grandchild of «. The BFS makes both y and z children of «. And the UCS

tree first finds z from a and then finds y from z, so that y is a grandchild of x. Exercise 9.8.8

The first edge to enter the UCS node

as one of its endpoints.

tree is the edge of minimum So if we search from

cost that has the start

u or v, we will include

(u,v)

in

the tree. This is not necessarily the case if we search from somewhere else. Consider

a graph with nodes «, u, and v, and edges (a, u) and (2, v) of weight 3, and (u,v) of weight 1. The shortest paths from x to u and v each consist of the edges from x, and the UCS tree is complete before edge (u,v) is considered. Exercise 9.8.9 We

transform

an edge

with weight

w into a path consisting of w edges.

Now

the

number of edges in a path of G’ is equal to the total weight of the corresponding path in G. Since BFS finds the paths with the smallest number of edges in G’, we can use this result to find the shortest paths in G. The number of nodes and the number of

edges in G’ is roughly equal to the sum of the edge weights in G — each is certainly less than mw, where m is the number of edges in G and w the maximum weight of any edge in G. Exercise 9.8.10

(a)

If we let uw and v be nodes defining the diameter, and let x be a middle node of

the shortest path from u to v. We know that D = d(u,v) > d(u,x) + d(a,v), which means that at least one of u and v is at distance at least D/2 from a, making R at least D/2. And clearly there is some shortest-path distance for a pair that is equal to R, making D at least R.

(b)

Do a uniform-cost search from each node x, recording the distance of the largest node from xz. The minimum

(c)

of these n distances is R, and the maximum

Here is a five-node example where we do not. The node set is {u, a, b, c, d} and the edge set {(u, a, 2), (u, b, 3), (u,¢, 3), (u,d,2), (a,b, 1), (b, €, 2), (c,d, 1)}. We search u, then search b and c because they have the maximum distance 3 from u. Every node is within distance 3 of u, b, and different paths, is 4.

Exercise 9.9.1

is D.

c.

But the distance from a to d, by two

A negative value of h can cause an incorrect conclusion if it is for the goal node. Let the graph have start node s, another node a, and goal node 3, with edges (s,a) of

length 1, (s,t) of length 3, and (a, t) of length a. The shortest path is through a, but suppose that h(t) = —7 and h(a) = 1. The A* search will take ¢ off the priority queue before

a, and

thus find the one-step

path to ¢ before

16-90

the shorter

two-step

path.

If

h(t) = 0, on the other hand, negative values of h for other nodes could cause those nodes to be explored earlier than they should, but cannot prevent exploring all relevant nodes before accepting a path from s to t.

For the second example,

let the graph have

nodes

set

{s,a,b,t}

(s,), (a,t), and (b,t), where the edge (s,b) has weight 2 have weight 1. (As usual, s is the start node and t is the (incorrectly high) and h(b) = 1. The A* search explores h(a) = 4 while k(b) + h(b) = 3. While exploring b, it finds s to t through b, missing the shorter path through a.

the search from

with edges

(s,a),

and the other three edges goal node.) Let h(a) = 3 b before a because k(a) + the path of length 3 from

Exercise 9.9.2 We

have four nodes, s (the start), a, b, and t (the goal). We have edges length 3, (s,b) of length 1, (b,a) of length 1, and (a,t) of weight 2. We let and h(b) = 3, which is admissible but violates the consistency condition h(b) > h(a) + d(b, a). The A* search initially computes k(a) + h(a) = 3+0 k(b) + h(b) = 1+3 = 4, so it explores a first and sees the bad path of length will still explore b before taking ¢ off the queue, since k(b) + h(b) = 4 is still k(t) + h(t) = 5, but it will not explore a a second time from b and will thus

(s,a) of h(a) = 0 because =3 and 5 to ¢. It less than miss the

optimal path of length 4.

If h is a consistent heuristic without negative weights, and h(t) then all we need to show h admissible is to show that h(w) is true distance from u to a goal node. This follows by induction if the shortest path from wu to t goes first to v, and we assume by

= 0 for any goal node, a lower bound on the on paths from u to t — induction that h(v) is

at most the distance from v to t, then consistency tells us that h(w) < h(v) + d(u, v). But the true distance to wu equals the true distance to v plus d(u,v), which is at least

as big as this.

Exercise 9.9.3 This heuristic is admissible because any path from v to ¢ must have total length at least equal to the a-axis distance from v to t, which is h(v). It is consistent because if u and v are any two nodes, h(v) = |, — 2;| is no greater than |x, — @,,| + |, — 21. The first of these terms is no greater than d(u,v), and the second is no greater than h(u). So we must have h(v) < d(u,v) + h(u), as required. Exercise 9.9.4

The search from s places nodes i, f, and j in the priority queue. Node j, with combined score 1+ 1, is searched first, putting g in the queue. Next is node f at 14+ V5, putting e and b on the queue. Then comes g at 2+ 2, which provides no new nodes. Then i at 1+ 3, which puts in e. Then b at 2+ V8, putting in a and c. Then e at 2+ /10, providing no new nodes. At this point the search finds the correct

path, adding in succession ¢ at 3+ V5, d at 4+2,h Exercise 9.9.5

at 541,

and t at 6+0.

In any position of the 3-puzzle there are two possible moves, which means that every

component of the position graph must be a cycle (since every node has degree 2. Denote a position by listing the contents of the four squares in row-major order, using X

for the blank

position.

The

start

position

is 123X,

from

which

we

successively

move to 1X32, X132, 31X2, 312X, 3X21, X321, 23X1, 231X, 2X31, X213, 12X3,

and back to 123X. Our only options are to go forward or backward in this cycle of twelve positions. There are 24 total ways to arrange four items in four positions — the other twelve form a separate cycle unconnected to this one. The suggested heuristic actually coincides with the distance in the cycle — if the start is at distance 0 the

16-91

reachable

nodes

along the cycle

are at distances

1, 2, 3, 4, 5, 6, 5, 4, 3, 2, and

1

respectively. Exercise 9.9.6

(a)

Suppose you are at town

x and want

to go to Boston.

Using

the graph,

you

find all towns y such that there is an edge e from x to y. You then look at the distance from x to Boston, and the distance from y to Boston, from the table. For at least one y, you will have the distance from y to Boston, plus the length

of e, adding to the distance from x to Boston. repeat

(b)

You should then travel to y and

the process.

This is just A* search with a perfect heuristic, since the table is telling you the exact distance to your goal. Yes.

The value h(s) is never queried by the A* algorithm.

Since the actual distance from g to g is 0, the only possible value for h(g) in an admissible heuristic is 0. If we alter h to make h(g) = 0, then, we make it admissible.

(c) Let x be the bad node. If h(a) < 0, we will still search every possible path, as the only consequence of the bad value is that some nodes will come off the priority queue

earlier than expected.

Each

node that does come

off is associated

with

a legitimate path, and since h(g) = 0 is accurate the paths to g will still come

off in the correct order. node

The problem comes

if h(x) > d(a,g), which may put a

for a path to x deeper in the priority queue

than it belongs.

This might

prevent us from ever checking some path from s through x to g that might be cheaper than the path we find first. But we will still check every path from s to

g that does not go through a, and at worst we will find the best such path which might not be the best path overall. If every path from s to g goes through x, we will take x off the priority queue shortest path.

before giving up, and we will find the correct

Exercise 9.9.8 We assume that the nodes of the graph are numbered from 0 through n — 1. Note that we need no exception for the case of i = j, because then d[i] [j] will be 0 and the condition will hold.

public

boolean n

for

(int

return

Exercise 9.9.9

(int

[][]

d,

int

[]

h)

{

h.length;

=

int

for

consistent i=

(int

0,

i

j =0,

if (h[j] true;}

h(y) + d(a,y)

we can bound

six per pawn or 96 and the total number

the total number

= 3. of pawn

moves

by

of captures by 30, the number of non-king

pieces. Thus the total number of moves must be at most 50(96 + 30) = 6300 before a fifty-move draw can be claimed. Just using the same-position-three-times rule we get a finite bound, of twice the possible number of positions plus one. The total number of positions is certainly bounded by 13%, as each of the 64 possible squares could have one of six White piece types, one of Black piece types, or be vacant.

Exercise 9.10.2

There are nine possible first moves, eight possible second moves, the ninth move is determined, so the number of possible sequen:

Exercise 9.10.3

There are 16 x 15 x 14 = 3360 possible positions, as we can place the White king in 16 possible places, then place the White rook in one of the 15 other places, and

then the Black king in one of the 14 others. notation, where

and so forth until s 9! = 362880.

Let’s use an analogue of algebraic chess

the four squares in White’s home

row are named

and similarly for the second, third, and fourth rows.

al, b1, cl, and dl,

If we start with the White king

on al, the White rook on 61, and the Black king on a4, then White has possible first moves Ka2, Rb2, Rb3, Rb4, Rel, and Rd1. There is no legal reply to Ka2 or Rb2, so

these positions are drawn (a good thing for Black here).

The only legal reply to Rb3

is the capture Kb3,

In reply to Rb4 Black could

play either Kwb4

which leads to an inevitable draw.

(leading to a draw) or Ka2.

Rel or Rd1, Black could move Ka3,

Exercise 9.10.4

To either of the last two White moves,

Kb4, or Kb3.

The first player has six possible opening moves, leaving the position respectively at 320, 311, 310, 221, 211, or 210. (Since the piles are indistinguishable, we resort them

from largest to smallest after every move.) five countermoves,

which we won’t

Each of these six moves has from three to

list here, but we will argue the the second player

has a winning strategy. This is to make the position either 220 or 110 after her first move, which is always possible (220 from 320 or 221, 110 from each of the other four). The only moves possible from 220 are 210 and 200, and the second player replies to

these by making the position 110 or 000 — the latter is an immediate win for her. If the second player makes 110, the first player must make 100 whereupon the second player

wins on her next move. The key to Nim strategy (and thus the answer to Problem 9.10.5) is to identify these second-player-win positions. The second player always can win from one of these, and the first player can always move to one of these, and thus win, if the current position is not one of these.

Exercise 9.10.5

All we

need

is for the best

move

by the second

player

to always

be the last one

considered, as in the tree whose leaves are 1,1,3; 2,2,4; 3

Exercise 9.10.6 ‘We will call the first player White and the second player Black. For n = 0, Black wins because White cannot move. For n = 1 and n = 2 White wins by knocking down all the pins.

For n = 3 White

wins

by knocking

down

the middle

pin, whereupon

Black must knock down one of the other two, leaving White one to knock down for the win. With n = 4 White can knock down the two middle pins, winning in the same

way.

With n = 5 White can knock down the middle pins, leaving two groups of two

16-93

whereupone she can win with a “mirror strategy”, taking everything Black does to one group and doing it to the other. In fact White can win any game with positive n in this way, by creating two groups of equal size by knocking down one or two middle pins.

Exercise 9.10.7 Again White cannot move.

moves

first and

Black second.

For n = 0 Black wins because White

For n = 1, n = 2, and n = 3, White can win by knocking down all the

pins on the first move. For n = 4 White can leave a single group of either two or three pins, which Black can win by knocking down. For n = 5, White can win by knocking down the three middle pins. For n = 6 White can win by knocking down two pins at either end, leaving a single group of four which we know is a losing position. For n=T White can win by knocking down the three middle pins, leaving two groups of two. From that point on she can win by matching everything Black does to one group by a move in the other group, until both groups are totally knocked down which will happen on White’s move.

Exercise 9.10.8

Exercise 9.10.9

This game is actually isomorphic to Tic-Tac-Toe and is thus a draw under optimal play. A 3 x 3 magic square is an arrangement of the numbers from 1 through 9 so that every row, column, and diagonal sums to 15. (It’s not hard to show that these exist, by just finding one.) If we number a Tic-Tac-Toe board in this way, and map the drawing of a card into one’s hand to marking the corresponding square with one’s mark, it’s easy to see that the games are isomorphic. (a)

Black can guarantee winning by making b; = 1 and by = 0 so that the number is both even and greater than 2.

(b)

If White makes w; = 0, Black can make b; = 1 and then win by making the final number either 5 or 7. But if White makes w; = 1, Black is doomed, because White can make w2 the same as b; so that the number is 8, 9, 14, or 15, all

composite.

Switching the roles of Black and White

losing one and vice versa, but that’s not what

changes a winning game to a

we did here.

It’s possible for the

saine player to win both an original game and the game resulting from reversing all the leaf outcomes — consider a game of only one move where one leaf wins for each player. We have a game tree where the values at the leaf level are 0, —1, 2, 3,

—6,

, —10, -12, and —15, the sahies at the third level are 2, 4, - : and —12, the values at the fourth level are —4 and —12, and the root value is —4. So the number chosen under optimal play is 4.

Exercise 9.10.10

You wait for Grandmaster A, playing White, to make the first move in that game. ‘You then make that same move as your first White move in the other game. When Grandmaster B replies to that move, you make the same move against A on the first board. In this way, you simply facilitate A and B playing a game with one another. If one of them wins, you win the game in which you were making their moves. If they draw, you earn a draw in both games.

16-94

16.7 Exercise 14.1.1

Exercises from Chapter

14

‘We use induction on all strings w.

For the base case, the string read so far is \, the

DFA is in state v, and 6*(v,A) =e by definition, so the value of current is as required.

For the inductive step, assume that current is 6*(1,w) = s after the string w has been read and let a be an arbitrary letter. By the given code, the DFA moves to

state d(s,a), and since by definition 5*(1, wa) = 6(d*(1,w)) = 6(s,w), current has the correct value after proof.

Exercise 14.1.2

wa

is read.

This

completes

the inductive

step and

thus the

Let the state set be {s,p}, with s the start state and the only final state, 6(s,0) =

6(p,0) = s, and 6(s,1) = d(p,1) = p. Then 6*(s, 01110) = 5(5(4(6(6(s, 0), 1), 1), 1),0) =

8. Exercise 14.1.3

Call the states s (the start state) and p (the other state). We must pick a subset of the two states for the final state set (four ways to do this) and pick values for 6 — we must pick one of the two states for each of the four state-input combinations. This is 4- 16 = 64 total choices. With three states, we have 2? = 8 choices for the final

state set. We must now choose one of the three states as value for 6 in each of the six state-input combinations,

so there are 3° = 729 possible 6 functions and 8-729 = 5832

different three-state DFA’s. Exercise 14.1.4

Let the j states be named

final state. reading

w

{0,1,...,j — 1}. Let 0 be the start state and i be the only

For each state k, 6(k,0)

is

equal to the number

= k and 6(k,1) = (k + 1)%j.

of ones in w, modulo j.

The state after

So the DFA

accepts

w if

and only if this number is 7. Exercise 14.1.5

The states are {0,1,2,3,4,5}.

For each i, d(i,0) =i and 6(i,1) = 1+ 1, except that

Exercise 14.1.6

Let D be a DFA for L and make a DFA D’ from D by changing every final state of D to non-final in D! and vice a. (That is, D’ has final state set F.) Now a string is accepted by D’ if and only if it is not accepted by D, so L(D') = L.

6(5, 1) = 5. The start state is 0 and the only final state is 5.

Write

w as wiw2...wn.

We

have state set {0,1,

n,d},

with 0 the start state and

n the only final state. We set 6(i — 1,w;) =i and set all other values of 5(s,a) to d. Now on input w the DFA goes from state 0 to state n and accepts, and on any other

input it stays in state 0 or goes to d and rejects. of w = A, where 0 = n and thus the DFA

string.)

(We should check the special case

accepts A, but goes to d on any nonempty

The DFA for w = 0110 has five states and is pictured in Figure S-21.

Exercise 14.1.8 We have a

start state « and states p, and

q, for each letter a.

The

final state set is

{pa : a € X}. We define 5(1, a) = pa, 5 (pa,a) = 6(qa,a) = pa, and d(pa,b) = 5(qa,b) = a for all b # a. Once the DFA has se n the it saw was also a, and to gq otherwise.

first letter a, it goes to pa if the last letter

Exercise 14.1.9 ‘We have a start state 1, a state pa for each letter a, and a single final state f. We define 5(t,a) = pa, 5(pa,a) = f, 5(pa,b) = pp for all b # a, and 6(f,a) = f for all a. After the first letter, the DFA

remembers

the last letter it has seen,

until or unless

it sees that letter again and jumps to the final state, staying there for the rest of the string.

16-95

or, using the negation-of-quantifier rule three times:

Va: Vb: Ve: -[((a,b) € R) A ((a,c€ R)A(b¥c)] or, rewriting the negated AND

as an implication,

Va: Vb: Ve: [((a,b) € R) A ((a,c) € R)] > (b=c). The last version can be informally read in English as of B, those two elements are the same,” which is related to at most one element of B”. In terms of dot on the left, any two arrows out of it go the the

“if any element of A is related to two elements another way of saying “every element of A is the diagram, the last statement says “for any same plac

Once we have quantified statements with R as a free variable that say “R is total” and “R is well-defined”, we can take the AND of these two statements to get a single statement saying “R is a function”. In Section 2.9 we will explore functions further and use quantifiers to define some important properties of functions.

2.8.2

Properties

of Binary

Relations

on a Single

Set

It is possible, in fact even common, for the two sets in a binary relation to be the same set, giving us a binary relation on a set. (If the set is A, the relation is a subset of A x A.) Two examples of such a relation are the equality relation (FE. — y) on any set, and the order

relation on a ordered set (such as L(x,y) R(y, x), that is, if you can always switch

the variables in a true instance and keep it true.

e The relation R is antisymmetric if Vx : Vy: (R(x,y) A R(y,2)) > (a = y). That is, if you switch the variables in a true instance, and the variables aren’t equal, you always get a false instance. Note that “antisymmetric” does not mean the same as “not symmetric” — it is not hard to think of a relation that is neither.

2-36

Figure 16-21: The DFA for {0110}. Exercise 14.1.10

We

have four states EE,

OE,

and OO,

where the first letter of the state name

indicates whether we have seen an even or odd number of a’s, and the second letter indicates whether we have seen an even or odd number of Seeing an a causes us to change the first letter and keep the second the same, so for example 6(EO,a) = OO, and

seeing

a b causes

The

three strings A,

6(EO, b) = EE. only final state. Exercise 14.2.1

EO,

us to keep

the

first and

change

the

second,

so for example

State EE is both the start state (since 0 is an even number) and the 1, and

10 are pairwise

L-distinguishable,

as \ distinguishes

from both 1 and 10 (since A € L but 1 ¢ L and 10 ¢ L), and 1 distinguishes 1 from 10

(since 11 € L, but 101 ¢ L). A two-state DFA would have to have two of the states 6*(v,A), 6*(c,1), and 5*(c, 10) equal, so it would fail to accept exactly the language L. Exercise 14.2.2 ‘We look at w =

vx, where

v is an arbitrary string and x an arbitrary letter.

definition 6*(1,w) = 6(5*(4,v), 2). We check each case: e Ifv eli,

« =a,

then w € Lz as w ends in a

e Ifv eli,

« =),

then w € Lj as w ends in neither a nor ab and has no aba

e IfvelLs,x e Ifv € Lo, e Ifv € L3,

, then w € Lz as w still ends in a and has no aba

« =a,

has no aba

then w € Ly as w has an aba at the end

= b, then w € Ly as w ends in neither a nor ab and has no aba

e Ifv € Ly, whatever x is, w € Ly because

Exercise 14.2.3

and has no aba

=), then w € L3 as w ends in ab and

e lfv

By

w also has an aba

The DFA is drawn in Figure S-22. L, is the language 0* + U*00*, Lp is the language =X*1, and Ly, is the language 0*10. L(M) is the union of L, and Ly, which can also be written \ + &*0.

If we let w = va,

as above in Exercise 14.2.2, there are six cases to

check: elfveL, elfueL,,

0, then w € L, as w is in 0* if v was and in 5*00*

= 1, then w € Ly as w clearly ends in 1

16-96

if v was

elfveL,

Figure 16-22:

The DFA for Exercise 14.2.3

Figure 16-23:

The DFA for Exercise 14.2.4

= 0, then w € L, as w ends in 10

elfveL,

1, then w € Ly as w ends in 1

elfue Ly

0, then w € L, as w ends in 00

e Ifuc Ly

= 1, then w € Lp as w ends in 1

Exercise 14.2.4 A DFA with language P; is shown in Figure S-23. Here L, = (LR)*, = (LR)*L and Ly is everything else, (LR)*[R + LL]=*. We have to verify hen six cases for a complete proof as in the ta two exercises (w= v2): e IfveL,

and«=L,

clearly w € Ls

e Ifv EL,

anda =

e Ifv eL,

and

X = L.

e Ifv eL,

and

X = R, then w € L, as we just created

R, clearly w € La with an R after the (LR)* string clearly w € Lg as we just created an LL

If v € Lg, whatever « is, w is also in Lg

16-97

an LR

Exercise 14.2.5

For the base case, with w = A, 6*(1,u) = 6*(1,v) is given as the assumption. For the inductive case, let w = wa and assume 6*(v, ux) = 6*(1,0%) = s. Then d*(t, ura) and 6*(v,vra) are both 6(s,a) by definition, and thus are equal to each other. This completes the inductive step and thus the proof.

Exercise 14.2.6

Let w = w) Wy... w, and consider the n+1 different strings A, w1, wy W2, ...,W1---Wr,

and any other string v, for n +2 strings in all. We claim that any two of these strings are distinguishable for the language {w}. If i and j are two different naturals each no greater than n, then we have (w)...wj)wi41...Wn = w but (wy... wj)wit1-.-Wn F w. And for any i with 1 < n we have (wi... wi)Wit1.-.Wn = w but (v)wi41... Wn F

w.

Exercise 14.2.7 The the not but Exercise 14.2.8

The

four strings A, a, b, and ab are pairwise distinguishable.

Exercise 14.2.10

six strings A, 1, 11, 111,

1111,

and

11111

are pairwise

distinguishable for this

language. Since 11111 is in the language and the other five strings are not, it is distinguished from each of them by A. If i < 7 < 5, we can distinguish 1’ from iJ by 1°-J, which puts 1

into the language but not 1’.

Four such strings are A, a, b, and aa. The last of these is the only one of the four with a double letter, and so is distinguished from the others by A. Then a is distinguished from \ and b by a, and b is distinguished from \ by b. We

have state set {0,1,...,k — 1}, start state 0, and

that after reading a binary string w with n ones,

accomplish this by setting 6(i,0) = i and d(i,1) pairwise distinguishable strings can be ,

Exercise 14.3.1

distinguish \ from

other three strings by appending A, since \ € EE and the other three strings are in EE. For each of the other three strings, appending itself gives a result in EE appending either of the other two strings does not.

appending

Exercise 14.2.9

We

only final state 0.

we should be in state n%k.

We can

= (i+ 1)%k for each state i. Our k

1, 11, ..., 1-1.

Tf i and j are two distinct

naturals with i < j < k, we can distinguish 1’ from 1

by the string 1*~.

Since x

) + (yz € L).

y, we know that for any string z,

A, we have that (a € L) 4 (y € L), so since

The idea is

i

same argument, taking z = A, shows that if « ¢ L and x

Specifying z to

we know that y € L. The y, then y ¢ L.

Exercise 14.3.2

There are four classes: Cy = (a+ b)*, Cy = (a + b)*c(a + b)*, Cy = L, and C3 = (a+ b)*c(a+b)*c(a+b)*cX*. Two strings are equivalent if and only if they each have three or more c’s, or they have the same number of c’s less than three.

Exercise 14.3.3

a® = ) is in the language,

a out,

a? and

a® in, a* in as a2a?,

strings of a’s in by an easy strong induction.

classes: {A}, {a}, and aaa*. three states.

a

in, and

all larger

Thus there are only three equivalence

This language is certainly recognizable by a DFA, with

Exercise 14.3.4

The minimal DFA is drawn in Figure S-24. The six classes are Cy = A, C2 C3 = bb*, Cy = aa*bb*, Cs = bb*aa*, and C¢ which is Two-Switch itself.

Exercise 14.3.5

The original DFA is drawn on the left of Figure S-25. The initial partition has two classes, N = {EE,OO} and F = {EO,OE}. On either input, both states in N go to F and both states in F go to N. There is thus no further need to subdivide the 16-98

= aa’,

Figure 16-24:

The DFA for the Language Two-Switch, from Exercise 14.3.4

Figure 16-25:

The Two DFA’s for Exercise 14.3.5

16-99

classes, and the original DFA is equivalent to a two-state DFA

start state N, only final state F, and 6(N,a) = F and 6(F,a) letter a. This DFA is drawn on the right of Figure S-25.

with states N and F,

= N

for any input

Exercise 14.3.6 We have states set {0,1,2,3,4}, with 0 the start state and only final state.

For each

state i, we set 6(i,0) = 2i%5 and d(i,1) = (2i+1)%5. By induction on the definition of the representation of naturals by binary strings, it is easy to prove that after reading any string w representing the natural n, the state of the DFA is n%5, which is a final state if and only if w € Ls.

Exercise 14.3.7 There is only one Myhill-Nerode equivalence class containing strings from L, so there is only one accepting state in the minimal DFA. There could be any number of non-final states.

Exercise 14.3.8

Let L = D*a (with © = {a,b}) so that L? = aX*.

The minimal DFA for L has two

states, one for strings ending in a and one for all others.

The minimal DFA

for L®

has three states, as the three strings A, a, and b are pairwise distinguishable.

Exercise 14.3.9 We constructed the DFA, with states {0,1 that no smaller DFA

n, d}, in Exercise 14.1.7, a and showed was possible in Exercise 14.2.6. With w = wy... wn, the equiv-

alence classes are {A}, {wi},

{wiw2},

strings that are not prefixes of w. Exercise 14.3.10

The statement is false. There is a DFA with ké states made by the Product Construction, but it may not be minimal. For example, let X be E*a (with X = {a,b}) and let Y =X.

Exercise 14.5.1

..., {w}, and one more class containing all

Then k = 2,

= 2, but X UY

= D* has a one-state DFA.

Add a new state to N, add a A-move from each final state of N to the new state, make those states non-final, and make the new state final. Call this new \-NFA N’. We

must show that for any string w, w € L(N) if and only if w € L(N’). One direction: If w € L(N) there is a path labeled by w from the start state of N to some final state f. In N’ we can follow this path to f and then take the new \-move to the new final

state, so w € L(N’) as well. Other direction: If w € L(N’), the path from the start state to the final state in N’ has a last move, a A-move from some state f to the final state.

In N, then, there is a path labeled by w from the start state to f, and since f

Make

the new \-NFA

is a final state of N, w € L(N). Exercise 14.5.2

M’

as follows:

Add

a new state to M,

add \-moves from that

state to all the start states of M, and make the new state the only start state of M’. For any

w, there is a path from

some start state of M

to a final state of M

only if there is a path from the start state of M’ to the same final state of M’. proof of this is very similar to the solution of Exercise 14.5.1:

if and

The

If there is a path in M,

we make a path in M’ by adding a \-move at the beginning, and if there is a path in M’, it must begin with a \-move which we can remove to get a path in M. Exercise 14.5.3

There are four ways to choose which of the two states are final. There are 2x2x2=8 elements of Q x © x Q, and any subset of these eight elements might be the relation A.

Exercise 14.5.4

So there are 4-28 = 4 - 256 = 1024 possible two-state NFA’s.

(a) The NFA is drawn in Figure S-26, on the left.

16-100

Figure 16-26:

Exercise 14.5.5

The NFA’s for Exercise 14.5.4 and 14.5.5

(b)

The strings A, a, of alternating a’s are not in L(M) must alternate

ab, aba, abab, and so forth are in L(M) because there is a path and b’s beginning at 1. The strings b, ba, aa, abb, and so forth because there is no path labeled by these strings — any path a’s and b’s.

(a)

The labeled directed graph of N is drawn in Figure S-26, on the right.

(b)

In L(N) are a (path « to p), ab (path v to p to s), and abb (path 1 to p to s to p). Outside L(N) are \ (only path is to « which is non-final), b (only paths are to q and r), and bb (only path is z to r to q).

(c) From the definition of A* and the list for A, we can successively prove A*(v,A, ¢), A*(t,a,p), A*(e,ab,s), and A*(u,aba,q). From 1, one can go only to p on the first a and from there only to s on the b, but on the final a one could go to either qorr. So A*(v, aba,r) is true as well, but A*(v,aba,x) is not true for any other x. (d)

We can delete g and the three transitions into g. No paths from v to a final state are affected, since q is non-final and has no transitions out of it. So the strings that have paths from to a final state in the two machines are the same, so the

languages are the same. Exercise 14.5.6

If our single state is non-final, the language of the NFA is 0. If it is final, then we can accept any string that can be made with letters that are labels of transitions from the state to itself. Thus our language is A*, where A is the set of such letters. It makes no difference if we are also allowed \-moves,

redundant —

since the only possible \-move is

we could already go from the state to itself while reading \ by taking

the empty path.

Exercise 14.5.7 Let the language be {A,a}.

This has an ordinary NFA with two final states, one the

start state and the other at the end of an a-move from the start state, and no other states. But we cannot accept A and a with the same state, since after A we can accept

on an a and after a we Exercise 14.5.8

cannot.

Let © = {a,b} and let L = a*, the language of an NFA with one state s that is final 16-101

Figure 16-27:

The NFA and the Resulting DFA for Exercise 14.6.3

and one transition (s,a,s). If we change the final state set to F = 0, the language of the new NFA is 0, not L = D*nd*. Exercise 14.5.9 ‘We want

to show

that the player has a winning strategy on input

there is a w-path from the start state to some final state.

w if and only if

If the path exists, the player

can choose the edges of the path in order and reach a final state after reading w, thus winning. If the player has a winning strategy, the result of their playing it is a w-path from the start state which must be to a final state since the player wins. Exercise 14.5.10

‘We want

to show

that at least one process reaches a final state after reading

w if

and only if there exists a w-path from the start state to some final state. If the path exists, some process will follow it, since each move of the path is a valid choice that the current process will either take itself or fork another process to take. If a process

reaches a final state after reading w, it must which is from the start state to a final state. Exercise 14.6.1

have done so by taking a valid path,

The new automaton has three states, for 0, {0}, and {1,p}. There is only one non-final

state, so we only have to check the two final states. State {c} goes to a final state on 0 and a non-final state on 1. State {1,p} goes to final states on both 0 and 1. So the second partition has each of the three states in a separate class,

and so the minimal

automaton for the language is the same as the given one. Exercise 14.6.2

Every set produced in the construction will contain 1, so the accessible states of D must be chosen from the 2”~! sets of N’s states that include v, rather than the 2”

general sets of N’s states. Exercise 14.6.3

An NFA and the resulting DFA

are shown in Figure S-27.

DFA has two final and three non-final states.

Each has five states.

The

The two final states {s} and {q} are

separated in the second partition because the former goes to a final state on input q

and the latter does not. Of the three non-final states, {¢} and on both inputs, while {p,r} goes to a

final state on both inputs.

go to non-final states

There are thus four

classes in the second partition, with the only non-singleton class containing () and {v}. But these two states are separated in the third partition, because on input b, state

16-102

Figure 16-28:

The Six-State DFA for Exercise 14.6.4

goes to @) and state {1} goes to {p,r}. Since 0 and {p,r} are separated in the second partition, @ and {1} are separated in the third partition. So the third partition has each of the five states of the DFA in a separate class, and thus this DFA is the minimal

one for its language. Exercise 14.6.4

The DFA has six states and is shown in Figure S-28. This DFA is not minimal because

the states

and {q} may be merged.

The minimal DFA for this language has five

states.

Exercise 14.6.5

The construction gives three states: The initial state {p} is non-final, it has an arrow labeled with both a and b to {q,r}, and this state has an arrow labeled with both a and b to {p,q,r}. This DFA is not minimal because the two final states may be

merged. Exercise 14.6.6

The state set is {0, 1,2, 1’, 2',3/,4"}, with 0 the start and only final state. We have a moves from 0 to 1, 1 to 2, 2 to 0, 0 to 1’, 1’ to 2’, 2’ to 3’, 3’ to 4’, and 4’ to 0. Thus any string in (aaa + aaaaa)* may be accepted by taking the unprimed loop for any aaa and the primed loop for any aaaaa. Any accepting path must be divisible into primed and unprimed loops, so the string read is in (aaa + aaaaa)*.

In the Subset Construction, start state {0} goes to {1,1’}, which goes to {2,2'}, which goes to {0,3'}, which goes to {1, 1’, 4’}, which goes to {0,2,2'}, which goes to {0,1,1,3'}, which goes to {1,2,1/,2',4’}, which goes to {0,2,2',3’}, which goes to {0,1,1’,3',4'}, which goes to {0, 1,2, 1’, 2',4’}, which goes to {0,1,2,1/,2’,3'}, which goes to the entire state set, which goes to itself. We have thirteen states in the NFA that we may rename {0,1,...,12}, with final state set F = {0,3,5,6,8,9, 10, 11, 12}.

Applying minimization, the set N = {1,2,4,7} split into X = {1} (with an a-arrow to N) and Y = {2,4,7} (with an a-arrow to F’). The set F splits into Z = {0, 3,6} (with a-arrow to N) and W = {5,8,9, 10, 11, 12} (with a-arrow to F). On the second round, we find that states 0, 2, and

5 behave

differently from the others in their class, giv-

ing us a partition of seven classes: {{0}, {1}, {2}, {3, 6}, {4,7}, {5}, {8, 9, 10, 11, 12}}. On

the

third

round,

we

split

4 from

7 and

ing us a stable partition of nine classes and 16-103

on

the

fourth

we

split 3 from

thus a minimal DFA

6, giv-

with nine states

({0, 1, 2,3, toi+1,

6,7, 8}), with final state set {0,3,5,6,8}.

Each state i has an a-arrow

except for state 8 which has an a-arrow to itself.

Exercise 14.6.7 The states of the DFA are sets of NFA states. Given a DFA state S and a letter a, we define 5(S,a) to be the union, over all NFA states p in S, of the states q such that A(p,a,q) is true. If, S, then after reading at p will fork on function is

DFA’s are the same.

before reading the a, there is a process at each of the states of the a there will be processes at exactly those states, since the to processes at each q such that A(p,a,q) is true. Since this the same as the one we defined for the Subset Construction, the

Exercise 14.6.8 ‘We show that any state ¢ is in the first set if and only if it is in the second. If¢ is in the first set, then there exists a state u that is in the set {s : A*(v,v,s)}, which means

exactly that A*(v,v,u) is true, such that A(u,a,t) is also true. This is just what the definition of the second set says about u, so for a given t, a state u exists meeting one condition if and only if one exists meeting the other.

Exercise 14.6.9

The states to be included in 5({p, gq}, b) are the union of the states with b-arrows from

p and the states with b-arrows from q. The first set is {r} and the second set is empty. The correct value of 6({p, q},17) is thus {r} U0 = {r}. By adding this second arrow,

the student

has created

an NFA

rather than a DFA.

The

fact that there are

no b-arrows out of q is already taken into account by the absence of any elements of o({p, q}.b) coming from q. Exercise 14.6.10

‘We carry out the Subset Construction, with the sole exception of the start state of the DFA, which corresponds to the set S of start states of the NFA. By induction on

all strings w, the set of states where the NFA could be after reading w is equal to the

set denoted by the DFA state 6*(S, w). Exercise 14.7.1

We have that I*(s,a,t) if and only if T'(s,a,t) which is true if and only if da : Sy: [A*(s, A, 2) AA(a,a,y) AA*(y,,t)]. On the other hand, A*(s, a, ¢) is true if and only if dz: da : dy: [A*(s, A, z) AA*(z, A, 2) A A(z, a, y) A A*(y, A, #)]. If the formula for I*(s,a,t) is true, we can instantiate x and y, let z = s, and use the Rule of Existence three times to prove the formula for A*(s,a,t). Conversely, if the

formula for A*(s,a,t)

is true, we may instantiate z, 2, and y, note that A*(s,

follows from A*(s, A, z) and A*(z, y Exercise 14.7.2

\,2)

), and use the Rule of Existence twice on x and

As suggested, we use ordinary string induction. If w = X, the desired statement is true vacuously. If w = va, we break into two cases. If v = A, then w = a and P(w) is true because P(a) is assumed. If v # A, the inductive hypothesis tell us that P(v) must

be true. Specifying the premise to v gives us that P(v) + P(va), so P(va) = P(w) follows by modus ponens. Exercise 14.7.3

If there is no final state, clearly L(M) = 0. Otherwise, we can accept a string w if and only if for each letter in w, there is some letter move labeled by that letter. (We move by A-moves from where we are to the start of the next letter move needed, then move to a final state at the end.) The language is thus B*, where B is the set of letters that occur as labels of letter moves in M. 16-104

Figure 16-29:

The Ordinary NFA Constructed in Exercise 14.7.5

Exercise 14.7.4 We show the contrapositive of this implication. If 1 € F, then clearly \ € L(M) because there is a path (of length 0), labeled by A, from the start state « to the final state .

Exercise 14.7.5

The ordinary NFA is shown in Figure S-29. a final state

in N.

The

letter

move

Note first that A € L(M),

(p,a,q)

so 1 becomes

gives rise to three letter moves

in N:

(p, a, q), (t,@,q), and (r,a,q). The move (q, b,p) of M also gives rise to three moves in N, which are itself, ( , and (q,b, f). Finally, the letter move (q,b,r) of M gives rise to the same three letter moves of N.

Exercise 14.7.6

So N

has six letter moves, as shown.

Let N be the \-NFA resulting from the original construction, and N’ be the A-NFA obtained from N by making s an additional final state. Note that the A* relation of the two machines is unaffected by the change. If w € L(N), then there exists a final state f with A*(v,w, f), and since f is also a final state of N’, w € L(N’) as

well. If w € L(N’), then either A*(v, w, f) for some final state f of N, or A*(1,w,s). In the first case we know that w € L(N). In the second, because we are given that A*(s,A,

f)

is true for some

final state of N,

and

we know

that A*

has a transitive

property, A*(u, wd, f) = A*(u,w,f) is true and again we know w € L(N). Exercise 14.7.7 ‘We change p to a final state if it is the start state and q is a final state. the A-move and add new letter moves as follows:

We

remove

e If (r,a,p) € A we add (r,a,q) e If (g,a,r) € A we add (p,a,r) Exercise 14.7.8 We know that \ € L(N) because of the two-step \-path. new a-move may be used only once, or more than once.

to r, or from q to r, then there

The question is whether the If it goes from p to q, from p

is only one way to use the a-move and L(N) = {A, a}.

In the other six situations, it is possible to use the a-move any number

of times on a

path from p to r, and L(N) = a*. Exercise 14.7.9

If the condition on letter moves

holds,

every move

increases the index of the state,

and there can be no more than k — 1 edges on any path. Thus no string in L(V) has 16-105

3

3

3

(a) Not symmetric

(b) Symmetric

(c) Symmetric with two-headed arrows Source: David Mix Barrington

Figure 2-4: Some diagrams representing binary relations on the set {1,2,3}.

e The relation R is transitive if Vx : Vy : Vz: (R(x,y) A R(y,z)) > R(a,z). This means that a “chain” of two true instances involving the same variable y in this way can be “collapsed” to get another true instance.

The less-than-or-equals relation is reflexive, antisymmetric, and transitive. Any relation on a set that has these three properties is defined to be a partial order. Partial orders are the subject of

Section 2.10 below. The equality relation is reflexive, three properties

symmetric,

and transitive.

is called an equivalence relation.

Any relation on a set that has these

Equivalence relations are the subject of Section

2.11 below. Since binary relations on a single set are diagrams of the kind shown in Figure 2.2, another dot for each element on the right. kind of diagram, with only one dot for each

a kind of binary relation, with a dot for each element But with relations on a set element of the set. We still

we can represent them by of the set on the left and we can also draw another draw an arrow from a to b

whenever (a, b) is in the relation. Each of our new properties of relations can be described in terms of a particular property of this diagram:

e A relation is reflexive if and only if every dot in its diagram has a loop (an arrow from that

dot to itself).

e A relation is antireflexive if and only if no dot in its diagram has a loop. e A relation is symmetric if and only if every arrow in its diagram has a matching arrow in the other direction. We sometimes draw a symmetric relation using two-headed arrows as in Figure 2-4. e A relation is anti-symmetric

if and

only

if no arrow

has

a matching

arrow

in the other

direction, unless this arrow is a loop.

eA

relation is transitive if and only if every two-arrow path in the diagram has a corresponding arrow. (If there is an arrow from a to b and another from b to c, then there must be an arrow from a to c, and this holds even if two or more of a, b, and c are equal.)

2-37

Figure 16-30:

The \-NFA for Exercise 14.8.1

Figure 16-31:

The \-NFA for Exercise 14.8.2

more than k—1 letters, and L(N) must be finite. If the condition does not hold, there there is a letter move (s;,a,8;) with i > j. By combining this move and letter moves, any string in the infinite language a* can be accepted by N, and so L(N) is infinite. Exercise 14.7.10

The procedure

must

terminate

because there are only a finite number

of possible A-

moves in a finite-state machine. It must terminate in a transitive relation because the condition to add a move is exactly the negation of the transitive property. So we need only make sure that there is not a smaller transitive relation including the original moves. But this is impossible, because (as we can prove by induction) every move we add corresponds

to a A-path in the original \-NFA,

and thus must appear in any

transitive relation that includes the original moves. Exercise 14.8.1

The resulting A-NFA is shown in Figure S-30.

Exercise 14.8.2

The A-NFA for (aa)*(ab + bb*a)(b + ab*a)* is shown in Figure S-31.

Exercise 14.8.3

Since L C L+., L* € (L+A)* by a rule from Section 5.4. We prove that (L+.)* C L* by induction on the definition of (L+ )*. Thus for all w € (Z + \)*, we must prove 16-106

Figure 16-32: A DFA and the Constructed \-NFA for (b+ ab*a)*, for Exercise 14.8.5 w € L*. For the base case,

w =

and w € L* because

is in any star language.

the inductive case, let w = wv with wu € (L+ .)* and v € L +4. hypoth L* as well. Exercise 14.8.4

The

we L*. Ifv=A, w= uw =u and w € L*. Otherwise This completes the induction and thus the proof.

two-state \-NFA

has a

start state,

a final state,

and

For

By the inductive

v € L, so wv is in

a single \-move

from the

start state to the final state. The constructed \-NFA for (* has four states which we'll call 1, 2, 3, and 4. State 1 is the start state, state 4 is the only final state, and only moves are \-moves — four moves from 1 to 2, 2 to 3, 3 to 2, and 3 to 4.

Exercise 14.8.5

The two-state DFA

has two states 0 and

the

1, where 0 is the start state and the only

final state, there are b-loops on each state, and there is an a move from each state to

the other. It and the constructed A-NFA for (b+ ab*a)* are shown in Figure S-32. We successively build a four-state A-NFA

for b*, a six-state one for ab*a, a six-state one

for b + ab*a, and finally the eight-state one shown for (b + ab*a)*. Exercise 14.8.6

Base cases: The fixed \-NFA’s for () and the single-letter languages both satisfy (1) and (2). Inductive hypothesis: We have regular expressions a and 8 and \-NFA’s for each satisfying our three standard assumptions plus properties (1) and (2). Inductive

goals: The \-NFA’s constructed for a + 8 and af both satisfy (1) and (2). Proof: They both satisfy (1) because neither construction introduces any \-moves. They both satisfy (2) because paths in the a+ 8 machine correspond to paths in either the a or the 8 machine,

and those in the a3 machine

cor

from a and one from {. In either case there are only Exercise 14.8.7 We

have state set {p,q,r}

start state

ond to pairs of

paths, one

ely many possibili

1, only final state 9, and

transitions

(p, \,q),

(q,@,q), and (q, A,r). It is clear that the three assumptions are followed, and that the language is a*. The A-NFA for the given regular expression has state set {1,2,3,4,5,6 and 15 transitions: (1,,2), (1,\,3), (2,a,2), (2,,4), (3,0,3), (3,4,4), (4,4, 5),

(5, ¢,5), (5,A,6, (6,4,7, (6,4,8, (7,4, 7), (7,A,9, (8,0, 8(, and (8, 4,9). By contrast, the A-NFA from the original construction has fourteen states and 25 total transitions (20 of them A-moves).

16-107

Figure 16-33: The A-NFA for (b + aa Exercise 14.8.8

(a)

bb)*(A + aa*)(A +b), for Exercise 14.8.9

For simplicity, we let L be a regular expression that is the sum of all the letter characters, and D the sum of all the digit characters. That makes our regular

expression just L(ZL+D)*. The A-NEA has states set {1,2,3, 4,5}, start state 1, only final state 5, and transitions (1, L, 2), (2, A, 3), (3, L+D, 4), (4, A, 3), and (4,A,5). The ordinary NFA has the same five states and transitions (D+ D,i) for i = 2,3,4,5, and (i, L + D,j for i= 3, i=4, and j = 3,4,5. The DFA has four states - {1} ((start), {2,3,4,5} (final), {3,4,5} (final), and a death state 0, with transitions ( ({1}, 2, {2,3, 4, 5}), ({2, },D+D, aa > ({3,4,5}, 2+ D, {3,4,5}), and all other transitions to the death state. Minimization collapses the two final states. (b)

We’ve seen in Problem 14.6.3 how to construct a DFA for any finite language — there is a state for any string that is a prefix of one of the words,

state. For the final states of new DFA and reads a single Exercise 14.8.9 Exercise 14.8.10

complement the DFA for the solution string — we

of a finite language, we just swap the final and nonthe finite language. The product construction of this to part (a) maintains the state of both DFA’s as it accept if and only if both DFA’s are in final states.

The twelve-state \-NFA is pictured in Figure S-33. We'll omit drawing the ordinary NFA, but draw the DFA which has only six states in Figure $-34. It’s easy to check from this DFA that aba takes any state to the death state, and that the death state can only be reached for the first time by a consecutive

a, b, and

a.

Minimization

merges

{3,4,5,,9,10, 11,12} with {4,5,10, 11,12}, No-aba that we found in Section 14.3. Exercise 14.10.1

plus a death

The four-state DFA

the start

state with

{2,7,8,12}

leaving the four-state minimal DFA

becomes a six-state \-NFA

and

for

when we add a new start and final

state (diagram (a) in Figure S-35). We then eliminate state 01 to get diagram (b), eliminate state 11 to get diagram (c), and finally eliminate state 10 to get the threestate diagram (d). From this we can read off the final regular expression

{0 + (10 + 111*0)(10 + 111*0)*0]*[111* + (10 + 111*0)(10 + 111*0)*(A + 111*)]. 16-108

Figure 16-34: The DFA for Exercise 14.8.10 To simplify this, we first convert 10 + 111*0 to 11*0 where it occurs, getting

[0 + (11*0)(11*0)*O}* [111* + (11*0)(11°0)*(A+ 111°). Note that every string in the second block must have a 1 as its next to last letter — we prove this by checking each of the three terms in the union. To show that every string whose next to last letter is 1 is in this language, we note the first starred block denotes the language \ + 0+ =*00 and the second starred block denotes the language of all strings that start with 1, have next to last letter 1 and have no occurrences of 00. So any string whose next to last letter is 1 can be broken into the prefix up to including the last 00 (or 4 or 0 if there is no 00) and the remaining suffix which meets these conditions. Exercise 14.10.2

Starting from the five-state r.e.-NFA in Figure 14-44, we first eliminate state 2 to get the four-state r.e.-NFA

pictured

in Figure

S-36.

Because

of the simple structure of

this r.e.-NFA, we can read off the regular expression directly as

b*[ab* + a + b(a + bab*)|[bbb(a + bab*)|*.

Exercise 14.10.3

We

Exercise 14.10.4

For each regular expression R occurring as a

first construct

a three-state

DFA

with

b-loops

at each state and

a-moves

from

states 0 to 1, 1 to 2, and 2 to 0 (0 is the start state and only final state). This DFA is shown in diagram (a) of Figure S-37. We add a new start and final state to get the five-state r.e.-~NFA of diagram (b). We then eliminate state 2 to get a four-state r.e-NFA (diagram (c)). Eliminating state 1 will then produce an r.e.-~NFA with a loop on state 0 labeled by b + ab*ab*a, so the final regular expression is (b + ab*ab*a)*. label in the r.e.-NFA, use the construction

of Section 14.8 to build a A-NFA with a single final state, no moves into its start state,

and no moves out of its final state, whose language is L(R). Then build a A-NFA by taking the states of the r.e-NFA and inserting the \-NFA for each R in place of the

R-arrow (that is, if (s, R,t) is a transition of the r.e.-NFA, we identify the start state of the \-NFA for R with s and its final state with ¢). 16-109

Figure 16-35:

Figure 16-36:

Four r.e.-NFA’s in the Construction for Exercise 14.10.1

An r.e.-NFA in the Construction for Exercise 14.10.2

16-110

Figure 16-37: Exercise 14.10.5

Exercise 14.10.6

A DFA and two r.e.-NFA’s for Exercise 14.10.3

Let A and B be two arbitrary regular languages.

By Kleene’s Theorem there exists

a DFA M with L(M) = A anda DFA N with L(N) = B. By the result of Problem 14.1.3 we can construct a DFA (the “product DFA”) whose language is L(M)NL(N) = ANB. Since AN B is the language of a DFA, we know by Kleene’s Theorem that it is regular. Eliminating the death state right away, we have start state 1, intermediate states 2, 3,

and 4, and final state 5, with transitions (1, ,2), (2,a,3), (2,A,5), (3, @,3), (3,b,4) (3, A, 5), 44,0, 3), and (4,,5). We first eliminate 4 to create (3, A+ ,5) and (3, bb, 2).

We then eliminate 3 to create (2,aa*(\ +),5) and (2,b+aa*bb, 2). Now eliminating 2 gives us exactly the regular expression from Exercise 14.8.9. Exercise 14.10.7 ‘We create a DFA from a and search it from the start state, returning false if there is any path to a non-final state and returning true if there is not. Exercise 14.10.8

Consider a A-NFA final state f.

N

for L satisfying our three assumptions

so that it has a single

Alter the final state set to add any state s such that there is any path

from s to f. We claim that a string is in Pref(Z) if and only if it is in the language of this new \-NFA N’. If u is in Pref(Z), then wv is in L for some string v. There is an accepting path for uv in N, which consists of a u-path from the start state to some state s, followed by a v-path from s to f. Since there is a path from s to f, s is final

in N’ and u € L(N’). Conversely, if u € L(N’), there

is a u-path in N to some state

sand some path in N from s to f. Let v be any string on which this second path can

be taken. Then wo is in L(N) = L, and u € Pref(L). Exercise 14.10.9

As suggested, let D be a DFA for X. Make a new DFA D’ with the same state set, start

Exercise 14.10.10

‘We create DFA’s for X and Y, build the product of these two DFA’s, and change the final state set of the product so that it includes pairs with a final state of one DFA and a non-final state of the other. This is a DFA for XAY, which must thus be a

state, and final state set as D. For every state s and letter a, compute 6*(s, f(a)) =t in D and define 6’(s,a) to be t. As D’ reads any string w, by induction its state on reading w is exactly d*(v, f(w)). This state is final if and only if w € f-1(X).

regular language. 16-111

16.8 Exercise 15.1.1

Solutions to Exercises From

Chapter

15

The standard computation starts in state 1 at the left of the string, a configuration we may denote by laabbab. It moves right to a2abbab, then moves right to aadbbab, left to adabbab, right to aalbbab, right to aab3bab, left to aa3bbab, left to a3dabbab, left to daabbab, right to alabbab, right to aa2bbab, right to aab2bab, right to aabb2ab,

right to aabbadb, left to aabb4ab, right to aabba1b, and finally right to aabbab3 where it rejects because 3 is not a final state. To determine foassas(3) we must start the machine in configuration aabba3b, from which it moves to aabb3ab, aab4bab, and aadbbab. Since this is the second configuration in the computation above and the machine is deterministic, we know that it will

now

follow the computation above until it leaves to the right in state 3.

Saaboab(3) = 3. Exercise 15.1.2

Therefore

If we start this machine on a string bw beginning with b, it first moves right and into state 3, from configuration lbw to b3w. If w = A it now rejects because 3 is a non-final state. If w = av, we move from b3av to 4bav and then move left again and hang. If w = bv, we move from b3bu to 3bby and then move left again and hang. Since we reject or hang in all cases when the input begins with b, there are no such strings in the language. In fact in these cases we never look beyond the second letter of the input.

Exercise 15.1.3

If x is any string at all, will also either loop or hang on both ua and v2, since it will never move past the right end of u or v. So neither ua nor vx is in L(M), and

the property (ua € L(M)) Exe

(vx € L(M)) holds.

4 We must decide whether the single state is final, whether to move left or right on seeing an a, and whether to move left or right on seeing a b. These three binary choices lead to eight possible 2WDFA’s. For the four with no final state, the language is clearly (. We must check the other four individually. The machine that moves right on either a or b accepts any string, so its language is 4*. The machine that moves left on either input will hang on any nonempty string, but on input 2 it begins off the right end of the string in an accepting state, so it accepts. Thus this machine has language (*. Now consider the string that moves right on a and left on b. Clearly it accepts any string in a*. But it does not accept any other string, since it can never move right past ab. So its language is a*, and the other machine’s language is b*. We must decide for each of the k states whether it is final (2

choices).

For each of

the 2k state-letter pairs, we must decide a move direction and a new state, so we must

choose from 2k options. The total number is thus 2"(2k)?*, or 23*k?*, or (8k?)*. For

k = 2 this number is 1024, for k = 3 it is 723 = 354168, and for k = 4 it is 1284 = 2?8 or 268435456. Exercise 15.1.6

(a) We add the endmarkers to the input and keep the entire transition function of the new-form 2WDFA, thus ensuring it will never hang. We add transitions to the two halt states so they move right on any letter, and make the former accepting halt state the only final state. So the old-form 2WDFA will always leave the input to the right, unless the computation of the new-form 2WDFA

was infinite,

and it will leave in a final state if and only if the new-form one accepted. 16-112

(b)

We leave the input the same. If any transition from $ went left, we replace it by a move to the rejecting halt state, since the old-form machine will hang in this circumstance.

If a transition

from #

moves

right,

we replace

it by a move

to

either an accepting or a rejecting halt state, depending on whether the old-form 2WDDFA went to a final or non-final state. Exercise 15.1.7 Our 2WDFA moves right until it finds ac, then moves left exactly k + 1 times. If it finds an a, it moves right (in a non-final state) to the c, then moves right once more in a final state. If that doesn’t end the computation, it goes to a rejecting state and moves right to the end of the string. If it finds a b k +1 steps to the left of the c, it goes to a rejecting state and moves right to the end of the string. It may hang if there are not k +1

spaces to the left of the c, but that is correct because such an input is

not in the language.

Of course, if it never finds a c, it rejects.

How

many

states do

we need? There are k + 1 to count the leftward moves, and some constant number of others, so the total number is proportional to k.

Exercise 15.1.8

If the language is almost periodic, we have already proved that it has an ordinary DFA, and so it also has a 2WDFA that never moves left. So let M be an arbitrary 2WDFA

with n states and input alphabet {a}.

string a"+!. Unless it course) M must revisit movement of the head that on sufficient long

hangs, sometime in the first n a state it has already been in. was during the cycle of states. input M is guaranteed to hang,

If there was no net movement,

so the language is also finite. exit the input

to the right,

does so. Membership

Start a Start the 2WDFA

on the input

moves (all of which see a’s, of The question is what the net If it was to the left, we know and so the language is finite.

it will repeat the same loop in the same place forever,

If the net movement and

accept

is to the right, M

will eventually

or reject depending on the state in which it

in the language for sufficiently long strings is thus periodic.

In

particular, if the cycle has a net movement of k steps to the right, and we add k letters to the input, we preserve the state in which M exits and thus preserve membership

or non-membership in the language. Exercise 15.1.9

(a)

aca:

laca > alca > acta > a3ca > 3aca > adca > aca > acad, accept.

lacb — alcb > infinite loop.

(b)

This makes the language almost periodic.

ac2b —

adcb —

dacb >,

hang.

c2c >

lec >

or ending with the first c, such as abc, or where

is directly followed by a second, or where there first ¢ for the a or b after it. (a)

lec +

c2c...,

In L(M): Any string where the a or b after the first c has a letter character before the c, such as abca, abcb, bcb, or bebaabcc. Not in L(M): Any string with no c, such as baab,

Exercise 15.1.10

cc:

acb:

the first ¢

is no matching letter before the

Start state 0 has 0 and 1 arrows to itself and a l-arrow to state 1.

For i from

1 to n +1, there is a 0-arrow and a l-arrow from state i to state i+ 1. There is a 1-arrow from state n+ 1 to state n+ 2, the only final state, and a 0-arrow

and a 1-arrow from state n+ 2 to itself.

Clearly the only way for this NFA

to

accept is to leave state 0 on a 1, read exactly n letters, then move to state n+2

on another 1 and stay there. (b)

Let u and v be any two strings with exactly n letters. Let w, of k letters, be the longest common

string 0°11,

prefix of u and

v, so that u = w0u' and v = wiv’.

Then uz = w0u’0*+11

is not in K,,

16-113

Let z be the

because the only 1 after the

first n letters has a 0 n spaces before it. But

v'0*+1 has exactly n letters.

vz = wlv'0*+11

is in Ky, because

(c) Here is pseudo-Java code of a sort for the 2WDFA:

while if

(true)

else

(see

0)

{move

{

move

right

right;

n spaces;

if (see 1) accept; move left n-1 spaces;} We need a state for the start of the loop, about 2n states to implement the loop,

and an accept state where we always move right. Every time through the loop has a net movement of one space to the right, so every letter is considered as the left member of a pair of ones putting the string in the language, as long as this is possible.

If the machine

leaves the input before reaching the accept state, we

correctly reject. Exercise 15.2.1

( (a)

) ( c) ) (d ) (b)

Context-free, not regular

Context-free, not regular Both context-free and regular Not context-free, two nonterminals on left

(e ) Not context-free, terminal on left: (f) Context-free, not regular (it is left-linear)

Exercise 15.2.2

(Using one-letter abbreviations for the terminal words, and sometimes making several one-to-one moves

(a)

S4tNVtN

(b) S43

at one time:)

> tANVIN —

tldatN —

tldatAN —

tldatAAN —

tNVtN > tANVtN > thNVtN > thANVtN —

thqdjotN —

tldatbbf

thqdjotAN

ES

tbqdjotAAN — tbqdjotqbd

Exercise 15.2.3

S + tNVEN = tdatd S + tNVtN + tANVtN > tAANVIN > tAAANVEN > tAAAANVIN 3 tAAAAANVIN = tlglblf jotN — tlglblfjotAN > teqlblf jotlf

For any natural i, we can construct the string the (quick)' fox ate the dog, for example. Since this gives us a different string for any natural, it gives us infinitely

many strings.

Exercise 15.2.4 The rule taking Mw to Mww, where w is an arbitrary string in (J + U)*, cannot go in a context-free grammar. But since the language of valid strings is the regular language MU* (I + IU*I)(U + IU*IU*1)*, it has some regular grammar. Exercise 15.2.5

Have states 1,

f)

f, and d, with v the start state, f the only final state, and transitions

Sf).

(f,0.d), (d,a,d), and (d,b,d). We can make a regular gram-

mar with rules S > aS, S > bF, F > aF, and F > X. To follow the construction from the DFA slavishly, we would also have the rules F + bD, D + aD, and D > bD,

but these rules are irrelevant as they cannot be used to get a string of terminals. 16-114

Figure 16-38: Two r.e.-NFA’s and an Ordinary NFA for Exercise 15.2.6 Exercise 15.2.6 We first make a four-state r.e.-NFA with states S, C, D, and F’, shown on the left in Figure S-32. By adding two more states to break up the two-letter moves we get the six-state ordinary NFA shown in the middle. To get a regular expression it is easier to start with the r.e-NFA — eliminating state C gives us the r.e.-NFA on the right from which we may read off the regular expression (aba)*abbba. Exercise 15.2.7 Essentially we convert the grammar to an equivalent NFA, and determine whether there is a cycle in the NFA, with at least one letter read on the cycle, and with a state in the cycle that has a path to a a series of depth-first searches.

Exercise 15.2.8

final state.

This can be determined,

for example,

by

For every letter in ©, we have rules S — aSa and S — a. This will generate any palindrome of odd length 2k+1, by using the first type of rule k times and the second type once.

To get palindromes of even length 2k, we need a rule S — A, to use after

k rules of the first type. Exercise 15.2.9

For the base cases, the grammar with no rules generates (), and one with a single rule S — a generates {a}. So assume we have a grammar G with start symbol S; generating the language L,, and a grammar G2 with start symbol Sy generating Ly. Also assume that no non-terminals are shared between these grammars. We can generate L, U L2 by adding the new start symbol S and the new rules S > S$) and S — So to the union of G; and G2. If we instead add the single new rule S > $152, we generate L;L2. And if we add the new rules S — A and S — S'S, we generate Lj.

These constructions are the inductive steps in a proof that for every regular expression

we have a context-free grammar with the same language. Exercise 15.2.10

All we need to do is reverse the right-hand side of each rule. To see that this works, consider a parse tree for any string w in L(G). If we hold this parse tree to a mirror, we get a parse

tree for the string w®,

have the same

left-hand side and the reversed right-hand side for a rule in G.

any derivation we make

and

with the reversed

the rules we use in this parse

And

rules can be held up to a mirror as well,

proving that the reversal of the derived string is in LL(G).

16-115

tree each

2.8.3

Exercises

E2.8.1

Give

E2.8.2

Consider the first quantified statement for “well-defined” given in the section. Explain why the condition “b # c” is necessary for the statement to have the intended meaning. If we omitted this condition from the definition, exactly which relations would be “well-defined”?

E2.8.3

Let A= {1,2,3, 4} and let B= {w

an example

of a

relation,

from one two-element

set to another,

that

is total but not

well-defined. Give an example (using the same sets) of a relation that is well-defined but not total. Draw diagrams for both your examples.

, 2}.

draw its diagram, indicate whether it

your answers.

For each of the following relations from A to B,

total, and indicate whether it is well-defined.

Justify

4,2)}

E2.8.4

Give an example of a binary relation on a set A that is neither symmetric nor antisymmetric. Make your set A as small as you can.

E2.8.5

Can a symmetric relation have loops in its diagram? it be reflexive?

Must

it be reflexive?

Must it have loops in its diagram?

If if can have loops, explain how

the condition

Can “every

arrow in its diagram has a matching arrow in the other direction” is satisfied for the loop arrow. (Hint: How would you formally define “the matching arrow for the arrow from a to b”, and what happens to this definition if a = b?)

E2.8.6

Draw diagrams (with only one dot per element) for the following binary relations on the set A=

{1,2,3,4,5}.

Indicate for each relation whether it is reflexive, symmetric, antisymmetric,

or transitive.

E2.8.7 Find a relation that is both reflexive and transitive but is neither symmetric nor antisymmetric. Use as few elements as possible in your base set. E2.8.8

Consider a finite set C’ of candidates,

“candidate c is qualified for job 7”.

a finite set J of jobs, and a predicate Q(c,7) meaning

Do not assume that C and J are the same size.

(a) What

does it mean, in terms of candidates and jobs, for Q to be a total relation?

(b)

does it mean for Q to be a well-defined relation?

What

(c) What

does it mean for Q to be a function?

2-38

SS,lSrS

SrSSr Cer ke rer

SrrSrré

(éSrr, and &

Exercise 15.3.2

Ee F+E a+E a+T a+F

oH ) +)

4 (nets “aan + (ax F *«T) a+ (ax (£)

*T)

a+(ax*(F +E)

a+ (a* (a+ £) a+ (a*(a+T +E)

HAA

at(ax* (T+ E)*T)

*T) a+(a*(a+F +E) *T) at(ax*(at+a+E£) *T) a+(a*(a+a+T+E£)*T) at(a*(at+a+F +E) *T) at(ax*(at+a+(E£)+E£) *T)

at (a*(a+a+(T+E)+E£) *T)

at(a*(atat+(F*T +E) +£)*T) at(ax(at+at+(a*T +E) +£)*T) a+(ax(atat(a*F+E)+£)*T) at(ax(atat(aea+E)+£)*T) a+(ax*(atat(a*a+T)+£)+T) a+(ax(atat(axa+F)+£)*T) at(a*(a+at(axa+a)+£)*T) a+(ax(atat(a*a+a)+T)*T) at(ax(atat(a*a+a)+F)*T) at(ax(at+a+t(aea+a)+a)*T) a+(ax(atat(a*a+a) +a)*F) at (ax (atat(a*a+a) +a) *(E)) at(ax(a+a+t(aea+a)+a)*(T+E£)) 16-116

*T)

*T)

4 4 = ey + 4+ 4 4 4 = ey + + 4+ 4 4 4 = + + 4+ 4 4 4 = ey + + 4+ 4 4 = ey + +

The Parse Tree for Exercise 15.3.4

at (ax (a+ (owase) 4a)» (F +B) a+(ax(a-ta+(axa+a)+a)* (a+)

at(a*(ata+(a*a+a)+a)*(a+F))

a+(ax(atat(axata)+a)*(atT ) at(ax(a+a+(axa+a) +a) *(ata )

Exercise 15.3.3

(a) The

successive

LereeSrr Sr,

(b)

bi

strings are

S,

rrSr, €

0Sr,

€SSr

LeSrSr,

rlSrr, and ¢

tlie

Figure 16-39:

lérSr,

lérSSr,

llreSrSr,

er:

The successive strings are E, T, F*T,a*T, ax F, a*(E), ax(T+£), ax(F+E£)

ax (a+E), ax (a+T),a*(a+F), and a* (ata).

(c) The successive strings are 2, T+E,F*T+E,

(E)*T+E£,(T+E£)*T+E,

(F*T+E)*T+E, (a*T+E)*T+E, (ak F+E)*T+E, (atat+£)*T+E, (axa+T)*T+E, (axa+F)*T+E, (axata)*T+E, (axata)*F+E,

(axata)*a+E, (a*xata)*a+T+E, (a*xa+a)*a+F +E, (axata)*ata+E, ( axata)*ata+T, (axata)*at+atF*T, (axata)*atata*T,

( axa+a)*a+a+a*F, Exercise 15.3.4

The

string aaababbb

is generated

aaSbb, aaaSbbb, and aaababbb.

Exercise 15.3.5

and (axa+a)*at+a+axa. by this grammar,

with successive strings S, aSb,

The parse tree is drawn in Figure $-33.

The parse tree for a right-linear grammar has one path along its left edge, with terminals trailing along the right of this path. The parse tree of a left-linear grammar is similar except that the long path is down the right edge of the tree and the terminals trail to the right. The tree for a general linear grammar still has a single long path, but now there may be terminals on either or both sides of this path.

Exercise 15.3.6 Let w be a sufficiently long string in EQ. Let y be any substring of w that consists of two different letters next to one another. Then xz, and all the other strings of the form xy’z, are in EQ.

(Note that this does not work if we impose the additional

condition that |ry| < k, because the first k letters might be all the same.) 16-117

Exercise 15.3.7 Base cases: If there are not at least two nodes in the tree, we cannot have one A on top of another in a single path. (Actually three nodes, since we have to also have a terminal or \ somewhere.) For the inductive case, assume that all trees with at most

n nodes have the given property,

and

consider a tree T with n+

1 nodes that

has one A above another on a path, violating the given property by not generating a terminal in between. Merge the two A’s, discarding anything else generated by top A. Since the discarded nodes do not contain a terminal, this does not change the string generated. But we have deleted at least one node, and now we can apply the inductive hypothesis (if necessary) to get the desired tree T’. Exercise 15.3.8 We look up the letter w;. If the rule A — w; exists in G, we answer yes. Otherwise, we need to see whether any other rules can be used to get from A to w;. This involves part of the work of putting the grammar in Chomsky Normal Form, as we’ll see in Excursion

15.4.

We

need

to first consider rules of the form B

>

C,

B >

w;, and

B — X for all nonterminals B. The goal is to determine exactly which nonterminals can generate A, and which can generate w;. We add rules of this form as long as we can, until we can see that no other rule can be used to generate either of these strings. Then we answer yes if and only if we have added the rule A > w;. Exercise 15.3.9

for

(int for

iti;

m= (int

n

return Exercise 15.3.10

m

= m+1;

gen(A,

< j; n

i,

mt+)

R(«, z), that is, if three elements of A are connected by a “chain” of two elements of the relation, the first and third elements must also be related. The

< and

> relations have these three properties on the naturals,

strings (using lexicographic order),

or any other ordered

set.

the real numbers,

characters,

In fact, on all these sets < and >

have an additional property as well:

e A relation is fully comparable if Vx : Vy : (R(a,y) V R(y, can be made true by switching the variables.

)), that is, if any false instance

This property is also sometimes called being “total”, but we will reserve that word for the property

of relations that is part of being a function. Definition:

A linear order, also called a total order, is a partial order that is fully comparable.

The reason for this name is that a straight line has the property that among any two distinct points, one is before the other.

But there can be partial orders that are not linear orders.

For example,

the equality relation is a partial order — it is clearly reflexive and transitive, and it is antisymmetric because we know that if a 4 y, E(a,y) and E(y, x) are both false. Example: Another partial order that is not a linear order comes to us from number theory (the study of the naturals) and will be very important to us in Chapter 3. If a and b are naturals, then we say that a divides b if there exists some natural c such that b = a-c. (In symbols,

D(a,b) = Je: b = a-c.) Equivalently, if you can divide b by a you get no remainder — in Java notation, b%a == 0. (In Exercise 2.10.1 you are asked to prove that these two definitions are equivalent when a is positive.)

It’s easy to check systematically that this relation D, order.

e Reflexive:

It is always true that D(a,a),

called the division

because 1+ a =a. 2-48

relation,

is a partial

3

5

7

1

Source: David Mix Barrington Figure 2-9: The Hasse diagram for the division relation on {1,...,8}. e Antisymmetric:

Unless b = 0, D(a,b)

can only be true if a < b.

So if both a and b are

nonzero, D(a,b) and D(b, a) together force a < b and b < a, and thus a = b. If b= 0, D(a,0) is definitely true and D(0, a) is true only if a = 0, so the antisymmetry property holds.

e Transitive: Assume D(a,b) and D(b,c), and we will prove D(a,c). We know that b= a-d for some d and that c = b-e for some e. By arithmetic we have c = a- (d-e) and d-e is the number needed to show D(a, c). In Problem 2.10.4 we'll define the substring relation on strings and show that it is also a partial order that is not total. If a partial order is on a finite set, we can represent it pictorially by a Hasse

finite graph (a picture with finite number of dots and represented by a dot, and element a is below element b if and only if you can go from dot a to dot b by going a is “path-below” b). Another way to say this is that is below b and there is nothing between a and b in the Let’s

have

a look

at the

Hasse

diagram

for the

diagram.

This is a

lines) where each element of the base set is in the partial order (that is, P(a, b) is true) upward along lines (we say in this case that you draw a line from a to b if and only if a partial order.

division

partial order

on

the

set

of numbers

{1,2,3,4,5,6,7,8}. We can easily list the pairs of numbers in the relation: Every number divides itself, 1 divides all of them, 2 divides the four even numbers, 3 divides 6, and 4 divides 8. Clearly 1 will go at the bottom, but what should be on the next level? 2 must be under 4, 6, and 8, 3 under 6, and 4 under 8. So 2, 4, and 8 are all on different levels, forming a vertical chain. We can put 3 on the same level as 2, with 6 above it and also above 2. Then 5 and 7 can also go on

the same level as 2, with just the lines to them up from 1. The resulting picture is shown in Figure

2-9.

2.10.2

The Hasse

Diagram

Theorem

Hasse diagrams are a convenient way to represent a partial order, but is it always possible to use one to represent any possible partial order? This is a serious mathematical question. We’ll answer

2-49

it by stating and proving a theorem, although because of our current lack of mathematical tools!* the proof won’t be entirely rigorous!®. With a convincing informal argument, we can see which properties of and facts about Hasse diagrams and partial orders are important

for the relationship

between them. Hasse Diagram Theorem: Any finite partial order is the “path-below” relation of some Hasse diagram, and the “path-below” relation of any Hasse diagram is a partial order. Proof: The second statement is fairly easy to prove and is left as Problem 2.10.1: we only have to verify that the “path-below” relation of any Hasse Diagram is reflexive, anti-symmetric, and transitive. To prove the first statement, we must show that given a

finite partial order P we can always draw

the diagram. (This is a useful thing to know how to do in any case.) We already know where we want to put the lines on the diagram, from the definition of a Hasse diagram: there should be a line from a to b if and only if P(a,) is true and there is no distinct third element c such that P(a,c) and P(c,b).

As a quantified statement, this becomes

L(a,b) + P(a,b) A7dc: If we have a

L(a,b).

table of values of P(«,y)

((e 4a) A (c#b) A P(a,c) A P(e, db).

for all possible 2 and y, we can use this definition to test

But how do we know that we can put the dots on our diagram in such a way that we can

draw all these lines? (Remember,

if P(a,b) then any line from a to 6 must go upward.)

Here is a systematic procedure that we can use to lay out the elements. We first need to find a minimal element!® of the partial order, which is an element m such that P(a,m) is true only when it has to be, when a = m.

Lemma:

Any partial order on a finite nonempty set has a minimal element.

Proof: Informally, here is how you can always find a minimal element. Since the set has elements we may start with any element at all, say, a. If a is minimal, you are done. If not, there must be an element b such that P(b,a) and b #4 a (Why? From the very definition of a not being minimal.) If bis minimal, you’ve got your minimal element, otherwise you continue in the same way, by finding

some element c such that P(c,b) and c# b. How could this process possibly stop?

You could find a minimal element,

which is what you want.

But actually no other outcome is possible. Since there are only a finite number of elements, you can’t go on forever without hitting the same one twice. And as we'll show in Problem 2.10.2, it’s simply not possible in a partial order to have a cycle, which is what you'd have if you did hit the

“Particularly mathematical induction from Chapter 4. ‘This proof is in some sense the first piece of “real mathematics” in this book.

Because we only just barely have

the tools to make the argument, it turns out to be rather complicated with several lemmas relegated to the Problems. Some instructors will find their students ready to tackle this proof at this point, and others may prefer to have them

skim it or skip it 16This is not the same as a minimum element, which would have P(m,a) for every element a.

maximal clement has Wx : P(m,x) + (x = m), and a maximum element has Wz : P(x,m) 2-50

Similarly, a

same element twice.

So a minimal element always exists!”, and we’ve proved the Lemma.

Once we have a minimal element a, we draw a dot for it at the bottom of our diagram. need to draw a line

up to a, because it’s minimal.

But now

a

We'll never

we have to draw all the lines up from

a to other elements — to do this we find all elements z such that P(a,z) is true and there is no y such that P(a,y), P(y,z), and y # z. All these lines can be drawn upward because a is below everything else. But we can’t draw them until we know where to put the dots for all the other

elements!

Consider the set of elements remaining if we ignore a. It’s still a partial order, because taking out a minimal element can’t destroy any of the partial order properties. (Why? See Problem 2.10.3.) So it has a minimal element b, by the same reasoning. We can put the dot for b just above the dot for a, below anything else, because we'll never need to draw a line up to it except maybe from a. Once b’s dot is drawn, we take care of the upward lines from b and then consider the partial order

obtained by removing b as well, continuing in the same way until all the dots are placed and we have a Hasse diagram!*. ‘We are almost done with the proof of the theorem — we worked as intended.

just have to verify that the construction

Lemma: The path-below relation for the constructed Hasse diagram is exactly the partial order we were given. Proof:

We must show that P(a,b) is true if and only if we have an upward path from a to b in

the constructed diagram. This equivalence proof breaks down naturally into two implications. For the easier implication, assume that there is an upward path from a to b. We can use transitivity to prove P(a,b), because P holds for every line in the diagram by the construction. If, for example,

the path from a to b goes through e, f, and g in turn, we know P(a,e), P(e, f), P(f,g), and P(g, b),

and the transitive property allows us to derive first P(a, f), then P(a,g), and finally P(a, b). It remains to prove that if P(a, 6) is true, then there must be a path from a up to b

in the diagram.

Again, we'll prove that it exists by explaining how to construct it out of lines that exist. If a = b then we have a path without doing anything. Otherwise, remember that there is a line from a to

b if and only if there is no c with P(a,c) and P(c,b). Either there is a line, then, in which case we have a path, or such a c exists. Now we apply the argument recursively to the pairs (a,c) and (c, b).

Either there is a d between a and c, for example, or there isn’t. If there is, we recurse again, and if there isn’t the construction assures us of a line from a up to c. The process must stop sometime,

because there are only so many elements in the partial order and we can’t (as proved in Problem

2.10.2) create a cycle. We've

now

a

finished the proof of the Hasse Diagram

Theorem.

oe

‘his argument is, of course, a disguised form of mathematical induction — you might want to take another look at it after Chapter 4.

'STechnically, we have just put the elements into a linear order that is consistent with the given partial order, by a

recursive algorithm. Common sense says that this will work, and later we will be able to prove this by mathematical induction.

2-51

2.10.3 E2.10.1

Exercises Prove that the two

definitions of “divides”

given in the text are equivalent.

that for any two naturals a and b, with a > 0, de: b= a-c if and only if bha E2.10.2

==

is, prove

0.

Which of the following binary relations are partial orders? Which are total orders? (a)

Baseball player x has a higher batting average than baseball player y.

) ( c) ) (a ) (e )

Baseball player 2 has a batting average equal to or higher than baseball player y.

(b)

E2.10.3

That

Natural x is equal to or less than half of natural y ((a = y) V (2a < y)). Over the alphabet

{a}, string « is a prefix of string y, that is, Jz: rz = y.

Natural « is less than twice natural y (a < 2y)).

Follow the procedure from this section to make a Hasse diagram from the following partial order, succ vely finding and removing minimal elements:

{(a, a); (a,d), (b, a); (b, b), (, d), (, €), (bs f), (a), (e,¢), (ed), (ce)

»(d,d), (e,d), (e,€), (fa), (f,.0), (Fe) (fh S)}-

E2.10.4

Draw the Hasse diagram for the relation D(x,y)

on the numbers from 1 through 24.

E2.10.5

Is an infinite partial order guaranteed to have a minimal element?

E2.10.6

Let R be a partial order on a set A and S be a

Argue why or why not.

partial order on another set B.

Define a new

relation T on the direct product A x B as follows: T((a, b),(a’,b’)) is true if and only if both R(a,a') and S(b,b’) are true. Prove that T is a partial order. E2.10.7

Let A and B be two disjoint sets, R be a partial order on A, and S be a partial order on B.

‘We'll define two new binary relations on the union A U B as

E2.10.8

follows.

(a)

Let P be the union of R and S. Prove that P is a partial order.

(b)

Let Q be the union of R, S, and A x B.

Prove that Q is a partial order.

Here are two more relations for you to prove to be partial orders.

(a)

Let X be the power set of {0,1,...,n} where n is a natural. Define the binary relation Son X so that S(A, B) is true if and only if A C B. Prove that S is a partial order.

(b)

Let Y be the set of binary strings of length n, where n is a natural. Define the binary relation T on Y so that T(u,v) is true if and only if for every i with 0 c, f(x) > g(x). D(f,g) means “f dominates g’

We define a relation D on functions so that

(a)

Prove that D is a linear order on polynomials.

(b)

Prove that D is a partial order on functions.

(c)

Prove that D is nota linear order on functions. neither of which dominates the other.)

Let P be a

that P C L.

partial order on a finite set X.

(Hint:

(Hint: Give an example of two functions,

Prove that there exists a linear order L on X such

Use the proof of the Hasse Diagram Theorem.)

2-54

(c) Repeat parts (a) and (b) for each of the other nine pairs of sets: and so forth. E1.1.9 E1.1.10

1.1.4 P1.1.1

A and C, A and D,

Is it ever possible for a set of novelists to be “equal” (as defined in this section) to a set of naturals? If so, describe all the ways that this might happen. Is it possible for two sets each to be proper subsets of the other? explain why there cannot be one.

Either give an example or

Problems How

many

elements are in each of these sets of numbers?

Are any of them equal to each

other? Which of them, if any, are subsets of which of the others?

(a) (3,7-4,7+4, 1} (b)

{1492, 5? — 4?, the number of players on a soccer team}

(c) {5,35— 4?, (11)°, the number of U.S. states, 11, 1331} (d) {1? —0?, 2?— 12,3? — 2?, 4? — 3,5? — 4? 6? — 52}

(ec) {14 10,24+9,3+8,...,9+2,10+1}

P1.1.2

(The Russell Paradox) Suppose that the data type thing were defined so that every set of thing elements was itself another thing. Define the set R to be {x: x is aset and x is nota

member of x}.

(a) Explain why 0, {1}, and {1, {1}} are elements of R. (b) Explain why the set of all thing elements, given our assumption, is not a member of R. (c) Is Ra member of R? impossible.

Explain why both a “yes” and a “no” answer to this question are

P1.1.3 Let A be the set {1,2,3}. Give explicit descriptions (lists of elements) of each of the following sets of sets:

a)

{B:BC A} {B: BC A

and |B| is even} (Remember that 0 is an even number.)

{B:BCAand3¢ B}

{B:BCAand AC B}

(ec)

{B: BC Aand BZ B}

P1.1.4 Let C be the set {0,1,...,15}. Let D be a subset of C and define the number f(D) as follows — f(D) is the sum, for every element i of D, of 2'. For example, if D is {1,6} then

f(D) =2' 42° = 66. (a) What are f(0), (b)

Is there a D ye

f({0,2,5}), and f(C)? that f(D) = 666? If so, find it.

(c) Explain why, if D and E are any two subsets of C such that f(D) = f(£), then D=

1-8

E.

2.11 2.11.1

Equivalence

Relations

Definition and

Examples

In Section 2.8 we defined an equivalence

relation to be a binary relation on a set A (that is, a

subset of the direct product A x A) that has the following three properties, expressed in quantified statements:

e Reflexivity:

Vx : R(«.

e Symmetry:

Vx : Vy: R(x,y) > R(y,2)

e Transitivity: These are the three is true if and only if equivalence relation. by the predicate “a

For any elements x, y, and z, if both R(«,y) and R(y, z) are true, so is R(x, z). most important properties of the equality relation, defined so that E(x, y) 2 = y. So the equality relation is our first and most important example of an Here are two more examples. Define the parity relation P(x, y) on naturals and y are both even or both odd”. Define the universal relation U(x, y) on

any set to always be true. We can easily check the three properties of each of these relations:

e Reflexivity:

P(x,x) is true whether « is even or odd, and U(z, x) is certainly true.

e Symmetry: P(x,y) are both true.

and P(y,

)

are exactly the same statement,

and U(x,y)

and U(y,

e Transitivity: If P(2,y) and P(y,z) are both true, then x and z are both even if y is even and are both odd if y is odd. And of course U(x, z) is always true, so the desired implication is true. It’s easy to make up more examples in the same state”. You may have

of equivalence relations, such as “person a and person b live noticed that all our examples of equivalence relations have

something in common — in some sense they say “a and y are the same” (they are equal, they have the same remainder when divided by 2, they are in the same universe, they live in the same state). Given the English meaning of “same”, a relation of this form will normally be an equivalence relation, because any thing is “the same as” itself, if 2 is “the same as” y then y is “the same as” a, and transitivity also holds. With a definition and a little more work, we can prove the converse

of this fact —

that every equivalence relation can be restated in this form.

First, though, let’s look at how we can describe an equivalence relation by a diagram. We’ll make a dot for every element, and a line from x to y whenever R(«,y) is true. (Since the relation is symmetric, we don’t need to indicate any direction on the line.) Figure 2-10 shows such diagrams for the relations E, P, and U on the set {1, 2,3, 4}. In each case the equivalence relation divides the elements into groups,

same

group

are related to each other and

elements in different

2-55

where

two elements in the

groups are not.

There

are four

1e

@2

i

2

(a) B

i

(b) P

Figure 2-10:

2

(c)U

Diagrams of three relations, EZ, P, and U, on the set {1,2,3,4}.

Source: David Mix Barrington 1 Figure 2-11:

A partition of a finite set. The shaded area is one element of P.

groups with one element each for E, two groups of two elements each for P, and one group of four elements for U. We'll now show that every equivalence relation gives rise to groups of elements in

this way.

2.11.2

The Partition Theorem

Definition:

Let A be any set. A partition of A is a set P, where

e Each element of P is a subset of A, e The

elements

then X NY

of P are pairwise

disjoint,

that is, if X

and

Y are different elements

of P

= 0, and

e The union of all the elements of P is exactly A. Figure 2-11 shows a partition of a The

last two

properties

finite set.

are equivalent

to the statement

2-56

that

each

element

of A is contained

in

exactly one element of P (this will be proved in Problem 2.11.1). We can now state and prove a theorem which says that equivalence relations are intimately related to partitions: Partition Theore: : Let A be a set. A relation R C A x A is an equivalence relation if and only if it is the relation “. r and y are in the same set of P” for some partition P of A. Proof:

The easy half of the proof is to show that given any partition P ona set A, the relation “a

and y are in the same set in P”

is an equivalence relation on A.

We must show

that this relation

(which we will abbreviate S(a,y)) is reflexive, symmetric, and transitive. For reflexivity, let x be any element of A. Since x must be in some set in P, a and z are in the same set in P and so is true. For symmetry, assume that S(a,y), and let B be the set in P that contains a and y. Now y and a are both in B, so they are both in the same set in P, and S(y,«) is true. Finally,

for transitivity, assume that S(«,y)

and S(y,z) are both true.

Let B again be the set containing

a and y. Since y and z are in the same set in P by assumption, z is in B. Since z is also in B, S(a,z) is true as required. We have proven that S$ is an equivalence relation!9, The remaining half of our problem is to take an equivalence relation R on A and construct a partition P of A such that R(x, y) will be true if and only if 2 and y are in the same set in P. We

do this by taking P to be the

set of equivalence

classes of R. For any element x of A, we define

the equivalence class of x to be the set {w : R(a,w)}, the set of elements that are “R-equivalent to 2”. As we take all the possible elements x, we get a collection of different sets as equivalence classes, and this collection will be exactly the partition P.

We

have two things to show.

First, we need to prove that R(a,y)

is exactly the relation

“

and

y are in the same set in P”, that is, that R(x,y) if and only if a and y are in the same set in P. Since this is an “if and only if” statement, we can break it into two parts. First assume R(z, y). By the definition, y is in the equivalence class of x. We also know that « is in the equivalence class

of x, because R(x, x) is true (R is reflexive). For the other direction of the that

2 and

y are in the same

set in P, which

must

“if and only if”,

be the equivalence class of some

assume

element

z.

The definition tells us that R(z,a) and R(z,y) are true, from which we can conclude R(x, z) by symmetry and R(x, y), our desired conclusion, by transitivity. Note how all three properties of an equivalence relation were used in the proof.

All that is left is to show that P is a partition.

Clearly it is a set of subsets of A.

The union of

the sets in P must be A because every element is in some equivalence class — its own, at least. Is it possible for an element of A to be in two different equivalence classes? No, as we can show

by deriving a contradiction. Suppose that x were in the equivalence class of y and also in the equivalence class of z. The definitions give us that R(y. and R(z,a) are true. From this we can derive R(y,z) as above, using symmetry to get R(x,z) and then using transitivity. Now we can show that the class of y and the class of z are exactly the same

different sets?

in P.

By

the definition of set equality,

we must

set, so that they are not two

show

that

for any element

w,

Riy,w) @ R(z,w). Given R(y,w), we use symmetry to get R(w,y), transitivity with R(y,z) to get R(w,z), and symmetry again to get R(z,w). In exactly the same way we can derive R(y,w) 1°This is a rather boring proof, but it’s worth tracing out all the details carefully at least once to make sure that

we're clear on the definition. Later we can say things like “it’s easy to check that S is reflexive, symmetric, transitive” and then not do it explicitly.

and

?°We constructed P by taking the set of all equivalence classes, so that if two elements produced the same equivalence

class it results in only one element of P.

2-57

from R(z,w),

completing the proof that the two sets are equal.

element are thus identical, partition.

Any two sets in P which share an

proving that these sets are pairwise disjoint and that P is therefore a a

Let’s see some examples of the equivalence classes of various equivalence relations.

relation E (defined so that E(x,y) element

x of A.

x = y), the classes are the singleton

For the parity relation P

there are two

equivalence

For the equality

sets {x} for each

classes:

the set of all odd

naturals and the set of all even naturals. For the universal relation U there is only one equivalence class, consisting of all the elements of the type. Finally, for a relation such as “x and y live in the same U.S. state” we would

Delaware}.

2.11.3 E2.11.1

have fifty equivalence classes, one for each state, such as {a : x lives in

Exercises Which

of the following relations are equivalence relations?

equivalence classes.

y)

For those that are,

describe the

For those that are not, say which of the three properties fail to hold.

)s )

on integers,

)

on real numbers, defined as “43z : z is an integer and is strictly between 2 and

defined as “string u and string v have the same length” defined by the inequality

—y| S10”

on novelists, defined as “x and y wrote a novel in the same language”

E2.11.2

Prove that R is symmetric if and only if Vx : Vy : R(z,y) @ R(y,«). statement can thus serve as an alternate definition of “symmetric”.

E2.11.3

Find the partition and

{a,b.¢,d,e, f}:

draw

This new quantified

the diagram for the following equivalence relation on the set:

{(a, a), (a, 0), (a, f), (bb), (b,€), (ea), (0, &), (es F), (dy d), (eb), (ese), F.@), (Fe), FSD} E2.11.4 Consider the partition {{1, 4,5}, {

}, {3, 8}} of the set {1,2

6,7, 8}. List the ele-

ments of the corresponding equivalence relation and draw a diagram of it.

E2.11.5

Let R and S be two equivalence relations on the same set A.

Define a new relation T such

that T(a,y) if and only if both R(x,y) and S(x,y) are true. Prove that T is an equivalence relation. E2.11.6

Show that the following relations on strings are equivalence relations, and describe the equiv-

alence classes of each relation on the set {a,b, c}°.

(a) R(u,v) meaning “every letter that occurs in u also occurs in v, and vice versa”. (b) E2.11.7

A(u,v) meaning “u and v are anagrams

Let D

be a set of dogs,

containing a subset

and a subset R of retrievers.

Bly)) A (F(@) & Fy) A (R@) (a)

(see Problem 1.2.9). B of black dogs,

a subset

Define a relation E on dogs so that E(2,y)

RY)”.

Prove that E is an equivalence relation. 2-58

F' of female dogs,

means

“(B(a)

(b) E2.11.8

Describe each of the possible equivalence classes of E in English. on D, some of the possible classes might be empty.)

(Note that depending

Let {x In} be aset of atomic variables, and let C' be the set of all compound propositions using those variables. Define the relation E so that if c and d are compound propositions,

E(c,d) means “c 4 d’.

E2.11.9

(a)

Prove that E is an equivalence relation.

(b)

Describe the equivalence classes of E. How many are there?

Let X be a finite set and let P and Q be two partitions of X. We say that P and Q are isomorphic if there is a bijection f from X to X such that for any set S in P, the set

S(S)

= {f(a

I(P,Q) E2.11.10

means

: a € S} is in Q.

Let I be the binary relation on partitions of X such that

“p and q are isomorphic”.

Prove that J is an equivalence relation.

We can represent a rational number by a pair of integers, the numerator and denominator of a fraction, as long as the denominator is not zero. It is possible for two pairs to represent

the same rational number.

Define the binary relation E on pairs of integers (with the second

integer nonzero) so that E((p, q), (r, s)) means that p/q and r/s are the same rational number. (a)

Prove that E is an equivalence relation.

(b)

Prove that there is a bijection between the rational numbers and the equivalence classes

(c)

(uses Java) Write a static method that takes four int arguments p, q, r, and s and determines whether p/q and r/s are the same rational number. Do not use the Java

of E.

division operator in your code”!,

2.11.4 P2.11.1

Problems Let P bea set, each element of which is a subset of some fixed set A.

Prove that the elements

of P are pairwise disjoint and union together to give A if and only if every element of A is an element of exactly one element of P. P2.11.2

Let f be any function from A to B.

is true if and only if f(x) = f(y).

( a) ( b)

‘What do we know about the equivalence classes of R if f is one-to-one?

(a)

Prove that the equivalence classes of R are in one-to-one correspondence with the range

Prove that R is an equivalence relation.

(c) What

P2.11.3

Let R be the binary relation on A defined so that R(«,y)

do we know about the equivalence classes of R if f is onto?

of f (that is, with the set f(A) = {y: da: f(x) = y}).

Let R and S be two equivalence relations on the same

set A.

Define a new relation U such

that U(x, y) + [R(a,y)V S(x,y)]. Is U necessarily an equivalence relation? Either prove that it is or give an example where it is not.

21Because division of floating point numbers introduces rounding errors, it is possible that “p/q == r/s” evaluates to false when in fact p/q = r/s is true. 2-59

P2.11.4

There is only one possible partition of the set {a}. There are two possible partitions of the set {a,b}, corresponding to our equivalence relations E and U. There are five possible partitions

of the set {a,b,c} — list them. P2.11.5

How many partitions are there of {a,b,c,d}?

Of {a,b, c,d, e}?

Suppose we have a set of airports and a symmetric set of direct flight connections.

(That is, if

you can fly directly from airport 2 to airport y you can also fly directly from y to x — we will call this predicate D(a,y).) The relation D may fail to be an equivalence relation because it need not be transitive.

But for any such D,

it turns out that there exists an equivalence

relation F’ such that:

¢ Dz,y) > F(x)

° If D(x,y) +G

y) and G

is an equivalence relation, then F(a,y) > G(a,y)

Find this F’, describe it in terms of D, and prove both that it is an equivalence relation and that it has the specified properties.

P2.11.6

The equivalence relation of isomorphism (see Exercise 2.11.9) divides the partitions of a given set into equivalence classes.

(a)

Divide the five partitions of {a,b,c} into equivalence classes.

(b)

Divide the partitions of {a, b, c,d} into equivalence classes.

(c)

Describe the equivalence

classes of partitions

of {a,b,c,d,e}

— you

need

not

list their

members. P2.11.7

Given any partial order P, we can form its symmetric P and P*.

closure P* by taking the union of

(a) Explain why P® is reflexive and symmetric.

P2.11.8

(b)

Given an example of a partial order P such that P® is not an equivalence relation.

(c)

Prove that ifP has the property from Problem 2.10.8, then P® is an equivalence relation.

Let X

be a

following

SOT=

finite set and let Y be a collection of nonempty

property:

If S and

T are two

different

sets in Y,

subsets of X

that satisfy the

then either S C T,

T C S, or

(a)

Give an example of such a system where X has four elements and Y has seven sets.

(b)

A set in Y is called “atomic” if it has no proper subset in Y. Must the atomic sets of Y form a partition of X? Prove your answer.

(c) Define a relation E on X such that E(a,b) means “for every set S in Y, (a € S) @ (bE S)”. Prove that E is an equivalence relation. (d) P2:11.9

Prove that E(a,) is true if and only if either a and b are in the same atomic set, or neither a nor 6 is in any atomic set.

(uses Java) We two-dimensional

have been representing a binary relation R on the set {0,1,...,n—1} boolean

array

r.

If R is an equivalence relation,

bya

we can also represent

it

by a one-dimensional array that tells which equivalence class contains which element of the base set. We will name each equivalence class by the smallest number it contains, and have a one-dimensional int array c so that c[i] is the name of i’s class, that is, the smallest number

j in the set such that R(i,j) is true.

2-60

(a) Write a real-Java static method int

[ ] class

(boolean

[ ]

[]

r) that returns the

(int

[ ] c) that returns

one-dimensional array listing the classes of each element under R.

(b) Write a real-Java static method boolean

[ ]

[ ] relation

the two-dimensional array representing the equivalence relation whose classes are given by the input array.

P2.11.10

(uses

Java) Suppose we have an equivalence relation R on the set {0,1,...,n—1}

represented

by a one-dimensional int array c, giving the equivalence class of each element as in Problem 2.11.9.

Now

suppose R is to be changed

to a new equivalence relation R’, by adding a new

pair (i,j) and any other pairs that are needed to preserve symmetry and transitivity.

(a) Write a real-Java static method void update

(int[

] c,

int

i,

int

j) that alters

c so that it now represents this new equivalence relation R’, including the pair (i,j).

(b) Write a real-Java static method void update

(boolean

[]

[]

r, int i, int j)

that alters r so that it now represents R’. You may find it convenient to use the methods of Problem 2.11.9.

2-61

Index antireflexive relation 2-36 antisymmetric relation 2-36 associativity of language concatenation 2-23 binary relation 2-2 binding a variable 2-10 bijection 2-42 binary relation 2-2 binary relation on a set 2-39 bound variable 2-10 combinatorics 2-42 composition of functions 2-43

concatenation of languages 2-19 concatenation of strings 2-19

cycle 2-50, 2-53

isomorphic partial orders 2-59

iterate of a function 2-46 join of relations 2-8, 2-13 Kleene star operation 2-20 k-tuple 2-2

ladder tournament 2-54 Least Number Axiom 2-14 linear order 2-48 majority quantifier 2-15 membership

predicate 2-2

matching 2-42

database 2-8

denominator 2-59 dense set of numbers 2-15 direct product

inverse of a function 2-43 inverse relation 2-32

2-2, 2-3

division relation 2-48 dividing a natural 2-48 domain 2-4 dominating a function 2-54 dual properties 2-41

equality relation 2-36 equivalence classes 2-57 equivalence relation 2-37, 2-55

even number 2-14 existential quantifier 2-10

fixing an element 2-47 free variable 2-10 fully comparable relation 2-45

function 2-4, 2-34

graph of a function 2-4 greedy algorithm 2-7 Hasse diagram 2-49 Hasse Diagram Theorem 2-50 identity element 2-20 identity function on a set 2-42 injection 2-42

maximal element of a partial order 2-50 maximum element of a partial order 2-50 minimal element of a partial order 2-50 minimum element of a partial order 2-50

numerator 2-59 odd number 2-14 one-to-one correspondence 2-42

one-to-one relation 2-41 onto relation 2-341 order relation 2-56 ordered set 2-48 pair 2-2

pairwise disjoint sets 2-56 palindrome 2-32

pangram 2-22 parity relation 2-55 partial order 2-37, 2-48 partition of a set 2-56

Partition Theorem 2-57 path-below relation 2-49 predecessor 2-14 predicate calculus 2-10 prefix code 2-23

preorder 2-53 projection 2-6, 2-13 quantified statement

2-62

2-10

quantifiers 2-10

range 2-4 record 2-8 reflexive relation 2-36 relation 2-2 relational database 2-8 Rule of Existence 2-26 Rule of Generalization 2-27 Rule of Instantiation 2-26 Rule of Specification 2-26 scope of a quantifier 2-11 sequence 2-2

signature of a predicate 2-3 single-elimination tournament 2-57 singleton set 2-58 Size-Function Theorem 2-43 star operation 2-20

substring surjection symmetric symmetric

relation 2-49, 2-53 2-41 closure 2-60 relation 2-36

total binary relation 2-35 total order 2-48 transitive relation 2-37 triple 2-2 tuple 2-2 unique existential quantifier 2-14 universal quantifier 2-10

universal relation 2-53 universe of discourse 2-11 well-defined binary relation 2-35

2-63

Chapter 3: Number Theory

I didn’t expect a kind of Spanish Inquisition.” obody expects the Spanish Inquisition. Our chief weapon

and fear ... fear efficiency. Our almost fanatical such elements as

is surprise

... surprise

and surprise ...our two weapons are fear and surprise ...and ruthless three weapons are fear and surprise and ruthless efficiency and an devotion to the Pope ... Our four ...no ...amongst our weaponry are fear, surprise ...I’ll come in again.”

The naturals (a term that refers in this book to the set {0,1,2,3,...} of non-negative integers) are one of the simplest and most important systems of mathematical objects. With the logical tools we’ve

developed,

we can state and

trivial through the important naturals and these statements, in computer science.

prove propositions

about

and ending with the unsolvable.

the naturals ranging from the

Number theory is the study of the

and is a major branch of mathematics with significant applications

e We'll begin with the basic definitions of number

theory, such as divisibility, prime

numbers,

and modular arithmetic. e We'll give proofs from first principles of some of the most important

modular arithmetic,

culminating with the Fundamental

Theorem

facts about primes and

of Arithmetic, which says

that every positive natural has a unique factorization into primes. e Finally,

in some

optional sections,

tiation in modular cryptosystem.

arithmetic,

we'll have a look at some

finishing with the mathematics

of the properties of exponen-

behind

the

RSA

public-key

P1.1.5

(uses Java, harder) Again let C be {0,1,...,15}. We will use Problem 1.1.4 to define a (real) Java class Cset to represent sets of elements of C. Remember that methods within a class may freely call each other. (a) Explain why for any set D with D C C, f(D) may be stored as a Java int variable. (b)

Write the header and instance variable of a class Cset that will store a set D by storing

the number f(D).

(c) Write a method

contains that takes one int argument and returns a boolean, so that

d.contains(i) returns true if and only if the number i is in the set d. If i is not an element of C this method should just return false.

(d)

Write a method size that takes no arguments and returns an int, so that d.size() the number of elements in the set D.

(e) Write a method d.subset(e)

subset that takes one Cset argument and returns a boolean,

returns true if and only if DC

is

so that

E.

P1.1.6 Define E to be the empty set and further define F = {E}, G = {F}, H = {E,F}, and

I ={E,F.G,H}.

(a) Describe each of these five sets as lists using only the symbols 0, {, and }. (b) What is the size of each of the five sets? (c) If X and Y are each any one of the five sets, we can form the statements “X € Y” and “XxX CY”. Exactly which of these fifty possible statements are true? P1.1.7

Let X be the set of even naturals that are greater than 2. Let Y be the set of naturals that can be written as y + 2 where y and z are odd naturals with y z.

(a) Explain why X is a subset of Y. (b) Explain why Y is a subset of X. P1.1.8 If S is any finite set of positive naturals, elements of S together.

let sum(S)

be the natural we get by adding the

(a) There is exactly one set S of positive naturals with sum(S) = 0. Are there any other naturals n for which there is exactly one set S with sum(S) = n? If so, list them all and argue that you have them all. (b) List all the sets S with sum(S') = 10. (Hint: There are exactly nine of them.) (c) Suppose S 4 T but sum(S) = sum(T). or why not? P1.1.9

Is it possible that S C T or that T CS?

Let D be a set of dogs, R a set of retrievers, and G a set of golden retrievers.

Assume

Why that

GC Rand that RCD.

(a) Suppose we know that all three sets are different. existence of specific dogs? (b) P1.1.10

If X

What

does this tell you about

Suppose we know that D ¥ G, but that the three sets are not all different. What this tell you about the existence of specific dogs? is any

{Y :Y CX.

set, the

power

set

of X

is a set of sets, defined

We write the power set of X as P(X). 1-9

in set builder

the does

notation as

3.1

Divisibility and Primes

3.1.1

The

Mathematics

Natural Numbers can be defined

as the

practice

of making

formal

systems

out of common

human

acti s!, and counting is perhaps the most fundamental of these. Humans count things, which led to the invention of the naturals, which are the possible results of a counting process — zero, one, two, three, and all the rest of the non-negative integers. Computers count things as well, such as the characters in a file, the pixels on a screen, or the pennies in a bank account. At some level every piece of data in a computer is really a sequence of

booleans, of course, but it is often useful to organize these bits into things which act like naturals or integers. They aren’t really naturals or integers because they are limited in size, typically to the size that will fit in one machine word. In most computer languages, the programmer must learn the difference between numbers of this kind and real numbers, stored as approximations in floating-point notation. They are separate data types, each with its own operations and properties. Number Theory is the branch of mathematics that deals with naturals, and in particular with the statements about naturals that one can make with the predicate calculus. We will have formulas,

statements of number theory, where the variables will each be of type natural and the atomic statements will be things like “a = “a>, “a+b=c’, and “a-b =c”. With quantifiers, we can say fairly complicated things, such as “Ja : 4b: Ve: (a-c =a) A (b-¢ =)”, a statement that two naturals exist with certain properties.

naturals?)

(Is this statement

true?

If so, what

are the two

Just as in Chapter 2, we can use the predicate calculus to define properties in terms of other properties. Perhaps our most important property is the “divides” relation from Section 2.10, which is defined so that “a divides b” if and only if Je: a+ c = b. Once this property is defined, it can

be used in more number theory statements. For example, we will shortly define a prime number in terms of this relation. In this book we will be a bit informal about the exact rules of our language for formulas, though it’s possible to fix such rules exactly and write some of the atomic formulas in terms of others”. ‘Why study number theory? There are practical uses for it in computer science, of course, in any situation that involves counting. You may be familiar with a hash table, which is a sequence of n data locations in which items are to be inserted as they arrive. You compute (using number theory, as it happens)

an address for the item and see whether that address is vacant.

Under one scheme

called open addressing, if it isn’t you try again at the address k locations later, where k is some fixed natural. If this brings you past the last location, you “wrap around” to the beginning. If you still get a filled location, you jump k locations again, and keep doing until there’s a vacant one. One feature of this scheme is that if k and n are chosen correctly, if there is any open location at ‘For a beautiful exposition of what mathematics is and what it is for, read Mathematics:

Form and Function by

Saunders Mac Lane.

2Hofstadter defines his “Typographical Number Theory” carefully so that everything is defined in terms of a

symbol for zero, the equals sign, and functions for successor, addition, and multiplication. He goes to some trouble to get a particular exact number of symbols in his alphabet, for non-mathematical reasons.

3-2

RON

2:WW >.

MK

oa

\ \\\\ CY I ' @©Kendall Hunt Publishing Company

Figure 3-1: A hash table. all you'll eventually find it.

Arrows show the jumps in looking for an empty space.

(Figure 3-1 shows an example with k = 3 and n = 8.)

Determining

which values of k and n have this property is a good example of a number-theoretic question.

More philosophically, studying the most fundamental mathematical system is valuable for what it tells us about how

mathematics

works and how

the world works.

There

theory that are very easy to state but that no one currently can answer*.

are questions in number

But if you believe that

the naturals exist in a philosophical sense, any statement of number theory must be true or false. Either a natural with some property exists, or it doesn’t. Either all naturals have a certain property, or they don’t. Just as with the propositional calculus, you’d like to find a proof system that is

complete (can prove all the true statements of number theory) and consistent (doesn’t prove any false statements). Although the best mathematicians in the world were seriously looking for such a thing early in this century, we now know it doesn’t exist. Gédel proved in 1931 that if you set down the rules for any mechanical proof system S$ that can talk about number theory, then there must be a statement of number theory that says, in effect, “Tam not provable in system S”. If this statement were false, it would be provable, and S would thus be able to prove a false statement and be inconsistant. On the other hand, if it is true, then it must not be provable as it says, and S is incomplete. Showing that such a statement exists is

well beyond the scope of this book — to begin with you have to define a particular proof system for number theory in far more detail than we will use here — but Hofstadter does this in Godel, Escher,

Bach'.

She usual example of such a question used to be Fermat’s Last Theorem, which is still easy to state (see the discussion in Hofstadter) but is no longer unsolved thanks to Andrew Wiles. The Goldbach Conjecture (“Every even number greater than two is the sum of two prime numbers”) is perhaps even easier to state and remains unresolved. “The philosophical implications of Gédel’s theorem are one of Hofstadter’s main topics. For example, does the fact that some mathematical questions are provably unsolvable say anything about whether true artificial intelligence can be achieved?

3.1.2

Primes

and Primality Testing

Our goal in this chapter will be to learn some of the more interesting things that can be said within number theory, and to learn some techniques (mostly informal) to prove some of them®. We begin by giving a few of the most basic definitions. We’ve already seen the division relation D(a,b) on naturals, which in Java terms is true if and only if bha 0. Some naturals, like 60, have lots of other numbers that divide them, and some, like 59, have very few. In fact the only two naturals

that divide 59 are 1 (which divides every natural) and 59 (as any natural divides itself). Naturals that have this property are called prime

Definition:

numbers®.

The predicate “P(a)”, read in English as “a is prime”,

is defined by the rule:

P(a) & (a> 1) A (D(b,a) > [(b= 1) V(b =a))) or equivalently, by expanding the definition of “divides” ,

P(a) @ (a> 1) A73b: de: [(b > 1) A(e > IA (b-c =a)]. Definition:

A natural x (with x >

(Note that we have thus defined

1) is said to be a composite

number

1 to be neither prime nor composite).

if it is not prime.

A composite

number

can

be factored, or written as the product of two smaller naturals (each greater than one). ziven a natural a, how do we determine whether it is prime? Proving it to be prime would mean proving the non-existence of factors, which would mean ruling out all the possible naturals d between 1 and a as possible divisors of a. This trial division algorithm is the simplest test for

primality. ‘We can take a shortcut that will improve the running time of trial division. If a = dc for some d and c, at least one of d or c must be at most the square root of a (Why? See Exercise 3.1.8). So if we check that all b between 1 and \/a fail to divide a, a must be prime. Let’s look at this test in pseudo-Java code:

public if

boolean (a

natural

while

if

1landz>1 and then factor y and z”. This algorithm can’t go into an infinite loop (Why?) and so has to stop with all the numbers being prime’. Let’s look at some recursive pseudo-Java code to print out a factorization:

"We don’t yet have logical tools to prove assertions about what will “eventually” happen in such a process — this will be one of the goals of Chapter 4. For the moment, we'll rely on informal intuition about, for example, what a recursive algorithm might do

®Kendall Hunt Publishing Company Figure 3-3: The Hasse diagram for divisors of 60.

public void factor // prints sequence //

special

cases:

natural

d

if

(x a) return return simpleEA(a

(natural

a,

natural

a;

simpleEA (b, - b, b);}

b)

{

a);

(a)

Trace the execution of simpleEA on inputs 4 and 10.

(b)

In the last line, how can we be sure that the subtraction a — b does not have a negative result?

(c)

Explain why simpleEA

outputs the greatest common

divisor of a and b.

(Hint:

Argue

that its output is the same as the last nonzero number in the sequence from the original

Euclidean Algorithm on input a and b.) E3.3.9

Let f and g be polynomials,

relation D(f,g) (a)

in one variable

to mean 3h: fh = g, that is,

x,

with real number

“f divides

coefficients.

Define

the

g”

Prove that D is reflexive and transitive, but is not a partial order on polynomials because it is not antisymmetric.

3-19

(b)

A monic

polynomial

is one whose

highest-degree

coefficient is 1.

Show

that D

is a

partial order on monic polynomials.

(c) E3.3.10

Prove that for any nonzero polynomial p, there exists a monic polynomial m such that

D(m,p) and D(p,m) are both true.

Prove carefully that if a = qb+ r, where all four numbers are naturals, and a and b are each

integer linear combinations of two other naturals m and n, that r is also an integer linear combination of m and n.

3.3.4 P3.3.1

Problems Show that if and y are relatively prime, any integer z can be written as a linear combination ax + by = z, where a and b are integers. Ilustrate this by writing 343 as a linear combination of 1729 and 4096.

If x and y are not relatively prime,

which integers can be written as linear

combinations of x and y? P3.3.2

Let n be any natural.

are relatively prime.

Define J, to be the set of all naturals a such that a .

We saw in Chapter 1 that we can prove

a proposition p by assuming 7p and deriving something absolutely false (a “contradiction”). Let p be the proposition

“There are infinitely many

primes.”

If we assume

—p, then there must

be a

finite list containing all the primes. But then if we multiply together all the primes on this list and add 1, we get a number which is not divisible by any prime on the list. Since every number greater than 1 has at least one prime divisor, we have contradicted the hypothesis that the list contained all possible prime numbers.

It’s worth looking at what assumptions about the naturals we took for granted in this proof. We assumed that we could add and multiply numbers and always get new numbers”® and that we could add congruences (as we proved in Section 3.3, using the laws of arithmetic only). The biggest assumption was that every number greater than 1 has at least one prime divisor, which we argued was true on the basis of the behavior of an algorithm (“try all possible proper divisors, if you find one the first one must be prime, if you don’t the number is prime . We'll see in Chapter 4 that this assumption can also be proved formally, from a suitable definition of the naturals.

Note also that this abstract proof has told us something about numbers that are too large for us to ever use in a real computation.

We know now,

for example,

that there is a prime number

that

is too big to be written on a 20-megabyte hard disk, because our theorem tells us that there is a

240! is 1, because an empty product (a product of no terms) is defined to be 1 25Some mathematicians, including a school called the intuitionists, are bothered by proofs by contradiction and

prefer to turn them into direct proofs wherever possible. For our purposes, we may be satisfied that it is a valid proof technique in any situation, but direct arguments are often clearer (this is a matter of individual taste). Avoiding argument by contradiction (called in Latin reductio ad absurdum) does have the advantage that you don’t accidently use the invalid proof technique of reductio ad erratum. This consists of assuming the negation of the desired propo-

sition and deriving as many consequences as possible, until a typographical or logical error results in your deriving two consequences that contradict each other.

°6This isn’t true of the int or long data types in Java — why?

3-24

than 2169-90.000 Pp rime number bigger Be numbers?

3.4.4 E3.4.1

Does it even make sense to talk about such absurdlyry big?” big’

Exercises Let a and

z be naturals

and

suppose

that

no prime

number

b, with 2 < b < a, divides

z.

Prove that no composite number in this range divides z either. E3.4.2

Calculate the number

z = a! + 1 for each value of a from

1 through

10, and

find a prime

number greater than a that divides z. (Hint: It may be useful to calculate \/2 first, to see how many primes you will have to check as possible divisors.) E3.4.3

Let f(a) be defined

{(6) =2-3-5+1=31, primorial of a.)

E3.4.4

to be the product

of all the primes less than or equal to a, plus 1.

for example.)

(So

(The product of primes itself is sometimes called the

Find the smallest value of a such that f(a) is composite.

Prove carefully that given any set S of naturals, number n such that for any number

x in S,n

each greater than 1, there exists a single

% x ==

1. (That is, prove

VS : (Va: (a €S) > (a >1)) 9 An: Va: («& € S) > n%er = 1.) E3.4.5

Suppose that the arithmetic progression (a,a+ b,a+ 2b, ...) contains infinitely many primes.

Prove that a and b are relatively prime. E3.4.6

Let n be any positive natural.

(Hint:

Argue by contradiction.)

We say that a natural a with 0 < a < nis a

perfect

square”®

modulo n if a is congruent to 6? modulo n for some natural b. (We don’t count 0 among the perfect squares,

though it is the square of 0.)

(a)

For each n from 3 through

(b)

Prove that if a and b are perfect squares,

squares.

15,

determine

which

numbers

in {0,1,...,n}

are perfect

so is ab.

E3.4.7

Explain why, if n is odd, there can be no more than (n — 1)/2 perfect squares modulo n.

E3.4.8

In Problem 3.9.5 we will prove that p— 1 is a perfect square modulo a prime p if and only if p=2or pis of the form 4k + 1. Verify this fact for the primes less than 20.

E3.4.9

It is also true that if p is a prime with p > 3, then —3 is a perfect square modulo p if and

only if p is of the form 6n+ 1. Verify this fact for all such p less than 20. E3.4.10

Fix a natural n and let r be the number of primes that are

les than 2”. We know that every

positive natural x with a 1.

Argue that No = n(n + 1) must have at least two

different prime factors.

3-26

b)

Define N3 = No(No+1).

Argue that N3 must 8

(c)

Continue the argument to show that for any number k, there must be a natural with at

(No’s two, plus at least one more).

have at least three different prime factors

least k different prime factors, and hence that there must be infinitely many

P3.4.8

primes.

Using the result of Exercise 3.4.10, we can get one more proof that there are infinitely many primes.

Suppose that for any n, the number r of primes that are < 2” was bounded by some

fixed number c. Show that the function given by prime factorization cannot be one-to-one if n is sufficiently large.

P3.4.9

Let

r(n)

be the number

of primes

that

are less than

or equal

to 2”,

A natural

question,

once we know that r(n) is unbounded, is to estimate how fast it grows with n. The Prime Number Theorem says that is proportional to 2”/n, but proving that is beyond us here. What can we show given the result of Exercise 3.4.10? That is, how large must r(n) be to

allow the function from {1,..., 2"} to {0,1,...,n}" to be one-to-one?

P3.4.10

Here is an argument that gets a better lower bound on the function r(n) from Problem 3.4.9, though it uses an assumption that we are not yet able to prove. Consider finding all the primes than 2” with a Sieve of Eratosthenes. We begin with 2” numbers. Removing multiples of 2 eliminates 1/2 of them. Removing multiples of 3 removes 1/3 of them. Our assumption will be that it removes 1/3 of the numbers remaining after the multiples of 2 have been removed. Then we will assume that removing multiples of 5 eliminates 1/5 it of those remaining, and so forth. We know that once we have eliminated all multiples of primes that are at most Jar = 2”/2, the numbers remaining are prime.

(a)

Given our assumptions, explain why the eventual fraction of numbers remaining is more

(b)

Explain why the result of part (a) implies that r(n) > 2"/?.

than (1/2)(2/3)(3/4)...((2” —1)/2"/?).

3-27

Ss

M

JULY 2019 T W T

12 8 7 8 (8) 10 15 16 17 21 22 23 28 (29) 30 31

Ss

M

F

S

5

6 13 20 27

F

S

112 18 25 26

AUGUST 2019 T W T

24 31

©Kendall Hunt Publishing Company Figure 3-4: Pill days are circled — pill-Wednesdays come every five weeks.

3.5 3.5.1

The Chinese Remainder Theorem Two

Congruences

With

Different

Moduli

Suppose that you have to take a pill every five days. How often will you take your pill on a ‘Wednesday? This is an example of two interacting periodic ems that can be analyzed using number theory. Assign a natural to each day, perhaps by taking the days since some arbitrary starting point’, and notice that our two conditions can be described by modular arithmetic. Day number « is a Wednesday if and only if« (mod 7), where c depends on the day of the week of our starting point. Similarly, day number z isa pill day if and only i d (mod 5), where again d depends on the starting point. The numbers of the days that are both Wednesdays and pill days will be those naturals x that satisfy both of these congruences. ‘We've seen how to work with more than one congruence with the same base, but here we have two congruences with different bases. How do we solve such a system of congruences? A bit of playing around with the above example (see Figure 3-4) will show that the special days appear to occur exactly every 35 days, and this is an instance of a general phenomenon first noted in ancient China®’: The Chinese Remainder

Theorem

(Simple Form):

two congruences « = a (mod m) and a =

If m and n

are relatively prime, then the

b (mod n) are satisfied if and only if 2 = ¢ (mod mn)

where c is a natural depending on a, b, m, and n. ?° Astronomers, for example, start counting with 1 January 4713 B.C., the start of the “Julian Period”

“©The problem is solved in Master Sun’s Mathematical Manual from the 3rd century C.E. (for an example with the

three moduli 3, 5, and 7), and by the fifth-century Indian mathematician-astronomer Aryabhata. The earliest known detailed general solution is by the Chinese mathematician Qin Jiushao in 1247. 3-28

‘We'll soon prove this simple form of the theorem and then move to the full statement of it, involving

more than two congruences. prime?

If we don’t have

But first, why do we need the part about m and n being relatively

it, the conclusion might be false, as in the example of the two congruences

x = 1 (mod 4) and x = 4 (mod 6) which have no solution (Why not?). In Problem 3.5.3 we'll look at how to solve an arbitrary system of congruences (or determine that it has no solution) by converting it into a system where the bases are relatively prime.

Proof of the Simple Form of the CRT: We need to show that 2 = a (mod m) and a = b (mod n) if and only if 2 = c (mod

mn),

halves of a logical equivalence.

which means

that we need to first define c and then show

Our main technical tool will be the Inverse

(since m and n are relatively prime)

Theorem,

both

which tells us

that there are two integers y and z such that ym + zn = 1.

This implies both ym = 1 (mod n) (y is the inverse of m modulo n) and zn = 1 (mod m) (z is the inverse of n modulo m). To construct c, we'll use these congruences multiplying and adding congruences over a single base. To get something congruent

to a modulo

and our facts about

m, we can multiply both sides of the congruence zn =

1 (mod m) by a to get azn = a (mod m). (If we like, we can think of this as multiplying by the congruence a = a (mod m).) Similarly, multiplying the other congruence by b gives us bym = b (mod n). Now we can notice that the left-hand sides of each of these congruences are congruent

to 0 modulo

the other base.

So if we add the congruence

bym

= 0 (mod

m)

to

azn = a (mod m), we get azn+bym = a (mod m), and similarly we can get azn+bym = b (mod n). Setting c to be azn +

bym, then, we have a solution to both congruences.

Furthermore,

x =c (mod mn), « is equal to c+ kmn for some integer k, and since both m and n know that « will satisfy both congruences as well.

as long as

divide kmn we

It remains to show that if x satisfies both « = a (mod m) and b (mod n), then it satisfies a =c (mod mn) as well. Let d = x —c. It’s easy to see that d = 2 — azn — bym is divisible by both m and n, using the arithmetic above. We need to show that d is divisible by mn, using the fact that m and n are relatively prime. If d = 0 this is trivially true — if not we may run the Euclidean Algorithm®! to find the greatest common divisor of d and mn, which we'll name q. This q must be a common multiple the Euclidean Algorithm will relatively prime, we know that Since q divides d, we are done

Example:

of m and n because both these numbers divide both d and mn, and preserve this property. But by Problem 3.1.5, because m and n are mn is their least common multiple, making mn = q the only choice. — x and c are congruent modulo mn. a

Suppose that we need to solve the two congruences

Since m = 15 and n = 16 are relatively prime,

solution will be of the form « = c (mod 240). error —

the Chinese

4 (mod 15) and a = 8 (mod 16). Remainder

Theorem

tells us that the

In small cases, it’s often easiest to find c by trial and

in this example checking 8, 24, 40, and so on until we run into a number that is congruent

to 4 modulo 15. But let’s go through the general solution method.

We have a formula for c, but it

requires the inverses of 15 and 16 modulo each other (y and z in the expression azn + bym). Si

16 =1 (mod 15), we can take z = 1, and since 15 = —1 (mod 16) we can take y = —1 or y = 15. (If we weren’t so lucky we'd have to use the Euclidean Algorithm to get the inverses, as in Section

3.3.) This gives us c = 4-1-16+8-(—1)-15 = 64 — 120 = —56 = 184 (mod 240).

51 Actually d may be negative, but if so we may run the Euclidean Algorithm on —d and mn

3-29

3.5.2

The

Full Version of the Theorem

If we have more than two congruences, the condition on the bases becomes a

little more complicated.

If any two of the bases have a common factor greater than one, it might be impossible to satisfy those two congruences together, and thus definitely impossible to satisfy the entire system. So to have a solution, we need to have the bases be pairwise

relatively

prime, which means that any

two of them are relatively prime to each other. The

Chinese

Remainder

Theorem

(Full

Version):

positive naturals that are pairwise relatively prime.

& =

ay (mod my),

where M

= m,-m

Let

™mi,mz2,..., my,

Any system of congruence:

be

a sequence

of

1 (mod m,)

a2 = ax (mod m,) is equivalent to a single congruence « = c (mod M),

. my, and c is a natural that depends on the a;’s and on the m;’s

Proof: If m,,mg,...,m, are pairwise relatively prime, then the number m my must be relatively prime to each of the numbers m3, m4, ..., ™M,. (We'll prove this as Exercise 3.5.1.) So if we apply

the simple form of the Chinese Remainder Theorem to the first two congruences, getting a single congruence 2 = b (mod mj mz), we are left with a system of k — 1 congruences whose bases are pairwise relatively prime. Similarly, we can combine this new first congruence with the third using the simple form of the theorem, and continue in this way until there is only one congruence left*?. Because we are multiplying bases each time that we combine congruences, this last congruence has the desired form. And since at each step we replaced a system of congruences by an equivalent system (one which was satisfied by exactly the same values of x), the last congruence is equivalent to the original system. Alternatively, we can calculate c directly and verify that it works, just as we did for the simple theorem. For each base m;, we can calculate because this number is relatively prime to m;.

an inverse n; for the natural M/m; modulo Then ajn;(M/m,) is congruent to a; modulo

m,, m;,

and congruent to 0 modulo any of the other bases. If we set ¢ to be aynq(M/m,) + agn9(M/m) + - +a4(M/mg,), then c satisfies all k of the congruences in the system. If = ¢ (mod M), then a — cis divisible by each of the bases mj, and arguing as in the simple form of the theorem we can show

that

c must

be divisible by M.

a

To illustrate the full version of the theorem, let’s return to our initial example.

Suppose that along

with pill days whenever « = 3 (mod 5) and Wednesdays whenever x = 4 (mod 7), we now introduce massages every six days, whenever « = 0 (mod 6). The full version of the theorem says that all thre ents will happen exactly when 4 e (mod 210), for some value of c. To calculate c, we

need the numbers m; (5, 6, and 7), the numbers a; (3, 0, and 4), the numbers M/mj (42, 35, and 30) and the numbers n; (the inverse of 42 modulo 5 which is 3, the inverse of 35 modulo 6 which is 5, and the inverse of 30 modulo 7 which is 4). Then ©

=

ayn (M/m)) + agn2(M/mz) + azn3(M/m3)

=

3-3-424+0-5-35+4-4-30

=

378 +480 = 858 = 18 (mod 210).

“This argument is a bit informal because we don’t yet have formal techniques to deal with the “...” in the

statement of the problem — this will be remedied in Chapter 4.

3-30

‘We can easily check that 18 satisfies the given three congruences. One use of the Chinese Remainder Theorem is a method to store very large naturals on a parallel computer. If you know what « is congruent to modulo several different large prime numbers (prime numbers are necessarily pairwise relatively prime), the theorem says that this specifies 2 modulo the product

of those primes.

Suppose that x does not fit on a single machine word,

but that each

of the remainders (modulo the different primes) does. You can put each of the remainders on a different processor and you have in a sense stored x in a distributed way. If you want to add or multiply two numbers stored in this way, it can be done in parallel, as each processor can carry out the operation modulo its prime. The only problem is that you have to combine all the remainders

in order to find out what

the result really is in ordinary notation.

But

if you have to do lots of

parallelizable operations before computing the answer, it might be worthwhile to do all of them in parallel, and convert the answer to ordinary notation at the end.

3.5.3

Exercises

E3.5.1

Prove that if m1,mo2,...,m, are pairwise relatively prime, each of the numbers m3,™m4,..., Mx.

E3.5.2

Find a

E3.5.3

Here are three systems of congruences where the bases are not pairwise relatively prime. You are to find all solutions to each system, or show that no solution exists. (Hint: What do the conditions say about whether a is even or odd?)

single congruence that is satisfied if and only if x =9

x = 3 (mod 13).

(a) a =5 (mod 6), (b) E3.5.4

then m mz

a=

11 (mod 12)

(c) «= 7

(mod 9)

7 (mod 8),

=3

9 (mod 14)

(mod 11),

is relatively prime to

« =6

(mod 12),

and

(mod 10). 5 (mod 16).

= 10 (mod 12).

4 (mod 10), «

Suppose two integers x and y satisfy the congruences x = 4 (mod 7), y = 2 (mod 7), 3 (mod 8), y = 1 (mod 8), 7 (mod 9), and y = 5 (mod 9). What are the residues of

xy modulo

7, 8, and

9?

Find

a number

z less than 504 such that ayz

=

1 (mod

504).

(Hint: Find the residues of z modulo 7, 8, and 9 first, and you need carry out the Chinese Remainder Theorem process only once.) E3.5.5

We say that three naturals a, b, and c are relatively

number

d > 1 that divides all three.

prime

if there does not exist a single

Give an example of three naturals that are relatively

prime, but not pairwise relatively prime.

E3.5.6

About a thousand soldiers are marching down a road, and their commander would like to know exactly how many there are. She orders them to line up in rows of seven, and learns that there are six left over. She then orders them to line up in rows of eight, and there are seven left over. Finally, she orders them into rows of nine, and there are three left over. How

many soldiers are in the group? E3.5.7

Someone ernment

on the internet, calling themself Mr. secrets for $100.

However,

Rabbit

Rabbit,

will accept

3-31

has agreed to sell me a file of govpayment

only in one of two obscure

ASCII string, a Java String (in which case it is written as " , two double-quotes enclosing no characters between them), and a binary string. If the alphabet is the empty set 0, it is impossible to have a string of one or more letters from @ because there aren’t any letters. But it is possible to have a string of zero letters from 0, so is a member of *. Since is the only string in 0*, we

know that 0* = {A}.

Definition: Two sequences are defined to be equal if they consist of exactly the same elements in the same order. Hence two strings are equal if they have the same number of letters, the same

first letter (if they have a Example:

abc abcc the one

Suppose

first letter), the same second letter, and so on.

our alphabet

is {a,b,c}

and consider the string abc.

This string is equal to

(itself), but is not equal to ab (which has fewer letters), aba (which has a different third letter), (which has more letters), or cab (which has the same letters but in a different order). In Java, strings "abc" and "abc " are not equal, because the first one has three letters and the second has a fourth letter, a space.

It’s this definition that lets us talk about “the empty string” instead of just “an empty string”, because any two empty strings must be equal. If two strings are empty, then they fit our definition of equal strings because they have the same number of letters (zero) and each corresponding letter is the same. It may be confusing to speak of “all the letters being the same” when there aren’t any letters, but we will have to get used to it because this is part of the mathematical meaning of “all things”. We'll see this concept in more detail in Chapter 2 when we talk about quantifiers.

Definition: The length of a string is the number of letters in it. If B is an alphabet and i is a natural number, B’ is the set of strings in B* whose length is exactly i. The length of a string u is written “ Example: The string abc has length 3, so |abc| = 3. The Java string "the cat" has length 7, since the space in the middle is one of the letters. The empty string A has length 0, so |A| = 0. In

fact for any alphabet A, \ is the only string of length 0, so the set A° is equal to {A}.

Let’s consider the binary alphabet B = {0,1}. The set B* is infinite, but the set of strings of any particular length is finite. There is exactly one binary string of length zero (A), so the set B® is

equal to {A}. There are two strings of length one (0 and 1), so B! is {0,1}, the same set as B. If we consider all of the possibilities carefully, we find that B? has four strings (00, 01, 10, and 11), B® has eight, and B‘ has sixteen. What about the strings of length k, for any number k? You may know, or you may have heard, that there are exactly 2" of them. If you believe this, you should think about why you believe it, and how you might explain it to someone who didn’t believe it — in Chapters 4 and 6 we’ll learn formal techniques to solve counting problems like this. We can make a

single infinite list containing all the strings in B* by listing first the element of B®,

then the elements of B!, then those of B?, and so on:

B* = {X,0,1, 00,01, 10, 11, 000, 001, 010,011, 100, 101,...}. Another way

to generate this same list is to take the naturals in binary representation,

1-12

starting

cryptocurrencies, Batcoins (currently worth $51 each) and Twitcoins (currently worth $32 each). For technical reasons, Batcoins and Twitcoins cannot be broken into fractions like Bitcoins — they must be transferred entirely or not at all. Both Rabbit and I have plenty of each kind of coin available. How can I pay Rabbit exactly $100 by transferring integer numbers of Batcoins and/or Twitcoins from me to Rabbit and/or from Rabbit to me? E3.5.8

Mr. Lear, an elderly man with three daughters, is making arrangements for his retirement. His bank accounts are accessed by a four-digit code, which we may think of as a natural less than 10000. He gives each of his daughters partial information about x, so that none of them can determine a on her own. He tells Cordelia the remainder «%97 from dividing a by 97. He tells Goneril 7%115, and tells Regan 7%119. Explain why any two of the daughters, by combining their information, can determine «.

E3.5.9

Let p, and

q, be two pairwise

function from

relatively prime

{0,1,...,pq — 1}

to the

set

naturals,

each greater than

1.

Let f be the

{0,1,...,p — 1} x {0,1,...,q — 1} defined

f(x) = (x%p, x%q). Prove that f is a bijection. E3.5.10

3.5.4 P3.5.1

Let n and a be positive naturals. Prove that a has a multiplicative inverse modulo only if for every prime p dividing n, a has an in e modulo p.

by

n if and

Problems The Julian calendar*® has years of 365 days unless the year number satis in which case the year has 366 days (a “leap year”). (a)

0 (mod 4),

George Washington was born on 11 February 1732, a Friday, according to the Julian calendar. Explain why 11 February in year x of the Julian calendar is a Friday, if

1732 (mod 28). (Note that this is not just a straightforward application of the Chinese Remainder Theorem).

P3.5.2

(b)

What day of the week was 11 February 1492, according to the Julian calendar? Explain

(c)

A “perpetual calendar” is a single chart including all possible calendars for a given year. How many calendars are needed? Explain how to determine which calendar is needed for year you know a congruence for 2 modulo 28.

your reasoning.

The Gregorian calendar (the one in most general use today)* is the same as the Julian calendar except that there are 365 days in year « if x is congruent to 100, 200, or 300 modulo

400. (a)

In the Gregorian calendar, as students of World War II may recall, 7 December 1941 was a Sunday. We cannot, as in the case of the Julian calendar, guarantee that 7 December of

8 Actually no relation to the Julian Period mentioned

above —

the calendar was devised by Julius Caesar and

the Period was named by its inventor, Joseph Justus Scaliger, after his father, who happened to be named Julius as well.

The starting date for the Period,

1 January 4713

respectively, were all in their desired starting positions.

“Great Britain and its coloni

B.C., was chosen so that three cycles, of 28, 19, and 15 years

How often does this happen?

itched from the Julian to Gregorian calendar in 1752, when they were considerably

out of step with each other — to see how this was implemented enter cal 1752 on any Unix machine. Washington, who was alive at the time of this change, retroactively changed his birthday to 22 February.

3-32

George

941 (mod ec)

year x was a Sunday if a = 1941 (mod 28), but we can guarantee it if x for some value of c. Find the smallest value of c for which this is true. (b)

Determine the day of the week on which you were born, December 1941 was a Sunday. Show all of your reasoning.

(c) What

(d) P3.5.3

using

only the fact that

7

additional complications arise when designing a perpetual Gregorian calendar?

In what years, during the period from 1 to 1941 A.D. (or 1 to 1941 C.E.), have the Gregorian and Julian calendars agreed for the entire year?

Suppose we are given a system of congruences:

a2

=

a, (mod m)

2

=

az (mod

mz)

ax (mod mx),

without any guarantee that the mj,’s are pairwise relatively prime. (a)

A prime power is a number that can be written p® for some prime number p and some positive number e. A consequence of the Fundamental Theorem of Arithmetic (which we'll prove soon) is that any number has a unique factorization into prime powers. Show

that we can convert any congruence into an equivalent system of congruences where each base is a prime power and the bases are pairwise relatively prime. (b)

Let p be a prime number, and suppose that we have a system of congruences where each base is a power of p. Explain how to tell whether the system has a solution, and how to

find it.

(c) Using parts (a) and (b), explain how to determine whether an arbitrary gruences has a solution, and if so how to find all solutions. P3.5.4

Suppose that the naturals mj,

mg are pairwise relatively prime and that for each i from 1

(mod m,) and the natural y satisfies y = y; (mod m). yi (mod m;) and 2 + y satisfies («& + y) =

through k, the natural « satisfies Explain why for each i, xy satisfie: (ai + yi)

(mod

m).

Now

suppose

that

21,...,z;

arithmetic expression in the z;’s (a combination

are some

naturals

and

that

of them using sums

and

products)

result is guaranteed to be less than M, the product of the m;’s.

Remainder

an

whose

Theorem

only

(uses Java) Write a real-Java static method that takes three moduli m;,m2,mg and three residues £1, 29,23 as input. It should check that the moduli are pairwise relatively prime, and

if they are, it should output a number z that satisfies all three congruences

P3.5.6

we have

Explain how we can compute

the exact result of this arithmetic expression using the Chinese once, no matter how large j is.

P3.5.5

system of con-

In Problem 3.3.2 we defined the Euler

totient

2=

function ¢(n), where n is a natural, to be

the number of naturals in the set {0,1,..., n} that are relatively prime to n. Remainder Theorem allows us to calculate ¢(n) for any n with a little work: (a)

Prove

that

p?\(p— 1).

if p is any prime

and

e is any

3-33

x; (mod mj).

positive natural,

then ¢(p°)

The Chinese

= p® — p@!pend =

(b)

Prove that if r and s are any relatively prime naturals each greater than 1, then $(rs) =

¢(r)¢(s). (Hint: Use the bijection of Exercise 3.5.9.) (c) Combine (a) and (b) to get a rule to compute ¢(n) for any natural n. Illustrate your method by finding (52500).

P3.5.7

Following Exercise 3.5.9, let p1,...,p,

P3.5.8

Let X be a finite set and let f be a bijection on X.

be a pairwise

relatively prime set of naturals,

greater than 1. Let X be the set {0,1,...,p;-—1} x... x {0,1, f from {0,1,...,pip2...py — 1} to X by the rule f(a) = (x%p; a bijection. f™,

is the function defined so that f(x)

each

pr — 1}. Define a function r%pp). Prove that f is

Recall that the n’th iterate of f, written

is the result of applying f to x exactly n times.

‘We define the period of f to be the smallest positive natural n such that f”

is the identity

function. (a) Why

P3.5.9

must every f have a period?

(b)

Show that if X has exactly three elements, every bijection on X has period 1, 2, or 3.

(c)

How large must X be before you can have a

(harder) Following Problem 3.5.8, let m(n) has exactly n elements.

bijection with period 6?

be the largest period of any bijection on X if X

(a)

Let pi, p2 py be pairwise relatively prime naturals with p; +...p, there is a bijection of period p,p2...p, on X.

0, then a = b. (Hint: Suppose a=b-+e with c > 0 and derive a contradiction. What simpler basic properties of numbers

do you need?) E3.6.4

Prove formally and carefully, using the definition of the primality predicate P(x) in terms of

the division relation D(a,y),

that Va : Vb: (D(a,b) A P(a) A P(b)) > (a = b).

E3.6.5

A positive rational number

is defined to be a number

of the form a/b where

a and b are

positive naturals. Explain why every positive rational number has a factorization, where each factor is either a prime number p or the reciprocal of a prime number, 1/p. Is the factorization into factors of this kind unique?

E3.6.6

Let r be any integer. If r is positive, /r is a real number, and if r is negative, /r is an imaginary number. In either case, we can consider the set Z[/r] of numbers that can be

written as a+ by/r for integers a and b.

(a) Show that if «= a+b,/r and y = c+dy/r are two numbers in Z[,/r], then the numbers x+y and zy are both in Z[yr]. (b) We define the norm n(x) of a number x = a + by/r to be a? — rb?. Show that for any numbers x and y in Z[/r], n(ay) = n(a)n(y). E3.6.7

In this problem we will work with subsets of some nonempty

operation be our “multiplication”. another subset Z, written D(Y,Z)

“prime” if Y 4 0 and if D(Z,Y)

We will say that and if IW :WUY = Z.

say that one subset Y “divid We will define a subset Y to be

is true, then either Z =) or Z=Y.

(a) Show that D(Y, Z) is true if and only if Y C Z. (b)

finite set X, and let the union

Which subsets are “prime” according to this definition?

87Zero is a special case in any system, and we don’t worry about factoring it.

3-38

Explain why any subset

“factors”

uniquely as a product

Prove a version of the Atomicity Lemma

of “primes”.

for these “primes” .

Show that any nonempty set that is not “prime” fails to satisfy the Atomicity Lemma, even in the simple version.

E3.6.8

Let p be a prime number and consider the set {0, ;p — 1} of integers modulo p. Using multiplication modulo p, we can define a division relation D(a,y) @ dz: az =y and use this to define a notion of “prime number”. Which numbers divide which other numbers, and what are the “primes”? Does every nonzero number have a unique factorization into “primes”?

E3.6.9

If¢ is any natural, we can define the threshold-t numbers to be the set {0,1,...,¢} where the multiplication operation makes xy either the natural xy, if that is in the set, or ¢ if it is

not.

(The rabbits of Robert

Adams’

Watership Down use threshold-5 numbers.)

the division relation

according

to this multiplication,

‘What

numbers

in this version of arithmetic?

are the prime

so that

D(#,y)

Does

means

We define

3z

: 2z

every nonzero

= y.

number

factor uniquely into primes?

E3.6.10

(uses Java)

In the game

of Kenken,

each square

contains a one-digit number

sometimes told the product of the numbers in a particular group of squares.

and you are

Define a Kenken

number to be a natural that can be written as a product of one-digit numbers. Write a static real-Java method that inputs a long argument and returns a boolean telling whether the input is a Kenken number. Does your method run quickly on any possible long input?

3.6.7 P3.6.1

Problems (uses Java) In Section 3.1 we gave a pseudo-Java method that input a natural argument and printed its prime factorization. a)

(b)

Write a recursive program in real Java that inputs a positive int or long gS value Bri prints a sequence of prime numbers that multiply together to give the input.

and

By trial and error, determine the largest prime number you can factor in ten seconds of computer time. (Primes would appear to be the worst case for this algorithm.)

P3.6.2

Prove that any positive natural factors uniquely as a product of prime prime powers are pairwise relatively prime (as defined in Section 3.5).

P3.6.3

In Exercise 3.3.9 and Problem 3.3.7 we considered the set of polynomials in one variable x with real number coefficients. These include the monic polynomials, whose highest-degree

powers,

where

the

coefficient is 1. We said that a polynomial f(a) divides a polynomial g(x) if and only if there is a polynomial h(x) such that f(x)h(x) = g(x). In Problem 3.3.7 we showed that it is possible to divide a polynomial s(2) by a polynomial p(), finding polynomials g(x) and r such that s(x) = p(x)q(a) +r and either r(x) = 0 or the degree of r() is strictly less than the degree of p(x). A monic polynomial is said to be irreducible (that is, prime) if it cannot be written as the product of two other monic polynomials,

neither of them equal to 1. So x? + 3x42

3-39

is monic

but not irreducible, because it is equal to («+1)(2+2). On the other hand, «+c is irreducible for any c, and 2? + x + 1 can be shown to be irreducible*®.

P3.6.4

(a)

Let f and g be two monic polynomials that have no monic

common

divisor other than

(b)

Following the reasoning in this section, show that any monic polynomial factors uniquely into monic irreducible polynomials.

1. Show that there are polynomials h and k such that h(x) f(a) + k(x)g(x) Adapt the Euclidean Algorithm.)

= 1. (Hint:

In this problem we will work with strings over the alphabet {a,b}, and let our “multiplicatio:

be string concatenation.

We say that one string u “divides” another string v if Sa : dy:

v, or equivalently, if u is a substring of v. As with the naturals, we redefine the word

for strings so that P(w) means “w # (a) What

“prime’

and if D(x, w) is true, then either x = \ or x = w”

strings are “prime” using this definition?

(b)

Explain why any string factors uniquely as a product (concatenation) of these “primes

(c)

Prove a

(d)

Show

version of the Atomicity Lemma that any nonempty

Atomicity Lemma,

P3.6.5

ruy

(requires

exposure

for these “prime

string that is not one of your

“primes””

fails to satisfy the

even in the simple version.

to complex

numbers.)

The

Gaussian

integers

are the subset

of the

complex numbers that are of the form a+ bi where a and b are integers and i is the square root of —1. (This is an example of the sets defined in Exercise 3.6.6, the set Z{Vv—I].) The

notions of division and primality can be made to make sense in this domain as well:

(a) In Exercise 3.6.6 we defined the norm of a+ bi to be a? +b?.

The length of a Gaussian

integer a+ bi is defined to be Va? + b?, the square root of the norm. If a+ bi and c+di are two Gaussian integers, we showed that the norm of their product is the product of

their norms.

Show that the length of their product is the product of their lengths.

(b)

A unit is a Gaussian integer of length 1. What are the units?

(c)

A prime is a Gaussian integer whose length is greater than 1 and which cannot be written as the product of two Gaussian integers unless one is a unit. Prove that any nonzero

Gaussian integer has at least one factorization as a product

of primes times a

unit.

P3.6.6

(d)

Prove that 1+

(e)

Prove that if the norm of a Gaussian integer is a prime number in the usual sense, then the Gaussian integer is a prime.

(requires exposure

is a prime and that 2 is not.

to complex

numbers)

In the complex

numbers,

1 has exactly three cube

roots, 1 itself, w = (—1+ V—3)/2, and w? = (-1— V—3)/2. The Eisenstein integers are the set of complex numbers that can be written a+ bw for integers a and b. (a)

Show that the sum and product of two Eisenstein integers are each Eisenstein integers.

(b)

Show that the Eisenstein integers are a proper superset of set of the set called Z[/—3] in Exercise 3.6.6.

8SIf it had a factor then that factor would have to be of the form a —¢, and if x—¢ divided our polynomial then would be a root, and this polynomial has no real roots. 3-40

(c) We can find the length of an Eisenstein integer as a complex number by writing it as a-+iy and computing x?+y?.

length.

The norm

of an Eisenstein integer is the square of this

Find the norm of a+ bw in terms of a and b.

(d)

A unit in the Eisenstein integers is an element of norm 1. Find all the units.

(e)

A prime is an Eisenstein integer that is not a unit and which cannot be written as the product of two Eisenstein integers unless one is a unit. Prove that any nonzero Eisenstein integer has at least one factorization as a product of primes times a unit.

(f) Prove that 2 is still prime but that 3 is not. Is 3+w P3.6.7

prime?

(requires exposure to complex numbers) Consider the set R = Z[V-5] as defined in Exercise 3.6.6, and the definition of units and primes as in Problems 3.6.5 and 3.6.6. What are the units

in R? Show that factorization into primes is not unique by giving two different factorizations of 4. (This means showing that all the alleged primes in your factorizations are actually prime

in R.) P3.6.8

P3.6.9

Consider the set S = Z{\/2] as defined in Exercise 3.6.6, a set of real numbers. (a)

Recall that a unit is an element with norm 1 — what are the units in S?

(b)

Prove that 7 is not prime in S.

(c)

(harder) Prove that 5 is prime in S.

Because there are infinitely many primes, we can assign each one a number: p2 = 5, and so forth. A finite multiset of naturals is like an ordinary finite s

po = 2, pi = 3, xcept that an

element can be included more than once and we care how many times it occurs. Two multisets are defined to be equal if they contain the same

number

of each natural.

So {2,4,4,5},

for

example, is equal to {4,2,5,4} but not to {4,2,2,5}. We define a function f so that given any finite multiset S of naturals, f(S) is the product of a prime for each element of S. For example, f({2,4,4,5} is popapaps = 5 x 11 x 11 x 13 = 7865. a)

Prove that f is a bijection from the set of all finite multisets of naturals to the set of \) positive naturals.

(b)

The union of two multisets is taken by including all the elements of each, retaining duplicates. For example, if S = {1,2,2,5} and T = {0,1,1,4}, SUT = {0,1,1,1,2, 2, 4,5}.

How is f(.S UT) related to f(S) and f(T)? (c)

S is defined to be a submultiset

ofT

if there is some multiset U such that SUU

(d)

The intersection of two multisets consists of the elements that occur in both, with each element occurring the same number of times as it does in the one where it occurs fewer times. For example, if S = {0,1,1,2} and T = {0,0,1,3}, SN T = {0,1}. How is

If S$ C T, what can we say about f(S) and f(T)?

=T.

f(S OT) related to f(S) and f(T)? P3.6.10

(requires exposure

gebra,

to complex

numbers)

One

form of the Fundamental

which we aren’t ready to prove here, says that any nonconstant

Theorem

of Al-

polynomial in one

variable over the complex numbers has at least one root.

(a)

Assuming this theorem, prove that a polynomial over the complex numbers is irreducible (as defined in Problem 3.6.3) if and only if it is linear. (Thus every polynomial over the complex numbers factors into a product of linears.)

3-41

with 1:

1,10, 11,100, 101, 110,111, 1000, 1001, 1010, 1011, 1100, 1101,...

and delete the initial 1 from each string. occur in the resulting list?

1.2.2

Can you explain why every binary string will eventually

String Operations

‘We have already defined some operations that ation takes two strings and returns a boolean returns a natural. In this section we define a will give new definitions of these operators that them. Definition: backward.

If v is a string, the reversal

take strings as their arguments. The equality opervalue, and the length operation takes a string and few more such string operations. In Chapter 4 we will be more useful in formally proving facts about

of v, written v", is the string obtained by writing v

Example: The reversal of abc is cba. The reversal of 111010 is 010111. The reversal of 2 is A, because we can write \ backward by writing no characters at all, that is, by writing A. Any onecharacter string, such as a, is equal to its own reversal. Some strings, such as hannah and radar, are equal to their own reversals — these strings are called palindromes. Definition: If v and w are strings, the concatenation obtained by first writing v and then writing w.

of v and

w, written vw,

is the string

Example: Let v be the string 01 and let w be the string 110. Then vw is the string 01110 and wv is the string 11001. (So the order matters when we concatenate two strings, even though the notation looks like multiplication, where the order doesn’t matter.) Using the reversal operation, we can also calculate v® = 10, w® = 011, and v?w® = 10011. Note that v?w® is equal to (wv)*, rather than to (vw)”, because 10011 is the reversal of 11001. By convention, we execute reversals before concatenations unless there are parentheses to indicate otherwise. Java uses the symbol + to denote concatenation of strings so that, for example, "abc" + "def" would denote the same string as "abcdef". A common use of this notation is in output statements, such as

System.out.printIn("The

answer

is

" + x);

where x can be a variable of any type. To execute this statement, Java takes the quoted string, converts the value of x to a string’, concatenates these two strings, and writes the result to the output device. Note that the output will have a space between the word is and the start of the value of x, because there is such a space in the quoted string. "Every Java object that might ever need to be output must have a method toString to do this conversion.

1-13

(b)

Assuming this theorem, product

prove that any polynomial over the real numbers factors as a

of linears and quadratics,

and thus every irreducible polynomial over the reals

is linear or quadratic. (Hint: First view the polynomial as being over the complex numbers and factor it as in part (a). Then prove that if a+ bi is a root of a polynomial

with real coefficients, so is a—bi. Then show that (a—(a+bi))(a—(a—Di)) is a quadratic polynomial with real coefficients which (unless b = 0) is irreducible over the reals.)

3-42

3.7

Excursion:

Expressing Predicates in Number

Theory

So far all our concepts of number theory have been expressible as formulas in the predicate calculus, using only a few basic predicates: equality, addition, multiplication, and names for particular numbers. In Godel, Escher, Bach Hofstadter defines a formal system he calls “Typographical

Number

Theory”,

which is just our predicate calculus but also includes a symbol for “successor”,

for reasons that might become clearer® in our Chapter 4. He shows, as we have, that other concepts like s than”, “odd number”, “perfect square”, “divid and “prime” can easily be expressed

without new symbols:

a 1) Ady: Fz: (yt 2)(z +2)

P(t)

&

(@>1)A7Ay: Diy,z)A(L?

order, the least positive number j such that 7-7 = 0. For each number

i, define

n, to be the number of elements of p— 1 whose additive order is exactly i. The only nonzero values of n; are when i divides p—1, because every element of Z,_; satisfies i-(p—1)

= 0. Ifi does divide

p— 1, we can find exactly which elements have additive order exactly i. Letting 7 = (p — 1)/i, we can see that j is such an element, as is mj where m is any number relatively prime to i. These are all the possib:

, as if m is not relatively prime to i then mj

has a smaller order, and if j does

not divide a number « then ij cannot divide ix. So the number

nj, ifi divides p—

1, is equal to ¢(i).

This means

that if we add up the numbers

(i) over all i dividing p— 1 (including the case i = p— 1), we get exactly p—1. the p— 1

numbers in Z

The orders of

must then split up exactly the same way, because once we take away the

at most (i) numbers for each i dividing p— 1

and less

than p — 1, this analysis

shows that there

are at least ¢(p — 1) numbers left, and p— 1 is the only possible order left for each of them. In particular, since ¢(p — 1) is positive, we know that there is at least one such element of order p—1, and it is the desired generator. (Note that if g has order p— 1, the p—1 numbers 1, g, g?, . Though we'll prove in a moment that this case never actually happens. 5! Another way to say this is to say that Z} is a cyclic group.

52We are switching gears slightly, talking now about order as the period of the system we get by successively adding

i, as opposed to multiplying as we did above.

The reason for this is that the multiplicative behavior of Z;, turns out

to be identical (“isomorphic”) to the additive behavior of Zp—1 3-54

ge?

are all distinct, so these must

Example:

be all the numbers

in Zi.)

oe

Consider the set Zj3. We can find the order of each element by finding all of its powers

in order until we get 1. The element 1 has order 1, and with 2 we get the sequence 1, 2, 4, 8, 3, 6, 12, 11, 9, 5, 10, 7, 1, showing that 2 is a generator. Now we can stop, actually, and read off the

remaining answers from the powers of 2. The other generators are the numbers 2” for m relatively

prime to 12, namely 2° = 6, 27 = 11, and 2!! = 7. There are (6) = 2 elements of order 6 (2? = 4 and 20 = 10), ¢(4) = 2 elements of order 4 (2? = 8 and 2° = 5), 4(3) = 2 elements of order 3 (24 =3 and 28 = 9), and 4(2) = 1 element of order 2 (2° = 12). What does Z* look like if r is composite? The Chinese Remainder Theorem tells us that the ring Z, can be thought of as a direct product of rings Z,- where the numbers p® are the prime-power factors of r. It’s not hard to see that Z* is also a direct product, the direct product of the multiplicative

groups Zi. for the numbers p*. Why? common

An element w

of Z; is in Z; if and only if it has no factor in

with r other than 1. If we think of w as a sequence of naturals, one for each prime power

in the base (as we thought of 23 in Zgo as the triple (3, 2,3)), we can see that w € Z; if and only if each of these individual numbers is relatively prime to its base. How do we justify this claim? One way is to use the Inverse Theorem in reverse. If the number is relatively prime to its base, it has an inverse modulo

that base, and the sequence

of inverses is an inverse for the original number.

For

example, since 3-3 = 1 (mod 4), 2: 2= 1 (mod 3), and 3-2 =1 (mod 5), (3,2,3) has an inverse (3, 2,2) in Z4 x Z3 x Zs, because these two triples multiply to give (1,1,1). A number with an inverse is relatively prime to its base.

Which

numbers

w € Zo

are prime to 60?

If we view w as a

triple z), we need for a to be prime to 4 (thus 1 or 3), y to be prime to 3 (1 or 2), and z to be prime to 5 (1, 2, 3, or 4). There are 16 ways to choose such a triple™, yielding the fact that Zéo is the 16-element set {1,7,11, 13,17, 19, 23, 29, 31, 37, 41, 43, 47, 49, 53, 59}. (In the Problems, you'll

use this analysis to calculate ¢(r) for arbitrary numbers r.)

3.9.4 E3.9.1

Exercises Let

r be any

number

and

let a and

y be

any

two

elements

of Z*.

Prove

that

xy

€ Zr.

Complete the argument that Z* is a group by checking that the multiplication operation is associative, has an identity, and has inverses.

E3.9.2

Let r be a natural and a be a member of Z*. Define the relation R(b, c), for b and c members of Z*, by the predicate Js : b = ca’.

E3.9.3

Consider

the congruence

be the set of congruence multiplication

tables

Prove that R is an equivalence relation on Z*.

relation on Z3[2]

classes.

for C

under

modulo

(You should polynomial

the polynomial

get nine of them.) addition

and

x? +1,

Construct

multiplication

Verify that C is a field. Is it isomorphic to Zo as a ring?

E3.9.4

Find all the elements of Z§3 and describe its structure as a direct product.

E3.9.5

Find all the generators of Zt; and Zig.

E3.9.6

Show that a ring with zero divisors cannot be a

field.

*8Counting them by a technique to be formally presented in Section 6.1

3-55

and

define C to

addition and

modulo

e+.

E3.9.7 The characteristic of a finite ring is the least positive natural ¢ such that the sum of t copies of the additive identity 1 is equal to the multiplicative identity 0. (a)

Prove that if the characteristic of a ring is composite, the ring is not a

field.

(b)

Prove that the additive order of any element of a ring divides the characteristic.

(c)

Argue that in any finite field, every nonzero element except 1 has the same prime additive order.

E3.9.8

Let F be a finite field with n nonzero elements. of F that generate F* under multiplication.

E3.9.9

(requires exposure to complex numbers) Is it possible to have a subset of the complex numbers that forms a finite field under the standard addition and multiplication operations?

E3.9.10

We showed in Exercise 3.9.7 that any nonzero element of a given finite field has the same prime additive order. Cauchy’s Theorem in algebra (which we won’t prove here) says that

Prove that there are exactly ¢(n) elements

if any prime q divides the the number of elements, there must exist an element of order p. Using these two facts, prove that if a finite field has characteristic p, it must have exactly p° elements where e is some positive natural.

3.9.5 P3.9.1

Problems (uses Java)

number and e

In Problem

3.5.6 we showed

is a positive natural),

that

if p® is a prime

then ¢(p*) = (p — 1)p*"!.

power

(that

is, p is a prime

(a) Using the analysis in this section, explain how to calculate ¢(r) for any r, given its prime factorization.

(b) Use your method to calculate 4(300), (320), (323), 6(329), 4(343), (350), $(353), and ¢(360). (c) Write a real-Java static method

int phi(int

n) that returns ¢(n) given any positive

natural n as input.

P3.9.2

Let p be a prime

number,

x with coefficients in Zp.

the form “ax'” we could

Theorem.

and

to be the set of all polynomials in one variable

(Such a polynomial is a sum of zero or more monomials,

for some element a of Zp and some number

perform the Euclidean

We

define Z,(2]

Algorithm on polynomials

defined a polynomial to be monic

each of

i.) We saw in Section 3.6 that

and prove a form of the Inverse

if its highest degree coefficient is 1, and a

monic polynomial to be irreducible if it is not the product of two monic polynomials, neither

of them equal to 1. If f(a) is any polynomial, we can define a congruence relation modulo f(x), and the equivalence classes of this relation form an algebraic system. Prove that if f(x) is irreducible, then this system is a field. (The system is finite (Why?) and so it is a finite field. In fact, any finite field is isomorphic to a

field of this form, though to prove this would

be beyond the scope of this book.)

P3.9.3

(uses Java) Write a Java program that inputs positive numbers a, b, and c and computes a? modulo c. Use repeated squaring, so that you loop once for each bit in the binary expansion

of b, rather than, say, looping b times. Also make sure that your program has integer overflow only when necessary.

3-56

P3.9.4

Compute

P3.9.5

Let p be an odd prime number. We asserted in Problem 3.4.3 that —1 is a perfect square modulo p (the square of some number, calculated in Zp, also called a quadratic residue) if

about a

modulo 561 for several values of a. modulo 561 for all values of a?

Is 561 prime?

a°°

Can you prove anything

and only ifp = 1 (mod 4). Prove this claim, using the Theorem from this section about generators for the multiplicative subgroup. Can you devise a test that inputs any number in Z, and decides whether it is a perfect square?

P3.9.6

Consider the congruence relation on Z3[z] modulo the polynomial 2? + 2+ 1 and construct addition and multiplication tables for the set D of nine congruence classes as in Exercise 3.9.3 above. Prove that D is a ring but not a field. Look at the other seven monic polynomials of degree 2 over Z3 and determine which of them produce sets of congruence classes that are fields. Can you formulate a rule for determining which do and which do not? Can you prove this rule correct?

P3.9.7 Show that a finite ring R (a commutative ring with identity) that is not a field must have zero divisors. (Hint: Let a be any nonzero element that has no inverse. What can you say about the set {ab: b € R}?) P3.9.8 We defined monic and irreducible polynomials (in one variable x) above in Problem 3.9.2. Here we consider such polynomials over Z».

(a)

Show that there are exactly 24 different monic polynomials of degree d over Zy.

(b)

Show that exactly one of the quadratic monic polynomials is irreducible.

(c)

If f is any polynomial of degree at least 2, consider the set of four polynomials { f+aa+b

a,b

€ Zo}.

Show

that

at most

one of these four polynomials

is irreducible.

(Hint:

Consider the value of the polynomial with « = 0 and « = 1. If either value is zero, the

polynomial has a root.) (d)

Part

(c) shows that there are at most

over Zo.

P3.9.9

four irreducible monic

polynomials

of degree 4

Are there exactly four?

Following the reasoning in Problem 3.9.8, find the exact number

of monic irreducible poly-

nomials of degree d over Z3, for d = 2, d= 3, and d=

P3.9.10

Let a be a generator of a finite field F with characteristic p and p* elements (following Exercise

3.9.7). (a)

Prove that an element b is a generator if and only if b? Lj

= b and b) 4 b for

all j with

Spe

(b)

Prove that b = a? satisfies bP” = b.

(c)

Finish the proof that 6 is a generator.

(Hint:

we know that p has an inverse modulo

1.

Since p and p* — 1 And

since c?*—! = 1

value of b’ depends only on j’s remainder modulo p* — 1.)

3-57

are relatively prime, for all c in F, the

:

3.10

Excursion:

Certificates of Primality

I claim that the number 1756205519 is composite. Why should you believe me? In Section 3.1 we discussed at a method to test whether an arbitrary natural is prime — dividing it by all numbers (or just all primes)

up to its square root.

it would take far too long by hand. wouldn’t help you very much. If I tell you,

though,

that

With a computer, you could do this for 1756205519,

but

With a number of 100 rather than 10 digits, even a computer

1756205519

is the product

of 50173 and 35003, you don’t have to take

this on faith, because you can multiply these two numbers by hand and check the answer against 1756205519. If they match (and they should), you would now have proof that 1756205519 is composite.

It’s not easy to take a large composite number and find two factors for it (in fact, in the next section we'll see how the security of the RSA cryptosystem depends on this being difficult). But if you're given the factors, they provide convincing evidence of compositeness that is easy to check. ‘We call such evidence a certificate. It may be very hard to find a certificate, and so the existence of certificates for a property doesn’t automatically make the property easy to test. ‘What sort of a certificate could convince you quickly that some 10-digit number is prime? We have an algorithm to test the primality of the number, but if the number is large enough we won’t have enough time for this algorithm to finish>'. In this Excursion, we'll see a trick for giving certificates (due to Pratt) that uses the number theory we developed in Section 3.9.

We've just shown that if a number n is prime, then a”~! = 1 (mod n) for any number a relatively

prime to n (for example, for any number in Z*). This is something we can check (with a computer, anyway) because we can use repeated squaring to limit the number of multiplications to the number of bits in the binary expansion of n, and we don’t have to keep around any numbers larger than n. For small n, we can even work by hand.

As an example, let’s calculate 2!0° modulo 101. 2! = 2, 2? = 4, 24 = 16, 2° = 32, 210 = 32.32= 1024 = 14 (mod 101), 27° = 14-14 = 196 = 95 (mod 101), 2 95 - 32 = 3040 = 10 (mod 101) 2°° = 10-10 = 100 (mod 101), and 210° = (-1)- (-1) = 1 (mod 101). If a"!

modulo n ever fails to equal 1 for any a, we have convincing evidence that n is composite.

But what numbers, example) which you

do we really know if for most choices of a, will always get you 1, always get 1 for any

we try a bunch of a’s and keep getting 1? For most®> composite you’re not going to get 1. But some choices of a (like a = 1, for and there are some unusual composite numbers” such as 561 for a in Z}. This means that just getting 1 a lot is no proof that the

number is prime.

There is a way around this problem — it turns out (for reasons we won’t prove here) that for any “The new primality test of Agrawal et al. takes time proportional to about n!? to test an n-digit number. It’s rather tricky to come up with a formal meaning for “most” in this context, but it does make sense.

56561 = 3-11-17, so by the Chinese Remainder Theorem, Zig: is isomorphic to a direct product Zi x Zi, x Zir The numbers (3), @(11), and (17) all just happen to divide 560, so raising any a € Zig, to the 560’th power will

get you 1 in all three components, which is 1. Numbers with this property are called Carmichael numbers 3-58

composite number n, at least half the possible a’s have a property that gives away the compositeness

of n. A few, of course, aren’t even in Z* and can’t have a”! = 1 (mod n). But for the rest, consider

calculating a”~! modulo n by repeated squaring, by writing n — 1 = 2/1 for some odd number i, getting a’ modulo n, and squaring it j times to get a”~!. If you don’t end up with 1, you've proved

compositeness.

The

fact

we

won’t

prove

is that

for at least

half of those

a such

that

a"! =1 (mod n), you first get to 1 during the repeated squaring by squaring a number other than —1.

And

of course if this happens, n must

the equation a —1=O0in

be composite,

because you have found a third root for

Zn, and thus Z, isn’t a field.

This doesn’t help us to prove that n is prime, but it allows us to pile up very good practical evidence. If for 100 different values of a, this compositeness test fails to show that n is composite, there are two possibilities. Either n is really prime, or all 100 values happened to be among the

set for which the test fails. If the values were chosen randomly, and we know that the test works at least half the time, the latter is very unlikely®’ and you have a lot of confidence that n is prime. This is how large numbers are tested for primality in practice. There are primality tests that don’t have

this small residue of doubt

in their conclusions,

but so far the best ones known

still take a

time that is a large polynomial in n to test n-digit numbers. ‘We can, however, construct certificates for primality using the number theory from this chapter.

Remember that if n is prime, there is a generator of Z*, a number g such that every element of Z*, can be written g’ for some i (equivalently, an element of order n — 1). No composite number n could possibly

have an element

different elements in Z*.

is prime. How

with order

n — 1, because

if n is composite

there aren’t n — 1

So if you believe that g has order n — 1, you should be convinced that n

can you be sure that g has the claimed order?

Let’s look at the example above,

because it

happens that 2 is a generator for Zjp,. The first requirement is that 2!0° = 1 (mod 101), which we

checked. This tells us that the order of 2 divides 100. If it were less than 100, it would have to be missing at least one of the prime divisors of 100, and hence would have to divide either 20 or 50.

But along the way to computing 2!°° modulo 101, we found that neither 27° nor 2°° is equal to 1 modulo

101.

So 2 has order 100, and thus we know that 101 is prime.

To certify n as prime, then, we need the following:

e A number g such that g”~! = 1 (mod n). e A factorization of n — 1 into primes p1,... ,pk-

A check that none of g®-)/?, g-/p2

(n-1)/Pk is congruent to 1

modulo n. e Proof that the numbers pi, ..., pp are really all prime.

The last evidence is necessary to keep us from being fooled by a fraudulent certificate which tried to slip an incorrect factorization of n—1

past us. We can check that the alleged factors multiply to

57At least as unlikely as 100 consecutive flips of a fair coin all being heads 3-59

n—1, but we need them to be prime if the certificate is to be valid. Of course, this only reduces the original problem into another instance of the same problem, raising the possibility that proving the original n will require proving more numbers prime, which themselves could require more numbers, and so on. We have shown that certificates exist, but we have a bit more work to do to show that

they are short. The largest prime factor of n—1 is at worst (n—1)/2, which limits the depth of our recursion to the number of bits in the binary expansion of n, which we’ll call t (you can only halve a number about that many times and still have anything left). The number of new primes that can show up on each level of the recursion is also limited to t, meaning that we can certify a t-bit prime by recursively certifying at most ¢? other primes. This keeps the length of the certificate, and hence the number of steps to check a certificate, at a reasonable number even for 100-digit or 1000-digit primes. Finding the certificate for a really large number might be far more time-consuming. Generators are fairly common, as it happens, so you might hope to get one by repeated guessing, but you also need to factor some composite numbers of almost the same length, and there is no efficient way known to do this.

3.10.1

Writing

Exercise

Here is an example of a problem that can be solved reasonably quickly by hand, illustrating the method clearly. If the students have access to more computing power during the exercise, it is easy to construct examples with larger numbers. Give a complete certificate proving that 103 is a prime number. You must find a generator of Zig3, factor 102, prove that your generator really is a generator, and recursively certify the prime factors

of 102. (Don’t worry about certifying that 2 is prime.)

3-60

plaintext

G

A

L

LI

A

number addthree

6 9

0 3

10 13

10 8 13 11

O 3

ciphertext

K

D

O

O

D

M

@©Kendall Hunt Publishing Company Figure 3-7: Encrypting with the Caesar cipher.

3.11 3.11.1

The An

RSA

Cryptosystem

Introduction

to Cryptography

‘We'll conclude this chapter by presenting an application of number theory to cryptography, the transmission of secret messages. First, a bit of background. Our goal in setting up a cryptosystem is to take a message, called a plaintext, and convert (encrypt) it into another form, called a ciphertext, such that

e The intended recipient will be able to convert (decrypt) it back to the plaintext, and e Anyone else intercepting the ciphertext will find it difficult or impossible to decrypt.

There are any number of ways to do this (invisible ink, hiding the message inside something, and so forth) but we are interested in the mathematical methods known as ciphers where we can describe the encryption and decryption as functions from strings to strings. One of the earliest ciphers is the Caesar cipher (used by Julius Caesar in the first century B.C.), and is easy to describe using number theory. Consider the 23 letters of the Latin alphabet®® and map them in the obvious way to the numbers in Zo3. So A is represented by 0, B by 1, and so on until Z is represented by 22. This is a sort of

cipher in itself, but it’s not likely to fool anybody.

Caesar then, in our terminology, encrypted his

becomes

GMZMXD

message (a sequence of numbers in Zy3) by adding 3 to each number modulo 23, and converting the number back to a letter. So the plaintext GALLIA EST OMNIS DIUISA IN PARTES TRES the ciphertext

KDOOMD

HXY

RPQMX

MQ

SDUYHX

YUHX

(see Figure

3-7). The intended recipient could decrypt by converting to numbers, subtracting 3 from each, and converting back to letters. This is called a single-letter substitution cipher, of a particularly simple form called a rotation. Another example with English text is the rot13 cipher®, which adds 13 to each letter, viewed as

58. atin had no W, and didn’t distinguish between the letters I and J, or between U and V. K, Y, and Z were very

rare, being needed only for loan-words from other languages

such as Greek.

5°Of course, it’s simpler to just count back three letters, which is how they would have thought of it, but it will be

useful for us to keep Z23 in mind.

©The

Usenet newsgroup rec.humor. funny (still in existence at www.netfunny.com)

has a policy of displaying

particularly offensive jokes encrypted using rot13, so that the reader has to do something (typically enter a single keystroke) to decrypt it. That way the reader can’t complain about being forced to read an offensive joke.

3-61

in NZ

are just the usual decimal representations of the positive numbers,

as {1,2,3,...}.

so we can write NZ

Any set of strings over D, like {046, 321334, 23, 16}, is a language over D.

for any set X of positive number in X}, which we positive even numbers, so could interpret the string of an even number and so

And

numbers, there is a language {w : w is the decimal representation of a might also call X. For example, let E be the set of all strings denoting E = {2,4,6,8,10,...}. Note that E is a subset of NZ. Although we 002 as denoting the even number 2, it is not the decimal representation it is not in E.

Formal language theory is the branch of mathematics where we look at languages and study how difficult their decision problems might be. In particular, we often consider whether a decision problem can be solved in a particular way. Because decision problems occur so often, and because so many other data processing problems can be rephrased as decision problems, formal language theory forms the basis of computational complexity theory, the study of what resources are required to solve particular computational problems.

Historically, much of the development of formal language theory (as well as its name) came from the problem of exactly specifying a language, so that even a machine could solve its decision problem. Linguists had long been formulating rules to describe the behavior of natural languages (such as English or Chinese), but these rules are so complex and contain so many exceptions that machine translation is still considered very difficult. But the “languages” used for programming computers could be designed so that machines could work with them. Consider some of the decisions a compiler makes when it reads your input program. Does your string finalValue meet Java’s rules for an identifier? Does every { symbol in your program have a matching } symbol? Throughout this book we’ll consider strings, languages, and decision problems as motivating examples for our mathematical techniques. At the end of the book, we’ll look explicitly at the difficulty of decision problems. In Chapter 14, for example, we'll determine exactly which languages have decision problems that can be solved using only a fixed amount of memory. And in Chapter 15 we'll take a brief look at the rest of computational complexity theory. Some sample questions will be: Which decision problems can be solved with stacks? Which can be solved by any method? (We'll prove some decision problems to be unsolvable.) And finally, what can we say about how the time and memory needed to solve a decision problem grow as we consider larger and larger input strings?

1.2.4 E1.2.1

Exercises Determine whether each of the following statements is true or false. string variables representing the strings cba, c, and ab respectively:

(f) w is a suffix of u.

Here u, v, and w are

a number in Zg.

Conveniently, the encryption and decryption algorithms for rot13 are the same

(Why?).

Single-letter substitution ciphers are convenient to use but not terribly secure, in that they can be easy for an enemy to break (decrypt without prior knowledge of the key). If you intercept a message that you know to be the ciphertext from a rotation, you could even try all the possible rotations, and trust to the fact that only one of them would look like a proper plaintext (for example, you don’t need to know much about Latin to see that the ciphertext above isn’t in Latin).

A general single-letter substitution cipher can be more difficult, but is subject to certain kinds of analysis. If you

intercepted

one

of Caesar’s

letters,

substitution cipher, a good first step might

in the ciphertext —

the single-letter

and

you

believed

be to count

frequencies.

it to be

the number

You would

written

in a single-letter

of occurrences

of each letter

likely find that H was the most

common letter, because E is the most common letter in Latin plaintext, as it is in English®!. By matching various combinations of common ciphertext letters and common plaintext letters, you

would likely be able to get partial words, which would give clues as to other letters, and so forth. Such “cryptograms” are a common recreational puzzle in English, though there are two significant differences between the puzzle situation and real-world cryptography. In the former, the designer of the puzzle will frequently choose a short message with very different letter frequencies (e.g., no E’s at all) to confound potential frequency analysis. But the puzzles usually retain spaces between words and even punctuation, important clues that in a real secret message would usually be omitted. (It’s not hard for an English speaker to read “WHENI

NTHEC

OURSE

OFHUM

ANEVE

NTSIT...”,

for example, but it makes it very easy for the cryptanalyst if they see a one-letter word and know

it has to be A or I.)

A natural idea to make frequency analysis more difficult is to change keys according to some fixed pattern.

A simple version of this is the keyword

message.

polyalphabetic cipher.

He and the intended recipient need to agree on a short keyword, say,

writes this keyword repeatedly under his plaintext,

GALLIA

EST

OMNIS

DIUISA

IN

PARTES

TRES

SPQRSP

QRS

PQRSP

QRSPQR

SP

QRSPQR

SPQR

and then

Let’s return to Caesar’s

“adds”

the two letters in each column,

“SPQR”®?.

Caesar

like this:

using the addition in Z 3 and the representation

of letters as numbers (for example, in the first column G (6) plus $ (17) is A (0) modulo 23). The ciphertext is the string of sums:

APCDCP

ULN

EDFCI

TBOZKR

CD

GRLKUL

NHUL

®'Book I of Caesar's memoir De Bello Gallico, for example, contains 5912 E's (12.2%), 5429 I/J’s (11.2%), and

4581 U/V’'s (9.5%).

It has one K, one Y, and no 2s,

°2An abbreviation for the Latin meaning “the Senate and people of Rome”, the formal name of the Roman state.

The recipient writes the keyword repeatedly under the ciphertext,

and

subtracts the second letter

in each column from the first to get the plaintext (for example A (0) minus S (17) is G (6)):

APCDCP

ULN

EDFCI

TBOZKR

CD

GRLKUL

NHUL

SPQRSP

QRS

PQRSP

QRSPQR

SP

QRSPQR

SPQR

GALLIA

EST

OMNIS

DIUISA

IN

PARTES

TRES

Note that in general, different copies of the same letter in the plaintext will be encrypted into different letters in the ciphertext (for example, the four I’s turned into two C’s, a B, and a Z). This will tend to make the ciphertext letters more evenly distributed, which makes analysis harder, but

there is still hope, especially if the cryptanalyst can guess the length of the keyword. The longer the keyword, the harder the analysis. How about using an entire long text as the keyword, so it need never repeat?

GALLIA

EST

OMNIS

DIUISA

IN

PARTES

TRES

ARMAUI

RUM

QUECA

NOTROI

AE

QUIPRI

MUSA

GRYLEI

XOG

FHRLS

QYPBHI

IR

GUBKXC

GNYS

This is a lot better, but a determined analysis on enough ciphertext might still be able to break it, especially knowing that both the plaintext and the key were Latin text (for example, you would expect

a lot of letters in each to be E’s, I’s, and U’s, so the sums

of those letters should come

up

a lot). The ideal thing would be to have a key that was completely random. Simply generate a sequence of random letters (perhaps by throwing a 23-sided die) as long as the desired message, and use that as the key. The resulting cryptosystem is unbreakable in principle. Because any possible string of the right length could be generated by the random process, and no key string is any more likely than any other, no ciphertext is any more likely than any other. Any plaintext could produce any possible ciphertext,

cours

so that knowing

if you use the same

(For example,

what

the ciphertext

random

gives you no help in determining

key for two

different plaintexts,

the plaintext.

Of

this is no longer the ca:

happens if you subtract one ciphertext from the other?)

For this reason, this

cryptosystem is called a one-time pad. ‘You may have noticed a significant logistical problem with the one-time pad.

If Caesar wants to

use it to send a message from Rome to Queen Cleopatra in Egypt, say, he has to somehow see that

he and Cleopatra have the same random key. But sending Cleopatra the key is exactly as difficult as sending her a message in the first place.

For this reason, the one-time pad is only practical when

you know in advance that you want someone to send you a message later, and you can give them the key before they go.

5%Phis is an example of an argument in information theory, a field that is the subject of Chapter 13 of this book You can get an excellent introduction to information theory by reading The Mathematical Theory of Communication by Shannon and Weaver, the 1948 book that first set out the theory.

3-63

‘You can give up some security and use a key that isn’t quite random.

For example,

in the example

above the “ke: was in e! ect “use the Aeneid, starting at line 1 of book 1” which transmits a long key as long as the recipient can find a copy of the Aeneid. A better approach, at a higher level of technology of course, was used by the Germans in World War II®°, The Enigma machine was a device where typewriter keys were connected to electric lights through wires that ran through a collection of wheels. Push a key for a plaintext letter, and the light for the ciphertext

letter lights up.

After each keystroke,

the wheels move

so a different alphabet

substitution is used for the next letter. The sender and receiver have to agree on the initial setup of the wheels for each day’s transmissions. In effect, the interaction of the wheels made for a sort of keyword cipher® with a very long key. What the Germans didn’t know, however, was that the British had obtained®” a copy of the machine and were able to read the messages if they could guess

or deduce

the initial setup.

A team

of experts®®

used a combination

of brute-force

search

(inventing machines which were among the precursors of electronic computers) and exploitation of German procedural errors to get many of these daily initial setups, revealing a great deal of militarily important information to the Allies.

3.11.2

Public-Key

Cryptography

With the advent of computers, more complicated ciphers became practical, as did more complicated

techniques of analysis.

Typically, they still work by having a relatively short secret piece of infor-

mation, the key, known to the sender and Ideally, the cipher will be:

receiver but kept secret from potential eave

e sufficiently hard to break in the absence of the key, and e have so many different possible keys that trying all possible keys is impractical (remember that in general only the correct key will yield intelligible output).

setting the key to the receiver, of course, is still a major problem. A novel solution was developed in the 1980’s, however, called public-key cryptography”. Here’s how it works. Suppose Caesar wants to send a secret message w to Cleopatra, and has a reasonably powerful computer available. He looks up Cleopatra’s public key k in some kind of public directory, and computes his ciphertext cas some function of w and k, called e(w,k). Cleopatra has her own private key s that she has kept secret, and applies a decoding function d to c and s, getting back d(c,s) = w. The important properties of the system are:

“Which might have been difficult for Julius Caesar, as the Aeneid was written well after his death

°° An excellent book on the history of cryptography, especially in World War II, is The Codebreakers by David

Kahn, ®8A]though the individual ciphers for each letter were not rotations, as they were in our example. 67Fyom some Polish researchers who had, amazingly, reverse-engineered it based on documents stolen from the Germans by the French. ®SIncluding Alan Turing, about whom we'll hear more in Chapter 15 ©The concept of public-key cryptography was first invented by Diffie and Hellman.

We'll describe the particular

system in widest use today, called “RSA” after its inventors: Rivest, Shamir, and Adleman 3-64

e The functions e and d are easy to compute.

It is easy to generate pairs of keys k and s such that for any message w, d(e(w,k),s) = w. It is difficult for someone who knows e(w,k) and k, but not s, to determine w.

The RSA cryptosystem is an implementation of public-key cryptography using number theory. Here is how Cleopatra would get her public and private keys. She first generates two large, random

prime numbers p and q, by generating random numbers of the desired size and testing them for primality”? using the methods discussed in Excursion 3.10. What’s the right size? She is going to keep p and

q secret

and

publish

n = pq,

so the two

doesn’t have time to run any factoring algorithm” considered secure”.

primes

must

be so large that

an enemy

on n — 200 digits each for p and q is currently

Remember that by the Chinese Remainder Theorem, the multiplicative group Z* is a direct product

of Z} and Zj, with (n) = (p—1)(q—1) elements. We’re going to use the fact that any element a in

Z*, satisfies the congruence a*) = 1 (mod n). This means that the value of a® modulo n depends

only on the value of b modulo ¢(n). Cleopatra knows the values of p and q and can compute ¢(n) easily, while an enemy has no obvious way to do this. Cleopatra’s encoding key consists of n and e, where e is any’? number that is relatively prime to @(n). Using the Euclidean Algorithm as in the proof of the Inverse Theorem, she finds an inverse

for e modulo ¢(n), and stores it secretly as d.

Caesar’s instructions are to convert his message into numbers in Z, by any method known to both (for example,

“GALLIAESTOMNIS...”

could just

become

the

decimal

“06001010080004...”,

using the representation of letters by numbers above and just concatenating the decimal representations). Each message w, then, is a single element of Z,,. The encrypting algorithm is to convert w to c = w®, with the computation being done in Z,, by repeated squaring. The decrypting algorithm, which requires Cleopatra’s secret information, is to convert ¢ to ct, also in Zn. By the choice of d

and e, Cleopatra winds up with w®¢, which is equal to w in Zn because ed = 1 (mod ¢(n)). An enemy could break the cipher by factoring n and thus computing ¢(n).

It is conceivable that

there is another way to break the cipher without factoring n, which could be implemented quickly. (At least one other proposed public-key cryptosystem, based on a different hard problem, has failed

in this way.)

No one knows how such an alternate attack on RSA might work, and there is some

hope that if one exists it might first be discovered by a mathematician who would publish it, rather

than a cryptanalyst who would keep it secret and exploit it. The same holds for factoring —

an

There is a very tiny chance that these primality tests will certify a composite number as prime, but this can be

made as small as the chance of an enemy just guessing the correct secret key and so needn't be worried about.

™ This means that she has to know how much computing power and what factoring algorithms her enemy might

have. She might want to check the latest results of the public factoring contests, to see what the state of the art is. But an enemy who had a secret, dramatically better factoring algorithm might be able to read her messages.

2 About one in every 500 200-digit naturals is prime, so it shouldn't take too long to find two of them by guessing

and testing random naturals of that length.

73-The number of such ¢ less than ¢(n) is ¢((n)), of course, and depends on the prime factorization of 6(n), about which we know very little. We'll look at this number in the Exercises, but note that should Cleopatra have any great trouble finding such an e by trying random numbers, she could always go find a different p and q. Of course testing that e and 6(n) are relatively prime is easy for her by the Euclidean Algorithm since she knows $(n) 3-65

advantage of RSA is that its security depends on mathematical facts and conjectures that are open to public discussion.

RSA serves as a component of a cryptosystem called “PGP” (for “Pretty Good Privacy”), that has become widely used on the Internet. Because the computations of RSA take unacceptably long for routine communications, the PGP system uses RSA to send a short key which is then used in an ordinary cipher. The cipher is only “pretty good” because the numbers involved are small enough that

a determined

computer

attack

could

break

the

cipher

by known

methods.

But

computer

privacy activists argue that if a significant fraction of routine email is encrypted with even a pretty good cipher, anyone wanting to monitor communications on the Internet would be stymied by the need to determine which of millions of encrypted messages were worth decrypting. Thus they put a “PGP public key” (represented as a few lines of random-appearing text characters) at the end of all their messages”.

3.11.3 E3.11.1

Exercises Decrypt the following Latin messages from the original Caesar cipher, as described above:

a) FRKMYR,

) c) ) d) ) (e) ) b)

E3.11.2

HUKR

H SOZUMEZX

XZP.

XQXP.

HY YZ, EUZYH? ZHQM,

ZMGM,

URPDQM

Show

MYH

ZMFM. GRPZP.

that the function that takes each element

« of Z

if a is relatively prime to m and b is any number. happens if a and m have a common

to the element

ax + b

is a bijection

What is the inverse of this function? What

factor?

E3.11.3

(uses Java) Write a method rotate(String w, int k) that outputs an encrypted version of w using a single-letter substitution cipher as follows. Small letters in w should be converted to other small letters obtained by adding k to their numerical value (a = 0,2 = 25) modulo 26. Capital letters should be converted to capital letters using the same system. Other characters should be unchanged. Then write a method unrotate (String w, int n) so that the rotate and unrotate functions for a fixed n are inverses. (Hint: The easiest way to write unrotate is to have it make a single call to rotate.)

E3.11.4

The ASCII encodings (in decimal notation). ASCII,

E3.11.5

for the capital letters A through Z are the numbers 65 through 90 Encode the message “ALOHA” as a sequence of five numbers using

and then encode this sequence in RSA

using n = 143 and e = 113.

Usingn = 143 and e = 113 as in Exercise 3.11.4, decrypt the sequence (65, 98, 65, 66, 65, 77.

into a string of ASCII letters. (You should use your ability to factor n to find the “privat decryption key.) Repeat for the sequence

(66, 118, 15,50, 18, 45, 112,71, 128, 49, 114). ‘Would a long message in this cipher be difficult to decrypt without knowledge of n and e? ™For more on the cryptography policy debate in the USA at the turn of the millenium, see Privacy on the Line

by Diffie and Landau.

3-66

N

Crowo

Zz

9D oO x

suT7AMm sz

=o00 D8
” yet. We'll

do this in Problem 4.1.5

4-4

e Ify > 0, then define x + y to be the successor of « + pred(y). Both clauses of these definitions match what

we know

about

how

addition works,

which is good,

but on the other hand we’ve defined addition in terms of addition, which looks a bit fishy. We can rewrite the definition as the following recursive algorithm:

static natural plus {// Returns x+y. if (zero(y))

(natural

return x; else return successor

x,

natural

(plus

y)

(x, pred(y)));}

The recursive-algorithm form of the fifth Peano axiom tells us that this procedure will terminate. Intuitively, this is because the second argument keeps getting smaller with each call until it reaches zero, in which case there’s no more recursion. Note that it’s not obvious from the definition how to prove even obvious facts like Vx :0+a =a. None of our techniques from the predicate calculus will help us here, until we have some knowledge

about the naturals and about addition that imply this fact (and commutativity of addition in general, and all the other familiar properties of addition and mathematical induction to prove these®.

multiplication).

We'll need

the method

of

The definition of multiplication is quite similar to that of addition:

e For any x, «-0=0, elfy>0.

and

= («+ pred(y)) +2.

Now multiplication is defined in terms of both itself and addition, but we know from above that the

addition algorithm always terminates.

static

natural

times

{// Returns x times if (zero(y)) return

else

return

y.

(natural

This results in the following recursive Java-syntax method:

x,

natural

y)

0;

plus

(times

(x,

pred(y)),

x);}

Again the recursion will continue until the second argument that plus

is defined

for all inputs that are naturals,

°Can you start by proving Vx : 0 +2

becomes

so is times.

We

zero.

As long as we know

know

that, but to prove it

=x? We'll prove the rest of the standard properties in Section 4.6,

4-5

formally we need some form of the fifth Peano axiom — that by repeatedly taking predecessors of a natural we will eventually get to zero. There are number systems that satisfy the first four Peano axioms but not the fifth’. Such aren’t just curiosities — their existence tells us something about the axioms. If it were possible to prove the fifth axiom starting from the first four, any system satisfying the first four would have to satisfy the fifth as well, unless it was an inconsistent system in which you could prove anything. So we can conclude that no such proof of the fifth axiom exists®.

Once you have these definitions, all sorts of properties like the commutativity and associativity of addition and multiplication can be proved from the definitions, by mathematical induction®. This raises the question, of course, of whether you’re now allowed to use a fact like “2+ 2 = 4” without proving it! It’s worthwhile to prove something like that once, to see how it works, but after this

section you should assume the well-known facts of arithmetic to be true (unless proving them is somehow the point of an Exercise or Problem).

4.1.3

Other

Recursive

Systems

Recursive definition is not restricted to the naturals. We can and will define a variety of mathematical systems in this way. We give some base objects, a method for constructing new objects from old ones, and an assertion that the only objects of the type are the ones obtained from the bases by using the rules. Once we have such a definition, we can define operators on the new type as we did for the naturals above, and prove statements about all the objects of the type by a general form of mathematical induction. For example, just as all naturals are derived from zero by taking successo , all strings are derived from the empty string by appending letters. We have “Peano axioms” for the set of strings on a given alphabet, which correspond to the axioms for naturals:

e Aisa

string.

e If w is a string, and a is a

letter, there is a unique string wa.

e If va = wb for some strings v and w and some letters a and b, then v = w and a = b. e Any string except \ can be written as wa for a unique string w and letter a. e Every string is derived from \ by appending letters as above.

7For example, in Problem 4.1.1 we'll consider the ordinary naturals together with a new number w and numbers

w+i for every positive or negative integer i.

You'll still have to define addition and multiplication on these “numbers” ,

but once you've done so you can go ahead and start doing number theory. SAlong the same lines, logicians also work with number systems called non-standard models of arithmetic that only satisfy weaker forms of the fifth axiom. The fifth axiom is somewhat vague — when you use it in proofs it really says only that the induction rule works for predicates P(s) that you can write down, so that there could be extra numbers out there such that for some reason, you can’t express the fact that they’re extra. systems are rather beyond the scope of this book.

Of course, such

° Again, we'll do this in Section 4.6. Hofstadter does it in his Chapter VIII, more formally than we will. 4-6

Later in this chapter we'll define some operations on strings by defining them first on \ and then on wa (for arbitrary strings w and letters a) in terms of their value on w. First, though, we want to study the proof method of mathematical induction in more detail, in the special case of proving

statements for all naturals.

4.1.4 E4.1.1

Exercises

Prove from the Peano axioms that successor (successor(successor(0))),

usually called

, is a natural.

E4.1.2

Prove

from

the

definition

of addition

that

2+

2 =

“successor (successor(0))” and “4” denotes that of

4,

where

2”

denotes

the

output

of

successor (successor (successor (successor(0)))). E4.1.3

(uses Java) Write a pseudo-Java method boolean isThree (natural x) that returns true if and only if x is equal to 3. You should use the given operations for the natural data type. Make sure that your method cannot ever call pred on a zero argument.

E4.1.4

Write the expression (2 + (2- (3+ 1)))-(4+0) in terms of the methods plus and times in this section. You may use the ordinary names for the numbers.

E4.1.5

Explain informally why the statement V: fourth and fifth Peano axioms.

E4.1.6

‘We’ve seen two other number systems that are in some ways like the naturals, but have only finitely many “numbers”. Which of the Peano axioms are true for each of these systems?

successor(y)] follows from the

(a) The numbers modulo m (for any m with m > 1), where the numbers are {0,1,...,m—1} and the successor operation adds 1 modulo m.

(b)

The

“threshold-t”

numbers,

defined in Exercise 3.6.9, have numbers

successor operation that is the same as the usual one,

E4.1.7

{0,1,...,¢} and a

except that the successor of ¢

is t.

Suppose we make Peano axioms for the set Z of all integers, by saying that 0 is an integer and that every integer x has both a unique successor and a unique predecessor,

each different

from x. Our “fifth axiom” could then say that every number is reachable from 0 by taking predecessors or successors. Clearly Z obeys these axioms. Is it the only number system that: does? E4.1.8

(uses Java) Write a static pseudo-Java method boolean equals (natural x, natural y) that returns true if and only if x and y are the same natural. Of course your method should not use the

E4.1.9

(uses Java)

== operator, and should return the correct answer given any two natural inputs.

Give a recursive definition of the exponentiation operation,

so that power

y) returns x” for any naturals x and y. Write a recursive static pseudo-Java method

(x,

imple-

menting this definition. You may use the methods defined in the section. E4.1.10

(uses Java) addition

Give a recursive definition for the evenness property of naturals,

(except

successor)

or multiplication.

boolean even (natural x) that returns true zero and pred methods defined in this section.

4-7

Write

a static recursive

without using

pseudo-Java

if and only if 2 is even,

method

and uses only the

o4

o-3

o-2

oO

1

2

3

4

©

oH

wt2

ot3

wt4

o-1

5

ot5

©Kendall Hunt Publishing Company Figure 4-1: A strange number system. 4.1.5

P4.1.1

Arrows point to successors.

Problems Consider a number system that contains all the ordinary non-negative integers, a new element

w, and an element w +i for every integer i (positive, negative, or zero), as illustrated in Figure 4-1. Show that this system satisfies the first four Peano axioms. Why doesn’t it satisfy the fifth?

P4.1.2

Can you define addition and multiplication for the number system of Problem 4.1.1 in a way that makes sense?

P41.3

Prove that Versions 2 and 3 of the fifth Peano axiom are logically equivalent.

P4.1.4

Prove that the Well-Ordering Principle (Version 5 of the fifth Peano axiom) is equivalent to one of the other versions of the fifth Peano axiom (you may choose which).

PALS

(uses Java) Give a recursive definition of the “less than” operator on numbers. (You may refer to equality of numbers in your definition.) Write a static pseudo-Java method “boolean isLessThan (natural x, natural y)” that returns true if and only if « < y and uses only

our given methods. PAL

(Hint:

Follow the example of the functions plus and times in the text.)

(uses Java) Give a recursive definition of and a recursive subtraction function, with pseudo-Java header natural

minus

(natural

x,

static method

natural

for the natural

y).

On input a and y this function returns 2 —y if this is a natural (i.e., if 2 > y) and 0 otherwise. P4.1.7 Following Exercise 4.1.7, create a set of axioms that exactly define the set Z of all integers. P4118

P4.1.9

(uses

Java)

As

in Problems

4.1.5

and

4.1.6,

write

static

pseudo-Java

methods

natural

quotient (natural x, natural y) andnatural remainder (natural x, natural y) that return x/y and xy respectively, as long as a is a natural and y is a positive natural. You may use the other methods defined in this section and its Problems.

(uses Java) Let’s define a stack as follows!: e The empty stack is a stack.

e If S is astack and 2 is a thing, S.push(x) e The stacks S.push(x) and T.push(y) and y are equal.

is a stack.

are equal if and only if S and T are equal and x

‘In real Java the Stack class is parametrized, using generics, but here we will define a pseudo-Java stack whose elements are from the class thing 4-8

e Every stack is derived from the empty stack by pushing things as above. Here are two problems using this definition:

P4.1.10

(a)

Explain why we can define a pop operation that returns a thing when called from any

(b)

Assume

nonempty

stack.

now

that

we have a pseudo-Java

Stack

class with instance methods

boolean

empty( ), void pop (thing x),and thing pop( ). Write an instance method boolean equals(Stack T) that returns true if and only if the stack T is equal to the calling stack. Make sure that your method has no side effect, that is, make sure that you leave both T and the calling stack just as you found them.

(uses Java)

A queue is a data structure where we may enqueue

elements on one end and

dequeue them from the other.

(a)

Give a recursive definition of a queue of thing elements on the model of the definition of a stack in Problem 4.1.9.

(b)

Give a recursive definition of the dequeuing operation.

(c)

Write a recursi e instance method boolean equals(Queue Q) for a pseudo-Java Queue class that returns true if and only if the calling queue and Q are equal by your definition

method call Q.dequeue(

That

is, define the result of the

) in terms of your recursive definition of queues in part (a).

in part (a). You may use the instance methods boolean empty( ), void enqueue (thing x), and thing dequeue( ). You method should have no side effects, meaning that both queues should be the same before and after your method is run.

E1.2.10

1.2.5

(a) boolean strings,

equals (String

u,

String

v), returning true if and only if u and v are equal

(b) boolean v, and

prefix(String

u,

String

v), returning true if and only if u is a prefix of

(c) boolean

suffix(String

u,

String

v), returning true if and only if u is a suffix of v.

Let A, B, and C be any we concatenate a string necessarily true that the give an example of three

three languages over the same alphabet ©. Suppose that whenever from A with a string from B, in either order, the result is in C. Is it reversal of any string in C is also in C? Either explain why it is, or languages A, B, and C for which it isn’t.

Problems

P1.2.1

Let A be the set {a}. What is the set A*? If u and v are any two strings in A*, explain why the statements u = u and uv = vu must be true.

P1.2.2

Let A be any finite alphabet and let u and v be strings over A.

(a) Suppose that wu is a prefix of v. Explain, using the definition of “prefix”, why there must be some string w such that uw = v. (b)

Suppose that ww = v for some string w. u must be a prefix of v.

Explain, using the definition of “prefix”, why

(c) Suppose wu is a suffix of v. Following part (b), what can we say about the existence of a string in what concatenation equation with u and v? (d)

Suppose u is a substring of v. concatenation equation?

How

can we describe this relationship

in terms of a

P1.2.3

What should be the alphabet used to model both positive and negative decimal integers as strings? Does every string over this alphabet denote an integer?

P1.2.4

In each of these cases, determine whether the given data item should be modeled as a set or as a sequence. Justify your answer.

(a) The choice of optional features ordered on a new car. (b)

The roads to be used to drive from one place to another.

(c) The sales totals for each of the years from 1992 to 1996.

(@) (e) P1.2.5

The years in which the sales exceeded a million dollars.

)

The squares on a chessboard.

Let the alphabet C be {a,b,c}. Let the language X be the set of all strings over C with at least two occurrences of b. Let Y be the language of all strings over C that never have two occurrences of c in a row. Let Z be the language of all strings over C in which every c is

followed by an a. (Recall that any string with no c’s is thus in Z.) (a) List the three-letter strings in each of X, Y, and Z. The easiest way to do this may be to first list all 27 strings in C® and then see which ones meet the given conditions.

1-17

4.2

Excursion:

Recursive Algorithms

In this Excursion we have some examples to familiarize or refamiliarize you with the notion of recursive algorithms, followed by an Exercise where you will argue that a particular recursive algorithm does the right thing. To begin, here is an example of a real Java method to go in a class called Stack!!. This method pops all the elements from the calling Stack object. It uses two other Stack methods — pop removes the top element from the stack and isEmpty tests the stack for emptiness:

void clear () {// Pops and discards if (!isEmpty())

all

elements

from

calling

Stack

{popQ;

clear () ;}}

So as long as the calling Stack isn’t empty, this procedure will pop off the top element, call another version of itself to clear what’s left, and stop with the stack empty. Once a version is called with an

empty stack, it does nothing and finishes (it’s a “no-op”). The version that called it is then done, so it finishes, and control passes through all the remaining stacked versions until the original one is reached and finishes, with the stack now successfully cleared. There

is of course a normal iterative version of this same procedure

that performs the same

pops

in the same order ~ its key statement is

while

(!isEmpty())

pop();.

In fact, any recursive algorithm that only calls itself once and recursion,

does so at the end,

called a tail

can easily be converted to an iterative program with a loop.

Recursion doesn’t allow us to do anything we couldn’t do already without it, but it often gives a simpler way of writing down an algorithm. (You'll see many more examples in an algorithms

course.)

In many programming languages, recursive programs are less efficient than the equivalent iterative programs because the compiler doesn’t convert the recursive code to machine code in the most efficient possible way. Other languages, like the Lisp family, support recursion very well. In general, the smarter your compiler, the greater the incentive to use simple, readable, verifiable recursive

algorithms in place of iterative ones that might be slightly faster. In Section 4.1 we saw

some

pseudo-Java

examples

of recurs

code,

definitions of the plus and times methods on natural primitives.

example, was defined to be x if y was zero, and to be successor This is a tail recursion much like the clear method above.

implementing

The result of plus(x,

(plus

(x,

y), for

pred(y))) otherwise.

If we call the method to add 3 to z, this

As usual, we will assume that the rest of this Stack class has already been written elsewhere.

4-10

the recursive

method makes a call to another version of plus that adds 2 to x. That in turn calls another version that adds

1, which calls another

on that adds 0.

The

last version returns a, the next-to-last

then returns x + 1, the second version returns x + 2, and finally the original version returns x +3. How do we know that a recursive algorithm does what it should? rules:

It must obey the following three

1. There must be a base case in which the algorithm does not make a recursive call. have the correct behavior in this base case.

att

very recursive call has the correct behavior (e.g., it returns the correct value), original call has the correct behavior.

It must then the

3. The recursion must be grounded, which means that there cannot be any infinite sequence of recursive calls. That is, any sequence of recursive calls must eventually end in a base case.

These rules allow us to separate the groundedness of the recursion from its correctness. If we can show that the algorithm follows Rules 1 and 2, then it will have the correct behavior whenever it finishes, since the base case returns correct answers and each succeeding case returns correct answers because its recursive calls give it correct answers. Rule 3 takes care of the only other way it could go wrong, by entering an infinite sequence of recursive calls and never returning an answer at all. Let’s apply these rules to the clear method above. The base case is when the stack is already empty. The method obeys Rule 1 because if the stack is empty, it returns and does nothing, and this behavior is correct because the method terminates with an empty stack. It also clearly obeys Rule 2, because if the stack is not empty, the pop call will success and it will make a recursive call to clear, which by the assumption of Rule 2 empties the stack.

Why does it obey Rule 3? Here we need an assumption about stacks, in particular that any stack contains some finite number of elements. Because of the pop call, the recursive call to clear operates on a stack with fewer elements than the stack that was the subject of the original call. Further calls will be on stacks with fewer and fewer elements, until we eventually reach an empty stack and we are in the base case. This use

of the word

word “finite” means.

“eventually”

is of course imprecise,

drawing on our intuition about what the

In Section 4.1 we saw the Peano Axioms, which formalize this

intuition — one

form of the fifth Peano Axiom says exactly that a recursive algorithm of this kind will eventually terminate.

In the remainder of this chapter we will consider a wide variety of examples of proof by

induction. Many of these can be viewed as arguments for the correctness of a recursive algorithm, like those in this excursion. Finally, we turn to the example algorithm to be analyzed. number

as input, this algorithm will output

the input.”

It is claimed that:

“Given a positive

a sequence of primes that multiply together to equal

If you believe that statement (which essentially just says that this algorithm is correct)

4-11

then you must believe the “existence half” of the Fundamental Theorem of Arithmetic!?.

void factor (natural x) {// Prints sequence of prime //

Special

cases:

natural

d

if

(x 1 and e < d. What is x % e?)

Suppose that x % d ==

0 and that d

Why does the method obey Rule 3, that is, why must it terminate given any natural as its input? (Hint: Why could we guarantee that the Euclidean Algorithm always terminates?)

“This is similar to the way we proved the Inverse Theorem in Section 3.3, by giving an algorithm that provided

an inverse whenever the theorem said that one exists.

4-12

4.3

Proof By Induction

4.3.1

What

It Is and

How

for Naturals It Works

‘We now come to our first detailed look at mathematical induction. Mathematical induction is a general technique for proving statements about all elements of a data type, and can be used whenever that data type has a recursive definition. We’re going to begin with ordinary induction, the simplest

kind,

which allows us to prove

statements

about

all naturals.

Later in this chapter

and the next we'll learn methods for several other data types with the same kind of definition. Formally, mathematical induction is just another proof rule like all our propositional and predicate

calculus rules, because it says that if you have proved certain statements you are allowed to conclude a certain other statement. Our two goals in this section are to learn when and how to use this rule, and to convince ourselves that it is valid (that things proved with it are actually true). First off, state the proof rule:

e Let P(x) be any predicate with one free variable of type natural.

e If you prove both P(0) and Va : P(«) + P(a +1) e Then you may conclude Va : P(x). Let’s try a simple example.

Define P(x)

to be the statement

“the sum of the first 2 odd numbers is

x.” (A bit of experimentation, like 1 = 1°, 1+3 = 27, 1+3+5 = 37, suggests that this might be a general rule.) If we remember various high-school rules about summing arithmetic progressions, we know how to verify this fact, but let’s use our new technique to give a formal proof.

First, we are told to prove P(0), which says “the sum of the first zero odd numbers is zero”. True enough, once we remember that all vacuous sums are zero just as all vacuous products are one. Next we are given a V statement

P(a«) + P(a+1). odd numbers is How

to prove, so we let x be an arbitrary natural and set out to prove

To do this by a direct proof we must assume P(x), that “the sum of the first and prove P(«+1), that “the sum of the first 2 +1 odd numbers is (x + 1)?”.

can we do this?

The key point is to notice that the second sum is equal to the first sum plus

one more term, the (x + 1)’st!? odd number, 2x + 1. So the sum

we are interested in is equal to the first sum plus 2x + 1.

We

apply

the

inductive

hypothesis by using P(x) to say that the first sum is equal to x”. Then it follows that the second sum is 2? + 2a +1 = (x +1)”, which is just what we wanted to prove it to be. This example illustrates

veral common

features of inductive proofs:

'\5How do we know that the (a + 1)’st odd number is 2x+ 1? The first odd number is 1, the second is 3, and the

third is 5. It appears from these three examples that the i’th odd number is 2i — 1, from which we could conclude that the (2 + 1)’st odd number is in fact 2(a +1)—1 = 2£+1. To be sure that this rule always holds, of course, we would need to do yet another mathematical induction. Unfortunately, we have a technical problem in that the method described above would require us to talk about “the 0’th odd number” to prove the base case. We'll deal with this technicality in the next section.

4-13

e We first have to prove a base case P(0), which is often something totally obvious, as it was here. It’s important that we substitute 2 = 0 into P(x) carefully to get the correct statement

P(0).

e Then we do the inductive step, by proving the quantified statement Vx : [P(x) + P(x+1)].

Following the general rule for V’s, we let « be an arbitrary natural, assume that P(2) is true,

and try to derive P(x +1). P(2) is called the inductive hypothesis, and P(2+ the inductive goal.

1) is called

e The best thing we usually have going for us is that P(x) and P(x +1) are similar statements. In this example, the two sums differed only in that the sum in P(a# + 1) had one extra term. Once we knew what that term was, the inductive hypothesis told us the rest of the sum and

we could evaluate the whole thing. e Once we have proved both the base case and the inductive case, we may carefully state our conclusion, which is “Va : P(a

One mental barrier that comes up in learning induction is that P(x) is a statement, not a term. Many students have a hard time avoiding a phrasing like the following for the third bullet above:

“P(x + 1) is equal to P(x) plus the extra you into trouble when you have to think to one of the terms in P(x) to avoid this (In the example above, define S(x) to be as “S(x) = 2?”.)

4.3.2

Examples

term...” This is clearly a type error, and is bound to get of P(a) as a predicate later. It may help to give a name problem and make the statements easier to talk about. “the sum of the first 2 odd numbers” and rewrite P(x)

of Proof By Induction

Let’s try some more examples. How many binary strings are there of length n? the answer is 2", so let’s let P(n) be the statement “There are exactly 2” binary n.” As usual, P(0) is pretty easy to prove: “There are exactly 2° binary strings should know by now that 2° = 1, and that there is exactly one string of length string.

We’ve seen that strings of length of length 0.” We zero, the empty

Now we assume “There are exactly 2” strings of length n” and try to prove “There are exactly 2”+1 strings of length n +1”. This means that we need some method to count the strings of length n+ 1, preferably by relating them somehow to the strings of length n, the subject of the inductive hypothesis. Well, each string of length n + 1 is obtained by appending a letter (0 or 1) to a string of length n. If there is no double counting involved (and why isn’t there?) this tells us that there are exactly two strings of length n + 1 for each string of length n. We are assuming that there are exactly 2” strings of length n, so this tells us that the number of strings of length n + 1 is exactly twice as many, proof.

or 2-2"

= 2+!

We

have

completed

the inductive step and

thus completed

An almost identical proof tells us that an n-element set has exactly 2” subsets!.

the

If we take P(n)

“Tn fact, it’s easy to see that you can match up the subsets with binary strings one for one, so there have to be

the same number of each, but let’s go through the whole inductive proof again for practice.

4-14

=

—_

Push

here

. 2

1

0

.

é

3, that every convex polygon with n sides has a triangulation, and that every triangulation contains exactly n — 2 triangles. (Hint: When you divide an n-gon with a single line segment, you create an i-gon and a j-gon for some naturals i and j. What does your strong inductive hypothesis tell you about triangulations of these polygons?) P4.4.8

Pig Floyd is weighed at the beginning of every month. In Month 0 he weighs 400 kilograms, in Month 1 he weighs 350 kilograms, and in later months his weight W(n + 1) is equal to

V2W(n) — W(n — 1) + 700 — 350V2.

(a) Calculate W(n) for all naturals n with n < 10. Write your answers in the form a+ b/2 where a and b are integers. (b)

Prove by strong induction on all naturals n that W(n) can be written in the form a+by2, where a and b are integers.

4-25

(c) Determine W(84), Floyd’s weight after seven years. You will find it easiest to discover a pattern in the numbers W(n), and prove that this pattern holds for all n by strong induction. P4.4.9

Let * be a binary operation on a set X that is associative, meaning that for any three elements

a, b, and c, we have a * (b*c) = (a*b) *c. (We do not assume that * is commutative.)

Let n

be any positive natural and let a), a9,...,a, any sequence of n elements of X, not necessarily

distinct. Prove that however we parenthesize the sequence “a, * az *...a,”,

we get the same

result. (Hint: Use strong induction on n. The cases of n = 1 and n = 2 are trivial, and n = 3 is given by our assumption. Show that any parenthesization of ay *...*an41 is equal to

some parenthesization of a, *...*a, starred with a,41, then apply the inductive hypothesis. ) P4.4.10

Let * be a binary operation on a set X that is commutative, meaning that a*b = b*a for any elements a and b of X, and associative, meaning that a*(b*c) = (a*b)*c for any elements a, b, and c of X. (So we know from Problem 4.4.9 that we can write the product of any sequence of elements without parentheses.) Let n be any natural with n > 2, and let a1,a2,...,an be any sequence of n elements of X, not necessarily distinct. Let by, be. bn be a sequence consisting of the same elements in another order. Prove that aj * a2 *...dp), = by * bg *...Dn.

(Hint: Use strong induction on n.)

4-26

@Kendall Hunt Publishing Company Figure 4-6: Fibonacci’s rabbits. Shaded rabbits are breeding pairs.

4.5

Excursion:

Fibonacci

Numbers

In this Excursion we study the Fibonacci 1200’s.

numbers,

His original motivating problem involved

first described by Leonardo of Pisa in the

population growth in rabbits.

At time step one

you begin with one pair of newborn rabbits. At every future time step, you have all the rabbits from the previous step (apparently they’re immortal) plus possibly some newly born ones. The rule for births is that each pair of rabbits

except those born on the last step produces a new pair.

Conveniently, these are always one male and one female and the rabbits have no objections mating with their close relatives. Figure 4-6 illustrates the first few stages of the proces The number

to

of pairs of rabbits at each time step n is called F(n) or “the n’th Fibonacci number”,

and is formally defined by the following recursive rules:

¢ F(0) =0. ° F(I)=1 e For any n > 2, F(n) = F(n—1)+ F(n- 2). It’s immediate by strong induction on n that “F(n) is defined for any n” (Proof:

The base cases

n= 0 and n= 1 are given to us by the definition, and if we know that F(n — 2) and F(n— 1) are defined then the third rule defines F'(n) for us.) We can calculate F'(2) = 1 (this value is sometimes given as part of the definition), F(3) = 2, F(4) = 3, F(5) =5, F(6) = 8, and so forth. The Fibonacci numbers are lots of fun to play with because it seems that if you do almost anything to the sequence, you get the same numbers back in some form.

For example, the difference between

F(n) and F(n+1) is just F(n—1), from the third rule. If we let S(n) be the sum of F(i) as i goes from 1 to n, we get S(0) = 0, S(1) = 1, S(2) = 2, $(3) = 4, S(4) = 7, S(5) = 12, $(6) = 20, and so forth. Looking at the sequence, you might notice that S(n) = F(n+ 2) —1, so that the summation 4-27

of the Fibonacci numbers gives more or less the Fibonacci numbers”?, As another

curiosity,

look at the squares of the Fibonaccis:

0,1,1,4,9,

25,

64,.

ee

obvious, but if we look at F(n)? — F(n — 2), starting from n = 2, we get 1,3, 8,21,55,....

too

We

can recognize all these as individual Fibonacci numbers, and in fact this sequence seems to contain

every second Fibonacci number.

We’ve spotted the identity

F(n)? — F(n — 2)? = F(2n— 2). Is this always true, or just coincidence? With any such identity, the sensible way to proceed is to use induction. We verify the base cases and continue by assuming that the identity is true for n—1

and proving it for n, usually using the key definition F(n) = F(n — 1) + F(n— 2). The inductive step in this particular case is a bit tough,

though.

The natural way to begin is to

expand out F(n)? = F(n — 1)? + F(n — 2)? + 2F(n — 1)F(n— 2). The first two terms relate to

our inductive hypothesis, but the third gives us trouble. If we look at F(n)F(n — 1), we get a nice sequence (from n = 1) 0,1,2,6,15,40,104,.... (Look at the differences of this sequence.) In fact this new sequence appears to also

satisfy an identity just like the one for the square:

F(2n—1) = F(n+1)F(n) — F(n —1)F(n —2) The easiest way to prove this identity and the one above is to do them simultaneously. both of them for the n — 1 and n — 2 cases, and prove them both for the n case.

We assume

This is one of the

two choices for a Writing Exercise below. One

more

observation about

the Fibonacci

numbers

is their relationship

to the Golden

Ratio.

You may have heard of this ratio from its role in art?! — there is a unique number ¢ such that the

ratio of one to ¢ is the same as that of 1+¢ to one (see Figure 4-7). By algebra, this 6 = (V/5—1)/2 or about 0.61, so the ratio is about

1.61 to one.

Once you get started with the Fibonacci numbers,

this golden ratio fairly quickly.

the ratio of one to the next seems to approach

In Chapter 7 we'll see a general mathematical theory of how

to

solve recurrences and derive the equation

= Fell (1+9)"

—(-9)".

As @ < 1, as n increases the (—¢)" terms gets smaller and smaller and the approximation F(

(1+ 4)"/V5 gets increasingly close.

Though we don’t yet know how to derive this equation, we can prove it correct by induction on n,

using the definitions of F(n) and ¢ together with a fair bit of arithmetic.

201 Excursion 7.6 we'll look at analogies between sequences like the Fibonacci numbers and the functions occurring

in calculus. When we define the operations appropriately, the Fibonacci numbers will turn out to be more or less their own “derivative” and “integral”. Do other sequences besides the Fibonaccis relate to their own differences and sums in the same way?

°11¢ is often claimed, for example, that the length and width of the Parthenon are in this ratio.

This is apparently

not true but many other things about this ratio are ~ see Mario Livio’s book The Golden Ratio: The Story of Phi, the World’s Most Astonishing Number. 4-28

== % ©Kendall Hunt Publishing Company Figure 4-7: The Golden Ratio. Writing Exercises: For each statement, write a careful inductive naturals n, after an appropriate starting point. e Both the formula

and the formula

proof that it is true for all

F(2n —2) = F(n)? — F(n — 2)? F(2n—1) = F(n+1)F(n) — F(n-1)F(n - 2)

hold for n. e Defining the number ¢ to be wet.

F(n) = La +9)" - 9)" v5

4-29

P1.2.9

(uses Java, harder) Two strings u and v are called anagrams if one can be made by rearranging the letters of the other, that is, if every letter that occurs in one also occurs in the other, exactly the same number of times. (An example is the pair of words looped and poodle.) Using the methods from Exercise 1.2.9 above, write a real-Java static method

boolean grams. P1.2.10

anagram(String

u,

String

v) that returns true if and only if u and v are ana-

Let © be an alphabet with k letters and let n be any natural. How many strings are in the language ©"? Justify your answer as best you can, though we won’t have formal tools to prove this until Chapter 4.

1-19

4.6

Proving the Basic Facts of Arithmetic

4.6.1

The

Semiring

of the Naturals

There are a number of properties of arithmetic on the naturals that we tend to take for granted as we compute. Some of them, such as that «+0 = & or x- 0 = 0, were included in our formal definitions of the operations, but others such as y-«& were not. The reason for this, as it turns out, is that we made the axioms and definitions as short as possible, leaving out any statements

that could be derived from those already there. Now

that we have

the technique

of mathematical

induction,

we can carry out

these derivations.

Why bother to prove things that we already accept as true? For one thing, the existence of these proofs justifies the particular form of our definitions and gives us a small set of fundamental properties of the numbers from which all these other facts follow.

For another,

this task gives us

some good practice in carrying out induction proofs, using a variety of proof strategies. In abstract algebra, the following properties of a system are called the semiring system satisfying them is called a semiring”’:

1. There are two binary operations + and

-, defined for all pairs of elements.

2. These operations are each commutative,

(wy) = (ya).

axioms and any

so that Vx : Vy:

aty)

=(y+2)

3. They are both associative, so that Vz: Vy: Vz:(a@+y)+2=2+(yt+z)

(wy): 2=2-(y-2).

4. There is an additive identity 0 such that +0 =0+2 1 such that 1-2 -l=a. Also0-4=2-:0=0.

and Vr: Vy:

and Va: Vy: Vz:

az, and a multiplicative identity

5. Multiplication distributes over addition, so that Va : Vy: Vz:a-(y+z)=a-y+a-z.

One of the biggest technical problems in constructing proofs of these properties is our tendency to assume that they are true and obvious while we’re trying to prove them, which would be invalid circular reasoning. In particular, our standard notation for arithmetic sometimes assumes that addition and multiplication are

is to be carried out first.

associative, as we write «+y+z

without specifying which operation

For this section, you should think of arithmetic

abbreviations for calls upon our formally defined functions,

x+(y+z), representing plus(x,

plus(y,

statements

as being

so that instead of z+y+z we need to say

z)), or (v+y)+z, representing plus(plus(x,

The value of such a task is something for an individual instructor to assess, of course.

y),

z).

Note that Hofstadter does

much of the same work in his Chapter VIII, but our derivations here are considerably shorter because of the more

informal proof style we have developed. 2314's called a “semiring” because these are only some of the properties of a full-fledged ring such as the integers. We gave the axioms for a ring in Section 3.8 — along with the semiring axioms a ring must have an additive inverse

for every element, so that it satisfies the property Vx : dy: «+ y= 0. Actually, if you're keeping score, this is the definition of a commutative semiring, as most authors do not require the multiplication operation in a ring or

semiring to be commutative. We'll encounter a number of other semirings later in the book 4-30

We can’t use the fact that these two calls return the same

answer until we’ve proved it from the

definitions. We can go at these proofs in either of two ways. Just as a large programming problem is going to involve various subproblems, a large proof is going to involve various subproofs. We can get at this by either a top-down method, where we start out to do the whole thing and identify a good subproblem as something we need to finish, or a bottom-up

method,

where we decide in advance

what a good subproblem might be. We'll use a mixture of the two to get experience of both?4.

4.6.2

Properties

of Addition

Remember that addition is defined by the rules x + 0 = x and a + S(y) = S(a+y), using S(x) to represent the successor of «. (We don’t want to use the notation “x +1” in this context because we want to distinguish between addition and the successor operation.) We want to show that the ring properties for addition follow from this definition of addition.

Let’s begin bottom-up, by looking for one of our desired properties that ought to be easy to prove. Va :a2+0=-2 is actually given to us, but what about Va: 0+ a = 2? It’s a statement about all naturals x, so let’s try induction on For the base case, we must show 0 + 0 = 0, which follows from the Vx +a =< rule by specifying 0. For the inductive step, we let x be arbitrary,

assume 0) + , and try to prove 0+ S(a S(a). Expanding 0+ $(x) by the definition of addition, we get $(0 +), and applying our inductive hypothesis inside the parentheses, we get S(a) as desired. So we’ve completed the inductive step and proved Va :0+a =. Now for a harder one, the commutativity of addition. Let’s try to work top-down, and see where we get stuck. Write the desired property as Va : Vy: (a +y) = (y+). We have a choice of induction

or the Rule of Generalization, and we’re going to take a particular choice: let x be arbitrary and use induction on y. (This is the easiest way it might work out, given that we don’t have any immediate way to prove the inner statement Vy : (« + y) = (y+) without induction. Using induction only once for the innermost statement turns out to be the right idea in all of the examples in this section — the other variables will be able to “go along for the ride” as we vary the innermost one. If they couldn’t, we would have to consider inducting on more than one variable.) So we're trying to prove Vy : («+ y) = (y+), with y = 0 turns out to be just the warmup

with a arbitrary, by induction on y. The base case

exercise above!

(We

knew

x + 0 =

x, and we showed

and try so x+0=0+2.) How about the inductive step? We assume that x+y = y+ O+a to prove that «+ S(y) = S(y)+. Well, «+S(y) is equal to S(a+y) by the definition of addition, and then equal to S(y +2) by the inductive hypothesis. The definition of addition again gets us to y+ S(x), rather than the S(y) +a we're looking for. Here is a subproblem that we can isolate and attack with another induction: Lemma:

Va : Vy: S(y) +2 =y+S(x)

24Hofstadter is again worth reading on this point — he makes a nice analogy between finishing a subprogram and

returning to a theme in a piece of music.

4-31

Proof:

Actually we’d rather induct

on x than y, because

our definition tells us what

successor terms on the right of the addition, not the left. Si quantifiers

from Chapter

2, rewrite the whole

thing as Vy

to do with

ing the commutativity of universal : Va

: S(y) +a

= y+ S(a),

let y be

arbitrary, and use induction on x. The base case is S(y) +0 = y+ S(0). By the definition, y+ S(0) is S(y +0) = S(y), which is also equal to S(y) + 0. For the inductive case, we assume S(y)+a = y+S(2) and try to prove S(y)+$(x) = y+ S(S(x)). There are several ways we could go from here, but let’s try working on the left-hand side. Applying the definition we get S(S(y) +), which is S(y + S$(x)) by applying the inductive hypothesis inside the parentheses. But then this is a y+ S(S(x)), as desired, by applying the definition again. Applying

this lemma

finishes the inductive step of the main

proof,

so we have

(«@+y) = (y+ 2). Let’s move on to the other main property of addition:

Proposition:

Vx: Vy: Vz: e+

(y+2)=(a@+y)

proved Va

: Vy

+z.

Proof: Let x and y be arbitrary and use induction on z. For the base case, both x + (y+ 0) and («+ y) +0 evaluate to «+ y by using the definition. For the inductive step, we assume

x+(y+z)=(a+y) immediately

+2 and try to prove

«+ (y+ S(z)) = («+ y) + S(z). Again, the only move

available is to use the definition of addition.

Working

on the left-hand side, we get

that «+(y+S(z)) is equal to +S(y+z) which is then S(a+(y+z)) by using the same definition again. (Note that we have to be careful not to assume associativity during this argument!) By the inductive hypothesis, this is S((# + y) + z). Using the definition in the other direction, we get (x+y) + S(z), as desired. a

4.6.3

Properties

of Multiplication

The two rules defining multiplication are 2-0 = 0 and x: S(y) =a-y+ . To start with, the axioms tell us that certain operations exist, and we will need to use these in coding

our new ones (just as we used successor and pred in defining the operations on the naturals). ‘We'll have a boolean method isEmpty that will return true if and only its string argument is X. Given a string w and a letter a, we know that wa is uniquely defined, by a function we'll call append(w,a). The third and fourth rules tell us that given a string x # A, x is equal to wa for a unique w and a, which we'll call?®al1ButLast (x) and last (x). The functions al1lButLast and last throw exceptions if they are called with input 2.

25 Again, this will involve formal proofs of some facts that are pretty obviously true. But we will see how all we

know about strings follows from the simple definitions, and get some practice with induction proofs outside the usual setting of numbers. There will be several proofs in this format in Chapter 14.

26This notation is borrowed from the programming language language POP-11 and will be convenient when we

deal with finite-state machines in Chapter 14.

A note for the Lisp-literate: If we chose to represent strings in Lisp as

lists of atoms that were unbalanced to the left (Lisp usually does it the other way), the basic Lisp operations car, ¢dr, and cons would correspond exactly to al1lButLast, last, and append.

4-37

4.7.2

Defining

the String

Operations

‘We're now ready to define operations on strings. First, though, note that we are not working with the String class of Java, but with the mathematical string class we defined in Chapter 1”, for our pseudo-Java language. Our mathematical string objects will behave like Java primitives rather than objects, and we will define the operators to be static methods rather than instance methods?8, The operations we define will be similar to the Java ones, but not identical. (For example, we can

imagine arbitrarily long strings while the length of a String must be an int.)

That said, we define the length of a string w, written “|w|” or “length(w)”, to be the number of letters in it. Formally, we can say that |A| = 0 and that for any w and a, |wa| = |w| +1. This definition immediately gives us a recursive algorithm: natural length (string w) {// Returns number of letters if (isEmpty(w)) return

else

in w

0;

return

(1 + length(allButLast

(w));}

In the same way, we can define the concatenation operator, which corresponds to the + operation

on Java String objects. Here we define wa by recursion on the definition of x. and for « = ya, we let wa = (wy)a. In code:

string cat (string w, string x) {// Returns string made from w followed if (isEmpty(x))

return w;

else

return

append(cat

(w,

allButLast

We let wA = w,

by x (x)),

last

(x));}

Note that when we write this pseudo-Java code, we have to resolve some ambiguities in the mathematical notation. When we write “wx = (wy)a”, for example, we’re using the same notation to denote appending and concatenation, and if we left off the parentheses we’d be assuming that

concatenation is associative, associative,

of course,

something

we haven’t yet proved.

(It is true that concatenation is

but when we write the code we have to decide exactly which

order of the

operations is meant.) Reversing

a string is another example.

Informally,

w® is w written backward.

Formally, \® = \

and if w = 2a, w® = az, (Note that we need to use the concatenation operator in order to define reversal, because ax* is a concatenation rather than an appending.) In pseudo-Java?® code: 27 Though note that the actual Java methods in Excursion 4.2 used the Java String class. °5In particular, we will test strings for equality wit! , whereas with Java Strings it is possible for u == v to

be false while u.equals(v) is true. Exercise 4.7.2 has you define the == operator from the other basic methods.

?°We are assuming an implicit type cast from characters to strings when we give the character last (w) as an

argument to cat

4-38

string rev (string w) {// Returns w written if (isEmpty(w)) return

else

4.7.3

backward

emptystring;

return

cat

(last

(w),

rev

(allButLast

(w)))

Proof By Induction for Strings

In each case the operation is defined for all strings because the recursion is guaranteed to terminate, which in turn is because each recursive call is on a smaller string until eventually the relevant argument is the empty these operators.

string.

Mathematical

Specifically, if a set of strings contains A and

induction®® will then allow us to prove properties of

is closed under the operations of appending

letters,

it must consist of all the strings. So if P(w) is any statement with one free variable ranging over strings, we can use the following Induction Rule For Strings:

Prove P(A) e Prove Vw : P(w) > (Va: P(wa)]. Here the variable a ranges over letters. For binary strings, with alphabet {0,1}, this has the special equivalent form Vw : P(w) > (P(w0) A P(wl)). © Conclude Vw : P(w).

Our definitions of the length, concatenation, and reversal functions have the property that for each

letter a, f(wa) is defined in terms of f(w) and a. This means that an inductive hypothesis telling us about f(w) will often be useful in proving things about f(wa). We'll now see a number of examples of such proofs.

Proposition: For any strings u and v, |wv| = |u| +|v|. (For example, in Figure 4-6 |u| and |v| are each equal to 3 and |wv| is equal to 6. The figure shows an example of this rule in action.) Proof:

Let u be an arbitrary string and use induction on v. In the base case,

v = A, |wv| = |uA| =

|u| (by definition of concatenation) and |u| + |v] = |u| +0 = |u| by definition of length. For the inductive case, we may assume v = wa and, by the inductive hypothesis, |ww| = |u| + |w|. We must show |u(wa)| = |u| + |wa| (being careful not to assume results that might be implicit in our notation). By definition of concatenation, u(wa) = (uw)a, so ju(wa)| = |(uw)al = juw| + 1 by definition of length, and = |u| + |w| +1 by the inductive hypothesis. Meanwhile, |u| + |wa| is also |u| + |w|+ 1 by the definition of length. a Proposition:

For any string w, |w®

= |v.

S°Mduction on a recursive definition, when done on something other than naturals, is often called structural

induction. All these techniques can still be justified from the single Principle of Mathematical Induction, of course, so whether to call this a “new proof technique” is a matter of taste.

4-39

1.3.

Excursion:

What

is a Proof?

The word “proof” has a variety of meanings in different contexts. In a criminal trial, the prosecution attempts to “prove beyond a reasonable doubt” that a defendant is guilty. A scientist reports experiments that “prove” that a particular theory is correct. Religious thinkers offer “proofs” of the existence of God, or of the immortality of the soul. And finally, mathematicians and computer scientists “prove” things in their own way. American high schools usually teach geometry at some point, in part because it is the most acces-

sible example of the mathematical idea of proof. There a “proof” is defined to be a sequence of statements, each of which is justified in some way, and there are explicit rules telling whether a statement is legally justified or not. Typical justifications are by appeal to a definition (“a triangle is a figure with three sides”), an axiom (“through any given point there is exactly one line parallel to a given line”), or a previously proved result (“opposite angles at a vertex are equal”). A finished proof could be checked for correctness by a computer program, if the rules are carefully enough defined. There has even been some success by computer programs that produce such proofs by systematically looking for sequences of statements that are valid and interesting in some way. In practice, though, actual mathematicians virtually never make completely formal proofs. Here is an example of how things actually happen. In 1985, a graduate student at MIT!! was working on his Ph.D. research problem. He wanted to show that a “branching program” !? that solved the “majority problem” requires more “width” the longer the input string gets (don’t worry about what the technical terms mean). Of the perhaps 100 people in the world who had thought seriously about this problem, to his knowledge all of them (including him) thought that what he was trying to show was true. Frustrated at his lack of progress, he tried constructing some branching programs and suddenly realized that he could solve the majority problem with width five for inputs of any length. He checked the argument and convinced himself that it was correct, despite the fact that it led to the “wrong” result. Now his problem was either to convince the world that he was right and everyone else was wrong, or to find the mistake in his argument before embarassing himself too much. Since his thesis advisor was out of town, he grabbed the first two other graduate students he could find who were familiar enough with the work to understand the argument. He then showed them the proof, which in

fifteen minutes they believed. Within a few weeks the proof (which was in fact correct) had been spread literally around the world by word of mouth, and those 100 or so people had all changed their minds.

This proof was not the sequence of statements and justifications from high-school geometry. The prover could take advantage of the fact that both he and his listeners were trained mathematicians, and leave out huge numbers of steps that they could all agree upon. The process was essentially an interactive one, where the prover made claims, and the listeners questioned any of those claims they didn’t accept. The prover then reinforced the challenged claims by showing how they followed from simpler facts, and so on. In principle, the prover was ready to to answer all possible objections. "Later to write a discrete mathematics textbook

20f polynomial “size”, to be precise.

1-20

u=“aab”

uR = “baa”

v= “bba” uv = “aabbba”

vR = “abb” (uv) = “abbbaa” = vRuR

©Kendall Hunt Publishing Company Figure 4-8: The reversal of the concatenation of two strings.

Proof:

|v®| =

For the base case,

|v].

and

This is ja] + |v®| by the On

the other

= |v| + 1 by the definition of length, and addition of naturals is commutative,

Proposition:

equal

a(v®)|.

this is equal to |a| + |v| by the inductive hypothesis.

have proved that \w*| = |w|. proof.

Proof:

For the inductive step, we let w = va and assume

By the definition of reversal, |w®| = |(va)”| =

previous result,

|w| = |va|

|\*| = |\| = 0.

hand,

so we

Since we have completed the inductive step, we have completed

For any three strings x, y, and z, (xy)z = x(yz).

We let « and y be arbitrary and use induction on z. If z = A, both (ay)A and x(yA) to xy by the definition

of concatenation.

For the inductive

step,

let z =

(zy)u yw). By successive application of the definition of concatenation, inductive hypothesis, we get

(ey)2 = Proposition:

the oe

(xy)(wa) = [(xy)e]

wa and

are

assume

and one use of the

Ja = x[(yw)a] = 2[y(wa)] = 2(y2).@

For any strings u and v, (uv)* = v?u®.

(See Figure 4-8 for an example.)

Proof: Again we let u be arbitrary and use induction on all strings v. For the base case, (uA)* and AFu® are both equal to w®. For the inductive case, we let v = wa and assume (uw)*® = weu®. We

have to determine what (uv)" is, by determining how it relates to (uw). Well, (uv)* is (u(wa))® (since v = wa), which is equal to ((ww)a)” by the definition of concatenation. This in turn is equal to a(uw)® by the definition of reversal, and is then a(w®u®) by the inductive hypothesis. If we can rewrite this as (aw®)u®, we are done because v® = aw” by the definition of concatenation. But we just proved the associativity of concatenation above. a Another interpretation of the law of induction for strings is that a recursive program,

on a single argument of type string,

that recurses

is guaranteed to terminate if (a) it doesn’t call itself on input

A, and (b) it calls itself on input 2 only with argument al1ButLast (x). There is a related form of “strong induction for strings”, that would allow the program to call itself with any argument that: is a prefix o!

Note that we can also recursively define a language, like the balanced parenthesis language of Problems 4.7.6 and 4.7.7 below. As long as we have a rule that strings are in the language only if they can be produced by particular other rules, we have a similar inductive technique to prove that

4-40

all strings in the language have a particular property. We will see much more of this in Chapter 5 when we look at languages defined by regular expr: ions.

4.7.4

Exercises

E4.7.1

Prove from the string axioms that aba is a string.

E4.7.2

(uses Java) Write a recursive (static) pseudo-Java method boolean isEqual (string x, string y) that returns true if and only if the strings (not Java Strings) x and y are the same string. Use only equality of letters and the predefined static methods from this section. Recall that these include a static boolean method isEmpty (string w) that determines whether w is the empty string — use this rather than usin; on string values.

E4.7.3

If w is a string in {0,1}*, the one’s complement of w, oc(w), is the unique string, of the same length as w, that has a zero wherever w has a one and vice versa. So, for example,

oc(101) = 010. Give a recursive definition of oc(w), like the definitions in this section. E4.7.4

(uses Java) Write a recursive static pseudo-Java method string oc (string w) that returns the one’s complement of a binary string, as defined in Exercise 4.7.3.

E4.7.5

(uses Java) Write a static real-Java method to reverse a String. Do this first using a loop and the charAt method in the String class. Then write another, recursive version that uses only the concatenation operator + and the substring method.

E4.7.6

If u and v are strings, we have defined u to be a suffix of v if there exists a string w such that wu = v. Write a recursive definition of this property like the ones in this section. (Hint: ‘When is u a suffix of the empty string? If you know about suffixes of v, how do you decide about suffixes of va’?)

E4.7.7

(uses Java) Using the isEmpty and al1ButLast methods, write a recursive pseudo-Java static method boolean isSuffix (string u, string v) that returns true if and only if u is a suffix of v as defined in Exercise 4.7.6.

E4.7.8

(uses Java) Often when you enter a password, what is displayed is not the password itself but a string of stars of the same length as the string you have entered. Given any string w, let stars(w) be this string of stars. Give a recursive definition of this stars function, and a recursive pseudo-Java static method computing it using the basic methods defined in this section.

E4.7.9

(uses Java) If u is a string and a is a letter, give a recursive definition for the relation contains(u, a), which is true if and only if a occurs at least once in u. Write a recursive pseudoJava static method

E4.7.10

(uses

boolean

contains(string

Java) A string is defined to have a double

aa where a

is any letter in the alphabet.

u,

char

a)

that decides this relation.

letter if is contains a substring of the form

Write a recursive static pseudo-Java method boolean

hasDouble(string w) that returns true if and only if w has a double letter. Use the basic methods given in the section.

4-41

4.7.5

Problems

P4.7.1

Prove by induction on strings that for any string w, (w®)? = w.

P4.7.2

Prove by induction on strings that for any binary string w, (oc(w))” = oc(w®). 4.7.3 for the definition of one’s complement.)

P4.7.3

The function first is defined to take one string argument and return the first letter of the

(See Exercise

string if there is one. (So first(w) has the same output as w.charAt(0).) The pseudoJava function allButFirst takes one string argument and returns the substring consisting of everything but the first letter. Both first and allButFirst should throw exceptions if called with \ as their argument. (a) Write recursive definitions for these two functions in terms of the append function.

(b)

P4.7.4

(uses Java) Write pseudo-Java recursive static methods to calculate these two functions, using any or all of the primitives isEmpty, append, last, and allbutLast. Your method should be closely based on the recursive definition.

(uses Java) Recall that in the String class in real Java, there are two functions both named substring. If i and j are naturals, w. substring (i) returns the substring of w obtained by deleting the first i characters. The two-argument function w.substring(i,j) returns the substring consisting of the characters with position numbers k such that i < k and k < j. (a)

Define two pseudo-Java static methods named substring to operate on our string primitive data type. The first method should take a string w and a natural i and return w.substring (i) as defined above. It should throw an exception if7 is negative or if7 is greater than the length of w. The second should take a string w and two naturals i and j and return w. substring (i,j). It should throw an exception if i is negative, if i > 7. or if either 7 or j is larger than the length of w. Give recursive definitions of these two functions in terms of the basic operations on strings and naturals. Each method should throw an exception if one or both of the natural arguments is equal to or greater than the length of the string.

(b)

Prove by induction, using your definitions, that cat (substring(w,0,i), substring(w,i)) w for all strings w and all naturals i such that i is less than or equal to the length of w.

(c) Prove by induction similarly that cat substring

(w,i,k)

(substring

(w,i,j),

substring

(w,j,k))

for all strings w and all naturals i, j,k such that i < 7 < k and k

is

less than or equal to the length of w. P4.7.5

(uses Java) Give a recursive definition, in terms of our given basic operations for pseudo-Java strings and naturals, of the following charAt function. Since strings are a primitive type in pseudo-Java, we must redefine charAt to take two arguments — if w is a string and i a natural, we define charAt(w,

i) to be the character of w in position i, if any, where the first

position is numbered 0. The function is undefined if there is no such character. definition should have two cases, one for w = and one for w = va.) Write a pseudo-Java recursive static method definition.

Throw

to calculate this charAt

an exception if the function value is undefined.

4-42

function,

(Hint:

Your

using your

P4.7.6

(uses Java) We can define the balanced parenthesis language using recursion.

This is the

set of sequences of left and right parentheses that are balanced, in that every left paren has a matching right paren and the pairs are nested properly. We'll use “L” and “R” instead of

“(” and “)” for readability.

We define the language Paren by the following four rules*!: is in Paren. If u is in Paren, then so is LuR. If u and

v are in Paren, then so is uv.

No other strings are in Paren. Write a real-Java static method isBalanced that takes a String argument and returns a boolean telling whether the input string is in Paren. A non-recursive method is simpler. P4.7.7

(hard) Another by the following in any prefix of by induction on

way to characterize the Paren language (defined in Problem 4.7.6 above) is two properties: (1) the number of L’s and R’s in the string is equal, and (2) the string, the number of L’s is at least as great as the number of R’s. Prove, the definition of Paren, that every string in Paren has these two properties.

P4.7.8

(uses Java) Suppose we have a set of “good” strings, defined by a pseudo-Java method boolean isGood(string w) that decides whether a given string is good. We would like to know whether a given input string has any substring that is good. (We'll assume that the empty

string is not good.)

(a) Prove that a string w has a good substring if and only if either (1) it is itself good or (2) it can be broken into two substrings substring(w, 0, i) and substring(w, i) (using the syntax from Problem 4.7.4 above) such that one of these has a good substring. (b)

Use this definition to write a recursive pseudo-Java method boolean hasGoodSubstring (string w) that returns true if and only if the input string has a good substring. Of course your method will call isGood.

(c) Write another method instead of recursion.

(d) P4.7.9

that has the same

output as that of part

(b), but uses a loop

Of the methods in parts (b) and (c), which do you think will run faster in general?

(uses Java) Here is a recursive pseudo-Java method which purports to count the good substrings in a given input string, in the context of Problem 4.7.8. Is it correct? If so, argue why, and if not, write a psuedo-Java method (not necessarily recursive) that is correct.

public

static

int

c

for

(i

int countGood(string

w)

{

0,

i)

= 0;

=

0;

i
1, we divide n by two, let w represent

remainder (Java n%2),

and represent n by wa.

the quotient

(Java n/2), let a represent

the

A few examples (see Figure 4-9) should convince you that these definitions correspond to the usual representation of naturals as binary strings.

For one example,

the representation of 7 is that of 3

followed by a one, that of 3 is that of 1 followed by a one, that of 1 is a one by the base case, giving us 111, the correct binary for 7. So now

we try to code these up as pseudo-Java methods,

given our standard

procedures

for both

naturals and strings (Again, recall that we are using our mathematical string primitive type rather than the Java String class):

4-45

value (“111”)

rep(7)

= rep(3) “1” = rep(t) “17-4” ge

= 2x value ("11") +1 = 2x (2x value ("1") +1) +1 =2x (2x141)+4

111”

=7

®Kendall Hunt Publishing Company Figure 4-9: The functions from naturals to strings and vice versa.

static natural value (string w) {// Returns natural number value if (isEmpty(w)) return 0; string abl = allButLast(w);

if (last(w)

== 70’)

return 2 * value (abl); else return (2 * value (abl))

static {//

string

Returns if (n == if (mn ==

of the given binary string.

rep

(natural

canonical 0) return 1) return

string w = rep if (nf#2 == 0)

+ 1;}

n)

binary "0"; "1";

string

representing

the

given

natural.

(n/2);

return append (w, ’0’); else return append (w, 71’);

Writing Exercise: Give a clear and convincing argument (using induction) that these algorithms are correct. Specifically:

1. Show by induction for all binary strings w that value(w) natural according to the definitions.

terminates and outputs the correct

2. Show by (strong) induction for all naturals n that rep(n) terminates and outputs the correct string according to the definitions. You will need two separate base cases for n = 0 and n = 1.

©Kendall Hunt Publishing Co Figure 4-10:

4.9

Graphs

4.9.1

Types

An undirected graph, drawn in two different ways.

and Paths of Graphs

Our next examples of recursive definitions will take us into the realm of graph theory. We met diagrams of dots, connected by lines or arrows, in Chapter 2 as a pictorial representation of binary

relations. You’ve probably run into several other similar diagrams to model other situations in computer science. What we’re going to do now is to formally define some mathematical objects that can be represented by such diagrams, in such a way that we'll be able to prove facts about them. This will be only a brief introduction — we'll return to graph theory in Chapters 8 and 9.

e An undirected

graph (Figure 4-10) is a set of points, called nodes or vertices*, and a

set of lines, called edges. Each edge has two endpoints, which are two distinct nodes. No two edges have the same pair of endpoints. Furthermore, the only aspect we care about in an undirected graph is which pairs of nodes have endpoints — the binary edge predicate

E(x, y) on nodes, meaning “there is an edge between node x and node y”. If two graphs have the same edge predicate, we consider them to be equal although they might be drawn to look very different.

e A directed graph (see Figure 4-11) is a set of nodes together with a set of directed edges or arcs. Each are is an arrow from one node to another**. No two arcs may have both the same start node and the same end node. The directed graph may also be represented by its edge predicate — E(x,y) meaning “there is an arc from node x to node y”, and two directed graphs with the same edge predicate are considered to be equal. We can think of an undirected graph as a directed graph if we like, where each edge between a and y is viewed as two arcs, one from x to y and one from y to x.

e We'll also eventually see both directed and undirected multigraphs, which are like graphs except that more than one edge or arc might have the same endpoints (see Figure 4-12).

82The singular of “vertices” is “vertex” 83 Actually we also allow an arc from a node to itself, in which case it is also called a loop. 4-47

@Kendall Hunt Publishing Con Figure 4-11: A directed graph.

Undirected

Directed

@©Kendall Hunt Publishing Company Figure 4-12: Undirected and directed multigraphs. e Also later, we'll see graphs with some other data item.

where the nodes and/or the edges are labeled — associated Labeled graphs are a useful data structure to model all sorts

of situations. For example, a labeled directed graph might have nodes representing airports, arcs for possible flights from one airport to another, and labels on the arcs for the departure

time, price,

4.9.2

When

or length of the flight (see Figure 4-13).

are Two

Graphs

the Same?

Figure 4-14 shows two different directed graphs,

one with vertex set {a,b,c} and the other with

vertex set { z}. Clearly these graphs are not equal or identical, because to be identical two graphs must have the same vertex set and the same edge predicate. However, there is a sense in which these two graphs are “the same graph”, and we will now make this notion pr In Chapter 3 we spoke of algebraic structures, such as rings, being isomorphic. The definition of “isomorphic” is specific to a particular class of structures*!, such as rings or undirected graphs. In

general an isomorphism

from one structure to another is a bijection of the base elements of the

structures, which satisfies additional rules that preserve the essential properties of the structure. A set is a collection of elements with no other structure,

bijection.

and

so an isomorphism of sets is just a

We say that two sets are isomorphic if there exists a bijection between them, which as

“4A branch of mathematics called category theory starts with a formal definition of these “classes of structures”,

and studies the properties that are common to all of them.

4-48

827 PM BUF

* 80s

2". We know that b> 2” because c, which is a % b, must be less than b. But then since a > b, a/b is at least 1, and a is at least b+ and thus greater than grt, a

The worst case for the Euclidean algorithm actually occurs when a and b are consecutive Fibonacci numbers, for example 21 and 13. (Try this example, if you haven’t already!). From the behavior of Fibonacci numbers, one can show that the number of divisions is at most log, ¢),. a, an improvement over the log; 4), @ shown here.

4.11.4

Exercises

E4.11.1

Show that a 2x n rectangle can be covered exactly with L-shaped tiles if and only if 3 divides

E4.11.2

Complete the argument in the section by using induction to prove that f(n), the maximum, number of pieces that can be made from a convex pizza with n cuts, is exactly (n? +n + 2)/2.

E4.11.3

The upper bound of Exercise 4.11.2 was for convex pizzas. Give an example showing that this bound can be exceeded if the original pizza is not convex. Can you prove any upper bound

n.

in the non-convex case? E4.11.4

A set of n lines on the plane is said to be in general

and no three lines intersect in a single point.

position if no two lines are parallel

Prove that n lines in general position divide

the plane into exactly f(n) regions, where f(n) = (n? +n + 2)/2 is the solution to the pizza problem.

E4.11.5

Prove by induction on the Fibonacci numbers

that for any natural n except n = 1, F(n+

E4.11.6

In how many different ways can we tile a 2 x n rectangle with 1 x 2 rectangles?

E4.11.7

Consider a 2 x n grid graph, an undirected graph where the nodes are arranged in a 2 x n rectangular array and there is an edge between any pair of node that are a unit distance

2)%F(n+ 1) = F(n). Determine exactly how many divisions the Euclidean algorithm takes if the original numbers are F(n +1) and F(n), and prove your answer by induction.

apart.

A perfect

matching

in an undirected graph is a subset of the edges such that each

node in the graph is an endpoint of exactly one of the edges. Prove that the number of perfect matchings in a 2 x n grid graph is exactly equal to the answer to Exercise 4.11.6. E4.11.8

A T tetromino is a four neighbors.

set of four squares consisting of a single square with exactly three of its

(a)

Prove that if n is divisible by 4, then a 4 x n rectangle can be tiled with T tetrominos.

(b)

Prove that if n is odd, then a 4 x n rectangle cannot be tiled with three tetrominos. (Hint: Think of the squares of the rectangle being colored black and white as in a

checkerboard.) E4.11.9

Prove that if i and k are any naturals, the Fibonacci numbers F'(i) and F(6k+7) are congruent: modulo 4. 4-69

more confident about the truth of something once you had proved it? What would you like to learn about proofs in this course? 2. Explain is, why you say through called a

as carefully as possible why the code in our example above is partially correct, that the postcondition will be true whenever the precondition is true. (Hint: What can about the relationship of soFar to the numbers seen so far at the start of each pass the while loop? A statement that is true every time you start a particular loop is loop invariant. Can you find a useful loop invariant that helps explain why this

code works?) 3. Given the number theory definitions above, prove the following six statements:

) )

If x has a predecessor that is even, then « is odd. If x is an odd natural, then « has a predecessor that is even.

(c) If a has a predecessor that is odd, then 2 is even. ) If x is an even natural, then 2’s predecessor (if it has one) is odd.

)

Prove that every natural is either odd or even. (Hint: By the Least Number Axiom, if any natural is neither odd nor even, there’s a least such natural. Could it be 0? If not,

what about its predecessor? Use the results of (a) — (d) to get a contradiction.) (f) Prove that no natural is both odd and even. assuming some natural is both.)

1-23

(Similar to (e) — get a contradiction by

E4.11.10

For what pairs of naturals i and j does the natural 2' + 1 divide 2? + 1? Prove your answer.

4.11.5

Problems

P4.11.1

Show that a 3 x n rectangle can be covered exactly with L-shaped tiles if and only if n is even. (Hint: For the negative result, use induction on all odd numbers and an indirect proof in the inductive step.)

P4.11.2

(suitable for an Excursion) The “cheese problem”

is a generalization of the “pizza problem’.

Instead of a two-dimensional pizza, we have a three-dimensional convex block of cheese that is to be cut into the maximum possible number of pieces by n straight planar cuts. Find the

>

maximum possible number of pieces, g(n). (Hint: Clearly g(0) = 1, g(1) = 2, g(2) = 4, and g(3) = 8. But in making a fourth cut, we can’t cut all eight pieces, but only seven. Why? Because the first three cuts can only divide the plane of the fourth cut into seven pieces by our solution to the pizza problem. Generalizing this observation, you'll get a recursive definition of g(n) in terms of the answer to the pizza problem, f(n). Then it’s a matter of finding the solution to this equation, which we haven’t studied how to do systematically but which you might be able to manage. The answer is of the form an® + bn? + cn +d, but you'll have to find the correct real numbers a,b,c, and d and show that they’re correct.)

P4.11.3

Prove

the claim at the end

numbers. P4.11.4

of the section about

the Euclidean

Algorithm

and

Fibonacci

Specifically, prove that if positive naturals a and b are each at most F'(n), then the

Euclidean Algorithm performs at most n—2

divisions.

(You may assume that n > 2.)

Suppose we want to lay out a full undirected

binary tree on an integrated circuit chip, with

the nodes at the intersections of a rectangular grid and the edges along lines of the grid. The H-tree is a recursive method of doing this. Define the H-tree H; by induction as follows: e The tree Ho has a single node and no edges. e For any number k, H;.41 roots of two copies

is made by taking a new root node and connecting it to the

of Hz,

each with roots a distance 2* away from the new

root, one

copy directly above and the other directly below. e For any positive number

k, H»; is made by taking a new root node and connecting it

to the roots of two copies of H>;_1, each with roots a distance 2*-! away from the new

root, one copy directly to the left and the other directly to the right. Figure 4-27 shows the first few H-trees, through Hy. (a)

Draw Hs and H¢.

(b)

How many nodes are in H;? How large a grid is needed to hold the layout? (For example, AX, fits on a 7 x 7 grid.) As n increases, approximately what percentage of the nodes on the grid become

(c) P4.11.5

nodes of the H-tree H,?

How much total wire is used in H,,? How far are the leaves from the root node?

Consider the following recursively defined sequence of paths in the unit square (Figure 4-28). Path Po goes from the middle of the top edge to the center of the square. Each succeeding path will be laid exactly through the center of the regions not touched by the previous path.

4-70

7 . Ho(tx 0) and imagine that for some reason test does not return a value. If x is not equal to 7, the computer will know that the compound proposition is false, and not bother to run test. For this reason && is called a “short-circuit” operation. If the statement were (x == 7) & (test() > 0), on the other hand, even if 2 were not equal to 7 the computer would still try to evaluate the second part of the expression, and would not succeed. Thus the two expressions could cause different behavior. In general, programmers use the && operator unless there is a specific reason why the second half should be checked — if nothing else it can save the computer some work. We defined the result of the A operation by giving the value ofp A q for each possible sequence of values for p and q. This will be an important general technique for working with compound propositions:

Figure 5-2: Dividing a string in EE

into substrings in EEP.

e This language is the star of another language: EEP = {w : w € EE and we cannot write w = uv with u € EE and both wu and v nonempty}. Although this language (called “EE P” because it consists of the “primitive”

strings in EE)

has a more

complicated

English description,

it

turns out to have an easier regular expresssion. (Can you explain why EE is the star of EEP? Figure 5-2 shows how a string in EE can be divided into strings in EEP.) e Clearly aa and bb are in EEP.

EEP?

Ifa string starts with ab or ba, how could it wind up being in

(Concentrate on the effect of two-letter blocks...)

5-14

5.4

Proving Regular

5.4.1

Language

Identities Involving

Union

Identities and Concatenation

The whole point of a formal definition of regular expressions and languages is to allow us to prove things about

them.

For example,

it is true that if S, 7, and

U are any regular expressions,

is an example

language

the

expressions S(T’ +U) and ST + SU denote the same language. (This distributive law is useful because it means that the + and - symbols behave just like ordinary addition and multiplication on naturals in this respect.)

This

about languages, with free variables of type those free variables.

of a regular

“regular expression”,

which

identity —

a statement

is true for all values of

Normally the best way to prove that something is true for all members of a recursively defined set is to use induction on the definition.

In the next section we’ll use this method to prove statements

about all regular languages. But we can prove many identities by simpler methods, especially identities that involve only union and concatenation. The recursive definition of the star operator will also be important. Because

the + operator

guages,

everything

in regular

we know

about

expressions translates

the union operator

number of identities without any further proof: S4+0=S,S+S

$+T

to the union of the corresponding

by itself also holds for +.

=T+S5,

5+(T+U)

lan-

So we get a

= (S+T)+U,

and S + 4* = o* where S, T, and U are any regular expressions.

Back in Section 2.5 we proved a number of facts about concatenation of languages, such as S(TU) = (ST)U, SO = 0S = 0, and SA = AS = S. In each case the method was to prove that the two languages in question are equal by the equational sequence method (e.g., “w € S(TU) @ w € (ST)U”), unrolling definitions wherever necessary and using basic facts about concatenation of strings. These latter facts, such as that (wv)w = u(vw) for any three strings u, v, and w, are reasonably obvious in themselves and were proved rigorously from the recursive definition of concatenation in Section 4.7. We've almost proved that the regular languages form a semiring (as defined in Section 4.6) under union and concatenation. The only axiom left is the distributive law, which we can prove by the same equational sequence method:

weS(T+U) du:du:w=wAueSAve(T+U)

o 6

du:du:w=uAueSA(WETVvVEU)

&

du:du:w=uvA[(wEeSAvET)V(UESAVEU)]

©

(Qu: du: w=uwAuEeSAveT)v (du: du:w=uAueSAveUu) wesTVwe

5-15

SU

©

5.4.2

Identities

Involving

Kleene

Star

To prove identities involving the star operator, we generally use the inductive definition of the star of

a language from Section 5.1, with base case \ € S* and inductive case (u € S*)A(u € S) > uw € S*.

To warm up, let’s prove that the strings in S* are closed under concatenation. (Note that this statement is almost identical to the inductive step of the definition, with one important character

of difference.) Lemma: Proof:

If u € S* and v € S*, then uu € S*. Induction on all v.

Inductive case:

uw € S*. Then!

Base case:

if v = A, then uv =

u and clearly u € S* implies uv € S*.

let v = wz with w € S* and z € S, and assume by the inductive hypothesis that

by the definition of S* and associativity of concatenation,

which is in S* because

uw

uv = u(wz) = (uw)z,

€ S* and z € S.

a

‘We mentioned earlier that we can remember the hierarchy of operations in regular expressions by thinking of the star operator as exponentiation. Unlike the analogy of union and concatenation to sum and product, this analogy usually does not lead us to valid identities involving the star operator.

For example,

(ay)* = 2*y* for numbers,

languages (ST)* and S*T*. There

aren’t

another.

many

useful identities where

but there is no general relationship between the

one expression

involving the star operator

The ones that exist, such as (S*)* = S*, are best thought

is equal to

of as two separate identities

involving a subset relation between languages. For example, (S*)* = S* breaks down into (S*)* € S* and S* C (S*)*. The first of these we prove by induction on strings in (S*)*: for the base case we just observe that A € S*, and for the inductive step if u € (S*)* and v € S*, then uv € S* follows from the inductive hypothesis u € S* and the Lemma above.

The second statement is a special case

of another useful identity, 7 C T*. (Why is T C T* true? From the informal definition, it’s clear that any string in T is the concatenation of zero or more strings in T,, since it is the concatenation of exactly one such string. From the inductive definition, if we know ¢ € T we can conclude t ¢ T* from the inductive step of the definition, because \ € T* and t € T implies At € T*.)

Here are a few more useful subset identities: Proposition:

SP eeP*, Proof:

If S C T*, then S* C T*.

In particular, since T C T*, it follows that if S C T then

As above, we use induction on strings in S* to prove that all of them are in T*.

case, we know \ € T* because X is in the star of any language.

For the base

For the inductive case, let w = uv

where u € S* and v € S. By the inductive hypothesis we have that u is in T*, the assumption of the proposition gives us that v € T*, and the result follows by the Lemma above. a

Corollary:

49.

(ST)* and S*T* are each subsets of (S +T)*.

“his proof should be starting to look familiar — compare it with the proof of the Transitivity Theorem in Section

5-16

Proof: For (ST)*, simply note that ST C (S +T)* and apply the Proposition. For S*T*, the proposition gives us that both $* and T* are subsets of (S+T)*. Any string in S*T* can be written uv, where u € S* and v € T*. But then since both u and v are in (S+T)*, the Lemma tells us that wv is there as well.

5.4.3

a

Exercises

E5.4.1

Find an example of regular expressions S and T such that neither (ST)* C S*T* nor S*T* €

E5.4.2

Prove that for any regular expressions S and T, (S +T + ST)* =(S+T)*.

E5.4.3

Prove that for any regular expression S and any natural n, (A+ 5S)" =A+S+...+ (Hint: Use induction on n.)

E5.4.4

Prove that if S and T are any regular expressions, (S$ +T)(S+T)=SS+ST+TS+TT.

E5.4.5

Prove that if S and T are any regular expressions over a one-letter alphabet, then ST = TS.

E5.4.6

Prove that for any regular expression S, definition of the Kleene star language.

SS*

Prove that for any regular expression S, language reversal.

(S*)*

(ST)* is true. Prove your answer.

Prove

that for any regular

expressions

S and T,

=

S*S.

= (S")*, (ST)

You

should

where =T*S®,

5".

use induction on the

the R superscript denotes where

the R superscript

denotes language reversal. E5.4.9

Let © be a finite

alphabet

S*n A* =(SNA*)*.

and let A C ©.

Prove that for any regular expression S over D,

E5.4.10 Exe 5.4.2 through 5.4.9 all refer to “any regular expressions”. Would any of the statements fail to be true if they instead referred to “all language (The statement of Exercise 5.4.5 would still refer to a one-letter alphabet.)

5.4.4

Problems

P5.4.1

What is the general relationship between (S$ +T)* and S* +T*? Are they ever equal? Is one sometimes, or always, a subset of the other? Prove your answer.

P5.4.2

Prove that if S and T are any regular expressions over a one-letter alphabet,

e.g., & = {a},

and if n is any natural, then the languages (ST)” and S"T” are equal. (Hint: Use ordinary induction on n and the result of Exer 5.4.5.) Give examples of two regular expressions S

and T (again with © = {a}) such that the languages (ST)* and S*T* are not equal. P5.4.3

Find a general formula for (S +7)", where S and T are arbitrary regular expressions over a one-letter alphabet and n is an arbitrary natural. Prove (by induction on n) that your formula is correct. Repeat for (S + T+U)", also over a one-letter alphabet.

5-17

P5.4.4

Let © be the alphabet

{a,b,a71,b-1}.

Define an equivalence

so that uv ~ ua~lav ~ uaaly ~ ub-!bv ~ ubb~!v for any then,

u ~ v if and

only if there is some

sequence

relation ~ over strings in X*

strings u and v. (By transitivity,

of insertions and deletions of aa~',

a~ta,

bb-!, or b~1b that leads from u to v.) It should be clear that no two different strings in {a,b}* are equivalent to one another.

Let R be the set of equivalence classes of this relation.

(a)

Prove that concatenation of elements of R is well-defined. then uw ~ ve.

(b)

Define operations of union,

language concatenation,

That is, ifu~v

and w~

a,

and Kleene star on subsets of R,

and prove that each of these operations is well-defined.

P5.4.5

Using the definitions in Problem 5.4.4, prove that the set of subsets of R satisfies all the ring axioms from Section 3.8,

except the commutativity of multiplication, if we define “addition”

to be union and “multiplication”

to be language concatenation.

P5.4.6

Is it true that for any regular expressions S and T, (S* NT) = (ST)*?

P5.4.7

Prove that for any two languages S and T, (ST)*S = S(T'S)*. Use induction on the definition of the Kleene star languages.

P5.4.8

This

P5.4.9

problem

involves

strings

in {a,b,a-!, b-1}*

and

the

equivalence

relation

~

on these

strings defined in Problem 5.4.4. (a)

Describe an algorithm that will input a string u and determine whether u ~ A.

(b)

Describe an algorithm that will input two strings u and v and determine whether u ~ v.

(Hint: Use the algorithm of part (a).)

Consider the set of all strings over {a, b} and define the equivalence relation ~ on these strings so that u and v are equivalent if and only if one can be transformed into the other by any sequence of insertions and deletions of the strings aa or bb. What

of this relation? P5.4.10

Prove your answer.

are the equivalence classes

How would you determine whether two strings are equivalent?

Let © = {a,b} and define the language A to be ab*b. Prove that A* = AU {A}.

5-18

5.5

Proving Properties

5.5.1

of the Regular

One’s Complements

Languages

of Languages

Because the regular expressions, like the numbers and the strings, are defined inductively, we can often prove a proposition for all regular expressions by induction. Let P(R) be a predicate whose one free variable represents a regular expression. The proof method is:

e Prove P((). e Prove P(a) for all ae Z. e Prove that for any regular expressions S$ and T, if P(S) and P(T), then P(S+T). Similarly prove that P(S) and P(T) imply P(ST). e Prove that for any S, P(S)

For our first example,

implies P(S*).

recall the one’s

complement

operation on strings in {0,1}*,

where

is the string that has the same length as w and has a 0 where w has a 1 and vice versa. 4.7.3 we introduced this operation and gave a recursive definition for it.

oc(w)

In Exercise

We now define an operation on languages, also called oc, such that oc(L) is defined to be {oc(w) : w € L}.

Though

it will be a bit tedious

only to strings and languages,

in places,

we'll be strict about

not to regular expressions themselves.

applying

Thus

the operation

when S is a regular

expression, we won’t define “oc(S)” for the time being, just oc(L($)) for the one’s complement of the language of S. Theorem: Proof:

If S$ is a regular expression, oc(L(S)) is a regular language.

There are two main parts to the proof.

First,

we recursively define a function f that takes

a regular expression S and returns a regular expression for the language oc(L(S)). In effect, we will give a recursive algorithm to compute this function. Second, we must prove by induction that this function definition is correct —

that in each of the cases of the definition the expression f(5')

“=

Ss

ll

~ 3

ee SS

I

given by the function really does denote the language oc(L(S')). First the definition:

5-19

Now we prove, by induction on all regular expressions S$, that f(S) denotes the language oc(L(S)). There are six cases (three base cases and three induction steps), one for each clause of the definition. For the base cases we must determine that the one’s complement of the base languages are the languages given: e We can verify oc(@) =

by substituting into the definition: oc() = {oc(w) : w € 0} = 0.

e Similarly, oc(0) = {oc(w) : w € {0}} = {1} by the definition of oc on strings.

@ Again, oc(1) = {oc(w) : w € {1}} = {0} by the same definition. For the three inductive cases we make

the inductive

hypothesis

that

for the particular regular

expressions S and T, f(S) = oc(L(S)) and f(T) = oc(L(T)). Then we can verify:

e f(S+T)

= f(S) + f(T) = oc(L(S)) U oc(L(T)) by the definition of f and the inductive

hypotheses. w € oc(L(S method for only if (w €

We have to verify that oc(L(S +T)) + T)), then w = oc(v) for some v € proving sets equal, v € L(S +T) if and oc(L(S))) V (w € oc(L(T))) if and only

is equal to this last expression. But if L(S+T). By the equational sequence only if (v € L(S)) v (vu € L(T)) if and if w € (oc(L(S)) U oc(L(T))).

e Similarly, f(ST) = f(S$)f(T) = oc(L(S))oc(L(T)) by the definition of f and hypothesis. We need to show that oc(L(ST)) is equal to this last expression. equational sequence method, w € oc(L(ST)) if and only if w = oc(v) and v € v € L(ST) if and only if v = ay for some x € L(S) and y € L(T). By an we'll prove in Exercise 5.5.1 below, w = oc(v) = oc(a)oc(y), so that v € L(ST) w € oc(L(S))oc(L(T)) and we have proven the desired equivalence. e Finally, f(S*) we must show identity oc(xy) strings is the

the inductive Again by the L(ST). But identity that if and only if

= f($)* = oc(L(S))* by the definition of f and the inductive hypothesis, and that oc(L($))* is equal to oc(L(S)*). Informally this is pretty clear from our = oc(a)oc(y) — the one’s complement of the concatenation of zero or more concatenation of their one’s complements. Formally, we do an induction for

each direction of the proof and use this identity in the inductive step.

To show oc(L(S))* € oc(L(S)*), we use induction on all strings w € oc(L(S))*, showing for each such string oc(w) € L(S)*. The base case of w = 4 is satisfied because oc(w) and € L(S)*. For the inductive step, let 1 y with a € oc(L(S))* and y € oc(L(S)). inductive hypothesis is that oc(a) € L(S)*. Since oc(w) = oc(«)oc(y) and oc(y) € L(S), conclusion follows because L($)* is closed under concatenation.

that = The the

The other half of the proof is similar — for all w € oc(L(S)*), that is, all w such that oc(w) € L(S)*, we must show w € oc(L(S))*. The base case is oc(w) = A, which implies w =X and w € oc(L(S))*. For the inductive step, let oc(w) = xy such that x € L(S)* and y € L(S). By the inductive hypothesis oc(a) € oc(L($))*. Then (using the identity again) w = oc(x)oc(y), and w € oc(L(S))* because oc(y) € oc(L(S)). In each of the three inductive steps we have shown that if S is a regular expression made directly

from subexpressions T and/or U, then we can define oc(L(S)) from oc(L(T)) and oc(L(U)) using operations that are part of the definition of regular expressions. So taking as our inductive 5-20

hypothesis that the latter two languages are regular, we may conclude that oc(L(S)) is a regular language. a Note that as a byproduct of our proof, we get a recursive algorithm to compute the function f, which

inputs a regular expression S and outputs a regular expression for oc(L(S)). we need to parse S, to determine

its structure,

specifically how

To produce our output

it is made

up of subexpressions.

How can we do this in Java syntax? ‘We need to define a RegExp class for objects that denote regular expressions.

Presumably we will

use some sort of tree structure to actually store the base expressions and operators that make up the expression. But we don’t need to decide on an implementation in order to describe our algorithm

‘d only decide what methods we need to be available for a RegExp object. on S, we first need to be able to tell whether

subexpressions.

Given a regular

it is a base expression or a combination of

If it is a base expression we need to know what kind it is (empty set or letter) and

if it is a combination we need

to know

what

operator was used last and what

the subexpressions

are. This leads us to an interface something like this:

public

class

RegExp

{// Represents

a regular

public

();

//

RegExp

is

calling

boolean

public

boolean

// is calling

// is calling

public is

to

equal

RegExp

represented

object

expression

"emptyset"?

the base

expression

"0"?

expression

"1"?

isZero();

object

isOne();

object

the base

subexpressions?

two

of

concatenation

public boolean isStar(); // is calling object the

star

//

returns

w

base

the

boolean isUnion(); calling object the union

RegExp

by

isEmptySet ();

public boolean isCat(); // is calling object the

public

emptyset

RegExp

w);

returning

boolean

public

//

(String

constructor

public //

returning

constructor

public //

RegExp

expression.

of

of

two

subexpressions?

a subexpression?

firstArg();

first

argument

of

union,

5-21

concatenation,

or

star

if

this

exists

public //

RegExp

returns

public

RegExp

plus

r + s

public static RegExp // returns rs

(RegExp

cat

(RegExp

public static RegExp star // returns Kleene star of

r,

r,

(RegExp r

concatenation

or

union

of

argument

second

static

// returns

secondArg(); RegExp

RegExp

if

this

exists

s);

s);

r);

If these methods are all available in the class, we can code up the algorithm for the function f like

this:

public {//

static

if

(s.isStar())

(RegExp

s) complement

one’s

the

denoting

if

(s.isUnion()) return

Reversal

star

cat

return

(oct,

plus

of

the

input’s

language

(oct);

(oct,

ocu) ;}

ocu); //

s.isCat()

must

be

true

the operation of reversal

on

languages

of Languages

do another

Theorem:

return

ocu = f (s.secondArg());

LP = {w" swe L}:

Proof:

f

if (s.isEmptySet()) return new RegExp (); if (s.isZero()) return new RegExp ("1"); if (s.isOne()) return new RegExp ("0"); RegExp oct = f (s.firstArg());

else

‘We now

a RegExp

Returns

RegExp

5.5.2

RegExp

similar example,

using

If S$ is a regular expression, L(S)"

defined

by

is a regular language.

We’ll again recursively define a function from RegExp objects to RegExp objects,

and prove

by induction on regular expressions that its output denotes the reversal of the language of its input. Here is a Java static method using the class definition above:

public static RegExp rev (RegExp s) {// Return a RegExp denoting the reversal of if (s.isEmptySet()) return new RegExp(); if (s.isZero()) return new RegExp("0"); if (s.isOne()) return new RegExp("1"); RegExp trev = rev (s.firstArg()); 5-22

the language

of s

if

(s.isStar())

if

(s.isUnion())

return

RegExp urev

= rev

else

cat

return

star

(trev);

(s.secondArg());

return

(urev,

plus

trev);

urev);

(trev,

//

s.isCat()

must

be

true;

So the function rev simply replicates all the regular expression operations except

concatenation,

which it reverses. To see it in action, consider an expression such as bac + (ab + bc)*ba(ba)*. By carrying out the recursive definition, you may verify that the output of rev on this input is

cab + (ab)*ab(ba + cb)*. When there are two or more levels of concatenation, as in this example, each must be reversed separately.

Now we must prove, by induction on all regular expressions S, that the regular expression returned

by rev(S)

denotes the reversal of the language

of S.

This proof has two

base cases and three

inductive cases, one case for each line of the function:

The function is correct on input ) because the reversal of the empty language is the empty language. Why? @* is the set of all strings w such that the string w* is in 0, and clearly there can be no such strings. If a is a

letter, the regular expression a denotes the language {a}, and thus L(a)" is defined

to be {w : w”

€ {a}}.

Since the only string whose reversal is in {a} is a itself, we have

that the reversal of a single-letter language is itself and the function is correct on single-letter languages. Now for the inductive cases we assume that the function rev correctly computes L(S)" and L(T)* on inputs $ and T respectively. On input S+T, then, rev(S+T) returns the union of

rev(S) and rev(T), which by this inductive hypothesis is equal to L(S)"UL(T)*. All we need to do is verify that this language is equal to L(S+T)*. This is easy by the equational sequence method — for any string w, w € L(S)" UL(T)* if and only if (w € L(S)") v (w € L(T)*)

if and only if (w® € L(S)) V (w® € L(T)) if and only if w® € (L(S) U L(T)) if and only if w® € L(S+T)

(by the definition of the + operator) if and only if w € L(S +T)*.

With the same assumptions about rev(S) and rev(T), the next case reduces to showing that

L(ST)* is equal to L(T)"L(S)". We can prove this by again using the equational sequence method, the definition of concatenation of languages, and a string identity from Section 4.7 — for any string w:

we L(ST)® wre L(ST) wre L(S)L(T)

o o ©

(we L(S))AvEL(T)

©

e L(S)®)A WF EL(T)®)

o

Su: du: (uw =w")A Su: du: (veu® = w)A(u® Fe : Ay: (yx =w)

A(x € L(S)®) A (ye L(T)®)

5-23

2

eoool>

= nas she wera ele

Source: David Mix Barrington

HOoHO.

Broo

Figure 1-1: The truth table for AND.

Source: David Mix Barrington Figure 1-2: The truth tables for OR and XOR. Definition: A truth table is a representation of the values of one or more compound propositions. Each row of the table represents one of the possible sequences of values for the base propositions involved, and each column represents a compound proposition. Each value in the table is the value of the compound proposition for its row, with the base values given for its column. We use 0 and 1 to denote the boolean values, as these are sometimes easier to distinguish than F and T. Example:

Figure 1.1 shows the truth table for the compound

proposition p A q.

The natural next compound proposition to consider is “the value of a is 7 or test returns a value”. But should we consider this to be true if both its base propositions are true? In English the answer often depends on the context. In mathematics we cannot afford to be imprecise, and so we define two different “or” operations for the two possibilities: Definition: Let p and q be propositions. The disjunction or inclusive or of p and gq, written “pV q” and read as “p or q”, is the proposition that is true if either or both ofp and q are true, and

false only if both are false. The exclusive or of p and q, written “p @q” and read as “p exclusive or q”, is the proposition that is true if one or the other of p and q is true, and the other false.

Example: Using our examples for p and q, we can read pV q as “x is 7, or test returns a value, or both” and p @q as “a is 7, or test returns a value, but not both”. If « = 6 and test returns a

value, for example, both pV q and p @ q are true. If pq is true, on the other hand, then p V q is true but p @q is false. Figure 1-2 describes the complete behavior of both of these operators by a truth table. Java represents the inclusive OR operation by the symbol ||. As with AND, there is an alternate operation | that evaluates the second input even if the first input is true. So if both inputs are defined, || and | give the same answer. But if p is true and q causes an error, evaluating p | q will

trigger the error but p || q will not. We will use || in this book, following the general practice. The exclusive OR operation is denoted in Java by the symbol *. 1-26

e For the final case, we need as inductive hypothesis only that rev(S)

returns a regular expres-

sion denoting L(S)*. From the code, we see that rev(star(S)) returns the star of rev(S), which thus denotes (L(S)")*. Our task, then, is to show that (L(S))* is equal to the desired (L(S)*)®. Informally this is fairly clear, because the reversal of the concatenation of zero or more strings from L(S) is going to be the concatenation of their reversals, in reverse order, as we can see from the identity (uv)” = v®u®, Formally, we need to prove that for any string w, w € (L(S)*)* & w € (L(S)*)®, which we can convert to w € (L(S)*)* w® € L(S)*. Unfortunately the equational method doesn’t work very well with the definition of a star language, so we'll do the two halves of this equivalence separately, each by an induction on all strings in a star language. Let’s first prove that w € (L(S)*)* + w® € L(S)*. The base case is w = \, and since w® = d and J is in any star language we have w" € L(S)*, as desired. For the inductive step, we assume w = uv, u € (L(S)")*, v € L(S)®, and (as inductive hypothesis) u® € L(S)*. Our concatenation identity tells us that w® = vPu®, and thus w® is the concatenation of two strings in L(S)*. (The inductive hypothesis says that u* € L(S)*, and the assumption says that v® is in L(S), which we know to be a subset of L(S)*.) Star languages are closed under concatenation (as shown in Section 5.4), so we may conclude that w® € L(S)* as desired. The other direction is very similar, except that now we induct on the definition of L(S)* with w* as our arbitrary string. The base case is that w® = \, in which case w = \ as well and

thus w € (L(S)*)*. For the inductive step, we assume w” = wv, u € L(S)*, v € L(S), and (as inductive hypothesis) u® € (L($)")*. Then as before, w = v’u* can be written as the concatenation of two strings in (L(S)")* and is thus in (L(S)*)* itself.

The induction is complete, and the correctness of the algorithm is proved for all regular expressions.

It’s instructive to note that the method of induction on all regular expressions doesn’t always work, just as ordinary mathematical induction doesn’t always work to prove a statement about numbers. For example,

we'll prove in Chapter

is a regular language.

14 that if S and T are regular expressions,

then L(S) 9 L(T)

But we don’t know enough about regular languages to prove this now, and

induction won’t help us. What

happens if we try?

We can start easily enough by induction on T: $N

= 0, and either SNa=a

or SNa=9

so that

in any case Sa is regular. By set identities, $Q(T+U) = (SNT)+(SNU) so we have our first inductive case. But we have a problem calculating SM (TU). To get a string in this language we take a string from T, concatenate it with a string in U, and just happen to get a string in S. The two substrings needn’t be in S themselves, so the inductive hypothesis (that SMT and SQU are regular languages) seems completely irrelevant. In Problem 5.5.4 we'll try to salvage this proof, by making some additional assumptions on T and getting a weaker result. “This step uses the fact that every string has a unique reversal, so that a string exists if and only its reversal

exists,

5-24

5.5.3

Exercises

E5.5.1

Using the recursive definition of the oc function on strings from Section 4.7, prove that for

E5.5.2

Verify carefully from the definition of rev that its value on input bac+ (ab+ bc)*ba(ba)* is in

any strings « and y, oc(ay) = oc(x)oc(y). fact. cab + (ab)*ab(ba + cb)*.

Recall that a string u is a prefix of a string v if v = uw for some string w. of aba are 4, a, ab, and aba.)

(So the prefixes

If L is a language, we define the prefix language of L, written

Pref(L), to be the set of all prefixes of all words in L — formally, Pref(L) = {u: du: ve LA Jw : uw = v}. Prove that if S is any regular expression, then Pref(L(S)) is a regular langauge. Give a recursive algorithm that on input S$ will produce a regular expression for Pref(L(S)). E5.5.4

(uses Java) Give a recursive boolean method that inputs a regular expression S and determines whether L(S) is the empty language. Note that other regular expressions besides @), for

example, a(b+ a)*(@ + a*@)(bb)*, denote the empty language. E5.5.5

(uses Java) Making calls to the solution of Exercise 5.5.4 if necessary, give a recursive algo-

rithm that inputs a regular expression S and determines whether L(S) = {A}. Note that * is far from the only regular expression denoting this language — consider for example the star of the regular expression for ) above, or ) + ((ab)*baaQba)* + 0*.

E5.5.6

(uses Java) Give a recursive boolean method that inputs a regular expression S and determines whether L(S) is a finite language. You may make calls to the solutions of Exercises 5.5.4 and 5.5.5 if necessary.

E5.5.7

Suppose

E5.5.8

(uses Java) Give a recursive boolean method that inputs a regular expression S and determines whether \ is a member of L(S). (Note that this is a different question from that of Exercise

we

had

two

algorithms:

one

that,

given

two

regular

expressions

S

and

T,

gave

a regular expression denoting the language L(S) L(T), and another that given a regular expression S, gives a regular expression for the complement of L(S'). How could we use these to test whether two arbitrary regular expressions denote the same language?

5.5.5.) E5.5.9

(uses Java) Write a recursive boolean method that inputs a regular expression S, over the alphabet © = {0,1}, and outputs whether L(S) contains at least one string in the language d*0d*,

E5.5.10

5.5.4 P5.5.1

that is, at least one string that contains at least one 0.

(uses Java) Write a recursive boolean method

that inputs a regular expression S, over the

alphabet © = {0,1}, and outputs whether L(S) C 0*, that is, whether every string in L(S) consists only of 0’s.

Problems If L is a language

and a is a

letter,

we define the quotient

of L

by

a, written La“,

as

follows. La~! consists of all those strings that would be in L if you appended an a to them — formally, La~! = {w : wa € L}. Prove that if S is any regular expression, and a is any

5-25

letter, then L(S)a™! is a regular language. expression for this language.

Give a recursive algorithm to produce a regular

P5.5.2

Can you write an algorithm, similar to those Exercises 5.5.4 and 5.5.5, that inputs a regular expression and tells whether its language is 5*? What goes wrong? (We'll be able to solve this problem with the results in Chapter 14.)

P5.5.3

(Hard) In Problem 5.2.5 we considered one-letter regular languages (subsets of a*), and defined almost periodic languages. Prove that every regular expression over a one-letter language denotes an almost periodic language. (Hint: Use induction over regular expressions with only one letter. The base languages are clearly almost periodic, so the work comes in the three induction steps. Given that S and T denote almost periodic languages with thresholds ts and tr and moduli qs and qr respectively, you need to show that there is an appropriate threshold and modulus for each of ST, SUT, and S*. It turns out that you can let gsr and qsur each be the least common

P5.5.4

multiple of gg and

qr, and let gg+

be the same

as gg.)

Suppose a language X satisfies the properties of closure under substring (whenever wow € X for any strings u, v, and w we may conclude that v € X) and closure under concatenation (whenever u and v are in X, so is wv). Prove by induction on all regular expressions

T that X NL(T)

is a regular language.

P5.5.5

Prove that if X C &* is a non-empty language satisfying the properties of closure under substrings and closure under concatenation, then X = A* for some set A C ¥ (that is, X is the set of strings in D* that only use certain letters).

P5.5.6

If L is any language, we define its substring language Sub(L) to be the set of all strings y such that y is a substring of any string x € L. Prove that if S is any regular expression, then

Sub(L(S)) is a regular language. for this language.

Give a recursive algorithm to produce a regular expression

P5.5.7

(uses Java) Write a recursive boolean method that inputs a regular expression S, over some finite alphabet U, and a string w € X*, and outputs whether w € L(S). (Hint: Use the results of both Exercise 5.5.8 and Problem 5.5.1.)

P5.5.8

If © is any non-empty finite alphabet, we can partition the languages over © into four sets: ¢ The set containing only the empty language 0, e The set of nonempty languages that contain only strings of even length, e The set of nonempty languages that contain only strings of odd length, and e The set of languages that contain both a string of even length and a string of odd length. Describe a recursive algorithm that inputs a regular expression S and determines to which of

the four sets L(.S) belongs. P5.5.9

Let © and A be two nonempty

finite alphabets.

A string homomorphism

from D* to A*

is a function f defined by first specifying a string f(a) € A* for every letter a € X, then using

the recursive rules f(A) = A and f(wa) = f(w)f(a) for every string w € D* and every letter acd. If X is any subset of ©*, we define f(X) to be the subset of A* consisting of all strings that are equal to f(w) for any string w € &*. Prove that if X is a the language of any regular

5-26

expression over , then f(a) is the language of some regular expression over A. Describe an algorithm to compute an expression for f(X) given one for X, and argue that your algorithm is correct. P5.5.10

(uses Java) Write a recursive boolean method

that takes a regular expression S (over © =

{a,b}) as input and returns whether L(S) contains at least one string that is in the language x*aX*bdu*. You will need the results of Exercises 5.5.4 and 5.5.9.

5-27

5.6

Excursion:

Hofstadter’s MU-Puzzle

In the first chapter of Godel,

Escher,

Bach, Hofstadter presents the following puzzle.

language L over the alphabet {M,I,U} language we’ve been using in this book:

He defines a

by the following rules, which are here translated into the

0. MIeL. 1. Ifwl eZ, 2. If Mw

then wIU

€ L.

€ L, then Mww

€ L.

3. If wIlIx € L, then wUa 4. If wUUa2

€L,

€ L.

then wa € L.

5. A string is in LD only if it can be shown to be so by Rules 0 through 4.

Note that this is a perfectly good recursive

definition of the kind we’ve been using — the language

L is defined in terms of itself, but in a consistent way.

of strings known

to be in L.

For example,

We can use the rules to develop a collection

starting with MJ

we could show MIU

€ L (Rule

1),

MIUIU €L (Rule 2), MIUIUIUIU € L (Rule 2 again), and so forth. If we take some of our other choices at the beginning, we get a chance to use Rules 3 and 4, which shorten the string we have, for example we might get MII (Rule 2), MIIII (Rule 2 again), MIIIIU (Rule 1), MITIIUIIIIU (Rule 2), MIJUUIIIIU (Rule 3), MIUUIUU (Rule 3 again), MIUUI (Rule 4), and MIUUIU (Rule 1). Hofstadter’s puzzle is to determine whether is to start listing all possible derivations in on each node and a child for each one-rule whether MU ever came up. But if we went it happens)

the string MU is in L. The obvious first thing to try some systematic way, maybe in a tree with a string derivation from each string. Then we'd have to see for a while without deriving MU (which we would, as

we’d be no more confident that it couldn’t show up later.

Suppose MU

were in L, but

the shortest derivation took thousands of steps — it could take years to find and we’d never know that we’d searched far enough’. The solution to the puzzle, as revealed later in the book, is NO. MU

derivation leads to it. have, but that MU

How

can we prove such a thing?

We

is not in L, because no possible

find a property that all strings in L

doesn’t have.

Just in the course of trying out derivations, we would quickly come to the conclusion that every string in ZL begins with an M. This is a fact we can prove by induction on the definition: the base string MT begins with an M, and each of Rules 1 through 4 preserves the property of beginning with an M. That is, if you start with a string beginning with an M and apply one of these rules, 'SIt’s worth mentioning Hofstadter’s motivation for introducing this puzzle, which is as a metaphor for formal logic.

If MJ is an “axiom” and Rules 0 through 4 are the “rules of inference”, the language L is the set of “theorems”

of this particular logical system. can’t prove that MU,

We can “prove”

for example,

isn’t a theorem.

certain strings to be theorems, but working within the system we For more on this see Hofstadter’s book.

5-28

the resulting string also begins with an M. (We can check this case by case. For example, for Rule 1 we derive a string wIU from a string wI. If we assume that wJ begins with an M, then it follows that w, and thus wIU, also begin with an M. The other three cases are similar.) So every string derived by these rules begins with an M, and by Rule 5 these are all the strings in L. But this fact doesn’t help us with the puzzle, because the string MU also begins with an M. need a more specific property to solve the puzzle, involving a little bit of number theory:

Theorem: A string is in L if and only if it is in M(J+U)* is not divisible by three.

5.6.1

Writing

The assignment into two parts:

We

and it contains a number of I’s that

Exercise is to write up some part of the proof of the above Theorem.

It clearly breaks up

1. Prove that every string in L is in M(I+U)* and has a number of I’s not divisible by three. This is very similar in form to the “begins with an Md” result above. Note that this is sufficient to solve Hofstadter’s original puzzle, since the number of ones in MU is divisible by three. 2. Prove that every string in M(I+U)*, whose number of J’s not divisible by three, is in L. To do this we have to show how to derive an arbitrary string of this type. This is most easily done in two parts.

First, show how to derive every string in MJ*

with an acceptable number

of I’s. (Hint: You can easily get any number of J’s that is a power of two from MI by Rule 1, then use the other three rules to subtract three J’s as many times as needed.) Then show that any one of the desired strings can be obtained from one of these strings in MJ*.

5-29

5.7

Recursion

5.7.1

Induction

and Induction

in General

on a Recursive

Definition

‘We've seen several examples of a general phenomenon now, where we define a concept by:

e Defining one or more base cases, e Defining one or more induction rules — ways to create new objects from old objects, e Declaring that the only possible objects are those created by a finite number induction rules, starting from the base cases.

of uses of the

Any time you do this, you will define something. Of course, what is important is to show that the thing you define is exactly the thing you are trying to define. This means showing that the set of “defined objects” is equal to the set of “target objects”, and usually reduces to two subgoals — proving that every target object is defined and that every defined object really is a target object. For example, in Excursion 5.6 we had a recursive definition for a particular language (the set of strings derivable from MI by the rules of the MU-puzzle) and we had to prove that these were exactly the strings with a certain property (exactly one M at the beginning, and a number of J’s not divisible by three). For one half of the proof, we took an arbitrary string with the property and showed how to derive it from MI by the rules. The proof of the other half used a general principle that deserves a name of its own: Principle

of Induction

on a Recursive

Definition:

If a property holds for all the base cases

of a recursive definition (of the form specified above),

and that property

is preserved by all the

inductive steps, then it holds for all objects in the defined class. All of the induction rules we’ve seen so far can be thought of as examples of this principle. It’s reasonably clear that the principle is valid, but it’s worth noting that we can prove it using the

definition and the single, original Principle of Mathematical Induction for the naturals. truth follows from our definition of the naturals.)

(Hence, its

To see this, we assign a number called the age to every object in the defined class. The age of an object is defined to be the minimum number of steps needed to produce the object from one of the base cases. For example, the age of the string MIIU in the MU-puzzle language is two,

because

we

can

make

MIJIU

in two

each rule can verify that it can’t be made

for MIIU,

such as MI

MIIIIUI + MIUUI ~ shortest derivation only.)

steps from

in one step.

MJ,

+ MII > MIIII > MIIIIU MIUUIU

— MIIU,

So we prove the Principle of Induction on a

proposition by ordinary induction:

going

through

MII,

and

a check of

(We could also have a longer derivation

— MIIIIUITIIU

> MIIIIUIUU

>

of length nine, but the age is determined by the

particular recursive definition by proving a particular

“For any natural n, any object of age n or less has the prop-

5-30

erty”. The base case, with n = 0, follows because the objects of age zero are exactly the base objects, and we are given that all of them have the property. For the induction, let x be an arbitary object of age n+ 1. It is produced by some rule from objects of age at most n. By the (strong) inductive hypothesis, all these objects have the property, property, « must have it as well.

5.7.2

A Digression:

Continued

and then because the rule preserves the

Fractions

Let’s look at another example. Define a real number to be buildable if it is equal to 1, if it is a buildable number plus 1, or if it is the reciprocal of a buildable number!®. Which real numbers are

buildable?

Again we can try examples, finding that 2, 1/2, 3, 1/3, 3/2, and 2/3 are all buildable.

We might notice that all of these are positive rational numbers, to prove that every buildable number is a positive rational:

and use induction on the definition

e 1 is a positive rational.

e If « =p/q is a positive rational, so is a+ 1= oo e If «

=p/q is a positive rational, so is 1/2 = q/p.

q can be zero.)

So is every positive rational number buildable?

What

(Since x is a positive number, neither p nor

about, for example,

41/17?

It’s not obvious

where to start to make 41/17 from 1, but we can get some ideas from looking at how the process might finish. We could get to 41/17 by either taking the reciprocal of 17/41, or by adding one to 24/17. So we need to figure out how to construct either 17/41 or 24/17 — the latter looks easier because it involves smaller numbers in the fraction. So how could we make 24/17? From either 17/24 or 7/17, and again the latter looks better. Now we couldn’t get 7/17 by adding one to a positive rational, so we have to get it from 17/7. And now we can play the same game again,

getting 17/7 by adding one to 10/7, and 10/7 by adding one to 3/7. which is two plus 1/3, and we saw above that 1/3 is buildable.

3/7 is the reciprocal of 7/3

So we have a general process, much like the Euclidean Algorithm from Section 3.3, that will allow us to build any positive rational. If p/q is less than one, we build q/p and then take its reciprocal; if p/q is greater than one, we build (q/p) — 1 and then add one. The pro must terminate

because every time we take a reciprocal, the denominator gets smaller. a recursive algorithm:

public

rational

buildRational

(natural

num,

natural

{// Returns rational number num/den by adding

This is easily described as

den)

1 and taking reciprocals.

“Or, if you like, we’re proving the statement “For any natural n, any object of age n has the property” by strong

induction.

© This is really, of course, a standard recursive definition with one base case, “1 is buildable”, and two inductive

cases, “If x is buildable, so are x +1 and 1/2”. But recursive definitions often appear in this shortened form, so we'd better get used to it.

5-31

if

(num

==

den)

if (num < den) return

return

return

reciprocal

1;

(buildRational

1 + buildRational

(num

(den,

- den,

den);}

num));

From this algorithm, we can represent any positiv rational number in a particular form, called a continued fraction!®. For example, our analysis of 41/17 shows that it can be written:

EH

1 41/17 = 24+ —_— 2+5

Any positive integer, since it is buildable, can be written this way, where the numbers at the beginning of the denominators are the number of times we add one between each taking of reciprocals. If we look at just parts of the continued

fraction,

1 and 2+ ma

like 2, 2 + 3,

2

better!” approximations of 41/17, in particular 2, 5/2, and 12/5.

5.7.3

Periodic

Continued

Fractions

This suggests an intriguing idea. What real number?

we get successively

if we apply the buildRational

procedure to an irrational

An immediate problem is that it will never terminate, because if it did we would have

a continued fraction, and continued fractions denote rational numbers. So we’ll need to change our algorithm to exit with a partial answer, just as we would if we were computing digits of a rational number. Let’s try it on 2, which is about 1.414. Well, subtracting one gives us V2 — 1, which is less than one, so we take the reciprocal. Remembering some high-school algebra, we evaluate —+— by

multiplying top and bottom by V2 +1, back to where

we started with

/2.

We’ve

v2=1+

41 = /2+1.

giving

v2-1

Now when we subtract one, we're

shown:

1

1 =14+>——

a+r

There is an infinite continued fraction that is periodic even though it will never terminate. This is very similar to decimal or binary expansions of real numbers, where some are terminating (those rationals that can be written with a denominator

that is a power

of the base), some are periodic

(the rest of the rationals) and the rest go on forever without repeating a pattern.

What

good is

'6The remainder of the section is something of a digression, though it provides further examples of induction on a

recursive definition,

'7In fact (though we won’t go into it here) these are the best possible approximations with denominators of those

sizes.

5-32

this? Well, just as for 41/17 we can use the initial parts of this infinite continued fraction to get good rational approximations for /2: 1, 3/2, 7/5, 19/12, 41/29, and so on!®. ‘We might now wonder exactly which numbers have periodic continued fractions. We'll now get half the answer with an inductive proof. (The converse of this Theorem is Problem 5.7.5.) Definition: A non-trivial quadratic equation where a, b, and c are integers that are not all zero.

Theorem

is an equation

(The Periodic Continued Fraction (PCF)

of the form ax? + ba +¢

Theorem):

= 0

Every positive real number

that has a periodic continued fraction is also the solution of a non-trivial quadratic equation.

5.7.4

Proof of the PCF

Theorem

We begin by isolating a special case of the problem: Definition:

A real number has a pure periodic

continued

using the operations of adding one and taking reciprocals!®. fraction if it is buildable,

using these operations,

fraction if it is buildable from itself

A number has a periodic continued

from a number

with a pure periodic continued

fraction. ‘We need to find a property of real numbers that isn’t changed by taking reciprocals or adding one. Then if our starting point has that property, by induction, so will any number buildable from that

starting point. Lemma:

If a real number

one and taking reciprocals,

¥ is buildable from a number

{ by a sequence

of operations of adding

then + satisfies the equation y = “3+ for some naturals w,

x, y, and

2.

Proof:

We'll show

definition.

this by induction for all real numbers

buildable from ,

using the recursive

For the base case, we need only note that { itself can be written at.

For the inductive

step, let y be built directly from some number 6 by adding one or taking the reciprocal, and assume

by the inductive hypothesis that | = “4+, If y is the reciprocal of 6, it clearly can be written as Se,

The four naturals for y are the same ones as for 6, just in different places.

the other hand, we may write y as fem

So if

If y = 6+ 1, on

ets) by simple algebra.

a

is defined by a pure periodic continued fraction, it satisfies an equation of the form

Multiplying through

by y3 + z, we get the integer quadratic

equation

yb + (z-—w)8—a«

=0.

case at least’?

Are

we done,

for the pure

'SIt happens, for example, that dividing an musical octave

periodic

3? + 28 = Not

quite,

because

into twelve equal intervals is a good idea because seven

of these intervals combine to give you notes with a three-to-two frequency ratio — 7/12 is an excellent rational approximation to the base-2 logarithm of 3/2. There are other ways to divide the octave, some of which sound better than others — if you are interested look up the composer Easley Blackwood.

‘°Of course we must rule out the possibility of using no operations, or just an even number of reciprocals, or every

number would have a pure periodic continued fraction.

5-33

Source: David Mix Barrington Figure 1-3: The truth table for NOT. Definition: Let p be a proposition. The negation of p, written sp and read as “not p” or "it is not the case that p”, is the proposition that is true when p is false and false when p is true. Example: Again using our given meanings for p and q, sp means “the value of x is not 7” and aq means “test does not return a value”. Note that there are several equivalent ways in English to express the negation, such as “It is not true that test returns a value”. Figure 1-3 is a truth table giving the values of the two compound propositions sp and gq. Note that each column has two 0’s and two 1’s — for example, —p is true for both the rows where p is false, because the value of q has no effect on the value of ap. ‘When we use the — operator along with other operators, we adopt the convention that the negation applies only to the next proposition in the compound proposition. Thus sp V q, for example, means “either p is false, or q is true, or both”, while (pV q) means “it is not true that p is true or q is true, or both”. Java denotes negation by the symbol “not equal”.

!. This character also occurs in the Java symbol

!= meaning

Our last two boolean operators are particularly important because they model steps taken in proofs.

The first captures the notion of a “legal move” in a proof. If we have established that p is true, for example, when would we be justified in saying that q must be true? Definition: Let p and q be propositions. The implicationp — q, read as “p implies q” or “if p, then q” is the proposition that is true unless p is true and q is false. Another way to say this is

that p — q

is true if p

is false or if q is true, but not otherwise.

Example: In our running example, p > q is the proposition “if x is 7, then test returns a value”. On the other hand, q — p is the proposition “if test returns a value, then a is 7”, which is a different proposition. Each one says that a certain situation is impossible. The first, p + q, means that it is not possible for 2 to be 7 and for test not to return a value. This would be true if test always returned a value, or if it checked x and continued and returned a value only if2 were 7. The second statement, gq — p, says that it is impossible that test returns a value and that « is not 7. Perhaps test sets x to 7 just before it returns, or test contains a statement while (x != 7) yt++; that sends it into an infinite loop unless z is 7. Implication can be rather confusing on a first encounter.

1-27

First, there are many

ways to express

we haven’t ruled out the possibility that this quadratic equation reduces to just 0 = 0 (note what happens, for example, if you start with 6 = et). But we can do this, by noting a stronger statement that can be made about the equation in the previous lemma:

Lemma: If ¥ is buildable from f as above, and thus y = wie yB+z , then the number”° wz — xy is either 1 or —1.

Proof: The base case has wz—xy = 1. Taking a reciprocal changes wz—a ry into ye—zw, multiplying this quantity by —1. Adding one to the whole fraction changes wz — ry to (w+ y)z—(a%+2z)y which equals wz —

So if this number is already 1 or —1, either operation keeps this fact true.@

This Lemma allows us to conclude that our equation y8? + (z — w)3 — z = 0 is non-trivial meaning that the three coefficients are not all zero. If y # 0, then clearly the equation is non-trivial. If y = 0, on the other hand, the Lemma says that wz — = wz is equal to 1 or —1. Sow —z can only be zero if both

w and

z are zero, or both are —1.

This

makes

the fraction

6 =

it

or

B att. which reduces to 8 = 8 + x, which is either trivially true (if 0) or has no solution. Could a pure periodic continued fraction reduce to 8 = 8? We'll show in Problem 5.7.3 that this

can happen only if the operations used are an even number of reciprocals (a case we’ve ruled out) — if adding one is ever used then at least three of the four naturals w, x, y, and z are positive. So much for pure periodic continued fractions — our remaining problem is to insure that if a is buildable from a number { with a pure periodic continued fraction, then a as well as { is the solution of a quadratic. Our Lemma tells us that a can be written as were but in order to use

this to get a quadratic equation in a we need to somehow write 8 in terms of a. Can we run the process of making a from { in reverse? We can, because actually there was nothing special about adding one in the Lemmas above — subtracting one would work as well: Lemma:

If 3 is made from a by taking reciprocals and subtracting one, then 8 =

j, k, and & are integers and if — jk Proof:

eS

where i,

is either 1 or —1.

We follow the proofs of the above Lemmas

a

exactly

So 6 is a quotient of linear polynomials in a, and we want to use this to get a in terms of a and thus get our quadratic. If we apply the given sequence of operations to this quotient for 3, of course, we'll just reduce to the use! equation a =a. But what if we take our expression for 8, and apply the operations to make { from itself? We'll show this in an example, then finish our argument.

Define a to be given by the continued fraction

4

.

‘

2

?°Some readers will no doubt recognize this number as the determinant of the matrix about matrices in Chapter 8

5-34

(

w

¥

z* ) — we'll talk

1 Soa = >—7 ay

and B=

First we write 3 in terms of a: start with a (dato ), take the reciprocal Da

era

(9254), subtract two (=),

take the reciprocal again (4),

and subtract one, getting Seek.

Now the other equation tells us that we can get 8 from { by adding three and taking the reciprocal, so working with this last fraction we add three to get =8¢+2 =2a--1 and take the y reciprocal to get 6 = at. Now we apply the operations needed to convert 3 to a: add one (S33), take the reciprocal

and take the reciprocal again to get a = aes.

(=3543): add two (358),

non-trivial

quadratic

a= ise (/1so= 106!

equation

—13a? + 13a — 3 =

0, or 13a? — 13a + 3 =

Both roots of this equation are positive, as it happens,

derivation of a from 8 = =

Evi

This gives us the 0, which

solves

to

but working through the

confirms that the correct value for a is 1.

So this process, of (1) computing § in terms of a, (2) substituting into the pure continued fraction to get another expression for 7 in terms of a, and (3) using the expression for a to get a in terms of a, will always get us a quadratic equation.

How can we guarantee that this equation is non-trivial?

‘We'll show in Problem 5.7.4 that as long as the pure continued fraction has at least one addition of one, then the resulting equation for a is all right.

5.7.5

Exercises

E5.7.1

Prove that under the rules of the MU-puzzle of Excursion 5.6, every string derivable from MIU is derivable from it by Rule 2 alone. (Hint: Prove by induction on the definition that every such string has the property that the other three rules are not applicable.)

E5.7.2

We've defined a rooted directed tree to be either a single node (the root) or a single node (also a root) with ares to one or more nodes, each of which is the root of a rooted directed tree. A directed graph is said to be two-colorable if we can assign one of two colors to each node so that no arc goes from a node to another node of the same color. Prove, by induction on the definition of rooted directed trees, that every such tree is two-colorable.

E5.7.3

Let p and q be any positive naturals. when run with inputs p and g. (Hint:

Prove that the function buildRational terminates Use induction on q, then induction on p within each

step of that induction.) E5.7.4

Convert 21/55 into a continued fraction.

E5.7.5

Convert

E5.7.6

Write out the proof of the last Lemma of this Section.

E5.7.7

If a/b and c/d are two non-negative

E5.7.8

Here we will use the mediant

the continued

,7 to an ordinary fraction.

fraction with denominators 1, 2,3,

rational numbers,

with a, b, c and d all naturals and

a/b < e/d, we define the mediant of these two numbers to be Hs. is a rational number strictly between the two numbers. of rational numbers,

(0,1).

Prove that the mediant

(defined in Exercise 5.7.7 above) to define a series of sequences

with each number

in the interval from 0 to 1. The

first sequence

So is

We make Sj41 by inserting the mediant of each two adjacent numbers between those

numbers.

(0,1/2,1).

For example,

to make

S; we insert

the mediant

of 0 and

1 in the middle

Similarly Sy is made by inserting two mediants to get (0, 1/3, 1/2,2/3, 1). 5-35

to get

(a)

What

are $3, S4, and S5?

(b)

How many numbers are in the sequence S;, for general n? Prove your answer by induction.

(c) For what n, if any, does the number 16/25 first appear in S,,? E5.7.9

(requires exposure to complex numbers) Let z be a complex number and let a, b, c, and d be integers with ad — bc = 1. The upper half-plane is the set of complex numbers a+ bi with i> 0. Prove that if z is in the upper half-plane, so is gett

E5.7.10

Suppose I am given two positive rational numbers as finite continued fractions, in terms of their denominator sequence. (So in our example earlier in the section, we showed that 41/17

can be represented by the sequence (2, 2,2,3)). How can I determine which number is larger from the sequences alone, without converting them into ordinary fractions?

5.7.6

Problems

P5.7.1

Define the language M C {a,b}* by the wha € M for any strings w and x, then a property that holds for exactly those your property holds for all strings in M. be formed by the rules.

P5.7.2

Find the numbers denoted by the periodic continued fractions with the following sequence of numbers in the denominators

following two rules: (base) bb wabax € M. Characterize the strings. Prove by induction Then prove that every string

(so, for example,

€

and (inductive) if strings in M by giving on this definition that with this property can

the sequence for V2 was 1525 252) seu)

(a) 1,1,1,1,... (b) 1,3,1,3,1,3,... (c): 1,2,.3,2,372, 3). . (d) 4,2,1,5,6,1,5,6,... Prove that if y is buildable from 8 by taking reciprocals and adding one, and if at least one operation of adding one is used, then y = eee

x,y, and z are nonzero. P5.7.4

where at least three of the four naturals w,

Suppose that 3 is buildable from itself by some sequence g of operations of taking reciprocals and adding one. Suppose also that a is buildable from 8 by some similar sequence f. Let h be the sequence of operations that consists of undoing f, then doing g, then doing f. It is clear

from the Lemmas

in this section that h yields a fraction of the form a = =.

with if — jk

equal to 1 or —1. Prove that it is not possible for this fraction to yield a trivial equation. (Hint: Prove by induction on the sequence f that i = = 1 and j = k = 0 is impossible. For the inductive step, the new h cons ts of subtracting one, doing the old h, and then adding

1)

P5.7.5

(Hard) Prove that every positive real number that is the solution of an integer quadratic equation has a periodic continued fraction. (Hint: Assume that the number is given as

(i+jVr)/k where i, j and k are integers — it must be of this form because of the quadratic formula. Show (a) that by subtracting one and taking reciprocals, you can force the number

5-36

into this form with i < 0, j > 0, and k > 0, (b) by rewriting the formula you can force b?r—a? to be divisible by k, (c) once these two conditions are true, taking reciprocals and subtracting one leads to positive numbers

that can still be expressed in this form,

with these conditions

still true, and with the same value of b, and finally (d) since there are only finitely many numbers satisfying all the conditions of (c), continuing the process of taking the continued fraction must eventually produce the same number twice.) P5.7.6

Prove that every rational number in the unit interval appears in the sequence S,, (defined in

Exercise 5.7.8) for some natural n. P5.7.7

Prove that in the sequences defined in Exercise 5.7.8, every rational number appears

lowest terms. That is, no adjacent numbers a/b and c/d create a mediant ae and b+ d have a common factor. P5.7.8

first in

where a +c

Prove that every positive irrational real number corresponds to an infinite continued fraction, representable by an infinite sequence of positive naturals (like those in Problem 5.7.2, but not:

necessarily periodic). P5.7.9

(requires exposure to complex numbers) Consider the upper half-plane defined in Exercise 5.7.9, and the particular subset R of this half-plane consisting of complex numbers a + bi

where |b| < 1/2 and a? +b? > 1.

(a) If f is a function from C to C defined by f(z) gets for integers a, b, c, and d with ad — be = 1, we define f(R) to be the set f(z z € R. Show that the boundaries of f(R), for any such f, must consist of vertical lines and/or circles with centers on the real axi. (b)

Find f(R) for the functions BS,25,

(c)

(harder)

Prove that the images f(R),

and Sete for all possible f of this form, cover the entire

half-plane except for points that are the images under some f of points on the boundary

of R. P5.7.10

(uses Java) Write a pseudo-Java method that inputs two positive rational numbers 2 and y as sequences of positive naturals as in Exercise 5.7.10, and returns a sequence of positive naturals representing «+ y in the same way.

5-37

5.8

Top-Down

5.8.1

Another

and Bottom-Up Form

For Recursive

Definitions Definitions

In Section 5.7 we presented a general form for inductive definitions: Base cases putting objects in the set, inductive cases putting objects in the set if certain other objects are already there, and a rule declaring that the only objects in the set are those forced to be so by these rules. If an object is in the

set, then,

there

is some proof of this fact using the rules.

But as we saw with the MU-puzzle

in Excursion 5.6, such rules don’t give us an obvious way to show that an object isn’t in the set. There is an alternate form of recursive definition that is far more useful for deciding membership. Here is an example for the even numbers:

e 0 is an even number. e 1 is not an even number. e Ifnm > 1, nis an even number if and only if n — 2 is an even number.

This translates almost verbatim into a recursive decision procedure for evenness. If a number is 0 or 1 we return the answer, otherwise we subtract two and make a recursive call to the same procedure. Our definitions of arithmetic operations in Section 4.6 all had this form.

‘We can think of the recursive algorithm as taking the original “high-level” question and reducing it to “lower-level” questions, which may be reduced to still lower-level questions, and so forth until we

reach “base questions” that can be answered immediately. such as this one for evenness

is called a top-down

recursive

Because of this terminology, a definition definition.

By contrast, our previous

recursive definitions began with the base cases and then asserted membership for objects that would be higher in this derivation tree, so they are called bottom-up definitions”!. Here are some examples of top-down versions of some of our earlier definitions:

e 0 is a natural. e If n

is not 0, then n is a natural if and only if its predecessor is a natural.

Of course, if we allow things other than naturals to have predecessors, the algorithm based on this

definition will not terminate if it is given a negative integer or a non-integer as input. ?!Most beginning programming texts emphasize top-down programming, where you reduce your original prob-

lem into smaller problems until you can implement these in code. An equally important paradigm in practice is bottom-up programming, where you expand the repertoire of what you are able to code until you see how to put implementable tasks together to accomplish the desired task. The concepts are exactly analogous to top-down and bottom-up recursive definitions.

5-38

e Aisa

string.

e Ifw is not , w is a string if and only if al1ButLast (w) is.

The key point enabling us to make the top-down version is that for numbers and strings, we have inductive operations that are invertible. With naturals, the inductive operation is the successor

function, which has as an inverse function (almost) the predecessor function. Specifically, every number except zero has a unique predecessor. Similarly, we (almost) have an inverse to the append function for strings. Given any string w except for the empty string, we have a unique string v

and letter a such that w = append(v,a). We called v al1lButLast (w) and a last (w), and these functions were crucial parts of all our recursive algorithms on strings. Even

for a more

such inverses.

complicated

recursive definition,

like that

of regular

expressions,

we can design

A given regular expression is either a base expression (0 or a letter) or is made from

one or two other regular expressions by the

we gave headers for a package of methods regular expressions. These included:

sum, concatenation,

or star operators.

for a Java class called RegExp,

whose

In Section 5.5

objects denote

e Boolean test methods that allow us to determine how the input regular expression was made, such as isEmptySet, isZero, isOne, isUnion, isCat, and isStar. For example, if s were

a RegExp object denoting the regular expression S = 01 + 10*1, s.isSum() would return true and s.isStar() would return false. Although the regular expression S contains a star operator, expression.

it is not the last operator applied,

so that S is not the star of another regular

e Inverse functions firstArg and secondArg that return the first or second arguments of the operation used to make a compound

expression.

In the example above, s.firstArg()

would

return a RegExp object denoting the regular expression 01, and s.secondArg() would return an object denoting the regular expression 10*1. Note that we assume that the expression denoted by a RegExp object is fully parenthesized, so that there are only two arguments to the last operation.

e Methods to allow us to construct RegExp objet Constructors to produce the base objects denoting 9) or a single letter, and methods plus, cat, and star to make larger RegExp objects from smaller ones.

5.8.2

Ambiguity

of Definitions

All these definitions have the property that a given object can be produced by only one sequence of steps — we call such a definition unambiguous. However, many recursive definitions are ambiguous. Consider the set S$ of numbers defined by these rules: 0 € S, and if n € S then both n+2€S

andn+3€S.

The number

5, for example,

can be derived in two separate ways,

from

2 and from 3. Similarly, in the case of the MU-puzzle from Excursion 5.6 we had strings that were in the language of strings derivable from MJ, but that could be derived in more than one way.

5-39

Sometimes it is possible to take an ambiguous definition and come up with an unambiguous recursive definition for the same set. In the case of S, we can say: 2€5,3€5,5¢€S,andifn >5andneS, then n+1€

S.

In the case of the MU-puzzle,

we have a general solution to the decision problem

that tells us that a string is derivable if and only if it is in M(J + U)* and has a number of ones not divisible by three. With a little work, we can use this characterization to get an unambiguous bottom-up definition of the language:

e Any string in MU*I e If w

or MU*IU*T

is derivable.

is derivable so is wU.

e If w is derivable and 2 € JU*JU*I,

then wa is derivable.

But in general it may be difficult or even impossible to convert unambiguous one. If we are left with an ambiguous definition, we functions for the construction rules. (For example, there is no way the MU-puzzle and determine how it was derived, because there cases we can use a technique

called backtracking,

where

an ambiguous definition to an have no hope of finding inverse to take a derivable string from is no single answer.) In some

we look at all the possibilities of how

an object might have been formed, then all the possibilities for how those predecessors might have been formed, and so on until we reach a base case. If we have some limit on how far back we might have to go, this gives us a correct (though usually horribly inefficient) decision procedure. But in the MU-puzzle case there is no limit on how long a derivation might be for a given string, because strings can get both longer and shorter in the course of a derivation (but see Problem 5.8.4).

5.8.3

Examples

From

Graph

Theory

Our various inductive definitions from graph theory in Section 4.9 provide us with more examples of bottom-up definitions. The definition of a rooted directed binary tree is easy to make top-down,

because we have inverses for the constructing operation:

e A single node is a rooted directed binary tree. e If a graph is not a single node, it is a rooted directed binary tree if and only if it consists of a root, with edges to the roots of two rooted directed binary trees.

In Lisp terms,

a list is either an atom or the cons of two lists, and in the latter case the functions

car and cdr can obtain these two lists for us. How about undirected trees, which we defined to be connected undirected graphs with no cycles? We had a bottom-up definition for rooted trees that could be converted to top-down much as we

did for rooted directed binary trees above. If we are simply given the undirected tree, we don’t know how it was assembled, but we can determine a way in which it might have been assembled. ‘We just take any node, imagine deleting it and its edges, and look at the connected components

5-40

of the graph that remains,

one for each neighbor of the node we picked.

We have broken the tree

down into subtrees, and this process can continue recursively only so long before we reach the base case of single nodes. For our final example,

consider our recursive definition of paths in graphs.

Here our bottom-up

definition is not easily converted to a top-down one, because we don’t know how to go backwards. Given two nodes a and y with a path from x to y of length n +1, it’s a difficult problem to find out which node the path hits just before y — in fact there may be more than one possibility. This makes it difficult to test for a path of a given length from « to y, if we insist on using the definition.

It is possible to look for the path by backtracking, by converting the following top-down definition into code:

e There is a path of length 0 from

to y if and only if # = y.

e Ifn > 0, then there is a path of length n from z to y if and only if there exists some node z such that there is a path of length n — 1 from « to z and an arc from z to y.

So we would try all possible nodes z with arcs to y, then all possible nodes with arcs to each of those z’s, and so on. The process would eventually stop, because eventually we’d be looking for

paths of length 0 that are

easy to test.

But the time taken could be horrible — for example even

if each node had only two arcs into it we would have to look at 2” possibilities to test for a path of length n. Of course, there are better ways to test for paths in a directed graph, and we will look at these in Chapter 8.

5.8.4

Exercises

E5.8.1

Give a top-down definition of the language (ab)*. Can you give a similar top-down definition

E5.8.2

Give bottom-up and top-down definitions for the set of strings that have even length.

E5.8.3

Give bottom-up and top-down definitions for the

for the language (a + ab)*?

set of naturals that are powers of 2 (naturals

whose only prime divisor is 2) in terms of multiplication and/or division. E5.8.4

Prove that the section’s unambiguous

bottom-up

recursive definition of the MU-puzzle

guage is correct. (You may use the results of Excursion 5.6 without proof.) recursive definition derived from this unambiguous bottom-up definition. E5.8.5

Give a top-down

Informally describe an implementation of the RegExp class and how the boolean test methods isEmptySet,

E5.8.6

lan-

(uses Java)

isZero, Here

isOne, isPlus,

is a bottom-up

string w such that w=

isCat, and isStar would

definition

of a palindrome,

work. which

w":

e \ and all single-letter strings are palindromes. e If w is a palindrome and a is

any letter, then awa is a palindrome.

e Nothing else is a palindrome. 5-41

can also defined

as a

E5.8.7

(a)

Give a top-down recursive definition of palindromes, using our pseudo-Java string functions from Chapter 4, including first and allButFirst if you like.

(b)

Write a recursive boolean pseudo-Java static method to test whether a drome, using your top-down definition.

After Exercise 5.7.6 we have

given

there,

and

three definitions of palindromes:

the top-down

one you are asked

for.

Prove

w = w",

string is a palin-

the bottom-up

by induction

that

one

all three

definitions are equivalent.

E5.8.8

Recall that a graph-theoretic

tree is a connected undirected graph with no cycles. Here is

a bottom-up definition of graph-theoretic trees:

e A single node with no edges is a tree. e If T

is a

tree, and 7’ is made from T by adding one new node with exactly one edge,

which is to a node of T, then 7" is a tree.

e Nothing else is a tree. (a) Verify from this bottom-up definition that every tree is connected. will show that every tree, as given by this definition, has no cycles.)

E5.8.9

(In Section 9.1 we

(b)

We'll also prove in Section 9.1 that every graph-theoretic tree has at least one node of degree one. Given this fact, argue that any such tree is a tree according to the bottom-up definition.

(c)

Give a top-down recursive definition of trees corresponding to this bottom-up definition. ‘You may assume that every tree has at least one node of degree one.

Using either the top-down or bottom-up

definition from Exercise 5.8.8, prove

by induction

on all trees that every tree is two-colorable (as defined in Exercise 5.7.2). E5.8.10

Consider a family of directed acyclic graphs G,, with nodes s,a1, a2,...,@n, 01, b2,.-., bp ares from s to a; and by, and

1 to i+1.

(Figure 5-3 shows the graph G4.)

s to aj, or from s to b;?

5.8.5 P5.8.1

arcs from each of a; and b; to each of aj+1 Prove your answer

and bj41 for every i from

For each i, how many paths are there in G; from by induction.

Problems Let n be a natural. Define an n-vertex graph recursively as follows. The set of vertices {1 n} is an n-vertex graph. If G is an n-vertex graph, and e = (i,j) is an undirected edge, then GU {e} is an n-vertex graph. Can this definition be made top-down? If so, do it, and if not, determine whether there is an equivalent bottom-up definition that can be made top-down.

P5.8.2

It is not obvious how to give a top-down definition of the Fibonacci numbers from Excursion 4.5. But define a Fibonacci pair by the following bottom-up definition: (0,1) is a Fibonacci pair, and if (a,y) is a Fibonacci pair then so is (y, a+ y).

(a) Prove that a pair of numbers is a Fibonacci pair if and only if it equals (F(n), F(n +1)) for some number n. 5-42

Figure 5-3: The graph Gy from Exercise 5.8.10.

(b)

Give a top-down definition of the Fibonacci pairs.

(c)

Give a recursive algorithm to decide whether a pair of numbers is a Fibonacci pair.

(d)

The bottom-up way to decide whether a number is a Fibonacci number is to compute all the Fibonacci numbers in turn, until the target is either reached or exceeded. Give a top-down recursive procedure to determine whether a number

Justify the correctness

of your

algorithm.

(Hint:wi Recall

is a Fibonacci

number.

the characterization of the

Fibonacci numbers in terms of the number ¢ = “3— from Excursion 4.5. If y were a Fibonacci number, then ( would be a Fibonacci pair if we could find the right «, which should be about yd. Can you find all possible candidates for 2?) P5.8.3

Let

{wi,we,...,we}

be a

finite set of strings.

It is easy to make

a bottom-up

definition of

the language L = (w; + wo+...+wp)*: A is in L, if u is in L then so is uw; for any i, and these are the only languages in L. Explain how to get a top-down definition of L if the w; strings have the following property: if i # j, neither w; nor w; is a suffix of the other. (A

string x is a suffix of a string y if and only if dz : za = y.) Show that (a + ab + ba)* does not have this property.

P5.8.4

Show that any derivable string in the derivations (that is, show that for any use the solution of the puzzle to find given string. That is, find a function f(n)

MU-puzzle from Excursion 5.6 has arbitrarily long n there is a derivation of more than n steps). Then a limit on the length of the shortest derivation for a f such that any string of n letters can be derived in

or fewer steps if it can be derived at all.

The

existance

of this function means

that a

backtracking decision procedure is possible, though it would be horribly slow. P5.8.5

Recall the language M from Problem 5.7.1, which was defined there by a bottom-up recursive definition. Although this particular definition is ambiguous (for example, we could derive abaaba from bb by first deriving either abab or baba), there is an equivalent set of rules that lead to an unambiguous definition. Describe this set of rules, formulate an unambiguous top-down definition of M,

and

describe a recursive algorithm to test membership

there only one way to do this? P5.8.6

Here is a bottom-up

definition of the path predicate in an undirected graph:

5-43

in M.

Is

Source: David Mix Barrington Figure 1-4: The truth tables for > and ©. an implication in English, such as “p implies q”, “p only if q”, and “if p, then q”. Their use in English may or may not capture the exact mathematical meaning of the symbol — (given by the truth table in Figure 1-4). For example, saying “if your shoes are muddy, it must be raining” in English carries the extra implication that the two facts have something to do with each other. Mathematically, it doesn’t matter whether the propositions are connected as long as their truth values make the implication true. Also, we don’t often think about what that English statement means if your shoes are not muddy, but mathematically it is still true. The compound proposition p— is true whenever p is false. So mathematically we could say “If 0 = 1, then I am Elvis”. The great logician Bertrand Russell was allegedly once asked to prove a similar statement and came up with the following (adapted) proof: “Assume 0 = 1. Add one to both sides, getting 1 = 2. Elvis and I are two people (obvious). But since 1 = 2, Elvis and I are one person. QED!°.” This convention may appear silly on a first encounter, but it matches actual mathematical practice. The key purpose of implications in proofs is to get true statements from other true statements. If you are starting from false premises, it is

not the fault of the proof method if you reach false conclusions.

Our last boolean operator is also extremely useful in proofs because it captures the notion of two propositions being the same in all possible circumstances. Definition:

Let p and q be propositions.

The equivalence of p and q, written p +

q and read

as “p if and only if!” q’, is the proposition that is true if p and q have the same truth value (are both true or both false), and false otherwise. Example: value”.

Again using our values for p and q, p + q means

“a is 7 if and only if test returns a

Facts: Ifp and q are any propositions, p + q has the same truth value as (p > q) A(q > p), and the same truth value as —(p @ q). The most important thing about equivalence is that it works like equality in equations — if you know that two compound propositions are equivalent, you can replace one by the other in any

context without changing any truth values.

For example, if we know p © q and p > (r V (qA p))

'6“= r) x -= 4; qt+;}

return

div

{

q;}

E5.10.4

Carry out the Béhm-Jacopini

construction on the flowchart of Figure 5-9.

E5.10.5

Argue carefully that the Cases O, boxes must fit into one of the c

I, I], and II are exhaustive — that any flowchart with n+1

5-52

Figure 5-10: A flowchart for use in Exercise 5.10.4. E5.10.6

(uses Java)

Write

all of the methods

a non-recursive isEmpty,

pseudo-Java

append,

last,

method

allButLast,

to reverse first,

a string,

using

and allButFirst.

some

or

Draw

a

flowchart for your method. E5.10.7

(uses Java) Consider the subset of toy-Java programs made without while

loops.

Assuming

that all atomic statements terminate, prove by induction on the definition of these programs

that they all also terminate. E5.10.8

(uses Java) Suppose we are given an undirected graph with vertex set {0,1,...,n—1} by a two-dimensional boolean array encoding its edge relation. That is, E[i] [j] is a boolean that is true if and only if there is an edge between nodes i and j. Write a real-Java method that: will input such an array and return the number of edges in the undirected graph. Draw a flowchart for your method.

E5.10.9

(uses Java)

E5.10.10

In real Java,

a break

statement

may

occur inside any loop.

If the execution

reaches the break statement, it continues from the end of the loop.

Consider a version of the

toy-Java language that allows a break statement inside any loop. statement of this language into an ordinary toy-Java statement.

Show how to translate a

(uses Java) In real Java, we have for loops as well as while

Consider a version of the

loops.

toy-Java language that allows for loops with their real-Java syntax and meaning. to translate a statement of this language into an ordinary toy-Java statement.

5.10.5 P5.10.1

Problems Carry out the B6hm-Jacopini

construction on the flowchart

getting a (long and complicated) P5.10.2

Show how

constructed

in Exercise 5.10.2,

toy-Java statement?®.

Prove that every toy-Java statement can be simulated by a flowchart. definition of a statement.)

?°Note the general phenomenon that when you turn an “2” into a

the resulting “y” back into an “x” by another, you get a much nastier

5-53

(Use induction on the

” by some general construction and then turn

then we can replace p by q in the second compound proposition and conclude that ¢ > (r V(qAq)) is true. We've now defined a large set of boolean operators. By larger compound propositions from smaller ones. Once propositions in a compound proposition, we can evaluate applying the rules for evaluating the results of individual

using them repeatedly, we can construct values are chosen for each of the atomic the compound proposition by repeatedly operators.

Example:

and we want

Suppose p and q are true

and r is false,

to evaluate the compound

proposition p + (qA(r@-7(qV p))). We can see that q Vp is true, so 4(q Vp) is false. Because r is also false, r—(qV/p) is false. By the definition of A, then, g(r 67(qVp)) is false. Finally, since p is true, it is not equivalent to q A (r @ =(q V p)) and the entire compound proposition is false.

In order to evaluate a compound proposition in this way, we have to know in which order to apply the operations. We'll see in Problem 1.4.1 that the meaning of p V q Ar, for example, is not clear unless we use parentheses to indicate which operation is to take place first. (That is, do we mean

pY (qAr) or (pV q) Ar?) In ordinary arithmetic and algebra we have a set of precedence rules telling us to perform multiplication before addition and so forth. Programming languages have a more formal set of rules telling which operation happens first in all situations. In compound expressions with boolean operators, the order of operations in all cases, with only two precedence, so that it is always applied to the next expression). Also, we know that the operators A, we don’t need parentheses in expressions like pV

we will insist that parentheses be used to tell us exceptions. We give the > operator the highest thing on its right (which may be a parenthesized V, @, and + are associative with each other, so q Vr.

What if we have a compound proposition and we don’t know the values of the atomic propositions in it? In general the compound proposition will be true for some settings of the atomic propositions and false for others, and we are interested in exactly which settings make it true. In Section 1.6 we will learn a systematic method to do this called the method of truth tables. Definition: A tautology is a compound proposition that is true for all possible values of its base propositions. Example: Recall our sample propositions p and q. A simple tautology is p V ap, meaning “a is equal to 7, or a is not equal to 7”. We can be confident that this is true without knowing anything

about x. A more complicated tautology is (=(pA q) A p) > 7g. Let’s see how to translate this. We first locate the last operation to be applied, which is the implication, and get “If (>(pA q) A p), then -=q”. Then we attack the two pieces, using “it is not the case that” to express the complex use of + in the first piece. The final translation is “If it is not the case that 2 = 7 and test returns a value, and x = 7, then test will not return a value”. If you puzzle over this statement carefully you should be convinced that it is true, and that its truth depends not on any knowledge about x or test, but essentially on the definition of the word “and”. All tautologies have this quality of obviousness to them, because if they depended on the content of their component propositions, they would be false for some possible contents and thus not be

1-29

Figure 5-11: P5.10.3

A crucial point

A flowchart to decide membership

in our proof of the Bohm-Jacopini

theorem

in D*ab=*. is that

when

we add

the new

boolean flags, this doesn’t count as adding additional boxes to the flowchart.

P5.10.4

(a)

Explain where the reasoning of the proof fails if the flag actions do count as ordinary boxes.

(b)

The the and the

Suppose

original proof in the paper by Béhm and Jacopini solved this problem as follows: quantity on which we inducted was 3r+ f, where r was the number of regular boxes f was the number of “flag” boxes. Explain carefully how this modification makes proof valid. that

our atomic

statements

include a data type large enough

to contain a unique

name for every box in a given flowchart. Show that by introducing a “program counter” of this type, this flowchart can be simulated by a toy-Java statement with a single loop. P5.10.5

It is natural to wonder how long a toy-Java program could be created from a flowchart of n boxes by the Béhm-Jacopini construction. Let BJ(n) be this maximum length, where we define the length of a toy-Java statement to be the total number of base statements, keywords, and opening or closing parentheses or braces in it. Show that in each case of the construction,

we can bound BJ(n +1) by no more than 2BJ(n) +11. Prove by induction for all naturals n that BJ(n) < 11(2" —1), remembering that BJ(0) = 0. P5.10.6

Consider the following flowchart (Figure 5-10), which describes an algorithm to determine whether the input string is in the language L*aby*: Carry out the Béhm-Jacopini

P5.10.7

construction on this flowchart to get a toy-Java statement.

In the solution to Exercise 5.10.4,

it is noted that the Bohm-Jacopini

construction could be

improved by separating out the case where the first box of the flowchart is the decision box of a statement that is already in if-then-else form. Modify the proof of the theorem so that this case is dealt with separately. Be precise about how to tell when you are in the new case. P5.10.8

(uses Java) Many

classes in real Java have compare

methods,

which take two arguments

«

and y and return an int value that is negative if x is less than y, 0 if x and y are equal, and positive if x is greater than y. If we were to model the result of such a method in a flowchart,

we would have a decision box with three arrows out of it rather than two.

5-54

P5.10.9

P5.10.10

(a)

Show how such a boxes.

decision box could be simulated in a flowchart with ordinary decision

(b)

Indicate how the proof of the Béhm-Jacopini Theorem would have to be altered to deal with flowcharts containing such three-way decision boxes.

Unlike toy-Java statements, pseudo-Java and real Java methods that return a value may do so at a return statement located anywhere in the code. How could we take such a method and alter it so that it could be modeled by a toy-Java statement, which always terminates at the end? In Exercise

5.10.7 we proved

that all toy-Java statements

made

without

while

loops must

always terminate. We know that this is not true of statements with while loops, as the simple example while (true) x++; shows. What kind of loops can we handle, and still guarantee termination? Hofstadter defines a programming language called BlooP in which the only loops allowed

natural, then “loop

are bounded

at most

loops.

x times:

If s is a statement

s”

and x is a variable of type

is defined to be a statement.

BlooP’s variables

are all of type natural, and its atomic statements can initialize variables to 0, assign them, increment them, add them, or multiply them.

(a)

Assume

that if we have any loop of this form, the variable x cannot be changed inside

the code of s.

Prove,

by induction on toy-Java programs

with bounded

loops, that all

such programs terminate. (b)

Explain why the extra assumption is necessary for the proof in (a) to be valid.

(c)

(harder)

The

Ackermann

function is a very rapidly growing

function from pairs of

naturals to naturals, defined by the rules A(0,n) = n+ 1, A(m+1,0) = A(m,1), and A(m+1,n+1) = A(m, A(m+1,n)). Prove that for any fixed BlooP program computing a function f(n) on naturals, the ome number m such that for all n, f(n) < A(m,n). Hence no BlooP program can possibly compute the Ackermann function.

5-55

5.11

Correctness

5.11.1

Talking

of Imperative Programs

About

Toy-Java

Programs

Once we have a formal definition of a programming language, we can begin to think about proving statements about the behavior of programs. Back in Excursion 1.3 we gave an example of an informal proof of the correctness of a short Java program. Here we'll do the same thing a bit more rigorously, by defining a set of rules corresponding to our toy-Java imperative language from Section 5.10. This language has some set of base statements and three methods for combining statements into larger statements — sequencing, if-then-else, and the while loop. A serious treatment of proofs of program correctness is beyond the scope of this book. But the principles used are exactly those we have been studying — formal definition, recursion, and induc-

tion. And even at this basic level, we should get some insight into some of the factors influencing the design of real programming languages. Our toy language will not have recursion, but note that we have already seen how to use induction to prove statements about the behavior of recursive algorithms. In general the interplay of recursion and induction makes recursive programs are easier to verify, which is one key advantage of using them?®, ‘What does it mean for a program to be correct?) We normally break this down into subassertions. First, the program must terminate, because looping is always bad (though note that this restriction constrains the sort of programs we can talk about to the “data processing” world, as opposed to many control applications where a program never terminates in normal operation). Secondly, it must do whatever it’s supposed to do. We say that a program is partially correct if whenever it

terminates, it does the right thing.

5.11.2

Hoare

Assertions

Our formalism for making assertions about what a program does will be that of Hoare assertions. These are named after the British computer ntist C. A. R. “Tony” Hoare, who invented them in the late 1960's. If S$ is a block of code, and p and q are any statements about the variables or other subject matter of S, then “p{S}q” is a Hoare assertion, and is interpreted: “If p is true before S runs, and S$ terminates, then q is true afterward.” This is an assertion of partial correctness because it doesn’t say that termination will happen, and also doesn’t say what will happen ou

run S without p being true. The

statements

p and

q ought

to be familiar

as a precondition

and

a postcondition

of the

method S. Common (and useful) advice for programming in any language is to state a precondition and postcondition for every subprogram of a program. This allows you to determine, without looking at the rest of the program, whether the subprogram does what it should.

This practice was

26 A though our proofs of correctness for recursive programs hide the serious problems involved in proving a procedure

call correct, such as parameter passing and aliasing.

5-56

a direct outgrowth of Hoare’s logic for proving programs correct. Now note that in order to prove an entire program partially correct, we need to break down a Hoare

assertion about the main program into Hoare assertions about subprograms, until (we hope) we reach the point of verifying Hoare assertions about atomic statements. If our program is one of the recursively defined toy-Java “statements” from Section 5.10, we can hope to establish systematic methods of doing this. Just as in other proof settings, the structure of the program can guide us as to what smaller assertions we need to make and verify.

5.11.3

Proof Rules for Hoare Assertions

Since we have no formal definition of atomic statements, we'll assume here that we can verify Hoare assertions about them. (In a real situation we’d need a definition of the semantics of the language, which would tell us how to do these verifications.) We need proof rules to cover the three ways to make large toy-Java statements from small ones:

e If S is the sequence

“{ T;

U }’, then we can prove the Hoare

assertion p{S}q

by proving

both the assertions p{T}r and r{U}q for any statement r. This means we need to look for what

we expect

to be true at the point in the execution when T has finished and U has not

yet started, a natural thing for us to do. e IfSis the statement “if

(x)

(pAx){T}q and (pA 7%) {U}q.

e IfS isthe statement

“while

T else

(x)

U”, we can prove p{S}q by proving both the assertions

T”, then we need to look for a loop invariant:

p for which we can prove (p A «){T}p.

a statement

From this we are allowed to conclude p{S}(p A 72).

Note that the loop introduces the possibility of non-termination, so if we want to be confident

that the program will terminate we have to prove this separately. The most common way to do this is to find some quantity that is guaranteed to change in some way that it eventually causes x to be false.

Of course these rules only tell us which moves are legal to make in the game of proving the original statement.

Just as in Chapters 1 and 2, we can benefit from Solow’s “forward-backward”

method.

Here a “forward” move consists of taking a precondition and seeing what you can prove from it. If you don’t know specifically what you want to prove, you might form the strongest postcondition, which is the AND of all the possible things you can prove (we can hope that there aren’t too many of these). More commonly, you work backwards from the desired postcondition, to get the weakest precondition required to make it true. Ideally, you can show this weakest equivalent to true, or to the given preconditions of your program.

precondition

to be

Software engineers, programming language designers, and other computer scientists are constantly debating how important formal proof is in the programming process. Some place great stock in the possibilities of machine proof, and others do not, but there is general agreement that reasoning

about programs at the level we’ve done here is a vital tool for any programmer in an imperative language. You should also be able to begin to see how this notion of formal proof has influenced

5-57

the design of languages, at least to the point where Algol-type languages (such as Pascal, C, C++, Ada, or Java) have replaced the earlier less structured languages (such as FORTRAN).

5.11.4 E5.11.1

Exercises Let S be the toy-Java atomic statement = 2*x + 3;, where x and y are variables

(y ==

4){{9;T}}(y == 21).

2*y + 1; and T be the atomic statement y of type natural. Prove the Hoare assertion

E5.11.2

With S$ and T as in Exercise 5.11.1, T. Prove the Hoare assertion (y

let U be the toy-Java statement if 4){U}(y == 4).

(y42

== 0)

E5.11.3

(uses Java) Here is a toy-Java procedure inspired by a Zen koan quoted by Hofstadter.

S else Recall

that Boolean is a wrapper class in Java, so that a Boolean object contains a single boolean

value.

public static void ganto (boolean sayWord, if (sayWord) cutOffHead = new Boolean(true); else cutOffHead = new Boolean(true) ;}

Boolean

cutOffHead)

Prove the Hoare assertion true{ganto}cutOffHead. booleanValue() rule for if-then-else.

{

using the given proof

E5.11.4

Formulate a proof rule for assertions about a Java do

E5.11.5

Formulate a proof rule for assertions about a Java switch statement.

E5.11.6

Formulate a proof rule for assertions about a Java for statement.

E5.11.7

Prove that the answer to Exercise 5.10.6 terminates,

E5.11.8

Write the answer to Exercise 5.10.6 as a toy-Java program R, and prove the Hoare assertion pRq where p is the proposition “w is a string” and q is the proposition “the output string is

...

while

loop.

given that the input is a finite string.

we”, E5.11.9 E5.11.10

5.11.5 P5111

Formulate and prove correctness and termination statements for the toy-Java statement Exercise 5.10.8, which counts the edges in an undirected graph. A triangle

is an undirected

graph is a set of three distinct nodes

edges (x,y), (w,z), and (y,z) are all in the graph. Following Exercis method that counts the triangles in a given undirected graph. Prov that your method is correct.

Problems Let A be the following toy-Java statement:

5-58

of

and z such that the 5.10.8, write a real-Java , using Hoare rtions,

tyes x

=

6;

while

(y != 32) y *= 2; x -= 1;} }

{

Prove the Hoare assertion true{A}(x == P5.11.2

Write a toy-Java statement

1).

called fib, using a loop, that takes a natural n as input and sets

a variable f to the n’th Fibonacci number F'(n) according to the definition in Excursion 4.5.

Prove the Hoare assertion (“n is a natural”){fib } (f = F(n)). (A toy-Java statement is given an input by having that variable defined when it begins to execute.) P5.11.3

Here is a toy-Java method that uses some instance methods for the pseudo-Java class Stack,

which represents a stack of thing objects. If s is a Stack, the boolean method s.isEmpty() tests s empty(S) tests the stack S for emptiness, and the method s.pop() returns that top thing on s and removes it from s.

public static natural clearAndCount (Stack {// Empties stack s and returns the number thing

s) of things

that were

in it

x;

natural

while

size

=

0;

(!s.isEmpty())

{

sizet+;

x = s.pop() ;}

return

size;}

Prove that this method is partially correct for the precondition “Stack s is a stack containing n thing objet and the postcondition “Stack s is empty and the returned value is n.” Use the proof rule for loop invariants.

P5.11.4

Carefully prove the partial correctness of the method

P5115

Prove by induction that the method

div from Exercise 5.10.3, for the pre-

conditions and postconditions given there. div from Exercise 5.10.3 will terminate, given that the

precondition is true.

P5.11.6

Using the code for the Euclidean Algorithm from Exercise 3.3.7 (a), prove (using Hoare assertions) that the returned value is actually the greatest common divisor of the two inputs.

P5117:

Using the code

for the Euclidean Algorithm

from Exercise 3.3.7 (a), formulate

and

verify

Hoare assertions to prove the result of Problem 4.11.3, that if the two inputs to the algorithm

are each at most the Fibonacci number F(n), the algorithm performs at most n —2 divisions. P5.11.8

Write a real-Java method to compute the Ackermann function as defined in Problem 5.10.10.

Prove by induction that your code terminates for any two natural inputs. (The output numbers may well be large enough to cause integer overflow, but this should affect only the correctness of the result and not the termination.) P5.11.9

Here is the most interesting part of the Java code for the well-known bubble

5-59

sort algorithn:

public

static

int

n

for

(int

=

void bubblesort

[ ] A) {

A.length;

i = 0;

i < n-1;

j = i; (A, j,

(int swap

for

(Comparable

i++)

j < n-1;

j++)

j+1);}

Here swap is a method to compare a given two entries of a given array and exchange them if they are out of order. Assuming that swap is correct and that no two elements in the array are equal, prove

(using Hoare assertions)

that after bubblesort

has run, the items in A are

in the correct order. P5.11.10

Here is a toy-Java statement div (with informal atomic statements) that inputs a string of digits and determines whether it represents an integer that is divisible by 3: int

while

r

s

r=

=

0;

(input

=

digits

integer

(rts)

value

% 35}

remain) of

next

{

digit;

Prove the Hoare assertion “(the string represents an integer x) {div} ((x % 3 == 0) if and only if (r == 0))”.

5-60

Index Ackermann function 5-55 action box 5-48 age of an object 5-30 almost periodic language 5-11, 5-26 ambiguous definition 5-39 arithmetic expression 5-45

backtracking 5-40 base case of a recursive definition 5-30 Bohm-Jacopini

Theorem 5-49

bottom-up programming 5-38 bottom-up recursive definition 5-38 bounded loop 5-55

bubble sort 5-59 buildable real number 5-31 center-exit loop 5-52

closure under concatenation 5-26 closure under substring 5-26 concatenation product of languages 5-2 continued fraction 5-32 decision box 5-48 determinant of a 2 x 2 matrix 5-34

double letter 5-6

EE language 5-13 EEP language 5-14 Fibonacci

pair 5-42

flowchart 5-47 finite language 5-7

graph (recursive definition) 5-42 graph-theoretic tree 5-42 Hoare assertion 5-56 induction rule of a recursive definition 5-30 invertible operation 5-39 Kleene star operator 5-2

logically describable language 5-9 loop invariant 5-57 MU-puzzle

5-28

one’s complement of a language 5-19 one’s complement of a string 5-19 output box 5-48 palindrome 5-41

parsing a string 5-21 partially correct program 5-36 periodic continued fraction 5-32, 5-33 Periodic Continued Fraction Theorem 5-33 postcondition 5-56 precondition 5-56 prefix language 5-25 prefix of a string 5-25 Principle of Induction on a

Recursive Definition 5-30 pure periodic continued

fraction 5-33

quotient of a language by a letter 5-25 RegExp pseudo-Java class 5-21 regular expression 5-2 regular language 5-2 regular language identity 5-15 reversal of a language 5-22 rooted tree 5-40 star-free regular expression 5-6 star-free regular language 5-9 start box 5-48 string homomorphism 5-26 strongest postcondition 5-57 suffix of a string 5-43 top-down programming 5-38 top-down recursive definition 5-38

toy-Java language 5-47 toy-Java statement 5-47 triangle 5-58 triple letter 5-11

two-colorable graph 5-35 unambiguous definition 5-39 upper half-plane 5-36 weakest precondition 5-57

non-trivial quadratic equation 5-33

5-61

Chapter 6: Fundamental Counting Problems

“Morning.” “Morning.” “What have you got, then?” “Well there’s egg and bacon; and spam;

spam,

spam,

spam,

spam,

egg, bacon,

sausage

bacon and spam;

spam,

egg,

sausage

and

and spam; spam,

spam, spam,

baked beans, spam,

spam,

bacon;

bacon,

egg and

sausage

spam;

and spam;

spam, egg and spam; spam, spam

egg, spam,

spam,

bacon egg,

spam,

and spam; or lobster thermidor aux

crevettes with a mornay sauce garnished with truffle paté, brandy and a fried egg on top and spam.”

“Have you got anything without spam in it?” Well, there’s spam,

egg, sausage and spam.

That’s not got much

spam

in it.”

“IT don’t want any spam.”

Many mathematical problems come down to determining the number of elements in some finite set. How many different telephone numbers obey a particular set of naming rules? How many of the possible cards in the deck will improve this poker hand?

a hacker have to try to find the right one?

How

many

possible passwords might

Combinatorics is the branch of mathematics

dealing

with such problems. In this chapter we’ll see a number of techniques to answer the question “how many?”, as well as a number of applications.

e We usually count the elements of a dence between

it and

some

finding these bijections, solve. The sum simpler sets.

finite set by finding a bijection, or one to one correspon-

other set that is easier to count.

as well as a “library”

We

thus need

techniques

of counting problems that we know

rule and the product rule will allow us to count sets that are made

how

for

to

up from

e We'll use the sum and product rules to solve the four basic counting problems — how many combinations of k elements can be chosen from a universe

of size n, with or without

replace-

ment and with or without an order on the elements chosen. e An important

example

of our techniques will be the counting of the strings (of a particular

length) in a given formal language. In particular counting the strings in the balanced parenthesis language will lead us to the Catalan Numbers, the solution to a wide variety of counting problems.

©Kendall Hunt Publishing Company

Figure 6-1: How Many Fish?

6.1 6.1.1

Counting:

Sum

and Product

Correct

Counting

and

Rules

Bijections

Suppose you want to count the fish in an aquarium (Figure 6-1). Your two main problems are to: ¢ Count every fish (including those hiding behind rocks, camouflaged against the bottom of the tank, and so forth), and e Not count any fish twice (even if they move).

The counting of mathematical objects poses the same two problems. For example, how many naturals less than 12 are divisible by either 2 or 3? We might add the six even numbers {0, 2,4, 6,8, 10} to the four divisible-by-three numbers {0,3,6,9} and say “10”, but this would be counting 0 and

6 twice each!.

On the other hand, if we go through all twelve numbers and test each one in turn,

we get {0,2,3,4,6,8,9, 10}, a set of size 8. Since we considered each number separately, we can’t

have counted any of them twice, and since we considered them all, we must have the right answer.

Such a correct counting of a finite set is actually a bijection between the set to be counted and the naturals

{1,2,3,...,k},

where

k is the size of the set.

Checking

that we have

a bijection is

just seeing that we’ve counted correctly. For the function from naturals to elements to be onto, we have to count everything, and for it to be one-to-one, we can’t have counted anything twice.

‘By subtracting 2 for the two instances of double counting, we get the correct answer of 8 — we'll discuss double

counting in detail in Section 6.2.

6-2

tautologies. But tautologies can still be useful in two ways. First, any tautology gives us a potential tool to use in proofs, because it generates a true statement that we might then use to justify other statements. In particular, tautologies of the form R + S and R + S, where R and S are themselves compound propositions, can be particularly useful. If R + S is a tautology, and we know that R is true, then S' must be true. If R 4 S is a tautology, we say that R and S are logically equivalent and we may replace R with S in any statement without changing its truth value. In Section 1.7 we'll give some examples of such useful tautologies. If a compound

proposition is not a tautology

(always true),

then it is either sometimes true or

never true.

Definition: A contradiction is a compound proposition that is never true for any of the possible values of its base propositions. A compound proposition is satisfiable if it is true for at least one of the possible settings of the variables. Thus a compound proposition is satisfiable if and only if it is not a contradiction.

1.4.3

E1.4.1

Exercises

Identify the following statements as true or false:

) If p ) The c)

is true, we know that p — q must be true. statement

Ifp Aq is true, we know that p V q

) The statement

) E1.4.2

“pV q’ is a tautology.

“p — (q A (r V p))”

is also true.

is a compound proposition.

You can never determine the truth value of a compound the truth values of all of its base propositions.

proposition without

knowing

Which of the following sentences are propositions?

(a) ) Montréal is the capital of Québec.

) (c) ) Co) (b)

Tell me the capital of Québec.

The next city I will name is the capital of Québec. I don’t know what the capital of Québec is.

E1.4.3

It was claimed in this section that “This statement is false” is not a proposition. What exactly is the problem with assigning a truth value to this statement?

E1.4.4

Evaluate the following Java boolean expressions:

(a) (@B a. Can you find a compound proposition using only the NAND operator that is equivalent to a Ay? How about a V y? 1-31

©Kendall Hunt Publishing Company Figure 6-9: Venn Diagrams for Three-Set Inclusion-Exclusion that an element might be in — there are six pairs taken from the four sets. Later in this chapter we'll learn systematic techniques to list pairs or other subsets of a fixed size, because this is an

example of our third counting problem.) ‘We reasoned above that with just counting and double-counting, an element that was in three of the sets made a net contribution of zero to the sum, so we had to add it back in. This argument applies to any of the four possible ways to choose three of our four sets. Thus we improve our sum

by adding

JAN BNC|+|ANBND|+|ANCND|+|BnNCnDI.

Now what about an element that is in all four of our sets? We counted it four times, removed it six

times to correct double-counting, and added Its net contribution is now

it back in four times to account for triple-counting.

two, and it should be one.

So we subtract it one more time, to give us:

Rule of Inclusion-Exclusion (four-set case):

|AUBUCUD|

Can

=

|Al+ |B] +|C|+|D|

-|ANB|-|ANC|-|AND|-|BncC|-|BND|-|CnD| +|ANBNC|+|ANBND|+|ANCND|+|BNCnD| -|ANBnCnDI.

we see a pattern emerging

in the three-set

intersection of some of the sets occurs in the do we add the size, and when do we subtract’? of three are added, and intersections of four each intersection of an it odd number of the

and

four-set rules?

The

size of every possible

formula, with a coefficient of either +1 or —1. When Intersections of two sets are subtracted, intersections are subtracted. It looks like we subtract the size of sets, and add the size of each intersection of an even

number of sets. Let’s formalize this as a rule, and then try to prove it:

Rule

of Inclusion-Exclusion

(general case):

6-13

Let Aj, A2,..., A, be any k finite sets.

|A1 U

...UAg| is equal to the sum of the sizes of all intersections of an odd number of the sets, minus the sum of the izes of all intersections of an even number of the sets. That is,

JA1U...U Ag]

=

[Ail +... + |Aal

—|ALN Ap| —... = |Ae-1 9 Ap|

+|A1N Ag MN Ag| +... + |Ag—2M Ag-1M Ap stewing

+(-1)"*1A19...9 Ag.

Proof of the Rule

(general case):

We use ordinary induction on k, the number of sets. With

k = 0, the size of the union is definitely zero, and the sum has no terms at all and thus is also zero. ‘We have verified the k = 1, k = 2, k = 3, and k = 4 cases above as well. It remains to prove the

inductive case.

Let k be an arbitrary natural and assume that the general-case rule holds for any

collection of k

>+-+,Agy1

Let B be the set A,U...UAg.

be an arbitrary collection of k +1

finite sets.

By the two-set version of the rule, which we have proved, |BUAx41|

=

|B| + |Agyil — |B Agyil. By the inductive hypothesis, |B| is equal to the sum of all sizes of intersections of an odd number of the first k sets, minus the sum of all sizes of intersections of an even number of those sets. These intersection sizes are some of the intersection sizes that occur in the desired formula for |A, U...U A,41|, and occur with the correct sign. In fact they are exactly the sizes of the intersections that do not involve A,41. What

we still need to do is to show

that the other intersection sums, those that do involve Aj,41,

exactly equal the remainder of our expression for |BU A,41|, which is |Aj41|—|BNAg41|.

The key

step in this argument is to apply the inductive hypothesis again to the k sets Ay 1 Ag4y1, AgNO Agyt, «++, Ap A Api, which we may call B,,...,By. The union of these k sets is BM Ag4i, and thus |BA Ax4i| is the sum of the sizes of all intersections of an odd number of the B,’s, minus the sum of the sizes of all intersections of an even number of the B;’s. Furthermore, the intersection of a collection of the B;’s is exactly the intersection of the corresponding A;’s, intersected with Ag4+1.

Assembling these facts together, we have that:

BUA]

= =

|B] +|Agyi] —|BO Agsal

[(sum of all odd intersections of A;’s) —(sum of all even intersections of A;’s] —[(sum of all odd intersections of B;’s) —(sum of all even intersections of B;’s)]

=

[sum of odd A;’s + sum of even B,’s] —|[sum of even A;’s + sum of odd B,’s]

=

(sum

of all odd intersections from Aj,

—(sum of all even intersections from Aj,

6-14

Thus the size of the union of k +1 arbitrary sets is exactly as given by the rule. We have completed the inductive step and thus completed the proof.

6.2.4

a

Exercises Use the two-set Rule of Inclusion-Exclusion to determine names either begin with I or end in A, or both.

the number

of U.S. states whose

Prove the three-set case of the Rule of Inclusion-Exclusion given above. the two-set

case, break A, B, C,, and

As in the proof for

their intersections into disjoint subsets,

and

then use

arithmetic and the Sum Rule. A basketball team has three players who can play center, seven who can play forward, and five who can play guard. Of these, two can play both center or forward, and three can play

both forward and guard. One player can play all three positions (and she is the only one who can play both center and guard). How many total players are on the team? Illustrate the calculation with a Venn diagram.

E6.2.4

There are 6 x 6 x 6 = 216 different ways to throw three six-sided dice. have a six on the first die?

On the second die?

On the third?

How many

of these

On both the first and second,

and so forth? Use the three-set rule of Inclusion-Exclusion to determine the number of these possible outcomes that have at least one six.

E6.2.5

There are 4 x 4 x 4 = 64 strings of three letters over the alphabet {a,b, c,d}. How many of these contain a double letter, that is, how many have either the first and second letters the same, or the second and third the same, or all three the same?

E6.2.6

Explain carefully why the n = 1 case of the general Rule of Inclusion-Exclusion holds.

E6.2.7 Suppose I have two lists A and B, each containing a list of prior donors to my non-profit organization. I would like to send a fundraising letter to each person who list, but I don’t want to send any person more than one letter.

E6.2.8

(a)

How can I compute the number of letters I need to send?

(b)

How do I compute this number if I have k lists instead of just two?

Suppose I have a

finite collection of sets {A1,..., Ax},

each a subset of a single set G.

I am

given a sequence of naturals that are alleged to be the sizes of each intersection of these sets,

that is, | Aj], |A|

|Ax|, |A19 Aol,

four, and up to k

. How

actual collection of sets?

|Ax—1M Ag|, and so on through intersections of three,

can I determine

(For example,

of |A,|, the numbers are not consistent.) E6.2.9

appears on either

whether

these numbers

if the claimed value of |A1

are con

ent with an

Ag] is larger than that

If a set of data about k sets, as in Exercise 6.2.8, is consistent, then we can use the general case of the Rule of Inclusion-Exclusion to compute the sizes of all the unions of all the subcollections of the collection. Prove that the mapping, from consistent sets of intersection

data to sets of union data, is a bijection.

E6.2.10

A quilter is making a pattern by sewing together two kind of pieces as shown in Figure 6.10, made from an m by n array of overlapping circles. Each circle has radius V2 and intersects

6-15

©Kendall Hunt Publishing Company Figure 6-10:

A quilting pattern made from overlapping circles

the four vertices of one of the squares in an m by n of the larger pieces,

since one of them comes

rectangle.

It is clear that there are mn

from each of the circles,

but the smaller pieces

(“orange peels”) are more challenging to count. Each circle is associated with four of them, but most of the orange peels form part of more than one circle. Give an argument to count the orange peels in the figure for general m and n.

6.2.5 P6.2.1

Problems Prove the Double-Counting Rule directly from the recursive definition of the size of sets in

Section 6.1. That is, let A be an arbitrary set of size m and prove the rule |AUB|+|ANB| = |A| + |B] by induction on the elements of B. You will need two cases in the inductive step, one for 2 € A and one for « ¢ A.

P6.2.2

A student can satisfy the university’s foreign language requirement by passing a test, taking a language in high school, passing a university course, or passing a transfer course. Given the following information, how many students in all satisfied the requirement? e 235 students passed the test.

e 314 students took a language in high school. e 453 students passed a university course. e 37 students passed a transfer course.

e 217 students both passed the test and took a language in high school. e 64 students passed both the test and a university course. e 21 students passed both the test and a transfer course.

e 83 students took a language in high school and passed a university course. e 9 students took a language in high school and passed a

transfer course.

e 4 students passed both a university and a transfer course. e 55 students passed

course.

the test, took a language

in high school,

and passed

a university

e 3 students passed the test, took a language in high school, and passed a transfer course. e 2 students passed the test, a university course, and a transfer course.

e 2 students took a language in high school and passed both a university and a transfer

course.

6-16

e 2 students met all four requirements.

P6.2.3

In this problem we want substring.

to count

the binary strings of various lengths that have

111 as a

(a) How many such strings are there for length 0, 1, 2, or 3? (b)

How many binary strings of length 4 fit the pattern 111X, where each X may be 0 or 1? How many fit the pattern X111? Use the two-set Rule of Inclusion-Exclusion to determine how many strings of length 4 have 111 as a substring.

(c)

Use the three-set Rule of Inclusion-Exclusion to determine how many strings of length 5 have 111 as a substring. Count the strings fitting the patterns 111XX, X111X, and XX111,

and the strings that fit any two or all three of these patterns, then use the rule

to find how many strings fit at least one of the patterns. (d) P6.2.4

Repeat this process for strings of length 6.

Let a, b, and ¢ be arbitrary real numbers each strictly between 0 and

1.

Consider the unit

cube, consisting of all points (x,y,z) in three-dimensional space where y, and z are each real numbers between 0 and 1, inclusive. We want to find the volume of the set of points that have x (q¢ @ 7(p > 1)) as Java boolean expressions.

Java does not define relational operators such as “less than” on booleans, but it does define the equality operator ==. Is this operator of any help in expressing > or