Algorithms and Complexity : A supplement to Algorithm Design by Jon Kleinberg & Éva Tardos
 978-1-84776-412-6

Citation preview

ALGORITHMS AND COMPLEXITY A SUPPLEMENT TO ALGORITHM DESIGN BY JON KLEINBERG & EVA TARDOS Compiled by:

Viggo Kann KTH CSC PEARSON Custom Publishing

Algorithms and Complexity A supplement to Algorithm Design by Jon Kleinberg & Eva Tardos



We work with leading authors to develop the strongest educational materials bringing cutting-edge thinking and best learning practice to a global market. Under a range of well-known imprints, including Financial Times/Prentice Hall, Addison Wesley and Longman, we craft high quality print and electronic publications which help readers to understand and apply their content, whether studying or at work. Pearson Custom Publishing enables our customers to access a wide and expanding range of market-leading content from world-renowned authors and develop their own tailor-made book. You choose the content that meets your needs and Pearson Custom Publishing produces a high-quality printed book. To find out more about custom publishing, visit www.pearsoncustom.co. uk

A Pearson Custom Publication

Algorithms and Complexity A supplement to Algorithm Design by ,, Jon Kleinberg & Eva Tardos Compiled from:

Introduction To The Design and Analysis ofAlgorithms Second Edition by Anany Levitin

Algorithmics: The Spirit of Computing Third Edition by David Harel with Yishai Feldman

Skip Lists: A Probabilistic Alternative to Balanced Trees Article from Communications ofthe ACM, June 1990, 33(6) by William Pugh



Space and Time Tradeoffs

1

7 Space and Time Tradeoffs Things which matter most must never be at the mercy of things which matter less. -Johann Wolfgang von Goethe (1749-1832)

S

pace and time tradeoffs in algorithm design are a well-known issue for both theoreticians and practitioners of computing. Consider, as an example, the problem of computing values of a function at many points in its domain. If it is time that is at a premium, we can precompute the function's values and store them in a table. This is exactly what human computers had to do before the advent of electronic computers, in the process burdening libraries with thick volumes of mathematical tables. Though such tables have lost much of their appeal with the widespread use of electronic computers, the underlying idea has proven to be quite useful in the development of several important algorithms for other problems. In somewhat more general terms, the idea is to preprocess the problem's input, in whole or in part, and store the additional information obtained to accelerate solving the problem afterward. We call this approach input enhancement 1 and discuss the following algorithms based on it: • •

counting methods for sorting (Section 7.1) Boyer-Moore algorithm for string matching and its simplified version suggested by Horspool (Section 7.2)

The other type of technique that exploits space-for-time tradeoffs simply uses extra space to facilitate faster and/or more flexible access to the data. We call this approach prestructuring. This name highlights two facets of this variation of the space-for-time tradeoff: some processing is done before a problem in question 1.

The standard terms used synonymously for this technique are preprocessing and preconditioning. Confusingly, these terms can also be applied to methods that use the idea of preprocessing but do not use extra space (see Chapter 6). Thus, in order to avoid confusion, we use "input enhancement" as a special name for the space-for-time tradeoff technique being discussed here.

2

Algorithms and Complexity

is actually solved but, unlike the input-enhancement variety, it deals with access structuring. We illustrate this approach by • •

hashing (Section 7.3) indexing with B-trees (Section 7.4)

There is one more algorithm design technique related to the space-for-time tradeoff idea: dynamic programming. This strategy is based on recording solutions to overlapping subproblems of a given problem in a table from which a solution to the problem in question is then obtained. We discuss this well-developed technique separately, in the next chapter of the book. Two final comments about the interplay between time and space in algorithm design need to be made. First, the two resources-time and space-do not have to compete with each other in all design situations. In fact, they can align to bring an algorithmic solution that minimizes both the running time and the space consumed. Such a situation arises, in particular, when an algorithm uses a spaceefficient data structure to represent a problem's input, which leads, in turn, to a faster algorithm. Consider, as an example, the problem of traversing graphs. Recall that the time efficiency of the two principal traversal algorithms-depthfirst search and breadth-first search-depends on the data structure used for representing graphs: it is 8(n2) for the adjacency matrix representation and G(n + m) for the adjacency list representation, where n and m are the numbers of vertices and edges, respectively. If input graphs are sparse, i.e., have few edges relative to the number of vertices (say, m E O(n)), the adjacency list representation may well be more efficient from both the space and the running-time points of view. The same situation arises in the manipulation of sparse matrices and sparse polynomials: if the percentage of zeros in such objects is sufficiently high, we can save both space and time by ignoring zeros in the objects' representation and processing. Second, one cannot discuss space-time tradeoffs without mentioning the hugely important area of data compression. Note, however, that in data compression, size reduction is the goal rather than a technique for solving another problem. We discuss just one data compression algorithm, in the next chapter. The reader interested in this topic will find a wealth of algorithms in such books as [Say00].

7. 1 Sorting by Counting As a first example of applying the input enhancement technique, we discuss its application to the sorting problem. One rather obvious idea is to count, for each element of a list to be sorted, the total number of elements smaller than this element and record the results in a table. These numbers will indicate the positions of the elements in the sorted list: e.g., if the count is 10 for some element, it should be in the 11th position (with index 10, if we start counting with 0) in the sorted

Space and Time Tradeoffs

3

array. Thus, we will be able to sort the list by simply copying its elements to their appropriate positions in a new, sorted list. This algorithm is called comparison counting sort (Figure 7.1). ALGORITHM

ComparisonCountingSort(A[O.. n -1])

//Sorts an array by comparison counting //Input: An array A[O.. n - 1] of orderable elements //Output: Array S[O .. n - 1] of A's elements sorted in nondecreasing order for i +- 0 ton - 1 do Count[i] +- 0 for i +- 0 to n - 2 do for j +- i + 1 to n - 1 do if A[i] < A[J] Count[}] +- Count[}] + 1 else Count[i] +- Count[i] + 1 for i +- 0 ton - 1 do S[Count[i]] +- A[i] return S

What is the time efficiency of this algorithm? It should be quadratic because the algorithm considers all the different pairs of an n-element array. More formally, the number of times its basic operation, the comparison A[i] < A[J], is executed is equal to the sum we have encountered several times already: n- 2 n- l

C(n) =

n- 2

LL

n- 2

1 = L[(n -1) - (i ··1 t·=0 j=t + I·o =

+ 1) + 1] =

L(n -1- i) = n(n - l) . 2 t·o =

Since the algorithm makes the same number of key comparisons as selection sort and in addition uses a linear amount of extra space, it can hardly be recommended for practical use.

Array A[0 .. 5] Initially After pass After pass After pass After pass After pass Final state

i i i i i

Array S[0 .. 51

= 0 = 1 = 2

=3 =4

1

62

1

0

Count[] Count[] Count[] Coun t [] Count [I Count[] Count[]

31

1

0 0

3

1

84

1

96

0

0

1 2

1 2 3

4

1

1

19

1

31

5

4

1 1

47

FIGURE 7 .1 Example of sorting by compa rison counting

1

62

1

0 0 0 0 0 0 0

5 3

19

1

84

47

0 0 1 1 1 2 2 1

96

1

4

Algorithms and Complexity

But the counting idea does work productively in a situation in which elements to be sorted belong to a known small set of values. Assume, for example, that we have to sort a list whose values can be either 1 or 2. Rather than applying a general sorting algorithm, we should be able to take advantage of this additional information about values to be sorted. Indeed, we can scan the list to compute the number of 1's and the number of 2's in it and then, on the second pass, simply make the appropriate number of the first elements equal to 1 and the remaining elements equal to 2. More generally, if element values are integers between some lower bound land upper bound u, we can compute the frequency of each of those values and store them in array F[O .. u - l]. Then the first F[O] positions in the sorted list must be filled with l , the next F[l] positions with l + 1, and so on. All this can be done, of course, only if we can overwrite the given elements. Let us consider a more realistic situation of sorting a list of items with some other information associated with their keys so that we cannot overwrite the list's elements. Then we can copy elements into a new array S[O .. n - 1] to hold the sorted list as follows. The elements of A whose values are equal to the lowest possible value/ are copied into the first F[O] elements of S, i.e., positions Othrough F[O] - 1, the elements of value l + l are copied to positions from F[O] to (F[O] + F[l ]) - 1, and so on. Since such accumulated sums of frequencies are called a distribution in statistics, the method itself is known as distribution counting.

EXAMPLE Consider sorting the array

1 13 111 1 12 1 13 12 12 whose values are known to come from the set {11, 12, 13} and should not be overwritten in the process of sorting. The frequency and distribution arrays are as follows: Array values Frequencies Distribution values

11

12

13

3

2

4

6

Note that the distribution values indicate the proper positions for the last occurrences of their elements in the final sorted array. Ifwe index array positions from 0 ton - 1, the distribution values must be reduced by 1 to get corresponding element positions. It is more convenient to process the input array right to left. For the example, the last element is 12, and, since its distribution value is 4, we place this 12 in position 4 - 1 = 3 of the array S that will hold the sorted list. Then we decrease

Space and Time Tradeoffs

A[5]=12 A[4I=12 A [3] = 13 A[2l = 12 A [1] = 11 A [OJ= 13

0[0 .. 2] 4 3 2 2 1 1 0 1 1 1 1 1

S[0 .. 5] 12 12

6 6 6 5 5 5

5

13 12 11 13

FIGURE 7.2 Example of sorting by distribution counting. The distribution va lues being decremented are shown in bold.

the 12's distribution value by 1 and proceed to the next (from the right) element in the given array. The entire processing of this example is depicted in Figure 7.2 .



Here is a pseudocode of this algorithm. ALGORITHM

DistributionCounting(A[O.. n -1], l, u)

//Sorts an array of integers from a limited range by distribution counting //Input: An array A[O.. n - 1] of integers between land u (l :s u) //Output: Array S[O .. n - 1] of A 's elements sorted in nondecreasing order for j - 0 to u - l do D[j] - 0 //initialize frequencies for i - 0 ton - 1 do D[A[i] - l] - D[A[i] - l] + 1 //compute frequencies for j -1 to u - l do D[j] - D[J -1] + D[J] //reuse for distribution for i - n - 1 downto Odo j -

A[i] - l

S[D[j] -1] - A[i] D[j] - D[j] - 1 return S

Assuming that the range of array values is fixed, this is obviously a linear algorithm because it makes just two consecutive passes through its input array A. This is a better time-efficiency class than that of the most efficient sorting algorithms-mergesort, quicksort, and heapsort-we have encountered. It is important to remember, however, that this efficiency is obtained by exploiting the specific nature of input lists on which sorting by distribution counting works, in addition to trading space for time.

- - - - - Exercises 7 .1 - - - - - - - - - - - - - - - - 1. Is it possible to exchange numeric values of two variables, say, u and v, without

using any extra storage?

6

Algorithms and Complexity

2. Will the comparison counting algorithm work correctly for arrays with equal

values? 3. Assuming that the set of possible list values is {a, b, c, d}, sort the following

list in alphabetical order by the distribution counting algorithm: b, c, d, c, b, a, a, b. 4. Is the distribution counting algorithm stable?

5. Design a one-line algorithm for sorting any array of size n whose values are n distinct integers from 1 to n. 6. The ancestry problem asks to determine whether a vertex u is an ancestor

of vertex v in a given binary (or, more generally, rooted ordered) tree of n vertices. Design a O(n) input enhancement algorithm that provides sufficient information to solve this problem for any pair of the tree's vertices in constant time. 7. The following technique, known as virtual initialization, provides a timeefficient way to initialize just some elements of a given array A[O .. n - 1] so

that for each of its elements, we can say in constant time whether it has been initialized and, if it has been, with which value. This is done by utilizing a variable counter for the number of initialized elements in A and two auxiliary arrays of the same size, say B[O .. n - 1] and C[O ..n - 1], defined as follows. B[O], ... , B[counter - 1] contain the indices of the elements of A that were initialized: B [O] contains the index of the element initialized first, B [1] contains the index of the element initialized second, and so on. Furthermore, if A [i] was the kth element (0 ::: k ::: counter - 1) to be initialized, C[i] contains k. a. Sketch the state of arrays A[0 .. 7], B[0 .. 7], and C[0 ..7] after the three assignments A[3] +- x; A[7] +- z; A[l] +- y. b. In general, how can we check with this scheme whether A[i] has been

initialized and, if it has been, with which value? 8. a. Write a program for multiplying two sparse matrices, a p-by-q matrix A

and a q-by-r matrix B. b. Write a program for multiplying two sparse polynomials p(x) and q(x) of degrees m and n, respectively. 9. Tic-tac-toe Write a program that plays the game of tic-tac-toe with the human user by storing all possible positions on the game's 3-by-3 board along with the best move for each of them.

Space and Time Tradeoffs

7

7 .2 Input Enhancement in String Matching In this section, we see how the technique of input enhancement can be applied to the problem of string matching. Recall that the problem of string matching requires finding an occurrence of a given string of m characters (called the pattern) in a longer string of n characters (called the text). We discussed the brute-force algorithm for this problem in Section 3.2: it simply matches corresponding pairs of characters in the pattern and the text left to right and, if a mismatch occurs, shifts the pattern one position to the right for the next trial. Since the maximum number of such trials is n - m + l and, in the worst case, m comparisons need to be made on each of them, the worst-case number of character comparisons is m(n - m + 1). This puts the worst-case performance of the brute-force algorithm in the G(nm) class. On average, however, we should expect just a few comparisons before a pattern's shift, and for random natural-language texts, the average-case efficiency indeed turns out to be in G(n). The good average-case efficiency of the brute-force algorithm can be considered both good and bad news. It is good news from a practical standpoint because it makes the brute-force solution a passable candidate for practical applications (especially for short patterns). It is bad news for a theoretician who would like to invent a faster algorithm. Nevertheless, several better algorithms have been discovered. Most of them exploit the input enhancement idea: preprocess the pattern to get some information about it, store this information in a table, and then use this information during an actual search for the pattern in a given text. This is exactly the idea behind the two best-known algorithms of this type: the KnuthMorris-Pratt algorithm [Knu77] and the Boyer-Moore algorithm [Boy77]. The principal difference between these two algorithms lies in the way they compare characters of a pattern with their counterparts in a text: the KnuthMorris-Pratt algorithm does it left to right, while the Boyer-Moore algorithm does it right to left. Since the latter idea leads to simpler algorithms, it is the only one we will pursue here. (Note that the Boyer-Moore algorithm starts by aligning the pattern against the beginning characters of the text; if the first trial fails, it shifts the pattern to the right. It is comparisons within a trial that the algorithm does right to left, starting with the last one in the pattern.) Although the underlying idea of the Boyer-Moore algorithm is simple, its actual implementation in a working method is less so. Therefore, we start our discussion with a simplified version of the Boyer-Moore algorithm suggested by R. Horspool [Hor80]. In addition to being simpler, Horspool's algorithm is not necessarily less efficient than the Boyer-Moore algorithm on random strings.

Horspool's Algorithm Consider, as an example, searching for the pattern BARBER in some text: C

BARBE R

8

Algorithms and Complexity

Starting with the last R of the pattern and moving right to left, we compare the corresponding pairs of characters in the pattern and the text. If all the pattern's characters match successfully, a matching substring is found. (Then the search can be either stopped altogether or continued if another occurrence of the same pattern is desired.) If, however, we encounter a mismatch, we need to shift the pattern to the right. Clearly, we would like to make as large a shift as possible without risking the possibility of missing a matching substring in the text. Horspool 's algorithm determines the size of such a shift by looking at the character c of the text that was aligned against the last character of the pattern. In general, the following four possibilities can occur. Case 1 If there are no c's in the pattern-e.g., c is letter S in our examplewe can safely shift the pattern by its entire length (if we shift less, some character of the pattern would be aligned against the text's character c that is known not to be in the pattern):

so

s K B A R B E R

BARBE R Case 2 If there are occurrences of character c in the pattern but it is not the last one there-e.g., c is letter Bin our example-the shift should align the rightmost occurrence of c in the pattern with the c in the text: B

K BARBE R BARB E R Case 3 If c happens to be the last character in the pattern but there are no c's among its other m - 1 characters, the shift should be similar to that of Case 1: the pattern should be shifted by the entire pattern's length m, e.g.,

M E R

K II II LEADER

L EADER Case 4 Finally, if c happens to be the last character in the pattern and there are other c's among its first m - 1 characters, the shift should be similar to that of Case 2: the rightmost occurrence of c among the first m - 1 characters in the pattern should be aligned with the text's c, e.g.,

Space and Time Tradeoffs

9

0 R

,If II REORDER REORDER

These examples clearly demonstrate that right-to-left character comparisons can lead to farther shifts of the pattern than the shifts by only one position always made by the brute-force algorithm. However, if such an algorithm had to check all the characters of the pattern on every trial, it would lose much of this superiority. Fortunately, the idea of input enhancement makes repetitive comparisons unnecessary. We can precompute shift sizes and store them in a table. The table will be indexed by all possible characters that can be encountered in a text, including, for natural language texts, the space, punctuation symbols, and other special characters. (Note that no other information about the text in which eventual searching will be done is required.) The table's entries will indicate the shift sizes computed by the formula

I

the pattern's length m, if c is not among the first m - l characters of the pattern

t(c)=

(7.1) the distance from the rightmost c among the first m - l characters of the pattern to its last character, otherwise

For example, for the pattern BARBER, all the table's entries will be equal to 6, except for the entries for E, B, R, and A, which will be 1, 2, 3, and 4, respectively. Here is a simple algorithm for computing the shift table entries. Initialize all the entries to the pattern's length m and scan the pattern left to right repeating the following step m - l times: for the jth character of the pattern (0:::: j :::: m - 2) , overwrite its entry in the table with m - l - j, which is the character's distance to the right end of the pattern. Note that since the algorithm scans the pattern from left to right, the last overwrite will happen for a character's rightmost occurrenceexactly as we would like it to be.

ShiftTable(P[O .. m -1]) //Fills the shift table used by Horspool's and Boyer-Moore algorithms //Input: Pattern P[O .. m - 1] and an alphabet of possible characters //Output: Table[O .. size - l] indexed by the alphabet's characters and // filled with shift sizes computed by formula (7.1) initialize all the elements of Table with m for j - 0 tom - 2 do Table[P[J]] - m -1- j return Table

ALGORITHM

10

Algorithms and Complexity

Now, we can summarize the algorithm as follows. Horspool's algorithm

Step 1 For a given pattern of length m and the alphabet used in both the pattern and text, construct the shift table as described above. Step 2 Align the pattern against the beginning of the text. Step 3 Repeat the following until either a matching substring is found or the pattern reaches beyond the last character of the text. Starting with the last character in the pattern, compare the corresponding characters in the pattern and text until either all m characters are matched (then stop) or a mismatching pair is encountered. In the latter case, retrieve the entry t(c) from the c's column of the shift table where c is the text's character currently aligned against the last character of the pattern, and shift the pattern by t(c) characters to the right along the text.

Here is a pseudocode of Horspool's algorithm. Horspoo!Matching(P[O .. m -1], T[O .. n -1]) //Implements Horspool's algorithm for string matching //Input: Pattern P[O .. m - 1) and text T[O .. n - 1] //Output: The index of the left end of the first matching substring II or -1 if there are no matches ShiftTable(P[O .. m - l]) //generate Table of shifts i +- m - 1 //position of the pattern's right end while i S n - l do k +- 0 //number of matched characters while k s m -1 and P[m -1- k] = T[i - k] do k+-k+l ifk = m returni-m+l else i +- i+ Table[T[i]] return -1

ALGORITHM

EXAMPLE As an example of a complete application of Horspool's algorithm, consider searching for the pattern BARBER in a text that comprises English letters and spaces (denoted by underscores). The shift table, as we mentioned, is filled as follows:

character c

A

B

C

D

E

F

...

R

. ..

z

-

shift t (c)

4

2

6

6

1

6

6

3

6

6

6

Space and Time Tradeoffs

11

The actual search in a particular text proceeds as follows:

sAw

A_ BA RB E RS B A R BER BARBE R BARB E R BA RB E R BARBE R B ARBE R J I M

M

E

I N

H O P



A simple example can demonstrate that the worst-case efficiency of Horspool's algorithm is in G(nm) (Problem 4 in the exercises). But for random texts, it is in G(n), and, though in the same efficiency class, Horspool's algorithm is obviously faster on average than the brute-force algorithm. In fact, as mentioned, it is often at least as efficient as its more sophisticated predecessor discovered by R. Boyer and J. Moore.

Boyer-Moore Algorithm Now we outline the Boyer-Moore algorithm itself. If the first comparison of the rightmost character in the pattern with the corresponding character c in the text fails, the algorithm does exactly the same thing as Horspool's algorithm. Namely, it shifts the pattern to the right by the number of characters retrieved from the table precomputed as explained earlier. The two algorithms act differently, however, after some positive number k (0 < k < m) of the pattern's characters are matched successfully before a mismatch is encountered:

Po

C

si-k+l

Sj

Jf

II

II

Pm-k-1

Pm-k

Pm-1

Sn-l

text pattern

In this situation, the Boyer-Moore algorithm determines the shift size by considering two quantities. The first one is guided by the text's character c that caused a mismatch with its counterpart in the pattern. Accordingly, it is called the badsymbol shift. The reasoning behind this shift is the reasoning we used in Horspool's algorithm. If c is not in the pattern, we shift the pattern to just pass this c in the text. Conveniently, the size of this shift can be computed by the formula t 1 (c) - k, where t 1(c) is the entry in the precomputed table used by Horspool's algorithm and k is the number of matched characters: so Po

C

si-k+l

Sj

Jf

II

II

Pm-k-1

Pm-k Po

Pm-1

Sn-]

text pattern

Pm-1

12

Algorithms and Complexity

For example, if we search for the pattern BARBER in some text and match the last two characters before failing on letter Sin the text, we can shift the pattern by t 1(S) - 2 = 6 - 2 = 4 positions:

so

S E R

K II II BARBER BARBER

The same formula can also be used when the mismatching character c of the text occurs in the pattern, provided t1(c) - k > 0. For example, if we search for the pattern BARBER in some text and match the last two characters before failing on letter A, we can shift the pattern by t1(A) - 2 = 4 - 2 = 2 positions:

A E R K II II BARBER BARBER

If ti (c) - k :s 0, we obviously do not want to shift the pattern by 0 or a negative number of positions. Rather, we can fall back on the brute-force thinking and simply shift the pattern by one position to the right. To summarize, the bad-symbol shift d1 is computed by the Boyer-Moore algorithm either as t1(c) - kif this quantity is positive or as 1 if it is negative or zero. This can be expressed by the following compact formula: d 1 = max{t 1(c) - k, l}.

(7.2)

The second type of shift is guided by a successful match of the last k > 0 characters of the pattern. We refer to the ending portion of the pattern as its suffix of size k and denote itsuff(k) . Accordingly, we call this type of shift the good-suffix shift. We now apply the reasoning that guided us in filling the bad-symbol shift table, which was based on a single alphabet character c, to the pattern's suffixes of sizes 1, ... , m - l to fill in the good-suffix shift table. Let us first consider the case when there is another occurrence of suff (k) in the pattern or, to be more accurate, there is another occurrence of suff (k) not preceded by the same character as in its last occurrence. (It would be useless to shift the pattern to match another occurrence of suff(k) preceded by the same character because this would simply repeat a failed trial.) In this case, we can shift the pattern by the distance d2 between such a second rightmost occurrence (not preceded by the same character as in the last occurrence) of suff(k) and its rightmost occurrence. For example, for the pattern ABCBAB, these distances for k = l and 2 will be 2 and 4, respectively:

Space and Time Tradeoffs

k

pattern

1

ABCBA~ ABCBAB

2

13

2 4

What is to be done if there is no other occurrence of suft(k) not preceded by the same character as in its last occurrence? In most cases, we can shift the pattern by its entire length m. For example, for the pattern DBCBAB and k = 3, we can shift the pattern by its entire length of 6 characters: c

B A B

X II II II DBCBAB

DBC BAB Unfortunately, shifting the pattern by its entire length when there is no other occurrence of suff (k) not preceded by the same character as in its last occurrence is not always correct. For example, for the pattern ABCBAB and k = 3, shifting by 6 could miss a matching substring that starts with the text's AB aligned with the last two characters of the pattern:

c BABCBAB

X II II II ABCBAB

ABC BAB Note that the shift by 6 is correct for the pattern DBCBAB but not for ABCBAB, because the latter pattern has the same substring AB as its prefix (beginning part of the pattern) and as its suffix (ending part of the pattern). To avoid such an erroneous shift based on a suffix of size k, for which there is no other occurrence in the pattern not preceded by the same character as in its last occurrence, we need to find the longest prefix of size l < k that matches the suffix of the same size l. If such a prefix exists, the shift size d2 is computed as the distance between this prefix and the corresponding suffix; otherwise, d2 is set to the pattern's length m. As an example, here is the complete list of the d2 values-the good-suffix table of the Boyer-Moore algorithm-for the pattern ABCBAB: k

pattern

d2

1

ABCBA~ ABCBAB ABCBAB ABCBAB ABCBAB

2

2 3 4

5

4 4 4 4

14

Algorithms and Complexity

Now we are prepared to summarize the Boyer-Moore algorithm in its entirety.

The Boyer-Moore algorithm Step 1 For a given pattern and the alphabet used in both the pattern and the

text, construct the bad-symbol shift table as described earlier. Step 2 Using the pattern, construct the good-suffix shift table as described. Step 3 Align the pattern against the beginning of the text. Step 4 Repeat the following step until either a matching substring is found or the pattern reaches beyond the last character of the text. Starting with the last character in the pattern, compare the corresponding characters in the pattern and the text until either all m character pairs are matched (then stop) or a mismatching pair is encountered after k c::_ 0 character pairs are matched successfully. In the latter case, retrieve the entry t 1(c) from the c's column of the bad-symbol table where c is the text's mismatched character. If k > 0, also retrieve the corresponding d 2 entry from the good-suffix table. Shift the pattern to the right by the number of positions computed by the formula d _ { d1

-

max{d1,d2 )

if k = 0 if k > 0 '

(7.3)

where d1 = max{t1(c) - k, 1). Shifting by the maximum of the two available shifts when k > 0 is quite logical. The two shifts are based on the observations-the first one about a text's mismatched character and the second one about a matched group of the pattern's rightmost characters-that imply that shifting by less than d 1 and d2 characters, respectively, cannot lead to aligning the pattern with a matching substring in the text. Since we are interested in shifting the pattern as far as possible without missing a possible matching substring, we take the maximum of these two numbers.

EXAMPLE As a complete example, let us search for BAOBAB in a text made of English letters and spaces. The bad-symbol table looks as follows:

6

6

15

Space and Time Tradeoffs

B

E

B A

s s 0

B

K A

d1 = t1 (K) - 0 = 6

N

E

w

B

A

0

0

u

T

A

0

B

A

B

A

B

d1 = t1 (_) - 2 = 4

B

d2 =5

d1 = t1 (_) - 1 = 5

d = max{4, 5} = 5

d2 =2

B

A

0

B

A

B

A

0

B

A

B

s

B B

A

B

d = max{5, 2} = 5

B

FIGURE 7.3 Example of string matching with the Boyer-Moore algorithm

The good-suffix table is filled as follows: k

pattern

d2

1 2 3 4 5

BAOBA~ BAOBAB BAOBAB BAOBAB BAOBAB

2 5 5 5 5

The actual search for this pattern in the text given in Figure 7.3 proceeds as follows. After the last Bof the pattern fails to match its counterpart Kin the text, the algorithm retrieves t1(K) = 6 from the bad-symbol table and shifts the pattern by d 1 = max{t1(K) - 0, 1} = 6 positions to the right. The new try successfully matches two pairs of characters. After the failure of the third comparison on the space character in the text, the algorithm retrieves t1( _) = 6 from the bad-symbol table and d2 = 5 from the good-suffix table to shift the pattern by max{d1,d2 } = max{6 2, 5} = 5. Note that on this iteration it is the good-suffix rule that leads to a farther shift of the pattern. The next try successfully matches just one pair of B's. After the failure of the next comparison on the space character in the text, the algorithm retrieves '1 (_) = 6 from the bad-symbol table and d2 = 2 from the good-suffix table to shift the pattern by max{d1,d2 } = max{6 - 1, 2} = 5. Note that on this iteration it is the badsymbol rule that leads to a farther shift of the pattern. The next try finds a matching substring in the text after successfully matching all six characters of the pattern with their counterparts in the text. • When searching for the first occurrence of the pattern, the worst-case efficiency of the Boyer-Moore algorithm is known to be linear. Though this algorithm

16

Algorithms and Complexity

runs very fast, especially on large alphabets (relative to the length of the pattern), many people prefer its simplified versions, such as Horspool's algorithm, when dealing with natural language-like strings.

Limitations of Algorithm Power

17

11 Limitations of Algorithm Power Intellect distinguishes between the possible and the impossible; reason distinguishes between the sensible and the senseless. Even the possible can be senseless. -Max Born (1882-1970), My Life and My Views, 1968

I

n the preceding chapters of this book, we encountered dozens of algorithms for solving a variety of different problems. A fair assessment of algorithms as problem-solving tools is inescapable: they are very powerful instruments, especially when they are executed by modern computers. But the power of algorithms is not unlimited, and its limits are the subject of this chapter. As we shall see, some problems cannot be solved by any algorithm. Other problems can be solved algorithmically but not in polynomial time. And even when a problem can be solved in polynomial time by some algorithms, there are usually lower bounds on their efficiency. We start, in Section 11.1, with methods for obtaining lower bounds, i.e., estimates on a minimum amount of work needed to solve a problem. In general, obtaining a nontrivial lower bound even for a simple-sounding problem is a very difficult task. As opposed to ascertaining the efficiency of a particular algorithm, the task here is to establish a limit on the efficiency of any algorithm, known or unknown. This also necessitates a careful description of the operations such algorithms are allowed to perform. If we fail to define carefully the "rules of the game," so to speak, our claims may end up in the large dustbin of impossibility-related statements as, for example, the one made by the celebrated British physicist Lord Kelvin in 1895: "Heavier-than-air flying machines are impossible." Section 11.2 discusses decision trees. This technique allows us, among other applications, to establish lower bounds on the efficiency of comparison-based algorithms for sorting and for searching in sorted arrays. As a result, we will be able to answer such questions as whether it is possible to invent a faster sorting algorithm than mergesort and whether binary search is the fastest algorithm for

18

Algorithms and Complexity

searching in a sorted array. (What does your intuition tell you the answers to these questions will turn out to be?) Incidentally, decision trees are also a great vehicle for directing us to a solution of some puzzles, such as the fake-coin problem discussed in Section 5.5. Section 11.3 deals with the question of intractability: which problems can and cannot be solved in polynomial time. This well-developed area of theoretical computer science is called computational complexity theory. We present the basic elements of this theory and discuss informally such fundamental notions as P, NP, and NP-complete problems, including the most important unresolved question of theoretical computer science about the relationship between P and NP problems. The last section of this chapter deals with numerical analysis. This branch of computer science concerns algorithms for solving problems of "continuous" mathematics-solving equations and systems of equations, evaluating such functions as sin x and In x, computing integrals, and so on. The nature of such problems imposes two types of limitations. First, most cannot be solved exactly. Second, solving them even approximately requires dealing with numbers that can be represented in a digital computer with only a limited level of precision. Manipulating approximate numbers without proper care can lead to very inaccurate results. We will see that even solving a basic quadratic equation on a computer poses significant difficulties that require a modification of the canonical formula for the equation's roots.

11. 1 Lower-Bound Arguments We can look at the efficiency of an algorithm two ways. We can establish its asymptotic efficiency class (say, for the worst case) and see where this class stands with respect to the hierarchy of efficiency classes outlined in Section 2.2. For example, selection sort, whose efficiency is quadratic, is a reasonably fast algorithm, whereas the algorithm for the Tower of Hanoi problem is very slow because its efficiency is exponential. One can argue, however, that this comparison is akin to the proverbial comparison of apples to oranges because these two algorithms solve different problems. The alternative and possibly "fairer" approach is to ask how efficient a particular algorithm is with respect to other algorithms for the same problem. Seen in this light, selection sort has to be considered slow because there are O(n log n) sorting algorithms; the Tower of Hanoi algorithm, on the other hand, turns out to be the fastest possible for the problem it solves. When we want to ascertain the efficiency of an algorithm with respect to other algorithms for the same problem, it is desirable to know the best possible efficiency any algorithm solving the problem may have. Knowing such a lower bound can tell us how much improvement we can hope to achieve in our quest for a better algorithm for the problem in question. If such a bound is tight, i.e., we already know an algorithm in the same efficiency class as the lower bound, we can hope for a constant-factor improvement at best. If there is a gap between the efficiency

Limitations of Algorithm Power

19

of the fastest algorithm and the best lower bound known, the door for possible improvement remains open: either a faster algorithm matching the lower bound could exist or a better lower bound could be proved. In this section, we present several methods for establishing lower bounds and illustrate them with specific examples. As we did in analyzing the efficiency of specific algorithms in the preceding chapters, we should distinguish between a lower-bound class and a minimum number of times a particular operation needs to be executed. As a rule, the second problem is more difficult than the first. For example, we can immediately conclude that any algorithm for finding the median of n numbers must be in Q (n) (why?), but it is not simple at all to prove that any comparison-based algorithm for this problem must do at least 3(n - 1)/2 comparisons in the worst case (for odd n).

Trivial Lower Bounds The simplest method of obtaining a lower-bound class is based on counting the number of items in the problem's input that must be processed and the number of output items that need to be produced. Since any algorithm must at least "read" all the items it needs to process and "write" all its outputs, such a count yields a trivial lower bound. For example, any algorithm for generating all permutations of n distinct items must be in Q(n!) because the size of the output is n!. And this bound is tight because good algorithms for generating permutations spend a constant time on each of them except the initial one (see Section 5.4). As another example, consider the problem of evaluating a polynomial of degree n

at a given point x, given its coefficients an, an_ 1, ... , a0 . It is easy to see that all the coefficients have to be processed by any polynomial-evaluation algorithm. Indeed, if it were not the case, we could change the value of an unprocessed coefficient, which would change the value of the polynomial at a nonzero point x. This means that any such algorithm must be in Q (n). This lower bound is tight because both the right-to-left evaluation algorithm (Problem 2 in Exercises 6.5) and Homer's rule (Section 6.5) are both linear. In a similar vein, a trivial lower bound for computing the product of two nby-n matrices is Q(n 2) because any such algorithm has to process 2n 2 elements in the input matrices and generate n 2 elements of the product. It is still unknown, however, whether this bound is tight. Trivial lower bounds are often too low to be useful. For example, the trivial bound for the traveling salesman problem is Q (n 2), because its input is n(n - 1) /2 intercity distances and its output is a list of n + l cities making up an optimal tour. But this bound is all but useless since there is no known algorithm for this problem with the running time being a polynomial function of any degree.

20

Algorithms and Complexity

There is another obstacle to deriving a meaningful lower bound by this method. It lies in determining which part of an input must be processed by any algorithm solving the problem in question. For example, searching for an element of a given value in a sorted array does not require processing all its elements (why?). As another example, consider the problem of determining connectivity of an undirected graph defined by its adjacency matrix. It is plausible to expect that any such algorithm would have to check the existence of each of the n(n - 1)/ 2 potential edges, but the proof of this fact is not trivial.

Information-Theoretic Arguments While the approach outlined above takes into account the size of a problem's output, the information-theoretical approach seeks to establish a lower bound based on the amount of information it has to produce. Consider, as an example, the well-known game of deducing a positive integer between 1 and n selected by somebody by asking that person questions with yes/no answers. The amount of uncertainty that any algorithm solving this problem has to resolve can be measured by '1og2 n l, the number of bits needed to specify a particular number among the n possibilities. We can think of each question (or, to be more accurate, an answer to each question) as yielding at most one bit of information about the algorithm's output, i.e., the selected number. Consequently, any such algorithm will need at least 1log2 nl such steps before it can determine its output in the worst case. The approach we just exploited is called the information-theoretic argument because of its connection to information theory. It has proved to be quite useful for finding the so-called information-theoretic lower bounds for many problems involving comparisons, including sorting and searching. Its underlying idea can be realized much more precisely through the mechanism of decision trees. Because of the importance of this technique, we discuss it separately and in more detail in Section 11.2.

Adversary Arguments Let us revisit the same game of "guessing" a number used to introduce the idea of an information-theoretic argument. We can prove that any algorithm that solves this problem must ask at least '1og2 nl questions in its worst case by playing the role of a hostile adversary who wants to make an algorithm ask as many questions as possible. The adversary starts by considering each of the numbers between 1 and n as being potentially selected. (This is cheating, of course, as far as the game is concerned, but not as a way to prove our assertion.) After each question, the adversary gives an answer that leaves him with the largest set of numbers consistent with this and all the previously given answers. (This strategy leaves him with at least one half of the numbers he had before his last answer.) If an algorithm stops before the size of the set is reduced to one, the adversary can exhibit a number that could be a legitimate input the algorithm failed to identify.

Limitations of Algorithm Power

21

It is a simple technical matter now to show that one needs 1log2 nl iterations to shrink an n-element set to a one-element set by halving and rounding up the size of the remaining set. Hence, at least rJog2 nl questions need to be asked by any algorithm in the worst case. This example illustrates the adversary method for establishing lower bounds. It is based on following the logic of a malevolent but honest adversary: the malevolence makes him push the algorithm down the most time-consuming path, while his honesty forces him to stay consistent with the choices already made. A lower bound is then obtained by measuring the amount of work needed to shrink a set of potential inputs to a single input along the most time-consuming path. As another example, consider the problem of merging two sorted lists of size n

a 1 < az < ... < an and bi < b2 < .. . < bn into a single sorted list of size 2n. For simplicity, we assume that all the a's and b's are distinct, which gives the problem a unique solution. We encountered this problem when discussing mergesort in Section 4.1. Recall that we did merging by repeatedly comparing the first elements in the remaining lists and outputting the smaller among them. The number of key comparisons in the worst case for this algorithm for merging is 2n - l. Is there an algorithm that can do merging faster? The answer turns out to be no. Knuth ([Knulll], p. 198) quotes the following adversary method for proving that 2n - 1 is a lower bound on the number of key comparisons made by any comparison-based algorithm for this problem. The adversary will employ the following rule: reply true to the comparison ai < bJ if and only if i < j . This will force any correct merging algorithm to produce the only combined list consistent with this rule: b1 < a1 < b2 < az < ... < bn < an.

To produce this combined list, any correct algorithm will have to explicitly compare 2n - 1 adjacent pairs of its elements, i.e., b1 to a 1, a 1 to b2 , and so on. If one of these comparisons has not been made, e.g., a 1 has not been compared to b2 , we can transpose these keys to get

which is consistent with all the comparisons made but cannot be distinguished from the correct configuration given above. Hence, 2n - 1 is indeed a lower bound for the number of key comparison needed for any merging algorithm.

Problem Reduction We have already encountered the problem-reduction approach in Section 6.6. There, we discussed getting an algorithm for problem P by reducing it to another problem Q solvable with a known algorithm. A similar reduction idea can be used for finding a lower bound. To show that problem P is at least as hard as another

22

Algorithms and Complexity

TABLE 11.1 Problems often used for establishing low er bounds

by problem reduction

Problem sorting searching in a sorted array element uniqueness problem multiplication of n-digit integers multiplication of square matrices

Lower bound

Tightness

Q(n logn)

yes yes yes unknown unknown

Q(logn) Q(nlogn) Q(n)

Q(n2)

problem Q with a known lower bound, we need to reduce Q to P (not P to Q!). In other words, we should show that an arbitrary instance of problem Q can be transformed (in a reasonably efficient fashion) to an instance of problem P, so any algorithm solving P would solve Q as well. Then a lower bound for Q will be a lower bound for P. Table 11.1 lists several important problems that are often used for this purpose. We will establish the lower bounds for sorting and searching in the next section. The element uniqueness problem asks whether there are duplicates among n given numbers. (We encountered this problem in Sections 2.3 and 6.1.) The proof of the lower bound for this seemingly simple problem is based on a very sophisticated mathematical analysis that is well beyond the scope of this book (see, e.g., [Pre85] for a rather elementary exposition). As to the last two algebraic problems in Table 11.1, there the lower bounds quoted are trivial, but whether they can be improved remains unknown. As an example of establishing a lower bound by reduction, let us consider the Euclidean minimum spanning tree problem: given n points in the Cartesian plane, construct a tree of minimum total length whose vertices are the given points. As a problem with a known lower bound, we use the element uniqueness problem. We can transform any set x1, x 2 , ... , xn of n real n11mbers into a set of n points in the Cartesian plane by simply adding 0 as the points' y coordinate: (x1, 0), (x 2 , 0) , .. . , (xn,0). Let T be a minimum spanning tree found for this set of points. Since T must contain a shortest edge, checking whether T contains a zerolength edge will answer the question about uniqueness of the given numbers. This reduction implies that Q(n log n) is a lower bound for the Euclidean minimum spanning tree problem, too. Since the final results about the complexity of many problems are not known, the reduction technique is often used to compare the relative complexity of problems. For example, the formulas X •

y

(x + y)2 -4_(x_- _ y)2 = ____ _

and

x 2 =x · x

Limitations of Algorithm Power

23

show that the problems of computing the product of two n-digit integers and squaring an n-digit integer belong to the same complexity class, despite the latter being seemingly simpler than the former. There are several similar results for matrix operations. For example, multiplying two symmetric matrices turns out to be in the same complexity class as multiplying two arbitrary square matrices. This result is based on the observation that not only the former problem is a special case of the latter one, but we also can reduce the problem of multiplying two arbitrary square matrices of order n, say A and B, to the problem of multiplying two symmetric matrices BT]

0

'

where AT and BT are the transpose matrices of A and B (i.e., AT [i, J] = A [J , i] and BT[i, J] = B[J, i]), respectively, and Ostands for the n-by-n matrix whose elements are all zeros. Indeed, XY = [ 1T

t] [~ ~T]

= [ AOB

ATOBT] '

from which the needed product AB can be easily extracted. (True, we will have to multiply matrices twice the original size, but this is just a minor technical complication with no impact on the complexity classes.) Though such results are interesting, we will encounter even more important applications of the reduction approach to comparing problem complexity in Section 11.3.

- - - - - Exercises 11.1 - - - - - - - - - - - - - - - 1. Prove that any algorithm solving the alternating-disk puzzle (Problem 11 in

Exercises 3.1) must make at least n(n + 1)/2 moves to solve it. Is this lower bound tight?

2. Prove that the classic recursive algorithm for the Tower of Hanoi puzzle (Section 2.4) makes the minimum number of disk moves needed to solve it. 3. Find a trivial lower-bound class for each of the following problems and indi-

cate, if you can, whether this bound is tight. a. finding the largest element in an array b. checking completeness of a graph represented by its adjacency matrix c. generating all the subsets of a n-element set d. determining whether n given real numbers are all distinct

4. Consider the problem of identifying a lighter fake coin among n identicallooking coins with the help of a balance scale. Can we use the same information-theoretic argument as the one in the text for the number of questions in

24

Algorithms and Complexity

the guessing game to conclude that any algorithm for identifying the fake will need at least !log2 n 1 weighings in the worst case? 5. Prove that any comparison-based algorithm for finding the largest among n

given numbers must make n - 1 comparisons in the worst case. 6. Find a tight lower bound for sorting an array by exchanging its adjacent

elements. 7. Give an adversary-argument proof that the time efficiency of any algorithm that checks connectivity of a graph with n vertices is in Q(n 2) , provided the only operation allowed for an algorithm is to inquire about the presence of an edge between two vertices of the graph. Is this lower bound tight? 8. What is the minimum number of comparisons needed for a comparison-based sorting algorithm to merge any two sorted lists of sizes n and n + 1 elements,

respectively? Prove the validity of your answer. 9. Find the product of matrices A and B through a transformation to a product

of two symmetric matrices if A

=

[1 -lJ 2

3

and B

[ 0 1]

= _1

2

.

10. a. Can we use this section's formulas that indicate the complexity equiva-

lence of multiplication and squaring of integers to show the complexity equivalence of multiplication and squaring of square matrices? b. Show that multiplication of two matrices of order n can be reduced to squaring a matrix of order 2n . 11. Find a tight lower-bound class for the problem of finding two closest numbers among n real numbers x1 , x2 , . . . , xn-

11 .2 Decision Trees Many important algorithms, especially those for sorting and searching, work by comparing items of their inputs. We can study the performance of such algorithms with a device called the decision tree. As an example, Figure 11.1 presents a decision tree of an algorithm for finding a minimum of three numbers. Each internal node of a binary decision tree represents a key comparison indicated in the node, e.g., k < k'. The node's left subtree contains the information about subsequent comparisons made if k < k' , while its right subtree does the same for the case of k > k'. (For the sake of simplicity, we assume throughout this section that all input items are distinct.) Each leaf represents a possible outcome of the algorithm's run on some input of size n. Note that the number of leaves can be greater than the number of outcomes because, for some algorithms, the same outcome can be arrived at through a different chain of comparisons. (This happens

25

Limitations of Algorithm Power

yes

a

no

b

C

FIGURE 11.1 Decision tree for finding

C

a minimum of three numbers

to be the case for the decision tree in Figure 11.1.) An important point is that the number of leaves must be at least as large as the number of possible outcomes. The algorithm's work on a particular input of size n can be traced by a path from the root to a leaf in its decision tree, and the number of comparisons made by the algorithm on such a run is equal to the number of edges in this path. Hence, the number of comparisons in the worst case is equal to the height of the algorithm's decision tree. The central idea behind this model lies in the observation that a tree with a given number of leaves, which is dictated by the number of possible outcomes, has to be tall enough to have that many leaves. Specifically, it is not difficult to prove that for any binary tree with/ leaves and height h, (11.1)

Indeed, a binary tree of height h with the largest number of leaves has all its leaves on the last level (why?). Hence, the largest number of leaves in such a tree is 2h. In other words, 2h :::: /, which immediately implies (11.1). Inequality (11.1) puts a lower bound on the heights of binary decision trees and hence the worst-case number of comparisons made by any comparison-based algorithm for the problem in question. Such a bound is called the infonnationtheoretic lower bound (see Section 11.1). We illustrate this technique below on two important problems: sorting and searching in a sorted array.

Decision Trees for Sorting Algorithms Most sorting algorithms are comparison-based, i.e., they work by comparing elements in a list to be sorted. Moreover, with the notable exception of binary insertion sort (Problem 9 in Exercises 5.1), comparing two elements is the basic operation of such algorithms. Therefore, by studying properties of decision trees

26

Algorithms and Complexity

abc yes

no

abc

abc yes

no

yes

no

cba

abc

a 2, output a set of tile types T of size N, for which max(T) is the largest among all tile types of size at most N.

8.10.

(a) Write a program which for a given X, terminates with the result "yes" precisely when the 3 x X + 1 program halts for X. (b) For each of the following inputs, calculate the highest number reached during the run of the 3 x X + 1 program, and the number of iterations it takes for the program to terminate: 256,101, 55,103,151,383, 71209.

8.11.

(a) Does the variant of the 3 x X + I program, in which the assignment "X - 3 x X + l" is replaced by "X - 2 x X + l ", halt for every positive integer X? (b) Prove that the following program terminates for every positive integer X.

X # Z.

while X is odd do the following: X - (3 x X + 1)/2. (Hint consider the binary representation of an odd X.)

Noncomputability and Undecidability

8.12.

Formulate the reduction of the halting problem to the verification problem, by writing down the transforming algorithm in detail.

8.13.

(a) Assume we never allow a program to be run on precisely its own text. How would you prove the undecidability of the halting problem nevertheless? (b) Assume we never allow a program in our programming language L to be run on any program text in L. Show that even so the halting problem for programs in L is undecidable. (c) Assume that programs in the programming language L must always be run on integer numbers. Show that the halting problem for programs in Lis undecidable nevertheless.

8.14.

Prove that the following problems are undecidable: (a) Given a program P and an input X, determine whether either P does not halt on X, or it halts and produces 8 as its result. (b) Given two programs P and Q, and an input X, determine whether either both P and Q halt on X or both do not halt on X. (c) Given a program P and two different inputs X and Y, determine whether P halts on both X and Y.

8.15.

Let us consider only programs that halt on all inputs. (a) Here is a diagonalization "proof" that it is undecidable whether given such a program P and an input X to P, the result is 17. The "proof" is exactly as the proof in the text that the halting problem is undecidable, but with the list of all always-halting programs (instead of all programs) plotted against all inputs, and the yes/no entries asserting whether the result is 17. What is wrong with this proof? (b) Show that the problem in part (a) is in fact decidable.

8.16.

What would happen to the decidability status of a problem if we were guaranteed that its certificate was not merely finite, but bounded in size (to be specific, let us say that its certificate was no larger than 22N, for inputs of size N)?

8.17.

Assume that a problem P has the following strange certificates. An input is a yes-input if it has at least 7 yes-certificates, and it is a no-input if it has at least 3 no-certificates. (All certificates are verifiable in finite time.) (a) Is P decidable? (b) What can you say about the problem if we replace "at least 7" with "at least 7 x N ," where N is the size of the input? (c) What if we replace "at least 7" with "at most 7"?

8.18.

Show how to solve the ordinary unbounded domino problem given the recurring domino problem as an oracle.

8. I 9.

Devise polynomial-time algorithms for the fixed-width bounded versions of the following problems. (a) The domino problem (i.e., K is fixed, T and N are given, and we are asked whether we can tile an N x K rectangle). (b) The word correspondence problem (i.e., K is fixed, the Xs, Ys, and N are given, and we are asked whether we can concatenate K words from the Xs forming a compound word that can also be formed by concatenating at most N words from the Ys).

8.20.

(a) Is the fixed-width unbounded domino problem decidable? (That is, K is fixed, and we are given T, and we are asked whether there is a tiling of an infinite strip of width K.) (b) How about the analogous word correspondence problem?

75

76

Algorithms and Complexity

8.21.

Show that the regular unbounded version of the tiling problem for equilateral triangles is undecidable.

they shall never see li~ht PSALM

49: 20

Skip Lists: A Probabilistic Alternative to Balanced Trees

77

Skip Lists: A Probabilistic Alternative to Balanced Trees Skip lists are a data structure that can be used in place of balanced trees. Skip lists use probabilistic balancing rather than strictly enforced balancing and as a result the algorithms for insertion and deletion in skip lists are much simpler and significantly faster than equivalent algorithms for balanced trees. William Pugh Binary trees can be used for representing abstract data types such as dictionaries and ordered lists. They work well when the elements are inserted in a random order. Some sequences of operations, such as inserting the elements in order, produce degenerate data structures that give very poor performance. If it were possible to randomly permute the list of items to be inserted, trees would work well with high probability for any input sequence. [n most cases queries must be answered on-line, so randomly permuting the input is impractical. Balanced tree algorithms re-arrange the tree as operations are performed to maintain certain balance conditions and assure good performance. Skip lists are a probabilistic alternative to balanced trees. Skip lists are balanced by consulting a random number generator. Although skip lists have bad worst-case performance, no input sequence consistently produces the worst-case performance (much like quicksort when the pivot element is chosen randomly). It is very unlikely a skip list data structure will be significantly unbalanced (e.g., for a dictionary of more than 250 elements, the chance that a search will take more than 3 times the expected time is less than one in a million). Skip lists have balance properties similar to that of search trees built by random insertions, yet do not require insertions to be random. Balancing a data structure probabilistically is easier than explicitly maintaining the balance. For many applications, skip lists are a more natural representation than trees, also leading to simpler algorithms. The simplicity of skip list algorithms makes them easier to implement and provides significant constant factor speed improvements over balanced tree and self-adjusting tree algorithms. Skip lists are also very space efficient. They can easily be configured to require an average of l 1/ 3 pointers per element (or even less) and do not require balance or priority information to be stored with each node.

SKIP LISTS We might need to examine every node of the list when searching a linked list (Figure l a). If the list is stored in sorted order and every other node of the list also has a pointer to the node two ahead it in the list (Figure lb), we have to examine no more than n/2l + 1 nodes (where n is the length of the list).

r

Also giving every fourth node a pointer four ahead (Figure I c) requires that no more thanf n/4l + 2 nodes be examined. If every (2i)1h node has a pointer 2i nodes ahead (Figure Id), the number of nodes that must be examined can be reduced to f log2 n l while only doubling the number of pointers. This data structure could be used for fast searching, but insertion and deletion would be impractical. A node that has k forward pointers is called a level k node. If every (2i)th node has a pointer ii nodes ahead, then levels of nodes are distributed in a simple pattern: 50% are level I, 25% are level 2, 12.5% are level 3 and so on. What would happen if the levels of nodes were chosen randomly, but in the same proportions (e.g., as in Figure le)? A node's i th forward pointer, instead of pointing ii- 1 nodes ahead, points to the next node of level i or higher. Insertions or deletions would require only local modifications; the level of a node, chosen randomly when the node is inserted, need never change. Some arrangements of levels would give poor execution times, but we will see that such arrangements are rare. Because these data structures are linked lists with extra pointers that skip over intermediate nodes, I named them skip lists.

SKIP LIST ALGORITHMS This section gives algorithms to search for, insert and delete elements in a dictionary or symbol table. The Search operation returns the contents of the value associated with the desired key or failure if the key is not present. The Insert operation associates a specified key with a new value (inserting the key if it had not already been present). The Delete operation deletes the specified key. It is easy to support additional operations such as "find the minimum key" or "find the next key". Each element is represented by a node, the level of which is chosen randomly when the node is inserted without regard for the number of elements in the data structure. A level i node has iforward pointers, indexed I through i. We do not need to store the level of a node in the node. Levels are capped at some appropriate constant Maxlevel. The level of a list is the maximum level currently in the list (or 1 if the list is empty). The header of a list has forward pointers at levels one through Maxlevel. The forward pointers of the header at levels higher than the current maximum level of the list point to NIL.

78

Algorithms and Complexity

a b

' d

FIGURE 1 - Linked lists with additional pointers

Initialization An element NIL is allocated and given a key greater than any legal key. All levels of all skip lists are terminated with NIL. A new list is initialized so that the the level of the list is equal to 1 and all forward pointers of the list's header point to NIL.

the previous maximum level of the list, we update the maximum level of the list and initialize the appropriate portions of the update vector. After each deletion, we check if we have deleted the maximum element of the list and if so, decrease the maximum level of the list.

Search Algorithm

Choosing a Random Level

We search for an element by traversing forward pointers that do not overshoot the node containing the element being searched for (Figure 2). When no more progress can be made at the current level of forward pointers, the search moves down to the next level. When we can make no more progress at level l, we must be immediately in front of the node that contains the desired element (if it is in the list).

Initially, we discussed a probability distribution where half of the nodes that have level i pointers also have level i+ 1 pointers. To get away from magic constants, we say that a fraction p of the nodes with level i pointers also have level i+ I pointers. (for our original discussion, p = 1/2). Levels are generated randomly by an algorithm equivalent to the one in Figure 5. Levels are generated without reference to the number of elements in the list.

Insertion and Deletion Algorithms To insert or delete a node, we simply search and splice, as shown in Figure 3. Figure 4 gives algorithms for insertion and deletion. A vector update is maintained so that when the search is complete (and we are ready to perform the splice), update[i] contains a pointer to the rightmost node of level i or higher that is to the left of the location of the insertion/deletion. If an insertion generates a node with a level greater than

Search(list, searchKey) x := list~ header -- loop invariant: x~key < searchKey for i := list~level downto 1 do while x~forward[i~key < searchKey do x := x~forward[i) -- x~key < searchKey ~ x~Jorward[1J~key x := x~ forward[1] if x~key = searchKey then return x~value else return failure

FIGURE 2 - Skip list search algorithm

At what level do we start a search? Defining L(n) In a skip list of 16 elements generated with p = 1/2, we might happen to have 9 elements of level 1, 3 elements of level 2, 3 elements of level 3 and 1 element of level 14 (this would be very unlikely, but it could happen). How should we handle this? If we use the standard algorithm and start our search at level 14, we will do a lot of useless work. Where should we start the search? Our analysis suggests that ideally we would start a search at the level L where we expect lip nodes. This happens when L = log 11P n. Since we will be referring frequently to this formula, we will use L(n) to denote log 1;p n. There are a number of solutions to the problem of deciding how to handle the case where there is an element with an unusually large level in the list. • Don't worry, be happy. Simply start a search at the highest level present in the list. As we will see in our analysis, the probability that the maximum level in a list of n elements is significantly larger than L(n) is very small. Starting a search at the maximum level in the list does not add more than a small constant to the expected search time. This is the approach used in the algorithms described in this paper.

Skip Lists: A Probabilistic Alternative to Balanced Trees

Search path

/

79

update[i]~forward[i]

NIL

original list, 17 to be inserted

list after insertion, updated pointers in grey FIGURE 3 - Pictorial description of steps involved in performing an insertion

• Use less than you are given. Although an element may contain room for 14 pointers, we don't need to use all 14. We can choose to utilize only L(n) levels. There are a number of ways to implement this, but they all complicate the algorithms and do not noticeably improve performance, so this approach is not recommended. • Fix the dice. If we generate a random level that is more than one greater than the current maximum level in the list, we simply use one plus the current maximum level in the list as the level of the new node. In practice and intuitively, this change seems to work well. However, it totally destroys our ability to analyze the resulting algorithms, since the level of a node is no longer completely random. Programmers should probably feel free to implement this, purists should avoid it.

Determining MaxLevel Since we can safely cap levels at L(n), we should choose MaxLevel = L(N) (where N is an upper bound on the number of elements in a skip list). If p = 112, using MaxLevel = 16 is appropriate for data structures containing up to 2 16 elements.

ANALYSIS OF SKIP LIST ALGORITHMS The time required to execute the Search, Delete and Insert operations is dominated by the time required to search for the appropriate element. For the Insert and Delete operations, there is an additional cost proportional to the level of the node being inserted or deleted. The time required to find an element is proportional to the length of the search path, which is determined by the pattern in which elements with different levels appear as we traverse the list.

Probabilistic Philosophy The structure of a skip list is determined only by the number

random Level() lvl := 1 -- random() that returns a random value in [0... 1) while random()< p and lvl < Maxlevel do lvl := lvl + 1 return lvl FIGURE 5 - Algorithm to calculate a random level

lnsert(list, searchKey, newValue) local update[1 .. Maxlevel] x := list-+header for i := list-+level downto 1 do while x-+forward[i]-+key < searchKey do x := x-+forward[i] -- x-+key < searchKey ~ x-+forward[i]-+key update[i] := x x := x-+forward[1] if x-+key = searchKey then x-+value := newValue else lvl := randomlevel() if lvl > list-+level then for i := list-+level + 1 to lvl do update[i] := list-+header list-+level := lvl x := makeNode(lvl, searchKey, value) for i := 1 to level do x-+forward[i] := update[i]-+forward[i] update[i]-+forward[i] := x Delete(list, searchKey) local update[1 .. Max Level] x := list-+header for i := list-+level downto 1 do while x-+forward[i]-+key < searchKey do x := x-+forward[i] update[i] := x x := x-+forward[1] if x-+key = searchKey then for i := 1 to list-+level do if update[i]-+forward[i] * x then break update[i]-+forward[i] := x-+forward[i) free(x) while list-+level > 1 and list-+header-+forward[list-+level] = NIL do list-+level := list-+level - 1 FIGURE 4 - Skip List insertion and deletion algorithms

80

Algorithms and Complexity

Our assumption that the list is infinite is a pessimistic assumption. When we bump into the header in our backwards climb, we simply climb up it, without performing any leftward movements. This gives us an upper bound of (L(n)-I )Ip on the expected length of the path that climbs from level I to level L(n) in a list of n elements. We use this analysis go up to level L(n) and use a different analysis technique for the rest of the journey. The number of leftward movements remaining is bounded by the number of elements of level L(n) or higher in the entire list, which has an expected value of lip. We also move upwards from level L(n) to the maximum level in the list. The probability that the maximum level of the list is a greater than k is equal to 1-( l-pk)n, which is at most npk. We can calculate the expected maximum level is at most L(n) + 1/(1-p). Putting our results together, we find

elements in the skip list and the results of consulting the random number generator. The sequence of operations that produced the current skip list does not matter. We assume an adversarial user does not have access to the levels of nodes; otherwise, he could create situations with worst-case running times by deleting all nodes that were not level 1. The probabilities of poor running times for successive operations on the same data structure are NOT independent; two successive searches for the same element will both take exactly the same time. More will be said about this later.

Analysis of expected search cost We analyze the search path backwards, travelling up and to the left. Although the levels of nodes in the list are known and fixed when the search is performed, we act as if the level of a node is being determined only when it is observed while backtracking the search path. At any particular point in the climb, we are at a situation similar to situation a in Figure 6 - we are at the jlh forward pointer of a node x and we have no knowledge about the levels of nodes to the left of x or about the level of x, other than that the level of x must be at least i. Assume the x is not the header (the is equivalent to assuming the list extends infinitely to the left). If the level of x is equal to i, then we are in situation b. If the level of x is greater than i, then we are in situation c. The probability that we are in situation c is p. Each time we are in situation c, we climb up a level. Let C(k) = the expected cost (i.e, length) of a search path that climbs up k levels in an infinite list:

Total expected cost to climb out of a list of n elements ~ L(n)lp + 1/(1-p) which is O(log n).

Number of comparisons Our result is an analysis of the "length" of the search path. The number of comparisons required is one plus the length of the search path (a comparison is performed for each position in the search path, the "length" of the search path is the number of hops between positions in the search path).

Probabilistic Analysis It is also possible to analyze the probability distribution of search costs. The probabilistic analysis is somewhat more complicated (see box). From the probabilistic analysis, we can calculate an upper bound on the probability that the actual cost of a search exceeds the expected cost by more than a specified ratio. Some results of this analysis are shown in Figure 8.

C(O) = 0 C(k) = (1-p) (cost in siniation b) + p (cost in situation c) By substituting and simplifying, we get: C(k) = (1-p) (I+ C(k)) + p (1 + C(k-1)) C(k) = 1/p + C(k-1) C(k) = kip

Choosingp Table I gives the relative times and space requirements for different values of p. Decreasing p also increases the variabil-

Need to climb k levels from here

Jl1'

probability = 1-p

situation a '

probability = p

Need to climb only k-1 levels vfromhere

? Still need to climb k levels from here

situation b

X

situation c

FIGURE 6 - Possible situations in backwards traversal of the search path

Skip Lists: A Probabilistic Alternative to Balanced Trees

ity of running times. If lip is a power of 2, it will be easy to generate a random level from a stream of random bits (it requires an average of (log2 1/p)/(l-p) random bits to generate a random level). Since some of the constant overheads are related to L(n) (rather than L(n)lp), choosing p = 1/4 (rather than 1/2) slightly improves the constant factors of the speed of the algorithms as well. I suggest that a value of 1/4 be used for p unless the variability of running times is a primary concern, in which case p should be 1/2.

Sequences of operations The expected total time for a sequence of operations is equal to the sum of the expected times of each of the operations in the sequence. Thus, the expected time for any sequence of m searches in a data structure that contains n elements is O(m log n). However, the pattern of searches affects the probability distribution of the actual time to perform the entire sequence of operations. If we search for the same item twice in the same data structure, both searches will take exactly the same amount of time. Thus the variance of the total time will be four times the variance of a single search. If the search times for two elements are independent, the variance of the total time is equal to the sum of the variances of the individual searches. Searching for the same element over and over again maximizes the variance.

ALTERNATIVE DATA STRUCTURES Balanced trees (e.g., A YL trees [Knu73] [Wir76]) and selfadjusting trees [ST85] can be used for the same problems as skip lists. All three techniques have performance bounds of the same order. A choice among these schemes involves several factors: the difficulty of implementing the algorithms, constant factors, type of bound (amortized, probabilistic or worst-case) and performance on a non-uniform distribution of queries.

Implementation difficulty For most applications, implementers generally agree skip lists are significantly easier to implement than either balanced tree algorithms or self-adjusting tree algorithms.

p

A vg. # of pointers per node (i.e., 1/( I - n))

Normalized search times (i.e., normalized L(n)lo)

112 l;e

I 0.94 ...

1/4 11s 1/16

1.33 ... 2

81

2 1.58 ... 1.33 ... l.14 ... 1.07 ...

l

TABLE l - Relative search speed and space requirements, depending on the value of p.

Constant factors Constant factors can make a significant difference in the practical application of an algorithm. This is particularly true for sub-linear algorithms. For example, assume that algorithms A and B both require O(log n) time to process a query, but that Bis twice as fast as A: in the time algorithm A takes to process a query on a data set of size n, algorithm B can process a query on a data set of size n 2 . There are two important but qualitatively different contributions to the constant factors of an algorithm. First, the inherent complexity of the algorithm places a lower bound on any implementation. Self-adjusting trees are continuously rearranged as searches are performed; this imposes a significant overhead on any implementation of self-adjusting trees. Skip list algorithms seem to have very low inherent constant-factor overheads: the inner loop of the deletion algorithm for skip lists compiles to just six instructions on the 68020. Second, if the algorithm is complex, programmers are deterred from implementing optimizations. For example, balanced tree algorithms are normally described using recursive insert and delete procedures, since that is the most simple and intuitive method of describing the algorithms. A recursive insert or delete procedure incurs a procedure call overhead. By using non-recursive insert and delete procedures, some of this overhead can be eliminated. However, the complexity of nonrecursive algorithms for insertion and deletion in a balanced tree is intimidating and this complexity deters most programmers from eliminating recursion in these routines. Skip list al-

-10-l

- - p = 1/4, n = 256 p= l/4,n=4,096 p = 1/4, n = 65,536 - - p = 1/2, n = 256 p= l/2,n=4,096 p = l/2, n = 65,536 -

.___ ___._ _ ____.,_ _ ____.,_ _ __1,_ _ ___.__ _ ___,__

1.0

2.0

...:J

___;:~_,__ _ _

10-8 10-9

3.0

Ratio of actual cost to expected cost FIGURE 8 - This graph shows a plot of an upper bound on the probability of a search taking substantially longer than expected. The vertical axis show the probability that the length of the search path for a search exceeds the average length by more than the ratio on the horizontal axis. For example, for p = 1/2 and n = 4096, the probability that the search path will be more than three times the expected length is less than one in 200 million. This graph was calculated using our probabilistic upper bound.

82

Algorithms and Complexity

Implementation Skip lists non-recursive A VL trees recursive 2- 3 trees Self-adjusting trees: top-down splaying bottom-uo solavimi

Search Time 0.051 msec (1.0) 0.046 msec (0.91 ) 0.054 msec (1.05) 0.15 msec 0.49 msec

(3.0) (9.6l

Insertion Time 0.065 msec (1.0) 0.10 msec (1.55) 0.21 msec (3.2) 0.16 msec 0.51 msec

(2.5) (7.8)

Deletion Time 0.059 msec ( l.O) 0.085 msec (1.46) 0.21 msec (3.65) 0.18 msec 0.53 msec

(3.1) (9.0)

Table 2 - Timings of implementations of different algorithms gorithms are already non-recursive and they are simple enough that programmers are not deterred from performing optimizations. Table 2 compares the performance of implementations of skip lists and four other techniques. All implementations were optimized for efficiency. The AVL tree algorithms were written by James Macropol of Conte! and based on those in [Wir76] . The 2-3 tree algorithms are based on those presented in [AHU83]. Several other existing balanced tree packages were timed and found to be much slower than the results presented below. The self-adjusting tree algorithms are based on those presented in [ST85]. The times in this table reflect the CPU time on a Sun-3/60 to perform an operation in a data structure containing 2 16 elements with integer keys. The values in parenthesis show the results relative to the skip list time The times for insertion and deletion do not include the time for memory management (e.g, in C programs, calls to malloc andfree). Note that skip lists perform more comparisons than other methods (the skip list algorithms presented here require an average of L(n)lp + 1/(I-p) + I comparisons). For tests using real numbers as keys, skip lists were slightly slower than the non-recursive A VL tree algorithms and search in a skip list was slightly slower than search in a 2-3 tree (insertion and deletion using the skip list algorithms was still faster than using the recursive 2- 3 tree algorithms). If comparisons are very expensive, it is possible to change the algorithms so that we never compare the search key against the key of a node more than once during a search. For p = 1/2, this produces an upper bound on the expected number of comparisons of 7/2 + 3/2 log2 n. This modification is discussed in [Pug89b].

Type of performance bound These three classes of algorithm have different kinds of performance bounds. Balanced trees have worst-case time bounds, self-adjusting trees have amortized time bounds and skip lists have probabilistic time bounds. With self-adjusting trees, an individual operation can take O(n) time, but the time bound always holds over a long sequence of operations. For skip lists, any operation or sequence of operations can take longer than expected, although the probability of any operation taking significantly longer than expected is negligible. In certain real-time applications, we must be assured that an operation will complete within a certain time bound. For such applications, self-adjusting trees may be undesirable, since they can take significantly longer on an individual operation than expected (e.g., an individual search can take O(n) time instead of O(log n) time). For real-time systems, skip lists may be usable if an adequate safety margin is provided: the chance that a search in a skip lists containing 1000 ele-

ments takes more than 5 times the expected time is about I in 1018_

Non-uniform query distribution Self-adjusting trees have the property that they adjust to nonuniform query distributions. Since skip lists are faster than self-adjusting trees by a significant constant factor when a uniform query distribution is encountered, self-adjusting trees are faster than skip lists only for highly skewed distributions. We could attempt to devise self-adjusting skip lists. However, there seems little practical motivation to tamper with the simplicity and fast performance of skip lists; in an application where highly skewed distributions are expected, either selfadjusting trees or a skip list augmented by a cache may be preferable [Pug90].

ADDITIONAL WORK ON SKIP LISTS I have described a set of algorithms that allow multiple processors to concurrently update a skip list in shared memory [Pug89a]. This algorithms are much simpler than concurrent balanced tree algorithms. They allow an unlimited number of readers and n busy writers in a skip list of n elements with very little lock contention. Using skip lists, it is easy to do most (all?) the sorts of operations you might wish to do with a balanced tree such as use search fingers, merge skip lists and allow ranking operations (e.g., determine the kth element of a skip list) [Pug89b]. Tom Papadakis, Ian Munro and Patricio Poblette [PMP90] have done an exact analysis of the expected search time in a skip list. The upper bound described in this paper is close to their exact bound; the techniques they needed to use to derive an exact analysis are very complicated and sophisticated. Their exact analysis shows that for p = 1/2 and p = l/4, the upper bound given in this paper on the expected cost of a search is not more than 2 comparisons more than the exact expected cost. I have adapted idea of probabilistic balancing to some other problems arising both in data structures and in incremental computation [PT88]. We can generate the level of a node based on the result of applying a hash function to the element (as opposed to using a random number generator). This results in a scheme where for any set S, there is a unique data structure that represents S and with high probability the data structure is approximately balanced. If we combine this idea with an applicative (i.e., persistent) probabilistically balanced data structure and a scheme such as hashed-consing [All78] which allows constant-time structural equality tests of applicative data structures, we get a number of interesting properties, such as constant-time equality tests for the representations of sequences. This scheme also has a number of applications for incremental computation. Since skip lists are

Skip Lists: A Probabilistic Alternative to Balanced Trees

somewhat awkward to make applicative, a probabilistically balanced tree scheme is used.

[BLLSS86]

RELATED WORK James Discroll pointed out that R. Sprugnoli suggested a method of randomly balancing search trees in 1981 [Spr81]. With Sprugnoli' s approach, the state of the data structure is not independent of the sequence of operations which built it. This makes it much harder or impossible to formally analyze his algorithms. Sprugnoli gives empirical evidence that his algorithm has good expected performance, but no theoretical results. A randomized data structure for ordered sets is described in [BLLSS86] . However, a search using that data structure requires O(n 112 ) expected time. Cecilia Aragon and Raimund Seidel describe a probabilistically balanced search trees scheme [AC89]. They discuss how to adapt their data structure to non-uniform query distributions.

[Knu73]

[PMP90]

[PT89]

[Pug89a]

[Pug89b]

SOURCE CODE AVAILABILITY Skip list source code libraries for both C and Pascal are available for anonymous ftp from ftp. cs. umd. edu.

[Pug90]

CONCLUSIONS From a theoretical point of view, there is no need for skip lists. Balanced trees can do everything that can be done with skip lists and have good worst-case time bounds (unlike skip lists). However, implementing balanced trees is an exacting task and as a result balanced tree algorithms are rarely implemented except as part of a programming assignment in a data structures class. Skip lists are a simple data structure that can be used in place of balanced trees for most applications. Skip lists algorithms are very easy to implement, extend and modify. Skip lists are about as fast as highly optimized balanced tree algorithms and are substantially faster than casually implemented balanced tree algorithms.

ACKNOWLEDGEMENTS Thanks to the referees for their helpful comments. Special thanks to all those people who supplied enthusiasm and encouragement during the years in which I struggled to get this work published, especially Alan Demers, Tim Teitelbaum and Doug Mcilroy. This work was partially supported by an AT&T Bell Labs Fellowship and by NSF grant CCR8908900.

REFERENCES [AC89]

[AHU83]

[All78]

Aragon, Cecilia and Raimund Seidel, Randomized Search Trees, Proceedings of the 30th Ann. IEEE Symp on Foundations of Computer Science, pp 540-545, October 1989. Aho, A., Hopcroft, J. and Ullman, J. Data Structures and Algorithms, Addison-Wesley Publishing Company, 1983.

John Allen. Anatomy of USP, McGraw Hill Book Company, NY, 1978.

[Spr8 I] [ST85]

[Wir76]

83

Bentley, J., F. T. Leighton, M.F. Lepley, D. Stanat and J.M. Steele, A Randomized Data Structure For Ordered Sets, MIT/LCS Technical Memo 297, May 1986. Knuth, D. "Sorting and Searching," The Art of Computer Programming, Vol. 3, Addison-Wesley Publishing Company, 1973. Papadakis, Thomas, Ian Munro and Patricio Poblette, Exact Analysis of Expected Search Cost in Skip lists, Tech Report#????, Dept. of Computer Science, Univ. of Waterloo, January 1990. Pugh, W. and T. Teitelbaum, "Incremental Computation via Function Caching," Proc. of the Sixteenth conference on the Principles of Programming Languages, 1989. Pugh, W., Concurrent Maintenance of Skip lists, Tech Report TR-CS-2222, Dept. of Computer Science, University of Maryland, College Park, 1989. Pugh, W., Whatever you might want to do using Balanced Trees, you can do it faster and more simply using Skip lists, Tech Report CS- TR- 2286, Dept. of Computer Science, University of Maryland, College Park, July 1989. Pugh, W. Slow Optimally Balanced Search Strategies vs. Cached Fast Unifonnly Balanced Search Strategies, to appear in Information Processing Letters. Sprugnoli, R. Randomly Balanced Binary Trees, Calcolo, V 17 (1981), pp 99-1 I 7. Sleator, D. and R. Trujan "Self-Adjusting Binary Search Trees," Journal of the ACM, Vo! 32, No. 3, July 1985, pp. 652-666. Wirth, N. Algorithms + Data Structures = Programs, Prentice-Hall, 1976.

84

Algorithms and Complexity

Probabilistic analysis of search cost

PROBABILISTIC ANALYSIS In addition to analyzing the expected performance of skip lists, we can also analyze the probabilistic performance of skip lists. This will allow us to calculate the probability that an operation takes longer than a specified time. This analysis is based on the same ideas as our analysis of the expected cost, so that analysis should be understood first. A random variable has a fixed but unpredictable value and a predictable probability distribution and average. If X is a random variable, Prob{ X = t} denotes the probability that X equals t and Prob{ X > t } denotes the probability that X is greater than t. For example, if X is the number obtained by throwing a unbiased die, Prob{ X > 3 } = 1/2. It is often preferable to find simple upper bounds on values whose exact value is difficult to calculate. To discuss upper bounds on random variables, we need to define a partial ordering and equality on the probability distributions of nonnegative random variables. Definitions (=prob and Sprob)- LetX and Ybe non-negative independent random variables (typically, X and Y would denote the time to execute algorithms Ax and Ay). We define X Sprob Y to be true if and only if for any value t, the probability that X exceeds t is less than the probability that Y exceeds t. More formally: X =prob Y iffV t, Prob{ X> t} = Prob{ Y > t} and X Sprob Yiff\f t, Prob{ X> t} S Prob{ Y> t }.

D

For example, the graph in Figure ?shows the probability distribution of three random variables X, Y and Z. Since the probability distribution curve for X is completely under the curves for Y and Z, X Sprob Y and X Sprob Z. Since the probability curves for Y and Z intersect, neither Y Sprob Z nor Z Sprob Y. Since the expected value of a random variable X is simply the area under the curve Prob{ X > t}, if X Sprob Y then the average of X is less than or equal to the average of Y. We make use of two probability distributions: Definition (binomial distributions - B(t, p)). Lett be a non-negative integer and p be a probability. The term B(t, p) denotes a random variable equal to the number of successes seen in a series oft independent random trials where the probability of a success in a trial is p . The average and variance of B(t, p) are tp and tp(l - p) respectively. D Definition (negative binomial distributions -NB(s, p)). Lets be a non-negative integer and p be a probability. The term NB(s, p) denotes a random variable equal to the number of failures seen before the slh success in a series of random independent trials where the probability of a success in a trial is p. The average and variance of NB(s, p) are s(l- p)lp and s(l - pYp2 respectively. D

-

Prob{X>t} Prob{ Y> t} Prob{ Z> t}

Prob

FIGURE 7 - Plots of three probability distributions

The number of leftward movements we need to make before we move up a level (in an infinite list) has a negative binomial distribution: it is the number of failures (situations b's) we see before we see the first success (situation c) in a series of independent random trials, where the probability of success is p. Using the probabilistic notation introduced above: Cost to climb one level in an infinite list =prob 1+ NB(I, p).

We can sum the costs of climbing each level to get the total cost to climb up to level L (n): Cost to climb to level L(n) in an infinite list =prob (L(n) - 1) + NB(L(n ) - I, p). Our assumption that the list is infinite is a pessimistic assumption: Cost to climb to level L(n) in a list of n elements Sprob(L(n) - 1) +NB(L(n ) -1 , p).

Once we have climbed to level L(n), the number of leftward movements is bounded by the number of elements of level L(n) or greater in a list of n elements. The number of elements of level L(n) or greater in a list of n elements is a random variable of the form B(n, 1/np). Let M be a random variable corresponding to the maximum level in a list of n elements. The probability that the level of a node is greater thank is pk, so Prob{ M > k} = 1(1-p"'f' < npk. Sincenpk = ~-L(n) and Prob{ NB(I, l-p) + 1 > i} = r}-, we get an probabilistic upper bound of M Sproh U(!) + NB(I, I -p) + 1. Note that the average of L(n) +NB(l, 1-p) + I is L(n ) + 1/(1-p). This gives a probabilistic upper bound on the cost once we have reached level L(n) of B(n , 1/np) + (L(n) +NB(], l -p) + 1)- L(n). Combining our results to get a probabilistic upper bound on the total length of the search path (i.e., cost of the entire search): total cost to climb out of a list of n elements Sprob (L(n ) - 1) + NB(L(n) - 1, p) + B(n , 1/np) +NB(l, 1-p)+ I

The expected value of our upper bound is equal to (L(n) - I)+ (L(n) -1)(1 -p)lp + 1/p + p/(1-p) + 1 = L(n)lp + 1/(1-p),

which is the same as our previously calculated upper bound on the expected cost of a search. The variance of our upper bound is (L(n)- 1)(1- p)!p2 + (I - 1/np)lp + p/(1 - p)2 < (l-p)L(n)tp 2 + p/(l-p)2 + (2p-l)/p2.

Figure 8 show a plot of an upper bound on the probability of an actual search taking substantially longer than average, based on our probabilistic upper bound.

r

We work with leading authors to develop the strongest educational materials bringing cutting-edge thinking and best learning practice to a global market. Under a range of well-known imprints, including Financial Times/Prentice Hall, Addison Wesley and Longman, we craft high-quality print and electronic publications which help readers to understand and apply their content, whether studying or at work. Pearson Custom Publishing enables our customers to access a wide and expanding range of market-leading content from world-renowned authors and develop their own tailor-made book. You choose the content that meets your needs and Pearson Custom Publishing produces a high-quality printed book. To find out more about custom publishing,

visit www.pearsoncustom.co.uk

PEARSON Custom Publishing