408 103 4MB
English Pages [153] Year 2010
CS 880: Quantum Information Processing
9/2/2010
Lecture 1: Introduction Instructor: Dieter van Melkebeek
Scribe: Dalibor Zelen´ y
Welcome to CS 880, a course on quantum information processing. This area has seen a lot of development in the last twenty years, with interest increasing after Peter Shor found a polynomialtime quantum algorithm for factoring in 1994. In this course, we focus on computational aspects of quantum computing, quantum information, errorcorrection, and faulttolerant computing. We do not focus on the issue of physical realizability of a quantum computer. Today we begin with a brief account of the history of quantum computing that is relevant to our course, give an overview of topics covered in this course, and start developing a formal model of quantum computation.
1
Brief Historical Overview
1.1
Quantum Computing
In the 1980s, physicists Richard Feynman and Paul Benioff observed that simulating quantum systems on computers did not work well, and suggested that a different kind of a computer, based on quantum principles, could perform better. They also suggested that such a computer could speed up computation for other problems solved on classical computers. David Deutsch, inspired by the ChurchTuring thesis1 , gave the first formal model of a quantum computer in 1985. It is now known that the ChurchTuring thesis holds for quantum computers; however, it is an open problem whether the strong version of the ChurchTuring thesis holds for quantum computers. The consensus in the community is that the strong ChurchTuring thesis does not hold for quantum computers. A significant piece of evidence for the latter came in 1994 when Peter Shor gave polynomialtime quantum algorithms for factoring integers and for finding discrete logarithms. Today’s best known classical algorithms for these problems still have exponential running times. Shor’s algorithm motivated a vast amount of research in quantum algorithms. We briefly mention only a few as a full account would take too much time. A key ingredient in Shor’s algorithm, Fourier sampling, can be generalized to an efficient procedure for estimating (the phases of) eigenvalues of unitary operators. This phase/eigenvalue estimation estimation technique has found several applications, including a polylogarithmic algorithm for solving wellconditioned systems of linear equations. There are also some negative results in this line of research: it has been shown that we cannot solve graph isomorphism using Fourier sampling. In 1995, Lov Grover showed how to search for an element in an unordered list of n elements √ using O( n) queries. In the classical and randomized settings, we require Θ(n) queries in the worst case. Applying Grover’s algorithm to exhaustive search for a satisfying assignment to a Boolean 1
The ChurchTuring thesis states that Turing machines can decide the same set of languages as any other computational model such as lambda calculus. The strong ChurchTuring thesis, in addition, says that we can simulate any other computational model on a Turing machine with only polynomial overhead.
1
formula yields a O(2n/2 ) algorithm for satisfiability, while we don’t know of any such algorithms in the classical setting. However, finding a subexponential algorithm for satisfiability is an open problem even on quantum computers. The consensus is that such algorithms do not exist. Given that, a lot of the research in quantum computing focuses on NPintermediate problems because some problems we suspect to be NPintermediate do have efficient quantum algorithms. Examples of problems believed to be NPintermediate are factoring, the discrete logarithm problem, the approximate shortest vector problem in lattices, and graph isomorphism. It may be the case, however, that all these problems have efficient quantum algorithms because they are actually in P.
1.2
Quantum Information
While the progress in realizing quantum computers has been slow, there has been more success in the area of quantum information. Shor’s algorithm was a negative result for cryptography, as cryptosystems such as RSA rely on the hardness of factoring. On the other hand, quantum computation yields some more powerful cryptographic tools. For example, there is a key distribution protocol where an eavesdropper cannot observe a message without irreversibly altering it, thus making her presence known to the communicating parties. Some research has focused on zero knowledge proofs2 in the quantum setting. It turns out that the zero knowledge protocol for graph isomorphism, as well as several other zero knowledge protocols, remain zero knowledge even if we give the power of quantum computation to the verifier. Other aspects studied in quantum information are teleportation, superdense coding, non locality, and many more.
1.3
Quantum Error Correcting and Fault Tolerant Computing
Quantum behavior of algorithms is negatively influenced by decoherence, which is a term for the interaction between the quantum system and the outside environment. We would like to design errorcorrecting algorithms that cope with it. Unfortunately, even the errorcorrecing algorithms themselves may suffer from decoherence. Faulttolerant computing solves the latter problem, provided the amount of interaction between the errorcorrecting procedure and the environment stays below a certain threshold. As we will see later, errorcorrection is harder in the quantum setting. In the classical setting, we only have two states, zero and one, whereas there are more states in the quantum setting.
2
In a zero knowledge proof, the goal of the prover is to convince the verifier that a particular statement is true, without revealing any additional information. For example, an NP witness that two graphs are isomorphic reveals additional information—the isomorphism from the vertex set of one graph to the vertex set of the other graph—so it’s not a zeroknowledge proof. We want the proof to be such that if the verifier can compute something efficiently after interacting with the prover, it could have computed it efficiently even before the interaction took place.
2
2
Topics Covered in This Course
This course covers topics listed below. 1. The model of a quantum computer 2. Paradigms for efficient quantum algorithms (a) Phase estimation (b) Hidden subgroup problem  In this problem we have a group action that, for each coset, behaves the same on all elements of that coset. We are interested in the subgroup used to define the cosets. (c) Quantum random walks 3. Quantum communications and other interactive processes 4. Time permitting: Quantum error correction and faulttolerant computation
3
Modeling Computation
Today we describe classical computation in terminology that is more suitable for describing quantum computation. Next time, we will give formal descriptions of probabilistic computation and quantum computation. Let R ⊆ {0, 1}∗ ×{0, 1}∗ be a relation. In this course, we think of it as a relation between inputs and outputs for some problem. Computation is a process that evolves the state of a system from an initial state to a final state, while transforming the input into an output. If, for every input x, our computation produces an output y such that (x, y) ∈ R, we say that it realizes R. Note that a relation is more general than a function. A function is a relation such that at most one pair (x, y) is in R for each x. We consider relations because many problems allow multiple strings y to be associated with a given string x. For example, in the shortest path problem, we can have multiple shortest paths y in a single instance x. We consider three stages of computation. 1. Initialization of the system with the input x. 2. Evolution of the system by performing a sequence of elementary steps according to some rules. 3. Observation of the final state of the system, from which we extract the output y. We need not pay much attention to steps 1 and 3 above in classical computation. These steps require more consideration in the case of quantum computation, but we are not going to discuss them in this course. We focus on the second step, which should have the following properties. 1. It should be physically realizable. This means that elementary operations should act on a small part of the system, i.e., it should act on a small (constant) number of bits that are physically close to each other.
3
2. The sequence of elementary operations should follow “easily” from a short description. For example, we could describe it by a program of constant length. We usually use Turing machines to describe classical computation. Unfortunately, quantum Turing machines are much more cumbersome to deal with. Thus, we present an alternative way of describing a computation. In particular, we use gates to describe the evolution of the state of a system from the initial configuration to the final configuration. We apply the gates in a sequence in order to carry out the evolution of state, thus producing a circuit. We then argue that this circuit is easily computable from a short description.
3.1
Circuits Describe Classical Computation
We first discuss how to use gates to describe computation, and then put them together to form a circuit. We also give a mathematical description of computation using linear transformations, which will be useful later. We view the state of a classical computer as a concatenation of m subsystems, each of which has two possible states, 0 and 1. This allows us to describe the state of the computer as a binary string s of length m. We encode the input of length n into the first n parts of the system, and set all the other parts to 0. Once the computation terminates, we read the final state to extract the output. See Figure 1. m 0
0
0
0 Evolves
Encoded Input
Encoded Output
Figure 1: Computation of a classical computer We often use the “ket” notation si to denote a state s of length m, and we think of si as a column vector of length 2m with zeros everywhere but the position indexed by the number represented by s in binary (so si is the characteristic vector of the state s, with positions in the vector corresponding to states in lexicographical ordering). We then think of the action of an operator on a state s as a linear transformation acting on the vector si. We can represent this linear transformation by a 2m × 2m matrix. Recall that we want operators to be physically realizable. We can represent them as gates, for example NAND gates. In Figure 2, we show a NAND gate with two inputs and two outputs, where the first output is the NAND of the inputs, and the second output is the same as the second input. b1
b2
b1 ∧ b2
b2
Figure 2: A NAND gate
4
Example: The NAND gate from Figure 2 operates on a state s of length m = 2. states using the “ket” notation as follows: 0 0 0 1 0 0 1 0 00i = 0 , 01i = 0 , 10i = 1 , and 11i = 0 1 0 0 0
We represent the
.
The NAND gate causes the following changes to the current state: 00i 01i 10i 11i
→ → → →
The matrix corresponding to this transformation is 0 0 0 0 T = 1 0 0 1
10i, 11i, 10i, 01i.
0 0 1 0
0 1 . 0 0
What the matrix T really does to si is that it moves the position of the only one in si to the position at index whose binary representation corresponds to the new state. If si has its only one in position j, and s gets mapped to s′ , the jth column of the matrix T is going to have its only one in position i where i is the only index where s′ i has a one. In our case, we can see that T moves the one from position 0 = (00)2 to position 2 = (10)2 (so it adds 2 to 0). Similarly, it moves the one from position 1 to 3 (so it adds 2 again), does nothing when the one is in position 2, and moves the one two positions back when it’s in position 3. Our goal now is to describe the matrix for the transformation of states of length m when two consecutive states are connected by a NAND gate. The matrix changes depending on the bits the NAND gate acts on. Consider the case where the states are represented by m bits, and where the NAND gate acts on two consecutive bits at positions m − k − 2 and m − k − 1 (where the last bit has index m − 1). We use Tm,k to denote the matrix. Figure 3 shows the case k = 0. We start by describing Tm,0 and then use the intuition behind its description to describe Tm,k . before T after
Figure 3: Two consecutive states with the last two bits connected by a gate The output of the transformation represented by Tm,0 depends only on the last two bits of the state, modifies only those last two bits, and keeps the first m − 2 bits fixed. We discussed how the only one in si is moved by T for each of the combinations of the last two bits in si. The combinations of the last two bits repeat “periodically”, which immediately gives us the matrix 5
1111
0011 0010 0001 0000 0000 0001 0010 0011
MT
0 MT MT
0
MT
1111
Figure 4: The matrix T4,0 = I4 ⊗ T Tm,0 = I2m−2 ⊗ T . Figure 4 shows Tm,0 for m = 4. We get Tm,k = I2m−2−k ⊗ T ⊗ I2k by similar reasoning. ⊠ If we consider putting together multiple copies of Figure 3 in a series, with the “after” part of one application of a gate being the “before” part of the next application of a gate (perhaps at different positions of the state), we obtain a circuit that describes computation. We give more details after a brief discussion of Turing machines.
3.2
Turing Machines
In the previous section, we described computation using a circuit. However, it is possible that the circuit is different for each input length, and maybe even for two inputs of the same length. Therefore, the next question we must answer is how to compute the description of any of these circuits from one short description. The Turing machine is the standard formal model of computation. Here we show that we can represent the computation of a Turing machine by a circuit, and that these two models of computation are equivalent, which implies that we can use either one to represent computation. In fact, we need a little more. Recall that we want our basic operations (e.g., gates in Section 3.1) to be computable from a short description. We show that the circuit described in Section 3.1 is computable from a short descirption (we call such circuits uniform, and we use the term polynomialtime uniform if we can compute them in time polynomial in the circuit size). A Turing machine acts on a register of m bits, and the actions are governed by some finite control. The finite control has a pointer p to one bit in the register. In each step, the finite control reads the bit of the register pointed to by p, and looks at the current state. Afterwards, it writes a bit at the position pointed to by p, goes to another state, and moves p to the left, to the right, or chooses not to move it. We can describe the action in each step by a transition function δ : Q \ {qhalt } × Γ → Q × Γ × {L, P, R}, where Q is the finite set of states, qhalt is the state in which computation terminates, Γ is the finite alphabet of the register, and the set {L, P, R} represents the three possibilities for the movement of the pointer p: move left, stay put, and move right. Note that the actions of the transition function are local (they only affect the bit of the register pointed to by p, and move p to an adjacent bit). This satisfies the physical realizability requirement mentioned earlier. Furthermore, we only need to describe the finite control in order to describe a Turing machine. Since the transition function has finite size, Turing machines satisfy the computability from a short description requirement as well. We can describe the transition function by 6
a circuit with a fixed number of gates, and we can think of that circuit as just one big gate. We can also encode where p points as part of the state, and modify our gate to account for that. Now we use this gate to construct the circuit in Section 3.1. Since the gate is easily computable, so is the circuit, provided we can compute which bits of two consecutive states are connected by that gate for each pair of consecutive states. It turns out that this is possible and can be done in time that is polynomial in the Turing machine’s running time. For another construction of a circuit that represents a Turing machine’s computation, see Chapter 6 of [1]. It is easy to show that computation done on polynomialtime uniform circuits can be carried out on a Turing machine. A Turing machine can compute a polynomialtime uniform circuit in time polynomial in the circuit size, and then evaluate that circuit again in time polynomial in the circuit size. Therefore, we get the following theorem. Theorem 1. Turing machine computations and polynomialtime uniform circuits are equivalent in power up to polynomial factors in the running time. It turns out that Theorem 1 transfers to the quantum setting as well. Note that we use Turing machines in two ways here. One use is as a model of computation that is equivalent to uniform circuits, and another use is to compute the description of a uniform circuit. This is not circular logic.
4
Next Time
Next time we will describe probabilistic and quantum computation using the formalism developed in this lecture.
References [1] Sanjeev Arora and Boaz Barak. Computational Complexity: A Modern Approach. Cambridge, 2009.
7
CS 880: Quantum Information Processing
9/7/10
Lecture 2: From Classical to Quantum Model of Computation Instructor: Dieter van Melkebeek
Scribe: Tyson Williams
Last class we introduced two models for deterministic computation. We discussed Turing Machines, which are models of sequential computation, and then families of uniform circuits, which are models of parallel computation. In both models, we required the operators to be physically realizable and imposed a uniformity condition; namely, that the state transitions could be described by a finite set of rules independent of the input. In this lecture, we shall develop a model for probabilistic computation, from which our model for quantum computation will follow.
1 1.1
Model for Probabilistic Computation Overview
Probabilistic computers can use randomness to determine which operations to perform on their inputs. Thus, the state at any given moment and the final output of a computation are both random variables. One way to represent a state ψi of dimension m is as a probability distribution over base states, si for s ∈ {0, 1}m , X X ψi = ps si 0 ≤ ps ≤ 1, ps = 1 s∈{0,1}m
s∈{0,1}m
where ps denotes the probability of observing base state si. These state vectors have an L1 norm of 1. Since the output is now a random variable, we require a computation to provide the correct answer with high probability. That is, given relation R, input x, and output y, (∀x) Pr [(x, y) ∈ R] ≥ 1 − where denotes the probability of err. If were smaller than another bad event, such as the computer crashing during the computation, then we are satisfied. In contrast, = 1/2 is no good for decision problems, because the algorithm can just flip a fair coin and return the result. If R is a function, then = 1/3 is good enough because we can rerun the algorithm a polynomial number of times, take the majority answer, and achieve exponentially small error via the Chernoff bound. In fact, any bounded away from 1/2 will suffice.
1.2
Local Operations
In the probabilistic setting, a transition operator can depend on probabilistic outcomes (i.e., coin flips). Thus, the local effect of a transition operator can be described as the multiplication of a (left) stochastic matrix T with a state vector ψi, X (∀j) Tij = 1 0 ≤ Tij ≤ 1. i
1
We interpret Tij as the probability of entering state i after applying T to state j. As before, the state after an operation is T ψi because X (ψafter i)i = (T ψbefore i)i = Tij (ψbefore i)j . j
The matrix for a deterministic operator, which is an all zeros matrix except for a single 1 per column, is just a special case of a stochastic matrix. See Figure 1 for examples of stochastic matrices for a fair coin flip and a biased coin flip. 1 1 p p 2 2 C = 1 1 = C p 1−p 1−p 2 2 (a) Fair coin flip
(b) Biased coin flip
Figure 1: Coin flip gates The following exercise shows that, in a strong sense, coin flips are the only genuinely probabilistic operations we need. Exercise 1. Given a probabilistic circuit, C, of size t and depth d, there is an equivalent probabilistic circuit C 0 of size O(t) and depth O(d) such that the first level of C 0 consists only of biased coin flips and all other levels of C 0 are deterministic. Here, equivalent means that for any input x the distribution of outputs y is the same for C and C 0 .
1.3
Uniformity Condition
We can think of a deterministic Turing Machine as having a Boolean measure of validity associated with every possible transition between configurations. A 1 signifies a completely valid transition, while a 0 denotes a completely invalid transition: δ : (Q\{qhalt } × Γ) × (Q × Γ × {L, P, R}) → {0, 1} A probabilistic TM will have a probability of validity associated with every transition: δp : (Q\{qhalt } × Γ) × (Q × Γ × {L, P, R}) → [0, 1] It is important to note that, in order to satisfy the uniformity condition, these probabilities must be easily definable. In particular, we require the nth bit of any bias be computable in time poly(n). If we did not impose this constraint, we could use the probabilities to encode information, such as “0.” followed by the characteristic sequence of the halting language. To decide if the nth Turing machine halts, we could repeatedly sample from such a biased coin flip gate in order to estimate p. After we are confident in the value of the nth bit, we return that bit, thereby solving the halting language. This uniformity condition allows for an infinite number of basic operations. If this is a problem, then we can also consider having just the fair coin flip gate as the only source of randomness. In this case, we would use this gate to get good estimates for any biased coin flips gates that we need. However, we would also have to relax the universality condition. Instead of being required to sample exactly from the distribution of any probabilistic circuit, we would only be required to sample approximately. We will discuss this notion of universality in the next lecture. 2
1.4
A More Abstract View
P We define a pure state, ψi, as a convex combination P of base states, si. That is, ψi = s ps si, where ps is the probability of being in base state si, s ps = 1, and 0 ≤ ps ≤ 1. A mixed state, is a discrete probability distribution over pure states. We can think of the probabilistic model as allowing two operations on any pure state ψi. 1. Local, stochastic transformations, as specified by probabilistic transition matrices. These are L1 preserving. 2. A terminal observation, which is a probabilistic process that transforms a mixed state into a base state after all transformations have been applied. That is, ψi → si, where the probability of achieving si is ps . Exercise 2. What happens if we allow observations at any point in time? That is, in between transitions? A motivation, consider the problem of composing two procedures, both of which observe their respective states after their transformations are complete?
2
Model for Quantum Computation
2.1
Overview
As with the probabilistic model, the state of a system is described by a superposition of base states, but here: 1. the coefficients are complex numbers (usually denoted by α because it stands for an amplitude) P 2. vectors have an L2 norm of 1 (i.e., s αs 2 = 1) A qubit is the quantum analog of a classical bit and satisfies the above two conditions. The interpretation is that Pr[observing si] = αs 2 . Note, this is a valid interpretation because the above defines a valid probability distribution.
2.2
Local Operations
For consistency of interpretation, global operations have to preserve the 2norm. It is necessary and sufficient that local operations are unitary transformations. That is, T ∗T = I = T T ∗, where T ∗ is the conjugate transpose1 of T . Unitary matrices have a full basis of eigenvectors with eigenvalues λ = 1. Since the determinant is the product of the eigenvalues,  det  = 1 as well. Example: Does the classical “coinflip” transformation describe a valid quantum gate? No, because its transistion matrix is not unitary. It does not even have full rank. The notation T ∗ for the conjugate transpose is more common in linear algebra while T † is more common in quantum mechanics. 1
3
The quantum analog of a fair coin flip is the Hadamard gate. It is described by the following matrix, which is unitary: 1 1 1 =√ H 2 1 −1 If we apply the Hadamard gate to base states, we get the intuitive “fair coin” result. That is, regardless of which base state we are in, we end up with 50% probability of being in base state 0i and 50% probability of being in base state 1i: 1 H(0i) = √ 0i + 2 1 H(1i) = √ 0i − 2
1 √ 1i 2 1 √ 1i 2
What if we apply the Hadamard gate to a superposition of base states? 1 1 1 1 H √ 0i + √ 1i = (0i + 1i) + (0i − 1i) = 0i 2 2 2 2 1 1 1 1 H √ 0i − √ 1i = (0i + 1i) − (0i − 1i) = 1i 2 2 2 2 Unlike in the probabilistic setting, we do not necessarily get a “fair coin” result. The above is an example of destructive interference, the key ingredient of quantum algorithm design. Quantum algorithms that run faster than their classical counterparts make constructive use of destructive interference, effectively canceling out wrong computation paths. The tranformation matrix for the quantum analog of a biased coin flip is √ √ p p √ √ . 1−p − 1−p Another prevalent quantum gate is the rotation Rθ
1 0 = , 0 eiθ
which effectively adds θ to the phase of the 1component. Example: Can we use deterministic gates in the quantum setting? Consider the NAND gate. The matrix associated with the NAND gate’s transformation is not unitary, as both 00i and 10i map to the same output state, 10i. In general, deterministic gates are unitary if and only if they are permutations of base states. That is, if they are reversible. An important gate is the CNOT gate, which is shown schematically in Figure 2. The matrix
b1 b2
•
b1 ⊕ b2 b2
Figure 2: CNOT gate 4
associated with this transformation is given below: 1 0 0 0 T = 0 0 0 1
0 0 1 0
1 0 0 0
This gate flips its first input bit if the second bit, also known as the control bit, is a 1; otherwise it leaves the first input bit unchanged. Note that if b1 = 0, then the CNOT gate effectively copies b2 .
2.3
Simulating classical gates
Even though classical gates, such as the NAND gate, do not translate directly into the quantum setting, they can be simulated. Given a transformation f : {0, 1}∗ → {0, 1}, we can define a new transformation f˜ : {0, 1}∗ × {0, 1} → {0, 1}∗ × {0, 1} : (x, b) → (x, b ⊕ f (x)). Essentially, f˜ maintains a copy of its input in order to make the transformation reversible. One can perform this transformation on all classical gates. Example: A reversible NAND gate is shown schematically in Figure 3. The additional third bit, which we need to simulate the classical gate, is called an ancilla bit.
b1 b2 b3
RNAND
b1 b2 b3 ⊕ b1 ∧ b2
Figure 3: Reversible NAND gate We can apply the above idea to an entire classical circuit. Sometimes, the “garbage” output due to the ancilla bits is problematic, as it is not defined by the original classical transformation. Specifically, this garbage output will prevent the destructive interference from happening as desired. We can circumvent this difficulty by copying the output of the circuit and then running the circuit in reverse as illustrated in Figure 4. Theorem 1. If f can be computed by a deterministic circuit of size t and depth d, then f˜ can be computed by a reversible circuit of size O(t) and depth O(d) using O(t) ancilla bits. There are more efficient space usage transformations than specified by the above theorem, but this efficiency comes at the expense of time efficiency. It is an open question whether one can simulate a classical circuit in constant time and constant space relative to the original circuit.
5
0
/t
g /t
0
/i
/i
x /n 0
C0
z
0
y /k •
/k
C 0−1
0 x y
Figure 4: Computation of f˜ for arbitrary classical circuit C.
6
9/9/2010
CS 880: Quantum Information Processing
Lecture 3: Constructing a Quantum Model Instructor: Dieter van Melkebeek
Scribe: Brian Nixon
This lecture focuses on quantum computation by contrasting it with the deterministic and probabilistic models. We define the model and discuss how it benefits from quantum interference and how observations affect the quantum behavior. Then we talk about the smallest sets of gates that can be used to generate the quantum model and approximations of that model (after defining an approximation). This should set us up to discuss quantum algorithms in the near future.
1
Defining the Quantum Model
We’re building a model of quantum computing as a set of linear operations on a register of qubits. We have previously done this for deterministic and probabilistic models of computation; here we compare them to get our desired model.
1.1
Probabilistic Model
As shown last time, the probabilistic model consists of P 1. State: P We conside the data register to exist in a “pure” state  a superposition ψi = s ps si where s ps = 1 and ∀s, 0 ≤ ps ≤ 1. Each si is the representative for a given bit vector. 2. Operations:
(a) Local probabilistic gates T . These can take the form of stochastic matrices. (b) Observation. Look at the register and see what bit vector it contains. Can think of as a map ψi → si with likelihood ps . 3. Uniformity issues. The gates have to be described concisely. Refer to the last lecture but can be imposed by using a deterministic Turing machine to control sequence of gates. We can ask what effect observation has on the computation process. While the result after an observation is always a single bit vector, it is useful to treat it abstractly by considering the likelihood of all possiblities. This leads to the following definition. DefinitionP1. A “mixed” state  a probability distribution over a set of pure states Given a pure state, obs( s ps si) = {(si, ps )}s where our notation describes the set of possibilities as tuples of the state and its likelihood. We can run an algorithm on a mixed state by using si as the input and multiplying the output by the likelihood, ps . Exercise 1. Prove that intermediate observations in the probabilistic model can be suspended in favor of one observation at the end without altering the likelihood of each output.
1
1.2
Quantum Model
Using this as a template we can describe the quantum model of computing similarly. P 1. State: The “pure” state  single superposition ψi = s∈{0,1}m αs si where ∀s, αs ∈ C and P 2 s αs  = 1. We can also consider the register to be in a mixed state. 2. Operations:
(a) Local unitary operations. These take the form of unitary matrices (i.e. T ∗ T = I = T T ∗ ). P (b) Observation. Collapses s αs si to a distribution where si occurs with probability 2 αs  . Unlike the probabilistic model, in the quantum model intermediate observations will affect the final distribution. Consider the following: 3. Uniformity conditions that we’ll talk about later. 1 1 1 √ . We saw last lecture that H 2 xi = xi. Example: Let H be the Hadamard matrix 2 1 −1 Now obs(Hxi) = {(0i, 21 ), (1i, 21 )} where our notation for the mixed state matches that of the √ √ , 12 )} not xi. The intermediate observation definition above. H(obs(Hxi)) = {( 0i+1i , 21 ), ( 0i−1i 2 2 has a distinct affect in our output. ⊠ Just as interference is lost by making observations in the quantum model, interference can also be gained by failing to make an observation step. In particular, this affects composition of quantum machines. We can remove the necessity of these intermediate observation steps if we use CNOT to copy the output after each subfunction terminates. Consider the following diagram. Here C1 and C2 are the machines we want to compose. By saving the state of the register after C1, we remove the potential interference. Subsequent observation of that register state is not necessary so long as it exists in a state where it can be observed.
2
Original computation w/ int. observation
New computation with deferred observation
x qubits
ancilla bits
x qubits
C1 (q. circuit)
C1 (q. circuit)
a y
0...........0
measurement "copy" of y
y, b>= y, y XOR b>
b y=x’
output basis states
C2
C2 c
d
~ y’
y’
As discussed in the last lecture, simulating the deterministic model using the quantum model by making the algorithm reversible requires only a little more resources. If we make an observation after every step in the quantum model we can simulate any probabilistic machine. Of course, this sacrifices the destructive interference we obtained by using the phase information in the quantum model.
1.3
Uniformity
Certain conditions have to be placed on our circuits to make them physically realizable, we call these the uniformity conditions. We defined our uniformity conditions in the deterministic model by using Turing machines to describe behavior (or the equivalent circuit model). For the probabilistic model we imposed the restriction that any “coin flip” choices had to be done with a fair coin. For quantum machines we can impose two conditions. First, we specify that the gates used need to be formed in polynomial time using a deterministic Turing machine. We could use a quantum Turing machine instead but would require some orthogonality conditions to restrict the transition function that we’re not going to go into. Second, the amplitudes in our circuits can’t be too complicated. While theoretically we have access to all unitary matrices, practically there should be a limit to the degree of precision we can use in generating the matrix entries. It is not probable that a randomly selected complex number can be generated from a base set using simple operations, however it is possible to get close while maintaining restrictions on the number and type of operations used. If we allow arbitrary precision we can encode objects, such as the characteristic sequence of the halting function, which will cause large problems. We should note that the power of quantum computing doesn’t come from there somehow being more computational paths that it can take but from the interference between these paths. If you 3
look at the possible state diagrams of a quantum machine vs. a probabilistic machine, they are formed on the same decision tree over the register. However, the coefficient types defining how probable a given state is are different.
2
Universal Sets
Uniformity can be more formally expressed using universal sets of gates. Definition 2. S is an exact universal set for a model if any gate from that model can be realized exactly using only combinations of gates from S. For the deterministic model, {NAND} is an exact universal set (i.e. any deterministic gate can be composed of NAND gates operating on the correct bits). For the probabilistic model, {NAND, COINFLIPp } generates an exact universal set where COINFLIPp is a coin flip operation on a single bit operating with bias p. Exact universal sets require a degree of precision that is not always helpful when trying to physically realize circuits for a model. We can relax the constraints to get sets that are “good enough” for general purposes. Definition 3. S is an universal set for a model if any gate from that model can be realized arbitrarily precisely using only combinations of gates from S. For practical reasons, we don’t want our generating set to be infinite and so try to get finite universal sets to approximate our model. For example, for the probabilistic model, {NAND, COINFLIP1/2 } is a universal set and, being finite, is better suited for circuit realization that the exact universal set. For the quantum model, {CNOT, SINGLEQUBITGATE} generates the exact universal set where SINGLEQUBITGATE refers to any unitary operation on a single qubit (i.e. any 2 × 2 unitary matrix). Every local unitary operation on a constant number of qubits can be formed by
4
a composition of these operators. However, this too is an infinite set and we would like to be able to approximate its behavior using a finite generating set. It is necessary to be formal when we talk of approximating. Given a fixed error term ǫ > 0 we want a generating set that lets us get close to any operation of a fixed complexity with error less than ˜ an approximation, we want it to be the case ǫ. If U denotes the exact unitary transformation and U ˜ ψi that for each initial state ψi, the probability distributions obtained by observing P U ψi and U are at most ǫ apart, i.e., for p = obs(T ψi) and p˜ = obs(T˜ψi), D(p, p˜) = p− p˜1 = s ps − p˜s  ≤ ǫ. The sequence of unitary operations before the final observation gives rise to an overall unitary transformation that is the product of unitary transformations U of the form U = Ti ⊗ Im , where ˜ ) = U − U ˜ 2 it is left as an exercise to T denotes an elementary quantum gates. Letting D(U, U show the following: Exercise 2. Prove the relations. ˜ ). • D(p, p˜) ≤ 2 D(U, U ˜t U ˜t−1 ...U ˜1 ) ≤ Pt D(Ui , U ˜i ). • D(Ut Ut−1 ...U1 , U i=1 • D(T ⊗ Im , T˜ ⊗ Im ) = D(T, T˜).
2 Note that A2 = maxx6=0 ( Ax x2 ) which is equivalent to the square root of the maximal eigen∗ value of A A. These relations tell us if our operations are restricted to be combinations of at most t elements of S then to get error of ǫ it suffices to show Ti − T˜i 2 ≤ ǫ′ where ǫ′ is taken to be 2tǫ . The following theorems have their proofs omitted.
Theorem 1. For any universal set S, ∀ǫ > 0 we can approximate any T to within ǫ by a combination of polylog(1/ǫ′ ) = polylog(2t/ǫ) gates from S and S − 1 (any gates from S and their inverses). Theorem 2. S = {CNOT, H, Rπ/4 } is universal. Here H is the Hadamard matrix used above and 1 0 Rθ = . 0 eiθ
5
CS 880: Quantum Information Processing
9/13/2010
Lecture 4: Elementary Quantum Algorithms Instructor: Dieter van Melkebeek
Scribe: Kenneth Rudinger
This lecture introduces several simple quantum algorithms. The particular algorithms introduced here solve their respective problems faster than their corresponding classical algorithms (deterministic or probabilistic). Specifically, we examine the Deutsch, DeutschJosza, BernsteinVazirani, and Simon problems..
1
Solving Black Box Problems
For the time being, we concern ourselves only with black box problems. A black box problem is one in which we are given a function f. All we know for certain about this function is that f : {0, 1}n → {0, 1}m . It is our goal to determine some property of f. (If we are given further information about this function, we call it a promise function, as some property of the function has been ”promised” to us.) To determine the desired property property, we are given access to a transformation (the black box) which acts on an input of our choosing (residing in {0, 1}n ); the transformation’s output is the function in question applied to our input. However, because valid quantum computations must correspond to invertible (more specifically, unitary) transformations, the black box actually performs the function ˜f : {0, 1}n × {0, 1}m → {0, 1}n × {0, 1}m : (x, y) → (x, y ⊕ f (x)). We denote the unitary transformation that realizes ˜f by Uf . One example of such is a transformation where m = 1. If we start in the state ψ0 i = xi 0i, then we see that Uf ψ0 i = xi f (x) ⊕ 0i = xi f (x)i. We then apply Rπ to the ancilla qubit, mapping x, f (x)i to (−1)f (x) x, f (x)i. Lastly, we apply Uf to the state again, sending the ancilla back to its original state of 0i, and keeping the nonancilla qubits in the state we desired to create, (−1)f (x) xi. Pictured below is the corresponding circuit. x1 x2 ...
Uf
Uf
xn 0i x, 0i
Rπ x, f (x)i
(−1)f (x) x, f (x)i
(−1)f (x) x, 0i
This particular method is referred to as phase kickback, as f (x) is encoded in the phase of our final state. 1
This transformation required two queries of our oracle Uf . Can we achieve this phase kickback with only one query? It turns out that we can. Let’s instead initialize our ancilla qubit to −i (Recall: −i = √12 (0i − 1i) = H 1i.) Now when we apply Uf , the ancilla remains in the state −i, and we pick up the desired phase, that is: Uf xi −i = (−1)f (x) xi −i. Thus we can achieve phase kickback with only one query of our oracle, as shown below. x1 x2 ...
Uf
xn −i (−1)f (x) x, 0i
x, 0i
Keeping in mind this phase kickback trick, we now turn our attention to the four problems mentioned above
2
Deutsch’s Problem
Consider a function f such that f : {0, 1}n → {0, 1}m . (Here, n = m = 1.) The question we wish to answer is: Does f (0) = f (1)? Classically, we must make two queries of our oracle, one to determine f (0), the other to determine f (1). (Even if we use probabilistic computation instead of deterministic, we are still required to make two queries.) However, with a quantum circuit, we need only one query. We will use the phase kickback technique, recalling that: 0i −i → (−1)f (0) 0i −i 1i −i → (−1)f (1) 1i −i If we apply the phase kickback to a superposition of these states, we find 1 1 √ (0i + 1i) −i → √ ((−1)f (0) 0i + (−1)f (1) 1i) −i 2 2 If f (0) = f (1), then this state reduces to ± √12 (0i + 1i) −i = ± +i −i. If f (0) 6= f (1), the resulting state is ± √12 (0i−1i) −i = ± −i −i. We note then that these two states are orthogonal. Therefore, there exists a unitary transformation which will map one of these states to 0i and the other to 1i. This transformation is the Hadamard gate. Therefore, the resulting quantum circuit is:
2
+i
NM
H Uf
NM
−i
(−1)f (x) x, 0i
+, −i
Thus if we observe the 0i state, we know that f (0) = f (1), and viceversa for an observation of the 1i. Therefore, we see that with only one query, we are able to determine the nature of f , with 100% accuracy.
3
DeutschJozsa Problem
Now we consider a more general problem, where our function takes a string of n qubits: f : {0, 1}n → {0, 1} Additionally, this is a promise problem; we are told some additional information about f . In this particular case, we know that f is either balanced or constant. (In the n=1 case, we see that this reduces to just the Deutsch problem.) It is our task to determine whether f is balanced or constant. How many queries are required to answer this question correctly? Deterministically, 2n−1 +1 queries are required. The number of queries required using a classical randomized algorithm will depend on the allowed error. If k inputs are chosen at random, k queries will yield a correct answer with error less than or equal to 21−k . (If the function is balanced, we could discover this in a minimum of two queries, if we are lucky. The most unlucky we can be is to choose k inputs that have the same output, leading us to guess incorrectly that the function is constant. The probability of choosing k inputs with the same output, and hence guessing wrong, is 21−k .) However, there exists a quantum algorithm which yields 100% accuracy (modulo experimental error) with only 1 query, providing a tremendous speedup, from exponential to constant! How does this algorithm work? Start with an initial state which is a uniform superposition of n qubits, tensored together with one ancilla in the −i state, or: X 1 ψ0 i = √ ( xi) −i 2n x∈{0,1}n Apply Uf to this state. If f is constant, we find that: X 1 Uf ψ0 i = ± √ ( xi) −i 2n x∈{0,1}n If f is balanced, Uf ψ0 i takes on a more complicated form, but as in the Deutsch algorithm, Ufbalanced ψ0 i is orthogonal to Ufconstant ψ0 i. Therefore, all we need is, as before, a unitary transformation that will send Ufbalanced ψ0 i to a string of all 0s. Again, the Hadamard gate will serve this purpose; applying it to each (nonancilla) wire yields the desired result. If f is constant, the final readout will be the all0 string. As Ufbalanced ψ0 i is orthogonal to Ufconstant ψ0 i, the final readout 3
for a balanced f cannot be the all0 string. Thus in one query have we determined the nature of f , with 100% accuracy. Shown below is the quantum circuit diagram for this algorithm. +i
H
NM
+i
H
NM
H
NM
H
NM
...
Uf
+i −i +i⊗n −i
3.1
0i⊗n −i (If f is balanced.)
Note on the Hadamard Gate
Before proceeding, let us consider the operation of n Hadamard gates on a base state of length n: H ⊗n xi. We see that such an operation has the following action: H
⊗n
n O 0i + (−1)xi 1i 1 X √ (−1)x·y yi xi = =√ n 2 2 y i=1
(1)
P (x·y denotes an inner product, that is, x·y = ni=1 xi yi .) Thus by appropriate choice of x, we may generate all possible base states in the {+i , −i} basis. Additionally, we can generate a uniform superposition of states if we choose x=0.
4
BernsteinVazirani Problem
Now consider the following function. f : {0, 1}n → {0, 1} x → (a · x + b)
(mod 2)
n
(a ∈ {0, 1} , b ∈ {0, 1}) The promise is that f conforms to the above constraints. Our task is to determine the values of a and b. (We note that the BernsteinVazirani promise is a restriction on DeutschJosza, but the question is harder to solve.) With a classical deterministic algorithm, n + 1 queries are required; 1 to determine b (query with x = 0), and n to determine a. (We also note that is, classically, the optimal algorithm, as n + 1 bits of information are required.) A classical randomized algorithm will do no better, also requiring n + 1 queries. Exercise 1. Show that any classical randomized algorithm that solves the BernsteinVazirani problem with reasonable error makes at least n + 1 queries. 4
Turning to the quantum algorithm, we will discover that there exists a solution that requires only two queries, solving with 100% accuracy. As before, one query is required for b, but now also only one query is required for a. b is determined in the same manner as in the classical algorithm. To determine a, we initialize to: ψ0 i = +i⊗n −i We note then that: X (−1)b X 1 (−1)f (x) xi) −i = √ ( (−1)a·x xi) −i Uf ψ0 i = √ ( 2n x∈{0,1}n 2n x∈{0,1}n If we simply apply a Hadamard gate to each nonancilla qubit, then the final readout will be of ai (along with the ancilla), as illustrated by equation (1). +i
H
NM
+i
H
NM
H
NM
H
NM
...
Uf
+i −i
(−1)b P a·x xi −i √ n +i⊗n −i ai −i x (−1) 2 It turns out that this is the most optimal quantum algorithm for this problem; one cannot do better.
5
Simon’s Algorithm
Now we examine a function which maps not to {0, 1}, but to {0, 1}n instead. More specifically f : {0, 1}n → {0, 1}n such that either: (1) f is onetoone (f is a permutation.) or (2) ∃s 6= 0∀(x, y)f (x) = f (y) ⇐⇒ x ⊕ y = s (f is twotoone.) (It should be noted that (1) is a special case of (2), with s = 0.) Our goal here is to determine if f falls into category (1) or (2), and also determine the value of s. If one were lucky, and f fell into the second category, one could solve this problem classically with only two queries (by appropriately choosing x and y such that f (x) = f (y)). However, this is not a reliable method, nor could it tell us if f is onetoone. The general lower bound is achieved by picking k possible valuesn for s (k k different spacings), eliminating 2 values for s. Thus the lower bound is approximately 2 2 queries. n A randomized algorithm requires Ω(2 2 ) queries. However, there exists a quantum algorithm which can solve this problem in O(n) time. The demonstration of this technique is forthcoming in the next lecture (Lecture 5). 5
9/14/10
CS 880: Quantum Information Processing
Lecture 5: More Elementary Quantum Algorithms Instructor: Dieter van Melkebeek
Scribe: John Gamble
Last class, we turned our attention to elementary quantum algorithms, examining the Deutsch, DeutschJozsa, and BernsteinVazirani problems. In all three of these cases, we showed that quantum algorithm could do better than the best classical scheme. We then briefly introduced Simon’s problem. In this lecture, we will develop the quantum algorithm that solves Simon’s problem in O(n) queries. We then begin analyzing Grover’s quantum search algorithm.
1
Simon’s Problem
In the following section, we analyze Simon’s problem, a particular promise problem that is solvable exponentially faster on a quantum computer than on a classical computer. The problem statement is that, given some function f : {0, 1}n → {0, 1}n as a black box and the promise that either: • f is onetoone, or else • there exists s 6= 0n such that ∀x, y ∈ {0, 1}n , f (x) = f (y) if and only if x ⊕ y = s, determine into which case f falls, and if the second, determine s. Note that the second case corresponds to f being twotoone, with each pair of inverse image points separated by exactly s. Our analysis proceeds as follows. First, we discuss the classical complexity of this problem. Next, we will propose a quantum circuit to solve the problem. Finally, we examine the circuit to show that it has query complexity O(n), a result first obtained by D. R. Simon in 1994 [2].
1.1
Classical complexity
We first turn our attention to deterministic, classical algorithms. In order to solve the problem, we need to find the correct value of s, or show that there is no such s 6= 0n exists. Hence, we need to “scan” through all possible values of s as quickly as possible. Since there are 2n choices for s, we have a trivial upper bound of O(2n ). Now, suppose that we have queried our function q times. q Then, assuming that we have not repeated a query, we have tried pairs of inputs. Hence, we 2 q have eliminated at most 2 possible choices of s. Since we need to check all possible values of s, to answer the problem with certainty, we need to satisfy q ≥ 2n . (1) 2 Applying the definition of the binomial formula, we find q(q − 1) ≥ 2n . (2) 2 Hence, q & 2n/2 , so that we require Ω 2n/2 queries to solve the problem as a lower bound. Probabilistically, the problem is not much different. We still need to rule out all possible values of s, of which there are 2n . It can be shown that this also has a lower bound of Ω 2n/2 . 1
1.2
Simon’s quantum algorithm
Now, we construct and analyze the quantum circuit that will enable us to solve Simon’s problem in O(n) evaluations, exponentially faster than the classical case. Extending the general scheme that we used in the Bernstein Vazirani algorithm, we consider (a) +i 0i where the state
n
.n
(3)
(b) H ⊗n
Uf .n
NM
y
NM
z,
notation indicates the presence of n such wires. First, we initialize our system in the 1 +i⊗n 0i⊗n = √ N
X x∈{0,1}n
xi 0i⊗n ,
(4)
where N = 2n . Next, we apply the unitary operator corresponding to the application of f , giving us the state 1 X (a)i = √ x, f (x)i (5) N x at point (a). Recall from last time that acting H ⊗n on any computational basis state xi gives 1 X (−1)x·y yi . H ⊗n xi = √ N y
(6)
Hence, at point (b) we have (b)i = = =
H ⊗n ⊗ I ⊗n (a)i 1 X √ H ⊗n xi f (x)i N x 1 X (−1)x·y y, f (x)i . N x,y
(7)
We let f (x) = z and define αy,z by (b)i =
X y,z
αy,z y, zi ,
(8)
where each sum runs over the usual {0, 1}n . We now perform a measurement, so we must analyze the form of αy,z to determine the result. To do this, note that our function can either be onetoone or twotoone. If it is the latter, half of the z ∈ {0, 1}n fall outside the range of f , and so the coefficients αy,z are zero. First, we consider the case where f is onetoone. Then, we have αy,z =
(−1)x·y , N 2
(9)
where x is the unique element such that f (x) = z. Next, consider the second case, where f is twotoone. We have two subcases: either z is in the range of f or it is not. If it is, then αy,z =
(−1)x·y + (−1)(x⊕s)·y , N
(10)
where s is such that f (x) = f (x ⊕ s) = z. Note that by the promise of the problem, we know that there will be exactly two elements in the inverse image, and that one can be obtained from the other by adding s. Finally, as mentioned before, if z is outside the range of f , then αy,z = 0. Now that we have worked out the form of αy,z , we can analyze the resulting outputs probabilistically, which we do in the next section.
1.3
Analysis of the measured output
Now that we know the possible values of αy,z = 0, we first need to use them to deduce what our circuit outputs, and with what probability. Specifically, we are interested in the tophalf of our register, which, after measurement, will be in the computational basis state yi with probability X Pr (y) = αy,z 2 . (11) z
Hence, from equation (9), if f is onetoone we expect that for any y, Pr (y) = 1/N . If f is twotoone, then from equation (10) we get probability 2 X (−1)x·y +(−1)(x⊕s)·y if z ∈ range(f ) N Pr (y) = 0 if z ∈ / range(f ) z N 1 + (−1)s·y 2 = 2 N ( 2 if s · y = 0 N , (12) = 0 if s · y 6= 0 where s is such that f (x) = f (x ⊕ s) = z. In summary, if our function falls in the first case of our problem and is onetoone, we get a uniform probability distribution across all possible outputs. However, if our function falls in the second case and is twotoone, we get a uniform distribution across all outputs that are orthogonal to s. Hence, outputs that are not orthogonal to s are never observed. By running our circuit repeatedly, we can record what output values we receive, and thus eventually determine s (or that we are in case one). However, we would like to know how many times we need to iterate before being certain of s. In order to accomplish this, we present the following lemma. Lemma 1. For any vector space S of dimension d over GF(2), pick d vectors y1 , y2 , . . . , yd uniformly at random. Then, Pr (span (y1 , y2 , . . . , yd ) = S) ≥ δ, where δ is a universal constant independent of S. Proof. First, note that since we require a set of d vectors to span a space of dimension d, all must be chosen to be linearly independent. We then consider choosing the d vectors, one at a time, 3
uniformly at random. In the first step, we must only avoid choosing the zero vector (the string of all zeros), so our success probability at the first step is P1 =
2d − 1 . 2d
(13)
Now that we have fixed y1 , we need to make sure y2 ∈ / span (0, y1 ), which has two members. Hence, the probability of the successfully choosing both the first and second vectors is P2 =
2d − 1 2d − 2 . 2d 2d
(14)
To pick y3 , we need to require y3 ∈ / span (0, y1 , y2 ), which has 23−1 = 4 members. Continuing like this, the probability for choosing all d vectors successfully is 2d − 1 2d − 2 2d − 2d−1 · · · 2d 2d 2d d Y 1 = 1− i 2 i=1 ∞ Y 1 ≥ 1− i . 2
Pd =
(15)
i=1
Taking the logarithm of both sides, we have1 ln(Pd ) ≥
∞ X i=1
1 ln 1 − i . 2
(16)
Asymptotically, as i → ∞, we note that 1 1 ln 1 − i ∼ − i . 2 2 Hence, since −
∞ X 1 = −1, 2i
(17)
(18)
i=1
we know that the series in equation (16) converges to some finite value. So, there is a positive δ such that Pd ≥ δ, independent of d, as desired. Now, we return to the analysis of our quantum algorithm. We run the circuit n − 1 times, each time receiving a random output y. If we are the second case (where f is twotoone), the outputs are uniformly distributed amongst all basis states orthogonal to s. Hence, we have picked n − 1 uniformly distributed vectors from a space of dimension n − 1, so our lemma applies, and we know that our vectors span the space orthogonal to s with constant error. Since we can easily check if the dimension of the space spanned by the output vectors is indeed n − 1, in constant error we can determine a unique candidate for s. We can then check that this is 1
Numerically,
P∞
i=1
ln 1 −
1 2i
≈ −1.24206, providing a bound of δ ≈ 0.288289.
4
the correct value of s by checking f (0) = f (s). If the equality holds then we know with certainty that we are in case two and have the correct value of s. Conversely, if the equality does not hold, then we know that we are in case one. Hence, with constant, finite probability we will know that we have the correct answer. Otherwise, we will know that we need to start over, so we need to run our circuit O(1) times to boost our probability of success arbitrarily close to unity. This is an example of a Las Vegas algorithm, an algorithm that will succeed with finite probability, and will always alert us to failures. Las Vegas algorithms themselves are a special class of Monte Carlo algorithms, where we will still get a correct answer with finite probability, but we may not be able to tell if an answer is incorrect. Exercise 1. Consider a function f : {0, 1}2 → {0, 1} with the promise that f −1 ({1}) = 1. That is, we are promised that the inverse image set of {1} has size one. Construct a quantum algorithm that determines f −1 ({1}) with zero error while using only one function query.
2
Introduction to quantum search
In this section, we begin considering Grover’s algorithm for quantum search, first formulated by L. n K. Grover in 1996 [1]. In this problem, we are given some blackbox function f : {0, 1} → {0, 1} −1 and the size of the inverse image of 1, t = f (1) . Our goal is to find an x such that f (x) = 1. Later on, we will consider relaxing the assumption that t is known, but for now will assume that it is given. Before we start looking for quantum methods to solve the search problem, we will briefly investigate its classical complexity. Deterministically, the worstcase scenario is that f evaluates to zero for as many queries as possible before finally outputting one. Hence, we may need to try N − t times before we know that there are only t options remaining, so they must evaluate to one. If we do not know the value of t, then we must try once more, since we would not be sure where to stop. If we allow randomization and select our trials uniformly at random, then we require Ω (N/t) function queries. p It turns out that the quantum algorithm will run in O N/t , and that we will be able to argue that this is optimal. Next time, we will talk about implementation in a circuit, but for now we will develop a conceptual picture for how such an algorithm could work by studying amplitudes of a quantum state. First, we consider initializing a quantum register of n qubits, corresponding to N computational basis states, in a uniform superposition, X ψi = αx xi , (19) x
√ where initially αx = 1/ N , as shown in the first panel of figure 1. Next, we apply phase kickback, that multiplies each αx by −1 whenever f (x) = 1. An example of this for which t = 3 is depicted in the second panel (step (a)) of figure 1. After that, we reflect the value of each amplitude about the average (shown in the figure by a blue dashed line). If t N , then √ the application of the phase kickback did not have much of an effect of the √ average value of 1/ N , so the new values of the target state amplitudes are approximately 3/ N , as depicted in the third panel of figure 1 (step (b)).
5
1 √ N
Initialize:
f (x) = 1
Step (a):
αx
Step (b):
x N −1
0
Figure 1: A graphical picture of Grover’s search algorithm by studying quantum amplitudes. In the first panel, we initialize a quantum register of n qubits to a uniform superposition. In the second panel, we apply a phase kickback, assuming that f evaluates to one three times. In the third panel, we reflect all the amplitudes about the average amplitude. By repeating steps (a) and (b), we can boost the amplitudes of the target states further. For instance, after applying (a) and (b) once more, the amplitudes on the target states are approxi√ mately 5/ N . Note, however, that eventually it will not be a good approximation that the average does not move. In fact, eventually the amplitudes of the target qubits will become sufficiently high that the the average amplitude after an iteration of step (a) will be negative, causing our system return toward the uniform initial state. It turns out that the repeated application of any unitary operator will eventually return us close to our original state. In order to see this, consider an arbitrary unitary operator U . Since we are dealing only with finite matrices, and hence spaces of finite volume, it follows that for any l and , there exist some k > l such that
k
(20)
U − U l ≤ .
6
That is, if we apply U enough, we will return arbitrarily close to a previous state. But now,
k
U − U l = U l U k−l − I
= U k−l − I ≤ , (21) so there is some iteration that will bring U arbitrarily close to identity. Hence, we will need to develop a strategy for determining when to stop iterating and measure. Next time, we will discuss the implementation of this process as a quantum circuit. We will then discuss how to determine the optimal stopping point so that we do not go back toward our initial state.
References [1] Lov K. Grover. A fast quantum mechanical algorithm for database search. In STOC ’96: Proceedings of the twentyeighth annual ACM symposium on Theory of computing, pages 212– 219, New York, NY, USA, 1996. ACM. [2] D.R. Simon. On the power of quantum computation. Foundations of Computer Science, Annual IEEE Symposium on, 0:116–123, 1994.
7
9/16/10
CS 880: Quantum Information Processing
Lecture 6: Quantum Search Instructor: Dieter van Melkebeek
Scribe: Mark Wellons
In the previous class, we had began to explore Grover’s quantum search algorithm. Today, we will illustrate the algorithm and analyze its runtime complexity.
1
Grover’s Quantum Search Overview
Grover’s algorithm is an excellent example of the potential power of √ a quantum computer over a classical one, as it can search an unsorted array of elements in O( N ) operations with constant error. A classical computer requires Ω(N ) operations, as it must traverse the entire array in the worst case. Formally, Grover’s algorithm solves Given some function f : {0, 1}n → −1 the following problem: {0, 1}, and possibly the value t = f (1) . Find any x ∈ f −1 (1). We begin by entangling all possible inputs so that the state vector looks like X ψi = αx xi (1) where For brevity, we will define
1 αx = √ . 2n
(2)
N ≡ 2n .
(3)
If we were to plot the phase of αx for each xi, it would look as shown in figure 1.
αx 1 √ N
x
Figure 1: The initial state of the system. Every state is equally likely to be observed if a measurement is taken. We sometimes refer to this state as the uniform distribution. At this point, we introduce a new operator, U1 , which performs a phase kick only on states where f (x) = 1 and leaves states where f (x) = 0 unchanged. The resulting amplitudes are shown in figure 2. . 1
αx 1 √ N
x
−1 √ N
Figure 2: The state of the system after a phase kick on all states where f (x) = 1. In this example, there were three states affected, which were reflected across the xaxis.
αx 1 √ N −1 √ N
αx U2
x
1 √ N
x
Figure 3: The state of the system after being reflected across the average, which is indicated by a dotted line. Note that the states where f (x) = 1 are now much more probable if a measurement is taken. We also another operator, U2 , which reflects each amplitude across the average value of αx , as shown in figure 3. We now repeatably apply U1 and U2 until the amplitudes of the desired states vastly exceeds the amplitudes of the other states. We then take a measurement, and with high probability will get some state x such that f (x) = 1. Which particular x we get will be uniformly random among the valid states.
2
Unitary Property of U1 and U2
We omit the proof that U1 is unitary as it is simply a phase kick, which was shown to be unitary in a previous lecture. To show U2 is unitary, we first show it is linear. To understand how U2 might be implemented, we note that reflecting around the average is equivalent to subtracting the average, reflecting across 2
the xaxis, and then adding the average back. In formal notation, we can describe U2 as X X U2 ≡ − ψi − AVG (αx ) xi + AVG(αx ) xi
(4)
where
1 X αx . (5) N This is clearly linear in ax , as AVG(αx ) is simply a linear combination of the ax ’s and all of the operators are linear. We finish this proof by showing that all the eigenvalues of U2 are magnitude 1, a condition required of unitary matrices. First consider what happens when we apply U2 to the initial state shown in figure 1. Nothing will happen, as the reflection across the average transforms this state to itself. Thus, the uniform distribution is an eigenvector and the eigenvalue is 1. Now consider the case shown in figure 4. On the left, we have a system where the average is zero, and after applying U2 , we have the system mirrored across the xaxis. Thus this state is another eigenvector and the eigenvalue is 1. In fact, all the eigenvectors orthogonal to the uniform distribution will be states that U2 simply reflects across the xaxis. P Therefore, all eigenvalues are either 1 or 1, as we can consider U2 to be a reflection across the x xiaxis. AVG(αx ) ≡
αx 1 √ N
−1 √ N
αx
x
U2
1 √ N
−1 √ N
x
Figure 4: The state of the system before applying U2 is on the left, and the system afterwards is on the right. As the average is zero, the system is merely reflected across the xaxis.
3
Quantum Circuit
Now we would like to construct the quantum circuit that implements Grover’s algorithm. We naturally start with the uniform superposition. Since U1 is simply a phase kick, it can be implemented by adding an additional −i qubit as described in previous lectures. To implement U2 , recall that U2 is reflection along a axis. If this axis was a basis axis, this reflection would be easy to realize. Unfortunately, it is instead some basis determined by the uniform superposition. However, we can change basis via Hadamard gates, which will shift us to the basis state corresponding to the allzeros vector. Now we simply reflect across this basis state, and then change back to the uniform superposition, and we have implemented U2 . We can repeat U as many times as desired. The full 3
circuit is shown below. U1
repeat k times
U2
+i
H
H
···
NM
x1
+i
H
H
···
NM
x2 ,
H
H
···
NM
xi ,
H
H
···
NM
xn ,
···
NM
y
Reflection Across 0n i ...
Uf
+i −i
There are alternatives ways to implement U , but this is adequate for our purposes.
4 4.1
Algorithm Complexity For a known t
We now seek to determine the optimal value of k, where k is the number of applications of U . Consider that the amplitude αx of xi at any point in time depends only whether f (x) = 0 or (i) f (x) = 1. Since αx only depends on f (x), we can describe the system state after i iterations of U as E X 1 1 X (i) = βi √ xi + γi √ xi , (6) ψ N − t x:f (x)=0 t x:f (x)=1 where βi and γi are constants and are constrained by βi2 + γi2 = 1.
(7)
It follows that r
N −t , N r t = . N
β0 = γ0
We can thus describe the system as a twodimensional system with parameters β and γ, where (β, γ) lie on the unit circle, as shown in figure 5. Here we plot β on the B axis and γ on the C axis. This unit circle allows us to generate a new variable θ, which is the angle between the Baxis and the point (β,γ) as measured from the origin. We can describe the initial value of θ as r t sin(θ0 ) = . (8) N 4
C (β0 , γ0 ) θ0
B
Figure 5: β and γ can be mapped to the unit circle, with β on the B axis and γ on the C axis. Given some point (β,γ) on this unit circle, what will the effect of the U1 and U2 operators be on this point? Since U1 is a phase kick, it transforms (β,γ) by (β, γ) → (β, −γ)
(9)
which is simply a reflection across the Baxis. U2 reflects the point across the line defined by the origin and the point (β0 ,γ0 ). Taken together, these two reflections form a rotation of 2θ0 . That is, every application of U rotates the point 2θ0 counterclockwise. It follows that after i iterations, θi = (2i + 1)θ0 , βi = cos(θi ), γi = sin(θi ). From looking at the unit circle, it should be clear that the best time to make a measurement is when (β,γ) is on or very close to the Caxis, as that is when the amplitudes of the valid states is highest. It follows that the ideal value of k would satisfy (2k + 1)θ0 = which leads to
1 k= 2
π 2
π −1 . 2θ0
(10)
(11)
This may not be an integer, so we simply choose the closest integer value. We now claim that if we choose a k such that 1 π k= −1 (12) 2 2θ0 then
1 Prob observe x ∈ f −1 (1) ≥ . 2 5
(13)
C
5θ0 3θ0 θ0
B
Figure 6: Here is an example where we have applied U three times, which brings us into the shaded part the of unit circle. Each application of U rotates us by 2θ0 , and there is no value of θ0 < π/2 that will allow us to completely jump over the shaded area when applying U . Measurements taken in the shaded region have probability ≥ 1/2 of observing a valid state. We know this as k must bring us with the top quarter of the unit circle, as shown in figure 6. The advantage of being in the shaded area is that, in terms of absolute value, the amplitudes of the valid states exceed the amplitudes of the invalid states, thus giving us an probability ≥ 1/2 when taking a measurement. We can now show that r ! N k=O (14) t for small values of t. Using the small angle approximation we can rewrite equation 8 as r t θ0 ≈ . N
(15)
Which can be substituted into equation (12), giving us equation (14).
4.2
For an unknown t
If t is unknown, there are several things we can try. For now, let us assume that t is positive. 4.2.1
First Attempt
We can try k = 1, and then double k with each step until the first success. This algorithm will have some iteration i∗ where Prob[success] ≥ 1/2. This is clearly true, as if we double k every step, then there is no way we can skip the top quarter of the unit circle.
6
How many times do we use U in the algorithm? As each iteration doubles the number of times U is applied, this is simplythe sum of a geometric series. So number of applications of U until q N ∗ iteration i is still O t . However, this algorithm does not quite work, as i∗ is only guaranteed to have probability of success ≥ 1/2. So it is very possible that we will reach i∗ , fail the measurement, and then move past i∗ . If we move past i∗ , the amplitudes of the valid states begin decreasing, thus lowering the probability of measuring a valid state. In other words, the problem with this algorithm is we do not know when to stop if we do not get a success. 4.2.2
Second Attempt
In our first attempt, the amplitudes of the valid states were improving until we reached i∗ , at which point they declined. In our second attempt, we correct for that by trying to maintain our position in the desirable region. We do that by setting l = 1 and doubling l in each iteration, and each time, we pick a k uniformly from random from the set {1, 2, 3, ..., l}. This has the advantage that if we overstep i∗ , there is still a probability of at least 1/2 that we will pick a point in the good region. It follows that the we expect to overstep i∗ by only one iteration. In any case, the expected number of applications of U is hnumber of U i = hnumber of U up to i∗ i + hnumber of U after i∗ i
(16)
We showed in the first attempt that r hnumber of U up to i∗ i = O
N t
! ,
(17)
which just leaves the us to solve the right term in equation (16). Since the number of applications of U doubles every step, we can express this term as ∗
hnumber of U after i i ≤
i−i∗ 3 2 4 ∗
X i>i
i
(18)
The 3/4 arrises from that fact each U has a probability of 1/2 of being in the good region and points in the good region have 1/2 probability of being a success. However, this series diverges, as the ratio in our geometric series is greater than 1. This can easily fixed by not doubling between each iteration. Instead, we chose some other factor λ < 4/3, and now the series converges as shown. i−i∗ 3 λ , hnumber of U after i i ≤ 4 i>i∗ i−i∗ X 3 i∗ i−i∗ λ ≤ λ , 4 i>i∗ X 3 i i∗ ≤ λ λi . 4 X
∗
i
i>0
7
∗
What’s inside the summation converges, as it is a simple geometric series. We also know λi from equation (17). As we now know both of the terms on the right side of equation (16), it follows that q N Grover’s algorithm runs in O t .
8
CS 880: Quantum Information Processing
9/20/10
Lecture 7: Query Lower Bounds Instructor: Dieter van Melkebeek
Scribe: Cong Han Lim
Last class, we covered Grover’s quantum search algorithm, which gives us a quadratic speedup over classical and probabilistic algorithms. Today, we review some applications of Grover’s algorithm and prove the optimality of the algorithm (up to a constant factor).
1 1.1
Applications of Grover’s Algorithm Amplitude Amplification
Suppose we are given a function f and a quantum algorithm A that produces a superposition over the base states, which can be partitioned into a good set (f (x) = 1) and a bad set of states (f (x) = 0): Output of A =
X
X
αx xi Garbage(x)i +
x:f (x)=1
αx xi Garbage(x)i
x:f (x)=0
(Grover’s quantum search algorithm produces precisely such an output, with the additional restriction that our αx are uniform within each of the partitions). Our goal is to boost the probability of observing a good state, which is X Pr[observing a good state] = αx 2 . x∈GOOD
This can be done by applying the technique of Amplitude Amplification, which we will briefly outline since its a generalization of Grover’s algorithm. The notation here is the same as the one used in our previous lecture. Consider the state after we apply A. Just as in Grover’s algorithm we can think of the state as a point on the unit circle on R2 with axes denoted by B and C (bad and correct), where the angle between the point and the axes give the probabilities of observing a bad or good state.
Figure 1: State after one run of A As in Grover’s algorithm, in each step we will be applying 1
1. Flips about the B axis 2. Rotation about the axis defined by the point. To implement the flips, we simply need to apply Uf (phase kickback). For the rotations, we will 1. apply A−1 to ‘shift’ the axis defined by the point to the B axis, 2. perform a reflection about the B axis, denoted by U0n i , and finally 3. apply A. Hence, we can describe each iteration of amplitude amplification as: AU0n i A−1 Uf (Note that if we consider Grover’s algorithm in this framework, the A here is simply the Hadamard gate H ⊗n .) Repeating the same analysis as last lecture, we know the number of iterations required is O( √1p ), where p is the probability of observing a good state after one run of A. This is again a quadratic speedup over the classical case which requires Ω( p1 ) trials.
1.2
Finding a Witness for an NP Problem
Consider the Satisfiability problem. In the classical setting, the brute force approach would take O(2n ) trial to obtain a valid assignment. However, using Grover’s√ algorithm, we can search over the space of all assignments and obtain a valid assigment in O(2 n ) trials. While this does not necessarily mean that for any classical algorithm for SAT we can always find a quantum algorithm that gives a quadratic speedup, this has been true for known algorithms. Current known deterministic methods for solving SAT gives us a search space that we can recast in a quantum setting to allow Grover’s algorithm to work efficiently.
1.3
Unstructured Database Search
One can view Grover’s algorithm as a way to search over an unstructured database where the keys are precisely the boolean strings x of length n, corresponding to the 2n base states xi. While this is often presented as an application of the algorithm (Grover’s original paper does this), this is impractical in reality. Firstly, it is unlikely that a set of realworld data has no intrinsic structure. Secondly, both the number of quantum gates required and the time needed to transform the realworld data into the appropriate form for the algorithm will be at least linear in the number of inputs.
1.4
Deciding OR(f )
We have been considering the search problem of finding an input x such that a given function f : {0, 1}n → {0, 1} evaluates to f (x) = 1. We can consider a related decision problem OR(f ): Given the function f : {0, 1}n → {0, 1}, does there exist an x ∈ {0, 1}n such that f (x) = 1? (This is equivalent to computing the boolean OR over all possible f (x)). It is clear that the search problem is at least as difficult as deciding OR(f ), and we will make use of this fact to obtain a lower bound for quantum search. 2
2
Tight Lower Bound for Quantum Search
√ From the previous lecture we know that Grover’s algorithm runs in O( N ), where N = 2n denotes the number of binary strings of length n. We will show that this is optimal by proving the following theorem: Theorem 1. Any quantum blackbox algorithm that decides OR(f ) with constant error < √ to make Ω( N ) queries to f .
2.1
1 2
needs
Structure of Quantum Circuit
To prove Theorem 1, we first need to consider the structure of any quantum circuit that makes q many queries. Each quantum circuit consists of two types of operators: 1. Unitary operators Vi , independent of f , 2. Oracle Uf , which provides blackbox access to f . We will also assume that we postpone observation of the system till the end. Hence, our quantum circuits have the following form:
repeat q times ··· ··· Uf
Uf
Uf ···
V1
Vf inal
Vq
V2 ··· ··· ···
(0) ψ Initial State
(1) ψ
(2) ψ
(q) ψ
NM
NM
NM
NM
NM
NM
f inal ψ Final State
Hence, the state after the ith Uf gate ψ (i) is given by E E (i) = (Uf ⊗ I) · Vi · . . . · (Uf ⊗ I) · V1 · ψ (0) , ψ where ψ (0) = 0 . . . 0i without loss of generality.
2.2
Proof Idea for Theorem 1
Given any two functions f, g such that OR(f ) 6= OR(g), the corresponding final states have to be almost orthogonal for us to observe the correct answer with high probability. Therefore, we can 3
prove the theorem by picking an ‘adversarial’ set of √ functions f such that any quantum circuit that correctly decides OR(f ) for this set will require Ω( N ) queries. E (i) We let ψ (i) denote the states for the function f such that OR(f ) = 0 and ψx˜ for function E E (0) fx˜ such that fx˜ (x) = 1 ⇔ x = x ˜. Note that ψx˜ = ψ (0) , but ψxf˜ inal and ψ f inal have to be nearly orthogonal. Only the oracle affect the angles, and we will show that any query gate E Uf can (i) (i) can increase the angles between ψ and ψ by a small factor on average over all x ˜. This x ˜
means many queries are required, giving us a lower bound.
2.3
Proof of Theorem 1
We will begin by making formal the ‘almost orthogonal’ condition. Consider the distance between these two probability distributions: D Prψf inal , Prψf inal ≥ Pr[algorithm outputs 0 on f ] − Pr[algorithm outputs 0 on fx˜ ] + x ˜ Pr[algorithm outputs 1 on f ] − Pr[algorithm outputs 1 on fx˜ ] Since we want the error rate of the algorithm to be a constant factor , we obtain D Prψf inal , Prψf inal . ≥ 2(1 − 2) x ˜
which implies we need
E E
f inal − ψxf˜ inal ≥ 2(1 − δ)
ψ
(1)
for some δ > 0 that is dependent only on . Exercise 1. Verify Equation 1 and determine δ() (there is a simple expression for δ()). Finally, we can begin the proof of Theorem 1.
(i) E
Proof. We will now put an upper bound on how much the norm ψ (i) − ψx˜ in can change in any one step of the quantum algorithm (where each step consists of a unitary gate √ Vi and the Uf gate directly after it), thereby showing any quantum algorithm requires q = Ω( N ) iterations to satisfy Equation 1. Since E E (i) = (I ⊗ I)Vi ψ (i−1) ψ E E (i) (i−1) , ψx˜ = (Ufx˜ ⊗ I)Vi ψx˜ we have
E E E E
(i) (i) (i−1) − ψx˜ = (I ⊗ I)Vi ψ (i−1) − (Ufx˜ ⊗ I)Vi ψx˜
ψ This equation can be simplified by applying the triangle inequality and removing the unitary terms (which does not affect the norm):
E E E E E E
(i) (i) (i−1) (i−1) (i−1) (i−1) ψ − ψ ≤ (I ⊗ I)V ψ − (U ⊗ I)V ψ + (U ⊗ I)V ψ − ψ
x˜
fx˜ x˜ i i i fx˜
E E E E
(i−1) = (I ⊗ I)Vi ψ (i−1) − (Ufx˜ ⊗ I)Vi ψ (i−1) + ψ (i−1) − ψx˜ (2)
.  {z } B
4
We now analyze the term B in Equation (2), which gives us an upper bound on the change in norm. Let E X αz zi Vi ψ (i−1) = z
where zi = xbui, such that xbi represents the input into Uf and bi represents the ancilla qubit that records the output of Uf . Note that ( xbu if x = x ˜ Ufx˜ ⊗ I : xbui 7→ xbui if x 6= x ˜. We can now proceed to bound B:
h
iX
B = (I ⊗ I) − (Ufx˜ ⊗ I) αz zi
h
i X
= (I ⊗ I) − (Ufx˜ ⊗ I) αz zi
z=˜ xbu sX 2 αx˜bu − αx˜bu = z=˜ xbu
s ≤
X
2 2 αx˜bu 2 + αx˜bu
z=˜ xbu
s =
4
X
αx˜bu 2
z=˜ xbu
q = 2 Pr[ith query for f ≡ 0 is x ˜]. (3)
E
We return to the term ψ f inal − ψxf˜ inal which, by removing unitary transformations, is
E (q)
simply ψ (q) − ψx˜ . By combining Equations (2) and (3), this gives us
E E E E (q)
f inal − ψxf˜ inal = ψ (q) − ψx˜
ψ q q
E E X
(0) Pr[ith query for f ≡ 0 is x ˜] + ψ (0) − ψx˜ . (4) ≤2  {z } i=1 =0
which is true for every possible x ˜. While the probability term in Equation (4) might be large for particular x ˜, on average they have to be small since they add up to 1. So, we sum over all x ˜ and apply CauchySchwarz inequality to obtain: q Xq E E X X
f inal f inal − ψx˜ Pr[ith query for f ≡ 0 is x ˜]
ψ
≤2 i=1
x ˜
≤2
x ˜
q X
1·
√ N
i=1
√ ≤ 2q N 5
(5)
Finally, we combine the lower and upper bounds (Equations (1) and (5) respectively) to get √ √ ⇒ q ≥ (1 − δ) N , N · 2(1 − δ) ≤ 2q N √ so q = Ω( N ), as desired.
2.4
Conclusion
In the proof above, we used an adversial argument  we chose a relatively small set of functions that is easy to analyze, where distinguishing between those with different outputs requires many queries. This is a simplified form of the quantum adversial argument, which is a generalization of the method used in the classical setting. In the next lecture, we will outline two other methods to obtain lower bounds in the quantum blackbox model  the generalized quantum adversial method and the polynomial method.
6
09/21/2010
CS 880: Quantum Information Processing
Lecture 8: Quantum Fourier Transform Instructor: Dieter van Melkebeek
Scribe: Balasubramanian Sivan
Last class we established a lower bound on the number of queries needed for quantum search in an unsorted database. Today, we give general techniques for proving quantum lower bounds in the blackbox oracle model. We also begin our discussion on the quantum Fourier transform.
1 1.1
Generalized adversarial argument Oracle as a characteristic sequence
An oracle is a function f : {0, 1}n → {0, 1} to which we have blackbox access. We view the value of this function f at different inputs as variables, i.e., f (0n ) = y0 , f (0n−1 1) = y1 , . . . , f (1n ) = y2n −1 . This sequence of yi ’s is called the characteristic sequence of the oracle f . Given this, the complexity of quantum search, which returns an x such that f (x) = 1, is at least the complexity of computing the OR of all these yi ’s. We proved yesterday that the query complexity (i.e., number of√queries) of any quantum algorithm for computing OR(y0 , y1 , . . . , yN −1 ) with bounded error is Ω( N ), where N = 2n . More generally, we ask, given a function F : {0, 1}N → {0, 1}, what is the complexity of computing F . The function F here is to be viewed as a function of the characteristic sequence of the oracle. We give a theorem for general F today, which is a generalization of the adversary argument we gave yesterday for the special case of F = OR.
1.2
Quantum lower bound for general functions
The intuition behind the theorem is that the state vectors for inputs that map to different values under F must start out identical and end up almost orthogonal. The only operations that can induce a difference in the state vectors are the oracle queries. If we can construct a collection of inputs pairs from F −1 (0) × F −1 (1) and argue that on average each individual oracle query can only induce a small difference in the state vector, we know that a lot of oracle queries are needed. In order to construct that collection, we make use of the fact that for a given state vector and two inputs x and y, the oracle query can only induce a significant difference if the state vector puts a lot of weight on the positions where x and y differ. Thus, in order to make that difference small on average, we consider pairs of inputs (x, y) ∈ F −1 (0) × F −1 (1) that have small Hamming distance. This leads to the following quantitative statement, in which R denotes the pairs of inputs we consider and which we choose adversarially so as to obtain as large a lower bound on the query complexity as possible. Theorem 1. Given F : {0, 1}N → {0, 1}, X ⊆ F −1 (0), Y ⊆ F −1 (1), and R ⊆ X × Y . Let dleft = minx {y : (x, y) ∈ R} and dright = miny {x : (x, y) ∈ R}. Let d′lef t and d′right be such that • (∀x ∈ X)(∀i ∈ {1, 2, . . . , N }) {y ∈ Y (x, y) ∈ R, and xi 6= yi } ≤ d′lef t , and, 1
• (∀y ∈ Y )(∀i ∈ {1, 2, . . . , N }) {x ∈ X (x, y) ∈ R, and xi 6= yi } ≤ d′right .
Then, any constant error quantum algorithm for computing F must make Ω
r
dlef t dright d′lef t d′right
queries.
The set R in the above theorem can be viewed as a bipartite graph, with X as the left partite set and Y as the right partite set, with and edge between x ∈ X and y ∈ X whenever (x, y) ∈ R. With this view, dlef t is the minimum degree of a vertex in X and dright is the minimum degree of a vertex in Y . The R in the theorem is to be viewed as that subset of X × Y such that we care about the behavior of the algorithm only in R. We do not prove the theorem, but show how to apply it for two functions, namely the OR function and the AND of OR’s function. Note that the theorem does not place restriction on how we pick our X, Y and R. It holds irrespective of this choice of X, Y and R. But in order to get a high lower bound, we have to be clever in picking these quantities so that dlef t , dright are high, and d′lef t , d′right are low. Example: F = OR. To get a good lower bound, we make the same choice as in last lecture for X, Y and R: • X = {0N }; (note that this is the only choice possible for X) • Y = {y has exactly one 1}; • R=X ×Y. From this, we compute the relevant quantities for lower bound: dlef t = N , dright = 1, d′lef t = 1, and d′right = 1. On substituting these values in the formula for lower bound in Theorem 1, we get √ ⊠ a Ω( N) lower bound. √ Example: F = AND of ORs, √ i.e., F is a Boolean formula in conjunctive normal form √ with N clauses, and each clause has N literals. The first clause is an OR of√the first N variables y0 , y1 , . . . , y√N −1 (called the first block), the second clause has the next N variables and so on. We make the following choices for X, Y and R. √ • X = {x x has one block of all zeros and each of the remaining N − 1 blocks has exactly one 1}; √ • Y = {y each of the N blocks of y has exactly one 1}; • R = {(x, y) ∈ X × Y  x and y differ in exactly one position}. √ √ From these choices, we can compute: dlef t = N , dright = N , d′lef t = 1, and d′right = 1. On √ substituting these values in the formula for lower bound in Theorem 1, we get a Ω( N ) lower bound. ⊠ √ Exercise 1. Give an algorithm for computing the above F , i.e., AND of OR’s, in O( N log N ) queries. Hint: Use Grover’s algorithm. Recall that Grover’s algorithm can compute OR, as well as AND.Use it to compute AND of OR’s. We remark that there exists a stronger version with weights on the edges, including a version where the weights can be negative, which gives tight results up to a constant factor for the boundederror query complexity of any function F . 2
2
The polynomial method
In this section we give another general technique, known as the polynomial method, for establishing quantum lower bounds. Theorem 2. Given a quantum query algorithm with q queries, the amplitudes at the end of the algorithm are multivariate polynomials in the yi ’s, of degree at most q. Proof. We prove this by induction on the number of queries made, q. q=0: This case is trivial as no yi is revealed with zero queries, and hence we have a polynomial of degree zero in yi ’s. Inductive step:
Recall our model of quantum algorithms:
1. Sequence of unitary operations that do not depend on the oracle, 2. followed by an oracle query
Uf Vq
Vq+1
P
P ′ αz zi z αz zi We repeat the above two steps q times. The ith appplication of unitary operations is denoted by the operator Vi , and the ith query operation is Uf . Let the state P of the system after the operation Vq but before the application of qth query operation U be f z αz zi, and the state after the qth P query operation be z α′z zi. Note that the base states zi tell us what queries we make. Consider an arbitrary z = xbu, (where x = n). The n bits in x are the query bits, the bit b is the answer bit, and u contains the rest. How are α′xbu and αxbu related? They are related as: z
α′xbu = αxbu if f (x) = 0 = αx¯bu if f (x) = 1 Note that f (x) is simply yx , which is one of our variables. Thus we have α′xbu = αxbu (1 − yx ) + αx¯bu yx i.e., α′xbu is a linear function in yx and the α’s. By inductive hypothesis, we have that αxbu and αx¯bu are multivariate polynomials of degree at most q − 1. Thus α′xbu is of degree at most q. There is just one more thing left to prove, namely the effect of Vq+1 on the degree. But Vq+1 is a linear operation, and hence won’t increase the degree. This proves the theorem. Now, we use this theorem to establish query lower bounds for computing the OR function. We do this for the bounded error case as well as the exact algorithm case. 3
Bounded error: Given an ǫerror algorithm for the OR function with q queries, we have X (f inal) 2 F (y) − αz ≤ǫ z: output bit is 1

{z
}
Probability that the algorithm outputs 1
The quantity subtracted from F (y) in the above equation is a polynomial of degree at most 2q. This means that there is a multivariate polynomial p of degreee at most 2q such that ∀y ∈ {0, 1}N F (y) − p(y) ≤ ǫ
√ When F is the OR function, we can show that the minimum degree of such a polynomial is Ω( N ), thus proving the desired lower bound. Exact algorithm: We can similarly show that Ω(N ) queries are required to compute the OR function exactly because the exact representation of the OR function requires a polynomial of degree N . Exercise 2. Prove that N queries are needed for computing the OR function exactly. Using the polynomial method, it canbe shown that for any function F : {0, 1}N → {0, 1}, if the optimal deterministic algorithm makes D queries in the worst case, then the quantum algorithm with zero error must make Ω(D 1/2 ) queries and a constant error quantum algorithm must make Ω(D1/6 ) queries. For symmetric Boolean functions (a function whose value depends only on the number of 1’s in the input), it is known that any constant error algorithm is required to make Ω(D1/2 ) queries, and it is conjectured that the same holds for any function. Note that the polynomial relationship between the quantum and classical query complexity of functions F : {0, 1}N → {0, 1} does not contradict the exponential gap in query complexity we have seen for (the decision version of) Simon’s problem, for example. This is because the F underlying Simon’s problem is not a function but only a partial function, corresponding to promise we are given in Simon’s problem that the input oracle is guaranteed to be of one of two special forms.
3 3.1
Fourier transform Fourier transform over the reals
We briefly discuss Fourier transform in the classical setting in this R section. In the classical setting, we define Fourier transform as follows. Let f : R → C such that f (x)2 dx < ∞, i.e., f is square integrable. Then, we write the Fourier transform of f as fb(w), and is given by Z 1 b f (x)eiωx dx (1) f (w) = √ 2π and we have that,
1 f (x) = √ 2π
Z 4
fˆ(w)e−iωx dw
(2)
Equation (2) says that we can write any function f as a superposition of the harmonics, represented by the eiωx terms. The coefficients of this superposition are the fb(w)’s, given by (1). We can think of the Fourier transform operation from f to fb as an orthonormal basis transformation, namely a transformation from the standard basis of δ functions, which itself is orthonormal, to another orthonormal basis. Apart from orthonormality, the most interesting property of Fourier transform is the fact that the Fourier transform of the convolution of two functions, is the product of the Fourier transforms of the individual functions. The convolution f ∗ g of two functions f and g is defined as Z 1 f (x − y)g(y) dy. f ∗ g(x) = √ 2π We have that f[ ∗ g(w) = fb(w) · gb(w).
where fb(w) · b g(w) is a pointwise product. The Fourier transform operation can be generalized to groups other than R. Note that R is a group under addition.
3.2
Fourier transform over general groups
Let f : G → C be a function from a group G to complex numbers. Can we comeup with an orthonormal basis such that there exists a transformation from convolutions to products? It turns out that we can do this for any finite group. The situation becomes easier for finite abelian groups, which are groups for which group operation is commutative. For finite abelian groups, the harmonics are the scaled versions of their characters. Definition 1. A character of a group is a homomorphism from the group to complex numbers. A homomorphism is a function that preserves the group operation, i.e., if χ is a homomorphism, then χ(a · b) = χ(a) · χ(b). Properties:
For any two characters, χ 6= χ′ , and any g ∈ G,
1. χ(g) = 1 2. (χ, χ′ ) = 0, where (χ, χ′ ) =
P
g∈G χ(g)χ
′ (g)
Proof. To prove the first property, recall that, any group element, on sufficient powering gives the unit element of the group, and we know that χ(unitelement of G) = unitelement of C = 1. Now, we use the fact that χ(gn ) = (χ(g))n . Thus, some power of χ(g) must be 1, implying that χ(g)=1. We prove the second property in two steps. In the first step, let χ′ be the trivial character which maps for any χ which does not map every element to P all elements to 1. We must now prove that P 1, g∈G χ(g) is the same as proving We know that for every group g∈G χ(g) = 0.P P = 0, which P P element a, g∈G χ(g) = g∈G χ(g.a) = χ(g)χ(a) = χ(a) g∈G g∈G χ(g). We pick an a such P that χ(a) 6= 1. This means that g∈G χ(g) = 0. Now consider the case where χ′ does not map every element to 1. For any two characters χ and χ′ , the function χχ′ is also a character. Observe that χχ′ cannot map every element to 1, as that would χ = χ′ . Now we use the proof in the first step for our character χχ′ , thus P imply that ′ yielding g∈G χ(g)χ (g) = 0. 5
The question now is, whether we have enough of these characters. If the number of characters equals the size of the group, then we have an orthogonal basis, because, the dimension of the space of all functions from G to C is G. To make the basis orthonormal, we scale the characters by a √ factor of  G, because (χ, χ) = G.
4
Next time
In the next lecture, we will see that a finite abelian group is isomorphic to a direct sum of finite cyclic groups, and discuss the characters of a cyclic group. We will prove that there are G characters for any finite abelian group by explicitly constructing these characters. We will also discuss some applications of Fourier transform, like phase estimation and eigenvalue estimation.
6
CS 880: Quantum Information Processing
9/23/2010
Lecture 9: Phase Estimation Instructor: Dieter van Melkebeek
Scribe: Hesam Dashti
Last lecture we reviewed the classical setting of the Fourier Transform over the reals. In this class we discuss the Fourier Transform over finite Abelian groups, and show how to compute it efficiently in the Quantum setting. At the end of this lecture, we start introducing Phase Estimation as an application of the Quantum Fourier Transform.
1
Fourier Transform over Finite Abelian Groups
For a general group G, the Fourier Transform is an orthonormal basis transformation from the standard basis of delta functions δg , g ∈ G, for the space of functions f : G → C such that convolutions are transformed into pointwise products. Such a transformation exists for any finite group, but the construction is significantly simpler when the group is Abelian. In that case the new orthonormal basis is formed by the normalized characters of the group. A character of G is a homomorphism from G to the group of complex numbers under multiplication, i.e., a function χ : G → C such that for all a, b ∈ G, χ(ab) = χ(a).χ(b). Let us recall the two facts about the characters from the previous lecture: Facts: For any two distinct characters χ, χ0 and any g ∈ G, the following holds: 1. χ(g) = 1; this means that range of the χ function lies on the unit circle in the complex plane. 2. χ and χ0 are perpendicular to each other with respect to the inner product X (χ, χ0 ) = χ(g)χ0 (g), g∈G
i.e., (χ, χ0 ) = 0. Recall that orthogonality of nonzero vectors implies linear independence, and that the dimension of the space of functions from G to C is G. Thus, if we can find G distinct characters, then they form a basis for that space. By Fact 2, the basis is orthogonal. By Fact 1, (χ, χ) = G, so in order to have an orthnormal basis we need to normalize the characters and use the functions √1 χ. If we then index the characters as χy where y ranges over G, every function f : G → C G can be written as a linear combination of the normalized characters normalized characters
√1 χy : G
√1 χy , G
or equivalently, of the
1 X ˆ f (x) = √ f (x)χy (x). G y∈G
By taking the inner product with χ ¯y∗ on both side and using orthonormality, we find the following expression for fˆ: 1 X fˆ(y∗) = p f (x)χy∗ (x). G x∈G 1
We wanted our new basis functions to have two properties: they should be orthonormal (as we showed) and the transformation should map convolutions to pointwise products. The new basis functions have this property and we leave the proof as an exercise. p P g (y) Exercise 1. For (f ∗ g)(x) = y∈G f (y)g(x.y −1 ) show that f[ ∗ g(y) = Gfˆ(y).ˆ Next, we show that for Abelian finite groups G, the number of distinct characteres is indeed equal to G, and we find all the characters. First, let us start with a very simple group: the cyclic group over N elements G = ZN , +. In this case, G = N , so we need to find N characters. Since ZN , + is generated by the element 1, each character of ZN , + is fully determined by its value at 1: χ(x) = χ(1)x . Moreover, the value χ(1) could be any z ∈ C that satisfies z N = 1, i.e., any point on the regular N gon inscribed in the unit circle and containing 1 (see Figure 1).
Figure 1: Choices for χ(1) for G = Z8 , +. y
y
Since these points are of the form e2πi N for y ∈ ZN , we set χy (1) = e2πi N . This generates the N distinct characters 2πiyx χy (x) = e N . which we need for the cyclic group ZN , +. Now consider the case of a group G that is the direct product of a finite number of cyclic groups: G = ⊕kj=1 Zmj , +. Every element of x ∈ G can be written as a ktuple x = (x1 , x2 , . . . , xk ) where xj ∈ Zmj , and Q G = kj=1 mj . In order to find characters of this group, we can pick a character from each of the constituent groups Zmj , + and multiply them to get a single character of G. For example, we can pick (χy1 (x1 ), χy1 (x2 ), . . . , χyk (xk )) and obtain k Y
2πi(
χyj (xj ) = e
xk .yk x2 .y2 x1 .y1 + m +...+ m m1 2 k
)
.
j=1
For different choices of (yQ 1 , . . . , yk ) the product gives us different characters of G. As the number of distinct choices equals kj=1 mj = G, we obtained all characters of G this way. Since every finite Abelian group is isomorphic to the direct product of cyclic groups, the latter case is the general one. 2
2
Quantum Fourier Transform
By viewing a base state xi as the delta function δx for x ∈ {0, 1}n , we define the Quantum Fourier transform as mapping xi onto the Fourier expansion of δx : X F : xi → δˆx (y) yi . y
P By linearity, if we apply F to a superposition Ψi = α(x) xi we get: X F Ψi = α ˆ (y) yi . y
For the group G = ZN , + this gives N −1 1 X 2πixy F : xi → √ e N yi . N y=0
2.1
Computing Fourier Transform over ZN for N = 2n
Now, we see how we can compute the Fourier Transform in the Quantum Setting for the special case of ZN , + where N = 2n is a power of 2. Let us start by considering the time complexity of computing the Fourier Transform in the Classical Setting and the Quantum Setting. Classical Setting : The trivial algorithm takes O(N 2 ) operations. The Fast Fourier Transform takes O(N n) operations. Quantum Setting : 1. A simple algorithm that we will develop, takes O(n2 ) operations. 2. A better algorithm takes O(n(log(n))2 log log(n)) operations. 3. An approximate algorithm with an error at most takes O(n log( n )) operations. The Simple Quantum Algorithm in time O(n2 ): Recall that we can represent the Quantum Fourier Transform by 1 X 2πix.y F xi = √ e N yi . N
(1)
Let us start by considering the binary representations x = x1 x2 . . . xn and y = y1 y2 . . . yn , where P xj , yj ∈ {0, 1}. Note that we can write y as y = nj=1 yj 2n−j and plug it into (1). We can rewrite the exponential function in the amplitude as a product e
2πixy N
=
n Y
e
2πix·yj ·2n−j 2n
=
j=1
where the latter step follows because
n Y
e2πi(·xn−j+1 ...xn )yj ,
j=1 x2n−j 2n
= x1 x2 . . . xn−j · xn−j+1 . . . xn and e2πi.(Integer) = 1. 3
Using this equation we can write F xi as the tensor product F xi = ⊗nj=1 yj i, where yj i =
0i + e2πi(·xn−j+1 ...xn ) 1i √ . 2
Let us start to compute the first qubit of the output and see what is the value of y1 i: y1 i = If xn is zero we get on xn i, so
0i+1i √ 2
0i + e2πixn 1i √ . 2
and in another case we get
0i−1i √ . 2
In fact, this is the Hadamard transform
y1 i = H xn i . Now, let us consider the second qubit 1 y2 i = √ (0i + e2πi(·xn−1 xn ) 1i), 2 The Hadamard transform on xn−1 i gives us 1 H xn−1 i = √ (0i + e2πi(·xn−1 ) 1i). 2 When xn is zero, H xn−1 i is the output. But if xn is 1 we need to apply the rotation operation R2π/4 on H xn−1 i. Hence, we need to define a conditional rotation (CRθ ) based on the regular rotation (Rθ ), as follows. 1 0 0 0 0 1 0 0 1 0 When Rθ = then CRθ = iθ 0 0 1 0 . 0 e 0 0 0 eiθ For y3 i we need two conditional rotations based on the values of xn and xn−1 . We can keep going and construct our Quantum circuit as pictured following: xn
•
xn−1
•
•
xn−2
H
Rπ/2
H
y1 i
H
Rπ/2
y2 i
Rπ/4
y3 i
.. .
.. .
This gives us the correct output except that the yj ’s are in the reverse order. We can easily swap them to get them in the right order. The number of gates to compute yj i is O(j). In total, the circuit contains O(n2 ) gates. 4
We can construct a simpler circuit with a good approximation and lower complexity by ignoring the smaller rotation gates, which have a minor effect on the outputs. Thus, we found the Fourier transform for the group ZN , where N = 2n . In particular, when n = 1 the operation of the Fourier transform is the Hadamard transform. The Fourier Transform N n n . Each time we applied over N (Z2 ) is the tensor product of n Hadamard transforms: F = H H n in the past, we were really applying the Fourier Transform over (Z2 )n .
3
Phase Estimation
Phase estimation is the following problem. Problem: For a given Ψi = ω < 1, the goal is estimate the value of ω. To achieve this goal we consider two cases as follows:
√1 N
PN −1 x=0
e2πiωx xi , where 0 ≤
Case 1: ω is of the form Nz , where z ∈ {0, . . . , N − 1}. In this case, we can find ω exactly by performing inverse Fourier transform on the superposition Ψi and observe the z: z := F −1 Ψi = zi .
Case 2: In general, without the constraint on ω, we can find an approximation which is close to ω. We claim that the above procedure yields that with high probability, and now analyze its probability of success. N −1 X F −1 Ψi = αz zi , z=0
where
z N −1 N −1 1 X 2πiωy − 2πizy 1 X 2πi(ω− z ) y 1 1 − e2πi(ω− N )N N N αz = = (e ) = . e e . z N N N 1 − e2πi(ω− N ) y=0 j=0
The probability of observing z is αz 2 . Using the above expression for αz we will show in the next lecture that the probability of obtaining the closest approximation to ω of the form Nz is high, and that the probability of obtaining one that is far from ω is small. We will also discuss an important application of phase estimation, namely estimating the eigenvalues of unitary operations, which are all of the form e2πiω for some real ω ∈ [0, 1).
5
9/27/2010
CS 880: Quantum Information Processing
Lecture 10: Eigenvalue Estimation Instructor: Dieter van Melkebeek
Scribe: Dalibor Zelen´ y
Last time we discussed the quantum Fourier transform, and introduced the problem of phase estimation. Today we conclude the discussion of phase estimation, and apply it to the problem of eigenvalue estimation.
1
Phase Estimation
Recall that in the phase estimation problem, we are given a state ψ of the form 1 X 2πiωx e xi ψi = √ N x for some real 0 ≤ ω < 1. Our goal is to determine the approximate value of ω. In order to find ω, we apply the inverse Fourier transform to the state ψi and observe the system. We interpret the observation as an integer z, and output z/N as our approximation of ω. Define ∆ to be the smallest real number d (in terms of absolute value) such that e2πi(z/N +d)x = e2πiωx , and note that ∆ ≤ 1/2. We define ∆ this way to make some facts easier to state, and to make their proofs simpler. We saw last time that if ω = z/N for some integer z, this algorithm finds ω exactly. When ω doesn’t have the form z/N for any integer z, we can only approximate ω by outputting a value z/N that’s close to ω in the sense that e2πiωx and e2πi(z/N )x are close. The best we can hope to find is a z that minimizes ∆, i.e., for which ∆ ≤ 1/2N . The reason we cannot do better is that we only have a limited number (namely n) qubits to work with. We showed last time that F −1 ψi =
X z
αz zi
with
αz =
1 1 − e2πi∆N · . N 1 − e2πi∆
(1)
Unlike in the case where ω has the form z/N , we are not guaranteed that we observe z that minimizes ∆. We observe a good z with high probability, as shown in the following claims. We prove the first and the third claim. The proof of the second claim is left as an exercise. Claim 1. Pr[We observe z that minimizes ∆] ≥ 4/π 2 . Claim 2. Pr[We observe z such that ∆ ≤ 1/N ] ≥ 8/π 2 . Claim 3. Pr[We observe z such that ∆ ≥ δ] ≤ O(1/δN ). Proof of Claim 1. When we observe a z that minimizes ∆, we have ∆ ≤ 1/2N . The probability of observing this z is αz 2 . We give a lower bound on this probability using (1) and a geometric argument. We get from (1) that 1 1 − e2πi∆N  · . (2) αz  = N 1 − e2πi∆  1
The numerator in (2) is the distance between points a and b in Figure 1 with b = e2πi∆N = eiθ , where −π ≤ θ ≤ π. Taking the right triangle formed by the points 0, a, and c, we see that b − a = 2 sin(θ/2). Note that for any θ ∈ [−π, π], we have  sin(θ/2) ≥ θ/2/(π/2) = θ/π, so 1 − e2πi∆N  = 2 sin(θ/2) ≥ 2θ/π. Therefore, 1 − e2πi∆N  ≥ 2 · (2π∆N )/π = 4∆N . For the denominator, since the arc between two points is longer than the line segment between those two points, we have 1 − e2πi∆  ≤ 2π∆. Combining the two yields αz  ≥
1 4∆N 2 · = , N 2π∆ π
so αz 2 ≥ 4/π 2 as we wanted.
b
θ
c
0
a
Figure 1: A geometric aid for the proof of Claims 1 and 3. The circle has radius 1, and we have a = 1 and b = eiθ . The point c is in the middle of the line segment from a to b. Exercise 1. Prove Claim 2. Proof of Claim 3. We need an upper bound for αz  now. First note that the numerator in (2) is at most 2 because it’s the distance between two points on a unit circle. Since ∆ ≤ 1/2, we have 1 − e2πi∆  = 2 sin(π∆) ≥ 2(π∆)/(π/2) = 4∆, so αz  ≤ 2/∆N , and αz 2 ≤ (2/∆N )2 .
(3)
We need to sum (3) over all z that cause a large value of ∆. The smallest ∆ can be in order to count towards that sum is δ. Since we output integers, the next possible values of ∆ are δ + 1/N , δ + 2/N , and so on. Each of those values occurs for two values of z (once in an overestimate and once in an underestimate). The smallest value of ∆N is then δN , and the other possible values
2
are δN + k for positive integers k. ∞ X
2 2 Pr[We observe z such that ∆ ≥ δ] ≤ 2 Nδ + k k=0 2 Z ∞ 2 dx ≤2 x=0 N δ + x Z ∞ 2 2 ≤2 dx x=N δ x 1 . ≤O Nδ
2
Eigenvalue Estimation
In eigenvalue estimation, we are given a unitary operator U acting on m qubits and an eigenvector ϕi of U . Since ϕi is an eigenvector of U , it follows that U ϕi = e2πiω ϕi
for some ω ∈ [0, 1).
(4)
Our goal is to estimate the eigenvalue corresponding to ϕi, which really means we just need a good estimate of ω in (4). As we will see soon, we can use phase estimation to find ω. Before we can find ω, we need to create a superposition that admits the use of phase estimation, namely something that looks like a Fourier transform. We do so using an idea similar to phase kickback—we inject the eigenvalue into the amplitude. We apply a controlled U operator to construct the necessary superposition. The new operator, CU , has the following behavior on an eigenvector ϕi: (CU ) 0i ϕi = 0i ϕi
(CU ) 1i ϕi = 1i U ϕi = e2πiω 1i ϕi
Then if we apply CU to the superposition +i ϕi, we get (CU ) +i ϕi =
0i + e2πiω 1i √ ϕi . 2
Recall that our goal is to get something that looks like a Fourier transform. To that end, construct yj i as follows: j 0i + e2πi2 ω 1i 2j √ yj i ψi = (CU ) +i ψi = ψi , 2 so we have jω n 2πi2 O 0i + e 1i 1 X 2πiωx √ e xi ψi . (5) yi ψi = ψi = √ 2 N x j=1 Figure 2 shows the circuit that produces the superposition (5). We construct it as a concatenation of CU gates raised to powers of to from 1 to 2n−1 , each controlled by a different qubit in the 3
+i state. After that, the n control qubits are in the right superposition for phase estimation, so we apply the inverse Fourier transform to them, make an observation, and get an approximation of ω like we did in phase estimation. Note that the three claims we stated for phase estimation carry over to this setting.
+i +i
···
• .. .
•
F −1
···
+i ψi /m U
···
U2
···
• n−1
NM
NM .
..
NM
U2
Figure 2: The Eigenvalue Estimation Circuit We make a few remarks about the circuit in Figure 2. First, this is efficient only if we can construct higher powers of the controlled U gates efficiently. For example, if we only have oracle access to U , we are out of luck and need k consecutive applications of the CU gate to get (CU )k . But even that may be sufficient in some applications, as we will see in the next section. Second, when we apply eigenvalue estimation, we aren’t always going to have access to an eigenvector ϕi, so let’s see what happens P when we use some general state ψi instead. We can write this state as a linear combination j αj ϕj i of eigenvectors of U . After we apply the inverse P Fourier transform in Figure 2, the state is j αj f ωj i ϕj i. With probability αj 2 , we observe a good approximation of ωj . Thus, we get an estimation of some eigenvalue out of the algorithm. Whether this is useful or not depends on the application. In the next section we will see some applications where this is useful information.
3
Applications of Eigenvalue Estimation
Eigenvalue estimation has many applications. We list a few here. • An implementation of Grover’s algorithm • Approximating the Fourier transform over ZN for N other than powers of 2 • Solving wellconditioned sparse systems of linear equations • Order finding and integer factorization • Computing discrete logarithms We describe the first application today. We will discuss the other applications in the coming lectures.
4
3.1
Grover’s Algorithm
Recall that in Grover’s algorithm, we are given oracle access to f : {0, 1}m → {0, 1} and our goal is to find an input x such that f (x) = 1. During the analysis, we noted that all positive inputs (those where f (x) = 1) had the same amplitude, and also that all negative inputs (those where f (x) = 0) had the same P amplitude. Let t be thePnumber of positive inputs. We defined superpositions Bi = √M1 −t f (x)=0 xi and Ci = √1t f (x)=1 xi representing all the negative and all the positive inputs, respectively, and viewed the state as a superposition of Bi and Ci. The goal of the algorithm was to increase the amplitude of the positive inputs and decrease the amplitude of the negative inputs. We achieved that by describing an operator G and applying it the right number of times. For the analysis, we viewed the B component of our state on the horizontal axis, the C component on the vertical axis, and the state itself as a point on the unit circle. In fact, looking at Figure 1, the state would be at point b and would have an angle of θ with the positive B axis. Applying G had the effect of rotating the state counterclockwise by 2θ. Exercise 2. The eigenvalues of G and their corresponding eigenvectors are λ+ = e2iθ , λ− = e−2iθ ,
1 ϕ+ i = √ Bi + 2 i ϕ− i = √ Bi + 2
i √ Ci 2 1 √ Ci 2
Using the eigenvectors from Exercise 2 above, we can write the state as ψi = α+ ϕ+ i+α− ϕ− i. We apply the eigenvalue estimation algorithm to +i⊗n ψi to get E E g e ϕ− i (6) ϕ+ i + α− −2θ α+ 2θ
Therefore, we observe a good estimate γ that is within δ of either 2θ or −2θ with probabilities α+ 2 and α− 2 , respectively (modulo a loss of a tiny constant factor depending on δ coming from Claim 3). Finally, we approximate the number of positive inputs t by t˜ = M · sin2 (γ/2), which is the same regardless of which of the two angles γ was approximating. The actual size of the set of positive inputs is t, and we would like to bound the difference t − t˜. t − t˜ = M  sin2 θ − sin2 (γ/2) δ δ ≤ M · 2 sin θ + 2 2 2 √ δ ≤ δ tM + M 4 √ 1 for δ = √ = O( t) M
(7) (8) (9)
We get (7) by factoring the line above and using the fact that the sine is Lipschitz continuous with the Lipschitz constant less than 1. We get√(8) by definition of t. If we pick δ as in (9), we get√an estimate of t within an additive √ factor of t, which is a very good approximation. We need M applications of G (and thus M queries to f ) to get this accuracy. Then to run Grover’s algorithm, we can approximate t with t˜, and then apply G t˜ times to do the search, and make an observation. To do the approximation, we initialize ψi in Figure 2 with +i⊗m , and use the controlled version of G in place of CU . 5
4
Next Time
Now suppose we observe the bottom m wires in the E circuit from Figure 2 instead of the top n wires. E g e The state has the form α+ 2θ ϕ+ i + α− −2θ ϕ− i, and the two components of the state are almost orthogonal, so they do not interfere with each other too much. Now both ϕ+ i and ϕ− i cause an observation of a positive example with probability 1/2 because they are both uniform superpositions of positive and negative inputs. Then we might as well not use the inverse Fourier transform on the top wires because it has no effect when performing Grover’s algorithm. We make this intuition more formal in the next lecture.
6
09/28/2010
CS 880: Quantum Information Processing
Lecture 11: Applications of Eigenvalue Estimation Instructor: Dieter van Melkebeek
Scribe: Balasubramanian Sivan
Last class we saw eigenvalue estimation as an application of phase estimation. We also discussed an application of eigenvalue estimation to Grover’s algorithm. Today, we complete our discussion of Grover’s algorithm from the perspective of eigenvalue estimation, and see two other applications, namely, quantum Fourier transform over ZM when M 6= 2m , and solving systems of linear equations over the reals.
1
Grover’s algorithm  an eigenvalue estimation perspective
Recall that in the eigenvalue estimation problem, we are given a unitary operator U and one of its eigenvectors. Our goal is to get a good approximation for the eigenvalue of U corresponding to this eigenvector. Some applications of eigenvalue estimation are: 1. An implementation of Grover’s algorithm. 2. Approximate quantum Fourier transform over ZM when M is not a power of 2 (We will see that the approach we used when M was a power of 2 doesn’t quite work here.) 3. An efficient algorithm for “solving” sparse and wellconditioned systems of linear equations in time polylogarithmic in the number of dimensions of the system. We mentioned this application in our first lecture. See Section 3 for why we put solving between quotes. 4. Order finding − Given m and a, find the order of a in Zm . We remark that order finding is a critical component in Shor’s algorithm for factoring integers. We now proceed to discuss Grover’s algorithm from the eigenvalue estimation perspective. Recall that the task of Grover’s algorithm is to compute an input x such that f (x) = 1, where f : {0, 1}m → 1. In Grover’s algorithm that we saw in lecture p 6, we had to know t − the number of strings x such that f (x) = 1 − in order to run in time O( N/t) and find an x that maps to 1. In yesterday’s class, we mentioned a way to compute the value of t approximately using eigenvalue estimation. The circuit we used yesterday for evaluating t in Grover’s algorithm, was an n + m qubit circuit. We refer to the first n bits as the top part, and the remaining m qubits as the bottom part. Yesterday we used the result obtained from the first n qubits to evaluate t. Today, we will observe the other part, namely the remaining m qubits. The circuit we used yesterday (see Figure 1) was simply the circuit for eigenvalue estimation procedure, with the operator being the Grover iterate, and the input being +i⊗n +i⊗m . From Exercise 2 in lecture 10, we know that the Grover iterate has two eigenvectors, namely, ϕ+ i = √1 Bi + √i Ci and ϕ− i = √i Bi + √1 Ci, with respective eigenvalues e2iθ and e−2iθ , where θ is 2 2 2 2 P the angle made by the uniform superposition state √1N x xi with the B axis (i.e., the horizontal q t axis), and sin(θ) = N . Here Bi and Ci are respectively, the superpositions of all the inputs 1
that map to zero, and all the inputs that map to one, i.e., we have Bi = √
X 1 xi M − t f (x)=0
1 X xi Ci = √ t f (x)=1
+i +i
···
• •
.. .
···
F −1
···
+i ψi /m G
G2
···
NM
NM .
..
NM
NM
• n−1
G2
Figure 1: Grover’s algorithm through the eigenvalue estimation circuit. The first n qubits of the circuit act as control wires for Grover iterates. The inverse Fourier transform is applied only on these first n qubits. The Grover iterates act on the remaining m qubits. What is the state of the system before observation? It is the state resulting from applying the eigenvalue estimation circuit to +i⊗n +i⊗m , i.e., E E g e ϕ− i (1) ϕ+ i + α− −2θ α+ 2θ where e xi is a superposition concentrated around x, which results from phase/eigenvalue estimation. What do we get if we observe the second register? First, note that if we were to observe ϕ+ i alone or ϕ− i alone, then with a probability of half, we will observe an input that maps to 1. This is because both ϕ+ i and ϕ− i have an amplitude Esquare of 1/2 E for Ci. But we only get to observe e g the second register for the state in (1). If 2θ and −2θ were orthogonal, the probability of observing an input that maps to one, E is the sum of the probabilities of observing the same along e with a constituent base state of 2θ and observing the same along with a constituent base state of E E E g g e and −2θ are not necessarily −2θ , which is exactly α2+ /2 + α2− /2, which is 1/2. But since 2θ orthogonal, to obtain the probability with weEobserve an input that maps to 1, we find an E which e g upper bound on the overlap between 2θ and −2θ . From our analysis of the phase estimation ˜ that we observe is more than δ away procedure, we know that the probability that the value 2θ 1 ). Using this, we note that from 2θ is O( δN
1 θN 1 g ∈ Pr[−2θ / (−4θ, 0)] = O θN e ∈ Pr[2θ / (0, 4θ)] = O
2
If 0 < θ < π/4, then, (0, 4θ) and (−4θ, 0) don’t E overlap. From the above equations, the total e 1 probability mass of base states constituting 2θ that lie outside (0, π) is at most O( θN ). Similarly, E g the total probability mass of the base states constituting −2θ that lie outside (−π, 0) is at most 1 ). Hence Pr[success] ≥ O( θN
1 2
1 − O( θN ). So our algorithm is the following:
1. For n = 1, 2, . . .
(a) Run eigenvalue estimation circuit on +i⊗n +i⊗m , with the Grover iterate operator.
(b) If f (observed y) = 1, then halt.
Once N exceeds O( θ1 ), for some value of θ, we have a good probability of success, say 1/3. If we denote by i∗ the trial number after which the probability of success exceeds 1/3, then the number q
of oracle calls till i∗ , as we saw in Lecture 6, is O( i∗ is given by
2m t ).
The expected number of oracle calls after
i−i∗ 2 2 3 ∗
X
i
i>i
i.e., the event that we do trial number i > i∗ , happens only when we fail in all the previous trials. In particular, we fail in all the trials after i∗ , each happens with probability at most 2/3. Hence the above expression. This series does not converge. The trick we adopted to in Lecture 6 was to change our doubling to “λing” for some appropriate λ that ensures convergence of the series. But we cannot do that here because we can only decide n here. Once we increase n by 1, we are automatically doubling. So what we do is, instead of going from n to n + 1 immediately after failing at n, we try each n twice before going to next n. Now, the trial number after q which the probability of success exceeds 1/3 is 2i∗ . Till 2i∗ trials, the number of oracle calls is O( number of Oracle calls after trial number 2i∗ is as follows: X i 2 i−2i∗ 2⌈ 2 ⌉ . 3 ∗
2m t ).
The expected
i>2i
√
This is a geometric series with a ratio of 2 3 2 < 1, and thus converges. The leading term q m O( 2t ). Thus, our final algorithm is as follows:
2 −2i 3
∗
is
1. For n = 1, 2, . . . Repeat twice:
(a) Run eigenvalue estimation circuit on +i⊗n +i⊗m , with the Grover iterate operator.
(b) If f (observed y) = 1, then halt.
Note that this algorithm is the same as our algorithm in Lecture 6, except that • We did not have the top portion in our old circuit, namely the one corresponding to the first n qubits. The top portion of the new circuit, does two things. One is to serve as control for the Grover iterates, and the other is to do an inverse Fourier transform on the first n qubits. 3
• In the earlier algorithm, for any given trial, we did not pick the number of iterations of Grover iterate to be 2n exactly. Instead, we picked a number uniformly at random from 1 to 2n to be the number of times Grover iterate must be applied. Now, in our new circuit, since we do not use the first n answer bits, we need not apply the inverse Fourier transform operation. Removing this inverse Fourier transform does not affect the probability of observing y in the last m qubits, for any given base state y. This is because the probability of observing y is the sum of squares of the amplitudes of the states xi yi, where xi is an nqubit state. This quantity is not altered by removing the inverse Fourier transform for the first n qubits. So, the only remaining difference from the algorithm in Lecture 6 is the control part. But note that using wires from a uniform superposition over all states from 0 to 2n − 1 as a control for Grover iterates, is just equivalent to picking the number of applications of Grover iterates uniformly at random between 0 and 2n − 1. So, both the algorithms are the same.
2
Quantum Fourier transform over ZM when M is not a power of two
We saw how to effect the Quantum Fourier transform over ZM when M is an exact power of 2, in an earlier lecture. Today we discuss the general M case. When M is not a power of 2, the first thing we do is to embed {0, 1, 2, . . . , M − 1} into an n qubit system with base states in {0, 1}n . For example, choose n = ⌈log M ⌉. When M is not an exact power of two, this means that we do not use some base states in {0, 1}n , namely those states that correspond to x such that x ≥ M . Our goal is to effect the following transformation FM : ( P −1 2πixy M yi for 0 ≤ x < M b xi = √1M M y=0 e FM : xi −→ xi otherwise We do not care about the action of FM for x ≥ M . It is worth pausing here to reflect that the idea of tensor products that we used for QFT when M is an exact power of 2, does not work here. We wrote the Fourier transform of a state xi as the tensor product ⊗nj=1 yj i, where yj i =
0i + e2πi(·xn−j+1 ...xn ) 1i √ . 2
Note that we needed 2n base states to use the tensor product idea. To realize the transformation FM , we first achieve a transformation from xi 0i 0i to xi b xi 0i as follows: M −1 M −1 X X 1 1 xi 0i 0i −→ xi √ yi 0i −→ xi √ yi xyi −→ M y=0 M y=0 M −1 M −1 X X 2πixy 2πixy 1 1 xi √ e M yi xyi −→ xi √ e M yi 0i = xi b xi 0i M y=0 M y=0 4
Note that operation one is just producing a superposition of base states from 0 to M − 1. Operation two is a multiplication operation, and hence is a deterministic one. Operation three is a controlled rotation. Operation four is simply undoing the reversible simulation of the multiplication operation we did. Thus we have effected a transformation from xi 0i 0i to xi b xi 0i, which is same as having effected the transformation from xi 0i to xi b xi. But, this is not enough for us as the xi in the first register will meddle with the interference pattern that must be exhibited by b xi. We need our final state to be b xi 0i. Note that it is enough if we get the final state to be 0i b xi, as we can flip the two registers. We now show how to transform xi b xi to 0i b xi. Actually, we show a unitary transformation from 0i b xi to xi b xi, and this is enough, as we know that unitary transformations are reversible. To achieve this, we first note that b xi is an eigenvector of the following operator U. ( x − 1 modM i for 0 ≤ x < M U : xi −→ xi otherwise To see that b xi is an eigenvector of U , note that M −1 1 X 2πixy e M y − 1i U b xi = √ M y=0 M −1 1 X 2πix(y+1) √ = yi e M M y=0
=e
2πix M
b xi . 2πix
Thus, b xi is an eigenvector of U with eigenvalue e M . In the notation we used for eigenvalue x . So, we run eigenvalue estimation on b xi with U as the operator. This gives estimation, ω = M us e xi b xi. Recall that the notation e xi denotes a state that is concentrated around xi. If e xi was exactly xi, then we have achieved the required transformation. But we will not get xi exactly. For k bits of accuracy, we require our operator U applied 2k times. Moreover, for our operator U , iterates are easy to compute because we have ( x − r modM i for 0 ≤ x < M r U : xi −→ xi otherwise So we can afford high values of k. But, is any error tolerated at all? The answer is yes because we know successive application of unitary operators never make error grow, and thus any further operations that we make on this approximate state will only carry the tiny error forward, without blowing it up.
3
“Solving” systems of linear equations
We explain why we wrote solving in quotes by describing the problem and the solution we develop. Consider a system of linear equations Ax = b, where A is an N × N matrix that is invertible. Our goal is to effect the following transformation: start with a normalized state bi, in n (= log N ) 5
qubits, and transform it into the normalized solution xi. As in Fourier transform, what we have at the end is a state that contains in it all the information about the solution to the system, but we do not have access to all that information: after all we can make a measurement. Nevertheless, the transformation to xi might be useful in some situations. We could for example do a linear transform from xi to some other state, and then make a measurement. We also require that the matrix A is Hermitian, i.e., A∗ = A. Note that A being Hermitian is not a restriction, as we can get around it by solving the following system instead: ′ 0 A b x = . ∗ x A 0 0 The coefficient matrix in the above equation is clearly Hermitian. x′ is some set of variables we do not care about. e notation e The quantum algorithm for this problem runs in time O((log N )s2 κ2 1ǫ ), where the O is just the O notation with polylogarithmic terms ignored. In this particular case, polylogarithmic terms in s, κ and ǫ have been ignored. We now explain the parameters in the expression for runtime. 1. s is the sparseness of the matrix A, i.e., A contains at most s nonzero elements per row. We require that these nonzero elements can be located efficiently, i.e., in time negligible compared to the runtime mentioned above. 2. Ω(1) ≤ A ≤ 1 is a constant less than 1 and A−1  ≤ κ. Note that for a Hermitian matrix the 2norm just represents the largest eigenvalue. We require the largest eigenvalue of A to be less than 1. If this is not the case, we scale the matrix to achieve this. We need to have good estimates of the largest eigenvalue to do this. κ is the notation for condition number in numerical analysis. We use it here for the following reason: The condition number of a matrix is A · A−1  , i.e., the ratio of the largest eigenvalue and the smallest eigenvalue. Since we have A ≤ 1, and A−1  ≤ κ, it follows that the condition number is at most κ. We require a system with a small condition number to have an edge over running a classical algorithm for solving the system. This is because κ could potentially be as large as N , in which case, we do not derive any benefit by running a quantum algorithm. 3. ǫ is the error parameter, i.e., we would like our final distribution to be at most ǫ away from the exact distribution. The runtime of any naive classical algorithm is O(poly(N )κ polylog( 1ǫ )). Thus, the dependence on κ is better in the classical algorithm than the quantum algorithm. Also the dependence on ǫ is exponentially better in a classical algorithm. But the runtime of the classical algorithm grows polynomially in N , which is exponentially worse than the quantum algorithm’s log(N ) growth.
3.1
Idea behind the algorithm
We now discuss the idea behind the algorithm. The first point to observe is that the matrix A is Hermitian, and thus, it has a full basis of eigenvectors, all with real eigenvalues. Thus, any state, in particular our bi, can be written as X bi = βj ϕj i (2) j
6
where the ϕj i’s form the eigenbasis of A, with λj ’s being their eigenvalues. Thus our target state xi can be written as X xi = βj λ−1 (3) j ϕj i . j
To see this is right, note that if we operate with A on both sides, we get (2). Equation (3) suggests that the next goal is to be find these λj ’s. Given ϕj i, we need to be able to extract λj out. In order to do this, we need an operator U with eigenvector ϕj i and eigenvalue eiλj = e2πiwj . (For phase estimation, we require ωj to be between 0 and 1. That’s why we restricted the largest eigenvalue to be at most 1.) Now, what is the operator with the above described property? It is U = eiA , i.e., the matrix iA exponentiated, where eX is defined through its Taylor expansion: eX =
∞ X Xk k=0
k!
.
(4)
We know that (4) converges for every point in the complex plane, when X is a number. But in fact, it converges when X is a matrix also. Exercise 1. Prove the following statements. • The series defined in (4) converges for every matrix X. • If X has an eigenvector ϕj i with eigenvalue λj , then, eX too has ϕj i an eigenvector with eλj as the corresponding eigenvalue. • If X and Y commute, i.e., XY = Y X, then, eX eY = eX+Y . • If A is Hermitian, and if U = eiA , then U ∗ = e−iA and U U ∗ = I. ∗
4
Next time
In the next class, we will see how to use eiA in effecting the transformation we need for solving the system. We will also discuss another application of eigenvalue estimation, namely order finding.
7
CS 880: Quantum Information Processing
9/30/2010
Lecture 12: Order Finding Instructor: Dieter van Melkebeek
Scribe: Kenneth Rudinger
We continue our discussion of quantum algorithms based on eigenvalue estimation. We finish up the topic of solving sparse wellconditioned systems of linear equations over the reals. Additionally, we examine the topic of order finding and begin the development of an efficient order finding algorithm.
1
“Solving” Linear Equations
To briefly recap from last lecture, we wish to solve a system of linear equations Ax = b, where A is an N × N invertible hermitian matrix. We are given A and bi (normalized); it is our goal to find xi (renormalized, as we don’t require A to be unitary). Note that this is not quite the same as solving for x, because we are not actually solving for all components of x; rather we hope to get as an output the normalized state xi. The quantum algorithm we will develop will run in e O((log N )s2 κ2 1 ). s is the sparseness of the matrix (the maximum number of nonzero elements in any row of A), is the allowed error ( e xi − xi  < , where e xi is the actual output state) and −1 A ≤ κ. We also have the requirement that Ω(1) ≤ A ≤ 1. Because A is hermitian, we know we can decompose bi into a linear combination of the eigenvectors of A: X bi = βj ϕj i (1) j
where ϕj i are the eigenvectors of A, with eigenvalues λj . It is this state we wish to transform into xi, as we know xi can be decomposed in the following manner (up to normalization): X xi = βj λ−1 (2) j ϕj i j
P The proof of (2) is straightforward: A xi = bi = j βj ϕj i; applying A−1 yields A−1 A xi = P P −1 −1 ϕ i = xi = j j βj A j βj λj ϕj i. Step 1 in creating xi is performing the transformation E P bi 0i → j βj ϕj i λej using eigenvalue estimation. Step 2 will be transforming this new state into xi.
1.1
Step 1
To perform the first step of this algorithm, we need a unitary operator U with the same eigenvectors as A, and with eigenvalues in close relation to the eigenvalues of A. If we choose U = eiA , these conditions are satisfied, as U ϕj i = eiλj ϕj i. (The process of matrix exponentiation is discussed in Lecture 11.) Because A ≤ 1, we know that each eigenvalue λj satisfies λj  ≤ 1. Therefore, we know there is a onetoone correspondence between λj and eiλj = e2πiωj where ωj  ≤ 21 . (In fact, this even 1
holds if we relaxe the condition A ≤ 1 to A < π.) We can, for a given eigenvector Eof U ϕj i, obtain f ωj i through eigenvalue estimation on U . We are then able to obtain the state λej , because of the onetoone correspondence between λj and ωj . If A is sparse, we can efficiently compute U with fixed error in a time of approximately e O(log(N )s2 ). (This computation will be discussed at greater length in the upcoming lecture on ranP dom walks.) Once we have our hands on U , we can run eigenvalue approximation on j βj ϕj i 0i E E P to get βj ϕj i λej (which, we recall, we will not actually observe). λej will be concentrated j
near λj i, but they will not be exactly equal. In order to realize our overall error of , we must E e ensure that the relative error for each λj is at most . If we apply U k times in the eigenvalue estimation step, then absolute error is at most k −1 with high probability. To ensure that the relative error for all j is at most with high probability, we want k −1 /λj  ≤ , i.e., k ≥ 1/(λj ) for all j. Because of our condition on κ, we see that this inequality is satisfied if k ≥ κ . (Recall that A−1  ≤ κ means that λ−1 j  ≤ κ.) This is the first step in our algorithm. The complexity cost is the cost of eigenvalue estimation, 2 so we see that we have a complexity of O( κ (log(N )s2 ). The Ecost of running U once is O(log(N )s ), and we do it κ times. As we are now assured that each λej is sufficiently concentrated near each
respective λjEi for our total error to be less that , we will simplify the rest of our analysis by assuming λej = λj i.
1.2
Step 2
P Next we discuss the second step of our algorithm. We now have the state j βj ϕj i λj i. We cannot simply extract xi by multiplying each term in the sum by λ−1 j , as this would not be a unitary transformation. However, if we attach a single 0i qubit to our state, we can perform a pseudononunitary action on the λj i part of the state, absorbing the “nonunitary” part of the transformation into the added 0i state, preserving unitarity. Such a transformation can be achieved with elementary operations, and will have the following action for some C to be determined: s C C λj i 0i → λj i ( 1 − ( )2 0i + 1i) λj λj For this operation to be unitary, we need that
C λj 
≤ 1 for all j, so we choose the value of C to
1 κ.
be C = We now make a partial observation, only observing the last qubit, measuring either 0i or 1i, and thus collapsing the rest of the (now renormalized) state into the subspace consistent with observation. If we observe 1i, the new state of the system is (modulo normalization) P our −1 β λ j j j ϕj i λj i, which is the state xi we had wanted to extract from the beginning, with the addition of the λj i qubits. To reset these λj i states to their initialized state of 0i (and thus independent of j), we simply run eigenvalue estimation in reverse. As 1i is the “good” state, we want to be guaranteed to have a sufficiently high probability of observing it. We see that the probability of this event is Pr[observing 1i] =
X βj C C 1 2 ≥ min  2 ≥ C 2 = 2  j λj λj κ j
2
Therefore, if we run this algorithm κ2 times, we will have a good probability of observing 1i. This would make our overall runtime O(κ3 (log(N )s2 1 ). However, we can use amplitude amplification (as is done in Grover’s algorithm) to decrease runtime, as it is desirable to increase the probability of observing a “good” state (and decrease the probability of observing a “bad” state). We recall that if, after one run, the probability of observing a good state is p, amplitude amplification reduces the number of iterations to have a good probability of success from O( p1 ) to O( √1p ). Therefore, as probability of success (observing 1i) after one run is on the order of in runtime, yielding a final runtime of O(κ2 (log(N )s2 1 ).
2
1 , κ2
we can save a factor of κ
Order Finding
We now turn our attention to the problem of order finding. We are given integers M and a such that M > 0 and 0 ≤ a < M . Our goal is to find the order of a mod M , that is, find the smallest r > 0 such that ar = 1 mod M . Exercise 1. Prove that r exists if and only if GCD(a,M)=1. (M and a are relatively prime.) From now on, we shall assume that GCD(a, M ) = 1, and will not concern ourselves with the need to check this condition, which can be done in O(polylog(M )) time. Classically, the best known algorithms solve this problem in time exponential in the bit length of M . However, we will demonstrate the existence of a quantum algorithm that solves this problem in time polynomial in the bit length of M . Our algorithm will be an application of eigenvalue estimation. However, this problem of order finding can also be solved as a special case of period finding, as was done by Shor (which will be discussed in a future lecture, when we discuss Shor’s algorithm for factoring). In order to find r using eigenvalue estimation, we need a unitary operator with its eigenvalues connected to a in some manner. The following is such an operator: Ua xi = ax mod M i (if 0 ≤ x ≤ M ) Ua xi = xi (otherwise) For future reference, we note that we can efficiently compute high powers of U . If we simply applied U as a black box, this would take time linear in the exponent, and recall that we need an exponent of N = 2n if we want n bits of accuracy in the eigenvalue estimation procedure for U . However, we can compute the powers of Ua more efficiently because UaN xi = aN x mod M . Since aN mod M is modular exponentiation, we can compute it with a number of steps polynomial in log(M + N ). Therefore, computing high powers of Ua is relatively easy. Now, let us note that Uar = I, because we know that ar mod M =1. Therefore, for any eigenvalue 2πij λ of U , λr = 1 and thus λj = e r for 0 ≤ j < r. P −2πijl al mod M is an eigenvector of Ua with r Exercise 2. Show that the state ϕj i = √1r r−1 l=0 e −2πijl
eigenvalue λj = e r for 0 ≤ j < r. Also, find the remaining eigenvectors of Ua . P √ Note that r−1 r 1i. So, if we run eigenvalue estimation on U starting from the state j=0 ϕj i = Pr−1 e 1 1i, we obtain the state √r j=0 ϕj i f ωj i, where ω fj = rj . If we increase our number of bits of 3
E e accuracy n, the more concentrated around rj our state will be. When we observe this state, we will get a good approximation of ωj for some j, where j is uniformly distributed in {0, 1, ..., r − 1}. The following is a circuit diagram of this procedure, with ψi = 1i (the n bit string equal to 1). (a) ···
•
+i
•
+i
···
F −1
.. . ···
+i ψi /m U
U2
···
The state at point (a) is the desired state will do this using the following two facts.
.. .
• n−1
NM
NM
NM
U2 √1 r
Pr−1 j=0
ϕj i f ωj i. Our goal is to now recover r. We
1 Fact 1. If f ωj − rj  ≤ 2M 2 , then we can efficiently recover j j 0 0 and r where j = GCD(j,r) and r0 = GCD(j,r) .
j r
in reduced form, i.e., we can obtain j 0
Fact 2. If you pick j1 and j2 uniformly from {0, 1, ..., r − 1}, then: P r[GCD(j1 , j2 ) = 1] ≥ 1 −
X p prime
1 ≥ 0.54. p2
As we will explain in the next lecture, if GCD(j1 , j2 ) = 1, then r = LCM (r10 , r20 ), where we have run the procedure from Fact 1 both j1 and j2 to yield r10 and r20 , respectively. The proofs of the two facts, along with the rest of the algorithm, will also be provided in the next lecture.
4
10/4/10
CS 880: Quantum Information Processing
Lecture 13: Factoring Integers Instructor: Dieter van Melkebeek
Scribe: Mark Wellons
In this lecture, we review order finding and use this to develop a method for factoring integers efficiently. With the exception of order finding, none of today’s derivations rely on quantum computing.
1
Order Finding
In the previous class we covered order finding, which solves the following problem: Given integers a, M > 0 and 0 < a < M with gcd(a, M ) = 1, find the smallest integer r > 0 such that ar ≡ 1 mod M . Recall from the previous lecture that we used eigenvalue estimation to developed a quantum procedure that runs in time polylog(M + N ) and returns ω ˜ j such that j is uniformly distributed in {1, 2, ...r} and j 1 8 Pr ω ˜j − ≤ ≥ 2 ≈ 0.81. (1) r N π Here N is defined as
N = 2n ,
(2)
where n is the number of qubits used in the eigenvalue estimation and with larger n comes greater accuracy. There are two important facts about this procedure that we can exploit.
1.1
Fact 1
If the eigenvalue estimation is precise enough that 1 j ω ˜ j − r ≤ 2M 2
(3)
then we can recover j/r in reduced terms in time polylog(M ). By reduced terms, we mean that we can find j 0 and r0 such that gcd(j 0 , r0 ) = 1 and j 0 /r0 = j/r. To recover j/r in reduced terms, we use continued fraction expansion (CFE). By definition, a continued fraction takes the form b1 a0 + (4) a1 + b2 b3 a2 + ...
where ai , bi ∈ Z. A continued fraction is sometimes denoted as ∞ X bi  ai
(5)
k X bi  pk = ai qk
(6)
i=0
and the kth convergent is i=0
1
where pk , qk ∈ Z and gcd(pk , qk ) = 1. To construct the CFE of some x ∈ R, we write x as x = bxc + (x − bxc) = bxc +
1 . 1/(x − bxc)
(7)
Since 1/(x − bxc) ≥ 1, it itself can be expanded into a CFE. Eventually, the expansion will end if for some iteration x − bxc = 0, which will happen if and only if x is rational. If x is irrational, this expansion continues forever but the sequence of convergents quickly converges to x. As an example of CFE, consider the case where x = π. π = 3.14 . . . π = 3 + 0.14 · · · ⇒
p0 =3 q0
1 1/0.14 . . . 1 p1 1 22 π = 3+ ⇒ =3+ = 7 + 0.06 . . . q1 7 7 π = 3+
1.1.1
Properties of Continued Fraction
Recall from equation (6) that qk is the denominator of the kth convergent. It will be always be true that qk+1 ≥ 2qk (8) and
pk − x ≤ 1 . q2 qk k
(9)
From these two equations, it should be clear that CFE converges very quickly. Additionally, if p − x ≤ 1 q 2q 2 k
(10)
and gcd(p, q) = 1 then p/q appears as a convergent for some iteration of the CFE of x. Note the similarity between equation (10) and equation (3). If we set N = 2M 2 and perform the order finding procedure to get some ω ˜ j , we can use CFE on ω ˜ j to recover j and r. It follows from equation (8) that the number of convergents we need to calculate is logarithmic in the size of M .
1.2
Fact 2
If we pick two integers, j1 and j2 , independently and uniformly at random from {1, 2, ...r} then Pr [gcd (j1 , j2 ) = 1] ≥ 1 −
r X p∈prime
∞ X 1 1 > 1 − ≥ 0.54. 2 p p2
(11)
p∈prime
To show the inequality, consider that for any j we pick, the odds that it is divisible by some prime p is asymptotically 1/p, but will always be ≤ 1/p. Since j1 and j2 are picked independently, the 2
chance that they would both be divisible by prime p is ≤ 1/p2 . If we sum over all primes, we get the probability that they share any prime factors, thus the inequality shown in equation (11). In the case that j1 and j2 are relatively prime, then r = lcm(r10 , r20 ). Since eigenvalue estimation produces j1 and j2 that are relatively prime, any factors of r that were canceled in the fraction j10 /r10 could not have been canceled in the fraction j20 /r20 . Thus r = lcm(r10 , r20 ).
1.3
Quantum Algorithm
We can now describe our order finding algorithm. We first run the eigenvalue estimation twice and get a ω ˜ 1 , and ω ˜ 2 . Using CRE, we can determine r10 and r20 and compute r = lcm(r10 , r20 ). Using modular exponentiation, we check whether ar ≡ 1 mod M in time polylog(M ). If so, we know that r equals the order of a modulo M , or is a nontrivial multiple. The probability of the former is at least the probability that we have success in equation (1) for both independent runs, and success in equation (11). This puts the total probability of success above 0.35 provided N ≥ 2M 2 . With this probability, we can simply repeat this algorithm several times and output the smallest r retained. This gives the correct result with very high probability. Let us consider what the total running time of this algorithm is. We naturally choose N = 2M 2 , so the running time is in terms of M , and using the naive implementation of multiplication, this runs in O (log M )3 . The most efficient known algorithm runs in time O (log M )2 (log log M )(log log log M ) .
2
Factoring Integers
When asked to factor some number M , we should first check if it is a prime or a prime power. This check can be done in polynomial time, and if M is a prime or prime power, we are done. If M is composite, we need only a means to find a single nontrivial factor, as we can simply divide M by this factor and repeat our factoring algorithm as needed. To find a nontrivial factor, we use two lemmas.
2.1
Necessary Lemmas
The first lemma lets us factor M if we can find some x such that x2 ≡ 1 mod M and x 6≡ ±1 mod M . Lemma 1. For any integers x, M > 0 such that x2 ≡ 1 mod M and x 6≡ ±1 mod M , then gcd(x ± 1, M ) is a nontrivial factor of M . Proof. Since this implies that Factoring the left side gives
x2 ≡ 1
mod M,
x2 − 1 ≡ 0
mod M.
(x − 1)(x + 1) ≡ 0
mod M.
(12) (13) (14)
Since M is divisible by (x − 1) and (x + 1), clearly gcd(x ± 1, M ) 6= 1. Furthermore we know that x 6≡ ±1 mod M , so M does not divide x ± 1. Therefore there is at least one factor of M that is not in (x − 1), and at least one that is not in (x + 1). Thus gcd(x ± 1, M ) is some nontrivialfactor of M . 3
The second lemma, combined with our orderfinding algorithm, lets us find the x that we use in lemma 1. Lemma 2. If M has k distinct prime factors then the probability that orderM (y) is even and that y order(y)/2 6≡ ±1 mod M is at least 1 − 1/2k−1 , where y is picked uniformly at random from the set of integers modulo M that are relatively prime to M . Proof. We omit the proof, but it uses the Chinese remainder theorem. The full proof can be found in appendix four of [1].
2.2
Factoring Algorithm
We can now describe our factoring algorithm for an integer M with k distinct prime factors. If k = 1, then M must be either prime or a prime power. However, checking that M is prime or prime power can be done in polynomial time. If M is composite, we pick y ∈ {1, 2, . . . , M − 1} uniformly at random. If gcd(y, M ) 6= 1, then we are done, as the GCD is the nontrivial factor. Otherwise, we check that orderM (y) is even, and if so, compute gcd(y orderM y/2 + 1, M ) and see if this is a nontrivial factor of M . If so, we are done. Otherwise, we pick another y and try again. This algorithm can only fail if the orderM (y) is not even or y order(y)/2 6≡ ±1 mod M , which occurs only with probability of 1/2k−1 . As k is at least 2, the probability of failure is at most 1/2.
3
Breaking RSA
Besides purely academic interest in factoring numbers, there are also some applications for this algorithm, particularly in cryptographic systems. The best known is breaking RSA public key system, which is widely used in electronic commerce protocols. If Bob wants to communicate with Alice using the RSA system, Alice will generate two keys. The first is a public key, which she will publish and Bob will use to send messages. The second is a private key which Alice shares with no one.
3.1
Construction
The private key consists of two distinct primes p and q and an integer d such that gcd (d, (p − 1)(q − 1)) = 1.
(15)
p and q are typically chosen at random in a manner that an eavesdropper would have difficulty guessing. Equation (15) implies the existence of integer e such that de ≡ 1 mod ((p − 1)(q − 1)). Alice can classically compute e efficiently using Euclid’s algorithm. The public key that Alice publishes consists of simply e and n, where n = pq.
3.2
Encryption
Suppose that Bob wants to send a message M to Alice where M ∈ {0, 1, . . . , n−1}, but is concerned somebody might eavesdrop on the communication channel and discover M . Instead, Bob will compute cyphertext C = M e mod n (16) 4
and send C to Alice. Alice then computes Cd
mod n ≡ (M e )d ≡ M
mod n
1+k(p−1)(q−1)
(17) mod n
(18)
Recall Fermat’s little theorem, which states that if a prime p and an integer a is coprime with p, then ap−1 ≡ 1 mod p. (19) Additionally, note that if gcd(p, q) = 1 and n = pq, then a ≡ b mod n ⇔ a ≡ b mod p and a ≡ b mod q.
(20)
From these two equations, we can simplify equation (18) to Cd
mod n ≡ M
mod n
= M.
(21) (22)
Going from equation (21) to (22) is trivially true as M < n. To illustrate RSA consider the following figure, which shows Bob sending a message to Alice, but there is an eavesdropper listening in on the communication channel. Bob encrypts his message M as C using equation (16), and sends it through the channel as shown in the figure. At this
← (C)
A
B
E
Figure 1: Bob transmits C back to Alice. Alice and Eve both receive it. point Alice can recover M using equation (22). Eve, having access to C, e and n, can in principle recover M , but the only known ways to do so involve factoring n into p and q, for which no efficient classical algorithms is known. However, if Eve has a quantum computer, she can efficiently factor n, and thus recover M . Therefore, efficient factoring breaks RSA, as we can factor n into p and q. From p and q, we can use e to compute d giving us Alice’s private key. This has far reaching implications, as a malicious party with a sufficiently powerful quantum computer can break many modern electronic commerce and email encryption systems.
References [1] Michael A. Nielsen and Isaac L. Chuang. Quantum Computation and Quantum Information. Cambridge, 2000.
5
10/05/10
CS 880: Quantum Information Processing
Lecture 14: Computing Discrete Logarithms Instructor: Dieter van Melkebeek
Scribe: John Gamble
Last class, we discussed how to apply order finding to efficiently factor numbers using a quantum computer. We then outlined how this capability presents problems to certain cryptographic systems, such as RSA. In this lecture, we will discuss a quantum algorithm for period finding of which orderfinding is a a special case. We then develop a similar algorithm efficiently compute the discrete logarithm. Finally, we conclude by outlining DiffieHellman key exchange, whose security relies on the discrete logarithm being difficult to compute.
1
Period finding
Suppose that we are given some function f : Z → {0, 1}m , and are promised that it is periodic with period r > 0. That is, for all x, we know that f (x + r) = f (x). Further, we are guaranteed that no value repeats between periods, so that for all s such that 0 < s < r, f (x) 6= f (x + s). Our task is to determine the value of r. In order to develop the algorithm, we will first assume that we are also given another integer N such that rN . We will then relax this assumption, so that N need only be an upper bound for r.
1.1
Special case: r divides N
First, we initialize our system in the state ψ0 i = 0n i 0m i ,
(1)
where n = ⌈log N ⌉, so that N can be represented in the first register. Next, we create a uniform superposition over {0, ..., N − 1} in the first register. Operating on our our state with the quantum oracle and storing its output in the second register gives us N −1 1 X xi f (x)i . ψ1 i = √ N x=0
(2)
Then, we observe the second register, leaving us in the state ψ2 i = p
1 N/r
X x
xi zi ,
(3)
where z is chosen uniformly at random from {f (0), f (1), . . . , f (r − 1)} and the sum runs over all x such that f (x) = z. Note that there are N/r such values of x, hence the normalization. We now call the smallest x in our sum x0 and note that since f has period r, each x can be written as x = x0 + jr for an integer j. Hence, our state is ψ2 i = p
1 N/r
N/r−1
X j=0
1
x0 + jri ,
(4)
where we have suppressed the second register, as we will not need it again. Now, we apply the inverse discrete Fourier transform over ZN to the state, which leads to ψ3 i = =
FN−1 ψ2 i
p
1 N/r
N/r−1
X j=0
N −1 1 X −2πi(x0 +jr)y/N √ e yi N j=0
√ NX N/r−1 −1 j X r −2πix0 y/N e−2πiry/N yi . e N y=0
(5)
j=0
Note that the second summation constitutes a geometric sum, so if ry/N is an integer, N −1 X
e−2πi(x0 +jr)y/N =
j=0
N , r
(6)
otherwise it is zero. Thus, we only ever observe a state yi such that ry/N ∈ Z. This means that we observe y, a multiple of N/r, distributed uniformly such that 0 ≤ y < N . Now, we proceed as we did in order finding. We run the algorithm twice, generating two outcomes y1 = a(N/r) and y2 = b(N/r). Since a and b are picked uniformly at random, with high probability gcd(a, b) = 1 and so gcd(y1 , y2 ) = N/r. Since we were given N , we can compute r and we are done. Next, we relax the assumption that rN . Intuitively, instead of yielding only random multiples of N/r, the above algorithm will return a distribution of values that are concentrated around integer multiples of N/r, where the concentration will be higher for higher N .
1.2
General analysis
We now analyze the above algorithm for arbitrary r and N , where N is an upper bound for r. We start from the state N −1 1 X xi f (x)i . (7) ψ1 i = √ N x=0
Next, for analysis purposes we create a modified Fourier transform that makes sense for our periodic function f . Specifically, if we restrict ourselves to the domain x ∈ {0, 1, . . . , r−1}, we can use f (x)i rather than xi as our base states. This is because f is not permitted to have any duplicate values each part of the domain. This modified Fourier transform is now defined as r−1 E 1 X 2πixy/r ˆ e f (x)i , f (y) = √ r x=0
(8)
which is defined for 0 ≤ y < r. By the general properties of the Fourier transform, we can write r−1 1 X −2πixy/r ˆ E e f (x)i = √ f (y) , r y=0
2
(9)
for 0 ≤ x < r. However, note that since both f and the amplitudes in (9) are periodic with period r, expression (9) actually holds for any x. Thus, we can write (7) as N −1 r−1 E X X 1 1 ψ1 i = √ xi √ e−2πixy/r fˆ(y) r y=0 N x=0 ! r−1 N −1 E 1 X −2πixy/r 1 X √ e xi fˆ(y) (10) = √ r y=0 N x=0 Then, we measure the second register, which gives us a y uniformly at random in {0, . . . , r − 1}. Neglecting the second register, our state is now N −1 1 X −2πixy/r ψ2 i = √ e xi . N x=0
(11)
Next, we apply a standard inverse Fourier transform over ZN to this state, in a procedure that we as phase estimation with ω = y/r. Hence, measuring the first register gives a state identify E g y/r , which is concentrated around y/ri. Now, just as we did for order finding, we can apply the continued fraction expansion to obtain r with finite probability after two independent runs. As anticipated, the only difference between this general treatment and the special case where rN is that we needed to use phase estimation and a continued fraction expansion. In order to see that order finding is a special case of period finding, consider the function f (x) ≡ ax mod M , which has period equal to the order of a. In fact, if we draw the period finding circuit, we can identify an exact correspondence to the eigenvalue estimation and orderfinding algorithms: (12) (a) +i 0i
.n
.m
F −1 Uf
NM
NM
Point (a) in the above diagram, corresponding to expression (11), is the same state as (5) in lecture 10, with ω = y/r. Thus, our oracle Uf is equivalent to running the controlled powers of U in figure 2 of lecture 10 (for periodfinding) and in the diagram of lecture 12 (for orderfinding). Note also that since we do not ever use the result of the lower register, measurement of it is not necessary for the algorithm.
2
The discrete logarithm
Now, we turn our attention to computing the discrete logarithm. Suppose we are given an integer M > 0 and a generator g of Z∗M , the group of all integers mod M that are relatively prime with M under multiplication. Further, suppose that we know the order of Z∗M , say R = Z∗M  1 1
Note that this is not a further restriction, as R can be computed from the Euler totient function, and it can be shown that the totient can be efficiently computed using the factorization algorithm.
3
Note that this is a cyclic group in which every element is a power of g. Further, suppose we ∗ . Then, our goal is to find the smallest integer l ≥ 0 such that g l ≡ a are given some a ∈ ZM mod M . Note that checking that we have a multiple of l is efficient, as it only requires modular exponentiation. However, classically the best known algorithms for finding l are exponential. In order to construct an efficient quantum algorithm for this problem, consider the function f : ZR × ZR → ZM , f (x, y) = ax gy mod M . Then, we begin with the state R−1 1 X ψ1 i = xi yi f (x, y)i . R x,y=0
(13)
Note that f (x, y) = ax gy = glx+y . Now, suppose we observe the third register, resulting in a value gb , with b chosen uniformly from {0, . . . , R − 1}. Hence, our first two registers are in the state 1 X ψ2 i = √ xi yi , R (x,y)∈Λ
(14)
where Λ = {(x, y) : 0 ≤ x, y < R and ℓx + y = b}, which is just R−1 1 X xi b − ℓxi . ψ2 i = √ R x=0
(15)
Next, we apply the inverse Fourier transform over ZR to both registers, resulting in R−1 R−1 1 X X −2πi[ξx+η(b−ℓx)]/R e ξi ηi . ψ3 i = 3/2 R x=0 ξ,η=0
(16)
Measuring this state gives us αξ,η =
R−1 1 −2πiηb/R X −2πi(ξ−ℓη)/R x e . e R3/2 x=0
(17)
This is now a geometric sum, so we know that we will only observe states that satisfy ξ ≡ ℓη mod R, since otherwise the sum is zero. What this gives us is a uniformly chosen state η, ℓξ mod Ri for 0 ≤ ξ < R. Hence, if we can compute η −1 , we can extract l and we are done. For η −1 to exist, we need gcd(η, R) = 1, which can be shown to occur with high probability: Z∗  1 . (18) Pr [gcd(η, R) = 1] = R ≥ Ω R log log R Note that the above procedure can be generalized to other groups. Specifically, the algorithm works for any cyclic group with unique representation such that the group operation is efficiently computable. In the next section, we detail how the computing the discrete logarithm can pose a threat on certain cryptographic systems.
4
3
Breaking DiffieHellman
Consider the DiffieHellman protocol, which enables two parties to exchange keys over an insecure channel. The protocol is as follows: 1. The two parties (Alice and Bob) agree publicly on a prime p and a generator g for Z∗p . 2. Alice picks a randomly from Z∗p and sends A = ga mod p to Bob. 3. Bob picks b randomly from Z∗p and sends B = gb mod p to Alice. Now, the key is K = ga b mod p. Note that K = Ab mod p = B a mod p, so that both Alice and Bob can easily generate K. However, since a and b were never transmitted and were random, it is thought that it is classically difficult to to construct K due to the discrete logarithm being hard to compute. While this remains an open question, what is clear is that being able to compute the discrete logarithm efficiently trivially breaks the system. The best known classical algorithm for computing 1/3 the discrete logarithm takes time 2O(n) on nbit inputs, while for factoring it takes time 2O(n ) , so classically this protocol seems more robust than RSA. However, this system of key exchange is still vulnerable to an attack by a quantum computer. This also means that certain cryptographic systems, such as ElGamal encryption, are broken by a quantum computer, as well. Further, many more sophisticated key exchange protocols involving groups over elliptic curves are still vulnerable to a quantum computer, as the quantum algorithm for the discrete logarithm breaks them, too. In the next lecture, we will discuss the hidden subgroup problem, which is sufficiently general to encompass almost all the applications of efficient quantum algorithms we have seen so far. We will then develop an efficient quantum algorithm to solve it for finitely generated Abelian groups.
5
CS 880: Quantum Information Processing
10/7/2010
Lecture 15: The Hidden Subgroup Problem Instructor: Dieter van Melkebeek
Scribe: Hesam Dashti
The Hidden Subgroup Problem is a particular type of symmetry finding problem. It captures a lot of quantum algorithms and instantiations of this problem give many of the previous problems that we have seen in this course. In this lecture we see how we can cast the previous problems as instantiations of this problem. We also consider an efficient Quantum algorithm to solve the Hidden Subgroup Problem on finite Abelian groups.
1
The Hidden Subgroup Problem
Let us start with the definition of the Hidden Subgroup Problem (HSP): Definition 1. Given access to an oracle function f : G → R, from a known group G to its range, such that there exists a subgroup H ≤ G such that ∀x, y ∈ G, f (x) = f (y) ⇔ Hx = Hy. Problem is to find a set S of generators of H. This means that the function f , returns the same value iff x and y belong to the same coset of H within G. Note that Hx = Hy ⇔ yx−1 ∈ H and in general we can have right or left coset (in this case right coset). Clearly, when we consider an Abelian group G or the subset H is a normal subgroup of G, there is not any difference between right and left cosets of H, where Hx = {hxh ∈ H}. Before delving into efficient algorithms for finding a set S of generators for H, a more basic question is whether a small set of generators exists. The answer is ”yes” and its proof comes as an exercise. Exercise 1. Show that for every subgroup H of G, there exists a set of generators of size at most log2 H. We will use hSi to denote the set of elements generated by S. Next, we show that how HSP captures many of the problems we have seen before.
1.1
Deutsch Problem
The Deutsch problem is a simple instantiation of the HSP problem. Let us recall the problem: For a given f : {0, 1} → {0, 1} the problem is whether f (0) = f (1) or not. In order to cast this as an HSP instantiation, we define the group G = Z2 and we also need to define our oracle function, which is the same as the function f in the Deutsch problem. When the function f is constant, the set H equals G and otherwise it is {0}. Hence, the HSP problem would be distinguishing between H = {0} vs H = Z2 , where, G = Z2 .
1
1.2
BernsteinVazirani Problem
Let us recall the problem; for a given f : {0, 1}n → {0, 1} : x → ax + b and the challenging goal was to find a. First let define our group G = Zn2 , and the oracle function is as the same as the f function. Now we use the definition of the subset H to cast it, which is looking for f (x) = f (y). Hence, in the BernsteinVazirani Problem, for all the elements of H we should have a(x − y) = 0: H := {z ∈ Zn2 az = 0}. Once we have a set of generators for H, we can find a as the nontrivial solution to a homogeneous system of linear equations.
1.3
Simon’s Problem
In Simon’s Problem we were given a function f : Zn2 → R such that f (x) = f (y) ⇔ x + y = s. The function is either onetoone (s = 0) or twotoone (s 6= 0) and the goal was to finding the shift s. In order to cast it, the group G = Zn2 , the oracle function is this f function, and the H is the subgroup generated by S = {s}.
1.4
Period Finding
For a given function f : Z → R, we knew that f is periodic with period of r, (∀x)f (x) = f (x + r), and (∀ 0 ≤ x < y < r)f (x) 6= f (y) and the goal was to find the period r. Hence, the group G = Z, the oracle is the same as f function, and H is a subgroup generated by r. We note that the Order Finding Problem is a special case of the Period Finding Problem and falls in this category of HSP problems, as well.
1.5
Finding Discrete Logarithms
Let us first setup the problem by introducing notations; an integer M > 0, g generator for Z∗M , a ∈ Z∗M , R = Z∗M , used to find smallest l, such that g l = a. The function we used in the previous lecture was f : ZR × ZR → ZM : (x, y) → ax g y mod M . This function was designed such that it is constant on the coset of H = h(1, −l)i in G = ZR × ZR . Indeed, f (x1 , y1 ) = f (x2 , y2 ) ⇔ lx1 + y1 = lx2 + y2 ⇒ l(x1 − x2 ) = −(y1 − y2 ). We use the function f as a HSP instantiation and can extract l as the generator of H.
2
1.6
Graph Automorphism and Graph Isomorphism
For a graph A(V, E), say with V  = n, a graph automorphism is a relabeling of vertices which preserves the edges. The graph automorphism problem for A is to find a set of generators for the group of automorphisms Aut(A) of A. We can cast the problem as an HSP over the group G of all permutations of {1, 2, . . . , n}, i.e., G is the symmetric group Sn . In addition, our function f should be constant on the automorphisms of G and the way that we can define it, is that for a permutation π, f (π) = π(A). f (π) = f (σ) ⇔ π(A) = σ(A) ⇔ (σ −1 π)(A) = A ⇔ σ −1 π ∈ Aut(A) ⇔ σAut(A) = πAut(A) We note that, in this case the group G is not an Abelian group and the subgroup H = Aut(A) is not always normal. Based on the above definition our subgroup is a left coset. Hence, the Graph Automorphism Problem can be cast as an instantiation of HSP problem. Next we consider the Graph Isomorphism Problem. The Graph Isomorphism Problem reduces to The Graph Automorphism Problem Here, we claim that if there is an efficient algorithm to solve the Graph Automorphism Problem we can use it to solve the Graph Isomorphism Problem. ˙ 2 Two connected graphs A1 and A2 are isomorphic iff there exists an automorphisms of A1 ∪A ˙ that maps a vertex from A1 to a vertex from A2 . Given a set of generators for Aut(A1 ∪A2 ), we can check the latter condition by verifying that there is a generator that maps a vertex from A1 to a vertex from A2 . In the general case, in which A1 and A2 are not connected graphs, we can add a node to each graphs, connect the new nodes to every other vertex in their associated graphs, and make a connected graph. Actually, the new nodes have maximum degree in each graph. Next, by adding an extra node to each graph which is only connected to the new node, we can handle the case that the graphs are fullconnected. Then, if we can find a permutation which maps one vertex from one graph to a vertex from another graph and also preserves edges, the graphs are isomorphic.
Figure 1: The extra nodes are shown with filled circles and new edges are shown by curves. We only showed some of the new edges to show general idea of using extra vertexes.
3
2
Efficient Quantum Algorithm for The Hidden Subgroup Problem Over Finite Abelian Groups
Each finite Abelian group is isomorphic to the direct sum of some cyclic groups G=
k M
ZNj
j=1
In lecture 8 we developed the Fourier Transform over such G, which was a mapping from the standard δ basis into orthonormal basis that maps convolutions into the point wise products. We use the Fourier Transform here, because the Fourier Transform of a group interacts very nicely with the symmetries in the group. In particular, it perfectly works for a coset state Hgi, which is the uniform superposition of all elements of the coset H, 1 X Hgi = p hgi H h∈H Now, let us complete the Fourier Transform of a coset state 1 X 1 X p χy (hg) yi F Hgi = p H h∈H G y∈G X X 1 =p χy (g)( χy (h)) yi , HG y∈G h∈H
(1) (2)
where, χy (h) = e Exercise 2. Show that
2πi
Pk
j=1
yj hj Nj
.
( H if y ∈ H ⊥ χy (h) = = 0 otherwise h∈H X
where H
⊥
= {y ∈ G(∀h ∈ H)
k X yj hj j=1
Nj
∈ Z}
Plugging in Exercise 2 into Equation (2) gives us: s H X F Hgi = χy (g) yi . G ⊥
(3)
y∈H
This is the reason of using the Fourier Transform; we started with the coset state Hg of all elements and by using the Fourier Transform we get an equally weighted superposition over H ⊥ . In particular, if g = 0 we get the coset state which is a uniform superposition over H ⊥ . The quantum algorithm for solving the HSP over G starts with the uniform superposition over G: 1 X xi Gi = p G x∈G 4
By applying our blackbox fo f we obtain 1 X p xi f (x)i . G x∈G Next, we observe the second register f (x)i, which leaves us in the first register the coset state Hgi for a uniformly at random chosen g ∈ G. The next step is applying F −1 , which by Equation (3) gives us: s H X χy (g) yi . G ⊥ y∈H
Then, we observe the first register to get a uniform y ∈ H ⊥ . Lemma 1. When we run these steps log H ⊥  times and collect the y’s, then Pr[hy1 , y2 , . . . , ylog H ⊥  i = H ⊥ ] ≥ δ ∗ , where δ ∗ is a universal positive constant. The proof is the same as what we explained for Simon’s problem in Lecture 5. ⊥ When we find a set of generators for H ⊥ , we can use it to find (H ⊥ ) to get the hidden subgroup H by solving a linear system of modular equations. This way we solve the HSP over a finite Abelian group G in polylogarithmic time in G. To solve the problem over finitely generated Abelian groups that are not finite, we perform a similar process as we did for the Period Finding problem.
3
Hidden Subgroup Problem for NonAbelian Groups
In general, the HSP over any finite group G can be solved using only polylog(G) many queries to the blackbox f . This is something we will prove in the next homework. However, this does not mean that the overall algorithm runs in polylogarithmic time in G. In fact, we only know how to do the latter for a few nonAbelian finite groups. We do not know it for the following groups, for which efficient solutions to the HSP would have interesting consequences. 1. Dihedral Groups (DN ): The group DN consiste of the symmetries of a regular N gon (rotations and reflections). A solution to the HSP problem on the Dihedral groups that would allow us to solve the g(N )unique SVP (Shortest Vector Problem) for polynomial functions g(N ). The SVP is a Lattice Problem on a real N dimensional space with N basis vectors, where every element in the lattice is a linear combination of the basis vectors with integer coefficients. The SVP asks for a shortest nonzero vector in the lattice. The SVP does not have a unique solution because for every vector in the lattice its minus is also in the lattice. The unique SVP is a promise version of SVP in which we are told that the solution is unique up to the sign. The term “g(N )unique” means that the second shortest lattice vector up to sign is of length at least g(N ) times the length of the shortest nonzero lattice vector. Solving the g(N )unique SVP for polynomial g(N ) is considered hard in the classical setting, and is used to design latticebased cryptosystems. Efficiently solving the HSP over the dihedral group would break those cryptosystems. 5
2. Symmetric Group: As we saw, Graph Isomorphism reduces to the HSP over the symmetric group, and the coset states Hgi contain enough information to solve the problem in principle. However, it turns out that Fourier sampling loses that information. More specifically, there are positive and negative instances of Graph Isomorphism for which the dsitributions that result after Fourier sampling are exponentially close. This means we would need exponentially many runs in order to have a good chance of distinguishing the positive from the negative instances. Now, we are done with the Hidden Subgroup Problem and in the next lecture, we will talk about ”Quantum Walks”.
6
CS 880: Quantum Information Processing
10/14/2010
Lecture 16: Quantum Walks Instructor: Dieter van Melkebeek
Scribe: Brian Nixon
In this lecture we start our examination of quantum walks, the quantum equivalent of classic random walks. There are two types of quantum walk: discrete time (which is mainly considered by computer scientists) and continuous time (which is mainly considered by physicists). Here we focus on the former and leave the latter for future lectures.
1
Classic Random Walks
A random walk is performed on a graph. Given a starting vertex s we move to a randomly chosen neighboring vertex. From there we can step again to a new neighbor and so on. An example is provided by a man who steps out onto the street and walks either one block north or one block south depending on the flip of a coin (here the graph is a line). We can ask questions about his position after t steps. Random walks can be modelled as Markov chains. There are two big motivations for random walks. 1. We can model random processes with a walk (e.g. there is an algorithm to solve 2SAT that can be analyzed as a random walk on a line). 2. It also forms the body for some algorithms (e.g. sampling methods for statistical physics). For such an algorithm we might ask under what conditions a stationary distribution might exist. If it exists, what is required for such a distribution to be unique and how quickly do walks converge to a stationary distribution (if ever)? We first consider regular graphs (all vertices have same degree). Let A be the normalized adjacency matrix. The distribution at step k + 1 is the vector pk+1 = Apk where pk = (as )s∈G and P as = 1, as ≥ 0∀s. We set our initial position vector p0 centered on a single node s as as = 1, at = 0 for t 6= s. A stationary distribution is represented by a vector v such that Av = v. Exercise 1. Prove the following propositions for the above A. 1. The uniform distribution is an eigenvector with eigenvalue 1. 2. There is a multiplicity of eigenvectors over 1 iff G is disconnected. 3. −1 is an eigenvalue iff G is bipartite. Definition 1. The spectral gap is δ = min({1 − λ∃x 6= u s.t. Ax = λx}) where u is the uniform distribution. Finding δ amounts to finding the greatest magnitude eigenvalue less than 1. Note if −1 is not an eigenvalue and there is only one eigenvector for 1 then we will have convergence to a unique stationary distribution at a speed dictated by the spectral gap. Given an initial vector p, we want to know (Ak p − u) = Ak (p − u). We note that A is a symmetric matrix so we can build an orthonormal basis of eigenvectors {ei } that includes the 1
√ normalized uniform distribution N u = e1 given that it is an eigenvector of 1. Now for any probability distribution p, hp, ui equals the sum over each entry multiplied by N1 . As the inner product is bilinear we get h(p − u), ui = hp, ui − hu, ui = N1 − N1 = 0. Since (p − u) has no N N X X γi λki ei where λi is the eigenvalue γi ei = component in the u direction we get Ak (p − u) = Ak i=2
corresponding to ei . Thus Ak p−u22 =
N X i=2
γi λki 2 ≤ (1−δ)k
i=2
N X i=2
γi 2 = (1−δ)k p−u22 ≤ (1−δ)2 .
This last step is reached by noting orthogonality gives us p − u22 + u22 = p22 and as this is 2 the sum of probabilities squared (all of [0, 1]) √ we mustkhave p2 ≤ 1. . Considering the √ elements k k 1norm, we want A p − u1 ≤ N A p − u2 ≤ N (1 − δ) ≤ ǫ. To guarantee this we need √ √ log ( N /ǫ) to choose k ≥ log (1/(1−δ)) for our running time. This is ≥ Ω( 1δ log ( N /ǫ)) for small δ. There are graphs where this bound is tight. The important part to take away is the dependency on spectral gap is 1δ .
2
Quantum Model
When adapting to quantum we again X need to pick neighbors at random. If we use the adjacency graph A we get vi → Nv i = N1v wi where Nv is the neighborhood of v. This is not unitary (v,w)∈E
(consider a square graph). Our solution is to distinguish between picking a nieghbor and moving to it. Let our state be v, ei where e is an edge containing v. In phase I, we will perform a unitary operation such that an observation afterwards would yield a random choice of edge. Now there is no interference as we’ve saved the v information and the states that would interfere operate in different subspaces. This is enough to be unitary. Call this transition Cv . In phase II, implement Sv, ei = w, ei where e = (v, w). Note this transition is its own inverse. If we observe after each pair of steps, this method provides a classic random walk. We have options in choosing Cv . If the degree is two (i.e. a walk on a line or circle) could use the Hadamard matrix. This approach can be generalized to higher values for the degree, where each entry has same amplitude at a different phase. Another option would be to ask for two amplitudes, one for the edge to change and one for the edge to go to itself. This is modelled by a square matrix of side length N = Nv  with value a on the diagonal and value b off. These variables are subject to the conditions a2 + (N − 1)b2 = 1 and 2ab + (N − 2)b2 = 0 by our orthonormality requirement. b = 0 returns the trivial case ±I. Another solution is b = ± N2 , a = ±( N2 − 1). This particular choice of Cv has 1 eigenvector for 1 (the uniform vector) and (NP − 1) eigenvectors for −1. This corresponds to the Grover iterate where we are reflecting around w∈Nv wi. This suggests a connection between quantum walks and quantum search that we’ll explore in a bit. Note that when N = 2 this corresponds to movement in one direction only. Compare classic and quantum walks in one√dimension. For the classic model what we get will resemble a normal distribution with width Θ( k). For quantum the resulting distribution after observation will depend on the coin operation, Cv , we chose. If the gate was symmetric we can expect to have two bulges in the distribution that are Θ(k) apart. This is why we expect to find answers quicker with quantum. 2
3
Search
Suppose we have a blackbox on the vertices of a graph that separates “good” vertices from “bad” vertices. Given a random starting vertex we want to walk until we find a good vertex. Definition 2. The hitting time is the number of steps at which the probability of hitting a good value rises above a certain threshold. Alternatively, we can consider it the expected number of steps before getting our first good hit. Theorem 1. Let G be a regular graph with spectral gap δ. Let ǫ be the fraction of good vertices. 1 ) and with a quantum random walk based on Grover diffusion it Classically the hitting time is O( δǫ 1 √ is O( δǫ ). With a quantum algorithm, the hitting time will depend on the choice of Cv . Proof. Coming in the next lecture.
3.1
Applications
1. Grover search. Here G is the complete graph with 2n vertices as we allow transition from possible state to all other states uniformly at random. δ ∼ 1, ǫ = Nt so we get the expected p speed up to O( N/t). Exercise 2. Show δ = 1 −
1 N −1
and determine the eigenstructure for Grover search.
2. Spatial search. This differs from Grover in that we only allow certain transitions. Examples include grids (1 dimensional, 2D, 3D, ...). This happens in appears in cases such as database search. Exercise 3. For the circle graph show determine the eigenvalues and show that δ =O( N12 ) and ǫ = N1 . 2 Classically, spatial search on 1 dimensional graphs takes √ O(N ), quantum takes O(N ). With 2D, classic takes O(N log √ N ) and quantum takes O( N log N ). For 3D and higher, classic is O(N ) and quantum O( N ).
3. Element distinctness problem: given N elements answer if they are distinct by returning a boolean value (we can also look at a search version which returns a matching pair if one exists). Our initial attempt will be to pick ℓ elements uniformly at random. Test that no element collides within √ √ and use Grover to test that no outside element collides. The number of queries is then ℓ + N with probability of success ≥ Nℓ . We can optimize it by choosing ℓ = N but this leads to an expected linear running time which doesn’t offer any improvement on the classic model. We can modify this idea to boost success rate using amplitude amplification. q √ N N So the expected number of trials goes from ℓ to N is still optimal so we get a ℓ . ℓ = running time of O(N 3/4 ). We can further modify our algorithm by not clearing all l elements every time we start over but instead swap out a single element chosen uniformly at random. This corresponds to a walk on a Johnson graph, a regular graph of V  = Nℓ with degree ℓ(n − ℓ) as we have ℓ elements in our current subset node to switch and (n − ℓ) options to 2 (N−2) = Θ( Nℓ 2 ) and δ = ℓ(NN−ℓ) . switch to. It is well known for the Johnson graph that ǫ = ℓ−2 N (ℓ) 3
N So cost is O( √1 ) + ℓ=O( √ ) + ℓ. The optimal choice is ℓ = Θ(N 2/3 ) yielding a O(N 2/3 ) δǫ ℓ running time. . Classically we don’t get the square root so this method yields O(N ) running time.
We note that this method offers another way of seeing the solution to question 6 in assignment 1, which was stated: given a function f on {0, 1}n that√is twotoone give an algorithm that ) time. By the birthday paradox returns two inputs that map to the same element in O( 3N √ √ we have a high probability of finding a collision amongst N elements so let N ′ = N . Running the element distinctness algorithm returns an answer in O((N ′ )2/3 )=O(N 1/3 ) time. This demonstrates that we have in fact found the lower bound for solving this particular problem.
4
10/18/10
CS 880: Quantum Information Processing
Lecture 17: Classical vs. Quantum Hitting Time Instructor: Dieter van Melkebeek
Scribe: Cong Han Lim
In the previous lecture, we introduced the idea of using quantum walks in search algorithms to obtain quadratic speedups over classical random walks. This was done by using a theorem that related the spectral gap of a transition matrix with the classical and quantum hitting times: Theorem 1. Let G be a regular graph with spectral gap δ. Let ǫ represent the fraction of good 1 ) in the classical random vertices. The expected number of steps to first reach a good vertex is O( δǫ 1 walk, and O( √δǫ ) in the quantum random walk (based on Grover’s diffusion operator). Here we will prove a weaker form of the theorem. For a proof of Theorem 1, refer to [4].
1
Main Theorem and Proof
The version of the theorem we will prove in this lecture will allow us to distinguish between the cases where our graph G has either no good vertices or at least an ǫfraction of good vertices. We obtain the proof of this by performing a simpler analysis on the same algorithm used in the full version. This weaker formulation still allows us to obtain certain desired results. For example, in the element distinctness problem, we can still detect whether we have duplicate elements but we cannot return an offending pair.
1.1
Setup for Analysis
Let A be the matrix describing the random walk as a Markov process where, without loss of generality, the vertices are ordered such that all the bad vertices occur before the good ones. Since we want to compute the expected time to first hit a good vertex, we will consider the variation where we never leave a good vertex once we reach it. Let A′ be the matrix obtained by setting the transitions for the good vertices to be the identity transition. We can write this matrix as ABB 0 , AGB I where ABB denotes the transitions between the bad states and AGB denotes the transitions from the bad to the good states. Given the transition matrix A′ , we will never leave from a good vertex. (In the literature, A′ is known as the transition matrix for an absorbing Markov chain, and the bad and good vertices are known as leaking and absorbing states respectively.)
1.2
The Classical Random Walk
We begin the classical random walk with a uniform distribution over the states. After taking one step, if we hit a good vertex we are done. Otherwise, by conditional probability, we are left with a 1
uniform distribution over the bad vertices. From this position we analyze the probability of success (ie. hitting a good vertex) within k steps: 1 X ′ k (A ) B vw v,w∈B T T 1 1 (A′ )k √ · 1B = √ · 1B B B
≤ (ABB )k
Pr[we do not succeed in the next k steps] =
≤ kABB kk .
This gives us E[number of steps needed to first success] = ≤ = Lemma 1. kABB k ≤ 1 −
∞ X
k=0 ∞ X k=0
Pr[first success occurs after the kth step] kABB kk
1 . 1 − kABB k
(1)
δǫ 2
Proof. (Outline) Consider a vector x and the norm kABB xk. We can represent this norm as kA˜ xk2 , where the vector x ˜ is simply the vector x padded with zeroes for coordinates corresponding to the good vertices. By the spectral theorem, we can decompose x ˜ into two components  the component parallel to the uniform distribution (allones vector which has eigenvalue 1) and component consisting of linear combinations of eigenvectors perpendicular to the uniform distribution. The spectral gap gives a lower bound on the reduction in the perpendicular component each time we apply the matrix A, and repeated applications of A would reduce the parallel component as certain states get weeded out. The analysis here is similar to the analysis done in Lecture 16.
1.3
Analysis of Quantum Random Walk
Recall that for the quantum random walk, we act on directed edges v, wi instead of vertices. Every step consists of two stage: 1. Coin flip stage 2. Swap stage We will begin by analysing the coin flip stage. Note that we can write the coin flip (Grover diffusion) operator on vertex v as X 1 Cv = p wi Nv  (v,w)∈E 2
where Nv denote the set of neighbors of v. Let πv denote the projection onto Nv . Then we can write Cv = 2πv − I. The general coin flip operator for every vertex can be denoted by C : v, wi → (I ⊗ Cv ) v, wi As for the swap stage, we have the swap operator S that gives S v, wi = w, vi . By introducing the operator π : v, wi → (I ⊗ πv ) v, wi = v, πv wi, we can combine the two stages as a single operator: U = SC = S(2π − I). We want to make an observation after every application of U to check if we have hit a good vertex. Directly observing the register describing our superposition over the edges will cause the states will collapse, reducing our algorithm to the classical random walk. This problem can be avoided by using an oracle query gate with an ancilla qubit to store the query result. If we observe a 1 in the ancilla qubit, we cause our superposition to collapse to a superposition over only the good vertices, and the opposite happens when we observe a 0. We will now describe the quantum walk algorithm. P We initialize the qubits to represent a uniform superposition over the graph. Let ψv i = √1N w Awv wi. Our initial state is v
1 X ψi = √ v, ψv i . N v
We first make an oracle query over this superposition and observe the ancilla qubit. If we observe a 1, we are done. Otherwise, we are left with a uniform superposition over the bad vertices: X ′ ψ = √1 v, ψv i . B v∈B We now consider the separate cases where our graph either has no good vertices or an ǫfraction of them. If our graph has no good vertices, note that applying U has no effect on ψ ′ i, so the resulting vector always corrresponds to the eigenvalue 1. If our graph has at least one good vertex, then ψ ′ i will have no component corresponding to the eigenvalue 1. This allows us to distinguish between the two cases by applying eigenvalue estimation on U . When we compute the phase of the eigenvalues of U , in the case with no good vertices we will obtain a phase of zero, whereas in the other case we will obtain a nonzero phase. Hence, we can apply phase estimation sufficiently precise (depending on the spectral gap) to differentiate between the cases. In order to perform this process, it is enough to have a nonzero lower bound for the eigenvalues of U . The following lemma gives us a relation between the eigenvalues of U and ABB . Lemma 2. The eigenvalues of U other than ±1 are given by p λ ± i 1 − λ2 = e± arccos λ where λ is an eigenvalue of the discriminant matrix D given by Dvw = 3
p (A′ )vw · (A′ )wv
We will defer our proof of the lemma to the end of the section. The matrix D can be viewed as a symmetrized version of A′ , and note that it has the form ABB 0 . D= 0 I The eigenvectors that have eigenvalue 6= ±1 have to correspond to the ABB portion of the matrix. By applying the lemma we obtain Phase of any eigenvalue of U ≥  arccos kABB k  and by Taylor expansion, we know that x2 cos x ≥ 1 − √ √2 x ≥ 2 1 − cos x, giving us √ p 2 kABB k √ ≥ δǫ.
Phase of any eigenvalue of U ≥
Hence, when we perform eigenvalue estimation, in the ǫfraction case it suffices to observe a √ 1 √ phase between 0 and δǫ. The number of operations we need to perform are O , as desired. δǫ While this might seem like a roundabout process to obtain the result, it is actually a trick that allows us to obtain the weaker theorem in a fairly simple manner. The proof for the full version is significantly more involved.
1.4
Proof Outline of Lemma 2
One thing to note about the operator U is that it acts on two registers. Let T be the operator such that T vi = v, ψv i. Exercise 1. Prove that T satisfies 1. T T † = π, 2. T † T = I, 3. T † ST = D. Let vi be an eigenvector of D and let λ be the corresponding eigenvalue. We can “extend” vi to a two register version by letting v ′ i = T vi. We will show that the eigenvectors of U that do not have eigenvalues ±1 are given by linear combinations of v ′ i and S v ′ i. Note that we have U v ′ = S v ′ U S v ′ = 2λS v ′ − v ′ 4
so the space spanned by the v ′ i and S v ′ i is invariant under the operator U . This allows us to break up the problem into two parts: considering eigenvectors contained in this space and those that have a component in the orthogonal space. We first consider the case of an eigenvector wi of U , which we can assume to be in the form wi = v ′ i − µS v ′ i for some µ. Applying the operator U , we obtain U wi = U v ′ − µU S v ′ = S v ′ − µ(2λS v ′ − v ′ ) = µ v ′ + (1 − 2µλ)S v ′ . Observing the coefficients, we know that −µ2 = 1 − 2µλ. So p µ2 − 2λµ + 1 = 0 ⇒ µ = λ ± i 1 − λ2
which is our desired result. Solving for µ, this gives us two eigenvectors for each choice of eigenvector v of D. This is ideal, since we have doubled the dimension in considering U instead of D and we are doubling the number of eigenvectors accordingly. As for eigenvectors that have a component in the orthogonal space, note that U simply peforms −S on the orthogonal component, so these have eigenvalues ±1.
2
General Framework
This brings us to the general framework for using quantum walks. There are three main steps: 1. Setup 2. Update 3. Check The setup is performed once at the start, and we loop the update and check process until we hit a good vertex. Let S, U, C denote the respective costs ofthese processes. The overall cost of such 1 a quantum walk algorithm would be S + O √δǫ (U + C) . One could reduce the overall cost of the algorithm by notiing that the checking cost C tends to be significantly higher than the update cost U . We could check less frequently, the tradeoff being lowering probability the of hitting a good vertex. The optimal algorithm one could obtain has cost 1 1 √ √ S+O U + C . For a full description of how to obtain refer to [3]. ǫ δ We will briefly describe how the element distinctness problem fits in this framework. For the quantum walk algorithm for this problem, we were performing a walk on the Johnson graph Jn,m where each vertex represented a subset of m elements out of the original of n elements and two vertices were adjacent if and only if the subsets they represent shared exactly m − 1 elements. Each step of the random walk can be seen as replacing an element in our chosen subset. Our goal is to hit a vertex representing a subset containing a collision. In this framework we can describe the associated costs as: 1. S: cost of collecting the initial m vertices 2. U : cost of replacing an element in the subset 3. C: cost of checking the subset to see if there are any collisions. 5
3
Further Reading
The paper by Szegedy [4] presents similar material in a slightly different form. Magniez et al. [3] elaborates on the general framework and discusses how to optimize the algorithm with regards to the update and checking processes. The papers by Ambainis [1] and Kempe [2] provide an introductory overview to quantum walks in general.
References [1] A. Ambainis. Quantum walks and their algorithmic applications. arXiv:quantph/0403120v3, 2004. [2] J. Kempe. Quantum random walks  an introductory overview. Contemporary Physics, Vol. 44, pages 307327, 2004. [3] F. Magniez, A. Nayak, J. Roland, M. Santha. Search via Quantum Walks. Proceedings of the 39th Annual ACM Symposium on Theory of Computing, pages 575584, 2007. [4] M. Szegedy. Quantum speedup of Markov chain based algorithms. Proceedings of the 45th Symposium on Foundations of Computer Science, pages 3241, 2004.
6
10/19/2010
CS 880: Quantum Information Processing
Lecture 18: Simulating Hamiltonian Dynamics Instructor: Dieter van Melkebeek
Scribe: Tyson Williams
Last lecture, we finished our discussion of discrete quantum walks. Today we discuss continuoustime quantum walks and how to simulate them using quantum gates. Their applications include solving sparse linear systems of equations and formula evaluation.
1
Continuoustime Walks
1.1
Classical
A classical continuoustime random walk on a graph G is specified by a vector P (t) where Pv (t) = Pr[walk is at vertex v at time t].
(1)
Each vertex sends probability to its neighbors proportional to its own probability. This process is described by dP (t) = (A − D)P (t), dt
(2)
where A is the adjacency matrix G (not normalized) and D a diagonal matrix with Dii = deg(vi ). The matrix L = A − D is known as the Laplacian of G. Exercise 1. Prove that (2) describes a valid probabilistic process. That is, show for all t that each component is never negative and all components sum to 1. There is, in fact, a closed form solution for (2), which is P (t) = eLt P (0).
1.2
(3)
Quantum
The transition from classical to quantum is easier in the continuoustime setting than in the discretetime setting. It is i
d ψ(t)i = (A − D) ψ(t)i . dt
(4)
While (2) preserves probability, (4) preserves the 2norm. We can state this using the braket notation hψ(t)ψ(t)i = hψ(t) ψ(t)i = ψ(t)i† ψ(t)i = 1,
(5)
which is just the inner product of ψ(t) with itself. Equation (4) then becomes i
d ψ(t)i = Lψ(t). dt 1
(6)
Proof that equation (6) is a valid quantum process. Since d d d hψ(t)ψ(t)i = hψ(t) ψ(t)i + hψ(t) ψ(t)i dt dt dt = iL† hψ(t)ψ(t)i − iL hψ(t)ψ(t)i = i(L† − L) hψ(t)ψ(t)i = i(L − L) hψ(t)ψ(t)i
(Since L is Hermitian)
= 0,
the 2norm of this quantum process is a constant. Thus, if hψ(0)ψ(0)i = 1, then hψ(t)ψ(t)i = 1 for all t. Physicists will recognize (6) as Schr¨odinger’s equation, which holds even when L is replaced with any Hermitian matrix H that varies over time. The closedform solution for constant H in the quantum case is similar to the classic one, namely ψ(t)i = e−iHt ψ(0)i .
(7)
When discussing how to solve wellconditioned systems of linear equations, e−iHt was the operator U that we used. We used the fact that if H is efficiently sparse, then we can compute U efficiently. We now show how to do that.
2
Simulating Sparse Hamiltonians
To simulate a sparse Hamiltonian H efficiently, we need a handle on H. It is not enough for H to be sparse. It needs to sparse in a “efficient” way. When looking at a row, we need to be able to efficiently locate and approximately compute the nonzero entries. We say that H is sparse when it has at most s nonzero entries per row/column where s = poly log(N ). We can efficiently approximate U = e−iHt when H is efficiently sparse. Our algorithm will be slightly worse parameters than the one we used while discussing wellconditioned systems of linear equations.
2.1
H is Diagonal
If H is diagonal, then e−iHt is just a combination of rotations.
2.2
H is Efficiently Diagonalizable
Being a Hermitian matrix, H has an orthonormal basis of eigenvectors. This implies that there exists a matrix V such that HV = V D where V , whose rows are the eigenvectors of H, is efficiently computable and D is a diagonal matrix. Then e−iHt = V e−iDt V −1 and we have reduced the problem to the case with a diagonal matrix.
2
(8)
2.3
H is a Matching
This is actually a special case of H begin efficiently diagonalizable. We single it out and discuss it further because we will use it later. If the graph underlying H is a matching, then H has at most one nonzero entry in each row/column. We can simultaneously permute the rows and columns to get a matrix of the form ∗ ∗ ∗ . ∗ ∗ ∗
Since 2 × 2 matrices are always efficiently diagonalizable when its entries are efficiently computable, we are done.
2.4
Closure Under Addition
If we can efficiently compute U1 = e−iH1 t and U2 = e−iH2 t , then we can efficiently compute U = e−i(H1 +H2 )t . This is easy when H1 and H2 commute because then U = e−i(H1 +H2 )t = e−iH1 t e−iH2 t = U1 U2 .
(9)
When H1 and H2 do not commute, we take advantage of the fact that we only need to approximate e−i(H1 +H2 )t . The Taylor series expansion for e−iHt is e−iHt = I − iHt + O H2 t2 . (10)
Since
n e−iHt = e−iHt/n ,
(11)
we are also interested in the Taylor series expansion for e−iHt/n , which is 2 t −iHt/n 2 t e = I − iH + O H 2 . n n
(12)
Then −iH1 t/n −iH2 t/n
e
e
2 t 2 2 2 t = I − i(H1 + H2 ) + O H1  + H  n n2 t2 = e−i(H1 +H2 )t/n + O H1 2 + H 2 2 2 , n
so n e−i(H1 +H2 )t = e−i(H1 +H2 )t/n = e−iH1 t/n e−iH2 t/n + O
3
H1 2 + H 2 2
t2 n
.
Closure under addition generalizes to n k k 2 Pk Y X t k e−i j=1 Hj t = e−iHj t/n + O Hj 2 . n j=1
(13)
j=1
In order to make the error term in (13) is no more than ǫ, it suffices for n to be at least 1 . poly max Hj , k, t j ǫ
2.5
H is Sparse
When H is sparse, the idea is to efficiently write H as a sum of efficient matchings and apply cases 2.3 and 2.4. First Attempt Let Hj be the jth nonzero entry in H. This will not work because the Hj ’s will not always be Hermitian. Second Attempt Decompose H into matchings. By Vizing’s Theorem, every graph G can be edge colored with at most ∆(G) + 1 colors, where ∆(G) is the maximum degree of the graph. Notice that the set of edges for each color is a matching. Unfortunately, we need to efficiently compute an edge coloring but all known constructive proofs of this result are not efficient. If we use O(s2 log2 N ) colors, then efficient constructions are known. For a graph G = (V, E), label the vertices with 1, . . . , N = V . If G has any self loops, we can take care of them by adding a diagonal matrix, which is efficiently computable by cases 2.1 and 2.4. Thus we can assume that G has no self loops. To the edge (v, w) ∈ E, assign the color (index of v as a neighbor of w, index of w as a neighbor of v, m(v, w), c(v, w) = w mod m(v, w)) v w, where
m(v, w) = min{µ ∈ Z+  v 6≡ w
(mod µ)},
which exists and is O(log N ) since 0 < v < w ≤ N . This can be seen by contradiction. Suppose that v 6≡ w (mod N ) but µ = ω(log N ). Then v ≡ w (mod µ) for all µ from 1 to O(log(N )). In particular, v and w are equivalent modulo the primes in that range. However, a nontrivial fact from number theory is that the the product all primes less than a number n is at least 2n . Thus by the Chinese remainder theorem, we get that v ≡ w (mod N ), a contradiction. This coloring is consistent (since c(v, w) = c(w, v)) and is efficient to compute. It remains to show that it is a valid coloring. Proof that this coloring is valid. It suffices to show that c(v, w) = c(v, w′ ) =⇒ w = w′ .
4
Case 1: v < w and v < w′
The second component of the color implies that w = w′ .
Case 2: v > w and v > w′ symmetric to case 1).
The first component of the color implies that w = w′ (and it is also
Case 3: v < w and v > w′ The third component of the color is µ = m(v, w) = m(w′ , v), so w ≡ v mod µ, which is a contradiction with the construction of µ = m(v, w). Case 4: v > w and v < w′
This case is symmetric to case 3.
Based on these techniques, our runtime for efficiently (and approximately) computing e−iHt when H is sparse is 1 poly s, log N, max Hj , t, j ǫ for an accuracy of ǫ. We refrain from specifying the exact polynomial since it is not as optimal as the one we mentioned while discussing wellconditioned systems of linear equations.
3
Application: Formula Evaluation
Consider the formula for determining which player in a twoplayer game (where each player always has two possible moves unless the game is over) has a winning strategy. Say that the formula evaluates to 1 if the first player to move has a winning strategy and 0 if the second player to move has a winning strategy. This type of formula is known as a game tree and has alternating levels of OR and AND gates with leaf nodes that indicate which player wins. The question is this, how many of the N leaves do we need to query to determine who wins? Deterministically, we need to query Θ(N ). Using randomness to recursively determine which branch to evaluate first, the expected number of leaves we need to query is Θ(N d ), where d ≈ 0.753. This improvement comes from the fact that while evaluating an OR (equivalently AND) gate, we might find a branch that evaluates to 1 (equivalently 0) before evaluating the other branch (as long as such a branch exists). √ √ The best known quantum algorithm uses O( N log N ) queries. The N term looks likes Grover search. The log N term would intuitively come from amplifying Grover’s success probability. However, Grover cannot get this result. This result is actually an application of discrete random walks that were inspired by continuous random walks.
4
Next Time
In our next lecture, we will discuss adiabatic quantum computation. This is an alternate model of quantum computation that is similar to continuetime quantum walks. We will show that this model is universal and can be simulated by the quantum circuit model with a polynomial amount of overhead in time.
5
CS 880: Quantum Information Processing
10/21/2010
Lecture 19: Adiabatic Quantum Computing Instructor: Dieter van Melkebeek
Scribe: Hesam Dashti
Today we will talk about Adiabatic Quantum Computing which is an alternate model for Quantum computing other than the circuit model which we have been working with. The Adiabatic Quantum model is closly relted to the continuous time Quantum walk which we have discussed in the previous lecture. In this lecture we show how one can simulate the Adiabatic model by using the Circuit model and vice versa with a polynomial overhead in time.
1
Adiabatic Evolution
“Adiabatic” is a term from thermodynamics referring to a process in which there is no heat transfer. In the quantum setting the term refers to a process in which the system changes gradually such that it always is close to its equilibrium state. For the Adiabatic Evolution, we are looking at a Quantum system that is described by a Hamiltonian H(t). The evolution is prescribed by Schr¨odinger’s equation: d Ψi i = H(t) Ψ(t)i . dt When the Hamiltonian is constant the evolution of the system is simple: Ψ(t)i = e−iHt Ψ(0)i. But in general, the Hamiltonian could depend on time, in which case the evolution becomes more complex. Here, we consider a case in which the Hamiltonian depends on time but only changes gradually. In this case the evolution is again relatively simple in the following sense. If the initial state of the system is close to one of the eigenstates of the initial Hamiltonian, provided there is no degeneracy, the state of system follows the eigenstate and at every point in time it will be close to the corresponding eigenstate of the Hamiltonian. If there is degenerecy, then there is no guarantee. The closer to degeneracy we are, the slower the change in the Hamiltonian needs to be. The egenvalues of the Hamiltonian correspond to the energy level of the system, which are always real numbers. A ground state is a state of minimum energy. Starting in the ground state of the system, we remain close to the ground state provided that there is no degeneracy and we move slowly whenever the gap between lowest energy level and the next one becomes small. We consider the evolution process in an interval [0, T ] and rescale it by s = Tt , t ∈ [0, T ]. So s = 0 would be the initial state and s = 1 the end of the process. Let Φ(s)i denote the ground state of H(s) and ∆(s) the difference of the lowest energy level and the next one. For a given process profile H(s), s ∈ [0, 1], the Adiabatic theorem tells us how much time T we need to run the process in order to end up in a state that is no more than away from the ground state of H(1). Theorem 1. Adiabatic Theorem If we set up our initial state Ψ(0)i equal to the ground state of the Hamiltonian Φ(0)i then k Ψ(1)i − Φ(1)i k ≤
1
provided: T ≥Ω ˙ where H(s) =
dH(s) ds
" #! Z 1 ˙ ˙ ˙ ¨ kH(1)k kH(s)k kH(s)k 1 kH(0)k + 2 + + 2 ds , 3 ∆2 (0) ∆ (1) ∆ (s) 0 ∆ (s)
¨ and H(s) =
(1)
d2 H(s) . ds2
¨ When the Hamiltonian changes linearly as a function of s, the second derivative of it (kH(s)k) ˙ would be zero and the first derivative (kH(s)k) would be a constant, so the value of T would be 1 1 Ω × ∆3 . min
Next, we are going to use the Adiabatic Theorem in computations, where a natural usage of it would be optimization as follows.
2
Adiabatic Optimization
We are given a function f , as a black box f : {0, 1}n → R and the Goal is to find x∗ ∈ {0, 1}n such that f (x∗ ) = min(f ). In order to solve this problem using the Adiabatic evolution, we assume that the function f has a unique minimum to avoid degeneracy. We start with a Hamiltonian for which we can easily construct its ground state, and let the system evolve Adiabaticlly to a Hamiltonian whose ground state is x∗ i. Algorithm Setup: In order to setup we need to define our Hamiltonian as well as its ground state. In order to define our initial Hamiltonian, there are several possibilities, but one is to define it such that it is a small sum of Hamiltonians that act locally, i.e., act on a constant number of qubits – in this case is a single qubit: H(0) = −
n X
I⊗
j=1
0 1 1 0
⊗ I,
(2)
where for every j, the middle matrix acts on the j th qubit. This middle matrix has the eigenstate +i for eigenvalue +1 and −i for the eigenvalue −1. By considering the minus sign before the sum, the eigenstate +in gives us the lowest energy state, namely −n, and all other eigenvalues are at least −n + 1. So, our initial ground state is 1 Φ(0) = √ N
X
xi .
x∈{0,1}n
We want the ground state at the end to be the x that minimize the f . We can set X H(1) = f (x) Πx . x∈{0,1}n
2
where Πx is a projection on x. Process of evolution: After setting our system up, we need to clarify how it evolves by defining an interpolation function between H(0) and H(1): H(s) = (1 − g(s))H(0) + g(s)H(1), where g is a monotone and smooth function with g(0) = 0, g(1) = 1. As an example, consider searching for a unique marked item. We can cast the problem as an Adiabatic optimization problem by chosing f to be the indicator for being nonmarked. To determine the time T needed to guarantee success, need to compute ∆(s).
If we choose a linear function for g, then calculations show that the integral in Equation 1 is Θ(N ), which is no good. We are not going to do calculations to find a better g, but intuitively from Equation 1, H˙ should be small whenever ∆ is small, and H˙ can be larger when ∆ is larger. Thus, in the above figure, g can grow quickly at the beginning and the end of the interval. But in the middle, where ∆ is close √ to zero, g can only grow slowly. By adapting g optimally, it turns out we only need time T = Θ( N ). Thus, by using the Adiabatic optimization we can get the same type of results as Quantum circuit model, namely a quadratic speedup for unstructured search. In general, we do not know whether the Adiabatic optimization is a universal model or not, but the Adiabatic evolution is more general and is a universal model of Quantum computation. It is easy to see that we can simulate our Adiabatic evolution on our Quantum circuit model; the Hamiltonian evolution process could be divided into small pieces and assumed that in every piece the Hamiltonian is constant. Then Ψ((end of piece))i = e−iHlength of piece Ψ(beginning of piece)i and we can apply U = e−iHt on it, as described in the previous lecture. For that to be possible efficiently, the Hamiltonian should satisfy some conditions like being efficiently sparse. So to simulate the Adiabatic evolution we need to choose the Hamiltonian from one of the good Hamiltonians as introduced in the previous lecture. Another good category are local Hamiltonians, like the one that is used in Equation 2 H = H 0 ⊗ I, where H 0 acts on a constant number of qubits. Then, by using Taylor expansion of the matrix exponential, we can write the unitary operator: 0
U = e−iHt = e−iH t ⊗ I. 3
0
Since e−iH t acts on constant number of qubits using the closure under sum from the previous lecture, we can efficiently construct the unitary operator for small sum of local Hamiltonians. Hence, we can efficiently simulate such an Adiabatic evolution process with our Quantum circuit model. In the other direction, we know how to simulate Quantum circuits with Adiabatic evolution using a small sum of local Hamiltonians but we do not know how to do it by Adiabatic optimization.
3
Universality of Sums of local Hamiltonians
Universality of a model means we can simulate every computation in the Quantum circuit model with a polynomial overhead time. In this section we consider the universality of sums of local Hamiltonians. We see that we can simulate every computation by Quantum circuits of size k, using a sum of a polynomial number of local Hamiltonians, where the time overhead is poly(k). In this section we briefly consider how we can simulate a) a Hamiltonian Quantum Walk and b) a Hamiltonian Adiabatic Evolution processes, which both are based on sum of local Hamiltonians. The general sketches are as follows: a) Quantum Walk : We only need to setup a Quantum system, starting from a certain state, and let it evolve according to a Hamiltonian which is sum of a small number of local Hamiltonians. At the end, we need to observe the state. b) Adiabatic Evolution: We start the system in the ground state of a Hamiltonian and then evolve it Adiabatically using the Adiabatic evolution. At the end, we can extract the result from the ground state of the system.
3.1
Quantum Walks
We can simulate this process with a sum of time independent local Hamiltonians, in time polynomial in k. This simulation is sometimes called a Feynman computer, because he came up with this idea of simulation for simulating classical reversible computation. But Feynman’s idea works for simulating any Quantum computations. Setup: We have Uk Uk−1 . . . U1 where each Uj is a local unitary, corresponding to a Quantum gate P acting on a constant number of qubits. Let us define our Hamiltonian H = kj=1 Hj such that we have one local Hamiltonian for each step of computation. The system consists of two components: one for the state of our system and one which is used as a clock, so the Hamiltonian acts on two registers. We define Hj Ψi j − 1i = Uj Ψi ji and Hj Ψi ji = Uj† Ψi j − 1i Hj acts locally on the first register. If the second register is represented in binary then the process is not local. To make it local, we represent our clock in unary to keep Hj Ψi j − 1i a local process. After defining the Hamiltonian, we start the process in Ψi 0i and evolve according to the Hamiltonian H as defined above. We can show that state remains in span of Ψj i ji = Uj Uj−1 . . . U1 Ψi ji , 0 ≤ j ≤ k. Here, we are interested to know the final state Ψk i. We can show that D E k 1  Ψk e−iH 2 Ψ0 2 = Ω( 2 ), k3 4
where H is our Hamiltonian. This means that if we start the system with the state Ψi = 0i and we observe the second register at time k2 , then second register equals k with the probability Ω( 12 ) k3
and we have the final state (Uk Uk−1 . . . U1 0i) in the first register. In other words, after k2 step we only observe the second register, if it is not equal to k, we restart the process and otherwise we observe the first register, which would be the final state with the probability Ω( 12 ). We repeat the k3
2
process Θ(k 3 ) times to have a good probability of success. This way, we show that we can simulate Quantum circuits using Quantum walks with a Hamiltonian that is a sum of small number of local Hamiltonians.
3.2
Adiabatic Evolution
In this simulation, we start from the ground state of an evolving Hamiltonian and we want that the final state of the Hamiltonian be the state that we are interested in. The first attempt is to simpily use the same Hamiltonian as above. However, it turns out that each uniform superposition of the form k
X 1 √ Uk . . . U1 Ψi ji . k + 1 j=0 is a ground state for any choice of Ψi. So we like to enforce additionally that Ψi corresponds to the initial state of our Quantum algorithm, typically all qubits should be zero. For that reason we change the Hamiltonian slightly by using an additional penalty to it when Ψi = 6 0i: H(1) = H + Hpenalty , where Hpenalty =
n X
Πxj =1 ⊗ Π0
j=1
In this manner, we set the Hamiltonian as we like and at the end of the process we observe 1 . And again, we repeat this process k + 1 times, to get a Uk Uk−1 . . . U1 Ψi 0i with probability k+1 good probability of succeess. At the beginning of the process we set up our Hamiltonian to H(0) = −I ⊗ Π0 + Hpenalty with the same penalty to be in state 0i 0i as the unique ground state. We evolve the system by using a linear function g(s). We need to know how long we should run this Adiabatic process to get the final state. This is governed by the Adiabatic Theorem, for which we need to know a lower bound for the gap function. With this setup one can show ∆(s) ≥ Ω( k12 ) so T = O(poly(k)) suffices. Hence, we show a simulation of Adiabatic Evolution by sum of local Hamiltonians with a polynomial overhead in time. This finishes the first part of the course, where we considered the standard computational setting in which we want to realize a certain transformation from inputs to outputs. Next lecture we will start talking about processes with more than one party involved: Quantum communication and other interactive processes. 5
10/25/2010
CS 880: Quantum Information Processing
Lecture 20: Density Operator Formalism Instructor: Dieter van Melkebeek
Scribe: Dalibor Zelen´ y
So far in this course we have been working in the setting where the goal is to realize a relation by means of some computation. This involved only one “party” that was performing the computation. In today’s lecture and several following lectures, we will focus on systems where multiple parties participate in the computation. We develop the density operator formalism which is suitable for describing multiparty systems. It turns out that we can use this formalism to describe the evolution of a quantum system, too, and that it is in some sense superior to our original way of describing things.
1
Density Operator
We start with the definition of the density operator, give some examples, and prove some properties of density operators. To conclude this section, we show how to represent the evolution of a quantum system using density operators. Definition 1 (Density operator). For a pure state ψi, P the density operator is ̺ = ψi hψ. For a mixed state {(pi , ψi i)}i , the density operator is ̺ = i pi ψi hψ. When we apply the density operator to a state φ, we get the projection of φ onto ψ, that is, ̺ φi = ψi hψφi. Also note that the density operator corresponding to a mixed state is just a convex combination of density operators for the individual pure states that form the mixed state.
1.1
Examples of Density Operators
We now present some examples of density operators. As the next two examples show, two different mixed states can have the same density operator. Example: Let’s compute the density operator corresponding to {( 21 , 0i), ( 12 , 1i)}. The density operators for 0i and 1i are 1 0 0 0 T T ̺0 = (1, 0) (1, 0) = and ̺1 = (0, 1) (0, 1) = . 0 0 0 1 Now we take their convex combination based on the probabilities describing our mixed state and ⊠ get that ̺ = 21 ̺0 + 12 ̺1 = 21 I. Example: Now we compute the density operator corresponding to {( 12 , +i), ( 21 , −i)}. The density operators for +i and −i are 1 1 1 1 1 1 −1 T 1 T 1 . ̺+ = √ (1, 1) √ (1, 1) = and ̺− = √ (1, −1) √ (1, −1) = −1 1 2 1 1 2 2 2 2 Now we take their convex combination based on the probabilities describing our mixed state and get that ̺ = 12 ̺+ + 12 ̺− = 21 I. ⊠ 1
We see that the mixed states {( 12 , 0i), ( 21 , 1i)} and {( 21 , +i), ( 21 , −i)} have the same corresponding density operators. As we will see later, this implies that no quantum procedure can distinguish between these two mixed states. We conclude our list of examples with a density operator corresponding to a twoqubit state. Example: The density operator corresponding to √12 (00i + 11i) = √12 (1, 0, 0, 1)T is
1 1 1 1 0 √ (1, 0, 0, 1)T √ (1, 0, 0, 1) = 0 2 2 2 1 The state in the last example,
1.2
√1 (00i 2
0 0 0 0
0 0 0 0
1 0 . 0 1
⊠
+ 11i), is called an EPR pair.
Properties of Density Operators
We use traces extensively when describing properties of density operators, so we start with some properties of the trace of a matrix. Afterwards, we state three properties of density operators. Recall thatP the trace of a matrix M , denoted Tr (M ), is the sum of the diagonal entries of M , i.e., Tr (M ) = i Mii . It is easy to see that Tr (AB) = Tr (BA). Note, however, that it is not true in general that Tr (ABC) = Tr (ACB). The equality holds only for cyclic shifts of a product of matrices. For example, Tr (ABC) = Tr (CAB) holds. P An equivalent definition of the trace is Tr (A) = i λi where the λi are the eigenvalues of A. To see this for the case where A has a basis of eigenvectors, write V as the matrix whose columns are A’s eigenvectors, and note that AV = V Λ. Since V ’s columns form a basis for A, V is invertible, and we have A = V ΛV −1 where Λ is the matrix with Λii = λiP and with zeros in the offdiagonal entries. Now Tr (A) = Tr V ΛV −1 = Tr V −1 V Λ = Tr (Λ) = i λi . Also recall that a matrix is positive semidefinite if hx M xi ≥ 0 for all x. Claim 1. Let ̺ be a density operator. Then Tr (̺) = 1. P Proof. First consider a pure state ψi = xPαx xi. Then the diagonal entry ̺ii of the corresponding density operator is α2x , and we know that x α2x = 1. For a mixed state, just notice that the resulting density operator is a convex combination of matrices with trace 1. Claim 2. The density operator is Hermitian. P P P Proof. For a state x αx xi, the corresponding density operator is ̺ = ( x αx xi)( y αy hy). Then ̺xy = αx αy , and ̺yx = αy αx = αx αy = ̺yx . This shows that ̺ is Hermitian for pure states. For mixed states, just note that a convex combination of Hermitian matrices is Hermitian. Claim 3. The density operator is positive semidefinite. P P Proof. Consider a state φi. Then hφ ̺ φi = hφ ( i pi ψi i hψi ) φi = i pi hφψi i hψi φi = P 2 ≥ 0, where the inequality follows because p ≥ 0 for all i. p  hφψ i  i i i i Exercise 1. It turns out that we don’t need Claim 2 because every positive semidefinite operator is Hermitian. Prove this assertion. 2
The combination of the necessary conditions from Claim 1 and Claim 3 actually yields a sufficient condition for a matrix to be a density operator. We have the following theorem. Theorem 1. The matrix ̺ describes a density operator if and only if Tr (̺) = 1 and ̺ is positive semidefinite. Proof. We argued the forward direction in the proofs of Claims 1 and 3. For the reverse direction, assume Tr (̺) = 1 and ̺ is positive semidefinite. We need to find a mixed state whose density operator is described by ̺. Since ̺ is positive semidefinite, it’s Hermitian, and thus has an orthonormal basis of eigenvectors,P say ψ1 i , . . . , ψk i, with corresponding eigenvalues λ1 , . . . , λk . This means that we can write ̺ = i λi ψi i hψi . Since ̺ is Hermitian, it has real eigenvalues. Since it’s also positive semidefinite, all eigenvalues are nonnegative. Finally, since the trace of ̺ is 1, the eigenvalues define a probability distribution, so ̺ is the density operator corresponding to the mixed state {(λi , ψi i)}i .
1.3
Describing the Evolution of a Quantum System
Now we show that we can describe quantum computation using density operators. For that, we need to describe the density operator ̺′ corresponding to the state ψ ′ i obtained from state ψi either by applying a unitary operation to ψi or by making a measurement of ψi. Let’s start with applying a unitary operation U to the state ψi. The new state is ψ ′ i = U ψi, so the corresponding density operator is ̺′ = U ψi (U ψi)∗ = U ψi hψ U ∗ = U ̺U ∗ . We use linearity to get the density operator in the case of mixed states. Now suppose we make a measurement of a state ψi whose density operator is ̺. We measure with respect to some orthogonal basis {φ1 i , . . . , φk i}. The state is a linear combination of the P basis vectors, say ψi = i αi φi i. We observe the state φi i with probability αi 2 , so the new state is a mixed state {(αi 2 , φi i)}i , and its corresponding density operator is X X (1) ̺′ = αi 2 φi i hφi  = φi i αi αi hφi  i
i
Note that if we multiply ̺ on the right with φj i and on the left with hφj , we get the probability that we observe φj . This follows from the second summation inP(1) because hφj φi i = 1 if i = j, and is zero otherwise. Thus, another way of writing (1) is ̺′ = i hφi  ̺ φi i φi i hφi . Once again, we can apply linearity to get the resulting density operator when we observe a mixed state. With this in hand, we can prove the following theorem. Theorem 2. Two states are distinguishable by some quantum process if and only if their density operators are different. Proof. Assume that two density operators are the same. We just showed in the previous paragraphs that we only need the density operator in order to describe the outcome of some quantum process, and gave an expression for the density operator corresponding to the next state of the system. Thus, any quantum process operating on two states with the same density operators evolves the same for both of the states, results in the same final density operator for the two final states, and, most importantly, the probability of observing a string x is the same for both states. Thus, since we rely on observations to decide on the output of quantum algorithms, we cannot tell from the distribution of the observations which state we were in at the beginning. 3
Now suppose the density operators of two states are different. Since they are both density operators, they have a different orthogonal basis of eigenvectors, or the eigenvalue corresponding to some eigenvector is different for the two density operators. In either case, we get a different distribution of observed basis vectors, and we can distinguish between the two states. Exercise 2. Make the second paragraph in the proof of Theorem 2 more formal.
2
TwoParty Systems
In a twoparty system, two parties, Alice and Bob, have access to two different parts of a quantum register. Alice applies unitary transformations and observations to her Ppart of the register without affecting Bob’s part, and vice versa. The general form of the state is s,t αs,t si ti where the first component (the state si) belongs to Alice and the second component belogs to Bob. To Alice, the state of the system looks like a mixed state over all possible states that Bob’s part of the quantum register could be in. Thus, Alice’s state is !) ( P X αs,t si s where Pr[t] = αs,t 2 , Pr[t], p Pr[t] s t and there is a symmetric expression for Bob’s state. Let’s now find the density operator ̺A for Alice. We call this the reduced density operator. P P ′ X αs,t si ′ αs′ ,t hs  s ̺A = Pr[t] · p · sp Pr[t] Pr[t] t ! X X
= αs,t αs′ ,t si s′ s,s′
t
X X = (̺AB )(s,t),(s′ ,t) s,s′
t
!
si s′ .
(2)
where the inner sum in (2) is denoted (TrB (̺AB ))s,s′ and is called the trace with respect to B. The matrix ̺AB in (2) is the density operator for the whole system. It follows from (2) that for states (not superpositions) s, t, s′ , and t′ we have TrB (si hs′  ⊗ ti ht′ ) = htt′ i si hs′ , with htt′ i = 1 if t = t′ and zero otherwise, and we use linearity for superpositions. This may look a little confusing, so let’s look at an example. Example: Suppose Alice and Bob operate on a twoqubit system, where the first qubit belongs to Alice and the second qubit belongs to Bob. The density operator is ̺00,00 ̺00,01 ̺00,10 ̺00,11 ̺01,00 ̺01,01 ̺01,10 ̺01,11 ̺= ̺10,00 ̺10,01 ̺10,10 ̺10,11 . ̺11,00 ̺11,01 ̺11,10 ̺11,11 Then the trace with respect to B is the matrix ̺00,00 + ̺01,01 ̺00,10 + ̺01,11 TrB (̺) = . ̺10,00 + ̺11,01 ̺10,10 + ̺11,11 4
We see that (TrB (̺))(s,s′ ) is the trace of a submatrix of ̺ where Alice’s part of the first index (i.e., the first bit of the first index in our case) is fixed to s and Alice’s part of the second index is fixed to s′ . Using this observation, we see that the trace with respect to A is the matrix ̺00,00 + ̺10,10 ̺00,01 + ̺10,11 . TrA (̺) = ̺01,00 + ̺11,10 ̺01,01 + ̺11,11 ⊠ Example: Let
1 1 0 ̺= 0 2 1
0 0 0 0
0 0 0 0
1 0 0 1
be the density operator for the EPR pair. Then we have TrA (̺) = TrB (̺) = 12 I.
2.1
⊠
Schmidt Decomposition
Suppose we have a system where Alice can act on some part of it and Bob acts on the rest. The state is described by ψAB i. We can always write this state as a linear combination of the standard basis vectors. The next theorem states that we can do better. It is possible to write the state as a tensor product of two linear combinations of vectors coming from two orthonormal bases, one for Alice and one for Bob, that are potentially much smaller. Moreover, both bases have the same set of eigenvalues. Theorem 3. Given a state ψAB i, there exist orthonormal bases {ψi i}i for P Alice’s part of the state and {φ i} for Bob’s part of the state, and λ ∈ [0, 1] such that ψ i = i i i AB i λi ψi i φi i with P 2 i λi = 1.
Before we prove Theorem 3, let’s see how we can use it to obtain reduced density operators ̺A and ̺B for Alice and Bob. It turns with respect to B and A, respecP 2 out that we can usePtraces 2 tively. To see that, note ̺AB = i λi ψi i φP i i hψi  ψi i = i ), and if we trace i λi (ψi i hψi)⊗(φ Pi i hφ 2 (ψ i hψ ) ⊗ (φ i hφ ) = 2 hφ φ i ψ i hψ  = out the B component we get ̺ = Tr λ λ A B i i i i i i i i i i i i P 2 , where the last equality follows because hφi φi i = 1. Similarly, we get ̺B = i λi ψi i hψiP TrA (̺AB ) = i λ2i φi i hφi . P Proof Sketch for Theorem 3. Look at a superposition s,t αs,t si ti. The values αs,t form a matrix A = (αst )s,t , which we express using singular value decomposition as A = U ΛV where U and V are orthogonal P and Λ is the matrix containing the singular values of A on the diagonal. We now have αs,t = i Usi Λii Vit , so we can use the columns of U and V as the bases for Alice’s and Bob’s parts of the state, respectively.
2.2
Purification
We use the Schmidt decomposition to go from a density operator representing the state of the system to a reduced density operator correspodning to what is seen by one of the parties participating in the computation. Our goal here is the opposite. We start with a mixed state described by the density operator ̺A and want to construct a bigger system so that √ Pfrom it a pure state ψAB i of P ̺A = TrB (ψAB i hψAB ). We have ̺A = i λi ψi i hψi , and let ψAB i = i λi ψi i ψi i. As we can see, we are defining Bob’s part of the state to be the same as Alice’s part. 5
3
Next Time
In the following lectures, we will see some applications of density operators.
6
CS 880: Quantum Information Processing
10/26/2010
Lecture 21: Quantum Communication Instructor: Dieter van Melkebeek
Scribe: Mark Wellons
Last lecture, we introduced the EPR pairs which we will use in this lecture to perform quantum communication. In the typical quantum communication setting, two parties, Alice and Bob want to communicate. Beforehand, they create an entangled EPR pair and give one qubit to Alice and the other to Bob. Alice and Bob will exploit the entanglement between the two qubits to exchange information. In this lecture, we show that quantum communication can outperform its classical counterpart and also explore the limits of quantum communication.
1
Teleportation
Teleportation is a procedure that allows Alice to send a qubit to Bob using only two classical bits and one EPR pair. Recall that the EPR pair we use is + Φ = √1 (00i + 11i) . 2
(1)
Suppose that Alice has some state X
Ψi =
αb bi ,
(2)
b∈{0,1}
that she wishes to send to Bob, and that she and Bob each hold one qubit of an EPR pair. We will denote Alice’s qubit as Ai and Bob’s as Bi. For our first attempt at teleportation, we can try the following circuit, in which Alice entangles Ai with Ψi, Ψi
•
Ai
NM
Bi (1)
(2)
(3)
At the point (1) in our circuit, the system state looks like X Ψi Φ+ = αb b, c, ci
(3)
b,c∈{0,1}
At the point (2), Alice has applied a CNOT gate to her qubit, so the system state becomes X αb b, b ⊕ c, ci b,c∈{0,1}
1
(4)
Alice then measures her qubit, and gets state d = b ⊕ c. At (3), she then transmits d to Bob, who will use this to affect his qubit. If d = 0, Bob does nothing, otherwise he flips his qubit, giving us the state X X αb b, b ⊕ di ⇒ αb b, bi (5) b∈{0,1}
b∈{0,1}
Note that Alice’s EPR qubit is omitted from the state equation, as it has been measured and is no longer useful. At this point, Bob’s qubit is almost where we want it to be. However, it is still entangled with Alice’s qubit. Alice could measure her state to remove the entanglement, but this collapses Bob’s as well, which defeats the purpose of sending it to him in the first place. To resolve this in our second attempt, we will use a similar circuit to the one in the first attempt. Ψi
•
Ai
NM
H
NM
Bi
Ψi
This circuit behaves much as the previous one, except that Alice uses a Hadamard gate before taking a measurement. This functionally means that she is measuring in the +i basis. Thus, the system state change from the Hadamard is X X αb b, bi ⇒ αb (−1)ab a, bi (6) b∈{0,1}
a,b
Alice then takes her second measurement and sends a to Bob. If Alice measured a 0, then the system state would be α0 (−1)(0)(0) 0i + α1 (−1)(0)(1) 1i = α0 0i + α1 1i = Ψi ,
(7)
which means Bob has the state Alice wanted to send him, so we are done. In the case that Alice measured a 1, we have α0 (−1)(1)(0) 0i + α1 (−1)(1)(1) 1i = α0 0i − α1 1i ,
(8)
which means Bob needs to only apply a phase flip to his qubit, and then he will have Ψi.
2
No Cloning Theorem
It is important to note that Alice was not able to duplicate the qubit that she wanted to send to Bob during the teleportation procedure, as her copy was destroyed in the transfer process when she measured it. This is not a coincidence, as it is impossible for any quantum process to duplicate an arbitrary state. Formally, we can show that there cannot exist a quantum operation that performs the transformation ψi ψ0 i ⇒ ψi ψi . (9)
2
Proof. Suppose that such a Q exists. Let ψi = α0 0i + α1 1i .
(10)
Q ψi ψ0 i = ψi ψi = α02 00i + α0 α1 01i + α0 α1 10i + α12 11i .
(11)
Then Since Q is a quantum operation, it must be linear, so Q ψi ψ0 i = Q (α0 0i + α1 1i) ψ0 i = α0 Q 0i ψ0 i + α1 Q 1i ψ0 i = α0 00i + α1 11i .
(12)
Since equations (11) and (12) must be equal, it follows that α0 α1 = 0, which implies that the only states we can possibly clone are the basis states 0i and 1i. Note that this proof shows that in the {0,1} basis, only the states 0i and 1i can be cloned. If we changed into a different basis, the {+, } basis for example, we could repeat the proof to show that we could make a quantum operator to copy the +i and −i states, but nothing else. This can be generalized to the statement that a quantum operator can only be constructed to clone basis states for a specific basis.
3
Superdense Coding
In teleportation, we used two classical bits and an EPR pair to send a qubit. We can also do the reverse in a process known as superdense coding. In this context, Alice will use an EPR pair and a single qubit to communicate two classical bits to Bob. To start, Alice and Bob jointly prepare their EPR pair and as before, Alice takes Ai and Bob takes Bi. At this point, the system state is + Φ = √1 (00i + 11i) . 2
(13)
Later, when Alice wants to send a twobit message b1 b2 to Bob, she transmits Ai to Bob, but first she applies some transformations to it. If b1 = 1, she applies the phaseflip operation, and if b2 = 1, she applies the bitflip operation. These two operations in matrix form are 1 0 0 1 phaseflip = , bitflip = . (14) 0 −1 1 0 Depending on which operations Alice applied, the EPR pair will be in one of the four states 1 1 Φi ∈ √ (00i ± 11i) , √ (10i ± 01i) . (15) 2 2 These states are called the Bell states, and they are all orthogonal. Thus Bob merely needs to measure in the appropriate basis (the Bell basis in this case), and he can determine the system state with perfect accuracy. From this, he can infer Alice’s message.
3
4
Bounds on Quantum Communication
Superdense coding allows us to transmit two bits with a single qubit and an EPR pair. It immediately follows that any n bit message can be transmitted with n/2 qubits if we allow prior entanglement. Naturally, we are interested in whether we can do better. The answer is no, a reduction by a factor of two is the best quantum communication can do over classical communication. Theorem 1. Given a quantum communication protocol that allows Alice to send any message x where x ∈ {0, 1}n to Bob with probability of correctness1 ≥ p, let mAB be the number of qubits Alice sends to Bob, and mBA be the number of qubits Bob sends to Alice, then without prior entanglement 1 mAB + mBA ≥ n − log , (16) p and with prior entanglement 1 1 mAB ≥ n − log . 2 p We now prove the special case of oneway communication without prior entanglement.
(17)
Proof. Given message x and some optimal protocol, Alice will send ψx i over the channel to Bob, where ψx i consists of mAB qubits. Bob then applies some quantum operation D on ψx i, and then performs a projective measurement onto Py for y ∈ {0, 1}n . The probability that Bob gets the correct result is Pr[Bob reads the correct x] = kPx D ψx ik2 . The average probability over all possible messages x would then be 1 X p≤ n kPx D ψx ik2 . 2 x
(18)
(19)
Since ψx i lives in a subspace of dimension d ≤ 2mAB , then so does D ψx i. Let φi i , i = 1, 2, . . . d be an orthonormal basis for the D ψx i subspace, then D ψx i =
d X
αx,i φi i .
(20)
i=1
Substituting this back into equation (19) and choosing our projection operators such that all Px ’s have orthogonal ranges gives
2 d
X 1 X
(21) p ≤ P α φ i
x x,i i ,
2n x i=1
p ≤
p ≤
d 1 XX αx,i 2 kPx φi ik2 , 2n x i=1
2 d
1 X
X α P φ i
. x,i x i n
2 x i=1  {z }
(22)
(23)
≤1
1
This is the probability that Bob receives the message that Alice actually sent. Obviously, this needs to be very close to 1.
4
The last equation is true as kPx φi ik2 = length of the projection of φi 2 ≤ kφi ik2 = 1. Thus p≤ which leads to mAB
5
d ≤ 2mAB −n , 2n 1 . ≥ n − log p
(24)
(25)
(26)
Holevo’s Theorem
The previous proof can be generalized into a more powerful theorem, but first let us review some information theory terminology. Given some random variable X with range R, we define its classical entropy H to be X 1 H(X) ≡ px log where px = Pr[X = x]. (27) px x∈R
H has the property that 0 ≤ H(X) ≤ log R
(28)
where H(X) = log R if X is the uniform distribution and H(X) = 0 if X is completely deterministic. For a mixed state ψi with density operator ρ, we define the Von Neumann entropy to be S(ρ) ≡ H(Probability distribution induced by the eigenvalues of ρ).
(29)
Finally, we define the mutual information of random variables X and Y to be I(X, Y ) = H(X) + H(Y ) − H(X, Y ).
(30)
Informally, the mutual information says how much information X gives about Y , and vise versa. We can now state Holevo’s Theorem, Theorem 2. Suppose Alice has a random variable x and sends ρx overPthe channel. Bob receives P ρx and applies some operations on it to obtain y. Then I(X, Y ) ≤ S(ρ)− x ρx Sρx where ρ = px ρx . Let’s consider the example where X is uniform over {0, 1}n and we want zero error in our channel. Note that zero error is equivalent to x = y in Holevo’s Theorem. In this example, we can see that I(X, Y ) = log 2n = n, and S(ρ) = mAB . Putting these into Holevo’s Theorem gives X n ≤ mAB − Sρx ≤ mAB . (31) ρx
Thus, classically, Bob cannot learn more bits of information than the number of qubits Alice transmits. Next lecture, we will expend on quantum communication protocols where Alice and Bob want to jointly compute a boolean function rather than just transmit a string. 5
10/28/10
CS 880: Quantum Information Processing
Lecture 22: Communication Complexity Instructor: Dieter van Melkebeek
Scribe: Kenneth Rudinger
Last lecture we saw how quantum communication protocols can lead to constant speedups over classical communication protocols. Given the task of transmitting an n bit classical string, we still require n qubits if there is no prior entanglement; n2 qubits are required if prior entanglement is allowed. Thus, for such a task we cannot do better than speedups of constant factor. However, in different communication settings, such as communication complexity, we can achieve better than constant speedups. This lecture discusses the communication complexity of three different problems (equality, disjointness, and inner product); we will see how quantum communication can lead to betterthanconstant speedups.
1
Communication Complexity Overview
Instead of concerning ourselves with the task of transmitting a string over a quantum communication channel, we now concern ourselves with attempting to compute a function where two parties, Alice and Bob, are each given exclusive access to half of the input, and one (or both parties) wish to know the output of the function. In other words, consider the following general function f : f : {0, 1}n × {0, 1}n → {0, 1} : (x, y) → f (x, y)
(1)
Alice is given access to the string x, while Bob is given access to the string y. Our goal is to find a communication protocol of minimum cost (one that minimizes the quantity of data transmitted between Alice and Bob) which allows one or both parties to know f (x, y). For any given function f , there are several different variations in the kind of communication allowed, which may (or may not) affect the complexity of computing f . We now describe the possible variants.
1.1
Allowed Directions of Communication
There are three kinds of allowed directions of communication. They are the following, listed in order of increasing restrictiveness. 1. Twoway: In twoway communication, both Alice and Bob may transmit and receive information. 2. Oneway: In oneway communication, only one of the two parties (say Bob) may receive information from the other. 3. Simultaneous: In simultaneous communication, both Alice and Bob may transmit information to a third party, Charlie, but cannot receive any information.
1
1.2
Communication Models
These are the three computational models that we concern ourselves with: 1. Deterministic: All parties’ actions are determined by their part of the input and the communication they have received thus far. 2. Randomized: All parties may use randomness to decide on their actions. There are two kinds of randomized models we examine. (a) Private random bits: The parties have their own sources of randomness; each may generate random bits, but there is a communication cost for one to know what the other’s random bits are. (b) Public random bits: The random bits are generated publicly; both Alice and Bob have access to the same random inputs. 3. Quantum: Lastly, we examine quantum communication, in which quantum algorithms and quantum communication channels are allowed when examining a problem’s communication complexity. There are two kinds of quantum communication we consider. (a) Prior entanglement allowed. Alice and Bob are each given half of one or more EPR pairs. (b) Prior entanglement not allowed. EPR pairs do not come for free. We also note that one may consider quantum communication in which Alice and Bob can share entanglement but only a classical channel of communication. However, we will not further concern ourselves with this particular model of communication.
1.3
Error
Lastly, we may consider communications protocols with different allowed errors. 1. Exact. No error is allowed 2. Bounded error. Error is allowed but bounded. It may be bounded from one or both sides.
2
Problems
We focus on the following three functions f (x, y): 1. Equality predicate: EQ(x, y) =
1 if x = y 0 otherwise
2. Disjointness predicate: DISJ(x, y) =
0 if ∃ i such that xi = yi = 1 1 otherwise 2
3. Inner Product: IP (x, y) =
n X
xi y i
mod 2
i=1
2.1
Equality Predicate
2.1.1
Deterministic Protocol
In the trivial protocol Alice sends all of her input to Bob, who can then evaluate any predicate on the combined input. If Alice also needs to know the answer, Bob then sends that bit to Alice. The resulting communication cost is n or n + 1. For the equality predicate no determinstic protocol can do better. 2.1.2
Randomized Protocol
Private coins Here is a randomized protocol with private coins. Alice picks a prime number p of O(log n) bits at random, and sends x mod p and p to Bob, at a cost of O(log n). Bob computes y mod p and compares it to x mod p for equality. If the two are different, Bob knows with certainty x 6= y; if the two are equal, it is likely that x = y. This protocol works only needs oneway √ communication. For simultaneous communication, it turns out that the complexity is Ω( n); this √ bound is tight, as there exists a simultaneous protocol that has a cost O( n). One way to obtain √ a protocol of cost O( n log n) uses errorcorrecting codes. An error correcting code is a mapping E : {0, 1}n → {0, 1}m with certain characteristics, namely the absolute and relative Hamming distance, respectively denotes by ∆ and δ, and the rate ρ: ∆ = min{number of positions in which E(x) and E(y) differ for x, y ∈ {0, 1}n and x 6= y} ∆ m n ρ= m The purpose of this errorcorrecting code E is to amplify the difference between each of our distinct strings that reside in {0, 1}n , by mapping them all to a larger space where they can differ by more bits; even elements that differ by only a single bit in {0, 1}n will be mapped to strings that differ by many bits in {0, 1}m . The number of differing bits is at least ∆. If there is some degradation in the fidelity of some string E(x) (that is, some of its bits are flipped), we may detect this error if fewer than ∆ bits are flipped; if fewer than ∆ 2 bits are flipped, we may recover x in its entirety, as the degraded string will still be closer (Hamming distancewise) to E(x) than to E(y) for all other y in {0, 1}n . It turns out that there exist families of errorcorrecting codes with constant δ > 0 and ρ > 0. Such families are sometime referred to as “good” errorcorrecting codes. One such family is known as the Justesen code. We will omit the details of the code, except to say that it is efficiently computable and decadable. δ=
3
We can use any good family of error correcting codes to develop a simultaneous protocol in the following manner. We know that if x = y, then E(x) = E(y). However, if x 6= y, then not only will E(x) 6= E(y), but E(x) and E(y) will differ in a fraction at least δ of the bits. Thus, Alice and Bob compute E(x) and E(y), respectively. Both select bit positions at random, and send the bit positions and their corresponding bit values to Charlie. If there is an overlap in the ith position number, then Charlie gets to compare the ith bit of E(x) and E(y), giving a good indication of whether or not x = y. Charlie outputs 1 iff there is agreement in all the overlapping positions. How many such pairs need to be sent to Charlie for a good chance of overlap? By the birthday √ √ paradox, we know that sending Ø( m) random places to Charlie (which is O( n)) will yield a good probability of overlap. Since each position requires O(log n) bits of communication for the √ index, this yields an O( n log n) simultaneous protocol that has a good probability of successfully computing EQ(x, y). Public Coins With shared public coins, Alice and Bob can select the same random bit from E(x) and E(y), and transmit it to Charlie. Charlie makes the comparison, and knows with good probability whether or not x = y. Thus, using a good error correcting code and O(log n) public coins, we obtain a protocol of cost O(1). In fact, at the expense of more public coins, we can use simpler error correcting codes with worse rate. In particular, we can use the Hadamard code, which maps a string x to the sequence of inner products of x with every string of the same length. Both Alice and Bob have access to some random string r of length n. They compute IP (x, r) and IP (y, r), respectively, and send their one bit answers to Charlie. If IP (x, r) 6= IP (y, r), then Charlie knows x 6= y. If IP (x, r) = IP (y, r), then Charlie knows that x = y with error bounded by 1/2 (which can be arbitrarily improved by having Alice and Bob compute and transmit their inner products with new random numbers O(1) times). 2.1.3
Quantum Protocol
Based on the classical simultaneous protocol using a good error correcting code, we can construct a quantum simultaneous communication protocol of cost O(log n), utilizing Alice’s and Bob’s ability to transmit qubits in superpositions. Alice and Bob prepare the following states: m
1 X ii (E(x))i i ψAlice i = √ m i=1 m
1 X ψBob i = √ ii (E(y))i i m i=1
These states are transmitted to Charlie. If x = y, hψAlice ψBob i = 1; otherwise,  hψAlice ψBob i  ≤ 1 − δ. We leave it as an exercise (similar to problem 2 on HW 2) to exhibit a quantum operation Charlie can perform on these states that accepts with probability (1 +  hψAlice ψBob i 2 )/2. Since each of these states contains O(log n) qubits, this gives a simultaneous quantum protocol without entanglement with a cost of O(log n).
2.2
Disjointness Predicate
Now we examine the various communicatio settings to evaluate the disjointness predicate, DISJ(x, y). 4
2.2.1
Deterministic Protocol
As in the case of the equality predicate, n bits must be in order to compute DISJ(x, y). 2.2.2
Randomized Protocol
As there might only be only one intersection between x and y which would make DISJ(x, y) = 1, we cannot take advantage of the the birthday paradox. In fact, it can be shown that any randomized protocol has cost Ω(n), even in the public coin setting. 2.2.3
Quantum Protocol
As it is our goal to find an i such that xi = yi = 1, our protocol is essentially a Grover search on O(log n) qubits. However, both Alice and Bob have their own unique parts of the input, so we P cannot run Grover immediately. Instead, whenever Grover’s algorithm needs to make a query i αi ii, we P ship that state with a few more qubits back and forth between Alice and Bob so as to obtain i αi ii xi yi i. To do so, the following algorithm is executed. P 1. Alice begins with the state ψ0 i = i αi ii 0i 0i 0i. P 2. By XORing xi to the second register, Alice makes the state ψ1 i = i αi ii xi i 0i 0i, which she sends to Bob. 3. Bob follows P the second step in a similar fashion, writing yi to the third register, yielding ψ2 i = i αi ii xi i yi i 0i. P 4. Bob then transforms ψ2 i to the following state: ψ3 i = i αi ii xi i yi i xi yi i. P 5. Bob XORs yi again to the third register, yielding ψ4 i = i αi ii xi i 0i xi yi i; this state he sends to Alice. P 6. Alice XORs xi again to the second register, yielding the desired state ψ5 i = i αi ii 0i 0i xi yi i. √ Now following Grover, O( n) queries are made, ultimately yielding an evaluation of DISJ(x, y). √ Because it costs O(log n) qubits of communication per query, the final cost is found to be O( n log n). There does exist a method for eliminating the O(log n) factor in the cost, yielding an overall cost √ of O( n), which turns out to be optimal up to constant factors.
2.3
Inner Product
Now we turn to the inner product function, IP (x, y). It turns out that cost for deterministic, random, and quantum protocols are all Ω(n), even in the boundederror twoway communication setting. The following is a proof of this claim for the case of exact protocols. Proof. Suppose we can evaluate IP (x, y) with m qubits of communication on inputs of length n. Then we can perform phase kicback, and apply the following transformation using m qubits of communication. xi yi → (−1)x·y xi yi .
5
Then Alice can transmit n classical bits to Bob using m qubits of communication asPfollows. Given an input x, Alice prepares the state xi; Bob creates a uniform superposition √1N y yi. We can then apply our abovedescribed inner product protocol in superposition to yield: 1 X 1 X xi √ yi → xi √ (−1)x·y yi N y N y Now has the first register with the state xi; Bob has the second register with the state P Alice x·y yi. Note that the latter is simply the Hadamard transform of xi. Bob can then y (−1) apply a Hadamard gate to each qubit in his register to invert the transformation: √1 N
1 X 1 X √ (−1)x·y yi → H ⊗n √ (−1)x·y yi = xi . N y N y Thus Bob can recover x (which is n bits long) with m bits of communication. By Holevo’s theorem, m ≥ n, and we are done.
3
Summary, Promise Problems
We summarize our findings of the separations in the various forms of communication protocols in the following table: Twoway Oneway Simultaneous
Classical Ω(n) √ Ω( n)
Quantum √ O( n) O(log n)
Comments From disjointness predicate; a quadratic separation No separation known From equality predicate; an exponential separation
The above results are the best separations known for total functions, i.e., functions that are defined for all strings. Stronger separations are known for promise problems. In fact, there exists a 1 promise problem that has a oneway quantum communication cost of O(log n) but requires Ω(n 3 ) classical communication, even in the twoway setting. The following is a (continuous) example of such a problem. Alice is given x ∈ Rn ; Bob is given a subspace H ⊆ Rn , where dim H = n2 . The promise is that either x ∈ H or x ∈ H ⊥ ; the task is to determine which subspace (H or H ⊥ ) x resides in. The quantum protocol is the following: Alice encodes x in log n qubits, and sends the qubits to Bob. Bob then measures the projection of x in H, determining where x resides.
6
CS 880: Quantum Information Processing
11/1/10
Lecture 23: Cryptographic Protocols Instructor: Dieter van Melkebeek
Scribe: Cong Han Lim
In today and tomorrow’s lectures we will be looking at cryptographic protocols. The protocols enable Alice and Bob to communicate with each other in a way where we can ensure the following goals. 1. Authenticity: Alice can authenticate a message she sends such that when Bob receives it, Bob knows that it is from Alice. 2. Secrecy: An eavesdropper (Eve) on the communication channel cannot decode the secret message. To achieve these goals we will be looking at some basic primitives. In this lecture we will cover secret key exchange and bit commitment, and tomorrow we will be talking about zero knowledge protocols. We had already encountered the concept of secret key exchange when we talked about the DiffieHellman key exchange. The goal there was for Alice and Bob to share a randomly chosen secret key, which can subsequently be used for encryption and decryption. For security reasons, only Alice and Bob should know the key, and the domain the key was chosen from has to be sufficiently large to prevent simple exhaustive search attacks. As for bit commitment, the goal is to allow Alice to commit to a value without revealing it. The real world analogue would be Alice sending a message inside a locked box to Bob while she retains the key to it. In the classical setting, we do not know how to achieve unconditonal security when implementing these primitives. Instead, we only know how to implement all of these primitives under certain hardness assumptions like relying on oneway functions. In principle, this means that the message being transmitted actually contains all the information necessary to decode it, though it is computationally difficult to break with a classical computer. Things change once we are in the quantum setting. On the one hand, the list of problems that are “computationally difficult” changes once we have access to a quantum computer. For example, the DiffieHellman protocol, ElGamal cryptosystems and RSA cryptosystems can be broken by quantum algorithms. On the other hand, a quantum computer can actually aid cryptography. As we shall see in this lecture, it allows us to realize secret key exchange unconditionally in an informationtheoretic sense. However, we will also prove that it is impossible to unconditionally realize bit commitment in the same sense.
1
Secret Key Exchange
We will describe two protocols. The first one will be covered in detail. We will also briefly describe another (BB84) which is easier to implement and has in fact been realized physically (albeit with easy possible sidechannel attacks). Both protocols capitalize on entanglement between the halves of EPR pairs, and presume a classical public channel that cannot be tampered with. 1
1.1
First Protocol (by Lo & Chau)
This protocol utilizes the key property that for an EPR pair where Alice and Bob have one part each, when Alice and Bob measure their respective qubits, they will both observe the same random bit. This is a way to share a single bit of a secret key, but it presupposes that there is some precommunication: the EPR pair has to generated by one party, split up and one qubit has to be sent to the other party, and during the transmission one could tamper with the qubit. We will describe how to get around this issue. We will be using the Bell basis throughout this protocol: + Φ = √1 (00i + 11i), 2 − 1 Φ = √ (00i − 11i), 2 + 1 Ψ = √ (01i + 10i), 2 − 1 Ψ = √ (01i − 10i). 2 ⊗n
There are three steps in this protocol. First, Alice will generate Φ+ i (the tensor product of n EPR pairs) and send Bob his half of the state. Then both Alice and Bob will check the qubits ⊗n they have to determine if the state is equal or close to Φ+ i or if it has been tampered with. If the states are perfect, the check should always accept. Otherwise, if the state is not close to ⊗n Φ+ i , then we want the check to return negative, after which Alice and Bob will discard the qubits and restart the protocol. After a successful check, Alice and Bob will measure the remaining components to obtain the secret key, which will be (almost) uniformly distributed. How are we going to perform the checking process? Alice and Bob could measure some of the qubits directly, which would actually destroy them but would leave enough of the joint state to be used in obtaining the random key. To see how direct measurement helps, we first consider a single two qubit state in the Bell basis. To detect whether there is a high weight on the Ψi terms, Alice and Bob both measure their qubit and compare their measurement outcomes using the secure public channel. They reject if the outcomes differ, and continue the protocol otherwise. Conceptually, this step can be thought of as performing a CN OT on the pair and observing the first qubit (the noncontrol qubit). b1 b1 ⊕ b 2 • b2 b2 In the case of the Φi states, the first qubit becomes 0i, whereas for the Ψi states we get 1i. Thus, if we measure that qubit and observe a 1, we can reject. Otherwise, we know that our state now only has Φi components. The above procedure allows us to eliminate the unwanted Ψi components. What about Φ− i? Applying the Hadamard gate to both qubits of our base states helpfully preserves Φ+ i but switches Φ− i and Ψ+ i: (H ⊗ H) Φ+ = Φ+ , (H ⊗ H) Φ− = Ψ+ , (H ⊗ H) Ψ+ = Φ− , (H ⊗ H) Ψ− = Ψ− .
2
We can now apply the CN OT and observe the first qubit to eliminate the Φ− i state. So, depending on whether we apply the Hadamard gate before the CN OT , we have two separate tests that allow us to eliminate the unwanted states. Alice picks one of the two tests uniformly at random and sends a bit over the public channel to Bob to coordinate the check. For this combined test we have that Pr[ Φ+ passes] = 1 + 2 + 2 1 1 − Φ Ψ Pr[Ψi passes] ≤ Φ Ψ + 2
2 1 = 1 + Φ+ Ψ . 2 The first equality follows because Φ+ i passes both tests. The inequality follows because the Bell states other than Φ+ i are rejected by at least one of the two tests. We now consider the entire 2n qubit state. This can be seen as a superposition over all combinations of n tensors of the Bell basis. The direct approach would be to perform the check on some of the pairs of qubits, but the probability of detecting tampering if Eve only tampers with a small number of pairs is low. We need to use a better approach to increase the probability. View the situation as follows. There are n possible combined tests Alice and Bob can perform, namely one for each of their n paired qubits. Each of those tests involves comparing two classical bits, one from Alice and one from Bob, which are obtained by measuring the corresponding qubits. Each individual test passes iff the bits are the same. We’d like to design a new test that rejects with high probability if at least one of those individual tests rejects, and accepts otherwise. We can do so by taking a random subset of the individual tests and reject iff the parity of the number of those that reject is odd. If all individual tests accept, the parity is always even; otherwise, it is odd with probability 50%. Moreover, Alice and Bob only need to sacrifice one qubit pair in order to execute this new test. They use the public channel to select the subset, and then XOR into one of the selected positions (say the first one) all the other selected qubits. They then measure the first selected position, compare the measurement outcomes using the public channel, and reject if they differ. Let us analyze what happens to the state of the system when we apply several new tests. We can write the initial state as a superposition over all possible tensors of Bell states. Consider the ⊗n evolution without renormalization. The weight of the component Φ+ i remains unaffected as that state passes all tests. The expected weight of all other components decreases by a factor of ⊗n 1/2 with each new test. If the initial weight of Φ+ i is very small, then chances are at least one of the new tests rejects. Otherwise, the tests will make the expected weight of the other components ⊗n small compared to the weight of the Φ+ i component, so in the end the normalized state is very ⊗n close to Φ+ i . Somewhat more quantitatively, if we haven’t rejected after n/2 new tests, the ⊗n remaining n/2 qubits are exponentially close to Φ+ i with exponentially high confidence. One subtle concern we have yet to address is whether the information broadcast over the public channel can allow Eve to obtain any additional knowledge about the key. Since the only information we send over th subset, the type of check we are performing, and the results of the measurements, one can see that Eve cannot benefit from this additional knowledge.
3
1.2
Second Protocol (BB84)
While the first protocol gives us a fairly efficient way of performing informationtheoretically secure quantum key exchange, it is not that easy to implement. In particular, implementing the XOR process is complicated as it is a quantum operation that involves many qubits. In comparison, the design and implementation of the very first quantum key exchange protocol (BB84) introduced by Charles Bennett and Gilles Brassard is rather elegant, though it took about fifteen years before the protocol was proven to be informationtheoretically secure. We will briefly describe the BB84 protocol here. For the initial communication over a quantum channel: 1. Alice randomly generates two binary strings a = a1 a2 . . . an and b = b1 b2 . . . bn of length n. String a will form the basis of the secret key, and string b will be used to decide how we encode a as qubits. If bi is 0 we encode ai in the standard {0i , 1i} basis, otherwise we encode ai in the Hadamard basis {+i , −i}. 2. Alice sends the encoded qubits to Bob, who has no way of knowing which basis was used for each qubit. Instead, Bob generates the random string b′ and use the ith bit of b′ to decide which basis he is going to measure the ith qubit in. This gives Bob another string a′ . We know for sure that the ith bit of a′ is the same as ai if b′ and b are the same on the corresponding bit. Since the choice of b′ is random, Bob is expected to get at least half of b correct. Now we want to decide on our secret key. This is done over the public channel: 1. Alice and Bob share their b and b′ over the public channel. 2. The bits of a and a′ where b and b′ do not coincide are discarded. If Eve has not observed or tampered with the encoded qubits, the resulting a and a′ will be the same; otherwise they may not be in total agreement. Alice and Bob now check if tampering has occured: 1. Alice and Bob disclose a portion of their strings a and a′ over the public channel. These bits are removed from the secret key. 2. If there are any differences, then we know that Eve has been tampering or observing the state, and the protocol is aborted. 3. Otherwise, the remaining bits in a and a′ form (two copies of) the secret key. While this protocol is simpler to implement, the correctness proof is more difficult. One needs to argue that the more information an eavesdropper has extracted from the channel, the more likely it is for more components of a and a′ to differ. One disadvantage shared by both protocols is that they do not work in an online fashion, since one needs to send over the entire key before any processing can occur. There are other protocols that address this issue.
4
2
Bit Commitment
Bit commitment is a twostage protocol for commiting to a value without revealing it. In the commitment phase, Alice picks a value for a bit b and sends some information to Bob that commits her to this value of b without revealing it. The next phase is the reveal phase, where Alice sends some additional information to Bob that allows him to figure out the bit Alice committed to. We want this protocol to satisfy two properties: 1. Binding: Once Alice has committed to a bit, she cannot change it afterwards. 2. Hiding: The information that Bob receives in the first phase cannot be used to obtain the bit before the reveal phase. In the classical setting, we can again realize bit commitment in a hardness sense by using oneway functions. One would hope that the quantum setting allows us to achieve this in an informationtheoretical setting, but that unfortunately is not true. Theorem 1. There is no informationtheoretically secure quantum bit commitment scheme that has both the binding and hiding properties. Proof. The proof is an application of the Schmidt decomposition. Consider the state after the commitment phase. The state can be described by a density E (b) operator. Consider purifications ψAB for b ∈ {0, 1}, where b is the bit that is committed by Alice. Note that informationtheoretic hiding means that Bob’s reduced density operator is exactly the same in both cases: (1)
(0)
̺B = ̺B .
(1)
Eo n (b) for Alice’s part of the By appyling the Schmidt decomposition, we get orthonormal bases φAi Eo n (b) for Bob’s part of the state, such that we can write state and φBi E E X E (b) (b) (b) (b) λi φAi φBi , ψAB = i
and (b) ̺B
=
X i
ED (b) (b) 2 (b) φBi . λi φBi
E E (1) . (0) (0) (1) . Becauase of (1) we can assume that λi = λi = λi and φBi = φBi = φBi i, so we can rewrite the purifications as E E X (b) (b) λi φAi φBi i. ψAB = i
E (b) Since both φAi are orthonormal bases over the same space, Alice can simply apply some local E E (1) (0) unitary transformation that transforms φAi to φAi . This allows Alice to cheat and modify the bit, so our protocol fails to be binding. 5
3
References
The first quantum key exchange method is described in Lo and Chau’s article [2], which contains a good amount of exposition. The BB84 protocol [1] is the first quantum key exchange protocol designed and the first proof of its security can be found in [3].
References [1] C. Bennett and G. Brassard Quantum Cryptography: Public key distribution and coin tossing. Proceedings of IEEE International Conference on Computers, Systems and Signal Processing (Bangalore, India, Dec.), pages 175  179, 1984. [2] H. Lo and H. Chau. Unconditional Security of Quantum Key Distribution over Arbitrarily Long Distances. arXiv:quantph/9803006v5, 1999. [3] D. Mayers. Unconditional security in quantum cryptography. Journal of the ACM, vol. 48, no. 3, Pages 351  406. 2001.
6
CS 880: Quantum Information Processing
11/2/2010
Lecture 24: Zero Knowledge Instructor: Dieter van Melkebeek
Scribe: Tyson Williams
Last lecture, we discussed cryptographic protocols. In particular, we gave a quantum protocol for secret key exchange that is secure in an information theoretical sense provided there is a secure public classical channel. We also discussed bit commitment and showed that no quantum protocol has information theoretic security. Today we will discuss zero knowledge systems and give an example of a classical zero knowledge protocol that remains zero knowledge even in the quantum setting.
1
Interactive Proof Systems
To introduce zero knowledge, we first need to introduce the notion of an interactive proof system. Definition 1. An interactive proof system (IPS) for a language L is a protocol between a computationally unrestricted prover P and a probabilistic polynomialtime verifier V such that on input x, which is available to both parties, (∀x ∈ L) Pr [(V ↔ P )(x) accepts] = 1 1 (∀x 6∈ L)(∀P 0 ) Pr (V ↔ P 0 )(x) accepts ≤ 2
(completeness) (soundness)
where (V ↔ P )(x) means “the verifier’s view while running the protocol with P on input x.” The view of the verifier contains his coin flips, communication received from the prover, and communication sent to the prover (although this last type of communication can be recreated by the verifier using the same random bits). The completeness does not have to be perfect (that is, equal to 1) but we will only discuss such IPSs. If soundness of 1/2 is too high, just repeat the protocol a polynomial number of times for exponentially small soundness. The soundness condition must hold for all provers P 0 , even ones that deviate from the protocol and try to convince the verifier that x is in the language when it is not. An IPS is a generalization of the proof system associated with the class NP. For NP, the prover provides the witness as the proof and the verifier checks it deterministically in polynomial time. The difference here is that the verifier is allowed randomness and may interact with the prover several times. Without the randomness, multiple interactions is not more powerful. An example of an IPS is, of course, standard NP proofs. An interesting example is GraphNonIsomorphism. We do not know if this problem is in NP, but it has a very simple IPS. A yes instance is a pair of graphs G0 and G1 that are not isomorphic. If the number of vertices in the graphs differ, then the verifier does not need the help of the prover, so let both graphs have n vertices. The verifier picks a bit b ∈ {0, 1} and σ ∈ Sn (both uniformly at random), sends σ(Gb ) to the prover, and asks the prover to state which b he used. If the prover responds correctly, then the verifier accepts; otherwise, he rejects. If the graphs are not isomorphic, then the prover is always be able to correctly identify b because σ(Gb ) is only isomorphic with Gb and not with Gb . Thus, this IPS has perfect completeness. If the 1
graphs are isomorphic, then the prover has no way of knowing which graph Gb was selected: Given any graph he received from the verifier, the probability that b = 0 is 50%. Whatever the prover does, he will be correct with probability 1/2, which matches our soundness bound. In general, any language L has an IPS iff L can be decided in polynomial space. That L has an IPS implies that L ∈ PSPACE is easy. The other direction is a nontrivial result of complexity theory.
2 2.1
Classical Zero Knowledge Informal Definition
A zero knowledge interactive proof system (ZKIPS) is a special kind of IPS. There is an additional condition, namely, when x ∈ L, the verifier does not learn anything other than being convinced that the x is indeed in L. In an IPS, the soundness condition protects the verifier from accepting an incorrect claim. In a ZKIPS, the new condition protects the prover from having to reveal any information (other than the correctness of the claim). When the prover follows the protocol for an input x ∈ L, the verifier will learn nothing beyond the fact that x ∈ L. Most standard NP proofs are not zero knowledge under standard complexity theory assumptions like P 6= NP. Consider the standard NP proof that a graph is 3colorable. The proof is a 3coloring. Intuitively, this is not a zero knowledge proof system because the verifier has learned more than just the fact that the graph is 3colorable. The verifier now knows a 3coloring, which he is unable to compute under the assumptions. Now the verifier can act as the prover and convince a different verifier that this graph is 3colorable, something that he could not have done previously.
2.2
Motivation
A ZKIPS can be used for authentication. The most popular form of authentication today is via a password that is given to the verifier. Anyone who watches the prover enter the password has broken the security. They can now successfully authenticate as the prover. If the authentication used a ZKIPS and the prover follows the protocol, then anyone can watch the prover’s interaction with the verifier, but they will learn nothing besides the fact that prover is who he says he is. In particular, no one will be able to authenticate as the prover (unless they were able to previously). This holds even for the computer system that the prover was using to communicate with the verifier. Cryptographic protocols typically require secret keys for various parties. We would like to know that all parties correctly follow the cryptographic protocol, but to know this for certain requires knowledge of their secret key. Instead, we can phrase it as an NP question by saying, does there exist a secret key that would have caused the behavior we observed in the other party. Now we can use a ZKIPS to be convinced of this fact without learning the value of the secret key.
2.3
Formal Definition for a ZKIPS
We formalize the property of zero knowledge for an IPS in a strong way – that whatever can be efficiently computed from some prior knowledge and interaction with the honest prover on any input x ∈ L, can be efficiently computed from the prior knowledge without interaction with the prover.
2
Definition 2. A zero knowledge interactive proof system (ZKIPS) for a language L is an interactive proof system between a prover P and a verifier V where for all probabilistic polynomial time verifiers V 0 , there exists a probabilistic polynomial time simulator SV 0 such that (∀x ∈ L)(∀a ∈ Σ∗ ) (V 0 ↔ P )(x, a) ∼ SV 0 (x, a) where the relation ∼ between the two distributions can take one of three meanings: 1. the distributions are perfectly identical, which is called perfect zero knowledge, 2. the distributions are close in the L1 norm, which is called statistical zero knowledge, or 3. the distributions are computationally indistinguishable to a probabilistic polynomial time machine, which is called computational zero knowledge. In this definition, SV 0 simulates the interaction between P and V 0 , and a represents the prior knowledge. Let’s discuss why this definition is what we want. The only source the (dishonest) verifier V 0 has to gain any information is his view of the interaction with the prover, which is denoted by (V 0 ↔ P )(x, a). However, the definition says that V 0 can instead ignore the prover and gain the same information by running SV 0 (x, a), which does not require interaction with the prover. The verifier is able to do this since SV 0 is also a probabilistic polynomialtime algorithm.
2.4
Examples of a ZKIPS
With such strong definitions, there is the risk that no examples exist. However, our definition is not that strong. We give two examples of ZKIPSs, one for GraphIsomorphism (unconditionally) and one for 3Colorability (assuming bit commitment). We intuitively argued above that the standard NP proof that a graph is 3colorable is not zero knowledge. The same reasoning applies for the standard NP proof that two graphs are isomorphic, which is the isomorphism. Note that formally proving those claims would imply separations lie P 6= NP, and is therefore beyond the current techniques of complexity theory. In contrast, proving that a protocol is zero knowledge just requires a construction like the ones below. 2.4.1
Graph Isomorphism has a ZKIPS
The input is two graphs G0 and G1 , both with n vertices. 1. The prover picks b ∈ {0, 1} and σ ∈ Sn uniformly at random and sends H = σ(Gb ) to the verifier. 2. The verifier picks c ∈ {0, 1} uniformly at random and sends it to the prover. 3. The prover picks some ρ ∈ Sn and sends it to the verifier. 4. The verifier accepts iff H = ρ(Gc ). Suppose the graphs are isomorphic, say G0 = π(G1 ). Then the completeness is perfect because the prover will pick ρ to be • σ when b = c, 3
• σ ◦ π when 0 = b 6= c = 1, and • σ ◦ π −1 when 1 = b 6= c = 0. The soundness is exactly 1/2 because the only way for the prover to send a valid isomorphism when the graphs are not isomorphic is when b = c, which happens with probability 1/2. We will show that this protocol is perfectly zero knowledge by giving the simulator SV 0 on inputs hG0 , G1 i and a. The simulator SV 0 (hG0 , G1 i, a) begins by running the same actions as the prover in step 1. In step 2, it behaves like V 0 to get the bit c. If b = c, output (H, c, σ). If b 6= c, start over. When SV 0 (hG0 , G1 i, a) succeeds and gets b = c, the output distributions are identical since SV 0 (hG0 , G1 i, a) followed the protocol. Conditioned on H and c, the probability that b = c is 1/2, so the expected number of iterations until SV 0 (hG0 , G1 i, a) succeeds is 2. Thus we have a probabilistic, expected polynomial time simulator, which is good enough to achieve prefect zero knowledge. If a definite runtime is required (instead of an expected one), then we can modify SV 0 (hG0 , G1 i, a) to obtain statistical zero knowledge by iterating some large but constant number of times before outputting some fixed string if all iterations failed. This distribution will be very close to the actual distribution created by the protocol as required. 2.4.2
3Colorability has a ZKIPS
A ZKIPS exists for 3Colorability assuming bit commitment. Last lecture, we showed that no bit commitment protocol has information theoretic security, but such protocols do exists for the classical computational setting under computational assumptions, like the existence of oneway functions. Note that it is better to base a ZKIPS on hard problems because the zero knowledge property only guarantees that a computationally efficient party cannot do anything more after running the protocol than before. If the underlying computational problem is easy, then there is no need for interaction to break the security. For that reason, zero knowledge protocols based on 3Colorability are safer than those based on GraphIsomorphism, as the former problem is NPcomplete but the latter is believed not to be. Suppose the prover has a 3coloring γ : V (G) → {R, Y, B} of the input graph G. The protocol then proceeds as follows. 1. The prover selects a uniformly random permutation π of {R, Y, B}, commits to π(γ(v)) for all v ∈ V (G), and sends those commitments to the verifier using the bit commitment scheme. 2. The verifier then selects (u, v) ∈ E(G) uniformly at random and sends the edge to the prover. 3. The prover checks that (u, v) is indeed an edge in E. If (u, v) 6∈ E, the prover aborts. If (u, v) ∈ E, then the prover continues the protocol by revealing a = π(γ(u)) and b = π(γ(v)). 4. The verifier accepts iff a, b ∈ {R, Y, B} and a 6= b. If γ is a valid 3coloring, then the verifier will always accept since the colors assigned to adjacent vertices are different choices of R, G, and B, so we have perfect completeness. If G is not 3colorable, then there exists at least one edge where the incident vertices have the same color or one has an invalid color. Catching the prover in the case that all colors are valid but there is exactly one edge with incident vertices of the same color is the harder case to detect, which happens with
4
1 probability E , so our soundness is at most E−1 E . This argument also relies on the provers bit commitments. After the verifier picks the edge (u, v), we cannot allow the prover to change to a coloring that is locally valid. In order to boost our confidence, we can repeat this protocol poly(E) times to achieve another protocol with soundness of at most 1/2. Furthermore, this protocol is zero knowledge, which we show by constructing the simulator SV 0 on inputs G and a. The simulator SV 0 (G, a) begins by running the same actions as the prover in step 1. In step 2, it behaves like V 0 to get the pair (u, v). If (u, v) 6∈ E(G), abort. If (u, v) ∈ E(G), then output two distinct colors from {R, Y, B} uniformly at random. When the verifier does not cheat and selects a pair of vertices that form an edge in G, two colors are revealed. Conditioned on the bit commitments and the edge (u, v), these two colors are fixed. However, these two colors are computationally indistinguishable from two distinct colors selected uniformly at random because the verifier does not have the computational ability to break the security of the bit commitments. Thus, this simulator proves that the protocol is computational zero knowledge. Notice how simple this ZKIPS is. Every step only contains basic computations. This protocol could easily be implemented on a smart card. Also note that it is crucial that the prover check that the verifier’s pair (u, v) is an edge. Without the check, this protocol is zero knowledge iff NP = RP.
3
Quantum ZKIPS
In a quantum IPS, the prover and verifier can perform quantum computations and their communication can be quantum. The prior knowledge will now be modeled by a quantum register αi. We will now prove the following theorem. Theorem 1. The zero knowledge interactive proof system for GraphIsomorphism remains perfectly zero knowledge in the quantum setting. Furthermore, the simulator runs in worstcase polynomial time. A theorem like this is important because it says that the prover can continue to use a cheap, common classical computer and remain secure against a dishonest verifier who has the power of quantum. Proof. Since the verifier can observe every message from the prover, the arguments for the completeness and soundness from the classical setting still hold. What remains is to show that this protocol is still zero knowledge, which is not obvious. Why does our argument from the classical setting fail? It is because of the prior knowledge. The standard simulation procedure runs the basic simulator until the first success. For each trial we need a fresh copy of αi, but the no cloning theorem forbids copying αi. Another idea is to run the protocol backwards and try to recover αi. However, checking for success involves a measurement, so we will not be able to recover αi exactly. We will show that the modified state of αi obtained by rewinding after a failed attempt nevertheless allows us to rerun the basic protocol with high probability of success, and keep doing so until the first success. The key property we need of the classical zero knowledge protocol is that the probability of success of the basic simulator is independent of αi, namely p = 1/2 in the case of the protocol for graph isomorphism. By assuming that SV 0 postpones all measurements until the end, we can represent SV 0 as a unitary matrix U applied to αi 0m i followed by a projective measurement (P0 , P1 ), where P1 corresponds to success. Of course U also acts on the input, but this will not affect the analysis. 5
For all αi, we have kP1 U αi 0m i k22 = p.
(probability of success)
kP0 U αi 0m i k22 = 1 − p.
(probability of failure)
and
We can rewrite the lefthand side of the latter equation as hα h0m  U † P0† P0 U αi 0m i = hα h0m  U † P02 U αi 0m i = hα h0m  U † P0 U αi 0m i , because a projective matrix is Hermitian and projecting twice is the same as once. Thus, we have that for all αi, (I ⊗ 0m i) U † P0 U (I ⊗ h0m ) αi = (1 − p) αi . The operator (I ⊗ 0m i) U † P0 U (I ⊗ h0m ) does the following to αi: It takes αi, extends it by m zeros, applies U † P0 U , and extracts the components that end in m zeros. This operator is Hermitian and maps every αi to (1 − p) αi. The only way that can happen is if (I ⊗ 0m i) U † P0 U (I ⊗ h0m ) = (1 − p)I. This follows because this operator, being Hermitian, has a full basis of eigenvectors. Every eigenvalue must be (1 − p). Another way to view all of this is that the projection of U ∗ P0 U αi 0m i onto components with 0m at the end is (1 − p) αi, which is parallel to αi. The latter is the property we will actually use. Note that the independence of the success probability on the state αi is what allowed us to argue it. Now let’s see, via a two dimensional diagram, what happens when we run our simulator SV 0 . We start with the vector αi 0m i which we place on an axis (see Figure 1(a)). After applying U , we are in a state in which the observation can either lead to success denoted by 1i or failure denoted by 0i (see Figure 1(b)). If we project and measure a 1, then we are done, so assume that we measure a 0. This means we are now (after normalizing) in the state 0i β0 (α) (see Figure 1(d)). Since we failed, we are going to try to return to the initial state by applying U † . There is a vector that we can pick for the vertical axis in Figure 1(a) so that U † 0i β0 (α) lies in the plane of the figure. From the first plane in Figure 1(a) to the second plane in Figure 1(b), the unitary operator U caused a rotation by θ. Thus, going in the reverse direction will send us back by θ (see Figure 1(c)). Now look at the parts of U † 0i β0 (α) that end in 0m . We know that this part is parallel to αi, so if we do a phase flip for all of the components which do not end in 0m , then we reflect across the αi 0m i axis and get some state φi (see Figure 1(e)). At this point, have a state that is different than the state αi 0m i we started from, but we can still use it in the simulation. If we apply U to φi, we return to the second diagram at an angle of 2θ (see Figure 1(f)). If we fail again, we return to the state in Figure 1(c) and repeat the above process (see Figure 1(g)).
6
1i β1 (α)
αi 0m i
U
− →
(a) Initial state
θ
U αi 0m i 0i β0 (α)
(b) Applying U
↓ Observation ψi
1i β1 (α)
θ
αi 0m i
U†
←−
0i β0 (α)
U † 0i β0 (α)
(c) Reverse Computation
(d) If the projection fails
↓ Phase Flip ψi
1i β1 (α) U φi φi θ
αi 0m i
(e) Phase flip with αi 0m i
U
− →
2θ
0i β0 (α)
(f) Applying U again
↓ Observation 1i β1 (α)
0i β0 (α)
(g) If the projection fails Figure 1: Two dimensional depiction of the simulator SV 0 7
Since the probability of success is the square of the projection on the vertical axis, the probability success in the first trial is Pr [success in first trial] = sin2 θ = p, and the probability of success in any subsequent trials is Pr [success in any subsequent trial] = sin2 (2θ) = 4p(1 − p). In the case of graph isomorphism, the probability of success in the first trial is p = 1/2, and in the second trial is 4p(1−p) = 1, so our simulator always halts and the running time is polynomial. Like in the classical setting, the output distribution on success is identical to the view of the verifier. Thus, the protocol is perfect zero knowledge.
4
Next Time
In the next lecture, we will continue our discussion of quantum interactive proofs. After this, we will begin talking about error correction.
8
CS 880: Quantum Information Processing
11/4/2010
Lecture 25: Error Correction Instructor: Dieter van Melkebeek
Scribe: Brian Nixon
In this lecture, we complete our discussion of interactive proof systems and begin talking about error correcting codes. We gradually build a system that will secure a single qubit from errors and prove its correctness. Next lecture we will discuss a different method of single qubit error correction.
1
Quantum Interactive Proof Systems, continued
Recall from last lecture that classically a language L has an interactive proof system (IPS) iff L ∈ PSPACE. In the quantum setting, both the verifier and the prover gain addition power so it is not immediately clear how the class of languages that have a quantum interactive proof system (QIPS) compares. We know that we do not lose power as the verifier can simply force the prover to adopt a classical protocol by making an observation after each step. So we at least have QIPS for all of PSPACE. A recent result proved that we get nothing more than PSPACE by demonstrating that any QIPS can be reduced to a similar type as we used in our graph isomorphism protocol last lecture. In this standard form there are 4 steps: 1. Prover sends a register X of qubits to Verifier, 2. Verifier picks b ∈ {0, 1} uniformly at random and sends it to Prover, 3. Prover sends register Y to Verifier, 4. Verifier decides whether to accept. Given this form we want to know the probability that the verifier accepts. We can capture this behavior with a semidefinite program. Semidefinite programs (SDPs) are similar to linear programs – they optimize a linear objective function over the reals under linear inequality constraints; in addition they can use constaints that require that certain variables from a semidefinite matrix. They solution to SDPs can be approximated to within in time polynomial in the size of the program and 1/. In our case the objective function can be written as 1
1X 1 1 Pr(Verifier acceptsb = i) = Tr(π0 ρ0 ) + Tr(π1 ρ1 ), 2 2 2 i=0
where πi denotes the projection onto the accepting subspace in the case b = i, and ρi denotes the density operator related to the combination of X and Y for b = i. The constraints are the following. • ρ0 and ρ1 are density operators. This can be expressed in and SDP program by stipulating that ρ0 and ρ1 are positive semidefinite matrices, and the linear equality that they have trace 1. 1
• X is independent of b. This can be expressed by the linear equalities that tracing out Y from ρ0 and ρ1 yields the same matrix: TrY (ρ0 ) = TrY (ρ1 ). The resulting SDP has exponential size but has enough structure such that it can be solved in polynomial space.
2
Multiple Prover IPS
How does the model change if we allow multiple provers? Such provers are allowed to collaborate prior to the start of the protocol but are not allowed to communicate once it has begun (or else there would effectively be a single prover). In the classical system, the verifier would benefit from such a situation by being able to check for consistency between the various provers. This way teh verifier can effectively force the provers to act in a nonadaptive manner where their anwers do not depend on the questions asked before. It is known that classically a 2prover IPS exists exactly for languages in NEXP (nondeterministic exponential time), and that more than 2 provers doesn’t buy any more. With quantum systems, the effect of multiple provers is an open question. The key difficulty here is that provers can share entangled qubits prior to the start of the protocol and cannot be viewed as completely disconnected.
3
Bloch Sphere and Pauli operators
Our discussion of quantum error correcting codes will be limited to codes that can correct errors on a single qubit. Before starting that discussion, we first review an interesting geometric representation of single qubit systems and operations. be represented by a 2by2 matrix. Let X = on a single qubit can that an operation Recall 1 0 0 −i 0 1 . These are the Pauli operators. Each of them , and Z = ,Y = 0 −1 i 0 1 0 are Hermitian and unitary so induce a legal quantum operation on a single qubit. The following properties of the operators are useful and you should confirm them for yourself. Exercise 1.
1. XY = iZ, Y Z = iX, ZX = iY .
2. HXH = Z where H is the Hadamard matrix. 3. {I, X, Y, Z} form a basis for all 2 × 2 matrices. As operations on qubits, X performs a bit flip, Z performs a phase flip, and Y corresponds to a combination of the two. The second item in Exercise 1 shows that Z corresponds to a bit flip in the Hadamard basis. The third item implies that all single qubit density operators can be decomposed into ρ = αI + βX + γY + δZ. We can easily show that α = 1/2 in this case as Tr(ρ) = 1, Tr(I) = 2, Tr(X) = Tr(Y ) = Tr(Z) = 0, and the trace is a linear operator on matrices. This allows us to write any single qubit density operator as 1 1 ρ = I + (cx X + cy Y + cz Z). 2 2
(1)
A pure state of a single qubit can be written uniquely as ψi = cos(θ)0i + eiφ sin(θ)1i up to a global phase shift where 0 ≤ φ < 2π and 0 ≤ θ ≤ π2 . Calculating the corresponding density 2
operator ρ = ψi hψ yields 1 1 + cos(2θ) e−iφ sin(2θ) cos2 (θ) e−iφ sin(θ) cos(θ) . ρ= = eiφ sin(2θ) 1 − cos(2θ) eiφ sin(θ) cos(θ) sin2 (θ) 2
(2)
Equating (1) and (2) yields cx = sin(2θ) cos(φ) cy = sin(2θ) sin(φ) cz = cos(2θ) These are the polar coordinates of a point on a threedimensional sphere with radius 1 centered at the origin. The representation is known as the Bloch sphere. 0i
θ
Φi
φ θ=0
1i
Figure 1: Bloch Sphere Every single qubit density operator corresponds uniquely to a point on the Bloch sphere. What about mixed states? These are convex combinations of pure states so will yield a point on the interior of the sphere. What is the effect of the Pauli operators on the Bloch sphere? Applying Z adds π to φ, which amounts to a 180◦ rotation around the zaxis (or if you prefer, a reflection through the zaxis). A little calculation will reveal that X performs a rotation of 180◦ around the xaxis, and Y performs a rotation of 180◦ around the yaxis. Any curiosity you might have been holding towards why the Pauli operators had their specific names should now be resolved. In general, any unitary operation on ψi corresponds to a rotation (not necessarily of 180◦ ) around some axis in the Bloch sphere. Exercise 2. Show the Hadamard matrix performs a reflection through the plane including the y and (x + z) axes. We note that hφ1 φ2 i2 = Tr(ρ1 ρ2 ) = 21 + 12 (cx1 cx2 + cy1 cy2 + cz1 cz2 ). In particular, orthogonal states map to antipodal points on the Bloch sphere.
3
4
Error Correction
Error correction seems considerably harder in the quantum setting than in the classical setting, and we will only discuss error correction on a single qubit. For comparison, correction on a single bit is easy  consider the code that repeats each bit thrice. To recover any bit, simply take the majority view of any triple (taking the view that errors are unlikely to affect more than one bit out of three). Why is error correction for a qubit so much harder than the simple code we gave for the classical bit? There are three main reasons. First, a qubit has a continuum of possibilities where a bit’s value lies in a discrete set. Second, it is easy to copy a bit but in the quantum setting we have the no cloning rule restricting us. Third, if we do a measurement in the course of detecting and correcting errors, that will collapse the state and may destroy information. In the face of these apparent difficulties we first restrict our question farther to simply correcting for the possibility of a bit flip error, noting that this corresponds to the only possible error in the classical environment. Taking our cue from the classical code, let us consider a system of three qubits that we bind together by applying CNOT’s to get α000i + β111i. See the first half of Figure 2. There are four possible output states after a single bit flip error, namely α000i + β111i, α100i + β011i, α010i + β101i, and α001i + β110i. Fortunately, these exist in orthogonal subspaces so we can separate them perfectly. In fact, we can figure out the error and correct if efficiently, as indicated in the second half of Figure 2. Consider the effect of reapplying the CNOT gates. If bit flips occurred or the bit flips occurred on the extra qubits, this results in the first qubit returning to α0i + β1i. We can correct the remaining error case by adding a CNOT gate that modifies the first qubit and is controlled by the other two. After this procedure the other two qubits take on a pure value of either 0i or 1i. In the field of error correction these extra two bits are called “syndromes” and they tell us exactly what error occurred: 00 for no error 01 for a bit flip on the third qubit 10 for a bit flip on the second qubit 11 for a bit flip on the first qubit.
φi 0i 0i
•
•
 {z } encoding
•
Ebit

•
•
• {z } decoding
φi Syndrome
Figure 2: Bit flip correction It is important to note that if there is a phase flip during the error stage, it appears in the final state φi on the first qubit. You should verify this for yourself. To correct a phase flip we can use the fact HZH = X and apply our circuit for correcting bit flips with the error zone flanked by H ⊗3 . As a bit flip here corresponds to a phase flip in the previous circuit, it passes through as a phase flip did before. 4
φi 0i
•
0i
•
H
H
•
H Ephase
H
H H
•
φi
• •
Figure 3: Phase flip correction We can handle combined errors by concatenating both codes as in Figure 4. We use the bit flip code internally to correct a bit flip and transfer a phase flip onto the first of the three qubits in the block. The dashed line in Figure 4 encircles one of the internal bit flip codes. There are three such blocks, namely one for each of the qubits of the external code, fore which we use our phase flip code.
φi
•
•
H
0i 0i 0i
H
0i 0i 0i 0i 0i
H
•
•
•
•
•
•
•
•
E
•
•
•
•
H
•
H
•
φi
• •
•
• •
H
•
• •
Figure 4: Full Single Qubit Error Correction The resulting 9qubit code can correct a single bit flip X, a single phase flip Z, and their combination Y . By linearity, this means that the code can correct any singlequbit error E, as any such E can be written as a linear combination of I, X, Y , and Z. Here we can see the magic of quantum linearity at work – although there is a continuum of possible singlequbit errors, is suffices to correct a discrete set (bit and phase flips and their combinations) in order to correct all. Next lecture we will discuss a different method of single qubit error correction that uses only 7 qubits to represent a single logical qubit.
5
CS 880: Quantum Information Processing
11/09/10
Lecture 26: Error Correcting Codes Instructor: Dieter van Melkebeek
Scribe: John Gamble
Last class we began talking about quantum error correcting codes, and the difficulties that arise when trying to apply classical error correcting techniques to a quantum setting. At first, the task seems rather daunting since in a classical setting each bit has only two possible values, while on a quantum computer each qubit can take a continuum of values. Nonetheless, we were able to show that it is sufficient to be able to correct only bit and phase flips and their combination. Linearity then enabled us to correct an arbitrary error. We used this to develop a code that represented one logical qubit as nine physical qubits, and could correct an arbitrary, singlequbit error. Today, we will examine the CalderbankShorSteane (CSS) procedure for generating quantum codes based on classical codes. Using this, we will be able to find a seven physical qubit code to represent one logical qubit and correct an arbitrary, singlequbit error (an improvement over our previous ninequbit code). Using the stabalizer formalism, it is possible to generate a fivequbit code, which is optimal, but we will not cover that here.
1
Background on Classical Codes
A code is a mapping C : {0, 1}k → {0, 1}n that maps information words in the set {0, 1}k to (longer) codewords in the set {0, 1}n . The basic idea is that we are adding some redundancy to become robust to some errors. Two key properties of codes are their rate ρ = k/n and their relative distance δ = d/n, where d is the minimum Hamming distance between any two valid codewords. Since we do not want the encodings to be too expensive, good codes should have high rates. Also, as we would like to be able to correct many errors, we would also like high relative distances. Codes are typically denoted by (n, k) or (n, k, d). As before, if the absolute distance d ≥ 2t + 1, where t is the maximum number of errors that can occur, received words that come from different codewords cannot collide and we can correct the errors. More specifically, if we consider some received codeword r = C(x) + e, which is an encoding of the information word x with some error e, as long as 2 · weight(e) + 1 ≤ d then we can recover C(x) from r simply by choosing the closest valid codeword to r.
1.1
Linear Codes
Many codes are linear, meaning that C is a linear mapping. In this case, the absolute distance d of the code is just the minimum weight of any nonzero codeword. To see this, first note that by linearity the zero vector must map to the zero vector: C(0k ) = 0n , so d is at most the minimum weight of a nonzero codeword. Now, suppose that two codewords, α and β, are the closest in the code. But then, by linearity, γ = α − β is also a codeword. Since the distance between α and β is the smallest in the code and is also the same as the distance between γ and 0n , which is the weight of γ, we have that d is at least the minimum weight. Another interesting feature of linear codes is that the code generation and decoding can be done in a generic way:
1
1. For encoding, use x → Gx, where G is an n × k generating matrix. 2. For decoding, use the (n − k) × n parity check matrix P : y ∈ C ↔ P y = 0.
(1)
We can view P as a set of homogeneous linear equations that exactly characterize the codewords. Also, we can think of P T as the generator matrix of the orthogonal complement of C, and its dual code C ⊥ . Vectors in this dual code are orthogonal to all codewords in C, and are taken from information words of length n − k. The parity check matrix cannot only be used to detect errors, but also to correct them. For instance, suppose that we had a received word r = C(x) + e with error e, generated from an information word x. Then, applying the parity check matrix to r gives us P r = 0 + P e. Here, P e is called the error syndrome, as it will allow us to diagnose the error. Note that for any two e1 6= e2 , as long as neither have weight more than (d − 1)/2, we know that P e1 6= P e2 . If they were the same, then we would have P (e1 − e2 ) = 0, so e1 − e2 would be a nonzero codeword (by linearity) of weight less than d, which is a contradiction. Hence, if no more than (d − 1)/2 errors, our syndrome tells us the location of any errors that occurred. The distance d can also be interpreted in terms of P . Specifically, d is the smallest weight of string y such that P y = 0. In other words, d is the smallest nonzero number of columns of P that add up to the all zero column. Linear codes are typically denoted with square brackets: [n, k] or [n, k, d].
1.2
Examples of linear codes
The trivial example we mentioned last time is the repetition code, where we just repeat one bit a number of times. In that case, we are trying to encode one bit, so k = 1. n is the number of times we repeat. The relative distance is just δ = 1, which is as good as it can be, since the only two valid codewords are all zeros or all ones. However, the rate is ρ = 1/n, as bad as it can be. As we mentioned in an earlier lecture, there are also families of codes where both δ and ρ are positive constants. One example is the Justesen code, but we will not develop it here. Next, consider the Hamming codes, a class of codes defined by their parity check matrices. Take P to have length s, with columns consisting of all nonzero strings of length s: 0 0 0 ··· 1 0 0 0 ··· 1 . . . . P = (2) 0 . . ··· 1 . .. .. . 1 1 ··· . 1 0 1 ··· 1 P has dimension s × (2s − 1). Since P operates on codewords of length n, we know n = 2s − 1. Also, since P has n − k rows, k = 2s − 1 − s. Since it takes a linear combination of three columns of P to form the zero vector, the code has distance 3. Hence, this is a [2s − 1, 2s − 1 − s, 3] code, and can correct a single error. Further, determining where the error occurred is very easy. This is because for any e of weight one, P e = i, where i is the binary representation of the location of the error. 2
A common instantiation of this is for s = 3, where we obtain H3 = [7, 4, 3] with dual code One can also show that in this case H3⊥ ⊆ H3 .
H3⊥ [7, 3].
Exercise 1. Verify that H3⊥ ⊆ H3 .
2
CSS Codes
CSS codes are a family of codes generated by a procedure for translating classical codes to quantum codes. Theorem 1. Suppose we have a (classical) [n, k1 ]code C1 with distance d (C1 ) ≥ 2t − 1 and a classical [n, k2 ]code C2 ⊆ C1 such that the distance of the dual code C2⊥ satisfies d C2⊥ ≥ 2t − 1. Then, there exists a quantum code CSS (C1 , C2 ) that maps k1 − k2 logical qubits to n physical qubits that can correct arbitrary errors on t qubits. As a specific example of this, pick C1 = C2⊥ = H3 . Then, k1 − k2 = 4 − 3 =, t = 1, and n = 7, which is a sevenqubit code that encodes a single logical qubit and can correct an arbitrary error on one qubit. Proof. First, note that it is enough to specify the encoding of base states, as we can then generate an arbitrary information word from a linear combination of base states. The basis for the codewords will all be of the form X 1 y + xi , (3) C2 + xi = p C2  y∈C2 where x ∈ C1 . So, our codewords are coset states of C2 . Two of these states C2 + x1 i and C2 + x2 i are distinct when x1 and x2 belong to distinct cosets of C1 in C2 , i.e., iff their difference x1 − x2 ∈ C2 . Thus, the number of available codewords is 2k1 /2k2 = 2k1 −k2 . The question now is how we correct errors on these encodings. We will proceed as we did before, by showing that we can correct bit flips, phase flips, and their combination, which is sufficient to correct arbitrary errors by linearity. We begin with bit flips, supposing that we have a state C2 + x + ei. where e is a bit flip error. Note that since we required that C2 ⊆ C1 , for each y ∈ C2 , y + x is also a codeword in in C1 . Hence, on each component of the superposition of our coset state, we can correct for e as long as weight(e) < t, since C1 can correct t errors. By linearity, this means that we can correct up to t bitflip errors on C2 + xi. Note that, as before, this procedure does not affect phase flips. Next, we look at phase flip errors. Recall that in the Hadamard domain, phase flips become bit flips. Referring to our discussion of the Fourier transform of a coset state and the fact that H ⊗n is the Fourier transform over Zn2 , we obtain the following for the Fourier transform of a valid codeword: X 1 χy (x) yi , (4) H ⊗n C2 + xi = FZ⊗n C2 + xi = q 2 C ⊥ y∈C ⊥ 2 2
where in this case the character χy (x) = (−1)x·y . If we apply this to a superposition of encoded states, we have X X X αx q χy (x) yi . (5) H ⊗n αx C2 + xi = C ⊥ x∈C1 x∈C1 y∈C2⊥ 2 3
If we have some phase flips before the Hadamard transform, then they map to bit flips after the Hadamard transform, i.e., we replace yi on the righthand side of (5) by y + ei, where e indicates the positions of the phase flips. As long as e has weight at most t, then we can correct the errors, since C2⊥ was required to be able to correct t errors. Next, we apply the Hadamard transform again, which brings us back to a valid encoding. Since phase flips are not affected by the bit flip correction, applying the bit flip error correction procudure followed by the phase flip error correction procedure allows us to correct bit flips, phase flips, and combined bit/phase flips on t qubits. By the linearity argument from last lecture, this means the procedure corrects arbitrary errors on up to t qubits. As it turns out, the simple ninequbit code we developed last time can also be viewed as a CSS code through use of classical repetition codes. Exercise 2. Show that the ninequbit code we developed last time can be expressed using the CSS formalism.
4
CS 880: Quantum Information Processing
11/11/2010
Lecture 27: Nonlocality Instructor: Dieter van Melkebeek
Scribe: Balasubramanian Sivan
The general idea in quantum communication protocols we have seen was to exploit the power of entanglement present in EPR pairs. Today, through various other examples, we illustrate several strange tasks that can be accomplished through that same power of entanglement.
1
What does nonlocal mean?
We begin by discussing what nonlocal doesn’t stand for. Let two parties Alice and Bob share one qubit each of a Bell state and separate. One interpretation of nonlocality could be that as soon as Alice makes a measurement, the resultant change in the system is instantaneously reflected in Bob’s view of the world. This is not the right interpretation of nonlocality though it seems true at first sight. In fact, this would contradict the postulate that no information can be transmitted faster than the speed of light. To see that instantaneous information transmission is indeed not happening here, consider the reduced density operator of Bob, which completely describes Bob’s view of the world. We discussed earlier that the reduced density operator contains all information that Bob needs to know about the rest of the world, to find how his state evolves, the effect of the operations that he performs, etc. Now, when Alice performs an operation on her qubit, the reduced density operator of Bob does not change. One way to see this is through the Schmidt decomposition. When Alice performs a unitary operation, the orthonormal basis given by the Schmidt decomposition changes, but this does not have any effect on Bob’s reduced density operator. (This was a fact that we exploited when we proved that perfect bit commitment is not possible even with quantum computers.) Similarly, when Alice performs a measurement, it can be interpreted as purification of the system on Alice’s part, and this does not affect Bob’s reduced density operator. Thus nonlocality is not immediate signaling. Nonlocality refers to the fact that the entanglement exhibited by the EPR pairs cannot be explained by the theory of local hidden variables. Hidden variable theory, proposed by Einstein, Podolsky and Rosen, states that for an entangled system, somehow, the two parties Alice and Bob have some local variables, which describe how the system behaves for each possible measurement. That is, before separating, Alice and Bob make a list of how the system behaves for every possible measurement, and thus, the information contained in these “hidden variables” explains entanglement. Today we illustrate through various examples that this is not the case.
2
GHZ paradox
The GHZ paradox is named after Greenberger, Horne and Zeilinger. We discuss this from a computer science perspective. Consider a threeparty game, with Alice, Bob and Carol. They receive one bit each, r, s, and t, respectively, with the promise that r ⊕ s ⊕ t = 0. Thus there are four possible inputs, corresponding to all zeroes, and exactly two ones. These inputs are drawn from a uniform distribution over the four possible inputs. The goal is, without any communication
1
after the inputs were given, for Alice, Bob and Carol to output a single bit each, a, b, and c, such that a ⊕ b ⊕ c = r ∨ s ∨ t. What is the probability of success achieveable classically? We can achieve a probability of success of 0.75 through a very simple protocol. Two of Alice, Bob and Carol shall always output zero, and the third person shall always output one. This way, the XOR of all their outputs will always be one. But in fact, we require the XOR of the outputs to be one for three out of the four inputs, namely for the three inputs apart from all the zeroes input. Thus this simple protocol manages to be correct for three of the four possible inputs, giving a success probability of 0.75. We now prove that this is in fact the highest possible probability of success that can be achieved classically. Let a0 , b0 and c0 denote the output of Alice, Bob and Carol when they receive an input of zero, and let a1 , b1 , and c1 denote the same when they receive an input of one. We require the following: Input = 000 ⇒ a0 ⊕ b0 ⊕ c0 = 0
Input = 011 ⇒ a0 ⊕ b1 ⊕ c1 = 1
Input = 101 ⇒ a1 ⊕ b0 ⊕ c1 = 1
Input = 110 ⇒ a1 ⊕ b1 ⊕ c0 = 1
This is something that cannot be true, because if we XOR all the four implied equalities, we have a one in the RHS, but the LHS must be zero as we have each variable appearing exactly twice. Thus, it is inevitable to err on one input, making it impossible to exceed a success probability of 0.75. Randomized strategies cannot exceed 0.75 either, as any randomized strategy is just a convex combination of deterministic strategies, and hence cannot yield a higher success probability. Hidden local variables are not of any help either, since the above argument still remains valid.
2.1
Quantum Setting
We now show that we can get a success probability of one in the quantum setting! Alice, Bob and Carol prepare the entangled state ψi = 21 (000i − 011i − 101i − 110i) take one qubit each and separate out. They decide on the following strategies. Alice’s strategy is that if she gets an input of 1, she applies a Hadamard operation on her qubit and outputs the outcome of the measurement on her qubit. If her input is zero, she does not perform the Hadamard operation, and just outputs the outcome of the measurement on her qubit. Bob and Carol have identical strategies. We now verify that for each input, the output satisfies the required condition. For example, when the input is 000, no Hadamard operation is performed, leaving the state ψi undisturbed. For the state ψi, irrespective of who measures first, the output will have an even number of ones, thus XORing to zero, which is what we require. When the input is 011, then, after the Hadamard operation, the state can be verified to be ψ ′ i = 001i + 010i − 100i + 111i. Here, irrespective of who gets to measure first, the XOR of the outputs will be one. By symmetry, the other two inputs are similar to this input. This means that we have a success probability of one. This proves that the local hidden variable theory cannot explain the power of entanglement in EPR pairs, for otherwise, the probability of success with hidden variables should have been one. But as we proved, even with hidden variables, the probability of success is at most 0.75.
2
3
Bell inequality
The Bell inequality was specifically designed to refute the local hidden variable theory. It is an inequality which has to be true if nature were to obey the local hidden variable model, i.e., if the following two assumptions were true. 1. Observables of a system have an intrinsic value irrespective of any measurement being performed. 2. Local measurements by any one party in an entangled system do not have any effect on the result of measurements made by the other party. However, this inequality is violated according to the quantum mechanical model of nature. From the experiments that have been conducted, it has been verified that nature overwhelmingly violates the Bell inequality, thereby proving that the local hidden variable theory does not explain entanglement. In particular, one or both of the above two assumptions must be false. We now derive Bell’s inequality, which as we noted, is really an inequality only if we disbelieve the quantum mechanical model. Thus, for deriving this inequality, we assume that both of the above assumptions are true. The setup here is somewhat similar to the GHZ paradox setup. We have two parties Alice and Bob, who have a qubit each. They perform their measurements far away from each other, so there is not enough time for any signal to have reached Bob from Alice (and vice versa), and thus Alice’s measurement had no chance of influencing the result of Bob’s measurement. (Recall that by our assumption 2, local measurements do not influence the result of other party’s measurements. Thus sending an explicit signal is the only way to have an influence, which we prevent by designing the experiment as above.) There are two possible observables that Alice and Bob can measure. The outcomes of the measurements are bits. Let M0A and M1A denote the two measurements that Alice can perform, and let a0 and a1 denote the respective outputs of these measurements. As noted, both a0 , a1 ∈ {0, 1}. The quantities M0B , M1B and b0 , b1 have similar meanings. We define the quantities Ai = (−1)ai and Bi = (−1)bi for i = 0, 1. Note that by our assumption 1, a0 (and others) being 0 or 1 is something intrinsic to the system, and will take the same value irrespective of the number of measurements we do. Thus the only randomness in the ai ’s and bi ’s, (and hence the Ai ’s and Bi ’s) is the randomness of the initial state of the system. That is, initially, the system is set such that the ai ’s and bi ’s are assigned specific values with some joint probability. Consider the following quantity: A0 B0 + A0 B1 + A1 B0 − A1 B1 . This can be rewritten as A0 (B0 + B1 ) + A1 (B0 − B1 ). Since B0 can be either B1 , or −B1 , this sum satisfies Thus −2 ≤ A0 B0 + A0 B1 + A1 B0 − A1 B1 ≤ 2. Thus, E[A0 B0 + A0 B1 + A1 B0 − A1 B1 ] = E[A0 B0 ] + E[A0 B1 ] + E[A1 B0 ] + E[−A1 B1 ] ≤ 2,
where the expectation is over the randomness in the initial state of the system. Thus, by running the experiment many times, we can compute each of these four expectations individually, and then sum them up. If our belief that nature obeys the local hidden variable theory were true, the expected value of this sum must be at most 2.
3
3.1
Quantum Setting
Here Alice and Bob have one qubit each of the entangled system φ− i = 00i − 11i. Measurement 0 = −π/16, and then performing a MA0 corresponds to Alice performing a rotation by an angle θA 1 measurement in the standard basis. Measurement MA corresponds to Alice performing a rotation 1 = 3π/16 and then performing a measurement in the standard basis. We similarly by an angle θA 0 0 = θ 0 , and θ 1 = θ 1 . We now drop the superscripts in the define MB and MB1 with the values θB A B A θ’s, and write the state of the system after these rotations: cos(θA + θB )(00i − 11i) + sin(θA + θB )(01i − 10i).
(1)
Alice and Bob now each measure their qubit, resulting in outcomes a, b ∈ {0, 1}. Notice that for the first two components in (1) the quantity (−1)a · (−1)b has value 1, whereas for the last two it is 1. Thus, E[(−1)a · (−1)b ] = cos2 (θA + θB ) − sin2 (θA + θB ).
Figure 1: Figure depicting the value of θA + θB for the four possible measurement types. The value of θA + θB  is π/8 for the three measurement types (MA0 , MB0 ), (MA0 , MB1 ) and For the fourth measurement type (MA1 , MB1 ), the value of θA + θB  is 3π/8. (See Figure 1). Thus for the first three types of measurement pairs, their corresponding√quantities, namely, E[A0 B0 ], E[A0 B1 ], and E[A1 B0 ] are all equal to cos2 (±π/8) − sin2 (±π/8) = √ 1/ 2. For the 2 2 (3π/8) = 1/ 2. Thus, the fourth type of measurement, the quantity E[−A1 B1 ] = sin (3π/8) − cos √ sum of the expectations of all these four quantities is 2 2 > 2. That is, Bell’s inequality doesn’t hold in the quantum mechanical model. As we mentioned, experiments conducted verify that the sum of these expectations indeed matches what is predicted by quantum mechanics, and in particular, is above 2. (MA1 , MB0 ).
3.2
Computer Science interpretation of the Bell inequality
Consider the following two party game with Alice and Bob. Alice and Bob receive a one bit input each, namely r and s, chosen uniformly at random. These input bits say which measurement is going to be performed. The goal is for both of them to output one bit each, namely a and b, such that a ⊕ b = r ∧ s, without any communication after the inputs have been given. Classically (deterministic or randomized), the probability of success is at most 0.75.
4
In the quantum setting, Alice and Bob have a single qubit each of the entangled system φ− i = 00i − 11i. Alice’s strategy is to apply the measurement MAr above, i.e., when r = 0 she rotates her 0 = −π/16 before measurement, and when r = 1 she rotates her qubit by θ 1 = 3π/16 qubit by θA A before measurement. Bob follows a similar strategy, using the angles θB . By the above analysis, in the three cases where r∧s = 0, the probability of success is Pr[a⊕b = 0] = cos2 (θA +θB ) = cos2 (π/8), whereas in the one case where r ∧ s = 1, the probability of success is Pr[a⊕ b = 1] = sin2 (θA + θB ) = sin2 (3π/8) = cos2 (π/8). So, the overall probability of success is cos2 (π/8) ≥ 0.85, which exceeds the classical limit of 0.75.
4
Magic Square
This is the final example that we give to refute the local hidden variable theory. This is again a two party game with Alice and Bob. Alice’s and Bob’s strategies can each be described by a 3by3 matrix over {0, 1}. Alice and Bob receive a number from {1, 2, 3} chosen uniformly at random. Alice outputs the values of the entries in that row of her matrix, and Bob outputs the values of the entries in that column of his matrix. They succeed if the sum of the elements in Alice’s row is even, the sum of the elements in Bob’s column is odd, and the row and column agree at the point of intersection in the matrix. Classically, no protocol can have a success probability more than 8/9. This is because it cannot be the case that all the row sums are even in Alice’s matrix, and all the column sums are odd in Bob’s matrix, and both matrices agree everywhere, as that would mean that the sum of all the elements in the matrix is simultaneously odd and even! Thus, for any deterministic strategies there is at least one input on which they fail. It follows that the success probability of deterministic (or even randomized) strategies is at most 8/9, a value that can be achieved. But quantum mechanically, we can achieve a success probability of 1. We omit the proof here.
5