Math Prerequisites for Quantum Computing A visual guide to the mathematics you need to master before you start learning quantum computing

314 133 78MB

English Pages [1295] Year 2021

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Math Prerequisites for Quantum Computing A visual guide to the mathematics you need to master before you start learning quantum computing

Table of contents :
Why Quantum Computing ?
BOOLEAN ALGEBRA
Working with True and False
Boolean Variables and Operators
Truth Tables
Logic Gates
Logic Circuits
AND Gate
OR Gate
NOT Gate
Multiple Input Gates
Equivalent Circuits 1
Equivalent Circuits 2
Universal Gate NAND
Exclusive-OR
XOR for Assignment
XOR of Bit Sequences 1
XOR of Bit Sequences 2
Introduction to Cryptography
Cryptography with XOR
Shared Secret
Importance of Randomness
Breaking the Code
PROBABILITY
Predicting the Future
Probability of a Boolean Expression
Mutually Exclusive Events
Independent Events
Manipulating Probabilities with Algebra
P( Mutually Exclusive Events )
P( Independent Events )
Complete Set of MutEx Events
P( A OR B )
Worked Examples 1
Worked Examples 2
P( Bit Values )
Analysis with Venn Diagrams
Venn Diagram: P(A AND B)
Venn Diagram: P( A OR B )
Venn Diagram: P( NOT A )
Worked Examples 1
Worked Examples 2
Conditional Probability
Worked Examples
STATISTICS
Introduction to Statistics
Random Variables
Mapping Random Variables
Mean, Average, Expected Value, ...
Worked Examples 1
Worked Examples 2
Beyond Mean
Standard Deviation
Worked Examples
Combinations of Random Variables
Correlation
Analysis of Correlation
COMPLEX NUMBERS
Introduction to Complex Numbers
Imaginary i
Addition of Complex Numbers
Subtraction
Multiplication by a Real
Division by a Real
Complex Multiplication
Worked Examples
Complex Conjugate
Squared Magnitude
Complex Division
Worked Examples
Euler's Formula
Polar Form
Worked Examples
Fractional Powers
Complex Cube Roots of 1
Square Root of i
2D Coordinates
LINEAR ALGEBRA
Introduction to Matrices
Matrix Dimensions
Matrix Addition
Matrix Subtraction
Scalar Multiplication
Matrix Multiplication
Worked Examples 1
Worked Examples 2
3x3 Example
Worked Examples 1
Worked Examples 2
When is Multiplication Possible?
Worked Examples
Not Commutative
Associative and Distributive
Dimension of Result
Odd Shaped Matrices
Worked Examples 1
Worked Examples 2
Worked Examples 3
Inner Product
Worked Examples
Identity Matrix
Matrix Inverse
Transpose
Worked Examples
Transpose of Product
Complex Conjugate of Matrices
Adjoint
Unitary
Hermitian
Hermitian and Unitary
Why Hermitian or Unitary ?
Vectors and Transformations
Rotation in 2D
Special Directions
Eigen Vectors and Eigen Values
More Eigen Vectors
Dirac Bra-ket Notation
QUBITS AND SUPERPOSITION
From Bits to Qubits
Polarized Photons of Light
Photons and Polarizing Filters
More on Photons and Polarizing Filters
Filters Change Polarization
Quantum Behavior of Polarizers
Polarizing Single Photons
Using Calcite
Loss of Information
Finding Angle of Polarization
Finding Polarization of a Single Photon
Limitations of Measurement
Repeated Measurement 1
Repeated Measurement 2
Repeated Measurement with Filters
Running the Java Code
Simulation of Classical Bit
Simulation of Qubit (Quantum Bit)
No Cloning Theorem 1
No Cloning Theorem 2
Measurement is Irreversible
Deterministic vs Probabilistic
Simulating Measurements
Superposition
Collapse of Superposition
Conclusion

Citation preview

Introduction To learn quantum computing, you need a strong foundation in linear algebra, complex numbers, statistics, and boolean logic. You must master these prerequisites before you can begin learning about quantum computing. There is nothing 'quantum' about these topics. You probably learned this material back when you were in high school. But over the years you might have forgotten important details. This book will help you revise the basic high-school level mathematics you must know before you can start learning about quantum computing. As a bonus, I have included a section on quantum physics. These chapters will give you a clear understanding of quantum bits (qubits), quantum measurement, and superposition. For those who know Java, I have also provided a qubit simulator. The qubit simulator can help you develop an intuitive understanding of how qubits behave. An Easier Way to Learn Math Math books can be dense. They usually try to pack as much information as possible into as few pages as possible. That reduces printing costs, but makes the content harder to understand. This book takes an opposite approach. It is published as an e-book, so there are no page limits. The subject is explained over as many pages as necessary to make it easy to understand. Complex concepts have been split into small ideas, and each small idea has been explained with its own illustration. There are over 1000 illustrations in this book. These illustrations have been organized as slides, one on each page. Explanations are displayed as text below the slides. You will never get stuck because each slide presents a small idea that is easy to understand. Learn at your own pace. Flip quickly through sections you find easy, and slow down for unfamiliar material. Videos Videos and animations can help you master new concepts. Most chapters in this book contain narrated videos in the author's voice. Open the links from within the Kindle App. Videos can be played on most devices except Kindle e-Ink Readers.

Math Prerequisites for Quantum Computing Contents 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38.

Why Quantum Computing ? BOOLEAN ALGEBRA Working with True and False Boolean Variables and Operators Truth Tables Logic Gates Logic Circuits AND Gate OR Gate NOT Gate Multiple Input Gates Equivalent Circuits 1 Equivalent Circuits 2 Universal Gate NAND Exclusive-OR XOR for Assignment XOR of Bit Sequences 1 XOR of Bit Sequences 2 Introduction to Cryptography Cryptography with XOR Shared Secret Importance of Randomness Breaking the Code PROBABILITY Predicting the Future Probability of a Boolean Expression Mutually Exclusive Events Independent Events Manipulating Probabilities with Algebra P( Mutually Exclusive Events ) P( Independent Events ) Complete Set of MutEx Events P( A OR B ) Worked Examples 1 Worked Examples 2 P( Bit Values ) Analysis with Venn Diagrams Venn Diagram: P(A AND B)

39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84.

Venn Diagram: P( A OR B ) Venn Diagram: P( NOT A ) Worked Examples 1 Worked Examples 2 Conditional Probability Worked Examples STATISTICS Introduction to Statistics Random Variables Mapping Random Variables Mean, Average, Expected Value, ... Worked Examples 1 Worked Examples 2 Beyond Mean Standard Deviation Worked Examples Combinations of Random Variables Correlation Analysis of Correlation COMPLEX NUMBERS Introduction to Complex Numbers Imaginary i Addition of Complex Numbers Subtraction Multiplication by a Real Division by a Real Complex Multiplication Worked Examples Complex Conjugate Squared Magnitude Complex Division Worked Examples Euler's Formula Polar Form Worked Examples Fractional Powers Complex Cube Roots of 1 Square Root of i 2D Coordinates LINEAR ALGEBRA Introduction to Matrices Matrix Dimensions Matrix Addition Matrix Subtraction Scalar Multiplication Matrix Multiplication

85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128. 129. 130.

Worked Examples 1 Worked Examples 2 3x3 Example Worked Examples 1 Worked Examples 2 When is Multiplication Possible? Worked Examples Not Commutative Associative and Distributive Dimension of Result Odd Shaped Matrices Worked Examples 1 Worked Examples 2 Worked Examples 3 Inner Product Worked Examples Identity Matrix Matrix Inverse Transpose Worked Examples Transpose of Product Complex Conjugate of Matrices Adjoint Unitary Hermitian Hermitian and Unitary Why Hermitian or Unitary ? Vectors and Transformations Rotation in 2D Special Directions Eigen Vectors and Eigen Values More Eigen Vectors Dirac Bra-ket Notation QUBITS AND SUPERPOSITION From Bits to Qubits Polarized Photons of Light Photons and Polarizing Filters More on Photons and Polarizing Filters Filters Change Polarization Quantum Behavior of Polarizers Polarizing Single Photons Using Calcite Loss of Information Finding Angle of Polarization Finding Polarization of a Single Photon Limitations of Measurement

131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. 144.

Repeated Measurement 1 Repeated Measurement 2 Repeated Measurement with Filters Running the Java Code Simulation of Classical Bit Simulation of Qubit (Quantum Bit) No Cloning Theorem 1 No Cloning Theorem 2 Measurement is Irreversible Deterministic vs Probabilistic Simulating Measurements Superposition Collapse of Superposition Conclusion

Why Quantum Computing ?

Quantum computing is a new field that promises vastly greater computing power. Now when you hear "greater computing power", you might be thinking "So what? My laptop is fast enough." But there are problems that are so complex, they cannot be solved by the fastest digital computers we have today.

Medical researchers have long wanted to understand the molecular behavior of cells in our body.

Cells are like machines. The proteins within cells are like the gears and components that make the machine work.

The behavior of proteins is related to the shape of the protein molecule. But unlike the simple molecules we study in high-school, the shape of a protein molecule is not easy to compute. A protein molecule is like a huge tapestry that folds into a complicated shape. This complicated shape is essential to how the internals of a cell work.

If we can compute the shape of a protein molecule from its structure,

it is expected that we will be able to design better treatments for infections,

cure certain types of cancer,

and reverse some of the effects of aging.

It is believed that quantum computers will be able to solve such problems in biochemistry.

Unlike traditional computing which is built on intuitive logic and step-by-step algorithms, quantum computing is based on mathematical models of quantum physics.

To understand quantum computing, you need to become familiar with the weird behavior of quantum systems and how the physics of the quantum world is represented in Math.

Fortunately, though quantum physics requires a strong foundation in Math, you only need to know relatively basic Math for quantum computing. This book helps you build a strong foundation in basic Math. We will be discussing Boolean logic, logic circuits, Cryptography, Probability and Statistics, complex numbers, and linear algebra.

For Quantum computing you need crossover skills.

You will need to use probability combined with Boolean logic.

Similarly, you will need to know how to use Matrix algebra in combination with complex numbers.

This book gives you the necessary crossover skills.

We focus only on what you need for Quantum computing. You don't waste time learning unnecessary material.

This fast-paced book will help you learn the basics of Math for Quantum Computing in less than 4 hours.

This book also has prerequisites.

I assume that you have already completed high-school through 12th grade, and that you elected to take Math and Physics during the final two years of high school. Though this book covers linear algebra, probability theory, and complex numbers, it is primarily a refresher. It builds upon the foundation you obtained from studying these topics in 12th grade high-school.

Finally, this book is meant for readers who enjoy Math. If you were in the habit of reading your Math lessons before your high school teacher discussed them in class, then you will enjoy this book. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Working with True and False

The first section of this book will discuss boolean algebra. This is a refresher of the boolean logic you might have studied in high-school. Even if you have already studied boolean algebra in school, do not skip this entire section. Instead, focus on the later lessons in this section.

Quantum cryptography which is built upon simple boolean logic and shared secret codes are discussed in the later chapters.

We will begin with Boolean logic and Boolean Algebra. Most of us are familiar with the algebra of real numbers. That is, how we add, subtract, multiply, and divide real numbers. In a similar manner, Boolean logic consists of operations for logical statements. Instead of arithmetic expressions we have Boolean statements.

Boolean statements are statements that can be true or false.

Instead of numbers we have Boolean values which are true or false.

Here are some sample Boolean statements:

Alice owns a red car

Bob is older than 40 years

Alice ate mushrooms at dinner yesterday

We are familiar with such statements from our conversations with friends and family. These statements can be either true or false.

This is important, so I will repeat that. A Boolean statement can either be true or false. No other values are possible.

When we speak with our friends, we can combine these statements. We might say, "Alice owns a red car, and she ate mushrooms at dinner yesterday." We have combined two different ideas using the rules of English grammar:

Alice owns a red car, and

Alice ate mushrooms at dinner yesterday.

But English grammar is imprecise. For Math, we need a precise grammar on how ideas are to be combined. This mathematical grammar is Boolean Algebra.

In Boolean Algebra, ideas are combined using operators. AND, OR, and NOT. These operators have a precise meaning. There is no ambiguity.

Using the grammar of Boolean logic the two ideas we discussed can be combined like this: "Alice owns a red car" AND "Alice ate mushrooms at dinner yesterday"

The boolean operator AND combines two different boolean statements.

The combination is also a boolean statement. The combination statement can be either true or false.

At this point, you might be wondering, what is the big deal? AND is an English word. Instead of writing a statement using English grammar, we have written it using a Math operator called AND. The operator AND has the same meaning as the English word "and". So what?

The reason we use Math is that we can use Math rules to manipulate expressions. In other words, after we have represented these ideas with symbols, that is, two boolean statements connected with an AND operator, we can manipulate these Boolean expressions using a set of rules, without thinking about the real-world meaning of these expressions.

For arithmetic you had rules like a ( b + c) = ab + ac. You were able to rewrite a(b+c) as ab+ac without thinking about the real-world meaning of a, b, and c. A similar set of rules for Boolean statements is called Boolean Algebra. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Boolean Variables and Operators

In the algebra of numbers, you can use letters to represent any arbitrary value. You might write an expression like x+2y and then substitute values as x = 3 and y = 4 . Similarly, in Boolean Algebra we can say:

A = " Alice has a red car" B = " Bob is older than 40 years" These variables A and B can be combined with operators.

A AND B = Alice has a red car AND Bob is older than 40 years.

A OR B = Alice has a red car OR Bob is older than 40 years.

NOT A = NOT(Alice has a red car) = Alice does not have a red car.

These operators AND, OR, and NOT have very precise meanings. The precise meanings enable symbolic manipulation of these expressions.

This expression A AND ( B OR C) is similar to expressions in Algebra.

We can expand this to: (A AND B) OR (A AND C)

This is similar to the algebraic expansions you perform in the algebra of numbers. For arithmetic we have a sequence in which operators have to be evaluated. First multiply, followed by subtraction and addition.

Similarly in Boolean Algebra the evaluation sequence is: NOT, followed by AND, and then followed by OR. We use parentheses to clarify the sequence of evaluation.

Boolean expressions can be manipulated just like the algebra of real numbers.

Consider this expression here:

When evaluating Boolean expressions you can treat AND like multiplication and OR like addition.

Intuitively you might imagine that the not operator behaves like the negative sign in Arithmetic. However that is not the case. The rules for NOT are a little tricky as you can see here:

When we expand parentheses using NOT, an OR operator becomes an AND operator as you can see here.

Similarly, applying NOT causes an AND operator to become an OR operator. I will repeat that. AND is manipulated similar to multiplication. OR is manipulated similar to addition.

But NOT has special rules. It is not the same as a negative sign.

NOT of NOT of A is the same as A

Lets simplify this expression:

The color highlights indicate how each sub expression is simplified.

Here is another expression we can simplify using Boolean algebra: The highlights show each step of the simplification

One more exercise in boolean algebra. Watch the highlights.

Another boolean expression expanded using boolean algebra.

The important takeaway from these exercises is that we can manipulate complex boolean statements with just the rules of boolean algebra. We don't need to think about what each expression means in the real world. As long as we apply the rules correctly, the simplified boolean expression will have exactly the same meaning in the real world as the original boolean expression. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Truth Tables

In this lesson we talk about truth tables. Truth tables are a way to define Boolean functions and operators. Truth tables are especially useful for other less intuitive boolean operators. We have already seen some operators: AND, OR, NOT

But there are more operators, with less intuitive meanings: Exclusive OR, and NAND.

The idea of a truth table is quite simple. It is a table listing every possible input combination and the corresponding output.

If you have a function or operator with 'n' inputs, the truth table will have 2 raised to the power of n rows. For 2 input operators like AND & OR, the truth table has 2 raised to the power of 2 = 4 rows.

This is the truth table for AND:

The boolean TRUE is represented by 1

The boolean FALSE is represented by 0.

When both inputs are 0 , the output is 0.

When the first input is 0 and the second input is 1, the output is 0.

When the first input is 1 and the second input is 0, the output is 0.

When both inputs are 1 the output is 1.

In this expression C = A AND B, C can be true only when both A and B are true.

I will repeat that: 1 represents Boolean TRUE, and 0 means Boolean FALSE.

Here is the truth table for OR:

As you can see from the first row, the output is false, (that is 0), only when both inputs are false.

If either input is TRUE, then the output is TRUE

The NOT operator has only one input.

If the input is false the output is true.

Conversely, when the input is true the output is false.

Finally we have the NAND operator.

The NAND operator is like an AND operator combined with a NOT. C = A NAND B is the the same as C = NOT ( A AND B)

The output is false (that is 0), only when both inputs are true. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Logic Gates

I have mentioned this before.

The Boolean value true corresponds to the number 1 and that also corresponds to an electrical circuit being on.

The Boolean value false corresponds to 0, which is also the same as a logic circuit being off.

In logic circuits, wires carry the on and off values.

Electronic circuits called Gates, take on and off values as input, and produce on or off values as output. Gates are available for operators like: AND, OR, NOT, and so on. The idea is that is boolean values can be represented by electrical signals, and if boolean expressions can be represented by electronic circuits, then Boolean expressions can be evaluated by electronic circuits. This idea forms the basis of modern digital computers.

Gates are building blocks for constructing circuits that evaluate Complex logic. We will learn more about Gates in the next lesson. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Logic Circuits

The diagram shown here is a representation of a logic circuit.

The logical expression corresponding to this circuit is shown here.

According to this expression, the not operation is performed last.

The not operation corresponds to this Gate here.

Before the NOT operation, an OR operation is performed. The two inputs to the OR are the inputs C and the result of A AND B.

The value of A AND B is calculated by this gate here. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

AND Gate

This is the logic symbol for the AND gate.

There are two inputs

and one output.

We have seen this truth table earlier.

The output is ON only when both inputs are ON.

Otherwise the output is OFF.

In the logic circuit we saw earlier, the AND operation is performed first on the inputs A and B. The output of the AND gate is then sent to another gate, called the OR gate which we will see in the next chapter. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

OR Gate

This is the OR gate.

The output is ON if either of the inputs are ON. The output is OFF only if both inputs are OFF.

The truth table of the OR gate is shown here. When both inputs are OFF the output is OFF. Otherwise, if either input is on, the output is on.

In the circuit here, the output of the AND gate feeds into one input of the OR gate.

The other input of the OR gate is the input C. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

NOT Gate

This is the symbol for a NOT gate in logic circuits. There is one Input and one output.

I would like to draw your attention to this little circle at the end of the triangle. That circle will be used in combination with other Gates to indicate an implicit NOT operation. We will see examples of that later.

The truth table for the not gate is shown here:

When the input is zero the output is one.

Conversely, when input is 1 the output is 0. The not gate inverts the input.

Returning to the logic circuit we had seen earlier, the not gate operates on the output of the OR gate. This is the last operation to be performed in this logic circuit. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Multiple Input Gates

So far we have seen logic gates with two inputs. But gates like AND and OR can be built with more inputs as shown here.

This is an AND gate with 3 inputs.

Similarly, this is an OR gate with 3 inputs.

Gates with multiple inputs are convenient for constructing complex circuits.

In this expression, the NOT F is computed here.

D AND E is computed by this AND gate.

A AND B AND C is computed by this 3-input gate.

Finally, all the intermediate values are sent to this OR gate. The OR gate combines the 3 inputs to produce the final output.

Gates with multiple inputs can be constructed by combining 2-input gates as shown here.

The three input AND is equivalent to the two 2-input AND gates connected like this. First we compute A AND B. Then we AND C to it.

Combining OR gates is similar. These 2-input OR gates together are equivalent to this 3-input OR gate. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Equivalent Circuits 1

It is possible to re-configure gates so that an AND gate is built using OR and NOT gates.

The symbol here is of an AND Gate.

Below that we have an OR Gate configured with NOT Gates so that it behaves like an AND Gate.

The expression evaluated by this configuration is not ( not a or not b ). Evaluating this expression using the rules of Boolean Algebra we find that it is A AND B.

Another way to verify that this configuration of OR and NOT Gates is equivalent to an AND gate is by building a truth table for the entire circuit. A and B are the inputs. C is the output. We make a table for every possible combination of A and B.

That would be 00, 01, 10, and 11.

Alongside each of the input combinations, we write the output. When the table is complete, we compare it with the truth table for an AND gate, and find that the two tables are identical.

If two functions have identical truth tables, then the functions are identical. It doesn't matter how the functions were constructed. If the truth tables are identical, the functions are identical. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Equivalent Circuits 2

In a similar manner to what we saw in the last lesson, we can construct an OR gate using an AND Gate with NOT Gates.

Here is the configuration, and this is the Boolean expression for that logic circuit.

When we simplify this expression we find that it is A OR B.

We can verify that the circuit is equivalent to the OR gate by building a truth table. As before, we enumerate all the possible combinations of A and B and write the corresponding output C. The final truth table is shown here,

and as you can see, it is identical to the truth table of an OR gate. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Universal Gate NAND

This is a NAND gate. The symbol is similar to an AND gate,

but there is a little circle at the output.

In boolean expressions, we write NAND like this: C = A NAND B

The behavior of a NAND gate is the same as an AND gate followed by a NOT gate. Like this.

Intuitively, you can think about the little circle from the NOT gate getting attached to the AND gate to form a NAND gate.

This is the truth table for the NAND gate.

As you can see, the output is 1 unless both inputs are 1. When both inputs are 1, the output is zero.

The NAND is special because it is a Universal Gate. Using the NAND gate alone, we can build AND, OR, and NOT gates.

We can make a NOT gate using a NAND gate like this: We connect the two inputs of the NAND gate together as shown here.

When the input is 0, both inputs of the NAND gate are 0 and the output is a one. Conversely, when the input is 1, both inputs of the NAND Gate are one, and the output is zero. This is the behavior we expect from a NOT gate.

Let's build an AND gate from the NAND gate. We already know how to build a not gate. Let's connect that not gate to the output of a NAND Gate. The result is an AND gate.

In this diagram, there is a NAND gate, and the output has been connected to a NAND gate that has been configured as a NOT gate.

Similarly an OR gate can be built from 3 NAND gates that are configured like this: These two wires here are the inputs, and this wire here is the output of the OR gate.

The key takeaway from this lesson is that any logic that can be expressed with AND, OR, and NOT can be expressed with NAND alone. The NAND is a Universal Gate. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Exclusive-OR

The Exclusive-OR is an interesting gate. The truth table is shown here. The first three rows are similar to an OR gate, but the last row is different. When both inputs are 1, the output is zero.

The Boolean symbol for Exclusive-OR is XOR, or you can also use the + sign with a circle around it.

The circuit symbol for the XOR gate is shown here. The symbol is similar to the OR gate, but it has an extra curve at the input.

In the case of exclusive-OR, we often talk of exclusive-ORing one boolean variable onto another. To say "exclusive OR B onto A" means the same as A = A XOR B. In other words, A is changed to the result of A XOR B. The difference is, this is done in a single atomic operation. We are not computing the value A XOR B separately in a temporary variable, and then assigning the temporary value into the variable A. Instead, the value of B is XOR-ed onto the value in A, and the result remains in A. This might seem like a lot of fuss about a small matter. The reason we are discussing it here is because in many quantum computing architectures, an atomic XOR is available, but an assignment is not. That is, the preferred way to compute A XOR B is to XOR B onto A in an atomic operation. The side-effect of this atomic operation is we lose the data that was originally in A. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

XOR for Assignment

As I mentioned earlier, XOR is available in most quantum computing architectures, but assignment is not.

Fortunately, the Exclusive or gate can be used to perform variable assignment. Something like this: A = B.

We first initialize A to be 0. Next we Exclusive OR B onto A. This is equivalent to setting A to be A XOR B. At the end of both these steps, the value of B has been assigned to A.

What this has accomplished is to replace assignment with two steps. First we initialize to 0. Next we perform an XOR operation. The act of Exclusive ORing B onto A is a single operation.

A can be initialized to 0 very early in a computation before the value of B is available.

When B is available, we XOR it onto A.

The end result is that we have set the value of A to be the value of B. This might seem like an abstract idea with no practical application. But replacing assignment with XOR is very important for quantum computations. Quantum assignment is an irreversible operation and can be performed only once, at the beginning of quantum computations. But Quantum-XOR is a reversible operation, which can be performed multiple times during the course of quantum computations. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

XOR of Bit Sequences 1

Just as we XOR single bits, entire sequences of bits can be XOR-ed with each other. The result of XOR-ing two bit sequences is another bit sequence of the same length.

Consider the two sequences shown here highlighted in yellow. These are the input sequences.

The output sequence is below, highlighted in blue. The output is the result of XOR-ing the two input sequences.

For your reference, the truth table of XOR is shown on the right here.

To calculate the output, we process the inputs element by element.

Lets begin with the first elements of the inputs. These two bits highlighted in green.

The result of 1 XOR 0 is 1. The output bit is here highlighted in red.

Let’s move on to the next bits in the sequences. We XOR 0 and 1. The result is 1.

Next, we XOR 1 and 1. The result is 0. In this way we XOR each element in the first bit sequence with the corresponding element in the second bit sequence.

The outputs of the XOR operations produce the result bit sequence.

Lets go through another example for XOR-ing bit sequences.

These two bit sequences are the inputs.

This is the output. Each bit of the output is obtained by XOR-ing the corresponding bits of the input as shown in the following sequence of illustrations.

Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

XOR of Bit Sequences 2

Consider the function: C = A XOR B XOR B. A and B are the inputs. C is the output of the boolean function. Observe we XOR the value of B twice.

Let's examine the truth table for this function. We see that the output is identical to the input A.

Whenever A is 0, the output is 0, and whenever A is 1, the output is 1. The value of B does not matter.

In other words, the function C = A XOR B XOR B, is the same as C = A. This tells us that if we XOR a value twice, we get the original value before any XOR operation was performed. The second XOR essentially reverses the effect of the first XOR operation.

Lets see how this applies to bit sequences.

This bit sequence highlighted in yellow is input sequence A.

And this sequence highlighted in orange is input sequence B. When we XOR these two bit sequences,

we get this sequence here, highlighted in blue, as the result.

This is A XOR B.

Next we will XOR this sequence again with the sequence B.

The two inputs for the second XOR operation are highlighted in yellow. The upper sequence is A XOR B. The lower sequence is B.

When we XOR these two sequences, we get this sequence here, highlighted in blue as the result. This is A XOR B XOR B. Examining this sequence,

we see that it is identical to the bit sequence A that we started with. By XOR-ing the input sequence B twice, we have obtained the bit sequence A which we started with.

If A is a bit sequence, and B is another bit sequence, then if we XOR B twice with A to compute A XOR B XOR B, then we see that A XOR B XOR B = A.

This concept is important for quantum cryptography, so let's work out more examples.

Inputs A and B are highlighted in yellow.

The result of A XOR B is highlighted in blue here. We XOR this result (in blue) with input B again.

The final result here, highlighted in orange, is the same as the input A. XOR-ing the bit sequence B twice has resulted in the original bit sequence A that we started with. This shows us that A XOR B XOR B = A

Let's do this exercise again.

These are the inputs A and B, highlighted in yellow.

The result XOR-ing the two bit sequences is this bit sequence here, highlighted in blue. We XOR this result again with B.

The result is here, highlighted in orange. This is A XOR B XOR B, which we can see,

is the same as the bit sequence A we started with. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Introduction to Cryptography

Quantum physics can be used for Cryptography. The advantage of using quantum physics for cryptography is that messages encrypted using the principles of quantum physics are completely secure. The Encryption algorithms in common use today, like RSA, can be broken by quantum computers. But quantum cryptography protocols remain secure against all known attacks. Cryptography is used for sending secret messages over insecure channels. There are two ideas here.

The channel is insecure. Other people may eavesdrop on the data that is sent.

But the message remains a secret.

The sender of a secret message can encrypt the message. Only the receiver can decrypt the message. Actually let's relax that last condition. It is OK if both the sender and the receiver can decrypt the message. But no one else should be able to decrypt the encrypted message.

There are many real-world uses for secure messaging. It can be used to protect sensitive information. Maintain secrecy of financial transactions until they are carried out. Keep private data secure. This includes passwords, bank details, and so on. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Cryptography with XOR The XOR operation we discussed earlier can be used for cryptography. There are two tasks we need to perform for cryptography. First: The sender encrypts. Second: The receiver decrypts. Let's start with the first task: Encryption.

The first step of encryption, is to convert the message that is to be protected, into a sequence of bits.

The second step is to get a random sequence of bits that only the sender and the receiver know. This bit sequence has nothing to do with the message. It is a completely random sequence of bits. The special thing about this random sequence is that it is a shared secret. Only the sender and the receiver must know this random bit sequence.

How do we establish a shared secret? One way is to give the sender and the receiver, each, a copy of a book of random bit sequences. Only the sender and receiver get this book. Other than the two copies of the book with the sender and the receiver, no one else has the book. Essentially the book is a shared secret. Both the sender and receiver operate using the bit sequence on the same page of the book.

The final encryption operation is performed using XOR.

This row of bits, highlighted in yellow, is the message to be encrypted.

This second row of bits, highlighted in green, is a shared secret. The sender and receiver alone have access to this shared secret.

The result of the XOR operation is this bit sequence here highlighted in blue. This is the encrypted message.

The sender transmits this encrypted message, highlighted in blue, to the receiver. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Shared Secret

To decrypt, we use another XOR operation.

This bit sequence here highlighted in yellow is the data received from the sender. It is the encrypted message.

The second bit sequence here, highlighted in orange, is the same random bit sequence that the sender used. This is from the same page of the secret book that only the sender and receiver have. We XOR the received data with the secret random bit sequence.

The result is highlighted here in blue. This is the original message. The message has been decrypted.

To summarize, We encrypted by XORing a random bit sequence onto the message. We decrypted by XORing the same bit sequence onto the encrypted data. The result of the second XOR is the original message.

Mathematically, The message after encryption is (MESSAGE XOR SHARED_SECRET) The sender sends this encrypted data to the receiver. The receiver XORs the shared secret again. That is, the receiver computes (MESSAGE XOR SHARED_SECRET) XOR SHARED_SECRET

As we saw earlier, this is the original message. The receiver has decrypted the message.

I will list the takeaways from this chapter: The book of random bit sequences, that the sender and receiver have, is called a shared secret.

The shared secret is a random bit sequence, that only the sender and receiver know.

The sender encrypts the message, by XORing with the shared secret. The receiver decrypts, by XORing the same shared secret, onto the encrypted data. The result of the second XOR operation, is the original message itself. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Importance of Randomness

There are some properties of random bit sequences that help us with cryptography. Suppose we XOR a random sequence of bits with a message sequence, then the result sequence is also a random sequence of bits. Because the encrypted data is a random sequence of bits, there is no discernible pattern. And if there is no pattern in the data, then there is no way for an eavesdropper to decode the message.

The requirement that the bit sequence obtained after encoding should be random, means that we should not reuse the shared secret for sending more than one encrypted message. The idea here is that the random sequence should be truly random. A re-used sequence cannot be random because the same sequence is being used more than once.

The key takeaway is that as long as you use a new random sequence for each encryption, the resulting bit sequence after XOR encryption is essentially random with no patterns. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Breaking the Code

Let's look at the complete single use shared secret protocol for encryption. The first requirement is that both the sender and receiver have a book of shared secrets. Both the sender and the receiver operate on the same page of the book. The book is secret. Only the sender and receiver have access to the bit sequences in the book.

Each page of the shared secret book contains a truly random sequence of bits with no discernible patterns. Both copies of the book contain identical random bit sequences with no errors.

To encrypt, the sender XORs the message with the shared secret. The resulting encrypted data appears to be random with no discernible patterns. The sender sends the encrypted data to the receiver. The receiver XORs the shared secret with the received encrypted data. The result of the second XOR performed by the receiver is the original message.

This single use shared secret protocol is unbreakable. Observe that all conditions must be met.

The shared secret should be used only once for a single message. And the shared secret should be truly random. If the shared secret is reused, then patterns will appear in the encrypted sequence.

Single use shared secret encryption was popular during the cold war. But there were instances when the sender and receiver got lazy and reused their shared secret. This allowed some encrypted messages to be decoded.

Lets see what happens if a shared secret is reused across multiple messages:

Suppose three messages "The supply truck did not arrive",

"The unit does not have enough food",

"We are starving"

are sent using the same shared secret.

Then the encoded version of the first two messages will have the same first four characters (corresponding to "The ").

"The" is a very common English word. Someone trying to break the encryption might look at the encoded versions of the first two messages, notice that the first four encoded letters are the same, and guess that the first word of both messages was "The " followed by a space.

The code breaker can easily verify if their guess is correct. First they XOR "The " onto the first 4 letters of the first message. This will provide the first 4 letters of the shared secret.

Then they XOR these first 4 letters of the shared secret onto the third message. They see that the partially decoded 3rd message is "We a...".

It is reasonable for the code-breaker to guess that "We a....." is "We are ...". They use this guess to decode another 3 letters of the shared secret and verify that their guess is correct by trying to decode the first two messages.

The partially decoded 1st and 2nd messages are: "The sup....", and "The uni...".

The code breaker guesses again. This time they guess that a word starting with "sup" is probably "supply ". They verify their guess on the 2nd and 3rd messages and see that they are correct.

So the partially decoded messages are: "The supply ..." "The unit do..." "We are star..." Proceeding in this way, a code breaker can decipher messages if a shared secret is used to encrypt more than one message. But as long as each random shared secret is used only once, the protocol we discussed here is completely secure.

The reason we discussed this single use shared protocol here is because these ideas are at the core of unbreakable quantum encryption. Quantum encryption is provably unbreakable. As long as the laws of physics do not change, data encrypted with quantum protocols will remain secure. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Predicting the Future

In the last section we discussed an algebra of events that happened in the past. Boolean algebra works with events about which we have definite knowledge. For events that have already happened, we know that the corresponding boolean statement can be only true or false. We have definite knowledge about events in the past, so we allow boolean statements to only be either true or false. There is no "maybe". There is no "perhaps". Our knowledge is assumed to be complete and definite. But what about events that haven't yet happened? The future is uncertain. Boolean statements might be either true or false, but we don't know for certain which it will be. Though the future is uncertain, we do have some knowledge about the present that determines the future. This knowledge about the future outcomes can be codified in the form of likelihoods of events. An event A might or might not occur in the future. But we often know how likely A is to occur, and conversely how likely A is to not occur. Our knowledge of the likelihood of future events is expressed through a math model called probability theory.

Quantum physics is built upon probability theory. In quantum theory, we don't directly talk about the physical properties of a system like location , momentum, and so on. Instead, we talk about the probability distributions of each of these physical properties. Probability gives us tools to predict the outcome of events that are uncertain. Let me rephrase that. Probability theory works with events whose outcome cannot be predicted with certainty. Outcomes of events may be uncertain because of insufficient information, poorly understood physical mechanisms, or the events may inherently be a matter of chance. In the case of quantum physics, outcomes of events are uncertain because outcomes are inherently a matter of chance. With the most complete information available, and with unlimited computational resources for simulating physical processes, outcomes will still be a matter of chance. Many Quantum events are inherently probabilistic. There is nothing we can do to make such quantum events deterministic.

We can't predict outcomes. For instance when we roll dice, there are 6 possible outcomes. We can't predict the outcome with certainty. Instead, we predict the likelihood of each possible outcome. For dice, each of the possible outcomes are equally likely, and the probability is 1/6.

Sometimes we can obtain the true likelihood of uncertain outcomes. Suppose we have a coin. We can run physical tests to determine the density distribution of the material of the coin, its balance, and other attributes. If we find through physical tests that the coin is perfectly balanced, then we know that heads and tails are equally likely. Probability(heads) = Probability(tails) = 0.5

In other situations, we can't model the outcome with physics. Financial market outcomes are driven by the behavior of unpredictable humans. There is no way to model the behavior of humans with math. In such cases, we run experiments or collect data about actual historical outcomes. This data helps us estimate the probability of future events. These are only estimates. They are not as accurate as the physical analysis we perform on dice and coins. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Probability of a Boolean Expression

Probability is about predicting likelihood of events. Will something happen or not? How likely is it to happen? How unlikely? Which is the most likely outcome? and so on.

The idea of events can be linked to boolean logic. A boolean statement being true is an event that might occur or not occur. So we can talk about the probability of a boolean statement. Just remember that when we apply probability techniques to boolean statements, we are actually talking about the probability of the boolean statement being true or false.

We use a mathematical shorthand when we talk of probability. The notation P with parenthesis is to be read as "probability of"

As an illustration, probability of event A is written like this:

Probability of event A AND event B is written like this:

Probability of event A OR event B is written like this:

In general we can use this notation with any boolean function or expression. This notation refers to the probability of the boolean function being true. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Mutually Exclusive Events

Events are mutually exclusive if the occurrence of one implies that the other cannot occur. Consider a throw of dice. If the dice comes to rest at 3, then it means that it did not show any of the other numbers 1, 2, 4 , 5 or 6. Similarly, when a coin is tossed, the coin will show heads or tails. It can never be both. To be more formal, we say that A and B are mutually exclusive when:

If A occurs then B does not occur, and, If B occurs, then A does not occur.

Consider these three statements carefully. Only one of them can be true at any time. If one is true, then the other two are necessarily false.

Observe that it is possible for all of these statements to be false. These statements are mutually exclusive implies that at most one of them can be true.

Now consider these statements. These are mutually exclusive, so at most one statement will be true. But in addition, observe that at least one of these statements must be true.

In other words, exactly one of these statements must be true. This is a complete set of mutually exclusive events. Every possible eventuality is covered by this set.

The interesting property of a complete set of mutually exclusive events is that if you add up the probabilities of each individual event, the total will be 1. This is an important idea. In quantum physics, a quantum state is represented by a matrix of probabilities. All the quantum probabilities of a state add up to one. This happens only when we have a complete set of mutually exclusive events. That is, when the events are mutually exclusive, and at least one of the events must be true. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Independent Events

Events that are completely unrelated to each other are called independent events.

Events are independent if the occurrence or non-occurrence of one event has absolutely no influence on the occurrence or non-occurrence of the other event. As an example, two consecutive throws of a dice are independent. Similarly, two coin tosses.

Let's consider a real-world example. Alice and Bob don't know each other. They don't communicate with each other. They have no common friends. In that case, the color of the car Alice buys is completely independent of the color of the car Bob buys. So the events shown here:

Alice owns a red car, and Bob owns a blue car are independent events. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Manipulating Probabilities with Algebra

We have already learned how boolean expressions can be simplified using techniques of boolean algebra. In the next few lessons we will learn how to apply similar techniques to the probabilities of boolean expressions.

Our task is to work with the probability of a boolean expression. Any boolean expression. Something like:

probability of (A AND B) OR C Another example:

probability of ((A AND B) OR (NOT C) OR D)

Our goal is to express the probability of a boolean expression in terms of the probabilities of the individual events An important point to remember: From this point onward we will be working with two different algebras at the same time. One algebra is what we have already seen. Boolean algebra. This algebra is a set of rules that helps us manipulate boolean expressions. In addition, we will now learn a second algebra which will help us manipulate probability expressions. This algebra is a set of rules that helps us combine or separate different probabilities. It will help us express the probability of a boolean expression in terms of the probabilities of individual events. Conversely, it will help us compute the probability of a boolean expression from the probabilities of individual events. This confuses beginners, so I will repeat. You will be working with two different algebras, each with a different set of rules. The algebra of boolean expressions. And the algebra of probabilities. The rules for manipulating probabilities is different from the rules for manipulating boolean expressions. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

P( Mutually Exclusive Events )

Suppose A and B are mutually exclusive events. That means A implies not B. And B implies Not A. In other words, A and B cannot both be true.

When we see an expression like this: Probability of A AND B, when A and B are mutually exclusive, it means we are asking about the probability of an event that cannot occur.

A and B cannot both be true. So P(A AND B) = 0 Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

P( Independent Events )

Suppose instead, that A and B were independent. That is, A and B have no influence on each other. In that case, P(A AND B) = P(A)xP(B)

Lets work through an example.

Suppose A is the event,"Alice owns a red car", and B is the event, "Bob owns a blue car". Alice and Bob do not know each other, and they don't influence each other in any way. These two events are independent of each other.

If P(A) = 0.3 and P(B) = 0.4, then P(A AND B) = P(A)xP(B) = 0.3x0.4 = 0.12 Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Complete Set of MutEx Events

Let's go back to the situation where A and B are mutually exclusive. If A occurs then B doesn't. And if B occurs, then A doesn't. then, P(A OR B) = P(A) + P(B) That is, the probability of the combined event A OR B is the sum of the individual probabilities.

Consider these two events A and B.

If we know that P(A) = 0.1 and P(B) = 0.3

Using our formula, P(A OR B) = P(A) + P(B) = 0.1 + 0.3 = 0.4

If A, B, C and D are a complete set of mutually exclusive events, then by definition, exactly one of them must occur. So P(A OR B OR C OR D) = 1 Since these are mutually exclusive, we can use the addition formula. So P(A OR B OR C OR D) = P(A) + P(B) + P(C) + P(D) = 1

Since P(A) + P(B) + P(C) + P(D) = 1, we say that probabilities of a complete set of mutually exclusive events add up to 1. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

P( A OR B )

In the general case, for any two events A and B, this relation holds: P(A OR B) = P(A) + P(B) - P(A AND B) This is valid regardless of the relation between the events A and B. The events A and B can be mutually exclusive, or independent, or related in some other manner. This formula is always valid.

In the special case when A and B are mutually exclusive, we know that P(A AND B) = 0 So P(A OR B) = P(A) + P(B)

Another special case is when A and B are independent. When A and B are independent, we know P(A AND B) = P(A)xP(B) So P(A OR B) = P(A) + P(B) - P(A)P(B)

These relations are very useful in the real world, because the majority of events we work with, are either mutually exclusive, or independent of each other. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples 1

Let's work through an example for independent events.

Define the event A to be "Alice owns a red car" and event B is "Bob owns a blue car"

Since A and B are independent, we know that P(A OR B) = P(A) + P(B) - P(A)P(B)

If P(A) is 0.2 and P(B) is 0.3, then P(A OR B) is 0.2 + 0.3 - 0.06 = 0.44 Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples 2

Sometime we will know the probabilities of composite events. That is, We might know P(A AND B) and need to find P(A) and P(B). Let's work through an example. Define A to be "Alice owns a red car" and B to be "Alice owns a blue car". Observe that in both these events we are talking about Alice's cars. These are not independent events. If Alice purchased a red car first, she might not have enough money to buy an additional blue car. Similarly, if she purchased the blue car first, she might not be willing to buy an additional red car. The events are not mutually exclusive either. It is quite possible for Alice to own two cars. One red and the other blue. Another possibility is that Alice might not own any car at all.

By analyzing the car purchase data of people similar to Alice, we find out these composite properties: P(A AND B) = 0.3 P(A AND (NOT B)) = 0.2 Our task is to use this data to find P(A).

Remember A and B are not independent. Alice's decision to buy one car influences her decision to buy the second car. So P(A AND B) is not P(A)xP(B)

We notice that the two events A AND B and the event A AND (NOT B) are mutually exclusive. When one occurs, the other cannot occur. Study the two events carefully. You will see why they are mutually exclusive.

Another thing to observe is that for any event A, A is the same as (A AND TRUE) in boolean algebra. What does this mean? Think of TRUE as an event that we know for certain to be true. Such as "The sun rose in the East this morning." So the event "Alice owns a red car AND the sun rose in the East this morning" is the same as the event "Alice owns a red car". In other words, the situation in which Alice owns a red car is the same as the situation where Alice owns a red car and the sun rose in the East.

In boolean algebra, for any event B, B OR NOT(B) is always TRUE. This should be obvious. Either something is true, or it is false. Our definition of boolean algebra doesn't allow for any other possibility.

Now let's return to the problem. We are told that P(A AND B) = 0.3 P(A AND (NOT B)) = 0.2 And we are asked to find P(A).

We can write P(A) as P(A AND TRUE) Within the P() operator, the rules of boolean algebra apply. We know that A = A AND TRUE So P(A) = P(A AND TRUE)

Further, we know that B OR NOT(B) = TRUE

Substituting within the P() operator, using the rules of boolean algebra,

P(A AND TRUE) = P(A AND (B OR NOT(B))) = P( (A AND B) OR (A AND NOT(B)) )

Work it out for yourself to make sure you understood this. In the next step we are going to use a probability rule to break the single P() operator into two.

We established earlier that A AND B and A AND NOT(B) are mutually exclusive.

When events X ad Y are mutually exclusive, we know P(X OR Y) = P(X) + P(Y)

So P( (A AND B) OR (A AND NOT(B)) ) = P(A AND B) + P(A AND NOT(B))

Reflect for a moment on what we did here. We broke the single P() operator into two.

This part highlighted in green, became this, highlighted in red.

This part highlighted in blue, become this, highlighted in yellow. To split the single P() operator into two, we used the probability formula for mutually exclusive events.

In this problem, we are given these two values. P(A AND B) = 0.3 P(A AND (NOT B)) = 0.2 So P(A) = P(A AND B) + P(A AND NOT(B)) = 0.5 We have found the probability of the individual event A from probabilities of composite events.

In this calculation, we used two different kinds of algebra.

Within the P() operator, we used boolean algebra. We rewrote A as (A AND B) OR (A AND NOT(B)) But Boolean algebra alone is not enough to solve this problem.

We then used the algebra of probabilities. P(X OR Y) = P(X) + P(Y) when X and Y are mutually exclusive. Substituting A AND B for X A AND NOT(B) for Y we obtained P( (A AND B) OR (A AND NOT(B)) ) = P(A AND B) + P(A AND NOT(B))

Finally, we had P(A) = P(A AND B) + P(A AND NOT(B)) which evaluates to 0.5

In this problem we used two different kinds of algebra.

We used boolean algebra within each P() operator.

Next, we used the algebra of probabilities to manipulate expressions with the P() operator. By using these two algebras together, we were able to solve the problem. Many problems in quantum computing will require that both boolean algebra and probability rules be used together like this. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

P( Bit Values )

Bits in a computer's memory can be 0 or 1. A bit being 1 is an event that can be analyzed through probability theory. In quantum computing, we will be analyzing the probabilities of quantum bits. There will be situations where we know the probability of groups of bits taking certain values, and from that we need to find the probabilities for individual bits. Suppose we have a system with two bits. The possible combinations of two bits are shown here:

00, 01, 10, and 11

We will call the bit on the left the first bit, and the bit on the right the second bit. Each combination can be considered an event.

This is event: first bit = 0 and second bit = 0.

This is the event where first bit = 0 and second bit = 1.

This is the event: first bit 1 and second bit 0.

This is the event where both bits are 1.

These are a complete set of mutually exclusive events. Exactly one of these events 00, 01, 10, or 11 will always be true.

Let's find the probability of (first bit = 1 AND second Bit = 1) OR (first bit = 1 AND second bit = 0). This is P(11 OR 10).

We know P(0) is P( NOT 1). So we rewrite this as: (first bit = 1 AND second Bit = 1) OR (first bit = 1 AND (NOT second bit = 1))

Recall the Alice and Bob example from the previous lesson. The expression here is of the same form: (A AND B) OR (A AND (NOT B)) which simplified to A

So the boolean expression (first bit = 1 AND second Bit = 1) OR (first bit = 1 AND (NOT second bit = 1)) simplifies to (first bit = 1). We have established that P(11 OR 10) = P(first bit = 1). But the events 11 and 10 are mutually exclusive. So P(11 OR 10) = P(11) + P(10)

This means that P(first bit = 1) = P(11) + P(10)

We can write similar relations for the other bit combinations: P(first bit = 1) = P(10) + P(11) P(second bit = 1) = P(01) + P(11) P(first bit = 0) = P(00) + P(01) P(second bit = 0) = P(00) + P(10) These relations are important for analyzing the behavior of groups of quantum bits. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Analysis with Venn Diagrams

So far we have solved problems that involve probabilities of boolean expressions using two different algebras. For boolean expressions within the P() operator, we used boolean algebra. To manipulate the P() operators, we used the algebra of probabilities. Using two different algebras together can be difficult. Fortunately, there is an alternative method to help us solve such problems visually. We can represent events and their probabilities on diagrams.

A large rectangle represents all possible events. The probability of all possible events = 1. In other words, the rectangle represents a complete set of events.

The event A can be drawn as a blob inside the rectangle. We label the blob with P(A).

Similarly for event B.

We can draw both events A and B on the same diagram like this. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Venn Diagram: P(A AND B)

We can represent composite events on these diagrams.

The region colored red is the event A AND B.

The area of the colored region is P(A AND B).

If you have studied Venn diagrams in high school, you will see some similarities with what we are doing here.

Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Venn Diagram: P( A OR B )

Let's consider the event A OR B. This is the colored region on this diagram. The area of the colored region is P(A OR B). In terms of sets, Event A is the set of possibilities inside the blob marked A. Event B is the set of possibilities inside the blob marked B. Event A OR B is the set of possibilities where either A or B occur. That is the union of possibilities for A and B. And that is the region marked in red here.

Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Venn Diagram: P( NOT A )

The final boolean operation we will represent on the diagram is NOT. The colored region is NOT A. The rectangle is the complete set of all possible events. Event A is the set of possibilities inside the blob marked A. So NOT A is simply the region outside the blob A in the diagram.

Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples 1

Probability diagrams can help us solve problems visually.

Suppose we have P(A) = 0.2, P(B) = 0.3, and P(A AND B) = 0.05. A and B are not independent because P(A AND B) is not P(A) x P(B). They are not mutually exclusive because P(A AND B) is not zero.

Our task is to find P(A AND (NOT B)). Visually we see that NOT B is the area outside B. The intersection of that with A is A AND (NOT B). This is the colored region. Visually, we see that the probability of the colored region is P(A) - P(A AND B). This is 0.2 - 0.05 = 0.15

Our next task is to compute P((NOT A) AND B). Visually, we see that it is the colored area. That is, the intersection of everything outside A, NOT A, with B. The area represents the probability. So the colored area is P(B) - P(A AND B) = 0.3 - 0.05 = 0.25

Lets find P(A OR B). Visually, A OR B is the colored region. The area can be computed as P(A) + P(B) - P(A AND B). The reasoning is that we add the areas in A and B. But then we have counted the overlap of A intersection B twice. So we subtract that out to compensate. That is, 0.2 + 0.3 - 0.05 = 0.45 Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples 2

The reason we use these diagrams is because they help us convert expressions involving probability of booleans, into arithmetic expressions involving areas on the diagram. Let's work through another example. We are given P(A) = 0.3 P(B) = 0.4 P(A OR B) = 0.6

Find P(A AND B) We know that P(A OR B) = P(A) + P(B) - P(A AND B) That means P(A AND B) = P(A) + P(B) - P(A OR B) = 0.3 + 0.4 - 0.6 = 0.1

Find P(A AND (NOT B)) This is something like a Venn diagram in sets. AND corresponds to set intersection. OR corresponds to set union. NOT corresponds to set inverse. A AND (NOT B) on the diagram is A intersection everything outside B. That is this colored area. In terms of areas on the diagram this is the same as: ((A union B) - B) In probabilities, that is P(A OR B) - P(B) = 0.6 - 0.4 = 0.2

Find P((NOT A) AND B) This is similar to the previous problem, but we will do this using a different method. We can say that the area we want, shaded here in this diagram, is the same as P(B) P(A AND B) = 0.4 - 0.1 = 0.3 Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Conditional Probability

In this lesson we will learn about conditional probabilities. Let's start with the general diagram of events A and B. We want to find out the probability of B, given that A has occurred.

We are told that A has occurred. So in the diagram we are now restricted to the area shaded in yellow. Within this yellow region, we need to find the occurrence of B.

That would be this orange region which represents both events A and B occurring. So how does this help us find Probability of B given that A has occurred? The answer we want is the fraction of area that is orange within the yellow region. That is, the probability of B within the area that represents event A. Visually, this is P(A AND B)/P(A)

Another way to visualize the same thing is to say that we are stretching A to fill the entire rectangle. Anything inside A stretches proportionally. The scaling factor is 1/P(A). Why ? Because we need to multiply the yellow area by some scaling factor so that it becomes 1. The yellow area is P(A). If we multiply it by 1/P(A), the product is 1. So 1/P(A) is the scaling factor. Applying this scaling on the orange area, P(A AND B), we get P(A AND B)/P(A). Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples

In terms of algebra, P(B/A) = P(A AND B)/P(A) Similarly, P(A/B) = P(A AND B)/P(B) These are the conditional probability formulas.

Let's look at a real-world example.

In a group of 100 buyers at a grocery store,

10 purchased apples, 20 purchased oranges,

and 5 purchased both apples and oranges.

We are told that one of the customers, Alice, purchased an apple. What is the probability that she also purchased an orange?

This is P(orange/apple) = P(Orange AND Apple)/P(Apple) = 0.05/0.1 = 0.5 Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Introduction to Statistics

So far we have been talking about probabilities of boolean functions. Boolean functions are either true or false, so it is easy to use them to represent events that either occur or don't occur. The result is uncertain, but the result is always either true or false.

Statistics deals with probabilities of functions that return values other than true or false. In the real world, boolean functions quite rare, but random functions that return real numbers are very common. The maximum temperature on any day is a random function. The outcome is uncertain, and the result is not true or false. Similarly, the wind speed at the location where you are now is a random function. The value is uncertain, and a real number. Statistics helps us analyze the behavior of such functions. We will learn more about statistics in the next few chapters. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Random Variables

Consider a throw of dice. This is an uncertain event. The result is not merely occurrence, or non-occurrence.

Instead, the result is a number between 1 and 6. Uncertain functions whose result is something other than true or false are called Random Variables.

A coin toss can also be a random variable. Instead of mapping the outcomes to True or false, we map the outcomes to numbers.

Heads to +1 Tails to -1 The random variable can be either -1 or +1

We can analyze the aggregate behavior of random variables. In this case, we can calculate the average value of a coin toss. If we throw the coin a 100 times, we expect 50 heads and 50 tails. With our mapping, we expect to get +1 fifty times and -1 fifty times. The average of the coin toss random variable is (50-50)/100 = 0 Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Mapping Random Variables

Let's learn how to map real world events to random variables.

The first step is to consider a complete set of mutually exclusive events. This is important: The set of events is both complete and mutually exclusive. That is, exactly one of the set of events will always be true. The second step is to map a real number to each event in the set. That is, to each outcome. The result is a random variable.

Let's consider a throw of dice. There are six possible outcomes. These six outcomes form a complete set of six mutually exclusive events. The dice will always show exactly one number from 1 to 6.

Let's map the outcome of a throw of dice like this:

Now that we have defined the random variable, lets analyze its behavior in the aggregate. The average value is (1+2+3+4+5+6)/6 = 3.5 This average we have computed is for this specific mapping where the number displayed on the dice is the result of the random variable. But other mappings are also possible.

Let's map a display of 1 on the dice to the number -3 Similarly we map dice 2 to the number -2 dice shows 3 to -1 dice shows 4 to +1 dice shows 5 to +2, and dice shows 6 to +3

In this case, the aggregate behavior is different. The average is (-3 + -2 + -1 + 1 + 2 +3)/6 = 0 The takeaway here is that the mapping determines the aggregate behavior of random variables. When we change the mapping, then aggregate properties like average also change.

Let's create a random variable based on car colors. Alice is planning to buy a car. She might choose red, blue, black, white or some other color. How do we define a random variable based on Alice's choice of car color? Alice's choice of color is a complete set of mutually exclusive events. It is a mutually exclusive set because choosing one color means that the other colors do not occur. There will be exactly one color chosen, so it is a complete set. We map each color to a real number. Then Alice's color choice can be represented as a number. This is the final random variable representing Alice's choice of color for her car.

Let's take a quick look at some other examples of random variables:

Rainfall is a random variable.

The change in stock market valuation on a specific date is a random variable.

Winners of the football world cup is a random variable.

Observe that in the case of the football cup, we need to map the outcomes to numbers to get the final random variable. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Mean, Average, Expected Value, ...

In this lesson we will discuss aggregate properties of random variables. The most common aggregate property is what we commonly call average, mean, estimate, and expected value. They all mean the same thing. This concept will be clear after we work through some examples.

Consider a throw of dice.

This is a random variable with six possible outcomes. Let's suppose we have mapped the faces on the dice to the real numbers v1, v2, v3, and so on to v6

Each outcome has probabilities: P(v1), P(v2), P(v3), and so on.

The mean is calculated as: v1 x P(v1) + v2 x P(v2) + and so on to v6 x P(v6) In other words, we multiply each possible value of the outcome, by the probability of that outcome, and then we add up all the products.

To say it formally, for a random variable that has m possible outcomes, we multiply each of the m values by the corresponding probability, and then we add it all up. The v x P(v) here refers to each of the m values multiplied by the corresponding probability. The Sigma adds up all the products. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples 1

Let's consider a more interesting example. Loaded dice. That is dice which favors one face. Also called unfair dice because this promotes cheating in games.

The probabilities are shown here. Observe that the sum of all these probabilities is 1. This is because the outcomes are a mutually exclusive and complete set.

This column has the mappings. These numbers are the possible values of the random variable. Let's calculate the mean.

We substitute the numbers into our formula. Multiply each possible value by the probability of that value. For the value 1, we multiply by its probability 0.1 For the value 2, we multiply by its probability 0.1 For the value 3, we multiply by its probability 0.1 For the value 4, we multiply by its probability 0.2 For the value 5, we multiply by its probability 0.2 For the value 6, we multiply by its probability 0.3 Adding up the products, we get 4.2

Recall, from an earlier chapter, that the mean or average of a fair dice was 3.5 The unfair loaded dice favors 6 which explains why the mean is higher than for a fair dice. The mean of a random variable is easily found through experiments without enumerating the probability of each possible outcome. If we find through experiments that the mean is significantly different from what we would expect for a fair dice, then we might have reason to believe that the dice is loaded. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples 2

Let's define a random variable for an unfair coin. A coin which favors heads. The mapping is -1 for tails, and +1 for heads

The probability of tails, that is -1 is 0.48 and the probability of heads, that is +1 is 0.52 Let's calculate the mean. Also known as average, estimate, or expected value.

We multiply each possible value by its corresponding probability. Multiply -1 by p(-1) and +1 by P(+1)

P(-1) is 0.48 P(+1) is 0.52 Substitute. Add up the products. The average or mean of this random variable is 0.04 If the coin had been fair, then with this mapping of -1 to tails and +1 to heads, the average would have been 0. The non-zero average we obtained for this coin is an indication that it might not be fair. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Beyond Mean

The mean alone doesn't tell us enough about a random variable. Consider a random variable defined around a fair coin like this: Tails is mapped to -1 Heads to +1 For a fair coin, the mean is 0.

Let's define a different mapping for the same coin. Tails to -10 Heads to +10 This a very different random variable from the previous one. The numbers are an order of magnitude different. But the mean is the same. It is still 0.

Consider a fair dice with the values mapped as shown here. The mean of this random variable is also 0. This random variable is completely different from the previous random variables with fair coins. This has 6 possible outcomes instead of 2. The values are different. Yet the mean is the same.

The takeaway here is that different random variables with very different behavior can have the same mean. The mean doesn't give us a complete picture of the behavior of a random variable.

To be precise, the mean doesn't tell us about the spread of the random variable. It doesn't tell us how different the possible values are, or the magnitude of the possible values. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Standard Deviation

In this chapter we will discuss another aggregate property of random variables called standard deviation. The name tells us something about this property. It is a measure of how much the random variable deviates from its mean. We will learn about this property through examples. Consider the mapping for dice shown here. This is a fair dice, so the probability of each outcome is 1/6. The values are mapped as shown here. As an exercise, stop here and calculate its mean. Did you get 3.5? That is the mean of a fair dice with this mapping.

Now let's discuss variations or spread of the random variable from its mean.

If the outcome is 1, then the variation or spread is (1-3.5) squared.

We square the difference to ensure we always have a positive number.

If the outcome is 2, the variation from the mean is (2-3.5) squared.

If the outcome is 3, the variation is (3-3.5) squared.

If the outcome is 4, the variation is (4-3.5) squared.

If the outcome is 5, the variation is (5-3.5) squared.

If the outcome is 6, the variation is (6-3.5) squared.

Now that we have the variations, let's calculate the mean of the variations. The idea is the same. Multiply each value by the corresponding probability and add it all up. The probability of each outcome is 1/6 because this is a fair dice. The mean is 2.91667

This mean we have calculated, that is, the average of the variations or spread, is called "variance".

While calculating the variations, we squared the difference between the random variable and its mean. This squaring had the benefit of removing negative numbers. But it has also skewed the numbers. Large variations have a geometrically larger impact on the variance. Usually we don't want this skewing effect we see in the variance. So we define a new quantity called standard deviation which is the square root of the variance. In this example, the standard deviation is square root of 2.91667. That works out to 1.708. This number is a measure of the spread of the random variable away from its mean.

This concept of standard deviation is important because in quantum physics measurements of physical properties are uncertain. The results are a matter of chance. The quantum physics term "uncertainty" refers to the standard deviation of a measured physical property. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples

Let's work through more examples of standard deviation. Consider a fair coin toss.

We map tails to -1 and heads to +1 Since the coin is fair, the mean of this random variable is 0.

Now let's enumerate all the possible values of the random variable, subtract the mean, and square it. For tails, we have -1 - 0 squared For heads, we have +1 - 0 squared

We multiply each of the values computed in the second column by the corresponding probability. Probability of tails is 0.5, and probability of heads is 0.5. Adding, we get 1.

So variance is 1. The standard deviation is the square root of the variance. Square root of 1 is 1. So standard deviation is 1.

Lets create a different random variable from the same fair coin.

We map tails to -10 and heads to +10 The mean is 0.

In the second column we will subtract the mean from each possible value of the random variable, and square it. For tails we have -10 - 0 squared. This is 100 For heads we have +10 - 0 squared. This is also 100.

In the third column we multiply by the probability of each outcome. For a fair coin, the probability of tails is 0.5, and heads is 0.5. For tails we multiply 100 by 0.5. And for heads we multiply 100 by 0.5

Adding we get the value of variance as 100. Standard deviation is the square root of variance. That is 10.

Next, lets analyze an unfair coin.

We map tails to -10 as before, but the probability of tails is 0.48. We map heads to +10. The probability of heads is 0.52. This coin favors heads.

The mean of an unfair coin is not 0. It is -10 x 0.48 +10 x 0.52 = 0.4

In the second column we subtract the mean from each outcome and square it.

In the third column, we multiply by the probability of each outcome.

Adding, we find that variance is 99.84 and standard deviation is the square root of the variance, which works out to 9.99.

Lets analyze a loaded dice.

We map the number of the face of the dice as the random variable outcome, as shown in this column.

This is a loaded dice, so the probabilities of some values are different as you can see in this column here. The dice favors 6. The probability is 0.3

The mean of this loaded dice is 4.2.

In the second column we subtract the mean from each outcome and square it.

In the third column we multiply the probability of each outcome with the result of the second column. Adding, we find that variance is 2.76. Computing the square root, we find that standard deviation is 1.66. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Combinations of Random Variables

A random variable is like a function. The difference is that the outcome is uncertain. We can perform arithmetic on functions. We can add, subtract, and multiply functions. The result of these operations is a new function. In a similar manner, we can add and subtract random variables. When we combine random variables using arithmetic, the result is a new random variable. For any two random variables, it can be proved that:

The mean of the sum of the two random variables is the sum of the means of the two random variables.

To rephrase that,

if we calculate the individual mean of random variable X, highlighted here,

and then if we calculate the individual mean of another random variable Y, highlighted here, and if we add the two means, then the result is the same as the mean of the sum of the two random variables,

highlighted here. This might look simple when written as an expression like this, but it is not a trivial result. Suppose the random variable X is the result of a fair coin toss. -1 for tails, and +1 for heads. The mean is 0. Further, suppose Y is the result of a throw of dice. Face value 1 is mapped to 1, 2 is mapped to 2, and so on. The mean is 3.5. If we add these two random variables, the result is a new random variable with complex behavior. The value can range from 0 to 7 with varying probabilities for each outcome. The value 0 occurs when the coin toss is tails and the dice value is 1. The value of 7 occurs when the coin toss is heads and the dice value is 6. It is possible to figure out the probability of each outcome between 0 and 7. Using that we can calculate the mean of the random variable X+Y. But this expression gives us a simpler way to compute the mean of X+Y. The mean of X+Y is the mean of X + the mean of Y.

That is 0 + 3.5 = 3.5. We can calculate the mean of the combined random variable, without calculating the probabilities of each outcome of the combined random variable. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Correlation

When events are independent, their probabilities are easy to analyze. P(A AND B) = P(A) x P(B) But this is true only when boolean events A and B are independent. In this chapter, we will learn to work with dependent events. These are events for which P(A AND B) is not equal to P(A) times P(B). Consider these two events: Alice earns more than $200,000 a year Alice owns a Porsche car These two events are related. If Alice earns more than $200,000 then the probability that she owns a Porsche is higher than if she earns less than $200,000.

Let's look at events that are closely related to each other.

Suppose Charles takes two coins. One coin is red in color, and the other is green.

He places the two coins in 2 sealed envelopes.

He then shuffles the envelopes so that no one knows which envelope contains the red coin, and which envelope contains the green coin.

Charles gives one sealed envelope to Alice, and the other to Bob.

Now consider these two events: Alice has a red coin, and Bob has a red coin. These two events are mutually exclusive. If Alice has a red coin, then Bob will have a green coin, not a red coin. Similarly, If Alice has a green coin, then Bob will have a red coin. Essentially, one event determines the other.

Now consider these two events. Alice finds a red coin in her envelope, and Bob finds a green coin in his envelope. These events are also related. If one event is true, then the other is also true. If one is false, then the other is also false. In the next chapter, we will explore techniques for analyzing such events and their probabilities. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Analysis of Correlation

In this chapter we will learn how to analyze random variables that are closely related. To refresh your memory, I will repeat the situation we considered in the previous chapter.

Charles places one red coin, and one green coin, in two envelopes. He shuffles the envelopes so that no one knows which envelope has which coin. Alice and Bob each take one envelope.

Let's define random variable RVA based on what Alice finds in her envelope. If Alice finds a red coin, then the random variable is +1. Instead, if Alice finds a green coin, then the value of RVA is -1.

Similarly, we define a random variable RVB for Bob's envelope. If Bob finds a red coin, then RVB is +1 If Bob finds a green coin then RVB is -1

Observe that RVA and RVB are strongly related. If any of them is +1, the other is -1. Conversely, if one of them is -1, the other is +1.

Now let's consider the product of the two random variables. That is RVA x RVB As I mentioned earlier, if RVA is +1, then RVB is -1. If RVA is -1, then RVB is +1 So this product is always -1. RVA and RVB are random variables. Their outcome is a matter of chance. But their product is not a random variable. It is merely the constant -1. The product of the two closely related random variables is deterministic.

Next, let's consider a slightly different pair of random variables.

Suppose Charles places two coins of the same color in both envelopes. Charles chooses the color (that is, blue or green) at random, but both envelopes have the same color. Then RVA and RVB will always have the same value. If RVA is -1, then RVB is also -1. If RVA is +1, then RVB is also +1

So their product RVA x RVB is always +1. That is, a constant, +1. So far we have seen what happens when there is a strong correlation between random variables.

Instead, suppose Charles chooses the coin for each envelope independently, and at random.

The first two columns here show the possible outcomes of RVA and RVB. Since we have two random variables, there are 2 raised to the power of 2 = 4 possible combinations. These combinations are the rows of this table here.

In the third column, we have written the product of the two random variables. RVA x RVB.

The fourth column is the probability of each combination. Each individual outcome of -1 or +1 has a 0.5 probability. Since the two are independent random variables, we can calculate the joint probability of each combination by multiplying the individual probabilities. Recall, that for independent events P(A AND B) = P(A) x P(B). So the numbers in the fourth column are 0.25. Which is 0.5 times 0.5. Now let's calculate the mean of RVA x RVB.

The mean is the value of each outcome multiplied by the probability of that outcome. All products to be summed up. We do that here. And the answer is 0.

We see that E(RVA x RVB) is very different depending on how they are correlated. Sometimes RVA x RVB can be constant. In that case, the mean of RVA x RVB is also that same constant.

In other cases, Mean of RVA x RVB is 0. This is when they are uncorrelated.

A common way to test if random variables X and Y are correlated, is to compute Mean of (XY) If X and Y are uncorrelated, then Mean of (XY) will have one value. If they are correlated, then Mean of XY will be different. Observe that these specific values we got here for Mean of XY depends on how we map the outcomes to values. If we had chosen different mappings, the mean would have been different. The actual numbers are not significant. Instead,

I want you to remember that by computing the mean of the product of two random variables, we can test if the random variables are correlated or not. That is the key takeaway of this chapter. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Introduction to Complex Numbers

In this section we will talk about complex numbers. You might remember from highschool that complex numbers are these imaginary quantities that are derived from the square root of -1. You might wonder what complex numbers have to do with quantum physics. To start with, I suggest you work with this intuitive explanation: Complex numbers are intermediate values that are not directly used in the real world. But as intermediate values, they are very useful. When transformations of the physical world are expressed in terms of complex numbers, the math becomes simpler than what it would otherwise be. In other words, complex numbers are a conceptual shortcut that helps us simplify the math. In addition, complex numbers have some nice theoretical properties that make them useful for modeling quantum systems. They are complete with respect to algebra. Any algebra operation on a complex number will yield a complex number. This is unlike subsets like integers, rational numbers, and real numbers. Division of integers can produce rational numbers. Square roots of positive rational numbers can be irrational. Fractional powers of negative real numbers can be complex. But for complex numbers, any operation we perform will always yield another complex number. In a sense, the set of complex numbers is a superset of all possible numbers we can compute through algebra.

That was the theoretical reason for using complex numbers. But for the sake of intuition, there is a much simpler reason. Using complex numbers makes the math of physical transformations much easier. And that is enough of a reason to use complex numbers.

Quantum physics uses complex numbers to represent the physical state of any system. We use complex numbers because state transformations can be performed by linear mathematical operations. Instead, if we chose to represent the quantum state using real numbers, then the mathematical operations for state transformations will be more complex. Most of us first encounter complex numbers when we try to solve an equation like this: x squared = -1 It is clear that there is no real number that can satisfy this equation. In other words, x is a meaningless quantity in the real world.

But x is meaningless only if x were to directly represent a quantity in the real world.

Instead, if x were an intermediate value, then this equation will make sense.

Suppose we define x to to be the square root of some quantity we call the correction factor. The correction factor is a real world quantity. It might represent some correction we need to apply on a physical system. It can be positive or negative.

With this definition of x, the equation here makes sense. x is the square root of the correction factor. That is the same as saying x squared is the correction factor. The correction factor can be negative. So x squared, the correction factor = -1 makes sense. In this example, x squared represents a real-world quantity. X squared can be negative. x by itself doesn't have meaning in the real world. But it can be an intermediate quantity to be used in further calculations.

The algebraic expressions for the further calculations might be simple when expressed in terms of x, and complicated when expressed in terms of x squared. In such situations, we want to be able to talk of the value of x, instead of the value of x squared. And that is what complex numbers are about. It is a way for us to talk of the value of x instead of x squared. Math for quantum physics is simpler when expressed in terms of x instead of x squared. x is only an intermediate quantity. It has no meaning in the real world. But the math expressions are simpler when written in terms of the imaginary quantity x than if they were written in terms of the real quantity x squared. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Imaginary i

In this chapter we will learn how to represent quantities like x, where x squared = -1. Ideally, we want to be able to represent any fractional power of any negative number. Furthermore, we also want to compute fractional powers of fractional powers.

Suppose x squared is -1. We know that x is not real.

What about square root of x? Can we compute fractional powers of quantities that are not real? Amazingly, all of these fractional powers can be represented in terms of just one imaginary quantity.

We define i to be the square root of negative 1. i is meaningless in the real world as a physical quantity. But this definition of i is useful because every other meaningless quantity can be represented in terms of i. These quantities represented in terms of i, that are meaningless in the real world, are called complex numbers.

The form of a complex number is shown here: a + bi

a and b are real numbers. a is the real part of the complex number.

bi is the imaginary part. This is a universal form that is sufficient to represent any fractional power of any number.

Once we have defined i = square root of -1, any nth root of any number can be written in the form a + bi. a + bi is the general form of any complex number. Observe that a and b are both real numbers. So we are saying that any complex number can be written as the sum of a real number and an imaginary quantity that is a real multiple of i.

Furthermore, any fractional root of any complex number is also a complex number, which can be represented as a + bi.

The key takeaway is that once we have given a name to the quantity square root of -1, any other fractional power that is meaningless in the real world can be written in terms of i, with the general form a + bi.

Some examples will make the concept clear. What is square root of -4? -4 can be written as +4 x -1. We apply the square root separately on each of these terms. Square root of +4 is 2 Square root of -1 is given a name. It is called i. So square root of -4 is 2 times i.

Can we find the fourth root of -1 ? That is the same as the square root of i. We will learn how to compute the square root of i later in this book. For now, I am going to give you the answer: square root of i is (1+i)/root(2). Now I will verify that by squaring this quantity.

To square a number, we multiply it with itself.

(1+i)/root(2) multiplied with itself is shown here.

Expanding, we find that this is i. So the square root of i is (1+i)/root(2) Think about what we have done here. i is an imaginary quantity. We have applied a square root operation on an imaginary quantity. The result is also not real. But interestingly, it can be expressed in terms of i.

All we did was to give a name to one imaginary fractional root of a negative number. We defined i to be the square root of -1

Now any imaginary quantity can be written in terms of i.

The general form is a + bi. a and b are real numbers. Every complex number can be written as the sum of a real part and an imaginary part. The imaginary part is a real multiple of i. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Addition of Complex Numbers

We can perform math operations on complex numbers just as we would on real numbers. Let's add (1 + 2i) and (3 + 4i). We expand this just like any other expression in algebra. Treat i as an unknown. Like x. The pure real numbers 1 and 3 are added together.

The multiples of i, that is 2 and 4 are added together. The final answer is 4 + 6i

When performing math operations, treat i like any unknown. You are familiar with algebra using unknowns like x and y. This is similar. Treat i the same way you would an unknown x. i is almost unknown. The only thing we know about i is that i squared is -1. So treat i as an unknown. Whenever you see i squared, replace it with -1.

Add (4-3i) and (-2 + 7i). Expand treating i like any unknown. The way you would an algebra expression with x or y.

Group the purely real numbers separately, and the multiples of i separately. Simplifying, it is 2 + 4i

Add (-6 + 7i) and (-1 - 2i). We simplify this in the same way. The final result is -7 + 5i. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Subtraction

Subtraction of complex numbers is similar (6 + 3i) - (4 + 2i)

Removing the parenthesis, we get 6 + 3i -4 - 2i

We group terms without i and the terms with i. (6 - 4) + (3i - 2i)

which works out to 2 + i

Another example.

Another example. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Multiplication by a Real

We can multiply a real number and a complex number. In the general case, we can multiply the complex number a + bi by a real quantity k k(a + bi) = ka + kbi

Let's calculate 3(2 + 3i) = 6 + 9i

Let's calculate (-4) x (2 - 3i) That is -8 + 12i In algebra, i can be treated the same way you would treat an unknown x or y. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Division by a Real

In the same way, we can divide a complex number by a real number. 4 + 6i divided by 2 = 2 + 3i

(1/3)rd of (6 - 9i) = 2 - 3i

(7 + 2i) divided by -3 = -7/3 - (2/3)i Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Complex Multiplication

In this chapter we will learn how to multiply one complex number by another complex number. When performing math operations, treat i like any unknown. You are familiar with algebra using unknowns like x and y. This is similar. Treat i the same way you would an unknown x. i is almost unknown. The only thing we know about i is that i squared is -1. So treat i as an unknown. Whenever you see i squared, replace it with -1.

Let's calculate i times (1 + 2i) Both i and (1 + 2i) are complex numbers. We expand the product by treating i like an unknown, just like you would an expression with x or y. = i + 2(i x i) i is almost an unknown. We don't know what i is. But we do know that i squared is -1.

Let's replace the i x i with -1 = i + 2(-1) = -2 + i We usually write complex numbers in the form real part plus complex part. This is the convention. To avoid confusion, you should always write the real part first, followed by the imaginary part. -2 is the real part. i is the imaginary part.

Find the product: (3 + 4i) x (5 + 6i) Expanding, we get this expression. i is treated just like an unknown. But we do know that i squared is -1.

There is an i squared term here. The i squared is -1. Simplifying, we get this. We rewrite the expression so that the real part appears first. The final answer is -9 + 38i Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples

Let's look at more complicated examples of complex number multiplication. (-7 +2i)(3 - 4i) We expand this using the usual rules of algebra. i can be treated as an unknown like x or y. = -21 +28i +6i - 8i squared

Observe the i squared term here. We can replace i squared with -1 Simplifying further, we get -13 + 34i We always write the real part first and the imaginary part after that.

Another example: (6 - 2i)(-9 + 4i) = -54 + 24i + 18i - 8 i squared

= -54 + 42i -8(-1) = -46 + 42i Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Complex Conjugate

There is an operation we can perform on complex numbers called complex conjugate. When we are first introduced to this operation, it appears to be quite useless. But this is the core of many operations in quantum physics. But for now, there is another reason for us to learn about complex conjugates. It helps us divide one complex number by another. Finding the complex conjugate is simple. The complex conjugate of a + bi is a - bi.

Just change the sign of the imaginary part.

Let's look at some examples. The complex conjugate of 2 + 3i is 2 - 3i All we did here was to change the sign of the imaginary part. The part which has the i.

Find the complex conjugate of 7 - 6i That would be 7 + 6i. We have changed the sign of the imaginary part.

One last example: Find the complex conjugate of -2 -3i That is -2 + 3i Complex conjugation is a simple operation. It is also surprisingly useful as we will see when we learn about division of complex numbers. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Squared Magnitude

There is another simple operation on complex numbers you need to know about. This is the squared magnitude. For a complex number a + bi, the squared magnitude is a squared + b squared. That is, the sum of the square of the real part and the square of the coefficient of the imaginary part.

The interesting thing is, if you multiply a complex number by its complex conjugate, you get its squared magnitude. Consider a complex number a + bi Multiply it by its complex conjugate a - bi This product (a + bi)(a - bi) simplifies to a squared plus b squared.

Find the squared magnitude of 2 + 3i Let's calculate this through complex conjugates. (2 + 3i)(2 - 3i) = 4 - 6i + 6i -9i squared Observe how the imaginary terms here cancel out. The answer is 13.

Find the squared magnitude of 2 + 3i through squaring the coefficients. 2 squared + 3 squared = 4 plus 9 = 13

Find the squared magnitude of 7 - 6i using both methods.

Through squaring the coefficients, we have this expression which simplifies to 85.

Through complex conjugates, we have this expression which also simplifies to 85. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Complex Division

In this chapter, we will learn how to divide complex numbers.

A division of the form a + bi divided by c + di. This looks difficult. But there is one kind of division we already know how to compute.

We know how to divide a complex number by a real number. a + bi divided by e is (a/e) + (b/e)i

So our method is to convert this complex number division: (a + bi)/(c + di) into a division by a real number. That is, we need to convert the denominator into a real number, without changing the value of the expression.

How do we do that? The trick is to multiply both the numerator and the denominator by the complex conjugate of the denominator. We are multiplying both the numerator and the denominator by the same value. So the value of the expression is not changed. Let's see what happens when we multiply both numerator and denominator by the complex conjugate of the denominator.

(a + bi)/(c + di) is the original expression. We multiply by (c - di)/(c - di) Let's simplify the denominator first. (c + di)(c - di) is c squared + d squared.

This is the squared magnitude of the denominator. And it is a real number. The squared magnitude of a complex number is always a real number. Now we have a complex number expression in the numerator and a real number in the denominator.

To simplify the numerator, we need to perform complex number multiplication, which we already know how to do. The denominator is a real number. We already know how to divide a complex number by a real number.

Let's work through an example. Divide 1 + 2i by 3 + 4i The complex conjugate of the denominator is 3 - 4i. We multiply both the numerator and the denominator by 3 - 4i. The denominator now becomes a real number, 25. We expand the multiplication in the numerator and get 11 + 2i. We divide 11 + 2i by 25 and we finally get (11/25) + (2/25)i. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples

Let's work through more examples. Divide 4 - 5i by 6 + 7i The complex conjugate of the denominator is 6 - 7i. Multiply both numerator and denominator by the complex conjugate. The denominator simplifies to the real number 85. The numerator simplifies to -11 - 58i. So the final answer is -11/85 -(58/85)i

Divide 8 + 9i by 1 - 2i The complex conjugate of the denominator is 1 + 2i. Multiplying, the denominator simplifies to 5. The numerator simplifies to -10 + 25i. Simplifying, we get -2 + 5i as the final answer. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Euler's Formula

There is a remarkable formula which connects trigonometry and exponentials through complex numbers.

e raised to the power of ix = Cos x + i Sin x On the left hand side, we have an exponential. On the right we have Cosine and Sine.

The e here is the mathematical constant 2.71828, which is also the base of natural logarithms.

Observe that the squared magnitude of e raised to the power ix is always 1. This is because e power i x = Cos x + i Sin x, and we know the squared magnitude of this is Cos squared x + Sin squared x. And there is a trigonometric formula that Cos squared x + Sin squared x = 1. So squared magnitude of e power ix is 1

The converse is also true. That is, any complex number, a + bi whose squared magnitude a squared + b squared = 1 can be written in the form a + bi = Cos x + i Sin x = e power i x Let's prove this.

Given a complex number a + bi whose squared magnitude is 1, we can write b in terms of a. b = square root of (1 - a squared). So a + bi = a + square_root(1-a^2)i If a = Cos x, then from trigonometry, we know that square_root(1-a^2) = Sin x

So a + bi can be written as Cos x + i Sin x where Cos x = a, and Sin x = b. In other words, we can write a + bi as e power i x, if the squared magnitude of (a + bi) is 1. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Polar Form

Now let's generalize what we learned in the last chapter for any complex number a + bi. That is, complex numbers where a squared + b squared is NOT 1. The trick is to rewrite a + bi like this.

We are multiplying both the numerator and the denominator by the square root of the squared magnitude. This is, you guessed it, called the magnitude of the complex number. Since a and b are real numbers, the magnitude is also a real number.

Let's focus on the right hand side of this expression

which is highlighted in yellow. For now, forget about the left hand side.

The squared magnitude of the highlighted portion is this: which upon simplifying, works out to 1.

So when the complex number a + bi is written in this form, it is a product of a real number,

shown here highlighted in blue, and a complex number whose squared magnitude is 1,

shown here highlighted in yellow. The blue highlighted portion is a real number. Lets call it r. The yellow highlighted portion is a complex number of magnitude 1, and it can be written as e power ix, or as Cos x + i Sin x

To recap, the complex number a + bi is written like this. The left hand side portion,

highlighted in blue is a real number. We call it r.

The right hand side portion can be written as Cos x + i Sin x because its squared magnitude is 1. The right hand side portion can also be written as e power i x. The two parts have been highlighted with colors to show equivalence.

The key takeaway is that any complex number a + bi can be written in the form r e power i x Where r is the magnitude and Cos x and x are as shown here. r e power i x is called the polar form of the complex number. The polar form is very important for quantum physics because many transformations can be visualized as changes in the angle x. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples

Let's work through some problems. First, we will learn how to convert polar form to standard form. Convert 10 e power i pi/4 to the form a + bi The e power i pi/4 can be written as Cos pi/4 + i Sin pi/4. We evaluate 10 times Cos pi/4 to get 10/root(2). Similarly, 10 Sin pi/4 is 10/root(2). So this is the final answer. Observe that angles are in radians, not degrees.

Convert 5 e power i pi/3 to the form a + bi We use the same procedure as before. Write e power ix as Cos x + i Sin x Then we simplify. The final answer is 5/2 + (5 root(3)/2)i.

Next we learn how to go the other way. From the form a + bi to polar form. Convert root(2) + root(2)i to polar form. The first step is to compute r. r is the magnitude. Square root of a squared + b squared. In this case, it is 2. The next step is to compute the angle x. We know Cos x = root(2)/2 = 1/root(2). We got this from the formula for Cos x that was presented in the previous chapter. Cos x = a/r So x is pi/4 The polar form is r e power i x = 2 e power i pi/4

Let's work through one more problem. Convert root(3) + i to polar form First we calculate r. As shown here. Next we find the angle x. As shown here. x is pi/6 So root(3) + i = 2 e power i pi/6 Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Fractional Powers

In this chapter we will learn how to compute powers of complex numbers. Computing any integer power of a complex number is easy. We can just multiply it out. Cube of a + bi is a + bi multiplied by itself 3 times. This expression can be expanded using algebra.

For other powers, especially fractional powers, it is easier if you first convert the complex number to its polar form. It is easy to find any power of a complex number when it is in the form r e power i x Let's work through some examples to see how powerful this method is. Find the cube roots of 1. 1 can be written in the form r e power i x. where r is 1 and x is 0. As seen here, raising e power 0 to the power of 1/3 is e power 0 which is 1. As we expected, 1 is the cube root of 1.

Let's find the cube root of 1 again, but we will proceed a little differently this time. e power i0 is Cos 0 + i Sin 0 Observe that Cos 0 is the same as Cos 2 pi. And Sin 0 is the same as Sin 2 pi. This is from trigonometry. So e power i0 is the same as e power i 2 pi

Let's find the cube root of 1 again, but this time we will use the result e power i0 = e power i 2 pi So 1 power 1/3 = e power i 2 pi / 3 = Cos 2 pi / 3 + (Sin 2 pi / 3) i = -1/2 + (root(3)/2) i That is strange! We have obtained a completely different value for the cube root of 1.

Let's take this idea further. 1 = e power i0 = e power i 2 pi = e power i 4 pi So cube root of 1 is e power i 4 pi / 3 = Cos 4 pi / 3 + i Sin 4 pi / 3 = -1/2 - (root(3)/2)i Stranger still! This is different from the earlier two values we obtained for the cube root of 1. Is this really a cube root of 1? We will answer this question in the next chapter. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Complex Cube Roots of 1

Let's verify that this complex number cubed is really 1. We expand the left hand side through algebra. As shown here, the left hand side simplifies to 1. This number actually is a cube root of 1. If you think about it some more, this is not counter intuitive. A linear equation has one solution. A quadratic equation has two solutions. An equation with cubes will have 3 solutions. The cube root of 1 will be solutions of the equation x^3 = 1 This ought to have 3 solutions. 1 is one solution.

These two complex numbers are the other two solutions for the equation x^3 = 1 These are commonly called the complex cube roots of unity. The key takeaway from this chapter is that to compute fractional powers, first express the complex number in polar form. Not just one polar form. Add multiple 2 pi values to get other angle quadrants. Then compute the fractional power. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Square Root of i

We can use the polar form to find roots of any complex number. Let's find the square root of i. That's right. We want to find the square root of the fundamental imaginary quantity i.

The key is to observe that i = i Sin pi/2 = Cos pi/2 + i Sin pi/2 This is because cos pi/2 is 0 and sin pi/2 is 1. From Euler's formula, this is = e power i pi/2 This is the polar form of i.

Now that we have the polar form of i, finding the square root is simple. SquareRoot(i) = e power i pi/2 raised to the power of 1/2 = e power i pi/4 = Cos pi/4 + i Sin pi/4 = 1/root(2) + i/root(2) This is the square root of i. As an exercise, evaluate the square of this complex number through algebraic expansion. You will find the answer is i. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

2D Coordinates

We can represent complex numbers on a 2 dimensional coordinate space. A complex number in the form a + bi can be represented as the point (a,b). That is, a is the xcoordinate, and b is the y-coordinate on a standard X and Y plot. In other words, the real part of the complex number is the x-coordinate. The imaginary part of the complex number is the y-coordinate.

Here is a representation of two complex numbers:

1 + i That is this point (1,1)

and the complex number 2 - 0.5i is the point (2, -0.5). As an exercise, try plotting a few complex numbers on the XY plane on your own.

Interestingly, the polar form of a complex number can also be represented on the XY coordinate plane. Let's write the polar form as r e power i theta. We use theta for the angle to avoid confusion with the X axis of the coordinate plane.

Consider the line joining the point (a,b) to the origin.

The length of this line is sqroot(a squared + b squared) This is the same as the value of r in the polar form.

Consider the angle between the line and the X-axis.

The angle is cosine inverse a/r. This is the angle Theta.

So the final representation of a complex number is shown here.

For the polar form, the length of this line is the value of r,

and the angle theta is the angle between this line and the X-axis.

The x-coordinate of the point is a,

the real part of the complex number.

The y-coordinate of the point is b,

the imaginary part of the complex number.

From trigonometry, we see that a = r Cos Theta and b = r Sin Theta

This picture can help you convert between the r e power i theta polar form and the a + bi form of a complex number. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Introduction to Matrices

In quantum physics, the state of a system is specified by a collection of values. When the state of the system changes, this collection of numbers also changes. So when we analyze quantum systems, we need mathematical tools to help us work with collections of numbers. Fortunately, such tools already exist, and you have probably learned how to use them in high school. The toolset we use to work with collections of numbers is linear algebra, also called matrix algebra. A matrix is a table of numbers. The numbers are enclosed with square brackets. These are matrices.

Matrices can have real numbers, or complex numbers. Or both. In this matrix, two of the elements are real.

These two along the diagonal, highlighted in yellow. The other two elements are complex numbers.

These two elements highlighted in blue.

We can define operations like addition, subtraction, and multiplication on matrices. These operations, together forming an algebra for matrices, is called linear algebra. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Matrix Dimensions

We defined a matrix as a table of numbers. A table has a shape. And dimensions. The shape or dimension of a matrix is the number of rows by columns.

When we write the dimension, we write it like this: rows x columns. But this is not the multiplication operator. This is to be read as rows by columns. The sequence is important. In the dimension, it is rows first, followed by columns. I suggest you memorize this expression: rows by columns In many operations of matrices, not just in dimensions, it will be rows first followed by columns. It is important you remember this. Rows first. Followed by columns.

This matrix has 2 rows, and 1 column.

The dimension is 2 by 1

The dimension is always rows first, followed by columns.

In this matrix, there are 3 rows, and 3 columns. The dimension is 3 by 3

This is important, so I am repeating it. You must never mix up the rows and columns. It is always rows first, followed by columns.

Try to keep this expression highlighted in yellow in your mind: ROWS BY COLUMNS

This matrix has 1 rows, and 3 columns. The dimension is 1 by 3

This matrix has 3 rows, and 2 columns. So it is a 3 by 2 matrix.

This has 2 rows, and 3 columns. We call it a 2 by 3 matrix.

Finally, this matrix has 4 rows and 4 columns. It is a 4 by 4 matrix. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Matrix Addition

In this lesson, we will learn about matrix addition. Matrices can be added if they have the same dimension.

These two matrices are both of dimension 2 by 2, and they can be added.

But these two matrices cannot be added. The first matrix is 2 by 1, and the second is 1 by 2. Since the dimension is different, they cannot be added.

Matrices are added by adding the corresponding elements. The yellow highlight shows the elements that are being added. The blue highlight is for the result. Follow along with sequence of illustrations.

1+5=6

2+6=8

3 + 7 = 10

4 + 8 = 12

Here is an example. As an exercise, try the addition yourself. 10 + -3 = 7 11 + 2 = 13

This example shows you how to add matrices with complex numbers. Verify the calculations on your own.

Another example with complex numbers. Verify the calculations on your own. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Matrix Subtraction

Matrix subtraction is similar to matrix addition. The matrix dimensions have to be the same, otherwise subtraction will not be possible. Subtraction is performed for each element. These two matrices shown here are both of the same dimension, 2 by 2, so subtraction is possible. Follow along with the sequence of illustrations. The elements to be subtracted are highlighted in yellow. The result is highlighted in blue.

6-4=2

7-3=4

8-2=6

9-1=8

Subtraction can be performed with complex numbers as well. Verify this calculation.

Here is another example. Verify that it is correct.

Here is another example. Verify that it is correct. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Scalar Multiplication

It is possible to multiply matrices with scalars. Scalars are plain numbers. Scalars may be real numbers or complex numbers. Multiplying a matrix by a scalar is the same as multiplying each element of the matrix by that scalar. Consider this multiplication. We are multiplying a 2 by 2 matrix by a scalar number, 2. The result is another 2 by 2 matrix, where each element has been multiplied by 2.

Here is another example. Each element in the matrix is multiplied by -4 to get the result. Verify the computation on your own.

The scalar can be a complex number as shown here.

Here is another example of multiplication, where both the scalar and the matrix are complex. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Matrix Multiplication

We can also multiply one matrix with another. Matrix multiplication is possibly the most complex operation we will discuss in this book. Matrix multiplication is absolutely essential for quantum computing. Every transformation of physical state in quantum computing is modeled by matrix multiplication. We will begin with simple examples of matrix multiplication and proceed step by step. When we multiply two matrices, we multiply an entire row of the first matrix, with an entire column of the second matrix. Observe the "rows first, followed by columns" rule being applied here. We multiply a ROW from the first matrix, with a COLUMN from the second matrix. ROWS first. Then COLUMNS Let's multiply these two matrices.

We take the first row of the first matrix,

and multiply it with the first column of the second matrix. The row and column are highlighted in yellow and orange respectively. This is important, so I will repeat it. We will always be taking entire ROWS from the first matrix, and entire COLUMNS from the second matrix. This row highlighted here with yellow in the first matrix has two elements, 1 and 2. The column highlighted here with orange in the second matrix also has two elements, 5 and 7. We are going to multiply this row and this column.

We start by multiplying the first element in the row with the first element in the column. That is 1 multiplied by 5. Then we move on to the next element. The second element of the row and the second element of the column. Multiply 2 and 7. We add up the two products to get 19.

This is the element in the first row first column of the result.

Next we multiply the first row of the first matrix

with the second column of the second matrix. Observe that we always select a row from the first matrix and a column from the second matrix. Now let's multiply the highlighted row with the highlighted column. We multiply the first element of the row with the first element of the column. 1 x 6. Then the second element of the row with the second element of the column. 2 x 8.

We add the products to get 22. This is the result element in the first row, second column.

We have completed one pass over all the columns in the second matrix.

So we move to the second row of the first matrix. We multiply the second row of the first matrix

with the first column of the second matrix. As highlighted here. That is, 3 x 5 + 4 x 7

= 43 This is the second row, first column of the result.

Next we multiply the second row of the first matrix

with the second column of the second matrix. Highlighted here. 3x6 + 4x8

=50 The element in the second row, second column of the result.

I will recap the procedure we followed in computing the matrix product. Multiply the first row of the first matrix with the 1st, 2nd, ... columns of the second matrix to get the 1st, 2nd, ... elements of the first row of the result. Multiply the second row of the first matrix with the 1st, 2nd, ... columns of the second matrix to get the 1st, 2nd, ... elements of the second row of the result. And if there are more rows in the first matrix, we continue until we process all of them. This procedure is useful when you want to compute all the elements of the result in one sweep. But sometimes, what you want is some specific element of the result matrix.

There is an alternative procedure you can use to compute specific elements of the result matrix.

Suppose you want to compute the element highlighted in blue. You see a blank there. We are yet to compute it. This is in the second row of the result. The second row of the result is obtained from the second row of the first matrix.

Let's highlight the row in yellow. The blue highlight is in the second column of the result. The second column of the result is obtained from the second column of the second matrix.

Let's highlight that in orange. We multiply the yellow row with the orange column. That is, 3x6 + 4x8 = 50

Let's do it again.

The result element we want is highlighted in blue. This is in the first row of the result. The first row of the result is obtained from the first row of the first matrix.

Let's highlight the row in yellow. The result we want is in the second column of the result. The second column of the result is obtained from the second column of the second matrix.

Let's highlight it in orange. Remember, we always take rows from the first matrix, and columns from the second matrix. We multiply the yellow row and the orange column. 1x6 + 2x8 = 22

The general rule for multiplication is: The result element at ith row, jth column = ith row of first matrix multiplied with jth column of second matrix. Study this rule for a few minutes before proceeding.

Let's use the rule to compute the element highlighted in blue. We need to specify it as ith row, and jth column. i=2 j=1 Let's highlight the ith row in yellow.

Remember the row is always from the first matrix.

Let's highlight the jth column in orange. The column is always from the second matrix. Multiplying, 3x5 + 4x7 = 43

As an exercise, use the rule and verify each element in the final result matrix. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples 1

Let's work through an example with complex numbers.

We want to compute the element highlighted in blue.

This is the product of the yellow row and the orange column. That is 9 + i.

Let's compute the element highlighted in blue.

That is the product of the yellow row and the orange column. Which works out to 16 + 2i.

Let's compute the element highlighted in blue.

That is the product of the yellow row and the orange column. Which works out to 14.

Let's compute the element highlighted in blue.

That is the product of the yellow row and the orange column. Which works out to 10. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples 2

This chapter is a set of exercises. Work out the multiplication on your own and verify the result.

Next exercise.

Final exercise.

3x3 Example

You can multiply larger matrices. Follow along with this sequence of illustrations to learn how you can multiply a 3 by 3 matrix with another 3 by 3 matrix.

As an exercise, compute each element again on your own. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples 1

Work through this exercise on your own. Verify your computation with the answer shown here.

Here is another exercise for you to work through.

Worked Examples 2

So far we have multiplied square matrices. That is, matrices where the number of rows = number of columns. We can also multiply matrices that are not squares as shown here. The first matrix is 3 by 2, which is not a square. Follow along with the sequence of illustrations to understand how the multiplication is computed.

Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

When is Multiplication Possible?

Let's revisit the matrix multiplication we performed in the previous lesson. We multiplied the rows of the first matrix with the columns of the second matrix.

To be precise, we multiplied each element of a row from the first matrix with each element from a column from the second matrix. For this to be possible, number of elements in a row of the first matrix must equal number of elements in a column of the second matrix.

The number of elements in a row of the first matrix is exactly the same as the number of columns in the first matrix. The number of elements in a column of the second matrix is the number of rows in the second matrix.

So the rule for a multiplication to be possible is that: Number of columns of the first matrix should be equal to the number of rows in the second matrix.

We can represent this rule visually like this: Write the dimension of the first matrix.

Here it is highlighted in yellow. Next to it, write the dimension of the second matrix.

Here it is highlighted in orange. For multiplication to be possible, the two adjacent numbers,

here enclosed in a red box, should be equal. This is an important rule. Make sure you understand it before you continue.

I will repeat the rule. When the dimension of the first matrix is written, followed by the dimension of the second matrix,

the two middle numbers must be equal for multiplication to be possible. That is, the numbers enclosed in the red box must be equal. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples

Let's determine if these two matrices can be multiplied. Observe that the matrices are the same as in the previous chapter, but the first and second matrices have been swapped. Let's apply the rule. The dimension of the first matrix is 2 by 2. The dimension of the second matrix is 3 by 2.

The numbers in the red box must be equal for multiplication to be possible. But they are not the same. So this multiplication is not possible.

There is another way to visualize this. To multiply, we would take a row from the first matrix,

highlighted in yellow. And a column from the second matrix,

highlighted here in orange. The row and the column will need to be multiplied. But you can see that the number of elements in the yellow row is not the same as the number of elements in the orange column. The row cannot be multiplied with the column. And so, the entire matrix multiplication is not possible. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Not Commutative

In the previous chapters we saw that it was possible to multiply matrices that were not square. We were able to multiply a 3 by 2 matrix and a 2 by 2 matrix. But all matrix multiplications are not possible. When we reversed the two matrices to be multiplied, that is, when we tried to multiply the 2 by 2 matrix with the 3 by 2 matrix, we found that it was not possible. In general we can say that given two matrices A and B, then if AB is possible, it does not imply that BA is possible. This can be a strange idea for beginners. We have an algebra where sometimes the product AB can be computed, but the product BA might not exist. The key takeaway here is that the sequence in which we multiply matrices is important.

In general, for two matrices A and B, AB is not the same as BA. Sometimes AB = BA, but not always. Sometimes it might be possible to compute AB though it isn't possible to compute BA. And sometimes, though both AB and BA can be computed, they are not equal.

Matrix multiplication is not commutative. In general, AB is not equal to BA. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Associative and Distributive

Matrix multiplication is associative. That is, given three matrices A, B, and C, the product (AB)C is the same as A(BC) In other words, which multiplication you perform first doesn't matter, as long as the sequence A, then B, then C is preserved.

Matrix multiplication is distributive over addition. A(B+C) = AB + AC

The associative and distributive properties make it possible for us to perform algebraic manipulations on matrices. For example, (A+B)(C+D) can be expanded to AC + AD + BC + BD But remember that matrix multiplication is not commutative. In general AB is not equal to BA. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Dimension of Result

In previous chapters we discussed a rule to determine if matrix multiplication is possible or not. If we have two matrices of dimensions m by n and n by p, then they can be multiplied.

Visually, the numbers in the red box have to be the same. The interesting thing is that there is a rule for the dimension of the result as well. When we multiply a m by n matrix and a n by p matrix,

the result is always a m by p matrix. That is, the dimensions of the result are given by the numbers outside the red box. Let's work through some examples.

Suppose we multiply a 3 by 2 matrix and a 2 by 2 matrix. The multiplication is possible

because the numbers inside the red box are equal. The result of the multiplication is 3 by 2.

This is determined by the numbers outside the red box, here highlighted in blue.

If we multiply a 3 by 3 matrix and a 3 by 1 matrix,

then the multiplication is possible as we can see from the red box.

The result is 3 by 1 from the numbers highlighted in blue. The numbers outside the red box.

How about multiplying a 2 by 2 matrix and a 3 by 2 matrix? That is not possible. The adjacent numbers are not equal.

How about a 3 by 1 matrix and a 3 by 3 matrix? This isn't possible either. The adjacent numbers are not equal.

How about a 3 by 1 matrix and a 1 by 3 matrix? This is possible.

The numbers in the red box are equal. The result is a 3 by 3 matrix. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Odd Shaped Matrices

In quantum physics, it might be necessary to multiply some odd shaped matrices. Consider this multiplication shown here. The first matrix is 3 by 2. The second is 2 by 1. The multiplication is possible

because the numbers in the red box are equal.

The dimension of the result are the numbers outside the red box, highlighted in blue. Now follow the sequence of illustrations to learn how the matrices are multiplied.

Perform this multiplication on your own and verify that you get the same answer. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples 1

Here are some more odd shaped multiplications. Follow along with the sequence of illustrations.

Then verify that you can perform this multiplication on your own.

Another exercise. Follow along with the sequence of illustrations.

Then verify that you can perform this multiplication on your own. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples 2

Consider this multiplication. Verify that you are able to compute this answer correctly. This kind of multiplication is important.

A matrix which is vertical like this is called a column matrix. It consists of a single column of numbers.

A matrix which is horizontal like this is called a row matrix. It consists of a single row of numbers.

When we multiply a column with a row, the multiplication is called "outer product". To understand why,

let's write the column and row matrices over the result like this. These numbers shown in green are the column and row matrices.

The result is a table of all possible products of the elements in the column with elements in the row. Watch the sequence of illustrations to learn why we call it an outer product.

Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples 3

Here is another exercise for you to work through. Verify that you get this answer.

Inner Product

Recall from an earlier chapter, when we multiplied a column matrix and a row matrix, we called it an outer product. Instead, if we reverse the sequence, and multiply a row matrix with a column matrix, we call it a inner product. Consider this multiplication shown here.

We multiply this row matrix, highlighted in yellow,

with this column matrix, highlighted in orange. The result is a 1 by 1 matrix, shown here,

highlighted in blue. I will repeat, a row matrix multiplied by a column matrix, always results in a 1 by 1 matrix result. This is called the inner product. Contrast this with the outer product we studied earlier.

Here is another inner product with complex numbers. Work through the multiplication on your own.

In the general case, when we have a row matrix a b c, and a column matrix d e f, the result is a 1 by 1 matrix. The result is a matrix with just one element,

ad + be + cf Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples

Another common kind of matrix multiplication you will see in quantum computing is to multiply a square matrix with a column matrix. Work through this exercise on your own and verify that you get the correct answer.

Here is another exercise. Work through this on your own and verify your answer.

Identity Matrix

When we talk of multiplication of real numbers, the number 1 is special. It is the identity. Any number multiplied by 1 is the number itself. For example, 5 times 1 is 5, 11 times 1 is 11, and so on. There is a special matrix that plays a similar role in matrix multiplication. It is called the identity matrix and is usually named I. The capital letter I. If A is a matrix, then AI = A and IA = A I is the matrix equivalent of the scalar number 1.

For a 2 by 2 matrix, I is shown here. Verify these multiplications on your own. Observe that when the matrix I is multiplied with another matrix, the result is that other matrix.

Multiplication by the identity matrix is not always possible.

This product exists. (highlighted in yellow)

But this product is not possible. (highlighted in orange)

A 3 by 3 identity matrix looks like this.

The general rule for identity matrices is:

The diagonal elements, from top-left to bottom-right are 1. Other elements are 0.

This is a 2 by 2 identity matrix.

This is a 3 by 3 identity matrix.

This is a 4 by 4 identity matrix.

Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Matrix Inverse

In real number multiplication, there is the concept of an inverse.

The inverse of a number like 6 is another number that when multiplied by 6 will result in 1. For real numbers, we know that the inverse of 6 is 1/6.

Similarly, the inverse of 11 is 1/11,

the inverse of 3 is 1/3,

and so on. There is also a notation in real number arithmetic where we use negative powers. 3 power -1 is 1/3. 6 power -1 is 1/6. 11 power -1 is 1/11. Any number x raised to the power of -1 is 1/x. But we have just seen that 6 power -1 = 1/6 is the multiplicative inverse of 6. Similarly, 3 power -1 is the inverse of 3, 11 power -1 is the inverse of 11, and so on. Because of this, the notation, raising to the power of -1, means finding the inverse.

In matrix multiplication also we have the concept of inverse. The notation is the superscript -1. Observe that this is only notation. We are not actually raising the matrix to any power. Given a matrix A, we define an inverse of A, denoted by "A superscript -1", as shown here.

A multiplied by its inverse is I, the identity matrix.

A inverse multiplied by A is also I, the identity matrix.

Some examples will make this concept clear.

If A is this, highlighted in yellow,

then A inverse is this, highlighted in orange.

Verify this for yourself. Multiplying the two matrices results in the identity matrix.

Here is another example. Verify that multiplying the matrix with its inverse results in the identity matrix. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Transpose

Another transformation of matrices you need to know about is the transpose. The transpose operation is represented by a superscript of T. To find the transpose of a matrix, you interchange the rows and columns. The meaning of interchanging rows and columns can be seen in this sequence of illustrations.

The first row becomes the first column. Yellow highlight.

The second row becomes the second column. Orange highlight.

Another way to visualize the transpose operation is as a mirroring about the diagonal. As seen here, if you mirror the matrix about the dashed red line, you get its transpose.

Another example of transpose. Verify for yourself that mirroring around the red line is the same as swapping the rows and columns. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Worked Examples

The transpose operation is very common in quantum physics. So here are some more examples to help you become familiar with the concept. Work this out on your own. It will take you less than a minute.

Another exercise for you to work through. Work through this using both methods. By mirroring, and by interchanging rows and columns.

Another exercise for you to work through. Work through this using both methods. By mirroring, and by interchanging rows and columns.

Transpose of Product

Suppose you have two matrices A and B. The transpose of their product, that is AB transpose, is equal to B transpose times A transpose. Observe how the A followed by B on the left hand side has become B followed by A on the right hand side.

Let's verify this.

Suppose A and B are matrices as shown here.

We calculate AB

and then we calculate transpose of AB.

This is the left hand side of the equation.

Let's work on the right hand side.

B transpose is this.

And A transpose is this.

B transpose times A transpose is this.

As we can see, AB transpose = B transpose times A transpose. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Complex Conjugate of Matrices

You might recall, when studying complex numbers, we learned about an operation called complex conjugation. The notation was superscript of * Complex conjugate of a+bi is a-bi We simply reverse the sign of the imaginary part. Complex conjugation can be performed on matrices. To find the complex conjugate of a matrix, we find the complex conjugate of each element in the matrix independently.

Consider this example.

Observe how the sign of imaginary part of each complex number in the matrix has changed.

Real numbers, like this 2, are not affected by complex conjugation.

Another example. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Adjoint

We can combine the complex conjugation and transpose operations. Transpose of the complex conjugate is called adjoint.

The notation is a superscript of dagger +

A adjoint

= transpose of complex conjugate of A

= complex conjugate of transpose of A

The adjoint is an important operation in quantum physics. Let's look at some examples to become familiar with it.

Consider this example. Work through it on your own.

Another example. Work through it on your own. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Unitary

In quantum physics, state transformations are modeled through matrix multiplications. Quantum state transformations can be of two types. Reversible and irreversible. Reversible quantum transformations are modeled through a special kind of matrix called a unitary matrix.

A matrix is unitary if its adjoint is its inverse. That is, A adjoint = A inverse. To repeat, a matrix A is unitary if A adjoint equals A inverse.

Another way to visualize a unitary matrix is that A x A inverse = A x A adjoint = the identity matrix I.

Here is an example of a unitary matrix. When we multiply the matrix by its adjoint, we get the identity matrix. This proves that the adjoint is the inverse.

Here is another example. When we multiply the matrix by its adjoint, we get the identity matrix. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Hermitian

In the previous chapter we learned about unitary matrices for reversible quantum state transformations. In this lesson we will learn about another kind of matrix which represents irreversible quantum state transformations. Matrices that represent irreversible quantum transformations have the property:

A adjoint = A That is, the matrix is its own adjoint. Such matrices are called Hermitian matrices.

Here is an example of a Hermitian. When we compute its adjoint, we find that it is equal to the matrix itself.

Here is another example. As an exercise, compute the adjoint of this matrix. You will find that it is equal to the matrix itself. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Hermitian and Unitary

A matrix can be both Unitary and Hermitian. That means that a matrix A = A inverse = A adjoint

As an exercise, verify that this matrix is both Hermitian and Unitary. That is, you must show that the inverse and the adjoint are the same as the matrix itself. If you show that A multiplied by A is the Identity matrix, then you have proved that A = A inverse. Next, If you show that A adjoint is A itself, then it proves that A = A inverse = A adjoint

Here is another matrix that is both Unitary and Hermitian. To recap, you must show that the inverse and the adjoint are the same as the matrix itself. If you show that A multiplied by A is the Identity matrix, then you have proved that A = A inverse. Next, If you show that A adjoint is A itself, then it proves that A = A inverse = A adjoint

Here is another exercise to work through. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Why Hermitian or Unitary ?

I want to give you some context about why Unitary and Hermitian matrices are important. A quantum computation usually has this structure.

The first step is to initialize quantum bits. Highlighted in yellow. This is an irreversible operation and is modeled by Hermitian matrices.

The main computation is a sequence of reversible operations. Highlighted in orange. These are modeled by Unitary matrices. The final step is to measure the quantum bits

that contain the results of computation. Highlighted in blue. This is an irreversible operation and it is modeled by Hermitian matrices. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Vectors and Transformations

In quantum physics you will frequently see column matrices. That is, matrices that consist of a column of numbers. These have multiple rows, but only one column. Like these examples shown here. Column matrices are also called vectors.

A vector in matrix algebra, and when we analyze quantum computing problems, is simply a column matrix. I will repeat that. For our purposes, a vector is not the physics concept you have learned in high school of a magnitude with direction. A vector is merely a column matrix.

These are vectors.

These are a 2x1 vector, 3x1 vector, and 4x1 vector respectively. This causes difficulties for beginners. So I will say it again. A vector in quantum computing is not about magnitude and direction. It is just a name for a special kind of matrix. A matrix that has only one column. A vector is just another name for a column matrix.

I am now going to introduce you to the concept of Eigen vectors. This is an important concept, but beginners often have trouble with it. The idea of an Eigen vector is related to linear transformations. So I will begin by discussing linear transformations, and then we will move on to Eigen vectors.

A linear transformation can be visualized on a coordinate space. Here we have the X and Y axes.

A point on this plane is represented by coordinates (x1, y1).

x1 represents the x-coordinate of the point, and y1 is the y-coordinate of the point.

Suppose we wish to apply a linear transformation on this point. The linear transformation can be represented like this.

The new point after transformation is (x2, y2).

And we have: x2 = Ax1 + By1 y2 = Cx1 + Dy1 The four coefficients A, B, C and D determine the nature of the linear transformation. Any linear transformation of a point on a 2-dimensional plane can be specified by choosing appropriate values for A, B, C, and D.

These equations of transformation can also be written in matrix form.

Consider this square matrix multiplied by this column matrix. Work out the expansion of this matrix multiplication for yourself.

This is the same as saying x2 = Ax1 + By1, and y2 = Cx1 + Dy1

In terms of matrices, this column matrix here

has been transformed by multiplying it by this square matrix, and the result of the transformation

is the the matrix result which is this column matrix. This is an important idea. I will repeat that. A linear transformation of the form x2 = Ax1 + By1 y2 = Cx1 + Dy1 can be represented in matrix form like this. The square matrix specifies the transformation. The column matrix is the coordinate that is being transformed. The result of the matrix multiplication is the transformed coordinate. Observe that the coordinates are vectors. That is, they are column matrices. When we multiply a vector (x1,y1) by a square matrix, it gets transformed into a new vector (x2,y2). Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Rotation in 2D

Let's look at matrix transformations in detail. Suppose you have a point (x1, y1). If you want to rotate it by an angle n anti-clockwise about the origin,

then the matrix transformation is as shown here.

Let's work through an example.

We start with the point (1,0) on the x-axis.

We rotate it through 30 degrees anti-clockwise by using this matrix.

We get this point.

Let's rotate this point again by another 30 degrees.

That is, this multiplication.

We get this point.

Let's rotate the point through a further 30 degrees.

We get this point (0,1). This makes sense. By rotating thrice by 30 degrees, we have ended up with a rotation through 90 degrees. Rotating (1,0) by 90 degrees results in (0,1). This is as we would expect. We aren't going to be doing much geometric transformations like this in quantum computing. But I wanted you to get an intuitive sense for how matrix multiplications correspond to linear transformations on a coordinate space. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Special Directions

We have seen how a square matrix corresponds to a linear transformation on coordinate space. The coordinate of a point is represented as a column matrix. A column matrix is also called a vector.

Now let's look at this transformation matrix.

If we transform the point (1,2), we get (2,1).

If we transform (2,5) we get (5,2).

This is not a rotation.

Instead, it is a reflection about the y=x line.

Another way to think about this transformation is that we are swapping x and y. The equations for this transformation are: x2 = y1 y2 = x1

This transformation does something interesting to any point on the y=x line.

If we transform (1,1), we get (1,1). The transformation doesn't change the vector.

Let's try with another point on the y=x line. We transform (3,3) and get (3,3).

So the direction of the line y=x is a special direction for this transformation.

Similarly, points on the line y = -x also have a special behavior with this transformation. A point on the y = -x line is transformed into another point on the same line.

So a point on the line y = -x, (3, -3) becomes (-3, 3), which is also a point on the line y=x

and the point (-2, 2) becomes (2 , -2), both points are on the line y=-x.

So the direction of the line y= -x is also a special direction for this transformation.

We have been talking about lines and directions. Let's look closer at what we mean by direction. A direction on coordinate space

can be represented by a point that is on a circle of radius 1 around the origin. All points on this circle are a distance of 1 from the origin. Each point on the circle represents a unique direction. Now let's represent the two special directions of this transformation matrix on this circle.

We get this point, and this point.

Each of these points can be represented as a column matrix. We get two column matrices as shown here. Remember a column matrix is also called a vector. And these are special vectors representing special directions for this transformation.

These two vectors are the Eigen vectors of this transformation matrix. Next we will discuss a more formal definition of Eigen vectors. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Eigen Vectors and Eigen Values

In the previous chapter we examined this transformation matrix

and we saw how this transformation had special behavior for points on the line y=x and on the line y=-x. For both these lines, a point on the line, is transformed to a point on the same line.

A point of y=x, like the point (2,2) is transformed to (2,2) which is also on the line y=x.

Similarly, let's take a point on the line y=-x. The point (2, -2) is transformed to (-2, 2). The new point (-2, 2) is also on the line y=-x.

The two special lines can be represented by two points of unit length.

That is, these two points.

We can write these points as column matrices, or vectors, like this.

We know these vectors represent special directions in coordinate space. In matrix multiplication, the special property is shown here: transformation matrix times the vector is a scalar times the vector. This is true for both the vectors, though the scalar value is different for each.

To repeat: Transformation matrix times vector equals the scalar 1 times the vector.

Here we have: Transformation matrix times vector equals the scalar -1 times the vector.

Any vector which has this behavior, that the transformation matrix multiplied by the vector is a scalar times the same vector is called an eigen vector of the transformation matrix. This scalar is the eigen value corresponding to the eigen vector.

This is a transformation matrix.

This is an eigen vector of this transformation.

And this is its corresponding eigen value.

Similarly, this is another eigen vector for the same transformation.

And this is the coresponding eigen value of the second eigen vector.

Some things to remember.

An eigen vector represents a direction. There can be an infinite number of vectors along a particular direction. So by convention, we choose an eigen vector to be vector that represents a point on the unit circle about the origin.

Each different transformation will have different eigen vectors. An eigen vector has special properties only for that particular transformation. Not every transformation will have eigen vectors. The rotation matrix we saw earlier doesn't have any eigen vectors (except for the special case when the angle is 0 or 180). A 2x2 matrix will typically have 2 eigen vectors. A 3x3 matrix will typically have 3 eigen vectors, and so on.

Eigen values and eigen vectors might appear to be a useless abstract concept. But in Quantum Physics, the concept is very important. Eigen vectors of Unitary and Hermitian matrices reveal the physical meaning of applying those state transformations on a quantum system. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

More Eigen Vectors

Here is an example of a 3x3 matrix. As I had mentioned earlier, for a 3x3 matrix, we should expect 3 eigen vectors.

These are the eigen vectors for the 3x3 matrix. For simplicity, we haven't normalized the vectors. The vectors are not on the unit circle. Just remember that these vectors represent directions, and only the direction is important.

Let's multiply the matrix by each vector and see what happens.

This multiplication evaluates to this vector.

When we factor out the eigen value, we get this.

This, highlighted in yellow is the eigen vector.

And highlighted in green is the eigen value.

Similarly, for the next eigen vector,

the multiplication evaluates to this, highlighted in blue.

Factoring out the eigen vector, we get this, highlighted in green,

is the eigen value corresponding to this eigen vector.

For the third eigen vector, we follow a similar procedure,

and this, highlighted in green,

is the eigen value corresponding to this eigen vector.

Eigen vectors and eigen values are not always real numbers. Consider this matrix. All the elements are real numbers. But the eigen vectors and eigen values are not all real numbers.

One of its eigen vectors is this.

And the corresponding eigen value is 1 which is a real number.

But here is an eigen vector and eigen value which are not real numbers.

And here is the third eigen vector and the corresponding eigen value. Verify these eigen vector equations yourself. In quantum computing you will be performing matrix operations like this very often, and it is necessary for you to be familiar with matrices that have complex number elements. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Dirac Bra-ket Notation

In this chapter we will learn about the Dirac Bra-ket notation. The Bra-ket notation is used everywhere in quantum computing.

The Bra-ket notation consists of two operators.

The first is the ket operator. This is the notation for Ket A.

The second is the Bra operator. This is the notation for Bra A.

The Bra and the Ket are used with vectors, that is column matrices.

The Ket operator does nothing. It is a 'no operation' or NOP. Ket A is written with this notation, and it just means the vector A.

The Bra operator is written with a left angle bracket like this, and Bra A is the same as Adjoint A or transpose of complex conjugate of A.

When we multiply a bra and a ket, we use a short form notation as shown here.

Bra A multiplied by Ket B is written like this. This is called a bra-ket.

Let's work through an example.

Suppose A is this column matrix.

Then adjoint of A, which is bra A, is this matrix.

Suppose B is this column matrix.

Then ket B is the same matrix. Ket is a 'no operation'.

To recap, we have bra A is this matrix, and ket B is this matrix.

To find this bra-ket, we multiply bra A and ket B. This is the result. The result is a 1 by 1 matrix. Usually 1 by 1 matrices are treated as scalars. Recall that multiplying a row matrix by a column matrix is called inner product. So the bra-ket is an inner product of matrices.

We can also multiply Ket B and Bra A as shown here.

The result is a square matrix. When we multiply a column matrix and a row matrix, it is called an outer product.

From Bits to Qubits A classical computer operates on bits. In a similar manner, quantum computers operate on Quantum Bits, or qubits. The behavior of qubits is weird. Their value can be 0, 1, or one of an infinite number of in-between superposed states. Understanding superposition of qubits can help when you are getting started with quantum computing. The chapters in this section will begin by describing the quantum behavior of light. We will focus on a property of light called polarization. We study polarization because it is very similar to the behavior of quantum bits (qubits). Next, we will study how polarization can be measured by calcite crystals and polarizing filters. From measurement we will move on to the study of superposition. Finally, I have provided Java source code for a photon-qubit simulator. If you know Java, you can run the simulator to get an intuitive understanding of how superposition works.

Polarized Photons of Light

We will begin our study of quantum physics with a property of light known as "Polarization". You might have studied in high school that polarization is caused by the angle of electric and magnetic fields. But for our purpose, we don't need to worry about how light gets polarized. You just need to know that photons of light have a property called polarization. The property is the angle of polarization. We will look at this in detail later.

We study polarization because it has quantum behavior that we can experience through everyday objects. Many of us have used polarized sunglasses to reduce glare. Photographers use polarized lens filters to reduce reflections in their photos.

Polarizers are easily available. Apart from sunglasses and camera lens filters, the easiest way to get a polarizer is to purchase a polarizing filter from Amazon. These polarizing filters are very inexpensive. These polarizing sheets are also called polarizers.

The most important reason we study photon polarization is because polarized photons behave exactly like Quantum Bits. A quantum bit, or qubit is the fundamental unit of computation in a quantum computer. It plays a role similar to bits in classical computers.

A quantum bit has two important behaviors that we can study through photons. Superposition, and Entanglement. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Photons and Polarizing Filters

We are going to conduct some imaginary experiments on photons. And one of the devices we will use for our imaginary experiments is your phone or laptop. Consider the screen on which you are reading this page. This screen is a source of light. It emits photons which travel away from the screen and reach your eye.

These photons have a quantum property called polarization. This property is an angle. The angle of polarization. Photons are polarized at an angle theta.

When polarization angle theta is 0, we call it horizontal polarization. When polarization angle theta is 90, we call it vertical polarization. Theta can also be some angle other than 0 or 90.

In classical computers we know about the concept of bits. A boolean value of FALSE is encoded as a bit value of 0 which is represented at a circuit level with a 0 volt potential. A boolean value of TRUE is encoded as a bit value of 1 which is often represented at a circuit level with a +5 volt potential.

In quantum computers we have quantum bits or qubits. I had mentioned earlier that photon polarization behaves exactly like qubits. So let's define vertical polarization to represent a qubit value of 1, and horizontal polarization to represent a qubit value of 0. We could have defined any two angles that are separated by 90 degrees as qubits 1 and 0.

In this case, we have chosen 90 degrees, that is vertical polarization for representing qubit value 1,

and 0 degrees, that is horizontal polarization for representing qubit value 0.

Now let's use your screen, this screen, where you are reading this book, in our virtual experiments.

See this rectangle here? Assume for this virtual experiment that it is a source of vertically polarized photons.

That is, vertically polarized photons are produced by this rectangular patch on your screen. The photons are directed away from your screen and toward your eyes.

Next we place a vertically polarized filter over this light source.

This inner rectangle you see here represents the vertically polarizing filter. The filter appears to be completely transparent.

That is because a vertically polarizing filter allows all photons which are vertically polarized to pass through. In other words, a vertically polarizing filter is completely transparent to vertically polarized photons. There are two arrows you see in this picture.

This inner arrow is the direction of polarization of the filter.

The outer arrow, here, is the direction of polarization of the light source.

Next we replace the vertically polarizing filter with a horizontally polarizing filter. A horizontally polarizing filter is not made of some strange substance that is different from a vertically polarizing filter. They are both essentially the same. If we rotate the vertically polarizing filter by 90 degrees, we get a horizontally polarizing filter. As you can see here, the light source is still vertically polarized. That is, it continues to emit photons that are vertically polarized. But the polarizing filter we placed on top has been rotated so that it is horizontal. That is, we have placed a horizontally polarizing filter on top of the light source.

This arrow which is horizontal indicates the orientation of the filter.

The polarizing filter in this configuration is completely opaque to the photons. That is why it appears black. The main rule is this: When a polarizing filter is aligned in the same direction as a photon, the photon always passes through. Earlier, both the photons and the filter were vertical. So the photons passed through. Instead, as shown here,

if the polarizing filter is at 90 degrees angle to the photon's polarization, then the photon is always blocked by the filter. In the diagram shown here, the light source is producing vertically polarized photons, but the filter is now horizontal. That is, the filter and the photons are at 90 degrees to each other. So the photons are always blocked. The filter in this configuration is opaque. So it appears black.

Next we rotate the filter so that it is oriented at 45 degrees, as shown here. In this case, 50% of the photons pass through, and the remaining 50% are blocked. That is, speaking in aggregates, half the photons pass through, and the rest are blocked. But if we look at this configuration from the point of view of a single photon, then we say that a photon has a probability of 0.5 of passing through, and a probability of 0.5 of being blocked. Since half the light gets through, the filter appears gray in this picture.

In general, if the photons and the filter are a small angle apart, say 10 degrees, then most of the light would pass through. And if the angle between the filter and the photon polarization was large, say 80 degrees, then very little light will get through. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

More on Photons and Polarizing Filters

To reinforce this concept, take a look at this sequence of illustrations. When the angle between the filter and the light source is small, most of the light gets through. That is, the filter appears almost transparent. When the angle is large, and closer to 90 degrees, very little light gets through, and the filter appears almost opaque.

To recap: When polarization of light source and polarizing filter are aligned at the same angle, all the light passes through the filter.

When photons and polarizing filter are anti-aligned, that is 90 degrees apart, all the photons are blocked at the filter.

In other words: Aligned means transparent. Anti-Aligned means opaque.

I use the term anti-aligned as the opposite of aligned. For photon polarization, antialigned means 90 degrees apart.

When the light source and the filter are at some angle other than 0 or 90, a fraction of the light gets through.

When the angle theta is closer to 0 degrees, more light gets through.

At theta = 45 degrees, half the light gets through.

When the angle theta is closer to 90, very little light passes through.

I will say the same thing again, but this time from the point of view of a single photon.

When the light source and filter are aligned, probability of pass-through = 1.0

When light source and filter are anti-aligned, that is when light source and filter are at 90 degrees, probability of pass-through = 0.0

For angles between 0 and 90, probability of pass-through is between 1 and 0. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Filters Change Polarization

Now we get to some quantum weirdness. Suppose we have an experiment set up as shown here.

There is a light source which emits vertically polarized photons.

The photons are directed at a polarizing filter aligned at 45 degrees.

We know that 50% of the photons pass through. The question is, what is the angle of polarization of photons that have passed through the polarizer? Your intuition from the world of classical physics might suggest that since the photons were originally vertical, they should continue to be vertical after passing through the polarizer. Your classical intuition would be wrong. When you send photons through a polarizer at 45 degrees, then all photons that pass through the filter will be polarized at 45 degrees. Essentially, the polarizer changes the angle of polarization of every photon that passes through.

In the general case, if we send photons with any arbitrary angle of polarization at a polarizing filter, only some of the photons pass-through.

But the photons that have passed-through the polarizer are now all polarized at the same angle as the polarizer.

If you send photons that are vertically polarized at a filter aligned at 45 degrees, then only 50% of the photons pass-through. But all of these 50% which have made it past the filter will now be polarized at 45 degrees, the same angle as the filter.

The first time you hear about the polarization of a photon getting changed when it passes through a filter, it might not sound too weird. After all, when you send white light through a red filter, the light coming out through the filter is red. This doesn't sound different. Consider the experiment shown here. This is a source of white light. White light is a mix of different colored photons. Some are red. Some are green, blue, and so on.

The red filter doesn't change the color of any photon. It blocks photons that are not red. It allows red photons to go through.

I will repeat that because it is important. The red filter does not change the color of any photon. It merely blocks other colors, and allows red to pass through. Contrast this with the polarizing filter. The polarizer actually changed the polarization of the photons that passed through. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Quantum Behavior of Polarizers

To emphasize that the behavior of the polarizer is not easily explained by classical physics, I want to discuss this experimental arrangement shown here. We have a source of vertically polarized photons. We are directing these photons at a horizontally polarizing filter. As we expect, none of the photons get through.

Next we insert a new polarizing filter between the light source and the horizontal polarizer. This filter highlighted with red marching ants. The new filter is aligned at 45 degrees. Interestingly, light now gets through to the screen. How did that happen? We inserted a new filter, and that filter made a previously opaque arrangement partially transparent!

Let's carefully analyze what is happening here. The vertically polarized photons from the light source hit the 45 degree filter.

50% of the photons pass through. The photons that pass through this filter are polarized at 45 degrees. These 45 degree photons then hit the horizontally polarized filter. The angle between the photons and the filter is now 45 degrees.

So 50% of these photons pass through and reach the screen.

To recap, after the first filter, 50% of photons have been blocked, and 50% have passed through. The photons that have passed through are now polarized at 45 degrees. These photons hit the second filter. Half the photons are blocked here. So 25% of the original photons pass through the second filter and reach the screen. The weird thing here is that by introducing the 45 degree filter, we have made it possible for some of the photons to reach the screen. This never happens in classical physics. Filters with classical behavior block light. They don't make things more transparent. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Polarizing Single Photons

Let's look at polarizing filters again but with an awareness that this is quantum behavior. From the point of view of a single photon,

in some situations, the behavior of the photon with a polarizer is deterministic. That is, there is no randomness.

When the photon and the polarizing filter are aligned, probability of pass-through is 1. Deterministic behavior. The photon always passes through.

When the photon and the polarizing filter are anti-aligned, that is at 90 degrees apart, then probability of pass-through is 0. Again deterministic. The photon is always blocked.

In other situations, the behavior of the photon has a random component. When the light source and the filter are at an angle between 0 and 90 degrees, probability of passthrough is between 1 and 0. That is, the photon's behavior is a matter of chance. It might or might not pass through.

The behavior is probabilistic.

Furthermore, any photon that passes through will be aligned with the polarizing filter. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Using Calcite

So far we have been using polarizing filters in all of our "virtual experiments". But there is another material we can use to experiment with polarization. This is a calcite crystal. Calcite has an interesting property. When photons are directed at a calcite crystal, all the photons pass through. But the light beam gets split into two different light beams that emerge from the calcite crystal.

One stream of photons emerging from the calcite are polarized to be in alignment with the calcite.

Another separate stream of photons leaves the calcite, but these are polarized to be anti-aligned with the calcite.

In other words, the calcite splits the incoming stream of photons into two streams. One stream's photon polarizations are aligned with the calcite. The other stream is antialigned.

Let's compare the behavior of calcite with the polarizing filters we discussed earlier. When we direct a stream of photons at a filter, there is only one stream coming out of the filter. That stream contains photons which are aligned with the filter. The second stream of anti-aligned photons that we obtained from the calcite is missing here. Essentially, the photons that would have exited through the anti-aligned stream are being blocked by the polarizing filter.

Let's return to calcite crystals and recap what we have learned.

The calcite crystal redirects the photons that would have been blocked by a polarizing filter.

One stream of photons exiting the crystal is aligned with the crystal.

The other stream is anti-aligned. That is, at 90 degrees to the former.

Consider the experimental arrangement shown here. Polarizations of the photons we send towards the calcite are aligned with the calcite.

So all the photons exit the calcite crystal in the aligned path shown. This is deterministic behavior. The probability of a photon exiting along the aligned exit path is 1.

Conversely, if we send photons that are anti-aligned with the calcite, the behavior is again deterministic. But instead of emerging through the aligned path, these photons exit through the anti-aligned path. The probability of a photon exiting along the antialigned path is 1.

Now we will look at probabilistic behavior. If we shine light that is neither aligned, nor anti-aligned, at the calcite, then we will see that some photons exit the calcite along the aligned path, while other photons exit the crystal along the anti-aligned path. This is probabilistic behavior. Probability of a photon exiting by the aligned path is between 0 and 1. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Loss of Information

Let's consider a single photon of unknown polarization. We direct this photon at a calcite crystal.

The photon will exit the calcite either along the aligned path or along the anti-aligned path. If it emerges along the aligned path, then the photon's polarization is now aligned with the calcite crystal. If it emerges along the anti-aligned path, the photon's polarization is now anti-aligned with the calcite crystal.

In both situations, all information about the polarization of the photon before entering the calcite is lost.

This is an important point, so I will repeat. If a photon of unknown polarization is sent at a calcite crystal, then upon exiting the crystal, the photon is either aligned or anti-aligned with the crystal. While the photon's polarization could have been anything before the photon entered the calcite, when it exits, there are only two possibilities. It can be aligned or anti-aligned.

This means that all information about the polarization state of the photon before it entered the calcite has been lost. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Finding Angle of Polarization

The idea of loss of information is very important. Let's look at two virtual experiments to help us understand this concept better. Suppose this rectangle on your screen represents a polarized light source. We want to experimentally find the angle of polarization.

The experimental procedure is obvious. We place a polarizing filter on top of the light source. We rotate the filter until it appears transparent. The angle of the filter when it appears transparent is the angle of polarization of photons emitted by this light source.

We found that the filter was transparent at an angle of 45 degrees. So 45 degrees is the angle of polarization of the light source. This experiment was trivial. The procedure and results were obvious.

The same experiment can also be performed with a calcite crystal. We set up the apparatus as shown here. We rotate the crystal until all the light exits the crystal from the aligned path, and none of the light exits the anti-aligned path. The angle of the crystal is the angle of polarization of the light source. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Finding Polarization of a Single Photon

So far the experiment has been trivial. The results have been what we would expect from classical physics. There have been no surprises. But now we will make a small, seemingly innocuous change in the experiment. Consider the apparatus shown here.

Instead of a light source that can emit billions of photons, we have a single photon of unknown polarization directed at the calcite.

After passing through the calcite, the photon will be either

aligned with the crystal, or

anti-aligned with the crystal. That is, it will exit through one of the two paths shown. We don't know the polarization of the photon before it enters the crystal. Its behavior as it passes through the calcite is probabilistic. We cannot predict which path it will follow.

The key takeaway from the single photon experiment is that information about the photon's prior polarization, before the photon passed through the calcite, is lost. I will repeat that. Sending the photon through calcite destroys information about the prior polarization state of the photon.

Before the photon passed through the calcite, its polarization could have been any angle. The angle could be described by a real number.

After passing through the calcite, the photon polarization is either aligned or antialigned. It is in one of these two states. In other words, the photon state after it emerges from the calcite can be described by a single bit of information. Aligned or anti-aligned. Just one bit.

So a state that was earlier described by a real number has changed and become describable by a single bit. When a real number becomes a single bit, information is lost.

There is a probability connection between the angle of polarization of the photon before entering the calcite, and the path it takes out of the calcite. Unfortunately, we cannot follow a probability connection in a reverse direction. If the photon exits through the aligned path, then we know that there is a higher probability that the prior polarization angle of the photon was closer to the aligned angle than the anti-aligned angle. But this is merely a probability. We can't tell anything definite about the prior state.

Let's compare this single photon experiment with the earlier experiment where we could find the polarization easily. The critical difference is that in the earlier situation we had a near infinite supply of identical photons entering the calcite. Then by counting the fraction of photons that exit through the aligned path, versus the anti-aligned path, we can compute the original polarization.

Here is the inference we draw from our single photon experiment. For a single photon of unknown polarization, we cannot find its polarization using a calcite crystal. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Limitations of Measurement

We cannot use calcite to find the polarization of a single photon. How about using a different apparatus for measuring polarization? Something other than calcite or polarizing filters?

Here is some weirdness. Any apparatus that measures polarization can only report "aligned" or "anti-aligned". No other measurement result is physically possible. This is important, so I will repeat. Any measurement of polarization can only produce a binary result. Aligned or anti-aligned.

This is a physical limitation. No better measurement is physically possible. That is not all.

The act of measurement changes the state of the photon polarization. After measurement, the photon's polarization will be either aligned or anti-aligned with the measurement apparatus. Information about the state of the photon polarization before measurement is lost. The polarization state which had been representable by a real number changes to a binary state whose information content is 1 bit.

This is a limitation imposed by physics.

Given a single photon of unknown polarization, it is physically impossible to find its angle of polarization.

We can get some probability based clues. But we can get no definite knowledge. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Repeated Measurement 1

Recall that after a photon passes through a calcite crystal, its polarization is changed. The calcite crystal is a measurement apparatus. Measurement changes the state of the system being measured. This is one of the key differences between quantum physics and classical physics, so I will spend some time on this idea. Sending a photon through calcite is the same as measuring the photons' polarization using calcite. After the photon is measured by the calcite, its polarization is changed. The key takeaway is that every measurement of a quantum system can change the state of the system. Because of our intuition developed with classical physics, we might think that if we used a different apparatus, we might get better measurements. If we use a calcite crystal to measure polarization of a photon, the only result we can get from the measurement is "aligned" or "anti-aligned". After measurement, the state of the photon is changed so that every subsequent measurement by an identical apparatus will produce the same result. This last point is important. To repeat, when me measure a quantum system, the state of the system is changed by the measurement in such a manner that every subsequent measurement with identical apparatus produces the same result as the first measurement.

Consider this experimental apparatus shown here.

We have three identically oriented calcite crystals. The crystals are arranged so that any photon exiting crystal 1 in the aligned path will hit crystal 2. Similarly, any photon that exits crystal 2 in its aligned path will hit crystal 3.

We send a photon of unknown polarization at the first crystal. The photon might exit the first crystal either in its aligned path.

This path highlighted in yellow.

Or it might exit the first crystal in its anti-aligned path. Here highlighted in green. Either path is possible.

Let's suppose that the photon exits by the aligned path. This one highlighted in yellow. Then the photon after exiting the first crystal now has a polarization that is aligned with it. This aligned photon hits crystal 2. The photon's behavior at crystal 2 is deterministic. Crystal 2 is identical to crystal 1. So the photon and the crystal are aligned. The photon will always exit crystal 2 along its aligned path. Similarly, the photon then hits crystal 3, and in a deterministic manner exits by the aligned path of crystal 3.

Let's analyze this behavior again. When an unknown photon hits crystal 1, its behavior is probabilistic. It might exit the first crystal either on its aligned path, or along its antialigned path. Suppose the photon exits crystal 1 along its aligned path. That is, the photon's polarization was measured by crystal 1, and the result of measurement was "aligned".

The photon which exits crystal 1 along the path highlighted in yellow is now aligned with crystal 1.

Recall that crystal 2 and 3 are also oriented identical to crystal 1. So the photon is now aligned with crystal 2 and 3 as well.

When the photon hits crystal 2 and crystal 3, its behavior is deterministic. It is aligned with the crystals. The photon exits crystal 2 in a deterministic manner along the aligned path highlighted in red. Recall what I had said earlier: When we measure a quantum system, the state of the system is changed by the measurement in such a manner that every subsequent measurement with identical apparatus produces the same result as the first measurement. In this case, the photon's polarization is changed at the first measurement, that is at crystal 1, in such a manner that at every subsequent identically oriented calcite crystal, it behaves the same way as it did at the first crystal. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Repeated Measurement 2

Now let's consider a similar experimental apparatus. The difference is that crystals 2 and 3 are arranged along the anti-aligned path.

That is, a photon exiting crystal 1 on its anti-aligned path highlighted in green is directed at crystal 2,

and a photon exiting crystal 2 on its anti-aligned path highlighted in blue is directed at crystal 3.

In this experiment we send another photon of unknown polarization at crystal 1.

Suppose the photon exits crystal 1 along its anti-aligned path. Highlighted in green. In other words, the photon has been measured by the first calcite and the result of measurement was "anti-aligned". When the photon reaches crystal 2, that would be a second measurement. After the first measurement, the photon's polarization was changed, so that in every subsequent calcite measurement, the result is the same. That is, the photon's polarization changed at the first measurement to "anti-aligned", and in every subsequent measurement, the result is "anti-aligned".

Note that this happens only because crystals 1, 2 and 3 are oriented at the same angle. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Repeated Measurement with Filters

The same experiment can be done with polarizing filters.

All the filters shown here are oriented at the same angle.

We send a photon of unknown polarization at the first filter.

The photon might be blocked, or it might get through. If it gets through the first polarizing filter, then the photon's polarization will now be aligned with the second and third filters. So a photon that passes through the first filter will also pass through all subsequent filters that are at the same angle as the first filter. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers.

Running the Java Code

We will be discussing a simulator written in Java. The simulator is in the file PhotonPolarizationMeasurement.java. Click on the above link to download the Java source file. Source code can be viewed at the end of this chapter. Open the file in your favorite Java IDE and scroll down to the main() function. I have used the Netbeans IDE here.

See this code about "Experiment 1" ? Remove the comments.

This code simulates the experiment we performed with calcite and polarizing filters.

This line here, creates a simulated photon at some specified angle. Here it is 90 degrees. The photon is vertically polarized.

The next line measures the photon's polarization with five identical calcite crystals. The result of each measurement is printed. In this code, the calcite is oriented at 45 degrees.

When you run the code, it simulates a sequence of 5 measurements with identical apparatus.

Each time you run the code, the result of the first measurement is a matter of chance. Highlighted here.

But the subsequent 4 measurements always produce the same result as the first measurement. Highlighted here. We will take a closer look at this simulator code shortly. Click here to view a video narration of this chapter. Videos will play on most devices except Kindle e-Ink Readers. /* Simulator for Photon Qubit Measurements */ import java.lang.Math.*; public class PhotonPolarizationMeasurement { private double polarizationAngle; public PhotonPolarizationMeasurement(double angle) { polarizationAngle = angle; } //Measurement returns boolean true for "aligned" and false for "anti-aligned" //Qubit value 1 = Aligned (true) //Qubit value 0 = Anti-Aligned (false) //Observe how measurement causes the state to change public boolean measurePolarization(double angle) { double diffAngle = angle-polarizationAngle; double cosDiffAngle = Math.cos(Math.toRadians(diffAngle)); double probabilityAlign = cosDiffAngle*cosDiffAngle;

double probabilityAntiAlign = 1 - probabilityAlign;

}

if(Math.random()