Functional analysis [draft ed.]

Table of contents :
Remling C. Functional analysis (draft, )()......Page 1
ln11......Page 124
ln12......Page 140
ln13......Page 157
ln14......Page 174
ln15......Page 184

Citation preview

FUNCTIONAL ANALYSIS CHRISTIAN REMLING

Contents 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Metric and topological spaces Banach spaces Consequences of Baire’s Theorem Dual spaces and weak topologies Hilbert spaces Operators in Hilbert spaces Banach algebras Commutative Banach algebras C ∗ -algebras The Spectral Theorem

2 12 30 34 50 61 67 78 87 105

These are lecture notes that have evolved over time. Center stage is given to the Spectral Theorem for (bounded, in this first part) normal operators on Hilbert spaces; this is approached through the Gelfand representation of commutative C ∗ -algebras. Banach space topics are ruthlessly reduced to the mere basics (some would argue, less than that); topological vector spaces aren’t mentioned at all.

1

2

CHRISTIAN REMLING

1. Metric and topological spaces A metric space is a set on which we can measure distances. More precisely, we proceed as follows: let X 6= ∅ be a set, and let d : X ×X → [0, ∞) be a map. Definition 1.1. (X, d) is called a metric space if d has the following properties, for arbitrary x, y, z ∈ X: (1) d(x, y) = 0 ⇐⇒ x = y (2) d(x, y) = d(y, x) (3) d(x, y) ≤ d(x, z) + d(z, y) Property 3 is called the triangle inequality. It says that a detour via z will not give a shortcut when going from x to y. The notion of a metric space is very flexible and general, and there are many different examples. We now compile a preliminary list of metric spaces. Example 1.1. If X 6= ∅ is an arbitrary set, then ( 0 x=y d(x, y) = 1 x 6= y defines a metric on X. Exercise 1.1. Check this. This example does not look particularly interesting, but it does satisfy the requirements from Definition 1.1. Example 1.2. X = C with the metric d(x, y) = |x−y| is a metric space. X can also be an arbitrary non-empty subset of C, for example X = R. In fact, this works in complete generality: If (X, d) is a metric space and Y ⊂ X, then Y with the same metric is a metric space also. Example 1.3. Let X = Cn or X = Rn . For each p ≥ 1, !1/p n X dp (x, y) = |xj − yj |p j=1

defines a metric on X. Properties 1, 2 are clear from the definition, but if p > 1, then the verification of the triangle inequality is not completely straightforward here. We leave the matter at that for the time being, but will return to this example later. An additional metric on X is given by d∞ (x, y) = max |xj − yj | j=1,...,n

FUNCTIONAL ANALYSIS

3

Exercise 1.2. (a) Show that (X, d∞ ) is a metric space. (b) Show that limp→∞ dp (x, y) = d∞ (x, y) for fixed x, y ∈ X. Example 1.4. Similar metrics can be introduced on function spaces. For example, we can take X = C[a, b] = {f : [a, b] → C : f continuous } and define, for 1 ≤ p < ∞, Z

b p

1/p

|f (x) − g(x)| dx

dp (f, g) = a

and d∞ (f, g) = max |f (x) − g(x)|. a≤x≤b

Again, the proof of the triangle inequality requires some care if 1 < p < ∞. We will discuss this later. Exercise 1.3. Prove that (X, d∞ ) is a metric space. Actually, we will see later that it is often advantageous to use the spaces Z b p Xp = L (a, b) = {f : [a, b] → C : f measurable, |f (x)|p dx < ∞} a

instead of X if we want to work with these metrics. We will discuss these issues in much greater detail in Section 2. On a metric space, we can define convergence in a natural way. We just interpret “d(x, y) small” as “x close to y”, and we are then naturally led to make the following definition. Definition 1.2. Let (X, d) be a metric space, and xn , x ∈ X. We say that xn converges to x (in symbols: xn → x or lim xn = x, as usual) if d(xn , x) → 0. Similarly, we call xn a Cauchy sequence if for every  > 0, there exists an N = N () ∈ N so that d(xm , xn ) <  for all m, n ≥ N . We can make some quick remarks on this. First of all, if a sequence xn is convergent, then the limit is unique because if xn → x and xn → y, then, by the triangle inequality, d(x, y) ≤ d(x, xn ) + d(xn , y) → 0, so d(x, y) = 0 and thus x = y. Furthermore, a convergent sequence is a Cauchy sequence: If xn → x and  > 0 is given, then we can find an N ∈ N so that d(xn , x) < /2 if n ≥ N . But then we also have that   (m, n ≥ N ), d(xm , xn ) ≤ d(xn , x) + d(x, xm ) < + =  2 2

4

CHRISTIAN REMLING

so xn is a Cauchy sequence, as claimed. The converse is wrong in general metric spaces. Consider for example X = Q with the metric d(x, y) = |x − y| from Example 1.2. Pick a sequence xn ∈ Q that √ converges in R (that is, in the traditional sense) to an irrational limit ( 2, say). Then xn is a Cauchy sequence in (X, d) because it is convergent in the bigger space (R, d), so, as just observed, xn must be a Cauchy sequence in (R, d). But then xn is also a Cauchy sequence in (Q, d) because this is actually the exact same condition (only the distances d(xm , xn ) matter, we don’t need to know how big the total space is). However, xn can not converge in (Q, d) because then it would have to converge to the same limit in the bigger space (R, d), but by construction, in this space, it converges to a limit that was not in Q. Please make sure you understand exactly how this example works. There’s nothing mysterious about this divergent Cauchy sequence. The sequence really wants to converge, but, unfortunately, the putative limit fails to lie in the space. Spaces where Cauchy sequences do always converge are so important that they deserve a special name. Definition 1.3. Let X be a metric space. X is called complete if every Cauchy sequence converges. The mechanism from the previous example is in fact the only possible reason why spaces can fail to be complete. Moreover, it is always possible to complete a given metric space by including the would-be limits of Cauchy sequences. The bigger space obtained in this way is called the completion of X. We will have no need to apply this construction, so I don’t want to discuss the (somewhat technical) details here. In most cases, the completion is what you think it should be; for example, the completion of (Q, d) is (R, d). Exercise 1.4. Show that (C[−1, 1], d1 ) is not complete. Suggestion: Consider the sequence   −1 −1 ≤ x < −1/n fn (x) = nx −1/n ≤ x ≤ 1/n .  1 1/n < x ≤ 1 A more general concept is that of a topological space. By definition, a topological space X is a non-empty set together with a collection T of distinguished subsets of X (called open sets) with the following properties:

FUNCTIONAL ANALYSIS

5

(1) ∅, X ∈ T S (2) If Uα ∈ T , then also Uα ∈ T . (3) If U1 , . . . , UN ∈ T , then U1 ∩ . . . ∩ UN ∈ T . This structure allows us to introduce some notion of closeness also, but things are fuzzier than on a metric space. We can zoom in on points, but there is no notion of one point being closer to a given point than another point. We call V ⊂ X a neighborhood of x ∈ X if x ∈ V and V ∈ T . (Warning: This is sometimes called an open neighborhood, and it is also possible to define a more general notion of not necessarily open neighborhoods. We will always work with open neighborhoods here.) We can then say that xn converges to x if for every neighborhood V of x, there exists an N ∈ N so that xn ∈ V for all n ≥ N . However, on general topological spaces, sequences are not particularly useful; for example, if T = {∅, X}, then (obviously, by just unwrapping the definitions) every sequence converges to every limit. Here are some additional basic notions for topological spaces. Please make sure you’re thoroughly familiar with these (the good news is that we won’t need much beyond these definitions). Definition 1.4. Let X be a topological space. (a) A ⊂ X is called closed if Ac is open. (b) For an arbitrary subset B ⊂ X, the closure of B ⊂ X is defined as \ A; B= A⊃B;A closed

this is the smallest closed set that contains B (in particular, there always is such a set). (c) The interior of B ⊂ X is the biggest open subset of B (such a set exists). Equivalently, the complement of the interior is the closure of the complement. (d) K ⊂ X is called compact if every open cover of K contains a finite subcover. (e) B ⊂ T is called a neighborhood base of X if for every neighborhood V of some x ∈ X, there exists a B ∈ B with x ∈ B ⊂ V . (f) Let Y ⊂ X be an arbitrary, non-empty subset of X. Then Y becomes a topogical space with the induced (or relative) topology TY = {U ∩ Y : U ∈ T }. (g) Let f : X → Y be a map between topological spaces. Then f is called continuous at x ∈ X if for every neighborhood W of f (x) there exists a neighborhood V of x so that f (V ) ⊂ W . f is called continuous

6

CHRISTIAN REMLING

if it is continuous at every point. (h) A topological space X is called a Hausdorff space if for every pair of points x, y ∈ X, x 6= y, there exist disjoint neighborhoods Vx , Vy of x and y, respectively. Continuity on the whole space could have been (and usually is) defined differently: Proposition 1.5. f is continuous (at every point x ∈ X) if and only if f −1 (V ) is open (in X) whenever V is open (in Y ). Exercise 1.5. Do some reading in your favorite (point set) topology book to brush up your topology. (Folland, Real Analysis, Ch. 4 is also a good place to do this.) Exercise 1.6. Prove Proposition 1.5. Metric spaces can be equipped with a natural topology. More precisely, this topology is natural because it gives the same notion of convergence of sequences. To do this, we introduce balls Br (x) = {y ∈ X : d(y, x) < r}, and use these as a neighborhood base for the topology we’re after. So, by definition, U ∈ T if for every x ∈ U , there exists an  > 0 so that B (x) ⊂ U . Notice also that on R or C with the absolute value metric (see Example 1.2), this gives just the usual topology; in fact, the general definition mimics this procedure. Theorem 1.6. Let X be a metric space, and let T be as above. Then T is a topology on X, and (X, T ) is a Hausdorff space. Moreover, Br (x) is open and d

xn → − x

⇐⇒

T

xn − → x.

Proof. Let’s first check S that T is a topology on X. Clearly, ∅, X ∈ T . If Uα ∈ T and x ∈ Uα , then x ∈ Uα0 for some index α0 , and since Uα0 is open, S there exists a ball Br (x) ⊂ Uα0 , but then Br (x) is also contained in Uα . T Similarly, if U1 , . . . , UN are open sets and x ∈ Uj , then x ∈ UTj for all j, so we can find N balls Brj (x) ⊂ Uj . It follows that Br (x) ⊂ Uj , with r := min rj . Next, we prove that Br (x) ∈ T for arbitrary r > 0, x ∈ X. Let y ∈ Br (x). We want to find a ball about y that is contained in the original ball. Since y ∈ Br (x), we have that  := r − d(x, y) > 0, and I now claim that B (y) ⊂ Br (x). Indeed, if z ∈ B (y), then, by the triangle inequality, d(z, x) ≤ d(z, y) + d(y, x) <  + d(y, x) = r,

FUNCTIONAL ANALYSIS

7

so z ∈ Br (x), as desired. The Hausdorff property also follows from this, because if x 6= y, then r := d(x, y) > 0, and Br/2 (x), Br/2 (y) are disjoint neighborhoods of x, y. Exercise 1.7. It seems intuitively obvious that Br/2 (x), Br/2 (y) are disjoint. Please prove it formally. d

Finally, we discuss convergent sequences. If xn → − x and V is a neighborhood of x, then, by the way T was defined, there exists  > 0 so that B (x) ⊂ V . We can find N ∈ N so that d(xn , x) <  for n ≥ N , or, equivalently, xn ∈ B (x) for n ≥ N . So xn ∈ V for large enough n. T → x. This verifies that xn − Conversely, if this is assumed, it is clear that we must also have that d xn → − x because we can take V = B (x) as our neighborhood of x and we know that xn ∈ V for all large n.  In metrizable topological spaces (that is, topological spaces where the topology comes from a metric, in this way) we can always work with sequences. This is a big advantage over general topological spaces. Theorem 1.7. Let (X, d) be a metric space, and introduce a topology T on X as above. Then: (a) A ⊂ X is closed ⇐⇒ If xn ∈ A, x ∈ X, xn → x, then x ∈ A. (b) Let B ⊂ X. Then B = {x ∈ X : There exists a sequence xn ∈ B, xn → x}. (c) K ⊂ X is compact precisely if every sequence xn ∈ K has a subsequence that is convergent in K. These statements are false in general topological spaces (where the topology does not come from a metric). Proof. We will only prove part (a) here. If A is closed and x ∈ / A, then, c since A is open, there exists a ball Br (x) that does not intersect A. This means that if xn ∈ A, xn → x, then we also must have that x ∈ A. Conversely, if the condition on sequences from A holds and x ∈ / A, then there must be an r > 0 so that Br (x) ∩ A = ∅ (if not, pick an xn from B1/n (x) ∩ A for each n; this gives a sequence xn ∈ A, xn → x, but x ∈ / A, contradicting our assumption). This verifies that Ac is open and thus A is closed. Exercise 1.8. Prove Theorem 1.7 (b), (c). 

8

CHRISTIAN REMLING

Similarly, sequences can be used to characterize continuity of maps between metric spaces. Again, this doesn’t work on general topological spaces. Theorem 1.8. Let (X, d), (Y, e) be metric spaces, let f : X → Y be a function, and let x ∈ X. Then the following are equivalent: (a) f is continuous at x (with respect to the topologies induced by d, e). (b) For every  > 0, there exists a δ > 0 so that e(f (x), f (t)) <  for all t ∈ X with d(x, t) < δ. (c) If xn → x in X, then f (xn ) → f (x) in Y . Proof. If (a) holds and  > 0 is given, then, since B (f (x)) is a neighborhood of f (x), there exists a neighborhood U of x so that f (U ) ⊂ B (f (x)). From the way the topology on a metric space is defined, we see that U must contain a ball Bδ (x), and (b) follows. If (b) is assumed and  > 0 is given, pick δ > 0 according to (b) and then N ∈ N so that d(xn , x) < δ for n ≥ N . But then we also have that e(f (x), f (xn )) <  for all n ≥ N , that is, we have verified that f (xn ) → f (x). Finally, if (c) holds, we argue by contradiction to obtain (a). So assume that, contrary to our claim, we can find a neighborhood V of f (x) so that for every neighborhood U of x, there exists t ∈ U so that f (t) ∈ / V . In particular, we can then pick an xn ∈ B1/n (x) for each n, so that f (xn ) ∈ / V . Since V is a neighborhood of f (x), there exists  > 0 so that B (f (x)) ⊂ V . Summarizing, we can say that xn → x, but e(f (xn ), f (x)) ≥ ; in particular, f (xn ) 6→ f (x). This contradicts (c), and so we have to admit that (a) holds.  The following fact is often useful: Proposition 1.9. Let (X, d) be a metric space and Y ⊂ X. As above, write T for the topology generated by d on X. Then (Y, d) is a metric space, too (this is obvious and was also already observed above). Moreover, the topology generated by d on Y is the relative topology of Y as a subspace of (X, T ). Exercise 1.9. Prove this (this is done by essentially chasing definitions, but it is a little awkward to write down). We conclude this section by proving our first fundamental functional analytic theorem. We need one more topological definition: We call a set M ⊂ X nowhere dense if M has empty interior. If X is a metric space, we can also say that M ⊂ X is nowhere dense if M contains no (non-empty) open ball.

FUNCTIONAL ANALYSIS

9

Theorem 1.10 (Baire). Let X beSa complete metric space. If the sets An ⊂ X are nowhere dense, then n∈N An 6= X. Completeness is crucial here: Exercise 1.10. Show that there are (necessarily: non-complete) metric spaces that are countable unions of nowhere dense sets. Suggestion: X = Q Proof. The following proof is similar in spirit to Cantor’s famous diagonal trick, which proves that S [0, 1] is uncountable. We will construct an element that is not in An by avoiding these sets step by step. First of all, we may assume that the An ’s are closed (if not, replace An with An ; note that these sets are still nowhere dense). Then, since A1 is nowhere dense, we can find an x1 ∈ Ac1 . In fact, c A1 is also open, so we can even find an open ball Br1 (x1 ) ⊂ Ac1 , and here we may assume that r1 ≤ 2−1 (decrease r1 if necessary). In the next step, we pick an x2 ∈ Br1 /2 (x1 ) \ A2 . There must be such a point because A2 is nowhere dense and thus cannot contain the ball Br1 /2 (x1 ). Moreover, we can again find r2 > 0 so that Br2 (x2 ) ∩ A2 = ∅,

Br2 (x2 ) ⊂ Br1 /2 (x1 ),

r2 ≤ 2−2 .

We continue in this way and construct a sequence xn ∈ X and radii rn > 0 with the following properties: Brn (xn ) ∩ An = ∅,

Brn (xn ) ⊂ Brn−1 /2 (xn−1 ),

rn ≤ 2−n

It follows that xn is a Cauchy sequence. Indeed, if m ≥ n, then xm lies in Brn /2 (xn ), so rn (1.1) d(xm , xn ) ≤ . 2 Since X is complete, x := lim xn exists. Moreover, d(xn , x) ≤ d(xn , xm ) + d(xm , x) for arbitrary m ∈ N. For m ≥ n, (1.1) shows that d(xn , xm ) ≤ rn /2, so if we let m → ∞, it follows that rn (1.2) d(xn , x) ≤ . 2 By construction, Brn (xn ) ∩ An = ∅, so (1.2) says that x ∈ / An for all n.  Baire’s Theorem can be (and often is) formulated differently. We need one more topological definition: We call a set M ⊂ X dense if M = X. For example, Q is dense in R. Similarly, Qc is also dense in R. However, note that (of course) Q ∩ Qc = ∅.

10

CHRISTIAN REMLING

Theorem 1.11 (Baire). Let X be T a complete metric space. If Un (n ∈ N) are dense open sets, then n∈N Un is dense. Exercise 1.11. Derive this from Theorem 1.10. Suggestion: If U is dense and open, then A = U c is nowhere dense (prove this!). Now apply Theorem 1.10. This will not quite give the full claim, but you can also apply Theorem 1.10 on suitable subspaces of the original space. An immediate consequence of this, in turn, is the following slightly stronger looking version. By definition, a Gδ set is a countable intersection of open sets. Exercise 1.12. Give an example that shows that a Gδ set need not be open (but, conversely, open sets are of course Gδ sets). Theorem 1.12 (Baire). Let X be a complete metric space. Then a countable intersection of dense Gδ sets is a dense Gδ set. Exercise 1.13. Derive this from the previous theorem. Given this result, it makes sense to interpret dense Gδ sets as big sets, in a topological sense, and their complements as small sets. Theorem 1.12 then says that even a countable union of small sets will still be small. Call a property of elements of a complete metric space generic if it holds at least on a dense Gδ set. Theorem 1.12 has a number of humoristic applications, which say that certain unexpected properties are in fact generic. Here are two such examples: Example 1.5. Let X = C[a, b] with metric d(f, g) = max |f (x) − g(x)| (compare Example 1.4). This is a complete metric space (we’ll prove this later). It can now be shown, using Theorem 1.12, that the generic continuous function is nowhere differentiable. Example 1.6. The generic coin flip refutes the law of large numbers. More precisely, we P proceed as follows. Let X = {(xn )n≥1 : xn = −n 0 or 1} and d(x, y) = ∞ |xn − yn |. This is a metric and X with n=1 2 this metric is complete, but we don’t want to prove this here. In fact, this metric is a natural choice here; it generates the product topology on X. From probability theory, we know that if the xn are independent random variables and the coin is fair, then, with probability 1, we have that Sn /n → 1/2, where Sn = x1 + . . . + xn is the number of heads (say) in the first n coin tosses.

FUNCTIONAL ANALYSIS

11

The generic behavior is quite different: For a generic sequence x ∈ X, Sn Sn lim inf = 0, lim sup = 1. n→∞ n n n→∞ Since these examples are for entertainment only, we will not prove these claims here. Baire’s Theorem is fundamental in functional analysis, and it will have important consequences. We will discuss these in Chapter 3. Exercise 1.14. Consider the space X = C[0, 1] with the metric d(f, g) = max0≤x≤1 |f (x) − g(x)| (compare Example 1.4). Define fn ∈ X by ( 2n x 0 ≤ x ≤ 2−n fn (x) = . 1 2−n < x ≤ 1 Work out d(fn , 0) and d(fm , fn ), and deduce from the results of this calculation that S = {f ∈ X : d(f, 0) = 1} is not compact. Exercise 1.15. Let X, Y be topological spaces, and let f : X → Y be a continuous map. True or false (please give a proof or a counterexample): (a) U ⊂ X open =⇒ f (U ) open (b) A ⊂ Y closed =⇒ f −1 (A) closed (c) K ⊂ X compact =⇒ f (K) compact (d) L ⊂ Y compact =⇒ f −1 (L) compact Exercise 1.16. Let X be a metric space, and define, for x ∈ X and r > 0, B r (x) = {y ∈ X : d(y, x) ≤ r}. (a) Show that B r (x) is always closed. (b) Show that Br (x) ⊂ B r (x). (By definition, the first set is the closure of Br (x).) (c) Show that it can happen that Br (x) 6= B r (x).

12

CHRISTIAN REMLING

2. Banach spaces Let X be a complex vector space. So the elements of X (“vectors”) can be added and multiplied by complex numbers (“scalars”), and these operations obey the usual algebraic rules. Definition 2.1. A map k · k : X → [0, ∞) is called a norm (on X) if it has the following properties for arbitrary x, y ∈ X, c ∈ C: (1) kxk = 0 ⇐⇒ x = 0 (2) kcxk = |c| kxk (3) kx + yk ≤ kxk + kyk We may interpret a given norm as assigning a length to a vector. Property (3) is again called the triangle inequality. It has a similar interpretation as in the case of a metric space. A vector space with a norm defined on it is called a normed space. If (X, k·k) is a normed space, then d(x, y) := kx−yk defines a metric on X. Exercise 2.1. Prove this remark. Therefore, all concepts and results from Chapter 1 apply to normed spaces also. In particular, a norm generates a topology on X. We repeat here some of the most basic notions: A sequence xn ∈ X is said to converge to x ∈ X if kxn − xk → 0 (note that these norms form a sequence of numbers, so it’s clear how to interpret this latter convergence). We call xn a Cauchy sequence if kxm − xn k → 0 as m, n → ∞. The open ball of radius r > 0 about x ∈ X is defined as Br (x) = {y ∈ X : ky − xk < r}. This set is indeed open in the topology mentioned above; more generally, an arbitrary set U ⊂ X is open precisely if for every x ∈ U , there exists an r = r(x) > 0 so that Br (x) ⊂ U . Finally, recall that a space is called complete if every Cauchy sequence converges. Complete normed spaces are particularly important; for easier reference, they get a special name: Definition 2.2. A Banach space is a complete normed space. The following basic properties of norms are relatively direct consequences of the definition, but they are extremely important when working on normed spaces. Exercise 2.2. (a) Prove the second triangle inequality: kxk − kyk ≤ kx − yk

Banach spaces

13

(b) Prove that the norm is a continuous map X → R; put differently, if xn → x, then also kxn k → kxk. Exercise 2.3. Prove that the vector space operations are continuous. In other words, if xn → x and yn → y (and c ∈ C), then also xn + yn → x + y and cxn → cx. Let’s now collect some examples of Banach spaces. It turns out that most of the examples for metric spaces that we considered in Chapter 1 actually have a natural vector space structure and the metric comes from a norm. Example 2.1. The simplest vector spaces are the finite-dimensional spaces. Every n-dimensional (complex) vector space is isomorphic to Cn , so it will suffice to consider X = Cn . We would like to define norms on this space, and we can in fact turn to Example 1.3 for inspiration. For x = (x1 , . . . , xn ) ∈ X, let !1/p n X (2.1) kxkp = |xj |p , j=1

for 1 ≤ p < ∞, and kxk∞ = max |xj |.

(2.2)

j=1,...,n

I claim that this defines a family of norms (one for each p, 1 ≤ p ≤ ∞), but we will not prove this in this setting. Rather, we will right away prove a more general statement in Example 2.2 below. (Only the triangle inequality for 1 < p < ∞ needs serious proof; everything else is fairly easy to check here anyway.) Example 2.2. We now consider infinite-dimensional versions of the Banach spaces from the previous example. Instead of finite-dimensional vectors (x1 , . . . , xn ), we now want to work with infinite sequences x = (x1 , x2 , . . .), and we want to continue to use (2.1), (2.2), or at least something similar. We first of all introduce the maximal spaces on which these formulae seem to make sense. Let ( ) ∞ X |xn |p < ∞ `p = x = (xn )n≥1 : n=1

(for 1 ≤ p < ∞) and ∞

`

  = x = (xn )n≥1 : sup |xn | < ∞ . n≥1

14

Christian Remling

Then, as expected, for x ∈ `p , define !1/p ∞ X kxkp = |xn |p

(p < ∞),

n=1

kxk∞ = sup |xn |. n≥1

Proposition 2.3. `p is a Banach space for 1 ≤ p ≤ ∞. Here, the algebraic operations on `p are defined in the obvious way: we perform them componentwise; for example, x + y is the sequence whose nth element is xn + yn . Proof. We will explicitly prove this only for 1 < p < ∞; the cases p = 1, p = ∞ are easier and can be handled by direct arguments. First of all, we must check that `p is a vector space. Clearly, if x ∈ `p and c ∈ C, then also cx ∈ `p . Moreover, if x, y ∈ `p , then, since |xn + yn |p ≤ (2|xn |)p + (2|yn |)p , we also have that x + y ∈ `p . So addition and multiplication by scalars can be defined on all of `p , and it is clear that the required algebraic laws hold because all calculations are performed on individual components, so we just inherit the usual rules from C. Next, we verify that k · kp is a norm on `p . Properties (1), (2) from Definition 2.1 are obvious. The proof of the triangle inequality will depend on the following very important inequality: Theorem 2.4 (H¨older’s inequality). Let x ∈ `p , y ∈ `q , where p, q satisfy 1 1 + =1 p q (1/0 := ∞, 1/∞ := 0 in this context). Then xy ∈ `1 and kxyk1 ≤ kxkp kykq . Proof of Theorem 2.4. Again, we focus on the case 1 < p < ∞; if p = 1 or p = ∞, an uncomplicated direct argument is available. The function ln x is concave, that is, the graph lies above line segments connecting any two of its points (formally, this follows from the fact that (ln x)00 = −1/x2 < 0). In other words, if a, b > 0 and 0 ≤ α ≤ 1, then α ln a + (1 − α) ln b ≤ ln (αa + (1 − α)b) . We apply the exponential function on both sides and obtain that aα b1−α ≤ αa + (1 − α)b. If we introduce the new variables c, d by writing a = cp ,

Banach spaces

15

b = dq , with 1/p = α (so 1/q = 1 − α), then this becomes cp dq (2.3) cd ≤ + . p q This holds for all c, d ≥ 0 (the original argument is valid only if c, d > 0, but of course (2.3) is trivially true if c = 0 or d = 0). In particular, we can use (2.3) with c = |xn |/kxkp , d = |yn |/kykq (at least if kxkp , kykq 6= 0, but if that fails, then the claim is trivial anyway) and then sum over n ≥ 1. This shows that ∞ ∞ ∞ X X X |xn yn | |xn |p |yn |q 1 1 ≤ + + = 1, p q = kxk kyk pkxk qkyk p q p q p q n=1 n=1 n=1 so xy ∈ `1 , as claimed, and we obtain H¨older’s inequality.



We are now in a position to establish the triangle inequality on `p : Theorem 2.5 (Minkowski’s inequality = triangle inequality on `p ). Let x, y ∈ `p . Then x + y ∈ `p and kx + ykp ≤ kxkp + kykp . Proof of Theorem 2.5. Again, we will discuss explicitly only the case 1 < p < ∞. We already know that x + y ∈ `p . H¨older’s inequality with the given p (and thus q = p/(p − 1)) shows that X X kx + ykpp = |xn + yn |p = |xn + yn | |xn + yn |p−1 X X ≤ |xn | |xn + yn |p−1 + |yn | |xn + yn |p−1 ≤ (kxkp + kykp ) kx + ykp−1 p . If x+y 6= 0, we can divide by kx+ykpp−1 to obtain the desired inequality, and if x + y = 0, then the claim is trivial.  It remains to show that `p is complete. So let x(n) ∈ `p be a Cauchy sequence (since the elements of `p are themselves sequences, we really have a sequence whose members are sequences; we use a super script to label the elements of the Cauchy sequence from X = `p to avoid confusion with the index labeling the components of a fixed element of `p ). Clearly, p (m) (m) xj − xj ≤ kx(m) − x(n) kpp   (n) for each fixed j ≥ 1, so xj is a Cauchy sequence of complex n≥1

numbers. Now C is complete, so these sequences have limits in C. Define (n) xj = lim xj . n→∞

16

Christian Remling

I claim that x = (xj ) ∈ `p and x(n) → x in the norm of `p . To verify that x ∈ `p , we observe that for arbitrary N ∈ N, N X j=1

N X (n) p |xj | = lim xj ≤ lim sup kx(n) kp . p

n→∞

n→∞

j=1

Exercise 2.4. Let xn ∈ X be Cauchy sequence in a normed space X. Prove that xn is bounded in the following sense: There exists C > 0 so that kxn k ≤ C for all n ≥ 1. Exercise 2.4 now shows that N X

|xj |p ≤ C

j=1

for some fixed, N independent constant C, so x ∈ `p , as required. It remains to show that kx(n) − xkp → 0. Let  > 0 be given and pick N0 ∈ N so large that kx(n) − x(m) k <  if m, n ≥ N0 (this is possible because x(n) is a Cauchy sequence). Then, for fixed N ∈ N, we have that N N p p X X (n) (n) (m) xj − xj ≤  xj − xj = lim m→∞

j=1

j=1

if n ≥ N0 . Since N ∈ N was arbitrary, it also follows that kx(n) −xkpp ≤  for n ≥ N0 .  Similar spaces can be defined for arbitrary index sets I instead of N. For example, by definition, the elements of `p (I) are complex valued functions x : I → C with X (2.4) |xj |p < ∞. j∈I

If I is uncountable, this sum needs interpretation. We can do this by hand, as follows: (2.4) means that xj 6= 0 only for countably many j ∈ I, and the corresponding sum is finite. Alternatively, and more elegantly, we can also use the counting measure on I and interpret the sum as an integral. If we want to emphasize the fact that we’re using N as the index set, we can also denote the spaces discussed above by `p (N). When no confusion has to be feared, we will usually prefer the shorter notation `p . We can also consider finite index sets I = {1, 2, . . . , n}. We then obtain that `p ({1, 2, . . . , n}) = Cn as a set, and the norms on these spaces are the ones that were already introduced in Example 2.1 above.

Banach spaces

17

Example 2.3. Two more spaces of sequences are in common use. In both cases, the index set is usually N (or sometimes Z). Put n o c = x : lim xn exists , n→∞ n o c0 = x : lim xn = 0 . n→∞



It is clear that c0 ⊂ c ⊂ ` . In fact, more is true: the smaller spaces are (algebraic linear) subspaces of the bigger spaces. On c and c0 , we also use the norm k · k∞ (as on the big space `∞ ). Proposition 2.6. c and c0 are Banach spaces. Proof. We can make use of the observation made above, that c0 ⊂ c ⊂ `∞ and then refer to the following fact: Proposition 2.7. Let (X, k · k) be a Banach space, and let Y ⊂ X. Then (Y, k · k) is a Banach space if and only if Y is a closed (linear) subspace of X. Exercise 2.5. Prove Proposition 2.7. Recall that on metric (and thus also normed and Banach) spaces, you can use sequences to characterize topological notions. So a subset is closed precisely if all limits of convergent sequences from the set lie in the set again. So we only need to show that c and c0 are closed in `∞ . Exercise 2.6. Complete the proof of Proposition 2.6 along these lines.  Example 2.4. Function spaces provide another very important class of Banach spaces. The discussion is in large parts analogous to our treatment of sequence spaces (Examples 2.2, 2.3); sometimes, sequence spaces are somewhat more convenient to deal with and, as we will see in a moment, they can actually be interpreted as function spaces of a particular type. Let (X, M, µ) be a measure space (with a positive measure µ). The discussion is most conveniently done in this completely general setting, but if you prefer a more concrete example, you could think of X = Rn with Lebesgue measure, as what is probably the most important special case. Recalling what we did above, it now seems natural to introduce (for 1 ≤ p < ∞)   Z p p L (X, µ) = f : X → C : f measurable, |f (x)| dµ(x) < ∞ . X

18

Christian Remling

Note that this set also depends on the σ-algebra M, but this dependence is not made explicit in the notation. We would then like to define Z 1/p p kf kp = |f | dµ . X

This, however, does not give a norm in general because kf kp = 0 precisely if f = 0 almost everywhere, so usually there will be functions of zero “norm” that are not identically equal to zero. Fortunately, there is an easy fix for this problem: we simply identify functions that agree almost everywhere. More formally, we introduce an equivalence relation on Lp , as follows: f ∼ g ⇐⇒ f (x) = g(x) for µ-almost every x ∈ X We then let Lp be the set of equivalence classes: Lp (X, µ) = {(f ) : f ∈ Lp (X, µ)} , where (f ) = {g ∈ Lp : g ∼ f }. We obtain a vector space structure on Lp in the obvious way; for example, (f ) + (g) := (f + g) (it needs to be checked here that the equivalence class on the right-hand side is independent of the choice of representatives f , g, but this is obvious from the definitions). Moreover, we can put k(f )kp := kf kp ; again, it doesn’t matter which function from (f ) we take on the righthand side, so this is well defined. In the same spirit (“ignore what happens on null sets”), we define L∞ (X, µ) = {f : X → C : f essentially bounded}. A function f is called essentially bounded if there is a null set N ⊂ X so that |f (x)| ≤ C for x ∈ X \ N . Such a C is called an essential bound. If f is essentially bounded, its essential supremum is defined as the best essential bound: ess sup |f (x)| =

inf

sup |f (x)|

N :µ(N )=0 x∈X\N

= inf{C ≥ 0 : µ({x ∈ X : |f (x)| > C}) = 0} Exercise 2.7. (a) Prove that both formulae give the same result. (b) Prove that ess sup |f | is itself an essential bound: |f | ≤ ess sup |f | almost everywhere. Finally, we again let L∞ = {(f ) : f ∈ L∞ } ,

Banach spaces

19

and we put k(f )k∞ = ess sup |f (x)|. Strictly speaking, the elements of the spaces Lp are not functions, but equivalence classes of functions. Sometimes, it is important to keep this distinction in mind; for example, it doesn’t make sense to talk about f (0) for an (f ) ∈ L1 (R, m), say, because m({0}) = 0, so we can change f at x = 0 without leaving the equivalence class (f ). However, for most purposes, no damage is done if, for convenience and as a figure of speech, we simply refer to the elements of Lp as “functions” anyway (as in “let f be a function from L1 ”, rather than the pedantic and clumsy “let F be an element of L1 and pick a function f ∈ L1 that represents the equivalence class F ”). This convention is in universal use (it is similar to, say, “right lane must exit”). Proposition 2.8. Lp (X, µ) is a Banach space for 1 ≤ p ≤ ∞. We will not give the complete proof of this because the discussion is reasonably close to our previous treatment of `p . Again, the two main issues are the triangle inequality and completeness. The proof of the triangle inequality follows the pattern of the above proof very closely. To establish completeness, we (unsurprisingly) need facts from the theory of the Lebesgue integral, so this gives us a good opportunity to review some of these tools. We will give this proof only for p = 1 (1 < p < ∞ is similar, and p = ∞ can again be handled by a rather direct argument). So let fn ∈ L1 be a Cauchy sequence. Pick a subsequence nk → ∞ so that kfnk+1 − fnk k < 2−k . Exercise 2.8. Prove that nk ’s with these properties can indeed be found. Let

j X fn Sj (x) =

k+1

(x) − fnk (x) .

k=1

Then Sj is measurable, non-negative, and Sj+1 ≥ Sj . So, if we let S(x) = limj→∞ Sj (x) ∈ [0, ∞], then the Monotone Convergence Theorem shows that Z Z j Z X fn − fn dµ S dµ = lim Sj dµ = lim k+1 k X

j→∞

X

j→∞

k=1

X

j

= lim

j→∞ 1

∞ X

X

fn − fn < 2−k = 1. k+1 k k=1

k=1

In particular, S ∈ L , and this implies that S < ∞ almost everywhere.

20

Christian Remling

The same conclusion can be obtained from Fatou’s Lemma; let us do this too, to get some additional practice: Z Z Z Z S dµ = lim Sj dµ = lim inf Sj dµ ≤ lim inf Sj dµ X j→∞

X

X

j→∞

j→∞

X

We can conclude R the argument as in the preceding paragraph, and we again see that S < 1, so S < ∞ almost everywhere. For almost every x ∈ X, we can define ∞ X  f (x) := fn1 (x) + fnk+1 (x) − fnk (x) ; k=1

indeed, we just verified that this series actually converges absolutely for almost every x ∈ X. Moreover, the sum is telescoping, so in fact f (x) = lim fnj (x) j→∞

for a.e. x. Also, ∞ X fn (x) − fn (x) . f (x) − fn (x) ≤ j k k+1 k=j

Since this latter sum is dominated by S ∈ L1 , this shows, first of all, that |f −fnj | ∈ L1 and thus also f ∈ L1 (because |f | ≤ |fnj |+|f −fnj |). Moreover, the functions |f − fnj | satisfy the hypotheses of Dominated Convergence, so we obtain that Z f − fn dµ = 0. lim j j→∞

X

To summarize: given the Cauchy sequence fn ∈ L1 , we have constructed a function f ∈ L1 , and kfnj − f k → 0. This is almost what we set out to prove. For the final step, we can refer to the following general fact. Exercise 2.9. Let xn be a Cauchy sequence from a metric space Y . Suppose that xnj → x for some subsequence (and some x ∈ Y ). Prove that then in fact xn → x. We also saw in this proof that fnj → f pointwise almost everywhere. This is an extremely useful fact, so it’s a good idea to state it again (for general p). Corollary 2.9. If kfn − f kp → 0, then there exists a subsequence fnj that converges to f pointwise almost everywhere. Exercise 2.10. Give a (short) direct argument for the case p = ∞. Show that in this case, it is not necessary to pass to a subsequence.

Banach spaces

21

If I is an arbitrary set (the case I = N is of particular interest here), M = P(I) and µ is the counting measure on I (so µ(A) equals the number of elements of A), then Lp (I, µ) is the space `p (I) that was discussed earlier, in Example 2.2. Note that on this measure space, the only null set is the empty set, so there’s no difference between Lp and Lp here. Example 2.5. Our final example can perhaps be viewed as a mere variant of L∞ , but this space will become very important for us later on. We start out with a compact Hausdorff space K. A popular choice would be K = [a, b], with the usual topology, but the general case will also be needed. We now consider C(K) = {f : K → C : f continuous }, with the norm kf k = kf k∞ = max |f (x)|. x∈K

The maximum exists because |f |(K), being a continuous image of a compact space, is a compact subset of R. As anticipated, we then have the following: Proposition 2.10. k · k∞ is a norm on C(K), and C(K) with this norm is a Banach space. The proof is very similar to the corresponding discussion of L∞ ; I don’t want to discuss it in detail here. In fact, if there is a measure on K that gives positive weight to all non-empty open sets (such as Lebesgue measure on [a, b]), then C(K) can be thought of as a subspace of L∞ . Exercise 2.11. Can you imagine why we want the measure to give positive weight to open sets? Hint: Note that the elements of C(K) are genuine functions, while the elements of L∞ (K, µ) were defined as equivalence classes of functions, so if we want to think of C(K) as a subset of L∞ , we need a way to identify continuous functions with equivalence classes. Exercise 2.12. Prove that C(K) is complete. In the sequel, we will be interested mainly in linear maps between Banach spaces (and not so much in the spaces themselves). More generally, let X, Y be normed spaces. Recall that a map A : X → Y is called linear if A(x1 +x2 ) = Ax1 +Ax2 and A(cx) = cAx. In functional analysis, we usually refer to linear maps as (linear) operators. The null

22

Christian Remling

space (or kernel ) and the range (or image) of an operator A are defined as follows: N (A) = {x ∈ X : Ax = 0}, R(A) = {Ax : x ∈ X} Theorem 2.11. Let A : X → Y be a linear operator. Then the following are equivalent: (a) A is continuous (everywhere); (b) A is continuous at x = 0; (c) A is bounded: There exists a constant C ≥ 0 so that kAxk ≤ Ckxk for all x ∈ X. Proof. (a) =⇒ (b): This is trivial. (b) =⇒ (c): Suppose that A was not bounded. Then we can find, for every n ∈ N, a vector xn ∈ X so that kAxn k > nkxn k. Let yn = (1/(nkxn k))xn . Then kyn k = 1/n, so yn → 0, but kAyn k > 1, so Ayn can not go to the zero vector, contradicting (b). (c) =⇒ (a): Suppose that xn → x. We want to show that then also Axn → Ax, and indeed this follows immediately from the linearity and boundedness of A: kAxn − Axk = kA(xn − x)k ≤ Ckxn − xk → 0  Given two normed spaces X, Y , we introduce the space B(X, Y ) of bounded (or continuous) linear operators from X to Y . The special case X = Y is of particular interest; in this case, we usually write B(X) instead of B(X, X). B(X, Y ) becomes a vector space if addition and multiplication by scalars are defined in the obvious way (for example, (A + B)x := Ax + Bx). We can go further and also introduce a norm on B(X, Y ), as follows: kAk = sup x6=0

kAxk kxk

Since A is assumed to be bounded here, the supremum will be finite. We call kAk the operator norm of A (that this is a norm will be seen in Theorem 2.12 below). There are a number of ways to compute kAk.

Banach spaces

23

Exercise 2.13. Prove the following formulae for kAk (for A ∈ B(X, Y )): kAk = inf{C ≥ 0 : kAxk ≤ Ckxk for all x ∈ X} = min{C ≥ 0 : kAxk ≤ Ckxk for all x ∈ X} kAk = sup kAxk kxk=1

In particular, we have that kAxk ≤ kAk kxk, and kAk is the smallest constant for which this inequality holds. Exercise 2.14. However, it is not necessarily true that kAk = maxkxk=1 kAxk. Provide an example of such an operator A. Suggestion: X = Y = c0 (or `1 if you prefer, this also works very smoothly), and define (Ax)n = an xn , where an is a suitably chosen bounded sequence. Theorem 2.12. (a) B(X, Y ) with the operator norm is a normed space. (b) If Y is a Banach space, then B(X, Y ) (with the operator norm) is a Banach space. The special case Y = C (recall that this is a Banach space if we use the absolute value as the norm) is particularly important. We use the alternative notation X ∗ = B(X, C), and we call the elements of X ∗ (continuous, linear) functionals. X ∗ itself is called the dual space (or just the dual) of X. This must not be confused with the dual space from linear algebra, which is defined as the set of all linear maps from the original vector space back to its base field (considered as a vector space also). This is of limited use in functional analysis. The (topological) dual X ∗ consists only of continuous maps; it is usually much smaller than the algebraic dual described above. Proof. (a) We observed earlier that B(X, Y ) is a vector space, so we need to check that the operator norm satisfies the properties from Definition 2.1. First of all, we will have kAk = 0 precisely if Ax = 0 for all x ∈ X, that is, precisely if A is the zero map or, put differently, A = 0 in B(X, Y ). Next, if c ∈ C and A ∈ B(X, Y ), then kcAk = sup kcAxk = sup |c|kAxk = |c|kAk. kxk=1

kxk=1

A similar calculation establishes the third property from Definition 2.1: kA + Bk = sup k(A + B)xk ≤ sup (kAxk + kBxk) ≤ kAk + kBk kxk=1

kxk=1

24

Christian Remling

(b) Let An be a Cauchy sequence from B(X, Y ). We must show that An converges. Observe that for fixed x, An x will be a Cauchy sequence in Y . Indeed, kAm x − An xk ≤ kAm − An kkxk can be made arbitrarily small by taking both m and n large enough. Since Y is now assumed to be complete, the limits Ax := limn→∞ An x exist, and we can define a map A on X in this way. We first check that A is linear: A(x1 + x2 ) = lim An (x1 + x2 ) = lim (An x1 + An x2 ) n→∞

n→∞

= lim An x1 + lim An x1 = Ax1 + Ax2 , n→∞

n→∞

and a similar (if anything, this is easier) argument shows that A(cx) = cAx. A is also bounded because kAxk = k lim An xk = lim kAn xk ≤ (sup kAn k) kxk; the supremum is finite because |kAm k − kAn k| ≤ kAm − An k, so kAn k forms a Cauchy sequence of real numbers and thus is convergent and, in particular, bounded. Notice also that we used the continuity of the norm for the second equality (see Exercise 2.2(b)). Summing up: we have constructed a map A and confirmed that in fact A ∈ B(X, Y ). The final step will be to show that An → A, with respect to the operator norm in B(X, Y ). Let x ∈ X, kxk = 1. Then, by the continuity of the norm again, k(A − An )xk = lim k(Am − An )xk ≤ lim sup kAm − An k. m→∞

m→∞

Since x was arbitrary, it also follows that kA − An k ≤ lim sup kAm − An k. m→∞

Since An is a Cauchy sequence, the lim sup can be made arbitrarily small by taking n large enough.  Perhaps somewhat surprisingly, there are discontinuous linear maps if the first space, X, is infinite-dimensional. We can then even take Y = C. An abstract construction can be done as follows: Let {eα } be an algebraic basis of X (that is, every x ∈ X can be written in a unique way as a linear combination of (finitely many) eα ’s). For arbitrary complex numbers cα , there exists a linear map A : X → C with Aeα = cα keα k.

Banach spaces

25

Exercise 2.15. This problem reviews the linear algebra fact needed here. Let V , W be vector spaces (over C, say), and let {eα } be a basis of V . Show that for every collection of vectors wα ∈ W , there exists a unique linear map A : V → W so that Aeα = wα for all α. Since kAeα k/keα k = |cα |, we see that A can not be bounded if supα |cα | = ∞. On the other hand, if dim X < ∞, then linear operators A : X → Y are always bounded. We will see this in a moment; before we do this, we introduce a new concept and prove a related result. Definition 2.13. Two norms on a common space X are called equivalent if they generate the same topology. This can be restated in a less abstract way: Proposition 2.14. The norms k · k1 , k · k2 are equivalent if and only if there are constants C1 , C2 > 0 so that (2.5)

C1 kxk1 ≤ kxk2 ≤ C2 kxk1

for all x ∈ X.

Proof. Consider the identity as a map from (X, k · k1 ) to (X, k · k2 ). Clearly, this is bijective, and, by Theorem 2.11 this map and its inverse are continuous precisely if (2.5) holds. Put differently, (2.5) is equivalent to the identity map being a homeomorphism (a bijective continuous map with continuous inverse), and this holds if and only if (X, k · k1 ) and (X, k · k2 ) have the same topology.  Exercise 2.16. (a) Let k · k1 , k · k2 be equivalent norms on X. Show that then (X, k · k1 ) and (X, k · k2 ) are either both complete or both not complete. (b) Construct a metric d on R that produces the usual topology, but (R, d) is not complete. (Since (R, | · |) has the same topology and is complete, this shows that the analog of (a) for metric spaces is false.) Theorem 2.15. Let X be a (complex) vector space with dim X < ∞. Then all norms on X are equivalent. In particular, by combining Example 2.1 with Exercise 2.16, we see that finite-dimensional normed spaces are automatically complete and thus Banach spaces. Proof. By fixing a basis on X, we may assume that X = Cn . We will show that every norm on Cn is equivalent to k · k1 . We will do this by verifying (2.5). So let k · k be a norm. Then, first of all,

  n n

X

X

(2.6) kxk = xj ej ≤ |xj | kej k ≤ max kej k kxk1 . j=1,...,n

j=1

j=1

26

Christian Remling

To obtain the other inequality, consider again the identity as a map from (Cn , k · k1 ) to (Cn , k · k). As we have just seen in (2.6), this map is bounded, thus continuous. Since a norm always defines a continuous map, we obtain that the composite map from (Cn , k · k1 ) to R, x 7→ kxk is also continuous. Now {x ∈ Cn : kxk1 = 1} is a compact subset of Cn , with respect to the topology generated by k · k1 (which is just the usual topology on Cn ). Therefore, the image under our map, which is given by {kxk : kxk1 = 1} is a compact subset of R, and it doesn’t contain zero, so inf kxk = min kxk =: c > 0,

kxk1 =1

kxk1 =1

and the homogeneity of norms now implies that kxk ≥ ckxk1 for all x ∈ Cn , as required.  Corollary 2.16. Suppose that dim X < ∞, and let A : X → Y be a linear operator. Then A is bounded. Proof. By Theorem 2.15, it suffices to discuss the case X = Cn , equipped with the norm k · k1 . As above, we estimate

!   n n

X

X

|xj | kAej k ≤ max kAej k kxk1 . xj ej ≤ kAxk = A j=1,...,n

j=1

j=1

 We conclude this chapter by discussing sums and quotients of Banach spaces. Let X1 , . . . , Xn be Banach spaces. We form their direct sum (as vector spaces). More precisely, we introduce X = {(x1 , . . . , xn ) : xj ∈ Xj }; this becomes a vector space in the obvious way: the algebraic operations are defined componentwise. Of course, we want more: We want to introduce a norm on X that makes X a Banach space, too. This can be done in several ways; for example, the following works. Pn Theorem 2.17. kxk = j=1 kxj kj defines a norm on X, and with this norm, X is a Banach space. Exercise 2.17. Prove Theorem 2.17. L We will denote this new Banach space by X = nj=1 Xj . Moving on to quotients now, we consider a Banach space X and a closed subspace M ⊂ X.

Banach spaces

27

Exercise 2.18. (a) In general, subspaces need not be closed. Give an example of a dense subspace M ⊂ `1 , M 6= `1 (in other words, we want M = `1 , M 6= `1 ; in particular, such an M is definitely not closed). (b) What about open subspaces of a normed space? Exercise 2.19. However, show that finite-dimensional subspaces of a normed space are always closed. Suggestion: Use Theorem 2.15. As a vector space, we define the quotient X/M as the set of equivalence classes (x), x ∈ X, where x, y ∈ X are equivalent if x−y ∈ M . So (x) = x+M = {x+m : m ∈ M }, and to obtain a vector space structure on X/M , we do all calculations with representatives. In other words, (x) + (y) := (x + y), c(x) := (cx), and this is well defined, because the right-hand sides are independent of the choice of representatives x, y. Theorem 2.18. k(x)k := inf y∈(x) kyk defines a norm on X, and X/M with this norm is a Banach space. Proof. First of all, we must check the conditions from Definition 2.1. We have that k(x)k = 0 precisely if there are mn ∈ M so that kx − mn k → 0. This holds if and only if x ∈ M , but M is assumed to be closed, so k(x)k = 0 if and only if x ∈ M , that is, if and only if x represents the zero vector from X/M (equivalently, (x) = (0)). If c ∈ C, c 6= 0, then kc(x)k = k(cx)k = inf kcx − mk = inf kcx − cmk m∈M

m∈M

= |c| inf kx − mk = |c| k(x)k. m∈M

If c = 0, then this identity (k0(x)k = 0k(x)k) is also true and in fact trivial. The triangle inequality follows from a similar calculation: k(x) + (y)k = k(x + y)k = inf kx + y − mk = inf kx + y − m − nk m∈M

m,n∈M

≤ inf (kx − mk + ky − nk) = k(x)k + k(y)k m,n∈M

Finally, we show that X/M is complete. Let (xn ) be a Cauchy sequence. Pass again to a subsequence, so that k(xnj+1 ) − (xnj )k < 2−j (see Exercise 2.8). Since the quotient norm was defined as the infimum of the norms of the representatives, we can now also (inductively) find representatives (we may assume that are the xn ’s themselves) so P these that kxnj+1 − xnj k < 2−j . Since 2−j < ∞, it follows that xnj is a Cauchy sequence in X, so x = limj→∞ xnj exists. But then we also have that k(x) − (xnj )k ≤ kx − xnj k → 0,

28

Christian Remling

so a subsequence of the original Cauchy sequence (xn ) converges, and this forces the whole sequence to converge; see Exercise 2.9.  Exercise 2.20. Let X be a normed space, and define B r (x) = {y ∈ X : kx − yk ≤ r}. Show that B r (x) = Br (x), where the right-hand side is the closure of the (open) ball Br (x). (Compare Exercise 1.16, which discussed the analogous problem on metric spaces.) Exercise 2.21. Call a subset B of a Banach space X bounded if there exists C ≥ 0 so that kxk ≤ C for all x ∈ B. (a) Show that if K ⊂ X is compact, then K is closed and bounded. (b) Consider X = `∞ , B = B 1 (0) = {x ∈ `∞ : kxk ≤ 1}. Show that B is closed and bounded, but not compact (in fact, the closed unit ball of an infinite-dimensional Banach space is never compact). Exercise 2.22. If P xn are elements of a normed space X, we define, as usual, the series ∞ n=1 xn as the limit as N → ∞ of the partial sums PN course, this limit needs to be taken SN = n=1 xn , if this limit exists (of P∞ with respect to the norm, so S = n=1 xj means that kS − SN k → 0). Otherwise, the is said to be divergent. Call a series absolutely Pseries ∞ convergent if n=1 kxn k < ∞. Prove that a normed space is complete if and only if every absolutely convergent series converges. Exercise 2.23. Find the operator norm of the identity map (x 7→ x) as an operator (a) from (Cn , k · k1 ) to (Cn , k · k2 ); (b) from (Cn , k · k2 ) to (Cn , k · k1 ). Exercise 2.24. Find the operator norms of the following operators on `2 (Z). In particular, prove that these operators are bounded. (Ax)n = xn+1 + xn−1 ,

(Bx)n =

n2 xn n2 + 1

Exercise 2.25. Let X, Y, Z be Banach spaces, and let S ∈ B(X, Y ), T ∈ B(Y, Z). Show that the composition T S lies in B(X, Z) and kT Sk ≤ kT k kSk. Show also that strict inequality is possible here. Give an example; as always, it’s sound strategy to try to keep this as simple as possible. Here, finite-dimensional spaces X, Y, Z should suffice. Exercise 2.26. Let X, Y be Banach spaces and let M be a dense subspace of X (there is nothing unusual about that on infinite-dimensional

Banach spaces

29

spaces; compare Exercise 2.18). Prove the following: Every A0 ∈ B(M, Y ) has a unique continuous extension to X. Moreover, if we call this extension A, then A ∈ B(X, Y ) (by construction, A is continuous, so we’re now claiming that A is also linear), and kAk = kA0 k. Exercise 2.27. (a) Let A ∈ B(X, Y ). Prove that N (A) is a closed subspace of X. (b) Now assume that F is a linear functional on X, that is, a linear map F : X → C. Show that F is continuous if N (F ) is closed (so, for linear functionals, continuity is equivalent to N (F ) being closed). Suggestion: Suppose F is not continuous, so that we can find xn ∈ X with kxn k = 1 and |F (xn )| ≥ n, say. Also, fix another vector z ∈ / N (F ) (what if N (F ) = X?). Use these data to construct a sequence yn ∈ N (F ) that converges to a vector not from N (F ). (If this doesn’t seem helpful, don’t give up just yet, but try something else; the proof is quite short.)

30

Christian Remling

3. Consequences of Baire’s Theorem In this chapter, we discuss four fundamental functional analytic theorems that are direct descendants of Baire’s Theorem (Theorem 1.10). All four results have a somewhat paradoxical character; the assumptions look too weak to give the desired conclusions, but somehow we get these anyway. Theorem 3.1 (Uniform boundedness principle). Let X be a Banach space and let Y be a normed space. Assume that F ⊂ B(X, Y ) is a family of bounded linear operators that is bounded pointwise in the following sense: For each x ∈ X, there exists Cx ≥ 0 so that kAxk ≤ Cx for all A ∈ F. Then F is uniformly bounded, that is, supA∈F kAk < ∞. Proof. Let Mn = {x ∈ X : kAxk ≤ n for all A ∈ F}. Then Mn is a closed subset X. Indeed, we can write \ Mn = {x ∈ X : kAxk ≤ n}, A∈F

and these sets are closed because they are the inverse images under A of the closed ball B nS(0). Moreover, the assumption that F is pointwise bounded says that n∈N Mn = X. Therefore, by Baire’s Theorem, at least one of the Mn ’s is not nowhere dense. Fix such an n, and let Br (x0 ) be an open ball contained in Mn . In other words, we now know that if ky − x0 k < r, then kAyk ≤ n for all A ∈ F. In particular, if x ∈ X is arbitrary with kxk = 1, then y = x0 + (r/2)x is such a vector and thus 2 2 kAxk = kA(y − x0 )k ≤ (kAyk + kAx0 k) r r 2 ≤ (n + Cx0 ) ≡ D. r The constant D is independent of x, so we also obtain that kAk ≤ D, and since D is also independent of A ∈ F, this is what we claimed.  Theorem 3.2 (The open mapping theorem). Let X, Y be Banach spaces, and assume that A ∈ B(X, Y ) is surjective (that is, R(A) = Y ). Then A is an open map: if U ⊂ X is open, then A(U ) is also open (in Y ). The condition defining an open map is of course similar to the corresponding property of continuous maps (see Proposition 1.5), but it goes in the other direction. In particular, that means that the inverse of an open map, if it exists, is continuous. Therefore, the open mapping theorem has the following consequence:

Baire’s Theorem

31

Corollary 3.3. Let X, Y be Banach spaces, and assume that A ∈ B(X, Y ) is bijective. Then A−1 ∈ B(Y, X). Exercise 3.1. Prove the following linear algebra fact: The inverse of an invertible linear map is linear. Proof. By Exercise 3.1, A−1 is linear. By the open mapping theorem and the subsequent remarks, A−1 is continuous.  Proof of Theorem 3.2. Let U ⊂ X be an open set, and let y ∈ A(U ), so y = Ax for some x ∈ X (perhaps there are several such x, but then we just pick one of these). We want to show that there exists r > 0 so that Br (y) ⊂ A(U ). Since y ∈ A(U ) was arbitrary, this will prove that A(U ) is open. We know that B (x) ⊂ U for some  > 0, so it actually suffices to discuss the case where U = B (x). In fact, this can be further reduced: it is enough to consider x, y = 0, and it then suffices to show that for some R > 0, the set A(BR (0)) contains a ball Br (0) for some r > 0. Indeed, if this holds, then, using the linearity of A, we will also obtain that   A(B (x)) = Ax+ A(BR (0)) ⊃ Ax+ Br (0) = Br/R (Ax) = Br/R (y), R R and this is exactly what we originally wanted to show. Since A is surjective, we have that [ [ Y = A(Bn (0)) = A(Bn (0)). n∈N

n∈N

By Baire’s Theorem, one of the closed sets in the second union has to contain an open ball, say Br (v) ⊂ A(Bn (0)). In other words, Br (0) ⊂ A(Bn (0)) − v. Now again v = Au for some u ∈ X, so (3.1)

Br (0) ⊂ A(Bn (0)) − Au = A(Bn (−u)),

and if we take N ≥ n + kuk, then BN (0) ⊃ Bn (−u), so (3.2)

Br (0) ⊂ A(BN (0)).

Except for the closure, this is what we wanted to show. Exercise 3.2. In (3.1), we used the following fact: If M ⊂ X and x ∈ X, then M + x = M + x. Prove this and also the analogous property that cM = cM (c ∈ C). We will now finish the proof by showing that A(BN (0)) ⊂ A(B2N (0)). So let y ∈ A(BN ) (since all balls will be centered at 0, we will use this simplified notation).

32

Christian Remling

We can find an x1 ∈ BN with ky − Ax1 k < r/2. Since, by (3.2) and Exercise 3.2, 1 1 Br/2 = Br ⊂ A(BN ) = A(BN/2 ), 2 2 we then also have that y − Ax1 ∈ A(BN/2 ). Thus there exists an x2 ∈ BN/2 with ky − Ax1 − Ax2 k < 2−2 r. We continue in this way and obtain a sequence xn with the following properties:

n

X

(3.3) xn ∈ B2−n+1 N , Axj < 2−n r

y −

j=1 P This shows, the series ∞ n=1 xn is absolutely convergent. P∞first of all, thatP ∞ −n 2 = 2N < ∞. By Exercise 2.22, kx k < 2N Indeed, n=1 P∞ n=1 n x := P n=1 xn exists. Moreover, by the calculation just carried out, kxk ≤ kxn k < 2N Pn, so x ∈ B2N . Since A is continuous, we obtain that Ax = limn→∞ j=1 Axj , and the second property from (3.3) now shows that Ax = y. In other words, y ∈ A(B2N ), as desired.  The graph of an operator A : X → Y is defined as the set G(A) = {(x, Ax) : x ∈ X}. We can think of G(A) as a subset of the Banach space X ⊕ Y that was introduced in Chapter 2; see especially Theorem 2.17. Exercise 3.3. Show that G(A) is a (linear) subspace of X ⊕ Y if A is a linear operator. Definition 3.4. Let X, Y be Banach spaces. A linear operator A : X → Y is called closed if G(A) is closed in X ⊕ Y . If we recall how the norm on X ⊕ Y was defined, we see that (xn , yn ) → (x, y) in X ⊕ Y precisely if xn → x and yn → y. Therefore, using sequences, we can characterize closed operators as follows: A : X → Y is closed precisely if the following holds: If xn → x and Axn → y, then y = Ax. On the other hand, A is continuous precisely if xn → x implies that Axn → y and y = Ax (formulated in a slightly roundabout way here to facilitate the comparison). This looks clearly stronger than the condition from above: what was part of the hypothesis has become part of the conclusion. In particular, continuous operators are always closed. When viewed against this background, the following result is quite stunning. Theorem 3.5 (The closed graph theorem). Let X, Y be Banach spaces and assume that A : X → Y is linear and closed. Then A ∈ B(X, Y ).

Baire’s Theorem

33

Proof. We introduce the projections P1 : X ⊕Y → X, P2 : X ⊕Y → Y , P1 (x, y) = x, P2 (x, y) = y. It is clear that P1 , P2 are linear and continuous. By hypothesis and Exercise 3.3, G(A) is a closed linear subspace of X ⊕ Y . By Proposition 2.7, it is therefore a Banach space itself (with the same norm as X ⊕ Y ). Now P1 , restricted to G(A) is a bijection onto X. Corollary 3.3 shows that the inverse P1−1 : X → G(A), P1−1 x = (x, Ax) is continuous. It follows that A = P2 P1−1 is a composition of continuous maps and thus continuous itself.  Exercise 3.4. Let X, Y be Banach spaces and An , A ∈ B(X, Y ). We say that An converges (to A) strongly if An x → Ax for all x ∈ X. s In this case, we write An −→ A. Prove that this has the following properties: s (a) kAn − Ak → 0 =⇒ An −→ A; (b) The converse does not hold; s (c) If An −→ A, then supn kAn k < ∞ (Hint: use the uniform boundedness principle). Exercise 3.5. Suppose that for some measure space (X, µ) and exponents p, q, we have that Lp (X, µ) ⊂ Lq (X, µ). Show that then there exists a constant C > 0 so that kf kq ≤ Ckf kp for all f ∈ Lp (X, µ). Suggested strategy: If Lp ⊂ Lq , we can define the inclusion map I : Lp → Lq , If = f . Use Corollary 2.9 to show that this map is closed, and then apply the closed graph theorem.

34

Christian Remling

4. Dual spaces and weak topologies Recall that if X is a Banach space, we write X ∗ for its dual. This was defined as the space of all continuous (or bounded) linear functionals F : X → C. We know from the special case Y = C of Theorem 2.12 that X ∗ itself is a Banach space, too, if we use the operator norm kF k = sup |F (x)|

(F ∈ X ∗ ).

kxk=1

The following fundamental result makes sure that there is a large supply of bounded linear functionals on every normed space. Theorem 4.1 (Hahn-Banach). Let X be a normed space and let M be a subspace of X. Suppose that F : M → C is a linear map satisfying |F (x)| ≤ Ckxk (x ∈ M ). Then there exists a linear extension G : X → C of F so that |G(x)| ≤ Ckxk for all x ∈ X. In other words, a bounded linear functional on a subspace can always be extended to the whole space without increasing the norm. This latter property is the point here; it is easy, at least in principle, to linearly extend a given functional. (Sketch: Fix a basis of M as a vector space, extend to a basis of the whole space and assign arbitrary values on these new basis vectors.) Proof. We first prove a real version of theorem. So, for the time being, let X be a real vector space, and assume that F : M → R is a bounded linear functional on a subspace. Roughly speaking, the extension will be done one step at a time. So our first goal is to show that F can be extended to a one-dimensional extension of M in such a way that the operator norm is preserved. We are assuming that |F (x)| ≤ Ckxk; in fact, since Ckxk defines a new norm on X (if C > 0), we can assume that C = 1 here. Now let x1 ∈ X, x1 ∈ / M . We want to define a linear extension F1 of F on the (slightly, by one dimension) bigger space M1 = {x + cx1 : x ∈ M, c ∈ R}. Such a linear extension is completely determined by the value f = F1 (x1 ) (and, conversely, every f ∈ R will define an extension). Since we also want an extension that still satisfies |F1 (y)| ≤ kyk (y ∈ M1 ), we’re looking for an f ∈ R so that (4.1)

−kx + cx1 k ≤ F (x) + cf ≤ kx + cx1 k

for all c ∈ R, x ∈ M . By assumption, we already have this for c = 0, and by discussing the cases c > 0 and c < 0 separately, we see that

Dual spaces

35

(4.1) is equivalent to

x

x

x x



− + x1 − F ≤ f ≤ + x1 − F c c c c for all c 6= 0, x ∈ M . In other words, there will be an extension F1 with the desired properties if (and only if, but of course that is not our concern here) kz + x1 k − F (z) ≥ −ky + x1 k − F (y) for arbitrary y, z ∈ M . This is indeed the case, because F (z) − F (y) = F (z − y) ≤ kz − yk ≤ kz + x1 k + kx1 + yk. We now use Zorn’s Lemma to obtain a norm preserving extension to all of X (this part of the proof can be safely skipped if you’re not familiar with this type of argument). We consider the set of all linear extensions G of F that satisfy |G(x)| ≤ kxk on the subspace on which they are defined. This set can be partially ordered by declaring G ≺ G0 if G0 is an extension of G. Now if {Gα } is a totally ordered subset (any two Gα ’sS can be compared) and if we denote the domain of Gα by Mα , then G : Mα → R, G(x) = Gα (x) defines an extension of all the Gα ’s, that is, G  Gα for all α. Note that there are no consistency problems in the definition of G because if there is more than one possible choice for α for any given x, then the corresponding Gα ’s must give the same value on x because one of them is an extension of the other. We have verified the hypotheses of Zorn’s Lemma. The conclusion is that there is a G that is maximal in the sense that if H  G, then H = G. This G must be defined on the whole space X because otherwise the procedure described above would give an extension H to a strictly bigger space. We have proved the real version of the HahnBanach Theorem. The original, complex version can be derived from this by some elementary, but ingenious trickery, as follows: First of all, we can think of X and M as real vector spaces also (we just refuse to multiply by non-real numbers and otherwise keep the algebraic structure intact). Moreover, L0 (x) = Re F (x) defines an R-linear functional L0 : M → R. By the real version of the theorem, there exists an R-linear extension L : X → R, |L(x)| ≤ kxk. I now claim that the following definition will work: G(x) = L(x) − iL(ix) Indeed, it is easy to check that G(x + y) = G(x) + G(y), and if

36

Christian Remling

c = a + ib ∈ C, then G(cx) = L(ax + ibx) − iL(iax − bx) = aL(x) + bL(ix) − iaL(ix) + ibL(x) = (a + ib)L(x) + (b − ia)L(ix) = c(L(x) − iL(ix)) = cG(x). So G is C-linear. It is also an extension of F because if x ∈ M , then L(x) = L0 (x) = Re F (x) and thus G(x) = Re F (x) − i Re F (ix) = Re F (x) − i Re(iF (x)) = Re F (x) + i Im F (x) = F (x). Finally, if we write G(x) = |G(x)|eiϕ(x) , we see that |G(x)| = G(x)e−iϕ(x) = G(e−iϕ(x) x) = Re G(e−iϕ(x) x) = L(e−iϕ(x) x) ≤ ke−iϕ(x) xk = kxk.  Here are some immediate important consequences of the Hahn-Banach Theorem. They confirm that much can be learned about a Banach spaces by studying its dual. For example, part (b) says that norms can be computed by testing functionals on the given vector x. Corollary 4.2. Let X, Y be normed spaces. (a) X ∗ separates the points X, that is, if x, y ∈ X, x 6= y, then there exists an F ∈ X ∗ with F (x) 6= F (y). (b) For all x ∈ X, we have that kxk = sup{|F (x)| : F ∈ X ∗ , kF k = 1}. (c) If T ∈ B(X, Y ), then kT k = sup{|F (T x)| : x ∈ X, F ∈ Y ∗ , kF k = kxk = 1}. Proof of (b). If F ∈ X ∗ , kF k = 1, then |F (x)| ≤ kxk. This implies that sup |F (x)| ≤ kxk. On the other hand, F0 (cx) = ckxk defines a linear functional on the one-dimensional subspace L(x) that satisfies |F0 (y)| ≤ kyk for all y = cx ∈ L(x) (in fact, we have equality here). By the Hahn-Banach Theorem, there exists an extension F ∈ X ∗ , kF k = 1 of F0 ; by construction, |F (x)| = kxk, so sup |F (x)| ≥ kxk and the proof is complete. In fact, this argument has also shown that the supremum is attained; it is a maximum.  Exercise 4.1. Prove parts (a) and (c) of Corollary 4.2. Let X be a Banach space. Since X ∗ is a Banach space, too, we can form its dual X ∗∗ = (X ∗ )∗ . We call X ∗∗ the bidual or second dual of X. We can identify the original space X with a closed subspace of X ∗∗ in

Dual spaces

37

a natural way, as follows: Define a map j : X → X ∗∗ , j(x)(F ) = F (x) (x ∈ X, F ∈ X ∗ ). In other words, vectors x ∈ X act in a natural way on functionals F ∈ X ∗ : we just evaluate F on x. Proposition 4.3. We have that j(x) ∈ X ∗∗ , and the map j is a (linear) isometry. In particular, j(X) ⊂ X ∗∗ is a closed subspace of X ∗∗ . An operator I : X → Y is called an isometry if kIxk = kxk for all x ∈ X. Exercise 4.2. (a) Show that an isometry I is always injective, that is, N (I) = {0}. (b) Show that S : `1 → `1 , Sx = (0, x1 , x2 , . . .) is an isometry that is not onto, that is R(S) 6= `1 . Proof. We will only check that j is an isometry and that j(X) is a closed subspace. Exercise 4.3. Prove the remaining statements from Proposition 4.3. More specifically, prove that j(x) is a linear, bounded functional on X ∗ for every x ∈ X, and prove that the map x 7→ j(x) is itself linear. By the definition of the operator norm and Corollary 4.2(b), we have that kj(x)k = sup{|j(x)(F )| : F ∈ X ∗ , kF k = 1} = sup{|F (x)| : F ∈ X ∗ , kF k = 1} = kxk, so j indeed is an isometry. Clearly, j(X) is a subspace (being the image of a linear map). If yn ∈ j(X), that is, yn = j(xn ), and yn → y, then also xn → x for some x ∈ X, because yn is a Cauchy sequence, and since j preserves norms, so is xn . Since j is continuous, it follows that j(x) = lim j(xn ) = y, so y ∈ j(X).  A linear isometry preserves all structures on a Banach space (the vector space structure and the norm), and thus provides an identification of its domain with its image. Using j and Proposition 4.3, we can therefore think of X as a closed subspace of X ∗∗ . If, in this sense, X = X ∗∗ , we call X reflexive. This really means that j(X) = X ∗∗ . In particular, note that for X to be reflexive, it is not enough to have X isometrically isomorphic to X ∗∗ ; rather, we need this isometric isomorphism to be specifically j. We now use dual spaces to introduce new topologies on Banach spaces. If T1 , T2 are two topologies on a common space X, we say that T1 is weaker than T2 (or T2 is stronger than T1 ) if T1 ⊂ T2 . In topology, coarse and fine mean the same thing as weak and strong,

38

Christian Remling

respectively, but it would be uncommon to use these alternative terms in functional analysis. Given a set X and a topological space (Y, T ) and a family F of maps F : X → Y , there exists a weakest topology on X that makes all F ∈ F continuous. Let us try to give a description of this topology (and, in fact, we also need to show that such a topology exists). We will denote it by Tw . Clearly, we must have F −1 (U ) ∈ Tw for all F ∈ F, U ∈ T . Conversely, any topology that contains these sets will make the F ’s continuous. So we could stop here and say that Tw is the topology generated by these sets. (Given any collection of sets, there always is a weakest topology containing these sets.) However, we would like to be somewhat more explicit. It is clear that finite intersections of sets of this type have to be in Tw , too; in other words, (4.2)

{x ∈ X : F1 (x) ∈ U1 , . . . , Fn (x) ∈ Un }

belongs to Tw for arbitrary choices of n ∈ N, Fj ∈ F, Uj ∈ T . If these sets are open, then arbitrary unions of such sets need to belong to Tw , and, fortunately, the process stops here: we don’t get additional sets if we now take finite intersections again. So the claim is that (4.3)

Tw = { arbitrary unions of sets of the type (3.3) }.

We must show that Tw is a topology; by its construction, any other topology that makes all F ∈ F continuous must then be stronger than Tw . This verification is quite straightforward, but a little tedious to write down, so I’ll make this an exercise: Exercise 4.4. Prove that (4.2), (4.3) define a topology. We now apply this process to a Banach space X, with Y = C and F = X ∗ . Of course, we already have a topology on X (the norm topology); this new topology will be different, unless X is finite-dimensional. Here’s the formal definition: Definition 4.4. Let X be a Banach space. The weak topology on X is defined as the weak topology Tw generated by X ∗ . If we denote the norm topology by T , then, since all F ∈ X ∗ are continuous if we use T (by definition of X ∗ !), we see that Tw ⊂ T ; in other words, the weak topology is weaker than the norm topology. By the discussion above, (4.3) gives a description of Tw . A slightly more convenient variant of this can be obtained by making use of the vector space structure. First of all, the sets (4.4) U (F1 , . . . , Fn ; 1 , . . . , n ) = {x ∈ X : |Fj (x)| < j (j = 1, . . . , n)}

Dual spaces

39

are in Tw for arbitrary n ∈ N, Fj ∈ X ∗ , j > 0. In fact, they are of the form (4.2), with Uj = {z : |z| < j }. I now claim that V ∈ Tw if and only if for every x ∈ V , there exists a set U = U (Fj ; j ) of this form with x + U ⊂ V . Exercise 4.5. Prove this claim. We can rephrase this as follows: The U ’s form a neighborhood base at 0 (that is, any neighborhood of x = 0 contains some U ) and the neighborhoods of an arbitrary x ∈ X are precisely the translates x + W of the neighborhoods W of 0. We’ll make two more observations on the weak topology and then leave the matter at that. First of all, Tw is a Hausdorff topology: If x, y ∈ X, x 6= y, then there exist V, W ∈ Tw with x ∈ V , y ∈ W , V ∩ W = ∅. To prove this, we use the fact that X ∗ separates the points of X; see Corollary 4.2(a). So there is an F ∈ X ∗ with F (x) 6= F (y). We can now take V = x + U (F ; ), W = y + U (F ; ) with a sufficiently small  > 0. Exercise 4.6. Provide the details for this last step. You can (and should) make use of the description of Tw established above, in Exercise 4.5. Finally, if xn is a sequence from X, then xn → x in Tw (this is usually w written as xn − → x; xn goes to x weakly) if and only if F (xn ) → F (x) for all F ∈ X ∗ . w

Exercise 4.7. Prove this. Again, by the results of Exercise 4.5, xn − → x precisely if for every U of the form (4.4), we eventually have that xn − x ∈ U . This gives a characterization of convergent sequences and thus some idea of what the topology does. However, it can happen that Tw is not metrizable and then the topological notions (closed sets, compactness, continuity etc.) can not be characterized using sequences. Definition 4.5. Let X be a Banach space. The weak-∗ topology Tw∗ on X ∗ is defined as the weak topology generated by X, viewed as a subset of X ∗∗ . Put differently, Tw∗ is the weakest topology that turns all point evaluations j(x) : X ∗ → C, F 7→ F (x) (x ∈ X) into continuous functions on X ∗ . We have an analogous description of Tw∗ . The sets U (x1 , . . . , xn ; 1 , . . . , n ) = {F ∈ X ∗ : |F (xj )| < j } are open, and V ⊂ X ∗ is open in the weak-∗ topology if and only for every F ∈ V , there exists such a U so that F + U ⊂ V .

40

Christian Remling

Exercise 4.8. Prove that Tw∗ is a Hausdorff topology. Hint: If F 6= G, then F (x) 6= G(x) for some x ∈ X. Now you can build disjoint neighborhoods of F , G as above, using this x; see also Exercise 4.6. Exercise 4.9. Let X be a Banach space, and let Fn , F ∈ X ∗ . Show that Fn → F in the weak-∗ topology if and only if Fn (x) → F (x) for all x ∈ X. Since X ∗ is a Banach space, we can also define a weak topology on X . This is the topology generated by X ∗∗ . The weak-∗ topology is generated by X, which in general is a smaller set of maps, so the weak-∗ topology is weaker than the weak topology. It can only be defined on a dual space. If X is reflexive, then there’s no difference between the weak and weak-∗ topologies. Despite its clumsy and artificial looking definition, the weak-∗ topology is actually an extremely useful tool. All the credit for this is due to the following fundamental result. ∗

Theorem 4.6 (Banach-Alaoglu). Let X be a Banach space. Then the closed unit ball B 1 (0) = {F ∈ X ∗ : kF k ≤ 1} is compact in the weak-∗ topology. Proof. This will follow from Tychonoff ’s Theorem: The product of compact topological spaces is compact in the product topology. To get this argument started, we look at the Cartesian product set Y K= {z ∈ C : |z| ≤ kxk}. x∈X

As a set, this is defined as the set of maps F : X → C with |F (x)| ≤ kxk. The individual factors {|z| ≤ kxk} come with a natural topology, and we endow K with the product topology, which, by definition, is the weak topology generated by the projections px : K → {|z| ≤ kxk}, px (F ) = F (x) (equivalently, you can also produce it from cylinder sets, if this is more familiar to you). By Tychonoff’s Theorem, K is compact. Now B 1 (0) ⊂ K; more precisely, B 1 (0) consists of those maps F ∈ K that are also linear. I now claim that the topology induced by K on B 1 (0) is the same as the induced topology coming from the weak-∗ topology on X ∗ ⊃ B 1 (0). This should not come as a surprise because both the product topology and Tw∗ are weak topologies generated by the point evaluations x 7→ F (x). Writing it down is a slightly unpleasant task that is best delegated to the reader. Exercise 4.10. Show that K and (X ∗ , Tw∗ ) indeed induce the same topology on B 1 (0). Come to think of it, we perhaps really want to prove the following abstract fact: Let (Z, T ) be a topological space, F a

Dual spaces

41

family of maps F : X → Z and let Y ⊂ X. Then we can form the weak topology Tw on X; this induces a relative topology on Y . Alternatively, we can restrict the maps F ∈ F to Y and let the restrictions generate a weak topology on Y . Prove that both methods lead to the same topology. As usual, this is mainly a matter of unwrapping definitions. You could use the description (4.2), (4.3) of the weak topologies and look at what happens when these induce relative topologies. Exercise 4.11. (a) Let Y be a compact topological space and let A ⊂ Y be closed. Prove that then A is compact, too. (b) Let Y be a topological space. Show that a subset B ⊂ Y is compact if and only if B with the relative topology is a compact topological space. (Sometimes compactness is defined in this way; recall that we defined compact sets by using covers by open sets U ⊂ Y . It is now in fact almost immediate that we get the same condition from both variants, but this fact will be needed here, so I thought I’d point it out.) With these preparations out of the way, it now suffices to show that B 1 (0) is closed in K. So let F ∈ K \ B 1 (0). We want to find a neighborhood of F that is contained in K \ B 1 (0) (note that we cannot use sequences here because there is no guarantee that our topologies are metrizable). Since F ∈ / B 1 (0), F is not linear and thus there are c, d ∈ C, x, y ∈ X so that  ≡ |F (cx + dy) − cF (x) − dF (y)| > 0. But then n  V = G ∈ K : |G(cx + dy) − F (cx + dy)| < , 3  o |c| |G(x) − F (x)| < , |d| |G(y) − F (y)| < 3 3 is an open set in K with F ∈ V , and if G ∈ V , then still |G(cx + dy) − cG(x) − dG(y)| > 0, so V does not contain any linear maps and thus is the neighborhood of F we wanted to find.  We have already seen how the fact that Tw∗ need not be metrizable makes this topology a bit awkward to deal with. The following result provides some relief. We call a metric space X separable if X has a countable dense subset (that is, there exist xn ∈ X so that if x ∈ X and  > 0 are arbitrary, then d(x, xn ) <  for some n ∈ N).

42

Christian Remling

Exercise 4.12. (a) Show that `p (with index set N, as usual) is separable for 1 ≤ p < ∞. You can use the result of Exercise 4.13 below if you want. (b) Show that `∞ is not separable. Suggestion: Consider all x ∈ `∞ that only take the values 0 and 1. How big is this set? What can you say about kx − x0 k for two such sequences? Exercise 4.13. Let X be a Banach space. Show that X will be separable if there is a countable total subset, that is, if there is a countable set M ⊂ X so that the (finite) linear combinations of elements from M are dense in X (in other words, if x ∈ X and  > 0, we must be

able

PN to find mj ∈ M and coefficients cj ∈ C so that j=1 cj mj − x < .) Theorem 4.7. If X is a separable Banach space, then the weak-∗ topology on B 1 (0) ⊂ X ∗ (more precisely: the relative topology induced by Tw∗ ) is metrizable. We don’t want to prove this in detail, but the basic idea is quite easy to state. The formula ∞ X |F (xn ) − G(xn )| 2−n d(F, G) = 1 + |F (xn ) − G(xn )| n=1 (say), where {xn } is a dense subset of X, defines a metric that generates the desired topology. Corollary 4.8. Let X be a separable Banach space. If Fn ∈ X ∗ , kFn k ≤ 1, then there exist F ∈ X ∗ , kF k ≤ 1 and a subsequence nj → ∞ so that Fnj (x) → F (x) for all x ∈ X. Proof. This follows by just putting things together. By the BanachAlaoglu Theorem, B 1 (0) is compact in the weak-∗ topology. By Theorem 4.7, this can be thought of as a metric space. By Theorem 1.7(c), compactness is therefore equivalent to sequences having convergent subsequences. By using Exercise 4.9, we now obtain the claim.  To make good use of the results of this chapter, we need to know what the dual space of a given space is. We now investigate this question for the Banach spaces from our list that was compiled in Chapter 2. Example 4.1. If X = Cn with some norm, then Corollary 2.16 implies that all linear functionals on X are bounded, so in this case X ∗ coincides with the algebraic dual space. From linear algebra we know that as a vector space, X ∗ can be identified with Cn again; P more precisely, y ∈ Cn can be identified with the functional x 7→ nj=1 yj xj . It also follows from this that X is reflexive. The norm on X ∗ depends on the

Dual spaces

43

norm on X; Example 4.3 below will throw some additional light on this. Exercise 4.14. Show that the weak topology on X = Cn coincides with the norm topology. Suggestion: It essentially suffices to check that open balls Br (0) (say) are in Tw . Show this and then use the definition of the norm topology T to show that T ⊂ Tw . Since always Tw ⊂ T , this will finish the proof. This Exercise says that we really don’t get anything new from the theory of this chapter on finite-dimensional spaces; recall also in this context that Tw = Tw∗ on Cn , thought of as the dual space X ∗ of X = Cn , because X is reflexive. Example 4.2. Let K be a compact Hausdorff space and consider the Banach space C(K). Then C(K)∗ = M(K), where M(K) is defined as the space of all complex, regular Borel measures on K. Here, we call a (complex) measure µ (inner and outer) regular if its total variation ν = |µ| is regular in the sense that ν(B) =

sup

ν(L) =

L⊂B:L compact

inf

U ⊃B:U open

ν(U )

for all Borel sets B ⊂ K. M(K) becomes a vector space if the vector space operations are introduced in the obvious way as (µ + ν)(B) = µ(B) + ν(B), (cµ)(B) = cµ(B). In fact, M(K), equipped with the norm (4.5)

kµk = |µ|(K),

is a Banach space. This is perhaps most elegantly deduced from the main assertion of this Example, namely the fact that C(K)∗ can be identified with M(K), and, as we will see in a moment, the operator norm on M(K) = C(K)∗ turns out to be exactly (4.5). More precisely, the claim is that every µ ∈ M(K) generates a functional Fµ ∈ C(K)∗ via Z (4.6) Fµ (f ) = f (x) dµ(x), K

and we also claim that the corresponding map M(K) → C(K)∗ , µ 7→ Fµ is an isomorphism between Banach spaces (in other words, a bijective, linear isometry). The Riesz Representation Theorem does the lion’s share of the work here; it implies that µ 7→ Fµ is a bijection from M(K) onto C(K)∗ ; see for example, Folland, Real Analysis, Corollary 7.18. It is also clear that this map is linear.

44

Christian Remling

Exercise 4.15. Suppose we introduce a norm on M(K) by just declaring kµk = kFµ k (operator norm), that is, we just move the norm on C(K)∗ over to M(K). Show that this leads to (4.5); put differently, show that the operator norm of Fµ from (4.6) equals |µ|(K). This identification of the dual of C(K) (which is basically a way of stating (one version of) the Riesz Representation Theorem) is an extremely important fundamental result; we have every right to be very excited about this. One pleasing consequence is the fact that M(K), being a dual space, can be equipped with a weak-∗ topology. This, in turn, has implications of the following type: Exercise 4.16. Show that C[a, b] (so K = [a, b], with the usual topology) is separable. Suggestion: Deduce this from the Weierstraß approximation theorem. Exercise 4.17. Let µn be a sequence of complex Borel measures on [a, b] with |µn |([a, b]) ≤ 1 (in particular, these could be arbitrary positive measures with µn ([a, b]) ≤ 1). Show that there exists another Borel measure µ on [a, b] with |µ|([a, b]) ≤ 1 and a subsequence nj so that Z Z f (x) dµ(x) lim f (x) dµnj (x) = j→∞

[a,b]

[a,b]

for all f ∈ C[a, b]. Hint: This in fact follows quickly from Corollary 4.8 and Exercise 4.16. Example 4.3. We now move on to `p spaces. We first claim that if 1 ≤ p < ∞, then (`p )∗ = `q , where 1/p + 1/q = 1. More precisely, the claim really is that every y ∈ `q generates a functional Fy ∈ (`p )∗ , as follows: ∞ X (4.7) Fy (x) = yj xj j=1

Moreover, the corresponding map y 7→ Fy is an isomorphism between the Banach spaces `q and (`p )∗ . Let us now prove these assertions. First of all, H¨older’s inequality shows that the series from (4.7) converges, and in fact |Fy (x)| ≤ kykq kxkp . Since (4.7) is also obviously linear in x, this shows that Fy ∈ (`p )∗ and kFy k ≤ kykq . We will now explicitly discuss only the case 1 < p < ∞. If p = 1 (and thus q = ∞), the same basic strategy works and actually the technical details get easier, but some slight adjustments are necessary.

Dual spaces

To compute kFy k, we set ( (4.8)

xn =

|yn |q yn

45

n ≤ N, yn 6= 0 , else

0

with N ∈ N. It is then clear that x ∈ `p , kxkpp

=

N X

(q−1)p

|yn |

=

n=1

N X

|yn |q ,

n=1

and thus Fy (x) =

N X

|yn |q =

N X

!1/q |yn |q

kxkp .

n=1

n=1

1/q P N q , and this holds for arbitrary N ∈ N, |y | Thus kFy k ≥ n=1 n so it follows that kFy k ≥ kykq , so kFy k = kykq . This says that the identification y 7→ Fy is isometric, and it is obviously linear (in y!), so it remains to show that it is surjective, that is, every F ∈ (`p )∗ equals some Fy for suitable y ∈ `q . To prove this, fix F ∈ (`p )∗ . It is clear from (4.7) that only yn = F (en ) can work, so define a sequence y in this way (here, en (j) = 1 if j = n and en (j) = 0 otherwise). Consider again the x ∈ `p from (4.8). Then we have that ! N N X X |yn |q F (x) = F en = |yn |q . y n n=1 n=1 As above, kxkp =

P

N n=1

N X n=1

|yn |q

1/p

, so

|yn |q ≤ kF k

N X

!1/p |yn |q

n=1

P 1/q N q or |y | ≤ kF k. Again, N is arbitrary here, so y ∈ `q . By n n=1 construction of y, we have that F (z) = Fy (z) if z is a finite linear combination of en ’s. These vectors z, however, are dense in `p , so it follows from the continuity of both F and Fy that F (w) = Fy (w) for all w ∈ `p , that is, F = Fy . As a consequence, `p is reflexive for 1 < p < ∞, basically because (`p )∗∗ = (`q )∗ = `p , by applying the above result on the dual of `r twice. Exercise 4.18. Give a careful version of this argument, where the identification j : X → X ∗∗ from the definition of reflexivity is taken seriously.

46

Christian Remling

We can’t be sure about `1 and `∞ at this point because we don’t know yet what (`∞ )∗ is. It will turn out that these spaces are not reflexive. Example 4.4. Similar discussions let us identify the duals of c0 and c. We claim that c∗0 = `1 = `1 (N) and c∗ = `1 (N0 ); as usual, we really mean that there are Banach space isomorphisms (linear, bijective isometries) y 7→ Fy that provide identifications between these spaces, and in the case at hand, these are given by: ∞ X Fy (x) = yj xj (y ∈ `1 (N), x ∈ c0 ), j=1

Fy (x) = y0 ·





lim xn +

n→∞

∞ X j=1

  yj xj − lim xn n→∞

(y ∈ `1 (N0 ), x ∈ c)

Since this discussion is reasonably close to Example 4.3, I don’t want to do it here. The above representations of the dual spaces as `1 (N) and `1 (N0 ) seem natural, but note that these are of course isometrically isomorphic as Banach spaces: (yn )n≥1 7→ (yn+1 )n≥0 is an isometry from `1 (N) onto `1 (N0 ). Roughly speaking, the one additional dimension of `1 (N0 ) doesn’t alter the Banach space structure. (At the same time, `1 (N) can of course also be identified with the codimension 1 subspace {(xn )n≥0 : x0 = 0} of `1 (N0 ); there is nothing unusual about this in infinite-dimensional situations: the whole space can be isomorphic to a proper subspace.) Example 4.5. We now discuss (`∞ )∗ . We will obtain an explicit looking description of this dual space, too, but, actually, this result will not be very useful. This is so because the objects that we will obtain are not particularly well-behaved and there is no well developed machinery that would recommend them for further use. It will turn out that (`∞ )∗ = Mf a (N), the space of bounded, finitely additive set functions on N. More precisely, the elements of Mf a (N) are set functions µ : P(N) → C that satisfy supM ⊂N |µ(M )| < ∞ and if M1 , M2 ⊂ N are disjoint, then µ(M1 ∪ M2 ) = µ(M1 ) + µ(M2 ). Note that the complex measures on the σ-algebra P(N) are precisely those µ ∈ Mf a (N) that are σ-additive rather than just finitely additive. Finitely additive bounded set functions will act on vectors x ∈ `∞ by an integration of sorts. We discuss this new integral first (as far as I can see, this integral does not play any major role in analysis except in this particular context). Let x ∈ `∞ . We subdivide the disk {z ∈ C : |z| ≤ kxk} into squares (say) Qj , and we fix a number zj ∈ Qj from each square. Let Mj = {n ∈ N : xn ∈ Qj } be the inverse image

Dual spaces

47

under xn . It’s quite easy to check that for µ ∈ Mf a (N), the sums P zj µ(Mj ) will approach a limit as the subdivision gets finer and finer, and this limit is independent of the choice of the Qj and zj . We call this limit the Radon integral of xn with respect to µ, and we denote it by Z R − xn dµ(n). N

Next, we show how to associate a set function µ ∈ Mf a (N) with a given functional F ∈ (`∞ )∗ . Define µ(M ) = F (χM ) (M ⊂ N). Then |µ(M )| ≤ kF k kχM k ≤ kF k, so µ is a bounded set function. Also, if M1 ∩ M2 = ∅, then µ(M1 ∪ M2 ) = F (χM1 ∪M2 ) = F (χM1 + χM2 ) = F (χM1 ) + F (χM2 ) = µ(M1 ) + µ(M2 ). Thus µ ∈ Mf a (N). Moreover, if squares Qj and points zj ∈ Qj are chosen

and if we again set Mj = {n ∈ N : xn ∈ Qj }, then

Pas above

x − zj χM is bounded by the maximal diameter of the Qj ’s, so this j will go to zero if we again consider a sequence of subdivisions becoming arbitrarily fine. It follows that Z  X X zj µ(Mj ) = R − xn dµ(n). F (x) = lim F zj χMj = lim N

Conclusion: Every F ∈ (`∞ )∗ can be represented as a Radon integral. Conversely, one can show that every µ ∈ Mf a (N) generates a functional Fµ on `∞ by Radon integration: Z Fµ (x) = R − xn dµ(n) N

(The boundedness of Fµ requires some work; Exercise 4.19 below should help to clarify things.) We obtain a bijection Mf a (N) → (`∞ )∗ , µ 7→ Fµ , and, as in the previous examples, it’s now a relatively easy matter to check that this actually sets up an isometric isomorphism between Banach spaces if we endow Mf a (N) with the natural vectors space structure ((µ+ν)(M ) := µ(M ) + ν(N ) etc.) and the norm Z kµk = sup R − xn dµ(n) . kxk=1

N

I’ll leave the details of these final steps to the reader.

48

Christian Remling

P Exercise 4.19. Show that kµk = sup |µ(Mj )|, where the supremum is over all partitions of N into finitely many sets M1 , . . . , MN . Moreover, ∞ X |µ ({n})| ≤ kµk; n=1

can you also show that strict inequality is possible? (Exercise 4.21 might be helpful here.) 1 1 Exercise 4.20. P Show that ` ⊂ Mf a (N) in the sense that if y ∈ ` , then µ(M ) = n∈M yn defines a bounded, finitely additive set function. Show that in fact these µ’s are exactly the (complex) measures on (N, P(N)). Remark: Since X can be identified with a subspace of X ∗∗ for any Banach space X and since `∞ = (`1 )∗ , we knew right away that `1 ⊂ (`∞ )∗ , provided this is suitably interpreted.

Exercise 4.21. The fact that `1 $ (`∞ )∗ can also be seen more directly, without giving a description of (`∞ )∗ , as follows: (a) Show that every y ∈ `1 generates a functional Fy ∈ (`∞ )∗ by letting ∞ X (4.9) Fy (x) = y n xn (x ∈ `∞ ). n=1 ∞ ∗

(b) Show that not every F ∈ (` ) is of this form, by using the HahnBanach Theorem. More specifically, choose a subspace Y ⊂ `∞ and define a bounded functional F0 on Y in such a way that no extension F of F0 can be of the form (4.9). (This is an uncomplicated argument if done efficiently; it all depends on a smart choice of Y and F0 .) Example 4.6. I’ll quickly report on the spaces Lp (X, µ) here. The situation is similar to the discussion above; see Examples 4.3, 4.5. If 1 ≤ p < ∞, then (Lp )∗ = Lq , where 1/p + 1/q = 1. This holds in complete generality for 1 < p < ∞, but if p = 1, then we need the additional hypothesis that µ is σ-finite (which means that X can be written as a countable union of sets of finite measure). Again, this is an abbreviated way of stating the result; it really involves an identification of Banach spaces: the function Rf ∈ Lq is identified with the functional Ff ∈ (Lp )∗ defined by Ff (g) = X f g dµ. (L∞ )∗ is again a complicated space that can be described as a space of finitely additive set functions, but this description is only moderately useful. In particular, except in special cases, (L∞ )∗ is (much) bigger than L1 . In fact, for example L1 (R, m) is not the dual space of any Banach space: there is no Banach space X for which X ∗ is isometrically isomorphic to L1 (R, m)!

Dual spaces

49

Exercise 4.22. What is wrong with the following sketch of a “proof” that (`∞ )∗ = `1 : Follow the strategy from Example 4.3. Obviously, if y ∈ `1 , then Fy ∈ (`∞ )∗ , if Fy is defined as in (4.7). Conversely, given an F ∈ (`∞ )∗ , let yn = F (en ). Define x ∈ `∞ by ( |yn | n ≤ N, yn 6= 0 x n = yn . 0 otherwise PN Then kxk∞ ≤ 1, so F (x) = n=1 |yn | ≤ kF k, and it follows that 1 y ∈ ` . By construction, F = Fy . Exercise 4.23. Let X be a Banach space. Show that every weakly convergent sequence is bounded: If xn , x ∈ X, F (xn ) → F (x) for all F ∈ X ∗ , then sup kxn k < ∞. Hint: Think of the xn as elements of the bidual X ∗∗ ⊃ X and apply the uniform boundedness principle. w

Exercise 4.24. (a) Show that en − → 0 in `2 . (b) Construct a sequence fn with similar properties in C[0, 1]: we want w that kfn k = 1, fn − → 0.

50

Christian Remling

5. Hilbert spaces Definition 5.1. Let H be a (complex) vector space. A scalar product (or inner product) is a map h·, ·i : H × H → C with the following properties: (1) hx, xi ≥ 0 and hx, xi = 0 ⇐⇒ x = 0; (2) hx, yi = hy, xi; (3) hx, y + zi = hx, yi + hx, zi; (4) hx, cyi = chx, yi . (3), (4) say that a scalar product is linear in the second argument, and by combining this with (2), we see that it is antilinear in the first argument, that is hx + y, zi = hx, zi + hy, zi as usual, but hcx, yi = c hx, yi. Example 5.1. It is easy to check that hx, yi =

∞ X

xn yn

n=1

defines a scalar product on H = `2 . Indeed, the series converges by H¨older’s inequality with p = q = 2, and once we know that, it is clear that (1)–(4) from above hold. In fact, this works for arbitrary index sets I: there is a similarly defined scalar product on `2 (I). I mention this fact here because we will actually make brief use of this space later in this chapter. Similarly, Z hf, gi =

f (x)g(x) dµ(x) X

defines a scalar product on L2 (X, µ). Theorem 5.2. p Let H be a space with a scalar product. Then: (a) kxk := hx, xi defines a norm on H; (b) The Cauchy-Schwarz inequality holds: |hx, yi| ≤ kxk kyk; (c) We have equality in (b) if and only if x, y are linearly dependent. Proof. We first discuss parts (b) and (c). Let x, y ∈ H. Then, by property (1) from Definition 5.1, we have that (5.1)

0 ≤ hcx + y, cx + yi = |c|2 kxk2 + kyk2 + chy, xi + chx, yi,

2 for arbitrary c ∈ C. If x 6= 0, we can p take c = −hx, yi/kxk here (note that (1) makes sure that kxk = hx, xi > 0, even though we don’t

Hilbert spaces

51

know yet that this really is a norm). Then (5.1) says that |hx, yi|2 , kxk2 and this implies the Cauchy-Schwarz inequality. Moreover, we can get equality in (5.1) only if cx + y = 0, so x, y are linearly dependent in this case. Conversely, if y = cx or x = cy, then it is easy to check that we do get equality in (b). We can now prove (a). Property (1) from Definition 5.1 immediately implies p condition (1) from Definition 2.1. Moreover, kcxk = p hcx, cxi = |c|2 hx, xi = |c| kxk, and the triangle inequality follows from the Cauchy-Schwarz inequality, as follows: 0 ≤ kyk2 −

kx + yk2 = hx + y, x + yi = kxk2 + kyk2 + 2 Re hx, yi ≤ kxk2 + kyk2 + 2kxk kyk = (kxk + kyk)2 .  Notice that we recover the usual norms on `2 and L2 , respectively, if we use the scalar products introduced in Example 5.1 here. p It now seems natural to ask if every norm is of the form kxk = hx, xi for some scalar product h·, ·i. This question admits a neat, satisfactory answer (although it must be admitted that this result doesn’t seem to have meaningful applications): Exercise 5.1. Let H be a vector space with a scalar product, and introduce a norm k · k on H as in Theorem 5.2(a). Then k · k satisfies the parallelogram identity: (5.2)

kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2

One can now show that (5.2) is also a sufficient condition for a norm to come from a scalar product (the Jordan-von Neumann Theorem). This converse is much harder to prove; we don’t want to discuss it in detail here. However, I will mention how to get this proof started. The perhaps somewhat surprising fact is that the norm already completely determines its scalar product (assuming now that the norm comes from a scalar product). In fact, we can be totally explicit, as Proposition 5.3 below will show. A slightly more general version is often useful; to state this, we need an additional definition: A sesquilinear form is a map s : H × H → C that is linear in the second argument and antilinear in the first (“sesquilinear” = one and a half linear): s(x, cy + dz) = cs(x, y) + ds(x, z) s(cx + dy, z) = cs(x, z) + ds(y, z)

52

Christian Remling

A scalar product has these properties, but this new notion is more general. Proposition 5.3 (The polarization identity). Let s be a sesquilinear form, and let q(x) = s(x, x). Then 1 [q(x + y) − q(x − y) + iq(x − iy) − iq(x + iy)] . 4 Exercise 5.2. Prove Proposition 5.3, by a direct calculation. s(x, y) =

This is an extremely useful tool and has many applications. The polarization identity suggest the principle “it is often enough to know what happens on the diagonal.” In the context of the Jordan-von Neumann Theorem, it implies that the scalar product can be recovered from its norm, as already mentioned above. This is in fact immediate now because if s(x, y) = hx, yi, then q(x) = kxk2 , so the polarization identity gives hx, yi in terms of the norms of four other vectors. Exercise 5.3. Use the result from Exercise 5.1 to prove that the norms k · kp on `p are not generated by a scalar product for p 6= 2. Given a scalar product on a space H, we always equip H with the norm from Theorem 5.2(a) also. In particular, all constructions and results on normed spaces can be used in this setting, and we have a topology on H. The following observation generalizes the result from Exercise 2.2(b) in this setting: Corollary 5.4. The scalar product is continuous: if xn → x, yn → y, then also hxn , yn i → hx, yi. Exercise 5.4. Deduce this from the Cauchy-Schwarz inequality. As usual, complete spaces are particularly important, so they again get a special name: Definition 5.5. A complete space with scalar product is called a Hilbert space. Or we could say a Hilbert space is a Banach space whose norm comes from a scalar product. By Example 5.1, `2 and L2 are Hilbert spaces (we of course know that these are Banach spaces, so there’s nothing new to check here). On the other hand, Exercise 5.3 says that `p cannot be given a Hilbert space structure (that leaves the norm intact) if p 6= 2. Hilbert spaces are very special Banach spaces. Roughly speaking, the scalar product allows us to introduce angles between vectors, and this

Hilbert spaces

53

additional structure makes things much more pleasant. There is no such notion on a general Banach space. In particular, a scalar product leads to a natural notion of orthogonality, and this can be used to great effect. In the sequel, H will always assumed to be a Hilbert space. We say that x, y ∈ H are orthogonal if hx, yi = 0. In this case, we also write x ⊥ y. If M ⊂ H is an arbitrary subset of H, we define its orthogonal complement by M ⊥ = {x ∈ H : hx, mi = 0 for all m ∈ M }. Exercise 5.5. Prove the following formula, which is reminiscent of the Pythagorean theorem: If x ⊥ y, then (5.3)

kx + yk2 = kxk2 + kyk2 .

Theorem 5.6. (a) M ⊥ is a closed subspace of H. ⊥ (b) M ⊥ = L(M )⊥ = L(M ) Here, L(M ) denotes the linear span of M , that is, L(M ) is the smallest subspace containing M . A more explicit description is also P possible: L(M ) = { nj=1 cj mj : cj ∈ C, mj ∈ M, n ∈ N} Proof. (a) To show that M ⊥ is a subspace, let x, y ∈ M ⊥ . Then, for arbitrary m ∈ M , hx + y, mi = hx, mi + hy, mi = 0, so x + y ∈ M ⊥ also. A similar argument works for multiples of vectors from M ⊥ . If xn ∈ M ⊥ , x ∈ H, xn → x and m ∈ M is again arbitrary, then, by the continuity of the scalar product (Corollary 5.4), hx, mi = lim hxn , mi = 0, n→∞

so x ∈ M ⊥ also and M ⊥ turns out to be closed, as claimed. (b) From the definition of A⊥ , it is clear that A⊥ ⊃ B ⊥ if A ⊂ ⊥ B. Since obviously M ⊂ L(M ) ⊂ L(M ), we obtain that L(M ) ⊂ L(M )⊥ ⊂ M ⊥ . On the other hand, if x ∈ M ⊥ , then hx, mi = 0 for all m ∈ M . Since the scalar product is linear in the second argument, this implies that hx, yi = 0 for all y ∈ L(M ). Since the scalar product is also continuous, it now follows that in fact hx, zi = 0 for all z ∈ L(M ), ⊥ that is, x ∈ L(M ) .  Exercise 5.6. (a) Show that the closure of a subspace is a subspace again. (This shows that L(M ) can be described as the smallest closed subspace containing M .) (b) Show that L(M ) ⊂ L(M ). (c) Show that it can happen that L(M ) 6= L(M ). Suggestion: Consider M = {en : n ≥ 1} ⊂ `2 .

54

Christian Remling

Theorem 5.7. Let M ⊂ H be a closed subspace of H, and let x ∈ H. Then there exists a unique best approximation to x in M , that is, there exists a unique y ∈ M so that kx − yk = inf kx − mk. m∈M

Proof. Write d = inf m∈M kx − mk and pick a sequence yn ∈ M with kx − yn k → d. The parallelogram identity (5.2) implies that kym − yn k2 = k(ym − x) − (yn − x)k2 = 2kym − xk2 + 2kyn − xk2 − kym + yn − 2xk2 = 2kym − xk2 + 2kyn − xk2 − 4k(1/2)(ym + yn ) − xk2 . Now if m, n → ∞, then the first two terms in this final expression both converge to 2d2 , by the choice of yn . Since (1/2)(ym +yn ) ∈ M , we have that k(1/2)(ym + yn ) − xk ≥ d for all m, n. It follows kym − yn k → 0 as m, n → ∞, so yn is a Cauchy sequence. Let y = limn→∞ yn . Since M is closed, y ∈ M , and by the continuity of the norm, kx − yk = lim kx − yn k = d, so y is a best approximation. To prove the uniqueness of y, assume that y 0 ∈ M also satisfies kx − y 0 k = d. Then, by the above calculation, with ym , yn replaced by y, y 0 , we have that ky − y 0 k2 = 2ky − xk2 + 2ky 0 − xk2 − 4k(1/2)(y + y 0 ) − xk2 = 4d2 − 4k(1/2)(y + y 0 ) − xk2 . Again, since (1/2)(y + y 0 ) ∈ M , this last norm is ≥ d, so the whole expression is ≤ 0 and we must have that y = y 0 , as desired.  These best approximations can be used to project orthogonally onto closed subspaces of a Hilbert space. More precisely, we have the following: Theorem 5.8. Let M ⊂ H be a closed subspace. Then every x ∈ H has a unique representation of the form x = y+z, with y ∈ M , z ∈ M ⊥ . Proof. Use Theorem 5.7 to define y ∈ M as the best approximation to x from M , that is, kx − yk ≤ kx − mk for all m ∈ M . Let z = x − y. We want to show that z ∈ M ⊥ . If w ∈ M , w 6= 0, and c ∈ C, then kzk2 ≤ kx − (y + cw)k2 = kz − cwk2 = kzk2 + |c|2 kwk2 − 2 Re chz, wi. In particular, with c = hw,zi , this shows that |hw, zi|2 ≤ 0, so hw, zi = 0, kwk2 and since this holds for every w ∈ M , we see that z ∈ M ⊥ , as desired. To show that the decomposition from Theorem 5.8 is unique, suppose that x = y + z = y 0 + z 0 , with y, y 0 ∈ M , z, z 0 ∈ M ⊥ . Then y − y 0 = z 0 − z ∈ M ∩ M ⊥ = {0}, so y = y 0 and z = z 0 . 

Hilbert spaces

55

Corollary 5.9. For an arbitrary subset A ⊂ H, we have that A⊥⊥ = L(A). Proof. From the definition of (. . .)⊥ , we see that B ⊂ B ⊥⊥ , so Theorem 5.6(b) implies that L(A) ⊂ A⊥⊥ . On the other hand, if x ∈ A⊥⊥ , we can use Theorem 5.8 to write ⊥ x = y + z with y ∈ L(A), z ∈ L(A) = A⊥ . The last equality again follows from Theorem 5.6(b). As just observed, we then also have that y ∈ A⊥⊥ and thus z = x − y ∈ A⊥ ∩ A⊥⊥ = {0}, so x = y ∈ L(A).  We now introduce a linear operator that produces the decomposition from Theorem 5.8. Let M ⊂ H be a closed subspace. We then define PM : H → H, PM x = y, where y ∈ M is as in Theorem 5.8; PM is called the (orthogonal) projection onto M . 2 Proposition 5.10. PM ∈ B(H), PM = PM , and if M 6= {0}, then kPM k = 1.

Proof. We will only compute the operator norm of PM here. 2 Exercise 5.7. Prove that PM is linear and PM = PM .

Write x = PM x+z. Then PM x ∈ M , z ∈ M ⊥ , so, by the Pythagorean formula (5.3), kxk2 = kPM xk2 + kzk2 ≥ kPM xk2 . Thus PM ∈ B(H) and kPM k ≤ 1. On the other hand, if x ∈ M , then PM x = x, so kPM k = 1 if M 6= {0}.  We saw in Chapter 4 that (`2 )∗ = `2 , (L2 )∗ = L2 . This is no coincidence. Theorem 5.11 (Riesz Representation Theorem). Every F ∈ H ∗ has the form F (x) = hy, xi, for some y = yF ∈ H. Moreover, kF k = kyF k. We can say more: conversely, every y ∈ H generates a bounded, linear functional F = Fy via Fy (x) = hy, xi. So we can define a map I : H → H ∗ , y 7→ Fy . This map is injective (why?), and, by the Riesz representation theorem, I is also surjective and isometric, so we obtain an identification of H with H ∗ . We need to be a little careful here, because I is antilinear, that is, Fy+z = Fy + Fz , as usual, but Fcy = cFy . Exercise 5.8. Deduce from this that Hilbert spaces are reflexive. If we ignore the identification maps and just pretend that H = H ∗ and proceed formally, then this becomes obvious: H ∗∗ = (H ∗ )∗ = H ∗ = H. Please give a careful argument. Recall that you really need to

56

Christian Remling

show that j(H) = H ∗∗ , where j was defined in Chapter 4. (This is surprisingly awkward to write down; perhaps you want to use the fact that F : X → C is antilinear precisely if F is linear.) Exercise 5.9. Let X be a (complex) vector space and let F : X → C be a linear functional, F 6= 0. (a) Show that codim N (F ) = 1, that is, show that there exists a onedimensional subspace M ⊂ X, M ∩ N (F ) = {0}, M + N (F ) = X. (This is an immediate consequence of linear algebra facts, but you can also do it by hand.) (b) Let F, G be linear functionals with N (F ) = N (G). Then F = cG for some c ∈ C, c 6= 0. Proof of Theorem 5.11. This is surprisingly easy; Exercise 5.9 provides the motivation for the following argument and also explains why this procedure (take an arbitrary vector from N (F )⊥ ) works. If F = 0, we can of course just take y = 0. If F 6= 0, then N (F ) 6= H, and N (F ) is a closed subspace because F is continuous. Therefore, Theorem 5.8 shows that N (F )⊥ 6= {0}. Pick a vector z ∈ N (F )⊥ , z 6= 0. Then, for arbitrary x ∈ H, F (z)x − F (x)z ∈ N (F ), so 0 = hz, F (z)x − F (x)zi = F (z)hz, xi − F (x)kzk2 . (z) Rearranging, we obtain that F (x) = hy, xi, with y = Fkzk 2 z. Since |hy, xi| ≤ kyk kxk, we have that kF k ≤ kyk. On the other hand, F (y) = kyk2 , so kF k = kyk. 

Exercise 5.10. Corollary 4.2(b), when combined with the Riesz Representation Theorem, implies that kxk = sup |hy, xi| . kyk=1

Give a quick direct proof of this fact. Exercise 5.11. We don’t need the Hahn-Banach Theorem on Hilbert spaces because the Riesz Representation Theorem gives a much more explicit description of the dual space. Show that it in fact implies the following stronger version of Hahn-Banach: If F0 : H0 → C is a bounded linear functional on a subspace H0 , then there exists a unique bounded linear extension F : H → C with kF k = kF0 k. Remark: If you want to avoid using the Hahn-Banach Theorem here, you could as a first step extend F0 to H0 , by using Exercise 2.26. The Riesz Representation Theorem also shows that on a Hilbert w space, xn − → x if and only if hy, xn i → hy, xi; compare Exercise 4.7.

Hilbert spaces

57

w

Exercise 5.12. Assume that xn − → x. (a) Show that kxk ≤ lim inf n→∞ kxn k. (b) Show that it can happen that kxk < lim inf kxn k. (c) On the other hand, if kxn k → kxk, then kxn − xk → 0 (prove this). As our final topic in this chapter, we discuss orthonormal bases in Hilbert spaces. Definition 5.12. A subset {xα : α ∈ I} is called an orthogonal system if hxα , xβ i = 0 for all α 6= β. If, in addition, all xα are normalized (so hxα , xβ i = δαβ ), we call {xα } an orthonormal system (ONS). A maximal ONS is called an orthonormal basis (ONB). Theorem 5.13. Every Hilbert space has an ONB. Moreover, any ONS can be extended to an ONB. This looks very plausible (if an ONS isn’t maximal yet, just keep adding vectors). The formal proof depends on Zorn’s Lemma; we don’t want to do it here. Theorem 5.14. Let {xα } be an ONB. Then, for every y ∈ H, we have that X y= hxα , yixα , α∈I

kyk2 =

X

|hxα , yi|2

(Parseval’s identity).

α∈I

P P 2 |c | < ∞, then the series If, conversely, cα ∈ C, α α∈I cα xα α∈I converges to an element y ∈ H. To make this precise, we need to define sums over arbitrary index sets. We encountered this problem before, in Chapter 2, when P defining the space `2 (I), and we will use the same procedure here: α∈I wα = z means that wα 6= 0 for at most countably many α ∈ I and if {αn } is P an arbitrary enumeration of these α’s, then limN →∞ N n=1 wαn = z. In this definition, we can have wα , z ∈ H or ∈ C. In this latter case, we can also again use counting measure on I to obtain a more elegant formulation. Theorem 5.14 can now be rephrased in a more abstract way. Consider the map U : H → `2 (I), (U y)α = hxα , yi. Theorem 5.14 says that this is well defined, bijective, and isometric. Moreover, U is also obviously linear. So, summing up, we have a bijection U ∈ B(H, `2 ) that also preserves the scalar product: hU x, U yi = hx, yi.

58

Christian Remling

Exercise 5.13. Prove this last statement. Hint: Polarization! Such maps are called unitary; they preserve the complete Hilbert space structure. In other words, we can now say that Theorem 5.14 shows that H ∼ = `2 (I) for an arbitrary Hilbert space H; more specifically, I can be taken as the index set of an ONB. So we have a onesize-fits-all model space, namely `2 (I); there is no such universal model for Banach spaces. There is a version of Theorem 5.14 for ONS; actually, we will prove the two results together. Theorem 5.15. Let {xα } be an ONS. Then, for every y ∈ H, we have that X PL(xα ) y = hxα , yixα , α∈I

kyk2 ≥

X

|hxα , yi|2

(Bessel’s inequality).

α∈I

Proof of Theorems 5.14, 5.15. We start by establishing Bessel’s inequality for finite ONS {x1 , . . . , xN }. Let y ∈ H and write ! N N X X hxn , yixn . hxn , yixn + y − y= n=1

n=1

A calculation shows that the two terms on the right-hand side are orthogonal, so

2

2 N N

X

X



2 hxn , yixn kyk = hxn , yixn + y −



n=1 n=1

2 N N N

X X X

hxn , yixn ≥ |hxn , yi|2 . = |hxn , yi|2 + y −

n=1 n=1 n=1 This is Bessel’s inequality for finite ONS. It now follows that the sets {α ∈ I : |hxα , yi| ≥ 1/n} are finite, so {α : hxα , yi = 6 0} is countable. Let {α1 , α2 , . . .} be an enumeration. Then, by Bessel’s inequality (we’re P 2 still referring to the version for finite ONS), limN →∞ N n=1 |hxαn , yi| exists, and, since we have absolute convergence Phere, the limit is independent of the enumeration. If we recall how α∈I . . . was defined, we see that we have proved the general P version of Bessel’s inequality. As the next step, define yn = nj=1 hxαj , yixαj . If n ≥ m (say), then

2 n n

X

X

2 hxα , yi 2 . kym − yn k = hxαj , yixαj = j

j=m+1

j=m+1

Hilbert spaces

This shows that yn is a Cauchy sequence. Let y 0 = lim yn = By the continuity of the scalar product, hxαk , y − y 0 i = hxαk , yi −

59

P∞

j=1 hxαj , yixαj .

∞ X hxαj , yiδjk = 0 j=1

for all k ∈ N, and if α ∈ I \ {αj }, then we also obtain that ∞ X hxα , y − y i = −hxα , y i = − hxαj , yihxα , xαj i = 0. 0

0

j=1 ⊥

So y − y 0 ∈ {xα }⊥ = L(xα ) , and, by its construction, y 0 ∈ L(xα ). Thus y 0 = PL(xα ) y, as claimed in Theorem 5.15. It now also follows P that α∈I hxα , yixα exists because we always obtain the same limit 0 y = PL(xα ) y, no matter how the αj are arranged. To obtain Theorem 5.14, we observe that if {xα } is an ONB, then L(xα ) = H. Indeed, if this were not true, then the closed subspace L(xα ) would have to have a non-zero orthogonal complement, by Theorem 5.8, and we could pass to a bigger ONS by adding a normalized vector from this orthogonal complement. So L(xα ) = H if {xα } is an ONB, but then also y 0 = y, and Parseval’s identity now follows from the continuity of the norm:

2 N N

X

X X

|hxαi , yi|2 = |hxα , yi|2 kyk2 = lim hxαi , yixαi = lim N →∞ N →∞

i=1 i=1 α∈I P Finally, similar arguments show that cα xα exists for c ∈ `2 (I) (consider the partial sums and check that these form a Cauchy sequence).  We can try to summarize this as follows: once an ONB is fixed, we may use the coefficients with respect to this ONB to manipulate vectors; in particular, there is an easy formula (Parseval’s identity) that will give the norm in terms of these coefficients. The situation is quite similar to linear algebra: coefficients with respect to a fixed basis is all we need to know about vectors. Note, however, that ONB’s are not bases in the sense of linear algebra: we use infinite linear combinations (properly defined as limits) to represent vectors. Algebraic bases on infinite-dimensional Hilbert spaces exist, too, but they are almost entirely useless (for example, they can never be countable). Exercise 5.14. Show that {en : n ∈ N} is an ONB of `2 = `2 (N).

60

Christian Remling

 dx . Exercise 5.15. Show that {einx : n ∈ Z} is an ONB of L2 (−π, π); 2π Suggestion: You should not try to prove the maximality directly, but rather refer to suitable facts from Analysis (continuous functions are dense in L2 , and they can be uniformly approximated, on compact sets, by trigonometric polynomials). Exercise 5.16. (a) For f ∈ L2 (−π, π), define the nth Fourier coefficient as Z π b f (x)e−inx dx, fn = −π

and use the result from Exercise 5.15 to establish the following formula, which is also known as Parseval’s identity: Z π ∞ X b 2 |f (x)|2 dx fn = 2π −π

n=−∞

(b) Prove that ∞ X 1 π2 = . n2 6 n=1

Suggestion: Part (a) with f (x) = x. Exercise 5.17. The Rademacher functions R0 (x) = 1, ( S n−1 1 x ∈ 2k=0 −1 [k21−n , (2k + 1)2−n ) Rn (x) = −1 else form an ONS, but not an ONB in L2 (0, 1). (Please plot the first few functions to get your bearings here.) Exercise 5.18. (a) Let U : H1 → H2 be a unitary map between Hilbert spaces, and let {xα } be an ONB of H1 . Show that {U xα } is an ONB of H2 . (b) Conversely, let U : H1 → H2 be a linear map that maps an ONB to an ONB again. Show that U is unitary. Exercise 5.19. Show that a Hilbert space is separable precisely if it has a countable ONB.

Operators in Hilbert spaces

61

6. Operators in Hilbert spaces Let H be a Hilbert space. In this chapter, we are interested in basic properties of operators T ∈ B(H) on this Hilbert space. First of all, we would like to define an adjoint operator T ∗ , and its defining property should be given by hT ∗ y, xi = hy, T xi. It is not completely clear, however, that this indeed defines a new operator T ∗ . To make this idea precise, we proceed as follows: Fix y ∈ H and consider the map H → C, x 7→ hy, T xi. It is clear that this is a linear map, and |hy, T xi| ≤ kyk kT xk ≤ kyk kT k kxk, so the map is also bounded. By the Riesz Representation Theorem, there exists a unique vector z = zy ∈ H so that hy, T xi = hzy , xi for all x ∈ H. We can now define a map T ∗ : H → H, T ∗ y = zy . By construction, we then indeed have that hT ∗ y, xi = hy, T xi for all x, y ∈ H; conversely, this condition uniquely determines T ∗ y for all y ∈ H. We call T ∗ the adjoint operator (of T ). Theorem 6.1. Let S, T ∈ B(H), c ∈ C. Then: (a) T ∗ ∈ B(H); (b) (S + T )∗ = S ∗ + T ∗ , (cT )∗ = cT ∗ ; (c) (ST )∗ = T ∗ S ∗ ; (d) T ∗∗ = T ; (e) If T is invertible, then T ∗ is also invertible and (T ∗ )−1 = (T −1 )∗ ; (f ) kT k = kT ∗ k, kT T ∗ k = kT ∗ T k = kT k2 (the C ∗ property) Here, we call T ∈ B(H) invertible (it would be more precise to say: invertible in B(H)) if there exists an S ∈ B(H) so that ST = T S = 1. In this case, S with these properties is unique and we call it the inverse of T and write S = T −1 . Notice that this version of invertibility requires more than just injectivity of T as a map: we also require the inverse map to be continuous and defined everywhere on H (and linear, but this is automatic). So we can also say that T ∈ B(H) is invertible (in this sense) precisely if T is bijective on H and has a continuous inverse. Actually, Corollary 3.3 shows that this continuity is automatic also, so T ∈ B(H) is invertible precisely if T is a bijective map. Exercise 6.1. (a) Show that it is not enough to have just one of the equations ST = 1, T S = 1: Construct two non-invertible maps S, T ∈ B(H) (on some Hilbert space H; H = `2 seems a good choice) that nevertheless satisfy ST = 1. (b) However, if H is finite-dimensional and ST = 1, then both S and T will be invertible. Prove this.

62

Christian Remling

Proof. (a) The (anti-)linearity of the scalar product implies that T ∗ is linear; for example, hcT ∗ y, xi = hT ∗ y, cxi = hy, T (cx)i = hcy, T xi for all x ∈ H, so T ∗ (cy) = cT ∗ y. Furthermore, sup kT ∗ yk = kyk=1

sup

|hT ∗ y, xi| =

kxk=kyk=1

sup

|hy, T xi| = kT k,

kxk=kyk=1

so T ∗ ∈ B(H) and kT ∗ k = kT k. Parts (b), (c) follow directly from the definition of the adjoint operator. For example, to verify (c), we just observe that hy, ST xi = hS ∗ y, T xi = hT ∗ S ∗ y, xi, so (ST )∗ y = T ∗ S ∗ y. (d) We have that hy, T ∗ xi = hT ∗ x, yi = hx, T yi = hT y, xi, thus ∗∗ T y = T y. (e) Obviously, 1∗ = 1. So if we take adjoints in T T −1 = T −1 T = 1 and use (c), we obtain that (T −1 )∗ T ∗ = T ∗ (T −1 )∗ = 1, and since (T −1 )∗ ∈ B(H), this says that T ∗ is invertible and (T ∗ )−1 = (T −1 )∗ . (f) We already saw in the proof of part (a) that kT ∗ k = kT k. It is then also clear that kT ∗ T k ≤ kT ∗ kkT k = kT k2 . On the other hand, kT ∗ T k =

sup kxk=kyk=1

|hy, T ∗ T xi| ≥ sup |hx, T ∗ T xi| = sup kT xk2 = kT k2 , kxk=1

kxk=1

so kT ∗ T k = kT k2 . If applied to T ∗ in place of T , this also gives that kT T ∗ k = kT ∗∗ T ∗ k = kT ∗ k2 = kT k2 .  Theorem 6.2. Let T ∈ B(H). Then N (T ∗ ) = R(T )⊥ . Proof. We have x ∈ N (T ∗ ) precisely if hT ∗ x, yi = 0 for all y ∈ H, and this happens if and only if hx, T yi = 0 (y ∈ H). This, in turn, holds if and only if x ∈ R(T )⊥ .  We will be especially interested in Hilbert space operators with additional properties. Definition 6.3. Let T ∈ B(H). We call T self-adjoint if T = T ∗ , unitary if T T ∗ = T ∗ T = 1 and normal if T T ∗ = T ∗ T . So self-adjoint and unitary operators are also normal. We introduced unitary operators earlier, in Chapter 5, in the more general setting of operators between two Hilbert spaces; recall that we originally defined these as maps that preserve the complete Hilbert space structure (that is, the algebraic structure and the scalar product). Theorem 6.4(b) below will make it clear that the new definition is equivalent to the old one (for maps on one space). Also, notice that U is unitary precisely if U is invertible (in B(H), as above) and U −1 = U ∗ . Here are some additional reformulations:

Operators in Hilbert spaces

63

Theorem 6.4. Let U ∈ B(H). Then the following statements are equivalent: (a) U is unitary; (b) U is bijective and hU x, U yi = hx, yi for all x, y ∈ H; (c) U is surjective and isometric (that is, kU xk = kxk for all x ∈ H). Exercise 6.2. Prove Theorem 6.4. Suggestion: Use polarization to derive (b) from (c). We now take a second look at (orthogonal) projections. Recall that, by definition, the projection on M (where M ⊂ H is a closed subspace) is the operator that sends x ∈ H to y ∈ M , where y is the part from M in the (unique) decomposition x = y + z, y ∈ M , z ∈ M ⊥ . If P = PM is such a projection, then it has the following properties: P 2 = P (see Proposition 5.10), R(P ) = M , N (P ) = M ⊥ Exercise 6.3. Prove these latter two properties. Also, show that P x = x if and only if x ∈ M = R(P ). Theorem 6.5. Let P ∈ B(H). Then the following are equivalent: (a) P is a projection; (b) 1 − P is a projection; (c) P 2 = P and R(P ) = N (P )⊥ ; (d) P 2 = P and P is self-adjoint; (e) P 2 = P and P is normal. Proof. (a) =⇒ (b): It is clear from the definition of PM and the fact that M ⊥⊥ = M that 1 − P is the projection onto M ⊥ if P is the projection onto M . (b) =⇒ (a): This is the same statement, applied to 1 − P in place of P . (a) =⇒ (c): This was already observed above, see Exercise 6.3. (c) =⇒ (a): If y ∈ R(P ), so y = P u for some u ∈ H, then we obtain that P y = P 2 u = P u = y. On the other hand, if z ∈ R(P )⊥ = N (P ) (we make use of the fact that N (P ) is a closed subspace, because P is continuous), then P z = 0. Now let x ∈ H be arbitrary and use Theorem 5.8 to decompose x = y + z, y ∈ R(P ), z ∈ R(P )⊥ . Note that R(P ) is a closed subspace because it is the orthogonal complement of N (P ) by assumption. By our earlier observations, P x = P y + P z = y, so indeed P is the projection on R(P ). (a) =⇒ (d): Again, we already know that P 2 = P . Moreover, for arbitrary x, y ∈ H, we have that (6.1)

hP x, P yi = hx, P yi = hP x, yi,

64

Christian Remling

because, for example, x = P x + (1 − P )x, but (1 − P )x ⊥ P y. The second equality in (6.1) says that P ∗ = P , as desired. (d) =⇒ (e) is trivial. (e) =⇒ (c): Since P is normal, we have that kP xk2 = hP x, P xi = hP ∗ P x, xi = hP P ∗ x, xi = kP ∗ xk2 . In particular, this implies that N (P ) = N (P ∗ ), and Theorem 6.2 then shows that N (P ) = R(P )⊥ . We could finish the proof by passing to the orthogonal complements here if we also knew that R(P ) is closed. We will establish this by showing that R(P ) = N (1 − P ) (which is closed, being the null space of a continuous operator). Clearly, if x ∈ R(P ), then x = P y for some y ∈ H and thus (1 − P )x = P 2 y − P y = 0, so x ∈ N (1 − P ). Conversely, if x ∈ N (1 − P ), then x = P x ∈ R(P ).  For later use, we also note the following technical property of projections: Proposition 6.6. Let P, Q be projections. Then P Q is a projection if and only if P Q = QP . In this case, R(P Q) = R(P ) ∩ R(Q). Proof. If P Q is a projection, then it satisfies condition (d) from Theorem 6.5, so P Q = (P Q)∗ = Q∗ P ∗ = QP . Conversely, if we assume that P Q = QP , then the same calculation shows that P Q is self-adjoint. Since we also have that (P Q)2 = P QP Q = P P QQ = P 2 Q2 = P Q, it now follows from Theorem 6.5 that P Q is a projection. To find its range, we observe that R(P Q) ⊂ R(P ), but also R(P Q) = R(QP ) ⊂ R(Q), so R(P Q) ⊂ R(P ) ∩ R(Q). On the other hand, if x ∈ R(P ) ∩ R(Q), then P x = Qx = x, so P Qx = x and thus x ∈ R(P Q).  On the finite-dimensional Hilbert space H = Cn , every operator T ∈ B(Cn ) (equivalently, every matrix T ∈ Cn×n ) can be brought to a relatively simple canonical form (the Jordan normal form) by a change of basis. In fact, usually operators are diagonalizable. Exercise 6.4. Can you establish the following precise version: The set of diagonalizable matrices is a dense open subset of Cn×n , where we use the topology generated by the operator norm. (In fact, by Theorem 2.15, any other norm will give the same topology.) The situation on infinite-dimensional Hilbert spaces is much more complicated. We cannot hope for a normal form theory for general Hilbert space operators. In fact, the following much more modest question is a famous long-standing open problem:

Operators in Hilbert spaces

65

Does every T ∈ B(H) have a non-trivial invariant subspace? (the invariant subspace problem) Here, a closed subspace M ⊂ H is called invariant if T M ⊂ M ; the trivial invariant subspaces are {0} and H. Exercise 6.5. (a) Show that every T ∈ Cn×n = B(Cn ) has a non-trivial invariant subspace. (b) Show that L({T n x : n ≥ 1}) is an invariant subspace (possibly trivial) for every x ∈ H. (c) Deduce from (b) that every T ∈ B(H) on a non-separable Hilbert space H has a non-trivial invariant subspace. Of course, we wouldn’t really gain very much even from a positive answer to the invariant subspace problem; this would just make sure that every operator has some smaller part that could be considered separately. The fact that the invariant subspace problem is universally recognized as an exceedingly hard problem makes any attempt at a general structure theory for Hilbert space operators completely hopeless. We will therefore focus on normal operators, which form an especially important subclass of Hilbert space operators. Here, we will be able to develop a powerful theory. The fundamental result here is the spectral theorem; we will prove this in Chapter 10, after a few detours. It is also useful to recall from linear algebra that a normal matrix T ∈ Cn×n can be diagonalized; in fact, this is done by changing from the original basis to a new ONB, consisting of the eigenvectors of T . Generally speaking, the eigenvalues and eigenvectors of a matrix take center stage in the analysis in the finite-dimensional setting, so it seems a good idea to try to generalize these notions. We do this as follows (actually, we only generalize the concept of an eigenvalue here): Definition 6.7. For T ∈ B(H), define ρ(T ) = {z ∈ C : T − z is invertible in B(H)}, σ(T ) = C \ ρ(T ). We call ρ(T ) the resolvent set of T and σ(T ) the spectrum of T . Exercise 6.6. Show that σ(T ) is exactly the set of eigenvalues of T if T ∈ B(Cn ) is a matrix. This confirms that we may hope to have made a good definition, but perhaps the more obvious try would actually have gone as follows: Call z ∈ C an eigenvalue of T ∈ B(H) if there exists an x ∈ H, x 6= 0, so

66

Christian Remling

that Hx = zx, and introduce σp (T ) as the set of eigenvalues of T ; we also call σp (T ) the point spectrum of T . However, this doesn’t work very well in the infinite-dimensional setting: Exercise 6.7. Consider the operator S ∈ `2 (Z), (Sx)n = xn+1 (S as in shift), and prove the following facts about S: (a) S is unitary; (b) σp (S) = ∅. We can also obtain an example of a self-adjoint operator with no eigenvalues from this, by letting T = S + S ∗ . Then T = T ∗ (obvious), and again σp (T ) = ∅ (not obvious, and in fact you will probably need to use a few facts about difference equations to prove this; this part of the problem is optional). Exercise 6.8. Show that σp ⊂ σ. Exercise 6.9. Here’s another self-adjoint operator with no eigenvalues; compare Exercise 6.7. Define T : L2 (0, 1) → L2 (0, 1) by (T f )(x) = xf (x). (a) Show that T ∈ B(L2 (0, 1)) and T = T ∗ , and compute kT k. (b) Show that σp (T ) = ∅. Can you also show that σ(T ) = [0, 1]? Exercise 6.10. Let s(x, y) be a sesquilinear form that is bounded in the sense that M ≡ sup |s(x, y)| < ∞. kxk=kyk=1

Show that there is a unique operator T ∈ B(H) so that s(x, y) = hx, T yi. Show also that kT k = M . Hint: Apply the Riesz Representation Theorem to the map x 7→ s(x, y), for fixed but arbitrary y ∈ H. Exercise 6.11. Let T : H → H be a linear operator, and assume that hT x, yi = hx, T yi for all x, y ∈ H. Show that T is bounded (the Hellinger-Toeplitz Theorem). Suggestion: Show that T is closed and apply the closed graph theorem.

Banach algebras

67

7. Banach algebras Definition 7.1. A is called a Banach algebra (with unit) if: (1) A is a Banach space; (2) There is a multiplication A × A → A that has the following properties: (xy)z = x(yz),

(x + y)z = xz + yz,

x(y + z) = xy + xz,

c(xy) = (cx)y = x(cy) for all x, y, z ∈ A, c ∈ C. Moreover, there is a unit element e: ex = xe = x for all x ∈ A; (3) kek = 1; (4) kxyk ≤ kxk kyk for all x, y ∈ A. So a Banach algebra is an algebra (a vector space with multiplication, satisfying the usual algebraic rules) and also a Banach space, and these two structures are compatible (see conditions (3), (4)). At the end of the last chapter, we decided to try to analyze normal operators on a Hilbert space H. Banach algebras will prove useful here, because of the following: Example 7.1. If X is a Banach space, then A = B(X) is a Banach algebra, with the composition of operators as multiplication and the operator norm. Indeed, we know from Theorem 2.12(b) that A is a Banach space, and composition of operators has the properties from (2) of Definition 7.1. The identity operator 1 is the unit element; of course k1k = supkxk=1 kxk = 1, as required, and (4) was discussed in Exercise 2.25. Of course, there are more examples: Example 7.2. A = C with the usual multiplication and the absolute value as norm is a Banach algebra. Example 7.3. A = C(K) with the pointwise multiplication (f g)(x) = f (x)g(x) is a Banach algebra. Most properties are obvious. The unit element is the function e(x) ≡ 1; clearly, this has norm 1, as required. To verify (4), notice that kf gk = max |f (x)g(x)| ≤ max |f (x)| max |g(x)| = kf k kgk. x∈K

x∈K

x∈K

Example 7.4. Similarly, A = L∞ and A = `∞ with the pointwise multiplication are Banach algebras. Notice that the last three examples are in fact commutative Banach algebras, that is, xy = yx for all x, y ∈ A. On the other hand, B(X) is not commutative if dim X > 1.

68

Christian Remling

Example 7.5. A = L1 (R) with the convolution product Z ∞ f (t)g(x − t) dt (f g)(x) = (f ∗ g)(x) = −∞

satisfies most of the properties from Definition 7.1, but does not have a unit element, so this would provide an example of a Banach algebra without a unit. On the other hand, the discrete analog A = `1 (Z) with the convolution product ∞ X (xy)n = xj yn−j j=−∞

is a Banach algebra with unit. Both L1 and `1 are commutative. Exercise 7.1. Prove the claims about the unit elements: Show that there is no function f ∈ L1 (R) so that f ∗ g = g ∗ f = g for all g ∈ L1 (R). Also, find the unit element e of `1 (Z). We now start to develop the general theory of Banach algebras. Theorem 7.2. Multiplication is continuous in Banach algebras: If xn → x, yn → y, then xn yn → xy. Proof. kxn yn − xyk ≤ k(xn − x)yn k + kx(yn − y)k ≤ kxn − xk kyn k + kxk kyn − yk → 0  We call x ∈ A invertible if there exists y ∈ A so that xy = yx = e. Note that on the Banach algebra B(H), this reproduces the definition of invertibility in B(H) that was given earlier, in Chapter 6. Returning to the general situation, we observe that if x ∈ A is invertible, then y with these properties is unique. We write y = x−1 and call x−1 the inverse of x. We denote the set of invertible elements by G(A). Here, the choice of symbol is motivated by the fact that G(A) is a group, with multiplication as the group operation. Indeed, we have that xy ∈ G(A) and x−1 ∈ G(A) if x, y ∈ G(A); this can be verified by just writing down the inverses: (xy)−1 = y −1 x−1 , (x−1 )−1 = x. Moreover, e ∈ G(A) (e−1 = e), and of course multiplication is associative. If A, B are algebras, then a map φ : A → B is called a homomorphism if it preserves the algebraic structure. More precisely, we demand that φ is linear (as a map between vector spaces) and φ(xy) = φ(x)φ(y) for all x, y ∈ A. If B = C here and φ 6≡ 0, we call φ a complex homomorphism.

Banach algebras

69

Proposition 7.3. Let φ be a complex homomorphism. Then φ(e) = 1 and φ(x) 6= 0 for all x ∈ G(A). Proof. Since φ 6≡ 0, there is a y ∈ A with φ(y) 6= 0. Since φ(y) = φ(ey) = φ(e)φ(y), it follows that φ(e) = 1. If x ∈ G(A), then φ(x)φ(x−1 ) = φ(e) = 1, so φ(x) 6= 0.  Exercise 7.2. Let A be an algebra, with unit e. True or false: (a) f x = x for all x ∈ A =⇒ f = e; (b) 0x = 0 for all x ∈ A; (c) xy = 0 =⇒ x = 0 or y = 0; (d) xy = zx = e =⇒ x ∈ G(A) and y = z = x−1 ; (e) xy, yx ∈ G(A) =⇒ x, y ∈ G(A); (f) xy = e =⇒ x ∈ G(A) or y ∈ G(A); (g) If B is another algebra with unit e0 and φ : A → B is a homomorphism, φ 6≡ 0, then φ(e) = e0 . Theorem 7.4. Let A be a Banach algebra. If x ∈ A, kxk < 1, then e − x ∈ G(A) and (e − x)−1 =

(7.1)

∞ X

xn .

n=0

Moreover, if φ is a complex homomorphism, then |φ(x)| < 1. Here, we define xn = xx · · · x as the n-fold product of x with itself, and x0 := e. The series from (7.1) is then defined, as usual, as the norm limit of the partial sums (existence of this limit is part of the statement, of course). It generalizes the geometric series to the Banach algebra setting and is called the Neumann series. n n Proof. Property (4) from Definition P 7.1n implies that kx k ≤ kxk . Since kxk < 1, we now see that kx k converges. It follows that the Neumann series converges, too (see Exercise 2.22). By the continuity of the multiplication in A,

(e − x)

∞ X

n

x = (e − x) lim

N →∞

n=0

= lim

N →∞

N X n=0

N X

n

x = lim (e − x) N →∞

n=0

xn −

N X

N X n=0

xn

n=0

! xn+1

 = lim e − xN +1 = e. N →∞

P n A similar calculation shows that ( ∞ n=0 x ) (e − x) = e, so indeed e − x ∈ G(A) and the inverse is given by (7.1).

70

Christian Remling

If c ∈ C, |c| ≥ 1, then, by what has just been shown, e − (1/c)x ∈ G(A), so φ(e − (1/c)x) = 1 − (1/c)φ(x) 6= 0 by Proposition 7.3, that is, φ(x) 6= c.  Corollary 7.5. (a) G(A) is open. More precisely, if x ∈ G(A) and 1 khk < kx−1 , then x + h ∈ G(A) also. k (b) If φ is a complex homomorphism, then φ ∈ A∗ and kφk = 1. Proof. (a) Write x + h = x(e + x−1 h). Since kx−1 hk ≤ kx−1 k khk < 1, Theorem 7.4 shows that e + x−1 h ∈ G(A). Since also x ∈ G(A) and G(A) is a group, it follows that x + h ∈ G(A), too. (b) The last part of Theorem 7.4 says that φ is bounded and kφk ≤ 1. Since φ(e) = 1 and kek = 1, it follows that kφk = 1.  Exercise 7.3. We can also run a more quantitative version of the argument from (a) to obtain the following: Inversion in Banach algebras is a continuous operation. More precisely, if x ∈ G(A) and  > 0, then there exists δ > 0 so that y ∈ G(A) and ky −1 − x−1 k <  if ky − xk < δ. Prove this. We now introduce the Banach algebra version of Definition 6.7. Definition 7.6. Let x ∈ A. Then we define ρ(x) = {z ∈ C : x − ze ∈ G(A)}, σ(x) = C \ ρ(x), r(x) = sup{|z| : z ∈ σ(x)}. We call ρ(x) the resolvent set, σ(x) the spectrum, and r(x) the spectral radius of x. Also, (x − ze)−1 , which is defined for z ∈ ρ(x), is called the resolvent of x. Theorem 7.7. (a) ρ(x) is an open subset of C. (b) The resolvent R(z) = (x−ze)−1 admits power series representations about every point z0 ∈ ρ(x). More specifically, if z0 ∈ ρ(x), then there exists r > 0 so that {z : |z − z0 | < r} ⊂ ρ(x) and (x − ze)−1 =

∞ X (x − z0 e)−n−1 (z − z0 )n n=0

for all z with |z − z0 | < r. Here we define y −n , for n ≥ 0 and invertible y, as y −n = (y −1 )n . More succinctly, we can say that the resolvent R(z) is a holomorphic function (which takes values in a Banach algebra) on ρ(x); we then simply define this notion by the property from Theorem 7.7(b).

Banach algebras

71

Proof. (a) This is an immediate consequence of Corollary 7.5 because kx − ze − (x − z0 )ek = |z − z0 |. (b) As in (a) and the proof of Corollary 7.5(a), we see that Br (z0 ) ⊂ ρ(x) if we take r = 1/k(x−z0 e)−1 k. Moreover, we can use the Neumann series to expand R(z), as follows:  −1 (x − ze)−1 = (e − (z − z0 )(x − z0 e)−1 )(x − z0 e) −1  = (x − z0 e)−1 e − (z − z0 )(x − z0 e)−1 ∞ X −1 (x − z0 e)−n (z − z0 )n = (x − z0 e) n=0 ∞ X (x − z0 e)−n−1 (z − z0 )n = n=0

We have used the continuity of the multiplication in the last step.  Theorem 7.8. (a) σ(x) is a compact, non-empty subset of C. (b) r(x) = inf n∈N kxn k1/n = limn→∞ kxn k1/n The existence of the limit in part (b) is part of the statement. Note also that kxn k ≤ kxkn , by using property (4) from Definition 7.1 repeatedly, so we always have that r(x) ≤ kxk. Strict inequality is possible here. The inconspicuous spectral radius formula from part (b) has a rather remarkable property: r(x) is a purely algebraic quantity (to work out r(x), find the biggest |z| for which x−ze does not have a multiplicative inverse), but nevertheless r(x) is closely related to the norm on A via the spectral radius formula. Proof. (a) We know from Theorem 7.7(a) that σ(x) = C\ρ(x) is closed. Moreover, if |z| > kxk, then x − ze = (−z)(e − (1/z)x) ∈ G(A) by Theorem 7.4, so σ(x) is also bounded and thus a compact subset of C. We also obtain the representation (7.2)

(x − ze)−1 = −

∞ X

z −n−1 xn

n=0

from Theorem 7.4; this is valid for |z| > kxk. Suppose now that we had σ(x) = ∅. For an arbitrary F ∈ A∗ , we can introduce the function g : ρ(x) → C, g(z) = F ((x − ze)−1 ). Since we are assuming that σ(x) = ∅, this function is really defined on all of C. Moreover, by using Theorem 7.7(b) and the continuity of F , we see that g has convergent power series representations about every point and thus is holomorphic

72

Christian Remling

(in the traditional sense). If |z| ≥ 2kxk, then (7.2) yields ! ∞ ∞ X X −n−1 n |g(z)| = F z x ≤ kF k |z|−n−1 kxkn ≤

kF k |z|

n=0 ∞ X

n=0

2−n =

n=0

2kF k . |z|

So g is a bounded entire function. By Liouville’s Theorem, g must be constant. Since g(z) → 0 as |z| → ∞, this constant must be zero. This, however, is not possible, because F (y) = 0 for all F ∈ A∗ would imply that y = 0, by Corollary 4.2(b), but clearly the inverse (x−ze)−1 can not be the zero element of A. The assumption that σ(x) = ∅ must be dropped. (b) Let n ∈ N and let z ∈ C be such that z n ∈ ρ(xn ). We can write xn − z n e = (x − ze)(z n−1 e + z n−2 x + · · · + xn−1 ), and now multiplication from the right by (xn −z n e)−1 shows that x−ze has a right inverse. A similar calculation provides a left inverse also, so it follows that z ∈ ρ(x) (we are using Exercise 7.2(d) here!). Put differently, z n ∈ σ(xn ) if z ∈ σ(x). The proof of part (a) has shown that |z| ≤ kyk for all z ∈ σ(y), so we now obtain that |z n | ≤ kxn k for all z ∈ σ(x). Since the spectral radius r(x) was defined as the maximum of the spectrum (we cautiously worked with the supremum in the original definition, but we now know that σ(x) is a compact set), this says that r(x) ≤ inf kxn k1/n . Next, consider again the function g(z) = F ((x−ze)−1 ), with F ∈ A∗ . This is holomorphic on ρ(x) ⊃ {z ∈ C : |z| > r(x)}. Furthermore, for |z| > kxk, we have the power series expansion (in z −1 ) g(z) = −

∞ X

F (xn ) z −1

n+1

.

n=0

This shows that g is holomorphic near z = ∞; more precisely, if we let ζ = 1/z and h(ζ) =P g(1/ζ), then h has a convergent power sen n+1 ries expansion, h(ζ) = − ∞ , which is valid for small |ζ|. n=0 F (x )ζ Moreover, by our earlier remarks, h also has a holomorphic extension to the disk {ζ : |ζ| < 1/r(x)} (the extension is provided by the original definition of g). A power series converges on the biggest disk to which the function can be holomorphically extended; thus the radius of conP vergence of the series F (xn )ζ n+1 is at least 1/r(x). In particular, if 0 < a < 1/r(x), then F (xn )an = F (an xn ) → 0

(n → ∞).

Banach algebras

73

Since this is true for arbitrary F ∈ A∗ , we have in fact shown that w an x n − → 0. Weakly convergent sequences are bounded (Exercise 4.23), so there exists C = C(a) > 0 so that kan xn k ≤ C (n ∈ N). Hence 1 1 1/n C → , a a and here a < 1/r(x) was arbitrary and we can take the limit on any subsequence, so r(x) ≥ lim supn→∞ kxn k1/n . On the other hand, we have already proved that kxn k1/n ≤

r(x) ≤ inf kxn k1/n ≤ lim inf kxn k1/n , n∈N

n→∞

so we now obtain the full claim.



You should now work out some spectra in concrete examples. The first example is particularly important for us, so I’ll state this as a Proposition: Proposition 7.9. Consider the Banach algebra A = C(K). Then, for f ∈ C(K), we have that σ(f ) = f (K), where f (K) = {f (x) : x ∈ K}. Moreover, r(f ) = kf k for all f ∈ C(K). Exercise 7.4. Prove Proposition 7.9. Exercise 7.5. (a) Show that on A = `∞ , we have that σ(x) = {xn : n ∈ N}. Also, show that again r(x) = kxk for all x ∈ `∞ . (b) Show that on A = L∞ (X, µ), we have that σ(f ) = {z ∈ C : µ({x ∈ X : |f (x) − z| < }) > 0 for all  > 0} . (This set is also called the essential range of f ; roughly speaking, it is the range of f , but we ignore what happens on null sets, in keeping with the usual philosophy. Also, it is again true that r(f ) = kf k.) Exercise 7.6. Show that on A = B(Cn ), the spectrum σ(T ) of a matrix T ∈ B(Cn ) = Cn×n is the set of eigenvalues of T (this was discussed earlier, in Chapter 6). Now find a matrix T ∈ C2×2 for which r(T ) < kT k. The fact that spectra are always non-empty has the following consequence: Theorem 7.10 (Gelfand-Mazur). If A is a Banach algebra with G(A) = A \ {0}, then A ∼ = C.

74

Christian Remling

More specifically, the claim is that there is an identification map between A and C (thought of as a Banach algebra, with the usual multiplication and the absolute value as the norm) that preserves the complete Banach algebra structure: There is a map ϕ : A → C that is bijective (= preserves sets), a homomorphism (= preserves the algebraic structure), and an isometry (= preserves the norm). Proof. By Theorem 7.8(a), we can pick a number z(x) ∈ σ(x) for each x ∈ A. So x − z(x)e ∈ / G(A), but the only non-invertible element of A is the zero vector, so we must have that x = z(x)e (and we also learn that in fact σ(x) = {z(x)}). The map ϕ : A → C, ϕ(x) = z(x) has the desired properties.  In the last part of this chapter, we discuss the problem of how the spectrum of an element changes when we pass to a smaller Banach algebra. Let B be a Banach algebra, and let A ⊂ B be a subalgebra. By this we mean that A with the structure inherited from B is a Banach algebra itself. We also insist that e ∈ A. Note that this latter requirement could be dropped, and in fact that would perhaps be the more common version of the definition of a subalgebra. The following Exercise discusses the difference between the two versions: Exercise 7.7. Let B be a Banach algebra, and let C ⊂ B be a subset that also is a Banach algebra with unit element with the structure (algebraic operations, norm) inherited from B. Give a (simple) example of such a situation where e ∈ / C. Remark: This is very straightforward. Just make sure you don’t get confused. C is required to have a unit (call it f , say), but what exactly is f required to do? Exercise 7.2(g) might also provide some inspiration. If we now fix an element x ∈ A of the smaller algebra, we can consider its spectrum with respect to both algebras. From the definition, it is clear that σA (x) ⊃ σB (x): everything that is invertible in A remains invertible in B, but we may lose invertibility when going from B to A simply because the required inverse may no longer be part of the algebra. Furthermore, Theorem 7.8(b) shows that rA (x) = rB (x). More can be said about the relation between σA (x) and σB (x), but this requires some work. This material will be needed later, but is of a technical character and can be given a light reading at this point. We need the notion of connected components in a topological space; actually, we only need this for the space X = C. Recall that we call a topological space X connected if the only decomposition of X into two disjoint open sets is the trivial one: if X = U ∪ V , U ∩ V = ∅

Banach algebras

75

and U, V are open, then U = X or V = X. A subset A ⊂ X is called connected if A with the relative topology is a connected topological space. A connected component is a maximal connected set. These connected components always exist and in fact every point lies in a unique connected component, and the whole space can be written as the disjoint union of its connected components. For a detailed reading of this final section, the following topological warm-up should be helpful. You can either try to solve this directly or do some reading. Exercise 7.8. (a) Prove these facts. More specifically, show that if x ∈ X, then there exists a unique set Cx so that x ∈ Cx , Cx is connected, and if also x ∈ D, D connected, then D ⊂ Cx . Also, show that if x, y ∈ X, then either Cx ∩ Cy = ∅ or Cx = Cy . (b) Call A ⊂ X arcwise connected if any two points can be joined by a continuous curve: If x, y ∈ A, then there exists a continuous map ϕ : [0, 1] → A with ϕ(0) = x, ϕ(1) = y. Show that an arcwise connected set is connected. (c) Show that if U ⊂ C is open, then all connected components of U are open subsets of C. We are heading towards the following general result: Theorem 7.11. We have a representation of the following type: σA (x) = σB (x) ∪ C, where C is a (necessarily disjoint) union of connected components of ρB (x) (C = ∅ is possible, of course). This has the following consequences (whose relevance is more obvious): Corollary 7.12. (a) If ρB (x) is connected, then σA (x) = σB (x). In particular, this conclusion holds if σB (x) ⊂ R. ◦ (b) If σ A (x) = ∅, then σA (x) = σB (x). ◦

Here, C denotes the interior of C, defined as the largest open subset of C. To prove the Corollary (given the Theorem), note that the hypothesis that ρB (x) is connected means that the only connected component of this set is ρB (x) itself, but we cannot have σA (x) = σB (x) ∪ ρB (x) because ρB (x) is unbounded (being the complement of the compact set σB (x)), and σA (x) needs to be compact. If σB (x) is a (compact!) subset of R, then clearly its complement ρB (x) is arcwise connected, thus connected. Compare Exercise 7.8(b).

76

Christian Remling

Part (b) follows from the fact that the connected components of the open set ρB (x) are open (Exercise 7.8(c)), so if we had C 6= ∅, then automatically σA (x) would have non-empty interior. To prove Theorem 7.11, we need the following topological fact. Lemma 7.13. Let U, V ⊂ S X be open sets and assume that U ⊂ V , (U \ U ) ∩ V = ∅. Then U = Vα , where the Vα are connected components of V (but not necessarily all of these, of course). Proof. We must show that if W is a connected component of V with W ∩ U 6= ∅, then W ⊂ U (assuming this, we can then indeed write U as the union of those components of V that intersect U ). So let W be such a component. From the assumption of the Lemma, we have that W ∩ (U \ U ) = ∅. Therefore, c

W = (W ∩ U ) ∪ (W ∩ U ). This is a decomposition of W into two disjoint relatively (!) open subsets. Since W is connected by assumption, one of these must be all of W , and since W ∩ U 6= ∅, the first set is this set, that is, W ∩ U = W or W ⊂ U .  We are now ready for the Proof of Theorem 7.11. We will verify the hypotheses of Lemma 7.13 for U = ρA (x), V = ρB (x). The Lemma will then show that ρA (x) = S α∈I0 Vα ,Swhere the Vα are connected components of ρB (x). Also, ρB (x) = α∈I Vα , and I0 ⊂ I, so we indeed obtain that [ σA (x) = C \ ρA (x) = σB (x) ∪ Vα . α∈I\I0

Clearly, ρA (x) ⊂ ρB (x), so we must check that (ρA (x)\ρA (x))∩ρB (x) = ∅. Let z ∈ ρA (x) \ ρA (x). Then there are zn ∈ ρA (x), zn → z. I now claim that

(x − zn e)−1 → ∞ (7.3) (n → ∞). Suppose this were wrong. Then |z − zn | k(x − zn e)−1 k < 1 for some (large) n, and hence (x − zn e)−1 (x − ze) = e − (z − zn )(x − zn e)−1 would be in G(A) by Theorem 7.4, but then also x − ze ∈ G(A), and this contradicts z ∈ / ρA (x). Thus (7.3) holds. Now (7.3) also prevents x − ze from being invertible in B, because inversion is a continuous operation in Banach algebras (Exercise 7.3). More precisely, if we had x − ze ∈ G(B), then, since x − zn e → x − ze, it would follow that

Banach algebras

77

(x − zn e)−1 → (x − ze)−1 , but this convergence is ruled out by (7.3). So x − ze ∈ / G(B), or, put differently, z ∈ / ρB (x).  Exercise 7.9. Show that r(xy) = r(yx). Hint: Use the formula (xy)n = x(yx)n−1 y. Exercise 7.10. Prove that σ(xy) and σ(yx) can at most differ by the point 0. (In particular, this again implies the result from Exercise 7.9, but of course the direct proof suggested there was much easier.) Suggested strategy: This essentially amounts to showing that e − xy is invertible if and only if e − yx is invertible. So assume that e − xy ∈ G(A). Assume also that kxk, kyk < 1 and write (e−xy)−1 , (e−yx)−1 as Neumann series. Use the formula from the previous problem to obtain one inverse in terms of the other. Then show that this formula actually works in complete generality, without the assumptions on x, y.

78

Christian Remling

8. Commutative Banach algebras In this chapter, we analyze commutative Banach algebras in greater detail. So we always assume that xy = yx for all x, y ∈ A here. Definition 8.1. Let A be a (commutative) Banach algebra. A subset I ⊂ A is called an ideal if I is a (linear) subspace and xy ∈ I whenever x ∈ I, y ∈ A. An ideal I 6= A is called maximal if the only ideals J ⊃ I are J = I and J = A. Ideals are important for several reasons. First of all, we can take quotients with respect to ideals, and we again obtain a Banach algebra. Theorem 8.2. Let I 6= A be a closed ideal. Then A/I is a Banach algebra. This needs some clarification. The quotient A/I consists of the equivalence classes (x) = x + I = {x + y : y ∈ I}, and we define the algebraic operations on A/I by working with representatives; the fact that I is an ideal makes sure that everything is well defined (independent of the choice of representative). Since I is in particular a closed subspace, we also have the quotient norm available, and we know from Theorem 2.18 that A/I is a Banach space with this norm. Recall that this norm was defined as k(x)k = inf kx + yk. y∈I

Proof. From the above remarks, we already know that A/I is a Banach space and a commutative algebra with unit (e). We need to discuss conditions (3), (4) from Definition 7.1. To prove (4), let x1 , x2 ∈ A, and let  > 0. We can then find y1 , y2 ∈ I so that kxj +yj k < k(xj )k+. It follows that k(x1 )(x2 )k = k(x1 x2 )k ≤ k[x1 + y1 ][x2 + y2 ]k ≤ kx1 + y1 k kx2 + y2 k ≤ (k(x1 )k + ) (k(x2 )k + ) . Since  > 0 is arbitrary here, we have that k(x1 )(x2 )k ≤ k(x1 )k k(x2 )k, as required. Next, notice that k(e)k ≤ kek = 1. On the other hand, for all x ∈ A, we have that k(x)k = k(x)(e)k ≤ k(x)k k(e)k, so k(e)k ≥ 1.  Theorem 8.3. (a) If I 6= A is an ideal, then I ∩ G(A) = ∅. (b) The closure of an ideal is an ideal. (c) Every maximal ideal is closed. (d) Every ideal I 6= A is contained in some maximal ideal J ⊃ I.

Commutative Banach algebras

79

Proof. (a) If x ∈ I ∩ G(A), then y = x(x−1 y) ∈ I for all y ∈ A, so I = A. (b) The closure of a subspace is a subspace, and if x ∈ I, y ∈ A, then there are xn ∈ I, xn → x. Thus xn y ∈ I and xn y → xy by the continuity of the multiplication, so xy ∈ I, as required. (c) Let I be a maximal ideal. Then, by (b), I is another ideal that contains I. Since I ∩ G(A) = ∅, by (a), and since G(A) is open, I still doesn’t intersect G(A). In particular, I 6= A, so I = I because I was maximal. (d) This follows in the usual way from Zorn’s Lemma. Also as usual, we don’t want to discuss the details of this argument here.  Definition 8.4. The spectrum or maximal ideal space ∆ of a commutative Banach algebra A is defined as ∆ = {φ : A → C : φ complex homomorphism}. The term maximal ideal space is justified by parts (a) and (b) of the following result, which set up a one-to-one correspondence between complex homomorphisms and maximal ideals. Theorem 8.5. (a) If I is a maximal ideal, then there exists a unique φ ∈ ∆ with N (φ) = I. (b) Conversely, if φ ∈ ∆, then N (φ) is a maximal ideal. (c) x ∈ G(A) ⇐⇒ φ(x) 6= 0 for all φ ∈ ∆. (d) x ∈ G(A) ⇐⇒ x does not belong to any ideal I 6= A. (e) z ∈ σ(x) ⇐⇒ φ(x) = z for some φ ∈ ∆. Proof. (a) A maximal ideal is closed by Theorem 8.3(c), so the quotient A/I is a Banach algebra by Theorem 8.2. Let x ∈ A, x ∈ / I, and put J = {ax + y : a ∈ A, y ∈ I}. It’s easy to check that J is an ideal, and J ⊃ I, because we can take a = 0. Moreover, x = ex + 0 ∈ J, but x∈ / I, so, since I is maximal, we must have that J = A. In particular, e ∈ J, so there are a ∈ A, y ∈ I so that ax + y = e. Thus (a)(x) = (e) in A/I. Since x ∈ A was an arbitrary vector with x ∈ / I, we have shown that every (x) ∈ A/I, (x) 6= 0 is invertible. By the GelfandMazur Theorem, A/I ∼ = C. More precisely, there exists an isometric homomorphism f : A/I → C. The map A → A/I, x 7→ (x) also is a homomorphism (the algebraic structure on A/I is defined in such a way that this would be true), so the composition φ(x) := f ((x)) is another homomorphism: φ ∈ ∆. Since f is injective, its kernel consists of exactly those x ∈ A that are sent to zero by the first homomorphism, that is, N (φ) = I. It remains to establish uniqueness. If N (φ) = N (ψ), then x−ψ(x)e ∈ N (φ) for all x ∈ A, so 0 = φ(x) − ψ(x).

80

Christian Remling

(b) Homomorphisms are continuous, so N (φ) is a closed linear subspace. If x ∈ N (φ), y ∈ A, then φ(xy) = φ(x)φ(y) = 0, so xy ∈ N (φ) also, and N (φ) is an ideal. Since φ : A → C is a linear map to the one-dimensional space C, we have that codim N (φ) = 1, so N (φ) is already maximal as a subspace (the only strictly bigger subspace is A). (c) =⇒: This was proved earlier, in Proposition 7.3. ⇐=: Suppose that x ∈ / G(A). Then I0 = {ax : a ∈ A} is an ideal with I0 6= A (because e ∈ / I0 ). By Theorem 8.3(d), there exists a maximal ideal I ⊃ I0 . By part (a), there is a φ ∈ ∆ with N (φ) = I. In particular, φ(x) = 0. (d) This follows immediately from what we have shown already, plus Theorem 8.3(d) again. (e) We have z ∈ σ(x) if and only if x − ze ∈ / G(A), and by part (c), this holds if and only if φ(x − ze) = φ(x) − z = 0 for some φ ∈ ∆.  In particular, this says that a commutative Banach algebra always admits complex homomorphisms, that is, we always have ∆ 6= ∅. Indeed, notice that Theorem 8.3(d) with I = {0} shows that there are maximal ideals, so we obtain the claim from Theorem 8.5(a). Alternatively, we could use Theorem 8.5(e) together with the fact that spectra are always non-empty (Theorem 7.8(a)). The situation can be quite different on non-commutative algebras: Exercise 8.1. Consider the algebra C2×2 = B(C2 ) of 2×2-matrices (this becomes a Banach algebra if we fix an arbitrary norm on C2 and use the corresponding operator norm; however, as this is a purely algebraic exercise, the norm plays no role here). Show that there are no complex homomorphisms φ 6≡ 0 on this algebra. Here is a rather spectacular application of the ideas developed in Theorem 8.5: Example 8.1. Consider the Banach algebra of absolutely convergent trigonometric series: ( ) ∞ X ix inx 1 A = f (e ) = an e : a ∈ ` (Z) n=−∞

We have written f (eix ) rather than f (x) because it will be convenient to think of f as a function on the unit circle S = {z ∈ C : |z| = 1} = {eix : x ∈ R}. Notice that the series converges uniformly, so A ⊂ C(S). Exercise 8.2. Show that if f ≡ 0, then an = 0 for all n ∈ Z. Suggestion: Recall that {einx } is an ONB of L2 ((−π, π), dx/(2π)). Use this fact to derive a formula that recovers the an ’s from f .

Commutative Banach algebras

81

The algebraic operations on A are defined pointwise; for example, (f + g)(z) := f (z) + g(z). It is not entirely clear that the product of two functions from A will be in A again, but this issue will be addressed later. P Consider the map ϕ : `1 → A, ϕ(a) = an einx . It is clear that ϕ is linear and surjective. Moreover, Exercise 8.2 makes sure that ϕ is injective. Therefore, we can define a norm on A by kϕ(a)k = kak1 . This makes A isometrically isomorphic to `1 (Z) as a Banach space. We claim that these spaces are actually isometrically isomorphic as Banach algebras, where we endow `1 with the convolution product, as in Example 7.5: ∞ X (a ∗ b)n = aj bn−j j=−∞

Exercise 8.3. Show that ϕ is a homomorphism. Since we already know that ϕ is linear, you must show that ϕ(a ∗ b) = ϕ(a)ϕ(b). In particular, this does confirm that f g ∈ A if f, g ∈ A (the sequence corresponding to f g is a ∗ b if a and b correspond to f and g, respectively). Since `1 (Z) is a Banach algebra, A is a Banach algebra also, or perhaps it would be more appropriate to say that A is another realization of the same Banach algebra. Proposition 8.6. Every φ ∈ ∆ on this Banach algebra is an evaluation: There exists a z ∈ S so that φ(f ) = f (z). Conversely, this formula defines a complex homomorphism for every z = eit ∈ S. Exercise 8.4. Prove Proposition 8.6, by using the following strategy: Let φ ∈ ∆. What can you say about |φ(eix )| and |φ(e−ix )|? Conclude that |φ(eix )| = 1, say φ(eix ) = eit . Now use the continuity of φ to prove that for an arbitrary f ∈ A, we have that φ(f ) = f (eit ). The converse is much easier, of course. This material leads to an amazingly elegant proof of the following result: Theorem 8.7 (Wiener). convergent trigonoP Considerinxan absolutely 1 metric series: f (eix ) = ∞ a e , a ∈ ` (Z). Suppose that f (z) 6= n=−∞ n 0 for all z ∈ S. Then 1/f also has an absolutely convergent P trigonometinx ric expansion: There exists b ∈ `1 (Z) so that 1/f (eix ) = ∞ . n=−∞ bn e This result is interesting because it is usually very hard to tell whether the expansion coefficients (“Fourier coefficients”) of a given function lie in `1 .

82

Christian Remling

Proof. By Proposition 8.6, the hypothesis says that φ(f ) 6= 0 for all φ ∈ ∆. By Theorem 8.5(c), f ∈ G(A). Clearly, the inverse is given by the function 1/f .  We now come to the most important topic of this chapter. With each x ∈ A, we can associate a function x b : ∆ → C, x b(φ) = φ(x). We have encountered this type of construction before (see Proposition 4.3); it will work especially well in this new context. We call x b the ∗ Gelfand transform of x. The Gelfand topology on ∆ ⊂ A is defined as the relative topology that is induced by the weak-∗ topology on A∗ . By Exercise 4.10, this is also the weak topology that is generated by b for this collection of the maps {b x : A → C : x ∈ A}. We also write A maps. Here are the fundamental properties of the Gelfand transform. Theorem 8.8. (a) ∆ with the Gelfand topology is a compact Hausdorff space. b ⊂ C(∆) and the Gelfand transform b: A → C(∆) is a homomor(b) A phism between Banach algebras. (c) σ(x) = x b(∆) = {b x(φ) : φ ∈ ∆}; in particular, kb xk∞ = r(x) ≤ kxk. Note that we use the term Gelfand transform for the function x b∈ C(∆), but also for the homomorphismb: A → C(∆) that sends x to x b. Recall from Proposition 7.9 that in the Banach algebra C(∆), σ(b x) = x b(∆), so part (c) of the Theorem really says that the Gelfand transform preserves spectra: σ(b x) = σ(x). It also preserves the algebraic structure (by part (b)) and is continuous (by part (c) again). Proof. (a) This is very similar to the proof of the Banach-Alaoglu Theorem, so we will just provide a sketch. From that result, we know that ∆ ⊂ B 1 (0) = {F ∈ A∗ : kF k ≤ 1} is a subset of the compact Hausdorff space B 1 (0), and so it again suffices to show that ∆ is closed in the weak-∗ topology. A procedure very similar to the one used in the original proof works again: If ψ ∈ B 1 (0) \ ∆, then either ψ ≡ 0 or there exist x, y ∈ A so that  := |ψ(xy) − ψ(x)ψ(y)| > 0. Let us indicate how to finish the proof in the second case: Let  , 3o  |φ(y)| < |ψ(y)| + 1, (|ψ(y)| + 1) |φ(x) − ψ(x)| < . 3

n  U = φ ∈ B 1 (0) : |φ(xy) − ψ(xy)| < , |ψ(x)| |φ(y) − ψ(y)| < 3

Commutative Banach algebras

83

Then U is an open set in the weak-∗ topology that contains ψ. Moreover, if φ ∈ U , then |φ(xy) − φ(x)φ(y)| ≥ |ψ(xy) − ψ(x)ψ(y)| − |φ(xy) − ψ(xy)|−    |φ(y)| |φ(x) − ψ(x)| − |ψ(x)| |φ(y) − ψ(y)| >  − − − = 0, 3 3 3 so φ ∈ / ∆ either and indeed ∆ ∩ U = ∅. We have shown that B 1 (0) \ ∆ is open, as claimed. b ⊂ C(∆), from the second description of the (b) It is clear that A b Gelfand topology as the weakest topology that makes all maps x b∈A continuous. To prove that b : A → C(∆) is a homomorphism of algebras, we compute (xy)b(φ) = φ(xy) = φ(x)φ(y) = x b(φ)b y (φ) = (b xyb) (φ); in other words, (xy)b = x byb. Similar arguments show that b is also linear. (c) This is an immediate consequence of Theorem 8.5(e).  Let us summarize this one more time and also explore the limitations of the Gelfand transform. The maximal ideal space ∆ with the Gelfand topology is a compact Hausdorff space, and the Gelfand transform provides a map from the original (commutative!) Banach algebra A to C(∆) that • preserves the algebraic structure: it is a homomorphism; • preserves spectra: σ(b x) = σ(x); • is continuous: kb xk ≤ kxk. However, in general, it • does not preserve the norm: it need not be isometric; in fact, it can have a non-trivial null space; b need not be a • need not be surjective; worse still, its range A closed subspace of C(∆). Another remarkable feature of the Gelfand transform is the fact that it is a purely algebraic construction: it is independent of the norm that is being used on A. Indeed, all we need to do is construct the complex homomorphisms on A and then evaluate these on x to find x b. We also let the x b generate a weak topology on ∆, but again, if formulated this way, this procedure does not involve the norm on A. We are using the fact that there is some norm on A, though, for example to make sure that ∆ is a compact space in the Gelfand topology. However, the Gelfand transform does not change if we switch to

84

Christian Remling

a different norm on A (in many situations, there will be only one norm that makes A a Banach algebra). The following examples illustrate the last two properties from the above list.  Example 8.2. Let A be the set of matrices of the form T = a0 ab . This is a commutative Banach algebra if we use matrix multiplication and an arbitrary operator norm on A; in fact, A is a (commutative) subalgebra of C2×2 = B(C2 ). Exercise 8.5. Find all complex homomorphisms. Then show that there are T ∈ A, T 6= 0 with φ(T ) = 0 for all φ ∈ ∆. In other words, Tb = 0, so the Gelfand transform on A is not injective. Remark: To get this started, you could use the fact that homomorphisms are in particular linear functionals, and we know what these are on a finite-dimensional vector space. Example 8.3. We consider again the Banach algebra of absolutely convergent trigonometric series from Example 8.1. We saw in Proposition 8.6 that as a set, ∆ may be identified with the unit circle S = {z : |z| = 1}. To extract this identification from Proposition 8.6, notice also that if z, z 0 ∈ S, z 6= z 0 , then there will be an f ∈ A with f (z) 6= f (z 0 ). Actually, there will be a trigonometric polynomial (that is, an = 0 for all large |n|) with this property. So if z 6= z 0 , then the corresponding homomorphisms are also distinct. With this identification of ∆ with S, the Gelfand transform fb of an f ∈ A is the function that sends z ∈ S to φz (f ) = f (z); in other words, fb is just f itself. The Gelfand topology on S is the weakest topology that makes all fb continuous. Clearly, these functions are continuous if we use the usual topology on S. Moreover, S with both topologies is a compact Hausdorff space. Now the following simple but very important Lemma shows that the Gelfand topology is just the usual topology on S. Lemma 8.9. Let T1 ⊂ T2 be topologies on a common space X. If X is a compact Hausdorff space with respect to both topologies, then T1 = T2 . Proof. We use the fact that on a compact Hausdorff space, a subset is compact if and only if it is closed. Now let U ∈ T2 . Then U c is closed in T2 , thus compact. But then U c is also compact with respect to T1 , because T1 is a weaker topology (there are fewer open covers to consider). Thus U c is T1 -closed, so U ∈ T1 .  b = A is dense in C(S) = C(∆) because, by (a suitable version of) A the Weierstraß approximation theorem, every continuous function on

Commutative Banach algebras

85

S can be uniformly (that is, with respect to k · k∞ ) approximated by trigonometric polynomials, and these manifestly are in A. However, A 6= C(S). This is a well known fact from the theory of Fourier series. The following Exercise outlines an argument of a functional analytic flavor. b = C(S). First of all, use Corollary Exercise 8.6. Suppose that we had A 3.3 to show that then kak1 ≤ Ckf k∞ P for all a ∈ `1 and f (x) = an einx , for some C > 0. However, (8.1) can be refuted by considering approximations fN to the (discontinuous!) function f (eixP ) = χ(0,π) (x). More precisely, proceed as follows: Notice that if f = an einx with a ∈ `1 , then the series also converges in L2 (−π, π). Recall that {einx } is an ONS (in fact, an ONB) in L2 ((−π, π), dx/(2π)), so it follows that Z π 1 inx an = he , f i = f (eix )e−inx dx 2π −π (8.1)

(N )

for all f ∈ C(S). Use this to approximately compute the an for functions fN ∈ C(S) that satisfy 0 ≤ fN ≤ 1, fN (eix ) = 1 for 0 < x < π and fN (eix ) = 0 for −π + 1/N < x < −1/N . Show that ka(N ) k1 can be made arbitrarily large by taking N large enough. Since kfN k∞ = 1, this contradicts (8.1). Exercise 8.7. (a) Show that c with pointwise multiplication is a Banach algebra. (b) Show that `1 ⊂ c is an ideal. (c) Show that there is a unique maximal ideal I ⊃ `1 . Find I and also the unique φ ∈ ∆ with N (φ) = I. Exercise 8.8. Consider the Banach algebra `∞ . Show that In = {x ∈ `∞ : xn = 0} is a maximal ideal for every n ∈ N. Find the corresponding homomorphisms φn ∈ ∆ with N (φn ) = In . Finally, show that there must be additional complex homomorphisms (Suggestion: Find another ideal J that is not contained in any In .) Exercise 8.9. Let A be a commutative Banach algebra. Show that the spectral radius satisfies r(xy) ≤ r(x)r(y), for all x, y ∈ A.

r(x + y) ≤ r(x) + r(y)

86

Christian Remling

Exercise 8.10. Show that the inequalities from Exercise 8.9 can fail on non-commutative Banach algebras. More specifically, show that they fail on A = C2×2 . Remark: Recall that on this Banach algebra, the spectrum of a matrix is the set of its eigenvalues, so r(T ) is the absolute value of the biggest eigenvalue of T .

C ∗ -algebras

87

9. C ∗ -algebras We are especially interested in the Banach algebra B(H), and here we have an additional structure that we have not taken into account so far: we can form adjoints T ∗ of operators T ∈ B(H). We now discuss such an operation in the abstract setting. Unless stated otherwise, the algebras in this chapter are not assumed to be commutative. Definition 9.1. Let A be a Banach algebra. A map ∗ : A → A is called an involution if it has the following properties: (x + y)∗ = x∗ + y ∗ ,

(cx)∗ = cx∗ ,

(xy)∗ = y ∗ x∗ ,

x∗∗ = x

for all x, y ∈ A, c ∈ C. We call x ∈ A self-adjoint (normal) if x = x∗ (xx∗ = x∗ x). Example 9.1. Parts (a)–(d) of Theorem 6.1 show that the motivating example “adjoint operator on B(H)” indeed is an involution on B(H) in the sense of Definition 9.1. Example 9.2. f ∗ (x) := f (x) defines an involution on C(K) and also on L∞ (X, µ). Similarly, (x∗ )n := xn defines an involution on `∞ . Theorem 9.2. Let A be a Banach algebra with involution, and let x ∈ A. Then: (a) x + x∗ , −i(x − x∗ ), xx∗ are self-adjoint; (b) x has a unique representation of the form x = u + iv with u, v self-adjoint; (c) e = e∗ ; ∗ (d) If x ∈ G(A), then also x∗ ∈ G(A) and (x∗ )−1 = (x−1 ) ; (e) z ∈ σ(x) ⇐⇒ z ∈ σ(x∗ ). Proof. (a) can be checked by direct calculation; for example, (x+x∗ )∗ = x∗ + x∗∗ = x∗ + x. (b) We can write 1 −i x = (x + x∗ ) + i (x − x∗ ), 2 2 and by part (a), this is a representation of the desired form. To prove uniqueness, assume that x = u+iv = u0 +iv 0 , with self-adjoint elements u, u0 , v, v 0 . Then both w := u − u0 and iw = i(u − u0 ) = v − v 0 are selfadjoint, too, so iw = (iw)∗ = −iw and hence w = 0. (c) e∗ = ee∗ , and this is self-adjoint by part (a). So e∗ = e∗∗ = e, and thus e itself is self-adjoint, too.

88

Christian Remling

(d) Let x ∈ G(A). Then we can take adjoints in xx−1 = x−1 x = e; by part (c), e∗ = e, so we obtain that ∗ ∗ x−1 x∗ = x∗ x−1 = e, ∗

and this indeed says that x∗ ∈ G(A) and (x∗ )−1 = (x−1 ) . (e) If z ∈ / σ(x), then x − ze ∈ G(A), so (x − ze)∗ = x∗ − ze ∈ G(A) by part (d), that is, z ∈ / σ(x∗ ). We have established “⇐=”, and the converse is the same statement, applied to x∗ in place of x.  The involution on B(H) has an additional property that does not follow from the conditions of Definition 9.1: we always have that kT T ∗ k = kT ∗ T k = kT k2 ; see Theorem 6.1(f). This innocuous looking identity is so powerful and has so many interesting consequences that it deserves a special name: Definition 9.3. Let A be a Banach algebra with involution. A is called a C ∗ -algebra if kxx∗ k = kxk2 for all x ∈ A (the C ∗ -property). From this, we automatically get analogs of the other properties from Theorem 6.1(f) also; in other words, these could have been included in the definition. Proposition 9.4. Let A be a C ∗ -algebra. Then kxk = kx∗ k and kx∗ xk = kxk2 for all x ∈ A. Exercise 9.1. Prove Proposition 9.4. Example 9.3. B(H), C(K), L∞ (X, µ), and `∞ with the involutions introduced above are C ∗ -algebras. For B(H) (which again was the motivating example) this of course follows from Theorem 6.1(f), and on the other algebras, we obtain the C ∗ -property from an easy direct argument. For example, if f ∈ C(K), then  2 ∗ 2 kf f k = max |f (x)f (x)| = max |f (x)| = max |f (x)| = kf k2 . x∈K

x∈K

x∈K

Example 9.4. This really is a non-example. Consider again the Banach algebra ( ) ∞ X A = f (eix ) = an einx : a ∈ `1 (Z) n=−∞

of absolutely convergent trigonometric series. Recall that we multiply functions from A pointwise (equivalently, we take the convolution product of the corresponding sequences from `1 ), and we use the norm kf k = kak1 .

C ∗ -algebras

89

It is not very difficult to verify that f ∗ (z) := f (z) again defines an involution on A. The algebraic properties from Definition 9.1 are in fact obvious, P we just need to makePsure that f ∗ ∈ A again, but this is easy: if f = an einx , then f ∗ = bn einx , with bn = a−n (or we can rephrase and say that this last formula defines an involution on `1 (Z)). Exercise 9.2. Show that this involution does not have the C ∗ -property, that is, A is not a C ∗ -algebra. We can now formulate and prove the central result of this chapter. Theorem 9.5 (Gelfand-Naimark). Let A be a commutative C ∗ -algebra. Then the Gelfand transform b : A → C(∆) is an isometric ∗-isomorphism between the C ∗ -algebras A and C(∆). We call a map ϕ : A → B between C ∗ -algebras an isometric ∗isomorphism if ϕ is bijective, a homomorphism, an isometry, and preserves the involution: ϕ(x∗ ) = (ϕ(x))∗ . In other words, such a map preserves the complete C ∗ -algebra structure (set, algebraic structure, norm, involution). It now becomes clear that the Gelfand-Naimark Theorem is a very powerful structural result; it says that C(K) provides a universal model for arbitrary commutative C ∗ -algebras. Every commutative C ∗ -algebra can be identified with C(K); in fact, we can be more specific: K can be taken to be the maximal ideal space ∆ with the Gelfand topology, and then the Gelfand transform provides an identification map. Note also that the Gelfand transform on C ∗ -algebras has much better properties than on general Banach algebras; see again our discussion at the end of Chapter 8. For the proof, we will need the following result. Theorem 9.6 (Stone-Weierstraß). Let K be a compact Hausdorff space, and suppose that A ⊂ C(K) has the following properties: (a) A is a subalgebra (possibly without unit); (b) If f ∈ A, then f ∈ A; (c) A separates the points of K: if x, y ∈ K, x 6= y, then there is an f ∈ A with f (x) 6= f (y); (d) For every x ∈ K, there exists an f ∈ A with f (x) 6= 0. Then A = C(K). This closure is taken with respect to the norm topology. So we could slightly rephrase the statement as follows: if g ∈ C(K) and  > 0 are given, then we can find an f ∈ A so that kf − gk∞ < . This result is a far-reaching generalization of the classical Weierstraß approximation theorem, which says that every continuous function on a

90

Christian Remling

compact interval [a, b] can be uniformly approximated by polynomials. To obtain this as a special case of Theorem 9.6, just put K = [a, b] and check that ( ) N X A = p(x) = an xn : an ∈ C, N ∈ N0 n=0

satisfies hypotheses (a)–(d). We don’t want to prove the Stone-Weierstraß Theorem here; a proof can be found in most topology books. Or see Folland, Real Analysis, Theorem 4.51. We are now ready for the Proof of the Gelfand-Naimark Theorem. We first claim that φ(u) ∈ R for all φ ∈ ∆ if u ∈ A is self-adjoint. To see this, write φ(u) = c + id, with c, d ∈ R, and put x = u + ite, with t ∈ R. Then φ(x) = c + i(d + t) and xx∗ = u2 + t2 e, so c2 + (d + t)2 = |φ(x)|2 ≤ kxk2 = kxx∗ k ≤ ku2 k + t2 . It follows that 2dt ≤ C, with C := ku2 k − d2 − c2 , and this holds for arbitrary t ∈ R. Clearly, this is only possible if d = 0, so φ(u) = c ∈ R, as asserted. It now follows that the Gelfand transform preserves the involution: if x ∈ A, then we can write x = u + iv with u, v self-adjoint, and it follows that φ(x∗ ) = φ(u − iv) = φ(u) − iφ(v) = φ(u) + iφ(v) = φ(x). Recall that the involution on C(∆) was defined as the pointwise complex conjugate, so, since φ ∈ ∆ is arbitrary here, this calculation indeed says that xb∗ = x b = (b x)∗ . b ⊂ C(∆) satisfies assumption (b) We also learn from this that A from the Stone-Weierstraß Theorem. It is straightforward to establish the other conditions, too; for example, to verify (c), just note that if φ, ψ ∈ ∆, φ 6= ψ, then φ(x) 6= ψ(x) for some x ∈ A, so x b(φ) 6= x b(ψ). b So Theorem 9.6 shows that A = C(∆). As the next step, we want to show that the Gelfand transform is isometric. Let x ∈ A, and put y = xx∗ . Then y is self-adjoint, and therefore the C ∗ -property gives that ky 2 k = kyk2 , ky 4 k = ky 2 y 2 k = ky 2 k2 = kyk4 , and so forth. The general formula is ky n k = kykn , if n = 2k is a power of 2. Now we can compute the spectral radius by using the formula from Theorem 7.8(b) along this subsequence. It follows that r(y) = limn→∞ ky n k1/n = kyk. Since kb y k = r(y) by Theorem 8.8(c), this shows that kb y k = kyk for y of the form y = xx∗ . We can now use the C ∗ -property on both algebras C(∆) and A to conclude that also kb xk = kxk for arbitrary x ∈ A.

C ∗ -algebras

91

So the Gelfand transform is an isometry, and this implies that this map is injective (obvious, because only the zero vector can get mapped b is a closed subspace of C(∆) (not completely to zero) and its range A obvious, but we have encountered this argument before; see the proof b = C(∆), so it now follows of Proposition 4.3). We proved earlier that A b = C(∆). We have established all the properties of the Gelfand that A transform that were stated in Theorem 9.5.  We now discuss in detail the Gelfand transform for the three commutative C ∗ -algebras C(K), c, L∞ (0, 1). Example 9.5. Let K be a compact Hausdorff space and consider the C ∗ -algebra A = C(K). We know from the Gelfand-Naimark Theorem that C(K) ∼ = C(∆), but we would like to explicitly identify ∆ and the Gelfand transforms of functions f ∈ C(K). We will need the following tool: Lemma 9.7 (Urysohn). Let K be a compact Hausdorff space. If A, B are disjoint closed subsets of K, then there exists f ∈ C(K) with 0 ≤ f ≤ 1, f = 0 on A and f = 1 on B. See, for example, Folland, Real Analysis, Lemma 4.15 (plus Proposition 4.25) for a proof. It is clear that the point evaluations φx (f ) = f (x) are complex homomorphisms for all x ∈ K. So we obtain a map Ψ : K → ∆, Ψ(x) = φx . Urysohn’s Lemma shows that Ψ is injective: if x, y ∈ K, x 6= y, then there exists f ∈ C(K) with f (x) 6= f (y) (just take A = {x}, B = {y} in Lemma 9.7). So φx (f ) 6= φy (f ) and thus φx 6= φy . I now claim that Ψ is also surjective. If this were wrong, then there would be a φ ∈ ∆, φ ∈ / {φx : x ∈ K}. Let I = N (φ), Ix = N (φx ) = {f ∈ C(K) : f (x) = 0} be the corresponding maximal ideals. By assumption and (the uniqueness part of) Theorem 8.5(a), I 6= Ix for all x ∈ K. Since I is also maximal, this implies that I is not contained in any Ix . So for every x ∈ K, there exists an fx ∈ I with fx (x) 6= 0. Since the fx are continuous, we can find neighborhoods Ux of x so that fx (y) 6= 0 for all y ∈ Ux . By compactness, K is covered by finitely many S PN of these, say K = N U . Now let g = x j j=1 j=1 fxj fxj . Then g ∈ I and g > 0 on K (because on Uxj , the jth summand is definitely positive), so g is invertible in C(K) (with inverse 1/g). This is a contradiction because the ideal I 6= C(K) cannot contain invertible elements; see Theorem 8.3(a).

92

Christian Remling

We conclude that ∆ = {φx : x ∈ K}. This identifies ∆ as a set with K. Moreover, fb(φx ) = φx (f ) = f (x), so if we use this identification, then the Gelfand transform of a function f ∈ C(K) is just f itself. We now want to show that the identification map Ψ is a homeomorphism, so in fact ∆ (with the Gelfand topology) can be identified with K as a topological space. We introduce some notation: write TG for the Gelfand topology on ∆, and let TK be the given topology on K, but moved over to ∆. More precisely, TK = {Ψ(U ) : U ⊂ K open }. Since Ψ is a bijection, it preserves the set operations and thus TK indeed is a topology. Notice that every fb : ∆ → C is continuous if we use the topology TK on ∆. This is almost a tautology because TK is essentially the original topology and fb is essentially f , and these were continuous functions to start with. For a more formal verification, notice that fb = f ◦ Ψ−1 , so if V ⊂ C is open, then fb−1 (V ) = Ψ(f −1 (V )), which is in TK . So TK is a topology that makes all fb continuous. This implies that TG ⊂ TK , because TG can be defined as the weakest such topology. Moreover, ∆ is a compact Hausdorff space with respect to both topologies. This follows from Theorem 8.8(a) (for TG ) and the fact that by construction of TK , (∆, TK ) is homeomorphic to K. Lemma 8.9 now shows that TG = TK . We summarize: Theorem 9.8. Let K be a compact Hausdorff space. Then the maximal ideal space ∆ of the C ∗ -algebra C(K) is homeomorphic to K. A homeomorphism between these spaces is given by Ψ : K → ∆, Ψ(x) = φx , φx (f ) = f (x). Moreover, if ∆ is identified in this way with K, then the Gelfand transform of a function f ∈ C(K) is just f itself. At least with hindsight, this does not come as a big surprise. The Gelfand transform gives a representation of a commutative C ∗ -algebra A as continuous functions on a compact Hausdorff space (namely, ∆), but if the algebra is already given in this form, there is no work left to be done, and indeed the Gelfand transform does not do anything (except change names) on C(K). From that point of view, Theorem 9.8 seems somewhat disappointing, but we can in fact draw interesting conclusions: Theorem 9.9. Let K and L be compact Hausdorff spaces. Then K is homeomorphic to L if and only if the algebras C(K) and C(L) are (algebraically!) isomorphic. In this case, C(K) and C(L) are in fact isometrically ∗-isomorphic as C ∗ -algebras.

C ∗ -algebras

93

Here, we say that A and B are algebraically isomorphic if there exists a bijective homomorphism (in other words, an isomorphism) ϕ : A → B. We do not require ϕ to be isometric or preserve the conjugation. Proof. Suppose that C(K) and C(L) are isomorphic as algebras. By Theorem 9.8, K ∼ = ∆K , L ∼ = ∆L , but the construction of ∆ and its Gelfand topology only uses the algebraic structure (we already discussed this feature of the Gelfand transform in Chapter 8), so ∆K ∼ = ∆L . Or, to spell this out somewhat more explicitly, if ϕ : C(K) → C(L) is an algebraic isomorphism, then φL 7→ φK = φL ◦ ϕ defines a homeomorphism from ∆L onto ∆K . Exercise 9.3. Prove the converse statement. Actually, prove right away the stronger version that C(K) and C(L) are isometrically ∗-isomorphic if K ∼ = L. Also, if the above sketch doesn’t convince you, try to write this down in greater detail. More specifically, give a more detailed argument that shows that the map defined at the end of the proof indeed is a homeomorphism.  Example 9.6. Our next example is A = c. This is a C ∗ -algebra with the conjugation (x∗ )n = xn ; in fact, c is a subalgebra of the C ∗ -algebra `∞ . We want to discuss its Gelfand representation c ∼ = C(∆). We start out by finding ∆. I claim that we can identify ∆ with N∞ ≡ N ∪ {∞} (this is just N with an additional point, which we choose to call “∞”). More precisely, n ∈ N corresponds to the complex homomorphism φn (x) = xn , and φ∞ (x) = limn→∞ xn . It’s easy to check that these φ’s are indeed complex homomorphisms. Moreover, these are in fact all homomorphisms. This could be seen as in Example 9.5, but we can also just recall that the dual space c∗ can be identified with `1 (N∞ ): we associate with y ∈ `1 (N∞ ) the functional ∞ X Fy (x) = yn xn + y∞ · lim xn . n=1

n→∞

See Example 4.4; we called the additional point 0 there (rather than ∞), but that of course is irrelevant. Exercise 9.4. Show that Fy is a homomorphism precisely if y = en or y = e∞ . With this identification of ∆ with N∞ , the Gelfand transform of an x ∈ c becomes the function x b(n) = φn (x) = xn , x b(∞) = lim xn . So x b is just the sequence xn itself, with the limit added as the value at the additional point ∞.

94

Christian Remling

Now what is the Gelfand topology on N∞ ? First of all, all subsets of N are open. To see this, just note that (with e = (1, 1, 1, . . .)) {m} = {n ∈ N∞ : |b e(n) − ec − em m (n)| < 1} = e\

−1

({z : |z| < 1}) ,

so this is indeed an open set for all m ∈ N. Similarly, the sets {n ∈ N : n ≥ k} ∪ {∞} are open for all k ∈ N because they are again inverse images of open sets U ⊂ C under suitable functions x b. For example, we can take U = {|z| < 1} (again) and xn = 1 for n < k and xn = 0 for n ≥ k. By combining these observations, we see that a subset U ⊂ N∞ is open in the Gelfand topology if: • ∞∈ / U or • U ⊃ {n : n ≥ k} ∪ {∞} for some k ∈ N This actually gives a complete list of the open sets. We can prove this remark as follows: First of all, the collection of sets U described above clearly defines a topology on N∞ . It now suffices to show that every x b : N∞ → C is continuous with respect to this topology, because the Gelfand topology was defined as the weakest topology with this property. Continuity of x b at n ∈ N is obvious because {n} is a neighborhood of n. To check continuity at ∞, let  > 0 be given. Since x b(∞) = limn→∞ x b(n), there exists k ∈ N so that |b x(n) − x b(∞)| < 

for n ≥ k.

Since U = {n : n ≥ k} ∪ {∞} is a neighborhood of ∞, this verifies that x b is continuous at ∞ also. This topology TG is a familiar object: the space (N∞ , TG ) is called the 1-point compactification of N; please refer to a topology book for further information. Here, the compactness of (N∞ , TG ) also follows from Theorem 8.8(a). In the case at hand, TG also has the following characterization: Exercise 9.5. Show that TG is the only topology on N∞ that induces the given topology on N (all sets open) and makes N∞ a compact space. We summarize: Theorem 9.10. The maximal ideal space ∆ of c is homeomorphic to the 1-point compactification N∞ of N. The Gelfand transform of an x ∈ c is just the original sequence, supplemented by its limit: x b(n) = xn , x b(∞) = lim xn . Example 9.7. In the previous two examples, the final results could have been guessed at the very beginning: it was not very hard to realize the given C ∗ -algebra as continuous functions on a compact Hausdorff space.

C ∗ -algebras

95

Matters are very different for A = L∞ (0, 1), which is our final example. Neither ∆ as a set nor its Gelfand topology are directly accessible, but we will obtain useful information anyway. It will turn out that the topological space (∆, TG ) has rather exotic properties. We introduce a Rmeasure on ∆ as follows: Consider the functional 1 C(∆) → C, fb 7→ 0 f (x) dx. This is well defined because every continuous function on ∆ is the Gelfand transform of a unique element of L∞ (0, 1), by the Gelfand-Naimark Theorem. Moreover, the functional is also linear and positive: if fb ≥ 0, then f ≥ 0 almost everywhere, because the Gelfand transform preserves spectra, and on C(∆) and L∞ , these are given by the range and essential range of the function, R 1 respectively (see Proposition 7.9 and Exercise 7.5(b)). Therefore, 0 f dx ≥ 0 if fb ≥ 0. The Riesz Representation Theorem now shows that there is a unique regular positive Borel measure µ ∈ M(∆) so that Z 1 Z f (x) dx = fb(φ) dµ(φ) 0





for all f ∈ L (0, 1). See Folland, Real Analysis, Theorem 7.2 (and Proposition 7.5 for the regularity). We can think of µ as Lebesgue measure on (0, 1), moved over to ∆. Notice also that b 1 = 1, so µ(∆) = R1 dx = 1. 0 We will now use µ as our main tool to establish the following properties of ∆ and the Gelfand topology. Taken together, these are rather strange. Theorem 9.11. (a) If V ⊂ ∆, V 6= ∅ is open, then µ(V ) > 0. (b) If g : ∆ → C is a bounded, (Borel) measurable function, then there exists an fb ∈ C(∆) so that g = fb µ-almost everywhere. (c) If V ⊂ ∆ is open, then V is also open. ◦

(d) If E ⊂ ∆ is a Borel set, then µ(E) = µ(E) = µ(E). (e) ∆ does not have isolated points, that is, {φ} is not open for any φ ∈ ∆. (f ) ∆ does not have non-trivial convergent sequences: If φn , φ ∈ ∆, φn → φ, then φn = φ for all large n. Some comments are in order. Parts (a) and (b) imply that L∞ (∆, µ) = C(∆): every bounded measurable function has exactly one continuous representative. The property stated in part (c) is sometimes referred to by saying that ∆ is extremally disconnected. Part (c) in particular implies that ∆ is totally disconnected: the only connected subsets of ∆ are the single points.

96

Christian Remling

Exercise 9.6. Prove this fact. In fact, please prove the corresponding general statement: If X is a topological Hausdorff space in which the closure of every open set is open and M ⊂ X has more than one point, then there are disjoint open sets U, V that both intersect M with M ⊂ U ∪ V . So far, none of this is particularly outlandish; indeed, discrete topological spaces such as N or finite collections of points (all subsets are open) have all these properties. However, part (e) says that ∆ is decidedly not of this type. We must give up all attempts at visualizing ∆ and admit that ∆ is such a complicated space that no easy intuition will do justice to it. Note also that some of the above properties (for example, (b), (c), and (d)) seem to suggest that ∆ might have many open subsets, but we also know that ∆ is compact, and that works in the other direction. Proof. (a) Let V ⊂ ∆ be a non-empty open set. Pick φ ∈ V . By Urysohn’s Lemma, there exists fb ∈ C(∆) with 0 ≤ fb ≤ 1, fb(φ) = 1, and fb = 0 on V c . Again, since the Gelfand transform preserves spectra, we then also have that f ≥ 0, but f is not equal to zero (Lebesgue) almost everywhere. Thus Z 0
0. (b) Let g : ∆ → C be a Borel function with |g(φ)| ≤ M . We now use the fact that continuous functions are dense in Lp spaces (p < ∞) if (like here) the underlying measure is a regular Borel measure on a compact space. See Folland, Real Analysis, Proposition 7.9 for a slightly more general version of this result. In particular, we can find fbn ∈ C(∆) so that kfbn − gk2 → 0. In fact, we may assume that |fbn | ≤ M also. Exercise 9.7. Prove this remark. Suggestion: If |fb| > M at certain points, we could just redefine fb on this set so that the new function is bounded by M , and we would in fact obtain a better approximation to g. However, we also need to make sure that the new function is still continuous. Use Urysohn’s Lemma to give a careful version of this argument.

C ∗ -algebras

97

By the basic properties of the Gelfand transform, we now obtain that Z 1 1 2 |fm (x) − fn (x)| dx = (f m (x) − f n (x))(fm (x) − fn (x)) dx 0 0 Z  = (f m − f n )(fm − fn ) b dµ Z∆ 2 b = fm (x) − fbn (x) dµ(x) → 0 (m, n → ∞).

Z



So f := limn→∞ fn exists in L2 (0, 1). On a suitable subsequence, we can obtain f (x) as a pointwise limit. This shows that |f | ≤ M almost everywhere, so f ∈ L∞ (0, 1). By the same calculation as above, we now see that Z Z 1 2 b b |fn (x) − f (x)|2 dx → 0, fn (x) − f (x) dµ(x) = ∆

0

that is, fbn → fb in L2 (∆, µ). On the other hand, fbn → g in this space by construction of the fbn , so g = fb in L2 (∆, µ), that is, almost everywhere with respect to µ, and fb ∈ C(∆), as desired. (c) g = χV c is a bounded Borel function because the only preimages that can occur here are ∅, V , V c , ∆. By part (b), there exists fb ∈ C(∆) so that g = fb µ-almost everywhere. Now fb−1 (C \ {0, 1}) is an open set of µ measure zero. By part (a), the set is actually empty, and thus fb only takes the values 0 and 1. This argument also shows that the sets c V ∩ fb−1 (C \ {0}) and V ∩ fb−1 (C \ {1}) are empty. Put differently, we c have that fb = 0 on V and fb = 1 on V . Therefore, V ⊂ fb−1 ({0}) ⊂ V . Now fb−1 ({0}) is also closed (it is the preimage of a closed set), and since V is the smallest closed set that contains V , we must have that fb−1 ({0}) = V . We can also obtain this set as fb−1 (C \ {1}), which is open, so indeed V is an open set. (d) First of all, let V ⊂ ∆ be open. Consider again the function g = χV c and its continuous representative fb from the proof of part (c). We saw above that fb = 0 exactly on V . On the other hand, g = 0 on V , and since g = fb almost everywhere, this implies that µ(V ) = µ(V ). By ◦

passing to the complements, we also obtain from this that µ(A) = µ(A) if A ⊂ ∆ is closed. If E ⊂ ∆ is an arbitrary Borel set and  > 0 is given, we can use the regularity of µ to find a compact set K ⊂ E and an open set V ⊃ E

98

Christian Remling

so that µ(V ) < µ(K) + . It then follows that ◦



µ(E) ≤ µ(V ) = µ(V ) < µ(K) +  = µ(K) +  < µ(E) + . ◦



Now  > 0 was arbitrary, so µ(E) ≤ µ(E). Since clearly µ(E) ≤ µ(E) ≤ µ(E), we obtain the claim. (e) Suppose that {φ0 } were an open set. Since points in Hausdorff spaces are always closed, the function χ{φ0 } would then be continuous and thus be equal to fb for some f ∈ L∞ (0, 1). We can now again use the fact that the Gelfand transform preserves spectra to deduce that f itself is the characteristic function of some measurable set M ⊂ (0, 1), |M | > 0: f = χM (this follows because the essential range of f has to be {0, 1}). Pick a subset M 0 ⊂ M so that both M 0 and M \ M 0 have positive Lebesgue measure. Exercise 9.8. Prove the existence of such a set M 0 . Does a corresponding result hold on arbitrary measure spaces (do positive measure sets always have subsets of strictly smaller positive measure)? Let g = χM 0 . Then clearly f g = g, so fbgb = gb. Since fb(φ) = 0 for φ 6= φ0 , this says that gb = cfb for some c ∈ C. On the other hand, it is not true that g = cf almost everywhere, so we have reached a contradiction. We have to admit that {φ0 } is not open. (f) Let φn → φ be a convergent sequence, and assume that φn is not eventually constant. By passing to a subsequence, we may then in fact assume that φn 6= φ for all n ∈ N. Pick disjoint neighborhoods U1 and V1 of φ1 and φ, respectively. Since φn → φ, we can find an index n2 so that φn2 ∈ V1 . Now pick disjoint neighborhoods U20 and V20 of φn2 and φ, respectively, and put U2 = U20 ∩ V1 , V2 = V20 ∩ V1 . These are still (possibly smaller) neighborhoods of the same points. We can continue this procedure. We obtain pairwise disjoint neighborhoods U1 , U2 , U3 , . . . of the members of the subsequence φ1 , φn2 , φn3 , . . .. Since all the Uj ’s are in particular open, the formula  S  φ ∈ j∈N U2j−1 1 S g(φ) = −1 φ ∈ j∈N U2j  0 otherwise defines a (bounded) Borel function g. By part (b), g = fb almost everywhere for some fb ∈ C(∆). We observe that we also must have that fb(φn2j−1 ) = 1, fb(φn2j ) = −1, because if fb took a different value at one of these points, then fb and g would differ on an open set, and this has positive measure by (a).

C ∗ -algebras

99

Exercise 9.9. Let f : X → Y be a continuous function between topological spaces. Show that f is also sequentially continuous, that is, if xn → x, then f (xn ) → f (x). From this Exercise, we obtain that fb(φn ) → fb(φ), but clearly this is not possible if these values alternate between 1 and −1.  We now return to the general theory of C ∗ -algebras. Theorem 9.12. Suppose that A is a commutative C ∗ -algebra that is generated by one element x ∈ A. Then ∆ ∼ = σ(x). If A is a (not necessarily commutative) C ∗ -algebra and C ⊂ A, then we define the C ∗ -algebra generated by C to be the smallest C ∗ subalgebra B ⊂ A that contains C. It is very important to recall here that we are using the convention that subalgebras always contain the original unit e ∈ A. The following Exercise clarifies basic aspects of this definition: Exercise 9.10. (a) Show that there always exists such a C ∗ -algebra B ⊂ A by defining B to be the intersection of all C ∗ -algebras B 0 with e ∈ B 0 and C ⊂ B 0 ⊂ A. (b) Prove that B has the following somewhat more explicit alternative description: B = {p(b1 , . . . , bM , b∗1 , . . . , b∗N ) : p polynomial , bj ∈ C} More precisely, the p’s are polynomials in non-commuting variables; these are, as usual, linear combinations of products of powers of the variables, but the order of the variables matters, and we need to work with all possible arrangements. Back to the case under consideration: The hypothesis of Theorem 9.12 means that the only C ∗ -algebra B ⊂ with e, x ∈ B is B = A. PA N ∗ Equivalently, the polynomials p(x, x ) = j,k=0 cjk xj (x∗ )k are dense in A; notice also that we don’t need to insist on non-commuting variables in p here because A is commutative. The conclusion of the Theorem states that ∆ and σ(x) (with the relative topology coming from C) are homeomorphic. Proof of Theorem 9.12. The Gelfand transform of x provides the homeomorphism we are looking for: x b : ∆ → σ(x) is continuous and onto. If x b(φ1 ) = x b(φ2 ) or, equivalently, φ1 (x) = φ2 (x), then also φ1 (x∗ ) = φ1 (x) = φ2 (x) = φ2 (x∗ ),

100

Christian Remling

and thus φ1 (p) = φ2 (p) for all polynomials in x, x∗ . Since these are dense in A by assumption and φ1 , φ2 are continuous, we conclude that φ1 (y) = φ2 (y) for all y ∈ A. So x b is also injective. Summing up: x b : ∆ → σ(x) is a continuous bijection between compact Hausdorff spaces. In this situation, the inverse is automatically continuous also, so we have our homeomorphism. To prove this last remark, we can argue as in Lemma 8.9 (or we could in fact use this result itself): Suppose A ⊂ ∆ is closed. Then A is compact, so x b(A) ⊂ σ(x) is compact, thus closed. We have shown that the inverse image of a closed set under x b−1 is closed, which is one of the characterizations of continuity.  Exercise 9.11. (a) Let B ⊂ A be the C ∗ -algebra that is generated by C ⊂ A. Show that B is commutative if and only if xy = yx,

xy ∗ = y ∗ x

for all x, y ∈ C. (b) Show that the C ∗ -algebra generated by x is commutative if and only if x is normal. Theorem 9.12 in particular shows that A ∼ = C(σ(x)) if the commutative C ∗ -algebra is generated by a single element. We can be a little more specific here: Theorem 9.13. Suppose that the commutative C ∗ -algebra A is generated by the single element x ∈ A. Then there exists a unique isometric ∗-isomorphism Ψ : C(σ(x)) → A with Ψ(id) = x. Here, id refers to the function id(z) = z (“identity”). Proof. Uniqueness is clear because x generates the algebra, so Ψ−1 is determined as soon as we know Ψ−1 (x). To prove existence, we can simply define Ψ−1 as the Gelfand transform, where we also identify ∆ with σ(x), as in Theorem 9.12. More precisely, let Ψ−1 (y) = yb◦b x−1 .  Exercise 9.12. If you have doubts about this definition of Ψ−1 , the following should be helpful: Let ϕ : K → L be a homeomorphism between compact Hausdorff spaces. Show that then Φ : C(L) → C(K), Φ(f ) = f ◦ ϕ is an isometric ∗-isomorphism between C ∗ -algebras. (“Change of variables on K preserves the C ∗ -algebra structure of C(K).”) We will use Theorem 9.13 to define f (x) := Ψ(f ), for f ∈ C(σ(x)) and x ∈ A as above. We interpret f (x) ∈ A as “f , applied to x”, as is already suggested by the notation. There is some logic to this terminology; indeed, if we move things over to the realization C(σ(x))

C ∗ -algebras

101

of A, then f is applied to the variable (which corresponds to x) in a very literal sense. So we can talk about continuous functions of elements of C ∗ -algebras, at least in certain situations. We have just made our first acquaintance with the functional calculus. It may appear that the previous results are rather limited in scope because we specifically seem to need commutative C ∗ -algebras that are generated by a single element. That, however, is not the case because we can often use these tools on smaller subalgebras of a given C ∗ algebra. Here are some illustrations of this technique. Definition 9.14. Let A be a C ∗ -algebra. An element x ∈ A is called positive (notation: x ≥ 0) if x = x∗ and σ(x) ⊂ [0, ∞). Theorem 9.15. Let A be a (not necessarily commutative) C ∗ -algebra. (a) If x = x∗ , then σ(x) ⊂ R. (b) If x is normal, then r(x) = kxk. (c) If x, y ≥ 0, then x + y ≥ 0. (d) xx∗ ≥ 0 for all x ∈ A. Proof. (a) Consider the C ∗ -algebra B ⊂ A that is generated by x. Since x is normal (even self-adjoint), B is commutative by Exercise 9.11(b). So the Gelfand theory applies to B. In particular, σB (x) = {φ(x) : φ ∈ ∆B }, and this is a subset of R, because φ(x) = φ(x∗ ) = φ(x). Since σA (x) ⊂ σB (x), this gives the claim. (b) Consider again the commutative C ∗ -algebra B ⊂ A that is generated by x. By the Gelfand theory (on B), rB (x) = kxk, but, as observed earlier, in Chapter 7, the spectral radius formula shows that rA (x) = rB (x). (c) We will make use of the following simple transformation property of spectra, which follows directly from the definition: Exercise 9.13. Show that if c, d ∈ C, x ∈ A, then σ(cx+de) = cσ(x)+d; this second set is of course defined as the collection of numbers cz + d, with z ∈ σ(x). By hypothesis, σ(x) ⊂ [0, kxk]. By the

Exercise,

σ(x − kxke) ⊂

[−kxk, 0] also, and now (b) implies that x − kxke ≤ kxk. Similarly,

y − kyke ≤ kyk. Thus

x + y − (kxk + kyk)e ≤ kxk + kyk, and now a final application of the Exercise yields σ(x + y) ⊂ [0, 2(kxk + kyk)].

102

Christian Remling

(d) Obviously, y = xx∗ is self-adjoint. We will again consider the commutative C ∗ -algebra B ⊂ A that is generated by y. We know that B∼ y |−b y is continuous, so there exists z ∈ B so = C(∆B ). The function |b that zb = |b y |−b y . Since zb is also real valued, this function is a self-adjoint element of C(∆B ), so we also have that z = z ∗ . Let w = zx and write w = u + iv, with u, v self-adjoint. Then ww∗ = zxx∗ z = zyz = z 2 y; in the last step, we used the fact that y and z both lie in the commutative algebra B. On the other hand, ww∗ = (u + iv)(u − iv) = u2 + v 2 + i(vu − uv), w∗ w = (u − iv)(u + iv) = u2 + v 2 + i(uv − vu), so w∗ w = 2u2 +2v 2 −ww∗ = 2u2 +2v 2 −z 2 y. We now claim that u2 , v 2 ≥ 0. Since u, v are self-adjoint, this can again be seen by investigating the ranges of the Gelfand transforms on suitable commutative subalgebras, as in the proof of part (a). Moreover, we also have that (9.1)

−b z 2 yb = −(|b y | − yb)2 yb = 2b y 2 (|b y | − yb) ≥ 0,

so −z 2 y ≥ 0. By part (c), w∗ w ≥ 0. Now Exercise 7.10 implies that ww∗ ≥ 0, and by Corollary 7.12(a), this also holds in the subalgebra B. But, as computed earlier, ww∗ = z 2 y, so by combining this with (9.1), we conclude that zb2 yb ≡ 0, so at all points of ∆B , either yb = 0 or zb = 0. In both cases, yb ≥ 0, so we obtain that σA (y) ⊂ σB (y) ⊂ [0, ∞), as claimed.  Here’s a very important and pleasing consequence of this material: Theorem 9.16. Let B be a C ∗ -algebra and let A ⊂ B be a C ∗ subalgebra. Then σA (x) = σB (x) for all x ∈ A. Proof. It is clear that σA (x) ⊃ σB (x) (see also our discussion in Chapter 7), so it suffices to show that if y ∈ A∩G(B), then also y ∈ G(A). Now if y ∈ A ∩ G(B), then y ∗ ∈ A ∩ G(B) and thus also yy ∗ ∈ A ∩ G(B). In particular, 0 ∈ / σB (yy ∗ ). Theorem 9.15(d) now shows that σB (yy ∗ ) ⊂ (0, ∞). By Corollary 7.12(a), σA (yy ∗ ) = σB (yy ∗ ). Hence 0 ∈ / σA (yy ∗ ), ∗ −1 −1 ∗ ∗ −1 so (yy ) ∈ A, and thus also y = y (yy ) ∈ A.  We conclude this chapter with a short digression. Suppose that xu = ux. Does this imply that also x∗ u = ux∗ ? For arbitrary u, this can only be true if x is normal (take u = x). This condition is indeed sufficient, and in fact we can prove a more general result along these lines. Theorem 9.17. Let A be a C ∗ -algebra and let x, y, u ∈ A. Suppose that x, y are normal and xu = uy. Then we also have that x∗ u = uy ∗ .

C ∗ -algebras

103

P 1 n Proof. We need some preparation. For w ∈ A, define ew := ∞ n=0 n! w . This series converges absolutely, and just as for the ordinary exponential function, one shows that ev+w = ev ew = ew ev if vw = wv. Involution is a continuous operation (it is in fact isometric), and this ∗ implies that (ew )∗ = ew . When applied to w = t − t∗ (where t ∈ A is arbitrary), these formulae show that ∗

ew (ew )∗ = ew ew = ew e−w = ew−w = 1; here we denote the unit element of A by 1 (rather than e, as usual), to avoid confusion with the base of the exponential function. It follows that 1 = kew (ew )∗ k = kew k2 or ∗

ket−t k = 1 for all t ∈ A.

(9.2)

The assumption that xu = uy can be used repeatedly, and we also obtain that xn u = uy n for all n ≥ 0. Multiplication is continuous, so this implies that ex u = uey or u = e−x uey . We now multiply this ∗ ∗ identity by ex and e−y (from the left and right, respectively). Since x, y are normal, this gives ∗



ex ue−y = ex ∗

∗ −x



uey−y ,



and now (9.2) shows that kex ue−y k ≤ kuk. This whole argument can be repeated with x, y replaced by zx, zy, with z ∈ C, so it is also ∗ ∗ true that kf (z)k ≤ kuk, where f (z) = ezx ue−zy . For every F ∈ A∗ , the new function g(z) = F (f (z)) is an entire function; the analyticity follows from the series representations of the exponential functions. Since g is also bounded (|g(z)| ≤ kF k kuk), this function is constant by Liouville’s theorem. Since this is true for every F ∈ A∗ , f itself has to be constant: ∗



f (z) = ezx ue−zy = u = f (0), ∗



or ezx u = uezy for all z ∈ C. We obtain the claim by comparing the first order terms in the series expansions of both sides (more formally, subtract u, divide by z and let z → 0).  Exercise 9.14. Let A be a commutative algebra with unit. True or false: (a) There exist at most one norm and one involution on A so that A becomes a C ∗ -algebra. (b) There exist a norm and an involution on A so that A becomes a C ∗ -algebra.

104

Christian Remling

Exercise 9.15. Let A be a C ∗ -algebra and let x, y be normal elements of A that commute: xy = yx. Show that σ(x + y) ⊂ σ(x) + σ(y) := {w + z : w ∈ σ(x), z ∈ σ(y)}, σ(xy) ⊂ σ(x)σ(y) := {wz : w ∈ σ(x), z ∈ σ(y)}. Also show that both inclusions can fail if x, y don’t commute. Suggestion: Consider suitable 2 × 2-matrices for the counterexamples. Exercise 9.16. Let A be a C ∗ -algebra and let x ∈ A be normal. Then we can define f (x) ∈ A, for f ∈ C(σ(x)), as follows: Consider the commutative C ∗ -algebra B ⊂ A that is generated by x, and then use Theorem 9.16 and the original definition of f (x) ∈ B, which was based on Theorem 9.13. Prove the spectral mapping theorem: σ(f (x)) = f (σ(x)). Hint: This follows very quickly from Theorem 9.16 and the fact that the map f 7→ f (x) sets up an isometric ∗-isomorphism between C(σ(x)) and B. Just make sure you don’t get confused. Exercise 9.17. Consider the following subalgebra of C2×2 = B(C2 ) :     a b A= y= : a, b ∈ C b a (a) Show that A is a commutative C ∗ -algebra (with the structure in∗  2 a b a b herited from B(C ); in particular, b a = b a ). Remark: Most of this is already clear because we know that the bigger algebra B(C2 ) is a C ∗ -algebra.  (b) Show that A is generated by x = 01 10 . (c) Show that ∆ = {φ1 , φ2 }, where φ1 (y) = a + b, φ2 (y) = a − b. (d) Find σ(x) and confirm the (here: obvious) fact that ∆ ∼ = σ(x), as asserted by Theorem 9.12. (e) Find f (x) ∈ A, for the functions f (z) = |z| and f (z) = 1/2(|z| + z).

The Spectral Theorem

105

10. The Spectral Theorem The big moment has arrived, and we are now ready to prove several versions of the spectral theorem for normal operators in Hilbert spaces. Throughout this chapter, it should be helpful to compare our results with the more familiar special case when the Hilbert space is finitedimensional. In this setting, the spectral theorem says that every normal matrix T ∈ Cn×n can be diagonalized by a unitary transformation. This can be rephrased as follows: There are numbers zj ∈ C P (the eigenn values) and orthogonal projections Pj ∈ B(C ) so that T = m j=1 zj Pj . The subspaces R(Pj ) are orthogonal to each other. From this representation of T , it is then also clear that Pj is the projection onto the eigenspace belonging to zj . In fact, we have already proved one version of the (general) spectral theorem: The Gelfand theory of the commutative C ∗ -algebra A ⊂ B(H) that is generated by a normal operator T ∈ B(H) provides a functional calculus: We can define f (T ), for f ∈ C(σ(T )) in such a way that the map C(σ(T )) → A, f 7→ f (T ) is an isometric ∗-isomorphism between C ∗ -algebras, and this is the spectral theorem in one of its many disguises! See Theorem 9.13 and the discussion that follows. As a warm-up, let us use this material to give a quick proof of the result about normal matrices T ∈ Cn×n that was stated above. Consider the C ∗ -algebra A ⊂ Cn×n that is generated by T . Since T is normal, A is commutative. By Theorem 9.13, A ∼ = C(σ(T )) = C({z1 , . . . , zm }), where z1 , . . . , zm are the eigenvalues of T . We also use the fact that by Theorem 9.16, σA (T ) = σB(H) (T ). All subsets of the discrete space {z1 , . . . , zm } are open, and thus all functions f : {z1 , . . . , zm } → C are continuous. We will make use of the functional calculus notation: f (T ) ∈ A will denote the operator that corresponds to the function f under the isometric ∗-isomorphism that sends the identity function id(z) = z to T ∈ A. Write fj = χ{zj } and let Pj = fj (T ). Since fj = fj and fj2 = fj , we also have that Pj∗ = Pj and Pj2 = Pj , so each Pj is an orthogonal projection by Theorem 6.5. Furthermore, fj fk = 0 if j 6= k, so Pj Pk = 0, and thus hPj x, Pk yi = hx, Pj Pk yi = 0 for all x, y ∈ H if j 6= k. This says that R(Pj ) ⊥ R(Pk ) for j 6= k. Also, P1 + . . . +L Pm = 1 because we have the same identity for fj ’s. Pthe m n It follows that m R(P ) = H = C . Finally, since id = z j j=1 j=1 j fj , Pm we obtain the representation T = j=1 zj Pj , as asserted. On infinite-dimensional Hilbert spaces, we have a continuous analog of this representation: every normal T ∈ B(H) can be written as

106

Christian Remling

R T = z dP (z). We first need to address the question of how such an integral can be meaningfully defined. We will also switch to the more common symbol E (rather than P ) for these “measures” (if that’s what they are). Definition 10.1. Let M be a σ-algebra on a set Ω, and let H be a Hilbert space. A resolution of the identity (or spectral resolution) on (Ω, M) is a map E : M → B(H) with the following properties: (1) Every E(ω) (ω ∈ M) is a projection; (2) E(∅) = 0, E(Ω) = 1; (3) E(ω1 ∩ ω2 ) = E(ω1 )E(ω2 ) (ω1 , ω2 ∈ M); (4) For all x, y ∈ H, the set function µx,y (ω) = hx, E(ω)yi is a complex measure on (Ω, M). If Ω is a locally compact Hausdorff space and M = B is the Borel σ-algebra, then we also demand that every µx,y is a regular (Borel) measure. We can think of E as a projection valued measure (of sorts) on (Ω, M): the “measure” E(ω) of a set ω ∈ M is a projection. The E(ω) are also called spectral projections. Let’s start out with some quick observations. For every x ∈ H, we have that µx,x (ω) = hx, E(ω)xi = hx, E(ω)2 xi = hE(ω)x, E(ω)xi = kE(ω)xk2 , so µx,x is a finite positive measure with µx,x (Ω) = kxk2 . Property (3) implies that any two spectral projections E(ω1 ), E(ω2 ) commute. Moreover, if ω1 ⊂ ω2 , then R(E(ω1 )) ⊂ R(E(ω2 )). If ω1 ∩ ω2 = ∅, then R(E(ω1 )) ⊥ R(E(ω1 )), as the following calculation shows: hE(ω1 )x, E(ω2 )yi = hx, E(ω1 )E(ω2 )yi = hx, E(ω1 ∩ ω2 )yi = 0 for arbitrary x, y ∈ H. E is finitely If ω1 , . . . , ωn ∈ M are disjoint sets, then   additive: Sn Pn E = j=1 ωj j=1 E(ωj ). To prove this, notice that (4) implies that ! ! n n n [ [ X hx, E ωj yi = µx,y ωj = µx,y (ωj ) j=1

j=1

=

n X j=1

j=1

hx, E(ωj )yi = hx,

n X

E(ωj )yi

j=1

for arbitrary x, y ∈ H, and this gives the claim. Is E also σ-additive (as it ought to be, if we are serious about interpreting E as a new sort of measure)? In other words, if ωn ∈ M are

The Spectral Theorem

107

disjoint sets, does it follow that ! (10.1)

E

[ n∈N

ωn

=

∞ X

E(ωn ).

n=1

The answer to this question depends on how one defines the righthand side of (10.1). We observe that if E(ωn ) 6= 0 for infinitely many n, then this series can never be convergent in operator norm. Indeed, kE(ωn )k = 1 if E(ωn ) 6= 0, and thus the partial sums do not form a Cauchy sequence. However, (10.1) will hold if we are satisfied with strong operator convergence: We say that Tn ∈ B(H) converges s strongly to T ∈ B(H) (notation: Tn → − T ) if Tn x → T x for all x ∈ H. To prove that (10.1) holds in this interpretation, fix x ∈ H and use the fact that the E(ωn )x form an orthogonal system (because the ranges of the projections are orthogonal subspaces for disjoint sets). E(ωn )x We normalize the non-zero vectors: let yn = kE(ω if E(ωn )x 6= 0. n )xk Then the yn form P P an ONS, and thus, by Theorem 5.15, the series hyn , xiyn = E(ωn )x converges. Now if y ∈ H is arbitrary, then the continuity of the scalar product and the fact that µx,y is a complex measure give that ! ∞ ∞ X [ X hy, E(ωn )xi = hy, E ωn xi. E(ωn )xi = hy, n=1

n=1

n∈N

P∞ Since for every y ∈ H, it follows that n=1 E(ωn )x =  S this holds E n∈N ωn x, and this is (10.1), with the series interpreted as a strong operator limit. Definition 10.2. A set N ∈ M with E(N ) = 0 is called an E-null set. We define L∞ (Ω, E) as the set of equivalence classes of measurable, essentially bounded functions f : Ω → C. Here, f ∼ g if f and g agree off an E-null set. Also, as usual, we say that f is essentially bounded if |f (x)| ≤ M (x ∈ Ω \ N ) for some M ≥ 0 and some E-null set N ⊂ Ω. Exercise 10.1. Prove that a countable union of E-null sets is an E-null set. Recall that for an arbitrary positive measure µ on X, the space L∞ (X, µ) only depends on what the µ-null sets are and not on the specific choice of the measure µ. For this reason and because of Exercise 10.1, we can also, and if fact without any difficulties, introduce L∞ spaces that are based on resolutions of the identity. These spaces have the same basic properties: L∞ (Ω, E) with the essential supremum of |f | as the norm and the involution f ∗ (x) = f (x) is a commutative

108

Christian Remling

C ∗ -algebra. The spectrum of a function f ∈ L∞ (Ω, E) is its essential range. Exercise 10.2. Write down precise definitions of the essential supremum and the essential range of a function f ∈ L∞ (Ω, E). R We would like to define an integral Ω f (t) dE(t) for f ∈ L∞ (Ω, E). This integral should be an operator from B(H), and it also seems reasonable to demand that Z  Z hx, f (t) dE(t) yi = f (t) dµx,y (t) Ω



for all x, y ∈ H. RIt is clear that this condition already suffices to uniquely determine Ω f (t) dE(t), should such an operator indeed exist. As for existence, we have the following result; it will actually turn out that the integral with respect to a resolution of the identity has many other desirable properties, too. Theorem 10.3. Let E be a resolution of the identity. Then there exists a unique map Ψ : L∞ (Ω, E) → A onto a C ∗ -subalgebra A ⊂ B(H), so that Z (10.2) hx, Ψ(f )yi = f (t) dµx,y (t) Ω ∞

for all f ∈ L (Ω, E), x, y ∈ H. Moreover, Ψ is an isometric ∗isomorphism from L∞ (Ω, E) onto A, and Z 2 |f (t)|2 dµx,x (t). (10.3) kΨ(f )xk = Ω

R

So we can (and will) define Ω f (t) dE(t) := Ψ(f ). Let us list the properties of the integral that are guaranteed by Theorem 10.3 one more time, using this new notation: Z Z Z Z Z (f + g) dE = f dE + g dE, (cf ) dE = c f dE, Z Z Z f g dE = f dE g dE,

Z

Z ∗ Z

= kf k∞ f dE = f dE, f dE

The multiplicativity of the integral (see the second line) may seem a bit strange at first, but it becomes plausible again if we recall that all E(ω) are projections.

The Spectral Theorem

109

P Pm Pm Exercise 10.3. Show that m j=1 fj Pj j=1 gj Pj = j=1 fj gj Pj if the Pj are projections with orthogonal ranges, as at the beginning of this chapter, and fj , gj ∈ C. Proof. This is not a particularly short proof, but it follows a standard R pattern. First of all, we certainly know how we want to define f dE for simple functions f ∈ L∞ (Ω, E), that is, functions of the form f = P n cj χωj with cj ∈ C and ωj ∈ M. For such an f , put Ψ(f ) = Pj=1 n j=1 cj E(ωj ). For x, y ∈ H, we then obtain that Z n n X X f (t) dµx,y (t). hx, Ψ(f )yi = cj hx, E(ωj )yi = cj µx,y (ωj ) = j=1



j=1

This is (10.2) for simple functions f , and this identity also confirms that Ψ(f ) was indeed well defined (Ψ(f ) is determined by the function f , and it is independent of the particular representation of f that was chosen to form Ψ(f )). P P 0 We also have that Ψ(f )∗ = cj E(ωj ) = Ψ(f ), and if g = m k=1 dk χωk is a second simple function, then X X cj dk E(ωj ∩ ωk0 ) = Ψ(f g). cj dk E(ωj )E(ωk0 ) = Ψ(f )Ψ(g) = j,k

j,k

For the last equality, we use the P fact that f g is another simple func0 tion, with representation f g = j,k cj dk χωj ∩ωk . Similar arguments show that Ψ is linear (on simple functions). Finally, (10.3) (for simple functions) follows from the identity Ψ(f )∗ Ψ(f ) = Ψ(f )Ψ(f ) = Ψ(|f |2 ): Z 2 ∗ 2 kΨ(f )xk = hx, Ψ(f ) Ψ(f )xi = hx, Ψ(|f | )xi = |f (t)|2 dµx,x (t) Ω 2

kf k2∞ kxk2 ,

so kΨ(fP )k ≤ kf k. On This also implies that kΨ(f )xk ≤ the other hand, the sets ωj in the representation f = cj χωj can be taken to be disjoint (just take ωj = f −1 ({cj })). Now if E(ωj ) 6= 0, then there exists x ∈ R(E(ωj )), x 6= 0. Clearly, Ψ(f )x = cj x, and since kf k∞ = maxj:E(ωj )6=0 |cj |, we now see that kΨ(f )k = kf k. So Ψ is isometric (on simple functions). We now want to extend these results to arbitrary functions f ∈ ∞ L (Ω, E) by using an approximation procedure. Exercise 10.4. Let f ∈ L∞ (Ω, E). Show that there exists a sequence of simple functions fn ∈ L∞ (Ω, E) so that kfn − f k → 0. Let f ∈ L∞ (Ω, E) and pick an approximating sequence fn of simple functions, as in Exercise 10.4. Notice that Ψ(fn ) converges in B(H):

110

Christian Remling

indeed, kΨ(fm ) − Ψ(fn )k = kΨ(fm − fn )k = kfm − fn k, so this is a Cauchy sequence. The same argument shows that the limit is independent of the specific choice of the approximating sequence, so we can define Ψ(f ) := lim Ψ(fn ). The continuity of the scalar product gives Z hx, Ψ(f )yi = lim hx, Ψ(fn )yi = lim fn (t) dµx,y (t). n→∞

n→∞



Every E-null set is a |µx,y |-null set, so fn converges µx,y -almost everywhere to f . Moreover, |fn | ≤ kfn k∞ ≤ C off an E-null set, so again µx,y -almost everywhere. The constant function C lies in L1 (Ω, d|µx,y |) because |µx,y | is a finite measure. We have just verified the hypotheses R Rof the Dominated Convergence Theorem. It follows∞that limn→∞ Ω fn dµx,y = f dµ, and we obtain (10.2) (for arbitrary f ∈ L (Ω, E)). Ω Exercise 10.5. Establish (10.3) in a similar way. The remaining properties follow easily by passing to limits. For example, if f, g ∈ L∞ , pick approximating simple functions fn , gn and use the continuity of the multiplication to deduce that Ψ(f )Ψ(g) = lim Ψ(fn ) lim Ψ(gn ) = lim Ψ(fn )Ψ(gn ) = lim Ψ(fn gn ) = Ψ(f g). In the last step, we use the fact that fn gn is a sequence of simple functions that converges to f g in the norm of L∞ (Ω, E). Exercise 10.6. Prove at least two more properties of Ψ (Ψ linear, isometric, Ψ(f )∗ = Ψ(f )) in this way. Finally, since Ψ is an isometry, its image A = Ψ(L∞ (Ω, E)) is closed (compare the proof of Proposition 4.3), and it is also a subalgebra that is closed under the involution ∗ because Ψ is a ∗-homomorphism.  We now have the tools to prove the next version of the Spectral Theorem (the first version being the existence of a functional calculus for normal operators). We actually obtain a more abstract version for a whole algebra of operators from our machinery; we discuss this first and then specialize to a single operator later on, in Theorem 10.5. Theorem 10.4. Suppose A ⊂ B(H) is a commutative C ∗ -subalgebra of B(H). Let ∆ be its maximal ideal space.

The Spectral Theorem

111

(a) There exists a unique resolution of the identity on the Borel sets of ∆ (with its Gelfand topology) so that Z Tb(t) dE(t) (10.4) T = ∆

for all T ∈ A. Moreover, R E has the following∞ additional properties: (b) B = { ∆ f (t) dE(t) : f ∈ L (Ω, E)} is a commutative C ∗ -algebra satisfying A ⊂ B ⊂ B(H). (c) The finite linear combinations of the E(ω), ω ∈ M are dense in B. (d) If ω ⊂ ∆ is a non-empty open set, then E(ω) 6= 0. Proof. By the Gelfand-Naimark Theorem, A ∼ = C(∆). We will now ∗ use the Riesz Representation Theorem: C(∆) = M(∆), the space of regular complex Borel measures on ∆. See Example 4.2. The uniqueness of E follows immediately from this: If E satisfies (10.4), then R Tb(t) dµx,y (t) = hx, T yi, and every continuous function on ∆ is of ∆ the form Tb for some T ∈ A, so the functionals (on C(∆)) associated with the measures µx,y and thus also the measures themselves are already determined by (10.4). Since x, y ∈ H are arbitrary here, E itself is determined by (10.4). To prove existence of E, we fix x, y ∈ H and consider the map C(∆) → C, Tb 7→ hx, T yi. Since the inverse of the Gelfand transform, Tb 7→ T , is linear, this map is linear, too, and also bounded, as we see from |hx, T yi| ≤ kxk kT yk ≤ kxk kT k kyk = kxk kyk kTbk∞ . By the Riesz Representation Theorem, there is a regular complex Borel measure on ∆ (call it µx,y ) so that Z (10.5) hx, T yi = Tb(t) dµx,y (t) ∆

for all T ∈ A. Our goal is to construct a resolution of the identity E for which hx, E(ω)yi = µx,y (ω). That will finish the proof of part (a). As a function of x, y, hx, T yi is sesquilinear. From this, it follows that that (x, y) 7→ µx,y is sesquilinear, too. This means that µx+y,z = µx,z + µy,z , µcx,y = cµx,y , and µx,y is linear in y. Exercise 10.7. Prove this claim. R If now f : ∆ → C is a bounded measurable function, then (x, y) 7→ f (t) dµx,y (t) defines another sesquilinear form. In fact, this form is ∆

112

Christian Remling

bounded in the sense that  Z    f (t) dµx,y (t) ≤ sup |f (t)| |µx,y |(∆) ≤ sup |f (t)| kxk kyk. t∈∆ t∈∆ ∆

By Exercise 6.10, there is a unique operator Φ(f ) ∈ B(H), so that Z hx, Φ(f )yi = f (t) dµx,y (t) ∆

for all x, y ∈ H. If f ∈ C(∆) here, then a comparison with (10.5) shows that Φ(f ) = T , where T ∈ A is the unique operator with Tb = f . Now Z Z Z ∗ ∗ b c T dµx,y = hx, T yi = hy, T xi = T dµy,x = Tb dµy,x , ∆





and this holds for all functions Tb ∈ C(∆), so we conclude that µx,y = µy,x , where, as expected, the measure ν is defined by ν(ω) = ν(ω). But then we can use this for integrals of arbitrary bounded Borel functions f: Z Z hx, Φ(f )yi = f dµx,y = f dµy,x = hy, Φ(f )xi = hΦ(f )x, yi, ∆





so Φ(f ) = Φ(f ). Next, for S, T ∈ A, we have that Z Z Z b b S T dµx,y = (ST )bdµx,y = hx, ST yi = Sb dµx,T y , ∆





so Tb dµx,y = dµx,T y . Again, we can apply this to integrals of arbitrary R R bounded Borel functions f : f Tb dµx,y = f dµx,T y , and this implies that Z Z ∗ b f T dµx,y = hx, Φ(f )T yi = hΦ(f ) x, T yi = Tb dµΦ(f )∗ x,y . ∆



b Since R T ∈ C(∆)Ris arbitrary here, this says that f dµx,y = dµΦ(f )∗ x,y , so f g dµx,y = g dµΦ(f )∗ x,y for all bounded Borel functions g. Now R f g dµx,y = hx, Φ(f g)yi and Z g dµΦ(f )∗ x,y = hΦ(f )∗ x, Φ(g)yi = hx, Φ(f )Φ(g)yi, ∆

so we finally obtain the desired conclusion that Φ(f g) = Φ(f )Φ(g). We can now define E(ω) = Φ(χω ). I claim that E is a resolution of the identity. Clearly, by construction, Z hx, E(ω)yi = hx, Φ(χω )yi = χω dµx,y = µx,y (ω), ∆

The Spectral Theorem

113

as required. This also verifies (4) from Definition 10.1. It remains to check the conditions (1)–(3). Notice that E(ω)∗ = Φ(χω )∗ = Φ(χω ) = Φ(χω ) = E(ω), so E(ω) is self-adjoint. Similarly, E(ω)2 = Φ(χω )2 = Φ(χ2ω ) = Φ(χω ) = E(ω). By Theorem 6.5, E(ω) is a projection, so (1) holds. A similar computation lets us verify (3). Finally, moving on to (2), it is clear that E(∅) = Φ(0) = 0, and E(∆) = Φ(1). Now the constant function 1 is continuous, so, as observed above, Φ(1) is the operator whose Gelfand transform is identically equal to one, but this is the identity operator 1 ∈ A ⊂ B(H) (the multiplicative unit of A and B(H)). So E(1) = 1, as desired. (b) We know from Theorem 10.3 that B is a C ∗ -subalgebra of B(H), and since continuous functions are in L∞ (∆, E), we clearly have that B ⊃ A. R (c) This is immediate from the way the integral f dE was constructed, in the proof of Theorem 10.3. (d) Let ω ⊂ ∆ be a non-empty open set. Pick t0 ∈ ω and use Urysohn’s Lemma to find a continuous function f with f (t0 ) = 1, f = 0 on ω c . Then f = Tb for some T ∈ A, and if we had E(ω) = 0, R then T = ∆ Tb dE = 0, but this is impossible because Tb = f is not the zero function.  We now specialize to (algebras generated by) a single normal operator. Theorem 10.5 (The Spectral Theorem for normal operators). Let T ∈ B(H) be a normal operator. Then there exists a unique resolution of the identity E on the Borel sets of σ(T ) so that Z (10.6) T = z dE(z). σ(T )

Proof. Consider, as usual, the commutative C ∗ -algebra A ⊂ B(H) that is generated by T . Existence of E now follows from Theorem 10.4(a) because we can make the following identifications: By Theorems 9.12 and 9.13, ∆A is homeomorphic to σ(T ), and A ∼ = C(σ(T )). Here we may interpret σ(T ) as σB(H) (T ) because σA (T ) is the same set by Theorem 9.16. From a formal point of view, perhaps the most satisfactory argument runs as follows: Reexamine the proof of R Theorem 10.4 to confirm that we obtain the representation T = K f dE as soon as we have an isometric ∗-isomorphism between A and C(K) that sends T to f (it is not essential that this isomorphism is specifically the Gelfand transform). In the case at hand, A ∼ = C(σ(T )), by Theorem 9.13, and the corresponding isomorphism sends T to id(z) = z, so we

114

Christian Remling

obtain (10.6). For later use, we also record that, by the same argument, R f (T ) = σ(T ) f (z) dE(z) for all f ∈ C(σ(T )), where f (T ) ∈ A is defined as in Chapter 9; see especially the discussion following Theorem 9.13. Let us now prove uniqueness of E. By Theorem 10.3, if (10.6) holds, R then also p(T, T ∗ ) = σ(T ) p(z, z) dE(z) for all polynomials p in two variables. When viewed as functions of z only, this set {f : σ(T ) → C : f (z) = p(z, z), p polynomial in two variables} satisfies the hypotheses of the Stone-Weierstraß Theorem. Therefore, if f ∈ C(σ(T )) is arbitrary, there are polynomials pn so that kf (z) − pn (z, z)k∞ → 0. Alternatively, this conclusion can also be obtained from the fact that T generates A, so {p(T, T ∗ )} is dense in A, and we can then move things over to C(σ(T )). The Dominated Convergence Theorem shows that Z Z f (z) dµx,y (z) = lim pn (z, z) dµx,y (z) = lim hx, pn (T, T ∗ )yi. σ(T )

n→∞

n→∞

σ(T )

So the measures µx,y and thus also E itself are uniquely determined.  This proof has also established the following fact, which we state again because it will prove useful in the sequel: Proposition 10.6. If E is the spectral resolution of T ∈ B(H), as in the Spectral Theorem, then E(U ∩ σ(T )) 6= 0 for all open sets U ⊂ C with U ∩ σ(T ) 6= ∅. This follows from Theorem 10.4(d) and our identification of ∆A with σ(T ). We introduce some new notation. It will occasionally be convenient to write dhx, E(z)yi for the measure dµx,y (z). Similarly, dhx, E(z)xi and dkE(z)xk2 both refer to the measure dµx,x (z). This notation is reasonable because hx, E(ω)xi = kE(ω)xk2 . We can now also extend the functional calculus from Chapter 9. More precisely, for a normal T ∈ B(H) and f ∈ L∞ (σ(T ), E), where E is the resolution of the identity of T , as in the Spectral Theorem, let Z (10.7) f (T ) := f (z) dE(z). σ(T )

As observed above, in the proof of Theorem 10.5, this is consistent with our earlier definition of f (T ) for f ∈ C(σ(T )) from Chapter 9. By Theorem 10.3, the functional calculus f 7→ f (T ) is an isometric ∗-isomorphism between L∞ (σ(T ), E) and aPsubalgebra of B(H). Note also that if p(z) is a polynomial, p(z) = nj=0 cj z j , then p(T ) could Pn j have been defined directly as p(T ) = j=0 cj T , and the functional

The Spectral Theorem

115

calculus gives the same result. A similar remark applies to functions of the form p(z, z). We state the basic properties of the functional calculus one more time: (cf + dg)(T ) = cf (T ) + dg(T ), f (T )∗ = f (T ),

(f g)(T ) = f (T )g(T ) = g(T )f (T ) Z 2 kf (T )k = kf k∞ , kf (T )xk = |f (z)|2 dkE(z)xk2 σ(T )

Moreover, if f is continuous, then we have the spectral mapping theorem: σ(f (T )) = f (σ(T )). This was discussed in Exercise 9.16. We want to prove still another version of the Spectral Theorem. This last version will be an analog of the statement: a normal matrix can be diagonalized by a unitary transformation. We will needs sums of Hilbert spaces to formulate this result, so we discuss this topic first. If H1 , . . . , HL n are Hilbert spaces, then we can construct a new Hilbert space H = nj=1 Hj , as follows: As a vector space, H is the sum of the vector spaces Hj , and if x, Py ∈ H, say x = (x1 , . . . , xn ), y = (y1 , . . . , yn ), then we define hx, yi = nj=1 hxj , yj iHj . Exercise 10.8. Verify that this defines a scalar product on H and that H is complete with respect to the corresponding norm. Note that each Hj can be naturally identified with a closed subspace of H, by sending xj ∈ Hj to x = (0, . . . , 0, xj , 0, . . . 0). In fact, the Hj , viewed in this way as subspaces of H, are pairwise orthogonal. Conversely,L if H is a Hilbert space and the Hj are orthogonal subspaces of H, then Hj P can be naturally identified with a subspace of H (by mapping (xj ) to xj ). An analogous construction works for infinitely many summands Hα , α ∈ I. We now P define H2 to be the set of vectors x = (xα )α∈I (xα ∈ Hα ) that satisfy α∈I kxα k < ∞. If I is uncountable, then, as usual, this means that xα 6= 0 for only countably many α and the corresponding seP ries is required to converge. We can again define hx, yi = α∈I hxα , yα i; the convergence of this series follows from the definition on H and the Cauchy-Schwarz inequality for both the individual scalar products and then also the sum over α ∈ I. Exercise 10.9. Again, prove that this defines a scalar product and that H is a Hilbert space. Theorem 10.7 (Spectral representation of normal operators). Let T ∈ B(H) be a normal operator. Then there exists a collection {ρα : α ∈ I} of finite positive Borel measures on σ(T ) and a unitary map U : H →

116

L

α∈I

Christian Remling

L2 (σ(T ), dρα ) so that U T U −1 = Mz ,

(Mz f )α (z) = zfα (z).

The minimal cardinality of such a set I is called the spectral multiplicity of T ; if H is separable (as almost all Hilbert spaces that occur in practice are), then I can always taken to be a countable set (say I = N). Sometimes, a finite I will suffice or even an I consisting of just one element, so that T would then be unitarily equivalent to a multiplication operator by the variable on a sinle L2 (ρ) space. Exercise 10.10. Let T ∈ Cn×n be a normal matrix, with eigenvalues σ(T ) = {z1 , . . . , zm }. Prove the existence of a spectral representation directly, by providing the details in the following sketch: Choose the ρα as counting measures on (subsets of) σ(T ), and to define U , send a vector x ∈ Cn to its expansion coefficients with respect to an ONB consisting of eigenvectors of T . Exercise 10.11. Use the discussion of the previous Exercise to show that for a normal T ∈ Cn×n , the spectral multiplicity (as defined above) is the maximal degeneracy of an eigenvalue, or, put differently, it is equal to maxz∈σ(T ) dim N (T − z). The measures ρα from Theorem 10.7 are called spectral measures. They are not uniquely determined by the operator T ; Exercise 10.17 below will shed some additional light on this issue. Proof. For x ∈ H, x 6= 0, let Hx = {f (T )x : f ∈ C(σ(T ))}. We also define an operator Ux : Hx → L2 (σ(T ), dµx,x ), as follows: For (0) f ∈ C(σ(T )), put Ux f (T )x = f . Then Z (0) 2 |f (z)|2 dµx,x (z) = kf (T )xk2 , kUx f (T )xk = σ(T ) (0)

by Theorem 10.3. By Exercise 2.26, the operator Ux : {f (T )x} → L2 (µx,x ) has a unique continuous extension to Hx (call it Ux ). Since the norm is continuous, Ux will also be isometric. In particular, R(Ux ) is closed, but clearly R(Ux ) also contains every continuous function on σ(T ), and these are dense in L2 (σ(T ), dµx,x ), so R(Ux ) = L2 (σ(T ), dµx,x ). Summing up: Ux is a unitary map (a linear bijective isometry) from Hx onto L2 (σ(T ), dµx,x ). Now let f ∈ C(σ(T )) and write zf (z) = g(z). Then Ux T Ux−1 f = Ux T f (T )x = Ux g(T )x = g = Mz f,

The Spectral Theorem

117

where Mz denotes the operator of multiplication by z (here: in L2 (σ(T ), dµx,x )). Since these functions f are dense in L2 (σ(T ), dµx,x ) and both operators Ux T Ux−1 and Mz are continuous, it follows that Ux T Ux−1 = Mz . We now consider those collections of such spaces {Hx : x ∈ I} for which the individual spaces are orthogonal: Hx ⊥ Hy if x, y ∈ I, x 6= y. One can now use Zorn’s L Lemma to show that there is such collection of Hx spaces so that x∈I Hx = H. As always, we don’t want L to discuss the details of this argument. The crucial fact is this: If x∈I Hx 6= H, then there is another space Hy that is orthogonal to all Hx (x ∈ I). ⊥ L This can be proved as follows: Just pick an arbitrary y ∈ , x∈I Hx y 6= 0. Then hy, g(T )xi = 0 for all x ∈ I and continuous functions g. But then it also follows that for all continuous f hf (T )y, g(T )xi = hy, f (T )g(T )xi = 0, because f g is another continuous function. So f (T )y ⊥ Hx and thus Hy ⊥ Hx by the continuity of the scalar product. L We can now L define the unitary map U as U = x∈I Ux , where I is chosen so that x∈I Hx = H, as discussed in the preceding paragraph. More precisely, by this we mean the following: M L2 (σ(T ), dµx,x ), U :H→ x∈I

P

and if y = x∈I yx is the unique decomposition of y ∈ H into components yx ∈ Hx , then we put (U y)x = Ux yx . This map has the desired properties. Exercise 10.12. Check this in greater detail.  We have now discussed three versions of the Spectral Theorem. We originally obtained the functional calculus for normal operators from the theory of C ∗ -algebras, especially the Gelfand-Naimark Theorem. This was then used to derive the existence of a spectral resolution E and a spectral representation. Conversely, spectral resolutions can be used to construct (in fact: an extended version of) the functional calculus, and it is also easy to recover E, starting from a spectral representation U T U −1 = Mz (we sometimes write this as T ∼ = Mz ). We summarize symbolically: functional calculus

⇐⇒

T =

R σ(T )

z dE(z)

⇐⇒

T ∼ = Mz

118

Christian Remling

Every version has its merits, and it’s good to have all three statements available. Note, however, that the original functional calculus (obtained from the theory of C ∗ -algebras) becomes superfluous now because we obtain more powerful versions from the other statements (this was already pointed out above). The spectrum of T will not always be known, and so it is sometimes more convenient to have statements that do not explicitly involve σ(T ). This is very easy to do: Given E, we can also get a spectral resolution on the Borel sets of C by simply declaring E(C \ σ(T )) = 0. Similarly, in a spectral representation, we can think of the ρα as measures on C (with ρα (C \ σ(T )) = 0). In this case, we can recover the spectrum from the measures ρα . We discuss the case of one space L2 (C, dρ) and leave the discussion of the effect of the orthogonal sum to an exercise. Given a Borel measure ρ on C, we define its topological support as the smallest closed set A that supports ρ in the sense that ρ(Ac ) = 0. We denote it by A = top supp ρ. Exercise 10.13. Prove that such aSset exists. Suggestion: It is tempting to try to define (top supp ρ)c = U , where the union is over all open sets U ⊂ C with ρ(U ) = 0. This works, but note that the union will be uncountable, which could be a minor nuisance because we want to show that it has ρ measure zero. Proposition 10.8. If T = Mz on L2 (C, dρ), then σ(T ) = top supp ρ. Proof. Abbreviate S = top supp ρ. We must show that Mz − w is invertible in B(L2 ) precisely if w ∈ / S. Now if w ∈ / S, then |w − z| ≥  > 0 for ρ-almost every z ∈ C (by definition of S), and this implies that M(z−w)−1 is a bounded linear operator. Obviously, it is the inverse of Mz − w. Conversely, if w ∈ S, then ρ(Bn ) > 0 for all n ∈ N, where Bn = {z ∈ C : |z − w| < 1/n}. Again, this follows from the definition of S. This means that kχBn k > 0 in L2 (C, dρ). Let fn = χBn /kχBn k, so kfn k = 1. Then k(Mz − w)fn k < 1/n, and this shows that (Mz − w) is not invertible: if it were, then it would follow that 1 = kfn k = k(Mz − w)−1 (Mz − w)fn k ≤ Ck(Mz − w)fn k < a statement that seems hard to believe. As for the orthogonal sum, we have the following result:

C , n 

The Spectral Theorem

119

Proposition 10.9. Let Hα be Hilbert L spaces, and let Tα ∈ B(Hα ), with supα∈I kTα k < ∞. Write H = α∈I Hα and define T : H → H as follows: (T x)α = Tα xα (if x = (xα )α∈I ). Then T ∈ B(H) and [ σ(T ) = σ(Tα ). α∈I

L It is customary to write this operator as T = α∈I Tα , and actually we already briefly mentioned this notation in the proof of Theorem 10.7. If I is finite, then no closure is necessary in the statement of Proposition 10.9. The situation of Theorem 10.7 is as discussed in the Proposition, with LTα2 = Mz for all α. So we can now say that the spectrum of Mz on L (C, dρα ) is the closure of the union of the topological supports of the ρα . Exercise 10.14. Prove Proposition 10.9. The following basic facts are very useful when dealing with spectral representations. They provide further insight into the functional calculus and also a very convenient way of performing these operations once a spectral representation has been found. Proposition 10.10. Let f : C → C be a bounded Borel function. Then: (a) f (Mz ) = Mf (z) ; (b) Let U : H1 → H2 be a unitary map and let T ∈ B(H1 ) be a normal operator. Then f (U T U −1 ) = U f (T )U −1 . Sketch of proof. We argue as in the second part of the proof of Theorem 10.5. First of all, the assertions hold for functions of the type f (z) = p(z, z), with a polynomial p, because for such functions we have an alternative direct description of f (T ), which lets us verify (a), (b) directly. Again, by the Stone-Weierstraß Theorem, these functions are dense in C(K) for compact subsets K ⊂ C. Since fn (T ) → f (T ) in B(H) if kfn − f k∞ → 0, this R gives2the claim for continuous functions. 2 Now k(f (T ) − g(T ))xk = |f − g| dµx,x and continuous functions are dense in L2 spaces. From this, we obtain the statements for arbitrary bounded Borel functions.  Exercise 10.15. Give a detailed proof by filling in the details. If T is of the form Mz on L2 (C, dρ), as in a spectral representation (where we assume, for simplicity, that there is just one L2 space), what is the spectral resolution E of this operator? In general, we can

120

Christian Remling

recover E from T as E(A) = χA (T ), so Proposition 10.10 shows that E(A) = MχA if A ⊂ C is a Borel set. Exercise 10.16. Verify directly that this defines a resolution of the identity on the Borel sets of C (and the Hilbert space L2 (C, dρ)). We observed earlier that the spectral measures ρα are (in fact: highly) non-unique. The following Exercise helps to clarify the situation. We call two operators Tj ∈ B(Hj ) unitarily equivalent if T2 = U T1 U −1 for some unitary map U : H1 → H2 . So, if we use this terminology, then Theorem 10.7 says that every normal operator is unitarily equivalent to the operator of multiplication by the variable in a sum of spaces L2 (C, ρα ). (µ)

Exercise 10.17. Consider the multiplication operators T1 = Mz and (ν) T2 = Mz on L2 (µ) and L2 (ν), respectively, where µ, ν are finite Borel measures on C. Show that T1 , T2 are unitarily equivalent if and only if µ and ν are equivalent measures (that is, they have the same null sets). Suggestion: For one direction, use the fact that µ and ν are equivalent if and only if dµ = f dν, with f ∈ L1 (ν) and f > 0 almost everywhere with respect to µ (or ν). Example 10.1. Let us now discuss the operator (T x)n = xn+1 on `2 (Z). By Exercise 6.7(a), T is unitary, so the Spectral Theorem applies. It is easiest to start out with a spectral representation because this can be guessed directly. Consider the operator Z 2π 1 2 2 f (eix )einx dx F : L (S, dx/(2π)) → ` (Z), (F f )n = 2π 0 (F as in Fourier transform). Here, S = {z ∈ C : |z| = 1} denotes again the unit circle; when convenient, we also use x ∈ [0, 2π) to parametrize S by writing z = eix . Note that (F f )n = hen , f i, with en (z) = z −n . Since these functions form an ONB (compare Exercise 5.15), Theorem 5.14 shows that F is unitary. Observe that the function g(z) = zf (z) has Fourier coefficients (F g)n = (F f )n+1 . In other words, F −1 T F = Mz , and this is a spectral representation, with U = F −1 . The spectral measure dx/(2π) has the unit circle as its topological support, so σ(T ) = S. Since only one L2 space is necessary here, the operator T has spectral multiplicity one. What is the spectral resolution of T ? We already identified this spectral resolution on L2 (S, dx/(2π)), the space from the spectral representation, and we can now map things back to the original Hilbert space `2 (Z) by using Proposition 10.10. More specifically, E(A) = χA (T ) = χA (F Mz F −1 ) = F χA (Mz )F −1 = F MχA (z) F −1 .

The Spectral Theorem

121

P We this if we recall that (F −1 y)(z) = yn z −n , so (MχA F −1 y)(z) = P can rewrite yn χA (z)z −n (both series converge in L2 (S)), and thus (E(A)y)n =

∞ X

χ cA (m − n)ym ,

m=−∞

R 2π

where χ cA (k) = 1/(2π) 0 χA (eix )eikx dx. Formally, this follows immediately from the preceding formulae, and for a rigorous argument, we use the fact that (F f )n may be interpreted as a scalar product, the continuity of the scalar product and the L2 convergence of the series that are involved here. We now prove some general statements that illustrate how the Spectral Theorem helps to analyze normal operators. Theorem 10.11. Let T ∈ B(H) be normal. Then: (a) T is self-adjoint ⇐⇒ σ(T ) ⊂ R; (b) T is unitary ⇐⇒ σ(T ) ⊂ S = {z ∈ C : |z| = 1}. The assumption that T is normal is needed here: if, for example,  T = 00 10 ∈ B(C2 ), then σ(T ) = {0} ⊂ R, but T is not self-adjoint. Proof. (a) =⇒: This was established earlier, in Theorem 9.15(a). ⇐=: By the Spectral Theorem and functional calculus, Z Z ∗ T = z dE(z) = T. z dE(z) = σ(T )

σ(T )

(b) ⇐=: This follows as in (a) from Z Z ∗ ∗ TT = T T = zz dE(z) = σ(T )

dE(z) = 1.

σ(T )

=⇒: If z ∈ σ(T ), then E(B1/n (z)) 6= 0 for all n ∈ N by Proposition 10.6, so we can pick xn ∈ R(E(B1/n (z))), kxn k = 1. Then µxn ,xn ((B1/n (z))c ) = hxn , E((B1/n (z))c )xn i = hxn , E((B1/n (z))c )E(B1/n (z))xn i = 0, so it follows that (10.8) Z kT xn k − |z| kxn k 2 ≤ k(T − z)xn k2 = σ(T )

|t − z|2 dµxn ,xn (t) ≤

1 . n2

Since kT yk = kyk for all y ∈ H for a unitary operator, this shows that |z| = 1, as claimed. 

122

Christian Remling

Theorem 10.12. If T ∈ B(H) is normal, then kT k = sup |hx, T xi| . kxk=1

Proof. Clearly, |hx, T xi| ≤ kT k kxk2 , so the sup is ≤ kT k. On the other hand, we know from Theorem 9.15(b) that kT k = r(T ), so there exists a z ∈ σ(T ) with |z| = kT k. As in the previous proof, if  > 0 is given, then E(B (z)) 6= 0, so we can find an x ∈ R(E(B (z))), kxk = 1. Then Z 2 |hx, T xi − z| = |hx, (T − z)xi| = (t − z) dkE(t)xk <  because (again, as in the previous proof) µx,x ((B (z))c ) = 0 (and µx,x (C) = kxk2 = 1). So sup |hx, T xi| ≥ kT k − , and  > 0 is arbitrary here.  Theorem 10.13. Let T ∈ B(H). Then T ≥ 0 (in the C ∗ -algebra B(H); see Definition 9.14) if and only if hx, T xi ≥ 0 for all x ∈ H. Proof. If T ≥ 0, then T is self-adjoint and σ(T ) ⊂ [0, ∞), so the Spectral Theorem shows that Z t dµx,x (t) ≥ 0 hx, T xi = [0,∞)

for all x ∈ H. Conversely, if this condition holds, then in particular hx, T xi ∈ R for all x ∈ H, so hx, T ∗ xi = hT x, xi = hx, T xi = hx, T xi. Polarization now shows that hx, T ∗ yi = hx, T yi for all x, y ∈ H, that is, T = T ∗ and T is self-adjoint. Now if t > 0, then tkxk2 = hx, txi ≤ hx, (T + t)xi ≤ kxk k(T + t)xk, so it follows that (10.9)

k(T + t)xk ≥ tkxk.

This shows, first of all, that N (T + t) = {0}. Moreover, we also see from (10.9) that R(T + t) is closed: if yn ∈ R(T + t), say yn = (T + t)xn and yn → y ∈ H, then (10.9) shows that xn is a Cauchy sequence, so xn → x for some x ∈ H and thus y = (T + t)x ∈ R(T + t) also, by the continuity of T + t. Finally, we observe that R(T + t)⊥ = N ((T + t)∗ ) = N (T +t) = {0} (by Theorem 6.2). Putting things together, we see that R(T + t) = H, so T + t is bijective and thus −t ∈ / σ(T ). This holds for every t > 0, so, since T is self-adjoint, σ(T ) ⊂ [0, ∞) and T ≥ 0.  Theorem 10.14. Let T ∈ B(H), T ≥ 0. Then there exists a unique S ∈ B(H), S ≥ 0 so that S 2 = T .

The Spectral Theorem

123

R Proof. Existence is very easy: By the Spectral Theorem, T = [0,∞) t dE(t). R The operator S = [0,∞) t1/2 dE(t) has the desired properties (here, t1/2 of course denotes the positive square root). Uniqueness isn’t hard either, but more technical, and we just sketch this part: If S0 is anotherR operator with S0 ≥ 0, S02 = T , write R S0 = [0,∞) s dE0 (s), so T = [0,∞) s2 dE0 (s). Now we can run a “substiR e0 (t), where tution” s2 = t (of sorts) and rewrite this as T = t dE [0,∞)

e0 (M ) = E0 ({s2 : s ∈ M }) (this part would need a more serious disE cussion if a full proof is desired). By the uniqueness of the spectral e0 = E, and this will imply that resolution E (see Theorem 10.5), E S0 = S.  Exercise 10.18. Let T ∈ Cn×n be a normal matrix with n distinct, nonzero eigenvalues. Show that there are precisely 2n normal (!) matrices S ∈ Cn×n with S 2 = T . Exercise 10.19. Recall that σp (T ) was defined as the set of eigenvalues of T ; equivalently, z ∈ σp (T ) precisely if N (T − z) 6= {0}. Show that if T ∈ B(H) is normal, then z ∈ σp (T ) if and only if E({z}) 6= 0 (here, as usual, E denotes the spectral resolution of T ). Exercise 10.20. Let T ∈ B(H) be normal. Show that z ∈ σ(T ) if and only if there exists a sequence xn ∈ H, kxn k = 1, so that (T −z)xn → 0. Exercise 10.21. Suppose that T ∈ B(H) is both unitary and selfadjoint. Show that T is of the form T = 2P − 1, for some orthogonal projection P . Show also that, conversely, every such operator T is unitary and self-adjoint. Suggestion: Use the Spectral Theorem and Theorem 10.11 for the first part. Exercise 10.22. Let T ∈ B(H). Recall that a closed subspace M ⊂ H is called invariant if T M ⊂ M , that is, if T x ∈ M for all x ∈ M . Call M a reducing subspace if both M and M ⊥ are invariant. Show that if T is normal with spectral resolution E, then R(E(B)) is a reducing subspace for every Borel set B ⊂ C. Hint: E(B) = χB (T ); now use the functional calculus. Math Department, University of Oklahoma, Norman, OK 73019 E-mail address: [email protected] URL: www.math.ou.edu/∼cremling

11. Unbounded operators Many important operators on Hilbert spaces are not bounded. For example, differential operators on L2 (Rn ) are never bounded. Therefore, we now want to analyze general linear operators T : D(T ) → H, where the domain D(T ) is assumed to be a subspace of H, not necessarily equal to H. Of course, if T ∈ B(H), then we do have that D(T ) = H. Conversely, the closed graph theorem shows that if T is closed and D(T ) = H, then T ∈ B(H), so closed unbounded operators are never defined on all of H. The vast majority of the operators that occur in applications are closed or at least have closed extensions, so the added flexibility of a domain D(T ), not necessarily equal to the whole space, is a crucial part of the set-up. More generally, the same argument, applied to the Hilbert space H0 = D(T ), shows that a closed unbounded operator can never have a closed domain. The existence of a domain is actually the main reason why unbounded operators can become quite awkward to deal with. It must always be taken into account when manipulating operators. For example, if S, T are linear operators on H, then we define sum and product as follows: D(S + T ) := D(S) ∩ D(T ) (S + T )x := Sx + T x, D(ST ) := {x ∈ D(T ) : T x ∈ D(S)} (ST )x := S(T x) Next, we want to define an adjoint operator T ∗ . We will assume that T is densely defined, that is, D(T ) = H. The following definition looks natural: D(T ∗ ) = {x ∈ H : There exists z = zx ∈ H so that hx, T yi = hz, yi for all y ∈ D(T )}, T ∗ x := z

(x ∈ D(T ∗ ))

The assumption that T is densely defined makes sure that such a z, if it exists at all, is unique, so this is well defined. Notice that we have defined D(T ∗ ) as the largest set of vectors x for which T can be moved over to the other argument in the scalar product in the expression hx, T yi. As before, we call T ∗ the adjoint operator (of T ). Exercise 11.1. Prove that D(T ∗ ) is a subspace and that T ∗ is a linear operator. Also, prove that if T ∈ B(H), then this new definition just recovers the operator T ∗ ∈ B(H) that was introduced earlier, in Chapter 6. One possible concern about this definition is the possibility of D(T ∗ ) being rather small, and, indeed, it can happen that D(T ∗ ) = {0}, and 124

Unbounded operators

125

then we don’t really get any operator at all. See Exercises 11.19 and especially 11.20. Fortunately, for large classes of operators T , the adjoint operator T ∗ will be densely defined also. We need some additional terminology: Definition 11.1. The graph of an operator T is defined as the set G(T ) = {(x, T x) : x ∈ D(T )}. We call T closed if G(T ) is a closed subset of H ⊕ H. We call T closable if G(T ) is the graph of some linear operator T0 . In this case, T0 is unique, and we call this new operator the closure of T and denote it by T0 = T . Except for the notions of closable operators and closures of operators, this is already familiar to us. As always, the sequence characterizations are often easier to work with, so let us state these, too: T is closed precisely if xn ∈ D(T ), xn → x, T xn → y implies that x ∈ D(T ) and T x = y. The details are important here: The fact that T is closed does not imply that x ∈ D(T ) if xn ∈ D(T ) and xn → x. Indeed, that would mean that D(T ) itself is closed, but we already know that this never happens for unbounded closed operators. We only obtain this conclusion (x ∈ D(T )) if, in addition, T xn also converges. To rephrase the condition that was used to define closable operators, observe the following: if G(T ) is the graph of some operator T0 and (x, y) ∈ G(T ), then clearly T0 x = y. So, if T is closable, then G(T ) certainly must have the property that if (x, y), (x, y 0 ) ∈ G(T ), then y = y 0 . On the other hand, if this is satisfied, then T is closable. Indeed, we can just define D(T0 ) = {x ∈ H : (x, y) ∈ G(T ) for some y ∈ H}, T0 x = y. Then, by definition, G(T0 ) = G(T ), and T0 is linear because of the following fact (and because closures of subspaces are subspaces again). Exercise 11.2. Let S : D(S) → H be an arbitrary map. Show that D(S) is a (linear) subspace and S is linear if and only if G(S) is a (linear) subspace of H ⊕ H. So T is closable if and only if (x, y), (x, y 0 ) ∈ G(T ) implies that y = y 0 . Because G(T ) is a subspace, this admits a simpler reformulation: T is closable if and only if (0, y) ∈ G(T ) implies that y = 0. Exercise 11.3. Prove this remark. This, finally, leads to the following sequence characterization of closability: T is closable if and only if xn ∈ D(T ), xn → 0, T xn → y

126

Christian Remling

implies that y = 0. Moreover, if T is closable, then D(T ) = {x ∈ H : There exists a sequence xn ∈ D(T ) so that xn → x, T xn → y for some y ∈ H}. If x ∈ D(T ) and y is as above, then T x = y. The condition that T is closable makes sure that this y is uniquely determined by x, so this is well defined. Occasionally, the following reformulation is also useful. Call an operator S an extension of T if D(S) ⊃ D(T ) and Sx = T x for all x ∈ D(T ). In this case, we also write S ⊃ T . Exercise 11.4. Prove that T is closable if and only if T has a closed extension. If T is closable, then T is the smallest closed extension of T . Again, there are pitfalls for the unwary: While the operator T is the smallest closed extension of T , if T is unbounded, it is never true that D(T ) is the closure of D(T ) (if it were, T and hence also T would be bounded by the closed graph theorem). The closure operation is applied to the graph of T . Theorem 11.2. Let T be a densely defined operator. Then T ∗ is closed. Proof. Assume that xn ∈ D(T ∗ ), xn → x, T ∗ xn → y, and let z ∈ D(T ). Then hx, T zi = lim hxn , T zi = lim hT ∗ xn , zi = hy, zi. n→∞

n→∞

This calculation (which uses the continuity of the scalar product and the fact that xn ∈ D(T ∗ )) shows that x ∈ D(T ∗ ) and T ∗ x = y, as required.  Here’s another useful fact (compare also Theorem 6.2). Theorem 11.3. Let T be a densely defined operator. Then N (T ∗ ) = R(T )⊥ . For unbounded operators, we of course define N (S) = {x ∈ D(S) : Sx = 0} and R(S) = {Sx : x ∈ D(S)}. Proof. From the definition of T ∗ , we immediately obtain that x ∈ N (T ∗ ) precisely if hx, T yi = 0 for all y ∈ D(T ), that is, if and only if x ∈ R(T )⊥ .  Exercise 11.5. Let S, T be densely defined operators with S ⊂ T . Show that then T ∗ ⊂ S ∗ . Theorem 11.4. Let T be a densely defined operator. Then T is closable if and only if D(T ∗ ) is a dense subspace. In this case, T = T ∗∗ and ∗ T = T∗

Unbounded operators

127

Proof. We will show that (0, y) ∈ G(T ) if and only if y ∈ D(T ∗ )⊥ . This will establish the equivalence asserted in the Theorem, because a subspace is dense precisely if it has zero orthogonal complement (see also Corollary 5.9). Now if (0, y) ∈ G(T ), then T xn → y for some sequence xn ∈ D(T ) with xn → 0. So for every z ∈ D(T ∗ ), we have that hz, yi = lim hz, T xn i = lim hT ∗ z, xn i = 0, n→∞

n→∞

∗ ⊥

so y ∈ D(T ) . To prove the converse, we first of all make the following observation (recall that on H ⊕ H, we use the scalar product h(u, v), (x, y)i = hu, xi + hv, yi): (11.1)



(u, v) ∈ G(T ) = G(T )⊥ ⇐⇒ v ∈ D(T ∗ ) and T ∗ v = −u

Indeed, (u, v) ∈ G(T )⊥ if and only if hu, xi + hv, T xi = 0 for all x ∈ D(T ) and this says that v ∈ D(T ∗ ) and T ∗ v = −u, as claimed. Now if y ∈ D(T ∗ )⊥ , then (11.1) shows that (0, y) ∈ G(T )⊥⊥ = G(T ) (use Corollary 5.9 again for this last step), as asserted earlier. If T ∗ is densely defined, then the definition of the adjoint operator shows that T ∗∗ ⊃ T . Since T ∗∗ is also closed, by Theorem 11.2, this implies that T ⊂ T ∗∗ . On the other hand, if x ∈ D(T ∗∗ ) and v ∈ D(T ∗ ), then hx, T ∗ vi = hT ∗∗ x, vi. By (11.1), this means that (x, T ∗∗ x) ∈ G(T )⊥⊥ = G(T ), so x ∈ D(T ), and thus T = T ∗∗ . Finally, this also implies that if T is closable, then ∗

T = T ∗∗∗ = T ∗ = T ∗ . The last step is by Theorem 11.2.



As before, we want to call an operator self-adjoint if it can go anywhere in a scalar product, that is, if we have the identity hx, T yi = hT x, yi. The presence of domains complicates this issue considerably. Definition 11.5. We call an operator T symmetric if T is densely defined and T ⊂ T ∗ . If, in addition, T = T ∗ , then we call T self-adjoint. More explicitly, the symmetry of a densely defined T is equivalent to the condition hx, T yi = hT x, yi for all x, y ∈ D(T ). Notice that this implies that T ∗ ⊃ T . If, in addition, we also have that D(T ) = D(T ∗ ), then T is self-adjoint. Exercise 11.6. Let T be a symmetric operator. Show that T is closable and that T is also symmetric.

128

Christian Remling

Example 11.1. Let H = L2 (0, 1) and define T f = if 0 on D(T ) = C0∞ (0, 1), the smooth functions on (0, 1) whose support is a compact subset of (0, 1). Since these are dense in L2 (0, 1), T is densely defined. It is easy to check that T is symmetric: an integration by parts shows that if f, g ∈ C0∞ (0, 1), then Z 1 Z 1 0 hf, T gi = f (x)ig (x) dx = −i f 0 (x)g(x) dx = hT f, gi. 0

0

However, T is not self-adjoint. The above calculation in fact shows that if f is an arbitrary C 1 function, then we still have that hf, T gi = hif 0 , gi, so D(T ∗ ) is strictly larger than C0∞ = D(T ). Let us try to find T ∗ explicitly. First of all, if f ∈ AC[0, 1], then the integration by parts calculation from above still goes through. See Folland, Real Analysis, Theorem 3.36 and Exercise 3.5.35. The space AC[0, 1] of absolutely continuous (on [0, 1]) functions can be defined in various ways; here is one possible version: f ∈ AC[0, R x 1] if and on1 ly if there exists h ∈ L (0, 1) so that f (x) = f (0) + 0 h(t) dt for all x ∈ [0, 1]. Absolutely continuous functions are differentiable almost everywhere, and if h is as above, then f 0 = h almost everywhere. Please see Folland, Section 3.5 for (much) more on absolutely continuous functions. We conclude that f ∈ D(T ∗ ) if f ∈ AC[0, 1] and f 0 ∈ L2 (0, 1), and T ∗ f = if 0 for these f . Conversely, assume that f ∈ D(T ∗ ). This means that there exists h ∈ L2 (0, 1) so that Z 1 Z 1 0 (11.2) i f (x)g (x) dx = h(x)g(x) dx 0 ∞ C0 (0, 1).

0

Now one possible interpretation of (11.2) is: the for all g ∈ distributional derivative of f equals −ih. In particular, f 0 ∈ D0 (0, 1) is an integrable function (since L2 (0, 1) ⊂ L1 (0, 1)). This implies that f ∈ AC[0, 1] and h = if 0 , so (11.3)

D(T ∗ ) = {f ∈ AC[0, 1] : f 0 ∈ L2 (0, 1)},

T ∗ f = if 0 .

If you are not familiar with the distributional characterization of absolute continuity, then the use of distributions can be avoided. Here’s an alternative argument. Suppose that f and h are as in (11.2), and let Z x (11.4) F (x) = f (x) + i h(t) dt. 0 2

Clearly, F ∈ L (0, 1). A calculation using the Fubini-Tonelli Theorem and (11.2) shows that hF, g 0 i = 0 for all g ∈ C0∞ (0, 1). Fix g0 ∈ C0∞ (0, 1) R with g0 = 1. Also, observe that h ∈ C0∞ (0, 1) is of the form h = g 0 for some g ∈ C0∞ (0, 1) if (and only if, but this is not needed here)

Unbounded operators

129

R

h = 0, so we can now Rrephrase and say that hF, g − cg0 i = 0 for all g ∈ C0∞ (0, 1), where c = g = h1, gi. Or, put differently, hF −c0 , gi = 0 for all g ∈ C0∞ (0, 1), and here c0 = hg0 , F i ∈ C is a constant (function). However, C0∞ (0, 1)⊥ = {0}, because C0∞ is dense, so F = c0 . Now (11.4) again confirms that (11.3) holds. In particular, T ∗ is densely defined, so T is closable. What is its closure? Exercise 11.7. Use similar arguments to show that D(T ∗∗ ) = {f ∈ AC[0, 1] : f 0 ∈ L2 (0, 1), f (0) = f (1) = 0}, T ∗∗ f = if 0 . Recall also that AC[0, 1] functions are continuous on [0, 1], so it makes sense to evaluate these at 0 and 1. Let S = T = T ∗∗ . Then S is closed and symmetric because S ∗ = T ∗ and thus S ∗ ⊃ S by (11.3). However, S is not self-adjoint, as D(S ∗ ) is strictly larger than D(S). When we make the domain larger (but in such a way that we still have a restriction of S ∗ ), the domain of the adjoint operator will decrease, so perhaps self-adjoint operators can be obtained in this way. It is clear that D(S ∗ ) is too large; in fact, S ∗ is not even symmetric because S ∗∗ = S 6⊃ S ∗ . However, the intermediate domains D(Sa ) = {f ∈ AC[0, 1] : f (1) = eia f (0)},

Sa f = if 0

work: Sa is self-adjoint for every a ∈ [0, 2π) (we don’t want to prove this here, but you can try to give a proof that is modeled on the discussion above). Notice that S ⊂ Sa ⊂ S ∗ for all a. This situation is typical. It is often easy to find domains on which operators are symmetric, but to build self-adjoint operators, the domains must be chosen very carefully. The so-called von Neumann-theory provides an easy systematic approach to these issues; we will not discuss it here. Instead, we will prove the following abstract criterion. Theorem 11.6. Let T be symmetric operator, and let z ∈ C \ R. Then the following statements are equivalent: (a) T is self-adjoint; (b) T is closed and N (T ∗ − z) = N (T ∗ − z) = {0}; (c) R(T − z) = R(T − z) = H. In the proof, we will make use of the following fact: If T is a densely defined operator and z ∈ C, then (T − z)∗ = T ∗ − z. Exercise 11.8. Prove this.

130

Christian Remling

Proof. (a) =⇒ (b): T is closed because T = T ∗ and adjoint operators are always closed (Theorem 11.2). Suppose that x ∈ N (T ∗ − z) = N (T − z). Then zhx, xi = hT x, xi = hx, T xi = zhx, xi, so x = 0. Of course, a similar argument works for N (T ∗ + z), so we have established (b). (b) =⇒ (c): By Theorem 11.3, R(T − z)⊥ = N (T ∗ − z) = {0}, so R(T − z) is dense. So it now suffices to show that this space is closed. Let yn ∈ R(T − z), so yn = (T − z)xn with xn ∈ D(T ), and suppose that yn → y. Write z = a + ib; by assumption, b 6= 0. If u ∈ D(T ), then k(T − z)uk2 = h(T − a − ib)u, (T − a − ib)ui = k(T − a)uk2 + b2 kuk2 because T ∗ ⊃ T , so hu, (T − a)ui = h(T − a)u, ui. It follows that kuk ≤

1 k(T − z)uk, |b|

and by applying this with u = xm − xn , we see that xn is a Cauchy sequence, so x = lim xn exists. Since T xn also converges, to y + zx, and T is closed, we conclude that x ∈ D(T ) and T x = y + zx or y = (T − z)x ∈ R(T − z), as desired. An analogous argument handles R(T − z). (c) =⇒ (a): Let x ∈ D(T ∗ ). By hypothesis, we can find a y ∈ D(T ) so that (T − z)y = (T ∗ − z)x. Since T ⊂ T ∗ , we have that x − y ∈ N (T ∗ − z). However, N (T ∗ − z) = R(T − z)⊥ = {0} (by Theorem 11.3 and assumption), so x = y ∈ D(T ). We have shown that D(T ∗ ) = D(T ), so T ∗ = T .  Definition 11.7. Let T be a closed operator. Define ρ(T ) = {z ∈ C : N (T − z) = {0}, R(T − z) = H} σ(T ) = C \ ρ(T ). We call ρ(T ) the resolvent set and σ(T ) the spectrum of T . Notice that if z ∈ ρ(T ), then (T − z)−1 ∈ B(H). This follows from the closed graph theorem because (T − z)−1 is a closed operator that is defined everywhere. Here we use the fact that an injective operator S is closed if and only if S −1 is closed; in fact, this is obvious because G(S) = {(x, Sx) : x ∈ D(S)} and G(S −1 ) = {(Sx, x) : x ∈ D(S)}. Conversely, if T − z is invertible as a map and (T − z)−1 ∈ B(H), then obviously z ∈ ρ(T ).

Unbounded operators

131

These remarks confirm that the definition is natural. The resolvent set consists of those z ∈ C for which T − z is invertible as a map and the inverse map lies in B(H), and this is a direct generalization of our earlier definition for bounded operators. As before, we call R(z) = (T − z)−1 (z ∈ ρ(T )) the resolvent of T . Proposition 11.8. ρ(T ) is an open subset of C. This is proved as in the case of bounded operators. If T − z0 is invertible in B(H), then we can write T − z = (T − z0 )(1 + (z − z0 )(T − z0 )−1 ) (as usual, the domains require constant attention, but they do not cause any trouble in this formula) and then use the Neumann series to show that the second factor is invertible in B(H) if |z − z0 | is small. See our discussion in Chapter 7, especially Corollary 7.5 and Theorem 7.7(a). Proposition 11.9 (First resolvent identity). Let T be a closed operator and w, z ∈ ρ(T ). Then R(w) − R(z) = (w − z)R(w)R(z); in particular, R(w) and R(z) commute. Proof. We have that (w − z)R(w)R(z) = R(w)(T − z − (T − w))R(z) = R(w) − R(z), which appears to verify the claim. However, this is a formal calculation and we also need to take the domains into account. More precisely, the domain of R(w)((T − z) − (T − w))R(z) is the space of those x ∈ H for which R(z)x ∈ D(T ), but, fortunately, this is all of H because R(z), being the inverse of T − z, has range D(T − z) = D(T ). So the above calculation is sound.  Exercise 11.9. Here’s an illustration of the kind of trouble we might run into if we let our guard down and just manipulate formally, without watching domains: Show that RS + RT ⊂ R(S + T ), and give an example (perhaps an easy abstract example) where the two sides don’t have the same domain. Corollary 11.10. Let T be self-adjoint. Then σ(T ) ⊂ R. Proof. By combining (b) and (c) of Theorem 11.6, we see that N (T − z) = {0}, R(T − z) = H for every z ∈ / R.  The following example once again demonstrates the dramatic effect that domain issues can have.

132

Christian Remling

Example 11.2. Consider again the operator f 7→ if 0 on L2 (0, 1), on the following domains: D(S) = {f ∈ AC[0, 1] : f 0 ∈ L2 (0, 1)}, D(T ) = {f ∈ D(S) : f (0) = 0} So S is the operator T ∗ from Example 11.1. Exercise 11.10. Prove that both operators are closed. I claim that σ(S) = C, σ(T ) = ∅. The claim on σ(S) is very easy to confirm. Just notice that ez (x) = e−izx ∈ D(S) for all z ∈ C and (S − z)ez = 0. To find σ(T ), fix z ∈ C and let Z x −izx eizt f (t) dt. (Rz f )(x) = −ie 0 2

This is defined for all f ∈ L (0, 1), and in fact (Rz f )(x) is an absolutely continuous function of x ∈ [0, 1]. In particular, Rz f ∈ L2 (0, 1). An easy calculation shows that (Rz f )0 (x) = −iz(Rz f )(x) − if (x) (almost everywhere). This implies that (Rz f )0 ∈ L2 (0, 1), and since clearly (Rz f )(0) = 0, we have that Rz f ∈ D(T ). Moreover, (T −z)Rz = 1 (note that the observations about Rz mapping to D(T ) are needed here to be able to define the left-hand side on all of H), so R(T − z) = H. Similar arguments (use an integration by parts!) show that Rz (T −z)f = f for all f ∈ D(T ), so we also obtain that N (T −z) = {0} (since Rz 0 = 0). Putting things together, we see that z ∈ ρ(T ), and z ∈ C was arbitrary here. We want to formulate and prove the Spectral Theorem for unbounded self-adjoint operators also. From a purely formal point of view, things look very familiar: Theorem 11.11 (The Spectral Theorem for self-adjoint operators). Let T be a self-adjoint operator. Then there exists a unique spectral resolution E (on the Borel sets of σ(T )) so that Z (11.5) T = t dE(t). σ(T )

However, we must tread very carefully here. If T is unbounded, then σ(T ) could be an unbounded subset of R (in fact, as we will prove later, σ(T ) is never bounded unless T ∈ B(H)), and thus (11.5) involves a

Unbounded operators

133

new kind of integral that we haven’t even defined yet (the integrand is an unbounded function). Clearly, we first need to address this issue, and we will do this in an abstract setting. So let E : Ω → B(H) be a resolution of the identity on Ran arbitrary space (Ω, M). We want to extend our earlier definition of Ω f dE to unbounded measurable functions f . If f : Ω → C is such an arbitrary measurable function, we let   Z 2 2 Df = x ∈ H : |f (t)| dkE(t)xk < ∞ . Ω

As a preliminary, we observe the following: Lemma 11.12. Df is a dense subspace of H. If x ∈ Df and y ∈ H, then 1/2 Z Z 2 (11.6) |f | d|µy,x | ≤ kyk |f | dµx,x . Ω



Here, we use the same notation as in Chapter 10: µy,x denotes the complex measure µy,x (ω) = hy, E(ω)xi. We sometimes write dµy,x = dhy, Exi instead. Proof. Suppose that z = x + y, with x, y ∈ Df . Then, for all ω ∈ M, we have that kE(ω)zk2 ≤ (kE(ω)xk + kE(ω)yk)2 ≤ 2kE(ω)xk2 + 2kE(ω)yk2 . This says that µz,z (ω) ≤ 2µx,x (ω) + 2µy,y (ω), so z ∈ Df . If c ∈ C and x ∈ Df , then µcx,cx (ω) = |c|2 µx,x (ω), so cx ∈ Df also. To prove that Df is dense, put ωn = {t ∈ Ω : |f (t)| < n}. If y ∈ R(E(ωn )), then µy,y (ωnc ) = 0 (why?), so Z Z 2 |f | dµy,y = |f |2 dµy,y ≤ n2 kyk2 < ∞, Ω

ωn

and thus y ∈ Df . I now claim that E(ωn )x → x for arbitrary x ∈ H. To verify this, notice that kx − E(ωn )xk2 = µx,x (ωnc ). This goes to c c c zero T c because the sets ωn decrease to the empty set: ω1 ⊃ ω2 ⊃ . . ., ωn = ∅ (you can also apply Monotone Convergence to the functions 1 − χωnc = χωn ). We first prove (11.6) for bounded f . Write d|µy,x | = u dµy,x , with |u| = 1. Then Z Z |f | d|µy,x | = u|f | dµy,x = hy, Ψ(u|f |)xi Ω



Z ≤ kyk kΨ(u|f |)xk = kyk

2

|f | dµx,x Ω

1/2 ,

134

Christian Remling

R as claimed. Here, we make use of the notation Ψ(g) = g dE (as in Chapter 10). For general measurable f , we can apply this to fn = χ{|f | 0, then z ∈ σ(T ). Then establish this property, by constructing a sequence xn ∈ D(T ) with kxn k = 1 and (T − z)xn → 0.

138

Christian Remling

Finally, we sketch a possible proof of the uniqueness assertion: We R can, conversely, go from a Rrepresentation T = t dE(t) of T to a representation (T − i)−1 = z dF (z) of the resolvent, by a change of variables again. Moreover, it is possible to recover E from F . Since we know that F is unique (Theorem 10.5), E must be unique, too.  From the Spectral Theorem, we again obtainR a functional calculus for self-adjoint operators. More precisely, if RT = t dE(t) and f : σ(T ) → C is measurable, then we put f (T ) = f (t) dE(t). Note that this is well defined even if both f and T are unbounded. If f is (essentially) bounded, then f (T ) ∈ B(H), whether or not T is bounded. In fact, it was exactly this property of self-adjoint operators (with f (t) = 1/(t−i)) that made our proof of the Spectral Theorem work. Unbounded self-adjoint operators also have spectral representations, that is, they are unitarily equivalent to multiplication by the variable on a sum of L2 (R, dρ) spaces, but we will not develop this result here. Exercise 11.15. Let ρ be a finite (positive) Borel measure on R. Let D(T ) = {f ∈ L2 (R, dρ) : tf (t) ∈ L2 (R, dρ)}, (M f )(t) = tf (t). Prove that M is a self-adjoint operator on L2 (R, dρ). Theorem 11.15. Let T be a self-adjoint operator. Then T ∈ B(H) if and only if σ(T ) is a bounded set. Exercise 11.16. Prove this. One direction is of course already known to us, and the other direction follows quickly from the Spectral Theorem and Theorem 11.13(b). Exercise 11.17. Let T be a linear operator. Let, for x, y ∈ D(T ), [x, y] = hx, yi + hT x, T yi. Prove that this defines a new scalar product on D(T ). Then show that T is closed if and only if D(T ) is complete with respect to [·, ·]. Exercise 11.18. Here’s an example of an operator that is not closable. Let H = `2 , D0 = {y ∈ `2 : yn = 0 for all n ≥ N = N (y)}. Fix a vector x∈ / D0 , let’s say xn = 1/n, and put D(T ) = L(x) u D0 , T (cx + y) = cx (where c ∈ C, y ∈ D0 ). Show that T is a densely defined linear operator that is not closable. Exercise 11.19. Consider again the operator T from the previous Exercise. Show that D(T ∗ ) = {x}⊥ , which is not dense (this again implies that T is not closable, by Theorem 11.4), and T ∗ y = 0 for all y ∈ D(T ∗ ).

Unbounded operators

139

Exercise 11.20. Here’s a more spectacular example of an operator with non-densely defined T ∗ . Let H = L2 (−1, 1),  D(T ) = f ∈ C ∞ (−1, 1) ∩ L2 (−1, 1) : f (j) (0) ≤ Cf 2−j j! (j ≥ 0) ∞ X f (j) (0) j x; (T f )(x) = j! j=0 so T sends f to its Taylor series, and the domain only contains functions for which this series converges uniformly and absolutely. Show that T is a densely defined operator with D(T ∗ ) = {0}. Suggested strategy: It may be easier to prove an abstract version of this statement first. Establish the following statements: (a) If T is an arbitrary linear operator and N (T ) is dense, then D(T ∗ ) = N (T ∗ ) (in a concrete setting, we already encountered this statement in Exercise 11.19). (b) If R(T ) is also dense, then D(T ∗ ) = {0}. (c) Finally, show that N (T ) and R(T ) are dense for the operator defined above. Exercise 11.21. Prove that the operator T from Example 11.1 is not bounded. Do this directly, by constructing functions fn ∈ C0∞ (0, 1) with kfn k = 1, kfn0 k → ∞. Exercise 11.22. Let T be a self-adjoint operator and let M ⊂ H be a closed subspace. We call M a reducing subspace if D(T ) = (M ∩ D(T )) + (M ⊥ ∩ D(T )) and both M and M ⊥ are invariant under T : if x ∈ M ∩ D(T ), then T x ∈ M , and similarly on M ⊥ ∩ D(T ). Roughly speaking, these conditions say that T can be split into two parts, one on M and a second part on M ⊥ . (a) Show that M is reducing if and only if P T ⊂ T P , where P denotes the projection onto M . (b) Let E be the spectral resolution of T . Show that R(E(B)) is a reducing subspace for every Borel set B ⊂ R. Hint: Theorem 11.13(c) Exercise 11.23. Show that if T is self-adjoint, z ∈ / σ(T ), and f (t) = 1/(t − z), then f (T ) = R(z) (as expected). Exercise 11.24. Similarly, show that if f (t) = t2 , then f (T ) agrees with the direct definition of T 2 = T T that was discussed at the beginning of this chapter. (More generally, if f = p is a polynomial, then the functional calculus just reproduces the direct definition of p(T ).)

12. The formalism of quantum mechanics In this chapter, we discuss some mathematical issues related to the theory of quantum mechanics. We will first give a quick description of the formal structure of quantum mechanics and then prove a number of mathematical (functional analytic) results that are relevant in this context. I will also make a few (slightly off-topic) more philosophical remarks on the interpretation of the formalism; I should perhaps point out that these are very close in spirit to the so-called Copenhagen interpretation of quantum mechanics, which, in turn, is close in spirit to Kant’s idealistic philosophy and theory of cognition. There have been other and radically different attempts, too (for example, the notorious many worlds interpretation), and I’m not particularly familiar with these. Quantum mechanics according to the Copenhagen interpretation, if nothing else, gives a consistent picture, and it opens up a strange new world of breathtaking elegance and awe-inspiring beauty. We base our discussion of quantum mechanics on the following principles: (K1) The state space of a quantum mechanical system is a Hilbert space H; the quantum mechanical states correspond to the one-dimensional subspaces of H. We will usually use normalized vectors (kψk = 1) to represent states. (K2) Physical observables correspond to self-adjoint operators on H. The system interacts with the observer through measurements of observables (and in no other way). The outcome of a measurement is genuinely random. If the system is in the state ψ ∈ H, kψk = 1, at the time of the measurement of the observable A = A∗ , then the probability to observe a value in M ⊂ R is given by PA (M ) = kEA (M )ψk2 , where EA denotes the spectral resolution of A: A =

R R

t dEA (t)

(U) When a measurement has been performed and a value in M of the observable A was observed, then the state must be updated according to EM (A)ψ ψnew = . kEM (A)ψk (D) If no measurement is carried out, then the state evolves according to a group of unitary operators: ψ(t) = U (t)ψ(0),

U (s + t) = U (s)U (t),

and here U (t) is a unitary operator on H for every t ∈ R. 140

Quantum mechanics

141

(K1), (K2) show how a system is described at a fixed point in time (“kinematics”), (U) shows how to update the state when new information becomes available through a measurement, and (D) describes the dynamics of ψ of an unobserved system. Note that ψ is a purely formal construct; it acquires physical meaning only through (K2), when a measurement is performed. It is a list of potential knowledge about the system, perhaps similar to a weather forecast (but note the ingenious way in which probabilities for all possible measurements are encoded in ψ, making the crude weather forecast pale in comparison). On a more philosophical level, we can say that ψ does not describe the “system as such,” but only how it is perceived through measurements. This simple precaution immediately gets rid of a number of alleged “paradoxes” such as Schr¨odinger’s cat, and it also demystifies the “discontinuity” in the evolution of ψ that comes with (U) and that has occasionally been perceived as troublesome: of course, probabilities have to be updated instantaneously as new information becomes available. In fact, far from introducing any mysterious discontinuity, (U) implies a very desirable continuity property: if we measure an observable, and, having completed this measurement, immediately measure the same observable again, then, with probability 1, we will observe the outcome of the first measurement again. Notice how the emphasis shifts here, compared to classical physics. Some scientists and philosophers have found it tempting, for reasons that are unclear to me, to decree that, whether or not we choose to observe a classical system, it will evolve according to a set of physical laws, and if we do observe it, we just read off what the equations tell the system to do. Adherents of such a realistic (in the technical sense, as the opposite of idealistic) philosophy usually proudly declare that physical objects “exist,” independently of their being observed, but fail to explain what verifiable consequences this “existence” might have (or what exactly it refers to). One can probably avoid immediate disaster by a judicious application of this unwarranted “existence” assumption in classical physics, but this does not seem to be good philosophical practice. More to the point, this philosophy becomes untenable in quantum mechanics. Matters are very different: The theory explicitly only talks about the system how it interacts with the observer, and in fact it would be cleaner to not insist on the existence of a system or anything else having a life of its own, independent of the observer. (K1)–(D) describe the skeleton of quantum mechanics. While this style of developing the theory is sometimes referred to as axiomatic quantum mechanics, (K1)–(D) are not axioms in the sense the word is

142

Christian Remling

usually used in mathematics; for a careful development of the theory, further analysis and clarifying comments are required. For example, (U) might cause difficulties if A is such that EA ({x}) = 0 for all x ∈ R (many self-adjoint operators are of this type). Indeed, one suggestion is to just admit projections as observables, corresponding to yes-no questions as the only admissible measurements. We don’t want to discuss these issues here. Moreover, and perhaps more to the point for the working physicist, (K1)–(D) show how the theory works in principle, but not how to model concrete systems. For example, if I want to study the hydrogen atom in its ground state, and perform a measurement of the angular momentum of the electron, what are H, ψ, A, U (t)? Clearly, this is a totally separate issue. We will not say much about this here. As our first mainly mathematical topic in this chapter, we discuss an important result that gives a reformulation of (D). To prepare for this, let T be a self-adjoint operator and let U (t) = e−itT Recall that this R .−its is defined via the functional calculus as U (t) = R e dE(s), where E is the spectral resolution of T . By the properties of the functional calculus, U (t) is unitary for every t ∈ R and U (s + t) = U (s)U (t). Exercise 12.1. Prove these properties of U (t). In other words, U (t) is a unitary group as in (D). Moreover, U (t) is also strongly continuous: this means that for every fixed x ∈ H, the map R → H, t 7→ U (t)x is continuous. To see this, notice that Z 2 |e−itv − e−isv |2 dµx,x (v) kU (t)x − U (s)xk = R

by the properties of the functional calculus. The Dominated Convergence Theorem shows that the right-hand side goes to zero as s → t, as claimed. Exercise 12.2. Let T be a self-adjoint operator on a Hilbert space H, and let U (t) = e−itT . Show that the map R → B(H), t 7→ U (t) is continuous if and only if T ∈ B(H). Exercise 12.3. Let again T be self-adjoint and U (t) = e−itT . Suppose that x ∈ D(T ). Show that then U (t)x ∈ D(T ) for all t ∈ R. Moreover, show that the map t 7→ U (t)x is differentiable and (d/dt)U (t)x = −iT U (t)x (this of course means that limh→0 (1/h)(U (t + h) − U (t))x exists (in H) and equals −iT U (t)x). Hint: Use an argument similar to the one that was used above to establish the strong continuity of U (t) = e−itT .

Quantum mechanics

143

Stone’s Theorem asserts that, conversely, every strongly continuous unitary group is of this type. Theorem 12.1 (Stone). Let U (t) be a strongly continuous unitary group. Then there exists a unique self-adjoint operator T so that U (t) = e−itT . We call T the (infinitesimal) generator of U (t). Exercise 12.4. Let U (t) be a unitary group. Prove that U (0) = 1. Also, prove that U (t)∗ = U (−t). Proof. From Exercise 12.3, we already have at least a vague idea of how to find such a T : we have to “differentiate” U (t). It thus seems natural to define   U (h) − 1 D(S) = x ∈ H : lim x exists , h→0 h U (h) − 1 (12.1) Sx = i lim x. h→0 h It is easy to see that D(S) is a subspace and S is linear. We next claim that D(S) is dense. To show this, we will make use of Hilbert space valued integrals. However, we will not develop this subject carefully here; instead, we will leave some of the details to the reader. For x ∈ H and f ∈ C0∞ (R), we want to define Z (12.2)



xf =

f (t)U (t)x dt. −∞

What exactly do we mean by this? Since the integrand takes values in H, this question certainly has to be asked. Fortunately, several good answers are available. Here, we only need to be able to integrate continuous functions, so (generalized) Riemann sums provide a convenient interpretation of (12.2): We take R ∈ N so large that P supp f ⊂ (−R, R), then form (1/N ) RN n=−RN f (n/N )U (n/N )x and finally take the limit N → ∞ to define the right-hand side of (12.2). Existence of this limit is an easy consequence of the continuity of the integrand, just as in the elementary theory of the Riemann integral. In the sequel, we will make use of some (very plausible) basic properties of this new integral without worrying too much about their formal verification; we leave this to the reader (see Exercise 12.7 below).

144

Christian Remling

First of all, we claim that xf ∈ D(S) whenever f ∈ C0∞ (R). This follows from the following calculation: Z U (h) − 1 1 xf = f (t)(U (t + h) − U (t))x dt h h R Z f (t − h) − f (t) = U (t)x dt h R Now as h → 0, we have that (f (t − h) − f (t))/h → −f 0 (t), uniformly in t ∈ R. Exercise 12.5. Prove this. The point here of course is the uniform convergence; convergence for fixed t just follows from the definition of the derivative. From this, it follows that (1/h)(U (h)−1)xf → x−f 0 ; this shouldn’t be very hard to believe because Riemann integration can be interchanged with uniform limits. In particular, xf ∈ D(S), as claimed. R Now if x ∈ H is arbitrary, fix an f ∈ C0∞ (R) with f = 1, and let fn (t) = nf (nt), xn = xfn . Then

Z

Z

kxn − xk = fn (t)(U (t)x − x) dt |fn (t)| kU (t)x − xk dt.

≤ R

R

Notice that the fn are supported by (−R/n, R/n), for suitable fixed R > 0, and sup|t| 0, we can find a finite set M ⊂ Zd so that PM c (t) <  for all t ∈ R, then we call ψ a bound state. We write Hss , Hws , Hb for the corresponding subsets of `2 . So, roughly speaking, if the system is in a scattering state, then the particle will leave every bounded set if you just wait long enough (the usual qualifications apply: the system of course doesn’t do anything other than respond to questions in the form of measurements; it is really the information in the form of probabilities as encoded in ψ(t) that evolves, not something with an existence of its own). In a weak scattering state, we can only make such a statement about the time averaged probabilities. If the system is in a bound state, on the other hand, it can essentially be confined to a bounded set for all times. Obviously, Hss ⊂ Hws . More can be said: Proposition 12.4. Hss , Hws , and Hb are closed subspaces and `2 = Hws ⊕ Hb . We postpone the proof because other results that we will develop later will come in handy here. We want to relate the dynamically defined subspaces Hss , Hws , Hb to spectral subspaces, so we need to discuss this topic first. We do this in an abstract setting. So let T be a self-adjoint operator on a Hilbert space H. Recall that a Borel measure ρ on R is called absolutely continuous if ρ(B) = 0 for all Borel sets B ⊂ R of Lebesgue measure zero. By the Radon-Nikodym Theorem, ρ is absolutely continuous if and only if dρ(t) = f (t) dt for some density f ∈ L1loc (R), f ≥ 0. If ρ is supported by a Lebesgue null set (that is, there exists a Borel set B ⊂ R with m(B) = ρ(B c ) = 0), then we say that ρ is singular. If ρ is even supported by a countable set, then we call ρ a (pure) point measure. We call ρ continuous if ρ({x}) = 0 for all x ∈ R, and a singular continuous measure is a

Quantum mechanics

149

measure that is both singular and continuous (the standard example being the Cantor measure). Any Borel measure ρ on R can be uniquely decomposed into absolutely continuous, singular continuous, and point parts: (12.3)

ρ = ρac + ρsc + ρpp

Exercise 12.9. Derive this refined decomposition from Lebesgue’s decomposition theorem (see, for example, Folland, Real Analysis, Theorem 3.8), which says that we can, in a unique fashion, write ρ = ρac +ρs , where ρac is absolutely continuous and ρs is singular. In other words, you need to further break up ρs . We now apply these notions to spectral measures to define the spectral subspaces. We write dµx (t) = dkE(t)xk2 for the spectral measure of T and x (we used to denote this by µx,x ). Definition 12.5. The absolutely continuous, singular continuous, and pure point subspaces are defined as follows: Hac = {x ∈ H : µx absolutely continuous} Hsc = {x ∈ H : µx singular continuous} Hpp = {x ∈ H : µx pure point measure} Theorem 12.6. Hac , Hsc , Hpp are closed subspaces; in fact, they are reducing subspaces for T . Moreover, (12.4)

H = Hac ⊕ Hsc ⊕ Hpp .

Proof. We first show that the H... are closed subspaces and that (12.4) holds. We will make use of the following fact, or rather the version for three subsets. Exercise 12.10. Let A, B be subsets of a Hilbert space H and suppose that A ⊥ B and H = A + B (that is, every x ∈ H can be written in the form x = a + b with a ∈ A, b ∈ B). Show that then A = B ⊥ , B = A⊥ , and thus H = A ⊕ B. Let x ∈ H, and decompose µx as in (12.3): µx = µac + µsc + µpp . By the defining properties of the individual parts, we can find disjoint Borel sets Sac , Ssc , Spp that support the corresponding µ’s. Then their union supports µ (or we can just assume that this union is all of R), so x = E(Sac ∪ Ssc ∪ Spp )x = E(Sac )x + E(Ssc )x + E(Spp )x. Notice that, for example, µE(Sac )x (M ) = kE(M )E(Sac )xk2 = kE(M ∩ Sac )xk2 = µx (M ∩ Sac ) = µac (M ),

150

Christian Remling

and similarly for the other parts, so E(Sj )x ∈ Hj for j = ac, sc, pp. This proves that H = Hac + Hsc + Hpp . To prove that these sets are orthogonal to each other, let x ∈ Hac , y ∈ Hsc , say. As above, the corresponding spectral measures admit disjoint supports Sac , Ssc (because one measure is absolutely continuous, the other is singular). It follows that hx, yi = hE(Sac )x, E(Ssc )yi = hx, E(Sac )E(Ssc )yi = 0. An argument of this type works in all cases. To prove that these subspaces are reducing, we will use the criterion from Exercise 11.21(a). So let P be the projection onto Hac , say; we want to show that P T ⊂ T P . Notice that Theorem 11.13(c) implies that (12.5)

E(A)T ⊂ T E(A)

for all Borel sets A ⊂ R. Let x ∈ D(T ). Fix again a Borel set S ⊂ R that supports (µx )ac and is given zero weight by the singular part of µx . Then, as above, P x = E(S)x. Moreover, E(S)x ∈ D(T ), too, so x ∈ D(T P ) and T P x = T E(S)x. By (12.5), this equals E(S)T x, so it just remains to show that E(S)T x = P T x. Now (12.5) also implies that (12.6)

dµT x (t) = t2 dµx (t).

From the first part of the proof, we know that we can obtain P (T x) as E(M )(T x), where the set M ⊂ R needs to be chosen so that it supports the ac part of µT x and is given zero weight by the singular part of the same measure. By (12.6), a set that works for x will also work for T x, so we can take M = S.  It is useful to note that Hpp has an alternative description. As usual, we call x ∈ H, x 6= 0 an eigenvector with eigenvalue t ∈ R if x ∈ D(T ) and T x = tx. Proposition 12.7. Hpp is the closed linear span of the eigenvectors of T. Exercise 12.11. Prove that x 6= 0 is an eigenvector with eigenvalue t if and only if E({t})x = x. Proof. If x is an eigenvector with eigenvalue t, then, by the Exercise, x = E({t})x, so µx (M ) = kE(M )xk2 = kE(M ∩ {t})xk2 is supported by {t} and thus definitely a pure point measure. In other words, x ∈ Hpp for all eigenvectors x. Since Hpp is a closed subspace,

Quantum mechanics

151

this implies that the closed linear span of the eigenvectors is contained in Hpp . Conversely, if x ∈ Hpp and {tj : j ∈ N} supports µx , then (12.7)

x = E({tj : j ∈ N})x = lim

N →∞

N X

E({tj })x,

j=1

and by Exercise 12.11 again, this shows that x is in the closed linear span of the eigenvectors.  Exercise 12.12. Give a careful argument for the second equality in (12.7). We now return to the situation where H = `2 (Zd ) and consider the dynamical subspaces from Definition 12.3. Theorem 12.8. Hws = Hc , Hb = Hpp . Moreover, Hac ⊂ Hss . Here, Hc denotes the continuous subspace Hc = Hac ⊕ Hsc ; in other words, x ∈ Hc if and only if µx is a continuous measure. Theorem 12.8 depends on two easy classical results on the Fourier transform of measures; in fact, it could be argued that Theorem 12.8 is essentially a rephrasing of these results. To see why Fourier transforms are relevant here, denote the standard ONB of `2 (Zd ) by {δn : n ∈ Zd }. So δn (m) = 1 if m = n and δn (m) = 0 otherwise. Write Z e−its dµ(s) µ b(t) = R

for the Fourier transform of a finite (possibly complex) Borel measure µ. Then X X hδn , e−itT ψi 2 = |ρbn (t)|2 , PM (t) = n∈M

n∈M

where we use the notation ρn (B) = hδn , E(B)ψi for the (complex) spectral measure associated with the vectors δn , ψ. Here are the two results about Fourier transforms that will form the basis of our discussion. Our measures are still assumed to be complex Borel measures on R; in particular, they must be finite if they are positive. Theorem 12.9 (Riemann-Lebesgue Lemma). Let µ be an absolutely continuous measure. Then lim|t|→∞ µ b(t) = 0. Theorem 12.10 (Wiener). Z T X 1 lim |b µ(t)|2 dt = |µ({x})|2 T →∞ 2T −T x∈R

152

Christian Remling

Existence of the limit is part of the statement in Wiener’s Theorem. Note that µ({x}) 6= 0 for at most countably many x ∈ R, so there are no difficulties involved in defining the sum. We will not use the formula from Wiener’s Theorem, but the following immediate Corollary: Corollary 12.11. 1 lim T →∞ 2T

Z

T

|b µ(t)|2 dt = 0

−T

if and only if µpp = 0. Exercise 12.13. Provide proofs of the Riemann-Lebesgue Lemma and Wiener’s Theorem. You can of course look these up in the literature, but it might be more fun to try for yourself first. Proof of Theorem 12.8 and Proposition 12.4. Notice that |ρn (B)| = |hδn , E(B)ψi| ≤ kE(B)ψk, so ρn  µψ . Thus, if ψ ∈ Hac , then the ρn (n ∈ Zd ) are absolutely continuous measures, too. The Riemann-Lebesgue Lemma now shows that X PM (t) = |ρbn (t)|2 → 0 (t → ±∞) n∈M d

for all finite M ⊂ Z , so Hac ⊂ Hss . A similar argument, with Corollary 12.11 replacing the RiemannLebesgue Lemma, shows that Hc ⊂ Hws . Conversely, if ψ ∈ Hws , then in particular Z T 1 |ρbn (t)|2 dt → 0 2T −T for all n ∈ Zd , so, by Corollary 12.11 again, every ρn is a continuous measure. But then µψ is a continuous measure, too, because if x ∈ R is an arbitrary point, then X µψ ({x}) = hψ, E({x})ψi = ψ(n)hδn , E({x})ψi X = ψ(n)ρn ({x}) = 0. In other words, ψ ∈ Hc . This identification Hws = Hc also proves that Hws is a closed (in fact: reducing) subspace, as claimed in Proposition 12.4. Next, we show directly that Hb is a closed subspace also. Obviously, cψ ∈ Hb if ψ ∈ Hb , and the estimate hδn , e−itT (ψ1 + ψ2 )i 2 ≤ 2 hδn , e−itT ψ1 i 2 + 2 hδn , e−itT ψ2 i 2

Quantum mechanics

153

makes it clear that also ψ1 + ψ2 ∈ Hb if ψ1,2 ∈ Hb . If ψj ∈ Hb , ψj → ψ, and  > 0 is given, pick first j ∈ N (large enough) so that kψj −ψk2 <  and then M ⊂ Zd , M finite, so large that X hδn , e−itT ψj i 2 <  n∈M /

for all t ∈ R (and this j). Since X hδn , e−itT ψj i − hδn , e−itT ψi 2 = ke−itT (ψj − ψ)k2 = kψj − ψk2 , n∈Zd

by Parseval’s identity, it then follows that X hδn , e−itT ψi 2 < 4 n∈M /

for all t ∈ R. Thus ψ ∈ Hb , and Hb is closed. Exercise 12.14. Prove in a similar way that Hss is a closed subspace. Next, let ψ be an eigenvector, with eigenvalue s, say. Then hδn , e−itT ψi = e hδn , ψi, so X PM c (t) = |hδn , ψi|2 , −ist

n∈M /

and Parseval’s identity ensures that this can be made arbitrarily small by taking M sufficiently large. So all eigenvectors belong to Hb , and Hb is a closed subspace, so Proposition 12.7 shows that Hpp ⊂ Hb . Finally, we claim that Hb ⊥ Hws . To prove this, let ϕ ∈ Hb , ψ ∈ Hws . Then Z T 1 −itT −itT hϕ, ψi = he ϕ, e ψi = he−itT ϕ, e−itT ψi dt 2T −T Z T X 1 he−itT ϕ, δn ihδn , e−itT ψi dt. = 2T −T d n∈Z

We split this sum into two parts, as P follows. Given  > 0, we can find −itT a finite set M ⊂ Zd so that kψk2 n∈M ϕ, δn i|2 < 2 for all / |he t ∈ R. With the help of the Cauchy-Schwarz inequality and Parseval’s identity, we thus see that X −itT −itT he ϕ, δn ihδn , e ψi < , n∈M /

and thus the corresponding time quantity is also < . Since R T averaged −itT ψ ∈ Hws , we have that 1/(2T ) −T |hδn , e ψi|2 dt → 0 for all n ∈ Zd .

154

Christian Remling

Now M is finite, so we can take T > 0 so large that also X 1 Z T −itT −itT he ϕ, δ ihδ , e ψi dt < . n n 2T −T n∈M We again use the Cauchy-Schwarz inequality here, and in fact we use it twice, to estimate the integral as well as the sum. Our estimates have shown that |hϕ, ψi| < 2, and  > 0 was arbitrary here, so hϕ, ψi = 0, as claimed. ⊥ Put differently, we have that Hb ⊂ Hws . We know already that Hws = Hc (see the first part of this proof) and Hc⊥ = Hpp (see (12.4)), so it follows that Hb ⊂ Hpp . Putting things together, we obtain that Hb = Hpp , and this completes the proof.  The decomposition from Theorem 12.6 also induces decompositions of the other quantities that are involved here. This makes use of the fact that we have reducing subspaces. More specifically, we can write T = Tac ⊕ Tsc ⊕ Tpp , where, for example, Tac : Hac → Hac , D(Tac ) = D(T ) ∩ Hac , and if x ∈ D(Tac ), then Tac x = T x. Or we can say that Tac = T Pac , but restricted to the smaller Hilbert space Hac . We also define σj (T ) := σ(Tj ), where j = ac, sc, pp. As in Chapter 10 (see especially Proposition 10.9), we then have that σ(T ) = σac (T ) ∪ σsc (T ) ∪ σpp (T ). The union is not necessarily disjoint, as the following Example shows. We will make use of the following fact, which is of considerable independent interest. Proposition 12.12. The (pure) point spectrum, σpp , is the closure of the set of eigenvalues. Recall that (much) earlier, we introduced the notation σp for the set of eigenvalues; so we can now say that σpp = σp . In particular, σpp can be much larger than σp . Proof. Hpp contains all eigenvectors, so every eigenvalue of T is an eigenvalue of Tpp also, so σp ⊂ σpp . Since σpp , being a spectrum, is closed, we in fact obtain that σp ⊂ σpp . Conversely, if t ∈ σpp = σ(Tpp ), then E((t − r, t + r))x = x for some x = xr ∈ Hpp for all r > 0, but µx is a pure point measure if x ∈ Hpp and an s ∈ R with E({s}) 6= 0 is an eigenvalue by Exercise 12.11, so this means that there are eigenvalues arbitrarily close to t, so  t ∈ σp .

Quantum mechanics

155

Example 12.1. This is in fact very easy from an abstract point of view. We can start out with, let’s say, an absolutely continuous operator on one Hilbert space and a pure point operator on a second space and then assemble these by taking the orthogonal sum. We just need to make sure that the spectra were not disjoint to start with. For example, let H0 = L2 (0, 1), (T0 f )(x) = xf (x). Note that T0 ∈ B(H0 ), T0 = T0∗ . In fact, T0 is already given as multiplication by the variable, as in a Spectral Representation, so the spectral theory of T0 is easy to work out. We claim that Hac = H0 for this operator. Indeed, if f ∈ H0 is arbitrary, then Z 2 µf (M ) = kE(M )f k = |χM f |2 dx. In this context, recall the discussion following Proposition 10.10, where the spectral resolution was identified as E(M )f = χM f . So dµf = |f |2 dx, and, as claimed, all spectral measures are absolutely continuous. Finally, notice that σ(T0 ) = [0, 1]. Next, let H1 be an arbitrary separable Hilbert space. Let {xn } be an ONB of H1 , and fix an enumeration {qn } of Q ∩ [0, 1]. Define T1 xn = qn xn and extend linearly. Exercise 12.15. Prove that T1 is bounded on L(xn ) and thus extends continuously to L(xn ) = H1 . Show that (the extended) T1 ∈ B(H1 ), H1 = Hpp , σ(T1 ) = [0, 1]. We can now let H = H0 ⊕ H1 , T = T0 ⊕ T1 . Then, basically by construction, T0 and T1 are the absolutely continuous and pure point parts, respectively, of T . In particular, σac (T ) = σpp (T ) = [0, 1]. Exercise 12.16. Let T be self-adjoint. Prove that every isolated point of σ(T ) is an eigenvalue of T . Exercise 12.17. Let H = Cn be a finite-dimensional Hilbert space. Show that then Hpp = H, Hac = Hsc = {0} for every self-adjoint T on H. Exercise 12.18. Let P be the projection onto R the closed subspace M ⊂ H, and let T ∈ B(H) be self-adjoint, T = t dE(t). Prove that M is a reducing subspace for T if and only if P E(B) = E(B)P for all Borel sets B ⊂ R. Hint: By Exercise 11.21(a), M is reducing if and only if P T = T P . Remark: The statement also holds for unbounded self-adjoint T , but the proof becomes more technical in this case.

156

Christian Remling

Exercise 12.19. Let µ be a finite (positive) Borel measure on R, with Lebesgue decomposition µ = µac + µsc + µpp . Consider the (self-adjoint) operator of multiplication by the variable on L2 (R, µ): D(Mt ) = {f ∈ L2 (µ) : tf (t) ∈ L2 (µ},

(Mt f )(t) = tf (t)

Prove that Hj may be naturally identified with L2 (R, µj ), for j = ac, sc, pp. More precisely, proceed as follows: Pick disjoint supports Sj of µj , and show that Pj f = χSj f , where Pj denotes the projection onto Hj (j = ac, sc, pp). Given this, we then indeed have that Hj = {f ∈ L2 (µ) : f = 0 µ-a.e. on Sjc }, which may be identified with L2 (Sj , µ) and also with L2 (R, µj ).

13. Jacobi matrices and orthogonal polynomials We now want to analyze one-dimensional discrete Schr¨odinger operators in some detail. These are given by S : `2 (I) → `2 (I) (13.1)

(Su)n = un+1 + un−1 + Vn un .

Here, usually I = Z or I = N, and both cases are in fact of considerable interest. If I = N, then we need to slightly modify the definition of S: we then let (Su)1 = u2 + V1 u1 (or, put differently, we declare the now meaningless expression u0 to be zero). Other intervals I ⊂ Z could be considered, too, if we make similar adjustments at the boundary. The sequence Vn is called the potential. We assume that Vn ∈ R and V ∈ `∞ . The theory could be developed without this latter assumption, but if V ∈ `∞ , then S ∈ B(`2 ) and matters become somewhat easier technically. Exercise 13.1. Prove that S ∈ B(`2 (I)) and S = S ∗ if V ∈ `∞ (I); here, I = N or I = Z. Physically, the operator S provides a discrete model for a quantum mechanical particle, subject to an external field, which is described by the potential energy V . From a mathematical point of view, it is often advantageous to consider the larger class of Jacobi matrices (or Jacobi operators) J: J : `2 (I) → `2 (I) (Ju)n = an un+1 + an−1 un−1 + bn un . Again, if I = N, then we put (Ju)1 = a1 u2 + b1 u1 to obtain an operator on `2 (N). We also assume that an > 0, bn ∈ R and, for simplicity, that a, b ∈ `∞ . Exercise 13.2. Prove again that under these assumptions, J ∈ B(`2 ) and J = J ∗ . Clearly, the Schr¨odinger operator S from (13.1) is a special Jacobi matrix, with an ≡ 1 and Vn = bn . The most basic situation arises when I = N, so we begin our discussion with this case. As J is self-adjoint, there is a spectral representation of J, that is, we can find (finite) R and L 2 Borel measures µj on 2 −1 a unitary transformation U : ` → L (R, µj ) so that U JU = Mt . This is what we get from the abstract theory, but in this concrete setting, it is usually better to work with a specific spectral representation that is especially natural here. 157

158

Christian Remling

Introduce the difference expression τ by (τ u)n = an un+1 + an−1 un−1 + bn un . Formally, this looks the same as J, but we will apply τ to arbitrary sequences u, not necessarily from `2 . To evaluate (τ u)1 , we need a0 , and we can just assign an arbitrary (positive) value, say a0 = 1. The following elementary observation will be very important for what follows: If u ∈ `2 , then (τ u)n = (Ju)n for n ≥ 2, but (τ u)1 = (Ju)1 + a0 u0 . (If u ∈ / `2 , then of course it is meaningless to talk about Ju, but we can consider τ u, without any difficulties.) Let z ∈ C and consider the difference equation (τ −z)u = 0. We seek solutions in the space of all sequences (un )n≥0 . The following properties of the difference equation are elementary, but very important for what follows. Proposition 13.1. Let c, d ∈ C and k ∈ N0 , and let f be an arbitrary sequence. Then there exists a unique solution u of (τ − z)u = f with uk = c, uk+1 = d. To prove this, just define uk = c, uk+1 = d and use the difference equation to recursively find uk+1 , uk+2 , . . . and uk−1 , uk−2 , . . .. Since only this u can work, we also have uniqueness. Exercise 13.3. Show that the set of solutions u to (τ − z)u = 0 is a two-dimensional vector space. Exercise 13.4. Let f be an arbitrary sequence. Show that the general solution of the inhomogeneous equation (τ − z)y = f can be obtained as y = u + y (0) , where u is the general solution of the homogeneous equation (τ − z)u = 0 and y (0) is a fixed solution of the inhomogeneous problem. Exercise 13.5. Suppose that u, v both solve the equation (τ − z)w = 0. Show that then the Wronskian Wn (u, v) = an (un vn+1 − un+1 vn ) is independent of n ∈ N. Exercise 13.6. Prove the variation of constants formula: Fix two solutions u, v of (τ − z)y =P0 with W (u, v) = 1, and let K(n, j) = uj vn − un vj . Then yn = nj=1 K(n, j)fj (n ≥ 1), y0 = 0 solves the inhomogeneous equation (τ − z)y = f . Remark: This is called variation of constants because the formula may be obtained from such an ansatz (make the constants in the general solution to the homogeneous problem n-dependent). Theorem 13.2. Let z ∈ C \ R. Then (τ − z)u = 0 has exactly one linearly independent solution u ∈ `2 (N).

Jacobi matrices

159

By this we mean that the set of solutions u ∈ `2 (N) forms a onedimensional subspace, but the slightly less precise formulation given above is very common. In fact, one often just says that (τ − z)u = 0 has precisely one `2 solution. Proof. J is self-adjoint, so σ(J) ⊂ R, and thus J − z is invertible in B(`2 ). Let u = (J − z)−1 δ1 . Then u ∈ `2 and (J − z)u = δ1 . In particular, this says that ((τ − z)u)n = 0 for n ≥ 2. By suitably defining u at n = 0, we can also achieve that ((τ − z)u)1 = 0, so this (extended) u is an `2 solution. If there was a second, linearly independent `2 solution v, then all solutions of (τ − z)y = 0 would be in `2 . In particular, there would be an `2 solution y with y0 = 0 (but y 6≡ 0). We obtain that (J − z)y = (τ −z)y = 0, but this is a contradiction because the self-adjoint operator J cannot have z ∈ / R as an eigenvalue.  It is convenient to fix a basis of the solution space. For z ∈ C, let pn (z), qn (z) be the solutions of (τ − z)u = 0 with the initial values a0 p0 (z) = 0 p1 (z) = 1

a0 q0 (z) = −1 q1 (z) = 0.

By iterating the difference equation, we see that for fixed n ∈ N, pn (z) and qn (z) are polynomials in z of degree n − 1 and n − 2, respectively. Notice that W (p, q) = 1. Definition 13.3. Let z ∈ C+ = {z ∈ C : Im z > 0}. The TitchmarshWeyl m function is defined as the unique number m(z) ∈ C for which (13.2)

fn (z) = qn (z) + m(z)pn (z) ∈ `2 (N).

To see that this is indeed well defined, recall from Theorem 13.2 that there is a unique `2 solution if z ∈ C+ . Moreover, p, q clearly form a basis of the solution space, so a representation of the type (13.2) is possible, and m(z) will be unique, unless p itself is the `2 solution. This, however, is not possible, because p0 = 0, so (J − z)p = (τ − z)p = 0 if p ∈ `2 , and z would be a non-real eigenvalue of J. Theorem 13.4. Let z ∈ C+ . If (τ − z)f = 0, f 6= 0, and f ∈ `2 , then (13.3)

m(z) = −

f1 (z) . a0 f0 (z)

Moreover, (13.4)

m(z) = hδ1 , (J − z)−1 δ1 i

160

Christian Remling

Proof. If f is specifically the `2 solution from (13.2), then a0 f0 (z) = −1, f1 (z) = m(z), so (13.3) clearly holds. An arbitrary `2 solution is a constant multiple of this solution, by Theorem 13.2, and the constant drops out when forming the quotient from (13.3). Now let fn be as in (13.2), and let g = (J − z)−1 δ1 . Then (J − z)g = δ1 , so ((τ − z)g)n = 0 for n ≥ 2. Moreover, g ∈ `2 . By Theorem 13.2, gn = cfn for all n ≥ 1 for some c ∈ C. By comparing values at n = 1, we obtain that hδ1 , (J − z)−1 δ1 i = cm(z). To find c here, we compare values at n = 2. From the difference equation, we obtain that a1 g2 (z) = (z − b1 )g1 (z) + 1, a1 f2 (z) = (z − b1 )f1 (z) − a0 f0 (z), and a0 f0 (z) = −1, so c = 1, and (13.4) holds.



Exercise 13.7. Write G(m, n; z) = hδm , (J − z)−1 δn i

(z ∈ C \ R)

for the Green function of J, and let f be the `2 solution from (13.2). Prove that ( pm (z)fn (z) m ≤ n (13.5) G(m, n; z) = . pn (z)fm (z) m > n Suggestion: You can try to check this directly, but a systematic approach is also possible, along the following lines: Fix n ∈ N, and let g = (J − z)−1 δn . Then G(m, n) = gm . If we also put g0 = 0, then g solves the inhomogeneous problem (τ − z)g = δn . Use the variation of constants formula (Exercise 13.6) with u = p, v = f and Exercise 13.4 to write down a representation of g, which will involve two unknown constants. Then find these constants from the conditions g0 = 0, g ∈ `2 , and confirm (13.5). Remark: Notice that (13.5) for m = n = 1 gives another proof of (13.4). By the functional calculus, (13.4) shows that Z dµ(t) (13.6) m(z) = , dµ(t) = dkE(t)δ1 k2 ; t − z R here, E denotes the spectral resolution of J. This spectral measure µ can now also be used to give a very convenient spectral representation of J. We could do this by referring to the general theory (see especially the proof of Theorem 10.7), but we prefer to give a direct argument.

Jacobi matrices

Let (U f )(t) =

∞ X

161

pn (t)fn .

n=1

This formula will eventually provide us with a map U : `2 → L2 (R, µ), but at this point, we only define U on finitely supported sequences f ; then convergence of the sum is not an issue and (U f )(t) is well defined for every t ∈ R. To analyze U , we will use the following elegant statement: Lemma 13.5. pn (J)δ1 = δn Proof. We use induction on n. Since p1 (t) = 1, the claim is obvious for n = 1. Since p(t) solves the difference equation (τ − t)p = 0, we have that an pn+1 (t) = −an−1 pn−1 (t) + (t − bn )pn (t). Apply this function to J, and then apply the resulting operator to δ1 . We obtain that an pn+1 (J)δ1 = −an−1 pn−1 (J)δ1 + (J − bn )pn (J)δ1 . Use the induction hypothesis on the right-hand side. This gives an pn+1 (J)δ1 = −an−1 δn−1 + (J − bn )δn . Now (J − bn )δn = an δn+1 + an−1 δn−1 and an > 0, so we obtain the desired conclusion that pn+1 (J)δ1 = δn+1 .  Now let’s look at the norm of U f in L2 (R, dµ). Since U f is a polynomial in t and µ is a compactly supported finite measure, this norm R is finite, and even if it wasn’t, we could still consider |f |2 dµ, because the integrand is non-negative. Write (U f )(t) = g(t) and recall that dµ(t) = dkE(t)δ1 k2 ; so the functional calculus shows that

2 Z N

X

2 2 |g(t)| dµ(t) = kg(J)δ1 k = f n δn .

n=1

R The last step is by Lemma 13.5. Now clearly this last expression is equal to kf k2 . So U is isometric on the subspace of finitely supported sequences f . Since this subspace is dense, U extends uniquely to an isometry on all of `2 , which we will still denote by U . Moreover, since U δn is a polynomial of degree n − 1, all polynomials are in R(U ) ⊂ L2 (R, µ). Since these are dense and since U is an isometry, R(U ) = L2 (R, µ), so U : `2 (N) → L2 (R, µ) is in fact a unitary map. Exercise 13.8. Let T ∈ B(H1 , H2 ) be an isometry. Show that R(T ) is a closed subspace of H2 .

162

Christian Remling

Theorem 13.6. Let µ be the measure associated with the m function of J, as in (13.6) (equivalently, let dµ(t) = dkE(t)δ1 k2 ). Define U : `2 (N) → L2 (R, µ) (U f )(t) = lim

N →∞

N X

pn (t)fn .

n=1

Then the limit exists as a norm limit in L2 (R, µ). Moreover, U is unitary and U JU −1 = Mt in L2 (R, µ). In other words, U and µ can be used as the data in a spectral representation. We will call µ the spectral measure of J. The use of the definite article here slightly contradicts our earlier use of the term spectral measure as any measure of the form dkE(t)xk2 , but since µ really plays such a distinguished role here, we will hardly ever feel the urge to use other spectral measures, so that’s a very small price to pay. Proof. Except for the final claim, this has already been discussed above. Note also that fN = χ{1,2,...,N } f is finitely supported and converges to f in `2 , and the sum in the definition of U is U fN , so we do have the norm convergence that is asserted above. To verify the final claim, observe that Mt U δn = tpn (t) and U Jδn = U (an δn+1 + an−1 δn−1 + bn δn ) = an pn+1 (t) + an−1 pn−1 (t) + bn pn (t) = tpn (t), because pn solves the difference equation τ p(t) = tp(t). If n = 1, the terms containing an−1 have to be omitted in the intermediate steps, but the final result of the calculation is still valid, because a0 p0 = 0. Now both Mt U and U J are bounded linear operators, and the linear combinations of the δn are dense in `2 , so Mt U = U J. We can multiply from the right by U −1 to obtain the claim.  This procedure of setting up a spectral representation is very plausible, at least with hindsight. We introduce the generalized eigenfunctions pn (”generalized” because there is no guarantee that pn ∈ `2 ) and then we just evaluate what formally looks like the scalar product hp(t), f i; in other words, we compute the “expansion coefficients” of f with respect to the generalized eigenfunctions. This is exactly how one diagonalizes a self-adjoint matrix. Obviously, great care has to be exercised here because the p(t) need not lie in the Hilbert space `2 , but our analysis shows that one can make the procedure work anyway.

Jacobi matrices

163

As a by-product of this analysis, we see that a Jacobi matrix on `2 (N) has simple spectrum, that is, one L2 (R, µ) space suffices for a spectral representation. That could have been observed earlier, based on the following general facts. Exercise 13.9. Let T ∈ B(H) be a self-adjoint operator. We call x ∈ H a cyclic vector for T if {f (T )x : f ∈ C(σ(T ))} = H. Show that T has simple spectrum if and only if T has a cyclic vector. Hint: If T has a cyclic vector, return to the proof of Theorem 10.7 to construct a spectral representation consisting of just one L2 (R, µ) space. For the converse, it suffices to find a cyclic vector of Mt as an operator on L2 (R, µ). Exercise 13.10. Find a cyclic vector of J ∈ B(`2 (N)). There is an interesting rephrasing of this material. Theorem 13.7. Let µ, pn (t) be as above. Then µ(R) = 1, and the pn are the normalized orthogonal polynomials with respect to µ: Z pm (t)pn (t) dµ(t) = δmn hpm , pn iL2 (µ) = R

There is no complex conjugation in the integral defining the scalar product because the pn are real valued polynomials (for t ∈ R). We call the pn normalized because kpn k = 1, and in fact pn has the additional property that the leading coefficient is positive. Exercise 13.11. Prove that 1 pn (t) = tn−1 + lower order terms. a1 a2 · · · an−1 In the sequel, the term normalized orthogonal polynomials will refer to polynomials having these two properties: the pn form an ONS and have positive leading coefficients. Proof. We have that pn = U δn , so, since U is unitary, hpm , pn iL2 (µ) = hU δm , U δn iL2 (µ) = hδm , δn i`2 = δmn .  So a Jacobi matrix produces a probability measure µ, and the pn are the normalized orthogonal polynomials with respect to this measure. Conversely, suppose that a compactly supported Borel measure ρ with ρ(R) = 1 is given. We also assume that ρ is not supported by a finite set. Then the monomials 1, t, t2 , . . . form a linearly independent subset of L2 (R, ρ).

164

Christian Remling

Exercise 13.12. Prove this remark. The Gram-Schmidt procedure yields orthogonal polynomials p1 , p2 , p3 , . . ., of norm one and degree deg pn = n − 1. If we also insist on positive leading coefficients, then the pn are uniquely determined by ρ. The other defining properties are: hpm , pn i = 0 if m 6= n, where this scalar product is taken in L2 (ρ), and L(p1 , . . . , pN ) = L(1, . . . , tN −1 ). Notice also that the Gram-Schmidt procedure produces real valued polynomials. Exercise 13.13. This problem provides a quick review of the GramSchmidt procedure. Let x1 , x2 , x3 , . . . be a linearly independent collection of vectors from a Hilbert space H. Define z1 = x1 , y1 = z1 /kz1 k. In general, if y1 , . . . , yn−1 have been constructed, put n−1 X zn = xn − hyj , xn iyj , j=1

yn =

zn . kzn k

Prove the following statements: zn 6= 0 for all n ≥ 1, {yn } is an ONS, and L(y1 , . . . , yN ) = L(x1 , . . . , xN ) for all N ≥ 1. Moreover, if {yn0 } is another ONS with this property, then yn0 = eiαn yn for all n ≥ 1. I now claim that the pn satisfy a three-term (Jacobi type) recurrence relation. Consider tpn (t). This is a polynomial of degree n, and thus must be a linear combination of p1 , p2 , . . . , pn+1 : tpn =

n+1 X

(n)

dj pj

j=1

Now hpj , tpn i = htpj , pn i, and this is zero if j ≤ n − 2, because tpj is of degree j and, by its construction, pn is orthogonal to every polynomial (n) of degree < n − 1. So dj = 0 for j ≤ n − 2, and the expansion really reads (13.7)

tpn = an pn+1 + bn pn + cn pn−1

(we have relabeled the coefficients here). By orthogonality, cn = hpn−1 , tpn i = htpn−1 , pn i = han−1 pn + bn−1 pn−1 + cn−1 pn−2 , pn i = an−1 . So (13.8)

tpn = an pn+1 + an−1 pn−1 + bn pn ,

and an , bn ∈ R here. This follows because these coefficients can be obtained as scalar products, (13.9)

an = hpn+1 , tpn i,

bn = hpn , tpn i,

Jacobi matrices

165

and, as observed above, the pn are real valued. Moreover, since kpn k = 1 by construction, these formulae also show that an , bn ∈ `∞ ; here, we make use of the assumption that ρ is compactly supported (so t is an essentially bounded function). Finally, (13.8) also implies, by an argument very similar to Exercise 13.11, that the leading coefficient of pn is 1/(a1 · · · an−1 ), so an > 0. If n = 1, then again a slight adjustment in the above argument is necessary. In this case, we don’t have the third term on the right-hand side of (13.7), so tp1 = a1 p2 +b1 p1 . Or, better yet, we could just declare p0 = 0, and then no adjustment is necessary in the formulae. Starting out from a measure ρ, we have produced a Jacobi difference equation (13.8) that is related to ρ in the following way: If we form the pn for this Jacobi matrix, then these are the normalized orthogonal polynomials for ρ. Is it also true that ρ = µ, the spectral measure of J? This turns out to be the case. We will make use of the fact that a compactly supported finite measure is uniquely determined by R n its moments Mn = t dρ(t). Proposition 13.8. Let µ, ν be compactly supported finite measures, and suppose that Z Z n t dµ(t) = tn dν(t) R

R

for all n ∈ N0 . Then µ = ν. Proof. We may view µ, ν as measures on [−R, R]; here, we take R > 0 so large that µ([−R, R]c ) = ν([−R, R]c ) = 0. If f ∈ C[−R, R], then, by the Weierstraß approximation theorem, there exists a sequence of polynomials R R pn so that pn → f uniformly on [−R, R]. By hypothesis, pn dµ = pn dν, Convergence lets us pass to the limit R and Dominated R to obtain that f dµ = f dν. By the uniqueness part of the Riesz Representation Theorem, µ = ν.  Given this, we can now prove easily that ρ = µ. Indeed, for all n ≥ 2, we have that Z pn (t) dρ(t) = h1, pn i = hp1 , pn i = 0, R

R R and similarly for µ. Moreover, 1 dρ = 1 dµ = 1. Since tn is a linear combination of p1 = 1, p2 , . . . , pn , it follows that µ and ρ have the same moments, so µ = ρ. We summarize: Theorem 13.9. The map (a, b) 7→ J(a,b) 7→ dµ(t) = dkE(t)δ1 k2

166

Christian Remling

sets up a one-to-one correspondence between bounded Jacobi coefficients an > 0, bn ∈ R, a, b ∈ `∞ (N) and probability Borel measures of compact, infinite support. Proof. The discussion above has shown that every such measure µ is the spectral measure of some Jacobi matrix with bounded coefficients. Conversely, the normalized orthogonal polynomials pn are uniquely determined by µ, and we saw above that the coefficients a, b in the recursion then have to obey (13.9), so the map (a, b) 7→ µ is also injective.  Example 13.1. Let us now discuss the free Jacobi matrix an = 1, bn = 0 in some detail. This is in fact a discrete Schr¨odinger operator, with zero potential; it models a discrete version of a free quantum mechanical particle, on a half line. We will use the m function as the main tool in our analysis. We will compute m(z) with the help of formula (13.3) from Theorem 13.4. So we need to solve the difference equation (13.10)

fn+1 + fn−1 = zfn .

We can find two linearly independent solutions using the ansatz fn = wn (at least, if z 6= ±2). This f solves (13.10)pif w + 1/w = z. This quadratic equation has the solutions w = z/2± z 2 /4 − 1. Notice that the two solutions satisfy w1 w2 = 1, so unless |wj | = 1, there is exactly one solution in the open unit disk. Call this solution w(z); also, observe that |wj | = 1, that is, wj = eiϕ if and only if z = 2 cos ϕ, which happens if and only if z ∈ [−2, 2]. In particular, if z ∈ C+ , there indeed is a unique solution w(z) with |w| < 1. Clearly, fn = wn ∈ `2 (N), and now (13.3) shows that m(z) = −w(z) (recall that a0 > 0 can in principle be given an arbitrary value, but if we use (13.10) for n = 1 also, that means that we have chosen a0 = 1). We now want to extract information on µ from our knowledge of m(z). A lot can be said about this in a completely general situation; here is the basic result. Theorem 13.10. Let Z F (z) = R

dµ(t) , t−z

where µ is a finite (positive) Borel measure on R. Then: (a) F (t) ≡ limy→0+ F (t + iy) exists for (Lebesgue) almost every t ∈ R; (b) 1 dµ(t) = w∗ − lim Im F (t + iy) dt; y→0+ π

Jacobi matrices

167

more precisely, this holds when integrated against continuous functions of compact support; (c) 1 dµac (t) = Im F (t) dt π (d) µs is supported by S = {t ∈ R : lim Im F (t + iy) = ∞}, y→0+

that is, µs (S c ) = 0; (e) µ({t}) = lim −iyF (t + iy) y→0+

We postpone the proof; let us first use this result to finish the discussion of the free Jacobi matrix. We want to analyze m(t + iy) = −w(t + iy) as y → 0+. Exercise 13.14. Show that m(t) = limy→0+ m(t + iy) exists for all t ∈ R and  q t t2   − 2 + iq 1 − 4 −2 ≤ t ≤ 2  m(t) = − 2t − t42 − 1 t < −2 ;  q   − t + t2 − 1 t > 2 2 4 √ here, refers to the positive square root in all three cases. Parts (c) and (d) of Theorem 13.10 now imply that µs = 0, √ 1 (13.11) dµ(t) = dµac (t) = χ(−2,2) (t) 4 − t2 dt. 2π Recall that the spectrum can be read off from a spectral representation as the topological support of µ (Proposition 10.8), so σ = σac = [−2, 2],

σsc = σpp = ∅.

Since the operator is purely absolutely continuous, Theorem 12.8 shows that all states are strong scattering states. This seems plausible from a physical point of view; note, however, that we also learn that there is no quantum mechanical analog of a classical free particle at rest. Finally, what are the orthogonal polynomials pn for the measure µ from (13.11)? By the general theory, these are the solutions to the difference equation (13.10) with the initial values p0 = 0, p1 = 1. Our discussion above has shown that the two solutions wn , w−n span the

168

Christian Remling

solution space, and the linear combination with the specified initial values is (13.12)

pn (z) =

w(z)n − w(z)−n . w(z) − w(z)−1

In particular, if z = t ∈ [−2, 2] and we write t = 2 cos ϕ with ϕ ∈ [0, π], then w(2 cos ϕ) = e−iϕ (this is the correct formula if we take the w(z) from above for z ∈ C+ and w(t) = limy→0+ w(t + iy), but actually this is not really an issue here because the right-hand side of (13.12) is invariant under w → w−1 ). So with this new notation, (13.12) now says that sin nϕ . pn (2 cos ϕ) = sin ϕ Since both sides are entire functions, this formula in fact holds for all ϕ ∈ C. If we put Pn (t) = pn (2t), then of course Pn (cos ϕ) =

sin nϕ , sin ϕ

and this identifies the Pn as the so-called Chebyshev polynomials of the second kind. The Pn are the normalized orthogonal polynomials for the √ 2 measure dν(t) = (2/π)χ(−1,1) (t) 1 − t dt: Z 1 Pm (t)Pn (t) dν(t) = δmn −1

The first few polynomials are given by P1 (t) = 1, P2 (t) = 2t, P3 (t) = 4t2 − 1, P4 (t) = 8t3 − 4t. The next three Exercises discuss the Chebyshev polynomials of the first kind from this point of view. Let √ P1 = 1, Pn+1 (cos ϕ) = 2 cos nϕ (n ≥ 1). Exercise 13.15. Prove√that Pn+1 (t) is√a polynomial of degree n. Hint: Pn+1 (cos ϕ) = 2 Re (eiϕ )n = 2 Re (cos ϕ + i sin ϕ)n Exercise 13.16. Use the formula   m=n=0 π cos mϕ cos nϕ dϕ = π/2 m = n ≥ 1  0 0 m 6= n, m, n ∈ N0 R to discover a probability Borel measure µ on R with Pm (t)Pn (t) dµ(t) = δmn (m, n ≥ 1). Z

π

Jacobi matrices

169

Exercise 13.17. Find the Jacobi matrix whose spectral measure is the µ from the previous Exercise (in other words, find the coefficients an , bn ). Hint: Use the formula cos ϕ cos(n − 1)ϕ =

1 (cos nϕ + cos(n − 2)ϕ) . 2

Finally, we return to Theorem 13.10. Our discussion will be based on the following important facts about measures and locally integrable functions. Theorem 13.11. Let µ be a Borel measure on R. Then: (a) µ((x − h, x + h)) h→0+ 2h

(Dµ)(x) = lim

exists for (Lebesgue) almost every x ∈ R. Moreover, dµac (x) = (Dµ)(x) dx. (b) The singular part of µ is supported by T = {x ∈ R : (Dµ)(x) = ∞}. In other words, µs (T c ) = 0. (c) If f ∈ L1loc (R), then almost every point x ∈ R is a Lebesgue point of f , that is, for almost every x ∈ R we have that Z x+h 1 lim |f (t) − f (x)| dt = 0. h→0+ 2h x−h See Rudin, Real and Complex Analysis, Chapter 7, and Folland, Real Analysis, Section 3.4. Proof of Theorem 13.10(b)–(e). First of all, notice that Z Z y y Im F (x + iy) = dµ(t) ≥ dµ(t) 2 2 2 2 R (t − x) + y |t−x|≤y (t − x) + y ≥

µ((x − y, x + y)) . 2y

This immediately gives part (d) because it now follows that T ⊂ S and, by Theorem 13.11(b), T already supports µs . It is also not difficult to confirm (b). Let f be a compactly supported continuous function on R. Fubini-Tonelli shows that (13.13) Z Z Z 1 1 y f (x)Im F (x + iy) dx = dµ(t) dx f (x) . π R π R (t − x)2 + y 2 R

170

Christian Remling

Note that the second integral on the right hand side is of the form R Py (t − x)f (x) dx, where   1 t 1 1 Py (t) = P , P (t) = y y π 1 + t2 R (the Poisson kernel ). Notice that P dt = 1, so we have an approximate identity (compare Folland, Real Analysis, Section 8.2). Since f is continuous, the following holds: Exercise 13.18. Prove that if f : R → C is a continuous, bounded function, then Z lim Py (t − x)f (x) dx = f (t) y→0+

R

for all t ∈ R. R R Since Py ≥ 0 and Py = P = 1, we also have that these integrals are ≤ kf k∞ in absolute value, for all y > 0 and t ∈ R. Constants are integrable with respect to the finite measure µ, so we have verified the hypotheses of the Dominated Convergence Theorem. It follows that R the right-hand side of (13.13) goes to f dµ as y → 0+, as claimed. Exercise 13.19. Prove part (e) of Theorem 13.10. Suggestion: Consider separately y Re F (x + iy) and y Im F (x + iy). Write down integral representations for these expressions and then use Dominated Convergence to analyze what happens as y → 0+. We next claim that Im F (x + iy) → 0 if (Dµ)(x) = 0. To (slightly) simplify the notation, we will assume that x = 0 here. Let  > 0 be given. We can then find h0 > 0 so that µ([−h, h]) < h for all 0 < h ≤ 2h0 . Now Z Z y y (13.14) Im F (iy) = dµ(t) + dµ(t), 2 2 2 2 |t|≤h0 t + y |t|>h0 t + y and the second integral is ≤ µ(R)y/h20 . Clearly, this goes to zero as y → 0+, and thus we can fix Y > 0 so that this second integral from (13.14) is <  if 0 < y ≤ Y . Let us now take a look at the first integral. We split this further into smaller pieces, as follows: Z ∞ Z X y y dµ(t) ≤ dµ(t) 2 2 2 2 −n −n+1 y t + y |t|≤h0 t + y n=−N 2 y 0 so that Z h |f (t) − f (0)| dt < h < 2

−h

if 0 < h ≤ 2h0 . We want to show that Z Z 1 1 y y f (t) dt − f (0) = (f (t) − f (0)) dt π R t2 + y 2 π R t2 + y 2 goes to zero as y → 0+. As above, we can make the contribution coming from |t| > h0 small by taking y > 0 small enough, and as for the part |t| ≤ h0 of the region of integration, we again use the dyadic decomposition into pieces where |t| ≈ 2−n y. Similar estimates as before work. Exercise 13.20. Give a careful proof of (13.15) (if x ∈ Lf ) that is based on this sketch. We are now in a position to establish part (c) of Theorem 13.10. Suppose that dµ = f dx+dµs . We must show that (1/π)Im F (x+iy) → f (x) for (Lebesgue) almost every x ∈ R. We can use the Lebesgue decomposition ofR µ to similarly F into two parts: F = F1 + F2 , R dµsplit f dt s where F1 (z) = t−z , F2 = t−z . We know that (Dµs )(x) = 0 for Lebesgue almost every x ∈ R (Theorem 13.11(a)). So, by what has just been shown, Im F2 (x + iy) → 0 for almost every x ∈ R. Moreover, (13.15) says that (1/π)Im F1 (x + iy) → f (x) at every Lebesgue point of f , so, in particular, almost everywhere. Putting things together, we deduce that (1/π)Im F (x + iy) → f (x) almost everywhere, as required. 

172

Christian Remling

Part (a) could be proved using similar tools, but this proof is more involved. Notice also that part (c) of Theorem 13.10 implies that Im F (x) ∈ L1 (R). The real part of the boundary value of F , on the other hand, need not be locally integrable. An easy counterexample is provided by F (z) = −1/z (or µ = δ0 , the Dirac measure at 0). Then F (x) ≡ limy→0+ F (x+iy) = −1/x almost everywhere (in fact, the limit exists at every x 6= 0), so Im F (x) = 0 (which is consistent with µ being a purely singular measure), but Re F (x) = −1/x is not integrable on any neighborhood of zero. Theorem 13.10 actually holds somewhat more generally. Here’s the relevant definition. Definition 13.12. A Herglotz function is a holomorphic function F : C+ → C+ . Exercise 13.21. Let F be as in Theorem 13.10. Show that F is a Herglotz function. Since we of course already know that Im F > 0, you have to show that F is holomorphic. There is a similar integral representation formula for Herglotz functions. Theorem 13.13 (Herglotz Representation). F is a Herglotz function if and only if F is of the form  Z  t 1 F (z) = a + bz + − 2 dµ(t) (z ∈ C+ ) t − z t + 1 R R for some (positive) Borel measure µ on R with t2dµ < ∞ and numbers +1 a ∈ R, b ≥ 0. Moreover, these data (µ, a, b) are uniquely determined by F . R dµ Notice that if µ is a finite measure or if |t|+1 < ∞, then the two integrals exist separately, and the second integral is just a constant, R dµso we then obtain the slightly simpler representation F (z) = A+bz+ t−z . Now, as already indicated above, Theorem 13.10 holds for general Herglotz functions F . The proof is essentially the same as before; some minor adjustments become necessary because now µ need no longer be a finite measure. This material also gives an elegant and quick alternative proof of (a). Let us sketch this argument (it of course depends heavily on those other results that we didn’t prove here). If F is a Herglotz function, p then we can take a holomorphic square root G(z) = F (z), and with the right choice of square root, this will be a Herglotz function also. If fact, Re G and Im G are both positive on C+ , so iG is another Herglotz

Jacobi matrices

173

function. We did prove above that limy→0+ Im F (x + iy) exists almost everywhere, so, by the Herglotz function version of this result, Im G and Im (iG) = Re G also have boundary limits almost everywhere. In other words, G(x + iy) itself converges almost everywhere, and thus F (x + iy) = G(x + iy)2 converges almost everywhere, as asserted. Exercise 13.22. Find the m function mb (z) of the Jacobi matrix Jb whose coefficients are given by: an = 1,

b1 = b,

bn = 0 (n ≥ 2)

More precisely, show that m0 (z) ; 1 + bm0 (z) the m function m0 (z) for b = 0 was of course discussed in detail in Example 13.1.

(13.16)

mb (z) =

Exercise 13.23. Use (13.16) and Theorem 13.10 to discuss the spectral properties of Jb . More precisely, prove the following: σac (Jb ) = [−2, 2], σsc (Jb ) = ∅, and σpp (Jb ) = ∅ if |b| ≤ 1 and σpp (Jb ) = {b+1/b} if |b| > 1.

14. Compact operators We now return, for the time being, to the general theory of linear operators. Let X, Y be Banach spaces, and let T : X → Y be a linear operator (defined everywhere). Definition 14.1. T is called compact if T (B) is a compact set; here, B = B1 (0) = {x ∈ X : kxk < 1}. We denote the set of compact operators by B∞ (X, Y ); if X = Y , we write B∞ (X) instead of B∞ (X, X). Exercise 14.1. Let T ∈ B∞ (X, Y ). Show that T (B) is compact for any bounded set B ⊂ X. Since compact sets are bounded, it follows that compact operators are always bounded: B∞ (X, Y ) ⊂ B(X, Y ). Here’s a convenient rephrasing of the definition. Proposition 14.2. T : X → Y is compact if and only if every sequence xn ∈ B has a subsequence xnj for which T xnj converges. Exercise 14.2. Prove the Proposition. Theorem 14.3. Suppose that S, T ∈ B∞ (X), A ∈ B(X), and c ∈ C. Then S + T, cT, AT, T A ∈ B∞ (X). Put differently, this says that B∞ (X) ⊂ B(X) is a two-sided ideal in the Banach algebra B(X) (“two-sided” refers to the fact that we may multiply by A ∈ B(X) from either side). Proof. We verify the criterion from Proposition 14.2. Given a sequence xn ∈ B, pick a subsequence x0n so that Sx0n converges and then a subsubsequence x00n so that T x00n converges, too. Then (S +T )x00n , cT x00n , and AT x00n converge. Furthermore, since A is bounded, Axn is just another bounded sequence, so T (Axn ) can also be made convergent by passing to a subsequence.  Theorem 14.4. B∞ (X) is a closed subset of B(X). So I = B∞ (X) is a closed two-sided ideal of B(X). If X = H is a separable Hilbert space, then it can be shown that B∞ (H) is the only closed two-sided ideal 6= {0}, H. Proof. Suppose that Tn ∈ B∞ (X), T ∈ B(X), kTn − T k → 0, and let xn be a sequence from B. We must show that T xn has a convergent subsequence. For fixed m, we can of course make Tm xn convergent as n → ∞ by passing to a suitable subsequence, and we can do better than this: a diagonal process lets us find a subsequence x0n with the property that limn→∞ Tm x0n exists for all m. 174

Compact operators

175

Now if  > 0 is given, fix an n ∈ N with kTn − T k < . Then take N ∈ N so large that (for this n) kTn (x0j − x0k )k <  for all j, k ≥ N . For these j, k, we then also have that kT (x0j − x0k )k ≤ kTn (x0j − x0k )k + kTn − T k kx0j − x0k k <  + 2kTn − T k < 3, so T x0n is a Cauchy sequence and thus convergent.



We say that T ∈ B(X, Y ) is a finite rank operator if dim R(T ) < ∞. In this case, if xn ∈ B, then T xn is a bounded sequence from the finite-dimensional space R(T ) ∼ = CN , so we can extract a convergent subsequence by the classical Bolzano-Weierstraß Theorem. Recall also that all norms on a finite-dimensional space are equivalent, so it suffices to identify R(T ) with CN as a vector space and then automatically the induced topology must be the usual topology on CN . So every finite rank operator is compact. In particular, B(Cn ) = B∞ (Cn ). Further examples of compact operators are provided by the following Exercise. Exercise 14.3. Suppose that tn → 0, and let T : `p → `p (1 ≤ p ≤ ∞) be the operator of multiplication by tn . More precisely, (T x)n = tn xn . Show that T is compact. Suggestion: Consider the finite rank truncations TN corresponding to (N ) (N ) the truncated sequence tn and use Theorem 14.4; here, tn = tn if (N ) n ≤ N and tn = 0 if n > N . We now focus on compact operators on a Hilbert space H. Theorem 14.5. Let T ∈ B(H). Then T ∈ B∞ (H) ⇐⇒ T ∗ ∈ B∞ (H) ⇐⇒ T ∗ T ∈ B∞ (H). Proof. Theorem 14.3 shows that T ∗ T ∈ B∞ (H) if T ∗ ∈ B∞ (H) (or T ∈ B∞ (H)). Next, assume that T ∗ T ∈ B∞ (H), and let xn ∈ B. Then T ∗ T xn converges on a suitable subsequence, which, for convenience, we will again denote by xn . The following calculation shows that T xn converges on the same subsequence, so T ∈ B∞ (H). kT (xm − xn )k2 = hT (xm − xn ), T (xm − xn )i = hT ∗ T (xm − xn ), xm − xn i ≤ kT ∗ T (xm − xn )k kxm − xn k ≤ 2kT ∗ T (xm − xn )k Finally, if T is compact, then T T ∗ = T ∗∗ T ∗ ∈ B∞ (H) by Theorem 14.3 again, so the argument from the preceding paragraph now shows that T ∗ ∈ B∞ (H), too. 

176

Christian Remling

Exercise 14.4. Let P ∈ B(H) be the projection onto the subspace M ⊂ H. Show that P is compact if and only if dim M < ∞. Compactness of operators on a Hilbert space admits an especially neat sequence characterization. Theorem 14.6. Let T : H → H be a linear operator (with D(T ) = H). (a) The following statements are equivalent: (i) T ∈ B(H); (ii) xn → 0 =⇒ T xn → 0; w w (iii) xn − → 0 =⇒ T xn − → 0; w (iv) xn → 0 =⇒ T xn − →0 (b) The following statements are equivalent: (i) T ∈ B∞ (H); w (ii) xn − → 0 =⇒ T xn → 0 w

Here, we of course need to remember that xn − → x if and only if hy, xn i → hy, xi for all y ∈ H. Exercise 14.5. Let xn ∈ H and suppose that limn→∞ hy, xn i exists for every y ∈ H. Show that then xn is bounded, that is, there exists C > 0 so that kxn k ≤ C for all n ∈ N. Hint: Apply the uniform boundedness principle to the maps Fn (y) = hxn , yi. Note that every weakly convergent sequence xn satisfies the assumption from this Exercise; conversely, it can be shown that such a sequence xn is weakly convergent, so we could have assumed this instead. The version given here will prove useful in a moment. In the proof of Theorem 14.6, we will need the following Lemma, which is of considerable independent interest. Lemma 14.7. Every bounded sequence xn ∈ H has a weakly convergent subsequence. Proof. For every fixed m, the sequence (hxm , xn i)n is a bounded sequence of complex numbers, so it has a convergent subsequence by the Bolzano-Weierstraß Theorem. Again, a diagonal process lets us in fact find a subsequence x0n for which hxm , x0n i converges, as n → ∞, for all m. The (anti-)linearity of the scalar product now implies that limhy, x0n i exists for all y ∈ L(xm ). Exercise 14.6. Show that this limit exists for all y ∈ L(xm ). Suggestion: Show that the scalar products form a Cauchy sequence.

Compact operators

177

Finally, if w ∈ H is arbitrary, write w = y + z with y ∈ M = L(xm ) and z ∈ M ⊥ . Then hw, x0n i = hy, x0n i, so this sequence converges, too. To show that x0n is weakly convergent, we still need to produce an x ∈ H so that limhw, x0n i = hw, xi for all w ∈ H. To do this, notice first of all that the sequence x0n is bounded, by Exercise 14.5. Therefore, the linear functional F (w) = limhx0n , wi is bounded: |F (w)| ≤ lim sup kx0n k kwk ≤ Ckwk. The Riesz Representation Theorem now shows that F (w) = hx, wi for some x ∈ H, as desired.  Proof of Theorem 14.6. (a) (i) =⇒ (ii): This is obvious, because (ii) is just the sequence version of continuity at x = 0, and so (i) and (ii) are in fact equivalent. w → 0, then also (ii) =⇒ (iii): As just observed, T ∈ B(H). If xn − hy, T xn i = hT ∗ y, xn i → 0 w

for all y ∈ H, so T xn − → 0. (iii) =⇒ (iv) is trivial. (iv) =⇒ (i): Suppose that T ∈ / B(H). Then we can find xn ∈ H, 2 kxn k = 1, with kT xn k ≥ n . Let yn = (1/n)xn . Then yn → 0, but kT yn k ≥ n, so, by Exercise 14.5, the sequence T yn cannot be weakly convergent. w → 0. Then xn is bounded (Exercise (b) (i) =⇒ (ii): Let xn ∈ H, xn − 14.5 again), so there exists a subsequence for which T x0n converges, say w → y, and now part (a), condition (iii) T x0n → y. In particular, T x0n − shows that we must have y = 0 here. This whole argument has in fact shown that every subsequence x0n of xn has a sub-subsequence x00n so that T x00n → 0. It follows that T xn → 0, without the need of passing to a subsequence. (ii) =⇒ (i): Let xn ∈ B. By Lemma 14.7, we can extract a weakly w convergent subsequence, which we denote by xn also. So xn − → x, w and thus xn − x − → 0. By hypothesis, T (xn − x) → 0, so indeed T xn converges (to T x).  We now discuss the spectral theory of compact operators. We first deal with compact normal operators. The following two results give a complete spectral theoretic characterization of these. Theorem 14.8. Let T ∈ B(H) be a compact, normal operator. Then σ(T ) is countable. Write σ(T ) \ {0} = {zn }. Then each zn is an eigenvalue of T of finite multiplicity: 1 ≤ dim N (T − zn ) < ∞. Moreover, zn → 0 if {zn } is infinite.

178

Christian Remling

If Pn denotes the projection onto the eigenspace N (T − zn ), then (14.1)

T =

X

zn Pn .

This series converges in B(H), for an arbitrary arrangement of the zn . Finally, if dim H = ∞, then 0 ∈ σ(T ). Proof. Denote the open disk about 0 of radius r by Dr = {z ∈ C : |z| < r}, and let P = E(Drc ), where E is the spectral resolution of T . Let M = R(P ), which is a reducing subspace for T by Exercise 10.22. I claim that dim M < ∞. Indeed, if this were wrong, we could find a w → 0 (pick any ONS in M ). Theorem sequence xn ∈ M , kxn k = 1, xn − 14.6(b) then shows that T xn → 0. This, however, is impossible because the functional calculus shows that Z 2 kT xn k = |z|2 dkE(z)xn k2 ≥ r2 > 0. C

Now since M is reducing, we can decompose T = TM ⊕ TM ⊥ , and M ⊥ = R(E(Dr )), so kTM ⊥ k ≤ r, and thus TM ⊥ − z is definitely invertible in B(M ⊥ ) if |z| > r. So such a z will be in ρ(T ), unless z ∈ σ(TM ), but TM is an operator on the finite-dimensional space M , so its spectrum consists of eigenvalues only, and there are only finitely many of these. Conversely, it is clear that every eigenvalue of TM is an eigenvalue of T also, so we have shown the following: σ(T ) ∩ Drc is finite for every r > 0 and contains only eigenvalues of T . Moreover, these are of finite multiplicity because N (T − z) = E({z}) ⊂ E(Drc ) = M . It now follows that σ(T ) is countable, and we also obtain the statements about the sequence zn . If dim H = ∞, then either E({0}) 6= 0 or the sequence zn is infinite and thus converges to 0. In both cases, 0 ∈ σ(T ). It remains to establish (14.1). Notice that Pn = E({zn }); in particular, the Pn have mutually orthogonal ranges. The Spectral Theorem shows that Z X (14.2) hx, T yi = z dµx,y (z) = zn hx, Pn yi. C

In the second step, we use the fact that E is supported by {zn } ∪ {0}, so µx,y is a density times counting measure on this set and thus the integral is a sum. Next, we verify that (14.1) converges in B(H). More precisely, we will prove that the partial sums form a Cauchy sequence.

Compact operators

179

Let x ∈ H, and consider

2

0  X  N0 N N0

X X

2 2 2 kPn xk2 zn Pn x = |zn | kPn xk ≤ sup |zn | ·

n>N n=N +1 n=N +1 n=N +1   ≤ sup |zn |2 · kxk2 . n>N

This implies that

0 N N

X X

zn Pn − zn Pn ≤ sup |zn |,

n>N

n=1

n=1

and this supremum goes to zero as N → ∞, as desired. So the right-hand side of (14.1) has a limit, and now (14.2) shows that this limit must be T .  So normal compact operators have representations of the type (14.1). It is also true that, conversely, if we are given data zn and Pn with the properties stated in the Theorem, then we can use (14.1) to define a normal compact operator T . In other words, (14.1) for sequences zn → 0 and mutually orthogonal finite-dimensional projections Pn lists exactly all normal compact operators. To formulate this converse, we slightly change the notation. We let hx, ·ix denote the operator that maps y 7→ hx, yix. Exercise 14.7. Show that hx, ·ix = kxk2 PL(x) . Also, show that if {x1 , . . . , xN } is an ONB of the (finite-dimensional) subspace M , then PM =

N X

hxn , ·ixn .

n=1

Theorem 14.9. Let {xn } be an ONS, and let zn ∈ C, zn 6= 0, zn → 0 (if the sequence is infinite). Then the series X T = zn hxn , ·ixn converges in B(H) (if infinite) to a compact normal operator T . We have that σ(T ) \ {0} = σp (T ) \ {0} = {zn }. Note that Exercise 14.7 guarantees that the series from (14.1) are of this form; if dim R(Pn ) > 1, then we need to pick an ONB of this space and repeat the corresponding eigenvalue zn that number of times. Proof. By Exercise 14.7, the operators hxn , ·ixn are projections onto the mutually orthogonal subspaces L(xn ), so convergence of the series in B(H) follows as in the previous proof. For each fixed N , the operator

180

Christian Remling

PN

n=1 zn hxn , ·ixn

is of finite rank, thus compact, and hence T is compact by Theorem 14.4. To prove that T is normal, we temporarily change our notation again and write hxn , ·ixn = Pn . We compute ∗

T T = lim

N →∞

= lim

N →∞

N X

zm Pm

m=1 N X

N X

zn Pn = lim

n=1

zn Pn

n=1

N X

N →∞

N X

|zn |2 Pn

n=1

zm Pm = T ∗ T,

m=1

so T is normal. It is also clear that T xn = zn xn , and since T is compact, any other non-zero point from the spectrum would have to be an eigenvalue, too, so the following Exercise finishes the proof. Exercise 14.8. Show that if z ∈ / {zn } ∪ {0}, then T x = zx has no solution x 6= 0.  We now move on to arbitrary compact operators T ∈ B(H), not necessarily normal. Actually, we are going to start with some introductory observations that apply to arbitrary bounded operators T ∈ B(H). We will consider T ∗ T , and this is a positive operator by Theorem 9.15. Exercise 14.9. Give an easier proof of this statement (T ∗ T ≥ 0 if T ∈ B(H)) that is based on Theorem 10.13. By Theorem 10.14, T ∗ T has a unique positive square root, which we will denote by |T | := (T ∗ T )1/2 . Exercise 14.10. Show that if T is normal, then this definition of |T | coincides with the one obtained from the functional calculus. In other words, show that Z |T | =

f (z) dE(z), C

where E is the spectral resolution of T and f (z) = |z|. This operator |T | has the important property that (14.3)

k|T |xk = kT xk

for all x ∈ H. We see this from the calculation k|T |xk2 = h|T |x, |T |xi = hx, |T |2 xi = hx, T ∗ T xi = hT x, T xi = kT xk2 .

Compact operators

Exercise 14.11. Compute |T | for   0 −2 T = and 0 0

 T =

181

 1 1 √ √ . 2 − 2

Theorem 14.10. Let T be a compact operator. Then |T | is also compact. Moreover, there exists a unique unitary map V : R(|T |) → R(T ) so that T = V |T |. This representation T = V |T | is called the polar decomposition of T . This terminology emphasizes the analogy to the polar representation of complex numbers z = eiϕ |z|. Proof. T ∗ T is compact by Theorem 14.3 (or Theorem P 14.5). So Theorem 14.8 gives a representation of the type T ∗ T = tn Pn . Since ∗ T T ≥ 0, we must have that tn > 0; if the sequence is infinite, then P 1/2 tn → 0. It now follows that |T | = tn Pn (positive square roots) because this operator is positive by Theorem 14.9 (its spectrum consists of 1/2 the tn and possibly 0) and its square equals T ∗ T , and positive square roots are unique. Theorem 14.9 then also shows that |T | ∈ B∞ (H). To construct V , define V0 : R(|T |) → R(T ) by V0 (|T |x) = T x. This is indeed well defined because if |T |x = |T |y, then |T |(x − y) = 0, so, by (14.3), T (x − y) = 0, that is, T x = T y. Moreover, (14.3) also shows that V0 is isometric. In particular, V0 is continuous, and thus there exists an isometric extension to R(|T |). Since R(V0 ) = R(T ) and isometries have closed ranges, it follows that R(V ) = R(T ). By the construction of V0 , we have the identity V0 |T | = T , so V |T | = T (note that |T |x ∈ R(|T |) for all x, so as far as this identity is concerned, it doesn’t matter if or how we extend V0 ). Finally, if also W |T | = T , then the restriction of W to R(|T |) must agree with V0 , and there is only one continuous extension to the closure, so W = V and V is unique.  To obtain series representations for arbitrary compact operators, we introduce additional data. Let s1 (T ) ≥ s2 (T ) ≥ s3 (T ) ≥ . . . > 0 be the 1/2 non-zero eigenvalues of |T | (what we called tn in the previous proof), repeated according to their (finite) multiplicities. The sn (T ) are called the singular values of T . If the sequence of singular values is infinite, then sn (T ) → 0. Theorem 14.11. Let T ∈ B∞ (H). Then sn (T ) = sn (T ∗ ) = sn (|T |) = sn (|T ∗ |). Moreover, there exist ONS {xn } and {yn }, consisting of eigenvectors of |T | and |T ∗ |, respectively (more precisely, |T |xn = sn xn ,

182

Christian Remling

|T ∗ |yn = sn yn ), so that X |T | = sn hxn , ·ixn , X T = sn hxn , ·iyn ,

|T ∗ | =

X

sn hyn , ·iyn

T∗ =

X

sn hyn , ·ixn .

These sums converge in B(H) (if they are infinite). Proof. We see as in the proof of Theorem 14.8 that these series converge in B(H) if {xn }, {yn } are (arbitrary) ONS. From this Theorem, we also know that |T | can indeed be written in this way, if we interpret sn = sn (T ) and |T |xn = sn xn . Also, from the definition of the singular values, it is already clear that sn (T ) = sn (|T |) and sn (T ∗ ) = sn (|T ∗ |). If we again let {xn } be an ONS of eigenvectors of |T | (so |T |xn = sn xn ), then Theorem 14.10 shows that X X T x = V |T |x = V sn hxn , xixn = sn hxn , xiyn ; here, we have put yn = V xn . Since xn is an ONS from R(|T |) and V is unitary on this space, yn is an ONS, too. Moreover, for arbitrary x, y ∈ H, we have that X X hx, T ∗ yi = hT x, yi = sn hxn , xihyn , yi = sn hx, xn ihyn , yi X = hx, sn hyn , yixn i. This establishes the formula for T ∗ , except that we haven’t shown yet that the yn ’s are eigenvectors of |T ∗ |. A similar calculation reveals that X  X ∗ TT y = T sn hyn , yixn = sm sn hyn , yihxm , xn iym m,n

X

s2n hyn , yiyn . P This says that |T ∗ | = sn hyn , ·iyn , and this formula clarifies everything: First of all, the sn = sn (T ) are indeed the eigenvalues of |T ∗ |, so sn (T ) = sn (T ∗ ). Moreover, we also see that the yn are eigenvectors corresponding to these eigenvalues, and we obtain the asserted formula for |T ∗ |.  =

Corollary 14.12. Let T ∈ B(H). Then T is compact if and only if there are finite rank operators Tn ∈ B(H) so that kTn − T k → 0. Proof. Finite rank operators are compact, so one directionPfollows from Theorem 14.4. Conversely, if T is compact, then T = sn hxn , ·iyn , P and the partial sums TN = N s hx , ·iy from a sequence of finite n n=1 n n rank operators that converges to T in operator norm. 

Compact operators

183

The singular values can be used to introduce subclasses of compact operators. More precisely, for 1 ≤ p < ∞, let Bp (H) = {T ∈ B∞ (H) : sn (T ) ∈ `p }. In fact, this is consistent with our notation B∞ (H) for the compact operators and it also finally makes this choice of symbol more transparent. The spaces Bp are sometimes called von Neumann-Schatten classes or trace ideals. Of particular interest are B2 (H), the Hilbert-Schmidt operators, and B1 (H), the trace class operators. It can be shown that Bp (H) is in fact a Banach space with the norm kT kp = ksn (T )k`p (that this indeed defines a norm is not obvious, either). Much more could be said, but we will not pursue these topics here. Exercise 14.12. Prove that if T is compact, then kT k = s1 (T ). So kT k∞ = kT k(= kT kB(H) ) and kT k ≤ kT kp for all 1 ≤ p ≤ ∞. Exercise 14.13. Consider the operator T ∈ B(`2 ) that is given by ( 0 n=1 (T x)n = xn−1 . n≥2 n (a) Prove that T is compact. (b) Prove that σ(T ) = {0}, σp (T ) = ∅. Exercise 14.14. Consider again the operator T from Exercise 14.13. 1 Find T ∗ and |T | and prove that sn (T ) = n+1 (so, in particular, T ∈ Bp for p > 1, but T ∈ / B1 ). Exercise 14.15. Consider again the multiplication operator (T Px)n = 2 tn xn on ` from Exercise 14.3. Show that T ∈ B1 if and only if |tn | < ∞. Exercise 14.16. Let µ be a finite Borel measure on [0, 1], and let K : [0, 1] × [0, 1] → C be a continuous function. Show that the operator T : L2 ([0, 1], µ) → L2 ([0, 1], µ), Z (T f )(x) = K(x, y)f (y) dµ(y) [0,1]

is compact.

15. Perturbations by compact operators In this chapter, we study the stability (or lack thereof) of various spectral properties under small perturbations. Here’s the type of situation we have in mind: Let T ∈ B(H) be a self-adjoint operator, and let V ∈ B(H) be another self-adjoint operator that will be assumed to be small in a suitable sense. We then want to compare the spectral properties of T + V with those of T . Definition 15.1. Let T be a self-adjoint operator, with spectral resolution E. The essential spectrum σess (T ) is the set of all t ∈ R for which dim R(E((t − r, t + r))) = ∞ for all r > 0. Recall that if t ∈ / σ(T ), then E((t − r, t + r)) = 0 for all small r > 0, so σess ⊂ σ. Also, it is clear that σess is a closed subset of R because if t∈ / σess , then R(E((t − r, t + r))) is finite-dimensional for some r > 0, but this implies that (t − r, t + r) ∩ σess = ∅, so the complement of σess is open, as claimed. Proposition 15.2. t ∈ σess precisely if t is an accumulation point of σ or an eigenvalue of infinite multiplicity. Here, we define the multiplicity of an eigenvalue t as dim N (T − t), as expected. Of course, if T has finite spectral multiplicity, then the second alternative cannot occur, so in this case, σess is just the set of accumulation points of σ. For example, this remark applies to Jacobi matrices. Proof. If t ∈ σ is not an accumulation point of the spectrum, then t is an isolated point of σ. So, for small enough r > 0, E((t − r, t + r)) = E({t}). Since this is the projection onto N (T − t), it will be finitedimensional if t is not an eigenvalue of infinite multiplicity. Hence t∈ / σess . Conversely, if t is an eigenvalue of infinite multiplicity, then R(E({t}) = N (T − t) is infinite-dimensional, so t ∈ σess . If t is an accumulation point of σ, then, for any r > 0 and N ∈ N, (t − r, t + r) contains N distinct points tn ∈ σ and thus also N disjoint open subsets In that all intersect σ (just take small neighborhoods of the tn ’s). Now E(In ) 6= 0, so dim R(E(In )) ≥ 1, and, moreover, these subspaces are mutually orthogonal. Therefore, dim R(E((t − r, t + r))) ≥ N . Since N was arbitrary here, this space is in fact infinite-dimensional, so t ∈ σess .  Exercise 15.1. Let T ∈ B(H) be a self-adjoint operator on an infinitedimensional Hilbert space H. Show that then σess (T ) 6= ∅. It is sometimes also convenient to introduce a symbol for the complement, σd = σ \ σess . We call σd the discrete spectrum; it consists of the 184

Perturbations

185

isolated points of the spectrum (these are automatically eigenvalues) of finite multiplicity. Here is our first result on perturbations. Theorem 15.3 (Weyl). Let T be a self-adjoint operator, and assume that V is compact and self-adjoint. Then σess (T + V ) = σess (T ). There is a very useful criterion for a point to lie in the essential spectrum, which will lead to an effortless proof of Weyl’s Theorem. w We call a xn ∈ H a Weyl sequence (for T and t) if kxn k = 1, xn − → 0, and (T − t)xn → 0. Theorem 15.4. t ∈ σess (T ) if and only if there exists a Weyl sequence for T and t. It is tempting to compare this with the result of Exercise 10.20: t ∈ σ(T ) if and only there exists a sequence xn ∈ H, kxn k = 1, so that (T − t)xn → 0. Proof. If t ∈ σess , pick x1 ∈ R(E((t − 1, t + 1))), then x2 ∈ R(E((t − 1/2, t + 1/2))) with x2 ⊥ x1 , then x3 ∈ R(E((t − 1/3, t + 1/3))) with x3 ⊥ x1 , x2 etc. We can also insist that kxn k = 1. Then this procedure w yields an ONS xn , so xn − → 0, and k(T − t)xn k ≤ 1/n. Conversely, assume that a Weyl sequence xn has been constructed. We will argue by contradiction, so assume also that dim R(E((t − r, t + r))) < ∞ for some r > 0. We abbreviate P = E((t − r, t + r)). Since R(P ) is finite-dimensional, P is a compact operator, and we assumed w that xn − → 0, so it follows that kP xn k → 0. Therefore, k(T − t)xn k ≥ k(T − t)(1 − P )xn k − k(T − t)P xn k ≥ rk(1 − P )xn k − k(T − t)P xn k → r, but this contradicts our assumption that xn is a Weyl sequence. We have to admit that t ∈ σess .  w

Proof of Theorem 15.3. This is very easy now. If xn − → 0, then V xn → 0 by Theorem 14.6(b), so T and T + V have the same Weyl sequences.  Here are some typical applications of this result to Jacobi matrices. Theorem 15.5. Let J be a Jacobi matrix whose coefficients satisfy an → 1, bn → 0. Then σess (J) = [−2, 2]. Proof. Let J0 be the Jacobi matrix with coefficients an = 1, bn = 0. We know that σ(J0 ) = σess (J0 ) = [−2, 2]. Now J = J0 + K, where (Ku)n = (an − 1)un+1 + (an−1 − 1)un−1 + bn un

(n ≥ 2).

186

Christian Remling

Exercise 15.2. Show that K is compact. Suggestion: Show that we can write K = K0 + K1 , where K0 is a finite rank operator and kK1 k < . Now Weyl’s Theorem gives the claim.



The same argument shows that if, more generally, J, J 0 are Jacobi matrices whose coefficients satisfy an − a0n → 0, bn − b0n → 0, then σess (J) = σess (J 0 ). In particular, the essential spectrum only depends on what happens asymptotically, “at infinity.” We also obtain a decomposition theorem for whole line problems. By this, we mean the following: Consider a whole line Jacobi matrix J : `2 (Z) → `2 (Z), and let J± be its half line restrictions. More precisely, let ( a1 u 2 + b 1 u 1 n=1 (J+ u)n = , an un+1 + an−1 un−1 + bn un n ≥ 2 ( a−1 u−1 + b0 u0 n=0 (J− u)n = . an un+1 + an−1 un−1 + bn un n ≤ −1 We interpret J± as an operator on `2 (Z± ), where Z+ = N, Z− = Z \ N. Theorem 15.6. σess (J) = σess (J+ ) ∪ σess (J− ) Proof. We will describe the argument somewhat informally, rather than try to set up elaborate notation for what is a fairly simple argument. We cut Z into two half lines and set a0 = 0, which is a rank two perturbation of J, and thus preserves the essential spectrum by Weyl’s Theorem. Call this new operator J1 . Since `2 (Z+ ) is a reducing subspace for J1 , we may naturally identify J1 = J+ ⊕ J− . Therefore, the following observation finishes the proof. Exercise 15.3. Let Tj ∈ B(Hj ) (j = 1, 2) be self-adjoint operators, and let T = T1 ⊕ T2 . Show that then σess (T ) = σess (T1 ) ∪ σess (T2 ).  Theorem 15.4 is also often useful as a tool to investigate σess . As an illustration, we will now discuss such an application. We need some notation. For simplicity, we only treat one-dimensional Schr¨odinger operators here; however, analogous results could be formulated and proved for Jacobi matrices also. Let W ∈ `∞ (Z), and denote the corresponding Schr¨odinger operator on `2 (Z) by HW . In other words, (HW u)n = un+1 + un−1 + Wn un . Suppose that V ∈ `∞ (N) contains arbitrarily large chunks of W , in the following sense. There are numbers

Perturbations

187

cn , Ln ∈ N, Ln → ∞, so that the sets {cn −Ln , . . . , cn +Ln } are disjoint subintervals of N, and Vcn +j = Wj

(|j| ≤ Ln ).

We denote the corresponding Schr¨odinger operator by HV+ . The superscript + reminds us that this is a half line operator, on `2 (N). Theorem 15.7. σ(HW ) ⊂ σess (HV+ ) Proof. Let t ∈ σ(HW ). We will construct a Weyl sequence for HV+ and this t; this will finish the proof by Theorem 15.4. By Exercise 10.20, there is a sequence u(n) ∈ `2 (Z) so that ku(n) k = 1, k(HW − t)u(n) k → 0. Since χ{−N,...,N } u → u in `2 as N → ∞ and since HW − t is a continuous operator, we may in fact also assume that the u(n) have finite supports. Since Ln → ∞, there are nj → ∞ so that u(j) is supported by {−Lnj , . . . , Lnj }. To keep the notation simple, we will just assume that nj = j works. (n) (n) Then the shifted sequence vj = uj−cn is a Weyl sequence: the v (n) w have disjoint supports, so form an ONS, and hence v (n) − → 0. Moreover, k(HV+ − t)v (n) k = k(HW − t)u(n) k → 0.  These results give information on the spectrum as a set. We are also interested in finer properties of the spectrum, such as the ac, sc, pp decomposition. We start with rank one perturbations, and we will in fact again work in an abstract framework, for general Hilbert space operators. So let T ∈ B(H) be self-adjoint, and assume that T has simple spectrum. Fix a cyclic vector x ∈ H, kxk = 1. Recall that this means that {f (T )x : f ∈ C(σ(T ))} is dense in H. We want to consider the family of rank one perturbations Tg = T + ghx, ·ix

(g ∈ R).

The following observations confirm that this is a good choice of setup. Exercise 15.4. Let T ∈ B(H) be normal, and let M ⊂ H be a closed subspace. Show that M is reducing if and only if M is invariant under both T and T ∗ . Now suppose that we are given an arbitrary self-adjoint operator T ∈ B(H) and an arbitrary vector x ∈ H, kxk = 1. Form the subspace H1 = {f (T )x : f ∈ C(σ(T ))}. Then H1 is clearly invariant under T , thus reducing by the Exercise. Thus we can decompose T = T1 ⊕ T2 , where T2 : H1⊥ → H1⊥ . Then T + ghx, ·ix = (T1 + ghx, ·ix) ⊕ T2 .

188

Christian Remling

Since it is also clear that x is cyclic for T1 , we have reduced the situation of a general rank one perturbation to the one outlined above. We also discover such a scenario in the theory of Jacobi matrices: If T = J, a Jacobi matrix on H = `2 (N), then x = δ1 is a cyclic vector. Note that the perturbed operator Jg = J + ghδ1 , ·iδ1 is again a Jacobi matrix. In fact, we obtain it from J by simply replacing b1 → b1 + g. Proposition 15.8. For every g ∈ R, x is a cyclic vector for Tg . Proof. An inductive argument shows that Tgn x = T n x + y, where y is a linear combination of x, T x, . . . , T n−1 x. So L(x, T x, . . . , T n x) = L(x, Tg x, . . . , Tgn x), or, put differently, {p(T )x} = {p(Tg )x}, where p varies over all polynomials. However, every continuous function on the compact set σ(T ) ⊂ R can be uniformly approximated by polynomials, so {p(T )x} is already dense in H.  Since x is cyclic, we know from Theorem 10.7 and its proof that Tg is unitarily equivalent to multiplication by t on L2 (R, µg ), where dµg (t) = dkEg (t)xk2 , and here Eg of course denotes the spectral resolution of Tg . By the functional calculus, Z dµg (t) −1 Fg (z) ≡ hx, (Tg − z) xi = (z ∈ / R). R t−z These functions Fg satisfy the following identity, which will be crucial for everything that follows. Theorem 15.9. (15.1)

Fg (z) =

F (z) 1 + gF (z)

Here, F (z) = F0 (z) = hx, (T − z)−1 xi. Proof. Write P = hx, ·ix and notice that (for z ∈ / R) (Tg − z)−1 − (T − z)−1 = −g(Tg − z)−1 P (T − z)−1 , so Fg (z) − F (z) = −ghx, (Tg − z)−1 P (T − z)−1 xi = −ghx, (Tg − z)−1 xihx, (T − z)−1 xi = −gFg (z)F (z), and we obtain (15.1) by rearranging.



We can now use (15.1) to show that the ac part of a self-adjoint operator is invariant under rank one perturbations. We need some preliminary observations. Let ρ be an absolutely continuous (positive) Borel measure on R. For simplicity, we also assume that ρ is finite. Then a Borel set M ⊂ R is called an essential support of ρ if ρ(M c ) = 0

Perturbations

189

and if N ⊂ M , ρ(N ) = 0, then |N | = 0, where | · | denotes Lebesgue measure. By the Radon-Nikodym Theorem, we can write dρ(x) = f (x) dx, with f ∈ L1 (R), f ≥ 0, and now M = {x ∈ R : f (x) > 0} provides an essential support. Essential supports are unique up to null sets: If M, M 0 are essential supports, then |M ∆M 0 | = 0, where M ∆M 0 = (M \M 0 )∪(M 0 \M ). Moreover, essential supports determine the measure class, in the following sense: Let Mρ , Mν be essential supports of the (absolutely continuous) measures ρ, ν. Then ρ and ν are equivalent (have the same null sets) if and only if |Mρ ∆Mν | = 0, which happens if and only if ρ, ν have a common essential support M . Recall from Exercise 10.17 that two simple self-adjoint operators S, T are unitarily equivalent if and only if they have equivalent spectral measures µ, ν. So we can now say that Sac ∼ = Tac if and only if µac , νac admit a common essential support. With these preparations out of the way, it will now be an easy matter to establish the following fact: Theorem 15.10. Tg and T have unitarily equivalent absolutely continuous parts. This of course implies that σac (Tg ) = σac (T ), but the actual statement is stronger than this because, in general, the ac spectra can be equal without the ac parts of the operators being unitarily equivalent. Exercise 15.5. Explain this in more detail. (µ) Suggestion: Construct two ac measures µ, ν, so that Mt (in L2 (µ)) (ν) and Mt (in L2 (ν)) have the same spectra, but are not unitarily equivalent. Equivalently, you need to construct two ac measures that have the same topological support but not the same null sets. Proof. We work with the measures dµg (t) = dkEg (t)xk2 that were introduced above. By Theorem 13.10(a), (c), Fg (t) ≡ limy→0+ Fg (t + iy) exists for almost every t ∈ R, and d(µg )ac (t) = (1/π)Im Fg (t) dt. As discussed above, Mg = {t ∈ R : Im Fg (t) > 0} is an essential support of this measure. Fix g ∈ R and assume that t ∈ M = M0 . By throwing away a null set N ⊂ R, we may also assume that F (t) = lim F (t + iy) and Fg (t) exist; since t ∈ M , we have that Im F (t) > 0. From (15.1), we see that Im F (z) Im Fg (z) = . |1 + gF (z)|2 Take z = t + iy and let y → 0+. It follows that Im Fg (t) > 0, too. In terms of the supports, this calculation has shown that we can take Mg ⊃ M . By symmetry, we also obtain that Mg ⊂ M .  This result can be improved. First of all, any self-adjoint finite rank PN perturbation is of the form V = n=1 vn hxn , ·ixn and thus may be

190

Christian Remling

interpreted as N successive rank one perturbations. So the ac part of a self-adjoint operator is invariant, up to unitary equivalence, under (self-adjoint) finite rank perturbations. A stronger result holds, but this is not so easy to prove, so I’ll just report on this: Theorem 15.11 (Kato-Rosenblum). Suppose that T ∈ B(H) is selfadjoint and V is self-adjoint and V ∈ B1 (H). Then the ac parts of T and T + V are unitarily equivalent. Exercise 15.6. Prove that the ac spectrum also obeys a decomposition law: If J is a Jacobi matrix on `2 (Z), then σac (J) = σac (J+ ) ∪ σac (J− ) (the notation is as in Theorem 15.6). The trace class condition in Theorem 15.11 is sharp. This is demonstrated by the following rather spectacular result (which we don’t want to prove here). Theorem 15.12 (Weyl-von Neumann). Let T ∈ B(H) be a self-adjoint operator on a separable Hilbert space H. Then, for every p > 1 and  > 0, there exists a self-adjoint K ∈ Bp (H) with kKkp <  so that σac (T + K) = σsc (T + K) = ∅. So T + K has pure point spectrum. Since the essential spectrum is preserved by the compact perturbation K, the closure of the eigenvalues of T + K has to contain σess (T ), so we will often get dense point spectrum here. We have seen that the ac spectrum has reasonably good stability properties under small perturbations. What about the sc, pp parts? The following examples make short work of any hopes one might have. As a preparation, we first prove a criterion that will allow us to conveniently detect point spectrum. Proposition 15.13. Let Z

dµ(t) ∈ (0, ∞]. 2 R (x − t) Then, for all g 6= 0, the following statements are equivalent: (a) µg ({x}) > 0; (b) G(x) < ∞, F (x) = −1/g. G(x) =

Here, F (x) = −1/g could be interpreted as an abbreviation for the statement F (x) = limy→0+ F (x + iy) exists and equals −1/g, but actually existence of this limit is automatic if G(x) < ∞. Exercise 15.7. Prove this remark. More precisely, prove the following: If G(x) < ∞, then F (x) = limy→0+ F (x + iy) exists and F (x) ∈ R.

Perturbations

191

Proof. Recall that µg ({x}) = lim −iyFg (x + iy) (Theorem 13.10(e)). So, if µg ({x}) > 0, then F (x + iy) =

Fg (x + iy) yFg (x + iy) 1 = →− . 1 − gFg (x + iy) y − gyFg (x + iy) g

Moreover, yIm F (x + iy) Im F (x + iy) = y |y − gyFg (x + iy)|2 also approaches a finite, positive limit as y → 0+. On the other hand, Z Im F (x + iy) dµ(t) = , 2 2 y R y + (x − t) and this converges to G(x) by the Monotone Convergence Theorem, so G(x) < ∞. Conversely, if G(x) < ∞, then the same calculation shows that Im F (x + iy)/y → G(x). Moreover, 1/|t − x| ∈ L1 (µ), so Dominated Convergence shows that Z dµ(t) F (x) = R t−x (compare Exercise 15.7). Hence Z F (x + iy) − F (x) dµ(t) =i → iG(x), y R (t − x − iy)(t − x) by Dominated Convergence again. In other words, if also F (x) = −1/g, then (1 + gF (x + iy))/y → igG(x). It now follows that y −1 Im F (x + iy) 1 → 2 > 0, −2 2 y |1 + gF (x + iy)| g G(x) so µg ({x}) > 0, as claimed. y Im Fg (x + iy) =



In the following examples, we will just give the measure µ. This will determine the measures µg completely, via F , Fg and (15.1). Moreover, we can just let H = L2 (R, dµ), T = Mt , x ≡ 1 to confirm that there indeed is a self-adjoint operator and a cyclic vector for which this measure µ is the corresponding spectral measure. Alternatively, we could let T = J be the Jacobi matrix with spectral measure µ (use Theorem 13.9!) and x = δ1 . P Example 15.1. Let dµ(x) = (1/2)χ[0,1] (x) dx + n≥1 2−n−1 δxn , where xn is a countable dense subset of [0, 1]. Then σac (T ) = σpp (T ) = [0, 1]. However, for all 0 ≤ x ≤ 1, we have that Z 1 1 dt = ∞, G(x) ≥ 2 0 (x − t)2

192

Christian Remling

so σpp (Tg ) ∩ [0, 1] = ∅ for all g 6= 0 by Proposition 15.13. Pn P Example 15.2. Let ρn = 2−n 2j=1 δj2−n and µ = n≥1 2−n ρn . Then σpp (T ) = [0, 1], σac (T ) = σsc (T ) = ∅. If x ∈ [0, 1], then there is a j so that |x − j2−n | ≤ 2−n , so Z dρn (t) ≥ 2−n 22n = 2n , 2 (x − t) R P −n n and G(x) ≥ 2 2 = ∞. Proposition 15.13 shows that σpp (Tg ) ∩ [0, 1] = ∅ for all g 6= 0. Since both the essential and the ac spectrum are preserved, it follows that σsc (Tg ) = [0, 1]. Exercise 15.8. Show that σp (Tg1 ) ∩ σp (Tg2 ) = ∅ if g1 6= g2 (we are working with σp here, the set of eigenvalues, not its closure σpp ). Exercise 15.9. Let J be a Jacobi matrix on `2 (N), and let dµ(t) = dkE(t)δ1 k2 , as usual. Consider the family of rank one perturbations Jg (corresponding to the coefficient change b1 → b1 + g). (a) Show that x ∈ R is an eigenvalue of Jg for some g ∈ R if and only if (τ − x)u = 0 has an `2 solution u with u1 6= 0. (b) Show that G(x) < ∞ if and only if (τ − x)u = 0 has an `2 solution u with u0 , u1 6= 0. Exercise 15.10. Let T ∈ B(H) be a self-adjoint operator. Show that T is compact if and only if σess (T ) ⊂ {0}.