This graduatelevel textbook introduces the reader to the area of inverse problems, vital to many fields including geoph
692 139 6MB
English Pages 400 [412] Year 2021
Table of contents :
Preface to the Third Edition
Preface to the Second Edition
Preface to the First Edition
Contents
1 Introduction and Basic Concepts
1.1 Examples of Inverse Problems
1.2 IllPosed Problems
1.3 The WorstCase Error
1.4 Problems
2 Regularization Theory for Equations of the First Kind
2.1 A General Regularization Theory
2.2 Tikhonov Regularization
2.3 Landweber Iteration
2.4 A Numerical Example
2.5 The Discrepancy Principle of Morozov
2.6 Landweber's Iteration Method with Stopping Rule
2.7 The Conjugate Gradient Method
2.8 Problems
3 Regularization by Discretization
3.1 Projection Methods
3.2 Galerkin Methods
3.2.1 The Least Squares Method
3.2.2 The Dual Least Squares Method
3.2.3 The Bubnov–Galerkin Method for Coercive Operators
3.3 Application to Symm's Integral Equation of the First Kind
3.4 Collocation Methods
3.4.1 Minimum Norm Collocation
3.4.2 Collocation of Symm's Equation
3.5 Numerical Experiments for Symm's Equation
3.6 The Backus–Gilbert Method
3.7 Problems
4 Nonlinear Inverse Problems
4.1 Local Illposedness
4.2 The Nonlinear Tikhonov Regularization
4.2.1 Existence of Solutions and Stability
4.2.2 Source Conditions And Convergence Rates
4.2.3 A ParameterIdentification Problem
4.2.4 A Glimpse on Extensions to Banach Spaces
4.3 The Nonlinear Landweber Iteration
4.4 Problems
5 Inverse Eigenvalue Problems
5.1 Introduction
5.2 Construction of a Fundamental System
5.3 Asymptotics of the Eigenvalues and Eigenfunctions
5.4 Some Hyperbolic Problems
5.5 The Inverse Problem
5.6 A Parameter Identification Problem
5.7 Numerical Reconstruction Techniques
5.8 Problems
6 An Inverse Problem in Electrical Impedance Tomography
6.1 Introduction
6.2 The Direct Problem and the Neumann–Dirichlet Operator
6.3 The Inverse Problem
6.4 The Factorization Method
6.5 Problems
7 An Inverse Scattering Problem
7.1 Introduction
7.2 The Direct Scattering Problem
7.3 Properties of the Far Field Patterns
7.4 Uniqueness of the Inverse Problem
7.5 The Factorization Method
7.6 The Interior Transmission Eigenvalue Problem
7.6.1 The Radially Symmetric Case
7.6.2 Discreteness And Existence in the General Case
7.6.3 The Inverse Spectral Problem for the Radially Symmetric Case
7.7 Numerical Methods
7.7.1 A Simplified Newton Method
7.7.2 A Modified Gradient Method
7.7.3 The Dual Space Method
7.8 Problems
A Basic Facts from Functional Analysis
A.1 Normed Spaces and Hilbert Spaces
A.2 Orthonormal Systems
A.3 Linear Bounded and Compact Operators
A.4 Sobolev Spaces of Periodic Functions
A.5 Sobolev Spaces on the Unit Disc
A.6 Spectral Theory for Compact Operators in Hilbert Spaces
A.7 The Fréchet Derivative
A.8 Convex Analysis
A.9 Weak Topologies
A.10 Problems
B Proofs of the Results of Section 2.7摥映數爠eflinksspscgsps22.72
Applied Mathematical Sciences
Andreas Kirsch
An Introduction to the Mathematical Theory of Inverse Problems Third Edition
Applied Mathematical Sciences Volume 120
Series Editors Anthony Bloch, Department of Mathematics, University of Michigan, Ann Arbor, MI, USA C. L. Epstein, Department of Mathematics, University of Pennsylvania, Philadelphia, PA, USA Alain Goriely, Department of Mathematics, University of Oxford, Oxford, UK Leslie Greengard, New York University, New York, NY, USA Advisory Editors J. Bell, Center for Computational Sciences and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, USA P. Constantin, Department of Mathematics, Princeton University, Princeton, NJ, USA R. Durrett, Department of Mathematics, Duke University, Durham, CA, USA R. Kohn, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA R. Pego, Department of Mathematical Sciences, Carnegie Mellon University, Pittsburgh, PA, USA L. Ryzhik, Department of Mathematics, Stanford University, Stanford, CA, USA A. Singer, Department of Mathematics, Princeton University, Princeton, NJ, USA A. Stevens, Department of Applied Mathematics, University of Münster, Münster, Germany S. Wright, Computer Sciences Department, University of Wisconsin, Madison, WI, USA Founding Editors F. John, New York University, New York, NY, USA J. P. LaSalle, Brown University, Providence, RI, USA L. Sirovich, Brown University, Providence, RI, USA
The mathematization of all sciences, the fading of traditional scientiﬁc boundaries, the impact of computer technology, the growing importance of computer modeling and the necessity of scientiﬁc planning all create the need both in education and research for books that are introductory to and abreast of these developments.The purpose of this series is to provide such books, suitable for the user of mathematics, the mathematician interested in applications, and the student scientist. In particular, this series will provide an outlet for topics of immediate interest because of the novelty of its treatment of an application or of mathematics being applied or lying close to applications. These books should be accessible to readers versed in mathematics or science and engineering, and will feature a lively tutorial style, a focus on topics of current interest, and present clear exposition of broad appeal. A compliment to the Applied Mathematical Sciences series is the Texts in Applied Mathematics series, which publishes textbooks suitable for advanced undergraduate and beginning graduate courses.
More information about this series at http://www.springer.com/series/34
Andreas Kirsch
An Introduction to the Mathematical Theory of Inverse Problems Third Edition
123
Andreas Kirsch Department of Mathematics Karlsruhe Institute of Technology (KIT) Karlsruhe, Germany
ISSN 00665452 ISSN 2196968X (electronic) Applied Mathematical Sciences ISBN 9783030633424 ISBN 9783030633431 (eBook) https://doi.org/10.1007/9783030633431 Mathematics Subject Classiﬁcation: 45Q05, 47A52, 47J06, 65J20, 65J22, 65L08, 65L09, 65N20, 65N21, 65R30, 65R32, 78A45, 78A46, 81U40, 86A22, 31A25, 34A55, 34B24, 35J05, 35R25, 25R30 1st edition: © SpringerVerlag New York, Inc. 1996 2nd edition: © Springer Science+Business Media, LLC 2011 3rd edition: © Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface to the Third Edition The field of inverse problems is growing rapidly, and during the 9 years since the appearance of the second edition of this book many new aspects and subfields have been developed. Since, obviously, not every subject can be treated in a single monograph, the author had to make a decision. As I pointed out already in the preface of the first edition, my intention was—and still is—to introduce the reader to some of the basic principles and developments of this field of inverse problems rather than going too deeply into special topics. As I continued to lecture on inverse problems at the University of Karlsruhe (now Karlsruhe Institute of Technology), new material has been added to the courses and thus also to this new edition because the idea of this book is still to serve as a type of textbook for a course on inverse problems. I have decided to extend this monograph in two directions. For some readers, it was perhaps a little unsatisfactory that only the abstract theory for linear problems was presented but the applications to inverse eigenvalue problems, electrical impedance tomography, and inverse scattering theory are of a nonlinear type. For that reason, and also because the abstract theories for Tikhonov’s method and Landweber’s iteration for nonlinear equations have come to a certain completion, I included a new chapter (Chapter 4) on these techniques for locally improperly posed nonlinear equations in Hilbert spaces with an outlook into some rather new developments for Banach spaces. The former Chapters 4, 5, and 6 are moved to 5, 6, and 7, respectively. The additional functional analytic tools needed in this new Chapter 4 result in two new sections of Appendix A on convex analysis and weak topologies. As a second new topic, a separate section (Section 7.6) on interior transmission eigenvalues is included. These eigenvalue problems arise naturally in the study of inverse scattering problems for inhomogeneous media and were introduced already in the former editions of this monograph. Besides their importance in scattering theory, the transmission eigenvalue problem is an interesting subject in itself, mainly because it fails to be selfadjoint. The investigation of the spectrum is a subject of the present
v
vi
Preface to the Third Edition
research. Special issues of Inverse Problems [37] and recent monographs [34, 55] have addressed this topic alreadyf for the study of complex eigenvalues, one is until now restricted to radially symmetric refractive indices which reduces the partial differential equations to ordinary differential equations. Classical tools from complex analysis make it possible to prove the existence of complex eigenvalues (Subsection 7.6.1) and uniqueness for the corresponding inverse spectral problem (Subsection 7.6.3). I think that this analogue to the inverse Sturm–Liouville problem of Chapter 5 is a natural completion of the study of interior transmission eigenvalues. Finally, a rather large number of mistakes, ambiguities, and misleading formulations has been corrected in every chapter. As major mistakes, the proofs of Theorems 4.22 (a) and 6.30 (d) (referring to the numbering of the second edition) have been corrected. I want to thank all of my colleagues and the readers of the first two editions for the overwhelming positive responses and, last but not least, the publisher for its encouragement for writing this third edition. Karlsruhe, Germany December 2020
Andreas Kirsch
Preface to the Second Edition The first edition of the book appeared 14 years ago. The area of inverse problems is still a growing field of applied mathematics and an attempt at a second edition after such a long time was a difficult task for me. The number of publications on the subjects treated in this book has grown considerably and a new generation of mathematicians, physicists, and engineers has brought new concepts into the field. My philosophy, however, has never been to present a comprehensive book on inverse problems that covers all aspects. My purpose was (as I pointed out in the preface of the first edition), and still is, to present a book that can serve as a basis for an introductory (graduate) course in this field. The choice of material covered in this book reects my personal point of view: students should learn the basic facts for linear illposed problems including some of the present classical concepts of regularization and also some important examples of more modern nonlinear inverse problems. Although there has been considerable progress made on regularization concepts and convergence properties of iterative methods for abstract nonlinear inverse problems, I decided not to include these new developments in this monograph. One reason is that these theoretical results on nonlinear inverse problems are still not applicable to the inverse scattering problems that are my major field of interest. Instead, I refer the reader to the monographs [92, 149] where regularization methods for nonlinear problems are intensively treated. Also, in my opinion, every nonlinear inverse problem has its own characteristic features that should be used for a successful solution. With respect to the inverse scattering problem to determine the shape of the support of the contrast, a whole class of methods has been developed during the last decade, sometimes subsumed under the name Sampling Methods. Because they are very popular not only in the field of inverse scattering theory but also in the field of electrical impedance tomography (EIT) I decided to include the Factorization Method as one of the prominent members in this monograph.
vii
viii
Preface to the Second Edition
The Factorization Method is particularly simple for the problem of EIT and this field has attracted a lot of attention during the past decade, therefore a chapter on EIT has been added to this monograph as Chapter 5 and the chapter on inverse scattering theory now becomes Chapter 6. The main changes of this second edition compared to the first edition concern only Chapters 5 and 6 and Appendix A. As just mentioned, in Chapter 5 we introduce the reader to the inverse problem of electrical impedance tomography. This area has become increasingly important because of its applications in medicine and engineering sciences. Also, the methods of EIT serve as tools and guidelines for the investigation of other areas of tomography such that optical and photoacoustic tomography. The forward model of EIT is usually set up in the weak sense, that is, in appropriate Sobolev spaces. Although I expect that the reader is familiar with the basic facts on Sobolev spaces such as the trace theorem and Friedrich’s inequality, a tutorial section on Sobolev spaces on the unit disk is added in Appendix A, Section A.5. The approach using Fourier techniques is not very common but fits well with the presentation of Sobolev spaces of fractional order on the boundary of the unit disk in Section A.4 of Appendix A. In Chapter 5 on electrical impedance tomography the Neumann–Dirichlet operator is introduced and its most important properties such as monotonicity, continuity, and differentiability are shown. Uniqueness of the inverse problem is proven for the linearized problem only because it was this example for which Calderón presented his famous proof of uniqueness. (The fairly recent uniqueness proof by Astala and Päivärinta in [10] is far too complicated to be treated in this introductory work.) As mentioned above, the Factorization Method was developed during the last decade. It is a completely new and mathematically elegant approach to characterize the shape of the domain where the conductivity differs from the background by the Neumann– Dirichlet operator. The Factorization Method is an example of an approach that uses special features of the nonlinear inverse problem under consideration and has no analogy for traditional linear inverse problems. Major changes are also made in Chapter 6 on inverse scattering problems. A section on the Factorization Method has been added (Section 6.4) because inverse scattering problems are the type of problem for which it is perfectly applicable. The rigorous mathematical treatment of the Factorization Method makes it necessary to work with weak solutions of the scattering problem. Therefore, here we also have to use (local) Sobolev spaces rather than spaces of continuously differentiable functions. I took the opportunity to introduce the reader to a (in my opinion) very natural approach to prove existence of weak solutions by the Lippmann–Schwinger equation in L2(D) (where D contains the support of the contrast n − 1). The key is the fact that the volume potential with any L2density solves the corresponding inhomogeneous Helmholtz equation in the weak sense (just as in the case of smooth densities) and can easily be proved by using the classical result and a density argument. The notion of weak solutions has the advantage of allowing
Preface to the Second Edition
ix
arbitrary L∞functions as indices of refraction but makes it necessary to modify almost all of the arguments in this chapter slightly. In Section 6.7 we dropped the motivating example for the uniqueness of the inverse scattering problem (Lemma 6.8 in the first edition) because it has already been presented for the uniqueness of the linearized inverse problem of impedance tomography. Finally, I want to thank all the readers of the first edition of the monograph for their extraordinarily positive response. I hope that with this second edition I added some course material suitable for being presented in a graduate course on inverse problems. In particular I have found that my students like the problem of impedance tomography and, in particular, the Factorization Method and I hope that this is true for others! Karlsruhe, Germany March 2011
Andreas Kirsch
Preface to the First Edition Following Keller [152] we call two problems inverse to each other if the formulation of each of them requires full or partial knowledge of the other. By this definition, it is obviously arbitrary which of the two problems we call the direct and which we call the inverse problem. But usually, one of the problems has been studied earlier and, perhaps, in more detail. This one is usually called the direct problem, whereas the other is the inverse problem. However, there is often another more important difference between these two problems. Hadamard (see [115]) introduced the concept of a wellposed prob lem, originating from the philosophy that the mathematical model of a physical problem has to have the properties of uniqueness, existence, and stability of the solution. If one of the properties fails to hold, he called the problem illposed. It turns out that many interesting and important inverse problems in science lead to illposed problems, whereas the corresponding direct problems are wellposed. Often, existence and uniqueness can be forced by enlarging or reducing the solution space (the space of “models”). For restoring stability, however, one has to change the topology of the spaces, which is in many cases impossible because of the presence of measurement errors. At first glance, it seems to be impossible to compute the solution of a problem numerically if the solution of the problem does not depend continuously on the data, that is, for the case of illposed problems. Under additional a priori information about the solution, such as smoothness and bounds on the derivatives, however, it is possible to restore stability and construct efficient numerical algorithms. We make no claim to cover all of the topics in the theory of inverse problems. Indeed, with the rapid growth of this field and its relationship to many fields of natural and technical sciences, such a task would certainly be impossible for a single author in a single volume. The aim of this book is twofold: first, we introduce the reader to the basic notions and difficulties encountered with illposed problems. We then study the basic properties of regularization methods for linear illposed problems. These methods can roughly be classified into two groups, namely, whether the regularization
xi
xii
Preface to the First Edition
parameter is chosen a priori or a posteriori. We study some of the most important regularization schemes in detail. The second aim of this book is to give a first insight into two special nonlinear inverse problems that are of vital importance in many areas of the applied sciences. In both inverse spectral theory and inverse scattering theory, one tries to determine a coefficient in a differential equation from measurements of either the eigenvalues of the problem or the field “far away” from the scatterer. We hope that these two examples clearly show that a successful treatment of nonlinear inverse problems requires a solid knowledge of characteristic features of the corresponding direct problem. The combination of classical analysis and modern areas of applied and numerical analysis is, in the author’s opinion, one of the fascinating features of this relatively new area of applied mathematics. This book arose from a number of graduate courses, lectures, and survey talks during my time at the universities of Göttingen and Erlangen/ Nürnberg. It was my intention to present a fairly elementary and complete introduction to the field of inverse problems, accessible not only to mathematicians but also to physicists and engineers. I tried to include as many proofs as possible as long as they required knowledge only of classical differential and integral calculus. The notions of functional analysis make it possible to treat different kinds of inverse problems in a common language and extract its basic features. For the convenience of the reader, I have collected the basic definitions and theorems from linear and nonlinear functional analysis at the end of the book in an appendix. Results on nonlinear mappings, in particular for the Fréchet derivative, are only needed in Chapters 4 and 5. The book is organized as follows. In Chapter 1, we begin with a list of pairs of direct and inverse problems. Many of them are quite elementary and should be well known. We formulate them from the point of view of inverse theory to demonstrate that the study of particular inverse problems has a long history. Sections 1.3 and 1.4 introduce the notions of illposedness and the worstcase error. Although illposedness of a problem (roughly speaking) implies that the solution cannot be computed numerically — which is a very pessimistic point of view — the notion of the worstcase error leads to the possibility that stability can be recovered if additional information is available. We illustrate these notions with several elementary examples. In Chapter 2, we study the general regularization theory for linear illposed equations in Hilbert spaces. The general concept in Section 2.1 is followed by the most important special examples: Tikhonov regularization in Section 2.2, Landweber iteration in Section 2.3, and spectral cutoff in Section 2.4. These regularization methods are applied to a test example in Section 2.5. While in Sections 2.1–2.5 the regularization parameter has been chosen a priori, that is before starting the actual computation, Sections 2.6– 2.8 are devoted to regularization methods in which the regularization parameter is chosen implicitly by the stopping rule of the algorithm. In
Preface to the First Edition
xiii
Sections 2.6 and 2.7, we study Morozov’s discrepancy principle and, again, Landweber’s iteration method. In contrast to these linear regularization schemes, we will investigate the conjugate gradient method in Section 2.8. This algorithm can be interpreted as a nonlinear regularization method and is much more difficult to analyze. Chapter 2 deals with illposed problems in infinitedimensional spaces. However, in practical situations, these problems are first discretized. The discretization of linear illposed problems leads to badly conditioned finite linear systems. This subject is treated in Chapter 3. In Section 3.1, we recall basic facts about general projection methods. In Section 3.2, we study several Galerkin methods as special cases and apply the results to Symm’s integral equation in Section 3.3. This equation serves as a popular model equation in many papers on the numerical treatment of integral equations of the first kind with weakly singular kernels. We present a complete and elementary existence and uniqueness theory of this equation in Sobolev spaces and apply the results about Galerkin methods to this equation. In Section 3.4, we study collocation methods. Here, we restrict ourselves to two examples: the moment collocation and the collocation of Symm’s integral equation with trigonometric polynomials or piecewise constant functions as basis functions. In Section 3.5, we compare the different regularization techniques for a concrete numerical example of Symm’s integral equation. Chapter 3 is completed by an investigation of the Backus–Gilbert method. Although this method does not quite fit into the general regularization theory, it is nevertheless widely used in the applied sciences to solve moment problems. In Chapter 4, we study an inverse eigenvalue problem for a linear ordinary differential equation of second order. In Sections 4.2 and 4.3, we develop a careful analysis of the direct problem, which includes the asymptotic behaviour of the eigenvalues and eigenfunctions. Section 4.4 is devoted to the question of uniqueness of the inverse problem, that is, the problem of recovering the coefficient in the differential equation from the knowledge of one or two spectra. In Section 4.5, we show that this inverse problem is closely related to a parameter identification problem for parabolic equations. Section 4.6 describes some numerical reconstruction techniques for the inverse spectral problem. In Chapter 5, we introduce the reader to the field of inverse scattering theory. Inverse scattering problems occur in several areas of science and technology, such as medical imaging, nondestructive testing of material, and geological prospecting. In Section 5.2, we study the direct problem and prove uniqueness, existence, and continuous dependence on the data. In Section 5.3, we study the asymptotic form of the scattered field as r ! 1 and introduce the far eld pattern. The corresponding inverse scattering problem is to recover the index of refraction from a knowledge of the far field pattern. We give a complete proof of uniqueness of this inverse problem in Section 5.4. Finally, Section 5.5 is devoted to the study of some recent reconstruction techniques for the inverse scattering problem.
xiv
Preface to the First Edition
Chapter 5 differs from previous ones in the unavoidable fact that we have to use some results from scattering theory without giving proofs. We only formulate these results, and for the proofs we refer to easily accessible standard literature. There exists a tremendous amount of literature on several aspects of inverse theory ranging from abstract regularization concepts to very concrete applications. Instead of trying to give a complete list of all relevant contributions, I mention only the monographs [17, 105, 110, 136, 168, 173, 174, 175, 182, 197, 198, 215, 263, 264], the proceedings, [5, 41, 73, 91, 117, 212, 237, 259], and survey articles [88, 148, 152, 155, 214] and refer to the references therein. This book would not have been possible without the direct or indirect contributions of numerous colleagues and students. But, first of all, I would like to thank my father for his ability to stimulate my interest and love of mathematics over the years. Also, I am deeply indebted to my friends and teachers, Professor Dr. Rainer Kress and Professor David Colton, who introduced me to the field of scattering theory and inuenced my mathematical life in an essential way. This book is dedicated to my long friendship with them! Particular thanks are given to Dr. Frank Hettlich, Dr. Stefan Ritter, and Dipl.Math. Markus Wartha for carefully reading the manuscript. Furthermore, I would like to thank Professor William Rundell and Dr. Martin Hanke for their manuscripts on inverse Sturm–Liouville problems and conjugate gradient methods, respectively, on which parts of Chapters 4 and 2 are based. Karlsruhe, Germany April 1996
Andreas Kirsch
Contents . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
1 1 9 13 20
2 Regularization Theory for Equations of the First Kind 2.1 A General Regularization Theory . . . . . . . . . . . . . . . . . . 2.2 Tikhonov Regularization . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Landweber Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 A Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 The Discrepancy Principle of Morozov . . . . . . . . . . . . . . 2.6 Landweber’s Iteration Method with Stopping Rule . . . . 2.7 The Conjugate Gradient Method . . . . . . . . . . . . . . . . . . 2.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
23 24 37 42 45 47 52 55 60
3 Regularization by Discretization . . . . . . . . . . . . . . . . . . . 3.1 Projection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Galerkin Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 The Least Squares Method . . . . . . . . . . . . . . . . 3.2.2 The Dual Least Squares Method . . . . . . . . . . . . 3.2.3 The Bubnov–Galerkin Method for Coercive Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Application to Symm’s Integral Equation of the First Kind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Collocation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Minimum Norm Collocation . . . . . . . . . . . . . . . 3.4.2 Collocation of Symm’s Equation . . . . . . . . . . . . 3.5 Numerical Experiments for Symm’s Equation . . . . . . . 3.6 The Backus–Gilbert Method . . . . . . . . . . . . . . . . . . . . 3.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
63 64 71 73 75
......
77
1 Introduction and Basic Concepts 1.1 Examples of Inverse Problems . 1.2 IllPosed Problems . . . . . . . . . . 1.3 The WorstCase Error . . . . . . . 1.4 Problems . . . . . . . . . . . . . . . . . .
xv
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. 81 . 90 . 91 . 95 . 103 . 110 . 117
xvi
Contents . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
119 120 122 122 124 133 138 149 167
5 Inverse Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Construction of a Fundamental System . . . . . . . . . . 5.3 Asymptotics of the Eigenvalues and Eigenfunctions 5.4 Some Hyperbolic Problems . . . . . . . . . . . . . . . . . . . . 5.5 The Inverse Problem . . . . . . . . . . . . . . . . . . . . . . . . 5.6 A Parameter Identification Problem . . . . . . . . . . . . 5.7 Numerical Reconstruction Techniques . . . . . . . . . . . 5.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
169 169 171 178 189 197 203 208 213
6 An Inverse Problem in Electrical Impedance Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 The Direct Problem and the Neumann–Dirichlet Operator 6.3 The Inverse Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 The Factorization Method . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
215 215 217 221 226 237
4 Nonlinear Inverse Problems . . . . . . . . . . . . . . . . . . . 4.1 Local Illposedness . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 The Nonlinear Tikhonov Regularization . . . . . . . . 4.2.1 Existence of Solutions and Stability . . . . . . 4.2.2 Source Conditions And Convergence Rates 4.2.3 A ParameterIdentification Problem . . . . . . 4.2.4 A Glimpse on Extensions to Banach Spaces 4.3 The Nonlinear Landweber Iteration . . . . . . . . . . . . 4.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 An 7.1 7.2 7.3 7.4 7.5 7.6
Inverse Scattering Problem . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Direct Scattering Problem . . . . . . . . . . . . . . . . . . . . Properties of the Far Field Patterns . . . . . . . . . . . . . . . Uniqueness of the Inverse Problem . . . . . . . . . . . . . . . . . The Factorization Method . . . . . . . . . . . . . . . . . . . . . . . The Interior Transmission Eigenvalue Problem . . . . . . . 7.6.1 The Radially Symmetric Case . . . . . . . . . . . . . . . 7.6.2 Discreteness And Existence in the General Case 7.6.3 The Inverse Spectral Problem for the Radially Symmetric Case . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7 Numerical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.1 A Simplified Newton Method . . . . . . . . . . . . . . . 7.7.2 A Modified Gradient Method . . . . . . . . . . . . . . . 7.7.3 The Dual Space Method . . . . . . . . . . . . . . . . . . . 7.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
239 239 243 256 266 272 282 284 291
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
296 302 304 307 308 312
Contents
A Basic A.1 A.2 A.3 A.4 A.5 A.6 A.7 A.8 A.9 A.10
xvii
Facts from Functional Analysis . . . . . . . . . . . . . . . . . . Normed Spaces and Hilbert Spaces . . . . . . . . . . . . . . . . . . . . Orthonormal Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Bounded and Compact Operators . . . . . . . . . . . . . . . . Sobolev Spaces of Periodic Functions . . . . . . . . . . . . . . . . . . Sobolev Spaces on the Unit Disc . . . . . . . . . . . . . . . . . . . . . Spectral Theory for Compact Operators in Hilbert Spaces . . The Fréchet Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weak Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
315 315 321 324 332 340 345 351 357 363 365
B Proofs of the Results of Section 2.7 . . . . . . . . . . . . . . . . . . . . . . . . 367 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Chapter 1
Introduction and Basic Concepts 1.1
Examples of Inverse Problems
In this section, we present some examples of pairs of problems that are inverse to each other. We start with some simple examples that are normally not even recognized as inverse problems. Most of them are taken from the survey article [152] and the monograph [111]. Example 1.1 Find a polynomial p of degree n with given zeros x1 , . . . , xn . This problem is inverse to the direct problem: Find the zeros x1 , . . . , xn of a given polynomial p. In this example, the inverse problem is easier to solve. Its solution is p(x) = c(x − x1 ) . . . (x − xn ) with an arbitrary constant c. Example 1.2 Find a polynomial p of degree n that assumes given values y1 , . . . , yn ∈ R at given points x1 , . . . , xn ∈ R. This problem is inverse to the direct problem of calculating the given polynomial at given x1 , . . . , xn . The inverse problem is the Lagrange interpolation problem. Example 1.3 Given a real symmetric n × n matrix A and n real numbers λ1 , . . . , λn , ﬁnd a diagonal matrix D such that A + D has the eigenvalues λ1 , . . . , λn . This problem is inverse to the direct problem of computing the eigenvalues of the given matrix A + D. Example 1.4 This inverse problem is used with intelligence tests: Given the ﬁrst few terms a1 , a2 , . . . , ak of a sequence, ﬁnd the law of formation of the sequence; that is, ﬁnd an for all n! Usually, only the next two or three terms are asked for to show © Springer Nature Switzerland AG 2021 A. Kirsch, An Introduction to the Mathematical Theory of Inverse Problems, Applied Mathematical Sciences 120, https://doi.org/10.1007/9783030633431 1
1
2
Introduction
that the law of formation has been found. The corresponding direct problem is to evaluate the sequence (an ) given the law of formation. It is clear that such inverse problems always have many solutions (from the mathematical point of view), and for this reason their use on intelligence tests has been criticized. Example 1.5 (Geological prospecting) In general, this is the problem of determining the location, shape, and/or some parameters (such as conductivity) of geological anomalies in the Earth’s interior from measurements at its surface. We consider a simple onedimensional example and describe the following inverse problem. Determine changes ρ = ρ(x), 0 ≤ x ≤ 1, of the mass density of an anomalous region at depth h from measurements of the vertical component fv (x) of the change of force at x. ρ(x )Δx is the mass of a “volume element” at x and (x − x )2 + h2 is its distance from the instrument. The change of gravity is described by Newton’s law of gravity f = γ rm2 with gravitational constant γ. For the vertical component, we have
Δfv (x) = γ
ρ(x )Δx h ρ(x )Δx cos θ = γ 3/2 . 2 2 (x − x ) + h (x − x )2 + h2
x 6 h
x
θ ?
? x
0
1
x
This yields the following integral equation for the determination of ρ: 1 fv (x) = γ h 0
ρ(x ) 3/2 dx (x − x )2 + h2
for 0 ≤ x ≤ 1 .
(1.1)
We refer to [6, 105, 277] for further reading on this and related inverse problems in geological prospecting. Example 1.6 (Inverse scattering problem) Find the shape of a scattering object, given the intensity (and phase) of sound or electromagnetic waves scattered by this object. The corresponding direct problem is that of calculating the scattered wave for a given object.
1.1
Examples of Inverse Problems
3
us u
i
D
us More precisely, the direct problem can be described as follows. Let a bounded region D ⊂ RN (N = 2 or 3) be given with smooth boundary ∂D (the scattering ˆ object) and a plane incident wave ui (x) = eikθ·x , where k > 0 denotes the wave number and θˆ is a unit vector that describes the direction of the incident wave. The direct problem is to ﬁnd the total ﬁeld u = ui +us as the sum of the incident ﬁeld ui and the scattered ﬁeld us such that u + k 2 u = 0 in RN \ D , ∂us − ikus = O r−(N +1)/2 ∂r
u=0
on ∂D ,
for r = x → ∞ uniformly in
(1.2a)
x . x
(1.2b)
For acoustic scattering problems, v(x, t) = u(x)e−iωt describes the pressure and k = ω/c is the wave number with speed of sound c. For suitably polarized time harmonic electromagnetic scattering problems, Maxwell’s equations reduce to the twodimensional Helmholtz equation Δu + k 2 u = 0 for the components of the electric (or magnetic) ﬁeld u. The wave number k is given in terms of the √ dielectric constant ε and permeability μ by k = εμ ω. In both cases, the radiation condition (1.2b) yields the following asymptotic behavior: us (x) =
exp(ikx) u∞ (ˆ x) + O x−(N +1)/2 (N −1)/2 x
as x → ∞ ,
where x ˆ = x/x. The inverse problem is to determine the shape of D when the far ﬁeld pattern u∞ (ˆ x) is measured for all x ˆ on the unit sphere in RN . These and related inverse scattering problems have various applications in computer tomography, seismic and electromagnetic exploration in geophysics, and nondestructive testing of materials, for example. An inverse scattering problem of this type is treated in detail in Chapter 7. Standard literature on these direct and inverse scattering problems are the monographs [53, 55, 176] and the survey articles [50, 248]. Example 1.7 (Computer tomography) The most spectacular application of the Radon transform is in medical imaging. For example, consider a ﬁxed plane through a human body. Let ρ(x, y) denote the change of density at the point (x, y), and let L be any line in the plane. Suppose that we direct a thin beam of X–rays into the body along L and measure how much of the intensity is attenuated by going through the body.
4
Introduction
L
y
s
δ
x
Let L be parametrized by (s, δ), where s ∈ R and δ ∈ [0, π). The ray Ls,δ has the coordinates seiδ + iueiδ ∈ C, u ∈ R , where we have identiﬁed C with R2 . The attenuation of the intensity I is approximately described by dI = −γρI du with some constant γ. Integration along the ray yields u ln I(u) = −γ ρ seiδ + iteiδ dt u0
or, assuming that ρ is of compact support, the relative intensity loss is given by ∞ ln I (∞) = −γ ρ seiδ + iteiδ dt . −∞
In principle, from the attenuation factors we can compute all line integrals ∞ (Rρ)(s, δ) := (1.3) ρ seiδ + iueiδ du, s ∈ R, δ ∈ [0, π) . −∞
Rρ is called the Radon transform of ρ. The direct problem is to compute the Radon transform Rρ when ρ is given. The inverse problem is to determine the density ρ for a given Radon transform Rρ (that is, measurements of all line integrals). The problem simpliﬁes in the following special case, where we assume that ρ is radially symmetric and we choose only vertical rays. Then ρ = ρ(r), r = x2 + y 2 , and the ray Lx passing through (x, 0) can be parametrized by (x, u), u ∈ R. This leads to (the factor 2 is due to symmetry) ∞ x2 + u2 du . V (x) := ln I(∞) = −2γ ρ 0
Again, we assume √ that ρ is of compact support in {x : x ≤ R}. The change of variables u = r2 − x2 leads to ∞ V (x) = −2γ x
r √ ρ(r) dr = −2γ 2 r − x2
R √ x
r2
r ρ(r) dr . − x2
(1.4)
1.1
Examples of Inverse Problems
5
A further change of variables z = R2 − r2 and y = R2 − x2 transforms √ this equation into the following Abel’s integral equation for the function z → ρ( R2 − z): y √ 2 ρ( R − z) 2 √ dz, V ( R − y) = −γ y−z
0 ≤ y ≤ R.
(1.5)
0
The standard mathematical literature on the Radon transform and its applications are the monographs [128, 130, 206]. We refer also to the survey articles [131, 183, 185, 192]. The following example is due to Abel himself. Example 1.8 (Abel’s integral equation) Let a mass element move along a curve Γ from a point p1 on level h > 0 to a point p0 on level h = 0. The only force acting on this mass element is the gravitational force mg.
y 6
r p1
h r p
r p0
Γ: x = ψ(y)
x
The direct problem is to determine the time T in which the element moves from p1 to p0 when the curve Γ is given. In the inverse problem, one measures the time T = T (h) for several values of h and tries to determine the curve Γ. Let the curve be parametrized by x = ψ(y). Let p have the coordinates (ψ(y), y). By conservation of energy, that is, m 2 v + m g y = const = m g h , E+U = 2 we conclude for the velocity that ds = v = 2g(h − y) . dt The total time T from p1 to p0 is p1 h
ds 1 + ψ (y)2 = dy for h > 0 . T = T (h) = v 2g (h − y) p0
0
6
Introduction
√ Set φ(y) = 1 + ψ (y)2 and let f (h) := T (h) 2g be known (measured). Then we have to determine the unknown function φ from Abel’s integral equation h 0
φ(y) √ dy = f (h) h−y
for h > 0 .
(1.6)
A similar—but more important—problem occurs in seismology. One studies the problem to determine the velocity distribution c of the Earth from measurements of the travel times of seismic waves (see [29]). For further examples of inverse problems leading to Abel’s integral equations, we refer to the lecture notes by R. Gorenﬂo and S. Vessella [108], the monograph [198], and the papers [179, 270]. Example 1.9 (Backwards heat equation) Consider the onedimensional heat equation ∂ 2 u(x, t) ∂u(x, t) = , ∂t ∂x2
(x, t) ∈ (0, π) × R>0 ,
(1.7a)
t ≥ 0,
(1.7b)
0 ≤ x ≤ π.
(1.7c)
with boundary conditions u(0, t) = u(π, t) = 0, and initial condition u(x, 0) = u0 (x),
The separation of variables leads to the (formal) solution u(x, t) =
∞
−n2 t
an e
sin(nx)
with
n=1
2 an = π
π u0 (y) sin(ny)dy .
(1.8)
0
The direct problem is to solve the classical initial boundary value problem: Given the initial temperature distribution u0 and the ﬁnal time T , determine u(·, T ). In the inverse problem, one measures the ﬁnal temperature distribution u(·, T ) and tries to determine the temperature at earlier times t < T , for example, the initial temperature u(·, 0). From the solution formula (1.8), we see that we have to determine u0 := u(·, 0) from the following integral equation: 2 u(x, T ) = π
π k(x, y) u0 (y) dy,
0 ≤ x ≤ π,
(1.9)
0
where k(x, y) :=
∞
2
e−n
T
sin(nx) sin(ny) .
(1.10)
n=1
We refer to the monographs [17, 175, 198] and papers [31, 43, 49, 80, 81, 94, 193, 247] for further reading on this subject.
1.1
Examples of Inverse Problems
7
Example 1.10 (Diﬀusion in an inhomogeneous medium) The equation of diﬀusion in an inhomogeneous medium (now in two dimensions) is described by the equation ∂u(x, t) 1 = div γ(x)∇u(x, t) , ∂t c
x ∈ D, t > 0,
(1.11)
where c is a constant and γ = γ(x) is a parameter describing the medium. In the stationary case, this reduces to div (γ∇u) = 0
in D .
(1.12)
The direct problem is to solve the boundary value problem for this equation for given boundary values u∂D and given function γ. In the inverse problem, one measures u and the ﬂux γ∂u/∂ν on the boundary ∂D and tries to determine the unknown function γ in D. This is the problem of impedance tomography which we consider in more detail in Chapter 6. The problem of impedance tomography is an example of a parameter identiﬁcation problem for a partial diﬀerential equation. Among the extensive literature on parameter identiﬁcation problems, we only mention the classical papers [166, 225, 224], the monographs [15, 17, 198], and the survey article [200]. Example 1.11 (Sturm–Liouville eigenvalue problem) Let a string of length L and mass density ρ = ρ(x) > 0, 0 ≤ x ≤ L, be ﬁxed at the endpoints x = 0 and x = L. Plucking the string produces tones due to vibrations. Let v(x, t), 0 ≤ x ≤ L, t > 0, be the displacement at x and time t. It satisﬁes the wave equation ρ(x)
∂ 2 v(x, t) ∂ 2 v(x, t) = , 2 ∂t ∂x2
0 < x < L, t > 0,
(1.13)
subject to boundary conditions v(0, t) = v(L, t) = 0 for t > 0. A periodic displacement of the form v(x, t) = w(x) a cos ωt + b sin ωt with frequency ω > 0 is called a pure tone. This form of v solves the boundary value problem (1.13) if and only if w and ω satisfy the Sturm–Liouville eigenvalue problem w (x) + ω 2 ρ(x) w(x) = 0 , 0 < x < L ,
w(0) = w(L) = 0 .
(1.14)
The direct problem is to compute the eigenfrequencies ω and the corresponding eigenfunctions for known function ρ. In the inverse problem, one tries to determine the mass density ρ from a number of measured frequencies ω. We see in Chapter 5 that parameter estimation problems for parabolic and hyperbolic initial boundary value problems are closely related to inverse spectral problems.
8
Introduction
Example 1.12 (Inverse Stefan problem) The physicist Stefan (see [253]) modeled the melting of arctic ice in the summer by a simple onedimensional model. In particular, consider a homogeneous block of ice ﬁlling the region x ≥ at time t = 0. The ice starts to melt by heating the block at the left end. Thus, at time t > 0 the region between x = 0 and x = s(t) for some s(t) > 0 is ﬁlled with water, and the region x ≥ s(t) is ﬁlled with ice.
t T 6
x = s(t)
water
ice
 x
Let u(x, t) be the temperature at 0 < x < s(t) and time t. Then u satisﬁes the onedimensional heat equation ∂u(x, t) ∂ 2 u(x, t) = ∂t ∂x2
in D := {(x, t) ∈ R2 : 0 < x < s(t), t > 0}
(1.15)
∂ subject to boundary conditions ∂x u(0, t) = f (t) and u(s(t), t) = 0 for t ∈ [0, T ] and initial condition u(x, 0) = u0 (x) for 0 ≤ x ≤ . Here, u0 describes the initial temperature and f (t) the heat ﬂux at the left boundary x = 0. The speed at which the interface between water and ice moves is proportional to the heat ﬂux. This is described by the following Stefan condition: ∂u(s(t), t) ds(t) = − for t ∈ [0, T ] . (1.16) dt ∂x The direct problem is to compute the curve s when the boundary data f and u0 are given. In the inverse problem, one has given a desired curve s and tries to reconstruct u and f (or u0 ). We refer to the monographs [39, 198] and the classical papers [40, 95] for a detailed introduction to Stefan problems.
In all of these examples, we can formulate the direct problem as the evaluation of an operator K acting on a known “model” x in a model space X and the inverse problem as the solution of the equation K(x) = y: Direct problem: Inverse problem:
given x (and K), evaluate K(x). given y (and K), solve K(x) = y for x.
In order to formulate an inverse problem, the deﬁnition of the operator K, including its domain and range, has to be given. The formulation as an operator equation allows us to distinguish among ﬁnite, semiﬁnite, and inﬁnitedimensional, linear and nonlinear problems.
1.2
IllPosed Problems
9
In general, the evaluation of K(x) means solving a boundary value problem for a diﬀerential equation or evaluating an integral. For more general and “philosophical” aspects of inverse theory, we refer to [7, 214].
1.2
IllPosed Problems
For all of the pairs of problems presented in the last section, there is a fundamental diﬀerence between the direct and the inverse problems. In all cases, the inverse problem is illposed or improperly posed in the sense of Hadamard, while the direct problem is wellposed. In his lectures published in [115], Hadamard claims that a mathematical model for a physical problem (he was thinking in terms of a boundary value problem for a partial diﬀerential equation) has to be properly posed or wellposed in the sense that it has the following three properties: 1. There exists a solution of the problem (existence). 2. There is at most one solution of the problem (uniqueness). 3. The solution depends continuously on the data (stability). Mathematically, the existence of a solution can be enforced by enlarging the solution space. The concept of distributional solutions of a diﬀerential equation is an example. If a problem has more than one solution, then information about the model is missing. In this case, additional properties, such as sign conditions, can be built into the model. The requirement of stability is the most important one. If a problem lacks the property of stability, then its solution is practically impossible to compute because any measurement or numerical computation is polluted by unavoidable errors: thus the data of a problem are always perturbed by noise! If the solution of a problem does not depend continuously on the data, then in general the computed solution has nothing to do with the true solution. Indeed, there is no way to overcome this diﬃculty unless additional information about the solution is available. Here, we remind the reader of the following statement (see Lanczos [171]): A lack of information cannot be remedied by any mathematical trickery! Mathematically, we formulate the notation of wellposedness in the following way. Definition 1.13 (Wellposedness) Let X and Y be normed spaces, and K : X → Y a linear operator. The equation Kx = y is called properly posed or wellposed if the following holds: 1. Existence: For every y ∈ Y , there is (at least one) x ∈ X such that Kx = y. 2. Uniqueness: For every y ∈ Y , there is at most one x ∈ X with Kx = y.
10
Introduction 3. Stability: The solution x depends continuously on y; that is, for every sequence (xn ) in X with Kxn → Kx (n → ∞), it follows that xn → x (n → ∞).
Equations for which (at least) one of these properties does not hold are called improperly posed or illposed. In Chapter 4, we will extend this deﬁnition to local illposedness of nonlinear problems. It is important to specify the full triple (X, Y, K) and their norms. Existence and uniqueness depend only on the algebraic nature of the spaces and the operator, that is, whether the operator is onto or onetoone. Stability, however, depends also on the topologies of the spaces, that is, whether the inverse operator K −1 : Y → X is continuous. These requirements are not independent of each other. For example, due to the open mapping theorem (see Theorem A.27 of Appendix A.3), the inverse operator K −1 is automatically continuous if K is linear and continuous and X and Y are Banach spaces. As an example for an illposed problem, we study the classical example given by Hadamard in his famous paper [115]. Example 1.14 (Cauchy’s problem for the Laplace equation) Find a solution u of the Laplace equation Δu(x, y) :=
∂ 2 u(x, y) ∂ 2 u(x, y) + = 0 in R × [0, ∞) ∂x2 ∂y 2
(1.17a)
that satisﬁes the “initial conditions” u(x, 0) = f (x),
∂ u(x, 0) = g(x) , ∂y
x ∈ R,
(1.17b)
where f and g are given functions. Obviously, the (unique) solution for f (x) = 0 and g(x) = n1 sin(nx) is given by u(x, y) =
1 sin(nx) sinh(ny), n2
x ∈ R, y ≥ 0 .
Therefore, we have sup {f (x) + g(x)} = x∈R
but sup u(x, y) = x∈R
1 −→ 0, n
1 sinh(ny) −→ ∞, n2
n → ∞,
n→∞
for all y > 0. The error in the data tends to zero while the error in the solution u tends to inﬁnity! Therefore, the solution does not depend continuously on the data, and the problem is improperly posed.
1.2
IllPosed Problems
11
Many inverse problems and some of the examples of the last section (for further examples, we refer to [111]) lead to integral equations of the ﬁrst kind with continuous or weakly singular kernels. Such integral operators are compact with respect to any reasonable topology. The following example will often serve as a model case in these lectures. Example 1.15 (Diﬀerentiation) The direct problem is to ﬁnd the antiderivative y with y(0) = 0 of a given continuous function x on [0, 1], that is, compute t x(s) ds ,
y(t) =
t ∈ [0, 1] .
(1.18)
0
In the inverse problem, we are given a continuously diﬀerentiable function y on [0, 1] with y(0) = 0 and want to determine x = y . This means we have to solve the integral equation Kx = y, where K : C[0, 1] → C[0, 1] is deﬁned by t x(s) ds,
(Kx)(t) :=
t ∈ [0, 1] ,
for x ∈ C[0, 1] .
(1.19)
0
Here, we equip C[0, 1] with the supremum norm x ∞ := max x(t). The 0≤t≤1
solution of Kx = y is just the derivative x = y , provided y(0) = 0 and y is continuously diﬀerentiable! If x is the exact solution of Kx = y, and if we perturb y in the norm · ∞ , then the perturbed righthand side y˜ doesn’t have to be diﬀerentiable, and even if it is the solution of the perturbed problem is not necessarily close to the exact solution. We can, for example, perturb y by δ sin(t/δ 2 ) for small δ. Then the error of the data (with respect to · ∞ ) is δ and the error in the solution is 1/δ. The problem K, C[0, 1], C[0, 1] is therefore illposed.
Now we choose a diﬀerent space Y := y ∈ C 1 [0, 1] : y(0) = 0 for the righthand side and equip Y with the stronger norm x C 1 := max x (t). If the 0≤t≤1
righthand side is perturbed with respect to this norm · C 1 , then the problem K, C[0, 1], Y is wellposed because K : C[0, 1] → Y is boundedly invertible. This example again illustrates the fact that wellposedness depends on the topology. In the numerical treatment of integral equations, a discretization error cannot be avoided. For integral equations of the ﬁrst kind, a “naive” discretization usually leads to disastrous results as the following simple example shows (see also [267]). Example 1.16 The integral equation 1 0
ets x(s) ds = y(t) ,
0 ≤ t ≤ 1,
(1.20)
12
Introduction
with y(t) = (exp(t + 1) − 1)/(t + 1), is uniquely solvable by x(t) = exp(t). We approximate the integral by the trapezoidal rule ⎛ ⎞ 1 n−1 1 1 ets x(s) ds ≈ h ⎝ x(0) + et x(1) + ejht x(jh)⎠ 2 2 j=1 0
with h := 1/n. For t = ih, we obtain the linear system ⎛ ⎞ n−1 2 1 1 h ⎝ x0 + eih xn + ejih xj ⎠ = y(ih) , i = 0, . . . , n . 2 2 j=1
(1.21)
Then xi should be an approximation to x(ih). The following table lists the error between the exact solution x(t) and the approximate solution xi for t = 0, 0.25, 0.5, 0.75, and 1. Here, i is chosen such that ih = t. t 0 0.25 0.5 0.75 1
n=4 0.44 0.67 0.95 1.02 1.09
n=8 0.47 2.03 4.74 3.08 1.23
n = 16 1.30 39.02 15.34 15.78 0.91
n = 32 41.79 78.39 1.72 2.01 20.95
We see that the approximations have nothing to do with the true solution and become even worse for ﬁner discretization schemes. In the previous two examples, the problem was to solve integral equations of the ﬁrst kind. Integral operators are compact operators in many natural topologies under very weak conditions on the kernels. The next theorem implies that linear equations of the form Kx = y with compact operators K are always illposed. Theorem 1.17 Let X, Y be normed spaces and K : X → Y be a linear compact operator with nullspace N (K) := {x ∈ X : Kx = 0}. Let the dimension of the factor space X/N (K) be inﬁnite. Then there exists a sequence (xn ) in X such that Kxn → 0 but (xn ) does not converge. We can even choose (xn ) such that xn X → ∞. In particular, if K is onetoone, the inverse K −1 : Y ⊃ R(K) → X is unbounded. Here, R(K) := {Kx ∈ Y : x ∈ X} denotes the range of K. Proof: We set N = N (K) for abbreviation. The factor space X/N is a
normed space with norm [x] := inf x + z X : z ∈ N since the nullspace ˜ : X/N → Y , deﬁned by K([x]) ˜ is closed. The induced operator K := Kx, ˜ −1 : Y ⊃ [x] ∈ X/N , is welldeﬁned, compact, and onetoone. The inverse K ˜ : X/N → ˜ −1 K R(K) → X/N is unbounded since otherwise the identity I = K
1.3
The WorstCase Error
13
X/N would be compact as a composition of a bounded and a compact operator (see Theorem A.34). This would contradict the assumption that the dimension ˜ −1 is unbounded, there of X/N is inﬁnite (see again Theorem A.34). Because K exists a sequence ([zn ]) in X/N with Kzn → 0 and [zn ] =1. We choose vn ∈ N such that zn + vn X ≥ 12 and set xn := (zn + vn )/ Kzn . Then Kxn → 0 and xn X → ∞.
1.3
The WorstCase Error
We come back to Example 1.15 of the previous section: Determine x ∈ C[0, 1] t such that 0 x(s) ds = y(t) for all t ∈ [0, 1]. An obvious question is: How large could the error be in the worst case if the error in the right side y is at most δ? The answer is already given by Theorem 1.17: If the errors are measured in norms such that the integral operator is compact, then the solution error could be arbitrarily large. For the special Example 1.15, we have constructed explicit perturbations with this property. However, the situation is diﬀerent if additional information is available. Before we study the general case, we illustrate this observation for a model example. Let y and y˜ be twice continuously diﬀerentiable and let a number E > 0 be available with y ∞ ≤ E . (1.22) y ∞ ≤ E and ˜ Set z := y˜ − y and assume that z (0) = z(0) = 0 and z (t) ≥ 0 for t ∈ [0, 1]. Then we estimate the error x ˜ − x in the solution of Example 1.15 by 2 x ˜(t) − x(t)
2
= z (t)
t = 0
t ≤
4E
d 2 z (s) ds = 2 ds
t
z (s) z (s) ds
0
z (s) ds = 4 E z(t).
0
Therefore, under √ the above assumptions on z = y˜ − y we have shown that ˜ x − x ∞ ≤ y − y ∞ ≤ δ and E is a bound as in (1.22). In this √ 2 E δ if ˜ example, 2 E δ is a bound on the worstcase error for an error δ in the data and the additional information x ∞ = y ∞ ≤ E on the solution. We deﬁne the following quite generally. Definition 1.18 Let K : X → Y be a linear bounded operator between Banach ˆ that is, there ˆ ⊂ X a subspace, and · ˆ a “stronger” norm on X; spaces, X X ˆ Then we deﬁne exists c > 0 such that x X ≤ c x Xˆ for all x ∈ X. ˆ Kx Y ≤ δ, x ˆ ≤ E , F δ, E, · Xˆ := sup x X : x ∈ X, (1.23) X and call F δ, E, · Xˆ the worstcase error for the error δ in the data and a priori information x Xˆ ≤ E.
14
Introduction
ˆ F δ, E, · Xˆ depends on the operator K and the norms in X, Y , and X. It is desirable that this worstcase error not only converges to zero as δ tends to zero but that it is of order δ. This is certainly true (even without a priori information) for boundedly invertible operators, as is readily seen from the inequality x X ≤ K −1 L(Y,X) Kx Y . For compact operators K, however, and norm · Xˆ = · X , this worstcase error does not converge (see the following lemma), and one is forced to take a stronger norm · Xˆ . Lemma 1.19 Let K : X → Y be linear and compact and assume that X/N (K) is inﬁnitedimensional. Then for every E > 0, there exists c > 0 and δ0 > 0 such that F δ, E, · X ≥ c for all δ ∈ (0, δ0 ). Proof: Assume that there exists a sequence δn → 0 such that F δn , E, · ˜ : X/N (K) → Y be again the induced operator in X → 0 as n → ∞. Let K ˜ −1 is bounded: Let K ˜ [xm ] = Kxm → 0. the factor space. We show that K Then there exists a subsequence xmn with Kxmn Y ≤ δn for all n. We set zn :=
xmn ,
if xmn X ≤ E,
E xmn −1 X xmn ,
if xmn X > E.
Then zn X ≤ E and Kzn Y ≤ δn for all n. Because the worstcase error tends to zero, we also conclude that zn X → 0. From this, we see that zn = xmn for suﬃciently large n; that is, xmn → 0 as n → ∞. This argument, applied to every subsequence of the original sequence (xm ), yields that xm tends to zero ˜ −1 is bounded on the range R(K) of K. This, however, for m → ∞; that is, K contradicts the assertion of Theorem 1.17. In the following analysis, we make use of the singular value decomposition of the operator K (see Appendix A.6, Deﬁnition A.56). Therefore, we assume from now on that X and Y are Hilbert spaces. In many applications X and Y are Sobolev spaces; that is, spaces of measurable functions such that their (generalized) derivatives are square integrable. Sobolev spaces of functions of one variable can be characterized as follows: ⎧ ⎫ t ⎨ ⎬ H p (a, b) := x ∈ C p−1 [a, b] : x(p−1) (t) = α + ψ ds, α ∈ R, ψ ∈ L2 ⎩ ⎭ a
for p ∈ N. Example 1.20 (Diﬀerentiation) As an example, we study diﬀerentiation and set X = Y = L2 (0, 1), t (Kx)(t) =
x(s) ds , 0
t ∈ (0, 1), x ∈ L2 (0, 1) ,
(1.24)
1.3
The WorstCase Error
15
and ˆ1 X ˆ2 X
{x ∈ H 1 (0, 1) : x(1) = 0}, {x ∈ H 2 (0, 1) : x(1) = 0, x (0) = 0}.
:= :=
(1.25a) (1.25b)
ˆ 1 , and x 2 := x L2 for x ∈ X ˆ 1 . Then We deﬁne x 1 := x L2 for x ∈ X the norms · j , j = 1, 2, are stronger than · L2 (see Problem 1.2), and we can prove for every E > 0 and δ > 0: √ F δ, E, · 1 ≤ δ E and F δ, E, · 2 ≤ δ 2/3 E 1/3 . (1.26) From this result, we observe that the possibility to reconstruct x is dependent on the smoothness of the solution. We come back to this remark in a more general setting (Theorem 1.21). We will also see that these estimates are asymptotically sharp; that is, the exponent of δ cannot be increased. Proof of (1.26): First, assume that x ∈ H 1 (0, 1) with x(1) = 0. Partial integration, which is easily seen to be allowed for H 1 functions and the Cauchy– Schwarz inequality, yields x 2L2
1 =
x(t) x(t) dt 0
1 = −
⎡ x (t) ⎣
0
⎤
⎡
x(s) ds⎦ dt + ⎣x(t)
0
1 = −
t
t
⎤t=1 x(s) ds⎦
0
t=0
x (t) (Kx)(t) dt ≤ Kx L2 x L2 .
(1.27)
0
This yields the ﬁrst estimate. Now let x ∈ H 2 (0, 1) such that x(1) = 0 and x (0) = 0. Using partial integration again, we estimate x 2L2
1 =
x (t) x (t) dt
0
1 = −
t=1 x(t) x (t) dt + x(t) x (t) t=0
0
1 = −
x(t) x (t) dt ≤ x L2 x L2 .
0
Now we substitute this into the righthand side of (1.27): x 2L2 ≤ Kx L2 x L2 ≤ Kx L2 x L2 x L2 . From this, the second estimate of (1.26) follows.
16
Introduction
This example is typical in the sense that integral operators are often smoothing. We can deﬁne an abstract “smoothness” of an element x ∈ X with respect to a compact operator K : X → Y by requiring that x ∈ R(K ∗ ) or x ∈ R(K ∗ K) ∗ σ/2 for some real σ > 0. The operator or, more generally, x ∈ R (K K) (K ∗ K)σ/2 from X into itself is deﬁned as (K ∗ K)σ/2 x = μσj (x, xj )X xj , x ∈ X , j∈J
A.6, where {μj , xj , yj : j ∈ J} is a singular system for K (see Appendix Theorem A.57 and formula (A.47)). We note that R(K ∗ ) = R (K ∗ K)1/2 . ∗ σ/2 Picard’s Theorem (Theorem A.58) yields that x ∈ R (K K) is equivalent ! (x,xj )X 2 to < ∞ which is indeed a smoothness assumption in concrete j μ2σ j applications. We refer to Example A.59 and the deﬁnition of Sobolev spaces of periodic functions (Section A.4) where smoothness is expressed by a decay of the Fourier coeﬃcients. Theorem 1.21 Let X and Y be Hilbert spaces, and K : X → Y linear, compact, and onetoone with dense range R(K). Let K ∗ : Y → X be the adjoint operator. " " ˆ 1 . Then ˆ 1 := R(K ∗ ) and x 1 := " K ∗ −1 x" for x ∈ X (a) Set X Y √ F δ, E, · 1 ≤ δ E .
Furthermore, every E > 0 there exists a sequence δj → 0 such that for F δj , E, · 1 = δj E; that is, this estimate is asymptotically sharp. " " ˆ 2 := R(K ∗ K) and x 2 := " K ∗ K −1 x" for x ∈ X ˆ 2 . Then (b) Set X X F δ, E, · 2 ≤ δ 2/3 E 1/3 ,
and for every E > 0 there exists a sequence δj → 0 such that F δj , E, · 2/3 2 = δj E 1/3 . ˆ σ := R (K ∗ K)σ/2 and x σ := (c) More generally, for some σ > 0 deﬁne X " ∗ −σ/2 " ˆ σ . Then " K K x" for x ∈ X X
F δ, E, · σ ≤ δ σ/(σ+1) E 1/(σ+1) ,
and for every E > 0 there exists a sequence δj → 0 such that F δj , E, · σ/(σ+1) 1/(σ+1) E . σ = δj The norms · 1 , · 2 , and · σ are welldeﬁned because K ∗ and (K ∗ K)σ/2 are onetoone. In concrete examples, the assumptions x ∈ R(K ∗ ) and x ∈ R((K ∗ K)σ/2 are smoothness assumptions on the exact solution x (together
1.3
The WorstCase Error
17
with boundary conditions) as we have mentioned already before. In the pret ceding example, where (Kx)(t) = 0 x(s) ds, the spaces R(K ∗ ) and R(K ∗ K) ˆ 1 and X ˆ 2 deﬁned in (1.25a) and (1.25b) (see coincide with the Sobolev spaces X Problem 1.3). ˆ 1 with Kx Y ≤ δ and x 1 ≤ E; Proof of Theorem 1.21: (a) Let x = K ∗ z ∈ X that is, z Y ≤ E. Then x 2X = (K ∗ z, x)X = (z, Kx)Y ≤ z Y Kx Y ≤ E δ . This proves the ﬁrst estimate. Now let {μj , xj , yj : j ∈ J} be a singular system xj and for K (see Appendix A.6, Theorem A.57). Set x ˆj = E K ∗ yj = μj E 2 xj 1 = E, K x ˆj Y = δj , and ˆ xj X = μj E = δj E. δj := μj E → 0. Then ˆ This proves part (a). Part (b) is proven similarly or as a special case (σ = 2) of part (c). ! (c) With a singular system {μj , xj , yj : j ∈ J}, we have x 2X = j∈J ρj 2 where ρj = (x, xj )X are the expansion coeﬃcients of x. In the following estimate, we use H¨older’s inequality with p = (σ + 1)/σ and q = σ + 1 (note that 1/p + 1/q = 1): 2/(σ+1) ρj 2 = (ρj  μj )2σ/(σ+1) ρj /μσj x 2X = j∈J
⎛ ≤ ⎝ ⎛ = ⎝
j∈J
ρj  μj
j∈J
2pσ/(σ+1)
⎞σ/(σ+1) ⎛ ρj 2 μ2j ⎠
⎝
j∈J
=
⎞1/p ⎛ ⎞1/q 2q/(σ+1) ⎠ ⎝ ⎠ ρj /μσj j∈J
⎞1/(σ+1)
⎠ ρj 2 μ−2σ j
j∈J
2σ/(σ+1) 2/(σ+1) Kx Y (K ∗ K)−σ/2 X .
This ends the proof.
Next, we consider Example 1.9 again. We are given the parabolic initial boundary value problem ∂ 2 u(x, t) ∂u(x, t) = , 0 < x < π, t > 0 , ∂t ∂x2 u(0, t) = u(π, t) = 0, t > 0, u(x, 0) = u0 (x), 0 < x < π . In the inverse problem, we know the ﬁnal temperature distribution u(x, T ), 0 ≤ x ≤ π, and we want to determine the temperature u(x, τ ) at time τ ∈ (0, T ). As additional information, we also assume the knowledge of E > 0 with u(·, 0) L2 ≤ E. The solution of the initial boundary value problem is given by the series π ∞ 2 −n2 t e sin(nx) u0 (y) sin(ny) dy, u(x, t) = π n=1 0
0 ≤ x ≤ π, t > 0 .
18
Introduction
We denote the unknown function by v := u(·, τ ), set X = Y = L2 (0, π), and # ∞ ∞ 2 2 −n τ 2 ˆ := v ∈ L (0, π) : v = X an e sin(n·) with a 1 with 1/p + 1/q = 1 to be speciﬁed in a moment): ∞ 2 π an 2 e−2n τ 2 n=1
=
≤
∞ 2 π an 2/q an 2/p e−2n τ 2 n=1 &1/q % ∞ &1/p % ∞ 2 π π an 2 an 2 e−2pn τ . 2 n=1 2 n=1
We now choose p = T /τ . Then 1/p = τ /T and 1/q = 1 − τ /T . This yields the assertion.
1.3
The WorstCase Error
19
The next chapter is devoted to the construction of regularization schemes ˆ that are asymptotically optimal in the sense that, under the information x ∈ X, " " " " x Xˆ ≤ E, and y˜ − y" Y ≤ "δ, an approximation x ˜ and a constant c > 0 are ˜ − x"X ≤ c F(δ, E, · Xˆ ). constructed such that "x As the ﬁrst tutorial example, we consider the problem of numerical diﬀerentiation; see Examples 1.15 and 1.20. Example 1.22 t Let again, as in Examples 1.15 and 1.20, (Kx)(t) = 0 x(s)ds, t ∈ (0, 1). Solving Kx = y is equivalent to diﬀerentiating y. We ﬁx h ∈ (0, 1/2) and deﬁne the onesided diﬀerence quotient by 1 h y(t + h) − y(t) , 0 < t < 1/2, v(t) = 1 h y(t) − y(t − h) , 1/2 < t < 1, for any y ∈ L2 (0, 1). First, we estimate v − y L2 for smooth functions y; that is, y ∈ H 2 (0, 1). From Taylor’s formula (see Problem 1.4), we have y(t ± h) = y(t) ± y (t) h +
t±h (t ± h − s) y (s) ds ; t
that is,
v(t) − y (t)
=
=
1 h 1 h
t+h (t + h − s) y (s) ds t
h
τ y (t + h − τ ) dτ
0
for t ∈ (0, 1/2) and analogously for t ∈ (1/2, 1). Hence, we estimate 1/2 h v(t) − y (t)2 dt 2
0
h h =
≤
≤
0
0
0
0
⎡
⎤ 1/2 ⎢ ⎥ τ s ⎣ y (t + h − τ ) y (t + h − s) dt⎦ dτ ds 0
) ) * * 1/2 h h * 1/2 * * * 2 + τs y (t + h − τ ) dt+ y (t + h − s)2 dt dτ ds 0
0
⎡ h ⎤2 1 4 2 h y L2 , y 2L2 ⎣ τ dτ ⎦ = 4 0
20
Introduction
and analogously for h2
1 1/2
v(t) − y (t)2 dt. Summing these estimates yields 1 v − y L2 ≤ √ E h , 2
where E is some bound on y L2 . Now we treat the situation with errors. Instead of y(t) and y(t ± h), we measure y˜(t) and y˜(t ± h), respectively. We assume that ˜ y − y L2 ≤ δ. Instead of v(t), we compute v˜(t) = ± y˜(t ± h) − y˜(t) /h for t ∈ (0, 1/2) or t ∈ (1/2, 1), respectively. Because ˜ v (t) − v(t) ≤
˜ y (t) − y(t) ˜ y (t ± h) − y(t ± h) + , h h
we conclude that ˜ v − v L2 ≤ 2δ/h. Therefore, the total error due to the error on the righthand side and the discretization error is v − v L2 + v − y L2 ≤ ˜ v − y L2 ≤ ˜
1 2δ + √ E h. h 2
(1.29)
By this estimate, it is desirable to choose the discretization parameter h as the minimum of the righthand side of (1.29). Its minimum is obtained at $ √ √ √ h = 2 2 δ/E. This results in the optimal error ˜ v − y L2 ≤ 2 4 2 E δ. Summarizing, we note that the discretization parameter h should be of order δ/E if the derivative of a function is computed by the onesided diﬀerence quotient. With this choice, the method is asymptotically optimal under the information x L2 ≤ E. The twosided diﬀerence quotient is optimal under the a priori information x L2 ≤ E and results in an algorithm of order δ 2/3 (see Example 2.4 in the following chapter). We have carried out the preceding analysis with respect to the L2 norm rather that the maximum norm, mainly because we present the general theory in Hilbert spaces. For this example, however, estimates with respect to · ∞ are simpler to derive (see the estimates preceding Deﬁnition 1.18 of the worstcase error). The result of this example is of practical importance: For many algorithms using numerical derivatives (for example, quasiNewton methods in optimization), it is recommended that you choose the discretization parameter ε to be the square root of the ﬂoatingpoint precision of the computer because a onesided diﬀerence quotient is used.
1.4
Problems
1.1 Show that equations (1.1) and (1.20) have at most one solution. Hints: Extend ρ in (1.1) by zero into R, and apply the Fourier transform and the convolution theorem. For (1.20) use results of the Laplace transform.
1.4
Problems
21
ˆ 1 and X ˆ 2 be deﬁned by (1.25a) and (1.25b), 1.2 Let the Sobolev spaces X respectively. Deﬁne the bilinear forms by 1 (x, y)1 :=
1
x (t) y (t) dt
and
(x, y)2 :=
0
x (t) y (t) dt
0
ˆ 1 and X ˆ 2 , respectively. Prove that X ˆ j are Hilbert spaces with respect on X ˆj , to the inner products (·, ·)j , j = 1, 2, and that x L2 ≤ x j for all x ∈ X j = 1, 2. 1.3 Let K : L2 (0, 1) → L2 (0, 1) be deﬁned by (1.19). Show that the ranges ˆ 1 and X ˆ 2 deﬁned by R(K ∗ ) and R(K ∗ K) coincide with the spaces X (1.25a) and (1.25b), respectively. 1.4 Prove the following version of Taylor’s formula by induction with respect to n and partial integration: Let y ∈ H n+1 (a, b) and t, t + h ∈ [a, b]. Then y(t + h) =
n y (k) (t) k=0
k!
hk + Rn (t; h) ,
where the error term is given by 1 Rn (t; h) = n!
t+h (t + h − s)n y (n+1) (s) ds . t
1.5 Verify the assertions of Example A.59 of Appendix A.6.
Chapter 2
Regularization Theory for Equations of the First Kind We saw in the previous chapter that many inverse problems can be formulated as operator equations of the form Kx = y , where K is a linear compact operator between Hilbert spaces X and Y over the ﬁeld K = R or C. We also saw that a successful reconstruction strategy requires additional a priori information about the solution. This chapter is devoted to a systematic study of regularization strategies for solving Kx = y. In particular, we wish to investigate under which conditions they are asymptotically optimal, that is, of the same asymptotic order as the worstcase error. In Section 2.1, we introduce the general concept of regularization. In Sections 2.2 and 2.3, we study Tikhonov’s method and the Landweber iteration as two of the most important regularization strategies. In these three sections, the regularization parameter α = α(δ) is chosen a priori, that is, before we start to compute the regularized solution. We see that the optimal regularization parameter α depends on bounds of the exact solution; they are not known in advance. Therefore, it is advantageous to study strategies for the choice of α that depend on the numerical algorithm and are made during the algorithm (a posteriori). Diﬀerent a posteriori choices are studied in Sections 2.5–2.7. All of them are motivated by the idea that it is certainly suﬃcient to compute an approximation xα,δ of the solution x such that the norm of the defect Kxα,δ − y δ is of the same order as the perturbation error δ of the righthand side. The classical strategy, due to Morozov [194], determines α by solving a nonlinear scalar equation. To solve this equation, we still need a numerical algorithm such as the “regula falsi” or the Newton method. In Sections 2.6 and 2.7, we investigate two wellknown iterative algorithms for solving linear (or nonlinear) equations: Landweber’s method (see [172]), which is the steepest
© Springer Nature Switzerland AG 2021 A. Kirsch, An Introduction to the Mathematical Theory of Inverse Problems, Applied Mathematical Sciences 120, https://doi.org/10.1007/9783030633431 2
23
24
Regularization Theory
descent method, and the conjugate gradient method. The choices of α are made implicitly by stopping the algorithm as soon as the defect Kxm − y δ Y is less than rδ. Here, r > 1 is a given parameter. Landweber’s method and Morozov’s discrepancy principle are easy to investigate theoretically because they can be formulated as linear regularization methods. The study of the conjugate gradient method is more diﬃcult because the choice of α depends nonlinearly on the righthand side y. Because the proofs in Section 2.7 are very technical, we postpone them to an appendix (Appendix B).
2.1
A General Regularization Theory
For simplicity, we assume throughout this chapter that the compact operator K is onetoone. This is not a serious restriction because we can always replace the domain X by the orthogonal complement of the kernel of K. We make the assumption that there exists a solution x∗ ∈ X of the unperturbed equation Kx∗ = y ∗ . In other words, we assume that y ∗ ∈ R(K). The injectivity of K implies that this solution is unique. In practice, the righthand side y ∗ ∈ Y is never known exactly but only up to an error of, say, δ > 0. Therefore, we assume that we know δ > 0 and y δ ∈ Y with (2.1) y ∗ − y δ Y ≤ δ . It is our aim to “solve” the perturbed equation Kxδ = y δ .
(2.2)
In general, (2.2) is not solvable because we cannot assume that the measured data y δ are in the range R(K) of K. Therefore, the best we can hope is to determine an approximation xδ ∈ X to the exact solution x∗ that is “not much worse” than the worstcase error F(δ, E, · Xˆ ) of Deﬁnition 1.18. An additional requirement is that the approximate solution xδ should depend continuously on the data y δ . In other words, it is our aim to construct a suitable bounded approximation R : Y → X of the (unbounded) inverse operator K −1 : R(K) → X. Definition 2.1 A regularization strategy is a family of linear and bounded operators Rα : Y −→ X, α > 0 , such that lim Rα Kx = x
α→0
for all x ∈ X ;
that is, the operators Rα K converge pointwise to the identity. From this deﬁnition and the compactness of K, we conclude the following. Theorem 2.2 Let Rα be a regularization strategy for a compact and injective operator K : X → Y where dim X = ∞. Then we have
2.1 A General Regularization Theory
25
(1) The operators Rα are not uniformly bounded; that is, there exists a sequence (αj ) with Rαj L(Y,X) → ∞ for j → ∞. (2) The sequence (Rα Kx) does not converge uniformly on bounded subsets of X; that is, there is no convergence Rα K to the identity I in the operator norm. Proof: (1) Assume, on the contrary, that there exists c > 0 such that Rα L(Y,X) ≤ c for all α > 0. From Rα y → K −1 y (α → 0) for all y ∈ R(K) and Rα yX ≤ cyY for α > 0, we conclude that K −1 yX ≤ cyY for every y ∈ R(K); that is, K −1 is bounded. This implies that I = K −1 K : X → X is compact, a contradiction to dim X = ∞. (2) Assume that Rα K → I in L(X, X). From the compactness of Rα K and Theorem A.34, we conclude that I is also compact, which again would imply that dim X < ∞. The notation of a regularization strategy is based on unperturbed data; that is, the regularizer Rα y ∗ converges to x∗ for the exact righthand side y ∗ = Kx∗ . Now let y ∗ ∈ R(K) be the exact righthand side and y δ ∈ Y be the measured data with y ∗ − y δ Y ≤ δ. We deﬁne xα,δ := Rα y δ
(2.3)
as an approximation of the solution x∗ of Kx∗ = y ∗ . Then the error splits into two parts by the following obvious application of the triangle inequality: xα,δ − x∗ X
≤
Rα y δ − Rα y ∗ X + Rα y ∗ − x∗ X
≤
Rα L(Y,X) y δ − y ∗ Y + Rα Kx∗ − x∗ X
and thus xα,δ − x∗ X ≤ δRα L(Y,X) + Rα Kx∗ − x∗ X .
(2.4a)
Analogously, for the defect in the equation we have Kxα,δ − y ∗ Y ≤ δKRα L(Y ) + KRα y ∗ − y ∗ Y .
(2.4b)
These are our fundamental estimates, which we use often in the following. We observe that the error between the exact and computed solutions consists of two parts: The ﬁrst term on the righthand side of (2.4a) describes the error in the data multiplied by the “condition number” Rα L(Y,X) of the regularized problem. By Theorem 2.2, this term tends to inﬁnity as α tends to zero. The second term denotes the approximation error (Rα − K −1 )y ∗ X at the exact righthand side y ∗ = Kx∗ . By the deﬁnition of a regularization strategy, this term tends to zero with α. The following ﬁgure illustrates the situation (Figure 2.1).
26
Regularization Theory error
6
Rα Kx∗ − x∗ X
α∗
Rα α
Figure 2.1: Behavior of the total error We need a strategy to choose α = α(δ) dependent on δ in order to keep the total error as small as possible. This means that we would like to minimize δ Rα L(Y,X) + Rα Kx∗ − x∗ X . The procedure is the same in every concrete situation: One has to estimate the quantities Rα L(Y,X) and Rα Kx∗ − x∗ X in terms of α and then minimize this upper bound with respect to α. Before we carry out these steps for two model examples, we introduce the following notation. Definition 2.3 A parameter choice α = α(δ) for the regularization strategy Rα is called admissible if limδ→0 α(δ) = 0 and sup Rα(δ) y δ − xX : y δ ∈ Y, Kx − y δ Y ≤ δ → 0 , δ → 0 , for every x ∈ X. From the fundamental estimate (2.4a), we note that a parameter choice α = α(δ) is admissible if α(δ) → 0 and δRα(δ) L(Y,X) → 0 as δ → 0 and Rα Kx − xX → 0 as α → 0. Example 2.4 (Numerical diﬀerentiation by twosided diﬀerence quotient) t Let again (Kx)(t) = 0 x(s)ds, t ∈ (0, 1); that is, solving Kx = y is equivalent to diﬀerentiating y. It is our aim to compute the derivative of y by the twosided diﬀerence quotient (see Example 1.22 for the onesided diﬀerence quotient). Here α = h is the stepsize, and we deﬁne ⎧ 1
4 y t + h2 − y(t + h) − 3 y(t) , 0 < t < h2 , ⎪ h ⎪ ⎨
1 h h (Rh y)(t) = y t + h2 − y t − h2 , h 2 < t < 1 − 2, ⎪ ⎪
⎩ 1 h h h 3 y(t) + y(t − h) − 4 y t − 2 , 1 − 2 < t < 1, for y ∈ L2 (0, 1). In order to prove that Rh deﬁnes a regularization strategy, it suﬃces to show that Rh K are uniformly bounded with respect to h in the
2.1 A General Regularization Theory
27
operator norm of L2 (0, 1) and that Rh Kx−xL2 tends to zero for smooth x (see Theorem A.29 of Appendix A.3). Later, we show convergence for x ∈ H 2 (0, 1). The fundamental theorem of calculus (or Taylor’s formula from Problem 1.4 for n = 0) yields 1 (Rh y)(t) = h
t+h/2
1 y (s) ds = h
t−h/2
h/2
y (s + t) ds,
−h/2
h h 0 with Rh yL2 ≤ c2 yL2 /h for all y ∈ L2 (0, 1). Estimate (2.4a) yields Rh y δ − x∗ L2 ≤ c2
δ + c1 E h2 , h
where E is a bound on (x∗ ) L2 = (y ∗ ) L2 . Minimization with respect to h of the expression on the righthand side leads to h(δ) = c 3 δ/E and Rh(δ) y δ − x∗ L2 ≤ c˜ E 1/3 δ 2/3 for some c > 0 and c˜ = c2 /c + c1 c2 . We observe that this strategy is asymptotically optimal for the information x L2 ≤ E because it provides an approximation xδ that is asymptotically not worse than the worstcase error (see Example 1.20). The (one or twosided) diﬀerence quotient uses only local portions of the function y. An alternative approach is to ﬁrst smooth the function y by molliﬁcation and then to diﬀerentiate the molliﬁed function. Example 2.5 (Numerical diﬀerentiation by molliﬁcation) t Again, we deﬁne the operator (Kx)(t) = 0 x(s) ds, t ∈ [0, 1], but now as an operator from the (closed) subspace L20 (0, 1) :=
1 z ∈ L2 (0, 1) : z(s) ds = 0 0
of L2 (0, 1) into L2 (0, 1). We deﬁne the Gaussian kernel ψα by
1 √ exp −t2 /α2 , t ∈ R , α π ∞ where α > 0 denotes a parameter. Then −∞ ψα (t) dt = 1, and the convolution ψα (t) =
∞
ψα ∗ y (t) :=
∞ ψα (t − s) y(s) ds =
−∞
ψα (s) y(t − s) ds, −∞
t ∈ R,
2.1 A General Regularization Theory
29
exists and is an L2 function for every y ∈ L2 (R). Furthermore, by Young’s inequality (see [44], p. 102), we have ψα ∗ yL2 (R) ≤ ψα L1 (R) yL2 (R) = yL2 (R)
for all y ∈ L2 (R) .
Therefore, the operators y → ψα ∗ y are uniformly bounded in L2 (R) with respect to α. We note that ψα ∗ y is inﬁnitely often diﬀerentiable on R for every y ∈ L2 (R). We need the two convergence properties ψα ∗ z − zL2 (R) → 0 as α → 0 for every z ∈ L2 (0, 1) and ψα ∗ z − zL2 (R) ≤
√ 2 α z L2 (0,1)
(2.5a) (2.5b)
1
for every z ∈ H (0, 1) with z(0) = z(1) = 0. Here and in the following, we identify functions z ∈ L2 (0, 1) with functions z ∈ L2 (R) where we think of them being extended by zero outside of [0, 1]. Proof of (2.5a), (2.5b): It is suﬃcient to prove (2.5b) because the space z ∈ H 1 (0, 1) : z(0) = z(1) = 0 is dense in L2 (0, 1), and the operators z → ψα ∗ z are uniformly bounded from L2 (0, 1) into L2 (R). Let the Fourier transform be deﬁned by 1 (Fz)(t) := √ 2π
∞ z(s) eist ds,
t ∈ R,
−∞
for z ∈ S, where the Schwarz space S is deﬁned by p (q) ∞ S := z ∈ C (R) : sup t z (t) < ∞ for all p, q ∈ N0 . t∈R
With this normalization, Plancherel’s theorem and the convolution theorem take the form (see [44]) √ FzL2 (R) = zL2 (R) , F(u ∗ z)(t) = 2π (Fu)(t) (Fz)(t), t ∈ R , for all z, u ∈ S. Because S is dense in L2 (R) the ﬁrst formula allows it to extend F to a bounded operator from L2 (R) onto itself (see Theorem A.30), and both formulas hold also for z ∈ L2 (R). Now we combine these properties and conclude that √ ψα ∗ z − zL2 (R) = F(ψα ∗ z) − Fz L2 (R) = 2π F(ψα ) − 1 Fz L2 (R) for every z ∈ L2 (0, 1). Partial integration yields that 1 F(z )(t) = √ 2π
1
ist
z (s) e 0
it ds = − √ 2π
1 z(s) eist ds = (−it) (Fz)(t) 0
30
Regularization Theory
for all z ∈ H 1 (0, 1) with z(0) = z(1) = 0. We deﬁne the function φα by √ 2 2 1 1 1 − 2π F(ψα ) = 1 − e−α t /4 , it it
φα (t) :=
t ∈ R.
Then we conclude that ψα ∗ z − zL2 (R)
= φα F(z )L2 (R) ≤ φα ∞ F(z )L2 (R) = φα ∞ z L2 (0,1) .
From
2 1 φα (t) = α 1 − e−(αt/2) 2 (αt)/2 √ and the elementary estimate 1 − exp(−τ 2 ) /τ ≤ 2 2 for all τ > 0, the desired estimate (2.5b) follows. After these preparations, we deﬁne the regularization operators Rα : L2 (0, 1) → L20 (0, 1) by (Rα y)(t)
:=
=
d (ψα ∗ y)(t) − dt (ψα ∗ y)(t) −
1
1 0
d (ψα ∗ y)(s) ds ds
(ψα ∗ y)(s) ds
0
for t ∈ (0, 1) and y ∈ L2 (0, 1). First, we note that Rα is welldeﬁned, that is, maps L2 (0, 1) into L20 (0, 1) and is bounded. To prove that Rα is a regularization strategy, we proceed as in the previous example and show that (i) Rα yL2 ≤
4 √ α π
yL2 for all α > 0 and y ∈ L2 (0, 1),
(ii) Rα KxL2 ≤ 2 xL2 for all α > 0 and x ∈ L20 (0, 1), that is, the operators Rα K are uniformly bounded in L20 (0, 1), and √ 1 (iii) Rα Kx − xL2 ≤ 2 2 α x L2 for all α > 0 and x ∈ H00 (0, 1), where we have set 1 (0, 1) H00
1 1 := x ∈ H (0, 1) : x(0) = x(1) = 0, x(s) ds = 0 . 0
To prove part (i), we estimate with the Young’s inequality Rα yL2 (0,1)
≤ ≤
2 ψα ∗ yL2 (0,1) ≤ 2 ψα ∗ yL2 (R) 4 2 ψα L1 (R) yL2 (0,1) ≤ √ yL2 (0,1) α π
2.1 A General Regularization Theory
31
for all y ∈ L2 (0, 1) because ψα L1 (R)
∞ = −2
ψα (s) ds = 2 ψα (0) =
0
2 √ . α π
This proves part (i). Now let y ∈ H 1 (0, 1) with y(0) = y(1) = 0. Then, by partial integration, (ψα
1 ∗ y)(t) =
ψα (t
1 − s) y(s) ds =
0
ψα (t − s) y (s) ds = (ψα ∗ y )(t) .
0
Taking y = Kx, x ∈ L20 (0, 1) yields
1 (Rα Kx)(t) = (ψα ∗ x)(t) −
(ψα ∗ x)(s) ds . 0
Part (ii) now follows from Young’s inequality. Finally, we write
1 (Rα Kx)(t) − x(t) = (ψα ∗ x)(t) − x(t) −
(ψα ∗ x)(s) − x(s) ds
0
because
1 0
x(s) ds = 0. Therefore, by (2.5b),
√ Rα Kx − xL2 (0,1) ≤ 2 ψα ∗ x − xL2 (0,1) ≤ 2 2 α x L2 1 for all x ∈ H00 (0, 1). This proves part (iii). Now we conclude that Rα Kx converges to x for any x ∈ L20 (0, 1) by (ii), (iii), 1 (0, 1) in L20 (0, 1). Therefore, Rα deﬁnes a regularization and the denseness of H00 strategy. From (i) and (iii), we rewrite the fundamental estimate (2.4a) as
Rα y δ − x∗ L2 ≤
√ 4δ √ + 2 2αE α π
1 if x∗ ∈ H00 (0, 1) with (x∗ ) L2 ≤ E,y = Kx, and y δ ∈ L2 (0, 1) such that δ ∗ y − y L2 ≤ δ. The choice α = c δ/E again leads to the optimal order
√ O δE . For further applications of the molliﬁcation method, we refer to the monograph by Murio [198]. There exists an enormous number of publications on numerical diﬀerentiation. We mention only the papers [3, 69, 74, 165] and, for more general Volterra equations of the ﬁrst kind, [25, 26, 76, 77, 178].
A convenient method to construct classes of admissible regularization strategies is given by ﬁltering singular systems. Let K : X → Y be a linear
32
Regularization Theory
compact operator, and let {μj , xj , yj : j ∈ J} be a singular system for K (see Appendix A.6, Deﬁnition A.56, and Theorem A.57). As readily seen, the solution x of Kx = y is given by Picard’s theorem (see Theorem A.58 of Appendix A.6) as ∞ 1 (y, yj )Y xj (2.6) x = μ j=1 j provided the series converges, that is, y ∈ R(K). This result illustrates again the inﬂuence of errors in y because the large factors 1/μj (note that μj → 0 as j → ∞) amplify the errors in the expansion coeﬃcients (y, yj )Y . We construct regularization strategies by damping the factors 1/μj . Theorem 2.6 Let K : X → Y be compact and onetoone with singular system {μj , xj , yj : j ∈ N} and
q : (0, ∞) × 0, KL(X,Y ) −→ R be a function with the following properties: (1) q(α, μ) ≤ 1 for all α > 0 and 0 < μ ≤ KL(X,Y ) . (2) For every α > 0, there exists c(α) such that q(α, μ) ≤ c(α) μ
for all 0 < μ ≤ KL(X,Y ) .
(3a) lim q(α, μ) = 1 for every 0 < μ ≤ KL(X,Y ) . α→0
Then the operator Rα : Y → X, α > 0, deﬁned by Rα y :=
∞ q(α, μj )
μj
j=1
(y, yj )Y xj ,
y∈Y ,
(2.7)
is a regularization strategy with Rα L(Y,X) ≤ c(α)
and KRα L(Y ) ≤ 1. A choice α = α(δ) is admissible if α(δ) → 0 and δ c α(δ) → 0 as δ → 0. The function q is called a regularizing ﬁlter for K. Proof:
The operators Rα are bounded because we have by assumption (2) that Rα y2X
=
∞ 2 1 2 q(α, μj ) 2 (y, yj )Y  μ j j=1
≤
c(α)2
∞
(y, yj )Y 2 ≤ c(α)2 y2Y ;
j=1
that is, Rα L(Y,X) ≤ c(α). From KRα y =
∞ q(α, μj ) j=1
μj
(y, yj )Y Kxj =
∞ j=1
q(α, μj ) (y, yj )Y yj ,
2.1 A General Regularization Theory
33
∞ 2 2 2 we conclude that KRα y2Y = j=1 q(α, μj ) (y, yj )Y  ≤ yY and thus KRα L(Y ) ≤ 1. Furthermore, from Rα Kx =
∞ q(α, μj )
μj
j=1
(Kx, yj )Y xj
and x =
∞
(x, xj )X xj ,
j=1
and (Kx, yj )Y = (x, K ∗ yj )X = μj (x, xj )X , we conclude that Rα Kx − x2X =
∞ 2 q(α, μj ) − 1 (x, xj )X 2 .
(2.8)
j=1
Here, K ∗ denotes the adjoint of K (see Theorem A.24). This fundamental representation will be used quite often in the following. Now let x ∈ X be arbitrary but ﬁxed. For > 0 there exists N ∈ N such that ∞
(x, xj )X 2
0 such that [q(α, μj ) − 1]2
2. Here cσ is independent of α. It is c1 = 1/2 and c2 = 1. (b) q(α, μ) = 1−(1−a μ2 )1/α for some 0 < a < 1/K2L(X,Y ) . In this case (2)
σ σ/2 σ/2 holds with c(α) = a/α, and (3b) is satisﬁed with ωσ (α) = 2a α for all σ, α > 0. (c) Let q be deﬁned by q(α, μ) =
1, μ2 ≥ α, 0, μ2 < α.
√ In this case (2) holds with c(α) = 1/ α, and (3b) is satisﬁed with ωσ (α) = ασ/2 for all σ, α > 0. Therefore, all of the functions q deﬁned in (a), (b), and (c) are regularizing ﬁlters. Proof:
For all three cases, properties (1) and (3a) are obvious.
(a) Property (2) follows from the elementary estimate √ α, μ > 0 (which is equivalent to (μ − α)2 ≥ 0).
μ α+μ2
≤
1 √ 2 α
for all
We observe that 1 − q(α, μ) = α/(α + μ2 ). For ﬁxed α > 0 and σ > 0, we deﬁne the function
f (μ) = μσ 1 − q(α, μ) =
α μσ , 0 ≤ μ ≤ μ0 , α + μ2
and compute its derivative as f (μ) = α μσ−1
(σ − 2) μ2 + ασ . (α + μ2 )2
Then f is monotonically increasing for σ ≥ 2 and thus f (μ) ≤ f (μ0 ) ≤ μσ−2 α. 0 ασ with value f (μmax ) ≤ If σ < 2, then we compute its maximum as μ2max = 2−σ cσ ασ/2 . We leave the details to the reader.
36
Regularization Theory
(b) Property (2) follows immediately from Bernoulli’s inequality:
1/α a μ2 a μ2 1 − 1 − a μ2 , ≤ 1− 1− = α α thus q(α, μ) ≤
q(α, μ) ≤
a/α μ.
(3b) is shown by the same method as in (a) when we deﬁne
f (μ) = μσ 1 − q(α, μ) = μσ (1 − a μ2 )1/α , 0 ≤ μ ≤ a , with derivative f (μ) = μσ−1 (1 − a μ2 )1/α−1 σ(1 − aμ2 ) − 2aμ2 . Then aμ2max = ασ σ/2 . Again, we leave the details to the reader. 2+ασ with value f (μmax ) ≤ cσ α 2 (c) For property (2), √ it is suﬃcient to consider the case μ ≥ 2α. In this case, q(α,
μ) = 1 ≤μ/ α. For (3b), we consider only the case μ < α and have μσ 1 − q(α, μ) = μσ ≤ ασ/2 .
We will see later that the regularization methods for the ﬁrst two choices of q admit a characterization that avoids knowledge of the singular system. The choice (c) of q is called the spectral cutoﬀ. The spectral cutoﬀ solution xα,δ ∈ X is therefore deﬁned by xα,δ =
1 (y δ , yj )Y xj . μ j 2
μj ≥α
For this spectral cutoﬀ solution, we combine the fundamental estimates (2.12a), (2.12b) with the previous theorem and show the following result. Theorem 2.9 (a) Let K : X → Y be a compact and injective operator with singular system {μj , xj , yj : j ∈ N}. The operators Rα y :=
1 (y, yj )Y xj , μj 2
y∈Y ,
(2.13)
μj ≥α
√ deﬁne a regularization strategy with Rα L(Y,X) ≤ 1/ α. Thus the parameter choice α = α(δ) is admissible if α(δ) → 0 (δ → 0) and δ 2 /α(δ) → 0 (δ → 0). (b) Let Kx∗ = y ∗ and y δ ∈ Y be such that y δ − y ∗ Y ≤ δ. Furthermore, let x∗ = K ∗ z ∈ R(K ∗ ) with zY ≤ E and c > 0. For the choice α(δ) = c δ/E, we have the following error estimates for the spectral cutoﬀ regularization. α(δ),δ √ √ 1 ∗ x √ + c δE, (2.14a) −x X ≤ c α(δ),δ Kx − y ∗ Y ≤ (1 + c) δ . (2.14b)
2.2 Tikhonov Regularization
37
(c) Let x∗ = (K ∗ K)σ/2 z ∈ R (K ∗ K)σ/2 for some σ > 0 with zX ≤ E. The choice α(δ) = c δ 2/(σ+1) leads to the estimates α(δ),δ 1 ∗ σ/2 x √ +c −x X ≤ (2.14c) δ σ/(σ+1) E 1/(σ+1) , c α(δ),δ
Kx (2.14d) − y ∗ Y ≤ 1 + c(σ+1)/2 δ . Therefore, the spectral cutoﬀ regularization is optimal under the information (K ∗ )−1 x∗ Y ≤ E or (K ∗ K)−σ/2 x∗ X ≤ E, respectively (if K ∗ is onetoone). Proof: Combining the fundamental estimates (2.12a), (2.12b) with Theorem 2.8 (part (c)) yields the error estimates xα,δ − x∗ X
≤
√ δ √ + α zY , α
Kxα,δ − y ∗ Y
≤
δ + αzY ,
for part (b) and xα,δ − x∗ X
≤
δ √ + ασ/2 zX , α
Kxα,δ − y ∗ Y
≤
δ + α(σ+1)/2 zX ,
for part (c). The choices α(δ) = c δ/E and α(δ) = c(δ/E)2/(σ+1) , respectively, lead to the estimates (2.14a), (2.14b) and (2.14c), (2.14d), respectively. The general regularization concept discussed in this section can be found in many books on inverse theory [17, 110, 182]. It was not the aim of this section to study the most general theory. This concept has been extended in several directions. For example, in [84] (see also [88]) the notations of strong and weak convergence and divergence are deﬁned, and in [182] diﬀerent notations of optimality of regularization schemes are discussed . The idea of using ﬁlters has a long history [109, 265] and is very convenient for theoretical purposes. For given concrete integral operators, however, one often wants to avoid the computation of a singular system. In the next sections, we give equivalent characterizations for the ﬁrst two examples without using singular systems.
2.2
Tikhonov Regularization
A common method to deal with overdetermined ﬁnite linear systems of the form Kx = y is to determine the best ﬁt in the sense that one tries to minimize the defect Kx − yY with respect to x ∈ X for some norm in Y . If X is inﬁnitedimensional and K is compact, this minimization problem is also illposed by the following lemma.
38
Regularization Theory
Lemma 2.10 Let X and Y be Hilbert spaces, K : X → Y be linear and bounded, and y ∗ ∈ Y . There exists x ˆ ∈ X with K x ˆ − y ∗ Y ≤ Kx − y ∗ Y for all ˆ = K ∗ y ∗ . Here, x ∈ X if and only if x ˆ ∈ X solves the normal equation K ∗ K x ∗ K : Y → X denotes the adjoint of K. Proof:
A simple application of the binomial theorem yields
Kx − y ∗ 2Y − K x ˆ − y ∗ 2Y = 2 Re K x ˆ − y ∗ , K(x − x ˆ) Y + K(x − x ˆ)2Y
∗ = 2 Re K (K x ˆ − y ∗ ), x − x ˆ X + K(x − x ˆ)2Y
for all x, x ˆ ∈ X. If x ˆ satisﬁes K ∗ K x ˆ = K ∗ y ∗ , then Kx−y ∗ 2Y −K x ˆ−y ∗ 2Y ≥ 0, ∗ ˆ minimizes Kx−y ∗ Y , that is, x ˆ minimizes Kx−y Y . If, on the other hand, x then we substitute x = x ˆ + tz for any t > 0 and z ∈ X and arrive at
ˆ − y ∗ ), z X + t2 Kz2Y . 0 ≤ 2t Re K ∗ (K x
Division by t > 0 and t → 0 yields Re K ∗ (K x ˆ − y ∗ ), z X ≥ 0 for all z ∈ X; ˆ − y ∗ ) = 0, and x ˆ solves the normal equation. that is, K ∗ (K x
As a consequence of this lemma, we should penalize the defect (in the language of optimization theory) or replace the equation of the ﬁrst kind K ∗ K x ˆ= K ∗ y ∗ by an equation of the second kind (in the language of integral equation theory). Both viewpoints lead to the following minimization problem. Given the linear, bounded operator K : X → Y and y ∈ Y , determine xα ∈ X that minimizes the Tikhonov functional Jα (x) := Kx − y2Y + αx2X
for x ∈ X .
(2.15)
We prove the following theorem. Theorem 2.11 Let K : X → Y be a linear and bounded operator between Hilbert spaces and α > 0. Then the Tikhonov functional Jα has a unique minimum xα ∈ X. This minimum xα is the unique solution of the normal equation α xα + K ∗ Kxα = K ∗ y .
(2.16)
∗
The operator αI + K K is an isomorphism from X onto itself for every α > 0. Proof:
We use the following formula as in the proof of the previous lemma:
Jα (x) − Jα (xα ) = 2 Re Kxα − y, K(x − xα ) Y + 2α Re(xα , x − xα )X + K(x − xα )2Y + αx − xα 2X
= 2 Re K ∗ (Kxα − y) + αxα , x − xα X + K(x − xα )2Y + αx − xα 2X
(2.17)
for all x ∈ X. From this, the equivalence of the normal equation with the minimization problem for Jα is shown exactly as in the proof of Lemma 2.10. Next,
2.2 Tikhonov Regularization
39
we show that α I + K ∗ K is onetoone for every α > 0. Let αx + K ∗ Kx = 0. Multiplication by x yields α(x, x)X + (Kx, Kx)Y = 0, that is, x = 0. Finally, we show that αI + K ∗ K is onto. Since αI + K ∗ K is onetoone and selfadjoint, we conclude that its range is dense in X. It remains to show that the range is closed. Let zn = αxn + K ∗ Kxn converge to some z ∈ X. Then zn − zm = α(xn − xm ) + K ∗ K(xn − xm ). Multiplication of this equation by xn − xm yields αxn − xm 2X + K(xn − xm )2Y
=
(zn − zm , xn − xm )X
≤
zn − zm X xn − xm X .
From this, we conclude that αxn − xm X ≤ zn − zm X . Therefore, (xn ) is a Cauchy sequence and thus convergent to some x ∈ X which obviously satisﬁes αx + K ∗ Kx = z. The solution xα of equation (2.16) can be written in the form xα = Rα y with Rα := (αI + K ∗ K)−1 K ∗ : Y −→ X . (2.18) Choosing a singular system {μj , xj , yj : j ∈ N} for the compact and injective operator K, we see that Rα y has the representation Rα y =
∞
∞ μj q(α, μj ) (y, y ) x = (y, yj )Y xj , j Y j 2 α + μ μj j n=0 n=0
y ∈ Y , (2.19)
with q(α, μ) = μ2 /(α + μ2 ). This function q is exactly the ﬁlter function that was studied in Theorem 2.8, part (a). Therefore, applications of Theorems 2.6 and 2.7 yield the following. Theorem 2.12 Let K : X → Y be a linear, compact, and injective operator and α > 0 and x∗ ∈ X be the exact solution of Kx∗ = y ∗ . Furthermore, let y δ ∈ Y with y δ − y ∗ Y ≤ δ. (a) The operators R √α : Y → X from (2.18) form a regularization strategy with Rα L(Y,X) ≤ 1/(2 α). It is called the Tikhonov regularization method. Rα y δ is determined as the unique solution xα,δ ∈ X of the equation of the second kind α xα,δ + K ∗ Kxα,δ = K ∗ y δ .
(2.20)
Every choice α(δ) → 0 (δ → 0) with δ 2 /α(δ) → 0 (δ → 0) is admissible. (b) Let x∗ = K ∗ z ∈ R(K ∗ ) with zY ≤ E. We choose α(δ) = c δ/E for some c > 0. Then the following estimates hold: xα(δ),δ − x∗ X
≤
Kxα(δ),δ − y ∗ Y
≤
√ √ 1 √ 1/ c + c δ E , 2 (1 + c) δ .
(2.21a) (2.21b)
40
Regularization Theory
(c) For some σ ∈ (0, 2], let x∗ = (K ∗ K)σ/2 z ∈ R (K ∗ K)σ/2 with zX ≤ E. The choice α(δ) = c (δ/E)2/(σ+1) for c > 0 leads to the error estimates 1 α(δ),δ ∗ σ/2 √ + cσ c x − x X ≤ δ σ/(σ+1) E 1/(σ+1) , (2.21c) 2 c
Kxα(δ),δ − y ∗ Y ≤ 1 + cσ+1 c(σ+1)/2 δ . (2.21d) Here, cσ are the constants for the choice of q of part (a) of Theorem 2.8. Therefore, for σ ≤ 2 Tikhonov’s regularization method is optimal for the information (K ∗ )−1 x∗ Y ≤ E or (K ∗ K)−σ/2 x∗ X ≤ E, respectively (provided K ∗ is onetoone). Proof: Combining the fundamental estimates (2.12a), (2.12b) with Theorem 2.8 (part (a)) yields the error estimates √ α δ √ + zY xα,δ − x∗ X ≤ 2 2 α Kxα,δ − y ∗ X
≤
δ + αzY
for part (b) and xα,δ − x∗ X
≤
δ √ + cσ ασ/2 zX 2 α
Kxα,δ − y ∗ X
≤
δ + cσ+1 α(σ+1)/2 zX
for part (c). The choices α(δ) = c δ/E and α(δ) = c(δ/E)2/(σ+1) , respectively, lead to the estimates (2.21a), (2.21b) and (2.21c), (2.21d), respectively. From Theorem 2.12, we observe that α has to be chosen to depend on δ in such a way that it converges to zero as δ tends to zero but not as fast as δ 2 . From parts (b) and (c), we conclude that the smoother the solution x∗ is, the slower α has to tend to zero. On the other hand, the convergence can be arbitrarily slow if no a priori assumption about the solution x∗ (such as (b) or (c)) is available (see [243]). The case σ = 2 leads to the order O(δ 2/3 ) for xα(δ),δ −x∗ X . It is surprising to note that this order of convergence of Tikhonov’s regularization method
cannot be improved even if x∗ ∈ R (K ∗ K)σ/2 for σ > 2. Indeed, we prove the following result. Theorem 2.13 Let K : X → Y be linear, compact, and onetoone such that the range R(K) is inﬁnitedimensional. Furthermore, let x ∈ X, and assume that there exists a continuous function α : [0, ∞) → [0, ∞) with α(0) = 0 such that lim xα(δ),δ − xX δ −2/3 = 0 δ→0
for every y ∈ Y with y δ − KxY ≤ δ, where xα(δ),δ ∈ X solves (2.20) for α = α(δ). Then x = 0. δ
2.2 Tikhonov Regularization
41
Proof: Assume, on the contrary, that x = 0. First, we show that α(δ) δ −2/3 → 0. Set y = Kx. From
α(δ) I + K ∗ K xα(δ),δ − x = K ∗ (y δ − y) − α(δ) x , we estimate α(δ) xX ≤ KL(X,Y ) δ +
α(δ) + K2L(X,Y ) xα(δ),δ − xX .
We multiply this equation by δ −2/3 and use the assumption that xα(δ),δ tends to x faster than δ 2/3 to zero, that is, xα(δ),δ − xX δ −2/3 → 0 . This yields α(δ) δ −2/3 → 0. In the second part we construct a contradiction. Let {μj , xj , yj : j ∈ N} be a singular system for K. Deﬁne δj := μ3j
and
y δj := y + δj yj ,
j ∈ N.
Then δj → 0 as j → ∞ and, with αj := α(δj ) and xαj := (αj I + K ∗ K)−1 y,
xαj ,δj − x = xαj ,δj − xαj + xαj − x −1 ∗
K (δj yj ) + xαj − x = αj I + K ∗ K
δj μj xj + xαj − x . = 2 αj + μj −2/3
Because also xαj − xX δj
→ 0, we conclude that
1/3
δj μj −→ 0, αj + μ2j
j → ∞.
But, on the other hand, 1/3
δj μj μ2j −2/3 −1 = = 1 + αj δj −→ 1, 2 2 αj + μj αj + μj This is a contradiction.
j → ∞.
This result shows that Tikhonov’s regularization method is not optimal for
stronger “smoothness” assumptions x∗ ∈ R (K ∗ K)σ/2 for σ > 2. This is in contrast to, for example, the spectral cutoﬀ regularization (see Theorem 2.9 above) or Landweber’s method or the conjugate gradient method, which are discussed later. The choice of α in Theorem 2.12 is made a priori, that is, before starting the computation of xα,δ by solving the least squares problem. In Sections 2.5
42
Regularization Theory
to 2.7 we study a posteriori choices of α, that is, choices of α made during the process of computing xα,δ . It is possible to choose stronger norms in the penalty term of the Tikhonov functional. Instead of (2.15), one can minimize the functional Kx − y δ 2y + αx21
on X1 ,
where · 1 is a stronger norm (or only seminorm) on a subspace X1 ⊂ X. This was originally done by Phillips [217] and Tikhonov [261, 262] (see also [97]) for linear integral equations of the ﬁrst kind. They chose the seminorm 1/2
x1 := x L2 or the H 1 norm x1 := x2L2 + x 2L2 . By characterizing · 1 through a singular system for K, one obtains similar convergence results as above in the stronger norm · 1 . For further aspects of regularization with diﬀerential operators or stronger norms, we refer to [70, 119, 180, 205] and the monographs [110, 111, 182]. The interpretation of regularization by smoothing norms in terms of reproducing kernel Hilbert spaces has been observed in [133].
2.3
Landweber Iteration
Landweber [172], Fridman [98], and Bialy [18] suggested to rewrite the equation Kx = y in the form x = (I − a K ∗ K) x + a K ∗ y for some a > 0 and iterate this equation, that is, compute x0 := 0
and xm = (I − a K ∗ K) xm−1 + a K ∗ y
(2.22)
for m = 1, 2, . . . . This iteration scheme can be interpreted as the steepest descent algorithm applied to the quadratic functional x → Kx − y2Y as the following lemma shows. Lemma 2.14 Let the sequence (xm ) be deﬁned by (2.22) and deﬁne the functional ψ : X → R by ψ(x) = 12 Kx − y2Y , x ∈ X. Then ψ is Fr´echet diﬀerentiable in every z ∈ X and
ψ (z)x = Re(Kz − y, Kx)Y = Re K ∗ (Kz − y), x X , x ∈ X . (2.23) The linear functional ψ (z) can be identiﬁed with K ∗ (Kz −y) ∈ X in the Hilbert space X over the ﬁeld R. Therefore, xm = xm−1 − a K ∗ (Kxm−1 − y) is the steepest descent step with stepsize a. Proof:
The binomial formula yields ψ(z + x) − ψ(z) − Re(Kz − y, Kx)Y =
1 Kx2Y 2
and thus 2 ψ(z + x) − ψ(z) − Re(Kz − y, Kx)Y ≤ 1 K2 L(X,Y ) xX , 2
2.3 Landweber Iteration
43
which proves that the mapping x → Re(Kz − y, Kx)Y is the Fr´echet derivative of ψ at z. Equation (2.22) is a linear recursion formula for xm . By induction with respect to m, it is easily seen that xm has the explicit form xm = Rm y, where the operator Rm : Y → X is deﬁned by Rm := a
m−1
(I − aK ∗ K)k K ∗
for m = 1, 2, . . . .
(2.24)
k=0
Choosing a singular system {μj , xj , yj : j ∈ N} for the compact and injective operator K, we see that Rm y has the representation Rm y
= a
=
=
∞
j=1 ∞ j=1 ∞
μj
m−1
(1 − aμ2j )k (y, yj )Y xj
k=0
1 1 − (1 − aμ2j )m (y, yj )Y xj μj
q(m, μj ) (y, yj )Y xj , μj n=0
y ∈ Y,
(2.25)
with q(m, μ) = 1 − (1 − aμ2 )m . We studied this ﬁlter function q in Theorem 2.8, part (b), when we deﬁne α = 1/m. Therefore, applications of Theorems 2.6 and 2.7 yield the following result. Theorem 2.15 Again let K : X → Y be a compact and injective operator and let 0 < a < 1/K2L(X,Y ) . Let x∗ ∈ X be the exact solution of Kx∗ = y ∗ . Furthermore, let y δ ∈ Y with y δ − y ∗ Y ≤ δ. (a) Deﬁne the linear and bounded operators Rm : Y → X by (2.24). These operators Rm deﬁne a regularization strategy√with discrete regularization parameter α = 1/m, m ∈ N, and Rm L(Y,X) ≤ a m. The sequence xm,δ = Rm y δ is computed by the iteration (2.22), that is, x0,δ = 0
and
xm,δ = (I − a K ∗ K)xm−1,δ + a K ∗ y δ
(2.26)
for m = 1, 2, . . . . Every strategy m(δ) → ∞ (δ → 0) with δ 2 m(δ) → 0 (δ → 0) is admissible. (b) Again let x∗ = K ∗ z ∈ R(K ∗ ) with zY ≤ E and 0 < c1 < c2 . For every choice m(δ) with c1 Eδ ≤ m(δ) ≤ c2 Eδ , the following estimates hold: √ (2.27a) xm(δ),δ − x∗ X ≤ c3 δ E ,
m(δ),δ ∗ Kx − y Y ≤ 1 + 1/(ac1 ) δ , (2.27b) for some c3 depending on c1 , c2 , and a. Therefore, the Landweber iteration is optimal under the information (K ∗ )−1 x∗ Y ≤ E.
44
Regularization Theory
(c) For some σ > 0, let x∗ = (K ∗ K)σ/2 z ∈ R (K ∗ K)σ/2 with zX ≤ E and let 0 < c1 < c2 . For every choice m(δ) with c1 (E/δ)2/(σ+1) ≤ m(δ) ≤ c2 (E/δ)2/(σ+1) , we have xm(δ),δ − x∗ X Kx
m(δ),δ
∗
− y Y
≤
c3 δ σ/(σ+1) E 1/(σ+1) ,
(2.27c)
≤
c3 δ ,
(2.27d)
for some c3 depending on c1 , c2 , σ, and a. Therefore, the Landweber iteration is also optimal for the information (K ∗ K)−σ/2 x∗ X ≤ E for every σ > 0. Proof: Combining the fundamental estimates (2.4a), (2.12a), and (2.12b) with Theorem 2.8 (part (b)) yields the error estimates xm,δ − x∗ X
≤
Kxm,δ − y ∗ Y
≤
√ 1 zY , am + √ 2am 1 zY , δ + am δ
for part (b) and xm,δ − x∗ X
≤
Kxm,δ − y ∗ Y
≤
σ σ/2 am + zX , 2am (σ+1)/2 σ+1 δ + zX , 2am δ
√
(2.28)
for part (c). Replacing m in the ﬁrst term by the upper bound and in the second by the lower bound yields estimates (2.27a), (2.27b) and (2.27c), (2.27d), respectively. The choice x0 = 0 is made to simplify the analysis. In general, the explicit iteration xm is given by xm = a
m−1
(I − aK ∗ K)k K ∗ y + (I − aK ∗ K)m x0 ,
m = 1, 2, . . . .
k=0
In this case, Rm is aﬃne linear, that is, of the form Rm y = z m + Sm y, y ∈ Y , for some z m ∈ X and some linear operator Sm : Y → X. For this method, we observe again that high precision (ignoring the presence of errors) requires a large number m of iterations but stability forces us to keep m small enough. We come back to the Landweber iteration in the next chapter, where we show that an optimal choice of m(δ) can be made a posteriori through a proper stopping rule. Other possibilities for regularizing ﬁrst kind equations Kx = y with compact operators K, which we have not discussed, are methods using positivity or more general convexity constraints (see [27, 30, 235, 236]).
2.4 A Numerical Example
2.4
45
A Numerical Example
In this section, we demonstrate the regularization methods by Tikhonov and Landweber for the following integral equation of the ﬁrst kind:
1 (1 + ts) ets x(s) ds = et ,
0 ≤ t ≤ 1,
(2.29)
0
with unique solution x∗ (t) = 1 (see Problem 2.1). The operator K : L2 (0, 1) → L2 (0, 1) is given by
1 (Kx)(t) = (1 + ts) ets x(s) ds 0 ∗
and is selfadjoint, that is, K = K. We note that x∗ does not belong to the range of K (see Problem 2.1). For the numerical evaluation of Kx, we use Simpson’s rule. With ti = i/n, i = 0, . . . , n, n even, we replace (Kx)(ti ) by ⎧ 1 n ⎨ 3n , j = 0 or n, ti tj 4 , j = 1, 3, . . . , n − 1, wj (1 + ti tj ) e x(tj ) where wj = ⎩ 3n 2 j=0 3n , j = 2, 4, . . . , n − 2. We note that the corresponding matrix A is not symmetric. This leads to the discretized Tikhonov equation α xα,δ + A2 xα,δ = A y δ . Here, y δ = yiδ ∈ Rn+1 is a perturbation (uniformly distributed random vector) of the discrete righthand yi∗ = exp(i/n) such that ! " n " 1 ∗ δ y − y 2 := # (y ∗ − yiδ )2 ≤ δ . n + 1 i=0 i The average results of ten computations are given in the following tables, where we have listed the discrete norms 1 − xα,δ 2 of the errors between the exact solution x∗ (t) = 1 and Tikhonov’s approximation xα,δ (Table 2.1).
Table 2.1: Tikhonov regularization for δ = 0 α 10−1 10−2 10−3 10−4 10−5 10−6 10−7 10−8 10−9 10−10
n=8 2.4 ∗ 10−1 7.2 ∗ 10−2 2.6 ∗ 10−2 1.3 ∗ 10−2 2.6 ∗ 10−3 9.3 ∗ 10−4 3.5 ∗ 10−4 1.3 ∗ 10−3 1.6 ∗ 10−3 3.9 ∗ 10−3
n = 16 2.3 ∗ 10−1 6.8 ∗ 10−2 2.4 ∗ 10−2 1.2 ∗ 10−2 2.3 ∗ 10−3 8.7 ∗ 10−4 4.4 ∗ 10−4 3.2 ∗ 10−5 9.3 ∗ 10−5 2.1 ∗ 10−4
46
Regularization Theory Table 2.2: Tikhonov regularization for δ > 0 α 10−1 10−2 10−3 10−4 10−5 10−6 10−7 10−8 10−9 10−10
δ = 0.0001 0.2317 0.0681 0.0238 0.0119 0.0031 0.0065 0.0470 0.1018 0.1730 1.0723
δ = 0.001 0.2317 0.0677 0.0240 0.0127 0.0168 0.0909 0.2129 0.8119 1.8985 14.642
δ = 0.01 0.2310 0.0692 0.0268 0.1172 0.2553 0.6513 2.4573 5.9775 16.587
δ = 0.1 0.2255 0.1194 0.1651 1.0218 3.0065 5.9854 30.595
In the ﬁrst table, we have chosen δ = 0; that is, only the discretization error for Simpson’s rule is responsible for the increase of the error for small α. This diﬀerence between discretization parameters n = 8 and n = 16 is noticeable for α ≤ 10−8 . We refer to [267] for further examples (Table 2.2). In the second table, we always took n = 16 and observed that the total error ﬁrst decreases with decreasing α up to an optimal value and then increases again. This is predicted by the theory, in particular by estimates (2.21a) and (2.21b). In the following table, we list results corresponding to the iteration steps for Landweber’s method with parameter a = 0.5 and again n = 16 (Table 2.3). Table 2.3: Landweber iteration m 1 2 3 4 5 6 7 8 9 10 100 200 300 400 500
δ = 0.0001 0.8097 0.6274 0.5331 0.4312 0.3898 0.3354 0.3193 0.2905 0.2838 0.2675 0.0473 0.0248 0.0242 0.0241 0.0239
δ = 0.001 0.8097 0.6275 0.5331 0.4311 0.3898 0.3353 0.3192 0.2904 0.2838 0.2675 0.0474 0.0248 0.0242 0.0241 0.0240
δ = 0.01 0.8088 0.6278 0.5333 0.4322 0.3912 0.3360 0.3202 0.2912 0.2845 0.2677 0.0476 0.0253 0.0249 0.0246 0.0243
δ = 0.1 0.8135 0.6327 0.5331 0.4287 0.3798 0.3339 0.3248 0.2902 0.2817 0.2681 0.0534 0.0409 0.0347 0.0385 0.0424
We observe that the error decreases quickly in the ﬁrst few steps and then slows down. To compare Tikhonov’s method and Landweber’s iteration, we
2.5 The Discrepancy Principle of Morozov
47
note that the error corresponding to iteration number m has to be compared with the error corresponding to α = 1/(2m) (see the estimates in the proofs of Theorems 2.15 and 2.12). Taking this into account, we observe that both methods are comparable where precision is concerned. We note, however, that the computation time of Landweber’s method is considerably higher than for Tikhonov’s method, in particular if the error δ is small. On the other hand, Landweber’s method is stable with respect to perturbations of the righthand side and gives very good results even for large errors δ. We refer also to Section 3.5, where these regularization methods are compared with those to be discussed in the subsequent sections for Symm’s integral equation.
2.5
The Discrepancy Principle of Morozov
The following three sections are devoted to a posteriori choices of the regularization parameter α. In this section, we study a discrepancy principle based on the Tikhonov regularization method. Throughout this section, we assume again that K : X −→ Y is a compact and injective operator between Hilbert spaces X and Y with dense range R(K) ⊂ Y . Again, we study the equation Kx = y δ for given perturbations y δ ∈ Y of the exact righthand side y ∗ = Kx∗ . The Tikhonov regularization of this equation was investigated in Section 2.2. It corresponds to the regularization operators Rα = (αI + K ∗ K)−1 K ∗
for α > 0
that approximate the unbounded inverse of K on R(K). We have seen that xα = Rα y exists and is the unique minimum of the Tikhonov functional Jα (x) := Kx − y2Y + αx2X ,
x∈X,
α > 0.
(2.30)
More facts about the dependence on α and y are proven in the following theorem. Theorem 2.16 Let y ∈ Y , α > 0, and xα be the unique solution of the equation α xα + K ∗ Kxα = K ∗ y .
(2.31)
Then xα depends continuously on y and α. The mapping α → xα X is monotonously nonincreasing and lim xα = 0 .
α→∞
The mapping α → Kxα − yY is monotonically nondecreasing and lim Kxα = y .
α→0
If y = 0, then strict monotonicity holds in both cases.
48
Regularization Theory
Proof:
We proceed in ﬁve steps.
(i) Using the deﬁnition of Jα and the optimality of xα , we conclude that
that is, xα X
α xα 2X ≤ Jα (xα ) ≤ Jα (0) = y2Y , √ ≤ yY / α. This proves that xα → 0 as α → ∞.
(ii) We choose α > 0 and β > 0 and subtract the equations for xα and xβ : α (xα − xβ ) + K ∗ K(xα − xβ ) + (α − β)xβ = 0 .
(2.32)
Multiplication by xα − xβ yields αxα − xβ 2X + K(xα − xβ )2Y = (β − α) (xβ , xα − xβ )X .
(2.33)
From this equation, we ﬁrst conclude that αxα − xβ 2X ≤ β − α (xβ , xα − xβ )X  ≤ β − α xβ X xα − xβ X , that is,
yY αxα − xβ X ≤ β − α xβ X ≤ β − α √ . β This proves continuity of the mapping α → xα . (iii) Now let β > α > 0. From (2.33) we conclude that (xβ , xα − xβ )X is (real and) positive (if zero then xα = xβ and thus xα = xβ = 0 from (2.32). This would contradict the assumption y = 0). Therefore, xβ 2X < (xβ , xα )X ≤ xβ X xα X , that is, xβ X < xα X which proves strict monotonicity of α → xα X . (iv) We multiply the normal equation for xβ by xα − xβ . This yields
β (xβ , xα − xβ )X + Kxβ − y, K(xα − xβ ) Y = 0 . Now let α > β. From (2.33), we see that (xβ , xα − xβ )X < 0; that is,
0 < Kxβ − y, K(xα − xβ ) Y = (Kxβ − y, Kxα − y)Y − Kxβ − y2Y . The Cauchy–Schwarz inequality yields Kxβ − yY < Kxα − yY . (v) Finally, let ε > 0. Because the range of K is dense in Y , there exists x ∈ X with Kx − y2Y ≤ ε2 /2. Choose α0 such that α0 x2X ≤ ε2 /2. Then Kxα − y2Y ≤ Jα (xα ) ≤ Jα (x) ≤ ε2 ; that is, Kxα − yY ≤ ε for all α ≤ α0 .
Now we consider the determination of α(δ) from the discrepancy principle; see [194–196]. We compute α = α(δ) > 0 such that the corresponding Tikhonov solution xα,δ , that is, the solution of the equation α xα,δ + K ∗ Kxα,δ = K ∗ y δ ,
2.5 The Discrepancy Principle of Morozov
49
that is, the minimum of Jα,δ (x) := Kx − y δ 2Y + αx2X , satisﬁes the equation Kxα,δ − y δ Y = δ .
(2.34)
Note that this choice of α by the discrepancy principle guarantees that, on the one side, the error of the defect is δ and, on the other side, α is not too small. Equation (2.34) is uniquely solvable, provided δ < y δ Y because by the previous theorem lim Kxα,δ − y δ Y = y δ Y > δ
α→∞
and lim Kxα,δ − y δ Y = 0 < δ .
α→0
Furthermore, α → Kxα,δ − y δ Y is continuous and strictly increasing. Theorem 2.17 Let K : X → Y be linear, compact, and onetoone with dense range in Y . Let Kx∗ = y ∗ with x∗ ∈X, y ∗ ∈Y , and y δ ∈ Y such that y δ − y ∗ Y ≤ δ < y δ Y . Let the Tikhonov solution xα(δ) satisfy Kxα(δ),δ − y δ Y = δ for all δ ∈ (0, δ0 ). Then (a) xα(δ),δ → x∗ for δ → 0; that is, the discrepancy principle is admissible. (b) Let x∗ = K ∗ z ∈ K ∗ (Y ) with zY ≤ E. Then √ xα(δ),δ − x∗ X ≤ 2 δ E . Therefore, the discrepancy principle is an optimal regularization strategy under the information (K ∗ )−1 x∗ Y ≤ E. Proof:
xδ := xα(δ),δ minimizes the Tikhonov functional J (δ) (x) := Jα(δ),δ (x) = α(δ)x2X + Kx − y δ 2Y .
Therefore, we conclude that α(δ)xδ 2X + δ 2
= J (δ) (xδ ) ≤ J (δ) (x∗ ) = α(δ)x∗ 2X + y ∗ − y δ 2Y ≤ α(δ)x∗ 2X + δ 2 ,
and hence xδ X ≤ x∗ X for all δ > 0. This yields the following important estimate: xδ − x∗ 2X
= xδ 2X − 2 Re(xδ , x∗ )X + x∗ 2X ≤ 2 x∗ 2X − Re(xδ , x∗ )X = 2 Re(x∗ − xδ , x∗ )X .
50
Regularization Theory
First, we prove part (b): Let x∗ = K ∗ z, z ∈ Y . Then xδ − x∗ 2X
≤
2 Re(x∗ − xδ , K ∗ z)X = 2 Re(y ∗ − Kxδ , z)Y
≤ ≤
2 Re(y ∗ − y δ , z)Y + 2 Re(y δ − Kxδ , z)Y 2 δzY + 2 δzY = 4 δzY ≤ 4 δ E.
(a) Now let x∗ ∈ X and ε > 0 be arbitrary. The range R(K ∗ ) is dense in X because K is onetoone. Therefore, there exists x ˆ = K ∗ z ∈ R(K ∗ ) such that ∗ ˆ x − x X ≤ ε/3. Then we conclude by similar arguments as above that xδ − x∗ 2X
2 Re(x∗ − xδ , x∗ − x ˆ)X + 2 Re(x∗ − xδ , K ∗ z)X ε + 2 Re(y ∗ − Kxδ , z)Y ≤ 2 x∗ − xδ X 3 ε + 4 δ zY . ≤ 2 x∗ − xδ X 3
2 This can be rewritten as x∗ − xδ X − ε/3 ≤ ε2 /9 + 4 δ zY . Now we choose δ > 0 such that the righthand side is less than 4ε2 /9. Taking the square root, we conclude that x∗ − xδ X ≤ ε for this δ. ≤
The condition y δ Y > δ certainly makes sense because otherwise the righthand side would be less than the error level δ, and xδ = 0 would be an acceptable approximation to x∗ . The determination of α(δ) is thus equivalent to the problem of ﬁnding the zero of the monotonic function φ(α) := Kxα,δ − y δ 2Y − δ 2 (for ﬁxed δ > 0). It is not necessary to satisfy the equation Kxα,δ −y δ Y = δ exactly. An inclusion of the form c1 δ ≤ Kxα,δ − y δ Y ≤ c2 δ for ﬁxed 1 ≤ c1 < c2 is suﬃcient to prove the assertions of the previous theorem. The computation of α(δ) can be carried out with Newton’s method. The derivative of the mapping α → xα,δ is given by the solution of the equation d α,δ x = −xα,δ , as is easily seen by diﬀerentiating (2.31) with (αI + K ∗ K) dα respect to α.
√ In the following theorem, we prove that the order of convergence O δ is best possible for the discrepancy principle. Therefore, by the results of Example 1.20, it cannot be optimal under the information (K ∗ K)−σ/2 xX ≤ E for σ > 1. Theorem 2.18 Let K be onetoone and compact and assume that there exists
σ > 0 such that for every x ∈ R (K ∗ K)σ/2 with y = Kx = 0, and all sequences δn → 0 and y δn ∈ Y with y − y δn Y ≤ δn and y δn Y > δn for all n, the α(δn ) is chosen by the discrepancy Tikhonov solutions xn = xα(δn ),δn (where √ principle) converge to x faster than δn to zero, that is, 1 √ xn − xX −→ 0 δn
as n → ∞ .
Then the range R(K) has to be ﬁnitedimensional.
(2.35)
2.5 The Discrepancy Principle of Morozov
51
Proof: We show ﬁrst that the choice of α(δ) by the discrepancy principle implies the boundedness of α(δ)/δ. Abbreviating xδ := xα(δ),δ , we write for δ ≤ 13 yY 1 yY 3
=
2 1− yY ≤ yY − 2δ 3
≤ y − y δ Y + y δ Y − 2δ ≤ y δ Y − δ = y δ Y − y δ − Kxδ Y ≤ Kxδ Y 1 δ KK ∗ (y δ − Kxδ )Y ≤ K2L(X,Y ) , = α(δ) α(δ) where we applied K to (2.31). Thus we have shown that there exists c > 0 with α(δ) ≤ cδ for all suﬃciently small δ. Now we assume that dim R(K) = ∞ and construct a contradiction. Let {μj , xj , yj : j ∈ N} be a singular system of K and deﬁne x :=
1 x1 μ1
y δn := y1 + δn yn
and
with
δn := μ2n .
Then y = Kx = y1 and δn → 0 as n → ∞ and x ∈ R (K ∗ K)σ/2 for every σ > 0 and y δn − yY = δn < 1 + δn2 = y δn Y . Therefore, the assumptions for the discrepancy principle are satisﬁed and thus (2.35) holds. The solution of α(δn )xn + K ∗ Kxn = K ∗ y δn is given by xn =
μ1 μn δn x1 + xn . α(δn ) + μ21 α(δn ) + μ2n
We compute xn − x = −
α(δn ) μn δn
x1 + xn 2 α(δn ) + μ2n μ1 α(δn ) + μ1
and hence √ 1 μn δn δn 1 1 n √ x − xX ≥ . = = ≥ α(δn ) + μ2n α(δn ) + δn 1 + α(δn )/δn 1+c δn This contradicts (2.35).
We remark that the estimate α(δ) ≤ δK2L(X,Y ) / y δ Y − δ derived in the
previous proof suggests to use δK2L(X,Y ) / y δ Y − δ as a starting value for Newton’s method to determine α(δ)! There has been an enormous eﬀort to modify the original discrepancy principle while still retaining optimal orders of convergence. We refer to [86, 93, 102, 209, 238].
52
Regularization Theory
2.6
Landweber’s Iteration Method with Stopping Rule
It is very natural to use the following stopping criteria, which can be implemented in every iterative algorithm for the solution of Kx = y. Let r > 1 be a ﬁxed number. Stop the algorithm at the ﬁrst occurrence of m ∈ N0 with Kxm,δ − y δ Y ≤ rδ. The following theorem shows that this choice of m is possible for Landweber’s method and leads to an admissible and even optimal regularization strategy. Theorem 2.19 Let K : X → Y be linear, compact, and onetoone with dense range. Let Kx∗ = y ∗ and y δ ∈ Y be perturbations with y ∗ − y δ Y ≤ δ and y δ Y ≥ rδ for all δ ∈ (0, δ0 ) where r > 1 is some ﬁxed parameter (independent of δ). Let the sequence xm,δ , m = 0, 1, 2, . . . , be determined by Landweber’s method; that is, x0,δ = 0 and
(2.36) xm+1,δ = xm,δ + a K ∗ y δ − Kxm,δ , m = 0, 1, 2, . . . , for some 0 < a < 1/K2L(X,Y ) . Then the following assertions hold: (1) limm→∞ Kxm,δ −y δ Y = 0 for every δ > 0; that is, the following stopping rule is welldeﬁned: Let m = m(δ) ∈ N0 be the smallest integer with Kxm,δ − y δ Y ≤ rδ. (2) δ 2 m(δ) → 0 for δ → 0, that is, this choice of m(δ) is admissible. Therefore, by the assertions of Theorem 2.15, the sequence xm(δ),δ converges to x∗ as δ tends to zero.
(3) If x∗ = K ∗ z ∈ R(K ∗ ) or x∗ = (K ∗ K)σ/2 z ∈ R (K ∗ K)σ/2 for some σ > 0 and some z, then we have the following orders of convergence: √ xm(δ),δ − x∗ X ≤ c E δ or (2.37a) xm(δ),δ − x∗ X
≤ c E 1/(σ+1) δ σ/(σ+1) ,
(2.37b)
respectively, for some c > 0 where again E = z. This means that this choice of m(δ) is optimal for all σ > 0. Proof:
In (2.25), we showed the representation Rm y =
∞ 1 − (1 − aμ2j )m
μj
j=1
(y, yj )Y xj
for every y ∈ Y and thus KRm y − y2Y =
∞ j=1
2 (1 − aμ2j )2m (y, yj )Y .
2.6 Landweber’s Iteration Method with Stopping Rule
53
From 1 − aμ2j < 1, we conclude that KRm − IL(Y ) ≤ 1. Application to y δ instead of y yields Kxm,δ − y δ 2Y =
∞ 2 (1 − aμ2j )2m (y δ , yj )Y . j=1
(1) Let ε > 0 be given. Choose j1 ∈ N with ∞ 2 δ (y , yj )Y 2 < ε . 2 j=j +1 1
Because 1 − aμ2j 2m → 0 as m → ∞ uniformly for j = 1, . . . , j1 , we can ﬁnd m0 ∈ N with j1
1 − aμ2j
2m δ (y , yj )Y 2
j=1
≤
max
j=1,...,j1
1 − aμ2j
j1 2 δ 2m (y , yj )Y 2 ≤ ε 2 j=1
for m ≥ m0 .
This implies that Kxm,δ − y δ 2Y ≤ ε2 for m ≥ m0 ; that is, the method is admissible. It is suﬃcient to prove assertion (2) only for the case m(δ) → ∞. We set m := m(δ) for abbreviation and derive an upper bound of m. By the choice of m(δ), we have with y ∗ = Kx∗ KRm−1 y ∗ − y ∗ Y
≥ ≥
KRm−1 y δ − y δ Y − (KRm−1 − I)(y ∗ − y δ )Y r δ − KRm−1 − IL(Y ) δ ≥ (r − 1) δ , (2.38)
and hence m (r − 1)2 δ 2
≤ m
∞
2 (1 − aμ2j )2m−2 (y ∗ , yj )Y
j=1
=
∞
2 m (1 − aμ2j )2m−2 μ2j (x∗ , xj )X .
(2.39)
j=1
We show that the series converges to zero as δ → 0. (The dependence on δ is hidden in m.) First we note that mμ2 (1 − aμ2 )2m−2 ≤ 1/a for all m ≥ 1 and all μ ≥ 0 (see Problem 2.6). Now we again split the series into a ﬁnite sum and a remaining series and estimate in the “long tail”, the expression m(1 − aμ2j )2m−2 μ2j by 1/a and note that m(1 − aμ2j )2m−2 tends to zero as m → ∞ uniformly in j ∈ {1, . . . , j1 }. This proves convergence and thus part (2). For part (3) we remind the reader of the fundamental estimate (2.4a), which we need in the following form:
54
Regularization Theory xm,δ − x∗ X ≤ δ
√ a m + Rm Kx∗ − x∗ X .
(2.40)
We restrict ourselves to the case that x∗ = (K ∗ K)σ/2 z for some σ > 0. We set again E = zX and estimate m = m(δ) from above by using (2.38) and (2.28) for y ∗ instead of y δ (that is, δ = 0) and m − 1 instead of m. (σ+1)/2 σ+1 (r − 1) δ ≤ KRm−1 y ∗ − y ∗ Y ≤ E. 2a(m − 1) Solving for m − 1 yields an estimate of the form 2/(σ+1) E m(δ) ≤ 2(m(δ) − 1) ≤ c δ for some c which depends solely on σ, r, and a. Substituting this into the ﬁrst √ term of (2.40) for m = m(δ) yields δ a m(δ) ≤ ac E 1/(σ+1) δ σ/(σ+1) . Next, we estimate the second term of (2.40). In the following estimates, we use again older’s inequality with p = σ+1 that x∗ = (K ∗ K)σ/2 z and also H¨ σ and q = σ +1. Rm Kx∗ − x∗ 2X ∞ ∞ 2 = (1 − aμ2j )2m (x∗ , xj )X 2 = (1 − aμ2j )2m μ2σ j (z, xj )X  =
j=1 ∞
j=1
1/p 1/q (1 − aμ2j )2m μ2σ+2 (1 − aμ2j )2m (z, xj )X 2 (z, xj )X 2 j
j=1
≤
⎤1/p ⎡ ⎤1/q ⎡ ∞ ∞ ⎣ (1 − aμ2j )2m μ2σ+2 (z, xj )X 2 ⎦ ⎣ (1 − aμ2j )2m (z, xj )X 2 ⎦ j j=1
≤ ≤
j=1 2σ/(σ+1) y ∗ Y
KRm y ∗ − E 2/(σ+1) 2σ/(σ+1) E 2/(σ+1) KRm y δ − y δ Y + (KRm − I)(y δ − y ∗ )Y .
Now we use the stopping rule for m = m(δ) in the ﬁrst term and the estimate KRm − IL(Y ) ≤ 1 in the second one. This yields Rm(δ) Kx∗ − x∗ 2X ≤ (r + 1)2σ/(σ+1) E 2/(σ+1) δ 2σ/(σ+1) . Substituting this into (2.40) for m = m(δ) yields the assertion.
It is also possible to formulate a similar stopping criterion for Morozov’s discrepancy principle. Choose an arbitrary monotonically decreasing sequence (αm ) in R with limm→∞ αm = 0. Determine m = m(δ) as the smallest integer m with Kxαm ,δ − y δ Y ≤ rδ. For details, we refer the reader to [89] or [182]. One can construct more general classes of methods through the spectral representation of the solution x∗ . Comparing the regularizer xδ of Landweber’s method with the true solution x∗ , we observe that the function φ(μ) = 1/μ, μ > 0 is approximated by the polynomial Pm (μ) = 1 − (1 − aμ2 )m /μ. It is certainly possible to choose
2.7 The Conjugate Gradient Method
55
better polynomial approximations of the function μ → 1/μ. Orthogonal polynomials are particularly useful. This leads to the νmethods; see [21, 118], or [120]. A common feature of these methods that is very crucial in the analysis is the fact that all of the polynomials Pm are independent of y and y δ . For the important conjugate gradient algorithm discussed in the next section, this is not the case, and that makes an error analysis much more diﬃcult to obtain.
2.7
The Conjugate Gradient Method
In this section, we study the regularizing properties of the conjugate gradient method. Because the proofs of the theorems are rather technical, we only state the results and transfer the proofs to the Appendix B. First, we recall the conjugate gradient method for least squares problems for overdetermined systems of linear equations of the form Kx = y. Here, K ∈ Rm×n and y ∈ Rm with m ≥ n are given. Because it is hopeless to satisfy all equations simultaneously, one minimizes the defect f (x) := Kx − y2 , x ∈ Rn , where · denotes the Euclidean norm in Rm . Standard algorithms for solving least squares problems are the QRalgorithm or the conjugate gradient method; see [71, 106, 132]. Because we assume that the latter is known for systems of equations, we formulate it now (in Figure 2.2) for the operator equation Kx = y, where K : X → Y is a bounded, linear, and injective operator between Hilbert spaces X and Y with adjoint K ∗ : Y → X. Theorem 2.20 (Fletcher–Reeves) Let K : X → Y be a compact, linear, and injective operator between Hilbert spaces X and Y . Then the conjugate gradient method is welldeﬁned and either stops or produces sequences (xm ), (pm ) in X with the properties
∇f (xm ), ∇f (xj ) X = 0 for all j = m (2.41a) and (Kpm , Kpj )Y = 0
for all j = m ;
(2.41b)
that is, the gradients are orthogonal and the directions pm are Kconjugate. Furthermore,
∇f (xj ), K ∗ Kpm X = 0 for all j < m . (2.41c) Deﬁne again the function f (x) := Kx − y2Y = (Kx − y, Kx − y)Y ,
x∈X.
We abbreviate ∇f (x) := 2 K ∗ (Kx − y) ∈ X and note that ∇f (x) is indeed the Riesz representation (see Theorem A.23) of the Fr´echet derivative f (x) of f at x (see Lemma 2.14). We call two elements p, q ∈ X Kconjugate if (Kp, Kq)Y = 0. If K is onetoone, this bilinear form has the properties of an inner product on X.
56
Regularization Theory
The following theorem gives an interesting and diﬀerent interpretation of the elements xm .
x0 := 0 m=0 ? K y=0? ∗
@ @
@ @
yes
 STOP
yes
 STOP
no ? p0 := −K ∗ y ?

tm =
(Kxm − y, Kpm )Y Kpm 2Y
xm+1 = xm − tm pm
@ @
? @ K ∗ (Kxm+1 − y) = 0 ? @ no ?
γm :=
K ∗ (Kxm+1 − y)2X K ∗ (Kxm − y)2X
pm+1 := K ∗ (Kxm+1 − y) + γm pm ? m := m + 1
Figure 2.2: The conjugate gradient method
Theorem 2.21 Let (xm ) and (pm ) be the sequences of the conjugate gradient method. Deﬁne the space Vm := span {p0 , . . . , pm }. Then we have the following equivalent characterizations of Vm : Vm
= =
span ∇f (x0 ), . . . , ∇f (xm ) span p0 , K ∗ Kp0 , . . . , (K ∗ K)m p0
(2.42a) (2.42b)
2.7 The Conjugate Gradient Method
57
for m = 0, 1, . . . . The spaces Vm are called Krylov spaces. Furthermore, xm is the minimum of f on Vm−1 for every m ≥ 1. By this result, we can write xm in the form xm = −Pm−1 (K ∗ K)p0 = Pm−1 (K ∗ K)K ∗ y
(2.43)
with a welldeﬁned polynomial Pm−1 ∈ Pm−1 of degree at most m − 1 (which depends itself on y). Analogously, we write the defect in the form y − Kxm
= y − KPm−1 (K ∗ K)K ∗ y = y − KK ∗ Pm−1 (KK ∗ )y = Qm (KK ∗ )y
with the polynomial Qm (t) := 1 − t Pm−1 (t) of degree m. Let {μj , xj , yj : j ∈ N} be a singular system for K. If it happens that y =
N
αj yj ∈ WN := span {y1 , . . . , yN }
j=1
for some N ∈ N, then all iterates xm ∈ AN := span {x1 , . . . , xN } because xm = Pm−1 (K ∗ K)K ∗ y =
N
αj Pm−1 (μ2j ) μj xj .
j=1
In this exceptional case, the algorithm terminates after at most N iterations because the dimension of AN is at most N and the gradients ∇f (xi ) are orthogonal to each other. This is the reason why the conjugate gradient method applied to matrix equations stops after ﬁnitely many iterations. For operator equations in inﬁnitedimensional Hilbert spaces, this method produces sequences of, in general, inﬁnitely many elements. The following characterizations of Qm are useful. Lemma 2.22 (a) The polynomial Qm minimizes the functional H(Q) := Q(KK ∗ )y2Y and satisﬁes
on
{Q ∈ Pm : Q(0) = 1}
H(Qm ) = Kxm − y2Y .
(b) For k = , the following orthogonality relation holds: Qk , Q :=
∞
2 μ2j Qk (μ2j ) Q (μ2j ) (y, yj )Y = 0 .
(2.44)
j=1
If y ∈ / span {y1 , . . . , yN } for any N ∈ N, then ·, · deﬁnes an inner product on the space P of all polynomials.
58
Regularization Theory
Without a priori information, the sequence (xm ) does not converge to the solution x of Kx = y. The images, however, do converge to y. This is the subject of the next theorem. Theorem 2.23 Let K and K ∗ be onetoone, and assume that the conjugate gradient method does not stop after ﬁnitely many steps. Then Kxm −→ y
as
m→∞
for every y ∈ Y . We give a proof of this theorem because it is a simple conclusion of the previous lemma. Proof: Let Q ∈ Pm be an arbitrary polynomial with Q(0) = 1. Then, by the previous lemma, Kxm − y2Y = H(Qm ) ≤ H(Q) =
∞
2 Q(μ2j )2 (y, yj )Y .
(2.45)
j=1
Now let ε > 0 be arbitrary. Choose j1 ∈ N such that ∞ 2 (y, yj )Y 2 < ε 2 j=j +1 1
and choose a function R ∈ C[0, μ21 ] with R(0) = 1, R∞ ≤ 1 and R(μ2j ) = 0 for j = 1, 2, . . . , j1 . By the theorem of Weierstrass, there exist polynomials ˜ m ∞ → 0 as m → ∞. We set Q ˆm = Q ˜ m /Q ˜ m (0), which ˜ m ∈ Pm with R − Q Q ˆ is possible for suﬃciently large m because R(0) = 1. Then Qm converges to R ˆ m (0) = 1. Substituting this into (2.45) yields as m → ∞ and Q
ˆm Kxm − y2Y ≤ H Q ≤
j1 ˆ m 2 ˆ m (μ2 ) − R(μ2 )2 (y, yj )Y 2 + Q Q (y, yj )Y 2 j j ∞ ( )* + j=1 j>j1 =0
≤
ˆ m − R2 y2 + Q ∞ Y
ε2 ˆ 2 Qm ∞ . 2
This expression is less than ε2 for suﬃciently large m.
Now we return to the regularization of the operator equation Kx∗ = y ∗ . The operator Pm−1 (K ∗ K)K ∗ : Y → X corresponds to the regularization operator Rα of the general theory. But this operator certainly depends on the righthand side y. The mapping y → Pm−1 (K ∗ K)K ∗ y is therefore nonlinear. So far, we have formulated and studied the conjugate gradient method for unperturbed righthand sides. Now we consider the situation where we know only an approximation y δ of y ∗ such that y δ − y ∗ Y ≤ δ. We apply the
2.7 The Conjugate Gradient Method
59
algorithm to y δ instead of y. This yields a sequence xm,δ and polynomials Pδm and Qδm . There is no a priori strategy m = m(δ) such that xm(δ),δ converges to x∗ as δ tends to zero; see [78]. An a posteriori choice as in the previous section, however, again leads to an optimal strategy. We stop the algorithm with the smallest m = m(δ) such that the defect Kxm,δ − y δ Y ≤ rδ, where r > 1 is some given parameter. From now on, we make the assumption that y δ is never a ﬁnite linear combination of the yj . Then, by Theorem 2.23, the defect tends to zero, and this stopping rule is welldeﬁned. We want to show that the choice m = m(δ) leads to an optimal algorithm. The following analysis, which we learned from [121] (see also [226]), is more elementary than, for example, in [21, 181], or [182]. We carry out the complete analysis but, again, postpone the proofs to Appendix B because they are rather technical. We recall that by our stopping rule Kxm(δ),δ − y δ Y ≤ rδ < Kxm(δ)−1,δ − y δ Y .
(2.46)
The following theorem establishes the optimality of the conjugate gradient method with this stopping rule. Theorem 2.24 Assume that y ∗ = Kx∗ and y δ do not belong to the linear span of ﬁnitely many yj . Let the sequence xm(δ),δ be constructed by the conjugate ∗ gradient method with stopping rule (2.46) for ﬁxed parameter r > 1. Let x = ∗ σ/2 ∗ σ/2 for some σ > 0 and z ∈ X. Then there exists (K K) z ∈ R (K K) c > 0 with xm(δ),δ − x∗ X ≤ c δ σ/(σ+1) E 1/(σ+1) .
(2.47)
where E = zX . As Landweber’s method and the regularization method by the spectral cutoﬀ (but in contrast to Tikhonov’s method), the conjugate gradient method is optimal for all σ > 0 under the a priori information (K ∗ K)−σ/2 x∗ X ≤ E. There is a much simpler implementation of the conjugate gradient method for selfadjoint positive deﬁnite operators K : X → X. For such K, there exists a unique selfadjoint positive operator A : X → X with A2 = K. Let Kx = y and set z := Ax, that is, Az = y. We apply the conjugate gradient method to the equation Ax = z without knowing z. In the process of the algorithm, only the elements A∗ z = y, Apm 2 = (Kpm , pm ), and A∗ (Axm − z) = Kxm − y have to be computed. The square root A and the quantity z do not have to be known explicitly, and the method is much simpler to implement. Actually, the conjugate gradient method presented here is only one member of a large class of conjugate gradient methods. For a detailed study of these methods in connection with illposed problems, we refer to [104, 124, 207, 208, 210] and, in particular, the work [122].
60
2.8
Regularization Theory
Problems
2.1 Let K : L2 (0, 1) → L2 (0, 1) be the integral operator
1 (1 + ts) ets x(s) ds,
(Kx)(t) :=
0 < t < 1.
0
Show by induction that dn (Kx)(t) = dtn
1 (n + 1 + ts) sn ets x(s) ds,
0 < t < 1, n = 0, 1, . . . .
0
Prove that K is onetoone and that the constant functions do not belong to the range of K. 2.2 Apply Tikhonov’s method of Section 2.2 to the integral equation
t x(s) ds = y(t),
0 ≤ t ≤ 1.
0
Prove that for y ∈ H 1 (0, 1) with y(0) = 0, Tikhonov’s solution xα is given by the solution of the boundary value problem −α x (t) + x(t) = y (t), 0 < t < 1,
x(1) = 0, x (0) = 0 .
2.3 Let K : X → Y be compact For any σ ≥ 0 let the spaces
and onetoone. Xσ be deﬁned by Xσ = R (K ∗ K)σ/2 equipped with the inner product
∗ σ/2 (K K) z1 , (K ∗ K)σ/2 z2 X = (z1 , z2 )X , z1 , z2 ∈ X . σ
Prove that Xσ are Hilbert spaces and that Xσ2 is compactly embedded in Xσ1 for σ2 > σ1 . 2.4 The iterated Tikhonov regularization xm,α,δ of order m ∈ N (see [87, 153]) is iteratively deﬁned by x0,α,δ = 0,
(α I + K ∗ K)xm+1,α,δ = K ∗ y δ + α xm,α,δ
for m = 0, 1, 2, . . . . (Note that x1,α,δ is the ordinary Tikhonov regularization.)
α m is the corresponding ﬁlter func(a) Show that q (m) (α, μ) := 1 − α+μ 2 tion. (m)
(b) Prove that this ﬁlter function leads to a regularizing operator Rα √ (m) with Rα L(Y,X) ≤ m/(2 α), and q (m) satisﬁes (2.9) from Theorem 2.7 with ωσ(m) (α) = c αmin{σ/2,m}
2.8 Problems
61
where c depends only on m and σ. (c) Show that the iterated Tikhonov regularization of order m is asymptotically optimal under the information (K ∗ K)−σ/2 x∗ X ≤ E for every σ ≤ 2m. 2.5 Fix y δ with y δ − y ∗ Y ≤ δ and let xα,δ be the Tikhonov solution corresponding to α > 0. The curve f (α) Kxα,δ − y δ 2Y α → := , α > 0, g(α) xα,δ 2X in R2 is called an Lcurve because it has often the shape of the letter L; see [90, 125, 127]. Show by using a singular system that f (α) = −α g (α). Furthermore, compute the curvature f (α) g (α) − g (α) f (α) C(α) :=
3/2 f (α)2 + g (α)2 and show that the curvature increases monotonically for 0 < α ≤ 1/K2L(X,Y ) . 2.6 Show that mμ2 (1 − aμ2 )2m−2 ≤
1 a
1 for all m ≥ 1 and 0 ≤ μ ≤ √ . a
Chapter 3
Regularization by Discretization In this chapter, we study a diﬀerent approach to regularizing operator equations of the form Kx = y, where x and y are elements of certain function spaces. This approach is motivated by the fact that for the numerical treatment of such equations, one has to discretize the continuous problem and reduce it to a ﬁnite system of (linear or nonlinear) equations. We see in this chapter that the discretization schemes themselves are regularization strategies in the sense of Chapter 2. In Section 3.1, we study the general concept of projection methods and give a necessary and suﬃcient condition for convergence. Although we have in mind the treatment of integral equations of the ﬁrst kind, we treat the general case where K is a linear, bounded, not necessarily compact operator between (real or complex) Banach or Hilbert spaces. Section 3.2 is devoted to Galerkin methods. As special cases, we study least squares and dual least squares methods in Subsections 3.2.1 and 3.2.2. In Subsection 3.2.3, we investigate the Bubnov– Galerkin method for the case where the operator satisﬁes G˚ arding’s inequality. In Section 3.3, we illustrate the Galerkin methods for Symm’s integral equation of the ﬁrst kind. This equation arises in potential theory and serves as a model equation for more complicated situations. Section 3.4 is devoted to collocation methods. We restrict ourselves to the moment method in Subsection 3.4.1 and to collocation by piecewise constant functions in Subsection 3.4.2, where the analysis is carried out only for Symm’s integral equation. In Section 3.5, we present numerical results for various regularization techniques (Tikhonov, Landweber, conjugate gradient, projection, and collocation methods) tested for Dirichlet boundary value problems for the Laplacian in an ellipse. Finally, we study the Backus–Gilbert method in Section 3.6. Although not very popular among mathematicians, this method is extensively used by scientists in geophysics and other applied sciences. The general ideas of Sections 3.1 and 3.2 can also be found in, for example, [17, 168, 182, 226].
© Springer Nature Switzerland AG 2021 A. Kirsch, An Introduction to the Mathematical Theory of Inverse Problems, Applied Mathematical Sciences 120, https://doi.org/10.1007/9783030633431 3
63
64
Regularization by Discretization
3.1
Projection Methods
First, we recall the deﬁnition of a projection operator. Definition 3.1 Let X be a normed space over the ﬁeld K where K = R or K = C. Let U ⊂ X be a closed subspace. A linear bounded operator P : X → X is called a projection operator on U if • P x ∈ U for all x ∈ X and • P x = x for all x ∈ U . We now summarize some obvious properties of projection operators. Theorem 3.2 Every nontrivial projection operator satisﬁes P 2 = P and P L(X) ≥ 1. Proof: P 2 x = P (P x) = P x follows from P x ∈ U . Furthermore, P L(X) = 2 P L(X) ≤ P 2L(X) and P = 0. This implies P L(X) ≥ 1. In the following two examples, we introduce the most important projection operators. Example 3.3 (a) (Orthogonal projection) Let X be a preHilbert space over K = R or K = C and U ⊂ X be a complete subspace. Let P x ∈ U be the best approximation to x in U ; that is, P x satisﬁes P x − xX ≤ u − xX
for all u ∈ U .
(3.1)
By the projection theorem (Theorem A.13 of Appendix A.1), P : X → U is linear and P x ∈ U is characterized by the abstract “normal equation” (x − P x, u)X = 0 for all u ∈ U , that is, x − P x ∈ U ⊥ . In this example, by the binomial theorem we have x2X
= P x + (x − P x)2X = P x2X + x − P x2X + 2 Re(x − P x, P x)X ≥ P x2X , =0
that is, P L(X) = 1. Important examples of subspaces U are spaces of splines or ﬁnite elements. (b) (Interpolation operator) Let X = C[a, b] be the space of realvalued continuous functions on [a, b] supplied with the supremum norm. Then X is a normed space over R. Let U = span{u1 , . . . , un } be an ndimensional subspace and t1 , . . . , tn ∈ [a, b] such that the interpolation problem in U is uniquely solvable; that is, det[uj (tk )] = 0. We deﬁne P x ∈ U by the interpolant of x ∈ C[a, b] in U , i.e., u = P x ∈ U satisﬁes u(tj ) = x(tj ) for all j = 1, . . . , n. Then P : X → U is a projection operator.
3.1
Projection Methods
65
Examples for U are spaces of algebraic or trigonometric polynomials. As a drawback of these choices, we note that from the results of Faber (see [203]) the interpolating polynomials of continuous functions x do not, in general, converge to x as the degree of the polynomials tends to inﬁnity. For smooth periodic functions, however, trigonometric interpolation at equidistant points converges with an optimal order of convergence. We use this fact in Subsection 3.4.2. Here, as an example, we recall the interpolation by linear splines. For simplicity, we formulate only the case where the endpoints are included in the set of interpolation points. Let a = t1 < · · · < tn = b be given points, and let U ⊂ C[a, b] be deﬁned by U
= :=
S1 (t1 , . . . , tn ) x ∈ C[a, b] : x[tj ,tj+1 ] ∈ P1 , j = 1, . . . , n − 1 ,
(3.2)
where P1 denotes the space of polynomials of degree at most one. Then the interpolation operator Qn : C[a, b] → S1 (t1 , . . . , tn ) is given by Qn x =
n
x(tj ) yˆj
for x ∈ C[a, b] ,
j=1
where the basis functions yˆj ∈ S1 (t1 , . . . , tn ) are deﬁned by ⎧ t − tj−1 ⎪ , t ∈ [tj−1 , tj ] (if j ≥ 2), ⎪ ⎪ ⎪ ⎨ tj − tj−1 tj+1 − t yˆj (t) = , t ∈ [tj , tj+1 ] (if j ≤ n − 1), ⎪ ⎪ t ⎪ j+1 − tj ⎪ ⎩ 0, t∈ / [tj−1 , tj+1 ],
(3.3)
for j = 1, . . . , n. In this example Qn L(C[a,b]) = 1 (see Problem 3.1). For general interpolation operators, Qn L(X) exceeds one and Qn L(X) does not have to be bounded with respect to n. Now we deﬁne the class of projection methods. Definition 3.4 Let X and Y be Banach spaces and K : X → Y be bounded and onetoone. Furthermore, let Xn ⊂ X and Yn ⊂ Y be ﬁnitedimensional subspaces of dimension n and Qn : Y → Yn be a projection operator. For given y ∈ Y , the projection method for solving the equation Kx = y is to solve the equation (3.4) Qn Kxn = Qn y for xn ∈ Xn . ˆn } Equation (3.4) reduces to a ﬁnite linear system by choosing bases {ˆ x1 , . . . , x and {ˆ y1 , . . . , yˆn } of Xn and Yn , respectively. One possibility is to represent Qn y and every Qn K x ˆj , j = 1, . . . , n, in the forms Qn y =
n i=1
βi yˆi
and Qn K x ˆj =
n i=1
Aij yˆi ,
j = 1, . . . , n ,
(3.5)
66
Regularization by Discretization
n with βi , Aij ∈ K. The linear combination xn = j=1 αj x ˆj solves (3.4) if and only if α = (α1 , . . . , αn ) ∈ Kn solves the ﬁnite system of linear equations n
Aij αj = βi ,
i = 1, . . . , n ;
that is, Aα = β .
(3.6)
j=1
There is a second possibility to reduce (3.4) to a ﬁnite system which is used for Galerkin methods. Let Y ∗ be the dual space of Y with dual pairing ·, · Y ∗ ,Y . We choose elements yˆi∗ ∈ Y ∗ for i = 1, . . . , n, such that the matrix Y ∈ Kn×n given yi∗ , yˆj Y ∗ ,Y is regular. Then, representing xn again as xn =
n by Yij = ˆ ˆj equation (3.4) is equivalent to j=1 αj x n
Aij αj = βi ,
i = 1, . . . , n ;
that is, Aα = β ,
(3.7)
and βi = ˆ yi∗ , Qn y Y ∗ ,Y .
(3.8)
j=1
where now yi∗ , Qn K x ˆj Y ∗ ,Y Aij = ˆ
The orthogonal projection and the interpolation operator from Example 3.3 lead to the following important classes of projection methods, which are studied in more detail in the next sections. Example 3.5 Let K : X → Y be bounded and onetoone. (a) (Galerkin method) Let X and Y be preHilbert spaces and Xn ⊂ X and Yn ⊂ Y be ﬁnitedimensional subspaces with dim Xn = dim Yn = n. Let Qn : Y → Yn be the orthogonal projection. Then the projected equation Qn Kxn = Qn y is equivalent to (3.9a) (Kxn , zn )Y = (y, zn )Y for all zn ∈ Yn . x1 , . . . , x ˆn } and y1 , . . . , yˆn }. Looking for a Again let Xn = span{ˆ
n Yn = span{ˆ ˆj leads to the system solution of (3.9a) in the form xn = j=1 αj x n
αj (K x ˆj , yˆi )Y = (y, yˆi )Y
for i = 1, . . . , n ,
(3.9b)
j=1
ˆj , yˆi )Y and βi = (y, yˆi )Y . This corresponds to or Aα = β, where Aij := (K x (3.7) with yˆi∗ = yˆi after the identiﬁcation of Y ∗ with Y (Theorem A.23 of Riesz). (b) (Collocation method) Let X be a Banach space, Y = C[a, b], and K : X → C[a, b] be a bounded operator. Let a = t1 < · · · < tn = b be given points (collocation points) and Yn = S1 (t1 , . . . , tn ) be the corresponding space (3.2) of n linear splines with interpolation operator Qn y = j=1 y(tj ) yˆj . Let y ∈ C[a, b] and some ndimensional subspace Xn ⊂ X be given. Then Qn Kxn = Qn y is equivalent to (3.10a) (Kxn )(ti ) = y(ti ) for all i = 1, . . . , n .
3.1
Projection Methods
67
If we denote by {ˆ x ˆn } a basis of Xn , then looking for a solution of (3.10a) 1, . . . , x n ˆj leads to the ﬁnite linear system in the form xn = j=1 αj x n
αj (K x ˆj )(ti ) = y(ti ),
i = 1, . . . , n ,
(3.10b)
j=1
ˆj )(ti ) and βi = y(ti ). or Aα = β, where Aij = (K x We are particularly interested in the study of integral equations of the form b k(t, s) x(s) ds = y(t) ,
t ∈ [a, b] ,
(3.11)
a
in L2 (a, b) or C[a, b] for some continuous or weakly singular function k. (3.9b) and (3.10b) now take the form Aα = β , where x =
n
j=1
(3.12)
αj x ˆj and b b Aij
=
k(t, s) x ˆj (s) yˆi (t) ds dt a
(3.13a)
a
b βi
=
y(t) yˆi (t) dt
(3.13b)
a
for the Galerkin method, and b Aij
=
k(ti , s) x ˆj (s) ds
(3.13c)
a
βi
= y(ti )
(3.13d)
for the collocation method. Comparing the systems of equations in (3.12), we observe that the computation of the matrix elements (3.13c) is less expensive than for those of (3.13a) due to the double integration for every matrix element in (3.13a). For this reason, collocation methods are generally easier to implement than Galerkin methods. On the other hand, Galerkin methods have convergence properties of high order in weak norms (superconvergence) which are of practical importance in many cases, such as boundary element methods for the solution of boundary value problems. For the remaining part of this section, we make the following assumption.
68
Regularization by Discretization
Assumption 3.6 Let K : X → Y be a linear, bounded, and injective operator between Banach spaces, Xn ⊂ X and Yn ⊂ Y be ﬁnitedimensional subspaces of dimension n, and Qn : Y → Yn be a projection operator. We assume that n∈N Xn is dense in X and that Qn KXn : Xn → Yn is onetoone and thus invertible. Let x ∈ X be the solution of Kx = y .
(3.14)
By xn ∈ Xn , we denote the unique solutions of the equations Qn Kxn = Qn y
(3.15)
for n ∈ N. We can represent the solutions xn ∈ Xn of (3.15) in the form xn = Rn y, where Rn : Y → Xn ⊂ X is deﬁned by Rn :=
Qn KXn
−1
Qn : Y −→ Xn ⊂ X .
(3.16)
The projection method is called convergent if the approximate solutions xn ∈ Xn of (3.15) converge to the exact solution x ∈ X of (3.14) for every y ∈ R(K), that is, if −1 Qn Kx −→ x, n → ∞ , (3.17) Rn Kx = Qn KXn for every x ∈ X. We observe that this deﬁnition of convergence coincides with Deﬁnition 2.1 of a regularization strategy for the equation Kx = y with regularization parameter α = 1/n. Therefore, the projection method converges if and only if Rn is a regularization strategy for the equation Kx = y. Obviously, we can only expect convergence if we require that n∈N Xn is dense in X and Qn y → y for all y ∈ R(K). But, in general, this is not suﬃcient for convergence if K is compact. We have to assume the following boundedness condition. Theorem 3.7 Let Assumption 3.6 be satisﬁed. The solution xn = Rn y ∈ Xn of (3.15) converges to x for every y = Kx if and only if there exists c > 0 such that (3.18) Rn KL(X) ≤ c for all n ∈ N . If (3.18) is satisﬁed the following error estimate holds: xn − xX ≤ (1 + c) min zn − xX zn ∈Xn
(3.19)
with the same constant c as in (3.18). Proof: Let the projection method be convergent. Then Rn Kx → x for every x ∈ X. The assertion follows directly from the principle of uniform boundedness (Theorem A.28 of Appendix A.3).
3.1
Projection Methods
69
Now let Rn KL(X) be bounded. The operator Rn K is a projection operator −1 onto Xn because for zn ∈ Xn , we have Rn Kzn = Qn KXn Qn Kzn = zn . Thus we conclude that xn − x = (Rn K − I)x = (Rn K − I)(x − zn )
for all zn ∈ Xn .
This yields xn − xX ≤ (c + 1) x − zn X
for all zn ∈ Xn and proves (3.19). Convergence xn → x follows because n∈N Xn is dense in X. We show now a perturbation result: It is suﬃcient to study the question of convergence for the “principal part” of an operator K. In particular, if the projection method converges for an operator K, then convergence and the error estimates hold also for K + C, where C is compact relative to K (that is, K −1 C is compact). Theorem 3.8 Let Assumption 3.6 hold. Let C : X → Y be a linear operator with R(C) ⊂ R(K) such that K + C is onetoone and K −1 C is compact in X. Assume, furthermore, that the projection method converges for K, that is, that Rn Kx → x for every x ∈ X, where again −1 Qn . Rn = Qn KXn ˜ n (K + C)x → x for every x ∈ X, Then it converges also for K + C; that is, R where now ˜ n = Qn (K + C)X −1 Qn . R n
∗
∗
˜ n y ∗ ∈ Xn be the ˜n = R Let x ∈ X be the solution of (K + C)x = y ∗ and x xn = Qn y ∗ . Then solution of the corresponding projected equation Qn (K + C)˜ there exists c > 0 with ˜ xn − x∗ X ≤ c Rn y ∗ − K −1 y ∗ X + Rn Cx∗ − K −1 Cx∗ X (3.20) for all suﬃciently large n. Proof: First we note that I + K −1 C = K −1 (K + C) is onetoone and thus, because of the compactness of K −1 C, an isomorphism from X onto itself (see xn = Qn y ∗ as (I +Rn C)˜ xn = Theorem A.36). We write the equation Qn (K +C)˜ ∗ Rn y and have (I+Rn C)(˜ xn −x∗ ) = Rn y ∗ −Rn Cx∗ −x∗ = [Rn y ∗ −K −1 y ∗ ]−[Rn Cx∗ −K −1 Cx∗ ] where we haveused that x∗ + K −1 Cx∗ = K −1 y ∗ . Now we note that the operators Rn C = Rn K K −1 C converge to K −1 C in the operator norm because Rn Kx → x for every x ∈ X and K −1 C is compact in X (see Theorem A.34, part
70
Regularization by Discretization
(d), of Appendix A.3). Therefore, by the general Theorem A.37 of Appendix A.3, we conclude that (I + Rn C)−1 exist for suﬃciently large n and their operator norms are uniformly bounded by some c > 0. This yields the estimate (3.20). The ﬁrst term on the righthand side of (3.20) is just the error of the projection method for the equation Kx = y ∗ , the second for the equation Kx = Cx∗ . By the previous Theorem 3.7, these terms are estimated by Rn y ∗ − K −1 y ∗ X ∗
Rn Cx − K
−1
∗
Cx X
≤ (1 + c) min K −1 y ∗ − zn X , zn ∈Xn
≤ (1 + c) min K −1 Cx∗ − zn X zn ∈Xn
where c is a bound of Rn KL(X) . So far, we have considered the case where the righthand side y ∗ = Kx∗ is known exactly. Now we study the case where the righthand side is known only approximately. We understand the operator Rn from (3.16) as a regularization operator in the sense of the previous chapter. We have to distinguish between two kinds of errors on the righthand side. The ﬁrst kind corresponds to the kind of perturbation discussed in Chapter 2. Instead of the exact righthand side y ∗ , only y δ ∈ Y is given with y δ − y ∗ Y ≤ δ. We call this the continuous perturbation of the righthand side. A simple application of the triangle inequality yields the following result. Theorem 3.9 Let Assumption 3.6 be satisﬁed and let again Rn = (Qn KXn )−1 Qn : Y → Xn ⊂ X as in (3.16). Let x∗ ∈ X the solution of the unperturbed equation Kx∗ = y ∗ . Furthermore, we assume that the projection method converges; that is by Theorem 3.7, Rn KL(X) are uniformly bounded with respect to n. Furthermore, let y δ ∈ Y with y δ − y ∗ Y ≤ δ and xδn = Rn y δ the solution of the projected equation Qn Kxδn = Qn y δ . Then xδn − x∗ X
≤ xδn − Rn y ∗ X + Rn y ∗ − x∗ X
≤ Rn L(Y,X) y δ − y ∗ Y + Rn Kx∗ − x∗ X . (3.21)
This estimate corresponds to the fundamental estimate from (2.4a). The ﬁrst term reﬂects the illposedness of the equation: The (continuous) error δ of the righthand side is multiplied by the norm of Rn . The second term describes the discretization error for exact data. In practice, one solves the discrete systems (3.6) or (3.7) where the coefﬁcients βi are replaced by perturbed coeﬃcients βiδ ∈ K; that is, one solves n j=1
Aij αjδ = βiδ ,
i = 1, . . . , n ;
that is, Aαδ = β δ ,
(3.22)
3.2
Galerkin Methods
71
instead of Aα = β where now β δ − β2 =
n
βiδ − βi 2 ≤ δ 2 .
i=1
Recall, that the elements Aij of the matrix A ∈ Kn×n and the exact coeﬃcients βi of β ∈ Kn are given by (3.5) or (3.8). We call this the discrete perturbation of the righthand side. Then xδn ∈ Xn is deﬁned by xδn =
n
αjδ x ˆj .
j=1
In this case, the choices of basis functions x ˆj ∈ Xn and yˆj ∈ Yn (and the dual basis functions yˆi∗ ) are essential. We will also see that the condition number of A reﬂects the illconditioning of the equation Kx = y. We do not carry out the analysis for these two forms of discrete perturbations in the general case but do it only for Galerkin methods in the next section.
3.2
Galerkin Methods
In this section, we assume that X and Y are (real or complex) Hilbert spaces; K : X → Y is linear, bounded, and onetoone; Xn ⊂ X and Yn ⊂ Y are ﬁnitedimensional subspaces with dim Xn = dim Yn = n; and Qn : Y → Yn is the orthogonal projection operator onto Yn . Then equation Qn Kxn = Qn y reduces to the Galerkin equations (see Example 3.5) (Kxn , zn )Y = (y, zn )Y
for all zn ∈ Yn .
(3.23)
If we choose bases {ˆ x1 , . . . , x ˆn } and {ˆ y1 , . . . , yˆn } of Xn and Yn , respectively, then n this leads to a ﬁnite system for the coeﬃcients of xn = j=1 αj x ˆj (compare with (3.9b)): n Aij αj = βi , i = 1, . . . , n , (3.24) i=1
where ˆj , yˆi )Y Aij = (K x
and βi = (y, yˆi )Y .
(3.25)
The Galerkin method is also known as the Petrov–Galerkin method (see [216]) because Petrov was the ﬁrst to consider the general situation of (3.23). The special case X = Y and Xn = Yn was studied by Bubnov in 1913 and later by Galerkin in 1915 (see [99]). For this reason, this special case is also known as the Bubnov–Galerkin method. In the case when the operator K is selfadjoint and positive deﬁnite, we will see that the Bubnov–Galerkin method coincides with the Rayleigh–Ritz method; see [221] and [228].
72
Regularization by Discretization
Theorem 3.10 Let Assumption 3.6 be satisﬁed and let again Rn = (Qn KXn )−1 Qn : Y → Xn ⊂ X as in (3.16). Let x∗ ∈ X the solution of the unperturbed equation Kx∗ = y ∗ . Furthermore, we assume that the Galerkin method converges; that is by Theorem 3.7, Rn KL(X) are uniformly bounded with respect to n. (a) Let y δ ∈ Y with y δ − y ∗ Y ≤ δ and xδn = Rn y δ the solution of the projected equation Qn Kxδn = Qn y δ . Then xδn − x∗ X ≤ Rn L(Y,X) δ + Rn Kx∗ − x∗ X . (b) Let Qn y ∗ =
n
i=1
βi yˆi and βiδ ∈ K with β δ −β =
let αδ ∈ Kn be the solution of Aαδ = β δ . Then, xδn − x∗ X
≤
xδn − x∗ X
≤
n
(3.26)
βiδ − βi 2 ≤ δ and
n = j=1 αjδ x ˆj ,
i=1 with xδn
an δ + Rn Kx∗ − x∗ X , (3.27a) λn Rn L(Y,X) bn δ + Rn Kx∗ − x∗ X , (3.27b)
where an
=
n max ρ x ˆ j j
X
j=1
bn
=
:
⎧ n ⎨ max ρi 2 ⎩ i=1
n
ρj 2 = 1 ,
(3.28a)
j=1
⎫ ⎬ n : ρ y ˆ = 1 , i i ⎭ Y
(3.28b)
i=1
and λn > 0 denotes the smallest singular value of the matrix A. We note yi : i = that if X or Y are Hilbert spaces and {ˆ xj : j = 1, . . . , n} or {ˆ 1, . . . , n}, respectively, are orthonormal systems then an = 1 or bn = 1, respectively. Proof:
(a) has been shown before.
∗ ∗ ∗ (b) Again we use xδn − x∗ X ≤ xδn − n y X + Rn y − x X and estimate
R n ∗ ˆj with Aα = β and thus the ﬁrst term. We note that Rn y = j=1 αj x
xδn − Rn y ∗ X
n δ = (α − α )ˆ x j j j j=1
X
n ≤ an (αδ − α)2 j=1
= an A−1 (β δ − β) ≤ an A−1 2 β δ − β ≤
an δ, λn
where A−1 2 denotes the spectral norm of A−1 , that is, the inverse of the smallest singular value of A. This yields (3.27a).
3.2
Galerkin Methods
73
To prove (3.27b), we choose ynδ ∈ Yn such that (ynδ , yˆi )Y = βiδ for i = 1, . . . , n. Then Rn ynδ = xδn and thus xδn − Rn y ∗ X
≤
Rn L(Y,X) ynδ − Qn y ∗ Y
(ynδ − Qn y ∗ , zn )Y  zn Y zn ∈Yn n δ ∗ ˆj )Y j=1 ρj (yn − Qn y , y
n = Rn L(Y,X) sup j=1 ρj yˆj Y ρj n δ j=1 ρj (βj − βj )
n = Rn L(Y,X) sup j=1 ρj yˆj Y ρj n 2 j=1 ρj  δ ≤ Rn L(Y,X) β − β sup n ˆj Y ρj j=1 ρj y = Rn L(Y,X) sup
≤ This ends the proof.
Rn L(Y,X) bn δ .
In the following three subsections, we derive error estimates for three special choices for the ﬁnitedimensional subspaces Xn and Yn . The cases where Xn and Yn are coupled by Yn = K(Xn ) or Xn = K ∗ (Yn ) lead to the least squares method or the dual least squares method, respectively. Here, K ∗ : Y → X denotes the adjoint of K. In Subsection 3.2.3, we study the Bubnov–Galerkin method for the case where K satisﬁes G˚ arding’s inequality. In all of the subsections, we formulate the Galerkin equations for the perturbed cases ﬁrst without using particular bases and then with respect to given bases in Xn and Yn .
3.2.1
The Least Squares Method
An obvious method to solve an equation of the kind Kx = y is the following: Given a ﬁnitedimensional subspace Xn ⊂ X, determine xn ∈ Xn such that Kxn − yY ≤ Kzn − yY
for all zn ∈ Xn .
(3.29)
Existence and uniqueness of xn ∈ Xn follow easily because Xn is ﬁnitedimensional and K is onetoone. The solution xn ∈ Xn of this least squares problem is characterized by (Kxn , Kzn )Y = (y, Kzn )Y
for all zn ∈ Xn .
(3.30a)
We observe that this method is a special case of the Galerkin method when we set Yn := K(Xn ). Choosing a basis {ˆ xj : j = 1, . . . , n} of Xn leads to the ﬁnite system n j=1
αj (K x ˆj , K x ˆi )Y = βi = (y, K x ˆi )Y
for i = 1, . . . , n ,
(3.30b)
74
Regularization by Discretization
or Aα = β. This has the form of (3.25) for yˆi = K x ˆi . The corresponding matrix ˆj , K x ˆi )Y is symmetric (if K = R) or Hermitian (if A ∈ Kn×n with Aij = (K x K = C) and positive deﬁnite because K is also onetoone. Again, we study the case where the exact righthand side y ∗ is perturbed by an error. For continuous perturbations, let xδn ∈ Xn be the solution of (Kxδn , Kzn )Y = (y δ , Kzn )Y
for all zn ∈ Xn ,
(3.31a)
where y δ ∈ Y is the perturbed righthand side with y δ − y ∗ Y ≤ δ. ˆi )Y , i = 1, . . . , n, For the discrete perturbation, we assume that βi = (y ∗ , K x is replaced by a vector β δ ∈ Kn with β δ − β ≤ δ, where  ·  denotes the Euclidean norm in Kn . This
nleads to the following ﬁnite system of equations ˆj : for the coeﬃcients of xδn = j=1 αjδ x n
αjδ (K x ˆj , K x ˆi )Y = βiδ
for i = 1, . . . , n .
(3.31b)
j=1
This system is uniquely solvable because the matrix A is positive deﬁnite. For least squares methods, the boundedness condition (3.18) is not satisﬁed without additional assumptions. We refer to [246] or [168], Problem 17.2, for an example. However, we can prove the following theorem. Theorem 3.11 Let K : X → Y be a linear, bounded, and injective operator between Hilbert spaces and Xn ⊂ X be ﬁnitedimensional subspaces such that X is dense in X. Let x∗ ∈ X be the solution of Kx∗ = y ∗ and xδn ∈ Xn n n∈N be the least squares solution from (3.31a) or (3.31b). Deﬁne (3.32) σn := max zn X : zn ∈ Xn , Kzn Y = 1 and let there exist c > 0, independent of n, such that min x − zn X + σn K(x − zn )Y ≤ cxX zn ∈Xn
for all x ∈ X .
(3.33)
Then the least squares method is convergent and Rn L(Y,X) ≤ σn . In this case, we have the error estimate (3.34) x∗ − xδn X ≤ bn σn δ + c˜ min x∗ − zn X : zn ∈ Xn for some c˜ > 0. Here, bn = 1 if xδn ∈ Xn solves (3.31a); that is, δ measures the ∗ Y . If δ measures the discrete error β δ − β in continuous perturbation y δ − y n δ ˆj ∈ Xn , where αδ solves (3.31b), then the Euclidean norm and xn = j=1 αjδ x bn is given by ⎧ ⎫ ! n n ⎨ ⎬ K ρj 2 : ρ x ˆ = 1 . (3.35) bn = max j j ⎩ ⎭ Y j=1
j=1
If {ˆ xj : j = 1, . . . , n} is an orthonormal basis of Xn then bn = σn .
3.2
Galerkin Methods
75
Proof: It is the aim to apply Theorem 3.10. First we prove that Rn KL(X) are bounded uniformly in n. Let x ∈ X and xn := Rn Kx. Then xn satisﬁes (Kxn , Kzn )Y = (Kx, Kzn )Y for all zn ∈ Xn . This yields K(xn − zn )2Y = K(xn − zn ), K(xn − zn ) Y = K(x − zn ), K(xn − zn ) Y ≤ K(x − zn )Y K(xn − zn )Y and thus K(xn − zn )Y ≤ K(x − zn )Y for all zn ∈ Xn . Using this and the deﬁnition of σn , we conclude that xn − zn X ≤ σn K(xn − zn )Y ≤ σn K(x − zn )Y , and thus xn X
≤ ≤
xn − zn X + zn − xX + xX xX + zn − xX + σn K(x − zn )Y .
This holds for all zn ∈ Xn . Taking the minimum, we have by Assumption (3.33) that xn X ≤ (1 + c)xX . Thus the boundedness condition (3.18) is satisﬁed. The application of Theorem 3.7 proves convergence. Analogously we prove the estimate for Rn L(Y,X) . Let y ∈ Y and set xn := Rn y. Then from (3.30a), we have Kxn 2Y = (Kxn , Kxn )Y = (y, Kxn )Y ≤ yY Kxn Y and thus xn X ≤ σn Kxn Y ≤ σn yY . This proves the estimate Rn L(Y,X) ≤ σn . The error estimates (3.34) follow directly from Theorem 3.10 and the estiˆj . mates (3.26) and (3.27b) for yˆj = K x For further numerical aspects of least squares methods, we refer to [79, 82, 107, 150, 188, 189, 201, 202].
3.2.2
The Dual Least Squares Method
As the next example for the Galerkin method, we study the dual least squares method. We will see that the boundedness condition (3.18) is always satisﬁed. We assume in addition to the general assumptions of this section that the range R(K) of K is dense in Y . Given any ﬁnitedimensional subspace Yn ⊂ Y , determine un ∈ Yn such that (KK ∗ un , zn )Y = (y, zn )Y
for all zn ∈ Yn ,
(3.36)
where K ∗ : Y → X denotes the adjoint of K. Then xn := K ∗ un is called the dual least squares solution. It is a special case of the Galerkin method when we set Xn := K ∗ (Yn ). Writing equation (3.36) for y = Kx in the form (K ∗ un , K ∗ zn )X = (x, K ∗ zn )X
for all zn ∈ Yn ,
76
Regularization by Discretization
we observe that the dual least squares method is just the least squares method for the equation K ∗ u = x. This explains the name. We assume again that the exact righthand side is perturbed. Let y δ ∈ Y with y δ − y ∗ Y ≤ δ. Instead of equation (3.36), one determines xδn = K ∗ uδn ∈ Xn with (3.37) (K ∗ uδn , K ∗ zn )X = (y δ , zn )Y for all zn ∈ Yn . For discrete perturbations, we choose a basis {ˆ yj : j = 1, . . . , n} of Yn and assume that the righthand sides βi = (y ∗ , yˆi )Y , i = 1, . . . , n, of the Galerkin equations are perturbed by a vector β δ ∈ Kn with β δ −β ≤ δ where · denotes the Euclidean norm in Kn . Instead of (3.36), we determine xδn = K ∗ uδn =
n
αjδ K ∗ yˆj ,
j=1
where αδ ∈ Kn solves n
αjδ (K ∗ yˆj , K ∗ yˆi )X = βiδ
for i = 1, . . . , n .
(3.38)
j=1
First we show that equations (3.37) and (3.38) are uniquely solvable. K ∗ : Y → X is onetoone because the range R(K) is dense in Y . Thus the dimensions of Yn and Xn coincide and K ∗ is an isomorphism from Yn onto Xn . It is suﬃcient to prove the uniqueness of a solution to (3.37). Let un ∈ Yn with (K ∗ un , K ∗ zn )X = 0 for all zn ∈ Yn . For zn = un we conclude that 0 = (K ∗ un , K ∗ un )X = K ∗ un 2X , that is, K ∗ un = 0 or un = 0. Convergence and error estimates are proven in the following theorem. Theorem 3.12 Let X and Y be Hilbert spaces and K : X → Y be linear, bounded, and onetoone such that the range R(K) is dense in Y . Let Yn ⊂ Y be ﬁnitedimensional subspaces such that n∈N Yn is dense in Y . Let x∗ ∈ X be the solution of Kx∗ = y ∗ . Then the Galerkin equations (3.37) and (3.38) are uniquely solvable for every righthand side and every n ∈ N. The dual least squares method is convergent and Rn L(Y,X) ≤ σn := max zn Y : zn ∈ Yn , K ∗ zn X = 1 . (3.39) Furthermore, we have the error estimates x∗ − xδn X ≤ bn σn δ + c min x∗ − zn X : zn ∈ K ∗ (Yn )
(3.40)
xδn
for some c > 0. Here, bn = 1 if ∈ Xn solves (3.37); that is, δ measures δ ∗ − y in Y . If δ measures the discrete error β δ − β and xδn = the norm y Y
n δ ∗ δ ˆj ∈ Xn , where α solves (3.38), then bn is given by j=1 αj K y ⎧ ⎫ n n ⎨ ⎬ bn = max ρj 2 : ρj yˆj =1 . (3.41) ⎩ ⎭ Y j=1
j=1
yj : j = 1, . . . , n} forms an orthonormal system in Y . We note that bn = 1 if {ˆ
3.2
Galerkin Methods
77
Proof: We have seen already that (3.37) and (3.38) are uniquely solvable for every righthand side and every n ∈ N. Now we prove the estimate Rn KL(X) ≤ 1, that is condition (3.18) with c = 1. Let x ∈ X and set xn := Rn Kx ∈ Xn . Then xn = K ∗ un , and un ∈ Yn satisﬁes (K ∗ un , K ∗ zn )X = (Kx, zn )Y for all zn ∈ Yn . For zn = un this implies xn 2X = K ∗ un 2X = (Kx, un )Y
= (x, K ∗ un )X ≤ xX xn X ,
which proves the desired estimate. If we replace Kx by y in the preceding arguments, we have xn 2X ≤ yY un Y ≤ σn yY K ∗ un X = σn yY xn X , which proves (3.39). Finally, we show that n∈N Xn is dense in X. Let x ∈ X and ε > 0. Because ∗ ∗ K (Y ) is dense in X, there exists y ∈ Y with x − K yX < ε/2. Because n∈N Yn is dense in Y , there exists yn ∈ Yn with y − yn Y < ε/(2KL(X,Y ) ). The triangle inequality yields that for xn := K ∗ yn ∈ Xn , x − xn X ≤ x − K ∗ yX + K ∗ (y − yn )X ≤ ε . The application of Theorem 3.10 and the estimates (3.26) and (3.27b) proves (3.40).
3.2.3
The Bubnov–Galerkin Method for Coercive Operators
In this subsection, we assume that Y = X coincides, and K : X → X is a linear and bounded operator and Xn , n ∈ N, are ﬁnitedimensional subspaces. The Galerkin method reduces to the problem of determining xn ∈ Xn such that (Kxn , zn )X = (y, zn )X
for all zn ∈ Xn .
(3.42)
This special case is called the Bubnov–Galerkin method. Again, we consider two kinds of perturbations of the righthand side. If y δ ∈ X with y δ − y ∗ X ≤ δ is a perturbed righthand side, then instead of (3.42) we study the equation (Kxδn , zn )X = (y δ , zn )X
for all zn ∈ Xn .
(3.43)
The other possibility is to choose a basis {ˆ xj : j = 1, . . . , n} of Xn and assume ˆi )X , i = 1, . . . , n, of the Galerkin equations that the righthand sides βi = (y ∗ , x are perturbed by a vector β δ ∈ Kn with β δ − β ≤ δ, where  ·  denotes again the Euclidean norm in Kn . In this case, instead of (3.42), we have to solve n j=1
αjδ (K x ˆj , x ˆi )X = βiδ
for i = 1, . . . , n ,
(3.44)
78
Regularization by Discretization
for αδ ∈ Kn and set xδn =
n
j=1
αjδ x ˆj .
Before we prove a convergence result for this method, we brieﬂy describe the Rayleigh–Ritz method and show that it is a special case of the Bubnov–Galerkin method. Let K : X → X also be selfadjoint and positive deﬁnite, that is, (Kx, y)X = (x, Ky)X and (Kx, x)X > 0 for all x, y ∈ X with x = 0. We deﬁne the functional ψ(z) := (Kz, z)X − 2 Re(y, z)X
for z ∈ X .
(3.45)
From the equation ψ(z) − ψ(x) = 2 Re(Kx − y, z − x)X + (K(z − x), z − x)X
(3.46)
and the positivity of K, we easily conclude (see Problem 3.2) that x ∈ X is the unique minimum of ψ if and only if x solves Kx = y. The Rayleigh–Ritz method is to minimize ψ over the ﬁnitedimensional subspace Xn . From (3.46), we see that if xn ∈ Xn minimizes ψ on Xn , then, for zn = xn ± εun with un ∈ Xn and ε > 0, we have that 0 ≤ ψ(zn ) − ψ(xn ) = ±ε 2 Re(Kxn − y, un )X + ε2 (Kun , un )X for all un ∈ Xn . Dividing by ε > 0 and letting ε → 0 yields that xn ∈ Xn satisﬁes the Galerkin equation (3.42). If, on the other hand, xn ∈ Xn solves (3.42), then from (3.46) ψ(zn ) − ψ(xn ) = (K(zn − xn ), zn − xn )X ≥ 0 for all zn ∈ Xn . Therefore, the Rayleigh–Ritz method is identical to the Bubnov–Galerkin method. Now we generalize the Rayleigh–Ritz method and study the Bubnov–Galerkin method for the important class of coercive operators in Gelfand triples V ⊂ X ⊂ V . For the deﬁnition of a Gelfand triple and coercive operators K : V → V , we refer to Deﬁnition A.26 of Appendix A.3. We just recall that V = { : V → K :
∈ V ∗ } is the space of antilinear functionals on V with corresponding bounded sesquilinear form ·, · : V × V → K which extends the inner product in X, that is, x, v = (x, v)X for all v ∈ V and x ∈ X. Coercivity of an operator K from V into V means that there exists a constant γ > 0 with x, Kx ≥ γ x2V for all x ∈ V . This condition implies that K is an isomorphism from V onto V ; see the remark following Deﬁnition A.26. If V is compactly imbedded in X, that is, j : V → X is compact, then KX is a compact operator from X into itself. Therefore, in this subsection we measure the compactness by the “smoothing” of K from V onto V rather than by a decay of the singular values. Now we can prove the main theorem about convergence of the Bubnov– Galerkin method for coercive operators in Gelfand triples.
3.2
Galerkin Methods
79
Theorem 3.13 Let V ⊂ X ⊂ V be a Gelfand triple, and Xn ⊂ V be ﬁnitedimensional subspaces such that n∈N Xn is dense in X. Let K : V → V be coercive in the sense of Deﬁnition A.26 with constant γ > 0. Let x∗ ∈ X be the solution of Kx∗ = y ∗ for some y ∗ ∈ V . Then we have the following: (a) There exist unique solutions of the Galerkin equations (3.42)–(3.44). The Bubnov–Galerkin solutions xn ∈ Xn of (3.42) converge in V with (3.47) x∗ − xn V ≤ c min x∗ − zn V : zn ∈ Xn
for some c > 0. (b) Deﬁne ρn > 0 by
ρn = max zn X : zn ∈ Xn , zn V = 1
(3.48)
and the orthogonal projection operator Pn from X onto Xn . The Bubnov– Galerkin solutions converge in X (rather than in the weaker norm of V ) if there exists c > 0 with c x − Pn xV ≤ xX for all x ∈ X . (3.49) ρn In this case, we have the estimates Rn L(X) ≤ and ∗
x −
xδn X
ρ2n γ
(3.50)
"
ρ2 ≤ c bn n δ + min x∗ − zn X : zn ∈ Xn γ
# (3.51)
for some c > 0. Here, bn = 1 if xδn ∈ Xn solves (3.43); that is, δ measures error β δ − β in the the norm y δ − y ∗ X in X.
n If δ δmeasures the discrete δ δ ˆj ∈ Xn , where α solves (3.44), then bn is Euclidean norm and xn = j=1 αj x given by ⎧ ⎫ n ⎨ ⎬ n aj 2 : a x ˆ = 1 : . (3.52) bn = max j j ⎩ ⎭ j=1
j=1
X
Again, we note that bn = 1 if {ˆ xj : j = 1, . . . , n} forms an orthonormal system in X. Proof: (a) We apply Theorem 3.7 to the equations Kx = y, x ∈ V , and Pn Kxn = Pn y, xn ∈ Xn , where we consider K as an operator from V into V . We observe that the projection operator Pn is also bounded from V into Xn where we consider Xn as a subspace of V . This follows from the observation that on the ﬁnitedimensional space Xn the norms ·X and ·V are equivalent and thus Pn uV ≤ c Pn uX ≤ c uX ≤ c˜ uV
for u ∈ V .
80
Regularization by Discretization
The constants c, and thus c˜, depend on n. Because V is dense in X and X is dense in V , we conclude that also n∈N Xn is dense in V . To apply Theorem 3.7, we have to show that (3.42) is uniquely solvable in Xn and that Rn K : V → Xn ⊂ V is uniformly bounded with respect to n. Because (3.42) is a ﬁnitedimensional quadratic system, it is suﬃcient to prove uniqueness. Let xn ∈ Xn satisfy (3.42) for y = 0. Because K is coercive, we have γxn 2V ≤ xn , Kxn = (xn , Kxn )X = 0 ; thus xn = 0. Now let x ∈ V and set xn = Rn Kx. Then xn ∈ Xn satisﬁes (Kxn , zn )X = (Kx, zn )X
for all zn ∈ Xn .
(3.53)
Again, we conclude that γxn 2V ≤ xn , Kxn = (xn , Kxn )X = (xn , Kx)X = xn , Kx ≤ KxV xn V and thus xn V ≤
1 1 KxV ≤ KL(V ,V ) xV . γ γ
Because this holds for all x ∈ V , we conclude that Rn KL(V ) ≤
1 KL(V ,V ) . γ
Then the assumptions of Theorem 3.7 are satisﬁed for K : V → V . (b) In this part we wish to apply Theorem 3.10. Let x ∈ X and xn = Rn Kx. Using the estimates (3.47) and (3.49), we conclude that x − xn X
≤ ≤
x − Pn xX + Pn x − xn X x − Pn xX + ρn Pn x − xn V
≤ ≤
x − Pn xX + ρn Pn x − xV + ρn x − xn V x − Pn xX + ρn Pn x − xV + c ρn min x − zn V
≤
x − Pn xX + (c + 1) ρn Pn x − xV
≤
2 xX + c1 xX = (2 + c1 ) xX ,
zn ∈Xn
and thus xn X ≤ xn − xX + xX ≤ (3 + c1 ) xX . Therefore, Rn KL(X) are uniformly bounded. Next, we prove the estimate of Rn in L(X). Let y ∈ X and xn = Rn y. We estimate γxn 2V ≤ xn , Kxn = (xn , Kxn )X = (xn , y)X ≤
yX xn X ≤ ρn yX xn V
3.3
Application to Symm’s Integral Equation of the First Kind
and thus xn X ≤ ρn xn V ≤
81
1 2 ρ yX , γ n
which proves the estimate (3.50). The application of Theorem 3.10 yields the estimate (3.51). From our general perturbation theorem (Theorem 3.8), we observe that the assumption of K being coercive can be weakened. It is suﬃcient to assume that K is onetoone and satisﬁes G˚ arding’s inequality. We formulate the result in the next theorem. Theorem 3.14 The assertions of Theorem 3.13 also hold if K : V → V is onetoone and satisﬁes G˚ arding’s inequality with some compact operator C : V →V. For further reading, we refer to [204] and the monographs [17, 168, 182].
3.3
Application to Symm’s Integral Equation of the First Kind
In this section, we apply the Galerkin methods to an integral equation of the ﬁrst kind that occurs in potential theory. We study the Dirichlet problem for harmonic functions, that is, solutions of the Laplace equation satisfying a boundary condition; that is, Δu = 0
in Ω,
u=f
on ∂Ω ,
(3.54)
where Ω ⊂ R2 is some bounded, simply connected region with analytic boundary ∂Ω and f ∈ C(∂Ω) is some given function. The single layer potential 1 u(x) = − φ(y) ln x − y ds(y), x ∈ Ω , (3.55) π ∂Ω
solves the boundary value problem (3.54) if and only if the density φ ∈ C(∂Ω) solves Symm’s equation 1 φ(y) ln x − y ds(y) = f (x) for x ∈ ∂Ω ; (3.56) − π ∂Ω
see [53]. It is wellknown (see [141]) that in general the corresponding integral operator is not onetoone. One has to make assumptions on the transﬁnite diameter of Ω; see [274]. We give a more elementary assumption in the following theorem. Theorem 3.15 Suppose there exists z0 ∈ Ω with x − z0  = 1 for all x ∈ ∂Ω. Then the only solution φ ∈ C(∂Ω) of Symm’s equation (3.56) for f = 0 is φ = 0; that is, the integral operator is onetoone.
82
Regularization by Discretization
Proof: We give a more elementary proof than in [142], but we still need a few results from potential theory. From the continuity of x → x − z0 , we conclude that either x − z0  < 1 for all x ∈ ∂Ω or x − z0  > 1 for all x ∈ ∂Ω. Assume ﬁrst that x − z0  < 1 for all x ∈ ∂Ω and choose a small disk A ⊂ Ω with center z0 such that x − z < 1 for all x ∈ ∂Ω and z ∈ A. Let φ ∈ C(∂Ω) satisfy (3.56) for f = 0 and deﬁne u by 1 φ(y) ln x − y ds(y) for x ∈ R2 . u(x) = − π ∂Ω
From potential theory (see [53]), we conclude that u is continuous in R2 , harmonic in R2 \ ∂Ω, and vanishes on ∂Ω. The maximum principle for harmonic functions implies that u vanishes in Ω. We show that u also vanishes in the exterior Ωe of Ω. The main part is to prove that φ(y) ds(y) = 0 . φˆ := ∂Ω
Without loss of generality, we can assume that φˆ ≥ 0. We study the harmonic function v deﬁned by 1 x − z φˆ ds(y), x ∈ Ωe , φ(y) ln v(x) := u(x) + ln x − z = π π x − y ∂Ω
for some z ∈ A. From the choice of A, we have v(x) =
φˆ ln x − z ≤ 0 π
for x ∈ ∂Ω .
We study the asymptotic behavior of v(x) as x tends to inﬁnity. Elementary calculations show that x − z 1 = 1 + x ˆ · (y − z) + O 1/x2 x − y x for x → ∞ uniformly in y ∈ ∂Ω, z ∈ A, and x ˆ := x/x. In particular, v(x) tends to zero as x tends to inﬁnity. The maximum principle applied to v in Ωe yields that v(x) ≤ 0 for all x ∈ Ωe . From the asymptotic formula and ln(1 + ε) = ε + O(ε2 ), we conclude furthermore that 1 v(x) = x ˆ · φ(y) (y − z) ds(y) + O 1/x2 πx ∂Ω
and thus x ˆ·
φ(y) (y − z) ds(y) ≤ 0 for all ˆ x = 1 . ∂Ω
3.3
Application to Symm’s Integral Equation of the First Kind
This implies that
83
φ(y) y ds(y) = z
∂Ω
φ(y) ds(y) . ∂Ω
$ Because this holds for all z ∈ A, we conclude that ∂Ω φ(y) ds(y) = 0. Now we see from the deﬁnition of v (for any ﬁxed z ∈ A) that u(x) = v(x) → 0
as x → ∞ .
The maximum principle again yields u = 0 in Ωe . Finally, the jump conditions of the normal derivative of the single layer potential operator (see [53]) yield 2 φ(x) = lim ∇u x − εν(x) − ∇u x + εν(x) · ν(x) = 0 ε→0+
for x ∈ ∂Ω, where ν(x) denotes the unit normal vector at x ∈ ∂Ω directed into the exterior of Ω. This ends the proof for the case that maxx∈∂Ω x − z0  < 1. The case minx∈∂Ω x − z0  > 1 is settled by the same arguments. Now we assume that the boundary ∂Ω has a parametrization of the form x = γ(s),
s ∈ [0, 2π] ,
for some 2πperiodic analytic function γ : [0, 2π] → R2 that satisﬁes γ(s) ˙ >0 for all s ∈ [0, 2π]. Then Symm’s equation (3.56) takes the form 1 − π
2π ψ(s) ln γ(t) − γ(s) ds = g(t) := f (γ(t))
for t ∈ [0, 2π]
(3.57)
0
for the transformed density ψ(s) := φ γ(s) γ(s), ˙ s ∈ [0, 2π]. For the special case where Ω is the disk with center 0 and radius a > 0, we have γa (s) = a (cos s, sin s) and thus ! t−s 1 ln γa (t) − γa (s) = ln a + ln 4 sin2 . (3.58) 2 2 For general boundaries, we can split the kernel in the form ! t−s 1 1 ln 4 sin2 − ln γ(t) − γ(s) = − + k(t, s) , π 2π 2
t = s ,
(3.59)
for some function k that is analytic for t = s. From the mean value theorem, we conclude that 1 ˙ . lim k(t, s) = − ln γ(t) s→t π
84
Regularization by Discretization
This implies that k has an analytic continuation onto [0, 2π] × [0, 2π]. With this, splitting the integral equation (3.57) takes the form 1 − 2π
2π 0
t−s ψ(s) ln 4 sin 2 2
!
2π ψ(s) k(t, s) ds = g(t)
ds +
(3.60)
0
for t ∈ [0, 2π]. We want to apply the results of the previous section on Galerkin methods to this integral equation. As the Hilbert space X, we choose X = L2 (0, 2π). The operators K, K0 , and C are deﬁned by (Kψ)(t)
(K0 ψ)(t) Cψ
1 = − π
2π ψ(s) ln γ(t) − γ(s) ds,
(3.61a)
0
1 = − 2π
2π 0
" ! # t−s ψ(s) ln 4 sin2 − 1 ds, 2
= Kψ − K0 ψ
(3.61b)
(3.61c)
for t ∈ [0, 2π] and ψ ∈ L2 (0, 2π). First, we observe that K, K0 , and C are welldeﬁned and compact operators in L2 (0, 2π) because the kernels are weakly singular (see Theorem A.35 of Appendix A.3). They are also selfadjoint in L2 (0, 2π). Then Symm’s equation (3.57) takes the form Kψ = g in the Hilbert space L2 (0, 2π). As ﬁnitedimensional subspaces Xn and Yn , we choose the spaces of truncated Fourier series, that is, n ˆ Xn = Yn = αj ψj : αj ∈ C , (3.62) j=−n
where ψˆj (t) = eijt for t ∈ [0, 2π] and j ∈ Z. The corresponding orthogonal projection operators Pn from L2 (0, 2π) into Xn = Yn are given as Pn ψ =
n
αj ψˆj
(3.63)
j=−n
∞ where ψ = j=−∞ αj ψˆj . To investigate the mapping properties of K, we need the following technical result (see [168], Lemma 8.21). Lemma 3.16 1 2π
2π
ins
e 0
s& ds = ln 4 sin 2 %
2
'
−1/n, 0,
n ∈ Z, n = 0, n = 0.
(3.64)
3.3
Application to Symm’s Integral Equation of the First Kind
85
Proof: It suﬃces to study the case n ∈ N0 . First let n ∈ N. Integrating the geometric sum 1 + 2
n−1
s eijs + eins = i 1 − eins cot , 2 j=1
yields
0 < s < 2π ,
2π ins s e − 1 cot ds = 2π i . 2 0
Integration of the identity % % s &) s& d ( ins s e − 1 ln 4 sin2 = in eins ln 4 sin2 + eins − 1 cot ds 2 2 2 yields 2π
ins
e 0
1 s& ds = − ln 4 sin 2 in %
2
2π 0
2π s , eins − 1 cot ds = − 2 n
which proves the assertion for n ∈ N. It remains to study the case where n = 0. Deﬁne 2π I := 0
% s& ds . ln 4 sin2 2
Then we conclude that 2π 2I
= 0
2π = 0
2π = 0
and thus I = 0.
s& ds + ln 4 sin 2 %
2
2π 0
% s& ds ln 4 cos2 2
% s s& ds ln 16 sin2 cos2 2 2
1 ln 4 sin s ds = 2 2
4π 0
% s& ds = I ln 4 sin2 2
This lemma shows that the functions ψˆn (t) := eint ,
t ∈ [0, 2π],
n ∈ Z,
(3.65)
86
Regularization by Discretization
are eigenfunctions of K0 : K0 ψˆn K0 ψˆ0
1 ˆ ψn n = ψˆ0 . =
for n = 0 and
(3.66a) (3.66b)
Now can prove the mapping properties of the operators. Theorem 3.17 Suppose there exists z0 ∈ Ω with x − z0  = 1 for all x ∈ ∂Ω. Let the operators K and K0 be given by (3.61a) and (3.61b), respectively. s (0, 2π), we denote the Sobolev spaces of order s (see Section A.4 of By Hper Appendix A). (a) The operators K and K0 can be extended to isomorphisms from s−1 s (0, 2π) onto Hper (0, 2π) for every s ∈ R. Hper −1/2
1/2
(b) The operator K0 is coercive from Hper (0, 2π) onto Hper (0, 2π). s−1 s (c) The operator C = K − K0 is compact from Hper (0, 2π) into Hper (0, 2π) for every s ∈ R.
Proof:
Let ψ ∈ L2 (0, 2π). Then ψ has the representation αn eint with αn 2 < ∞ . ψ(t) = n∈Z
n∈Z
From (3.66a) and (3.66b), we have (K0 ψ)(t) = α0 +
1 αn eint n
n =0
and thus for any s ∈ R: K0 ψ2Hper s
= α0 2 + ⎡
(ψ, K0 ψ)L2
1 + n2
n =0
s 1 αn 2 n2 ⎤
1 αn 2 ⎦ n
=
2π ⎣α0 2 +
≥
−1/2 2π 1 + n2 αn 2
n =0
n∈Z
=
2πψ2H −1/2 .
From the elementary estimate s s s−1 1 + n2 1 + n2 2 s−1 = 2 1 + n2 1+n ≤ ≤ 1 , 2 2 n 2 1+n
per
n = 0 ,
s−1 we see that K0 can be extended to an isomorphism from Hper (0, 2π) onto s Hper (0, 2π) and is coercive for s = 1/2. The operator C is bounded from
3.3
Application to Symm’s Integral Equation of the First Kind
87
r s Hper (0, 2π) into Hper (0, 2π) for all r, s ∈ R by Theorem A.47 of Appendix A.4. s−1 (0, 2π) into This proves part (c) and that K = K0 + C is bounded from Hper s s−1 (0, 2π) Hper (0, 2π). It remains to show that K is also an isomorphism from Hper s onto Hper (0, 2π). From the Riesz theory (Theorem A.36), it is suﬃcient to s−1 (0, 2π) with Kψ = 0. From K0 ψ = −Cψ and prove injectivity. Let ψ ∈ Hper r (0, 2π) for all r ∈ R, the mapping properties of C, we conclude that K0 ψ ∈ Hper r that is, ψ ∈ Hper (0, 2π) for all r ∈ R. In particular, this implies that ψ is continuous and the transformed function φ(γ(t))=ψ(t)/γ(t) ˙ satisﬁes Symm’s equation (3.56) for f = 0. The application of Theorem 3.15 yields φ=0.
We are now in a position to apply all of the Galerkin methods of the previous section to Symm’s equation (3.56), that is, Kψ = g in L2 (0, 2π). We have seen that the convergence results require estimates of the condition numbers of K on the ﬁnitedimensional spaces Xn and also approximation properties of Pn ψ. In Lemma A.45 of Appendix A, we show the following estimates for any r ≥ s. r ψn Hper
≤
s Pn ψ − ψHper
≤
s c nr−s ψn Hper for all ψn ∈ Xn , (3.67a) c r r ψHper for all ψn ∈ Hper (0, 2π) , (3.67b) nr−s
and all n ∈ N. Estimate (3.67a) is sometimes called the stability property (see −1 (0, 2π), [142]). From (3.67a) and the continuity of K −1 from L2 (0, 2π) into Hper we conclude that ψn L2 ≤ c n Kψn L2
for all ψn ∈ Xn .
(3.67c)
Indeed, this follows from ψn L2
≤
−1 −1 c n ψn Hper = c n K −1 Kψn Hper
≤
−1 Kψn L2 . c n K −1 L(L2 ,Hper )
Combining these estimates with the convergence results of the previous section, we have (almost) shown the following1 . r Theorem 3.18 Let ψ ∗ ∈ Hper (0, 2π) be the unique solution of (3.57), that is,
1 (Kψ )(t) := − π ∗
2π
ψ ∗ (s) ln γ(t) − γ(s) ds = g ∗ (t) := f (γ(t)) ,
0
r+1 (0, 2π) for some r ≥ 0. Let g δ ∈ L2 (0, 2π) for t ∈ [0, 2π] and some g ∗ ∈ Hper with g δ − g ∗ L2 ≤ δ and let Xn be deﬁned by (3.62). 1 Note that in this theorem, ψ ∗ denotes the exact solution in accordance with the notation of the general theory. It must not be mixed up with the complex conjugate.
88
Regularization by Discretization
(a) Let ψnδ ∈ Xn be the least squares solution, that is, the solution of (Kψnδ , Kφn )L2 = (g δ , Kφn )L2
for all φn ∈ Xn
(3.68a)
or (b) Let ψnδ = K ψ˜nδ with ψ˜nδ ∈ Xn be the dual least squares solution, that is, ψ˜δ solves n
(K ψ˜nδ , Kφn )L2 = (g δ , φn )L2
for all φn ∈ Xn
(3.68b)
or (c) Let ψnδ ∈ Xn be the Bubnov–Galerkin solution, that is, the solution of (Kψnδ , φn )L2 = (g δ , φn )L2
for all φn ∈ Xn .
Then there exists c > 0 with ψnδ
∗
− ψ L2
1 r ≤ c n δ + r ψ ∗ Hper n
(3.68c)
! (3.69)
for all n ∈ N. 1/2
Proof: We apply Theorems 3.11, 3.12, and 3.14 (the latter with V = Hper (0, 2π) −1/2 and V = Hper (0, 2π)). For the least squares method (Theorem 3.11), we have to show Assumption (3.33). By (3.67c) we have (3.70) σn = max φn L2 : φn ∈ Xn , Kφn L2 = 1 ≤ c n , and thus, using (3.67b) for r = 0 and s = −1, min ψ − φn L2 + σn K(ψ − φn )L2 φn ∈Xn
≤ ψ − Pn ψL2 + σn K(ψ − Pn ψ)L2 −1 −1 ≤ ψL2 + c n KL(Hper ≤ c ψL2 . ,L2 ) ψ − Pn ψHper Furthermore, min ψ ∗ − φn L2 : φn ∈ Xn ≤ ψ ∗ − Pn ψ ∗ L2 ≤
c r ψ ∗ Hper nr
which proves the result for the least squares method. The estimate for the dual least squares method follows immediately from (3.40) and σn ≤ c n. For the Bubnov–Galerkin method, we have to estimate ρn from (3.48). √ ρn = max φn L2 : φn ∈ Xn , φn Hper ≤ c n −1/2 = 1
3.3
Application to Symm’s Integral Equation of the First Kind
by (3.67a) for r = 0 and s = −1/2. This ends the proof.
89
It is interesting to note that diﬀerent error estimates hold for discrete perturbations of the righthand side. Let us denote by βk the righthand sides of (3.68a), (3.68b), or (3.68c), respectively, for φk (t) = exp(ikt). Assume that β is perturbed by a vector β δ ∈ C2n+1 with β − β δ  ≤ δ. We have to compute bn of (3.35), (3.41), and (3.52), respectively. Because the functions ψˆk (t) = exp(ikt), k = −n, . . . , n, are orthogonal, we compute bn for (3.41) and (3.52) by ⎫ ⎧ n ⎬ ⎨ n 1 ˆ ρj 2 : ρ = 1 = √ . ψ max j j ⎭ ⎩ 2π L2 j=−n j=−n For the least squares method, however, we have to compute ⎧ ⎫ n n ⎨ ⎬ ˆ ρj 2 : ρ K = 1 b2n = max ψ j 0 j ⎩ ⎭ 2 j=−n
=
⎧ n ⎨
L
j=−n
⎫ ! ⎬ 1 2 max ρj 2 : 2π ρ0 2 + ρ  = 1 j ⎩ ⎭ j2 j=−n j =0
2
=
n ; 2π
that is, for discrete perturbations of the righthand side, the estimate (3.69) is asymptotically the same for the dual least squares method and the Bubnov– Galerkin method, while for the least squares method it has to be replaced by ! 1 r ψnδ − ψ ∗ L2 ≤ c n2 δ + r ψ ∗ Hper . (3.71) n The error estimates (3.69) are optimal under the a priori information ψ ∗ ∈ 1/(r+1) r and ψ ∗ Hper ≤ 1. This is seen by choosing n ∼ 1/δ , which gives the asymptotic estimate δ ∗ ψ ≤ c δ r/(r+1) . n(δ) − ψ L2 r Hper (0, 2π)
This is optimal by Problem 3.4. From the preceding analysis, it is clear that the convergence property !r−s / . 1 r s r : φn ∈ X n ≤ c ψ ∗ Hper , ψ ∗ ∈ Hper (0, 2π) , min ψ ∗ − φn Hper n and the stability property r s φn Hper ≤ c nr−s φn Hper ,
φn ∈ X n ,
for r ≥ s and n ∈ N, are the essential tools in the proofs. For regions Ω with nonsmooth boundaries, ﬁnite element spaces for Xn are more suitable. They
90
Regularization by Discretization
satisfy these conditions for a certain range of values of r and s (depending on the smoothness of the solution and the order of the ﬁnite elements). We refer to [65, 68, 141–143, 268] for more details and boundary value problems for more complicated partial diﬀerential equations. We refer to Problem 3.5 and Section 3.5, where the Galerkin methods are explicitly compared for special cases of Symm’s equation. For further literature on Symm’s and related integral equations, we refer to [11, 12, 22, 83, 223, 249, 275].
3.4
Collocation Methods
We have seen that collocation methods are subsumed under the general theory of projection methods through the use of interpolation operators. This requires the space Y to be a reproducing kernel Hilbert space, that is, a Hilbert space in which all the evaluation functionals y → y(t) for y ∈ Y and t ∈ [a, b] are bounded. Instead of presenting a general theory as in [202], we avoid the explicit introduction of reproducing kernel Hilbert spaces and investigate only two special, but important, cases in detail. First, we study the minimum norm collocation method. It turns out that this is a special case of a least squares method and can be treated by the methods of the previous section. In Subsection 3.4.2, we investigate a second collocation method for the important example of Symm’s equation. We derive a complete and satisfactory error analysis for two choices of ansatz functions. First, we formulate the general collocation method again and derive an error estimate in the presence of discrete perturbations of the righthand side. Let X be a Hilbert space over the ﬁeld K, Xn ⊂ X be ﬁnitedimensional subspaces with dimXn = n, and a ≤ t1 < · · · < tn ≤ b be the collocation points. Let K : X → C[a, b] be bounded and onetoone. Let Kx∗ = y ∗ , and assume that the collocation equations (Kxn )(ti ) = y(ti ) ,
i = 1, . . . , n ,
(3.72)
xj : j = are uniquely solvable in Xn for every righthand side. Choosing a basis
{ˆ n ˆj 1, . . . , n} of Xn , we rewrite this as a system Aα = β, where xn = j=1 αj x and Aij = (K x ˆj )(ti ) , βi = y(ti ) . (3.73) The following main theorem is the analogue of Theorem 3.10 for collocation methods. We restrict ourselves to the important case of discrete perturbations of the righthand side. Continuous perturbations could also be handled but are not of particular interest because point evaluation is no longer possible when the righthand side is perturbed in the L2 sense. This would require stronger norms in the range space and leads to the concept of reproducing kernel Hilbert spaces (see [168]).
3.4
Collocation Methods
91 (n)
(n)
Theorem 3.19 Let Kx∗ = y ∗ and let {t1 , . . . , tn } ⊂ [a, b], n ∈ N, be a sequence of collocation points. Assume that n∈N Xn is dense in X and that
n ˆj ∈ Xn , where αδ solves the collocation method converges. Let xδn = j=1 αjδ x Aαδ = β δ . Here, β δ ∈ Kn satisﬁes β − β δ  ≤ δ where βi = y ∗ (ti ) and  ·  denotes again the Euclidean norm in Kn . Then the following error estimate holds: ! an xδn − x∗ X ≤ c δ + inf x∗ − zn X : zn ∈ Xn , (3.74) λn where an
n = max ρj x ˆj j=1
:
X
n
ρj  = 1 2
(3.75)
j=1
and λn denotes the smallest singular value of A. Proof: Again we write xδn − x∗ X ≤ xδn − xn X + xn − x∗ X , where ∗ xn = Rn y solves the collocation equation for β instead of β δ . The second term is estimated by Theorem 3.7. We estimate the ﬁrst term by xδn − xn X
≤ ≤
an αδ − α = an A−1 (β δ − β) an δ. an A−1 2 β δ − β ≤ λn
Again we remark that an = 1 if {ˆ xj : j = 1, . . . , n} forms an orthonormal system in X.
3.4.1
Minimum Norm Collocation
Again, let K : X → C[a, b] be a linear, bounded, and injective operator from the Hilbert space X into the space C[a, b] of continuous functions on [a, b]. We assume that there exists a unique solution x∗ ∈ X of Kx∗ = y ∗ . Let a ≤ t1 < · · · < tn ≤ b be the set of collocation points. Solving the equations (3.72) in X is certainly not enough to specify the solution xn uniquely. An obvious choice is to determine xn ∈ X from the set of solutions of (3.72) that has a minimal L2 norm among all solutions. Definition 3.20 xn ∈ X is called the moment solution of (3.72) with respect to the collocation points a ≤ t1 < · · · < tn ≤ b if xn satisﬁes (3.72) and xn X = min zn X : zn ∈ X satisﬁes (3.72) . We can interpret this moment solution as a least squares solution. Because z → (Kz)(ti ) is bounded from X into K, Theorem A.23 by Riesz yields the existence of ki ∈ X with (Kz)(ti ) = (z, ki )X for all z ∈ X and i = 1, . . . , n. If, for example, X = L2 (a, b) and K is the integral operator b k(t, s) z(s) ds ,
(Kz)(t) = a
t ∈ [a, b] , z ∈ L2 (a, b) ,
92
Regularization by Discretization
with realvalued kernel k then ki ∈ L2 (a, b) is explicitly given by ki (s) = k(ti , s). Going back to the general case, we rewrite the moment equation (3.72) in the form (xn , ki )X = y(ti ) = (x, ki )X , i = 1, . . . , n . The minimum norm solution xn of the set of equations is characterized by the projection theorem (see Theorem A.13 of Appendix A.1) and is given by the solution of (3.72) in the space Xn := span{kj : j = 1, . . . , n}. Now we deﬁne the Hilbert space Y by Y := K(X) = R(K) with inner product (y, z)Y := (K −1 y, K −1 z)X
for y, z ∈ K(X) = R(K) .
We omit the simple proof of the following lemma. Lemma 3.21 Y is a Hilbert space that is continuously embedded in C[a, b]. Furthermore, K is an isomorphism from X onto Y . Now we can rewrite (3.72) in the form. Kxn , Kki Y = y, Kki Y ,
i = 1, . . . , n .
Comparing this equation with (3.31a), we observe that (3.72) is the Galerkin equation for the least squares method with respect to Xn . Thus we have shown that the moment solution can be interpreted as the least squares solution for the operator K : X → Y . The application of Theorem 3.11 yields the following theorem. Theorem 3.22 Let K be onetoone and {kj : j = 1, . . . , n} be linearly independent where kj ∈ X are such that (Kz)(tj ) = (z, kj )X for all z ∈ X, j = 1, . . . , n. Then there exists one and only one moment solution xn of (3.72). xn is given by n xn = αj kj , (3.76) j=1
where α ∈ Kn solves the linear system Aα = β with Aij = (Kkj )(ti ) = (kj , ki )X (n)
and
βi = y(ti ) .
(3.77)
(n)
Let {t1 , . . . , tn } ⊂ [a, b], n ∈ N, be a sequence of collocation points such that n∈N Xn is dense in X where (n)
Xn := span{kj
: j = 1, . . . , n} .
Then the moment method converges; that is, the moment solution xn ∈ Xn of (3.72) converges in X to the solution x∗ ∈ X of Kx∗ = y ∗ . If xδn =
3.4
Collocation Methods
n
93
(n)
αjδ kj , where αδ ∈ Kn solves Aαδ = β δ with β − β δ  ≤ δ, then the following error estimate holds: an (3.78) x∗ − xδn X ≤ δ + c min x∗ − zn X : zn ∈ Xn , λn j=1
where
n (n) an = max ρ k j j
:
X
j=1
n
ρj 2 = 1
(3.79)
j=1
and where λn denotes the smallest singular value of A. Proof: The deﬁnition of · Y implies that σn = 1, where σn is given by (3.32). Assumption (3.33) for the convergence of the least squares method is obviously satisﬁed because min x∗ − zn X + σn K(x∗ − zn )Y ≤ x∗ X + σn x∗ X = 2 x∗ X .
zn ∈Xn
The application of Theorem 3.11 yields the assertion.
As an example, we again consider numerical diﬀerentiation. Example 3.23 Let X = L2 (0, 1) and K be deﬁned by t
1 x(s) ds =
(Kx)(t) = 0
k(t, s) x(s) ds,
t ∈ [0, 1] ,
0
1, s ≤ t, 0, s > t. We choose equidistant nodes, that is, tj = nj for j = 0, . . . , n. The moment method is to minimize x2L2 under the restrictions that
with k(t, s) =
tj x(s) ds = y(tj ),
j = 1, . . . , n .
(3.80)
0
The solution xn is piecewise constant because it is a linear combination of the piecewise constant functions k(tj , ·). Therefore, the ﬁnitedimensional space Xn is given by (3.81) Xn = zn ∈ L2 (0, 1) : zn (tj−1 ,tj ) constant, j = 1, . . . , n . ˆj (s) = k(tj , s). As basis functions ˆj of Xn we choose x
n x Then xn = j=1 αj k(tj , ·) is the moment solution, where α solves Aα = β with βi = y(ti ) and 1 Aij =
k(ti , s) k(tj , s) ds = 0
1 min{i, j} . n
94
Regularization by Discretization
It is not diﬃcult to see that the moment solution is just the onesided diﬀerence quotient xn (t1 ) =
1 y(t1 ), h
xn (tj ) =
1 y(tj ) − y(tj−1 ) , h
j = 2, . . . , n ,
for h = 1/n. We have to check the assumptions of Theorem 3.22. First, K isonetoone and {k(tj , ·) : j = 1 . . . , n} are linearly independent. The union n∈N Xn is dense in L2 (0, 1) (see Problem 3.6). We have to estimate an from (3.79), the smallest eigenvalue λn of A, and min x − zn L2 : zn ∈ Xn . n Let ρ ∈ Rn with j=1 ρ2j = 1. Using the Cauchy–Schwarz inequality, we estimate 2 1 1 n n n n+1 2 . ρ k(t , s) ds ≤ k(t , s) ds = tj = j j j 2 j=1 j=1 j=1 0
0
0 Thus an ≤ (n + 1)/2. It is straightforward to check that the inverse of A is given by the tridiagonal matrix ⎤ ⎡ 2 −1 ⎥ ⎢ −1 2 −1 ⎥ ⎢ ⎥ ⎢ .. .. .. A−1 = n ⎢ ⎥. . . . ⎥ ⎢ ⎣ −1 2 −1 ⎦ −1 1 We estimate the largest eigenvalue μmax of A−1 by the maximum absolute row sum μmax ≤ 4 n. This is asymptotically sharp because we can give a lower estimate of μmax by the trace formula n μmax ≥ trace(A−1 ) =
n −1 A = (2n − 1) n ; jj j=1
that is, we have an estimate of λn of the form 1 1 ≤ λn ≤ . 4n 2n − 1 In Problem 3.6, it is shown that 1 min x − zn L2 : zn ∈ Xn ≤ x L2 . n Thus we have proven the following theorem. Theorem 3.24 The moment method for (3.80) converges. The following error estimate holds: 3 c n+1 ∗ δ x − xn L2 ≤ δ + (x∗ ) L2 2 n
3.4
Collocation Methods
95
if x∗ ∈ H 1 (0, 1). Here, δ is the discrete error on the righthand side, that is,
n
n δ ∗ 2 2 δ δ ˆj , where αδ ∈ Rn solves Aαδ = β δ . j=1 βj −y (tj ) ≤ δ and xn = j=1 αj x The choice Xn = S1 (t1 , . . . , tn ) of linear splines leads to the twosided diﬀerence quotient (see Problem 3.8). We refer to [85, 201, 202] for further reading on moment collocation.
3.4.2
Collocation of Symm’s Equation
We now study the numerical treatment of Symm’s equation (3.57), that is, 1 (Kψ )(t) := − π ∗
2π
ψ ∗ (s) ln γ(t) − γ(s) ds = g ∗ (t)
(3.82)
0
for 0 ≤ t ≤ 2π by collocation methods. The integral operator K from (3.82) is 1 welldeﬁned and bounded from L2 (0, 2π) into Hper (0, 2π). We assume throughout this subsection that K is onetoone (see Theorem 3.15). Then we have seen in Theorem 3.17 that equation (3.82) is uniquely solvable in L2 (0, 2π) for every 1 g ∈ Hper (0, 2π); that is, K is an isomorphism. We deﬁne equidistant collocation points by π for k = 0, . . . , 2n − 1 . tk := k n There are several choices for the space Xn ⊂ L2 (0, 2π) of basis functions. Before xj : j ∈ Jn } ⊂ L2 (0, 2π) be arbitrary. we study particular cases, let Xn = span{ˆ xj : j ∈ Jn } Jn ⊂ Z denotes a set of indices with 2n elements. We assume that {ˆ forms an orthonormal system in L2 (0, 2π). The collocation equations (3.72) take the form 1 − π
2π
ψn (s) ln γ(tk ) − γ(s) ds = g ∗ (tk ) , k = 0, . . . , 2n − 1 ,
(3.83)
0
1 with ψn ∈ Xn . Let Qn : Hper (0, 2π) → Yn be the trigonometric interpolation operator into the 2ndimensional space ' n−1 4 imt Yn := am e : am ∈ C . (3.84) m=−n
We recall some approximation properties of the interpolation operator Qn : 1 (0, 2π) → Yn . First, it is easily checked (see also Theorem A.46 of Hper Appendix A.4) that Qn is given by Qn ψ =
2n−1 k=0
ψ(tk ) yˆk
96
Regularization by Discretization
with Lagrange interpolation basis functions yˆk (t) =
n−1 1 im(t−tk ) e , 2n m=−n
k = 0, . . . , 2n − 1 .
(3.85)
From Theorem A.46 of Appendix A.4, we have also the estimates ψ − Qn ψL2
≤
1 Qn ψHper
≤
c 1 1 ψHper for all ψ ∈ Hper (0, 2π), n 1 1 c ψHper for all ψ ∈ Hper (0, 2π) .
(3.86a) (3.86b)
Now we can reformulate the collocation equations (3.83) as Qn Kψn = Qn g ∗
with ψn ∈ Xn .
(3.87)
We use the perturbation result of Theorem 3.8 again and split K into the form K = K0 + C with 1 (K0 ψ)(t) := − 2π
2π 0
"
t−s ψ(s) ln 4 sin 2 2
!
# − 1 ds .
(3.88)
Now we specify the spaces Xn . As a ﬁrst example, we choose the orthonormal functions 1 x ˆj (t) = √ eijt for j = −n, . . . , n − 1 . (3.89) 2π We prove the following convergence result. Theorem 3.25 Let x ˆj , j = −n, . . . , n − 1, be given by (3.89). The collocation method is convergent; that is, the solution ψn ∈ Xn of (3.83) converges to the solution ψ ∗ ∈ L2 (0, 2π) of (3.82) in L2 (0, 2π). Let the righthand side of (3.83) be replaced by β δ ∈ C2n with 2n−1
βkδ − g ∗ (tk )2 ≤ δ 2 .
k=0
ˆj )(tk ). Then the Let αδ ∈ C2n be the solution of Aαδ = β δ , where Akj = (K x following error estimate holds: √ ψnδ − ψ ∗ L2 ≤ c n δ + min ψ ∗ − φn L2 : φn ∈ Xn , (3.90) where
n−1 1 δ ijt ψnδ (t) = √ αj e . 2π j=−n
r If ψ ∗ ∈ Hper (0, 2π) for some r > 0, then " # √ 1 δ ∗ ∗ r n δ + r ψ Hper ψn − ψ L2 ≤ c . n
(3.91)
3.4
Collocation Methods
97
Proof: By the perturbation Theorem 3.8, it is suﬃcient to prove the result for K0 instead of K. By (3.66a) and (3.66b), the operator K0 maps Xn into Yn = Xn . Therefore, the collocation equation (3.87) for K0 reduces to K0 ψn = Qn g ∗ . We want to apply Theorem 3.7 and have to estimate Rn K0 where in this case −1 1 Rn = K0 Xn Qn . Because K0 : L2 (0, 2π) → Hper (0, 2π) is invertible, we conclude that 1 1 1 Rn gL2 = ψn L2 ≤ c1 K0 ψn Hper = c1 Qn gHper ≤ c2 gHper
1 for all g ∈ Hper (0, 2π), and thus 1 Rn KψL2 ≤ c2 KψHper ≤ c3 ψL2
for all ψ ∈ L2 (0, 2π). The application of Theorem 3.7 yields convergence. To prove the error estimate (3.90), we want to apply Theorem 3.19 and hence have to estimate the singular values of the matrix B deﬁned by ˆj )(tk ), Bkj = (K0 x
k, j = −n, . . . , n − 1 ,
with x ˆj from (3.89). From (3.66a) and (3.66b), we observe that 1 1 ijk π e n, Bkj = √ 2π j
k, j = −n, . . . , n − 1 ,
where 1/j has to be replaced by 1 if j = 0. Because the singular values of B are the square roots of the eigenvalues of B ∗ B, we compute
B∗B
j
=
n−1 k=−n
B k Bkj =
n−1 π 1 1 n 1 eik(j−) n = δj , 2π  j π 2 k=−n
where again 1/ 2 has to be replaced 0 by 1 for = 0. From this, we see that the singular values √ of B are given by n/(π 2 ) for = 1, . . . , n. The smallest singular value is 1/ nπ. Estimate (3.74) of Theorem 3.19 yields the assertion. (3.91) follows from Theorem A.46. Comparing the estimate (3.91) with the corresponding error estimate (3.69) for the Galerkin methods, it seems that the estimate √ for the collocation method is better because the error δ is only multiplied by n instead of n. Let us now compare the errors of the continuous perturbation y ∗ − y δ L2 with the discrete perturbation for both methods. To do this, we have to “extend” the discrete vector β δ to a function ynδ ∈ Xn . For the collocation method, we have to use
2n−1 the interpolation operator Qn and deﬁne ynδ ∈ Xn by ynδ = j=1 βjδ yˆj , where yˆj are the Lagrange basis functions (3.85). Then ynδ (tk ) = βkδ , and we estimate ynδ − y ∗ L2 ≤ ynδ − Qn y ∗ L2 + Qn y ∗ − y ∗ L2 .
98
Regularization by Discretization
Writing ynδ (t) − Qn y ∗ (t) =
n−1
ρj eijt ,
j=−n
a simple computation shows that n−1
βkδ − y ∗ (tk )2
=
k=0
n−1
βkδ − Qn y ∗ (tk )2 = n−1
2n
k=0 j=−n
k=0
=
n−1 n−1
j=−n
ρj 2 =
2 π ρj eikj n
n δ y − Qn y ∗ 2L2 . π n
(3.92)
Therefore, for the collocation error 0 method we have to compare the continuous √ δ with the discrete error δ n/π. This gives an extra factor of n in the ﬁrst terms of (3.90) and (3.91).
n 1 δ For Galerkin methods, however, we deﬁne ynδ (t) = 2π j=−n βj exp(ijt). δ ij· δ Then yn , e L2 = βj . Let Pn be the orthogonal projection onto Xn . In ynδ − y ∗ L2 ≤ ynδ − Pn y ∗ L2 + Pn y ∗ − y ∗ L2 , we estimate the ﬁrst term as ynδ − Pn y ∗ 2L2 =
n 1 δ ∗ ij· 2 β − y , e L2 . 2π j=−n j
In this case, the continuous and discrete errors are of the same order. Choosing trigonometric polynomials as basis functions is particularly suitable for smooth boundary data. If ∂Ω or the righthand side f of the boundary value problem (3.54) is not smooth, then spaces of piecewise constant functions are more appropriate. We now study the case where the basis functions x ˆj ∈ L2 (0, 2π) are deﬁned by ' 0 n x ˆ0 (t)
=
π,
0, ' 0 n
x ˆj (t)
=
π,
0,
π if t < 2n or t > 2π − π π if 2n < t < 2π − 2n ,
if t − tj  < if t − tj  >
π 2n , π 2n ,
π 2n ,
(3.93a)
(3.93b)
for j = 1, . . . , 2n − 1. Then x ˆj , j = 0, . . . , 2n − 1, are also orthonormal in L2 (0, 2π). In the following lemma, we collect some approximation properties of the corresponding spaces Xn .
3.4
Collocation Methods
99
Lemma 3.26 Let Xn = span{ˆ xj : j = 0, . . . , 2n − 1}, where x ˆj are deﬁned 2 : L (0, 2π) → X be the orthogonal projection by (3.93a) and (3.93b). Let P n n operator. Then n∈N Xn is dense in L2 (0, 2π) and there exists c > 0 with ψ − Pn ψL2
≤
K(ψ − Pn ψ)L2
≤
c 1 1 ψHper for all ψ ∈ Hper (0, 2π), n c ψL2 for all ψ ∈ L2 (0, 2π) . n
(3.94a) (3.94b)
Proof: Estimate (3.94a) is left as an exercise. To prove estimate (3.94b), we use (implicitly) a duality argument: K(ψ − Pn ψ), φ L2 K(ψ − Pn ψ)L2 = sup φL2 φ =0 ψ − Pn ψ, Kφ L2 = sup φL2 φ =0 ψ, (I − Pn )Kφ L2 = sup φL2 φ =0 ≤ ≤
(I − Pn )KφL2 φL2 φ =0
ψL2 sup
1 KφHper c˜ ψL2 sup n φL2 φ =0
c ψL2 . n
≤
Before we prove a convergence theorem, we compute the singular values of the matrix B deﬁned by Bkj
1 = (K0 x ˆj )(tk ) = − 2π
"
2π
2 tk
x ˆj (s) ln 4 sin 0
−s 2
!
# − 1 ds.
(3.95)
Lemma 3.27 B is symmetric and positive deﬁnite. The singular values of B coincide with the eigenvalues and are given by 3 3 n n sin mπ 1 2n and μm = (3.96a) μ0 = m 2 π π 2nπ +j j∈Z
2n
for m = 1, . . . , 2n − 1. Furthermore, there exists c > 0 with √ 1 √ ≤ μm ≤ c n πn
for all m = 0, . . . , 2n − 1 .
(3.96b)
We observe that the condition number of B, that is, the ratio between the largest and smallest singular values, is again bounded by n.
100
Regularization by Discretization
Proof:
We write Bkj = −
1 2π
3
n π
π tj + 2n "
ln 4 sin2
π tj − 2n
with
3
1 b = − 2π
n π
π t + 2n
π t − 2n
s − tk 2
!
# − 1 ds = bj−k
( % ) s& ln 4 sin2 − 1 ds , 2
where we extended the deﬁnition of t to all ∈ Z. Therefore, B is circulant and symmetric. The eigenvectors x(m) and eigenvalues μm of B are given by x(m) =
imk π 2n−1 n e k=0
and μm =
2n−1
π
bk eimk n ,
k=0
respectively, for m = 0, . . . , 2n − 1, as is easily checked. We write μm in the form 2π ) ( % s& 1 − 1 ds = K0 ψm (0) ψm (s) ln 4 sin2 μm = − 2π 2 0
3
with
n imk π π n e , k ∈ Z. for s − tk  ≤ ψm (s) = π 2n
Let ψm (t) = k∈Z ρm,k exp(ikt). Then by (3.66a) and (3.66b), we have ρm,k . μm = ρm,0 + k k =0
Therefore, we have to compute the Fourier coeﬃcients ρm,k of ψm . They are given by ρm,k
1 = 2π
2π
3 −iks
ψm (s) e 0
ds =
π
tj + 2n 2n−1 n 1 imj π e n e−iks ds . π 2π j=0
For k = 0, this reduces to 3 0 2n−1 n 1 imj π n/π e n = ρm,0 = 0 π 2n j=0
π tj − 2n
if m = 0, if m = 1, . . . , 2n − 1,
and for k = 0 to 3 ρm,k =
⎧ 3 n 2n πk ⎨ πk 2n−1 sin sin , π n 2n ei(m−k)j n = π πk 2n ⎩ π πk j=0 0,
if m − k ∈ 2nZ, if m − k ∈ / 2nZ.
3.4
Collocation Methods
101
0 Thus we have μ0 = n/π and 3 3 n 2n  sin πk n sin πm 1 2n  2n μm = = m 2 2 π π k π 2nπ j∈Z 2n + j k−m∈2nZ for m = 1, . . . , 2n − 1. This proves (3.96a). Because all eigenvalues are positive, the matrix B is positive deﬁnite and the eigenvalues coincide with the singular values. We set x = m/(2n) ∈ (0, 1) and separate the ﬁrst two terms in the series. This yields 3 ! n sin πx 1 1 + μm = (3.97) π 2nπ x2 (x − 1)2 3 3 ∞ ∞ n sin πx 1 n sin πx 1 + + π 2nπ j=1 (x + j)2 π 2nπ j=2 (x − j)2 3 ! n 1 sin πx sin π(1 − x) ≥ + . π 2nπ x2 (1 − x)2 From the elementary estimate sin πx sin π(1 − x) + ≥ 8, 2 x (1 − x)2 we conclude that μm ≥
x ∈ (0, 1) ,
1 4 1 √ ≥ √ π πn πn
for m = 1, . . . , 2n − 1. The upper estimate of (3.96b) is proven analogously. Indeed, from (3.97) we estimate, using also  sin t/t ≤ 1, ⎛ ⎞ 3 ∞ n 1 ⎝ sin πx 1⎠ sin π(1 − x) + + 2 μm ≤ π 2nπ x2 (1 − x)2 j2 j=1 ⎛ ⎞ 3 ∞ √ 1 2 1⎠ n 1 ⎝1 ≤ + + ≤ c n π 2n x 1−x π j=1 j 2 for some c > 0.
Now we can prove the following convergence result. Theorem 3.28 Let x ˆj , j = 0, . . . , 2n − 1, be deﬁned by (3.93a) and (3.93b). The collocation method is convergent; that is, the solution ψn ∈ Xn of (3.83) converges to the solution ψ ∗ ∈ L2 (0, 2π) of (3.82) in L2 (0, 2π). Let the righthand side be replaced by β δ ∈ C2n with 2n−1 j=0
βjδ − g ∗ (tj )2 ≤ δ 2 .
102
Regularization by Discretization
Let αδ ∈ C2n be the solution of Aαδ = β δ , where Akj = K x ˆj (tk ). Then the following error estimate holds: √ (3.98) ψnδ − ψ ∗ L2 ≤ c n δ + min ψ ∗ − φn L2 : φn ∈ Xn ,
2n−1 1 where ψnδ = j=0 αjδ x ˆj . If ψ ∗ ∈ Hper (0, 2π), then " # √ 1 1 ψ ∗ Hper nδ + . (3.99) ψnδ − ψ ∗ L2 ≤ c n Proof: By the perturbation theorem (Theorem 3.8), it is suﬃcient to prove the result for K0 instead of K. Again set −1 1 Qn : Hper (0, 2π) −→ Xn ⊂ L2 (0, 2π) , Rn = Qn K0 Xn
2n−1 1 let ψ ∈ Hper (0, 2π), and set ψn = Rn ψ = j=0 αj x ˆj . Then α ∈ C2n solves Bα = β with βk = ψ(tk ), and thus by (3.96b) ψn L2 = α ≤ B
−1
:1/2 92n−1 √ 2 2 β ≤ πn ψ(tk ) k=0
where  ·  again denotes the Euclidean norm in Cn . Using this estimate and (3.92) for βkδ = 0, we conclude that Rn ψL2 = ψn L2 ≤ n Qn ψL2
(3.100)
1 for all ψ ∈ Hper (0, 2π). Thus
Rn K0 ψL2 ≤ n Qn K0 ψL2 for all ψ ∈ L2 (0, 2π). Now we estimate Rn K0 ψL2 by the L2 norm of ψ itself. Let ψ˜n = Pn ψ ∈ Xn be the orthogonal projection of ψ ∈ L2 (0, 2π) in Xn . Then Rn K0 ψ˜n = ψ˜n and ψ˜n L2 ≤ ψL2 , and thus Rn K0 ψ − ψ˜n L2 = Rn K0 (ψ − ψ˜n )L2 ≤ n Qn K0 ψ − ψ˜n L2 ≤ n Qn K0 ψ − K0 ψL2 + n K0 ψ − K0 ψ˜n L2 + n K0 ψ˜n − Qn K0 ψ˜n L2 . Now we use the error estimates (3.86a), (3.94a), and (3.94b) of Lemma 3.26. This yields ( ) 1 1 Rn K0 ψ − ψ˜n L2 ≤ c1 K0 ψHper + ψL2 + K0 ψ˜n Hper ( ) ≤ c2 ψL2 + ψ˜n L2 ≤ c3 ψL2 , that is, Rn K0 ψL2 ≤ c4 ψL2 for all ψ ∈ L2 (0, 2π). Therefore, the assumptions of Theorem 3.7 are satisﬁed. The application of Theorem 3.19 yields the error estimate (3.99).
3.5
Numerical Experiments for Symm’s Equation
103
Among the extensive literature on collocation methods for Symm’s integral equation and related equations, we mention only the work of [8, 66, 67, 140, 147, 241, 242]. Symm’s equation has also been numerically treated by quadrature methods; see [90, 169, 239, 240, 250, 251]. For more general problems, we refer to [9, 68].
3.5
Numerical Experiments for Symm’s Equation
In this section, we apply all of the previously investigated regularization strategies to Symm’s integral equation 1 (Kψ)(t) := − π
2π ψ(s) ln γ(t) − γ(s) ds = g(t),
0 ≤ t ≤ 2π ,
0
where in this example γ(s) = cos s, 2 sin s , 0 ≤ s ≤ 2π, denotes the parametrization of the ellipse with semiaxes 1 and 2. First, we discuss the numerical computation of Kψ. We write Kψ in the form (see (3.60)) (Kψ)(t) = −
1 2π
2π 0
2π % t − s& ds + ψ(s) ln 4 sin2 ψ(s) k(t, s) ds , 2 0
for 0 ≤ t ≤ 2π, with the analytic function k(t, s) k(t, t)
1 γ(t) − γ(s)2 ln , t = s, 2π 4 sin2 t−s 2 1 ˙ , 0 ≤ t ≤ 2π . = − ln γ(t) π = −
We use the trapezoidal rule for periodic functions (see [168]). Let tj = j nπ , j = 0, . . . , 2n − 1. The smooth part is approximated by 2π k(t, s) ψ(s) ds ≈ 0
2n−1 π k(t, tj ) ψ(tj ), n j=0
0 ≤ t ≤ 2π .
For the weakly singular part, we replace ψ by its trigonometric interpolation
2n−1 polynomial Qn ψ = j=0 ψ(tj ) Lj into the 2ndimensional space ⎧ n ⎨ ⎩
j=0
aj cos(jt) +
n−1 j=1
⎫ ⎬ bj sin(jt) : aj , bj ∈ R ⎭
104
Regularization by Discretization
over R (see Section A.4 of Appendix A.4). From (A.37) and Lemma 3.16, we conclude that 1 − 2π
2π 0
% t − s& ds ψ(s) ln 4 sin2 2
≈
=
1 − 2π 2n−1
2π 0
% t − s& Qn ψ (s) ln 4 sin2 ds 2
ψ(tj ) Rj (t) ,
0 ≤ t ≤ 2π ,
j=0
where Rj (t)
2π % t − s& 1 ds = − Lj (s) ln 4 sin2 2π 2 0 ' 4 n−1 1 1 1 = cos n(t − tj ) + cos m(t − tj ) n 2n m m=1
for j = 0, . . . , 2n − 1. Therefore, the operator K is replaced by (Kn ψ)(t) :=
2n−1 j=0
π k(t, tj ) , ψ(tj ) Rj (t) + n
0 ≤ t ≤ 2π .
It is wellknown (see [168]) that Kn ψ converges uniformly to Kψ for every 2πperiodic continuous function ψ. Furthermore, the error Kn ψ − Kψ∞ is exponentially decreasing for analytic functions ψ. For t = tk , k = 0, . . . , 2n − 1,
2n−1 we have (Kn ψ)(tk ) = j=0 Akj ψ(tj ) with the symmetric matrix Akj := Rk−j +
where
1 R = n
'
π k(tk , tj ), n
k, j = 0 . . . , 2n − 1 ,
n−1 1 (−1) m π + cos 2n m n m=1
4 ,
= 0, . . . , 2n − 1.
For the numerical example, we take ψ(s) = exp 3 sin s , 0 ≤ s ≤ 2π, and ˜ We g = Kψ or, discretized, ψ˜j = exp 3 sin tj , j = 0, . . . , 2n − 1, and g˜ = Aψ. take n = 60 and add uniformly distributed random noise on the data g˜. All the results show the average of 10 computations. The errors are measured in the 2n−1 1 2 2n discrete norm z22 := 2n j=0 zj  , z ∈ C . First, we consider Tikhonov’s regularization method for δ = 0.1, δ = 0.01, δ = 0.001, and δ = 0. In Figure 3.1, we plot the errors ψ˜α,δ − ψ˜2 and α,δ Aψ˜ − g˜ in the solution and the righthand side, respectively, versus the 2 regularization parameter α. We clearly observe the expected behavior of the errors: For δ > 0 the error in the solution has a welldeﬁned minimum that depends on δ, while the defect always converges to zero as α tends to zero.
3.5
Numerical Experiments for Symm’s Equation
105
The minimal values errδ of the errors in the solution are approximately 0.351, 0.0909, and 0.0206 for δ = 0.1, 0.01, and 0.001, respectively. From this, we observe the order of convergence: increasing the error by factor 10 should increase the error by factor 102/3 ≈ 4.64, which roughly agrees with the numerical results where errδ=0.1 /errδ=0.01 ≈ 3.86 and errδ=0.01 /errδ=0.001 ≈ 4.41. In Figure 3.2, we show the results for the Landweber iteration with a = 0.5 for the same example where again δ = 0.1, δ = 0.01, δ = 0.001, and δ = 0. The errors in the solution and the defects are now plotted versus the iteration number m.
1
0.5
0.9
0.45
0.8
0.4
0.7
0.35
0.6
0.3
0.5
0.25
0.4
0.2
0.3
0.15
0.2
0.1
0.1
0.05
0 0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.04
0.08
0 0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.02 0.018
0.035
0.016 0.03 0.014 0.025
0.012
0.02
0.01 0.008
0.015
0.006 0.01 0.004 0.005 0 0
0.002 0.0002
0.0006
0.001
0.0014
0.0018
0 0
0.0002
0.0004
0.0006
Figure 3.1: Error for Tikhonov’s regularization method.
0.0008
0.001
106
Regularization by Discretization 1
0.5
0.9
0.45
0.8
0.4
0.7
0.35
0.6
0.3
0.5
0.25
0.4
0.2
0.3
0.15
0.2
0.1
0.1
0.05
0 0
10
20
30
40
50
60
70
80
90
100
0.04
0 0
50
100
150
200
250
300
0.02 0.018
0.035
0.016 0.03 0.014 0.025
0.012
0.02
0.01 0.008
0.015
0.006 0.01 0.004 0.005 0 0
0.002 50
100
150
200
250
300
350
400
450
500
0
50
100
150
200
250
300
350
400
450
500
Figure 3.2: Error for Landweber’s method (a = 0.5).
1
0.5
0.9
0.45
0.8
0.4
0.7
0.35
0.6
0.3
0.5
0.25
0.4
0.2
0.3
0.15
0.2
0.1
0.1
0.05
0 2
3
4
5
6
7
8
9
10
0 0
2
4
6
8
10
12
14
16
18
20
0.02
0.04
0.018
0.035
0.016 0.03 0.014 0.025
0.012 0.01
0.02
0.008
0.015
0.006 0.01 0.004 0.005 0 0
0.002 5
10
15
20
25
30
0 5
6
7
8
Figure 3.3: Error for the conjugate gradient method.
9
10
3.5
Numerical Experiments for Symm’s Equation
107
In Figure 3.3, we show the results for the conjugate gradient method for the same example where again δ = 0.1, δ = 0.01, δ = 0.001, and δ = 0. The errors in the solution and the defects are again plotted versus the iteration number m. Here, we observe the same behavior as for Tikhonov’s method. We note the diﬀerence in the results for the Landweber method and the conjugate gradient method. The latter decreases the errors very quickly but is very sensitive to the exact stopping rule, while the Landweber iteration is slow but very stable with respect to the stopping parameter τ . The minimal values are errδ=0.1 ≈ 0.177, errδ=0.01 ≈ 0.0352, and errδ=0.001 ≈ 0.0054 for the Landweber iteration and errδ=0.1 ≈ 0.172, errδ=0.01 ≈ 0.0266, and errδ=0.001 ≈ 0.0038 for the conjugate gradient method. The corresponding factors are considerably larger than 102/3 ≈ 4.64 indicating the optimality of these methods also for smooth solutions (see the remarks following Theorem 2.15). Table 3.1: Least squares method n
δ = 0.1
δ = 0.01
1 2 3 4 5 6 10 12 15 20
38.190 15.772 5.2791 1.6209 1.0365 1.1954 2.7944 3.7602 4.9815 7.4111
38.190 15.769 5.2514 1.4562 0.3551 0.1571 0.2358 0.3561 0.4871 0.7270
δ = 0.001
δ=0
38.190 15.768 5.2511 1.4541 3.433 ∗ 10−1 7.190 ∗ 10−2 2.742 ∗ 10−2 3.187 ∗ 10−2 4.977 ∗ 10−2 7.300 ∗ 10−2
38.190 15.768 5.2511 1.4541 3.432 ∗ 10−1 7.045 ∗ 10−2 4.075 ∗ 10−5 5.713 ∗ 10−7 5.570 ∗ 10−10 3.530 ∗ 10−12
Table 3.2: Bubnov–Galerkin method n
δ = 0.1
δ = 0.01
1 2 3 4 5 6 10 12 15 20
38.190 15.771 5.2752 1.6868 1.1467 1.2516 2.6849 3.3431 4.9549 7.8845
38.190 15.769 5.2514 1.4565 0.3580 0.1493 0.2481 0.3642 0.4333 0.7512
δ = 0.001
δ=0
38.190 15.768 5.2511 1.4541 3.434 ∗ 10−1 7.168 ∗ 10−2 2.881 ∗ 10−2 3.652 ∗ 10−2 5.719 ∗ 10−2 7.452 ∗ 10−2
38.190 15.768 5.2511 1.4541 3.432 ∗ 10−1 7.045 ∗ 10−2 4.075 ∗ 10−5 5.713 ∗ 10−7 5.570 ∗ 10−10 3.519 ∗ 10−12
Next, we compute the same example using some projection methods. First, we list the results for the least squares method and the Bubnov–Galerkin method of Subsections 3.2.1 and 3.2.3 in Tables 3.1 and 3.2. We observe that both methods produce almost the same results, which reﬂect the estimates of Theorem 3.18. Note that for δ = 0 the error decreases exponentially with m. This
108
Regularization by Discretization
reﬂects the fact that the best approximation min ψ − φn L2 : φn ∈ Xn converges to zero exponentially due to the analyticity of the solution ψ(s) = exp(3 sin s) (see [168], Theorem 11.7).
Now we turn to the collocation methods of Section 3.4. To implement the collocation method (3.83) for Symm’s integral equation and the basis functions (3.89), (3.93a), and (3.93b), we have to compute the integrals
1 − π
2π
eijs ln γ(tk ) − γ(s) ds ,
(3.101a)
0
j = −m, . . . , m − 1, k = 0, . . . , 2m − 1, and 1 − π
2π x ˆj (s) ln γ(tk ) − γ(s) ds ,
j, k = 0, . . . , 2m − 1 ,
(3.101b)
0
respectively. For the ﬁrst integral (3.101a), we write using (3.64),
−
1 π
2π
eijs ln γ(tk ) − γ(s) ds
0
1 = − 2π
2π
ijs
e
% ln 4 sin
2 tk
0
= εj eijtk −
1 2π
2π
eijs ln
0
1 − s& ds − 2 2π
2π
eijs ln
0
γ(tk ) − γ(s)2 ds 4 sin2 (tk − s)/2
γ(tk ) − γ(s)2 ds , 4 sin2 (tk − s)/2
where εj = 0 for j = 0 and εj = 1/j otherwise. The remaining integral is computed by the trapezoidal rule. The computation of (3.101b) is more complicated. By Deﬁnition (3.93a), (3.93b) of x ˆj , we have to calculate tj +π/(2m)
2
π/(2m)
ln γ(tk ) − γ(s) ds = tj −π/(2m)
−π/(2m)
ln γ(tk ) − γ(s + tj )2 ds .
3.5
Numerical Experiments for Symm’s Equation
109
For j = k, the integrand is analytic, and we use Simpson’s rule π/(2m)
g(s) ds ≈
n
where π π − , s =
m n 2m
w g(s ) ,
=0
−π/(2m)
π w = · 3m n
'
1, 4, 2,
= 0 or n,
= 1, 3, . . . , n − 1,
= 2, 4, . . . , n − 2,
= 0, . . . , n. For j = k, the integral has a weak singularity at s = 0. We split the integrand into π/(2m)
−π/(2m)
s& ds + ln 4 sin 2
π = −2 π/(2m)
%
π/(2m)
2
ln −π/(2m) π/(2m)
% s& ds + ln 4 sin2 2
γ(tk ) − γ(s + tk )2 ds 4 sin2 (s/2)
ln −π/(2m)
γ(tk ) − γ(s + tk )2 ds 4 sin2 (s/2)
$π because ln 4 sin2 (s/2) is even and 0 ln 4 sin2 (s/2) ds = 0 by (3.64). Both integrals are approximated by Simpson’s rule. For the same example as earlier, with 100 integration points for Simpson’s rule we obtain the following results for basis functions (3.89) (in Table 3.3) and basis functions (3.93a), (3.93b) (in Table 3.4).
Table 3.3: Collocation method for basis functions (3.89) m
δ = 0.1
δ = 0.01
δ = 0.001
δ=0
1 2 3 4 5 6 10 12 15 20
6.7451 1.4133 0.3556 0.2525 0.3096 0.3404 0.5600 0.6974 0.8017 1.1539
6.7590 1.3877 2.791 ∗ 10−1 5.979 ∗ 10−2 3.103 ∗ 10−2 3.486 ∗ 10−2 5.782 ∗ 10−2 6.766 ∗ 10−2 8.371 ∗ 10−2 1.163 ∗ 10−1
6.7573 1.3880 2.770 ∗ 10−1 5.752 ∗ 10−2 1.110 ∗ 10−2 3.753 ∗ 10−3 5.783 ∗ 10−3 6.752 ∗ 10−3 8.586 ∗ 10−3 1.182 ∗ 10−2
6.7578 1.3879 2.769 ∗ 10−1 5.758 ∗ 10−2 1.099 ∗ 10−2 1.905 ∗ 10−3 6.885 ∗ 10−7 8.135 ∗ 10−9 6.436 ∗ 10−12 1.806 ∗ 10−13
110
Regularization by Discretization
Table 3.4: Collocation method for basis functions (3.93a) and (3.93b) m
δ = 0.1
δ = 0.01
δ = 0.001
δ=0
1 2 3 4 5 6 10 12 15 20 30
6.7461 1.3829 0.4944 0.3225 0.3373 0.3516 0.5558 0.6216 0.8664 1.0959 1.7121
6.7679 1.3562 4.874 ∗ 10−1 1.971 ∗ 10−1 1.649 ∗ 10−1 1.341 ∗ 10−1 8.386 ∗ 10−2 7.716 ∗ 10−2 9.091 ∗ 10−2 1.168 ∗ 10−1 1.688 ∗ 10−1
6.7626 1.3599 4.909 ∗ 10−1 2.000 ∗ 10−1 1.615 ∗ 10−1 1.291 ∗ 10−1 6.140 ∗ 10−2 4.516 ∗ 10−2 3.137 ∗ 10−2 2.121 ∗ 10−2 1.862 ∗ 10−2
6.7625 1.3600 4.906 ∗ 10−1 2.004 ∗ 10−1 1.617 ∗ 10−1 1.291 ∗ 10−1 6.107 ∗ 10−2 4.498 ∗ 10−2 3.044 ∗ 10−2 1.809 ∗ 10−2 8.669 ∗ 10−3
The diﬀerence for δ = 0 reﬂects the fact that the best approximation min ψ − φn L2 : φn ∈ span {ˆ xj : j ∈ J} converges to zero exponentially for x ˆj deﬁned by (3.89), while it converges to zero only of order 1/n for x ˆj deﬁned by (3.93a) and (3.93b) (see Theorem 3.28). We have seen in this section that the theoretical investigations of the regularization strategies are conﬁrmed by the numerical results for Symm’s integral equation.
3.6
The Backus–Gilbert Method
In this section, we study a diﬀerent numerical method for “solving” ﬁnite moment problems of the following type: b kj (s) x(s) ds = yj ,
j = 1, . . . , n .
(3.102)
a
Here, yj ∈ R are any given numbers and kj ∈ L2 (a, b) arbitrary given functions. Certainly, we have in mind that yj = y(tj ) and kj = k(tj , ·). In Section 3.4, we studied the moment solution of such problems; see [184, 229]. We saw that the moment solution xn is a ﬁnite linear combination of the functions {k1 , . . . , kn }. Therefore, the moment solution xn is as smooth as the functions kj even if the true solution is smoother. The concept originally proposed by Backus and Gilbert ([13, 14]) does not primarily wish to solve the moment problem but rather wants to determine how well all possible models x can be recovered pointwise.
3.6
The Backus–Gilbert Method
111
Deﬁne the ﬁnitedimensional operator K : L2 (a, b) → Rn by b (Kx)j =
kj (s) x(s) ds ,
j = 1, . . . , n,
x ∈ L2 (a, b) .
(3.103)
a
We try to ﬁnd a left inverse S, that is, a linear operator S : Rn → L2 (a, b) such that (3.104) SKx ≈ x for all x ∈ L2 (a, b) . Therefore, SKx should be a simultaneous approximation to all possible x ∈ L2 (a, b). Of course, we have to make clear the meaning of the approximation. The general form of a linear operator S : Rn → L2 (a, b) has to be (Sy)(t) =
n
t ∈ (a, b) ,
yj ϕj (t) ,
y = (yj ) ∈ Rn ,
(3.105)
j=1
for some ϕj ∈ L2 (a, b) that are to be determined from the requirement (3.104): (SKx)(t)
n
=
b ϕj (t)
j=1
kj (s) x(s) ds a
⎡ ⎤ b n ⎣ = kj (s) ϕj (t)⎦ x(s) ds . a
j=1
The requirement SKx ≈ x leads to the problem of approximating Dirac’s delta
n distribution δ(s − t) by linear combinations of the form j=1 kj (s) ϕj (t). For example, one can show that the minimum of 2 b b n ds dt k (s) ϕ (t) − δ(s − t) j j a
a
j=1
(in the sense of distributions) is attained at ϕ(s) = A−1 k(s), where k(s) = $b k1 (s), . . . , kn (s) and A ij = a ki (s) kj (s) ds, i, j = 1, . . . , n. For this minn imization criterion, x = j=1 yj ϕj is again the moment solution of Subsec−s norm tion 3.4.1. In [184], it is shown that minimizing with respect to an Hper s spaces. We refer also to [272] for s > 1/2 leads to projection methods in Hper for a comparison of several minimization criteria. The Backus–Gilbert method is based on a pointwise minimization criterion: Treat t ∈ [a, b] as a ﬁxed parameter and determine the numbers ϕj = ϕj (t) for j = 1, . . . , n, as the solution of the following minimization problem: b minimize a
2 n s − t kj (s) ϕj ds 2
j=1
(3.106a)
112
Regularization by Discretization
subject to ϕ ∈ Rn and
b n a
kj (s) ϕj ds = 1 .
(3.106b)
j=1
Using the matrixvector notation, we rewrite this problem in short form: minimize
ϕ Q(t) ϕ
r · ϕ = 1,
subject to
where b Q(t)ij
=
s − t2 ki (s) kj (s) ds,
i, j = 1, . . . , n,
a
b rj
=
kj (s) ds ,
j = 1, . . . , n .
a
This is a quadratic minimization problem with one linear equality constraint. We assume that r = 0 because otherwise the constraint (3.106b) cannot be satisﬁed. Uniqueness and existence are assured by the following theorem, which also gives a characterization by the Lagrange multiplier rule. Theorem 3.29 Assume that {k1 , . . . , kn } are linearly independent. Then the symmetric matrix Q(t) ∈ Rn×n is positive deﬁnite for every t ∈ [a, b]. The minimization problem (3.106a), (3.106b) is uniquely solvable. ϕ ∈ Rn is a solution of (3.106a) and (3.106b) if and only if there exists a number λ ∈ R (the Lagrange multiplier) such that (ϕ, λ) ∈ Rn × R solves the linear system Q(t)ϕ − λ r = 0
and
r · ϕ = 1.
(3.107)
λ = ϕ Q(t) ϕ is the minimal value of this problem. Proof:
From
b
ϕ Q(t) ϕ =
n 2 s − t kj (s) ϕj ds, 2
a
j=1
we conclude ﬁrst that ϕ Q(t) ϕ ≥ 0 and second that ϕ Q(t) ϕ = 0 implies
n k that j=1 j (s) ϕj = 0 for almost all s ∈ (a, b). Because {kj } are linearly independent, ϕj = 0 for all j follows. Therefore, Q(t) is positive deﬁnite. Existence, uniqueness, and equivalence to (3.107) are elementary results from optimization theory; see [269]. n Definition 3.30 We denote by ϕj (t) j=1 ∈ Rn the unique solution ϕ ∈ Rn of (3.106a) and (3.106b). The Backus–Gilbert solution xn of b kj (s) xn (s) ds = yj , a
j = 1, . . . , n ,
3.6
The Backus–Gilbert Method
is deﬁned as xn (t) =
n
113
yj ϕj (t) ,
t ∈ [a, b] .
(3.108)
j=1
The minimal value λ = λ(t) = ϕ(t) Q(t)ϕ(t) is called the spread.
n We remark that, in general, the Backus–Gilbert solution xn = j=1 yj ϕj $b is not a solution of the moment problem, that is, a kj (s) xn (s) ds = yj ! This is certainly a disadvantage. On the other hand, the solution x is analytic in [a, b]—even for nonsmooth data kj . We can prove the following lemma. Lemma 3.31 ϕj and λ are rational functions. More precisely, there exist polynomials pj , q ∈ P2(n−1) and ρ ∈ P2n such that ϕj = pj /q, j = 1, . . . , n, and λ = ρ/q. The polynomial q has no zeros in [a, b]. Proof: Obviously, Q(t) = Q0 − 2t Q1 + t2 Q2 with symmetric matrices Q0 , n Q1 , Q2 . We search for a polynomial solution p ∈ Pm and ρ ∈ Pm+2 of Q(t)p(t) − ρ(t) r = 0 with m = 2(n − 1). Because the number of equations is n(m + 3) = 2n2 + n and the number of unknowns is n(m n + 1) + (m + 3) = 2n2 + n + 1, there exists a nontrivial solution p ∈ Pm and ρ ∈ Pm+2 . If p(tˆ) = 0 for some tˆ ∈ [a, b], then ρ(tˆ) = 0 because r = 0. In this case, we divide the equation by (t − tˆ). Therefore, we can assume that p has no zero in [a, b]. Now we deﬁne q(t) := r · p(t) for t ∈ [a, b]. Then q ∈ Pm has no zero in [a, b] because otherwise we would have 0 = ρ(tˆ) r · p(tˆ) = p(tˆ) Q(tˆ) p(tˆ) ; thus p(tˆ) = 0, a contradiction. Therefore, ϕ := p/q and λ := ρ/q solves (3.107). By the uniqueness result, this is the only solution. For the following error estimates, we assume two kinds of a priori information on x depending on the norm of the desired error estimate. Let Xn = span{kj : j = 1, . . . , n} . 2 Theorem 3.32 Let
nx ∈ L (a, b) be any solution of the ﬁnite moment problem (3.102) and xn = j=1 yj ϕj be the Backus–Gilbert solution. Then the following error estimates hold:
(a) Assume that x is Lipschitz continuous with constant > 0, that is, x(t) − x(s) ≤ s − t Then
for all s, t ∈ [a, b] .
√ xn (t) − x(t) ≤ b − a n (t)
(3.109)
for all n ∈ N, t ∈ [a, b], where n (t) is deﬁned by b b s−t2 zn (s)2 ds : zn ∈ Xn , zn (s) ds = 1 . (3.110) 2n (t) := min a
a
114
Regularization by Discretization
(b) Let x ∈ H 1 (a, b). Then there exists c > 0, independent of x, such that xn − xL2 ≤ c x L2 n ∞
for all n ∈ N .
(3.111)
Proof: By the deﬁnition of the Backus–Gilbert solution and the constraint on ϕ, we have n
xn (t) − x(t) =
yj ϕj (t) − x(t)
j=1
a
n b
=
b n
kj (s) ϕj (t) ds
j=1
kj (s) x(s) − x(t) ϕj (t) ds .
j=1 a
Thus
b n x(s) − x(t) ds . k (s) ϕ (t) j j
xn (t) − x(t) ≤
a
j=1
Now we distinguish between parts (a) and (b): (a) Let x(t) − x(s) ≤ t − s. Then, by the Cauchy–Schwarz inequality and the deﬁnition of ϕj , b xn (t) − x(t)
≤
a
≤
√
n 1 · kj (s) ϕj (t) t − s ds j=1
⎡ b ⎤1/2 2 n b − a ⎣ kj (s) ϕj (t) t − s2 ds⎦ a
=
√
j=1
b − a n (t) .
(b) First, we deﬁne the cutoﬀ function λδ on [a, b] × [a, b] by 1, t − s ≥ δ, λδ (t, s) = 0, t − s < δ. Then, by the Cauchy–Schwarz inequality again, ⎤2 ⎡ b n ⎣ λδ (t, s) kj (s) ϕj (t) x(s) − x(t) ds⎦ j=1
a
⎤2 ⎡ b n x(s) − x(t) ds⎦ = ⎣ kj (s) ϕj (t) (t − s) λδ (t, s) t−s a
≤
j=1 2
b
n (t)
a
x(s) − x(t) 2 ds . λδ (t, s) s−t
(3.112)
3.6
The Backus–Gilbert Method
115
Integration with respect to t yields ⎤2 ⎡ b b n ⎣ λδ (t, s) kj (s) ϕj (t) x(s) − x(t) ds⎦ dt a
≤
j=1
a
n 2∞
x(s) − x(t) 2 s − t λδ (t, s) ds dt .
b
b
a
a
The following technical lemma from the theory of Sobolev spaces yields the assertion. Lemma 3.33 There exists c > 0 such that b b x(s) − x(t) 2 2 s − t λδ (t, s) ds dt ≤ cx L2 a
a
for all δ > 0 and x ∈ H 1 (a, b). Here, the cutoﬀ function λδ is deﬁned by (3.112). Proof:
First, we estimate x(s) − x(t)
2
t 2 t = 1 · x (τ ) dτ ≤ t − s x (τ )2 dτ s
s
and thus, for s = t, t x(s) − x(t) 2 1 ≤ x (τ )2 dτ . s−t s − t s
Now we ﬁx t ∈ (a, b) and write b x(s) − x(t) 2 s − t λδ (t, s) ds a
t ≤ a
b =
λδ (t, s) t−s
t
2
b
x (τ ) dτ ds + s
t
λδ (t, s) s−t
s
x (τ )2 dτ ds
t
x (s)2 Aδ (t, s) ds ,
a
where
Aδ (t, s) =
⎧ $s ⎪ ⎪ ⎨ ⎪ ⎪ ⎩
a $b s
λδ (t,τ ) t−τ  λδ (t,τ ) t−τ 
dτ,
'
a ≤ s < t, =
dτ,
t < s ≤ b,
t−a ln max(δ,t−s) , s ≤ t, b−t , s ≥ t. ln max(δ,s−t)
116
Regularization by Discretization
Finally, we estimate b b a
a
2
x(s) − x(t) λδ (t, s) ds dt ≤ s − t2
and
b
⎛ b ⎞ x (s)2 ⎝ Aδ (t, s) dt⎠ ds
a
a
b Aδ (t, s) dt ≤ c for all s ∈ (a, b) and δ > 0 a
which is seen by elementary integration.
From these error estimates, we observe that the rate of convergence depends on the magnitude of n , that is, how well the kernels approximate the delta distribution. Finally, we study the question of convergence for n → ∞. Theorem 3.34 Assume that {kj : j ∈ N} is linearly independent and dense in L2 (a, b). Then n ∞ −→ 0 for n → ∞ . Proof: For ﬁxed t ∈ [a, b] and arbitrary δ ∈ 0, (b − a)/2 , we deﬁne ⎤−1 ⎡ b 1 , s − t ≥ δ, s−t v˜(s) := and v(s) := ⎣ v˜(τ ) dτ ⎦ v˜(s) . 0, s − t < δ, $b
a
Then v ∈ L2 (a, b) and a v(s) ds = 1. Because Xn is dense in L2 (a, b), there exists a sequence v˜n ∈ Xn with v˜n → v in L2 (a, b). This implies also that $b $b v˜ (s) ds → a v(s) ds = 1. Therefore, the functions a n ( b )−1 vn := v˜n (s) ds v˜n ∈ Xn a
converge to v in L2 (a, b) and are normalized by admissible, and we conclude that 2
n (t)
b ≤
$b a
vn (s) ds = 1. Thus vn is
s − t2 vn (s)2 ds
a
b
2
2
b
s − t v(s) ds + 2
= a
s − t2 v(s) vn (s) − v(s) ds
a
b +
2 s − t2 vn (s) − v(s) ds
a
≤
"b #−2 (b − a) v˜(s) ds + (b − a)2 2 vL2 vn − vL2 + vn − v2L2 . a
3.7
Problems
117
This shows that lim sup n (t) ≤
√
n→∞
"b b−a
#−1 v˜(s) ds
for all t ∈ [a, b] .
a
Direct computation yields b v˜(s) ds ≥ c +  ln δ a
for some c independent of δ; thus √ b−a lim sup n (t) ≤ c +  ln δ n→∞
for all δ ∈ 0, (b − a)/2 .
This yields pointwise convergence, that is, n (t) → 0 (n → ∞) for every t ∈ [a, b]. Because n (t) is monotonic with respect to n, Dini’s wellknown theorem from classical analysis (see, for example, [231]) yields uniform convergence. For further aspects of the Backus–Gilbert method, we refer to [32, 114, 144, 162, 163, 244, 272, 273].
3.7
Problems
3.1 Let Qn : C[a, b] → S1 (t1 , . . . , tn ) be the interpolation operator from Example 3.3. Prove that Qn L(C[a,b]) = 1 and derive an estimate of the form Qn x − x∞ ≤ c h x ∞ for x ∈ C 1 [a, b], where h = max{tj − tj−1 : t = 2, . . . , n}. 3.2 Let K : X → X be selfadjoint and positive deﬁnite and let y ∈ X. Deﬁne ψ(x) = (Kx, x)X − 2 Re(y, x)X for x ∈ X. Prove that x∗ ∈ X is a minimum of ψ if and only if x∗ solves Kx∗ = y. 3.3 Deﬁne the space Xn by Xn =
aj eijt : aj ∈ C
j≤n
and let Pn : L2 (0, 2π) → Xn be the orthogonal projection operator. Prove that for r ≥ s there exists c > 0 such that r ψn Hper
≤
s Pn ψ − ψHper
≤
s c nr−s ψn Hper for all ψn ∈ Xn , 1 r r c r−s ψHper for all ψ ∈ Hper (0, 2π) . n
118
Regularization by Discretization
3.4 Show that the worstcase error of Symm’s equation under the information s ψHper ≤ E for some s > 0 is given by s ≤ c δ s/(s+1) . F δ, E, · Hper 3.5 Let Ω ⊂ R2 be the disk of radius a = exp(−1/2). Then ψ = 1 is the unique solution of Symm’s integral equation (3.57) for f = 1. Compute explicitly the errors of the least squares solution, the dual least squares solution, and the Bubnov–Galerkin solution as in Section 3.3, and verify that the error estimates of Theorem 3.18 are asymptotically sharp. 3.6 Let tk = k/n, k = 1, . . . , n, be equidistant collocation points. Let Xn be the space of piecewise constant functions as in (3.81) andPn : L2 (0, 1) → Xn be the orthogonal projection operator. Prove that Xn is dense in L2 (0, 1) and 1 x − Pn xL2 ≤ x L2 n for all x ∈ H 1 (0, 1) (see Problem 3.1). 3.7 Show that the moment solution can also be interpreted as the solution of a dual least squares method. 3.8 Consider moment collocation of the equation t x(s) ds = y(t),
t ∈ [0, 1] ,
0
in the space Xn = S1 (t1 , . . . , tn ) of linear splines. Show that the moment solution xn coincides with the twosided diﬀerence quotient, that is, xn (tj ) =
1 y tj+1 + h − y tj−1 − h , 2h
where h = 1/n. Derive an error estimate for xδn −xL2 as in Example 3.23.
Chapter 4
Nonlinear Inverse Problems In the previous chapters, we considered linear problems which we wrote as Kx = y, where K was a linear and (often) compact operator between Hilbert spaces. Needless to say that most problems in applications are nonlinear. For example, even in the case of a linear diﬀerential equation of the form −u + cu = f for the function u the dependence of u on the parameter function c is nonlinear; that is, the mapping c → u is nonlinear.1 In Chapters 5, 6, and 7 we will study particular nonlinear problems to determine parameters of an ordinary or partial diﬀerential equation from the knowledge of the solution. Although we believe that the best strategies for solving nonlinear problems are intrinsically linked to the particular nature of the underlying problem, there are general methods for solving these nonlinear problems if they can be written in the form K(x) = y, where K is now a nonlinear operator between Hilbert spaces or Banach spaces. Guided by the structure of Chapter 2, we will study the nonlinear form of the Tikhonov regularization in Section 4.2 and the extension of the Landweber method in Section 4.3. Since the investigation of the latter one is already rather complicated, we do not present the extension of the conjugate gradient method or methods of Newton type, but refer the interested reader to the monograph [149] of Kaltenbacher, Neubauer, and Scherzer. We start this chapter with a clariﬁcation of the notion of illposedness in Section 4.1 and its relation to the illposedness of the linearized problem. In Section 4.2, we study Tikhonov’s regularization method for nonlinear problems. In contrast to the linear case, the question of existence of a global minimum of the Tikhonov functional is not obvious and requires more advanced tools from functional analysis, in particular, on weak topologies. We include the arguments in Subsection 4.2.1 but note already here that this Subsection is not central for the understanding of the further theory. It can be omitted because we formulate the existence of a minimum also as Assumption 4.9 at the beginning of Subsection 4.2.2. After the application of the general theory to the abovementioned parameter identiﬁcation problem for a boundary value
1 The diﬀerential equation has to be complemented by initial or boundary conditions, of course.
© Springer Nature Switzerland AG 2021 A. Kirsch, An Introduction to the Mathematical Theory of Inverse Problems, Applied Mathematical Sciences 120, https://doi.org/10.1007/9783030633431 4
119
120
Nonlinear Inverse Problems
problem for an ordinary diﬀerential equation, in Subsection 4.2.4, we present some of the basic ideas for Tikhonov’s method in Banach spaces and more general metrics to penalize the discrepancy and to measure the error. As a particular example, we consider the determination of a sparse solution of a linear equation. Further, tools from convex analysis are needed, such as the subdiﬀerential and the Bregman distance. Finally, in Section 4.3, we return to the Hilbert space setting and extend the Landweber method from Sections 2.3 and 2.6 to the nonlinear case.
4.1
Local Illposedness
In this chapter, we assume that X and Y are normed spaces (in most cases Hilbert spaces), K : X ⊃ D(K) → Y a nonlinear mapping with domain of deﬁnition D(K) ⊂ X. Let x∗ ∈ D(K) and y ∗ ∈ Y such that K(x∗ ) = y ∗ . It is the aim—as in the linear case—to determine an approximate solution to x∗ when the righthand side y ∗ is perturbed; that is, replaced by y δ ∈ Y with y δ − y ∗ Y ≤ δ. In the following, let B(x, r) = {z : x − zX < r} denote the open ball centered at x with radius r. The following notion of local illposedness goes back to Hofmann and Scherzer (see, e.g., [137]). Definition 4.1 Let x∗ ∈ D(K) and y ∗ = K(x∗ ). The equation K(x) = y is called locally improperlyposed or locally illposed at x∗ if for any suﬃciently small ρ > 0 there exists a sequence xn ∈ D(K) ∩ B(x∗ , ρ) such that K(xn ) → K(x∗ ) but (xn ) does not converge to x∗ as n tends to inﬁnity. Example 4.2 1 1] × [0, 1] × R , k = k(t, s, r), and let there exist c1 > 0 with Let k ∈ C [0, ∂k(t, s, r)/∂r ≤ c1 for all (t, s, r) ∈ [0, 1] × [0, 1] × R. Deﬁne 1 K(x)(t) =
k t, s, x(s) ds ,
t ∈ [0, 1] ,
for x ∈ L2 (0, 1) .
0
Then K is welldeﬁned from L2 (0, 1) into itself, and the equation K(x) = y is locally illposed at x = 0. Proof: Let x ∈ L2 (0, 1). The application of the fundamental theorem of calculus in the form r ∂k (t, s, r ) dr , r ∈ R , k(t, s, r) = k(t, s, 0) + (4.1) ∂r 0
2 2 implies k(t, s, r) ≤ k(t, s, 0) + c1 r, thus k(t, s, r) ≤ 2k(t, s, 0) + 2c21 r2 , thus, using the CauchySchwarz inequality, K(x)(t)2 ≤
1 0
k t, s, x(s) 2 ds ≤ 2
0
k(t, s, 0)2 + c21 x(s)2 ds
1
4.1
Local Illposedness
121
2 for all t ∈ [0, 1]. Therefore, K(x) is measurable and K(x)(t) is bounded which implies K(x) ∈ L2 (0, 1). √ Let now ρ > 0 be arbitrary and xn (t) = ρ 2n + 1 tn , t ∈ [0, 1]. Then xn 2L2 (0,1) = 1 ρ2 (2n + 1) t2n dt = ρ2 and, with (4.1), 0
K(xn )(t) − K(0)(t)
1 ≤
c1
√ xn (s) ds = c1 ρ 2n + 1
0
=
1
sn ds
0
√ c1 ρ 2n + 1 → 0 as n → ∞ . n+1
Therefore, the equation K(x) = y is locally improperlyposed at x = 0. A second, and more concrete, example is formulated as Problem 4.2. If K is continuously Fr´echetdiﬀerentiable at x∗ with Lipschitz continuous derivative then local illposedness of the nonlinear problem implies the illposedness of the linearization. Theorem 4.3 Let K be Fr´echetdiﬀerentiable in the ball B(x∗ , ρˆ) and let there exists γ > 0 with K (x)−K (x∗ )L(X,Y ) ≤ γx−x∗ X for all x ∈ B(x∗ , ρˆ). Let the equation K(x) = y be locally illposed at x∗ . Then K (x∗ ) is not boundedly invertible; that is, the linear equation K (x∗ )h = z is also illposed. Proof: We assume on the contrary that K (x∗ ) is boundedly invertible and ∗ −1 L(Y,X) =: q < 1. For this ρ we choose choose ρ ∈ (0, ρˆ) such that ργ 2 K (x ) a sequence xn ∈ B(x∗ , ρ) by Deﬁnition 4.1. Lemma A.63 of the Appendix A.7 implies the representation K(xn ) − K(x∗ ) = K (x∗ )(xn − x∗ ) + rn
with
rn Y ≤
for all n ∈ N; that is, K (x∗ )−1 K(xn ) − K(x∗ ) = xn − x∗ + K (x∗ )−1 rn
γ xn − x∗ 2X 2
for all n ∈ N ,
thus xn − x∗ X
≤ ≤
≤
∗ −1 K (x ) K(xn ) − K(x∗ ) + K (x∗ )−1 rn X X
K (x∗ )−1 L(Y,X) K(xn ) − K(x∗ )Y γ + K (x∗ )−1 L(Y,X) xn − x∗ 2X 2 K (x∗ )−1 L(Y,X) K(xn ) − K(x∗ )Y + q xn − x∗ X ,
thus (1 − q) xn − x∗ X ≤ K (x∗ )−1 L(Y,X) K(xn ) − K(x∗ )Y ,
122
Nonlinear Inverse Problems
and this expression converges to zero because of K(xn ) → K(x∗ ). This contra dicts (xn ) does not converge to x∗ . We will see in Section 4.3 that also the reverse assertion is true provided an addition condition (“tangential cone condition”) is satisﬁed, see Lemma 4.35.
4.2
The Nonlinear Tikhonov Regularization
In this section (except of Subsection 4.2.4), we assume that X and Y are Hilbert spaces with inner products (·, ·)X and (·, ·)Y , respectively, and corresponding norms. Let again x∗ ∈ D(K) and y ∗ ∈ Y with K(x∗ ) = y ∗ and y δ ∈ Y with y δ − y ∗ Y ≤ δ. Let, in addition, x ˆ ∈ X be given which is thought of being an approximation of the true solution x∗ . We deﬁne the Tikhonov functional by ˆ2X , Jα,δ (x) = K(x) − y δ 2Y + αx − x
x ∈ D(K) .
(4.2)
In the ﬁrst subsection, we will discuss brieﬂy the question of existence of minima of the Tikhonov functional and stability with respect to perturbation of y δ . This part needs some knowledge of the weak topology in Hilbert spaces. We have collected the needed results in Section A.9 of the Appendix for the convenience of the reader. One can easily drop this subsection if one is not familiar (or interested) with this part of functional analysis. The further analysis is quite independent of this subsection.
4.2.1
Existence of Solutions and Stability
We recall the following deﬁnition (see remark following Deﬁnition A.75 of the Appendix A.9). Definition 4.4 Let X be a Hilbert space. A sequence (xn ) in X is said to converge weakly to x ∈ X if limn→∞ (xn , z)X = (x, z)X for all z ∈ X. If Y is a second Hilbert space and K : X ⊃ D(K) → Y is a (nonlinear) mapping then K is called weaktoweak continuous, if K maps weakly convergent sequences in D(K) into weakly convergent sequences in Y . We will need the following two results (see Corollary A.78 and part (d) of Theorem A.76). Theorem 4.5 Let X be a Hilbert space. (a) Every bounded sequence (xn ) in X contains a weak accumulation point; that is, a weakly convergent subsequence. (b) Every convex and closed set U ⊂ X is also weakly closed; that is, if the sequence (xn ) in U converges weakly to some x ∈ X then necessarily x ∈ U.
4.2
The Nonlinear Tikhonov Regularization
123
The norm function · X fails to be weakly continuous but has the following property which is sometimes called “weak lower semicontinuity”. Lemma 4.6 Let X be a Hilbert space and let the sequence (xn ) converge weakly to x. Then lim inf xn − zX ≥ x − zX n→∞
Proof:
for all z ∈ X .
This follows from the formula xn − z2X − x − z2X
=
2 Re(xn − x, x − z)X + xn − x2X
≥
2 Re(xn − x, x − z)X
because the right hand side of the inequality tends to zero as n tends to inﬁnity. Now we are able to prove the existence of minima of the Tikhonov functional under appropriate assumptions. Theorem 4.7 Let X and Y be Hilbert spaces, y δ ∈ Y , and K : X ⊃ D(K) → Y weaktoweak continuous with convex and closed domain of deﬁnition D(K). Then there exists a global minimum of Jα,δ , deﬁned in (4.2), on D(K) for all α > 0. Proof: Let xn ∈ D(K) be a minimizing sequence; that is, Jα,δ (xn ) → J ∗ := inf{Jα,δ (x) : x ∈ D(K)} as n → ∞. From αxn − x ˆ2X ≤ Jα,δ (xn ) ≤ J ∗ + 1 for suﬃciently large n, we conclude that the sequence (xn ) is bounded. By part (a) of Theorem 4.5, there exists a subsequence—which we also denote by (xn )— which converges weakly to some x. Also, x ∈ D(K) by part (b) of this theorem. Furthermore, K(xn ) converges weakly to K(x) by the assumption on ˆX ≥ x − x ˆX and lim inf K(xn ) − K. Lemma 4.6 implies that lim inf xn − x n→∞
n→∞
y δ Y ≥ K(x) − y δ Y . Therefore, for any ε > 0 there exists N ∈ N such that for all n ≥ N Jα,δ (x)
= αx − x ˆ2X + K(x) − y δ 2Y ≤ ≤
αxn − x ˆ2X + K(xn ) − y δ 2Y + ε = Jα,δ (xn ) + ε J ∗ + 2ε .
This holds for all ε > 0, thus J ∗ ≤ Jα,δ (x) ≤ J ∗ which proves that x is a minimum of Jα,δ . In the same way one proves stability.
124
Nonlinear Inverse Problems
Theorem 4.8 Let the assumptions of the previous theorem hold, and in addition, (yn ) be a sequence with yn → y δ as n → ∞. Let xn ∈ D(K) be a minimum ˆ2X + K(x) − yn 2Y on D(K). Then there exist weak accuof Jn (x) := αx − x mulation points of the sequence (xn ), and every weak accumulation point x is a minimum of Jα,δ . If in addition, K is weaktonorm continuous, then every weak accumulation point x of (xn ) is also an accumulation point with respect to the norm. Proof:
The estimate
αxn − x ˆ2X + K(xn ) − yn 2Y
= Jn (xn ) ≤ Jn (ˆ x) = K(ˆ x) − yn 2Y 2 ≤ K(ˆ x) − y δ Y + y δ − yn Y 2 ≤ K(ˆ x) − y δ Y + 1
for suﬃciently large n implies again that the sequence (xn ) is bounded. Thus, it contains weak accumulation points by Theorem 4.5. Let x ∈ D(K) be a weak accumulation point, without loss of generality let xn itself converge weakly to x. Then K(xn ) − yn converges weakly to K(x) − y δ and thus as before ˆX ≥ x − x ˆX and lim inf K(xn ) − yn Y ≥ K(x) − y δ Y . Set lim inf xn − x n→∞
n→∞
Jn∗ = Jn (xn ). Then, for any ε > 0, there exists N ∈ N such that for all n ≥ N and all x ∈ D(K) Jα,δ (x)
≤ ≤
αxn − x ˆ2X + K(xn ) − yn 2Y + ε = Jn∗ + ε Jn (x) + ε = αx − x ˆ2X + K(x) − yn 2Y + ε .
(4.3)
Now we let n tend to inﬁnity. This yields Jα,δ (x) ≤ lim inf Jn∗ + ε ≤ lim inf Jn∗ + ε ≤ Jα,δ (x) + ε . n→∞
n→∞
Letting ε tend to zero proves the optimality of x and also the convergence of Jn∗ to Jα,δ (x). Let now K be weaktonorm continuous. From ˆ2X − x − x ˆ2X = Jn∗ − Jα,δ (x) + K(x) − y δ 2Y − K(xn ) − yn 2Y α xn − x and the convergence of K(xn ) to K(x), we conclude that the righthand side ˆX → x− x ˆX . Finally, the binomial formula converges to zero and thus xn − x yields xn − x2X which tends to zero.
4.2.2
= (xn − x ˆ) − (x − x ˆ)2X = xn − x ˆ2X + x − x ˆ2X − 2 Re(xn − x ˆ, x − x ˆ)X
Source Conditions And Convergence Rates
We make the following general assumption.
4.2
The Nonlinear Tikhonov Regularization
125
Assumption 4.9 (a) D(K) is open, and x∗ ∈ D(K) ∩ B(ˆ x, ρ) is a solux, ρ), and K is continuously Fr´echettion of K(x) = y ∗ in some ball B(ˆ diﬀerentiable on D(K) ∩ B(ˆ x, ρ), (b) K (x∗ ) is compact from X into Y . (c) The Tikhonov functional Jα,δ possesses global minima xα,δ ∈ D(K) on D(K) for all α, δ > 0; that is, ˆ2X + K(xα,δ ) − y δ 2Y ≤ αx − x ˆ2X + K(x) − y δ 2Y (4.4) αxα,δ − x for all x ∈ D(K). We refer to the previous subsection where we showed part (c) of this assumption under appropriate smoothness assumptions on K. Now we are able to formulate the condition which corresponds to the “source condition” in the linear case. We will include the more general form with index functions and introduce the following notion. Definition 4.10 Any monotonically increasing and continuous function ϕ : [0, δmax ] → R (for some δmax > 0) with ϕ(0) = 0 is called an index function. The most prominent examples for index functions are ϕ(t) = tσ for any σ > 0 but also ϕ(t) = −1/ ln t for 0 < t < 1 (and ϕ(0) = 0). By calculating the second derivative, one observes that the latter one is concave on [0, 1/e2 ] and the ﬁrst class is concave whenever σ ≤ 1. The linear case ϕ(t) = βt for t ≥ 0 is particularly important. Assumption 4.11 (Source condition) Let D(K) be open and x∗ ∈ D(K) ∩ x, ρ) and let K be diﬀerenB(ˆ x, ρ) be a solution of K(x) = y ∗ in some ball B(ˆ tiable on D(K) ∩ B(ˆ x, ρ). Furthermore, let ϕ : [0, δmax ] → R be a concave index function with δmax ≥ K (x∗ )L(X,Y ) . (i) Let K be locally Lipschitz continuous; that is, there exists γ > 0 with K (x) − K (x∗ )L(X,Y ) ≤ γx − x∗ X
for all x ∈ B(x∗ , ρ) ∩ D(K) ,
(ii) and there exists w ∈ X with ˆ = ϕ [(K (x∗ )∗ (K (x∗ )]1/2 w x∗ − x
and
γ wX < 1 .
We note that for a linear compact operator A : X → Y (in the present case A := K (x∗ )) the operator ϕ [A∗ A]1/2 from X into itself is deﬁned as in (A.47) by a singular system {μj , xj , yj : j ∈ J} for A, see Appendix A.6, Theorem A.57, where J is ﬁnite or J = N, namely,
ϕ [A∗ A]1/2 z = ϕ(μj ) (z, xj )X xj , z ∈ X . j∈J
126
Nonlinear Inverse Problems
In the special case that ϕ(t) = tσ the condition reads as x∗ − x ˆ = (A∗ A)σ/2 w which is just the source condition of the linear case ˆ = 0). Again, for (for x σ = 1 the ranges R [(K (x∗ )∗ (K (x∗ )]1/2 and R K (x∗ )∗ coincide which is seen from the singular system {μj , xj , yj : j ∈ J}. Therefore, in this linear case ˆ = K (x∗ )∗ v for some v ∈ Y with γ vY < 1. part (ii) takes the form x∗ − x In the past decade, a diﬀerent kind of source conditions has been developed which does not need the derivative of K. It can be generalized to a wider class of Tikhonov functionals with nondiﬀerentiable K acting between Banach spaces. We refer to Subsection 4.2.4 for a short glimpse on these extensions. Assumption 4.12 (Variational Source condition) Let D(K) be open and x∗ ∈ x, ρ) and ϕ : D(K) ∩ B(ˆ x, ρ) be a solution of K(x) = y ∗ in some ball B(ˆ ∗ index function with δ ≥ sup K(x ) − K(x)Y : [0, δmax ] → R be a concave max x ∈ D(K) ∩ B(ˆ x, ρ) . Furthermore, there exists a constant β > 0 such that ˆ2X − x∗ − x ˆ2X + ϕ K(x∗ ) − K(x)Y βx∗ − x2X ≤ x − x for all x ∈ B(ˆ x, ρ) ∩ D(K). This assumption is also known as a variational inequality (see, e.g., [245]). We prefer the notion of variational source condition as, e.g., in [96], because it takes the role of the source condition. We now show a relationship between Assumption 4.11 and Assumption 4.12. Theorem 4.13 Let D(K) be open and x∗ ∈ D(K) ∩ B(ˆ x, ρ) be a solution of K(x) = y ∗ with ρ ∈ (0, 1/2) and ϕ : [0, δmax ] → R be a concave index function ∗ ∗ with δmax ≥ K (x )L(X,Y ) and δmax ≥ sup K(x ) − K(x)Y : x ∈ D(K) ∩ B(ˆ x, ρ) . (a) The variational source condition of Assumption 4.12 is equivalent to the following condition: There exists 0 ≤ σ < 1 such that 2 Re(x∗ − x ˆ, x∗ − x)X ≤ σ x∗ − x2X + ϕ K(x∗ ) − K(x)Y (4.5) for all x ∈ B(ˆ x, ρ) ∩ D(K). (b) Let Assumption 4.11 hold and, in addition, ϕ(t) = t for all t or x, ρ) ∩ D(K) K (x∗ )(x − x∗ )Y ≤ η K(x) − K(x∗ )Y for all x ∈ B(ˆ where η is another concave index function. Then also Assumption 4.12 holds with some index function ϕ˜ which is linear if ϕ is linear. (c) Let Assumption 4.12 hold for ϕ(t) = βt for all t ≥ 0. Then x∗ − x ˆ ∈ ∗ ∗ ∗ ∗ ∗ 1/2 R K (x ) = R K (x ) K (x ) . Proof:
(a) From the elementary equation x − x ˆ2X − x∗ − x ˆ2X = 2 Re(x∗ − x ˆ, x − x∗ )X + x − x∗ 2X
4.2
The Nonlinear Tikhonov Regularization
127
we observe that the variational source condition is equivalent to βx − x∗ 2X ≤ 2 Re(x∗ − x ˆ, x − x∗ )X + x − x∗ 2X + ϕ K(x∗ ) − K(x)Y , that is, 2 Re(x∗ − x ˆ, x∗ − x)X ≤ (1 − β)x − x∗ 2X + ϕ K(x∗ ) − K(x)Y , which has the desired form with σ = 1 − β if β ≤ 1 and σ = 0 if β > 1. (b) Set A := K (x∗ ) for abbreviation. Then, for x ∈ B(ˆ x, ρ) ∩ D(K), Re(x∗ − x ˆ, x∗ − x)X = Re ϕ [A∗ A]1/2 w, x∗ − x X = Re w, ϕ [A∗ A]1/2 (x∗ − x) X ≤ wX ϕ [A∗ A]1/2 (x∗ − x) . X
Now we use the fact that for any concave index function ∗ 1/2 ϕ [A A] z X ≤ ϕ AzY for all z ∈ X with zX ≤ 1 . For a proof, we refer to Lemma A.73 of the Appendix. Therefore, Re(x∗ − x (4.6) ˆ, x∗ − x)X ≤ wX ϕ K (x∗ )(x∗ − x)Y . If K (x∗ )(x − x∗ )Y ≤ η K(x) − K(x∗ )Y for all x ∈ B(ˆ x, ρ) ∩ D(K) then Re(x∗ − x ˆ, x∗ − x)X ≤ wX ϕ η(K(x) − K(x∗ )Y ) because of the monotonicity of ϕ. This proves the estimate (4.5) with σ = 0 and ϕ˜ = 2wX ϕ◦η. Note that the composition of two concave index functions is again a concave index function. If ϕ(t) = t for all t we use the estimate K(x) − K(x∗ ) − K (x∗ )(x − x∗ )Y ≤
γ ∗ x − x2X 2
for all x ∈ B(ˆ x, ρ) ∩ D(K) (see Lemma A.63 of the Appendix A.7), thus K (x∗ )(x − x∗ )Y ≤ K(x) − K(x∗ )Y + γ2 x∗ − x2X and thus from (4.6) 2 Re(x∗ − x ˆ, x∗ − x)X ≤ 2wX K(x∗ ) − K(x))Y + wX γ x∗ − x2X for all x ∈ B(ˆ x, ρ) ∩ D(K) which proves the estimate (4.5) with σ = γwX < 1 and ϕ(t) ˜ = 2wX t for t ≥ 0. (c) Let Assumption 4.12 hold for ϕ(t) = βt for all t ≥ 0. For any ﬁxed z ∈ X and t ∈ K suﬃciently small such that x := x∗ − tz ∈ B(ˆ x, ρ) ∩ D(K) we substitute x into (4.5) for ϕ(t) = βt which yields ≤ t2 σz2X + β K(x∗ − tz) − K(x∗ )Y 2 Re t (x∗ − x ˆ, z)X βγ 2 t z2X . ≤ t2 σz2X + tβK (x∗ )zY + 2
128
Nonlinear Inverse Problems
Dividing by t and letting t tend to zero yields2 (x∗ − x ˆ, z)X ≤ β2 AzY for all ˆ belongs to the range of A∗ . z ∈ X where again A = K (x∗ ). We show that x∗ − x ∗ ˆ is orthogonal to the nullspace N (A) of A. We choose First we note that x − x a singular system {μj , xj , yj : j ∈ J} for A, see Appendix A.6, Theorem A.57,
ˆ in the form x∗ − x ˆ = j∈J γj xj . where J is ﬁnite or J = N and expand x∗ − x ˆ is orthogonal (The component in the nullspace N (A) vanishes because x∗ − x to N (A).) We set Jn = J if J is ﬁnite and Jn = {1, . . . , n} if J = N. For γ ˆ, z)X ≤ β2 Az is equivalent to z = j∈Jn μj2 xj the inequality (x∗ − x j 2 2 γj β 2 γj2 ≤ ; 2 μj 4 μ2j j∈Jn
j∈Jn
2 γ2 that is, j∈Jn μj2 ≤ β4 . This proves that w := j∈J j ˆ. Finally, A∗ w = x∗ − x
γj μj
yj ∈ Y is welldeﬁned.
Under the Assumptions 4.9 and 4.12, we are able to prove convergence and also rates of convergence as in the linear theory. As we know from the linear theory there are (at least) two √ strategies to choose the regularization parameter α. To achieve the rate O( δ), we should choose α = α(δ) to be proportional to δ (a priori choice) or such that the “discrepancy” K(xα(δ),δ ) − y δ Y to be proportional to δ (a posteriori choice). For nonlinear operators, K essentially the same arguments as for linear operators (substitute x = x ˆ and x = x∗ into (4.4)) show that lim sup K(xα,δ ) − y δ Y ≤ δ α→0
and
lim K(xα,δ ) − y δ Y = K(ˆ x) − y δ Y
α→∞
for any choice of minimizers xα,δ . However, for nonlinear operators, the mapping α → K(xα,δ ) − y δ Y is not necessarily continuous (see, e.g, [219] for a discussion of this topic). Therefore, the discrepancy principle is not welldeﬁned unless more restrictive assumptions are made. In the following, we just take the possibility to choose the regularization parameter by the discrepancy principle as an assumption (see also [245], Section 4.1.2). ˆ. Theorem 4.14 Let Assumptions 4.9 and 4.12 hold and let ρ > 2x∗ − x (a) Let α = α(δ) be chosen such that c−
δ2 δ2 ≤ α(δ) ≤ c+ ϕ(δ) ϕ(δ)
for all δ > 0
where c+ ≥ c− > 0 are independent of δ (a priori choice), (b) or assume that there exists r+ > r− ≥ 1 and α(δ) > 0 such that r− δ ≤ K(xα(δ),δ ) − y δ Y ≤ r+ δ
for all δ > 0
(a posteriori choice) where xα(δ),δ denotes a minimum of the Tikhonov functional Jα(δ),δ on D(K). 2 Note
that the phases of t/t can be chosen arbitrarily!
4.2
The Nonlinear Tikhonov Regularization
129
Then xα(δ),δ ∈ B(ˆ x, ρ) for suﬃciently small δ and α(δ),δ ∗ 2 − x X = O ϕ(δ) and K(xα(δ),δ ) − y ∗ Y = O(δ) , x
δ → 0.
x, ρ) for suﬃciently small δ. From Proof: We show ﬁrst that xα(δ),δ ∈ B(ˆ (4.4) for x = x∗ , we conclude that ˆ2X K(xα,δ ) − y δ 2Y + αxα,δ − x
≤ ≤
δ2 ϕ(δ) ϕ(δ) ρ2 c− + 4
If c−
≤ α(δ) ≤ c+
δ2 ϕ(δ)
δ 2 + αx∗ − x ˆ2X ρ2 . δ2 + α 4
we conclude that xα(δ),δ − x ˆ2X ≤
δ2 α(δ)
(4.7a) (4.7b) +
ρ2 4
≤
2
≤ ρ for suﬃciently small δ. If the discrepancy principle holds, then from (4.7b), 2 2 δ + α(δ)xα(δ),δ − x ˆ2X ≤ δ 2 + α(δ) r−
ρ2 , 4
2
and thus α(δ)xα(δ),δ − x ˆ2X ≤ α(δ) ρ4 because r− ≥ 1. This shows xα(δ),δ − x ˆX ≤ ρ and ends the ﬁrst part of the proof. To show the error estimates, we use the variational source condition of Assumption 4.12 and (4.7a). K(xα,δ ) − y δ 2Y + αβxα,δ − x∗ 2X ≤ K(xα,δ ) − y δ 2Y + αxα,δ − x ˆ2X − αx∗ − x ˆ2X + α ϕ K(x∗ ) − K(xα,δ )Y ≤ δ 2 + α ϕ K(x∗ ) − K(xα,δ )Y (4.8) ≤ δ 2 + α ϕ y δ − K(xα,δ )Y + δ . Let ﬁrst α(δ) be chosen according to the discrepancy principle. Then 2 − 1) δ 2 + α(δ) β xα(δ),δ − x∗ 2X ≤ α(δ) ϕ (r+ + 1)δ (r− ≤
α(δ) (1 + r+ ) ϕ(δ)
where we have used that ϕ(sδ) ≤ sϕ(δ) for all s ≥ 1 (see Lemma A.73). This proves the assertion for xα(δ),δ − x∗ X after division by α(δ) and dropping the ﬁrst term on the left hand side. The estimate for K(xα(δ),δ ) − y ∗ Y follows obviously from the triangle inequality because K(xα(δ),δ ) − y ∗ Y ≤ K(xα(δ),δ ) − y δ Y + δ ≤ (r+ + 1) δ. 2
2
δ δ Let now c− ϕ(δ) ≤ α(δ) ≤ c+ ϕ(δ) . Substituting this into (4.8) and dropping the second term on the left hand side yields
K(xα(δ),δ ) − y δ 2Y ≤ δ 2 +
c+ δ 2 δ ϕ y − K(xα(δ),δ )Y + δ . ϕ(δ)
Now we set t = K(xα(δ),δ ) − y δ Y /δ for abbreviation. Then the previous estimate reads as c+ ϕ (1 + t)δ ≤ 1 + c+ (1 + t) = 1 + c+ + c+ t , t2 ≤ 1 + ϕ(δ)
130
Nonlinear Inverse Problems
where we used again ϕ (1 + t)δ ≤ (1 + t)ϕ(δ). Completing the square yields c2 t ≤ c2+ + 1 + c+ + 4+ ; that is, K(xα(δ),δ ) − y δ Y ≤ cδ for some c > 0. Now we substitute this and the bounds of α(δ) into (4.8) again and drop the ﬁrst term on the left hand side which yields δ2 δ2 β xα(δ),δ − x∗ 2X ≤ δ 2 + c+ ϕ (1 + c)δ ≤ δ 2 + c+ (1 + c) δ 2 c− ϕ(δ) ϕ(δ) which yields c− β xα(δ),δ − x∗ 2X ≤ [1 + c+ (1 + c)] ϕ(δ) and ends the proof of the theorem. By Theorem 4.13, the special case ϕ(t) = t corresponds to the √ source condition ˆ ∈ R (A∗ A)1/2 = R(A∗ ) and leads to the order O( δ) just as in the x∗ − x linear case. The cases ϕ(t) = tσ for σ > 1 are not covered by the previous theorem because these index functions are not concave anymore. For proving the analogue of Theorem 2.12 to get the optimal order of convergence up to O(δ 2/3 ), we have to use the classical source condition of Assumption 4.11, which is the obvious extension of the one in the linear case, see Theorem 2.12. A variational source condition for this case is not available. We follow the approach in [92] (see also [260] for the original proof). xX and (ii) Theorem 4.15 Let Assumptions 4.9 and 4.11 hold with ρ > 2x∗ −ˆ ∗ ∗ ∗ σ/2 ∗ ∗ ∗ σ/2 ∗ modiﬁed in the way that x −ˆ x= K (x ) K (x ) v ∈ R K (x ) K (x ) ∗ ∗ ∗ −1/2 ∗ for some σ ∈ [1, 2] and v ∈ X such that γ K (x ) K (x ) (x − x ˆ) X < 1 where γ denotes the Lipschitz constant from part (i) of Assumption 4.11. We choose α(δ) such that c− δ 2/(σ+1) ≤ α(δ) ≤ c+ δ 2/(σ+1) . Then xα(δ),δ − x∗ X α(δ),δ
K(x
∗
) − y Y
= O δ σ/(σ+1) , = O δ σ/(σ+1) ,
δ → 0, δ → 0.
Proof: We leave α > 0 arbitrary til the end of the proof. We set A := K (x∗ ) and choose a singular system {μj , xj , yj : j ∈ J} for A, see Appendix A.6, ˆ as Theorem A.57, where J is ﬁnite or J = N. Then we write x∗ − x
x∗ − x ˆ = (A∗ A)σ/2 v = μσj vj xj = A∗ w with w = μσ−1 vj y j j j∈J
j∈J
where vj = (v, xj )X are the expansion coeﬃcients of v. Then γwY < 1. We deﬁne zα ∈ X by (4.9) zα = x∗ − α (A∗ A + αI)−1 A∗ w ; that is, ∗
zα = x
− α
j∈J
μσj vj xj , thus zα − x∗ 2X = α2 2 μj + α j∈J
μσj μ2j + α
2 vj 2 .
4.2
The Nonlinear Tikhonov Regularization
131
Later, we will also need the form
μσ−1
μσ+1 j j σ−1 ∗ A(zα − x ) + αw = α vj y j μj − 2 vj y j = α 2 μj + α μ2j + α j∈J
j∈J
with ∗
A(zα − x ) +
αw2Y
= α
4
j∈J
μσ−1 j μ2j + α
2 vj 2 .
With the elementary estimate (see Problem 4.3) μt ≤ ct αt/2−1 , μ2 + α
μ ≥ 0,
for t = σ and t = σ − 1, respectively, (where ct depends on t only) we obtain zα − x∗ 2X A(zα − x∗ ) + αw2Y
≤ ≤
c2σ ασ , c2σ−1 ασ+1 .
(4.10a) (4.10b)
In particular, zα converges to x∗ as α → 0 and is, therefore, in B(ˆ x, ρ) ∩ D(K) for suﬃciently small α. We use the optimality of xα,δ ; that is (4.4), for x = zα to obtain K(xα,δ ) − y δ 2Y + αxα,δ − x ˆ2X ≤ K(zα ) − y δ 2Y + αzα − x ˆ2X . With xα,δ − x ˆ2X zα − x ˆ2X
= x∗ − x ˆ2X + 2 Re xα,δ − x∗ , x∗ − x ˆ X + xα,δ − x∗ 2X = x∗ − x ˆ2X + 2 Re A(xα,δ − x∗ ), w Y + xα,δ − x∗ 2X , = x∗ − x ˆ2X + 2 Re A(zα − x∗ ), w Y + zα − x∗ 2X
we obtain K(xα,δ ) − y δ 2Y + 2α Re w, A(xα,δ − x∗ ) Y + αxα,δ − x∗ 2X ≤ K(zα ) − y δ 2Y + 2α Re w, A(zα − x∗ ) Y + αzα − x∗ 2X ,
and thus K(xα,δ ) − y δ + αw2Y + αxα,δ − x∗ 2X ≤ α2 w2Y + 2α Re w, K(xα,δ ) − y δ − A(xα,δ − x∗ ) Y +K(zα ) − y δ 2Y + 2α Re w, A(zα − x∗ ) Y + αzα − x∗ 2X . Now we use K(xα,δ ) = K(x∗) + A(xα,δ −x∗) + rα,δ = y ∗ + A(xα,δ −x∗ ) + rα,δ
and
K(zα ) = K(x∗ ) + A(zα − x∗ ) + sα = y ∗ + A(zα − x∗ ) + sα
132
Nonlinear Inverse Problems
with rα,δ Y ≤ γ2 xα,δ − x∗ 2X and sα Y ≤ γ2 zα − x∗ 2X and obtain K(xα,δ ) − y δ + αw2Y + αxα,δ − x∗ 2X ≤ α2 w2Y + 2α Re w, y ∗ − y δ Y + 2α Re w, rα,δ Y + y ∗ − y δ + A(zα − x∗ ) + sα 2Y + 2α Re w, A(zα − x∗ ) Y
+ αzα − x∗ 2X = α2 w2Y + 2α Re w, y ∗ − y δ Y + 2α Re w, rα,δ Y + y ∗ − y δ 2Y + 2 Re A(zα − x∗ ) + sα , y ∗ − y δ Y +A(zα − x∗ ) + sα 2Y + 2α Re w, A(zα − x∗ ) Y + αzα − x∗ 2X = 2 Re A(zα − x∗ ) + αw, y ∗ − y δ Y + 2 Re(sα , y ∗ − y δ )Y + α2 w2Y + 2α Re w, rα,δ Y + A(zα − x∗ ) + sα 2Y + 2α Re w, A(zα − x∗ ) + sα Y − 2α Re(w, sα )Y + αzα − x∗ 2X = 2 Re A(zα − x∗ ) + αw, y ∗ − y δ Y + 2 Re(sα , y ∗ − y δ )Y + 2α Re w, rα,δ Y + A(zα − x∗ ) + αw + sα 2Y − 2α Re(w, sα )Y + αzα − x∗ 2X
≤
2δA(zα − x∗ ) + αwY + γδzα − x∗ 2X + α γwY xα,δ − x∗ 2X + 2A(zα − x∗ ) + αw2Y γ2 zα − x∗ 4X + αγwY zα − x∗ 2X + αzα − x∗ 2X . + 2
Now we use that γwY < 1 and thus K(xα,δ ) − y δ + αw2Y + α 1 − γwY xα,δ − x∗ 2X ≤
2δA(zα − x∗ ) + αwY + 2A(zα − x∗ ) + αw2Y γ2 zα − x∗ 4X + (γδ + αγwY + α)zα − x∗ 2X . + 2
So far, we have not used the deﬁnition of zα . We substitute the estimates (4.10a), (4.10b) and arrive at K(xα,δ ) − y δ + αw2Y + α 1 − γwY xα,δ − x∗ 2X ≤ c δ α(σ+1)/2 + ασ+1 + α2σ + δ ασ for some c > 0. Dropping one of the terms on the left hand side yields K(xα,δ ) − y δ 2Y
≤ ≤
xα,δ − x∗ 2X
≤
2K(xα,δ ) − y δ + αw2Y + 2α2 w2Y c δ α(σ+1)/2 + ασ+1 + α2σ + δ ασ + α2 , c δ α(σ−1)/2 + ασ + α2σ−1 + δ ασ−1
4.2
The Nonlinear Tikhonov Regularization
133
for some c > 0. The choice c− δ 2/(σ+1) ≤ α(δ) ≤ c+ δ 2/(σ+1) yields the desired result (note that 1 ≤ σ ≤ 2). We note that the modiﬁed defect K(xα(δ),δ )−y ∗ +α(δ)wY satisﬁes K(xα(δ),δ )− y ∗ + α(δ)wY ≤ cδ (see Problem 4.4). In the next subsection, we apply the result to the Tikhonov regularization of a parameter identiﬁcation problem.
4.2.3
A ParameterIdentification Problem
Let f ∈ L2 (0, 1) be given. It is the aim to determine the parameter function c ∈ L2 (0, 1), c ≥ 0 on (0, 1), in the boundary value problem −u (t) + c(t) u(t) = f (t) , 0 < t < 1 ,
u(0) = u(1) = 0 ,
(4.11)
from perturbed data uδ (t). We recall the Sobolev spaces H p (0, 1) from (1.24) as the spaces p
H (0, 1) =
u∈C
p−1
[0, 1] : u
(p−1)
t (t) = α+
ψ(s) ds , α ∈ R , ψ ∈ L (0, 1) 2
0
and set u(p) := ψ for the pth derivative. Note that H p (0, 1) ⊂ C[0, 1] for p ≥ 1 by deﬁnition. Then the diﬀerential equation of (4.11) for u ∈ H 2 (0, 1) is understood in the L2 −sense. It is an easy exercise (see Problem 4.7) to show that u∞ ≤ u L2 (0,1) for all u ∈ H 1 (0, 1) with u(0) = 0. First, we consider the direct problem and show that the boundary value problem is equivalent to an integral equation of the second kind. Lemma 4.16 Let f ∈ L2 (0, 1) and c ∈ L2+ (0, 1) := {c ∈ L2 (0, 1) : c ≥ 0 almost everywhere on (0, 1)}. (a) If u ∈ H 2 (0, 1) solves (4.11) then u solves the integral equation 1 u(t) +
1 g(t, s) c(s) u(s) ds =
0
where g(t, s) =
g(t, s) f (s) ds ,
t ∈ [0, 1] ,
(4.12)
0
s(1 − t), 0 ≤ s ≤ t ≤ 1, t(1 − s), 0 ≤ t ≤ s ≤ 1.
(b) We deﬁne the integral operator G : L2 (0, 1) → L2 (0, 1) by
1 g(t, s) v(s) ds = (1 − t)
(Gv)(t) = 0
2
1
t
(1 − s) v(s) ds ,
s v(s) ds + t 0
t
t ∈ (0, 1), v ∈ L (0, 1). The operator G is bounded from L2 (0, 1) into H 2 (0, 1) and (Gv) = −v.
134
Nonlinear Inverse Problems
(c) If u ∈ C[0, 1] is a solution of (4.12); that is, of u + G(cu) = Gf , then u ∈ H 2 (0, 1) and u is a solution of (4.11). Note that the righthand side of (4.12) is continuous because Gf ∈ H 2 (0, 1). Proof: (a) Let u ∈ H 2 (0, 1) solve (4.11) and set h = f − cu. Then h ∈ 2 L (0, 1) (because u is continuous) and −u = h. Integrating this equation twice and using the boundary conditions u(0) = u(1) = 0 yields the assertion (see Problem 4.5). (b) Let v ∈ L2 (0, 1) and set t u(t) = (Gv)(t) = (1 − t)
1
ψ(t) = −
1 (1 − s) v(s) ds ,
v(s) ds + 0
t ∈ [0, 1] ,
t
0
t
(1 − s) v(s) ds ,
s v(s) ds + t
t ∈ [0, 1] .
0
t Then it is easy to see that u(t) = 0 ψ(s) ds. Therefore, u ∈ H 1 (0, 1) and u = ψ. From the deﬁnitions of ψ and H 2 (0, 1), we observe that u ∈ H 2 (0, 1) and v = −u . (c) Let now u ∈ C[0, 1] be a solution of (4.12) and set again h = f − cu. Then again h ∈ L2 (0, 1), and u has the representation u = Gh. By part (b), we conclude that u ∈ H 2 (0, 1) and −u = h = f − cu. Theorem 4.17 The integral equation (4.12) and the boundary value problem (4.11) are uniquely solvable for all f ∈ L2 (0, 1) and c ∈ L2+ (0, 1). Furthermore, 2 there exists γ > 0 (independent of f and c) such that uH (0,1) ≤ γ 1 + cL2 (0,1) f L2 (0,1) . Proof:
By the previous lemma we have to study the integral equation u + G(cu) = Gf
(4.13)
with the integral operator G with kernel g from the previous lemma. The operator T : u → G(cu) is bounded from C[0, 1] into H 2 (0, 1) and thus compact from C[0, 1] into itself (see again Problem 4.5). Now we use the following result from linear functional analysis (see Theorem A.36 of the Appendix A.3): If the homogeneous linear equation u + T u = 0 with the compact operator T from C[0, 1] into itself admits only the trivial solution u = 0 then the inhomogeneous equation u + T u = h is uniquely solvable for all h ∈ C[0, 1], and the solution depends continuously on h. In other words, if I + T is onetoone then also onto and I + T is boundedly invertible. Therefore, we have to show injectivity of I + T in C[0, 1]. Let u ∈ C[0, 1] solve (4.13) for h = 0. Then u ∈ H 2 (0, 1) and u solves (4.11) for f = 0 by the previous lemma. Multiplication of (4.11) by u(t) and integration yields 1 0 = 0
−u (t) + c(t) u(t) u(t) dt =
1 0
2 u (t) + c(t) u(t)2 dt ,
4.2
The Nonlinear Tikhonov Regularization
135
where we used partial integration and the fact that u vanishes at the boundary of [0, 1]. Since c ≥ 0 we conclude that u vanishes on [0, 1]. Therefore, u is constant and thus zero because of the boundary conditions. Therefore, I + T is onetoone and thus invertible. This shows that (4.11) is uniquely solvable in H 2 (0, 1) for every f, c ∈ L2 (0, 1) with c ≥ 0 almost everywhere on (0, 1). In order to derive the explicit estimate for uH 2 (0,1) we observe ﬁrst that any solution u ∈ H 2 (0, 1) of (4.11) satisﬁes u 2L2 (0,1) ≤ f L2 (0,1) uL2 (0,1) . Indeed, this follows by multiplication of the diﬀerential equation by u(t) and integration: 1 −
1
u (t) u(t) dt + 0
2
1 f (t) u(t) dt ≤ f L2 (0,1) uL2 (0,1) .
c(t) u(t) dt = 0
0
Partial integration and the assumption c(t) ≥ 0 yields the estimate u 2L2 (0,1) ≤ f L2 (0,1) uL2 (0,1) . With u∞ ≤ u L2 (0,1) and uL2 (0,1) ≤ u∞ this implies that u∞ ≤ f L2 (0,1) . Therefore uH 2 (0,1)
= G(f − cu)H 2 (0,1) ≤ GL(L2 (0,1),H 2 (0,1)) f − cuL2 (0,1) ≤ GL(L2 (0,1),H 2 (0,1)) f L2 (0,1) + cL2 (0,1) u∞ ≤ 1 + cL2 (0,1) GL(L2 (0,1),H 2 (0,1)) f L2 (0,1) .
We can even show existence and uniqueness for c from a small open neighbor hood of L2+ (0, 1) = c ∈ L2 (0, 1) : c ≥ 0 on (0, 1) . Corollary 4.18 There exists δ > 0 such that the boundary value problem (4.11) is uniquely solvable for all f ∈ L2 (0, 1) and c ∈ Uδ where c1 ∈ L2+ (0, 1), h ∈ L2 (0, 1) 2 Uδ := c = c1 + h ∈ L (0, 1) : . 1 + c1 L2 (0,1) hL2 (0,1) < δ Furthermore, there exists γ > 0 such that uH 2 (0,1) ≤ γ 1+cL2 (0,1) f L2 (0,1) for all f ∈ L2 (0, 1) and c ∈ Uδ . Proof: Let Kc1 : L2 (0, 1) → H 2 (0, 1) be the operator f → u where u is the solution of (4.11) for c1 ∈ L2+ (0, 1). Let c = c1 + h ∈ Uδ . We consider the ﬁxed u) = Kc1 f for u ˜ ∈ H 2 (0, 1). We have for v ∈ H 2 (0, 1) point equation u ˜ + Kc1 (h˜ that Kc1 (hv)H 2 (0,1)
≤ ≤
γ(1 + c1 L2 (0,1) ) hvL2 (0,1) γ(1 + c1 L2 (0,1) ) hL2 (0,1) v∞ ≤ γ δvH 2 (0,1) .
For δ < 1/γ we observe that v → Kc1 (hv) is a contraction and, by the Conu) = Kc1 f has a unique solution u ˜ ∈ H 2 (0, 1) traction Theorem A.31, u ˜ + Kc1 (h˜
136
Nonlinear Inverse Problems
and ˜ uH 2 (0,1)
≤ ≤ ≤
1 + c1 L2 (0,1) 1 Kc1 f H 2 (0,1) ≤ γ f L2 (0,1) 1 − δγ 1 − δγ 1 + c1 + hL2 (0,1) + δ f L2 (0,1) γ 1 − δγ 1+δ (1 + cL2 (0,1) ) f L2 (0,1) γ 1 − δγ
because hL2 (0,1) ≤ δ. Finally, we note that the equation u ˜ = Kc1 f − Kc1 (h˜ u) u = f. is equivalent to −˜ u + (c1 + h)˜ We note that the set Uδ is an open set containing L2+ (0, 1) (see Problem 4.5). Therefore, K can be extended to the set Uδ . As a next step towards the inverse problem, we show that the nonlinear mapping K : c → u is continuous and even diﬀerentiable. Theorem 4.19 Let Uδ ⊃ L2+ (0, 1) be as in the previous corollary and let K : Uδ → H 2 (0, 1) deﬁned by K(c) = u where u ∈ H 2 (0, 1) solves the boundary value problem (4.11). Then K is continuous and even diﬀerentiable in every c ∈ Uδ . The derivative is given by K (c)h = v where v ∈ H 2 (0, 1) is the solution of the boundary value problem −v (t) + c(t) v(t) = −h(t) u(t) , 0 < t < 1 ,
v(0) = v(1) = 0 ,
(4.14)
and u ∈ H 2 (0, 1) is the solution of (4.11) for c; that is, u = K(c). ˜ = K(c + h). Then u Proof: Let h ∈ L2 (0, 1) such that c + h ∈ Uδ and let u and u ˜ satisfy u = f, −˜ u +(c+h)˜
−u +(c+h)u = f +hu ,
u ˜(0) = u ˜(1) = u(0) = u(1) = 0 ,
u−u) = respectively. We subtract both equations which yields −(˜ u−u) +(c+h)(˜ −hu. The stability estimate yields ˜ u − uH 2 (0,1) ≤ γ 1 + c + hL2 (0,1) huL2 (0,1) ≤ γ 1 + cL2 (0,1) + δ hL2 (0,1) u∞ which proves continuity (even Lipschitz continuity on bounded sets for c). For the diﬀerentiability, we just subtract the equations for u and v from the one for u ˜ u − u − v) = −hv , (4.15) −(˜ u − u − v) + (c + h)(˜ with homogeneous boundary conditions. The stability estimate yields ˜ u − u − vH 2 (0,1) ≤ γ 1 + c + hL2 (0,1) hvL2 (0,1) ≤ γ 1 + cL2 (0,1) + δ hL2 (0,1) v∞ .
4.2
The Nonlinear Tikhonov Regularization
137
The stability estimate applied to v yields v∞ ≤ vH 2 (0,1) ≤ γ 1 + cL2 (0,1) huL2 (0,1) ≤ γ 1 + cL2 (0,1) hL2 (0,1) u∞ which altogether ends up to 2 ˜ u − u − vH 2 (0,1) ≤ γ 2 1 + cL2 (0,1) + δ h2L2 (0,1) u∞ .
This proves diﬀerentiability.
In the following, we consider the parametertosolution map K as a mapping from Uδ ⊂ L2 (0, 1) into L2 (0, 1) instead of H 2 (0, 1). Then, of course, K is also diﬀerentiable with respect to this space with the same derivative. From the theory, we know that for Assumption 4.11, we need the adjoint of K (c). Lemma 4.20 The adjoint operator K (c)∗ : L2 (0, 1) → L2 (0, 1) is given by K (c)∗ w = −u y where u = K(c), and y ∈ H 2 (0, 1) solves the following boundary value problem (the “adjoint problem”): −y (t) + c(t) y(t) = w(t) , 0 < t < 1 ,
y(0) = y(1) = 0 ,
(4.16)
Proof: Let h, w ∈ L2 (0, 1) and v the solution of (4.14) for c = c∗ and y the solution of (4.16). By partial integration we compute K (c)h, w L2 (0,1)
1 =
v(t) [−y (t) + c(t) y(t)] dt
0
1
1
[−v (t) + c(t) v(t)] y(t) dt = −
= 0
h(t) u(t) y(t) dt 0
= −(h, uy)L2 (0,1) . This proves the assertion.
Now we can formulate condition (ii) of Assumption 4.11 for a linear index function ϕ: The existence of w ∈ L2 (0, 1) with c∗ − cˆ = K (c∗ )∗ w is equivalent to the existence of y ∈ H 2 (0, 1) with y(0) = y(1) = 0 and c∗ − cˆ = −u∗ y. Therefore, the condition is equivalent to c∗ − cˆ ∈ H 2 (0, 1) ∩ H01 (0, 1) . u∗ This includes smoothness of c∗ − cˆ as well as a suﬃciently strong boundary condition because also u∗ vanishes at the boundary of [0, 1].
138
Nonlinear Inverse Problems
4.2.4
A Glimpse on Extensions to Banach Spaces
Recalling the classical Tikhonov functional Jα,δ in Hilbert spaces from (4.2) we observe that the ﬁrst term measures the misﬁt in the equation while the second part serves as a penalty term. The error xα,δ − x∗ in the solution is measured in a third metric. In many cases, the canonical space for the unknown quantity x is only a Banach space rather than a Hilbert space. For example, in parameter identiﬁcation problems as in the previous subsection the canonical space for the parameters are L∞ −spaces rather than L2 −spaces. The Hilbert space setting in the previous subsection only works because the pointwise multiplication is continuous as a mapping from L2 × H 2 into L2 . For more general partial diﬀerential equation this is not always true. For an elaborate motivation why to use Banach space settings, we refer to Chapter I of the excellent monograph [245] by Schuster, Kaltenbacher, Hofmann, and Kazimierski. In this subsection, we will get the ﬂavor of some aspects of this theory. Let X and Y be Banach spaces, K : X ⊃ D(K) → Y a (nonlinear) operator with the domain of deﬁnition D(K) ⊂ X where D(K) is again convex and closed, and let x∗ ∈ D(K) be the exact solution of K(x) = y ∗ for some y ∗ . In the following we ﬁx this pair x∗ , y ∗ . As before, y ∗ is perturbed by y δ ∈ Y such that y δ − y ∗ Y ≤ δ for all δ ∈ (0, δmax ) for some δmax > 0. We note that we measure the error in the data with respect to the Banach space norm. The penalty term x − x ˆ2X is now replaced by any convex and continuous function Ω : X → [0, ∞] where D(Ω) := {x ∈ X : Ω(x) < ∞} is not empty and, even more, D(K) ∩ D(Ω) = ∅. Further assumptions on Ω and K are needed to ensure the existence of minima of the Tikhonov functional Jα,δ (x) := K(x) − y δ pY + α Ω(x) ,
x ∈ D(K) ∩ D(Ω) .
(4.17)
Here p > 1 is a ﬁxed parameter. Instead of posing assumptions concerning the weak topology, we just make the same assumption as at the beginning of Subsection 4.2.2. Assumption 4.21 The Tikhonov functional Jα,δ possesses global minima xα,δ ∈ D(K) ∩ D(Ω) on D(K) ∩ D(Ω) for all α, δ > 0; that is, K(xα,δ ) − y δ pY + α Ω(xα,δ ) ≤ K(x) − y δ pY + α Ω(x)
(4.18)
for all x ∈ D(K) ∩ D(Ω). It remains to specify the metric in which we measure the error in x. As we will see in a moment the norm in X is not always the best possibility. To have more ﬂexibility, we take any “measure function” E(x) which measures the distance of x to x∗ . We only require that E(x) ≥ 0 for all x ∈ X and E(x∗ ) = 0. Then the variational source condition of Assumption 4.12 is generalized into the following form.
4.2
The Nonlinear Tikhonov Regularization
139
Assumption 4.22 (Variational Source condition) Let x∗ ∈ D(K) ∩ D(Ω) be a solution of K(x) = y ∗ . Furthermore, let δmax = sup K(x∗ ) − K(x)Y : x ∈ D(K) ∩ D(Ω) and ϕ : [0, δmax ) → R be a concave index function (see Deﬁnition 4.10), and, for some constants β > 0 and ρ > 0 let the following estimate hold. β E(x) ≤ Ω(x) − Ω(x∗ ) + ϕ K(x∗ ) − K(x)Y for all x ∈ Mρ := x ∈ D(K) ∩ D(Ω) : Ω(x) ≤ Ω(x∗ ) + ρ . Theorem 4.23 Let Assumptions 4.21 and 4.22 hold. (a) Let α = α(δ) be chosen such that c−
δp δp ≤ α(δ) ≤ c+ ϕ(δ) ϕ(δ)
for all δ ∈ (0, δ1 )
(4.19)
where c+ ≥ c− ≥ ρ1 are independent of δ and where δ1 ∈ (0, δmax ) is chosen such that ϕ(δ1 ) ≤ ρ c− (a priori choice), (b) or assume that there exists r+ > r− ≥ 1 and α(δ) > 0 such that r− δ ≤ K(xα(δ),δ ) − y δ Y ≤ r+ δ
for all δ ∈ (0, δ1 )
(4.20)
where δ1 ∈ (0, δmax ) is arbitrary (a posteriori choice). Then xα(δ),δ ∈ Mρ and E(xα(δ),δ ) = O ϕ(δ)
and
K(xα(δ),δ ) − y ∗ Y = O(δ) ,
δ → 0.
Proof: We follow almost exactly the proof of Theorem 4.14. Substituting x = x∗ into (4.18) yields K(xα,δ ) − y δ pY + α Ω(xα,δ ) ≤ δ p + α Ω(x∗ ) .
(4.21)
First we show that xα(δ),δ ∈ Mρ . If α(δ) is chosen as in (4.19) then Ω(xα(δ),δ ) ≤
δp 1 1 + Ω(x∗ ) ≤ ϕ(δ) + Ω(x∗ ) ≤ ϕ(δ1 ) + Ω(x∗ ) α(δ) c− c−
which shows xα(δ),δ ∈ Mρ by the choice of δ1 . If α(δ) is chosen by the discrepancy principle (4.20) then again from (4.21) p − 1) δ p + α(δ) Ω(xα(δ),δ ) ≤ α(δ) Ω(x∗ ) (r−
and thus Ω(xα(δ),δ ) ≤ Ω(x∗ ) because r− ≥ 1. Therefore, xα(δ),δ ∈ M0 ⊂ Mρ . Now we show the rates of convergence. Applying the variational source condition we conclude from (4.21) that K(xα,δ ) − y δ pY + αβ E(xα,δ ) ≤ δ p + α ϕ K(xα,δ ) − y ∗ Y
140
Nonlinear Inverse Problems ≤ δ p + α ϕ K(xα,δ ) − y δ Y + δ .
(4.22)
If α = α(δ) is chosen according to the discrepancy principle (4.20) then p (r− − 1)δ p + α(δ) β E(xα(δ),δ ) ≤ α(δ) ϕ (r+ + 1)δ and thus β E(xα(δ),δ ) ≤ ϕ (r+ + 1)δ . Now we use the elementary estimate ϕ(sδ) ≤ s ϕ(δ) for all s ≥ 1 and δ ≥ 0 (see Lemma A.73 of Appendix A.8). Therefore, β E(xα(δ),δ ) ≤ (r+ + 1) ϕ(δ) . This proves the estimate for E(xα(δ),δ ). The estimate for K(xα(δ),δ ) − y ∗ Y follows obviously from the discrepancy inequality and the triangle inequality. Let now α = α(δ) be given by (4.19). From (4.22), we obtain, using the upper estimate of α(δ), δp ϕ K(xα,δ ) − y δ Y + δ . K(xα(δ),δ ) − y δ pY ≤ δ p + c+ ϕ(δ) We set t = K(xα(δ),δ ) − y δ Y /δ for abbreviation. Then the previous formula takes the form ϕ δ(t + 1) ≤ 1 + c+ (t + 1) = (1 + c+ ) + c+ t tp ≤ 1 + c+ ϕ(δ) where we used the estimate ϕ(sδ) ≤ sϕ(δ) for s ≥ 1 again. Choose c > 0 with c (cp−1 − c+ ) > 1 + c+ . Then t ≤ c. Indeed, if t > c then tp − c+ t = t (tp−1 − c+ ) > c (cp−1 − c+ ) > 1 + c+ , a contradiction. This proves that K(xα(δ),δ ) − y δ Y ≤ c δ. Now we substitute this into the righthand side of (4.22), which yields 1 δp + ϕ (c + 1)δ ≤ β E(xα(δ),δ ) ≤ ϕ(δ) + (c + 1) ϕ(δ) . α(δ) c− This ends the proof.
We note that in the case of Hilbert spaces X and Y and Ω(x) = x − x ˆ2X and ∗ 2 E(x) = x − x X Assumption 4.22 and Theorem 4.23 reduce to Assumption 4.12 and Theorem 4.14, respectively. Before we continue with the general theory, we apply this theorem to the special situation to determine a sparse approximation of the linear problem Kx = y δ . By sparse we mean that the solution x∗ ∈ X can be expressed by only ﬁnitely ˜ be a Banach space many elements of a given basis of X. Therefore, let X having a Schauder basis {bj : j ∈ N} with bj X˜ = 1 for all j ∈ N; that is, ˜ has a unique representation as x = every element x ∈ X j∈N xj bj where, of ˜ course, the convergence is understood in the norm of X. We deﬁne the subspace ˜ by X⊂X ⎧ ⎫ ⎨ ⎬
X = x= xj bj : xj  < ∞ ⎩ ⎭ j∈N
j∈N
4.2
The Nonlinear Tikhonov Regularization
141
with the norm j∈N xj bj X := xj  for x ∈ X. Then X is boundj∈N ˜ because x ˜ = ˜ ≤ edly imbedded in X ˜ = j∈N xj bj X j∈N xj bj X X
x  = x for x ∈ X. Obviously, the space X is normisomorphic j X j∈N
∞ to the space 1 of sequences (xj ) such that j=1 xj  converge, equipped with
∞ the canonical norm x 1 = j=1 xj  for x = (xj ). As the Schauder basis of (j)
1 we take e(j) : j = 1, 2, . . . ⊂ 1 where e(j) ∈ 1 is deﬁned as ek = 0 for (j) k = j and ej = 1. Therefore, we can take directly 1 as the space X. We note that the dual of 1 is just ( 1 )∗ = ∞ , the space
of bounded sequences with the supnorm and the dual pairing3 y, x ∞ , 1 = j∈N yj xj for y ∈ ∞ and x ∈ 1 . Also we note that 1 itself is the dual of the space c0 of sequences converging to zeros (see Example A.21). Therefore, by Theorem A.77 of Appendix A.9 the unit ball in 1 is weak∗ compact which is an important ingredient to prove existence of minimizers of the Tikhonov functional. With these introductory remarks, we are able to show the following result where we followed the presentation in [96]. Theorem 4.24 Let Y be a Banach space and K : 1 → Y be a linear bounded operator such that μ, Ke(j) Y ∗ ,Y → 0 as j → ∞ for all μ ∈ Y ∗ where μ, yY ∗ ,Y denotes the application of μ ∈ Y ∗ to y ∈ Y . Let Kx∗ = y ∗ and y δ ∈ Y with y δ − y ∗ Y ≤ δ. (a) Let xα,δ ∈ 1 be a minimizer of Jα,δ (x) = Kx − y δ pY + αx 1 ,
x ∈ 1 .
Then xα,δ ∈ 1 is sparse; that is, the number of nonvanishing components = 0 is ﬁnite. xα,δ j (b) For every j ∈ N let there exists fj ∈ Y ∗ with e(j) = K ∗ fj where K ∗ : Y ∗ → ( 1 )∗ = ∞ is the dual operator corresponding to K. Deﬁne the function ϕ : [0, ∞) → R as n
. x∗j  where γn = sup s f ϕ(t) := 2 inf γn t + j j n∈N
j>n
sj ∈{0,1,−1} j=1
Y∗
Then ϕ is a concave index function. With the choices (4.19) or (4.20) of α = α(δ) the following convergence rates hold: xα(δ),δ − x∗ 1 = O ϕ(δ) , Kxα(δ),δ − y ∗ Y = O(δ) as δ tends to zero. (c) If x∗ ∈ 1 is such that
∞
γjσ x∗j  < ∞ for some σ > 0 then we have xα(δ),δ − x∗ 1 = O δ σ/(1+σ) , δ → 0 . j=1
3 Note that we denote the dual pairing by , x ∗ ∗ X ,X = (x) for ∈ X and x ∈ X. The mapping (, x) → , xX ∗ ,X is bilinear.
142
Nonlinear Inverse Problems If x∗ is sparse or if (γn ) is bounded then we have xα(δ),δ − x∗ 1 = O(δ) as δ tends to zero.
Proof: as
(a) Set z = Kxα,δ − y δ for abbreviation. The optimality of xα,δ reads
z + KhpY − zpY ≥ −α xα,δ + h 1 − xα,δ 1 for all h ∈ 1 .
(4.23)
Deﬁne the sets A, B ⊂ R × Y as follows: A = (r, y) ∈ R × Y : r > z + ypY − zpY , B = (r, Kh) ∈ R × Y : h ∈ 1 , r ≤ −α xα,δ + h 1 − xα,δ 1 . Then it is not diﬃcult to show that A and B are convex, A is open, and A∩B = ∅ because of (4.23). Now we apply the separation theorem for convex sets (see Theorem A.69 of Appendix A.8). There exists (s, μ) ∈ R × Y ∗ and γ ∈ R such that (s, μ) = (0, 0) and s r + μ, yY ∗ ,Y ≥ γ ≥ s r + μ, KhY ∗ ,Y for all (r, y) ∈ A and (r , Kh) ∈ B . Letting r tend to inﬁnity while keeping the other variables constant yields s ≥ 0. It is s = 0 because otherwise we would have μ, yY ∗ ,Y ≥ γ for all y ∈ Y (set r := z + ypY − zpY + 1) which would yield that also μ vanishes4 , a contradiction. Therefore, s > 0, and without loss of generality, s = 1. Now we set y = 0 and ﬁx any h ∈ 1 and let r tend to zero from above and set r = −α xα,δ + h 1 − xα,δ 1 . This yields the inequality 0 ≥ −α xα,δ + h 1 − xα,δ 1 + μ, KhY ∗ ,Y for all h ∈ 1 . For any t ∈ R and k ∈ N, we set h = te(k) and arrive at α,δ − t μ, Ke(k) Y ∗ ,Y ≥ 0 . α xα,δ k + t − xk For ﬁxed k with xα,δ = 0 we choose t so small such that sign xα,δ +t = k k . Then the previous inequality reads as sign xα,δ k t α sign xα,δ − μ, Ke(k) Y ∗ ,Y ≥ 0 . k Choosing t > 0 and t < 0 yields α sign xα,δ = μ, Ke(k) Y ∗ ,Y ; that is, k (k) μ, Ke Y ∗ ,Y = α. This holds for every k ∈ J := k ∈ N : xα,δ = 0 . k This implies that J is ﬁnite because of μ, Ke(k) Y ∗ ,Y → 0 as k → ∞. (b) We apply Theorem 4.23 and have to verify Assumptions 4.21 and 4.22 for the special case D(K) = D(Ω) = X = 1 , E(x) = x − x∗ 1 , and Ω(x) = x 1 for x ∈ X = 1 . First we show that ϕ is a concave index function. Indeed, ϕ is continuous, monotonic, and concave as the inﬁmum of aﬃne functions 4 the
reader should prove this himself.
4.2
The Nonlinear Tikhonov Regularization
143
(see Problem 4.6). Furthermore, ϕ(0) = 2 inf
n∈N
j>n
x∗j  = 0 which shows
that ϕ is a concave index function. Assumption 4.21; that is, existence of a minimum of Jα,δ can be shown using results on weak and weak∗topologies. (The assumption that y ∗ , Ae(k) Y ∗ ,Y tends to zero for all y ∗ is equivalent to the weak∗weak continuity of K. Then one uses that · pY and · 1 are lower weak semicontinuous and lower weak∗ semicontinuous, respectively.) We do not carry out this part but refer to, e.g., [96]. Assumption 4.22 is therefore equivalent to (with β = 1) for all x ∈ 1 . (4.24) x − x∗ 1 − x 1 + x∗ 1 ≤ ϕ K(x − x∗ )Y To prove this we have for any n ∈ N n
xj − x∗j 
=
j=1
n
j=1
=
n
j=1
sj e(j) , x − x∗ ∞ , 1 =
n
sj K ∗ fj , x − x∗ ∞ , 1
j=1
n sj fj , K(x − x∗ )Y ∗ ,Y ≤ s f j j j=1
Y
∗
K(x − x∗ )Y
∗
≤ γn K(x − x )Y where sj = sign(xj − x∗j ) ∈ {0, 1, −1}. Therefore, with xj  ≥ x∗j  − xj − x∗j , n
xj − x∗j  − xj  + x∗j 
≤ 2
n
xj − x∗j  ≤ 2γn K(x − x∗ )Y .
j=1
j=1
Furthermore,
xj − x∗j  − xj  + x∗j  ≤ 2 x∗j  ; j>n
that is,
j>n
⎡
x − x∗ 1 − x 1 + x∗ 1 ≤ 2 ⎣γn K(x − x∗ )Y +
⎤ x∗j ⎦ (4.25)
j>n
This shows (4.24) since this estimate holds for all n ∈ N. Therefore, all of the assumptions of Theorem 4.23 are satisﬁed, and the error estimates are shown. (c) We estimate ϕ(t). First we observe that (γn ) is monotonically increasing. If γn is bounded then it converges to some ﬁnite γ ∈ R. Therefore, we let n tend to inﬁnity in the deﬁnition of ϕ and arrive at ϕ(t) = 2γ t. We consider now the case that γn tends to inﬁnity. First we estimate
1 σ ∗ c γn t + x∗j  ≤ γn t + σ γj xj  ≤ γn t + σ γn+1 j>n γn+1 j>n with c =
∞
j=1
γjσ x∗j . For suﬃciently small t, the index 1 n(t) = max n ∈ N : γn ≤ 1/(1+σ) t
144
Nonlinear Inverse Problems
is welldeﬁned and ﬁnite. Then γn(t) ≤ t−1/(1+σ) and γn(t)+1 > t−1/(1+σ) . Therefore, c ϕ(t) ≤ γn(t) t + σ ≤ (1 + c) tσ/(1+σ) . γn(t)+1 Finally, if x∗ is sparse exists n ∈ N such that x∗j = 0 for all j > n.
then there ∗ For that n the series j>n xj  in (4.25) vanishes which shows the result for the linear index function ϕ(t) = 2γn t. Remark: The reciprocals 1/γn play the role of the singular values in the case of Hilbert spaces X = 2 and Y with a singular system {μj , e(j) , g (j) : j ∈ J}. Indeed, the assumption e(j) = K ∗ fj is satisﬁed with fj = μ1j g (j) and for γn one has the form
2 n n
n s2j 1 γn2 = sup s f = sup = j j μ2 μ2 sj ∈{0,1,−1} j=1
The condition that x∗ ∈ R (A∗ A)σ/2 .
∞
j=1
Y
sj ∈{0,1,−1} j=1
j
j=1
j
γjσ x∗j  converges corresponds to the source condition
We go now back to the general case studied in Theorem 4.23. We wish to carry over Theorem 4.13 which proves the variational source condition from the classical one of Assumption 4.11. The essential formula used in the proof of part (a) of Theorem 4.13 was based on the binomial formula; that is, ˆ2X − x∗ − x ˆ2X − 2 (x − x∗ , x∗ − x ˆ)X . x∗ − x2X = x − x If we denote the penalty term by Ω(x); that is, Ω(x) = x − x ˆ2X then we can write this formula as x∗ − x2X = B Ω (x, x∗ ) := Ω(x) − Ω(x∗ ) − Ω (x∗ ), x − x∗ X ∗ ,X where Ω (x∗ ) : X → R is the Fr´echet derivative of Ω at x∗ and , zX ∗ ,X is the application of ∈ X ∗ to z ∈ X. The function B Ω (x, x∗ ) is the famous Bregman distance corresponding to the function Ω. This can be extended to Banach spaces because for convex and diﬀerentiable functions Ω from a Banach space X into R the function B Ω (x, x∗ ) := Ω(x) − Ω(x∗ ) − Ω (x∗ ), x − x∗ X ∗ ,X ,
x∈X,
is nonnegative on X (see Lemma A.70 of Appendix A.8). If Ω is even strictly convex, then B Ω (x, x∗ ) = 0 ⇔ x = x∗ . If Ω : X → R is convex and only continuous then the subdiﬀerential ∂Ω(x∗ ) ⊂ X ∗ ; that is, set of subgradients is nonempty (see Lemma A.72 of Appendix A.8). We recall that ∂Ω(x∗ ) ⊂ X ∗ is the set of all ∈ X ∗ with Ω(x) − Ω(x∗ ) − , x − x∗ X ∗ ,X ≥ 0 for all x ∈ X . If Ω is convex and diﬀerentiable then ∂Ω(x∗ ) = Ω (x∗ ) (Lemma A.72). Therefore, we formulate the following deﬁnition:
4.2
The Nonlinear Tikhonov Regularization
145
Definition 4.25 Let X be a normed space, A ⊂ X convex and open, and Ω : A → R convex and continuous with subdiﬀerential ∂Ω(x∗ ) at some x∗ ∈ A. For
∈ ∂Ω(x∗ ) the Bregman distance is deﬁned as B Ω (x, x∗ ) := Ω(x) − Ω(x∗ ) − , x − x∗ X ∗ ,X ,
x ∈ A.
We note that the Bregman distance depends on the function Ω. It measures the defect of the function with its linearization at x∗ . With the Bregman distance as E(x) we have an analogue of Theorem 4.13. Lemma 4.26 Let x∗ ∈ D(K) ∩ D(Ω) be a solution of K(x) = y ∗ and let ϕ : [0, ∞) → R be a concave index function. Furthermore, let Ω : X → R be convex and continuous and ∈ ∂Ω(x∗ ) and E(x) = B Ω (x, x∗ ). (a) Then, for this particular choice of E(x), the variational source condition of Assumption 4.22 is equivalent to the following condition: There exists ρ > 0 and 0 ≤ σ < 1 such that , x∗ − xX ∗ ,X ≤ σ B Ω (x, x∗ ) + ϕ K(x∗ ) − K(x)Y (4.26) for all x ∈ Mρ . (b) Assume that there exist w ∈ Y ∗ and a concave index function η such that
= K (x∗ )∗ w and K (x∗ )(x − x∗ )Y ≤ η K(x∗ ) − K(x)Y for all x ∈ Mρ . Then Assumption 4.22 holds with ϕ(t) = wY ∗ η(t) for t ≥ 0. Proof: (a) This follows directly from the deﬁnitions of E(x) and B Ω (x, x∗ ). Indeed, the estimate in Assumption 4.22 reads as β B Ω (x, x∗ ) ≤ Ω(x) − Ω(x∗ ) + ϕ K(x∗ ) − K(x)Y ; that is, β B Ω (x, x∗ ) ≤ B Ω (x, x∗ ) + , x − x∗ X ∗ ,X + ϕ K(x∗ ) − K(x)Y which is equivalent to , x∗ − xX ∗ ,X ≤ (1 − β) B Ω (x, x∗ ) + ϕ K(x∗ ) − K(x)Y . This proves part (a) with σ = 1 − β if β ≤ 1 and σ = 0 otherwise. (b) If = K (x∗ )∗ w then , x∗ − xX ∗ ,X
= K (x∗ )∗ w, x∗ − xX ∗ ,X = w, K (x∗ )(x∗ − x)Y ∗ ,Y ≤ wY ∗ K (x∗)(x∗ − x)Y ≤ wY ∗ η K(x∗ ) − K(x)Y .
This proves the condition of (a) with σ = 0; that is, β = 1.
Combining Theorem 4.23 with these particular choices, we have the following theorem:
146
Nonlinear Inverse Problems
Theorem 4.27 Let Assumption 4.21 hold and let x∗∈ D(K) ∩ D(Ω) be a solution of K(x) = y ∗ . Furthermore, let δmax = sup K(x∗ ) − K(x)Y : x ∈ D(K) ∩ D(Ω) and ϕ : [0, δmax ) → R be a concave index function, and for some constants ρ > 0 and 0 ≤ σ < 1, let the source condition (4.26) hold for some
∈ ∂Ω(x∗ ). Let α = α(δ) be chosen according to (4.19) or (4.20). Then we have the error estimates B Ω (xα(δ),δ , x∗ ) = O ϕ(δ) and K(xα(δ),δ ) − y ∗ Y = O(δ) as δ tends to zero. As a particular and obviously important example, we now take Ω(x) := x − x ˆpX ,
x∈X,
for some p > 1. Then one would like to characterize the Bregman distance — or, at least, construct lower bounds of B Ω (x, x∗ ) in terms of x − x∗ pX . This leads to the concept of p−convex Banach spaces. Definition 4.28 A Banach space X is called p−convex for some p > 0 if there exists c > 0 such that x + ypX − xpX − x , yX ∗ ,X ≥ c ypX for all x ∈ ∂ · pX (x) and x, y ∈ X.5 In other words, for p−convex spaces the Bregman distance B Ωz (x, z) corresponding to Ω(x) = x − x ˆpX can be bounded below by c x − zpX for all x, z ∈ X. Therefore, if the assumptions of the previous Theorem 4.27 holds one has the rate xα(δ),δ − x∗ pX = O ϕ(δ) . As a particular example, we show that Lp (D) are p−convex for all p > 2. Lemma 4.29 Let p > 1 and D ⊂ Rn open and x(t)p dt = f (x) = xp p L (D)
for x ∈ Lp (D) .
D
(a) Then f is diﬀerentiable and p−1 sign x(t) dt f (x)y = p y(t) x(t)
for x, y ∈ Lp (D) .
D
(b) Let p > 2. Then there exists cp > 0 with f (x + y) − f (x) − f (x)y ≥ cp ypLp (D)
for all x, y ∈ Lp (D) .
5 Actually, the classical deﬁnition uses the dual mapping instead of the subdiﬀerential. However, by Asplund’s theorem (see [48]) they are equivalent.
4.2
The Nonlinear Tikhonov Regularization
147
Proof: (a) First we observe that the integral for f (x)y exists by H¨older’s p and s = p then 1r + 1s = 1 and thus inequality. Indeed, set r = p−1
x(t)p−1 y(t) dt
D
≤
⎛ ⎞1/r ⎛ ⎞1/s ⎝ x(t)r(p−1) dt⎠ ⎝ y(t)s dt⎠ D
D
⎛ ⎞(p−1)/p ⎛ ⎞1/p ⎝ y(t)p dt⎠ = ⎝ x(t)p dt⎠ D
D
p = xp−1 Lp (D) yL (D) .
We use the following elementary estimate. There exist constants c+ > 0 and cp ≥ 0 with cp = 0 for p ≤ 2 and cp > 0 for p > 2 such that for all z ∈ R ' c+ zp if p ≤ 2 or z ≥ 12 , p p cp z ≤ 1 + z − 1 − p z ≤ (4.27) c+ z2 if p > 2 and z ≤ 12 . We give a proof of this estimate in Lemma A.74 for the convenience of the reader. Therefore, for any x, y ∈ R with x = 0, we have y p y ≥ 0 x + yp − xp − p y xp−1 sign x = xp 1 + − 1 − p x x and
y p y x + yp − xp − p y xp−1 sign x = xp 1 + − 1 − p x x ' c+ xp y/xp = c+ yp if p ≤ 2 or 2y ≥ x , ≤ c+ xp y/x2 = c+ xp−2 y2 if p > 2 and 2y ≤ x .
This holds obviously also for x = 0. Now we apply this to x(t) and y(t) with x, y ∈ Lp (D). This shows already that x(t) + y(t)p − x(t)p − p y(t) x(t)p−1 sign x(t) dt ≥ 0 . D
Next we show that x(t) + y(t)p − x(t)p − p y(t) x(t)p−1 sign x(t) dt ≤ c ymin{2,p} Lp (D) D
for yLp (D) ≤ 1. This would ﬁnish the proof of part (a) because p > 1. For proving this estimate we deﬁne T := t ∈ D : 2y(t) ≥ x(t) . Then x(t) + y(t)p − x(t)p − p y(t) x(t)p−1 sign x(t) dt ≤ c+ y(t)p dt T
T
148
Nonlinear Inverse Problems
and
x(t) + y(t)p − x(t)p − p y(t) x(t)p−1 sign x(t) dt
D\T
≤
⎧ ⎪ ⎪ ⎪ ⎪ ⎨
y(t)p dt
if p ≤ 2 ,
x(t)p−2 y(t)2 dt
if p > 2 .
c+
⎪ ⎪ c+ ⎪ ⎪ ⎩
D\T
D\T
If p ≤ 2 we just add the two estimates and have shown the estimate x(t) + y(t)p − x(t)p − p y(t) x(t)p−1 sign x(t) dt ≤ c+ yp p
L (D)
D
If p > 2 we apply H¨ older’s inequality to the integral p Indeed, set r = p−2 and s = p2 then 1r + 1s = 1 and thus
x(t)p−2 y(t)2 dt
≤
D\T
1/r
D\T
⎛ ⎜ = ⎝
x(t)r(p−2) dt
≤
⎜ ⎝
⎜ ⎝
D\T 2 xp−2 Lp (D) yLp (D)
D\T
⎛
⎞(p−2)/p ⎛ ⎟ x(t)p dt⎠
x(t)p−2 y(t)2 dt. ⎞1/s
⎟ y(t)2s dt⎠
D\T
.
⎞2/p
⎟ y(t)p dt⎠
D\T
.
Therefore,
x(t) + y(t)p − x(t)p − p y(t) x(t)p−1 sign x(t) dt
D
≤
2 c+ ypLp (D) + c+ xp−2 Lp (D) yLp (D) .
For p > 2 and yLp (D) ≤ 1 we have ypLp (D) ≤ y2Lp (D) and thus x(t) + y(t)p − x(t)p − p y(t) x(t)p−1 sign x(t) dt ≤ c y2 p L (D) . D
This proves part (a). (b) By part (a) we have to show that x(t) + y(t)p − x(t)p − p y(t) x(t)p−1 sign x(t) − cp y(t)p dt ≥ 0 D
4.3
The Nonlinear Landweber Iteration
149
for all x, y ∈ Lp (D). We show that the integrand is nonnegative. Indeed, we have y p y p y x + yp − xp − p y xp−1 sign x − cp yp = xp 1 + − 1 − p − cp x x x and this is nonnegative by (4.27).
By the same method, it can be shown that also the spaces p of sequences and the Sobolev spaces W m,p (D) are p−convex for p ≥ 2 and any m ∈ N0 .
4.3
The Nonlinear Landweber Iteration
As a general criticism towards Tikhonov’s method in the nonlinear case, we mention the disadvantage that the convergence results hold only for the global minima of the Tikhonov functional Jα,δ which is, in general, a nonconvex function. In the best of all cases, the global minima can only be computed by iterative methods6 . Therefore, it seems to be natural to solve the nonlinear equation K(x) = y δ directly by an iterative scheme such as Newtontype methods or Landweber methods. In this section, we present the simplest of such an algorithm and follow the presentation of the paper [126] and also the monograph [149]. In this section, let again X and Y be Hilbert spaces and K : X ⊃ D(K) → Y a continuously Fr´echet diﬀerentiable mapping from the open domain of deﬁnition D(K) and let x ˆ ∈ D(K). First we recall the nonlinear Tikhonov functional Jα,δ (x) = K(x) − y δ 2Y + αx − x ˆ2X ,
x ∈ D(K) ,
for α, δ ≥ 0. This functional is diﬀerentiable. Lemma 4.30 Jα,δ is diﬀerentiable in every x∗ ∈ D(K) and Jα,δ (x∗ )h = 2 Re K(x∗ ) − y δ , K (x∗ )h Y + 2α Re(x∗ − x ˆ, h)X ,
h∈X.
Proof: We have K(x∗ +h) = K(x∗ )+K (x∗ )h+r(h) with r(h)Y /hX → 0 for h → 0. Using the binomial formula a + b2 = a2 + 2 Re(a, b) + b2 we have Jα,δ (x∗ + h) = K(x∗ + h) − y δ 2Y + αx∗ + h − x ˆ2X
= (K(x∗ ) − y δ ) + (K (x∗ )h + r(h))2Y + α(x∗ − x ˆ) + h2X = K(x∗ ) − y δ 2Y + 2 Re K(x∗ ) − y δ , K (x∗ )h + r(h) Y
+ K (x∗ )h + r(h)2Y + αx∗ − x ˆ2X + 2α Re(x∗ − x ˆ, h)X
+ αh2X = Jα,δ (x∗)+2 Re K(x∗)−y δ , K (x∗ )h Y +2α Re(x∗ − x ˆ, h)X + r˜(h) 6 Even this is only possible in very special cases. Usually, one must be satisﬁed with critical points of Jα,δ
150
Nonlinear Inverse Problems
with r˜(h) = 2 Re K(x∗ ) − y δ , r(h) Y + K (x∗ )h + r(h)2Y + αh2X , thus limh→0 r˜(h)/hX = 0.
(x∗ )h = 0 for Lemma 4.31 If x∗ ∈ D(K) is a local minimum of Jα,δ then Jα,δ ∗ ∗ ∗ δ ∗ all h ∈ X, thus K (x ) K(x ) − y + α (x − x ˆ) = 0.
Proof: Let h = 0 ﬁxed. For suﬃciently small t > 0 the point x∗ + th belongs to D(K) because D(K) is open. Therefore, for suﬃciently small t > 0 the optimality of x∗ implies Jα,δ (x∗ + th) − Jα,δ (x∗ ) /thX ≥ 0, thus 0 ≤
Jα,δ (x∗ + th) − Jα,δ (x∗ ) − Jα,δ (x∗ )(th) (x∗ )h Jα,δ + . thX hX
The inequality Jα,δ (x∗ )h ≥ 0 follows as t tends to zero. This implies the assertion since this holds for all h.
The equation
ˆ) = 0 K (x∗ )∗ K(x∗ ) − y δ + α (x∗ − x
(4.28)
reduces to the well known normal equation (2.16) in the case that K is linear and x ˆ = 0 because the derivative is given by K (x∗ ) = K. However, in general, this equation is nonlinear, and one has to apply an iterative method for solving this equation. Instead of solving the well posed nonlinear equation (4.28), one can directly use an iterative scheme to solve—and regularize—the unregularized equation K (x∗ )∗ K(x∗ ) − y δ = 0, which can be written as x∗ = x∗ − a K (x∗ )∗ K(x∗ ) − y δ with an arbitrary number a > 0. This is a ﬁxed point equation, and it is natural to solve it iteratively by the ﬁxed point iteration: Fix x ˆ ∈ D(K). Set xδ0 = x ˆ and xδk+1 = xδk − a K (xδk )∗ K(xδk ) − y δ , k = 0, 1, 2, . . . This is called the nonlinear Landweber iteration because it reduces to the well known Landweber iteration in the linear case, see Sections 2.3 and 2.6. At the moment, it is not clear that it is welldeﬁned; that is, that all of the iterates lie in the domain of deﬁnition D(K). We choose ρ > 0 and a > 0 such that aK (x)2L(X,Y ) ≤ 1 for all x ∈ B(ˆ x, ρ). δ Then we scale the equation K(x) = y ; that is, we replace it by the equivalent ˜ = √a K and y˜δ = √a y δ . Then K ˜ (x)2 ˜ ≤1 equation K(x) = y˜δ with K for all x ∈ B(ˆ x, ρ), and the Landweber iteration takes the form ˜ δk ) − y˜δ , k = 0, 1, 2, . . . ˜ (xδk )∗ K(x xδk+1 = xδk − K
L(X,Y )
4.3
The Nonlinear Landweber Iteration
151
Therefore, we assume from now on that a = 1 and K (x)L(X,Y ) ≤ 1 for all x ∈ B(ˆ x, ρ). The Landweber iteration thus takes the form xδ0 = x (4.29) ˆ , xδk+1 = xδk − K (xδk )∗ K(xδk ) − y δ , k = 0, 1, 2, . . . The following, rather strong, assumption is called the “tangential cone condition” and will ensure that the Landweber iteration is welldeﬁned and will also provide convergence. Assumption 4.32 (Tangential Cone Condition) Let B(ˆ x, ρ)⊂D(K) be some ball, K diﬀerentiable in B(ˆ x, ρ) with K (x)L(X,Y ) ≤1 x, ρ), and for all x ∈ B(ˆ x, ρ). Furthermore, let K be Lipschitz continuous on B(ˆ there exists η < 12 with K(˜ x) − K(x) − K (x)(˜ x − x) Y ≤ η K(˜ x) − K(x)Y (4.30) for all x, x ˜ ∈ B(ˆ x, ρ). We compare this estimate with the estimate K(˜ x) − K(x) − K (x)(˜ x − x) Y ≤ c ˜ x − x2X , which holds for Lipschitzcontinuously diﬀerentiable functions (see Lemma A.63). If there exists c with ˜ x − xX ≤ c K(˜ x) − K(x)Y then (4.30) follows. However, such an estimate does not hold for illposed problems (see Deﬁnition 4.1 and Problem 4.1). The condition (4.30) is very strong but in some cases it can be veriﬁed (see Problem 4.7). First we draw some conclusions from this condition. The ﬁrst one is a basic inequality which will be used quite often in the following. Corollary 4.33 Under Assumption 4.32, the following holds: 1 1 K (x)(˜ K (x)(˜ x − x)Y ≤ K(˜ x) − K(x)Y ≤ x − x)Y (4.31) 1+η 1−η for all x, x ˜ ∈ B(ˆ x, ρ). Proof:
The right estimate follows from
K(˜ x) − K(x)Y
≤ ≤
K(˜ x) − K(x) − K (x)(˜ x − x)Y + K (x)(˜ x − x)Y η K(˜ x) − K(x)Y + K (x)(˜ x − x)Y ,
and the left from K(˜ x) − K(x)Y
= K (x)(˜ x − x) − K (x)(˜ x − x) − K(˜ x) + K(x) Y ≥
K (x)(˜ x − x)Y − η K(˜ x) − K(x)Y .
152
Nonlinear Inverse Problems
With this lemma, we try to justify the notion ”tangential cone condition” and deﬁne the convex cones C± (x) in R × X with vertex at (0, x) as 1 K (x)(x − x ˜) ∈ R × X : r ≥ ˜)Y . C± (x) = (r, x 1±η ˜ ∈ C+ (x) \ int C− (x) Then (4.31) can be formulated as K(˜ x) − K(x)Y , x for every x, x ˜ ∈ B(ˆ x, ρ). Therefore, for every x ∈ B(ˆ x, ρ) the graph of x ˜ → K(˜ x) − K(x)Y lies between the cones C+ (x) and C− (x) which are build by the tangent space at x. We draw two conclusions from this corollary. Under Assumption 4.32, the reverse of Theorem 4.3 holds, and there exists a unique minimum norm solution with respect to x ˆ. Definition 4.34 Let K : X ⊃ D(K) → Y be a nonlinear mapping, y ∗ ∈ Y , and x ˆ ∈ X. A point x∗ ∈ D(K) with K(x∗ ) = y ∗ is called minimum norm ˆX ≤ x − x ˆX for all solutions x ∈ D(K) solution with respect to x ˆ if x∗ − x ∗ of K(x) = y . Lemma 4.35 Let Assumption 4.32 hold for some ball B(ˆ x, ρ) and let x∗ ∈ B(ˆ x, ρ) with K(x∗ ) = y ∗ . (a) Let the linear equation K (x∗ )h = 0 be illposed in the sense of Deﬁnition 4.1. Then the nonlinear equation K(x) = y is locally illposed in x∗ . (b) x∗is a minimum norm solution with respect to x ˆ if, and only if, x∗ − x ˆ⊥ ∗ N K (x ) (c) There exists a unique minimum norm solution x† of K(x) = y ∗ with respect to x ˆ. ˆX . Since the linear equation is illposed Proof: (a) Let r ∈ 0, ρ − x∗ − x there exists a sequence hn ∈ X with hn X = 1 and K (x∗ )hn Y → 0 for ˜ = x∗ and x = xn in (4.31) it follows that n → ∞. We set xn = x∗ + rhn . For x ∗ ∗ K(xn ) → K(x ) and xn − x X = r. (b) For the characterization of a minimum norm solution it suﬃces to compare x∗ − x ˆX with ˜ x−x ˆX for solutions x ˜ ∈ B(ˆ x, ρ) of K(˜ x) = y ∗ . The estimate ˜ ∈ B(ˆ x, ρ) is a solution of K(˜ x) = y ∗ if, and only (4.31) for x = x∗ implies that x x) = 0; that is, if x∗ −˜ x ∈ N K (x∗ ) . The point x∗ is a minimum if, K (x∗ )(x∗ −˜ ˆX ≤ ˜ x−x ˆX for all norm solution with respect to x ˆ if, and only if, x∗ − x ˜ with x∗ − x ˜ ∈ N K (x∗ ) . solutions x ˜ ∈ B(ˆ x, ρ) of K(˜ x) = y ∗ ; that is, for all x ˜ by z shows that x∗ is a minimum norm solution Replacing x∗ − x with respect ∗ ∗ − x ˆ in N K (x ) which to x ˆ if, and only if, 0 is the best approximation of x ∗ ∗ ˆ ⊥ N K (x ) . is equivalent to x − x ∗ (c) The subspace N K (x ) is closed. Therefore, there exists a unique best ˆ in N K (x∗ ) .7 Then x† = x∗ − p ∈ x∗ + N K (x∗ ) approximation p of x∗ − x 7 This
follows from a general theorem on best approximations in Hilbert spaces.
4.3
The Nonlinear Landweber Iteration
153
is the best approximation at x ˆ in x∗ + N K (x∗ ) . Also, K(x† ) = y ∗ because ˆX ≤ x∗ − x ˆX < ρ, thus x† ∈ x∗ − x† = p ∈ N K (x∗ ) . Finally, x† − x B(ˆ x, ρ). The following result proves that the Landweber iteration is welldeﬁned, and it motivates a stopping rule. Theorem 4.36 Let Assumption 4.32 hold. Let y δ − y ∗ Y ≤ δ and x∗ ∈ B(ˆ x, ρ/2) with K(x∗ ) = y ∗ . Furthermore, let r > 2(1+η) 1−2η and assume that δ ∗ there exists k∗ ∈ N with xk ∈ B(x , ρ/2) for all k = 0, . . . , k∗ − 1 (where xδk are deﬁned by (4.29)) such that K(xδk ) − y δ Y ≥ rδ
for all k = 0, . . . , k∗ − 1 .
(4.32)
Then the following holds. (a) xδk+1 − x∗ X ≤ xδk − x∗ X for all k = 0, . . . , k∗ − 1. In particular, xδk − x∗ X ≤ xδ0 − x∗ X = ˆ x − x∗ X < ρ/2 for all k = 0, . . . , k∗ . δ ∗ x, ρ) for k = 0, . . . , k∗ . Therefore, all xk belong to B(x , ρ/2) ⊂ B(ˆ (b) For all ∈ {0, . . . , k∗ − 1} it holds that k∗ −1 2(1 + η)
1 − 2η − K(xδk ) − y δ 2Y ≤ xδ − x∗ 2X − xδk∗ − x∗ 2X . r k= (4.33) (c) If δ = 0 then (4.32) holds for all k ∈ N, and (we write xk instead of x0k ) ∞
K(xk ) − y ∗ 2Y ≤
k=0
ˆ x − x∗ 2X . 1 − 2η
(4.34)
In particular, K(xk ) converges to y ∗ . Proof:
Since K (xδk )∗ L(Y,X) ≤ 1 we have for k = 0, . . . , k∗ − 1
xδk+1 − x∗ 2X − xδk − x∗ 2X = = ≤ = ≤ ≤ = = ≤
2 Re(xδk − x∗ , xδk+1 − xδk )X + xδk+1 − xδk 2X 2 2 Re xδk − x∗ , K (xδk )∗ (y δ − K(xδk )) X + K (xδk )∗ (y δ − K(xδk )) X 2 Re K (xδk )(xδk − x∗ ), y δ − K(xδk ) X + y δ − K(xδk )2Y 2 Re K (xδk )(xδk − x∗ ) − K(xδk ) + y δ , y δ − K(xδk ) X − y δ − K(xδk )2Y 2η K(xδk ) − y ∗ Y + 2δ − K(xδk ) − y δ Y K(xδk ) − y δ Y (2η − 1) K(xδk ) − y δ Y + 2δ + 2ηδ K(xδk ) − y δ Y 2δ(1 + η) − (1 − 2η) K(xδk ) − y δ Y K(xδk ) − y δ Y (4.35) 1 2(1 + η)δr − (1 − 2η)r K(xδk ) − y δ Y K(xδk ) − y δ Y r 2(1 + η) − (1 − 2η)r K(xδk ) − y δ 2Y r
154
Nonlinear Inverse Problems
where we used (4.32) in the last estimate. This implies (a), because 2(1 + η) − (1 − 2η)r < 0 by the choice of r. With α = (1 − 2η)r − 2(1 + η) /r > 0 we just have shown the estimate α K(xδk ) − y δ 2Y ≤ xδk − x∗ 2X − xδk+1 − x∗ 2X ,
k = 0, . . . , k∗ − 1 ,
and thus for ∈ {0, . . . , k∗ − 1}: α
k
∗ −1
K(xδk ) − y δ 2Y ≤ xδ − x∗ 2X − xδk∗ − x∗ 2X .
k=
(c) We consider the above estimate up to line (4.35) for δ = 0; that is, (1 − 2η) K(xk ) − y ∗ 2Y ≤ xk − x∗ 2X − xk+1 − x∗ 2X , and thus for all m ∈ N (1 − 2η)
m−1
K(xk ) − y ∗ 2Y ≤ x0 − x∗ 2X − xm − x∗ 2X ≤ x0 − x∗ 2X .
k=0
Therefore the series converges which yields the desired estimate.
Let now δ > 0 and again r > 2(1+η) condition (4.32) can not hold for 1−2η . The every k∗ . Indeed, otherwise the sequence xk −x∗ X would be a monotonically decreasing and bounded sequence, thus convergent. From (4.33) for = k∗ − 1 it would follows that 2(1 + η) 1 − 2η − K(xδk∗ −1 )−y δ 2Y ≤ xδk∗ −1 −x∗ 2X − xδk∗ −x∗ 2X −→ 0 r for k∗ → ∞, a contradiction. Therefore, the following stopping rule is welldeﬁned. Stopping rule: Let δ > 0 and r > 2(1+η) 1−2η . We deﬁne k∗ = k∗ (δ) ∈ N as the uniquely determined number such that K(xδk ) − y δ Y ≥ rδ > K(xδk∗ ) − y δ Y
for all k = 0, . . . , k∗ − 1 . (4.36)
With this stopping rule, we can show convergence as δ → 0. First we note that for any ﬁxed δ > 0 and k ≤ k∗ (δ), the mapping y δ → xδk is continuous. Indeed, in every of the ﬁrst k steps of the algorithm only continuous operations of the Landweber iteration are performed. We ﬁrst consider the case of no noise; that is, δ = 0. Theorem 4.37 Let Assumption 4.32 hold and let ρ > 0 and x∗ ∈ B(ˆ x, ρ/2) with K(x∗ ) = y ∗ . The Landweber iteration is well deﬁned for δ = 0 (that is, the iterates xk belong to D(K)) and the sequence (xk ) converges to a solution x, ρ) of K(x) = y ∗ . x ˜ ∈ B[x∗ , ρ/2] ⊂ B(ˆ
4.3
The Nonlinear Landweber Iteration
155
Proof: We have seen already in Theorem 4.36 that xk ∈ B(x∗ , ρ/2) for all k. We now show that (xk ) is a Cauchy sequence. Let ≤ m be ﬁxed. Determine k with ≤ k ≤ m and
K(xk ) − y ∗ Y ≤ K(xi ) − y ∗ Y
for all ≤ i ≤ m .
Because of x − xm 2 ≤ 2x − xk 2 + 2xk − xm 2 we estimate both terms separately and use the formula u − v2 + u2 − v2 = 2 Re(u, u − v) and the left estimate of (4.31).
xk − xm 2X + xk − x∗ 2X − xm − x∗ 2X m−1
= 2 Re xk − x∗ , xk − xm X = 2 Re xk − x∗ , xi − xi+1 X i=k
=2
m−1
Re xk − x∗ , K (xi )∗ (K(xi ) − y ∗ )
i=k
=2
m−1
i=k
≤2
m−1
X
Re K (xi )(xk − x∗ ), K(xi ) − y ∗ Y K(xi ) − y ∗ Y K (xi )(xk − x∗ )Y
i=k
≤2
m−1
K(xi ) − y ∗ Y K (xi )(xi − x∗ )Y + K (xi )(xi − xk )Y
i=k
≤ 2(1 + η)
m−1
K(xi ) − y ∗ Y K(xi ) − K(x∗ )Y + K(xi ) − K(xk )Y
i=k
≤ 2(1+η)
m−1
K(xi )−y ∗ Y K(xi )−y ∗)Y + K(xi )−y ∗ Y + y ∗ − K(xk )Y
i=k
≤ 2(1+η)
m−1
K(xi )−y ∗ Y K(xi )−y ∗ Y +K(xi )−y ∗ Y + y ∗ − K(xi )Y
i=k
= 6(1 + η)
m−1
K(xi ) − y ∗ 2Y .
i=k
Analogously, we estimate
xk − x 2X + xk − x∗ 2X − x − x∗ 2X ≤ 6(1 + η)
k−1
i=
K(xi ) − y ∗ 2Y
156
Nonlinear Inverse Problems
and thus x − xm 2X
≤
2x − xk 2X + 2xk − xm 2X
≤
2x − x∗ 2X − 2xk − x∗ 2X + 12(1 + η)
k−1
K(xi ) − y ∗ 2Y
i=
+2xm − x∗ 2X − 2xk − x∗ 2X + 12(1 + η)
m−1
K(xi ) − y ∗ 2Y
i=k
=
2x − x∗ 2X + 2xm − x∗ 2X − 4xk − x∗ 2X + 12(1 + η)
m−1
K(xi ) − y ∗ 2Y .
i=
∞ Since both, the sequence x − x∗ X and also the series i=0 K(xi ) − y ∗ 2Y , converge by Theorem 4.36 we conclude that also x − xm 2X converges to zero as → ∞. Therefore, (xm ) is a Cauchy sequence and thus convergent; that is, xm → x ˜ for some x ˜ ∈ B[x∗ , ρ/2]. The continuity of K implies that ∗ K(˜ x) = y . Now we prove convergence of xδk∗ (δ) as δ → 0 if k∗ (δ) is determined by the stopping rule. Theorem 4.38 Let Assumption 4.32 hold and let ρ > 0 and x∗ ∈ B(ˆ x, ρ/2) . Then the sequence stops by the stopping rule with K(x∗ ) = y ∗ and r > 2(1+η) 1−2η δ ∗ (4.36) and deﬁnes k∗ (δ) for all δ > 0. Then lim xk∗ (δ) = x ˜ where x ˜ ∈ B[x , ρ/2] δ→0
is the limit of the sequence for δ = 0 – which exists by the previous theorem and is a solution of K(˜ x) = y ∗ . Proof: Let (xk ) be the sequence of the Landweber iteration for δ = 0; that ˜ = limk→∞ xk . Furthermore, let (δn ) be a sequence which is, xk = x0k , and x converges to zero. Set for abbreviation kn = k∗ (δn ). We distinguish between two cases. Case 1: The sequence (kn ) of natural numbers has ﬁnite accumulation points. Let k ∈ N be the smallest accumulation point. Then there exists I ⊂ N of inﬁnite cardinality such that kn = k for all n ∈ I. By the deﬁnition of kn , we have K(xδkn ) − y δn Y < r δn
for all n ∈ I .
Since for this ﬁxed k, the iteration xδkn depends continuously on y δn , and since y δn converges to y ∗ , we conclude for n ∈ I, n → ∞, that xδkn → xk (this is the kth iteration of the sequence for δ = 0) and K(xk ) = y ∗ . Landweber’s iteration for δ = 0 implies that then xm = xk for all m ≥ k; that is, the sequence is ˜ for all m ≥ k. The same argument constant for m ≥ k. In particular, xm = x x. holds for any other accumulation point k˜ ≥ k and thus lim xδkn∗ (δn ) = xk˜ =˜ n→∞,n∈I˜
4.3
The Nonlinear Landweber Iteration
157
Case 2: The sequence (kn )n tends to inﬁnity. Without loss of generality, we assume that kn converges monotonically to inﬁnity. Let n > m. We apply Theorem 4.36 for x ˜ instead of x∗ and get xδknn − x ˜X
≤
xδknn−1 − x ˜X ≤ · · · ≤ xδknm − x ˜X
≤
xδknm − xkm X + xkm − x ˜X .
Let ε > 0. Choose m such that xkm − x ˜X ≤ ε/2. For this ﬁxed m, the sequence xδknm converges to xkm for n → ∞. Therefore, we can ﬁnd n0 with ˜X ≤ ε. xδknm − xkm X ≤ ε/2 for all n ≥ n0 . For these n ≥ n0 , we have xδknn − x Therefore, we have shown convergence of every sequence δn → 0 to the same limit x ˜. This ends the proof. Before we prove a result on the order of convergence, we formulate a condition under which the Landweber iteration converges to the minimum norm solution. x, ρ/2) be the minimum Lemma 4.39 Let Assumption 4.32 hold and let x∗ ∈ B(ˆ ˆ. (It exists and is unique by norm solution of K(x) = y ∗ with respect to x x, ρ/2). Lemma 4.35.) Let, furthermore, N K (x∗ ) ⊂ N K (x) for all x ∈ B(ˆ n of the Landweber iteration converge to Then the sequences (xk ) and xδk(δ n) n the minimum norm solution x∗ of K(x) = y ∗ . Proof: Convergence of the sequences has been shown in Theorems 4.37 and 4.38 already. Let x ˜ = limk→∞ xk . Then also x ˜ = limδ→0 xδk(δ) by Theorem 4.38. We show by induction that xk − x∗ ⊥ N K (x∗ ) for all k = 0, 1, . . .. For k = 0 ˆ, the minimum property of x∗ , and part this is true because x0 = x (b) of ∗ (x ) . Then Lemma 4.35. Let the assertion be true for k and let z ∈ N K z ∈ N K (xk ) , thus (xk+1 − x∗ , z)X = (xk − x∗ , z)X + K(xk ) − y ∗ , K (xk )z Y = 0 . ∗ Since the orthogonal complement of N K (x ) is closedwe conclude that also ˜ − x∗ ∈ N K (x∗ ) because of (4.31) for x ˜ − x∗ ⊥ N K (x∗ ) . Furthermore, x ˜ = x∗ . x = x∗ . Therefore, x We will now prove a rate of convergence. This part will be a bit more technical. It is not surprising that we need the source condition of Assumption 4.11 just as in the linear case or for the nonlinear Tikhonov regularization. Unfortunately, Assumptions 4.32 and 4.11 are not suﬃcient, and we have to strengthen them. x, ρ) be the minimum Assumption 4.40 Let Assumption 4.32 hold and x∗ ∈ B(ˆ ˆ. (It exists by Lemma 4.35). norm solution of K(x) = y ∗ with respect to x Furthermore, let K (x∗ ) compact and there exists C > 0 and a family {Rx : x ∈ B(ˆ x, ρ)} of linear bounded operators Rx : Y → Y with K (x) = Rx K (x∗ )
and
Rx − IL(Y,Y ) ≤ C x − x∗ X
for x ∈ B(ˆ x, ρ) .
158
Nonlinear Inverse Problems
In the linear case, both conditions are satisﬁed for Rx = I because K (x) = K for all x. Under this additional assumption, we can partially sharpen the tangential cone condition of Assumption 4.32. Lemma 4.41 Under Assumption 4.40, we have N K (x∗ ) ⊂ N K (x) for all x ∈ B(ˆ x, ρ) and K(x) − K(x∗ ) − K (x∗ )(x − x∗ ) Y K(x) − K(x∗ ) − K (x∗ )(x − x∗ ) Y
C x − x∗ X K (x∗ )(x − x∗ )Y , 2 C ≤ x − x∗ X K(x) − K(x∗ )Y 2 − ρC ≤
for all x ∈ B(ˆ x, ρ). Therefore, if we choose ρ such that η := condition (4.30) is satisﬁed for x = x∗ . Proof:
Cρ 2−ρC
0
1 =
K tx + (1 − t)x∗ − K (x∗ ) (x − x∗ ) dt ,
0
thus K(x) − K(x∗ ) − K (x∗ )(x − x∗ )Y 1 K tx + (1 − t)x∗ − K (x∗ ) (x − x∗ ) dt ≤ Y 0
1 ≤
Rtx+(1−t)x∗ − IL(Y,Y ) dt K (x∗ )(x − x∗ )Y
0
1
t dt x − x∗ X K (x∗ )(x − x∗ )Y
≤
C
=
C x − x∗ X K (x∗ )(x − x∗ )Y 2
0
which proves (4.37). We continue and estimate K(x) − K(x∗ ) − K (x∗ )(x − x∗ ) Y C x − x∗ X K (x∗ )(x − x∗ ) + K(x∗ ) − K(x) Y ≤ 2 C x − x∗ X K(x∗ ) − K(x)Y + 2
(4.38)
< 1/2 then
The ﬁrst assertion is obvious. For the second, we write we K(x) − K(x∗ ) − K (x∗ )(x − x∗ ) 1 d ∗ ∗ ∗ K tx + (1 − t)x − K (x )(x − x ) dt = dt
(4.37)
4.3
The Nonlinear Landweber Iteration
159
and thus 2−Cx−x∗ X K(x)−K(x∗ )−K (x∗)(x−x∗) Y ≤C x−x∗ X K(x∗ )−K(x)Y . This proves (4.38) because x − x∗ X ≤ ρ.
x, ρ/2)), we know from Lemma 4.39 Under this Assumption 4.40 (and x∗ ∈ B(ˆ that xδk(δ) converges to x∗ as δ → 0. For the proof of the order of convergence, we need the following elementary estimates: Lemma 4.42 1 − (1 − x2 )k x
≤
x (1 − x2 )k
≤
x2 (1 − x2 )k
≤
√
for all k ∈ N and x ∈ [0, 1] ,
k
1 for all k ∈ N and x ∈ [0, 1] , k+1 1 for all k ∈ N and x ∈ [0, 1] , k+1 √
(4.39) (4.40) (4.41)
for every α ∈ (0, 1] there exists c(α) > 0 with k−1
(j + 1)−α (k − j)−3/2 ≤
j=0
c(α) (k + 1)α
for all k ∈ N .
(4.42)
√ Proof: To show (4.39) let ﬁrst 0 < x ≤ 1/ k. Bernoulli’s inequality implies √ √ 2 k 2 ) ) (1−x2 )k ≥ 1−kx2 , thus 1−(1−x ≤ 1−(1−kx = kx ≤ k. Let now x ≥ 1/ k. x x √ 2 k ) Then 1−(1−x ≤ x1 ≤ k. For x = 0 the estimate follows by l’Hospital’s rule. x The proofs of estimates (4.40) and (4.41) are elementary and left to the reader (see also the proof of Theorem 2.8). Proof of (4.42): Deﬁne the function f (x) = (x + 1)−α (k − x)−3/2 for −1 < x < k. By the explicit computation of the second derivative, one obtains that f (x) ≥ 0 for all −1 < x < k.8 Taylor’s formula yields for j < k 1 f (zj,x )(x − j)2 ≥ f (j) + f (j)(x − j) , 2
f (x) = f (j) + f (j)(x − j) +
for j − 1/2 ≤ x ≤ j + 1/2 with some intermediate point zj,x , thus j+1/2
j+1/2
f (x) dx ≥ j−1/2
8 Therefore,
f is convex.
j−1/2
f (j) + f (j)(x − j) dx = f (j) ,
160
Nonlinear Inverse Problems
because the integral
j+1/2 j−1/2
(x − j) dx vanishes by the symmetry of the intervals.
Therefore, k−1
−1/2
(j + 1)
−3/2
(k − j)
=
j=0
k−1
f (j) ≤
j=0
k−1
f (x) dx =
j=0 j−1/2
k/2 =
k−1/2
j+1/2
f (x) dx −1/2
k−1/2
f (x) dx + −1/2
f (x) dx . k/2
For the ﬁrst integral, we have k/2 −1/2
3/2 k/2 2 f (x) dx ≤ (x + 1)−α dx . k −1/2
If α < 1 then k/2 −1/2
1−α 3/2 2 k 1 c(α) +1 f (x) dx ≤ ≤ c˜(α) k −α−1/2 ≤ . k 1−α 2 (k + 1)α
If α = 1 then k/2 f (x) dx ≤ −1/2
3/2 2 c(1) . ln(k/2) ≤ k k+1
For the remaining integral, we estimate ≤
α k−1/2 α k−1/2 2 2 (k − x)−3/2 dx = 2 (k − x)−1/2 k/2 k k
≤
α √ 2 2 2. k
k−1/2
f (x) dx k/2
k/2
This ﬁnishes the proof.
(4.39), (4.40), and (4.41) directly imply Corollary 4.43 Let A : X → Y be a linear and compact operator with adjoint A∗ and let AL(X,Y ) ≤ 1. Then (a) (I − A∗ A)k A∗ L(Y,X) ≤ (k + 1)−1/2 ,
4.3
The Nonlinear Landweber Iteration
161
(b) (I − A A∗ )k A A∗ L(Y,Y ) ≤ (k + 1)−1 , k−1 √ ∗ j ∗ (c) (I − A A) A ≤ k. j=0 L(Y,X)
Proof: Let {μi , xi , yi : i ∈ I} be a singular system for A, see Appendix A.6, Theorem A.57. For y = y0 + i∈I αi yi ∈ Y with A∗ y0 = 0, we have
2 (I − A∗ A)k A∗ y2X = (1 − μ2i )k μi αi2 ≤ i∈I
1 2 1 y2Y αi ≤ k+1 k+1 i∈I
because of (4.40). The part (b) follows in the way from (4.41). Finally, we observe that k−1
(I − A∗ A)j A∗ y =
j=0
k−1
j=0 i∈I
1 − (1 − μ2 )k i (1 − μ2i )j μi αi yi = αi yi μi i∈I
and thus by (4.39) 2 k−1
(I − A∗ A)j A∗ y ≤ k αi2 ≤ k y2Y . j=0 i∈I Y
Now we are able to prove the main theorem on the order of convergence. The proof is lengthy. Theorem 4.44 Let Assumption 4.40 hold, and let x∗ ∈ B(ˆ x, ρ) be the minimum ˆ. Deﬁne the constant norm solution of K(x) = y ∗ with respect to x 1 − 2η c∗ = 2 1 + η (5 − 4η) where η < 1/2 is the constant of Assumption 4.32. Furthermore, let the source ˆ = K (x∗ )∗ w and condition hold; that is, there exists w ∈ Y such that x∗ − x 9 C c2∗ max c(1/2), c(1) wY ≤ 1, 2 where C is the constant of Assumption 4.40 and c(α) are the constants of (4.42) for α = 1/2 and α = 1. Then there exists a constant c > 0 such that 1/2
xδk∗ (δ) − x∗ X ≤ c wY δ 1/2 ,
K(xδk∗ (δ) ) − y ∗ Y ≤ (1 + r) δ . (4.43)
Again, k∗ (δ) is the iteration index determined by the stopping rule (4.36).
162
Nonlinear Inverse Problems
Proof: The second estimate follows immediately from the property of k∗ (δ) and the triangle inequality. Set for abbreviation ek = xδk − x∗ and A = K (x∗ ). Then K (xδk ) = Rxδk A because of Assumption 4.40 and thus ek+1
= = =
=
ek − K (xδk )∗ K(xδk ) − y δ ek − A∗ K(xδk ) − y δ + A∗ (I − Rx∗ δ ) K(xδk ) − y δ k (I − A∗ A)ek − A∗ K(xδk ) − y δ − K (x∗ )ek + A∗ (I − Rx∗ δ ) K(xδk ) − y δ k (I − A∗ A)ek + A∗ (y δ − y ∗ ) − A∗ K(xδk ) − K(x∗ ) − K (x∗ )(xδk − x∗ ) + A∗ (I − Rx∗ δ ) K(xδk ) − y δ k
=
(I − A∗ A)ek + A∗ (y δ − y ∗ ) + A∗ zk ,
k = 0, . . . , k∗ (δ) − 1 ,
(4.44)
with zk = (I − Rx∗ δ ) K(xδk ) − y δ − K(xδk ) − K(x∗ ) − K (x∗ )(xδk − x∗ ) . (4.45) k
This is a recursion formula for ek . It is solved in terms of y δ − y ∗ and zj by ek
=
(I − A∗ A)k e0 +
k−1
(I − A∗ A)j A∗ (y δ − y ∗ ) + zk−j−1
j=0
⎤
⎡
k−1
= −(I − A∗ A)k A∗ w + ⎣
(I − A∗ A)j A∗ ⎦(y δ − y ∗ )
j=0
+
k−1
(I − A∗ A)j A∗ zk−j−1
(4.46)
j=0
for k = 0, . . . , k∗ (δ). (Proof by induction with respect to k.) Furthermore, because A(I − A∗ A)j = (I − A A∗ )j A,
4.3
Aek
The Nonlinear Landweber Iteration
= −(I − A A∗ )k A A∗ w +
163
k−1
(I − A A∗ )j A A∗ (y δ − y ∗ )
j=0
+
k−1
(I − A A∗ )j A A∗ zk−j−1
j=0
= −(I − A A∗ )k A A∗ w −
k−1
(I − A A∗ )j+1 − [(I − A A∗ )j (y δ − y ∗ )
j=0
+
k−1
(I − A A∗ )j A A∗ zk−j−1
j=0
= −(I − A A∗ )k A A∗ w + I − (I − A A∗ )k (y δ − y ∗ ) +
k−1
(I − A A∗ )j A A∗ zk−j−1
(4.47)
j=0
for k = 0, . . . , k∗ (δ). Now we consider zk from (4.45) for k ∈ {0, . . . , k∗ (δ) − 1} and estimate both terms of zk separately. Since r > 2 and K(xδk ) − y δ Y ≥ rδ > 2δ we have K(xδk )−y δ Y ≤ K(xδk )−y ∗ Y + δ ≤ K(xδk ) − y ∗ Y +
1 K(xδk ) − y δ Y 2
and thus K(xδk ) − y δ Y ≤ 2 K(xδk ) − y ∗ Y . With (4.31) and because η < 1/2 we have (I −R∗ δ ) K(xδk )−y δ ≤I −R∗ δ L(Y,Y ) K(xδk )−y δ Y ≤ 4C ek X Aek Y . x x Y k
k
For second term of zk we use (4.37) for x = xδk : K(xδk ) − K(x∗ ) − K (x∗ )(xδk − x∗ ) ≤ C ek X Aek Y Y 2 and thus zk Y ≤ 92 C ek X Aek Y . Now we estimate δ from above. Because of the stopping rule (4.36) and (4.31) we have for k = 0, . . . , k∗ (δ) − 1 r δ ≤ K(xδk ) − y δ Y ≤ K(xδk ) − K(x∗ )Y + δ ≤ and thus, because r − 1 > δ ≤
2(1+η) 1−2η
−1=
1 Aek Y + δ , 1−η
1+4η 1−2η ,
1 − 2η Aek Y , (1 + 4η)(1 − η)
k = 0, . . . , k∗ (δ) − 1 .
(4.48)
164
Nonlinear Inverse Problems
Now we show ej X ≤ c∗ wY √
1 , j+1
Aej Y ≤ c∗ wY
1 , j+1
(4.49)
and
2(1 − 2η) wY . (4.50) η(5 − 4η) k + 1 for j = 0, . . . , k and every k ≤ k∗ (δ) − 1. We show these three estimates by induction with respect to k < k∗ (δ). These are true for k = 0 because 2(1−2η) 1−2η ≤ η(5−4η) . e0 X ≤ wY and Ae0 Y ≤ wY and by (4.48) and (1+4η)(1−η) Let now (4.49) and (4.50) hold for k − 1. From the above representation (4.46) of ek , parts (a) and (c) of Corollary 4.43, the assumption of induction, and (4.42) for α = 1/2 we conclude that δ ≤
ek X
≤
k−1 √ 1 w 9
√ Y + kδ + C √ ek−j−1 X Aek−j−1 Y 2 j=0 j + 1 k+1
≤
k−1
√ 1 1 w 9 1 √ Y + k δ + C c2∗ w2Y √ √ 2 k − j j + 1 k − j k+1 j=0
≤ ≤
√ w c(1/2) 9 √ Y + k δ + C c2∗ w2Y √ 2 k+1 k+1 √ wY k + 1δ + 2 √ k+1
by assumption on wY . Analogously, it follows for Aek Y with (4.47) and part (b) of Corollary 4.43 and (4.42) for α = 1 with constant c(1): Aek Y
≤
k−1 9 1 wY + δ + C ek−j−1 X Aek−j−1 Y k+1 2 j=0 j + 1
≤
k−1
1 9 1 wY 1 √ + δ + C c2∗ w2Y k+1 2 j+1 k−j k−j j=0
9 c(1) wY + δ + C c2∗ w2Y k+1 2 k+1 wY ≤ δ + 2 . k+1 Next we substitute this estimate of Aek Y into the estimate (4.48) for δ and obtain 1 − 2η 1 − 2η wY Aek Y ≤ δ ≤ δ+2 . (1 + 4η)(1 − η) (1 + 4η)(1 − η) k+1 ≤
Since 1 −
1−2η (1+4η)(1−η)
=
η(5−4η) (1+4η)(1−η)
δ ≤
it follows estimate (4.50) for k; that is,
2(1 − 2η) wY . η(5 − 4η) k + 1
4.3
The Nonlinear Landweber Iteration
165
We substitute this into the estimates of ek X and Aek Y : ek X
≤
Aek Y
≤
w w 1 − 2η √ Y = c∗ √ Y , 2 1+ η(5 − 4η) k+1 k+1 wY wY 1 − 2η = c∗ . 2 1+ η(5 − 4η) k + 1 k+1
This shows (4.49) for k. Therefore, (4.49) and (4.50) are proven for all 0 ≤ j, k < k∗ (δ). In the case k∗ (δ) ≥ 1, we take k = k∗ (δ) − 1 in (4.50) and obtain k∗ (δ) ≤
2(1 − 2η) wY . η(5 − 4η) δ
(4.51)
From the estimate K(xδk ) − y δ Y ≤ 2K(xδk ) − K(x∗ )Y ≤ 4 Aek Y for k < k∗ (δ) it follows that K(xδk ) − y δ Y ≤ 4 c∗ wY
1 , k+1
2 1−η
Aek Y ≤
k = 0, . . . , k∗ (δ) − 1 .
(4.52)
Now we come to the ﬁnal part of the proof. We write ek from (4.46) in der form ⎤ ⎡ k−1
ek = −(I − A∗ A)k A∗ w + ⎣ (I − A∗ A)j A∗ ⎦(y δ − y ∗ ) j=0
+
k−1
(I − A∗ A)j A∗ zk−j−1
j=0
⎤
⎡
k−1
= −A∗ (I − AA∗ )k w + ⎣
(I − A∗ A)j A∗ ⎦(y δ − y ∗ )
j=0
+ A∗
k−1
j=0
(I − AA∗ )j zk−j−1 ⎤
⎡
k−1
(I − A∗ A)j A∗ ⎦(y δ − y ∗ )
= A∗ wk + ⎣
j=0
for k = 0, . . . , k∗ (δ) with wk = −(I − AA∗ )k w +
k−1
j=0
(I − AA∗ )j zk−j−1 .
(4.53)
166
Nonlinear Inverse Problems
Since (I − AA∗ )j L(Y,Y ) ≤ 1 for all j, we conclude wk Y
≤
k−1
wY +
zk−j−1 Y ≤ wY +
j=0
≤
≤
k−1 9
C ek−j−1 X Aek−j−1 Y 2 j=0
k−1 ∞
1 1 1 9 9 √ C c2∗ w2Y ≤ wY + C c2∗ w2Y 2 k−j k−j 2 j 3/2 j=0 j=1 ⎡ ⎤ ∞
1 9 2 ⎦ C c∗ wY ⎣1 + 2 j 3/2 j=1
wY +
for all k = 0, . . . , k∗ (δ). Here we used without loss of generality that wY ≤ 1. The bound on the righthand side is independent of δ. Similarly, we estimate AA∗ wk Y . Indeed, with ⎤ ⎡ k−1
(I − A∗ A)j A∗ ⎦(y δ − y ∗ ) AA∗ wk = Aek − ⎣A j=0
from (4.53) and
A
k−1
(I − A∗ A)j A∗
=
j=0
k−1
(I − AA∗ )j AA∗
j=0
=
−
k−1
k−1
j=0
j=0
(I − AA∗ )j+1 +
(I − AA∗ )j = I − (I − AA∗ )k
we estimate AA∗ wk Y
≤
k−1
∗ j ∗ A Aek Y + (I − A A) A j=0
≤
Aek Y
L(Y,Y )
y δ − y ∗ Y
+ I − (I − AA∗ )k Y δ ≤ Aek Y + δ
for k = 0, . . . , k∗ (δ). For k = k∗ (δ) the estimate (4.31) and the stopping rule imply that AA∗ wk∗ (δ) Y
≤ ≤
(1 + η)K(xδk∗ (δ) ) − y ∗ Y + δ (1 + η) K(xδk∗ (δ) ) − y δ Y + δ + δ ≤ (1 + η)(1 + r) + 1 δ .
This implies A∗ wk∗ (δ) 2X
= A∗ wk∗ (δ) , A∗ wk∗ (δ) X = AA∗ wk∗ (δ) , wk∗ (δ) Y ≤ AA∗ wk∗ (δ) Y wk∗ (δ) Y ≤ c1 δ wY
4.4
Problems
167
with some constant c1 which is independent of δ and w. Now we go back to (4.53) and get, using part (c) of Corollary 4.43 and (4.51) (in the case k∗ (δ) ≥ 1, otherwise directly) k∗ (δ)−1 +
∗ j ∗ ek∗ (δ) X ≤ c1 δ wY + (I − A A) A δ ≤ ≤
+ +
c1 δ wY +
+ ,
c1 δ wY +
This, ﬁnally, ends the proof.
j=0
L(X,Y )
k∗ (δ) δ √ √ 2(1 − 2η) 1/2 1/2 wY δ ≤ c wY δ. η(5 − 4η)
The Landweber iteration is only one member of the large class of iterative regularization methods. In particular, Newtontype methods, combined with various forms of regularization, have been investigated in the past and are subject of current research. We refer to the monograph [149] of Kaltenbacher, Neubauer, and Scherzer.
4.4
Problems
4.1 Let K : X ⊃ D(K) → K be a mapping with ˜ x −xX ≤ cK(˜ x)−K(x)Y for all x, x ˜ ∈ B(x∗ , ρ) where c is independent of x ˜ and x. Set y ∗ = K(x∗ ). Show that the equation K(x) = y ∗ is not locally illposed in the sense of Deﬁnition 4.1. 4.2 Deﬁne the integral operator (autoconvolution) K from L2 (0, 1) into itself by t K(x)(t) = x(t − s) x(s) ds , t ∈ (0, 1) . 0
(a) Show that K is welldeﬁned; that is, that K(x) ∈ L2 (0, 1) for every x ∈ L2 (0, 1). Remark: K is even welldeﬁned as a mapping from L2 (0, 1) into C[0, 1]. One try to prove this. (b) Show that K(x) = y is locally illposed in the sense of Deﬁnition 4.1 in every x∗ ∈ L2 (0, 1) with x∗ (t) ≥ 0 for almost all t ∈ (0, 1). Hint: For any r > 0 and n ∈ N deﬁne xn ∈ L2 (0, 1) by x∗ (t) √ , 0 < t < 1 − 1/n , xn (t) = ∗ x (t) + r n , 1 − 1/n < t < 1 . 4.3 Show that for every t ∈ [0, 2] there exists ct > 0 such that μt ≤ ct αt/2−1 +α
μ2
for all μ, α > 0 .
168
Nonlinear Inverse Problems
4.4 Show that K(xα(δ),δ ) − y δ + αwY = O(δ) as δ → 0 for the choice c− δ 2/(σ+1) ≤ α(δ) ≤ c+ δ 2/(σ+1) and xα,δ and w as in Theorem 4.15. 4.5 (a) Show that u∞ ≤ u L2 (0,1) for all u ∈ H 1 (0, 1) with u(0) = 0. (b) Show that H 2 (0, 1) is compactly imbedded in C[0, 1]. (c) Show that the solution u ∈ H 2 (0, 1) of u = −g in (0, 1) and u(0) = 1 u(1) = 0 is given by u(t) = 0 G(t, s)g(s)ds where G(t, s) is deﬁned in Lemma 4.16. 4.6 Let the aﬃne functions fn : R≥0 → R for n ∈ N be given by fn (t) = γn t+ηn , t ≥ 0, where γn , ηn > 0. Show that the function ϕ(t) := inf fn (t), t ≥ 0, is continuous, monotonic, and concave.
n∈N
4.7 (a) Show that the set Uδ from Corollary 4.18 is open in L2 (0, 1). (b) Let K : Uδ → H 2 (0, 1) be the operator from Theorem 4.19; that is, c → u where u ∈ H 2 (0, 1) solves the boundary value problem (4.11). Show that this K satisﬁes the Tangential Cone Condition of Assumption 4.32. Hint: Modify the diﬀerential equation (4.15) such that u − u ˜ appears on the right hand side and continue as in the proof of Theorem 4.19.
Chapter 5
Inverse Eigenvalue Problems 5.1
Introduction
Inverse eigenvalue problems are not only interesting in their own right, but also have important practical applications. We recall the fundamental paper by Kac [148]. Other applications appear in parameter identiﬁcation problems for parabolic or hyperbolic diﬀerential equations—as we study in Section 5.6 for a model problem—(see also [167, 187, 255]) or in grating theory ([156]). We study the Sturm–Liouville eigenvalue problem in canonical form. The direct problem is to determine the eigenvalues λ and the corresponding eigenfunctions u = 0 such that −
d2 u(x) + q(x) u(x) = λ u(x) , dx2
0 ≤ x ≤ 1,
u(0) = 0 and hu (1) + Hu(1) = 0 , 2
2
(5.1a) (5.1b)
2
where q ∈ L (0, 1) and h, H ∈ R with h +H > 0 are given. In this chapter, we assume that all functions are realvalued. In some applications, e.g., in grating theory, complexvalued functions q are also of practical importance. Essentially, all of the results of this chapter hold also for complexvalued q and are proven mainly by the same arguments. We refer to the remarks at the end of each section. The eigenvalue problem (5.1a), (5.1b) is a special case of the more general eigenvalue problem to determine ρ ∈ R and nonvanishing w such that d dw(t) p(t) + ρ r(t) − g(t) w(t) = 0 , t ∈ [a, b] , (5.2a) dt dt αa w (a) + βa w(a) = 0 ,
αb w (b) + βb w(b) = 0 .
© Springer Nature Switzerland AG 2021 A. Kirsch, An Introduction to the Mathematical Theory of Inverse Problems, Applied Mathematical Sciences 120, https://doi.org/10.1007/9783030633431 5
(5.2b)
169
170
Inverse Eigenvalue Problems
Here p, r, and g are given functions with p(t) > 0 and r(t) > 0 for t ∈ [a, b], and αa , αb , βa , βb ∈ R are constants with αa2 + βa2 > 0 and αb2 + βb2 > 0. If we assume, however, that g ∈ C[a, b] and p, r ∈ C 2 [a, b], then the Liouville transformation reduces the eigenvalue problem (5.2a), (5.2b) to the canonical form (5.1a), (5.1b). In particular, we deﬁne σ(t) :=
r(t) , p(t)
1/4 f (t) := p(t) r(t) ,
b L :=
σ(s) ds ,
(5.3)
a
the monotonic function x : [a, b] → [0, 1] by x(t) :=
1 L
t σ(s) ds ,
t ∈ [a, b] ,
(5.4)
a
and the new function u : [0, 1] → R by u(x) := f t(x) w t(x) , x ∈ [0, 1], where t = t(x) denotes the inverse of x = x(t). Elementary calculations show that u satisﬁes the diﬀerential equation (5.1a) with λ = L2 ρ and
f (t) p(t)f (t) 2 g(t) + . (5.5) q(x) = L r(t) r(t) f (t)2 t=t(x)
Also, it is easily checked that the boundary conditions (5.2b) are mapped into the boundary conditions h0 u (0) + H0 u(0) = 0 and h1 u (1) + H1 u(1) = 0 (5.6) with h0 = αa σ(a)/ L f (a) and H0 = βa /f (a)−αa f (a)/f (a)2 and, analogously, h1 , H1 with a replaced by b. In this chapter, we restrict ourselves to the study of the canonical Sturm– Liouville eigenvalue problem (5.1a), (5.1b). In the ﬁrst part, we study the case h = 0 in some detail. At the end of Section 5.3, we brieﬂy discuss the case where h = 1. In Section 5.3, we prove that there exists a countable number of eigenvalues λn of this problem and also prove an asymptotic formula. Because q is realvalued, the problem is selfadjoint, and the existence of a countable number of eigenvalues follows from the general spectral theorem of functional analysis (see Appendix A.6, Theorem A.53). Because this general theorem provides only the information that the eigenvalues tend to inﬁnity, we need other tools to obtain more information about the rate of convergence. The basic ingredient in the proof of the asymptotic formula is the asymptotic behavior of the fundamental system of the diﬀerential equation (5.1a) as λ tends to inﬁnity. Although all of the data and the eigenvalues are realvalued, we use results from complex analysis, in particular, Rouch´e’s theorem. This makes it necessary to allow the parameter λ in the fundamental system to be complexvalued. The existence of a fundamental solution and its asymptotics is the subject of the next section.
5.2
Construction of a Fundamental System
171
Section 5.5 is devoted to the corresponding inverse problem: Given the eigenvalues λn , determine the function q. In Section 5.6, we demonstrate how inverse spectral problems arise in a parameter identiﬁcation problem for a parabolic initial value problem. Section 5.7, ﬁnally, studies numerical procedures for recovering q that have been suggested by Rundell and others (see [186, 233, 234]). We ﬁnish this section with a “negative” result, as seen in Example 5.1. Example 5.1 Let λ be an eigenvalue and u a corresponding eigenfunction of −u (x) + q(x) u(x) = λ u(x), 0 < x < 1,
u(0) = 0, u(1) = 0 .
Then λ is also an eigenvalue with corresponding eigenfunction v(x) := u(1 − x) of the eigenvalue problem −v (x) + q˜(x) v(x) = λ v(x), 0 < x < 1,
v(0) = 0, v(1) = 0 ,
where q˜(x) := q(1 − x). This example shows that it is generally impossible to recover the function q unless more information is available. We will see that q can be recovered uniquely, provided we know that it is an even function with respect to 1/2 or if we know a second spectrum; that is, a spectrum for a boundary condition diﬀerent from u(1) = 0.
5.2
Construction of a Fundamental System
It is wellknown from the theory of linear ordinary diﬀerential equations that the following initial value problems are uniquely solvable for every ﬁxed (realor complexvalued) q ∈ C[0, 1] and every given λ ∈ C: −u1 + q(x) u1 = λ u1 , 0 < x < 1 ,
u1 (0) = 1 , u1 (0) = 0
(5.7a)
−u2 + q(x) u2 = λ u2 , 0 < x < 1 ,
u2 (0) = 0 , u2 (0) = 1 .
(5.7b)
Uniqueness and existence for q ∈ L2 (0, 1) is shown in Theorem 5.4 below. The set of functions {u1 , u2 } is called a fundamental system of the diﬀerential equation −u + q u = λ u in (0, 1). The functions u1 and u2 are linearly independent because the Wronskian determinant is one
u1 u2 (5.8) [u1 , u2 ] := det = u1 u2 − u1 u2 = 1 . u1 u2 This is seen from d [u1 , u2 ] = u1 u2 − u1 u2 = u1 (q − λ) u2 − u2 (q − λ) u1 = 0 dx and [u1 , u2 ](0) = 1. The functions u1 and u2 depend on λ and q. We express this dependence often by uj = uj (·, λ, q), j = 1, 2. For q ∈ L2 (0, 1), the solution
172
Inverse Eigenvalue Problems
is not twice continuously diﬀerentiable anymore but is only an element of the Sobolev space x 1 H (0, 1) := u ∈ C [0, 1] : u (x) = α + v(t) dt, α ∈ C, v ∈ L2 (0, 1) , 2
0
see (1.24). We write u for v and observe that u ∈ L2 (0, 1). The most important example is when q = 0. In this case, we can solve (5.7a) and (5.7b) explicitly and have the following: Example 5.2 Let q = 0. Then the solutions of (5.7a) and (5.7b) are given by √ sin( λ x) √ , u1 (x, λ, 0) = cos( λ x) and u2 (x, λ, 0) = λ √
(5.9)
respectively. An arbitrary branch of the square root can be taken because s → cos(sx) and s → sin(sx)/s are even functions. We will see that the fundamental solution for any function q ∈ L2 (0, 1) behaves as (5.9) as λ tends to inﬁnity. For the proof of the next theorem, we need the following technical lemma. Lemma 5.3 Let q ∈ L2 (0, 1) and k, k˜ ∈ C[0, 1] such that there exists μ > 0 ˜ ) ≤ exp(μτ ) for all τ ∈ [0, 1]. Let K, K ˜ : with k(τ ) ≤ exp(μτ ) and k(τ C[0, 1] → C[0, 1] be the Volterra integral operators with kernels k(x − t) q(t) and ˜ − t) q(t), respectively; that is, k(x x k(x − t) q(t) x(t) dt ,
(Kφ)(x) =
0 ≤ x ≤ 1,
0
˜ Then the following estimate holds: and analogously for K. ˜ K n−1 φ)(x) ≤ φ∞ 1 qˆ(x)n eμx , 0 ≤ x ≤ 1 , (K (5.10) n! x for all φ ∈ C[0, 1] and all n ∈ N. Here, qˆ(x) := 0 q(t) dt. If φ ∈ C[0, 1] satisﬁes also the estimate φ(τ ) ≤ exp(μτ ) for all τ ∈ [0, 1], then we have ˜ K n−1 φ)(x) ≤ 1 qˆ(x)n eμx , (K n!
0 ≤ x ≤ 1,
for all n ∈ N. Proof: We prove the estimates by induction with respect to n.
(5.11)
5.2
Construction of a Fundamental System
173
For n = 1, we estimate ˜ (Kφ)(x)
x ˜ − t) q(t) φ(t) dt = k(x 0
≤
x
φ∞
eμ(x−t) q(t) dt ≤ φ∞ eμx qˆ(x) . 0
˜ Now we assume the validity of (5.10) for n. Because it holds also for K = K, we estimate x ˜ K n φ)(x) ≤ (K eμ(x−t) q(t) (K n φ)(t) dt 0
≤
φ∞
1 μx e n!
We compute the last integral by x x n q(t) qˆ(t) dt = qˆ (t) qˆ(t)n dt = 0
0
=
x q(t) qˆ(t)n dt . 0
1 n+1
x 0
d qˆ(t)n+1 dt dt
1 qˆ(x)n+1 . n+1
This proves the estimate (5.10) for n + 1. For estimate (5.11), we only change the initial step n = 1 into ˜ ≤ (Kφ)(x)
x eμ(x−t) eμt q(t) dt ≤ eμx qˆ(x) . 0
The remaining part is proven by the same arguments.
Now we prove the equivalence of the initial value problems for uj , j = 1, 2, to Volterra integral equations. Theorem 5.4 Let q ∈ L2 (0, 1) and λ ∈ C. Then we have (a) u1 , u2 ∈ H 2 (0, 1) are solutions of (5.7a) and (5.7b), respectively, if and only if u1 , u2 ∈ C[0, 1] solve the Volterra integral equations: √ x √ sin λ(x − t) √ u1 (x) = cos( λ x) + q(t) u1 (t) dt, (5.12a) λ 0
u2 (x)
=
√ √ x sin λ(x − t) sin( λ x) √ √ + q(t) u2 (t) dt , λ λ 0
respectively, for 0 ≤ x ≤ 1.
(5.12b)
174
Inverse Eigenvalue Problems
(b) The integral equations (5.12a) and (5.12b) and the initial value problems (5.7a) and (5.7b) are uniquely solvable. The solutions can be represented by a Neumann series. Let K denote the integral operator x (Kφ)(x) := 0
√ sin λ(x − t) √ q(t) φ(t) dt , λ
x ∈ [0, 1] ,
(5.13)
and deﬁne √ C(x) := cos( λx) Then u1 =
∞
Kn C
and
S(x) :=
and
u2 =
n=0
∞
√ sin( λx) √ . λ
(5.14)
Kn S .
(5.15)
n=0
The series converge uniformly with respect to (x, λ, q) ∈ [0, 1] × Λ × Q for all bounded sets Λ ⊂ C and Q ⊂ L2 (0, 1). Proof: (a) We use the following version of partial integration for f, g ∈ H 2 (0, 1): b
b f (t) g(t) − f (t) g (t) dt = f (t) g(t) − f (t) g (t) a .
(5.16)
a
We restrict ourselves to the proof for u1 . Let u1 be a solution of (5.7a). Then x
x S(x − t) q(t) u1 (t) dt =
0
x = 0
S(x − t) λ u1 (t) + u1 (t) dt
0
u1 (t) λ S(x − t) + S (x − t) dt =0
t=x + u1 (t) S(x − t) + u1 (t) S (x − t) t=0 √ = u1 (x) − cos λx . On the other hand, let u1 ∈ C[0, 1] be a solution of the integral xequation (5.12a). The operator A : L2 (0, 1) → L2 (0, 1) deﬁned by (Aφ)(x) = 0 S(x − t) φ(t) dt, x ∈ (0, 1), is bounded, and it is easily seen that (Aφ) + λ(Aφ) = φ for φ ∈ C[0, 1]. Therefore, A is even bounded from L2 (0, 1) into H 2 (0, 1). Writing (5.12a) in the form u1 = C + A(qu1 ) yields that u1 ∈ H 2 (0, 1) and u1 = −λC + qu1 − λA(qu1 ) = qu1 − λu1 . This proves the assertion because the initial conditions are obviously satisﬁed. √ √ (b) We observe √ that all √ of the functions k(τ ) = cos( λτ ), k(τ ) = sin( λτ ), and k(τ ) = sin( λτ )/ λ for τ ∈ [0, 1] satisfy the estimate k(τ ) ≤ exp(μ τ )
5.2
Construction of a Fundamental System
175
√ with μ =  Im λ. This is obvious for the ﬁrst two functions. For the third, it follows from √ τ τ √ sin( λτ ) √ ≤  cos( λs) ds = cosh(μs) ds ≤ cosh(μτ ) ≤ eμτ . λ 0
0
We √ have to√study the integral operator K with kernel k(x−t)q(t), where k(τ ) = ˜ = K. Estimate (5.10) yields sin( λτ )/ λ. We apply Lemma 5.3 with K K n ∞ ≤
qˆ(1)n μ e < 1 n!
for suﬃciently large n uniformly for q ∈ Q and λ ∈ Λ. Therefore, the Neumann series converges (see Appendix A.3, Theorem A.31), and part (b) is proven. The integral representation of the previous theorem yields the following asymptotic behavior of the fundamental system by comparing the case for arbitrary q with the case of q = 0. Theorem 5.5 Let q ∈ L2 (0, 1), λ ∈ C, and u1 , u2 be the fundamental system; that is, the solutions of the initial value problems (5.7a) and (5.7b), respectively. Then we have for all x ∈ [0, 1]: √ u1 (x) − cos( λx) √ u2 (x) − sin(√ λx) λ √ √ u1 (x) + λ sin( λx)
≤
x √ 1 √ exp  Im λ x + q(t) dt ,  λ
(5.17a)
≤
x √ 1 exp  Im λ x + q(t) dt , λ
(5.17b)
0
0
≤
x √ exp  Im λ x + q(t) dt ,
(5.17c)
0
√ u2 (x) − cos( λx)
≤
x √ 1 √ exp  Im λ x + q(t) dt .  λ
(5.17d)
0
√ Proof: Again, we √ use√the Neumann series and deﬁne C(τ ) := cos( λτ ) and Let K be the integral operator with kernel S(τ ) := √ λ. √ sin( λτ )/ q(t) sin λ(x − t) / λ. Then ∞ √ (K n C)(x) . u1 (x) − cos( λx) ≤ n=1
√ √ √ ˜ ) = sin( λτ ) and k(τ ) = sin( λτ )/ λ and denote by K ˜ and Now we set k(τ ˜ K the Volterra integral operators with kernels k(x−t) and k(x−t), respectively.
176
Inverse Eigenvalue Problems
Then K n =
√1 λ
˜ K n−1 and, by Lemma 5.3, part (b), we conclude that K
1 (K C)(x) ≤ √  λ n!
n
x q(t) dt
n
√ exp  Im λ x
0
for n ≥ 1. Summation now yields the desired estimate: √ u1 (x) − cos( λx) ≤
x √ 1 √ exp  Im λ x + q(t) dt .  λ 0
√ Because S(x) ≤ √1λ exp  Im λ x , the same arguments prove the estimate (5.17b). Diﬀerentiation of the integral equations (5.12a) and (5.12b) yields u1 (x)
x √ √ √ + λ sin( λ x) = cos λ(x − t) q(t) u1 (t) dt, 0
√ u2 (x) − cos( λ x) =
x cos
√ λ(x − t) q(t) u2 (t) dt .
0
˜ deﬁned as the operator with kernel q(t) cos With K as before and K Then u1 (x) +
√ λ(x − t).
∞ √ √ ˜ λ sin( λ x) = K K nC , n=0
∞ √ ˜ u2 (x) − cos( λ x) = K K nS , n=0
and we use Lemma 5.3, estimate (5.11), again. Summation yields the estimates (5.17c) and (5.17d). In the next section, we need the fact that the eigenfunctions are continuously diﬀerentiable with respect to q and λ. We remind the reader of the concept of Fr´echet diﬀerentiability (Fdiﬀerentiability) of an operator between Banach spaces X and Y (see Appendix A.7, Deﬁnition A.60). Here we consider the mapping (λ, q) → uj (·, λ, q) from C × L2 (0, 1) into C[0, 1] for j = 1, 2. We denote these mappings by uj again and prove the following theorem: Theorem 5.6 Let uj : C × L2 (0, 1) → C[0, 1], j = 1, 2, be the solution operator of (5.7a) and (5.7b), respectively. Then we have the following: (a) uj is continuous.
5.2
Construction of a Fundamental System
177
ˆ qˆ) ∈ C × L2 (0, 1) with (b) uj is continuously Fdiﬀerentiable for every (λ, partial derivatives ∂ ˆ qˆ) = uj,λ (·, λ, ˆ qˆ) uj (·, λ, (5.18a) ∂λ and ∂ ˆ qˆ) (q) = uj,q (·, λ, ˆ qˆ) , uj (·, λ, (5.18b) ∂q ˆ qˆ) and uj,q (·, λ, ˆ qˆ) are solutions of the following initial where uj,λ (·, λ, boundary value problems for j = 1, 2: ˆ uj,λ = uj (·, λ, ˆ qˆ) in (0, 1) −uj,λ + qˆ − λ uj,λ (0) = 0, uj,λ (0) = 0, ˆ uj,q = −q uj (·, λ, ˆ qˆ) + qˆ − λ
−uj,q
uj,q (0) = 0,
(5.19) in (0, 1),
uj,q (0) = 0.
(c) Furthermore, for all x ∈ [0, 1] we have x
uj (t)2 dt
=
[uj,λ , uj ](x),
u1 (t) u2 (t) dt
=
[u1,λ , u2 ](x) = [u2,λ , u1 ](x),
(5.20b)
q(t) uj (t)2 dt
=
[uj,q , uj ](x),
(5.20c)
q(t) u1 (t) u2 (t) dt
=
[u1,q , u2 ](x) = [u2,q , u1 ](x) ,
j = 1, 2,
(5.20a)
0
x 0
x
− x −
j = 1, 2,
0
(5.20d)
0
where [u, v] denotes the Wronskian determinant from (5.8). Proof: (a), (b): Continuity and diﬀerentiability of uj follow from the integral equations (5.12a) and (5.12b) because the kernel and the righthand sides depend continuously and diﬀerentiably on λ and q. It remains to show the representation of the derivatives in (b). Let u = uj , j = 1 or 2. Then ˆ + ε) −u (·, λ ˆ −u (·, λ)
+ +
ˆ − ε u(·, λ ˆ + ε) qˆ − λ ˆ u(·, λ) ˆ qˆ − λ
= 0, = 0;
thus −
1 ˆ + ε) − u(·, λ) ˆ + qˆ − λ ˆ + ε) − u(·, λ) ˆ = u(·, λ ˆ + ε) . ˆ 1 u(·, λ u(·, λ ε ε
178
Inverse Eigenvalue Problems
Furthermore, the homogeneous initial conditions are satisﬁed for the diﬀerence ˆ as ε → 0. Therequotient. The righthand side converges uniformly to u(·, λ) fore, the diﬀerence quotient converges to uλ uniformly in x. The same arguments yield the result for the derivative with respect to q. (c) Multiplication of the diﬀerential equation for uj,λ by uj and the diﬀerential equation for uj by uj,λ and subtraction yields u2j (x)
= uj (x) uj,λ (x) − uj,λ (x) uj (x) =
d u (x) uj,λ (x) − uj,λ (x) uj (x) . dx j
Integration of this equation and the homogeneous boundary conditions yield the ﬁrst equation of (5.20a). The proofs for the remaining equations use the same arguments and are left to the reader. At no place in this section have we used the assumption that q is realvalued. Therefore, the assertions of Theorems 5.4, 5.5, and 5.6 also hold for complexvalued q.
5.3
Asymptotics of the Eigenvalues and Eigenfunctions
We ﬁrst restrict ourselves to the Dirichlet problem; that is, the eigenvalue problem −u (x) + q(x) u(x) = λ u(x) , 0 < x < 1,
u(0) = u(1) = 0 .
(5.21)
We refer to the end of this section for diﬀerent boundary conditions. Again, let q ∈ L2 (0, 1) be realvalued. We observe that λ ∈ C is an eigenvalue of this problem if and only if λ is a zero of the function f (λ) := u2 (1, λ, q) . Again, u2 = u2 (·, λ, q) denotes the solution of the diﬀerential equation −u2 + q u2 = λ u2 in (0, 1) with initial conditions u2 (0) = 0 and u2 (0) = 1. If u2 (1, λ, q) = 0, then u = u2 (·, λ, q) is an eigenfunction corresponding to the eigenvalue λ, normalized such that u (0) = 1. There are diﬀerent ways to normalize the eigenfunctions. Later we will sometimes normalize them such that the L2 −norms are one; that is, use g = u/uL2 instead of u. The function f plays exactly the role of the wellknown characteristic polynomial for matrices and is, therefore, called the characteristic function of the eigenvalue problem. Theorem 5.6 implies that f is diﬀerentiable; that is, analytic in all of C. This observation makes it possible to use tools from complex analysis. First, we summarize wellknown facts about eigenvalues and eigenfunctions for the Sturm–Liouville problem. Theorem 5.7 Let q ∈ L2 (0, 1) be realvalued. Then
5.3
Asymptotics of the Eigenvalues and Eigenfunctions
179
(a) All eigenvalues λ are real. (b) There exists a countable number of real eigenvalues λj , j ∈ N, which tend to inﬁnity as j → ∞. The corresponding eigenfunctions gj ∈ C[0, 1], normalized by gj L2 = 1, form a complete orthonormal system in L2 (0, 1). (c) The geometric and algebraic multiplicities of the eigenvalues λj are one; that is, the eigenspaces are onedimensional and the zeros of the characteristic function f are simple. (d) Let the eigenvalues be ordered as λ1 < λ2 < λ3 < · · · . The eigenfunction gj corresponding to λj has exactly j − 1 zeros in (0, 1). (e) Let q be even with respect to 1/2; that is, q(1 − x) = q(x) for all x ∈ [0, 1]. Then gj is even with respect to 1/2 for odd j and odd with respect to 1/2 for even j. Proof: (a) and (b) follow from the fact that the boundary value problem is selfadjoint. We refer to Problem 5.1 for a repetition of the proof (see also Theorems A.52 and A.53). (c) Let λ be an eigenvalue and u, v be two corresponding eigenfunctions. Choose α, β with α2 + β 2 > 0, such that α u (0) = β v (0). The function w := αu − βv solves the diﬀerential equation and w(0) = w (0) = 0; that is, w vanishes identically. Therefore, u and v are linearly dependent. We apply Theorem 5.6, part (c), to show that λ is a simple zero of f . Because u2 (1, λ, q) = 0, we have from (5.20a) for j = 2 that f (λ)
= =
∂ u2 (1, λ, q) = u2,λ (1, λ, q) ∂λ 1 1 u2 (x, λ, q)2 dx = 0 . u2 (1, λ, q)
(5.22)
0
This proves part (c). (d) First we note that every gj has only a ﬁnite number of zeros. Otherwise, they would accumulate at some point x ∈ [0, 1], and it is not diﬃcult to show that gj and also gj vanish at x. This would imply that gj vanishes identically. We ﬁx j ∈ N and deﬁne the function h : [0, 1] × [0, 1] → R by h(t, x) = uj (x; tq). Here, uj (·; tq) is the jth eigenfunction uj corresponding to tq instead of q and normalized such that uj (0) = 1. Then h is continuously diﬀerentiable and h(t, 0) = h(t, 1) = 0 and every zero of h(t, ·) is simple. This holds for every t. By part (a) of Lemma 5.8 below, the number of zeros of h(t, ·) is constant with respect to t. Therefore, √ uj (·, q) = h(1, ·) has exactly the same number of zeros as h(0, x) = uj (x, 0) = 2 sin(jπx) which is j − 1. The normalization to gj = uj (·, q)/uj (·, q)L2 does not change the number of zeros. (e) Again, it is suﬃcient to prove that vj = u2 (·, λj , q) is even (or odd) for odd (or even) j. First, we note that also v˜j (x) := vj (1 − x) is an eigenfunction.
180
Inverse Eigenvalue Problems
Since the eigenspace is onedimensional, we conclude that there exists ρj ∈ R with v˜j (x) = vj (1 − x) = ρj vj (x) for all x. For x = 1/2 this implies that (1 − ρj )vj (1/2) = 0 and also by diﬀerentiation (1 + ρj )vj (1/2) = 0 for all j. Since vj (1/2) and vj (1/2) cannot vanish simultaneously, we conclude that ρj ∈ {+1, −1} and even ρj = ρj vj (0) = −vj (1). From (5.22), we conclude that sign f (λj ) = sign vj (1) = − sign ρj . Since λj are the subsequent zeros of f , we conclude that f (λj )f (λj+1 ) < 0 for all j; that is, ρj = σ(−1)j+1 , j ∈ N, for some σ ∈ {+1, −1}. The ﬁrst eigenfunction v1 has no zero by part (d) which yields σ = 1. This ends the proof. The ﬁrst part of the following technical result has been used in the previous proof, the second part will be needed below. Lemma 5.8 (a) Let h : [0, 1] × [0, 1] → R be continuously diﬀerentiable such that h(t, ·) has ﬁnitely many zeros in [0, 1] and all are simple for every t ∈ [0, 1]. Then the number m(t) of zeros of h(t, ·) is constant with respect to t. (b) Let z ∈ C with z − nπ ≥ π/4 for all n ∈ Z. Then exp  Im z < 4  sin z . Proof: (a) It suﬃces to show that t → m(t) is continuous. Fix tˆ ∈ [0, 1] and ˆj )/∂x = 0 there let x ˆj , j = 1, . . . , m(tˆ), be the zeros of h(tˆ, ·). Because ∂h(tˆ, x xj − δ, x ˆj + δ) ∩ [0, 1] with exist intervals T = (tˆ − δ, tˆ + δ) ∩ [0, 1]and Jj = (ˆ ∂h(t, x)/∂x = 0 for all t ∈ T and x ∈ j Jj . Therefore, for every t ∈ T , the function h(t, ·) has at most one zero in every Jj . On the other hand, by the implicit function theorem (applicable because ∂h(tˆ, x ˆj )/∂x = 0), for every t ∈ T , where T and Jj are possibly there exists at least one zero of h(t, ·) in every J j made smaller. Outside of Jj there are no zeros of h(tˆ, ·), and thus by making perhaps T smaller again, no zeros of h(t, ·) either for all t ∈ T . This shows that m(t) = m(tˆ) for all t ∈ T which shows that t → m(t) is continuous, thus constant. / {nπ : (b) Let ψ(z) = exp z2 / sin z for z = z1 + iz2 , z1 , z2 ∈ R with z1 ∈ n ∈ Z}. We consider two cases: 1st case: z2  > ln 2/2. Then ψ(z) =
2 ez2  2 ez2  2 ≤ = < 4 iz −z −iz +z z  −z  1 2 1 2 2 2 e −e  e −e 1 − e−2z2 
because exp(−2z2 ) < 1/2. 2nd case: z2  ≤ ln 2/2. From z − nπ ≥ π/4 for all n, we conclude that z1 − nπ2 ≥ π 2 /16 − z22 ≥ π 2 /16 − (ln 2)2 /4 ≥ π 2 /64; thus  sin z1  ≥ sin π8 . With  Re sin z =  sin z1   cosh z2  ≥  sin z1 , we conclude that √ √ 2 2 ez2  ≤ ≤ < 4. ψ(z) ≤  Re sin z  sin z1   sin π8 
5.3
Asymptotics of the Eigenvalues and Eigenfunctions
181
Now we prove the “counting lemma,” a ﬁrst crude asymptotic formula for the eigenvalues. As the essential tool in the proof, we use the theorem of Rouch´e from complex analysis (see [2]), which we state for the convenience of the reader: Let U ⊂ C be a domain and the functions F and G be analytic in C and F (z) − G(z) < G(z) for all z ∈ ∂U . Then F and G have the same number of zeros in U. Lemma 5.9 Let q ∈ L2 (0, 1) and N > 2 exp qL1 be an integer. Then (a) The characteristic function f (λ) := u2 (1, λ, q) has exactly N zeros in the halfplane (5.23) H := λ ∈ C : Re λ < (N + 1/2)2 π 2 . (b) For every m > N there exists exactly one zero of f in the set √ Um := λ ∈ C : λ − mπ < π/2 . √ Here we take the branch with Re λ ≥ 0.
(5.24)
(c) There are no other zeros of f in C. Proof: We are going to apply Rouch´e’s theorem to the function F (z) = f (z 2 ) = u2 (1, z 2 , q) and the corresponding function G of the eigenvalue problem for q = 0; that is, G(z) := sin z/z. For U , we take one of the sets Wm or VR deﬁned by Wm := z ∈ C : z − mπ < π/2 , z ∈ C :  Re z < (N + 1/2)π,  Im z < R VR := for ﬁxed R > (N + 1/2)π and want to apply Lemma 5.8: (i) First let z ∈ ∂Wm : For n ∈ Z, n = m, we have z−nπ ≥ m−nπ−z−mπ ≥ π − π/2 > π/4. For n = m, we observe that z − mπ = π/2 > π/4. Therefore, we can apply Lemma 5.8 for z ∈ ∂Wm . Furthermore, we note the estimate z ≥ mπ − z − mπ = (m − 1/2)π > N π > 2N for all z ∈ ∂Wm . (ii) Let z ∈ ∂VR , n ∈ Z. Then  Re z = (N +1/2)π or  Im z = R. In either case, we estimate z − nπ2 = (Re z − nπ)2 + (Im z)2 ≥ π 2 /4 > π 2 /16. Therefore, we can apply Lemma 5.8 for z ∈ ∂VR . Furthermore, we have the estimate z ≥ (N + 1/2)π > 2N for all z ∈ ∂VR . Application of Theorem 5.5 and Lemma 5.8 yields the following estimate for all z ∈ ∂VR ∪ ∂Wm : 1 4  sin z N F (z) − sin z ≤ exp  Im z + qL1 ≤ z z2 z2 2 sin z 2N sin z . < = z z z Therefore, F and G(z) := sin z/z have the same number of zeros in VR and every Wm . Because the zeros of G are ±nπ, n = 1, 2, . . ., we conclude that G
182
Inverse Eigenvalue Problems
has exactly 2N zeros in VR and exactly one zero in every Wm . By the theorem of Rouch´e, this also holds for F . Now we show that F has no zero outside of VR ∪ m>N Wm . Again, we apply Lemma 5.8: Let z ∈ / VR ∪ m>N Wm . From z ∈ / VR , we conclude that z = (Re z)2 + (Im z)2 ≥ (N +1/2)π. For n > N , we have that z −nπ > π/2 because z ∈ / Wn . For n ≤ N , we conclude that z − nπ ≥ z − nπ ≥ (N + 1/2 − n)π ≥ π/2. We apply Theorem 5.5 and Lemma 5.8 again and use the second triangle inequality. This yields sin z − 1 exp  Im z + qL1 F (z) ≥ 2 z z
sin z 1 4 exp q L 1− ≥ z z
sin z 1 − 2N > 0 ≥ z z because z ≥ (N + 1/2)π > 2N . Therefore, we have shown that f has exactly one zero in every Um , m > N , and N zeros in the set √ √ HR := λ ∈ C : 0 < Re λ < (N + 1/2)π, Im λ < R and no other zeros. It√remains to show that HR ⊂ H. For λ = λ exp(iθ)∈ H R, we conclude that Re λ = λ cos θ2 < (N + 1/2)π; thus Re λ = λ cos 2 θ2 ≤ λ cos2 θ2 < (N + 1/2)2 π 2 . This lemma proves again the existence of inﬁnitely many eigenvalues. The arguments are not changed for the case of complexvalued functions q. In this case, the general spectral theory is not applicable anymore because the boundary value problem is not selfadjoint. This lemma also provides more information about the eigenvalue distribution, even for the realvalued case. First, we order the eigenvalues in the form λ 1 < λ2 < λ3 < · · · . Lemma 5.9 implies that λn = nπ + O(1) ;
that is, λn = n2 π 2 + O(n) .
(5.25)
For the treatment of the inverse problem, it is necessary to improve this formula. It is our aim to prove that 2 2
λn = n π
1 ˜n q(t) dt + λ
+ 0
where
∞ 2 ˜ λn < ∞ .
(5.26)
n=1
There are several methods to prove (5.26). We follow the treatment in [218]. The key is to apply the fundamental theorem of calculus to the function t → λn (tq)
5.3
Asymptotics of the Eigenvalues and Eigenfunctions
183
for t ∈ [0, 1], thus connecting the eigenvalues λn corresponding to q with the eigenvalues n2 π 2 corresponding to q = 0 by the parameter t. For this approach, we need the diﬀerentiability of the eigenvalues with respect to q. For ﬁxed n ∈ N, the function q → λn (q) from L2 (0, 1) into C is welldeﬁned and Fr´echet diﬀerentiable by the following theorem. Theorem 5.10 For every n ∈ N, the mapping q → λn (q) from L2 (0, 1) into C is continuously Fr´echet diﬀerentiable for every qˆ ∈ L2 (0, 1) and λn (ˆ q )q
1 =
gn (x, qˆ)2 q(x) dx ,
q ∈ L2 (0, 1) .
(5.27)
0
Here, ˆ n , qˆ) u2 (x, λ gn (x, qˆ) := ˆ n , qˆ) 2 u2 (·, λ L ˆ n := λn (ˆ denotes the L2 normalized eigenfunction corresponding to λ q ). Note 2 ˆ that the integral is welldeﬁned because u2 (·, λn , qˆ) ∈ H (0, 1) ⊂ C[0, 1]. ˆ n , qˆ) = 0 and apply the implicit function theorem Proof: We observe that u2 (1, λ to the equation u2 (1, λ, q) = 0 ˆ n , qˆ). This is possible because the zero λ ˆ n of u2 (1, ·, qˆ) in a neighborhood of (λ is simple by Lemma 5.7. The implicit function theorem (see Appendix A.7, Theorem A.66) yields the existence of a unique function λn = λn (q) such that u2 (1, λn (q), q) = 0 for all q in a neighborhood of qˆ; we know this already. But it also implies that the function λn is continuously diﬀerentiable with respect to q and ∂ ∂ ˆ n , qˆ) λ (ˆ ˆ n , qˆ)q ; 0 = u2 (1, λ u2 (1, λ n q )q + ∂λ ∂q that is, u2,λ (1) λn (ˆ q )q + u2,q (1) = 0. With Theorem 5.6, part (c), we conclude that λn (ˆ q )q
u2,q (1) u2 (1) u2,q (1) = − u2,λ (1) u2,λ (1) u2 (1) 1 1 q(x) u2 (x)2 dx [u2,q , u2 ](1) 0 = = gn (x, qˆ)2 q(x) dx , = − 1 [u2,λ , u2 ](1) u2 (x)2 dx
= −
0
ˆ and qˆ. where we have dropped the arguments λ
0
Now we are ready to formulate and prove the main theorem which follows:
184
Inverse Eigenvalue Problems
Theorem 5.11 Let Q ⊂ L2 (0, 1) be bounded, q ∈ Q, and λn ∈ C the corresponding eigenvalues. Then we have 2 2
λn = n π
1
1 q(t) dt −
+ 0
q(t) cos(2nπt) dt + O(1/n)
(5.28)
0
for n → ∞ uniformly for q ∈ Q. Furthermore, the corresponding eigenfunctions gn , normalized to gn L2 = 1, have the following asymptotic behavior: √ gn (x) = 2 sin(nπx) + O(1/n) and (5.29a) √ gn (x) = 2 n π cos(nπx) + O(1) (5.29b) as n → ∞ uniformly for x ∈ [0, 1] and q ∈ Q. We observe that the second integral on the righthand side of (5.28) isthe nth Fourier coeﬃcient an of q with respect to cos(2πnt) : n = 0, 1, 2, . . . , . From Fourier theory, it is known that ∞an converges to zero, and even more: Bessel’s inequality (A.7) yields that n=0 an 2 < ∞; that is, (5.26) is satisﬁed. If q is smooth enough, e.g., continuously diﬀerentiable, then an tends to zero faster than 1/n. In that case, this term is absorbed in the O(1/n) expression. Proof: We split the proof into four parts: √ √ (a) First, we show that gn (x) = 2 sin( λn x) √ + O(1/n) uniformly for (x, q) ∈ [0, 1] × Q. By Lemma 5.9, we know that λn = nπ + O(1), and thus by Theorem 5.5 √ sin( λn x) √ + O(1/n2 ) . u2 (x, λn ) = λn 1 With the formula 2 0 sin2 (αt) dt = 1 − sin(2α)/(2α), we compute 1
2
u2 (t, λn ) dt = 0
= =
1 λn
1
sin2
λn t dt + O(1/n3 )
0 √
1 sin(2 λn ) √ 1− + O(1/n3 ) 2λn 2 λn 1 [1 + O(1/n)] . 2λn
Therefore, we have √ u2 (x, λn ) = 2 sin( λn x) + O(1/n) . gn (x) = 1 u (t, λn )2 dt 0 2 (b) Now we show that
√
λn = nπ + O(1/n) and gn (x) =
√ 2 sin(nπx) + O(1/n).
5.3
Asymptotics of the Eigenvalues and Eigenfunctions
185
We apply the fundamental theorem of calculus and use Theorem 5.10 2 2
λn − n π
1 = λn (q) − λn (0) = 1 =
λn (tq)q dt =
0
d λn (tq) dt dt
0 1 1
0
(5.30)
gn (x, tq)2 q(x) dx dt = O(1) .
0
√ This √ yields λn = nπ+O(1/n) and, with part (a), the asymptotic form gn (x) = 2 sin(nπx) + O(1/n). (c) Now the asymptotics of the eigenvalues follow easily from (5.30) by the observation that gn (x, tq)2 = 2 sin2 (nπx) + O(1/n) = 1 − cos(2nπx) + O(1/n) , uniformly for t ∈ [0, 1] and q ∈ Q. (d) Similarly, we have for the derivatives √ √ √ 2 λn cos( λn x) + O(1) u (x, λn ) gn (x) = 2 = 1 1 + O(1/n) u (t, λn )2 dt 0 2 =
√
2 nπ cos(nπx) + O(1) .
Example 5.12 We illustrate Theorem 5.11 by the following two numerical examples: (a) Let q1 (x) = exp sin(2πx) , x ∈ [0, 1]. Then q1 is analytic and periodic with period 1. Plots√of the √ characteristic functions λ → f (λ) for q1 and q = 0; that is, λ → sin λ/ λ are shown in Figure 5.1. 1.4
0.5
1.2
0.4
1
0.3
0.8
0.2
0.6 0.1 0.4 0
0.2
0.1
0
0.2
0.2 0.4 0
2
4
6
8
10
12
14
16
18
20
0.3 0
10
20
30
40
50
60
70
80
90
100
Figure 5.1: Characteristic functions of q, q1 , respectively, on [0, 20] and [5, 100]
186
Inverse Eigenvalue Problems
(b) Let q2 (x) = −5 x for 0 ≤ x ≤ 0.4 and q2 (x) = 4 for 0.4 < x ≤ 1. The function q2 is not continuous. Plots of the characteristic functions λ → f (λ) for q2 and q = 0 are shown in Figure 5.2. 1.4
0.6
1.2
0.5
1
0.4
0.8
0.3
0.6
0.2
0.4
0.1
0.2
0
0
0.1
0.2
0.2
0.4 0
2
4
6
8
10
12
14
16
18
20
0.3 0
10
20
30
40
50
60
70
80
90
100
Figure 5.2: Characteristic functions of q, q2 , respectively, on [0, 20] and [5, 100] The Fourier coeﬃcients of q1 converge to zero of exponential order. The following table shows the eigenvalues λn corresponding to q1 , the eigenvalues n2 π 2 corresponding to q = 0 and the diﬀerence 1 2 2 q(x) dx for n = 1, . . . , 10 : cn := λn − n π − 0
λn 11.1 40.9 90.1 159.2 248.0 356.6 484.9 632.9 800.7 988.2
2 2
n π 9.9 39.5 88.8 157.9 246.7 354.3 483.6 631.7 799.4 987.0
cn −2.04 ∗ 10−2 1.49 ∗ 10−1 2.73 ∗ 10−3 −1.91 ∗ 10−3 7.74 ∗ 10−4 4.58 ∗ 10−4 4.58 ∗ 10−4 4.07 ∗ 10−4 3.90 ∗ 10−4 3.83 ∗ 10−4
We clearly observe the rapid convergence. Because q2 is not continuous, the Fourier coeﬃcients converge to zero only slowly. Again, we list the eigenvalues λn for q2 , the eigenvalues n2 π 2 corresponding to q = 0, and the diﬀerences 1 2 2 q(x) dx and cn := λn − n π − 0
dn
:=
2 2
λn − n π
1 −
1 q(x) dx +
0
q(x) cos(2πnx) dx 0
5.3
Asymptotics of the Eigenvalues and Eigenfunctions
187
for n = 1, . . . , 10: λn 12.1 41.1 91.1 159.8 248.8 357.4 484.5 633.8 801.4 989.0
n2 π 2 9.9 39.5 88.8 157.9 246.7 354.3 483.6 631.7 799.4 987.0
cn 1.86 ∗ 10−1 −3.87 ∗ 10−1 3.14 ∗ 10−1 1.61 ∗ 10−1 2.07 ∗ 10−2 8.29 ∗ 10−2 −1.25 ∗ 10−1 1.16 ∗ 10−1 −6.66 ∗ 10−2 5.43 ∗ 10−3
dn −1.46 ∗ 10−1 8.86 ∗ 10−2 2.13 ∗ 10−2 −6.70 ∗ 10−3 2.07 ∗ 10−2 −4.24 ∗ 10−3 6.17 ∗ 10−3 3.91 ∗ 10−3 −1.38 ∗ 10−3 5.43 ∗ 10−3
Now we sketch the modiﬁcations necessary for Sturm–Liouville eigenvalue problems of the type −u (x) + q(x) u(x) = λ u(x) , u(0) = 0 ,
0 < x < 1,
u (1) + Hu(1) = 0 .
(5.31a) (5.31b)
Now the eigenvalues are zeros of the characteristic function f (λ) = u2 (1, λ, q) + Hu2 (1, λ, q),
λ ∈ C.
(5.32) √ √ For the special case, where q = 0, we have u2 (x, λ, 0) = sin( λx)/ λ. The characteristic function for this case is then given by √ √ sin λ √ . g(λ) = cos λ + H λ The zeros of f for q = 0 and H = 0 are λn = (n + 1/2)2 π 2 , n = 0, 1, 2, . . . If H = 0, one has to solve the transcendental equation z cot z + H = 0. One can show (see Problem 5.2) by an application of the implicit function theorem in R2 that the eigenvalues for q = 0 behave as λn = (n + 1/2)2 π 2 + 2H + O(1/n) . Lemma 5.7 is also valid because the boundary value problem is again selfadjoint. The Counting Lemma 5.9 now takes the following form: Lemma 5.13 Let q ∈ L2 (0, 1) and N > 2 exp qL1 (1 + H) be an integer. Then we have (a) The mapping f (λ) := u2 (1, λ, q) + H u2 (1, λ, q) has exactly N zeros in the halfplane H := λ ∈ C : Re λ < N 2 π 2 .
188
Inverse Eigenvalue Problems
(b) f has exactly one zero in every set √ Um := λ ∈ C : λ − (m − 1/2)π < π/2 provided m > N . (c) There are no other zeros of f in C. For the proof, we refer to Problem 5.3. We can apply the implicit function theorem to the equation u2 (1, λn (q), q) + Hu2 (1, λn (q), q) = 0 because the zeros are again simple. Diﬀerentiating this equation with respect to q yields ˆ n , qˆ) + Hu2,λ (1, λ ˆ n , qˆ) λ (ˆ u2,λ (1, λ n q )q ˆ n , qˆ) + Hu2,q (1, λ ˆ n , qˆ) = 0 . + u2,q (1, λ Theorem 5.6 yields 1
u2 (t)2 dt
u2 (1) − u2,λ (1) u2 (1)
= u2,λ (1)
0
=−Hu2 (1)
= −u2 (1) u2,λ (1) + H u2,λ (1) ˆ n and qˆ. Analogously, we compute where again we have dropped the arguments λ 1 −
q(t)u2 (t)2 dt = −u2 (1) u2,q (1) + H u2,q (1)
0
and thus q )q λn (ˆ
u2,q (1) + H u2,q (1) = = − u2,λ (1) + H u2,λ (1)
1
q(t)u2 (t)2 dt . 1 u (t)2 dt 0 2
0
This has the same form as before. We continue as in the case of the Dirichlet boundary condition and arrive at Theorem 5.14. Theorem 5.14 Let Q ⊂ L2 (0, 1) be bounded, q ∈ Q, and H ∈ R. The eigenvalues λn have the asymptotic form λn =
1 n+ 2
2
2
1
1 q(t) dt −
π + 2H + 0
q(t) cos(2n + 1)πt dt + O(1/n) (5.33) 0
5.4
Some Hyperbolic Problems
189
as n tends to inﬁnity, uniformly in q ∈ Q. For the L2 normalized eigenfunctions, we have √ 2 sin(n + 1/2)πx + O(1/n) and (5.34a) gn (x) = √ 2 (n + 1/2) π cos(n + 1/2)πx + O(1) (5.34b) gn (x) = uniformly for x ∈ [0, 1] and q ∈ Q. As mentioned at the beginning of this section, there are other ways to prove the asymptotic formulas for the eigenvalues and eigenfunctions that avoid Lemma 5.9 and the diﬀerentiability of λn with respect to q. But the proof in, e.g., [276], seems to yield only the asymptotic behavior λn = m2n π 2 +
1 q(t) dt + O(1/n) 0
instead of (5.28). Here, (mn ) denotes some sequence of natural numbers. Before we turn to the inverse problem, we make some remarks concerning the case where q is complexvalued. Now the eigenvalue problems are no longer selfadjoint, and the general spectral theory is not applicable anymore. With respect to Lemma 5.7, it is still easy to show that the eigenfunctions corresponding to diﬀerent eigenvalues are linearly independent and that the geometric multiplicities are still one. The Counting Lemma 5.9 is valid without restrictions. From this, we observe also that the algebraic multiplicities of λn are one, at least for n > N . Thus, the remaining arguments of this section are valid if we restrict ourselves to the eigenvalues λn with n > N . Therefore, the asymptotic formulas (5.28), (5.29a), (5.29b), (5.33), (5.34a), and (5.34b) hold equally well for complexvalued q.
5.4
Some Hyperbolic Problems
As a preparation for the following sections, in particular, Sections 5.5 and 5.7, we study some initial value problems for the twodimensional linear hyperbolic partial diﬀerential equation ∂ 2 W (x, t) ∂ 2 W (x, t) − + a(x, t) W (x, t) = 0 , ∂x2 ∂t2 where the coeﬃcient a has the special form a(x, t) = p(t)−q(x). It is wellknown that the method of characteristics reduces initial value problems to Volterra integral equations of the second kind, which can be studied in spaces of continuous functions. This approach naturally leads to solution concepts for nonsmooth coeﬃcients and boundary data. We summarize the results in three theorems. In each of them, we formulate ﬁrst the results for the case of smooth coeﬃcients and then for the nonsmooth case. We remark that it is not our aim to
190
Inverse Eigenvalue Problems
relax the solution concept to the weakest possible case but rather to relax the assumptions only to the extent that are needed in Sections 5.5 and 5.7 and in Subsection 7.6.3. Although most of the problems—at least for smooth data—are subjects of elementary courses on partial diﬀerential equations, we include the complete proofs for the convenience of the reader. Before we begin with the statements of the theorems, we recall some function spaces C00 [0, 1] := f ∈ C[0, 1] : f (0) = 0 , x 1 2 H (0, 1) := f ∈ C[0, 1] : f (x) = α + g(t) dt, α ∈ R, g ∈ L (0, 1) , 1 (0, 1) H00
0
:= H 1 (0, 1) ∩ C00 [0, 1]
and equip them with their canonical norms f ∞
:=
f H 1
:=
max f (x) in C00 [0, 1], 1 f 2L2 + f 2L2 in H 1 (0, 1) and H00 (0, 1) .
0≤x≤1
1 The notations C00 [0, 1] and H00 (0, 1) should indicate that the boundary condition is set only at x = 0. By · C j for j ≥ 1 we denote the canonical norm in C j [0, 1]. Furthermore, we deﬁne the triangular regions Δ0 ⊂ R2 and Δ ⊂ R2 by Δ0 := (x, t) ∈ R2 : 0 < t < x < 1 , (5.35a) 2 Δ := (x, t) ∈ R : t < x < 1 , (5.35b)
respectively. We begin with an initial value problem, sometimes called the Goursat problem. Theorem 5.15 (a) Let p, q ∈ C[0, 1] and f ∈ C 2 [0, 1] with f (0) = 0. Then there exists a unique solution W ∈ C 2 (Δ0 ) of the following hyperbolic initial value problem: ∂ 2 W (x, t) ∂ 2 W (x, t) − + p(t) − q(x) W (x, t) = 0 ∂x2 ∂t2 W (x, x)
= f (x) ,
W (x, 0)
=
0,
0 ≤ x ≤ 1,
0 ≤ x ≤ 1.
in Δ0 ,
(5.36a) (5.36b) (5.36c)
(b) The solution operator (p, q, f ) → W has an extension to a bounded operator from L2 (0, 1) × L2 (0, 1) × C00 [0, 1] into C(Δ0 ). (c) The operator (p, q, f ) → W (1, ·), Wx (1, ·) has an extension to a bounded 1 (0, 1) into H 1 (0, 1) × L2 (0, 1). Here and operator from L2 (0, 1) × L2 (0, 1) × H00 in the following, we denote by Wx the partial derivative with respect to x.
5.4
Some Hyperbolic Problems
191
Proof: (a) First, we extend the problem to the larger region Δ and study the problem ∂ 2 W (x, t) ∂ 2 W (x, t) − + a(x, t) W (x, t) = 0 in Δ , ∂x2 ∂t2 W (x, x) W (x, −x)
= f (x) , 0 ≤ x ≤ 1 , = −f (x) , 0 ≤ x ≤ 1 ,
(5.37a) (5.37b) (5.37c)
where we have extended p(t) − q(x) to a(x, t) := p(t) − q(x) for (x, t) ∈ Δ. To treat problem (5.37a)–(5.37c), we make the change of variables x = ξ + η,
t = ξ −η.
Then (x, t) ∈ Δ if and only if (ξ, η) ∈ D, where D := (ξ, η) ∈ (0, 1) × (0, 1) : η + ξ < 1 .
(5.38)
We set w(ξ, η) := W (ξ+η, ξ−η) for (ξ, η) ∈ D. Then W solves problem (5.37a)– (5.37c) if and only if w solves the hyperbolic problem ∂ 2 w(ξ, η) = −a(ξ + η, ξ − η) w(ξ, η) , ∂ξ ∂η
(ξ, η) ∈ D ,
(5.39a)
=:˜ a(ξ,η)
w(ξ, 0) w(0, η)
= f (ξ) for ξ ∈ [0, 1] , = −f (η) for η ∈ [0, 1] .
(5.39b) (5.39c)
Now let w be a solution of (5.39a)–(5.39c). We integrate the diﬀerential equation twice and use the initial conditions. Then w solves the integral equation η ξ w(ξ, η) = 0
a ˜(ξ , η ) w(ξ , η ) dξ dη − f (η) + f (ξ) ,
(5.40)
0
for (ξ, η) ∈ D. This is a Volterra integral equation in two dimensions. We use the standard method to solve this equation by successive iteration in C(D) ˜ ∈ L2 (D). Let A be the Volterra where we assume only p, q ∈ L2 (0, 1), and thus a integral operator deﬁned by the integral on the righthand side of (5.40). By induction with respect to n ∈ N, it can easily be seen (compare with the proof of Theorem 5.18 below for a similar, but more complicated estimate) that 1 (ξ η)n/2 ,  An w (ξ, η) ≤ w∞ ˜ anL2 n!
n = 1, 2, . . . ;
1 thus An w∞ ≤ w∞ ˜ anL2 n! . Therefore, An L(C[0,1]) < 1 for suﬃciently large n, and the Neumann series converges (see Appendix A.3, Theorem A.31).
192
Inverse Eigenvalue Problems
This proves that there exists a unique solution w ∈ C(D) of (5.40). From our arguments, uniqueness also holds for (5.37a)–(5.37c). Now we prove that the solution w ∈ C(D) is even in C 2 (D). Obviously, from (5.40) and the diﬀerentiability of f , we conclude that w is diﬀerentiable with partial derivative (with respect to ξ, the derivative with respect to η is seen analogously) η wξ (ξ, η)
=
q(ξ + η ) − p(ξ − η ) w(ξ, η ) dη + f (ξ)
0
=
ξ+η ξ q(s) w(ξ, s − ξ) ds − p(s) w(ξ, ξ − s) ds + f (ξ) . ξ
ξ−η
This second form can be diﬀerentiated again. Thus w ∈ C 2 (D), and we have shown that W is the unique solution of (5.37a)–(5.37c). Because a(x, ·) is an even function and the initial data are odd functions with respect to t, we conclude from the uniqueness result that the solution W (x, ·) is also odd. In particular, this implies that W (x, 0) = 0 for all x ∈ [0, 1], which proves that W solves problem (5.36a)–(5.36c) and ﬁnishes part (a). Part (b) follows immediately from the integral equation (5.40) because the ˜ ∈ integral operator A : C(D) → C(D) depends continuously on the kernel a L2 (D). For part (c), we observe that W (1, 2ξ − 1) d W (1, 2ξ − 1) dξ
= w(ξ, 1 − ξ) , thus d = w(ξ, 1 − ξ) = wξ (ξ, 1 − ξ) − wη (ξ, 1 − ξ) dξ 1 1 wξ (ξ, 1 − ξ) + wη (ξ, 1 − ξ) . = 2 2
Wx (1, 2ξ − 1)
and
We have computed wξ already above and have 1 wξ (ξ, 1 − ξ) =
ξ q(s) w(ξ, s − ξ) ds −
ξ
p(s) w(ξ, ξ − s) ds + f (ξ)
2ξ−1
1 (0, 1). An analogous formula which is in L2 (0, 1) for p, q ∈ L2 (0, 1) and f ∈ H00 1 holds for wη and shows that W (t, ·) ∈ H (0, 1) and Wx (1, ·) ∈ L2 (0, 1) for 1 (0, 1). This ends the proof. p, q ∈ L2 (0, 1) and f ∈ H00
Remark 5.16 (a) If p, q ∈ L2 (0, 1) and f ∈ C[0, 1] with f (0) = 0, we call the solution W ∈ C(Δ0 ), given by 1 1 W (x, t) = w (x + t), (x − t) , 2 2
5.4
Some Hyperbolic Problems
193
where w ∈ C(D) solves the integral equation (5.40), the weak solution of the Goursat problem (5.36a)–(5.36c). We observe that for every weak solution W there exist sequences (pn ), (qn ) in C[0, 1] and (fn ) in C 2 [0, 1] with fn (0) = 0 and pn − pL2 → 0, qn − qL2 → 0 and fn − f ∞ → 0 such that the solutions Wn ∈ C 2 (Δ0 ) of (5.36a)–(5.36c) corresponding to pn , qn , and fn converge uniformly to W . (b) We observe from the integral equation (5.40) that w has a decomposition into w(ξ, η) = w1 (ξ, η) − f (η) + f (ξ) where w1 ∈ C 1 (D) even if only p, q ∈ L2 (0, 1). This transforms into W (x, t) = W1 (x, t) − f 12 (x − t) + f 12 (x + t) with W1 ∈ C 1 (Δ). For the special case p = q = 0, the integral equation (5.40) reduces to the wellknown solution formula ! ! 1 1 W (x, t) = f (x + t) − f (x − t) . 2 2 The next theorem studies a Cauchy problem for the same hyperbolic diﬀerential equation. Theorem 5.17 (a) Let f ∈ C 2 [0, 1], g ∈ C 1 [0, 1] with f (0) = f (0) = g(0) = 0, and p, q ∈ C[0, 1] and F ∈ C 1 (Δ0 ). Then there exists a unique solution W ∈ C 2 (Δ0 ) of the Cauchy problem ∂ 2 W (x, t) ∂ 2 W (x, t) − + p(t) − q(x) W (x, t) = F (x, t) 2 2 ∂x ∂t W (1, t) ∂ W (1, t) ∂x
in Δ0 , (5.41a)
= f (t)
for 0 ≤ t ≤ 1,
(5.41b)
= g(t)
for 0 ≤ t ≤ 1 .
(5.41c)
(b) Furthermore, the solution operator (p, q, F, f, g) → W has an extension to a 1 bounded operator from L2 (0, 1) × L2 (0, 1) × L2 (Δ0 ) × H00 (0, 1) × L2 (0, 1) into C(Δ0 ). Proof: As in the proof of Theorem 5.15, we set a(x, t) := p(t) − q(x) for (x, t) ∈ Δ and extend F to an even function on Δ by F (x, −t) = F (x, t) for (x, t) ∈ Δ0 . We also extend f and g to odd functions on [−1, 1] by f (−t) = −f (t) and g(−t) = −g(t) for t ∈ [0, 1]. Then F ∈ C 1 Δ \ ([0, 1] × {0}) ∩ C(Δ), f ∈ C 2 [−1, 1], and g ∈ C 1 [−1, 1]. We again make the change of variables x = ξ + η,
t = ξ − η,
w(ξ, η) = W (ξ + η, ξ − η) for (ξ, η) ∈ D ,
where D is given by (5.38). Then W solves (5.41a)–(5.41c) if and only if w solves ∂ 2 w(ξ, η) = a ˜(ξ, η) w(ξ, η) + F˜ (ξ, η), (ξ, η) ∈ D , ∂ξ ∂η
194
Inverse Eigenvalue Problems
where F˜ (ξ, η) = F (ξ+η, ξ−η) and a ˜(ξ, η) = −a(ξ+η, ξ−η) = q(ξ+η)−p(ξ−η. The Cauchy conditions (5.41b) and (5.41c) transform into w(ξ, 1 − ξ) = f (2ξ − 1)
and wξ (ξ, 1 − ξ) + wη (ξ, 1 − ξ) = 2 g(2ξ − 1)
for 0 ≤ ξ ≤ 1. Diﬀerentiating the ﬁrst equation and solving for wξ and wη yields wξ (ξ, 1 − ξ) = g(2ξ − 1) + f (2ξ − 1)
and wη (ξ, 1 − ξ) = g(2ξ − 1) − f (2ξ − 1)
for 0 ≤ ξ ≤ 1. Integration of the diﬀerential equation with respect to ξ from ξ to 1 − η yields ∂w(ξ, η) = − ∂η
1−η
a ˜(ξ , η) w(ξ , η) + F˜ (ξ , η) dξ + g(1 − 2η) − f (1 − 2η) .
ξ
Now we integrate this equation with respect to η from η to 1 − ξ and arrive at 1−ξ 1−η
w(ξ, η)
= η
a ˜(ξ , η ) w(ξ , η ) + F˜ (ξ , η ) dξ dη
(5.42)
ξ
1−ξ 1 1 − g(1 − 2η ) dη + f (2ξ − 1) + f (1 − 2η) 2 2 η
for (ξ, η) ∈ D. This is again a Volterra integral equation in two variables. The solution w ∈ C(D) is in C 2 (D). Indeed, since it is obviously diﬀerentiable we take the derivative with respect to η and arrive at the formula above and, after substitution of a ˜(ξ , η) and making the change of variables s = ξ − η and s = ξ + η, respectively, at the representation 1−2η 1 ∂w(ξ, η) = q(s) w(s − η, η) ds p(s) w(s + η, η) ds − ∂η ξ−η
ξ+η 1−η
F˜ (ξ , η) dξ + g(1 − 2η) − f (1 − 2η) .
− ξ
An analogous formula holds for the derivative with respect to ξ. We can diﬀer 1−η F˜ (ξ , η) dξ is diﬀerentiable in entiate again because the function ψ(ξ, η) = ξ D – although F˜ is not diﬀerentiable at the line ξ = η. The reader should try to 1 (0, 1), and g ∈ L2 (0, 1) prove this. If only p, q ∈ L2 (0, 1), F ∈ L2 (Δ0 ), f ∈ H00 then (5.42) deﬁnes the weak solution. Let A denote the integral operator 1−ξ 1−η
(Aw)(ξ, η) = η
ξ
a ˜(ξ , η ) w(ξ , η ) dξ dη ,
(ξ, η) ∈ D .
5.4
Some Hyperbolic Problems
195
By induction, it is easily seen that (An w)(ξ, η) ≤ w∞ ˜ anL2
1 (2n)!
1−ξ−η
n
for all (ξ, η) ∈ D and n ∈ N; thus anL2 An w∞ ≤ w∞ ˜
1 (2n)!
for all n ∈ N. For suﬃciently large n, we conclude that An L(C[0,1]) < 1, which again implies that (5.42) is uniquely solvable in C(D) for any p, q, g ∈ L2 (0, 1), 1 (0, 1). F ∈ L2 (Δ0 ), and f ∈ H00 For the special case p = q = 0 and F = 0, the integral equation (5.42) reduces to the wellknown d’Alembert formula 1 W (x, t) = − 2
t+(1−x)
g(τ ) dτ +
1 1 f t + (1 − x) + f t − (1 − x) . 2 2
t−(1−x)
Finally, the third theorem studies a quite unusual coupled system for a pair (W, r) of functions. We treat this system with the same methods as above. Theorem 5.18 (a) Let q ∈ C[0, 1], F ∈ C 1 (Δ0 ), f ∈ C 2 [0, 1], and g ∈ C 1 [0, 1] such that f (0) = f (0) = g(0) = 0. Then there exists a unique pair of functions (W, r) ∈ C 2 (Δ0 ) × C 1 [0, 1] with ∂ 2 W (x, t) ∂ 2 W (x, t) − − q(x) W (x, t) = F (x, t) r(x) 2 ∂x ∂t2 W (x, x) W (x, 0)
= =
1 2
in Δ0 ,
(5.43a)
x
0,
r(s) ds ,
0 ≤ x ≤ 1,
(5.43b)
0
0 ≤ x ≤ 1,
(5.43c)
and W (1, t) = f (t)
and
∂ W (1, t) = g(t) ∂x
for all t ∈ [0, 1] .
(5.43d)
(b) Furthermore, the solution operator (q, F, f, g) → (W, r) has an extension to 1 a bounded operator from L2 (0, 1) × C(Δ0 ) × H00 (0, 1) × L2 [0, 1] into C(Δ0 ) × 2 L (0, 1).
196
Inverse Eigenvalue Problems
Proof: (a) We apply the same arguments as in the proofs of Theorems 5.15 and 5.17. We extend F (x, ·) to an even function and f and g to odd functions. We again make the change of variables x = ξ + η and t = ξ − η and set F˜ (ξ, η) = F (ξ + η, ξ − η). In Theorem 5.17 (for p = 0), we have shown that the solution W of the Cauchy problem (5.43a) and (5.43d) is equivalent to the integral equation 1−ξ 1−η
w(ξ, η)
= η
q(ξ + η ) w(ξ , η ) + F˜ (ξ , η ) r(ξ + η ) dξ dη
ξ 1−ξ
g(1 − 2η ) dη +
−
1 1 f (2ξ − 1) + f (1 − 2η) 2 2
(5.44a)
η
for w(ξ, η) = W (ξ + η, ξ − η) (see equation (5.42)). From this and the initial condition (5.43b), we derive a second integral equation. We set η = 0 in (5.44a), diﬀerentiate, and substitute (5.43b). This yields the following Volterra equation after an obvious change of variables: 1 r(x) 2
1 = −
q(s) w(x, s − x) + r(s) F˜ (x, s − x) ds
x
+ g(2x − 1) + f (2x − 1) .
(5.44b) 2
Assume that there exists a solution (w, r) ∈ C(D) × L (0, 1) of (5.44a) and (5.44b). From (5.44b), we observe that r is continuous on [0, 1], and thus by (5.44a), w ∈ C 1 (D) and thus also r ∈ C 1 [0, 1] because the function ψ(x) = 1 r(s) F˜ (x, s − x) ds is diﬀerentiable on [0, 1]. Therefore, the righthand side x of (5.43a) is diﬀerentiable and we conclude as in the previous theorem that d d W (x, x) = dx w(x, 0) = 12 r(x). Now, because w ∈ C 2 (D). Furthermore, dx F (x, ·) is even and f and g are odd functions, we conclude that W (x, ·) is also odd. In particular,W (x, 0) = 0 for all x ∈ [0, 1]. This implies W (0, 0) = 0 and x thus W (x, x) = 12 0 r(s) ds. Therefore, we have shown that every solution of equations (5.44a) and (5.44b) satisﬁes (5.43a)–(5.43d) and vice versa. Now we sketch the proof that the system (5.44a), (5.44b) is uniquely solvable 1 (0, 1), for (w, r) ∈ C(D) × L2 (0, 1) for given q ∈ L2 (0, 1), F ∈ C(Δ0 ), f ∈ H00 2 and g ∈ L (0, 1). This would also include the proof of part (b). We write this system in the form (w, r) = A(w, r) + b in the product space C(D) × L2 (0, 1) 2 which we equip with the norm (w, r)∞,L2 = max{w∞ , r L (0,1) }. To apply the ﬁxed point theorem we deﬁne ﬁrst the constant c := 2 qL2 (0,1) + F ∞ and, for given (w, r) ∈ C(D) × L2 (0, 1), the functions (wn , rn ) = An (w, r). By induction we prove the following estimates n wn (ξ, η) ≤ (w, r)∞,L2 √c (1 − ξ − η)n/2 , (ξ, η) ∈ D , n! n c rn (x) ≤ (w, r)∞,L2 √ (1 − x)n/2 , x ∈ (0, 1) , n!
5.5
The Inverse Problem
197
for all n = 1, 2, . . .. 1−ξ 1−η We use the elementary integral η (1 − ξ − η )n dξ dη = ξ 1 n+2 for n = 0, 1, . . . and set q˜(ξ, η) = q(ξ + η) and r˜(ξ, η) (n+1)(n+2) (1 − ξ − η) = r(ξ + η) for abbreviation. We note that ˜ q 2L2 (D) = D q(ξ + η)2 d(ξ, η) = 1 t q(t)2 dt ≤ q2L2 (0,1) and analogously ˜ rL2 (D) ≤ rL2 (D) . 0 For n = 1 we have by the CauchySchwarz inequality w1 (ξ, η)
1−ξ 1−η
≤
w∞
1−ξ 1−η
˜ q (ξ, η) dξ dη + F ∞ η
≤
η
ξ
≤ ≤ ≤ ≤
˜ r(ξ, η) dξ dη
ξ
" # 1−ξ 1−η # # q L2 (D) + F ∞ $ (w, r)∞,L2 ˜ dξ dη η
r(x)
ξ
1 c (w, r)∞,L2 √ (1 − ξ − η) ≤ c (w, r)∞,L2 1 − ξ − η , 2 √ 2 w∞ qL2 (0,1) + rL2 (0,1) F ∞ 1 − x √ 2 (w, r)∞,L2 qL2 (0,1) + F ∞ 1 − x √ c (w, r)∞,L2 1 − x .
The step from n to n+1 is proven in just the same way. Therefore, An ∞,L2 ≤ cn √ which tends to zero as n tends to inﬁnity. Application of Theorem A.31 n!
yields that (5.44a), (5.44b) has a unique solution in C(D) × L2 (0, 1) for all 1 (0, 1), and g ∈ L2 (0, 1). q ∈ L2 (0, 1), F ∈ C(Δ0 ), f ∈ H00
5.5
The Inverse Problem
Now we study the inverse spectral problem. This is, given the eigenvalues λn of the Sturm–Liouville eigenvalue problem −u (x) + q(x) u(x) = λ u(x) , 0 < x < 1 ,
u(0) = 0 , u(1) = 0 ,
(5.45)
determine the function q. We saw in Example 5.1 that the knowledge of the spectrum {λn : n ∈ N} is, in general, not suﬃcient to determine q uniquely. We need more information, such as a second spectrum μn of an eigenvalue problem of the form −v (x) + q(x) v(x) = μ v(x) ,
v(0) = 0 , v (1) + Hv(1) = 0 ,
(5.46)
or some knowledge about the eigenfunctions. The basic tool in the uniqueness proof for this inverse problem is the use of the Gelfand–Levitan–Marchenko integral operator (see [101]). This integral operator maps solutions of initial value problems for the equation −u +qu = λu onto solutions for the equation −u + pu = λu and, most importantly, does not
198
Inverse Eigenvalue Problems
depend on λ. It turns out that the kernel of this operator is the solution for the hyperbolic boundary value problem that was studied in the previous section. Theorem 5.19 Let p, q ∈ L2 (0, 1), λ ∈ C, and u, v ∈ H 2 (0, 1) be solutions of −u (x) + q(x) u(x) = λ u(x) ,
0 < x < 1,
u(0) = 0 ,
(5.47a)
−v (x) + p(x) v(x) = λ v(x) ,
0 < x < 1,
v(0) = 0 ,
(5.47b)
such that u (0) = v (0). Also let K ∈ C(Δ0 ) be the weak solution of the Goursat problem ∂ 2 K(x, t) ∂ 2 K(x, t) − + p(t) − q(x) K(x, t) = 0 2 2 ∂x ∂t K(x, 0) K(x, x)
=
0,
=
1 2
in Δ0 ,
0 ≤ x ≤ 1, x
(5.48a) (5.48b)
q(s) − p(s) ds ,
0 ≤ x ≤ 1,
(5.48c)
0
where the triangular region Δ0 is again deﬁned by Δ0 := (x, t) ∈ R2 : 0 < t < x < 1 .
(5.49)
Then we have x u(x) = v(x) +
K(x, t) v(t) dt ,
0 ≤ x ≤ 1.
(5.50)
0
x We remark that Theorem 5.15 with f (x) = 12 0 q(s) − p(s) ds implies that this Goursat problem is uniquely solvable in the weak sense. Proof: First, let p, q ∈ C[0, 1]. Then K ∈ C 2 (Δ0 ) by Theorem 5.15. Deﬁne w by the righthand side of (5.50); that is, x K(x, t) v(t) dt
w(x) := v(x) +
for 0 ≤ x ≤ 1 .
0
Then w(0) = v(0) = 0 = u(0) and w is diﬀerentiable with
x
w (x) = v (x) + K(x, x)v(x) +
Kx (x, t) v(t) dt, 0
0 < x < 1.
5.5
The Inverse Problem
199
Again, we denote by Kx , Kt , etc., the partial derivatives. For x = 0, we have w (0) = v (0) = u (0). Furthermore, d K(x, x) + K(x, x) v (x) dx x + Kx (x, x) v(x) + Kxx (x, t) v(t) dt
w (x) = v (x) + v(x)
0
d = p(x) − λ + K(x, x) + Kx (x, x) v(x) + K(x, x) v (x) dx x (q(x) − p(t))K(x, t)v(t) + Ktt (x, t) v(t) dt . + 0
Partial integration yields x Ktt (x, t) v(t) dt 0
x
t=x K(x, t) v (t) dt + Kt (x, t) v(t) − K(x, t) v (t) t=0
= 0
x =
p(t) − λ K(x, t) v(t) dt + Kt (x, x) v(x) − K(x, x) v (x) .
0
Therefore, we have w (x)
=
d K(x, x) + Kx (x, x) + Kt (x, x) v(x) p(x) − λ + dx d =2 dx K(x,x)=q(x)−p(x)
+ q(x) − λ
=
⎡
x K(x, t) v(t) dt 0
q(x) − λ ⎣v(x) +
x
⎤ K(x, t)v(t) dt⎦ =
q(x) − λ w(x) ;
0
that is, w solves the same initial value problem as u. The Picard–Lindel¨ of uniqueness theorem for initial boundary value problems yields w = u. Thus, we have proven the theorem for smooth functions p and q. Now let p, q ∈ L2 (0, 1). Then we choose functions (pn ), (qn ) in C[0, 1] with pn → p and qn → q in L2 (0, 1), respectively. Let Kn be the solution of (5.48a)–
200
Inverse Eigenvalue Problems
(5.48c) for pn and qn . We have already shown that x un (x) = vn (x) +
Kn (x, t) vn (t) dt,
0 ≤ x ≤ 1,
0
for all n ∈ N, where un and vn solve (5.47a) and (5.47b), respectively, with un (0) = vn (0) = u (0) = v (0). From the continuous dependence results of Theorems 5.6 and 5.15(b), the functions un , vn , and Kn converge uniformly to u, v, and K, respectively. This proves the assertion of the theorem for p, q ∈ L2 (0, 1). √ √ As an example, we take p = 0 and v(x) = sin( λx)/ λ and have the following result: Example 5.20 Let u be a solution of −u (x) + q(x) u(x) = λ u(x) ,
u (0) = 1 ,
u(0) = 0 ,
(5.51)
for given q ∈ L2 (0, 1). Then we have the representation u(x) =
√ √ x sin λ x sin λ t √ dt , + K(x, t) √ λ λ
0 ≤ x ≤ 1,
(5.52)
0
where the kernel K solves the following Goursat problem in the weak sense: Kxx (x, t) − Ktt (x, t) − q(x) K(x, t) = 0 K(x, 0) K(x, x)
=
0,
=
1 2
in Δ0 ,
0 ≤ x ≤ 1,
(5.53a) (5.53b)
x q(s) ds ,
0 ≤ x ≤ 1.
(5.53c)
0
This example has an application that is interesting in itself but that we also need in Section 5.7 and later in Subsection 7.6.3 Theorem 5.21 Let λn be the eigenvalues of one of the eigenvalue problems √ (5.45) or (5.46) where again q ∈ L2 (0, 1). Then the set of functions sin λn · : √ 1 n ∈ N is complete in L2 (0, 1). This means that 0 h(x) sin λn x dx = 0 for all n ∈ N implies that h = 0. Proof: Let T : L2 (0, 1) → L2 (0, 1) be the Volterra integral operator of the second kind with kernel K; that is, x (T v)(x) := v(x) +
K(x, t) v(t) dt, 0
x ∈ (0, 1), v ∈ L2 (0, 1) ,
5.5
The Inverse Problem
201
where K solves the Goursat problem (5.53a)–(5.53c) in the weak sense. Then we √ know that T is an isomorphism from L2 (0, 1) onto itself. Deﬁne vn (x) := sin λn x for x ∈ [0, 1], n ∈ N. Let un be the eigenfunction corresponding to λn , normalized to un (0) = 1. By the preceding example, 1 u n = √ T vn λn Now, if
1 0
or
vn =
λn T −1 un .
h(x) vn (x) dx = 0 for all n ∈ N, then 1
0 =
h(x) T −1 un (x) dx =
0
1
un (x) (T ∗ )−1 h(x) dx
for all n ∈ N ,
0
where T ∗ denotes the L2 adjoint of T . Because {un /un L2 : n ∈ N} is complete in L2 (0, 1) by Lemma 5.7, we conclude that (T ∗ )−1 h = 0 and thus h = 0. Now we can prove the main uniqueness theorem. Theorem 5.22 Let H ∈ R, p, q ∈ L2 (0, 1), and λn (p), λn (q) be the eigenvalues of the eigenvalue problem −u + r u = λ u in (0, 1),
u(0) = 0, u(1) = 0 ,
corresponding to r = p and r = q, respectively. Furthermore, let μn (p) and μn (q) be the eigenvalues of −u + r u = μ u in (0, 1),
u(0) = 0, u (1) + Hu(1) = 0 ,
corresponding to r = p and r = q, respectively. If λn (p) = λn (q) and μn (p) = μn (q) for all n ∈ N, then p = q. Proof: From the asymptotics of the eigenvalues (Theorem 5.11), we conclude that λn (p)
2 2
= n π
1 +
p(t) dt + o(1),
n → ∞,
q(t) dt + o(1) ,
n → ∞,
0
λn (q)
= n2 π 2 +
1 0
and thus
1
p(t) − q(t) dt = lim (λn (p) − λn (q)) = 0 . n→∞
(5.54)
0
Now let K be the weak solution of the Goursat problem (5.48a)–(5.48c). Then K depends only on p and q and is independent of the eigenvalues λn := λn (p) =
202
Inverse Eigenvalue Problems
λn (q) and μn := μn (p) = μn (q). Furthermore, from (5.54), we conclude that K(1, 1) = 0. Now let un , vn be the eigenfunctions corresponding to λn (q) and λn (p), respectively; that is, solutions of the diﬀerential equations −un (x) + q(x) un (x) = λn un (x),
−vn (x) + p(x) vn (x) = λn vn (x)
for 0 < x < 1 with homogeneous Dirichlet boundary conditions on both sides. Furthermore, we assume that they are normalized by un (0) = vn (0) = 1. Then Theorem 5.19 is applicable and yields the relationship x K(x, t) vn (t) dt for x ∈ [0, 1] , (5.55) un (x) = vn (x) + 0
and all n ∈ N. For x = 1, the boundary conditions yield 1 0 = K(1, t) vn (t) dt for all n ∈ N .
(5.56)
0
Now we use the fact that the set {vn /vn L2 : n ∈ N} forms a complete orthonormal system in L2 (0, 1). From this, K(1, t) = 0 for all t ∈ [0, 1] follows. Now let u ˜n and v˜n be eigenfunctions corresponding to μn and q and p, respectively, with the normalization u ˜n (0) = v˜n (0) = 1. Again, Theorem 5.19 is applicable and yields the relationship (5.55) for u ˜n and v˜n instead of un und vn , respectively. Assume for the moment that K is diﬀerentiable. Then we can diﬀerentiate this equation, set x = 1, and arrive at 0 = u ˜n (1) − v˜n (1) + H u ˜n (1) − v˜n (1) 1 Kx (1, t) + H K(1, t) v˜n (t) dt . = K(1, 1) v˜n (1) + 0
=0
1
=0
We conclude that 0 Kx (1, t) v˜n (t) dt = 0 for all n ∈ N. From this, Kx (1, t) = 0 vn L2 } forms a complete orthonormal for all t ∈ (0, 1) follows because {˜ vn /˜ system. However, K is only a weak solution since p, q ∈ L2 (0, 1). From part 1 (b) (x, t) − f of Remark 5.16 we know that K has the form K(x, t) = K 1 2 (x − t) + f 12 (x + t) with K ∈ C 1 (Δ0 ) where in the present situation f (x) = 1 x 1 (approximation 2 0 (q(s) − p(s)) ds is in H00 (0, 1). Then one can easilyprove x 1 of f by C −functions) that ψ ∈ C 1 [0, 1] where ψ(x) = 0 f 12 (x ± t) v˜n (t) dt x and ψ (x) = f 12 (x ± x) v˜n (x) + 12 0 f 12 (x ± t) v˜n (t) dt. Therefore, one can argue as in the smooth case of K. Now we apply the uniqueness part of Theorem 5.17 (in particular, the integral equation (5.42) for f = g = 0 and F = 0) which yields that K has to vanish identically. In particular, this means that x 1 p(s) − q(s) ds for all x ∈ (0, 1) . 0 = K(x, x) = 2 0
5.6
A Parameter Identification Problem
203
Diﬀerentiating this equation yields that p = q.
We have seen in Example 5.1 that the knowledge of one spectrum for the Sturm–Liouville diﬀerential equation is not enough information to recover the function q uniquely. Instead of knowing the spectrum for a second pair of boundary conditions, we can use other kinds of information, as the following theorem shows: Theorem 5.23 Let p, q ∈ L2 (0, 1) with eigenvalues λn (p), λn (q), and eigenfunctions un and vn , respectively, corresponding to Dirichlet boundary conditions u(0) = 0, u(1) = 0. Let the eigenvalues coincide; that is, λn (p) = λn (q) for all n ∈ N. Let one of the following assumptions also be satisﬁed: (a) Let p and q be even functions with respect to 1/2; that is, p(1 − x) = p(x) and q(1 − x) = q(x) for all x ∈ [0, 1]. (b) Let the Neumann boundary values coincide; that is, let v (1) un (1) = n un (0) vn (0)
for all n ∈ N .
(5.57)
Then p = q. Proof: (a) From Theorem 5.7, part (e), we know that the eigenfunctions un and vn , again normalized by un (0) = vn (0) = 1, are even with respect to x = 1/2 for odd n and odd for even n. In particular, un (1) = vn (1). This reduces the uniqueness question for part (a) to part (b). (b) We follow the ﬁrst part of the proof of Theorem 5.22. From (5.56), we again conclude that K(1, t) vanishes for all t ∈ (0, 1). The additional assumption (5.57) yields that un (1) = vn (1). We diﬀerentiate (5.55), set x = 1, 1 and arrive at 0 Kx (1, t)vn (t) dt = 0 for all n ∈ N. Again, this implies that Kx (1, ·) = 0, and the proof follows the same lines as the proof of Theorem 5.22.
5.6
A Parameter Identification Problem
This section and the next two chapters are devoted to the important ﬁeld of parameter identiﬁcation problems for partial diﬀerential equations. In Chapter 6, we study the problem of impedance tomography to determine the conductivity distribution from boundary measurements, while in Chapter 7, we study the inverse scattering problem to determine the refractive index of a medium from measurements of the scattered ﬁeld. In the present section, we consider an application of the inverse Sturm–Liouville eigenvalue problem to the following parabolic initial boundary value problem. First, we formulate the direct problem: Let T > 0 and ΩT := (0, 1) × (0, T ) ⊂ R2 , q ∈ C[0, 1] and f ∈ C 2[0, T ] be given with f (0) = 0 and q(x) ≥ 0 for x ∈ [0, 1]. Determine U ∈ C ΩT ,
204
Inverse Eigenvalue Problems
which is twice continuously diﬀerentiable with respect to x and continuously diﬀerentiable with respect to t in ΩT such that ∂U/∂x ∈ C ΩT and ∂U (x, t) ∂t
=
∂ 2 U (x, t) − q(x) U (x, t) ∂x2
U (x, 0)
=
0,
U (0, t)
=
0,
x ∈ [0, 1] , ∂ U (1, t) = f (t) , ∂x
in ΩT ,
(5.58a) (5.58b)
t ∈ (0, T ) .
(5.58c)
From the theory of parabolic initial boundary value problems, it is known that there exists a unique solution of this problem. We prove uniqueness and refer to [170] or (5.60) for the question of existence. Theorem 5.24 Let f = 0. Then U = 0 is the only solution of (5.58a)–(5.58c) in ΩT . Proof: Multiply the diﬀerential equation (5.58a) by U (x, t) and integrate with respect to x. This yields 1 d 2 dt
1
U (x, t)2 dx =
0
1 0
∂ 2 U (x, t) 2 U (x, t) − q(x) U (x, t) dx . ∂x2
We integrate by parts and use the homogeneous boundary conditions: 1 d 2 dt
1
1
2
U (x, t) dx = − 0
0
∂U (x, t) ∂x
2
+ q(x) U (x, t) dx ≤ 0 . 2
1
This implies that t → 0 U (x, t)2 dx is nonnegative and monotonically nonin1 1 creasing. From 0 U (x, 0)2 dx = 0, we conclude that 0 U (x, t)2 dx = 0 for all t; that is, U = 0. Now we turn to the inverse problem. Let f be known and, in addition, U (1, t) for all 0 < t ≤ T . The inverse problem is to determine the coeﬃcient q. In this section, we are only interested in the question if this provides suﬃcient information in principle to recover q uniquely; that is, we study the question of uniqueness of the inverse problem. It is our aim to prove the following theorem: Theorem 5.25 Let U1 , U2 be solutions of (5.58a)–(5.58c) corresponding to q = q1 ≥ 0 and q = q2 ≥ 0, respectively, and to the same f ∈ C 2 [0, T ] with f (0) = 0 and f (0) = 0. Let U1 (1, t) = U2 (1, t) for all t ∈ (0, T ). Then q1 = q2 on [0, 1]. Proof: Let (q, U ) be (q1 , U1 ) or (q2 , U2 ), respectively. Let λn and gn , n ∈ N, be the eigenvalues and eigenfunctions, respectively, of the Sturm–Liouville eigenvalue problem (5.46) for H = 0; that is, −u (x) + q(x) u(x) = λ u(x),
0 < x < 1,
u(0) = u (1) = 0 .
5.6
A Parameter Identification Problem
205
We assume that the eigenfunctions are normalized by gn L2 = 1 for all n ∈ N. Furthermore, we can assume that gn (1) > 0 for all n ∈ N1 . We know that {gn : n ∈ N} forms a complete orthonormal system in L2 (0, 1). Theorem 5.14 implies the asymptotic behavior λn
˜n (n + 1/2)2 π 2 + qˆ + λ
=
gn (x)
√ 2 sin(n + 1/2)πx + O 1/n ,
=
∞
with
˜2 < ∞ , λ n
(5.59a)
n=1
(5.59b)
1
where qˆ = 0 q(x) dx. In the ﬁrst step, we derive a series expansion for the solution U of the initial boundary value problem (5.58a)–(5.58c). From the completeness of {gn : n ∈ N}, we have the Fourier expansion U (x, t) =
∞
1 an (t) gn (x) with
U (x, t) gn (x) dx, n ∈ N ,
an (t) =
n=1
0
where the convergence is understood in the L2 (0, 1)sense for every t ∈ (0, T ]. We would like to substitute this into the diﬀerential equation and the initial and boundary conditions. Because for this formal procedure the interchanging of summation and diﬀerentiation is not justiﬁed, we suggest a diﬀerent derivation of an . We diﬀerentiate an and use the partial diﬀerential equation (5.58a). This yields an (t)
1 = 0
∂U (x, t) gn (x) dx = ∂t
1 0
∂ 2 U (x, t) − q(x) U (x, t) gn (x) dx ∂x2
x=1 ∂U (x, t) − U (x, t)gn (x) = gn (x) ∂x x=0 1 + U (x, t) gn (x) − q(x) gn (x) dx 0
=−λn gn (x)
= f (t) gn (1) − λn an (t) . With the initial condition an (0) = 0, the solution is given by t an (t) = gn (1)
f (τ ) e−λn (t−τ ) dτ ;
0
that is, the solution U of (5.58a)–(5.58c) takes the form U (x, t) =
∞
t gn (1) gn (x)
n=1 1 g (1) n
(1) = 0 = 0 is impossible because of gn
0
f (τ ) e−λn (t−τ ) dτ.
(5.60)
206
Inverse Eigenvalue Problems
From partial integration, we observe that t
−λn (t−τ )
f (τ ) e 0
t
1 1 dτ = f (t) − λn λn
f (τ ) e−λn (t−τ ) dτ ,
0
and this decays as 1/λn . Using this and the asymptotic behavior (5.59a) and (5.59b), we conclude that the series (5.60) converges uniformly in ΩT . For x = 1, the representation (5.60) reduces to U (1, t)
=
∞
2
t
gn (1)
n=1
0
t =
f (τ ) e−λn (t−τ ) dτ
f (τ )
∞
gn (1)2 e−λn (t−τ ) dτ ,
n=1
0
=: A(t − τ )
t ∈ [0, T ] .
Changing the orders of integration and summation is justiﬁed by Lebesgue’s theorem of dominated convergence. This is seen from the estimate ∞
2 −λn s
gn (1) e
≤ c
n=1
∞
−n2 π 2 s
e
∞ ≤ c
n=1
e−σ
0
2
π2 s
dσ =
c √ 2 πs
√ and the fact that the function s → 1/ s is integrable in (0, T ]. Such a representation holds for U1 (1, ·) and U2 (1, ·) corresponding to q1 and q2 , respectively. We denote the dependence on q1 and q2 by superscripts (1) and (2), respectively. From U1 (1, ·) = U2 (1, ·), we conclude that t 0 =
(1)
f (τ ) A
(2)
(t − τ ) − A
(t − τ ) dτ =
0
t
f (t − τ ) A(1) (τ ) − A(2) (τ ) dτ ;
0
that is, the function w := A(1) − A(2) solves the homogeneous Volterra integral equation of the ﬁrst kind with kernel f (t − τ ). We diﬀerentiate this equation twice and use f (0) = 0 and f (0) = 0. This yields a Volterra equation of the second kind for w:
t
f (0) w(t) +
f (t − s) w(s) ds = 0,
t ∈ [0, T ] .
0
Because Volterra equations of the second kind are uniquely solvable (see Example A.32 of Appendix A.3), this yields w(t) = 0 for all t, that is ∞ ∞ (1) 2 −λ(1) t (2) 2 −λ(2) t gn (1) e n = gn (1) e n n=1
n=1
for all t ∈ (0, T ) .
5.6
A Parameter Identification Problem
207
(j)
We note that gn (1) > 0 for j = 1, 2 by our normalization. Now we can apply a result from the theory of Dirichlet series (see Lemma 5.26) and conclude that (1) (2) (1) (2) λn = λn and gn (1) = gn (1) for all n ∈ N. Applying the uniqueness result analogous to Theorem 5.23, part (b), for the boundary conditions u(0) = 0 and u (1) = 0 (see Problem 5.5), we conclude that q1 = q2 . It remains to prove the following lemma: Lemma 5.26 Let λn and μn be strictly increasing sequences that tend to inﬁnity. Let the series ∞ ∞ αn e−λn t and βn e−μn t n=1
n=1
converge for every t ∈ (0, T ] and uniformly on some interval [δ, T ]. Let the limits coincide, that is ∞ n=1
αn e−λn t =
∞
βn e−μn t
for all t ∈ (0, T ] .
n=1
If we also assume that αn = 0 and βn = 0 for all n ∈ N, then αn = βn and λn = μn for all n ∈ N. Proof: Assume that λ1 = μ1 or α1 = β1 . Without loss of generality, we can assume that μ1 ≥ λ1 (otherwise, interchange the roles of λn and μn ). Deﬁne Cn (t) := αn e−(λn −λ1 )t − βn e−(μn −λ1 )t for t ≥ δ . ∞ By analytic continuation, we conclude that n=1 Cn (t) = 0 for all t ≥ δ and that the series converges uniformly on [δ, ∞). Because C1 (t) = α1 − β1 e−(μ1 −λ1 )t and α1 = β1 or μ1 > λ1 there exist > 0 and t1 > δ such that C1 (t) ≥ for all t ≥ t1 . Choose n0 ∈ N with n0 < for all t ≥ t1 . C (t) n 2 n=1 Then we conclude that n0 n0 n0 Cn (t) = C1 (t) − Cn (t) ≥ C1 (t) − Cn (t) ≥ 2 n=2 n=1 n=1 for all t ≥ t1 . Now we let t tend to inﬁnity. The ﬁrst ﬁnite sum converges to zero, which is a contradiction. Therefore, we have shown that λ1 = μ1 and α1 = β1 . Now we repeat the argument for n = 2, etc. This proves the lemma.
208
5.7
Inverse Eigenvalue Problems
Numerical Reconstruction Techniques
In this section, we discuss some numerical algorithms for solving the inverse spectral problem which was suggested and tested by W. Rundell, P. Sacks, and others. We follow closely the papers [186, 233, 234]. From now on, we assume knowledge of eigenvalues λn and μn , n ∈ N, of the Sturm–Liouville eigenvalue problems (5.45) or (5.46). It is our aim to determine the unknown function q. Usually, only a ﬁnite number of eigenvalues is known. Then one cannot expect to recover the total function q but only “some portion” of it (see (5.62)). The ﬁrst algorithm we discuss uses the concept of the characteristic function again. For simplicity, we describe the method only for the case where q is known to be an even function; that is, q(1 − x) = q(x). Then we know that only one spectrum suﬃces to recover q (see Theorem 5.23). Recalling the characteristic function f (λ) = u2 (1, λ, q) for the problem (5.45), the inverse problem can be written as the problem of solving the equations (5.61) u2 (1, λn , q) = 0 for all n ∈ N for the function q. If we know only a ﬁnite number, say λn for n = 1, . . . , N , then we assume that q is of the form q(x; a) =
N
an qn (x) ,
x ∈ [0, 1] ,
(5.62)
n=1
for coeﬃcients a = (a1 , . . . , aN ) ∈ RN and some given linear independent even functions qn . If q is expected to be smooth and periodic, a good choice for qn is qn (x) = cos(2π(n − 1)x), n = 1, . . . , N . Equation (5.61) then reduces to the ﬁnite nonlinear system F (a) = 0, where F : RN → RN is deﬁned by Fn (a) := u2 (1, λn , q(·; a))
for a ∈ RN and n = 1, . . . , N .
Therefore, all of the welldeveloped methods for solving systems of nonlinear equations can be used. For example, Newton’s method −1 (k) , F a a(k+1) = a(k) − F a(k)
k = 0, 1, . . . ,
is known to be quadratically convergent if F (a)−1 is regular. As we know from Section 5.2, Theorem 5.6, the mapping F is continuously Fr´echet diﬀerentiable for every a ∈ RN . The computation of the derivative is rather expensive, and in general, it is not known if F (a) is regular. In [186], it was proven that F (a) is regular for suﬃciently small a and is of triangular form for a = 0. This observation leads to the simpliﬁed Newton method of the form a(k+1) = a(k) − F (0)−1 F a(k) , k = 0, 1, . . . . For further aspects of this method, we refer to [186].
5.7
Numerical Reconstruction Techniques
209
Before we describe a second algorithm, we observe that from 1 the asymptotic form (5.28) of the eigenvalues, we have an estimate of qˆ = 0 q(x) dx. Writing the diﬀerential equation in the form −un (x) + q(x) − qˆ un (x) = λn − qˆ un (x), 0 ≤ x ≤ 1 , 1 we observe that we can assume without loss of generality that 0 q(x) dx = 0. Now we describe an algorithm that follows the idea of the uniqueness Theorem 5.22. We allow q ∈ L2 (0, 1) to be arbitrary. The algorithm consists of two steps. First, we recover the Cauchy data f = K(1, ·) and g = Kx (1, ·) from the two sets of eigenvalues. Then we suggest Newtontype methods to compute q from these Cauchy data. The starting point is Theorem 5.19 for the case p = 0. We have already formulated this special case in Example 5.20. Therefore, let (λn , un ) be the eigenvalues and eigenfunctions of the eigenvalue problem (5.45) normalized such that un (0) = 1. The eigenvalues λn are assumed to be known. From Example 5.20, we have the representation √ √ x sin λn x sin λn t √ un (x) = + K(x, t) √ dt , λn λn
0 ≤ x ≤ 1,
(5.63)
0
1 where K satisﬁes (5.53a)–(5.53c) with K(1, 1) = 12 0 q(t) dt = 0. From (5.63) for x = 1, we √ can compute K(1, t) because, by Theorem 5.21, the functions vn (t) = sin λn t form a complete system in L2 (0, 1). When we know only a ﬁnite number λ1 , . . . , λN of eigenvalues, we suggest representing K(1, ·) as a ﬁnite sum of the form K(1, t) =
N
ak sin(kπt) ,
k=1
arriving at the ﬁnite linear system N k=1
1 ak
sin(kπt) sin
λn t dt = − sin λn
for n = 1, . . . , N .
(5.64)
0
The same arguments yield a set of equations for the second boundary condition u (1) + H u(1) = 0 in the form √
√ √ μn cos μn + H sin μn +
1
√ Kx (1, t) + H K(1, t) sin μn t dt = 0 ,
0
where now μn are the corresponding known eigenvalues. The representation Kx (1, t) + H K(1, t) =
N k=1
bk sin(kπt)
210
Inverse Eigenvalue Problems
leads to the system N k=1
1 bk
√ √ √ √ sin(kπt) sin μn t dt = − μn cos μn − H sin μn
(5.65)
0
for n = 1, . . . , N . Equations (5.64) and (5.65) are of the same form and we restrict ourselves to the discussion of (5.64). Asymptotically, the matrix A ∈ 1 √ RN ×N deﬁned by Akn = 0 sin(kπt) sin λn t dt is just 12 I. More precisely, from Parseval’s identity (see (A.8) from Theorem A.15 of AppendixA.2) 2 1 ∞ 1 ψ(t) sin(kπt) dt = 1 ψ(t)2 dt 2
k=1 0
0
√
we conclude that (set ψ(t) = sin λn t − sin(nπt) for some n ∈ N) ∞ 1 2 sin(kπt) sin λn t − sin(nπt) dt k=1 0
=
1 2
1 0
sin λn t − sin(nπt)2 dt ≤ 1 λn − nπ 2 2
where we used the mean value theorem. The estimate (5.30) yields λn −n2 π 2  ≤ c˜q∞ and thus λn − nπ ≤ c q∞ , n where c is independent of q and n. From this, we conclude that ∞ 1 2 2 sin(kπt) sin λn t − sin(nπt) dt ≤ c q2∞ . 2 n2
k=1 0
The matrix A is thus diagonally dominant, and therefore, invertible for suﬃciently small q∞ . Numerical experiments have shown that also for “large” values of q the numerical solution of (5.65) does not cause any problems. We are now facing the following inverse problem: Given (approximate values 1 of) the Cauchy data f = K(1, ·) ∈ H00 (0, 1) and g = Kx (1, ·) ∈ L2 (0, 1), 2 compute q ∈ L (0, 1) such that the solution of the Cauchy problem(5.41a)– x (5.41c) for p = 0 and F = 0 assumes the boundary data K(x, x) = 12 0 q(t) dt for x ∈ [0, 1]. An alternative way of formulating the inverse problem is to start with the Goursat problem (5.53a)–(5.53c): Compute q ∈ L2 (0, 1) such that the solution of the initial value problem (5.53a)–(5.53c) has Cauchy data f (t) = K(1, t) and g(t) = Kx (1, t) for t ∈ [0, 1].
5.7
Numerical Reconstruction Techniques
211
We have studied these coupled systems for K and q in Theorem 5.18. Here we apply it for the case where F = 0. It has been shown that the pair (K, r) solves the system ∂ 2 K(x, t) ∂ 2 K(x, t) − − q(x) K(x, t) = 0 in Δ0 , ∂x2 ∂t2
K(x, x) K(x, 0)
= =
1 2
x
0,
r(t) dt ,
0 ≤ x ≤ 1,
0
0 ≤ x ≤ 1,
and
∂ K(1, t) = g(t) for all t ∈ [0, 1] ∂x if and only if w(ξ, η) = K(ξ + η, ξ − η) and r solve the system of integral equations (5.44a) and (5.44b). For this special choice of F , (5.44b) reduces to K(1, t) = f (t) and
1 r(x) = − 2
1
q(s) K(s, 2x − s) ds + g(2x − 1) + f (2x − 1) ,
(5.66)
x
where we have extended f and g to odd functions on [−1, 1]. Denote by T (q) the expression on the righthand side of (5.66). For the evaluation of T (q), one has to solve the Cauchy problem (5.41a)–(5.41c) for p = 0. Note that the solution K; that is, the kernel K(y, 2x − y) of the integral operator T , also depends on q. The operator T is therefore nonlinear! The requirement r = q leads to a ﬁxed point equation q = 2T (q) in L2 (0, 1). It was shown in [233] that there exists at most one ﬁxed point q ∈ L∞ (0, 1) of T . Even more, Rundell and Sachs proved that the projected operator PM T is a contraction on the ball BM := q ∈ L∞ (0, 1) : q∞ ≤ M with respect to some weighted L∞ norms. Here, PM denotes the projection onto BM deﬁned by q(x), q(x) ≤ M, (PM q)(x) = M sign q(x), q(x) > M. Also, they showed the eﬀectiveness of the iteration method q (k+1) = 2T (q (k) ) by several numerical examples. We observe that for q (0) = 0 the ﬁrst iterate q (1) is simply q (1) (x) = 2 g(2x − 1) + 2 f (2x − 1), x ∈ [0, 1]. We refer to [233] for more details. As suggested earlier, an alternative numerical procedure based on the kernel 2 1 2 function K is to deﬁne the operator S from L (0, 1) into H00 (0, 1) × L (0, 1) by S(q) = K(1, ·), Kx (1, ·) , where K solves the Goursat problem (5.53a)–(5.53c) in the weak sense. This operator is welldeﬁned and bounded by Theorem 5.15, 1 (0, 1) and g ∈ L2 (0, 1) are the given Cauchy values K(1, ·) part (c). If f ∈ H00
212
Inverse Eigenvalue Problems
and Kx (1, ·), respectively, then we have to solve the nonlinear equation S(q) = (f, g). Newton’s method does it by the iteration procedure q (k+1) = q (k) − S (q (k) )−1 S(q (k) ) − (f, g) , k = 0, 1, . . . . (5.67) For the implementation, one has to compute the Fr´echet derivative of S. Using the Volterra equation (5.40) derived in the proof of Theorem 5.15, it is not diﬃ cult to prove that S is Fr´echet diﬀerentiable and that S (q)r = W (1, ·), Wx (1, ·) , where W solves the inhomogeneous Goursat problem Wxx (x, t) − Wtt (x, t) − q(x) W (x, t) = K(x, t) r(x) W (x, 0) W (x, x)
=
0,
=
1 2
in Δ0 ,
0 ≤ x ≤ 1,
(5.68a) (5.68b)
x 0 ≤ x ≤ 1.
r(t) dt ,
(5.68c)
0
In part (b) of Theorem 5.18, we showed that S (q) is an isomorphism. We reformulate this result. Theorem 5.27 Let q ∈ L2 (0, 1) and K be the weak solution of (5.53a)–(5.53c). 1 (0, 1) and g ∈ L2 (0, 1), there exists a unique r ∈ L2 (0, 1) and For every f ∈ H00 a weak solution W of (5.68a)–(5.68c) with W (1, ·) = f and Wx (1, ·) = g in the sense of (5.44a), (5.44b); that is, S (q) is an isomorphism. Implementing Newton’s method is quite expensive because in every step one has to solve a coupled system of the form (5.68a)–(5.68c). Rundell and Sachs suggested a simpliﬁed Newton method of the form q (k+1) = q (k) − S (0)−1 S q (k) − (f, g) , k = 0, 1, . . . . Because S(0) = 0, we can invert the linear operator S (0) analytically. In particular, we have S (0)r = W (1, ·), Wx (1, ·) , where W now solves Wxx (x, t) − Wtt (x, t) = 0 W (x, 0) = 0 ,
1 and W (x, x) = 2
in Δ0 ,
x 0 ≤ x ≤ 1,
r(t) dt, 0
because also K = 0. The solution W of the Cauchy problem Wxx (x, t) − Wtt (x, t) = 0 W (1, t) = f (t),
and
in Δ0 ,
Wx (1, t) = g(t),
0 ≤ t ≤ 1,
is given by 1 W (x, t) = − 2
t+(1−x)
g(τ ) dτ + t−(1−x)
1 1 f t + (1 − x) + f t − (1 − x) , 2 2
5.8
Problems
213
where we have extended f and g to odd functions again. The solution r of S (0)r = (f, g) is, therefore, given by r(x) = 2
d W (x, x) = 2 f (2x − 1) + 2 g(2x − 1) . dx
In this chapter, we have studied only one particular inverse eigenvalue problem. Similar theoretical results and constructive algorithms can be obtained for other inverse spectral problems; see [4, 16]. For an excellent overview, we refer to the lecture notes by W. Rundell [232].
5.8
Problems
5.1 Let q, f ∈ C[0, 1] and q(x) ≥ 0 for all x ∈ [0, 1]. (a) Show that the following boundary value problem on [0, 1] has at most one solution u ∈ C 2 [0, 1]: −u (x) + q(x) u(x) = f (x) ,
u(0) = u(1) = 0 .
(5.69)
(b) Let v1 and v2 be the solutions of the following initial value problems on [0, 1]: −v1 (x) + q(x) v1 (x) = 0 , −v2 (x) + q(x) v2 (x) = 0 ,
v1 (0) = 0 , v1 (0) = 1 , v2 (1) = 0 , v2 (1) = 1 .
Show that the Wronskian v1 v2 −v2 v1 is constant. Deﬁne the following function for some a ∈ R: a v1 (x) v2 (y), 0 ≤ x ≤ y ≤ 1, G(x, y) = a v2 (x) v1 (y), 0 ≤ y < x ≤ 1. Determine a ∈ R such that 1 G(x, y) f (y) dy,
u(x) :=
x ∈ [0, 1] ,
0
solves (5.69). The function G is called Green’s function of the boundary value problem (5.69). (c) Show that the eigenvalue problem −u (x) + q(x) u(x) = λ u(x),
u(0) = u(1) = 0 ,
is equivalent to the eigenvalue problem for the integral equation 1 u(x) = λ
1 G(x, y) u(y) dy, 0
x ∈ [0, 1] .
214
Inverse Eigenvalue Problems Prove Theorem 5.7, parts (a) and (b) by the general spectral theorem (Theorem A.53 of Appendix A.6). (d) How can one treat the case of part (c) when q changes sign?
5.2 Let H ∈ R. Prove that the transcendental equation z cot z + H = 0 has a countable number of zeros zn and that zn = (n + 1/2) π + From this,
H + O(1/n2 ) . (n + 1/2)π
zn2 = (n + 1/2)2 π 2 + 2H + O(1/n)
follows. Hint: Make the substitution z = x + (n + 1/2)π, set ε = 1/(n + 1/2)π, write z cot z +H = 0 in the form f (x, ε) = 0, and apply the implicit function theorem. 5.3 Prove Lemma 5.13. 5.4 Let q ∈ C[0, 1] be real or complexvalued and λn , gn be the eigenvalues and L2 normalized eigenfunctions, respectively, corresponding to q and boundary conditions u(0) = 0 and hu (1)+Hu(1) = 0. Show by modifying the proof of Theorem 5.21 that {gn : n ∈ N} is complete in L2 (0, 1). This gives—even for real q—a proof diﬀerent from the one obtained by applying the general spectral theory. 5.5 Consider the eigenvalue problem on [0, 1]: −u (x) + q(x) u(x) = λ u(x),
u(0) = u (1) = 0 .
By modifying the proof of Theorem 5.22, prove the following uniqueness result for the inverse problem: Let (λn , un ) and (μn , vn ) be the eigenvalues and eigenfunctions corresponding to p and q, respectively. If λn = μn for all n ∈ N and un (1) vn (1) = for all n ∈ N , un (0) vn (0) then p and q coincide.
Chapter 6
An Inverse Problem in Electrical Impedance Tomography 6.1
Introduction
Electrical impedance tomography (EIT) is a medical imaging technique in which an image of the conductivity (or permittivity) of part of the body is determined from electrical surface measurements. Typically, conducting electrodes are attached to the skin of the subject and small alternating currents are applied to some or all of the electrodes. The resulting electrical potentials are measured, and the process may be repeated for numerous diﬀerent conﬁgurations of applied currents. Applications of EIT as an imaging tool can be found in ﬁelds such as medicine (monitoring of the lung function or the detection of skin cancer or breast cancer), geophysics (locating of underground deposits, detection of leaks in underground storage tanks), or nondestructive testing (determination of cracks in materials). To derive the EIT model, we start from the timeharmonic Maxwell system in the form curl H + (iωε − γ) E = 0 ,
curl E − iωμ H = 0
in some domain which we take as a cylinder of the form B × R ⊂ R3 with bounded crosssection B ⊂ R2 . Here, ω, ε, γ, and μ denote the frequency, electric permittivity, conductivity, and magnetic permeability, respectively, which are all assumed to be constant along the axis of the cylinder; that is, depend on x1 and x2 only. We note that the real parts Re exp(−iωt) E(x) and Re exp(−iωt) H(x) are the physically meaningful electric and magnetic ﬁeld, respectively. For low frequencies ω (i.e., for small (ωμγ) · L2 where L is a typical
© Springer Nature Switzerland AG 2021 A. Kirsch, An Introduction to the Mathematical Theory of Inverse Problems, Applied Mathematical Sciences 120, https://doi.org/10.1007/9783030633431 6
215
216
Electrical Impedance Tomography
length scale of B), one can show (see, e.g., [47]) that the Maxwell system is approximated by curl H − γ E = 0 ,
curl E = 0 .
The second equation yields the existence1 of a scalar potential u such that E = −∇u. this into the ﬁrst equation and taking the divergence yields Substituting div γ∇u = 0 in the cylinder. We restrict ourselves to the twodimensional case and consider the conductivity equation div (γ∇u) = 0
in B .
(6.1)
There are several possibilities for modeling the attachment of the electrodes on the boundary ∂B of B. The simplest of these is the continuum model in which the potential U = u∂B and the boundary current distribution f = γ ∇u · ν = γ ∂u/∂ν are both given on the boundary ∂B. Here, ν = ν(x) is the unit normal vector at x ∈ ∂B directed into the exterior of B. First, we observe that,2 by the divergence theorem, ∂u d = div γ∇u dx = γ f d ; 0 = ∂ν B
∂B
∂B
that is, the boundary current distribution f has zero mean. In practice, f (x) is not known for all x ∈ ∂B. One actually knows the currents sent along wires attached to N discrete electrodes that in turn are attached to the boundary ∂B. Therefore, in the gap model one approximates f by assuming that f is constant at the surface of each electrode and zero in the gaps between the electrodes. An even better choice is the complete model . Suppose that fj is the electric current sent through the wire attached to the jth electrode. At the surface Sj of this electrode, the current density satisﬁes ∂u d = fj . γ ∂ν Sj
In the gaps between the electrodes, we have γ
∂u = 0 in ∂B \ Sj . ∂ν j
If electrochemical eﬀects at the contact of Sj with ∂B are taken into account, the Dirichlet boundary condition u = Uj on Sj is replaced by u + zj γ
∂u = Uj ∂ν
on Sj ,
where zj denotes the surface impedance of the jth electrode. We refer to [20, 145, 146, 252] for a discussion of these electrode models and the wellposedness of the corresponding boundary value problems (for given γ). 1 If
the domain is simply connected. γ, f , and u are smooth enough.
2 Provided
6.2 The Direct Problem
217
In the inverse problem of EIT the conductivity function γ is unknown and has to be determined from simultaneous measurements of the boundary voltages U and current densities f , respectively. In this introductory chapter on EIT, we restrict ourselves to the continuum model as the simplest electrode model. We start with the precise mathematical formulation of the direct and the inverse problem and prove wellposedness of the direct problem: existence, uniqueness, and continuous dependence on both the boundary data f and the conductivity γ. Then we consider the inverse problem of EIT. The question of uniqueness is addressed, and we prove uniqueness of the inverse linearized problem. This problem is interesting also from an historical point of view because the proof, given in Calder´ on’s fundamental paper [38], has inﬂuenced research on inverse medium problems monumentally. In the last section, we introduce a technique to determine the support of the contrast γ − γ1 where γ1 denotes the known background conductivity. This factorization method has been developed fairly recently—after publication of the ﬁrst edition of this monograph—and is a prominent member of a whole class of newly developed methods subsumed under the name Sampling Methods.
6.2
The Direct Problem and the Neumann–Dirichlet Operator
Let B ⊂ R2 be a given bounded domain with boundary ∂B and γ : B → R and f : ∂B → R be given realvalued functions. The direct problem is to determine u such that div (γ∇u) = 0
in B ,
γ
∂u = f ∂ν
on ∂B .
(6.2)
Throughout this chapter, ν = ν(x) again denotes the exterior unit normal vector at x ∈ ∂B. As mentioned in the introduction, we have to assume that ∂B f d = 0. Therefore, throughout this chapter, we make the following assumptions on B, γ, and f : Assumption 6.1 (a) B ⊂ R2 is a bounded Lipschitz domain3 such that the exterior of B is connected. (b) γ ∈ L∞ (B), and there exists γ0 > 0 such that γ(x) ≥ γ0 for almost all x ∈ B. (c) f ∈ L2 (∂B) where L2 (∂B)
=
2
f ∈ L (∂B) : ∂B
3 For
a definition see, e.g., [191, 161].
f d = 0 .
218
Electrical Impedance Tomography
In this chapter and the following one on scattering theory, we have to use Sobolev spaces as the appropriate functions spaces for the solution. For a general theory on Sobolev spaces, we refer to the standard literature such as [1, 191]. For obvious reasons, we also refer to the monograph [161], Sections 4.1 and 5.1. Also, in Appendix A.5, we introduce and study the Sobolev space H 1 (B) for the particular case of B being the unit disc. At this place we recall only the very basic deﬁnition. For any open and bounded set B ⊂ R2 the Sobolev space H 1 (B) is deﬁned as the completion of C 1 (B) with respect to the norm uH 1 (B) = ∇u2 + u2 dx . B
An important property of Sobolev spaces for Lipschitz domains is the existence of traces; that is, for every u ∈ H 1 (B) the trace u∂B on ∂B is welldeﬁned and represents an L2 (∂B)function4 (Theorem 5.10 in [161]). Also, there exists cT > 0 (independent of u) such that u∂B L2 (∂B) ≤ cT uH 1 (B) for all u ∈ H 1 (B); that is, the trace operator u → u∂B is bounded. We note that the solution u of (6.2) is only unique up to an additive constant. Therefore, we normalize the solution u ∈ H 1 (B) such that it has vanishing mean on the boundary; that is, u ∈ H1 (B) where
u d = 0 . (6.3) H1 (B) = u ∈ H 1 (B) : ∂B
The formulation (6.2) of the boundary value problem has to be understood in the variational (or weak) sense. By multiplying the ﬁrst equation of (6.2) with some test function ψ and using Green’s ﬁrst formula we arrive at 0 = ψ div γ∇u dx = − γ ∇ψ · ∇u dx + ψ γ∇u · ν d B
= −
γ ∇ψ · ∇u dx +
B
B
∂B
ψ f d . ∂B
Therefore, we deﬁne the variational solution u ∈ H1 (B) of (6.2) by the solution of γ ∇ψ · ∇u dx = ψ f d for all ψ ∈ H1 (B) . (6.4) B
∂B
Existence and uniqueness follows from the representation theorem due to Riesz (cf. Theorem A.23 of Appendix A.3). Theorem 6.2 Let Assumption 6.1 be satisﬁed. For every f ∈ L2 (∂B) there exists a unique variational solution u ∈ H1 (B) of (6.2), that is, a solution 4 The
trace is even more regular and belongs to the fractional Sobolev space H 1/2 (∂B).
6.2 The Direct Problem
219
of the variational equation (6.4). Furthermore, there exists a constant c > 0 (independent of f ) such that uH 1 (B) ≤ c f L2 (∂B) . In other words: the operator f → u from L2 (∂B) to H1 (B) is bounded. Proof: We deﬁne a new inner product in the space H1 (B) by (u, v)∗ = γ ∇u · ∇v dx , u, v ∈ H1 (B) . B
The corresponding norm u∗ = (u, u)∗ is equivalent to the ordinary norm · H 1 (B) in H1 (B). This follows from Friedrich’s inequality in the form (see Theorem A.50 for the case of B being the unit disc and [1, 191] for more general Lipschitz domains): There exists cF > 0 such that vL2 (B) ≤ cF ∇vL2 (B)
for all v ∈ H1 (B) .
(6.5)
Indeed, from this the equivalence follows inasmuch as γ0 v2H 1 (B) ≤ γ0 ∇v2L2 (B) ≤ v2∗ ≤ γ∞ v2H 1 (B) 1 + c2F
(6.6)
for all v ∈ H1 (B). For ﬁxed f ∈ L2 (∂B) we can interpret the righthand side of (6.4) as a linear functional F on the space H1 (B); that is, F (ψ) = (f, ψ)L2 (∂B) for ψ ∈ H1 (B). This functional F is bounded by the inequality of CauchySchwarz and the trace theorem (with constant cT > 0) because F (ψ) ≤ f L2 (∂B) ψL2 (∂B) ≤ cT f L2 (∂B) ψH 1 (B) ≤ c f L2 (∂B) ψ∗ with c = cT (1 + c2F )/γ0 . In particular, F H1 (B)∗ ≤ c f L2 (∂B) , and we can apply the representation theorem of Riesz in the Hilbert space H1 (B), (·, ·)∗ : there exists a unique u ∈ H1 (B) with (u, ψ)∗ = F (ψ) for all ψ ∈ H1 (B). This is exactly the variational equation (6.4). Furthermore, u∗ = F H1 (B)∗ and thus by (6.6), u2H 1 (B) ≤
1 + c2F 1 + c2F 1 + c2F u2∗ = F 2H1 (B)∗ ≤ c2 f 2L2 (∂B) ; γ0 γ0 γ0
that is, the operator f → u is bounded from L2 (∂B) into H1 (B).
This theorem implies the existence and boundedness of the Neumann–Dirichlet operator. Definition 6.3 The Neumann–Dirichlet operator Λ : L2 (∂B) → L2 (∂B) is deﬁned by Λf = u∂B , where u ∈ H1 (B) is the uniquely determined variational solution of (6.2); that is, the solution of (6.4).
220
Electrical Impedance Tomography
Remark: This operator is bounded by the boundedness of the solution map f → u from L2 (∂B) to H1 (B) and the boundedness of the trace operator from H1 (B) to L2 (∂B). It is even compact because the trace operator is compact from H1 (B) to L2 (∂B). However, we do not make use of this latter property. We show some properties of the Neumann–Dirichlet operator. Theorem 6.4 Let Assumption 6.1 be satisﬁed. Then the Neumann–Dirichlet map Λ is selfadjoint and positive; that is, (Λf, g)L2 (∂B) = (f, Λg)L2 (∂B) and (Λf, f )L2 (∂B) > 0 for all f, g ∈ L2 (∂B), f = 0. Proof: This follows simply from the deﬁnition of Λ and Green’s ﬁrst identity. Let u, v ∈ H1 (B) be the solutions of (6.4) corresponding to boundary data f and g, respectively. Then, by (6.4) for the pair g, v and the choice ψ = u (note that u∂B = Λf ), u g d = γ ∇u · ∇v dx , (Λf, g)L2 (∂B) = ∂B
B
and this term is symmetric with respect to u and v. For f = g this also yields that Λ is positive. In the following, we write Λγ to indicate the dependence on γ. The next interesting property is a monotonicity result. Theorem 6.5 Let γ1 , γ2 ∈ L∞ (B) with γ1 ≥ γ2 ≥ γ0 a.e. on B. Then Λγ1 ≤ Λγ2 in the sense that Λγ1 f, f L2 (∂B) ≤ Λγ2 f, f L2 (∂B) for all f ∈ L2 (∂B) . Proof: For ﬁxed f ∈ L2 (∂B) let uj ∈ H1 (B) be the corresponding solution of (6.2) for γj , j = 1, 2. From (6.4) (for γ = γ2 , u = u2 , and ψ = u1 − u2 ) we conclude that (Λγ1 − Λγ2 )f, f L2 (∂B) = (u1 − u2 ) f d = γ2 (∇u1 − ∇u2 ) · ∇u2 dx ∂B
which proves the result.
B
γ2 ∇u1 2 − ∇u2 2 − ∇(u1 − u2 )2 dx
=
1 2
≤
1 2
≤
1 2
=
1 (Λγ1 − Λγ2 )f, f L2 (∂B) 2
B
B
B
γ2 ∇u1 2 dx −
1 2
1 γ1 ∇u1 2 dx − 2
γ2 ∇u2 2 dx
B
B
γ2 ∇u2 2 dx
6.3
6.3
The Inverse Problem
221
The Inverse Problem
As described in the introduction, the problem of electrical impedance tomography is to determine (properties of) the conductivity distribution γ from all —or at least a large number of—pairs (f, u∂B ). Because u∂B = Λf we can rephrase this problem as follows: Inverse Problem: Determine the conductivity γ from the given Neumann– Dirichlet operator Λγ : L2 (∂B) → L2 (∂B)! As we have seen already in the previous chapter (and this is typical for studying inverse problems), an intensive investigation of the direct problem has to precede the treatment of the inverse problem. In particular, we study the dependence of the Neumann–Dirichlet map on γ. First, we show with an example that the inverse problem of impedance tomography is illposed. Example 6.6 Let B = B(0, 1) be the unit disk, qˆ > 0 constant, and R ∈ (0, 1). We deﬁne γR ∈ L∞ (B) by 1, R < x < 1 , γR (x) = 1 + qˆ , x < R . Because γR is piecewise constant the solution u ∈ H1 (B) is a harmonic function in B(0, 1) \ {x : x = R} (that is, Δu = 0 in B(0, 1) \ {x : x = R}) and satisﬁes the jump conditions u− = u+ and (1 + qˆ)∂u/∂r− = ∂u/∂r+ for x = R where v± denotes the trace of v from the interior (−) and exterior (+) of {x : x = R}, respectively. We refer to Problem 6.1 for a justiﬁcation of this statement. We solve the boundary value problem (6.2) by expanding the boundary data f ∈ L2 (∂B) and the solution u into Fourier series; that is, for f (ϕ) = fn einϕ , ϕ ∈ [0, 2π] , n=0
we make an ansatz for the solution of (6.2) in the form ⎧ n inϕ ⎪ (bn + cn ) Rr e , ⎪ ⎨ n=0 u(r, ϕ) = n −n inϕ ⎪ ⎪ bn Rr e + cn Rr , ⎩
r < R, r > R.
n=0
The ansatz already guarantees that u is continuous on the circle r = R. The unknown coeﬃcients bn , cn are determined from the conditions (1+ˆ q ) ∂u/∂r− = ∂u/∂r + for r = R and ∂u/∂r = f for r = 1. This yields the set of equations (1 + qˆ) (bn + cn ) = bn − cn and bn
n − cn n Rn = fn Rn
222
Electrical Impedance Tomography
for all n = 0 which yields explicit formulas for bn and cn . Substituting this into the form of u and taking r = 1 yields (ΛγR f )(ϕ) = u(1, ϕ) =
α − R2n fn einϕ , 2n n α + R n=0
ϕ ∈ [0, 2π] ,
(6.7)
with α = 1 + 2/ˆ q . We observe that ΛγR is a diagonal operator from L2 (0, 2π) into itself with eigenvalues that behave asymptotically as 1/n. Therefore, the natural setting for ΛγR is to consider it as an operator from the Sobolev space −1/2 1/2 H (0, 2π) of order −1/2 into the Sobolev space H (0, 2π) of order 1/2; see Section A.4 of Appendix A. We prefer the setting in L2 (0, 2π) because the more general setting does not give any more insight with respect to the inverse problem. Let Λ1 be the operator with γ = 1 which is given by (Λ1 f )(ϕ) =
fn einϕ , n
ϕ ∈ [0, 2π] .
n=0
We estimate the diﬀerence by (ΛγR −
Λ1 )f 2L2 (0,2π)
=
2 α − R2n fn 2 2π − 1 α + R2n n2 n=0
R4n fn 2 2 2n n2 n=0 α + R
=
8π
≤
4 R4 f 2L2 (0,2π) ; α2
that is,
2R2 ≤ 2R2 α because α ≥ 1. Therefore, we have convergence of ΛγR to Λ1 in the operator norm as R tends to zero. On the other hand, the diﬀerence γR − 1∞ = qˆ is constant and does not converge to zero as R tends to zero. This shows clearly that the inverse problem to determine γR from Λ is illposed. One can argue that perhaps the supnorm for γ is not appropriate to measure the error in γ. Our example, however, shows that even if we replace qˆ by a constant qˆR which depends on R such that limR→0 qˆR = ∞ we still have convergence of ΛγR to Λ1 in the operator norm as R tends to zero. Taking, for example, qˆR = qˆ/R3 , we observe that γR − 1Lp (B) → ∞ as R tends to zero for arbitrary p ≥ 1, and the problem of impedance tomography is also illposed with respect to any Lp norm. ΛγR − Λ1 L(L2 (0,2π)) ≤
A fundamental question for every inverse problem is the question of uniqueness: is the information—at least in principle—suﬃcient to determine the
6.3
The Inverse Problem
223
unknown quantity? Therefore, in electrical impedance tomography, we ask: does the knowledge of the Neumann–Dirichlet operator Λ determine the conductivity γ uniquely or is it possible that two diﬀerent γ correspond to the same Λ? In full generality, this fundamental question was not answered until 2006 by K. Astala and L. P¨ aiv¨ arinta in [10]. We state the result without proof. Theorem 6.7 Let γ1 , γ2 ∈ L∞ (B) with γj (x) ≥ γ0 for j = 1, 2 and almost all x ∈ B. We denote the corresponding Neumann–Dirichlet operators by Λ1 and Λ2 , respectively. If Λ1 = Λ2 then γ1 = γ2 in B. Instead of proving this theorem which uses reﬁned arguments from complex analysis, we consider the linearized problem. Therefore, writing Λ(γ) instead of Λγ to indicate the dependence on γ, we consider the linear problem Λ(γ) + Λ (γ)q = Λmeas , (6.8) where Λ (γ) : L∞ (B) → L L2 (∂B) denotes the Fr´echet derivative of the non linear operator γ → Λ(γ) from L∞ (B) to L L2 (∂B) at γ. Here, L L2 (∂B) denotes again the space of all linear and bounded operators from L2 (∂B) into itself equipped with the operator norm. The righthand side Λmeas ∈ L L2 (∂B) is given (“measured”), and the contrast q ∈ L∞ (B) has to be determined. Theorem 6.8 Let U ⊂ L∞ (B) be given by U = γ ∈ L∞ (B) : γ ≥ γ0 a.e. on B . (a) The mapping γ → Λ(γ) from U to L L2 (∂B) is Lipschitz continuous. (b) The mapping γ → Λ(γ) from U to L L2 (∂B) is Fr´echet diﬀerentiable. The Fr´echet U in the direction q ∈ L∞ (B) is given derivative Λ (γ) at γ ∈ 1 by Λ (γ)q f = v∂B where v ∈ H (B) solves div γ∇v = − div q∇u in B ,
γ
∂u ∂v = −q on ∂B , ∂ν ∂ν
(6.9)
and u ∈ H1 (B) solves (6.2) with data γ ∈ U and f ∈ L2 (∂B). The solution of (6.9) is again understood in the weak sense; that is, γ ∇ψ · ∇v dx = − q ∇ψ · ∇u dx for all ψ ∈ H1 (B) . (6.10) B
B
Proof: (a) Let γ1 , γ2 ∈ U , f ∈ L2 (∂B), and u1 , u2 ∈ H1 (B) be the corresponding weak solutions of (6.2). Taking the diﬀerence of (6.4) for the triples (γ1 , u1 , f ) and (γ2 , u2 , f ) yields γ1 ∇(u1 − u2 ) · ∇ψ dx = (γ2 − γ1 ) ∇u2 · ∇ψ dx for all ψ ∈ H1 (B) . B
B
224
Electrical Impedance Tomography
With ψ = u1 − u2 and the lower bound γ0 ≤ γ1 this yields 2 γ0 ∇(u1 − u2 )2L2 (B) ≤ γ1 ∇(u1 − u2 ) dx B
(γ2 − γ1 ) ∇u2 · ∇(u1 − u2 ) dx
= B
≤
γ1 − γ2 ∞ ∇(u1 − u2 )L2 (B) ∇u2 L2 (B) ;
that is, there exists a constant c1 > 0 (independent of γ1 , γ2 ) with ∇(u1 − u2 )L2 (B)
1 γ1 − γ2 ∞ ∇u2 L2 (B) γ0 ≤ c1 γ1 − γ2 ∞ f L2 (∂B) , ≤
(6.11)
where we use the boundedness of the mapping f → u2 (see Theorem 6.2). Now we use the trace theorem and (6.6) to conclude that Λ(γ1 )f − Λ(γ2 )f L2 (∂B) = (u1 − u2 )∂B L2 (∂B) ≤ c2 γ1 − γ2 ∞ f L2 (∂B) ; that is, Λ(γ1 ) − Λ(γ2 )L(L2 (∂B)) ≤ c2 γ1 − γ2 ∞ which proves part (a). (b) Let γ ∈ U and q ∈ L∞ (B) such that q∞ ≤ γ0 /2. Then γ + q ≥ γ0 /2 a.e. on B. Let u, uq ∈ H1 (B) correspond to γ and γ + q, respectively, and boundary data f . Subtraction of (6.4) for the triple (γ, u, f ) and (6.10) from (6.4) for (γ + q, uq , f ) yields γ ∇(uq − u − v) · ∇ψ dx = q ∇(u − uq ) · ∇ψ dx for all ψ ∈ H1 (B) . B
B
Taking ψ = uq − u − v yields as in part (a) an estimate of the form ∇(uq − u − v)L2 (B) ≤
1 q∞ ∇(u − uq )L2 (B) . γ0
Now we use (6.11) (with u1 = u and u2 = uq ) to conclude that ∇(uq − u − v)L2 (B) ≤
c1 q2∞ f L2 (∂B) . γ0
Again by the trace theorem and (6.6) this yields Λ(γ + q)f − Λ(γ)f − Λ (γ)q f L2 (∂B) = (uq − u − v) 2 ∂B L (∂B) ≤ which proves part (b).
c q2∞ f L2 (∂B) ,
6.3
The Inverse Problem
225
We now show that, for any given constant background medium γ, the linearized inverse problem of electrical impedance tomography (6.8) has at most one solution; that is, the Fr´echet derivative is onetoone. As already mentioned in the introduction, this proof is due to Calder´ on (see [38]) and has “opened the door” to many uniqueness results in tomography and scattering theory. We come back to this method in the next chapter where we prove uniqueness of an inverse scattering problem by this method. ∞ Theorem 6.9 Let γ be constant. Then the Fr´echet derivative Λ (γ) : L (B) → 2 L L (∂B) is onetoone.
Proof: First we note that we can assume without loss of generality that γ = 1. Let q ∈ L∞ (B) such that Λ (γ)q = 0; that is, Λ (γ)q f = 0 for all f ∈ L2 (∂B). The proof consists of two parts. First, we show that q is orthogonal to all products of two gradients of harmonic functions. Then, in the second part, by choosing special harmonic functions we show that the Fourier transform of q vanishes. Let u1 ∈ C 2 (B) be any harmonic function; that is, Δu1 = 0 in B. Deﬁne f ∈ L2 (∂B) by f = ∂u1 /∂ν on ∂B. Then u1 is the solution of (6.2) with Neumann boundary data f . We denote by v1 ∈ H1 (B) the corresponding solution of (6.10); that is,
∇ψ · ∇v1 dx = − B
q ∇ψ · ∇u1 dx
for all ψ ∈ H1 (B) .
B
Now we take a second arbitrary harmonic function u2 ∈ C 2 (B) and set ψ = u2 in the previous equation. This yields
q ∇u2 · ∇u1 dx = −
B
∇u2 · ∇v1 dx = −
B
∂B
v1
∂u2 d ∂ν
by Green’s ﬁrst theorem. Now we note that v1 ∂B = Λ (γ)q f = 0. Therefore, we conclude that the righthand side vanishes; that is q ∇u2 · ∇u1 dx = 0 for all harmonic functions u1 , u2 ∈ C 2 (B) . (6.12) B
So far, we considered realvalued functions u1 and u2 . By taking the real and imaginary parts, we can also allow complexvalued harmonic functions for u1 and u2 . Now we ﬁx any y ∈ R2 with y = 0. Let y ⊥ ∈ R2 be a vector (unique up z ± ∈ C2 to sign) with y · y ⊥ = 0 and y = y ⊥ . Deﬁne the complex vectors ± 2 1 ± ⊥ ± ± by z = 2 (iy ± y ). Then one computes that z · z = j=12 (zj ) = 0 and z + · z − = j=12 zj+ zj− = − 12 y2 and z + + z − = iy. From this, we observe that
226
Electrical Impedance Tomography
the functions u± (x) = exp(z ± · x), x ∈ R2 , are harmonic in all of R2 . Therefore, substituting u+ and u− into (6.12) yields 0 = B
q ∇u+ ·∇u− dx = z + ·z −
B
q(x) e(z
+
+z − )·x
1 dx = − y2 2
q(x) eiy·x dx.
B
From this, we conclude that the Fourier transform of q (extended by zero in the exterior of B) vanishes on R2 \ {0}, and thus also q itself. This ends the proof.
6.4
The Factorization Method
In this section, we consider the full nonlinear problem but restrict ourselves to the more modest problem to determine only the shape of the region D, where γ diﬀers from the known background medium which we assume to be homogeneous with conductivity 1. We sharpen the assumption on γ of Assumption 6.1. Assumption 6.10 In addition to Assumption 6.1, let there exist ﬁnitely many domains Dj , j = 1, . . . , m, such that Dj ⊂ B and Dj ∩ Dk = ∅ forj = k and m such that the complement B \ D of the closure of the union D = j=1 Dj is connected. Every domain Dj is assumed to satisfy the exterior cone condition (see. e.g., [103]); that is, for every z ∈ ∂Dj there exists a set C (part of a cone) of the form
2 ˆ x > 1 − δ, 0 < x < ε0 C = z + x∈R :θ· x ˆ = 1, such that C ∩ Dj = ∅. for some ε0 , δ > 0 and θˆ ∈ R2 with θ Furthermore, there exists q0 > 0 such that γ = 1 on B \ D and γ ≥ 1 + q0 on D. We deﬁne the contrast q ∈ L∞ (B) by q = γ − 1 and note that D is the support of q. It is not diﬃcult to show that every Lipschitz domain D satisﬁes the exterior cone condition (see Problem 6.6). The inverse problem of this section is to determine the shape of D from the Neumann–Dirichlet operator Λ. In the following, we use the relative data Λ − Λ1 where Λ1 : L2 (∂B) → corresponds to the known background medium; that is, to γ = 1. The information that Λ − Λ1 does not vanish simply means that the background is perturbed by some contrast q = γ − 1. In the factorization method, we develop a criterion to decide whether or not a given point z ∈ B belongs to D. The idea is then to take a ﬁne grid in B and to check this criterion for every grid point z. This provides a pixelbased picture of D. L2 (∂B)
6.4
The Factorization Method
227
We recall that Λf = u∂B and Λ1 f = u1 ∂B , where u, u1 ∈ H1 (B) solve (1 + q) ∇u · ∇ψ dx = (f, ψ)L2 (∂B) for all ψ ∈ H1 (B) , (6.13) B
∇u1 · ∇ψ dx
=
(f, ψ)L2 (∂B)
for all ψ ∈ H1 (B) .
(6.14)
B
For the diﬀerence, we have (Λ1 − Λ)f = (u1 − u)∂B , and u1 − u ∈ H1 (B) satisﬁes the variational equation (1 + q) ∇(u1 − u) · ∇ψ dx = q ∇u1 · ∇ψ dx for all ψ ∈ H1 (B) . (6.15) B
D
It is the aim to factorize the operator Λ1 − Λ in the form Λ1 − Λ = A∗ T A , where the operators A : L2 (∂B) → L2 (D)2 and T : L2 (D)2 → L2 (D)2 are deﬁned as follows:5 • Af = ∇u1 D , where u1 ∈ H1 (B) solves the variational equation (6.14), and • T h = q(h − ∇w) where w ∈ H1 (B) solves the variational equation (1 + q) ∇w · ∇ψ dx = q h · ∇ψ dx for all ψ ∈ H1 (B) . (6.16) B
D
We note that the solution w of (6.16) exists and is unique. This is seen by the representation theorem A.23 of Riesz because the righthand side again deﬁnes a linear and bounded functional F (ψ) = D q h·∇ψ dx on H1 (B). The lefthand side of (6.16) is again the inner product (w, ψ)∗ . The classical interpretation of the variational equation (under the assumption that all functions are suﬃciently smooth) can again be seen from Green’s ﬁrst theorem, applied in D and in B\D. Indeed, in this case (6.16) is equivalent to ψ div (1 + q) ∇w − q h dx − ψ ν · (1 + q) ∇w − q h d 0 = D
ψ Δw dx −
+ B\D
∂(B\D)
∂D
∂w d ψ ∂ν
for all ψ. This yields div (1 + q) ∇w − q h = 0 in D ,
Δw = 0 in B \ D ,
5 Here, L2 (D)2 denotes the space of vectorvalued functions D → R2 such that both components are in L2 (D).
228
Electrical Impedance Tomography ∂w ν · (1 + q) ∇w − q h − = on ∂D , ∂ν +
and
∂w = 0 on ∂B ; ∂ν
that is, ∂w ν · (1 + q) ∇w − − = q− ν · h on ∂D , ∂ν +
∂w = 0 on ∂B . ∂ν
Theorem 6.11 Let the operators A : L2 (∂B) → L2 (D)2 and T : L2 (D)2 → L2 (D)2 be deﬁned as above by (6.14) and (6.16), respectively. Then Λ1 − Λ = A∗ T A .
(6.17)
Proof: We deﬁne the auxiliary operator H : L2 (D)2 → L2 (∂B) by Hh = w∂B where w ∈ H1 (B) solves (6.16). Obviously, we conclude from (6.15) that Λ1 − Λ = HA. We determine the adjoint A∗ : L2 (D)2 → L2 (∂B) of A and prove that ∗ A h = v∂B where v ∈ H1 (B) solves the variational equation ∇v · ∇ψ dx = h · ∇ψ dx for all ψ ∈ H1 (B) (6.18) B
D
and even for all ψ ∈ H 1 (B) because it obviously holds for constants. The solution v exists and is unique by the same arguments as above. Again, by applying Green’s theorem we note that v is the variational solution of the boundary value problem ∂v div h in D , = 0 on ∂B , (6.19a) Δv = 0 in B \ D , ∂ν ∂ ∂ v− − v+ = ν · h on ∂D . v− = v+ on ∂D , (6.19b) ∂ν ∂ν To prove the representation of A∗ h, we conclude from the deﬁnition of A, equation (6.18) for ψ = u1 , and (6.14) that ∇u1 · h dx = ∇u1 · ∇v dx = (f, v)L2 (∂B) , (Af, h)L2 (D)2 = D
B
and thus v∂B is indeed the value of the adjoint A∗ h. Now it remains to show that H = A∗ T . Let h ∈ L2 (D)2 and w ∈ H1 (B) solve (6.16). Then Hh = w∂B . We rewrite (6.16) as ∇w · ∇ψ dx = q (h − ∇w) · ∇ψ dx for all ψ ∈ H1 (B) . (6.20) B
D
The comparison with (6.18) yields A∗ q(h − ∇w) = w∂B = Hh; that is, A∗ T = H. Substituting this into Λ1 − Λ = HA yields the assertion. Properties of the operators A and T are listed in the following theorem:
6.4
The Factorization Method
229
Theorem 6.12 The operator A : L2 (∂B) → L2 (D)2 is compact, and the operator T : L2 (D)2 → L2 (D)2 is selfadjoint and coercive (T h, h)L2 (D)2 ≥ c h2L2 (D)2 where c = q0 1 − q0 /(1 + q0 ) > 0.
for all h ∈ L2 (D)2 ,
(6.21)
Proof: (i) For smooth functions u1 ∈ C 2 (B) with Δu1 = 0 in B and ∂u1 /∂ν = f on ∂B the following representation formula holds (see [53] or Theorem 7.16 for the case of the threedimensional Helmholtz equation). ∂ ∂u1 (y) − u1 (y) Φ(x, y) d(y) u1 (x) = Φ(x, y) ∂ν ∂ν(y) ∂B ∂ = Φ(x, y) d(y) , x ∈ B , Φ(x, y) f (y) − (Λ1 f )(y) ∂ν(y) ∂B
where Φ denotes the fundamental solution of the Laplace equation in R2 ; that is, 1 ln x − y , x = y . Φ(x, y) = − 2π We can write ∇u1 in D in the form ∇u1 D = K1 f − K2 Λ1 f where the operators K1 , K2 : L2 (∂B) → L2 (D)2 , deﬁned by (K1 f )(x) = ∇ Φ(x, y) f (y) d(y) , x ∈ D , ∂B
(K2 g)(x)
= ∇ ∂B
g(y)
∂ Φ(x, y) d(y) , ∂ν(y)
x ∈ D,
are compact as integral operators on bounded regions of integration with smooth kernels. (Note that D ⊂ B.) The representation A = K1 − K2 Λ1 holds by a density argument (see Theorem A.30). Therefore, also A is compact. (ii) Let h1 , h2 ∈ L2 (D)2 with corresponding solutions w1 , w2 ∈ H1 (B) of (6.16). Then, with (6.16) for h2 , w2 and ψ = w1 : q (h1 − ∇w1 ) · h2 dx (T h1 , h2 )L2 (D)2 = D
q h1 · h2 dx −
= D
q ∇w1 · h2 dx D
q h1 · h2 dx −
= D
(1 + q) ∇w1 · ∇w2 dx . B
This expression is symmetric with respect to h1 and h2 . Therefore, T is selfadjoint.
230
Electrical Impedance Tomography
For h ∈ L2 (D)2 and corresponding solution w ∈ H1 (B) of (6.16), we conclude that (T h, h)L2 (D)2 = q h − ∇w2 dx + q (h − ∇w) · ∇w dx D
D
2
q h − ∇w dx +
= D
≥
∇w2 dx
(with the help of (6.20))
B
q0 h2 − 2 q0 h · ∇w + (1 + q0 ) ∇w2 dx
D
1 + q0 ∇w − √ q0 = 1 + q0 D q0 ≥ q0 1 − h2L2 (D)2 . 1 + q0
2 q0 h + q0 1 − 1 + q0
! h
2
dx
From this result and the factorization (6.17), we note that Λ1 −Λ is compact, selfadjoint (this follows already from Theorem 6.4), and nonnegative. Now we derive the binary criterion on a point z ∈ B to decide whether or not this point belongs to D. First, for every point z ∈ B we deﬁne a particular function G(·, z) such that ΔG(·, z) = 0 in B \ {z} and ∂G(·, z)/∂ν = 0 on ∂B such that G(x, z) becomes singular as x tends to z. We construct G from the Green’s function N for Δ in B with respect to the Neumann boundary conditions. ˜ (x, z) where We make an ansatz for N in the form N (x, z) = Φ(x, z) − N again 1 ln x − z , x = z , Φ(x, z) = − 2π ˜ (·, z) ∈ is the fundamental solution of the Laplace equation in R2 and determine N 1 H (B) as the unique solution of the Neumann problem ˜ ∂N ∂Φ 1 (·, z) = (·, z) + on ∂B . ∂ν ∂ν ∂B We note that the solution exists because ∂Φ(·, z)/∂ν + 1/∂B d = 0. This ˜ (·, z) = 0 in B ΔN
and
∂B
is seen by Green’s ﬁrst theorem in the region B \ B(z, ε): ∂Φ ∂Φ (·, z) d = (x, z) d(x) ∂ν ∂ν ∂B
x−z=ε
= −
1 2π
x−z=ε
x−z x−z d(x) = −1 . · 2 x − z x − z
6.4
The Factorization Method
231
˜ is the Green’s function in B with respect to the Neumann Then N = Φ − N boundary conditions; that is, N satisﬁes ΔN (·, z) = 0 in B \ {z} and
1 ∂N (·, z) = − on ∂B . ∂ν ∂B
˜ (·, z) on the parameter From the diﬀerentiable dependence of the solution N 2 z ∈ B, we conclude that, for any ﬁxed a ∈ R with a = 1, the function G(·, z) = a · ∇z N (·, z) satisﬁes ΔG(·, z) = 0 in B \ {z} and
∂G (·, z) = 0 on ∂B . ∂ν
(6.22)
The function G(·, z) has the following desired properties. Lemma 6.13 Let z ∈ B, ε0 > 0, θ ∈ [0, 2π], and δ > 0 be kept ﬁxed. For ε ∈ [0, ε0) deﬁne the set (part of a cone, compare with Assumption 6.10 for θ θˆ = cos sin θ )
cos t Cε = z + r : ε < r < ε0 , θ − t < arccos(1 − δ) sin t with vertex in z. Let ε0 be so small such that C0 ⊂ B. Then lim G(·, z)L2 (Cε ) = ∞ .
ε→0
˜ (·, z) it is suﬃcient to consider only the part a · Proof: By the smoothness of N t ∇z ln x−z. Using polar coordinates for x with respect to z (i.e., x = z+r cos sin t ), s , we have with η = arccos(1 − δ) and the representation of a as a = cos sin s
a · ∇z ln x − z2 dx
=
Cε
2 ε0 θ+η 2 (z − x) · a r cos2 (s − t) dx = r dt dr x − z4 r4 ε θ−η
Cε θ+η
cos2 (s − t) dt
= θ−η
"
Therefore, G(·, z)L2 (Cε ) ≥
& c ln
#$
= c
%
ε0 ε
ε0 1 dr = c ln . r ε
ε0 ˜ (·, z)L2 (C ) → ∞ − a · ∇z N 0 ε
for ε → 0 .
We observe that the functions φz (x) = G(·, z)∂B are traces of harmonic functions in B \ {z} with vanishing normal derivatives on ∂B. Comparing this with the classical formulation (6.19a) (6.19b) of the adjoint A∗ of A it seems to be plausible that the “source region” D of (6.19a), (6.19b) can be determined by moving the source point z in φz . This is conﬁrmed in the following theorem.
232
Electrical Impedance Tomography
Theorem 6.14 Let Assumptions 6.10 hold and let a ∈ R2 with a = 1 be ﬁxed. For every z ∈ B deﬁne φz ∈ L2 (∂B) by φz (x) = G(x, z) = a · ∇z N (x, z) ,
x ∈ ∂B ,
(6.23)
where N denotes the Green’s function with respect to the Neumann boundary condition. Then (6.24) z ∈ D ⇐⇒ φz ∈ R(A∗ ) , where A∗ : L2 (D)2 → L2 (∂B) is the adjoint of A, given by (6.18), and R(A∗ ) its range. Proof: First let z ∈ D. Choose a disc B[z, ε] = {x ∈ R2 : x − z ≤ ε} with center z and radius ε > 0 such that B[z, ε] ⊂ D. Furthermore, choose a function ϕ ∈ C ∞ (R2 ) such that ϕ(x) = 0 for x − z ≤ ε/2 and ϕ(x) = 1 for x − z ≥ ε and set w(x) = ϕ(x)G(x, z) for x ∈ B. Then w ∈ H1 (B) and w = G(·, z) in B \ D, thus w∂B = φz . Next, we determine u ∈ H1 (D) as a solution of Δu = Δw in D, ∂u/∂ν = 0 on ∂D; that is, in weak form ∂ G(·, z) d , ψ ∈ H1 (D) , ∇u · ∇ψ dx = ∇w · ∇ψ dx − ψ ∂ν D
D
∂D
because ∂w/∂ν = ∂G(·, z)/∂ν on ∂D. Again, the solution exists and is unique. Application of Green’s ﬁrst theorem in B \ D yields ∂ ∂ G(·, z) d = G(·, z) d = 0 . ∂ν ∂ν ∂D
∂B
Therefore, the previous variational equation holds also for constants and thus for all ψ ∈ H 1 (D). Now let ψ ∈ H1 (B) be a test function on B. Then ∂ G(·, z) d ∇u · ∇ψ dx = ∇w · ∇ψ dx − ψ ∂ν D D ∂D = ∇w · ∇ψ dx + ∇G(·, z) · ∇ψ dx = ∇w · ∇ψ dx . D
B
B\D
Therefore, the deﬁnition h = ∇u in D yields A∗ h = w∂B = φz and thus φz ∈ R(A∗ ). Now we prove the opposite direction. Let z ∈ / D. We have to show that φz is not contained in the range of A∗ and assume, on the contrary, that φz = A∗ h for some h ∈ L2 (D)2 . Let v ∈ H1 (B) be the corresponding solution of (6.18). Therefore, the function w = v − G(·, z) vanishes on ∂B and solves the following equations in the weak form Δw = 0 in B \ Dε (z) ,
∂w = 0 on ∂B , ∂ν
6.4
The Factorization Method
233
for every ε > 0 such that Dε (z) := D ∪ B(z, ε) ⊂ B; that is, ∇w · ∇ψ dx = 0 B\Dε (z)
for all ψ ∈ H 1 B \ Dε (z) withψ = 0 on ∂D ε (z). We extend w by zero into the exterior of B. Then w ∈ H 1 R2 \ Dε (z) because w = 0 on ∂B 6 and ∇w · ∇ψ dx = 0 R2 \Dε (z)
for all ψ ∈ H 1 R2 \Dε (z) which vanish on ∂Dε (z). This is the variational form small ε > 0 we of Δw = 0 in R2 \ Dε (z). Since this holds for all suﬃciently conclude that Δw = 0 in the exterior Ω := R2 \ D ∪ {z} of D ∪ {z}. Now we use without proof7 that w is analytic in this set Ω and thus satisﬁes the unique continuation principle, see, e.g., Theorem 4.39 of [161]. Therefore, because it vanishes in the exterior of B it has to vanish in all of the connected set Ω. (Here we make assumption that B \ D is connected.) Therefore, v = G(·, z) use of the in B \ D ∪ {z} . The point z can either be on the boundary ∂D or of D. in the exterior t : 0 < r < In either case there is a cone C0 of the form C0 = z + r cos sin t ε0 , θ − t < δ with C0 ⊂ B \ D. (Here we use the fact that every component of D satisﬁes the exterior cone condition.) It is vC0 ∈ L2 (C0 ) because even v ∈ H1 (B). However, Lemma 6.13 yields that G(·, z)L2 (Cε ) → ∞ for ε → 0 t where Cε = z + r cos sin t : ε < r < ε0 , θ − t < δ . This is a contradiction because v = G(·, z) in C0 and ends the proof. Therefore, we have shown an explicit characterization of the unknown domain D by the range of the operator A∗ . This operator, however, is also unknown: only Λ1 − Λ is known! The operators A∗ and Λ1 − Λ are connected by the factorization Λ1 − Λ = A∗ T A. We can easily derive a second factorization of Λ1 − Λ. The operator Λ1 − Λ is selfadjoint and compact as an operator from L2 (∂B) into itself. Therefore, there exists a spectral decomposition of the form (Λ1 − Λ)f =
∞
λj (f, ψj )L2 (∂B) ψj ,
n=1
where λj ∈ R denote the eigenvalues and ψj ∈ L2 (∂B) the corresponding orthonormal eigenfunctions of Λ1 − Λ (see Theorem A.53 of Appendix A.6). Furthermore, from the factorization and the coercivity of T it follows that is not quite obvious that the extension is in H 1 R2 \ Dε (z) , see, e.g., Corollary 5.13 in [161] 7 see again [161], Theorem 4.38 and Corollary 3.4 6 It
234
Electrical Impedance Tomography
(Λ1 − Λ)f, f L2 (∂B) ≥ 0 for all f ∈ L2 (∂B). This implies λj ≥ 0 for all j. Therefore, we can deﬁne Wf =
∞ λj (f, ψj )L2 (∂B) ψj , n=1
and have a second factorization in the form W W = Λ1 −Λ. We write (Λ1 −Λ)1/2 for W . The operator (Λ1 − Λ)1/2 is also selfadjoint, and we have (Λ1 − Λ)1/2 (Λ1 − Λ)1/2 = Λ1 − Λ = A∗ T A .
(6.25)
We show that the ranges of (Λ1 − Λ)1/2 and A∗ coincide.8 This follows directly from the following functional analytic result: Lemma 6.15 Let X and Y be Hilbert spaces, B : X → X, A : X → Y , and T : Y → Y linear and bounded such that B = A∗ T A. Furthermore, let T be selfadjoint and coercive; that is, there exists c > 0 such that (T y, y)Y ≥ cy2Y for all y ∈ Y . Then, for any φ ∈ X, φ = 0, φ ∈ R(A∗ ) ⇐⇒ inf (Bx, x)X : x ∈ X , (x, φ)X = 1 > 0 . Proof: (i) First, let φ = A∗ y ∈ R(A∗ ) for some y ∈ Y . Then y = 0, and we estimate for arbitrary x ∈ X with (x, φ)X = 1: (Bx, x)X
= = =
(A∗ T Ax, x)X = (T Ax, Ax)Y ≥ cAx2Y 2 c c (Ax, y)Y Ax2Y y2Y ≥ 2 2 yY yY 2 c c c 2 (x, A∗ y)X = (x, φ)X = . 2 2 yY yY y2Y
Therefore, we have found a positive lower bound for the inﬁmum. (ii) Second, let φ ∈ / R(A∗ ). Deﬁne the closed subspace V = x ∈ X : (φ, x)X = 0 = {φ}⊥ . We show that the image A(V ) is dense in the closure of the range of A. Indeed, let y ∈ closure(R(A)) such that y ⊥ Ax for all x ∈ V ; that is, 0 = (Ax, y)Y = / R(A∗ ) we (x, A∗ y) for all x ∈ V ; that is, A∗ y ∈ V ⊥ = span{φ}. Because φ ∈ ∗ ∗ conclude that A y = 0. Therefore, y ∈ closure(R(A)) ∩ N (A ). This yields y = 0.9 Therefore, A(V ) is dense in closure(R(A)). Because Aφ/φ2X is in the xn → −Aφ/φ2X . We range of A there exists a sequence x ˜n ∈ V such that A˜ 2 ˜n + φ/φX . Then (xn , φ)X = 1 and Axn → 0 for n → ∞, and deﬁne xn := x we estimate (Bxn , xn )X = (T Axn , Axn )Y ≤ T L(Y ) Axn 2Y −→ 0 , 8 This
n → ∞,
is also known as Douglas’ Lemma, see [75]. a sequence (xj ) in X such that Axj → y. Then 0 = (A∗ y, xj )X = (y, Axj )Y → (y, y)Y ; that is, y = 0. 9 Take
6.4
The Factorization Method
235
and thus inf (Bx, x)X : x ∈ X , (x, φ)X = 1 = 0.
We apply this result to both of the factorizations of (6.25). In both cases, B = Λ1 − Λ and X = L2 (∂B). First, we set Y = L2 (D)2 and A : L2 (∂B) → L2 (D)2 and T : L2 (D)2 → L2 (D)2 as in the second factorization of (6.25). Because T is selfadjoint and coercive we conclude for any φ ∈ L2 (∂B), φ = 0, that φ ∈ R(A∗ ) ⇔ inf (Λ1 −Λ)f, f L2 (∂B) : f ∈ L2 (∂B) , (f, φ)L2 (∂B) = 1 > 0 . Second, we consider the ﬁrst factorization of (6.25) with T being the identity. For φ ∈ L2 (∂B), φ = 0, we conclude that φ ∈ R (Λ1 − Λ)1/2 ⇔ inf (Λ1 − Λ)f, f L2 (∂B) : (f, φ)L2 (∂B) = 1 > 0 . The righthand sides of the characterizations only depend on Λ1 − Λ, therefore, we conclude that (6.26) R (Λ1 − Λ)1/2 = R(A∗ ) . Application of Theorem 6.14 yields the main result of the factorization method: Theorem 6.16 Let Assumptions 6.10 be satisﬁed. For ﬁxed a ∈ R2 with a = 0 and every z ∈ B let φz ∈ L2 (∂B) be deﬁned by (6.23); that is, φz (x) = a · ∇z N (x, z), x ∈ ∂B, where N denotes the Green’s function for Δ with respect to the Neumann boundary conditions. Then (6.27) z ∈ D ⇐⇒ φz ∈ R (Λ1 − Λ)1/2 . We now rewrite the righthand side with Picard’s Theorem A.58 of Appendix A.6. First, we show injectivity of the operator Λ1 − Λ. Theorem 6.17 The operator Λ1 − Λ is onetoone. Proof: From (Λ1 − Λ)f, f L2 (∂B)
=
(A∗ T Af, f )L2 (∂B) = (T Af, Af )L2 (D)2
≥
cAf 2L2 (D)2
for f ∈ L2 (∂B)
it suﬃces to prove injectivity of A. Let Af = ∇u1 D = 0 where u1 ∈ H1 (B) denotes the weak solution of Δu1 = 0 in B and ∂u1 /∂ν = f on ∂B. Therefore, ∇u1 is constant in every component of D. Without proof, we use again the regularity result that u1 is analytic in B. The derivatives vj = ∂u1 /∂xj are solutions of Δvj = 0 in B and vj = 0 in D. The unique continuation property yields vj = ∂u1 /∂xj = 0 in all of B and thus f = 0. Therefore, the operator Λ1 − Λ is selfadjoint, compact, onetoone, and all eigenvalues are positive. Let {λj , ψj } be an eigensystem of Λ1 − Λ; that
236
Electrical Impedance Tomography
is, λj > 0 are the eigenvalues of Λ1 − Λ and ψj ∈ L2 (∂B) are the corresponding orthonormal eigenfunctions (see Theorem A.53 of Appendix A.6). The set {ψj : j = 1, 2, . . .} is complete by the spectral theorem. Therefore, { λj , ψj , ψj } is a singular system of the operator (Λ1 − Λ)1/2 . Application of Picard’s Theorem A.58 of Appendix A.6 yields Theorem 6.18 Let Assumptions 6.10 be satisﬁed. For ﬁxed a ∈ R2 with a = 0 and for every z ∈ B let φz ∈ L2 (∂B) be deﬁned by (6.23); that is, φz (x) = a · ∇z N (x, z), x ∈ ∂B. Then z∈D
⇐⇒
∞ (φz , ψj )2L2 (∂B) j=1
λj
< ∞
(6.28)
or, equivalently, z∈D
⇐⇒
⎤−1 ⎡ ∞ (φz , ψj )2L2 (∂B) ⎦ χ(z) := ⎣ > 0. λj j=1
(6.29)
Here we agreed on the setting that the inverse of the series is zero in the case of divergence. Therefore, χ vanishes outside of D and is positive in the interior of D. The function 1, χ(z) > 0, sign χ(z) = 0, χ(z) = 0, is thus the characteristic function of D. We ﬁnish this section with some further remarks. We leave it to the reader to show (see Problems 6.2–6.4) that in the case of B = B(0, 1) being the unit disk and D = B(0, R) the disk of radius R < 1 the ratios (φz , ψj )2L2 (∂B) /λj behave as (z/R)2j . Therefore, convergence holds if and only if z < R which conﬁrms the assertion of the last theorem. In practice, only ﬁnitely many measurements are available; that is, the data operator Λ1 − Λ is replaced by a matrix M ∈ Rm×m . The question of convergence of the series is obsolete because the sum consists of only ﬁnitely many terms. However, in practice, it is observed that the value of this sum is much smaller for points z inside of D than for points outside of D. Some authors (see [123]) suggest to test the “convergence” by determining the slope of the straight line that best ﬁts the curve j → ln (φz , ψj )2L2 (∂B) /λj (for some j only). The points z for which the slope is negative provide a good picture of D. A rigorous justiﬁcation of a projection method to approximate the (inﬁnite) series by a (ﬁnite) sum has been given in [177]. In the implementation of the factorization method, only the relative data operator Λ1 − Λ has to be known and no other information on D. For example, it is allowed (see Assumption 6.10) that D consist of several components. Furthermore, the fact that the medium D is penetrable is not used. If one imposes some boundary condition on ∂D, the same characterization as in Theorem 6.18
6.5
Problems
237
holds. For example, in [123], the factorization method has been justiﬁed for an insulating object D. In particular, the factorization method provides a proof of uniqueness of D independent of the nature of D; that is, whether it is ﬁnitely conducting, a perfect conductor (Dirichlet boundary conditions on ∂D), a perfect insulator (Neumann boundary conditions on ∂D), or a boundary condition of Robintype.
6.5
Problems
6.1 Let D be a domain with D ⊂ B and γ ∈ L∞ (B) piecewise constant with γ = γ0 in D for some γ0 ∈ R and γ = 1 in B \ D. Let u ∈ H1 (B) ∩ C 2 (B \ ∂D) be a solution of the variational equation (6.4) and assume that uD and uB\D have diﬀerentiable extensions to D and B \ D, respectively. Show that u solves Δu = 0 in B \∂D and ∂u/∂ν = f on ∂B and u+ = u− on ∂D and γ0 ∂u− /∂ν = ∂u+ /∂ν on ∂D. Hint: Use Green’s ﬁrst theorem. For the following problems let B be the unit disk in R2 with center at the origin. 6.2 Show that the fundamental solution Φ and the Green’s function N are given in polar coordinates (x = r(cos t, sin t) and z = ρ(cos τ, sin τ ) ) as ∞ 1 1 + ρ ,n 1 ln r + cos n(t − τ ) , Φ(x, z) = − 2π 2π n=1 n r x ,z N (x, z) = Φ(x, z) + Φ x2 ∞ 1 1 n 1 1 ln r + ρ = − + rn cos n(t − τ ) , 2π 2π n=1 n rn for ρ = z < x = r. Hint: Write Φ in the form
+ ρ ,2 1 ρ 1 ln r − ln 1 + − 2 cos(t − τ ) 2π 4π r r ∞ 1 n 1 2 and show n=1 n α cos(ns) = − 2 ln 1 + α − 2 α cos s by diﬀerentiation respect to α and applying the geometric series formula for ∞ with n−1 exp(ins). n=1 α Φ(x, z) = −
6.3 Show that φz from (6.23) is given by φz (x) =
a · (x − z) , πx − z2
x = 1 .
Also compute φz in polar coordinates for a = (cos α, sin α) by the formulas of Problem 6.2.
238
Electrical Impedance Tomography
6.4 Compute the eigenvalues λn and the normalized eigenfunctions ψn ∈ L2 (∂B) of Λ1 − ΛγR and the coeﬃcients (φz , ψn )L2 (∂B) for the case of Example 6.6. Compute the ratios (φz , ψn )2L2 (∂B) /λn and validate the condition (6.29) of Theorem 6.18. 6.5 Consider the case of D ⊂ B being the annulus D = {x ∈ B : R1 < x < R2 } for some 0 < R1 < R2 < 1. Compute again the eigenvalues λn and the normalized eigenfunctions ψn ∈ L2 (∂B) of Λ1 − Λ and the coeﬃcients (φz , ψn )L2 (∂B) . Verify that you can only determine the outer boundary {x : x = R2 } by the factorization method. 6.6 Let f : [a, b] → R>0 be a Lipschitz continuous function; that is, f (x) > 0 for all x ∈ [a, b] and there exists L > 0 with f (x) − f (y) ≤ Lx − y for all x, y ∈ [a, b]. Deﬁne D := {(x1 , x2 ) ∈ [a, b] × R : 0 < x2 < f (x1 ) for x1 ∈ [a, b] . Show that this Lipschitz domain D ⊂ R2 satisﬁes the exterior cone condition of Assumption 6.10.
Chapter 7
An Inverse Scattering Problem 7.1
Introduction
We consider acoustic waves that travel in a medium such as a ﬂuid. Let v(x, t) be the velocity vector of a particle at x ∈ R3 and time t. Let p(x, t), ρ(x, t), and S(x, t) denote the pressure, density, and speciﬁc entropy, respectively, of the ﬂuid. We assume that no exterior forces act on the ﬂuid. Then the movement of the particle is described by the following equations. ∂v 1 + (v · ∇)v + γ v + ∇p ∂t ρ ∂ρ + div (ρv) ∂t f (ρ, S) ∂S + v · ∇S ∂t
=
0
(Euler’s equation) ,
(7.1a)
=
0
(continuity equation) ,
(7.1b)
= p
(equation of state) ,
(7.1c)
=
(adiabatic hypothesis) ,
(7.1d)
0
where the function f depends on the ﬂuid. γ is a damping coeﬃcient, which we assume to be piecewise constant. This system is nonlinear in the unknown functions v, ρ, p, and S. Let the stationary case be described by v0 = 0, timeindependent distributions ρ = ρ0 (x) and S = S0 (x), and constant p0 such that p0 = f ρ0 (x), S0 (x) . The linearization of this nonlinear system is given by the (directional) derivative of this system at (v0 , p0 , ρ0 , S0 ). For deriving the linearization, we set v(x, t) = ε v1 (x, t) + O(ε2 ) , p(x, t) = p0 + ε p1 (x, t) + O(ε2 ) , ρ(x, t) S(x, t)
= ρ0 (x) + ε ρ1 (x, t) + O(ε2 ) , = S0 (x) + ε S1 (x, t) + O(ε2 ) ,
© Springer Nature Switzerland AG 2021 A. Kirsch, An Introduction to the Mathematical Theory of Inverse Problems, Applied Mathematical Sciences 120, https://doi.org/10.1007/9783030633431 7
239
240
Inverse Scattering Problem
and we substitute this into (7.1a), (7.1b), (7.1c), and (7.1d). Ignoring terms with O(ε2 ) leads to the linear system ∂v1 1 + γ v1 + ∇p1 = 0 , ∂t ρ0 ∂ρ1 + div (ρ0 v1 ) = 0 , ∂t ∂f (ρ0 , S0 ) ∂f (ρ0 , S0 ) ρ1 + S 1 = p1 , ∂ρ ∂S ∂S1 + v1 · ∇S0 = 0 . ∂t
(7.2a) (7.2b) (7.2c) (7.2d)
First, we eliminate S1 . Because ∂f (ρ0 , S0 ) ∂f (ρ0 , S0 ) ∇ρ0 + ∇S0 , 0 = ∇f ρ0 (x), S0 (x) = ∂ρ ∂S we conclude by diﬀerentiating (7.2c) with respect to t and using (7.2d) ∂p1 ∂ρ1 = c(x)2 + v1 · ∇ρ0 , (7.2e) ∂t ∂t where the speed of sound c is deﬁned by c(x)2 :=
∂ f ρ0 (x), S0 (x) . ∂ρ
Now we eliminate v1 and ρ1 from the system. This can be achieved by diﬀerentiating (7.2e) with respect to time and using equations (7.2a) and (7.2b). This leads to the wave equation for p1 : 1 ∂ 2 p1 (x, t) ∂p1 (x, t) 2 = c(x) ρ0 (x) div ∇p1 (x, t) . + γ (7.3) ∂t2 ∂t ρ0 (x) Now we assume that terms involving ∇ρ0 are negligible and that p1 is timeperiodic; that is, of the form p1 (x, t) = Re u(x) e−iωt with frequency ω > 0 and a complexvalued function u = u(x) depending only on the spatial variable. Substituting this into the wave equation (7.3) yields the threedimensional Helmholtz equation for u: ω2 γ Δu(x) + 1+i u = 0. 2 c(x) ω In free space, c = c0 is constant and γ = 0. We deﬁne the wave number and the index of refraction by γ c20 ω 1 + i . (7.4) > 0 and n(x) := k := c0 c(x)2 ω
Inverse Scattering Problem
241
The Helmholtz equation then takes the form Δu + k 2 n u = 0
(7.5)
where n is a complexvalued function with Re n(x) ≥ 0 and Im n(x) ≥ 0. This equation holds in every sourcefree domain in R3 . We assume in this chapter that there exists a > 0 such that c(x) = c0 and γ(x) = 0 for all x with x ≥ a; that is, n(x) = 1 for x ≥ a. This means that the inhomogeneous medium {x ∈ R3 : n(x) = 1} is bounded and contained in the ball B(0, a) := {y ∈ R3 : y < a} of radius a. By B[0, a] := {y ∈ R3 : y ≤ a}, we denote its closure. We further assume that the sources lie outside the ball B[0, a]. These sources generate “incident” ﬁelds ui , that satisfy the unperturbed Helmholtz equation Δui + k 2 ui = 0 outside the sources. In this introduction, we assume that ui is either a point source or a plane wave; that is, the timedependent incident ﬁelds have the form pi1 (x, t) =
1 Re eikx−z−iωt ; x − z
eikx−z , x − z
that is, ui (x) =
for a source at z ∈ R3 , or ˆ
pi1 (x, t) = Re eikθ·x−iωt ;
ˆ
that is, ui (x) = eikθ·x ,
for a unit vector θˆ ∈ R3 . In any case, ui is a solution of the Helmholtz equation Δui + k 2 ui = 0 in 3 R \ {z} or R3 , respectively. In the ﬁrst case, pi1 describes a spherical wave that travels away from the source with velocity c0 . In the second case, pi1 is a plane wave that travels in the direction θˆ with velocity c0 . The incident ﬁeld is disturbed by the medium described by the index of refraction n and produces a “scattered wave” us . The total ﬁeld u = ui + us satisﬁes the Helmholtz equation Δu + k 2 n u = 0 outside the sources; that is, the scattered ﬁeld us satisﬁes the inhomogeneous equation Δus + k 2 n us = k 2 (1 − n)ui
(7.6)
where the righthand side is a function of compact support in B(0, a). Furthermore, we expect the scattered ﬁeld us to behave as a spherical wave far away from the medium. This can be described by the following radiation condition ∂us (x) − ik us (x) = O(1/r2 ) as r = x −→ ∞ , ∂r
(7.7)
uniformly in x/x ∈ S 2 . Here we denote by S 2 the unit sphere in R3 . The smoothness of the solution us depends on the smoothness of the refractive index n. We refer to the beginning of Subsection 7.2 for more details. We have now derived a (almost) complete description of the direct scattering problem. Let the wave number k > 0, the index of refraction n ∈ L∞ (R3 ) with n(x) = 1 for x ≥ a, and the incident ﬁeld ui be given. Determine the scattered
242
Inverse Scattering Problem
ﬁeld us that satisﬁes the source equation (7.6) in some generalized sense (to make more precise later) and the radiation condition (7.7). In the inverse problem, one tries to determine the index of refraction n from measurements of the ﬁeld u outside of B(0, a) for several diﬀerent incident ﬁelds ui and/or diﬀerent wave numbers k. The following example shows that the radially symmetric case reduces to an ordinary diﬀerential equation. Example 7.1 Let n = n(r) be radially symmetric: n is independent of the spherical coordinates. Because in spherical polar coordinates (r, φ, θ),
∂2 ∂ ∂ 1 1 ∂ 1 ∂ + Δ = 2 r2 + 2 2 sin θ , r ∂r ∂r r2 sin θ ∂θ ∂θ r sin θ ∂φ2 the Helmholtz equation for radially symmetric u = u(r) reduces to the following ordinary diﬀerential equation of second order, 1 2 r u (r) + k 2 n(r) u(r) = 0 ; r2 that is, 2 u (r) + k 2 n(r) u(r) = 0 for r > 0 . (7.8a) r From the theory of linear ordinary diﬀerential equations of second order with singular coeﬃcients, we know that in a neighborhood of r = 0 there exist two linearly independent solutions, a regular one and one with a singularity at r = 0. We construct them by making the substitution u(r) = v(r)/r in (7.8a). This yields the equation u (r) +
v (r) + k 2 n(r) v(r) = 0
for r > 0 .
(7.8b)
For the simplest case, where n(r) = 1, we readily see that u1 (r) = α sin(kr)/r and u2 (r) = β cos(kr)/r are two linearly independent solutions. u1 is regular and u2 is singular at the origin. Neither of them satisﬁes the radiation condition. However, the combination u(r) = γ exp(ikr)/r does satisfy the radiation condition because
1 exp(ikr) u (r) − iku(r) = −γ = O 2 r2 r as is readily seen. For the case of arbitrary n, we construct a fundamental system {v1 , v2 } of (7.8b) (compare with Section 5.2); that is, v1 and v2 satisfy (7.8b) with v1 (0) = 0, v1 (0) = 1, and v2 (0) = 1, v2 (0) = 0. Then u1 (r) = v1 (r)/r is the regular and u2 (r) = v2 (r)/r is the singular solution. In the next section, we rigorously formulate the direct scattering problem and prove the uniqueness and existence of a solution. The basic ingredients for the uniqueness proof are a result by Rellich (see [222]) and a unique continuation principle for solutions of the Helmholtz equation. We prove neither
7.2
The Direct Scattering Problem
243
Rellich’s lemma nor the general continuation principle, but rather give a simple proof for a special case of a unique continuation principle that is suﬃcient for the uniqueness proof of the direct problem. This suggestion was made by H¨ahner (see [116]). We then show the equivalence of the scattering problem with an integral equation. Existence is then proven by an application of the Riesz theorem A.36 of Appendix A. Section 7.3 is devoted to the introduction of the far ﬁeld patterns that describe the scattered ﬁelds “far away” from the medium. We collect some results on the far ﬁeld operator, several of which are needed in Sections 7.5 and 7.7. The question of injectivity of the far ﬁeld operator is closely related to an unusual eigenvalue problem which we call the interior transmission eigenvalue problem. We will investigate this eigenvalue problem in Section 7.6. In Section 7.4, we prove uniqueness of the inverse problem. Section 7.5 is devoted to the factorization method which corresponds to the method in Section 6.4 and provides a very simple characterization of the support of the contrast by the far ﬁeld patterns. This method is rigorously justiﬁed under the assumption that, again, the wavenumber is not an interior transmission eigenvalue. Since the interior transmission eigenvalue problem is also an interesting problem in itself and widely studied during the past decade we include Section 7.6 for some aspects of this eigenvalue problem. Finally, in Section 7.7, we present three classical numerical algorithms for solving the inverse scattering problem.
7.2
The Direct Scattering Problem
In this section, we collect properties of solutions to the Helmholtz equation that are needed later. We prove uniqueness and existence of the direct scattering problem and introduce the far ﬁeld pattern. In the remaining part of this chapter, we restrict ourselves to scattering problems for plane incident ﬁelds. Throughout this chapter, we make the following assumptions. Let n ∈ L∞ (R3 ) and a > 0 with n(x) = 1 for almost all x ≥ a. Assume that Re n(x) ≥ 0 and Im n(x) ≥ 0 for almost all x ∈ R3 . Let k ∈ R, k > 0, ˆ = 1. We set ui (x) := exp(ik θˆ · x) for x ∈ R3 . Then ui solves and θˆ ∈ R3 with θ the Helmholtz equation Δui + k 2 ui = 0
in R3 .
(7.9)
We again formulate the direct scattering problem. Given n, k, θˆ satisfying the 2 previous assumptions, determine u ∈ Hloc (R3 ) such that Δu + k 2 n u = 0
in R3 ,
(7.10)
and us := u − ui satisﬁes the Sommerfeld radiation condition ∂us − ikus = O(1/r2 ) for r = x → ∞ , ∂r
(7.11)
uniformly in x/x ∈ S 2 . Since the index function n is not smooth we cannot expect that the solution u is smooth either. Rather, it belongs to the (local)
244
Inverse Scattering Problem
2 Sobolev space Hloc (R3 ). We recall that for any open set Ω ⊂ R3 and p ∈ N the p Sobolev space H (Ω) is deﬁned as the completion of
u ∈ C p (Ω) :
∂ j1
u ∈ L2 (Ω) for all j ∈ N30 with j1 ≤ p ∂xj11 ∂xj22 ∂xj33
with respect to the norm
u H p (Ω)
∂ j1 u(x) 2 = ∂xj1 ∂xj2 ∂xj3 dx . 1 2 3 j ∈ N30 Ω j1 ≤ p
Here we have set j1 = j1  + j2  + j3  for j = (j1 , j2 , j3 ) ∈ N30 . We refer to Chapter 6 where we already used Sobolev spaces of functions on twodimensional p (Ω) are deﬁned by domains B. The local spaces Hloc p Hloc (Ω)= u : Ω→C : uB ∈ H p (B) for every bounded domain B with B ⊂ Ω . For us, the spaces H 1 (Ω) and H 2 (Ω) (and their local analogies) are particularly important. We deﬁne the subspace H0p (Ω) of H p (Ω) by the closure of the set C0p (Ω) = ψ ∈ C p (Ω) : ψ has compact support in Ω in H p (Ω). By deﬁnition it is closed, and one can show that it is a strict subspace of H p (Ω). The space H01 (Ω) models the class of functions which vanish on the boundary of Ω while for functions in H02 (Ω) also the normal derivatives ∂ψ/∂ν vanish at ∂Ω. The following versions of Green’s theorem are not diﬃcult to prove by approximating u and v by sequences of smooth functions. Lemma 7.2 Let Ω ⊂ R3 be a bounded domain. Then u Δv + ∇u · ∇v dx = 0 for all u ∈ H01 (Ω), v ∈ H 2 (Ω) ,
(7.12a)
Ω
u Δv − v Δu dx = 0
for all u ∈ H 2 (Ω), v ∈ H02 (Ω) .
(7.12b)
Ω
The application of (7.12a) to solutions of the Helmholtz equation yields the following version. Lemma 7.3 Let v, w ∈ H 2 B(0, b) ) be solutions of the Helmholtz equation Δu + k 2 u = 0 in some annular region A = {x ∈ R3 : a < x < b}. Then v and w are analytic in A, and for every R ∈ (a, b) it holds that ∂w ds = ∇v · ∇w + v Δw dx . (7.13) v ∂r x=R
x a, and the radiation condition (7.11) is welldeﬁned. Proof: The smoothness of v and w follow from general regularity results for solutions of the Helmholtz equation Δu + k 2 u = 0 and is not proven here, see, e.g., [161], Corollary 3.4, or [55], Theorem 2.2. To show (7.13) we choose support in B(0, b) such that ρ(x) = 1 for x ≤ R. ρ ∈ C ∞ (R3 ) with compact Then ρv ∈ H01 B(0, b) as easily seen and thus by (7.12a) 0 = ρv Δw + ∇w · ∇(ρv) dx x 0 we estimate −j2 + i j · (2tˆ e − ip) − (it + α) 1 ≥ Re[· · · ] = j2 + 2tj2 − j · p + α ≥ 1 + j2 2 for all j ∈ Z3 with j ≥ j0 for some j0 ∈ N. Furthermore, −j2 + i j · (2tˆ e − ip) − (it + α) ≥  Im[· · · ] = t2j1 − 1 ≥ t for all j ∈ Z3 and t > 0. Therefore, wj =
gj −j2 + i j · (ip + 2tˆ e) − (it + α)
2 (Q) because are well deﬁned for all j ∈ Z3 and w ∈ Hper [1 + j2 ]2 wj 2 ≤ 4 gj 2 . j≥j0
j≥j0
Furthermore, the solution operator gj ei j·x , (Lt g)(x) := 2 + i j · (ip + 2tˆ −j e ) − (it + α) 3
g ∈ L2 (Q),
j∈Z
is bounded from L2 (Q) into itself with Lt L(L2 (Q)) ≤ 1/t for every t > 0. Now we can give a simple proof of the following version of a unique continuation principle. Theorem 7.7 Let n ∈ L∞ (R3 ) with n(x) = 1 for x ≥ a be given. Let u ∈ H 2 (R3 ) be a solution of the Helmholtz equation Δu + k 2 n u = 0 in R3 such that u(x) = 0 for all x ≥ b for some b ≥ a. Then u has to vanish in all of R3 . Proof: Deﬁne eˆ = (1, i, 0) ∈ C3 as before, set ρ = 2b/π, and deﬁne the function w(x) := ei/2 x1 −t eˆ·x u(ρx), x ∈ Q := (−π, π)3 ,
248
Inverse Scattering Problem
for some t > 0. Then w(x) = 0 for all x ≥ π/2, in particular near the boundary of the cube Q. Extend w to a 2πperiodic function in R3 by w(2πj + x) := w(x) 2 (Q), and w satisﬁes the for x ∈ Q and all j ∈ Z3 , j = 0. Then w ∈ Hper diﬀerential equation Δw + (2tˆ e − ip) · ∇w − (it + 1/4) w = −ρ2 k 2 n ˜w. Here, we have set p = (1, 0, 0) and n ˜ (2πj + x) := n(ρx) for almost all x ∈ [−π, π]3 and j ∈ Z3 . Application of the previous lemma to this diﬀerential equation yields the existence of a linear bounded operator Lt from L2 (Q) into itself with Lt L(L2 (Q)) ≤ 1/t such that the diﬀerential equation is equivalent to ˜w . w = −ρ2 k 2 Lt n Estimating
w L2 (Q) ≤
ρ2 k 2 ρ2 k 2 n ∞
˜ nw L2 (Q) ≤
w L2 (Q) t t
yields w = 0 for suﬃciently large t > 0. Thus, also u has to vanish.
The preceding theorem is a special case of a far more general unique continuation principle, which we formulate without proof here. 2 (Ω) be a solution of the Helmholtz equation Δu + k 2 nu = 0 in a Let u ∈ Hloc 3 domain Ω ⊂ R (i.e., Ω is open and connected). Furthermore, let n ∈ L∞ (Ω) and u(x) = 0 on some open set. Then u = 0 in all of Ω. For a proof we refer to, for example, [55]. Now we can prove the following uniqueness result. Theorem 7.8 (Uniqueness) The problem (7.10), (7.11) has at most one solution; that is, if u is a solution corresponding to ui = 0, then u = 0. Proof:
Let ui = 0. The radiation condition (7.11) yields 2
O(1/R )
= x=R
= x=R
2 ∂u ∂r − ik u ds
(7.18)
2 ∂u + k 2 u2 ds + 2 k Im ∂r
x=R
u
∂u ds. ∂r
We transform the last integral using Green’s formula (7.13) for v = u and w = u; that is, ∂u ∇u2 − k 2 n u2 dx , ds = u ∂r x=R
x 0 (depending only on n, D, and A) such that w H (A) ≤ c f L2 (D) + w L2 (D) .
(b) Every solution w ∈ H 2 (D) of Δw + k 2 nw = f in D is also an ultraweak solution. Proof: (a) Let A be any open bounded set such that A ⊂ D. We choose ρ ∈ C ∞ (D) with compact support in D such that ρ = 1 on A. Furthermore, let Q be a cube containing D in its interior. For any φ ∈ C ∞ (R3 ) we take ψ = ρφ in (7.21). With Δψ = ρΔφ + 2 ∇ρ · ∇φ + φΔρ we have ρ w φ − Δφ dx D
= D
=
2 w (k n + 1)ρφ + 2 ∇ρ · ∇φ + φΔρ − f ρ φ dx
(7.22)
g φ + h · ∇φ dx
D
with g = w[(k 2 n + 1)ρ + Δρ] − f ρ and h = 2w∇ρ. We can replace the region of integration by Q because ρ vanishes in D \ Q. Without loss of generality we L2 (Q) we expand them into Fourier assume that Q= (−π, π)3 . Since g, h ∈ ij·x and h(x) = j∈Z3 hj eij·x with gj ∈ C and hj ∈ series g(x) = j∈Z3 gj e 3 2 2 C such that < ∞ and 3 gj  j∈Z j∈Z3 hj  < ∞ and make the ansatz (ρw)(x) = j∈Z3 wj eij·x in Q. We take φ(x) = e−i·x for some ∈ Z3 in (7.22) and have (1 + 2 ) w = g − i · h and thus (1 + 2 ) w 2 ≤
2 2 g  + 2 h 2 ≤ 2 g 2 + h 2 2 1 + 
(7.23)
1 Therefore, ρw ∈ Hper (Q) and
w H 1 (A)
≤ ≤
1 (Q) ≤ c2 g L2 (Q) + h L2 (Q) c1 ρw Hper c3 w L2 (D) + f L2 (D) .
1 Since A was arbitrary we have shown that w ∈ Hloc (D). Now we repeat the ﬁrst part of the proof but apply Green’s ﬁrst formula to the second term of the
7.2
The Direct Scattering Problem
251
right hand side of (7.22). This yields ρ w φ − Δφ dx = φ w(k 2 n + 1)ρ − 2 div (w∇ρ) + w Δρ − f ρ dx . Q
=: g
Q
Now we argue in the same way but with h = 0. Estimate (7.23) yields (1 + 2 2 )2 w 2 ≤ 2g 2 and thus ρw ∈ Hper (Q) and ≤
w H 2 (A)
≤
2 (Q) ≤ c2 g L2 (Q) c1 ρw Hper c3 w L2 (D) + ∇w L2 (B) + f L2 (D)
where B is the support of ρ. Now we substitute the estimate for w H 1 (B) which yields the desired estimate. (b) This follows directly from Green’s theorem in the form (7.12b). Now we construct volume potentials with the fundamental solution (7.19). Theorem 7.11 Let Ω ⊂ R3 be a bounded domain. For every φ ∈ L2 (Ω) the volume potential φ(y) Φ(x, y) dy , x ∈ R3 , (7.24) v(x) := Ω 2 Hloc (R3 )
yields a function v ∈ that satisﬁes the radiation condition (7.11) and is the only radiating2 solution of Δv + k 2 v = −φ. Furthermore, for every ball B = B(0, R) containing Ω in its interior there exists c > 0 (only dependent on B, k, and Ω) such that (7.25)
v H 2 (B) ≤ c φ L2 (Ω) . Proof: First we state without proof (see, e.g., [161], Theorem 3.9) that for any k ∈ C and φ ∈ C01 (Ω) ; that is, φ ∈ C 1 (Ω) with compact support in Ω, the potential v is in C 2 (R3 ) and solves Δv + k 2 v = −φ in Ω and Δv + k 2 v = 0 in the exterior of Ω. Second, we ﬁx φ ∈ L2 (Ω) and choose a sequence φj ∈ C01 (Ω) which converges to φ in L2 (Ω). Let v and vj be the corresponding potentials, and let ψ ∈ C ∞ (R3 ) some test function with compact support. Then Δvj + k 2 vj = −φj in R3 and thus by Green’s second theorem vj (Δψ + k 2 ψ) dx = − φj ψ dx . R3
Ω
Let B(0, R) be a ball that contains the support of ψ. From the boundedness of the volume integral operator from L2 (Ω) into L2 B(0, R) we conclude that vj converges to v in L2 (B(0, R)) as j tends to inﬁnity. Therefore, v (Δψ + k 2 ψ) dx = − φ ψ dx R3 2 that
is, it satisﬁes the radiation condition (7.11)
Ω
252
Inverse Scattering Problem
for all ψ ∈ C ∞ (R3 ) with compact support. Therefore, v is an ultraweak solution of Δv + k 2 v = −φ in R3 . The regularity result of Lemma 7.10 applied to 2 B(0, R + 1)) and the estimate D = B(0, R + 1) yields v ∈ Hloc
v H 2 (B(0,R) ≤ c1 v L2 (B(0,R+1) + φ L2 (Ω) ≤ c2 φ L2 (Ω) 2 where we used again the boundedness of the volume potential from L (Ω) to 2 L B(0, R + 1) .
Now we can transform the scattering problem into a Fredholm integral equation of the second kind. The following theorem is needed quite often later on. 2 Theorem 7.12 (a) Let u ∈ Hloc (R3 ) be a solution of the scattering problem (7.10), (7.11). Then uB(0,a) belongs to L2 B(0, a) and solves the Lippmann– Schwinger integral equation i 2 u(x) = u (x) − k 1 − n(y) Φ(x, y) u(y) dy , x ∈ B(0, a) . (7.26) y 0. Lemma 7.44 The characteristic function d from (7.87) is an even entire func1 tion of order one provided n(1) = 1 or η := 0 n(s)ds = 1. Proof: d is an even function because yk and also k → sin k/k and cos are all functions of k 2 . Furthermore, from the Volterra integral equation (5.12b) for the part u2 (·, λ, q) of the fundamental system {u1 , u2 } of (5.7a), (5.7b) we derive easily that λ → u2 (η, λ, q) is holomorphic in all of C and thus also k → yk (1) = n(1)−1/4 zk (η) = [n(0)n(1)]−1/4 u2 (η, k 2 , q). The same holds for the derivative. Therefore, d is an entire function. It remains to compute the order of d. From (7.90) and the estimate  sin z ≤ ez we observe that d(k) ≤ c ek(η+1) for some c > 0. Therefore, for k = r, ln[ln c + (η + 1)r] ln[(η + 2)r] ln(η + 2) ln(ln d(k)) ≤ ≤ = + 1 ln r ln r ln r ln r for large values of r which proves that ρ ≤ 1. Now we set k = ti for t ∈ R>0 . t(η+1) If n(1) = 1 then it is easy to see that f (it) ≥ A for large values of t 2 e (where f if given below (7.90)) and thus, again from (7.90), d(it) ≥ c et(η+1) for some c > 0. This yields that ρ ≥ 1 and ends the proof if n(1) = 1. If n(1) = 1 then η = 1 and A = 0 and one gets d(it) ≥ c etη−1 and again ρ ≥ 1. The following theorem is a special case of the more general factorization theorem of Hadamard which we cite without proof (see, e.g., [19], Chapter 2).
288
Inverse Scattering Problem
Theorem 7.45 (Hadamard) Let f be an entire function of order one, let m ≥ 0 be the order7 of the root z = 0 of f (in particular, m = 0 if f (0) = 0), and let {aj : j ∈ N} be the nonzero roots of f repeated according to multiplicity. Then there exists a polynomial p of degree at most one such that f (z) = ep(z) z m
∞
% j=1
1−
z aj
ez/aj ,
z ∈ C.
(7.92)
With this theorem we can prove a slight generalization of a theorem of Laguerre. Theorem 7.46 Let f be an entire function of order one which is real for real values of z. Suppose that f has inﬁnitely many real zeros and only a ﬁnite number of complex ones. Then f has a single critical point8 on each interval formed by two consecutive real zeros of f provided this interval is suﬃciently far away from the origin. Proof: Since f has only ﬁnitely many complex zeros (which will occur in conjugate pairs) there exists a polynomial q whose roots are exactly all those complex ones of f as well as a possible root at z = 0 which has the same order as the possible zero of f at z = 0. We factorize f (z) = q(z)g(z) where all the roots of g are given by {aj : j ∈ N} in nondecreasing magnitude (where multiple roots are repeated according to their multiplicities). We apply Hadamard’s theorem to the function g and have for real values of x ∈ / {aj : j ∈ N} g (x) g(x)
∞ d d x x = ln g(x) = p (x) + + ln 1 − dx dx aj aj j=1 ∞ 1 1 = α + + x − a a j j j=1
where α = p (x) is a real constant (since p is a polynomial of order at most one). Diﬀerentiating this expression yields
d f (x) d q (x) d g (x) = + dx f (x) dx q(x) dx g(x)
∞ 1 d q (x) , x∈ / {aj : j ∈ N} . = − dx q(x) (x − aj )2 j=1 Since q is a polynomial with no real zeros (except possibly z = 0) there exists d q (x) c > 0 with dx q(x) ≤ xc 2 for all x ≥ 1. Choose N ∈ N with N > 2c and 7 Note 8 that
that the order of a root of a holomorphic function is always ﬁnite. is, a zero of f
7.6
The Interior Transmission Eigenvalue Problem
289
then R > 0 with aj − x2 ≤ 2x2 for all x ≥ R and j = 1, . . . , N . Then d dx
f (x) f (x)
≤
N 1 c N 2c − N c − ≤ − = < 0 2 2 2 2 x (x − aj ) x 2x 2x2 j=1
for x ≥ R, x ∈ / {aj : j ∈ N}. Therefore, f /f is strictly decreasing in every interval (a , a+1 ) with a < a+1 and large enough. Furthermore, from f (x) f (x)
∞ 1 q (x) g (x) q (x) 1 = + = + α + + q(x) g(x) q(x) x − aj aj j=1 1 1 q (x) + α + + = q(x) x − aj aj j =,+1 1 1 1 1 + + + + x − a a x − a+1 a+1 f (x) x→a + f (x)
we conclude that lim
= +∞ and
lim f (x) x→a+1 − f (x)
(7.93)
= −∞. Therefore, f /f
has exactly one zero in the interval (a , a+1 ) which ends the proof.
As a corollary we obtain Laguerre’s theorem (see [19], Chapter 2). Corollary 7.47 (Laguerre) Let f be an entire function of order one which is real for real values of z and all of its zeros are real. Then all of the zeros of f are real as well and interlace those of f . Proof: In this case we set q(z) = z m where m is the order of the zero z = 0. Then, by analytic extension, (7.93) holds also for complex z instead of x ∈ R. If z ∈ C is a critical point then from (7.93) ∞ ∞ 1 z − aj 1 1 m mz + α + + + α + + 0 = = . z z − aj aj z2 z − aj 2 aj j=1 j=1 Taking the imaginary part yields 0 = − Im z
∞ 1 m − Im z z2 z − aj 2 j=1
which implies that z is real. Now we follow exactly the proof of the preceding theorem. Now we are able to answer the question of existence of complex eigenvalues for the special case of n being constant. For constant n the function yk from
290
Inverse Scattering Problem
(7.86) is given by yk (r) = 1 kη fη (k) where
sin(kηr) kη
for r ≥ 0 where η =
fη (k) = η cos(kη) sin k − cos k sin(kη) ,
√
n. Therefore, d(k) = k ∈ C.
(7.94)
Here we indicated the dependence on η > 0. For the author the following result is rather surprising. √ Theorem 7.48 If η = n = 1 is an integer or the reciprocal of an integer then no complex eigenvalues with eigenfunctions depending only on r exist. Otherwise, there exist inﬁnitely many complex eigenvalues. Proof: We consider the function fη from (7.94) and note ﬁrst that −ηf1/η (ηk) = fη (k) for all k ∈ C. Therefore, a complex zero of fη exists if, and only if, a complex zero of f1/η exists. It is thus suﬃcient to study the case η > 1. Let ﬁrst η > 1 be an integer. Then fη (k) = 0 if, and only if, k is a critical 2 point of the entire function g(k) = sin(ηk) sin k because g (k) = fη (k)/ sin k. Since all of the zeros of g are real, by Corollary 7.47 also its critical points are real; that is, all of the zeros of fη are real. Let now η > 1 not be an integer. We will construct a sequence I of intervals which tend to inﬁnity such that fη does not change its sign on I and each I contains two consecutive real critical points of fη . By Theorem 7.46 this is only possible if there exist inﬁnitely many complex zeros of fη . From fη (k) = (1 − η 2 ) sin(kη) sin k jπ we observe that η : j ∈ N and {jπ : j ∈ N} are the critical points of / N for all fη . Choose a sequence m ∈ N converging to inﬁnity such that η m ∈ ∈ N. Fix ∈ N in the following and set m = m for abbreviation. The interval (mη − 1, mη) contains an integer j. Set ε = mη − j ∈ (0, 1). The two points jπ η π π and mπ are consecutive zeros of fη because their distance is mπ − jπ η = εη < η. Furthermore,
jπ jπ jπ − cos sin(jπ) fη = η cos(jπ) sin η η η
επ επ and = η (−1)j sin mπ − = η (−1)j+1+m sin η η fη (mπ) = η cos(mπη) sin(mπ) − cos(mπ) sin(mπη) = (−1)m+1 sin(jπ + επ) = (−1)m+1+j sin(επ) . and fη (mπ) coincide because ε, ε/η ∈ We observe that the signs of fη jπ η (0, 1). Furthermore, fη has no zero between jπ η and mπ because otherwise there would be another critical point between them. Therefore, the interval I = jπ η , mπ has the desired properties. We refer to Problem 7.5 for two explicit examples where the assertions of this theorem are illustrated.
7.6
The Interior Transmission Eigenvalue Problem
291
There exist also results on the existence of complex transmission eigenvalues for nonconstant refractive indices under certain conditions.For example, as shown in [55] (see also [58]), if either 1 < n(1) < η or η < n(1) < 1 (where η is again given by (7.88d)) there exist inﬁnitely many real and inﬁnitely many complex eigenvalues. Also, all complex eigenvalues lie in a strip around the real axis if n(1) = 1. For a more detailed investigation of the location for the constant case we refer to [257] and to [55] and the references therein.
7.6.2
Discreteness And Existence in the General Case
We continue now with the general case; that is, D is a bounded Lipschitz domain and n is realvalued with n(x) ≥ 1 + q0 on D for some q0 > 0. For the deﬁnition of a Lipschitz domain we refer again to [191] or [161]. For Lipschitz domains we can give an alternative characterization of an interior transmission eigenvalue. We allow k = 0 to be complex. Theorem 7.49 Let D be a Lipschitz domain. k ∈ C \ {0} is an interior transmission eigenvalue if and only if there exist u ∈ H02 (D) and v ∈ L2 (D), not vanishing simultaneously, such that Δu + k 2 nu = k 2 (n − 1) v a.e. in D
and
Δv + k 2 v = 0 in D in the ultra weak sense; that is, v Δϕ + k 2 ϕ dx = 0 for all ϕ ∈ H02 (D) .
(7.95a)
(7.95b)
D
Proof: Let ﬁrst u ∈ H02 (D) and v ∈ L2 (D) with (7.95a), (7.95b). Set w = v − u and let φ, ψ ∈ H 2 (D) with φ − ψ ∈ H02 (D). Then
v Δφ + k 2 φ − w Δψ + k 2 nψ dx
D
=
v Δ(φ − ψ) + k 2 (φ − ψ) dx +
D
k 2 v (1 − n) ψ + u Δψ + k 2 nψ dx
D
The ﬁrst integral on the right hand side vanishes because v is an ultra weak solution of Δv +k 2 v = 0. For the second integral we use Green’s second formula (7.12b) which yields D
u Δψ + k 2 nψ dx =
D
ψ Δu + k 2 nu dx = k 2
ψ (n − 1) v dx . D
Therefore, the pair (v, w) ∈ L2 (D) × L2 (D) satisﬁes (7.84). This proves the ﬁrst direction.
292
Inverse Scattering Problem
For the reverse direction let v, w ∈ L2 (D) with (7.84). Deﬁne u = v − w in D and u = 0 in R3 \ D. Furthermore, set ϕ = ψ for some ψ ∈ C ∞ (R3 ) with compact support in (7.84). Then u Δψ + k 2 nψ dx = k 2 (n − 1) v ψ dx ; R3
D
that is, u is an ultra weak solution of Δu + k 2 nu = k 2 (n − 1)v in R3 . The regularity result of Lemma 7.10 yields u ∈ H 2 (R2 ) with u = 0 in R3 \ D; that is, u ∈ H02 (D) by Lemma 7.4 and Δu + k 2 nu = k 2 (n − 1)v almost everywhere in D. Finally, let ϕ ∈ H02 (D) and set ψ = 0 in (7.84). Then D v Δϕ + k 2 ϕ dx = 0. This ends the proof. This theorem makes it possible to eliminate the function v from the system. Indeed, let u ∈ H02 (D) and v ∈ L2 (D) satisfy (7.95a) and (7.95b). We devide (7.95a) by n − 1 (note that n(x) − 1 ≥ q0 on D by assumption), multiply the resulting equation by Δψ + k 2 ψ for some ψ ∈ H02 (D) and integrate. This gives dx = k 2 v Δψ + k 2 ψ dx = 0 Δu + k 2 nu Δψ + k 2 ψ n−1 D
D
by (7.95b); that is, replacing ψ by its complex conjugate, dx = 0 for all ψ ∈ H02 (D) . Δu + k 2 nu Δψ + k 2 ψ n−1
(7.96)
D
1 On the other side, if u ∈ H02 (D) satisﬁes (7.96) then we set v = k2 (n−1) [Δu + 2 2 k nu] ∈ L (D) which satisﬁes (7.95b). In this sense the system (7.95a), (7.95b) is equivalent (7.96) is the weak form of the fourth order 1 Equation to (7.96). Δu + k 2 nu = 0. equation Δ + k 2 n−1 With respect to (7.96) it is convenient to introduce a new inner product (·, ·)∗ in H02 (D) by dx , u, ψ ∈ H02 (D) . (u, ψ)∗ := Δu Δψ (7.97) n−1 D
Lemma 7.50 (·, ·)∗ deﬁnes an inner product in H02 (D) with corresponding norm
· ∗ which is equivalent to the ordinary norm · H 2 (D) in H02 (D). Proof: By the deﬁnition of H02 (D) and a denseness argument it is suﬃcient to prove the existence of c1 , c2 > 0 with c1 ψ ∗ ≤ ψ H 2 (D) ≤ c2 ψ ∗ for all ψ ∈ C0∞ (D). The left estimate is obvious because n − 1 ≥ q0 > 0 on D. To prove the right estimate we choose a cube Q ⊂ R3 which contains D in its interior, extend any ψ ∈ C0∞ (D) by zero into Q, and then periodically (with respect to Q) into R3 . Without loss of generality we assume that Q = (−π, π)3 .
7.6
The Interior Transmission Eigenvalue Problem
293
2 2 (Q) where c > 0 is independent of Then ψ ∈ Hper (Q) and ψ H 2 (D) ≤ c ψ Hper ψ. With the Fourier coeﬃcients ψj of ψ (see (7.15b)) we have = [1 + j2 ]2 ψj 2
ψ 2Hper 2 (Q) j∈Z3
= ψ0 2 +
[1 + 2j2 + j4 ]ψj 2
0 =j∈Z3
≤
ψ0 2 + 4
j4 ψj 2 = ψ0 2 +
0 =j∈Z3
≤
ψ0 2 + 4
4
Δψ 2L2 (Q) (2π)3
n − 1 ∞
ψ 2∗ . (2π)3
∈ ∂Q we conclude from the It remains to estimate ψ 0 . For x = (π, 0, 0) ij1 π = 0 and thus boundary condition that j ψj e
ψ0  ≤
ψj  =
&
j =0
0 =j∈Z3
as before.
1 j2 ψj  ≤ j2
1 & j4 ψj 2 ≤ c ψ ∗ j4 j =0
j =0
Now we rewrite (7.96) in the form (u, ψ)∗ + k 2 a(u, ψ) + k 4 b(u, ψ) = 0 for all ψ ∈ H02 (D) , where
a(u, ψ)
=
[n u Δψ + ψ Δu] D
= D
=
nuψ D
dx n−1
dx + [u Δψ + ψ Δu] n−1
b(u, ψ)
(7.98)
u Δψ dx , D
dx n−1
for u, ψ ∈ H02 (D). The sesquilinear forms a and b are hermetian. For a this is seen from the second form and Green’s second identity (7.12b). By the representation theorem of Riesz (Theorem A.23 of the Appendix) in the Hilbert space H02 (D) for every u ∈ H02 (D) there exists a unique u ∈ H02 (D) with a(u, ψ) = (u , ψ)∗ for all ψ ∈ H02 (D). We deﬁne the operator A from H02 (D) into itself by Au = u . Therefore, a(u, ψ) = (Au, ψ)∗ for all u, ψ ∈ H02 (D). Then it is not diﬃcult to prove that A is linear, selfadjoint, and compact.9 9 For the proof of compactness one needs, however, the fact that H 1 (D) is compactly 0 imbedded in L2 (D), see Problem 7.1.
294
Inverse Scattering Problem
Analogously, there exists a linear, selfadjoint, and compact operator B from H02 (D) into itself with b(u, ψ) = (Bu, ψ)∗ for all u, ψ ∈ H02 (D). Then (7.96) can be written as (u, ψ)∗ + k 2 (Au, ψ)∗ + k 4 (Bu, ψ)∗ = 0 for all ψ ∈ H02 (D); that is, u + k 2 Au + k 4 Bu = 0
in H02 (D) .
(7.99)
This is a quadratic eigenvalue problem in the parameter τ = k 2 . We can reduce it to a linear eigenvalue problem for a compact operator. Indeed, since B is positive (that is, (Bu, u)∗ > 0 for all u = 0) there exists a positive and compact operator B 1/2 from H02 (D) into itself with B 1/2 B 1/2 = B (see formula (A.47) of the Appendix). If u satisﬁes (7.99) for some k ∈ C \ {0} then set u1 = u and u2 = k 2 B 1/2 u. Then the pair uu12 ∈ H02 (D) × H02 (D) satisﬁes
u1 A + k2 −B 1/2 u2
B 1/2 0
u1 u2
=
0 . 0
(7.100)
Therefore 1/k 2 is an eigenvalue of the compact (but not selfadjoint) operator
−A −B 1/2 . Conversely, if uu12 ∈ H02 (D) × H02 (D) satisﬁes (7.100) 1/2 B 0 then u = u1 satisﬁes (7.99). Therefore, wellknown results on the spectrum of compact operators (see, e.g., [151]) imply the following theorem. Theorem 7.51 There exists at most a countable number of eigenvalues k ∈ C with no accumulation point in C. By diﬀerent methods the discreteness can be shown under the weaker assumption that n > 1 only in a neighborhood of the boundary ∂D (see, e.g., [256, 159] or [34]). The question of the existence of real eigenvalue was open for about 20 years.10 . The idea of the proof of the following result goes back to P¨ aiv¨ arinta and Sylvester ([213]) and was generalized to general refractive indices by Cakoni, Gintides, and Haddar ([36]). Theorem 7.52 There exists a countable number of real eigenvalues k > 0 which converge to inﬁnity. Proof: With the Riesz representations A and B of the hermetian forms a and b, respectively, from above we deﬁne the family of operators Φ(κ) = I + κA + κ2 B for κ ≥ 0 with corresponding sesquilinear forms dx Δu + κnu Δψ + κψ φ(κ; u, ψ) = n−1 D dx − κ u Δψ + κψ dx ; (7.101) Δu + κu Δψ + κψ = n−1 D
10 Note
D
that the matrix operator in (7.100) fails to be selfadjoint!
7.6
The Interior Transmission Eigenvalue Problem
295
that is, φ(κ; u, ψ) = Φ(κ)u, ψ ∗ for all u, ψ ∈ H02 (D). Here, κ plays the role of k 2 and is considered as a parameter. We search for parameters κ > 0 such that 0 is an eigenvalue of Φ(κ); that is, −1 is an eigenvalue of the compact and selfadjoint operator κA + κ2 B. For κ ≥ 0 let λj (κ) ∈ R, j ∈ N, be the nonzero eigenvalues of the compact and selfadjoint operators κA + κ2 B. They converge to zero as j tends to inﬁnity (if there exist inﬁnitely many). The corresponding eigenspaces are ﬁnite− dimensional. Let the negative eigenvalues be sorted as λ− 1 (κ) ≤ λ2 (κ) ≤ · · · < 0 where the entries are repeated with its multiplicity. In general, they could be none or ﬁnitely many or inﬁnitely many of them. Let m ∈ N be any natural number. First we construct κ ˆ > 0 and a subspace κ; u, u) ≤ 0 for all u ∈ Vm : We choose Vm of H02 (D) of dimension m such that φ(ˆ ε > 0and m pairwise disjoint balls Bj = B(zj , ε), j = 1, . . . , m, of radius ε m with j=1 Bj ⊂ D. Setting n0 := 1 + q0 we note that n(x) ≥ n0 on D. In every ball Bj we consider the interior transmission eigenvalue problem with constant refractive index n0 . By Theorem 7.41 inﬁnitely many real and positive transmission eigenvalues exist. Note that these eigenvalues do not depend on j because a translation of a domain results in the same interior transmission eigenvalues. Let kˆ > 0 be the smallest one with corresponding eigenfunctions uj ∈ H02 (Bj ). Set κ ˆ = kˆ2 and let φj (κ; u, ψ) be the sesquilinear form corresponding to n0 in κ; uj , ψ) = 0 for all ψ ∈ H02 (Bj ). We extend each uj by zero to a Bj . Then φj (ˆ function in H02 (D). (We use Lemma 7.4 again.) Then {uj : j = 1, . . . , m} are certainly linearly independent because their supports are pairwise disjoint. We : j = 1, . . . , m} as a m−dimensional subspace of H02 (D) deﬁne Vm = span{uj m and compute for u = j=1 αj uj ∈ Vm , using (7.101), φ(ˆ κ; u, u)
=
=
m j=1 m
αj 2 φ(ˆ κ; u j , uj ) αj 2
j=1
≤
m j=1
=
m
Bj
αj 
2
Bj
2 dx Δuj + κ − κ ˆ ˆ uj n−1
uj Δuj + κ ˆ uj dx
Bj
2 Δuj + κ ˆ uj
dx − κ ˆ n0 − 1
uj Δuj + κ ˆ uj dx
Bj
αj 2 φj (ˆ κ; u j , uj ) = 0 .
j=1
This shows that Φ(ˆ κ)u, u ∗ ≤ 0 for all u ∈ Vm . By Corollary A.55 we conclude that there exist eigenvalues λ (ˆ κ) ≤ −1 of κ ˆA + κ ˆ 2 B for = 1, . . . , m. Furthermore, again by Corollary A.55 of Appendix A.6 the eigenvalues λ (κ) depend continuously on κ and λ (0) = 0 for all . Therefore, for every ∈ {1, . . . , m} ˆ ] with λ (κ ) = −1. The corresponding eigenfunctions there exists κ ∈ (0, κ √ κ : = 1, . . . , m u , = 1, . . . , m, satisfy Φ(κ )u = 0 which implies that are interior transmission eigenvalues. Since m was arbitrary the existence of
296
Inverse Scattering Problem
inﬁnitely many real eigenvalues is shown.
The—very natural—question of existence of complex transmission eigenvalues in this general situation is totally open. Since even in the case of a constant refractive index in a ball both, existence and nonexistence, of complex eigenvalues can occur (see, for example Theorem 7.48) the problem is certainly hard to solve. Aspects concerning the asymptotic distribution of the eigenvalues (Weyl’s formula), location and properties of the eigenfunctions have attracted a lot of attention, also for much more general types of elliptic problems. We only mention the special issue ([37]) of Inverse Problems in 2013, the already mentioned monographs [34, 55], and refer to the references therein.
7.6.3
The Inverse Spectral Problem for the Radially Symmetric Case
After the study of the existence and discreteness of the interior transmission eigenvalues the natural question arises where these eigenvalues determine the refractive index n(x) uniquely. For spherically stratiﬁed media studied in Subsection 7.6.1, this question is the analogue of the inverse SturmLiouville problem of Chapter 5 and has been subject of intensive research. It started with the work by J. McLaughlin and P. Polyakov ([190]) and was picked up in, e.g., [56–58, 46, 45]. We will follow the presentations of [34, 55]. We assume as at the beginning of Subsection 7.6.1 that n ∈ C 2 [0, 1] is positive on [0, 1]. This smoothness assumption is mainly necessary to apply the Liouville transform. We have seen in Subsection 7.6.1 that the interior transmission eigenvalue eigenvalues of (7.44a), (7.44b) are—for radially symmetric eigenfunctions—just the zeros of the characteristic function d, given by (7.87); that is, sin k − yk (1) cos k , k ∈ C , (7.102) d(k) = yk (1) k where yk solves the initial value problem yk (r) + k 2 n(r)yk (r) = 0 in (0, 1) ,
yk (0) = 0 , yk (0) = 1 .
(7.103)
Therefore, the inverse problem is to recover the refractive index n = n(r) from the zeros of the characteristic function d = d(k). This corresponds exactly to the situation of Chapter 5 where the inverse spectral problem was to recover q = q(x) from the zeros of the characteristic function f = f (λ). Therefore, it is not surprising that we use similar arguments. We saw already (in Lemma 7.44) that the characteristic function d is an 1 even entire function of order one provided n(1) = 1 or η := 0 n(s)ds = 1. We prove a further property of d. Lemma 7.53 Let d be the characteristic function from (7.102). Then d(k) 1 − = lim k→0 k 2 3
1 0
n(s) s2 ds .
7.6
The Interior Transmission Eigenvalue Problem
297
which is not zero if, for example, n(r) ≤ 1 for all r ∈ [0, 1] and n ≡ 1. Therefore, under this condition k = 0 is a root of d of order 2. Proof: For low values of k it is convenient to use an integral equation argument directly for yk . Indeed, it is not diﬃcult to show that yk ∈ C 2 [0, 1] solves (7.103) if, and only if yk ∈ C[0, 1] satisﬁes the Volterra equation r 2 (r − s) n(s) yk (s) ds , 0 ≤ r ≤ 1 . (7.104) yk (r) = r − k 0
For suﬃciently small k (actually, for all values of k) this ﬁxed point equation is solved by the Neumann series (see Theorem A.31 of Appendix A). The ﬁrst two terms yield r 2 (r − s) n(s) s ds + O(k4 ) . yk (r) = r − k 0
Also, for the derivative we get yk (r)
= 1 − k
2
r n(s) yk (s) ds = 1 − k 0
2
r
n(s) s ds + O(k4 ) .
0
Substituting these expansions for r = 1 and the power series of sin k/k and cos k into the deﬁnition of d yields after collecting the terms with k 2 ⎡ ⎤ 1 1 d(k) = k 2 ⎣ − n(s) s2 ds⎦ + O(k4 ) . 3 0
This ends the proof.
From the properties derived in Lemmas 7.44 and 1 7.53 we have the following form of the characteristic function, provided 0 n(s)s2 ds = 1/3 holds and n(1) = 1 or η = 1. ∞
% k2 1 − 2 , k ∈ C, (7.105) d(k) = γ k 2 kj j=1 for some γ ∈ C where kj ∈ C are all nonzero roots of d repeated according to multiplicity. This follows directly from Theorem 7.45 and the fact that with k also −k is a zero. Indeed, let {kj : j ∈ J} be the set of zeros in the half plane {z ∈ C : Re z > 0 or z = it, t > 0}. Then the disjoint union {kj : j ∈ J} ∪ {−kj : j ∈ J} cover all of the roots. We group the factors of Hadamard’s formula (7.5) into pairs 2 1 − kkj ek/kj 1 + kkj e−k/kj = 1 − kk2 for j ∈ J. Furthermore, the polynomial j
p(k) must be constant because d is even. This shows that ∞
%
k2 1 p 2% k2 e k 1− 2 = 1− 2 . d(k) = ep k 2 kj 2 kj j=1 j∈J
298
Inverse Scattering Problem
After these preparations we turn to the inverse problem to determine n(r) from (all of) the roots kj ∈ C of the characteristic function d. Our main concern is again the question of uniqueness, namely, do the refractive indices n1 and n2 have to coincide if the roots of the corresponding characteristic functions coincide? Therefore, we assume that we have two positive refractive indices n1 , n2 ∈ C 2 [0, 1] such that the zeros kj of their characteristic functions d1 and d2 coincide. 1 Furthermore, we assume that 0 n (s) s2 ds = 1/3 for = 1, 2 and also n (1) = 1 for = 1, 2. Then the characteristic functions d (k) have the representations (7.105) with constants γ for = 1, 2. Then we conclude that d1 (k)/γ1 = d2 (k)/γ2 for all k ∈ C; that is, we can determine d(k)/γ from the data {kj : j ∈ N}. In the following we proceed in several steps. StepA:We determine η from the data; that is, we show that η1 = η2 where 1 η = 0 n (s) ds for = 1, 2 if the transmission eigenvalues corresponding to n1 and n2 coincide. Fix a ∈ R and set k = a + it for t > 0. From the asymptotic behavior (7.90) we conclude that k
d(k) γ
1 1/4 A sin k(1 + η) + B sin k(1 − η) γ n(0)n(1)
exp(t(η + 1)) +O t A −ia(η+1) t(η+1) 1 + O(1/t) , t → ∞ , e = − 1/4 e γ n(0)n(1) 2i
=
because 1 + η > 1 − η. Here, A and B are given by (7.91) and A = 0 because of n(1) = 1. Analogously, we have for the complex conjugate k = a − it k
A d(k) = eia(η+1) et(η+1) 1 + O(1/t) , t → ∞ . 1/4 γ γ n(0)n(1) 2i
Therefore, also the ratio is known and also ψ(a) := lim
t→∞
k d(k) = −e2ia(1+η) k d(k
for all a ∈ R .
This determines η through 2i(1 + η) = ψ (0). Step B: Under the assumption η = 1 we determine n(1) and γ n(0)1/4 from the data. Let now k > 0 be real valued. Estimate (7.90) has the form k
1 d(k) A sin k(1 + η) + B sin k(1 − η) + R(k) = 1/4 γ γ n(0)n(1)
7.6
The Interior Transmission Eigenvalue Problem
299
with R(k) ≤ c1 /k for k ≥ 1 and some c1 > 0. The left hand side is known and also η from Step A. Therefore, for ﬁxed a > 0, also
ψ1 (T )
=
1 T
T k
d(k) sin k(1 + η) dk γ
a
=
1
1/4 γ n(0)n(1)
1 T
T
A sin2 k(1 + η) +
a
1 +B sin k(1 − η) sin k(1 + η) dk + T
T
R(k) sin k(1 + η) dk
a
and ψ2 (T )
=
1 T
T k
d(k) sin k(1 − η) dk γ
a
=
1
1/4 γ n(0)n(1)
1 T
T
A sin k(1 + η) sin k(1 − η) +
a
1 +B sin k(1 − η) dk + T 2
T
R(k) sin k(1 − η) dk
a
are known for T ≥ a. The terms involving R(k) tend to zero as T tends T T c1 ln(T /a) . The other elemento inﬁnity because T1 a R(k) dk ≤ cT1 a dk k = T T tary integrals can be computed explicitly which yields that lim T1 a sin k(1 + T →∞ T η) sin k(1 − η) dk = 0 and lim T1 a sin2 k(1 ± η) dk = 12 (note that η = 1). T →∞
Therefore, the limits lim ψ1 (T ) = T →∞
are known and thus also
A 2γ[n(0)n(1)]1/4
B and lim ψ2 (T )= 2γ[n(0)n(1)] 1/4 T →∞
n(1) − 1 A ψ1 (T ) = = . T →∞ ψ2 (T ) B n(1) + 1 lim
This determines n(1), thus also A and B and therefore also γ n(0)1/4 . From now on we assume that n(r) ≤ 1 for all r ∈ [0, 1] and n(1) < 1. Then 1 η < 1 and 0 n(s) s2 ds < 13 . Therefore, all of the assumptions are satisﬁed for the determination of η, n(1), and γ n(0)1/4 . −1/4 zk (s(r)) from Step C: Now we use the Liouville transform r yk (r) = n(s(r)) (7.88a)–(7.88d) again where s(r) = 0 n(s)ds and zk (s) solves (7.88b) in (0, η) with zk (0) = 0 and zk (0) = n(0)−1/4 . The application of Theorem 5.19 in Example 5.20 yields an explicit form of n(0)1/4 zk (r) in terms of the solution
300
Inverse Scattering Problem
K ∈ C 2 (Δ0 ) of the Gousat problem11 Kss (s, t) − Ktt (s, t) − q(s) K(s, t) = 0 in Δ0 , K(s, 0) K(s, s)
(7.106a)
= 0, 0 ≤ s ≤ η, s 1 q(σ) dσ , 0 ≤ s ≤ η , = 2
(7.106b) (7.106c)
0
where q(s) is related to n(r) by (7.88c) and Δ0 = {(s, t) ∈ R2 : 0 < t < s < η}. Indeed, equation (5.52) yields ⎡ ⎤ s(r) sin(kt) ⎥ 1 ⎢ sin ks(r) + dt⎦ (7.107) yk (r) = K s(r), t ⎣ 1/4 k k [n(0)n(r)] 0
for 0 ≤ r ≤ 1 and thus
⎡
sin(kη) 1 + yk (1) = 1/4 ⎣ k n(0)n(1)
⎤
η K(η, t) 0
sin(kt) ⎦ dt . k
Diﬀerentiation of (7.107) and setting r = 1 yields ⎤ ⎡ 1/4 η η n(1) ∂K(η, t) sin(kt) ⎦ ⎣cos(kη) + sin(kη) q(s) ds + yk (1) = dt n(0) 2k ∂s k 0 0 ⎤ ⎡ η n (1) sin(kt) ⎦ ⎣ sin(kη) + − dt K(η, t) k k 4 n(0)1/4 n(1)3/4 0
Step D: We determine K(η, t) for t ∈ [0, η]. With the constant γ from Hadamard’s formula (7.105) we have for k = π, ∈ N, π
d(π) γ
= π =
yπ (1) (−1)+1 γ
⎡ ⎤ η (−1)+1 ⎣sin(πη) + K(η, t) sin(πt) dt⎦ . [γ n(0)1/4 ] n(1)1/4 0
The left hand ηside and the factor in front of the bracket are known which implies that 0 K(η, t) sin(πt) dt is determined from the data for all ∈ N. This determines K(η, ·) because {sin(πt) : ∈ N} is complete in L2 (0, η) since η < 1. Indeed, extend K(η, ·) by zero into (0, 1) and then to an odd function 11 The
interval [0, 1] is now replaced by [0, η] which does not aﬀect the result.
7.6
The Interior Transmission Eigenvalue Problem
301
η into (−1, 1). Then 0 K(η, t) sin(πt) dt are the Fourier coeﬃcients of this odd extension which determine K(η, ·) uniquely. Step E: Determination of n (1). From the previous arguments yk (1)/γ is known yk (1) yk (1) and thus also ψ(k) := k d(k) γ + k γ cos k = γ sin k. We determine the asymptotic behavior of this η expression as k ∈ R tends to inﬁnity. The integrals η K(η, t) sin(kt) dt and ∂K(η, t)/∂s sin(kt) dt tend to zero as O(1/k) as seen 0 0 from partial integration. Therefore, ψ(k)
d(k) yk (1) y (1) +k cos k = k sin k γ γ γ ⎡ ⎤ η n(1)1/4 ⎣ sin(kη) cos(kη) + q(s) ds⎦ sin k 2k γ n(0)1/4
= k =
0
n (1) sin(kη) − sin k + O(1/k 2 ) . 1/4 3/4 k 4 γ n(0) n(1) The left hand side is known and also the ﬁrst term on the right hand side η because 0 q(s) ds = 2K(η, η). This determines n (1) from the data. Step F: Determination of ∂K(η, t)/∂s for t ∈ [0, η]. We compute ψ (π) from the previously deﬁned function ψ as (1) n(1)1/4 yπ sin(πη) cos(π) = K(η, η) (−1) cos(πη) + ψ (π) = 1/4 γ π γ n(0) η ∂K(η, t) sin(πt) dt + ∂s π 0 ⎤ ⎡ η n (1) sin(πη) sin(πt) − + dt⎦ . (−1) ⎣ K(η, t) π π 4 γ n(0)1/4 n(1)3/4 0
η
From this we conclude that also 0 ∂K(η, t)/∂s sin(πt) dt is known for all ∈ N and thus also ∂K(η, t)/∂s by the same arguments as in Step D. Step G: Determination of q = q(s) for 0 ≤ s ≤ η. We recall from Steps D and F, that K(η, t) and ∂K(η, t)/∂s are determined from the data for all t ∈ [0, η]. More precisely, if K denote the solutions of (7.106a)–(7.106c) for q = q , = 1, 2, then η1 = η2 =: η and K1 (η, ·) = K2 (η, ·) and ∂K1 (η, ·)/∂s = ∂K2 (η, ·)/∂s. The diﬀerence K := K1 − K2 satisﬁes Kss (s, t) − Ktt (s, t) − q1 (s) K(s, t) = [q1 (s) − q2 (s)] K2 (s, t) and K(·, 0) = 0 on [0, η] and K(s, s) =
1 2
s 0
in Δ0 ,
[q1 (σ) − q2 (σ)] dσ for 0 ≤ s ≤ η.
Furthermore, K(η, ·) = ∂K(η, ·)/∂s = 0 on [0, η]. Now we apply Theorem 5.18 from Chapter 5 with q = q1 , F = K2 , and f = g = 0. We observe that the pair
302
Inverse Scattering Problem
(K, q1 − q2 ) solves the homogeneous system (5.43a)–(5.43d). The uniqueness result of this theorem yields that K = 0 and q1 = q2 . This proves Part G. Step H: In this ﬁnal part we have to determine n from q where their relationship is given by (7.88c). Let again n1 and n2 be two indices with the same transmission eigenvalues. Deﬁne u by 1/4 n r (s) ,
s ∈ [0, η ] = [0, η] , r where again r = r (s) is the inverse of s = s (r) = 0 n (σ) dσ for = 1, 2. An elementary computation (using the chain rule and the derivative of the inverse function and (7.88c)) yields that u satisﬁes the ordinary linear diﬀerential equation u (s) =
u (s) = q (s) u (s) ,
0 ≤ s ≤ η, n (1)
with end conditions u (η) = n (1)1/4 and u (η) = 4 n(1)5/4 . From q1 = q2 and n1 (1) = n2 (1) and n1 (1) = n2 (1) and the uniqueness of this initial value problem we conclude that u1 (s) = u2 (s) for all s; that is, s1 r1 (s) = n1 r1 (s) = n2 r2 (s) = s2 r2 (s) . On the other hand, diﬀerentiating s r (s) = s yields s r (s) r (s) = 1 which implies that r1 = r2 , thus r1 = r2 and, ﬁnally, n1 = n2 . We summarize the result in the following theorem. Theorem 7.54 Let nj ∈ C 2 [0, 1], j = 1, 2, be positive with nj (r) ≤ 1 on [0, 1] and nj (1) < 1 such that all of the corresponding transmission eigenvalues with radially symmetric eigenfunctions coincide. Then n1 and n2 have to coincide.
7.7
Numerical Methods
In this section, we describe three types of numerical algorithms for the approximate solution of the inverse scattering problem for the determination of n and not only of the support D of n − 1. We assume—unless stated otherwise—that n ∈ L∞ (R3 ) with n(x) = 1 outside some ball B = B(0, a) of radius a > 0. The numerical methods we describe now are all based on the Lippmann– Schwinger integral equation. We deﬁne the volume potential V φ with density φ by eikx−y φ(y) dy , x ∈ B . (7.108) (V φ)(x) := 4πx − y y 0 such that, if m ∈ C(Q) with m ∞ ≤ ˆ is the corresponding total ﬁeld with exact far ﬁeld pattern ε and u = u(x, θ) ∞ ˆ ˆ then the sequence (m , u ) constructed by the regularized x, θ) = u∞ (ˆ x, θ), u (ˆ ˜ u ˜) ∈ XN × C Q × S 2 that algorithm (Ar ), (Br ), (Cr ) converges to some (m, satisﬁes the scattering problem with refraction contrast m. ˜ Its far ﬁeld pattern u ˜∞ coincides with u∞ at the points x ˆj , θˆj ∈ S 2 × S 2 , j ∈ ZN . If, in addition, the exact solution m satisﬁes m ∈ XN , then the sequence m , u converges to (m, u). Proof:
We deﬁne the operator L : C Q × S 2 × C S 2 × S 2 −→ XN × C Q × S 2
by
L1 ρ , w − k 2 V (ui L1 ρ) . Then L is a left inverse of T (0, ui ) on XN × C Q × S 2 ; that is, L T (0, ui )(μ, v) = (μ, v) for all (μ, v) ∈ XN × C Q × S 2 . Indeed, let (μ, v) ∈ XN × C Q × S 2 and set (w, ρ) = T (0, ui )(μ, v), i.e., w = v + k 2 V (μui ) and ρ = W (μui ). The latter equation implies that 4π μ(y) e−i j·y dy = − 2 ρ x ˆj , θˆj , j ∈ ZN . k L(w, ρ) :=
Q
Because μ ∈ XN , this yields μ = L1 ρ and thus L(w, ρ) = (μ, v). With the abbreviations z = m , u and R = (ui , u∞ ), we can write the regularized algorithm in the form z+1 = z − L T (z ) − R , = 0, 1, 2, . . . in the space XN × C Q × S 2 . We can now apply a general result about local convergence of the simpliﬁed Newton method (see Appendix A, Theorem A.65). 2 of This yields the existence of a unique solution ( m, ˜ u ˜ ) ∈ X × C Q × S N , u ) to ( m, ˜ u ˜ ). L T m, ˜ u ˜ − R = 0 and linear convergence of the sequence (m 2 i ˜ u = u is equivalent to the scattering problem by The equation u ˜ + k V m˜ ˆj , θˆj = ˜ u = L1 f is equivalent to u ˜∞ x Theorem 7.12. The equation L1 W m˜ ˆj , θˆj forall j ∈ ZN . Finally, if m ∈ XN , then (m, u) satisﬁes L T (m, u) = u∞ x L R and thus m, ˜ u ˜ = (m, u). This proves the assertion. We have formulated the algorithm with respect to the Lippmann–Schwinger integral equation because our analysis of existence and continuous dependence is based on this setting. There is an alternative way to formulate the simpliﬁed Newton method in terms of the original scattering problems; see [113]. We note also that our analysis can easily be modiﬁed to treat the case where only n ∈ L∞ (B). For numerical examples, we refer to [113].
7.7
Numerical Methods
7.7.2
307
A Modified Gradient Method
The idea of the numerical method proposed and numerically tested by Kleinman and van den Berg (see [164]) is to solve (7.112a), (7.112b) by a gradienttype method. For simplicity, we describe the method again in the function space setting and refer for discretization aspects to the original literature [164]. Again let B = B(0, a) contain the support of m = 1 − n. (A) Choose m0 ∈ L∞ (B), u0 ∈ L2 (B × S 2 ), and set = 0. (B) Choose directions e ∈ L2 (B × S 2 ) and d ∈ L∞ (B), and set u+1 = u + α e ,
m+1 = m + β d .
(7.122)
The stepsizes α , β > 0 are chosen in such a way that they minimize the functional Ψ (α, β) :=
r+1 2L2 (B×S 2 )
s+1 2L2 (S 2 ×S 2 )
,
(7.123a)
where the defects r+1 and s+1 are deﬁned by r+1 := ui − u+1 − k 2 V m+1 u+1 , s+1 := u∞ − W m+1 u+1 .
(7.123b)
ui 2L2 (B×S 2 )
+
f 2L2 (S 2 ×S 2 )
(7.123c)
(C) Replace by + 1 and continue with step (B). There are diﬀerent choices for the directions d and e . In [164], ˆ u (x, θ) ˆ ds(θ) ˆ , x ∈ B , and e := r d (x) = − d˜ (x, θ)
(7.124)
S2
have been chosen where d˜ = −W ∗ W (m u ) − u∞ ∈ L∞ B × S 2 . In this case, d is the steepest descent direction of μ → W (μu )−u∞ 2L2 (S 2 ×S 2 ) . In [266], for d and e Polak–Ribi`ere conjugate gradient directions are chosen. A rigorous convergence analysis of either method has not been carried out. A severe drawback of the methods discussed in Sections 7.7.1 and 7.7.2 is ˆ To estimate the that they iterate on functions m = m (x) and u = u (x, θ). storage requirements, we choose a grid of order N ·N ·N grid points in B and M directions θ1 , . . . , θM ∈ S 2 . Then both methods iterate on vectors of dimension N 6 · M . From the uniqueness results, M is expected to be large, say, of order N 2 . For large values of M , the method described next has proven to be more eﬃcient.
308
7.7.3
Inverse Scattering Problem
The Dual Space Method
The method described here is due to Colton and Monk [61, 62] based on their earlier work for inverse obstacle scattering problems (see [59, 60]). There exist various modiﬁcations of this method, but we restrict ourselves to the simplest case. This method consists of two steps. In the ﬁrst step, one tries to determine ˆ such that the corresponding a superposition of the incident ﬁelds ui = ui (·, θ) ˆ far ﬁeld pattern u∞ (·, θ) is (close to) the far ﬁeld pattern of radiating multipoles. In the second step, the function m = n − 1 is determined from an interior transmission problem. We describe both steps separately. Assume for the following that the origin is contained in B = B(0, a). By u∞ we denote again the measured far ﬁeld pattern. Step 1: Determine g ∈ L2 (S 2 ) with ˆ g(θ) ˆ ds(θ) ˆ = 1, u∞ (ˆ x, θ)
x ˆ ∈ S2 .
(7.125)
S2
ˆ x, θ) In Theorem 7.22, we have proven that for the exact far ﬁeld pattern u∞ (ˆ 2 2 this integral equation of the ﬁrst kind is solvable in L (S ) if and only if the interior transmission problem Δv + k 2 v = 0 in B ,
Δw + k 2 nw = 0 in B ,
w(x) − v(x) = ∂w(x) ∂v(x) − ∂ν ∂ν
=
(7.126a)
ikx
e
x
on ∂B ,
∂ eikx ∂ν x
on ∂B ,
(7.126b) (7.126c)
has a solution v, w ∈ L2 (B) in the ultra weak sense of Deﬁnition 7.21 such that v(x) = eikx·ˆy g(ˆ y ) ds(ˆ y ) , x ∈ R3 . (7.127) S2
The kernel of the integral operator in (7.125) is (for the exact far ﬁeld pattern) analytic with respect to both variables, thus (7.125) represents a severely illposed—but linear—equation and can be treated by Tikhonov’s regularization method as described in Chapter 2 in detail. (In this connection, see the remark following Theorem 7.23.) We formulate the interior transmission problem (7.126a)–(7.126c) as an integral equation. Lemma 7.56 (a)Let v, w ∈ L2 (B) solve the boundary value problem (7.126a)– (7.126c) in the ultra weak sense of (7.46b). Deﬁne u = w − v in B and
7.7
Numerical Methods
309
2 u(x) = exp(ikx)/x in R3 \ B. Then u ∈ Hloc (R3 ) and u and w ∈ L2 (B) solve 2 m(y) w(y) Φ(x, y) dy , x ∈ R3 , (7.128a) u(x) = k y 0 and λ1 , λ2 > 0, minimize J(g, w, m) on L2 (S 2 ) × L2 (B) × C,
(7.130a)
7.7
Numerical Methods
311
where J(g, w, m)
:= F g − 1 2L2 (S 2 ) + ε g 2L2 (S 2 ) 2
+ λ1 w − vg − k V
(7.130b)
(mw) 2L2 (B)
+ λ2 k 2 V (mw) − 4π Φ(·, 0) 2L2 (∂B) , and the far ﬁeld operator F : L2 (S 2 ) → L2 (S 2 ) is deﬁned by (see (7.40)) ˆ , x ˆ, θˆ g θˆ ds(θ) ˆ ∈ S2. u∞ x (F g) x ˆ := S2
Theorem 7.57 This optimization problem (7.130a), (7.130b) has an optimal solution (g, w, m) for every choice of ε, λ1 , λ2 > 0 and every compact subset C ⊂ L∞ (B). Proof: Let (gj , wj , mj ) ∈ L2 (S 2 ) × L2 (B) × C be a minimizing sequence; that is, J(gj , wj , mj ) → J ∗ where the optimal value J ∗ is deﬁned by J ∗ := inf J(g, w, m) : (g, w, m) ∈ L2 (S 2 ) × L2 (B) × C . We can assume that (mj ) converges to some m ∈ C because C is compact. Several tedious applications of the parallelogram equality
a + b 2 = − a − b 2 + 2 a 2 + 2 b 2 and the binomial formula
b 2 = a 2 + 2 Re (a, b − a) + a − b 2 yield −J
∗
1 1 (gj + g ), (wj + w ), mj ≥ −J 2 2 1 1 = − J(gj , wj , mj ) − J(g , w , mj ) 2 2 1 ε + F (gj − g ) 2L2 (S 2 ) + gj − g 2L2 (S 2 ) 4 4 λ1
(wj − w ) − vgj −g − k 2 V mj (wj − w ) 2L2 (B) + 4 λ2 k 4
V mj (wj − w ) 2L2 (∂B) . + 4
From this we conclude that 1 1 J(gj , wj , mj ) + J(g , w , mj ) 2 2 λ1 ε
gj − g 2L2 (S 2 ) +
(wj − w ) − vgj −g − k 2 V mj (wj − w ) 2L2 (B) . 4 4
−J ∗ + ≥
312
Inverse Scattering Problem
The lefthand side tends to zero as j and tend to inﬁnity, therefore we conclude that (gj ) is a Cauchy sequence, thus converging gj → g in L2 (S 2 ). Furthermore, from
(wj − w ) − vgj −g − k 2 V mj (wj − w ) L2 (B) −→ 0 as , j → ∞ and the convergence gj → g we conclude that
(wj − w ) − k 2 V mj (wj − w ) L2 (B) −→ 0 as , j → ∞. The operators I − k 2 V (mj ·) converge to the isomorphism I − k 2 V (m·) in the operator norm of L2 (B). Therefore, by Theorem A.37 of Appendix A, we conclude that (wj ) is a Cauchy sequence and thus is convergent in L2 (B) to some w. The continuity of J implies that J(gj , wj , mj ) → J(g, w, m). Therefore, (g, w, m) is optimal.
7.8
Problems
p 7.1 Let Q = (−π, π)3 ⊂ R3 be the cube and H p (Q), H0p (Q), and Hper (Q) be the Sobolev spaces deﬁned at the beginning of this chapter. Show that p p (Q) ⊂ H p (Q) and H0p (Q) ⊂ Hper (Q) with bounded inclusions. Use Hper p this result to show that H0 (Q) is compactly imbedded in L2 (Q).
ˆ k) and ub (ˆ ˆ x, θ, 7.2 Let ub1,∞ (ˆ 2,∞ x, θ, k) be the far ﬁeld patterns of the Born ˆ wave approximations corresponding to observation x ˆ, angle of incidence θ, number k, and indices of refraction n1 and n2 , respectively. Assume that ˆ k) = ub (ˆ ˆ ub1,∞ (ˆ x, θ, 2,∞ x, θ, k) for all x ˆ ∈ S 2 and k ∈ [k1 , k2 ] ⊂ R+ and some θˆ ∈ S 2 . Prove that n1 = n2 . ˆ; θˆ , 7.3 Prove the following result, sometimes called Karp’s theorem. Let u∞ x x ˆ, θˆ ∈ S 2 , be the far ﬁeld pattern and assume that there exists a function f : [−1, 1] → C with ˆ; θˆ = f x ˆ · θˆ for all x ˆ, θˆ ∈ S 2 . u∞ x Prove that the index of refraction n has to be radially symmetric: n = n(r). Hint: Rotate the geometry and use the uniqueness result. 7.4 Show that for any a > 0 max
x≤a y a, and solve this elliptic equation explicitly by separation of variables.
7.8
Problems
313
7.5 Show that the characteristic functions fη from (7.94) for η = and η = 2/3 have the forms f1/2 (k)
=
f2/3 (k)
=
k , k ∈ C, 3 2 2k 3 k sin 3 + 2 cos , 3 3 3
√ n = 1/2
sin3
k ∈ C,
respectively. Discuss the existence of zeros of these functions and justify for these examples the assertions of Theorem 7.48. Hint: Use the addition formulas for the trigonometric functions to express fη in terms of sin(k/3) and cos(k/3).
Appendix A
Basic Facts from Functional Analysis In this appendix, we collect some of the basic deﬁnitions and theorems from functional analysis. We prove only those theorems whose proofs are not easily accessible. We recommend the monographs [151, 168, 230, 271] for a comprehensive treatment of linear and nonlinear functional analysis.
A.1
Normed Spaces and Hilbert Spaces
First, we recall two basic deﬁnitions. Definition A.1 (Scalar Product, PreHilbert Space) Let X be a vector space over the ﬁeld K = R or K = C. A scalar product or inner product is a mapping (·, ·)X : X × X −→ K with the following properties: (i) (x + y, z)X = (x, z)X + (y, z)X for all x, y, z ∈ X, (ii) (αx, y)X = α (x, y)X for all x, y ∈ X and α ∈ K, (iii) (x, y)X = (y, x)X for all x, y ∈ X, (iv) (x, x)X ∈ R and (x, x)X ≥ 0, for all x ∈ X, (v) (x, x)X > 0 if x = 0. A vector space X over K with inner product (·, ·)X is called a preHilbert space over K. © Springer Nature Switzerland AG 2021 A. Kirsch, An Introduction to the Mathematical Theory of Inverse Problems, Applied Mathematical Sciences 120, https://doi.org/10.1007/9783030633431
315
316
Appendix A The following properties are easily derived from the deﬁnition:
(vi) (x, y + z)X = (x, y)X + (x, z)X for all x, y, z ∈ X, (vii) (x, αy)X = α (x, y)X for all x, y ∈ X and α ∈ K. Definition A.2 (Norm) Let X be a vector space over the ﬁeld K = R or K = C. A norm on X is a mapping · X : X −→ R with the following properties: (i) xX > 0 for all x ∈ X with x = 0, (ii) αxX = α xX for all x ∈ X and α ∈ K, (iii) x + yX ≤ xX + yX for all x, y ∈ X. A vector space X over K with norm · X is called normed space over K. Property (iii) is called triangle inequality. Applying it to the identities x = (x − y) + y and y = (y − x) + x yields the second triangle inequality x − yX ≥ xX − yX for all x, y ∈ X. Theorem A.3 Let X be a preHilbert space. The mapping · X : X −→ R deﬁned by (x, x)X , x ∈ X , xX := is a norm; that is, it has properties (i), (ii), and (iii) of Deﬁnition A.2. Furthermore, (iv) (x, y)X  ≤ xX yX for all x, y ∈ X (Cauchy–Schwarz inequality), (v) x ± y2X = x2X + y2X ± 2 Re(x, y)X for all x, y ∈ X (binomial formula), (vi) x + y2X + x − y2X = 2x2X + 2y2X for all x, y ∈ X. In the following example, we list some of the most important preHilbert and normed spaces. Example A.4 (a) Cn is a preHilbert space of dimension n over C with inner product n (x, y)2 := k=1 xk y k . (b) Cn is a preHilbert space of dimension 2n over R with inner product n (x, y)2 := Re k=1 xk y k . (c) Rn is a preHilbert space of dimension n over R with inner product n (x, y)2 := k=1 xk yk .
A.1 Normed Spaces and Hilbert Spaces
317
(d) For p ≥ 1 deﬁne the set p of complexvalued sequences by ∞ p p xk  < ∞ . := (xk ) :
(A.1)
k=1
Then p is a linear space over C because if (xk ), (yk ) ∈ p , then (λxk ) and (xk + yk ) are also in p . The latter follows from the inequality xk + yk p ≤ (2 max{xk , yk })p ≤ 2p (xk p + yk p ). xp :=
∞
1/p xk 
p
,
x = (xk ) ∈ p ,
k=1
deﬁnes a norm in p . The triangle inequality in the case p > 1 is known as the Minkowski inequality. In the case p = 2, the sesquilinear form (x, y)2 :=
∞
x = (xk ) , y = (yk ) ∈ 2 ,
xk yk ,
k=1
deﬁnes an inner product on 2 . It is welldeﬁned by the Cauchy–Schwarz inequality. (e) The space C[a, b] of (real or complexvalued) continuous functions on [a, b] is a preHilbert space over R or C with inner product
b (x, y)L2 :=
x(t) y(t) dt ,
x, y ∈ C[a, b] .
(A.2a)
a
The corresponding norm is called the Euclidean norm and is denoted by
b (x, x)L2 = x(t)2 dt , x ∈ C[a, b] . (A.2b) xL2 := a
(f) On the same vector space C[a, b] as in example (e), we introduce a norm by (A.3) x∞ := max x(t) , x ∈ C[a, b] , a≤t≤b
that we call the supremum norm. (g) Let m ∈ N and α ∈ (0, 1]. We deﬁne the spaces C m [a, b] and C m,α [a, b] by x is m times continuously m C [a, b] := x ∈ C[a, b] : , diﬀerentiable on [a, b]
C
m,α
[a, b]
x(m) (t) − x(m) (s) m := x ∈ C [a, b] : sup 0 with M ⊂ B(0, r). The set M ⊂ X is called open if for every x ∈ M there exists ε > 0 such that B(x, ε) ⊂ M . The set M ⊂ X is called closed if the complement X \ M is open. (b) A sequence (xk )k in X is called bounded if there exists c > 0 such that xk X ≤ c for all k. The sequence (xk )k in X is called convergent if there exists x ∈ X such that x − xk X converges to zero in R. We denote the limit by x = limk→∞ xk , or we write xk → x as k → ∞. The sequence (xk )k in X is called a Cauchy sequence if for every > 0 there exists N ∈ N with xm − xk X < for all m, k ≥ N . (c) Let (xk )k be a sequence in X. A point x ∈ X is called an accumulation point if there exists a subsequence (akn )n that converges to x. (d) A set M ⊂ X is called compact if every sequence in M has an accumulation point in M . Example A.6 Let X = C[0, 1] over R and xk (t) = tk , t ∈ [0, 1], k ∈ N. The sequence (xk )k converges to zero with respect to the Euclidean norm ·L2 introduced in (A.2b). With respect to the supremum norm · ∞ of (A.3), however, the sequence does not converge to zero. It is easy to prove (see Problem A.1) that a set M is closed if and only if the limit of every convergent sequence (xk )k in M also belongs to M . The sets int(M ) := {x ∈ M : there exists ε > 0 with B(x, ε) ⊂ M } and closure(M ) :=
x ∈ X : there exists (xk )k in M with x = lim xk k→∞
A.1 Normed Spaces and Hilbert Spaces
319
are called the interior and closure, respectively, of M . The set M ⊂ X is called dense in X if closure(M ) = X. In general, the topological properties depend on the norm in X as we have seen already in Example A.6. For ﬁnitedimensional spaces, however, these properties are independent of the norm. This is seen from the following theorem. Theorem A.7 Let X be a ﬁnitedimensional space with norms · 1 and · 2 . Then both norms are equivalent; that is, there exist constants c2 ≥ c1 > 0 with c1 x1 ≤ x2 ≤ c2 x1
for all x ∈ X .
In other words, every ball with respect to · 1 contains a ball with respect to · 2 and vice versa. Further properties are collected in the following theorem. Theorem A.8 Let X be a normed space over K and M ⊂ X be a subset. (a) M is closed if and only if M = closure(M ), and M is open if and only if M = int(M ). (b) If M = X is a linear subspace, then int(M ) = ∅, and closure(M ) is also a linear subspace. (c) In ﬁnitedimensional spaces, every subspace is closed. (d) Every compact set is closed and bounded. In ﬁnitedimensional spaces, the reverse is also true (Theorem of Bolzano–Weierstrass): In a ﬁnitedimensional normed space, every closed and bounded set is compact. A crucial property of the set of real numbers is its completeness. It is also a necessary assumption for many results in functional analysis. Definition A.9 (Banach Space, Hilbert Space) A normed space X over K is called complete or a Banach space if every Cauchy sequence converges in X. A complete preHilbert space is called a Hilbert space. The spaces Cn and Rn are Hilbert spaces with respect to their canonical inner products. The space C[a, b] is not complete with respect to the inner product (·, ·)L2 of (A.2a)! As an example, we consider the sequence xk (t) = tk for 0 ≤ t ≤ 1 and xk (t) = 1 for 1 ≤ t ≤ 2. Then (xk )k is a Cauchy sequence in C[0, 2] but does not converge in C[0, 2] with respect to (·, ·)L2 because it converges to the function 0, t < 1, x(t) = 1, t ≥ 1, that is not continuous. The space C[a, b], · ∞ , however, is a Banach space. Every normed space or preHilbert space X can be “completed”; that is, ˜ respectively, that extends there exists a “smallest” Banach or Hilbert space X, X (that is, xX = xX˜ or (x, y)X = (x, y)X˜ , respectively, for all x, y ∈ X). More precisely, we have the following (formulated only for normed spaces).
320
Appendix A
Theorem A.10 Let X be a normed space with norm · X . There exist a ˜ such that ˜ · ˜ and an injective linear operator J : X → X Banach space X, X ˜ is dense in X, ˜ and (i) The range J(X) ⊂ X (ii) JxX˜ = xX for all x ∈ X; that is, J preserves the norm. ˜ is uniquely determined in the sense that if X ˆ is a second space Furthermore, X ˆ then with properties (i) and (ii) with respect to a linear injective operator J, ˆ the operator Jˆ J −1 : J(X) → J(X) has an extension to a normpreserving ˜ onto X. ˆ In other words, X ˜ and X ˆ can be identiﬁed. isomorphism from X We denote the completion of the preHilbert space C[a, b], (·, ·)L2 by L2 (a, b). Using Lebesgue integration theory, it can be shown that the space L2 (a, b) is characterized as follows. (The notions “measurable,” “almost everywhere” (a.e.), and “integrable” are understood with respect to the Lebesgue measure.) First, we deﬁne the vector space
L2 (a, b) := x : (a, b) → C : x is measurable and x2 integrable , where addition and scalar multiplication are deﬁned pointwise almost everywhere. Then L2 (a, b) is a vector space because, for x, y ∈ L2 (a, b) and α ∈ C, x + y and αx are also measurable and αx, x + y ∈ L2 (a, b), the latter by the binomial theorem x(t) + y(t)2 ≤ 2x(t)2 + 2y(t)2 . We deﬁne a sesquilinear form on L2 (a, b) by
b x, y :=
x(t) y(t) dt ,
x, y ∈ L2 (a, b) .
a
·, · is not an inner product on L2 (a, b) because x, x = 0 only implies that x vanishes almost everywhere; that is, that x ∈ N , where N is deﬁned by
N := x ∈ L2 (a, b) : x(t) = 0 a.e. on (a, b) . Now we deﬁne L2 (a, b) as the factor space L2 (a, b) := L2 (a, b)/N and equip L2 (a, b) with the inner product
[x], [y] L2 :=
b x(t) y(t) dt ,
x ∈ [x] , y ∈ [y] .
a 2
Here, [x], [y] ∈ L (a, b) are equivalence classes of functions in L2 (a, b). Then it can be shown that this deﬁnition is well deﬁned and yields an inner product on L2 (a, b). From now on, we write x ∈ L2 (a, b) instead of x ∈ [x] ∈ L2 (a, b). Furthermore, it can be shown by fundamental results of Lebesgue integration theory that L2 (a, b) is complete; that is, a Hilbert space and contains C[a, b] as a dense subspace.
A.2 Orthonormal Systems
321
Definition A.11 (Separable Space) The normed space X is called separable if there exists a countable dense subset M ⊂ X; that is, if there exist M and a bijective mapping j : N → M with closure(M ) = X. The spaces Cn , Rn , L2 (a, b), and C[a, b] are all separable. For the ﬁrst two examples, let M consist of all vectors with rational coeﬃcients; for the latter examples, take polynomials with rational coeﬃcients. Definition A.12 (Orthogonal Complement) Let X be a preHilbert space (over K = R or C). (a) Two elements x and y are called orthogonal if (x, y)X = 0. (b) Let M ⊂ X be a subset. The set
M ⊥ := x ∈ X : (x, y)X = 0 for all y ∈ M is called the orthogonal complement of M . ⊥ M ⊥ is always a closed subspace and M ⊂ M ⊥ . Furthermore, A ⊂ B implies that B ⊥ ⊂ A⊥ . The following theorem is a fundamental result in Hilbert space theory and relies heavily on the completeness property. Theorem A.13 (Projection Theorem) Let X be a preHilbert space and V ⊂ X be a complete subspace. Then V = ⊥ ⊥ V . Every x ∈ X possesses a unique decomposition of the form x = v + w, where v ∈ V and w ∈ V ⊥ . The operator P : X → V , x → v, is called the orthogonal projection operator onto V and has the properties (a) P v = v for v ∈ V ; that is, P 2 = P ; (b) x − P xX ≤ x − v X for all v ∈ V . This means that P x ∈ V is the best approximation of x ∈ X in the subspace V.
A.2
Orthonormal Systems
In this section, let X always be a separable Hilbert space over the ﬁeld K = R or C. Definition A.14 (Orthonormal System) A countable set of elements A = {xk : k = 1, 2, 3, . . .} is called an orthonormal system (ONS) if (i) (xk , xj )X = 0 for all k = j and
322
Appendix A
(ii) xk X = 1 for all k ∈ N. A is called a complete or a maximal orthonormal system if, in addition, there is no ONS B with A ⊂ B and A = B. One can show using Zorn’s Lemma that every separable Hilbert possesses a maximal ONS. Furthermore, it is well known from linear algebra that every countable set of linearly independent elements of X can be orthonormalized. For any set A ⊂ X, let span A :=
n
αk xk : αk ∈ K, xk ∈ A, n ∈ N
(A.5)
k=1
be the subspace of X spanned by A. Theorem A.15 Let A = {xk : k = 1, 2, 3, . . .} be an orthonormal system. Then (a) Every ﬁnite subset of A is linearly independent. (b) If A is ﬁnite; that is, A = {xk : k = 1, 2, . . . , n}, then for every x ∈ X there exist uniquely determined coeﬃcients αk ∈ K, k = 1, . . . , n, such that n αk xk ≤ x − aX for all a ∈ span A . (A.6) x − k=1
X
The coeﬃcients αk are given by αk = (x, xk )X for k = 1, . . . , n. (c) For every x ∈ X, the following Bessel inequality holds: ∞ (x, xk )X 2 ≤ x2X ,
(A.7)
k=1
and the series
∞
k=1 (x, xk )X xk
converges in X.
(d) A is complete if and only if span A is dense in X. (e) A is complete if and only if for all x ∈ X the following Parseval equation holds: ∞ (x, xk )X 2 = x2X . (A.8) k=1
(f ) A is complete if and only if every x ∈ X has a (generalized) Fourier expansion of the form x =
∞
(x, xk )X xk ,
k=1
(A.9)
A.2 Orthonormal Systems
323
where the convergence is understood in the norm of X. In this case, the Parseval equation holds in the following more general form: (x, y)X =
∞
(x, xk )X (y, xk )X .
(A.10)
k=1
This important theorem includes, as special examples, the classical Fourier expansion of periodic functions and the expansion with respect to orthogonal polynomials. We recall two examples. Example A.16 (Fourier Expansion) √ (a) The functions xk (t) := exp(ikt)/ 2π, k ∈ Z, form a complete system of orthonormal functions in L2 (0, 2π). By part (f) of the previous theorem, every function x ∈ L2 (0, 2π) has an expansion of the form
2π ∞ 1 ikt x(t) = e x(s) e−iks ds , 2π k=−∞
0
where the convergence is understood in the sense of L2 ; that is, 2
2π
2π N 1 ikt −iks x(t) − e x(s) e ds dt −→ 0 2π k=−M
0
0
as M, N tend to inﬁnity. Parseval’s identity holds the form k∈Z
ak 
2
1 x2L2 , = 2π
ak
1 = 2π
2π
x(s) e−iks ds .
(A.11)
0
For smooth periodic functions, one can even show uniform convergence (see Section A.4). (b) The Legendre polynomials Pk , k = 0, 1, . . ., form a maximal orthonormal system in L2 (−1, 1). They are deﬁned by Pk (t) = γk
dk (1 − t2 )k , dtk
t ∈ (−1, 1) , k ∈ N0 ,
with normalizing constants γk =
2k + 1 1 . 2 k! 2k
We refer to [135] for details. Other important examples will be given later.
324
A.3
Appendix A
Linear Bounded and Compact Operators
For this section, let X and Y always be normed spaces and A : X → Y be a linear operator. Definition A.17 (Boundedness, Norm of A) The linear operator A is called bounded if there exists c > 0 such that AxY ≤ cxX
for all x ∈ X .
The smallest of these constants is called the norm of A; that is, AL(X,Y ) := sup x=0
AxY . xX
(A.12)
Theorem A.18 The following assertions are equivalent: (a) A is bounded. (b) A is continuous at x = 0; that is, xj → 0 implies that Axj → 0. (c) A is continuous for every x ∈ X. The space L(X, Y ) of all linear bounded mappings from X to Y with the operator norm is a normed space; that is, the operator norm has properties (i), (ii), and (iii) of Deﬁnition A.2 and the following: Let B ∈ L(X, Y ) and A ∈ L(Y, Z); then A B ∈ L(X, Z) and A BL(X,Z) ≤ AL(Y,Z) BL(X,Y ) . Integral operators are the most important examples for our purposes. Theorem A.19 (a) Let k ∈ L2 (c, d) × (a, b) . The operator
b (Ax)(t) :=
k(t, s) x(s) ds ,
t ∈ (c, d) ,
x ∈ L2 (a, b) ,
(A.13)
a
is welldeﬁned, linear, and bounded from L2 (a, b) into L2 (c, d). Furthermore,
AL(L2 (a,b),L2 (c,d))
d b
≤ k(t, s)2 ds dt . c
a
(b) Let k be continuous on [c, d] × [a, b]. Then A is also welldeﬁned, linear, and bounded from C[a, b] into C[c, d] and
b A∞ := AL(C[a,b],C(c,d]) = max
k(t, s) ds .
t∈[c,d] a
A.3 Linear Bounded and Compact Operators
325
We can extend this theorem to integral operators with weakly singular kernels. We recall that a kernel k is called weakly singular on [a, b] × [a, b] if k is deﬁned and continuous for all t, s ∈ [a, b], t = s, and there exist constants c > 0 and α ∈ [0, 1) such that k(t, s) ≤ c t − s−α
for all t, s ∈ [a, b] , t = s .
Theorem A.20 Let k be weakly singular on [a, b]. Then the integral operator A, deﬁned by (A.13) for [c, d] = [a, b], is welldeﬁned and bounded as an operator in L2 (a, b) as well as in C[a, b]. For the special case Y = K, we denote by X ∗ := L(X, K) the dual space of X. Often we write , x X ∗ ,X instead of (x) for ∈ X ∗ and x ∈ X and call ·, · X ∗ ,X the dual pairing. The dual pairing is a bilinear form from X ∗ × X into K. The space X ∗∗ = (X ∗ )∗ is called the bidual of X. The canonical embedding J : X → X ∗∗ , deﬁned by (Jx) := , x X ∗ ,X ,
x ∈ X , ∈ X∗ ,
(A.14)
is linear, bounded, onetoone, and satisﬁes JxX ∗∗ = xX for all x ∈ X. We recall some important examples of dual spaces (where we write ·, · for the dual pairing). Example A.21 Let again K = R or K = C. with Kn itself. The identiﬁcation I : (a) The dual of Kn can be identiﬁed n Kn → (Kn )∗ is given by Ix, y = j=1 xj yj for x, y ∈ Kn . (b) Let p > 1 and q > 1 with p1 + 1q = 1. The dual of p (see Example A.4) can be identiﬁed with q . The identiﬁcation I : q → (p )∗ is given by Ix, y =
∞
xj yj
for x = (xj ) ∈ q and y = (yj ) ∈ p .
j=1
(c) The dual (1 )∗ of the space 1 can be identiﬁed with the space ∞ of bounded sequences (equipped with the supremum norm). The identiﬁcation I : ∞ → (1 )∗ is given by the form as in (b) for x = (xj ) ∈ ∞ and y = (yj ) ∈ 1 . (d) Let c0 ⊂ ∞ be the space of sequences in K which converge to zero, equipped with the supremum norm. Then c∗0 can be identiﬁed with 1 . The identiﬁcation I : 1 → c∗0 is given by the form as in (b) for x ∈ 1 and y ∈ c0 . Definition A.22 (Reﬂexive Space) The normed space X is called reﬂexive if the canonical embedding of X into X ∗∗ is surjective; that is, a normpreserving isomorphism from X onto the bidual space X ∗∗ .
326
Appendix A
The spaces p for p > 1 of Example A.21 are typical examples of reﬂexive spaces. The spaces 1 , ∞ , and c0 fail to be reﬂexive. The following important result gives a characterization of X ∗ in Hilbert spaces. Theorem A.23 (Riesz) Let X be a Hilbert space. For every x ∈ X, the functional x (y) := (y, x)X , y ∈ X, deﬁnes a linear bounded mapping from X to K; that is, x ∈ X ∗ . Furthermore, for every ∈ X ∗ there exists one and only one x ∈ X with (y) = (y, x)X for all y ∈ X and X ∗ := sup y=0
, y  = xX . yX
This theorem implies that every Hilbert space is reﬂexive. It also yields the existence of a unique adjoint operator for every linear bounded operator A : X −→ Y . We recall that for any linear and bounded operator A : X → Y between normed spaces X and Y the dual operator A∗ : Y ∗ → X ∗ is deﬁned as T ∗ = ◦ A for all ∈ Y ∗ . Here ◦ A is the composition of A and ; that is, ( ◦ A)x = (Ax) for x ∈ X. Theorem A.24 (Adjoint Operator) Let A : X −→ Y be a linear and bounded operator between Hilbert spaces. Then there exists one and only one linear bounded operator A∗ : Y −→ X with the property (Ax, y)Y = (x, A∗ y)X for all x ∈ X , y ∈ Y . This operator A∗ : Y −→ X is called the adjoint operator to A. For X = Y , the operator A is called selfadjoint if A∗ = A. Example A.25 (a) Let X = L2 (a, b), Y = L2 (c, d), and k ∈ L2 (c, d) × (a, b) . The adjoint A∗ of the integral operator
b k(t, s) x(s) ds ,
(Ax)(t) =
t ∈ (c, d) ,
x ∈ L2 (a, b) ,
t ∈ (a, b) ,
y ∈ L2 (c, d) .
a
is given by (A∗ y)(t) =
d k(s, t) y(s) ds , c
(b) Let the space X = C[a, b] of continuous function over C be supplied with the L2 inner product. Deﬁne f, g : C[a, b] → C by
b x(t) dt
f (x) := a
and g(x) := x(a)
for x ∈ C[a, b] .
A.3 Linear Bounded and Compact Operators
327
Both f and g are linear. f is bounded but g is unbounded. There is an extension of f to a linear bounded functional (also denoted by f ) on L2 (a, b); that is, f ∈ L2 (a, b)∗ . By Theorem A.23, we can identify L2 (a, b)∗ with L2 (a, b) itself. For the given f , the representation function is just the constant function 1 because f (x) = (x, 1)L2 for x ∈ L2 (a, b). The adjoint of f is calculated by
b f (x) · y =
x(t) y dt = (x, y)L2 =
x, f ∗ (y) L2
a 2
for all x ∈ L (a, b) and y ∈ C. Therefore, f ∗ (y) ∈ L2 (a, b) is the constant function with value y. (c) Let X be the Sobolev space H 1 (a, b); that is, the space of L2 functions that possess generalized L2 derivatives: there exists α ∈ K and y ∈ L2 (a, b) with 1 2 . H (a, b) := x ∈ L (a, b) : t x(t) = α + a y(s) ds for t ∈ (a, b) We denote the generalized derivative y ∈ L2 (a, b) by x . We observe that H 1 (a, b) ⊂ C[a, b] with bounded embedding. As an inner product in H 1 (a, b), we deﬁne (x, y)H 1 := x(a) y(a) + (x , y )L2 ,
x, y ∈ H 1 (a, b) .
Now let Y = L2 (a, b) and A : H 1 (a, b) −→ L2 (a, b) be the operator x → x for x ∈ H 1 (a, b). Then A is welldeﬁned, linear, and bounded. It is easily seen that the adjoint of A is given by ∗
t
(A y)(t) =
y(s) ds,
t ∈ (a, b) ,
y ∈ L2 (a, b) .
a
In the following situation, we consider the case that a Banach space V is contained in a Hilbert space X with bounded imbedding j : V → X such that also j(V ) is dense in X. We have in mind examples such as H 1 (0, 1) ⊂ L2 (0, 1). Then the dual operator j ∗ is a linear bounded operator from X ∗ into V ∗ with dense range (the latter follows from the injectivity of j). Also, j ∗ is onetoone because j(V ) is dense in X, see Problem A.3. Now we use the fact that X and X ∗ are antiisomorphic by the Theorem A.23 of Riesz; that is, the operator jR : X x → x ∈ X ∗ where x (z) = (z, x)X for z ∈ X is bijective and antilinear; that is, satisﬁes jR (λx + μy) = λjR x + μjR y for all x, y ∈ X and λ, μ ∈ K. Therefore, also the composition j ∗ ◦ jR : X → V ∗ is antilinear. For this reason we deﬁne the antidual space V of V by
(A.15) V = V∗ = :V →K:∈V∗ where (v) = (v) for all v ∈ V . Then the operator j := j ∗ ◦ jR : X → V is linear and onetoone with dense range, thus an imbedding. In this sense, X is
328
Appendix A
densely imbedded in V . We denote the application of ∈ V to v ∈ V by , v and note that (, v) → , v is a sesquilinear form on V × V . (It should not be mixed up with the dual pairing ·, · V ∗ ,V : V ∗ × V → K which is bilinear.) From this analysis, we conclude that (x, v)X = x (v) = j x, v for all x ∈ X and v ∈ V and thus (x, v)X ≤ j xV vV for all x ∈ X , v ∈ V . (A.16) Definition A.26 (a) A Gelfand triple (or rigged Hilbert space, see [100]) V ⊂ X ⊂ V consists of a reﬂexive Banach space V , a separable Hilbert space X, and the antidual space V of V (all over the same ﬁeld K = R or K = C) such that V is a dense subspace of X, and the imbedding j : V → X is bounded. Furthermore, the sesquilinear form ·, · : V × V → K is an extension of the inner product in X; that is, x, v = (x, v)X for all v ∈ V and x ∈ X. (b) A linear bounded operator K : V → V is called coercive if there exists γ > 0 with x, Kx ≥ γ x2V for all x ∈ V . (A.17) The operator K satisﬁes G˚ arding’s inequality if there exists a linear compact operator C : V → V such that K + C is coercive; that is, x, (K + C)x ≥ γ x2V for all x ∈ V . By the same arguments as in the proof of the Lax–Milgram theorem (see [129]), it can be shown that every coercive operator is an isomorphism from V onto V . Coercive operators play an important role in the study of partial diﬀerential equations and integral equations by variational methods. Often, the roles of V and V are interchanged. For integral operators that are “smoothing”, our deﬁnition seems more appropriate. However, both deﬁnitions are equivalent in the sense that the inverse operator K −1 : V → V is coercive in the usual sense with γ replaced by γ/K2L(V ,V ) . The following theorems are two of the most important results of linear functional analysis. Theorem A.27 (Open Mapping Theorem) Let X, Y be Banach spaces and A : X → Y a linear bounded operator from X onto Y . Then A is open; that is, the images A(U ) ⊂ Y are open in Y for all open sets U ⊂ X. In particular, if A is a bounded isomorphism from X onto Y , then the inverse A−1 : Y → X is bounded. This result is sometimes called the Banach–Schauder theorem. Theorem A.28 (Banach–Steinhaus, Principle of Uniform Boundedness) Let X be a Banach space, Y be a normed space, I be an index set, and Aα ∈ L(X, Y ), α ∈ I, be a collection of linear bounded operators such that sup Aα xY < ∞ α∈I
Then supα∈I Aα L(X,Y ) < ∞.
for every x ∈ X .
A.3 Linear Bounded and Compact Operators
329
As an immediate consequence, we have the following. Theorem A.29 Let X be a Banach space, Y be a normed space, D ⊂ X be a dense subspace, and An ∈ L(X, Y ) for n ∈ N. Then the following two assertions are equivalent: (i) An x → 0 as n → ∞ for all x ∈ X. (ii) supn∈N An L(X,Y ) < ∞ and An x → 0 as n → ∞ for all x ∈ D. We saw in Theorem A.10 that every normed space X possesses a unique com˜ Every linear bounded operator deﬁned on X can also be extended pletion X. ˜ to X. ˜ Y˜ be Banach spaces, X ⊂ X ˜ a dense subspace, and Theorem A.30 Let X, ˜ A : X → Y be linear and bounded. Then there exists a linear bounded operator ˜ → Y˜ with A˜ : X ˜ = Ax for all x ∈ X; that is, A˜ is an extension of A, and (i) Ax ˜ (ii) A ˜ Y˜ ) = AL(X,Y ) . L(X, Furthermore, the operator A˜ is uniquely determined. We now study equations of the form x − Kx = y ,
(A.18)
where the operator norm of the linear bounded operator K : X → X is small. The following theorem plays an essential role in the study of Volterra integral equations. Theorem A.31 (Contraction Theorem, Neumann Series) Let X be a Banach space over R or C and K : X → X be a linear bounded operator with 1/n
lim sup K n L(X) < 1 .
(A.19)
n→∞
∞ Then I −K is invertible, the Neumann series n=0 K n converges in the operator norm, and ∞ K n = (I − K)−1 . n=0
Condition (A.19) is satisﬁed if, for example, K m L(X) < 1 for some m ∈ N. Example A.32
Let Δ := (t, s) ∈ R2 : a < s < t < b .
330
Appendix A
(a) Let k ∈ L2 (Δ). Then the Volterra operator
t k(t, s) x(s) ds ,
(Kx)(t) :=
a < t < b , x ∈ L2 (a, b) ,
(A.20)
a
is bounded in L2 (a, b). There exists m ∈ N with K m L(L2 (a,b)) < 1. The Volterra equation of the second kind
t x(t) −
k(t, s) x(s) ds = y(t) ,
a < t < b,
(A.21)
a
is uniquely solvable in L2 (a, b) for every y ∈ L2 (a, b), and the solution x depends continuously on y. The solution x ∈ L2 (a, b) has the form
t r(t, s) y(s) ds ,
x(t) = y(t) +
t ∈ (a, b) ,
a
with some kernel r ∈ L2 (Δ). (b) Let k ∈ C(Δ). Then the operator K deﬁned by (A.20) is bounded in C[a, b], and there exists m ∈ N with K m ∞ < 1. Equation (A.21) is also uniquely solvable in C[a, b] for every y ∈ C[a, b], and the solution x depends continuously on y. For the remaining part of this section, we assume that X and Y are normed spaces and K : X → Y a linear and bounded operator. Definition A.33 (Compact Operator) The operator K : X → Y is called compact if it maps every bounded set S into a relatively compact set K(S). We recall that a set M ⊂ Y is called relatively compact if every bounded sequence (yj )j in M has an accumulation point in closure(M ); that is, if the closure closure(M ) is compact. The set of all compact operators from X into Y is a closed subspace of L(X, Y ) and even a twosided ideal by part (c) of the following theorem. Theorem A.34 (a) If K1 and K2 are compact from X into Y , then so are K1 + K2 and λK1 for every λ ∈ K. (b) Let Kn : X −→ Y be a sequence of compact operators between Banach spaces X and Y . Let K : X −→ Y be bounded, and let Kn converge to K in the operator norm; that is, Kn − KL(X,Y ) := sup x=0
Kn x − KxY −→ 0 (n −→ ∞) . xX
A.3 Linear Bounded and Compact Operators
331
Then K is also compact. (c) If L ∈ L(X, Y ) and K ∈ L(Y, Z), and L or K is compact, then KL is also compact. (d) Let An ∈ L(X, Y ) be pointwise convergent to some A ∈ L(X, Y ); that is, An x → Ax for all x ∈ X. If K : Z → X is compact, then An K−A KL(Z.Y ) → 0; that is, the operators An K converge to AK in the operator norm. (e) The identity operator x → x is compact as an operator from X into itself if, and only if, X is ﬁnite dimensional. (f ) Every bounded operator K from X into Y with ﬁnitedimensional range is compact. Theorem A.35 (a) Let k ∈ L2 (c, d) × (a, b) . The operator K : L2 (a, b) → L2 (c, d), deﬁned by
b (Kx)(t) :=
k(t, s) x(s) ds ,
t ∈ (c, d) ,
x ∈ L2 (a, b) ,
(A.22)
a
is compact from L2 (a, b) into L2 (c, d). (b) Let k be continuous on [c, d]×[a, b] or weakly singular on [a, b]×[a, b] (in this case [c, d] = [a, b]). Then K deﬁned by (A.22) is also compact as an operator from C[a, b] into C[c, d]. We now study equations of the form x − Kx = y ,
(A.23)
where the linear operator K : X → X is compact. The following theorem extends the wellknown existence results for ﬁnite linear systems of n equations and n variables to compact perturbations of the identity. Theorem A.36 (Riesz) Let X be a normed space and K : X → X be a linear compact operator. (a) The null space N (I − K) = {x ∈ X : x = Kx} is ﬁnitedimensional and the range R(I − K) = (I − K)(X) is closed in X. (b) If I − K is onetoone, then I − K is also surjective, and the inverse (I − K)−1 is bounded. In other words, if the homogeneous equation x − Kx = 0 admits only the trivial solution x = 0, then the inhomogeneous equation x − Kx = y is uniquely solvable for every y ∈ X and the solution x depends continuously on y. The next theorem studies approximations of equations of the form Ax = y. Again, we have in mind that A = I − K.
332
Appendix A
Theorem A.37 Assume that the operator A : X → Y between Banach spaces X and Y has a bounded inverse A−1 . Let An ∈ L(X, Y ) be a sequence of bounded operators that converge in norm to A; that is, An − AL(X,Y ) → 0 as n → ∞. Then, for suﬃciently large n, more precisely for all n with A−1 (An − A)L(X) < 1 ,
(A.24)
the inverse operators A−1 n : Y → X exist and are uniformly bounded by A−1 n L(Y,X) ≤
A−1 L(Y,X) ≤ c. 1 − A−1 (An − A)L(X)
(A.25)
For the solutions of the equations Ax = y
and
An xn = yn ,
the error estimate
xn − xX ≤ c An x − AxY + yn − yY
(A.26)
holds with the constant c from (A.25).
A.4
Sobolev Spaces of Periodic Functions
In this section, we recall deﬁnitions and properties of Sobolev (Hilbert) spaces of periodic functions. A complete discussion including proofs can be found in the monograph [168]. From Parseval’s identity (A.11), we note that x ∈ L2 (0, 2π) if and only if the Fourier coeﬃcients
2π 1 ak = x(s) e−iks ds , k ∈ Z , (A.27) 2π 0
are square summable. In this case ak 2 = k∈Z
1 x2L2 . 2π
If x is periodic and continuously diﬀerentiable on [0, 2π], partial integration of (A.27) yields the formula ak
−i = 2πk
2π
x (s) e−iks ds ;
0
that is, ikak are the Fourier coeﬃcients of x and are thus square summable. r (0, 2π) of L2 (0, 2π) by requirThis motivates the introduction of subspaces Hper ing for their elements a certain decay of the Fourier coeﬃcients ak . In the following we set ψˆk (t) = eikt
for k ∈ Z and t ∈ (0, 2π) .
A.4 Sobolev Spaces of Periodic Functions
333
Definition A.38 (Sobolev space of periodic functions) r For r ≥ 0, the Sobolev space Hper (0, 2π) of order r is deﬁned by r Hper (0, 2π) := x = ak ψˆk ∈ L2 (0, 2π) : (1 + k 2 )r ak 2 < ∞ . k∈Z 0 (0, 2π) Hper
We note that
k∈Z 2
coincides with L (0, 2π).
r Theorem A.39 The Sobolev space Hper (0, 2π) is a Hilbert space with the inner product deﬁned by r (x, y)Hper := (1 + k 2 )r ak bk , (A.28)
where x = by
k∈Z
r ˆ ˆ k∈Z ak ψk and y = k∈Z bk ψk . The norm in Hper (0, 2π) is given r = xHper
1/2 (1 + k 2 )r ak 2
.
k∈Z r The Sobolev space Hper (0, 2π) is dense in L2 (0, 2π). √ 0 0 We note that xL2 = 2π xHper , that is, the norms xL2 and xHper are equivalent on L2 (0, 2π). r [0, 2π] = x ∈ Theorem A.40 (a) For r ∈ N0 := N ∪ {0}, the space Cper
r (0, 2π). C r [0, 2π] : x is 2π − periodic is boundedly embedded in Hper (b) The space T of all trigonometric polynomials n T := ak ψˆk : ak ∈ C, n ∈ N k=−n
is dense in
r Hper (0, 2π)
for every r ≥ 0.
⊂ L2 (0, 2π) as a Banach space (that is, forget We consider r (0, 2π) → the inner product for a moment) with bounded imbedding j : Hper 2 L (0, 2π) which has a dense range. Therefore, we can consider the correspondr r r (0, 2π) ⊂ L2 (0, 2π) ⊂ Hper (0, 2π) where Hper (0, 2π) ing Gelfand triple Hper r denotes the space of all antilinear functionals on Hper (0, 2π), see Deﬁnition A.26. We make the following deﬁnition. r (0, 2π) Hper
−r r (0, 2π) = Hper (0, 2π) the antiDefinition A.41 For r ≥ 0, we denote by Hper r (0, 2π); that is, the space of all antilinear bounded functionals dual space of Hper r r −r (0, 2π). Then Hper (0, 2π) ⊂ L2 (0, 2π) ⊂ Hper (0, 2π) with bounded and on Hper −r (0, 2π) × dense imbeddings. The corresponding sesquilinear form ·, · : Hper r (0, 2π) → C extends the inner product in L2 (0, 2π); that is, Hper
2π ψ, φ = (ψ, φ)
L2
=
ψ(t) φ(t) dt 0
r for all ψ ∈ L2 (0, 2π) and φ ∈ Hper (0, 2π).
334
Appendix A
The following theorems give characterizations in terms of the Fourier coefﬁcients. Theorem A.42 Let again ψˆk (t) = eikt for k ∈ Z and t ∈ (0, 2π). −r r (a) Let ∈ Hper (0, 2π) = Hper (0, 2π) and deﬁne ck := , ψˆk for k ∈ Z. Then 1/2 −r Hper = (1 + k 2 )−r ck 2 k∈Z
and , x =
ck ak
for all x =
k∈Z
r ak ψˆk ∈ Hper (0, 2π) .
(A.29)
k∈Z
(b) Conversely, let ck ∈ C satisfy (1 + k 2 )−r ck 2 < ∞ . k∈Z −r Then , deﬁned by (A.29), is in Hper (0, 2π) with 2 H −r .
k∈Z (1
+ k 2 )−r ck 2 =
per
Proof: (a) Set z N = k≤N ck (1 + k 2 )−r ψˆk for N ∈ N. Then (note that is antilinear) N r −r z (1 + k 2 )−r ck 2 = , z N ≤ Hper Hper k≤N −r = Hper
(1 + k 2 )−2r ck 2 (1 + k 2 )r
k≤N
−r = Hper
(1 + k 2 )−r ck 2
k≤N
+ k 2 )−r ck 2 ≤ 2H −r by letting N tend to inﬁnity. In per r particular, the series converges. Furthermore, for x = k∈Z ak ψˆk ∈ Hper (0, 2π) N N we set x = k≤N ak ψˆk , and have that , x = k≤N ak ck and thus
which proves
, xN
k∈Z (1
= ak ck ≤ k≤N
≤
k≤N
ak 2 (1 + k 2 )r
k≤N r = xN Hper
k≤N
ak (1 + k 2 )r/2 ck (1 + k 2 )−r/2
ck 2 (1 + k 2 )−r
k≤N
ck 2 (1 + k 2 )−r
A.4 Sobolev Spaces of Periodic Functions and thus also , x = N tend to inﬁnity.
k∈Z
335
ak ck and 2H −r ≤
per
k∈Z (1+k
2 −r
)
ck 2 by letting
(b) This is shown in the same way.
−r This theorem makes it possible to equip the Banach space Hper (0, 2π) = with an inner product and make it a Hilbert space.
r (0, 2π) Hper
Theorem A.43 Let r > 0. On L2 (0, 2π), we deﬁne the inner product and norm by (x, y)−r := (1 + k 2 )−r ak bk , (A.30a) k∈Z
x−r
:=
(1 + k 2 )−r ak 2 ,
(A.30b)
k∈Z
respectively, where x = k∈Z ak ψˆk and y = k∈Z bk ψˆk . Then the completion ˜ −r (0, 2π) of L2 (0, 2π) with respect to · −r can be identiﬁed with H −r (0, 2π). H per per −r (0, 2π), where The isomorphism is given by the extension of J : L2 (0, 2π) → Hper Jx, y :=
ak bk
for x =
k∈Z
r ak ψˆk ∈ Hper (0, 2π)
k∈Z
−r and y = k∈Z bk ψˆk ∈ L2 (0, 2π). Therefore, we identify p−r with JpHper −r . and simply write pHper −r Proof: First we show that Jx ∈ Hper (0, 2π). Indeed, by the Cauchy–Schwarz r (0, 2π) inequality, we have for y = k∈Z bk ψˆk ∈ Hper
Jx, y 
≤
(1 + k 2 )−r/2 ak 
k∈Z
≤
(1 + k 2 )r/2 bk 
1/2 ⎛ ⎝
(1 + k 2 )−r ak 2
⎞1/2 (1 + k 2 )r bk 2 ⎠
k≤N
k∈Z
≤
−r yr , xHper
−r 2 −r ≤ x−r for all x ∈ L (0, 2π). By the (0, 2π) with JxHper and thus Jx ∈ Hper previous theorem, applied to = Jx, we have Jx2H −r = k∈Z (1 + k 2 )−r ck 2 per with ck = Jx, ψˆk = ak . Therefore, Jx −r = x−r , and J can be extended Hper
˜ −r (0, 2π) into H −r (0, 2π) (by Theorem A.30). to a bounded operator from H per per −r r (0, 2π) = Hper (0, 2π) and It remains to show that J is surjective. Let ∈ Hper deﬁne ck = , ψˆk and xN = k≤N ck ψˆk for N ∈ N. For y = k∈Z bk ψˆk ∈ r Hper (0, 2π) we have JxN = k≤N bk ck = , k≤N bk ψˆk which converges
336
Appendix A
to , y . Furthermore, (xN ) is a Cauchy sequence with respect to ·−r because of the convergence of k∈Z (1 + k 2 )−r ck 2 by the previous theorem, part (a). r Theorem A.44 (a) For r > s, the Sobolev space Hper (0, 2π) is a dense subspace s r s (0, 2π) is of Hper (0, 2π). The inclusion operator from Hper (0, 2π) into Hper compact. r (b) For all r ≥ 0 and x ∈ L2 (0, 2π) and y ∈ Hper (0, 2π), there holds (x, y)L2 = 2π (x, y)H 0 ≤ 2π x −r yH r . Hper per per
(A.31)
We note that the estimate (A.31) is in accordance with (A.16) because, with −r the imbedding j from L2 (0, 2π) into Hper (0, 2π) and the identiﬁcation J from −r −r ˜ per (0, 2π) onto Hper (0, 2π) of the previous theorem we show easily that the H −r ˜ per imbedding of L2 (0, 2π) into H (0, 2π) is given by J −1 ◦ j where (J −1 ◦ j )x = 2 2πx for x ∈ L (0, 2π). Proof: (a) Denseness is easily seen by truncating the Fourier series. Compactness is shown by deﬁning the ﬁnitedimensional operators JN from r s ˆ ˆ (0, 2π) into Hper (0, 2π) by JN x = a where x = ψ Hper k≤N k k k∈Z ak ψk . Then JN is compact by part (f) of Theorem A.34 and JN x − x2Hper s
=
(1 + k 2 )s ak 2 ≤
k>N
≤
1 N 2(r−s)
1 (1 + k 2 )r ak 2 (1 + N 2 )r−s k>N
x2Hper . r
Therefore, JN converges in the operator norm to the imbedding J which implies, again by Theorem A.34, that also J is compact. (b) This is seen as in the proof of Theorem A.43 with the Cauchy–Schwarz inequality. Theorems A.40 and A.43 imply that the space T of all trigonometric polyr nomials is dense in Hper (0, 2π) for every r ∈ R. Now we study the orthogonal projection and the interpolation operators with respect to equidistant knots and the 2ndimensional space n−1 Tn := ak ψˆk : ak ∈ C (A.32) k=−n
where again ψˆk (t) = eikt for t ∈ [0, 2π] and k ∈ Z. Lemma A.45 Let r, s ∈ R with r ≥ s. (a) The following stability estimate holds r s zn Hper ≤ c nr−s zn Hper
for all zn ∈ Tn .
A.4 Sobolev Spaces of Periodic Functions
337
(b) Let Pn : L2 (0, 2π) → Tn ⊂ L2 (0, 2π) be the orthogonal projection operator. Then Pn is given by n−1
Pn x =
ak ψˆk ,
x ∈ L2 (0, 2π) ,
(A.33)
k=−n
where 1 = 2π
ak
2π
x(s) e−iks ds ,
k ∈ Z,
0
are the Fourier coeﬃcients of x. Furthermore, the following estimate holds: s ≤ x − Pn xHper
Proof:
(a) With zn =
zn 2Hper r
≤
1 r xHper nr−s
n−1 k=−n
(A.34)
ak ψˆk this follows simply by
(1 + k 2 )r ak 2 =
k≤n
≤
r for all x ∈ Hper (0, 2π) ,
(1 + k 2 )r−s (1 + k 2 )s ak 2
k≤n
(1 + n2 )r−s zn 2Hper ≤ (2n2 )r−s zn 2Hper . s s
(b) Let x = k∈Z ak ψˆk ∈ L2 (0, 2π) and deﬁne the righthand side of (A.33) n−1 by z; that is, z = k=−n ak ψˆk ∈ Tn . The orthogonality of ψˆk implies that x − z is orthogonal to Tn . This proves that z coincides with Pn x. Now let r x ∈ Hper (0, 2π). Then x − Pn x2Hper s
≤
(1 + k 2 )s ak 2
k≥n
=
(1 + k 2 )−(r−s) (1 + k 2 )r ak 2
!
k≥n
≤
(1 + n2 )s−r x2Hper ≤ n2(s−r) x2Hper . r r
Now let tj := j nπ , j = 0, . . . , 2n − 1, be equidistantly chosen points in [0, 2π]. Interpolation of smooth periodic functions by trigonometric polynomials can be found in numerous books as, for example, in [72]. Interpolation in Sobolev spaces of integer orders can be found in [42]. We give a diﬀerent and much simpler proof of the error estimates that are optimal and hold in Sobolev spaces of fractional order. Theorem A.46 For every n ∈ N and every 2πperiodic function x ∈ C[0, 2π], there exists a unique pn ∈ Tn with x(tj ) = pn (tj ) for all j = 0, . . . , 2n − 1. The
338
Appendix A
trigonometric interpolation operator Qn : Cper [0, 2π] = x ∈ C[0, 2π] : x(0) =
x(2π) → Tn has the form Qn x =
2n−1
x(tk ) Lk
k=0
with Lagrange interpolation basis functions Lk (t) =
n−1 1 im(t−tk ) e , 2n m=−n
k = 0, . . . , 2n − 1 .
(A.35)
The interpolation operator Qn has an extension to a bounded operator from r r (0, 2π) into Tn ⊂ Hper (0, 2π) for all r > 12 . Furthermore, Qn obeys estiHper mates of the form s x − Qn xHper ≤
c r xHper nr−s
r for all x ∈ Hper (0, 2π) ,
(A.36)
where 0 ≤ s ≤ r and r > 12 . The constant c depends only on s and r. In r (0,2π)) is uniformly bounded with respect to n. particular, Qn L(Hper Proof: The proof of the ﬁrst part can be found in, for example, [168]. Let x(t) = m∈Z am exp(imt). Direct calculation shows that for smooth functions x the interpolation is given by (Qn x)(t) =
n−1
a ˆj eijt
with
j=−n
a ˆj
=
2n−1 1 x(tk ) e−ijkπ/n , 2n
j = −n, . . . , n − 1 .
k=0
The connection between the continuous and discrete Fourier coeﬃcients is simply a ˆj
=
2n−1 1 am eimkπ/n−ijkπ/n 2n
=
2n−1 #k " 1 ei(m−j)π/n am = aj+2n 2n
k=0 m∈Z
m∈Z
k=0
∈Z
where x = m∈Z am ψˆm . It is suﬃcient to estimate Pn x − Qn x because the required estimate holds for x − Pn x by formula (A.34). We have Pn x − Qn x =
n−1
[am − a ˆ m ] ψm
m=−n
A.4 Sobolev Spaces of Periodic Functions
339
and thus by the Cauchy–Schwarz inequality Pn x − Qn x2Hper s =
n−1
am − a ˆm 2 (1 + m2 )s ≤ c n2s
≤
2 a m+2n
m=−n =0
m=−n
≤
n−1
2 r/2 1 + (m + 2n)2 m=−n =0 % n−1 $ r 1 r . 1 + (m + 2n)2 am+2n 2 c n2s 1 + (m + 2n)2 m=−n =0 =0 c n2s
n−1
r/2 ! 1 + (m + 2n)2 am+2n
1
From the obvious estimate '−2r & m −r + 1 + (m + 2n)2 ≤ (2n)−2r ≤ c n−2r 2n =0
=0
for all m ≤ n and n ∈ N, we conclude that Pn x − Qn x2Hper s
≤
c n2(s−r)
≤
2(s−r)
n−1
r 1 + (m + 2n)2 am+2n 2
m=−n =0
cn
x2Hper . r
For realvalued functions, it is more convenient to study the orthogonal projection and interpolation in the 2ndimensional space ⎧ ⎫ n n−1 ⎨ ⎬ aj cos(jt) + bj sin(jt) : aj , bj ∈ R . ⎩ ⎭ j=0
j=1
In this case, the Lagrange interpolation basis functions Lk are given by (see [168]) n−1 1 1 + 2 cos m(t − tk ) + cos n(t − tk ) , (A.37) Lk (t) = 2n m=1 k = 0, . . . , 2n − 1, and the estimates (A.34) and (A.36) are proven by the same arguments. Theorem A.47 Let r ∈ N and k ∈ C r [0, 2π] × [0, 2π] be 2πperiodic with respect to both variables. Then the integral operator K, deﬁned by
2π (Kx)(t) :=
k(t, s) x(s) ds ,
t ∈ (0, 2π) ,
(A.38)
0 p r can be extended to a bounded operator from Hper (0, 2π) into Hper (0, 2π) for every −r ≤ p ≤ r.
340 Proof:
Appendix A Let x ∈ L2 (0, 2π). From dj (Kx)(t) = dtj
2π 0
∂ j k(t, s) x(s) ds , ∂tj
j = 0, . . . , r ,
we conclude from Theorem A.44 that for x ∈ L2 (0, 2π) j j ∂ k(t, ·) d −r dtj (Kx)(t) ≤ 2π ∂tj r xHper H per
and thus r −r ≤ c1 KxC r ≤ c2 xHper KxHper
for all x ∈ L2 (0, 2π). Application of Theorem A.30 yields the assertion because −r (0, 2π). L2 (0, 2π) is dense in Hper
A.5
Sobolev Spaces on the Unit Disc
Let B = {x ∈ R2 : x < 1} be the open unit disc with boundary ∂B. In this section, we consider functions from B into C that we describe by Cartesian coordinates x = (x1 , x2 ) or by polar coordinates (r, ϕ). Functions on the boundary are identiﬁed with 2πperiodic functions on R. As in the case of the s (0, 2π) we deﬁne the Sobolev space H 1 (B) by completion. Sobolev spaces Hper Definition A.48 The Sobolev space H 1 (B) is deﬁned as the completion of C ∞ (B) with respect to the norm
! f H 1 (B) = f (x)2 + ∇f (x)2 dx . (A.39) B
We express the norm (A.39) in polar coordinates (r, ϕ). The gradient is given in polar coordinates as 1 ∂f (r, ϕ) ∂f (r, ϕ) rˆ + ϕˆ , ∂r r ∂ϕ ϕ sin ϕ where rˆ = cos ˆ = −cos denote the unit vectors. We ﬁx r > 0 and sin ϕ and ϕ ϕ expand the function f (r, ·) (formally) into a Fourier series with respect to ϕ: f (r, ϕ) = fm (r) eimϕ ∇f (r, ϕ) =
m∈Z
with Fourier coeﬃcients 1 fm (r) = 2π
2π 0
f (r, t) e−imt dt ,
m ∈ Z,
A.5 Sobolev Spaces on the Unit Disc
341
that depend on r. Therefore, ∂f (r, ϕ) = fm (r) eimϕ , ∂r m∈Z
∂f (r, ϕ) = i fm (r) m eimϕ . ∂ϕ m∈Z
The norm in H 1 (B) is given by f 2H 1 (B)
1 % $ m2 = 2π (r)2 r dr , 1 + 2 fm (r)2 + fm r
(A.40)
m∈Z 0
because ∇f (r, ϕ)
2
∂f (r, ϕ) 2 + 1 = ∂r r2
∂f (r, ϕ) 2 ∂ϕ
and 2
2π imϕ f (r) e fm (r)2 . m dϕ = 2π 0
m∈Z
m∈Z
To every function f ∈ C ∞ (B), one can assign the trace f ∂B on ∂B. We denote this mapping by τ , thus τ : C ∞ (B) → C ∞ (∂B) is deﬁned as τ f = f ∂B . The following result is central. Theorem A.49 (Trace Theorem) The trace operator τ has an extension to a bounded operator from H 1 (B) to 1/2 H 1/2 (∂B), where again H 1/2 (∂B) is identiﬁed with the Sobolev space Hper (0, 2π) 1 of periodic functions (see Deﬁnition A.38). Furthermore, τ : H (B)→H 1/2 (∂B) is surjective. More precisely, there exists a bounded linear operator E : H 1/2 (∂B) → H 1 (B) with τ ◦ E = I on H 1/2 (∂B) (that is, E is a right inverse of τ ). Proof: Let f ∈ C ∞ (B). Then (see (A.40) and (A.28)) f 2H 1 (B)
=
τ f 2H 1/2 (∂B)
=
1 % $ m2 2 2 2π 1 + 2 fm (r) + fm (r) r dr , r m∈Z 0 1 + m2 fm (1)2 . m∈Z
We estimate, using the fundamental theorem of calculus and the inequality of Cauchy–Schwarz,
342
fm (1)
Appendix A
2
1 = 0
d 2 r fm (r)2 dr dr
1 =
1
2
fm (r) r dr + 2 Re
2 0
≤
(r) r 2 dr fm (r) fm
0
1 1
1 (r)2 r 2 dr . 2 fm (r)2 r dr + 2 fm (r)2 r2 dr fm 0
0
Using the inequality 2ab ≤ 1 + m2 fm (1)2
√
≤ 2
2 √ b 1+m2
1 + m2 a2 +
1 1+
0
yields
fm (r)2 r dr
m2 0 2
1
fm (r) r dr +
+ (1 + m ) 0
≤ 3 (1 + m2 )
1 ≤ 3 0
1
2 2
1 0 2
m 1+ 2 r
fm (r)2 r dr +
1 0
2
fm (r)2 r2 dr
0 fm (r)2 r dr
1
fm (r) r dr +
fm (r)2 r dr ,
0
where we have also used the estimates r2 ≤ r for r ∈ [0, 1] and 1 + m2 . By summation, we conclude that τ f 2H 1/2 (∂B) ≤
3 f 2H 1 (B) 2π
√ 1 + m2 ≤
for all f ∈ C ∞ (B) .
(A.41)
Therefore, the trace operator is bounded with respect to the norms of H 1 (B) and H 1/2 (∂B). By the general functional analytic Theorem A.30, the operator has an extension to a bounded operator from H 1 (B) into H 1/2 (∂B). We deﬁne the operator E : C ∞ (∂B) → C ∞ (B) by (Ef )(r, ϕ) =
fm rm eimϕ ,
r ∈ [0, 1] , ϕ ∈ [0, 2π] .
m∈Z
Here, again, fm are the Fourier coeﬃcients of f ∈ C ∞ (∂B). Obviously, (τ Ef )(ϕ) = m∈Z fm eimϕ = f (ϕ); that is, E is a right inverse of τ .
A.5 Sobolev Spaces on the Unit Disc
343
It remains to show the boundedness of E. 1 % $ m2 2 2 2m 2 2 2m−2 Ef H 1 (B) = 2π + fm  m r 1 + 2 fm  r r dr , r m∈Z 0 1 + m ≤ 2π = 2π fm 2 fm 2 (1 + m) 2m + 2 m∈Z m∈Z √ √ 2 ≤ 2 2π fm  1 + m2 = 2 2πf 2H 1/2 (∂B) m∈Z
√ √ where we used the inequality 1 + m ≤ 2 1 + m2 . Therefore, E also pos sesses an extension to a bounded operator from H 1/2 (∂B) to H 1 (B). Remark: The trace operator is compact when considered as an operator from H 1 (B) to L2 (∂B) because it is the composition of the bounded operator τ : H 1 (B) → H 1/2 (∂B) and the compact embedding j : H 1/2 (∂B) → L2 (∂B). We now consider the subspaces
f ∈ L2 (∂B) : f d = 0 , L2 (∂B) = ∂B
1/2 f ∈ H 1/2 (∂B) : f d = 0 , H (∂B) = ∂B
f ∈ H 1 (B) : τ f d = 0 . H 1 (B) = ∂B
2π
1/2
Because 0 exp(imϕ) dϕ = 0 for m = 0, the spaces H (∂B) and H 1 (B) consist exactly of the functions with the representations fm eimϕ and f (ϕ) = m∈Z m=0
f (r, ϕ)
=
fm (r) eimϕ ,
m∈Z
that satisfy the summation conditions 1 + m2 fm 2 < ∞
and
m∈Z m=0 1 % $ m2 (r)2 r dr < ∞ 1 + 2 fm (r)2 + fm r
and f0 (1) = 0 ,
m∈Z 0
respectively. We can deﬁne an equivalent norm in the subspace H 1 (B). This is a consequence of the following result.
344
Appendix A
Theorem A.50 (Friedrich’s Inequality) For all f ∈ H 1 (B), we have √ f L2 (B) ≤ 2 ∇f L2 (B) .
(A.42)
Proof: Again, we use the representation of the norm in polar coordinates: r fm (r)
2
r = 0
d s fm (s)2 ds ds
r
r
2
0
1 ≤
0
1 1 2 2 (s)2 s ds fm (s) ds + 2 fm (s) s ds fm
0
1 ≤
(s) s ds fm (s) fm
fm (s) ds + 2 Re
=
0
0
1
fm (s)2 (1 + s) ds +
0
fm (s)2 s ds ,
0 2
2
where we again used 2ab ≤ a + b in the last step. First let m ≥ 1. By 1 + s ≤ 2 ≤ 2m2 /s, it is r fm (r)2 ≤ 2
1 0
m2 fm (s)2 s ds + s2
1
fm (s)2 s ds ,
0
and thus by integration
1
1
2
r fm (r) dr ≤ 2 0
0
m2 2 2 f (s) + f (s) s ds . m m s2
We ﬁnally consider f0 . It is f0 (r) = − r f0 (r)
2
1 ≤ (1 − r) r
f0 (s)2 r ds
1 r
(A.43)
f0 (s) ds because f0 (1) = 0, thus
1
≤ r
f0 (s)2 s ds
1 ≤
f0 (s)2 s ds .
0
Therefore, (A.43) also holds for m = 0. Summation with respect to m yields the assertion. Remark: Therefore, f → ∇f L2 (B) deﬁnes an equivalent norm to · H 1 (B) in H 1 (B). Indeed, for f ∈ H 1 (B) it holds by Friedrich’s inequality: f 2H 1 (B) = f 2L2 (B) + ∇f 2L2 (B) ≤ 3 ∇f 2L2 (B) ,
A.6 Spectral Theory for Compact Operators in Hilbert Spaces
345
thus 1 √ f H 1 (B) ≤ ∇f L2 (B) ≤ f H 1 (B) 3
for all f ∈ H 1 (B) .
(A.44)
So far, we considered spaces of complexvalued functions. The spaces of realvalued functions are closed subspaces. In the Fourier representation, one has f (r, ϕ) = fm (r) e−imϕ = f−m (r) eimϕ = f (r, ϕ) = fm (r) eimϕ m∈Z
m∈Z
m∈Z
because f (r, ϕ) = f (r, ϕ). Therefore, f−m = fm for all m. All of the theorems remain valid also for Sobolev spaces of realvalued functions.
A.6
Spectral Theory for Compact Operators in Hilbert Spaces
Definition A.51 (Spectrum) Let X be a normed space and A : X −→ X be a linear operator. The spectrum σ(A) is deﬁned as the set of (complex) numbers λ such that the operator A − λI does not have a bounded inverse on X. Here, I denotes the identity on X. λ ∈ σ(A) is called an eigenvalue of A if A − λI is not onetoone. If λ is an eigenvalue, then the nontrivial elements x of the kernel N (A − λI) = {x ∈ X : Ax − λx = 0} are called eigenvectors of A. This deﬁnition makes sense for arbitrary linear operators in normed spaces. For noncompact operators A, it is possible that even for λ = 0 the operator A − λI is onetoone but fails to be bijective. As an example, we consider X = 2 and deﬁne A by 0, if k = 1 , (Ax)k := xk−1 , if k ≥ 2 , for x = (xk ) ∈ 2 . Then λ = 1 belongs to the spectrum of A but is not an eigenvalue of A, see Problem A.4. Theorem A.52 Let A : X → X be a linear bounded operator. (a) Let xj ∈ X, j = 1, . . . , n, be a ﬁnite set of eigenvectors corresponding to pairwise diﬀerent eigenvalues λj ∈ C. Then {x1 , . . . , xn } are linearly independent. If X is a Hilbert space and A is selfadjoint (that is, A∗ = A), then all eigenvalues λj are realvalued and the corresponding eigenvectors x1 , . . . , xn are pairwise orthogonal. (b) Let X be a Hilbert space and A : X → X be selfadjoint. Then AL(X) =
sup (Ax, x)X  = r(A) ,
x X =1
where r(A) = sup{λ : λ ∈ σ(A)} is called the spectral radius of A.
346
Appendix A
The situation is simpler for compact operators. We collect the most important results in the following fundamental theorem. Theorem A.53 (Spectral Theorem for Compact SelfAdjoint Operators) Let K : X → X be compact and selfadjoint (and K = 0). Then the following holds: (a) The spectrum consists only of eigenvalues and possibly 0. Every eigenvalue of K is realvalued. K has at least one but at most a countable number of eigenvalues with 0 as the only possible accumulation point. (b) For every eigenvalue λ = 0, there exist only ﬁnitely many linearly independent eigenvectors; that is, the eigenspaces are ﬁnitedimensional. Eigenvectors corresponding to diﬀerent eigenvalues are orthogonal. (c) We order the eigenvalues in the form λ1  ≥ λ2  ≥ λ3  ≥ . . . and denote by Pj : X → N (K − λj I) the orthogonal projection onto the eigenspace corresponding to λj . If there exist only a ﬁnite number λ1 , . . . , λm of eigenvalues, then K =
m
λ j Pj .
j=1
If there exists an inﬁnite sequence (λj ) of eigenvalues, then K =
∞
λ j Pj ,
j=1
where the series converges in the operator norm. Furthermore, m K − λ P = λm+1  . j j j=1
L(X)
(d) Let H be the linear span of all of the eigenvectors corresponding to the eigenvalues λj = 0 of K. Then X = closure(H) ⊕ N (K) . Sometimes, part (d) is formulated diﬀerently. For a common treatment of the cases of ﬁnitely and inﬁnitely many eigenvalues, we introduce the index set J ⊂ N, where J is ﬁnite in the ﬁrst case and J = N in the second case. For every eigenvalue λj , j ∈ J, we choose an orthonormal basis of the corresponding eigenspace N (K − λj I). Again, let the eigenvalues λj = 0 be ordered in the form λ1  ≥ λ2  ≥ λ3  ≥ . . . > 0 .
A.6 Spectral Theory for Compact Operators in Hilbert Spaces
347
By counting every λj = 0 relative to its multiplicity, we can assign an eigenvector xj to every eigenvalue λj . Then every x ∈ X possesses an abstract Fourier expansion of the form x = x0 +
(x, xj )X xj
j∈J
for some x0 ∈ N (K) and Kx =
λj (x, xj )X xj .
j∈J
As a corollary, we observe that the set {xj : j ∈ J} of all eigenvectors forms a complete system in X if K is onetoone. The eigenvalues can be expressed by Courant’s maxinf and minsup principle. We need it in the following form. Theorem A.54 Let K : X → X be compact and selfadjoint (and K = 0) and + let {λ− j : j = 1, . . . , n− } and {λj : j = 1, . . . , n+ } be its negative and positive eigenvalues, respectively, ordered as − − + + + λ− 1 ≤ λ 2 ≤ · · · ≤ λ j ≤ · · · < 0 < · · · ≤ λj ≤ · · · ≤ λ 2 ≤ λ 1
and counted according to multiplicity. Here n± can be zero (if no positive or negative, respectively, eigenvalues occur), ﬁnite, or inﬁnity. (a) If there exist positive eigenvalues λ+ m then λ+ m =
min
sup
V ⊂X x∈V ⊥ dim V =m−1
(Kx, x)X . x2X
(b) If there exist negative eigenvalues λ− m then λ− m =
max
inf
V ⊂X x∈V dim V =m−1
⊥
(Kx, x)X . x2X
Proof: We only prove part (b) because this is needed in the proof of Theorem 7.52. Set (Kx, x)X μm = sup inf ⊥ x2X x∈V V ⊂X dim V =m−1
for any x ∈ X we use the representation as x = x0 + + For abbreviation. − − + c x + c x with Kx0 = 0. Here, x± j are the eigenvectors correspondj j j j j j − −2 ± − 2 + + 2 ing to λj . Then (Kx, x)X = j λ− c  + j j j λj cj  ≥ j λj cj  .
348
Appendix A
⊥ For V = span{x− we have that c− j : j = 1, . . . , m − 1} and x ∈ V j = − (x, xj )X = 0 for j = 1, . . . , m − 1 and thus
(Kx, x)X ≥ inf ⊥ x2X x∈V
inf
x∈V ⊥
− 2 λ− j cj  + 2 ≥ λ− m − 2 c  + j j j cj 
j≥m
x0 2X +
− because λ− j < 0. Therefore, μm ≥ λm . We note that for this choice of V the − inﬁmum is attained for x = xm . Let, on the other hand, V ⊂ X be an arbitrary subspace of dimension m − 1. ˆ ∈ span{x− Choose a basis {vj : j = 1, . . . , m − 1} of V . Construct x j : j = 1, . . . , m} such that x ˆ ⊥ V and ˆ xX= 1. This is possible. Indeed, the ansatz m m − x ˆ = j=1 cj x− j leads to the system j=1 cj (xj , v )X = 0 for = 1, . . . , m − 1. This system of m − 1 equations and mvariables has a nontrivial solution which m can be normalized such that ˆ x2X = j=1 cj 2 = 1. Therefore,
inf
x∈V ⊥
m m (Kx, x)X − 2 − ≤ (K x ˆ , x ˆ ) = λ c  ≤ λ cj 2 = λ− X j m m. j x2X j=1 j=1
This shows μm ≤ λ− m and ends the proof.
The following corollary is helpful. Corollary A.55 Let K : X → X and λ± j as in the previous theorem. (a) If there exists a subspace W ⊂ X of dimension m with (Kx, x)X +x2X ≤ 0 for all x ∈ W then (at least) m negative eigenvalues exist and λ− 1 ≤ ≤ −1. · · · 0 the ordered sequence of the positive singular values of K, counted relative to its multiplicity. Then there exist orthonormal systems {xj : j ∈ J} ⊂ X and {yj : j ∈ J} ⊂ Y with the following properties: Kxj = μj yj
and
K ∗ yj = μj xj
for all j ∈ J .
The system {μj , xj , yj : j ∈ J} is called a singular system for K. Every x ∈ X possesses the singular value decomposition x = x0 + (x, xj )X xj j∈J
for some x0 ∈ N (K) and Kx =
μj (x, xj )X yj .
j∈J
Note that J ⊂ N can be ﬁnite or inﬁnite; that is, J = N.
350
Appendix A
The following theorem characterizes the range of a compact operator with the help of a singular system. Theorem A.58 (Picard) Let K : X −→ Y be a linear compact operator with singular system {μj , xj , yj : j ∈ J}. The equation Kx = y (A.45) is solvable if and only if y ∈ N (K ∗ )⊥
1 (y, yj )Y 2 < ∞ . μ2j
and
(A.46)
j∈J
In this case x =
1 (y, yj )Y xj μj j∈J
is a solution of (A.45). We note that the solvability conditions (A.46) require a fast decay of the Fourier coeﬃcients of y with respect to the orthonormal system {yj : j ∈ J} in order for the series ∞ 1 2 2 (y, yj )Y  μ j=1 j to converge. Of course, this condition is only necessary for the important case where there exist inﬁnitely many singular values. As a simple example, we study the following integral operator. Example A.59 Let K : L2 (0, 1) −→ L2 (0, 1) be deﬁned by
t x(s) ds ,
(Kx)(t) :=
t ∈ (0, 1) , x ∈ L2 (0, 1) .
0
Then ∗
1
(K y)(t) =
y(s) ds
and
1 s
∗
(K Kx)(t) =
t
x(τ ) dτ ds . t
0
The eigenvalue problem K ∗ Kx = λx is equivalent to
1 s λx(t) =
x(τ ) dτ ds ,
t
0
t ∈ [0, 1] .
A.7 The Fr´echet Derivative
351
Diﬀerentiating twice, we observe that for λ = 0 this is equivalent to the eigenvalue problem λx + x = 0 in (0, 1) , Solving this yields xj (t) =
x(1) = x (0) = 0 .
2j − 1 2 cos πt, t ∈ [0, 1] , π 2
and λj =
4 (2j − 1)2 π 2
for j ∈ N. The singular values μj and the ONS {yj : j ∈ N} are given by μj
2 , j ∈ N, (2j − 1)π
=
yj (t)
=
and
2j − 1 2 sin πt , j ∈ N . π 2
The singular value decomposition of Theorem A.57 !makes it possible to deﬁne, for every continuous function f : 0, K2L(X,Y ) → R, the operator f (K ∗ K) from X into itself by f (K ∗ K)x = f (μ2j ) (x, xj )X xj , x ∈ X . (A.47) j∈J
This operator is always welldeﬁned, linear, and bounded. It is compact if, and only√ if, f (0) = 0 (see Problem A.5). The special cases f (t) = t and f (t) √ = t are of particular importance. From this deﬁnition we note that R K ∗ K = R(K ∗ ). K of the previous Example A.59 we note that x ∈ For the operator R (K ∗ K)σ/2 if, and only if, ∞ j=1
1 cj 2 < ∞ (2j − 1)2σ
where
cj =
2 π
1 x(t) cos 0
2j − 1 πt dt 2
are the Fourier coeﬃcients of x. Therefore, R (K ∗ K)σ/2 plays the role of the σ periodic Sobolev spaces Hper (0, 2π) of Section A.4.
A.7
The Fr´ echet Derivative
In this section, we brieﬂy recall some of the most important results for nonlinear mappings between normed spaces. The notions of continuity and diﬀerentiability carry over in a very natural way. Definition A.60 Let X and Y be normed spaces over the ﬁeld K = R or C, U ⊂ X an open subset, x ˆ ∈ U , and T : X ⊃ U → Y be a (possibly nonlinear) mapping.
352
Appendix A
(a) T is called continuous at x ˆ if for every ε > 0 there exists δ > 0 such that T (x) − T (ˆ x)Y ≤ ε for all x ∈ U with x − x ˆX ≤ δ. (b) T is called Fr´echet diﬀerentiable at x ˆ ∈ U if there exists a linear bounded operator A : X → Y (depending on x ˆ) such that lim
h→0
1 T (ˆ x + h) − T (ˆ x) − AhY = 0 . hX
(A.48)
x) := A. In particular, T (ˆ x) ∈ L(X, Y ). We write T (ˆ (c) The mapping T is called continuously Fr´echet diﬀerentiable at x ˆ ∈ U if T is Fr´echet diﬀerentiable in a neighborhood V of x ˆ and the mapping T : V → L(X, Y ) is continuous in x ˆ. Continuity and diﬀerentiability of a mapping depend on the norms in X and Y , in contrast to the ﬁnitedimensional case. If T is diﬀerentiable in x ˆ, then the linear bounded mapping A in part (b) of Deﬁnition A.60 is unique. x) := A is welldeﬁned. If T is diﬀerentiable in x, then T is also Therefore, T (ˆ continuous in x. In the ﬁnitedimensional case X = Kn and Y = Km , the linear bounded mapping T (x) is given by the Jacobian (with respect to the Cartesian coordinates). Example A.61 (Integral Operator) Let f : [c, d] × [a, b] × K → K, f = f (t, s, r), K = R or K = C, be continuous with respect to all arguments and also ∂f /∂r ∈ C [c, d] × [a, b] × R . (a) Let the mapping T : C[a, b] → C[c, d] be deﬁned by
b T (x)(t) :=
f t, s, x(s) ds ,
t ∈ [c, d] , x ∈ C[a, b] .
(A.49)
a
We equip the normed spaces C[c, d] and C[a, b] with the maximum norm. Then T is continuously Fr´echet diﬀerentiable with derivative
b
T (x)z (t) =
∂f t, s, x(s) z(s) ds , ∂r
t ∈ [c, d] , x, z ∈ C[a, b] .
a
Indeed, let (Az)(t) be the term on the righthand side. By assumption, ∂f /∂r is continuous on [a, b] × [c, d] × R, thus uniformly continuous on the compact set M = {(t, s,r) ∈ [a, b] × [c, d] × K : r ≤ x∞ + 1}. Let ε > 0. Choose ∂f ε δ ∈ (0, 1) with ∂f ˜) ≤ b−a for all (t, s, r), (t, s, r˜) ∈ M with ∂r (t, s, r) − ∂r (t, s, r
A.7 The Fr´echet Derivative
353
r − r˜ ≤ δ. We estimate for z ∈ C[a, b] with z∞ ≤ δ: T (x + z)(t) − T (x)(t) − (Az)(t) b $ % ∂f = t, s, x(s) z(s) ds f t, s, x(s) + z(s) − f t, s, x(s) − ∂r a b 1 ∂f d = f t, s, x(s) + rz(s) − t, s, x(s) z(s) dr ds dr ∂r 0
a
≤
b 1 ∂f ∂f z(s) dr ds t, s, x(s) + rz(s) − t, s, x(s) ∂r ∂r a
0
b ≤
z∞ a
ε ds = εz∞ . b−a
This holds for all t ∈ [c, d], thus also for the maximum. Therefore, T (x + z) − T (x) − Az∞ ≤ ε z∞
for all z ∈ C[a, b] with z∞ ≤ δ .
This proves the diﬀerentiability of T from C[a, b] into C[c, d]. (b) Let now in addition ∂f /∂r ∈ C [a, b] × [c, d] × K be Lipschitz continuous with respect to r; that is, there exists κ > 0 with ∂f (t, s, r)/∂r − ∂f (t, s, r˜ ≤ κr − r˜ for all (t, s, r), (t, s, r˜) ∈ [a, b] × [c, d] × K. Then the operator T of (A.49) is also Fr´echetdiﬀerentiable as an operator from L2 (a, b) into C[c, d] (and thus into L2 (c, d)) with the same representation of the derivative. Indeed, deﬁne again the operator A as in part (a). Then A is bounded because
(Az)(t)
≤
b ∂f z(s) ds t, s, x(s) ∂r
≤
%
b $ ∂f + κx(s) z(s) ds t, s, 0 ∂r
a
a
≤
! max ∂f (τ, ·, 0)/∂rL2 (a,b) + κxL2 (a,b) zL2 (a,b)
c≤τ ≤d
for all t ∈ [c, d]. Concerning the computation of the derivative we can proceed in the same way
354
Appendix A
as in part (a): T (x + z)(t) − T (x)(t) − (Az)(t) ≤
b 1 ∂f ∂f z(s) dr ds t, s, x(s) + rz(s) − t, s, x(s) ∂r ∂r 0
a
b 1 ≤
κ a
0
κ r z(s) dr ds = 2 2
b
z(s)2 ds =
κ z2L2 (a,b) . 2
a
This holds for all t ∈ [c, d]. Therefore, T (x + z) − T (x) − Az∞ ≤ which proves the diﬀerentiability.
κ 2
z2L2 (a,b)
The following theorem collects further properties of the Fr´echet derivative. Theorem A.62 (a) Let T, S : X ⊃ U → Y be Fr´echet diﬀerentiable at x ∈ U . Then T + S and λT are also Fr´echet diﬀerentiable for all λ ∈ K and (T + S) (x) = T (x) + S (x) ,
(λT ) (x) = λ T (x) .
(b) Chain rule: Let T : X ⊃ U → V ⊂ Y and S : Y ⊃ V → Z be Fr´echet diﬀerentiable at x ∈ U and T (x) ∈ V , respectively. Then S T is also Fr´echet diﬀerentiable at x and (S T ) (x) = S T (x) T (x) . /0 1 . /0 1
∈ L(X, Z) .
∈L(Y,Z) ∈L(X,Y )
(c) Special case: If x ˆ, h ∈ X are ﬁxed and T : X → Y is Fr´echet diﬀerentiable on X, then ψ : K → Y , deﬁned by ψ(t) := T (ˆ x + th), t ∈ K, x + th)h ∈ Y . Note that origiis diﬀerentiable on K and ψ (t) = T (ˆ nally ψ (t) ∈ L(K, Y ). In this case, one identiﬁes the linear mapping ψ (t) : K → Y with its generating element ψ (t) ∈ Y . If T is Lipschitz continuous then the following estimate holds. Lemma A.63 Let T be diﬀerentiable in the ball B(x, ρ) centered at x ∈ U with radius ρ > 0, and let there exists γ > 0 with T (x) − T (x)L(X,Y ) ≤ γx − xX for all x ∈ B(x, ρ). Then T (x) − T (x) − T (x)(x − x) ≤ γ x − x2X Y 2
for all x ∈ B(x, ρ) .
A.7 The Fr´echet Derivative
355
Proof: Let ∈ Y ∗ andx ∈ B(x, ρ) kept ﬁxed. Set h = x − x and deﬁne the scalar function f (t) = , T (x + th) for t < ρ/hX where ·, · = ·, · Y ∗ ,Y denotes the dual pairing in Y ∗ ,Y . Then, by the chain rule of the previous Theorem, f (t) = , T (x + th)h and thus , T (x) − T (x) − T (x)h = , T (x) − , T (x) − , T (x)h 1 = f (1) − f (0) − , T (x)h = [f (t) − , T (x)h ] dt 0 1 ! = , T (x + th)h − , T (x)h dt 0 1 ! = , T (x + th) − T (x) h dt 0
≤
Y ∗ hX
≤
γY ∗ h2X
T (x + th) − T (x) dt L(X,Y )
1
0
1
t dt = 0
γ Y ∗ h2X . 2
We set y := T (x) − T (x) − T (x)h and choose ∈ Y ∗ with Y ∗ = 1 and , y = yY . This is possible by a well known consequence of the HahnBanach theorem (see [151], Chap.V, §7, Theorem 2). Then the assertion follows. We recall Banach’s contraction mapping principle (compare with Theorem A.31 for the linear case). Theorem A.64 (Contraction Mapping Principle) Let C ⊂ X be a closed subset of the Banach space X and T : X ⊃ C → X a (nonlinear) mapping with the properties (a) T maps C into itself; that is, T (x) ∈ C for all x ∈ C, and (b) T is a contraction on C; that is, there exists c < 1 with T (x) − T (y)X ≤ c x − yX
for all x, y ∈ C .
(A.50)
Then there exists a unique x ˜ ∈ C with T (˜ x) = x ˜. The sequence (x ) in C, ˜ for every x0 ∈ C. Furtherdeﬁned by x+1 := T (x ), = 0, 1, . . . converges to x more, the following error estimates hold: x+1 − x ˜X ≤ c x − x ˜X ,
= 0, 1, . . . ;
(A.51a)
that is, the sequence converges linearly to x ˜, x − x ˜X
≤
x − x ˜X
≤
for = 1, 2, . . .
c x1 − x0 X , (a priori estimate) 1−c 1 x+1 − x X , (a posteriori estimate) 1−c
(A.51b) (A.51c)
356
Appendix A
The Newton method for systems of nonlinear equations has a direct analogy for equations of the form T (x) = y, where T : X → Y is a continuously Fr´echet diﬀerentiable mapping between Banach spaces X and Y . We formulate a simpliﬁed Newton method and prove local linear convergence. It diﬀers from the ordinary Newton method not only by replacing the derivative T (x ) by x) but also by requiring only the existence of a left inverse. T (ˆ Theorem A.65 (Simpliﬁed Newton Method) Let T : X → Y be continuously Fr´echet diﬀerentiable between Banach spaces X and Y . Let V ⊂ X be a closed subspace, x ˆ ∈ V and yˆ := T (ˆ x) ∈ Y . Let x) : X → Y L : Y → V be linear and bounded such that L is a left inverse of T (ˆ x)v = v for all v ∈ V . on V ; that is, L T (ˆ Then there exists ε > 0 such that for any y¯ = T (¯ x) with x ¯ ∈ X and ¯ x− ˜∈V: x ˆX ≤ ε the following algorithm converges linearly to some x x0 = x ˆ,
x+1 = x − L [T (x ) − y¯] ,
= 0, 1, 2, . . . .
(A.52)
The limit x ˜ ∈ V satisﬁes L[T (˜ x) − y¯] = 0. Proof: We apply the contraction mapping principle of the preceding theorem to the mapping ! S(x) := x − L[T (x) − y¯] = L T (ˆ x)x − T (x) + T (¯ x) from V into itself on some closed ball B[ˆ x, ρ] ⊂ V . We estimate S(x) − S(z)X
≤ ≤
LL(Y,X) T (ˆ x)(x − z) + T (z) − T (x)Y 2 LL(Y,X) x − zX T (ˆ x) − T (z)L(X,Y ) T (z) − T (x) + T (z)(x − z)Y 3 + x − zX
and S(x) − x ˆX
≤ ≤
LL(Y,X) T (ˆ x)(x − x ˆ) − T (x) + T (¯ x)Y
LL(Y,X) T (ˆ x)(x − x ˆ) + T (ˆ x) − T (x)Y + LL(Y,X) T (ˆ x) − T (¯ x)Y
First, we choose ρ > 0 such that $ % T (z) − T (x) + T (z)(x − z)Y 1 LL(Y,X) T (ˆ x) − T (z)L(X,Y ) + ≤ x − zX 2 for all x, z ∈ B[ˆ x, ρ]. This is possible because T is continuously diﬀerentiable. Next, we choose ε > 0 such that LL(Y,X) T (ˆ x) − T (¯ x)Y ≤
ρ 2
A.8 Convex Analysis
357
for ¯ x−x ˆX ≤ ε. Then we conclude that S(x) − S(z)X
≤
S(x) − x ˆX
≤
1 x − zX for all x, z ∈ B[ˆ x, ρ] , 2 1 1 x − x ˆX + ρ ≤ ρ for all x ∈ B[ˆ x, ρ] . 2 2
Application of the contraction mapping principle ends the proof.
The notion of partial derivatives of mappings T : X × Z → Y is introduced just as for functions of two scalar variables as the Fr´echet derivative of the mappings T (·, z) : X → Y for z ∈ Z and T (x, ·) : Z → Y for x ∈ X. We denote the partial derivatives in (x, z) ∈ X × Z by ∂ T (x, z) ∈ L(X, Y ) and ∂x
∂ T (x, z) ∈ L(Z, Y ) . ∂z
Theorem A.66 (Implicit Function Theorem) Let T : X ×Z → Y be continuously Fr´echet diﬀerentiable with partial derivatives ∂ ∂ x, zˆ) = 0 ∂x T (x, z) ∈ L(X, Y ) and ∂z T (x, z) ∈ L(Z, Y ). Furthermore, let T (ˆ ∂ x, zˆ) : Z → Y be a normisomorphism from Z onto Y . Then there and ∂z T (ˆ exists a neighborhood U of x ˆ and a Fr´echet diﬀerentiable function ψ : U → Z such that ψ(ˆ x) = zˆ and T x, ψ(x) = 0 for all x ∈ U . The Fr´echet derivative ψ ∈ L(X, Z) is given by $ % −1 ∂ ∂ T x, ψ(x) T x, ψ(x) , x ∈ U . ψ (x) = − ∂z ∂x The following special case is particularly important. ˆ = 0 and ∂ T (ˆ ˆ = 0. Let Z = Y = K; thus T : X × K → K and T (ˆ x, λ) x, λ) ∂λ Then there exists a neighborhood U of ˆ and a Fr´echet diﬀerentiable function x ˆ and T x, ψ(x) = 0 for all x ∈ U and ψ : U → K such that ψ(ˆ x) = λ ψ (x) = −
∂ ∂λ T
∂ 1 T x, ψ(x) ∈ L(X, K) = X ∗ , x, ψ(x) ∂x
x∈U,
where again X ∗ denotes the dual space of X.
A.8
Convex Analysis
Definition A.67 Let X be a normed space. (a) A set M ⊂ X is called convex if for all x, y ∈ M and all λ ∈ [0, 1] also λx + (1 − λ)y ∈ M . (b) Let M ⊂ X be convex. A function f : M → R is called convex if for all x, y ∈ M and all λ ∈ [0, 1] f λx + (1 − λ)y ≤ λf (x) + (1 − λ) f (y) .
358
Appendix A
(c) A function f : M → R is called concave if −f is convex; that is, if for all x, y ∈ M and all λ ∈ [0, 1] f λx + (1 − λ)y ≥ λf (x) + (1 − λ) f (y) . (d) f is called strictly convex if f λx + (1 − λ)y < λf (x) + (1 − λ) f (y) x, y ∈ M and all λ ∈ (0, 1) with x = y. The deﬁnition for a strictly concave function is formulated analogously. The deﬁnition of convexity of a set or a function can be extended to more than two elements. Lemma A.68 Let X be a normed space. (a) A set M ⊂ X is convex if, and m only if, for any elements xj ∈ M , j = 1, . . . , m, and λj ≥ 0 with j=1 λj = 1 also the convex combination m j=1 λj xj belongs to M . (b) Let M ⊂ X be convex. A function f : M → R is convex if, and only if, f
m
λj xj
≤
j=1
m
λj f (xj )
j=1
m for all xj ∈ M , j = 1, . . . , m, and λj ≥ 0 with j=1 λj = 1. For concave functions the characterization holds analogously. For any set A ⊂ X of a normed space A the set conv A =
m j=1
λj aj : aj ∈ A, λj ≥ 0,
m
λj = 1, m ∈ N
(A.53)
j=1
is convex by the previous lemma and is called the convex hull of A The following separation theorem is one of the most important tools in the area of convex analysis. Theorem A.69 Let X be a normed space over R and A, B ⊂ X two convex sets with A ∩ B = ∅. Furthermore, let A be open. Then there exists a hyperplane which separates A and B; that is, there exists ∈ X ∗ and γ ∈ R such that = 0 and , a X ∗ ,X ≥ γ ≥ , b X ∗ ,X for all a ∈ A and b ∈ B . Here, , x X ∗ ,X denotes the dual pairing; that is, the application of ∈ X ∗ to x ∈ X.
A.8 Convex Analysis
359
For a proof we refer to,
e.g., [139], Chapter II. The hyperplane is given by x ∈ X : , x X ∗ ,X = γ . The convexity can be characterized easily for diﬀerentiable functions. Lemma A.70 Let X be a normed space and M ⊂ X be an open convex set and f : M → R be Fr´echet diﬀerentiable on M . Then f is convex if, and only if, f (y) − f (x) − f (x)(y − x) ≥ 0
for all x, y ∈ M .
f is strictly convex if, and only if, the inequality holds strictly for all x = y. For concave functions the characterizations hold analogously. Proof: Let ﬁrst f be convex, x, y ∈ M and t ∈ (0, 1]. From ! the convexity of f we conclude that f x + t(y − x) ≤ f (x) + t f (y) − f (y) , thus f (y) − f (x) ≥ =
! 1 f x + t(y − x) − f (x) t ! 1 f x + t(y − x) − f (x) − t f (x)(y − x) + f (x)(y − x) . t
The ﬁrst term on the righthand side tends to zero as t → 0. This proves f (y) − f (x) ≥ f (x)(y − x). Let now f (u) − f (v) ≥ f (v)(u − v) for all u, v ∈ M . For x, y ∈ M and λ ∈ [0, 1] apply this twice to v = λx + (1 − λ)y and u = y and u = x, respectively. With y − v = −λ(x − y) and x − v = (1 − λ)(x − y) this yields f (y) − f (v) ≥
f (v)(y − v) = λf (v)(y − x) ,
f (x) − f (v) ≥
f (v)(x − v) = (1 − λ)f (v)(x − y) .
Multiplying the ﬁrst inequality by 1 − λ and the second by λ and adding these inequalities yields the assertion. Note that we could equally well write f (x), y − x X ∗ ,X for f (x)(y − x). We use both notations synonymously. This characterization motivates the deﬁnition of the subgradient of a convex function. Definition A.71 Let X be a normed space over R with dual X ∗ , M ⊂ X be an open convex set, and f : M → R be a convex function. For x ∈ M the set
∂f (x) := ∈ X ∗ : f (z) − f (x) − , z − x X ∗ ,X ≥ 0 for all z ∈ M is called the subgradient of f at x. As one sees from the function f (x) = x for x ∈ R the subgradient ∂f is, in general, a multivalued function. It is not empty for continuous functions, and it consists of the derivative as the only element for diﬀerentiable functions.
360
Appendix A
Lemma A.72 Let X be a normed space over R with dual X ∗ , M ⊂ X be an open convex set, and f : M → R be a convex and continuous function. (a) Then ∂f (x) = ∅ for all x ∈ M .
(b) If f is Fr´echet diﬀerentiable at x ∈ M then ∂f (x) = f (x) . In particular, ∂f is single valued at x.
Proof: (a) Deﬁne the set D ⊂ X × R by
D := (z, r) ∈ M × R : r > f (z) . Then D is open because M is open and f is continuous and D is also convex (see Problem A.6). Fix x ∈ M . Then we observe that x, f (x) ∈ / D. The separation theorem for convex sets (see Theorem A.69) yields the existence of (, s) ∈ X ∗ × R with (, s) = (0, 0) and γ ∈ R such that , z X ∗ ,X + sr ≤ γ ≤ , x X ∗ ,X + s f (x)
for all r > f (z) , z ∈ M .
Letting r tend to +∞ implies that s ≤ 0. Also s = 0 because otherwise , z X ∗ ,X ≤ , x X ∗ ,X for all z ∈ M which would imply that also vanishes2 , a contradiction to (, s) = (0, 0). Therefore, s < 0 and, without loss of generality (division by s), we can assume that s = −1. Letting r tend to f (z) yields , z X ∗ ,X − f (z) ≤ , x X ∗ ,X − f (x) which shows that ∈ ∂f (x). (b) f (x) ∈ ∂f (x) follows from Lemma A.70. Let ∈ ∂f (x) and y ∈ X arbitrary with y = 0. For suﬃciently small t > 0 we have that x + ty ∈ M and thus f (x + ty) − f (y) − tf (x)y ≥ t − f (x) y . Division by t > 0 and letting t tend to zero implies thatthe lefthand side tends to zero by the deﬁnition of the derivative. Therefore, − f (x) y ≤ 0. Since this holds for all y ∈ X we conclude that = f (x). Lemma A.73 Let ψ : [0, a] → R be a continuous, concave, and monotonically increasing function with ψ(0) = 0. (a) Then ψ(st) ≤ max{1, s} ψ(t) for all s, t ≥ 0 with st, t ∈ [0, a]. √ !2 (b) The function t → ψ( t) is concave on [0, a2 ]. (c) Let K : X → Y be a linear compact operator between Hilbert spaces and a ≥ KL(X,Y ) . Then ∗ 1/2 ψ (K K) z X ≤ ψ KzY
2 The
for all z ∈ X with zX ≤ 1 .
reader should verify this himself by using that M is an open set.
A.8 Convex Analysis
361
Proof: (a) If s ≤ 1 the assertion follows from the monotonicity of ψ. If s ≥ 1 then 1 1 1 1 1 (st) + 1 − ψ(st) + 1 − ψ(st) . ψ(t) = ψ 0 ≥ ψ(0) = s s s s s √ !2 (b) Set φ(t) = ψ( t) for t ∈ [0, a2 ]. If ψ was twice diﬀerentiable then an elementary calculation shows that √ √ √ ! √ ψ ( t) √ √ 1 φ (t) = ψ( t ψ ( t) − ψ( t) + t) ψ ( t) . 2t 2t3/2 The ﬁrst term is nonpositive because ψ ≥ 0 and s ψ (s) − ψ(s) = ψ(0) − ψ(s) − (0 − s)ψ (s) ≤ 0 by Lemma A.70. The second term is also nonpositive because ψ ≥ 0 and ψ ≤ 0. Therefore, φ (t) ≤ 0 for all t which proves that φ is concave. If ψ is merely continuous then we approximate ψ by a sequence (ψ ) of smooth concave and monotonically increasing functions with ψ (0) = 0 and ψ − ψ∞ → 0 as tends to inﬁnity. We sketch the proof but leave the details to the reader. In the ﬁrst step we approximate ψ by the interpolating polygonal function pm a with respect to tj = j m , j = 0, . . . , m, with values ψj := ψ(tj ) at tj . Then, for any ∈ N there exists m = m() ∈ N with pm − ψ∞ ≤ 1/. Next, we extend pm onto R by extending the ﬁrst and the last segment linearly; that is, m−1 by setting pm (t) = ψ1 tt1 for t ≤ 0 and pm (t) = ψ(a) + ψ(a)−ψ a−tm−1 (t − a) for t ≥ a. In the third step we smooth pm by using a molliﬁer;that is, a nonnegative 1 function φ ∈ C ∞ (R) with φ(t) = 0 for t ≥ 1 and −1 φ(t)dt = 1. We set & ' φρ (t) = ρ1 φ ρt and
∞ ψ˜ρ (t) =
ρ φρ (t − s) pm (s) ds =
−∞
φρ (s) pm (t − s) ds ,
t ∈ [0, a] .
−ρ
Then ψ˜ρ is in C ∞ [0, a], concave, monotonically increasing and ψ˜ρ −pm C[0,a] → 0 as ρ tends to zero. Finally we set ψ (t) = ψ˜ρ (t) − ψ˜ρ (0), where ρ = ρ() > 0 is such that ψ − pm C(0,a) ≤ 1/. This sketches the construction of the sequence ψ . √ !2 From the ﬁrst part we know that t → ψ ( t) is concave for every . Letting tend to inﬁnity proves the assertion. √ !2 ! (c) We set again φ(t) = ψ( t) for t ∈ 0, K2L(X,Y ) and let z ∈ X with zX ≤ 1. We decompose z = z0 + z ⊥ with z0 ∈ N (K) and z ⊥ ⊥ ∗ 1/2 z in the form ∗ z = ψ (K K)1/2 z ⊥ and Kz = Kz ⊥ . Therefore, it N (K). Then ψ (K K) suﬃces to take z ∈ X with z ⊥ N (K) and zX ≤ 1. Witha singular system {μj , xj , yj : j ∈ J} of K we expand such a z ∈ X as z = j∈J zj xj . We set Jn = J if J is ﬁnite and Jn = {1, . . . , n} if J = N and zˆ(n) = j∈Jn zˆj xj with
362
Appendix A
4 2 zˆj = zj / z (n) 2X = j∈Jn ˆ zj 2 = 1 and thus ∈Jn z  . Then ˆ ∗ 1/2 (n) 2 ψ (K K) zˆ X
=
!2 2 ψ(μj ) ˆ zj  = φ(μ2j ) ˆ zj  2
j∈Jn
≤
φ
j∈Jn
μ2j ˆ zj  2
⎡ ⎛ = ⎣ψ ⎝
j∈Jn
" # 2 = ψ K zˆ(n) Y
⎞⎤2 μ2j ˆ zj  2 ⎠ ⎦
j∈Jn
(n) where we used that φ is ∗concave. Letting n tend to inﬁnity yields zˆ → zˆ = 1/2 zˆ X ≤ ψ K zˆY . Therefore, z/zX and thus ψ (K K) ∗ 1/2 ψ (K K) z X = zX ψ (K ∗ K)1/2 zˆX 1 ≤ zX ψ K zˆY = zX ψ KzY zX ≤ ψ KzY
where we used the estimate ψ(st) ≤ s ψ(t) from part (a) for s = 1/zX ≥ 1 and t = KzY . Lemma A.74 There exist constants c+ > 0 and cp ≥ 0 with cp = 0 for p ≤ 2 and cp > 0 for p > 2 such that for all z ∈ R c+ zp if p ≤ 2 or z ≥ 12 , p p cp z ≤ 1 + z − 1 − pz ≤ (A.54) c+ z2 if p > 2 and z ≤ 12 . Proof: To show the upper estimate, let ﬁrst z ≥ 12 . Then 1 ≤ 2z and thus 1+zp −1−pz ≤ (1+z)p +p·1·z ≤ (3z)p +p 2p−1 zp−1 z = 3p +p 2p−1 zp . p p Let now z ≤ 12 ; that is, 1 ≥ 2z. Then := f (z). ! 1+z −1−pz = (1+z) −1−pz p−1 − 1 and f (z) = p(p − 1)(1 + z)p−2 . Taylor’s We compute f (z) = p (1 + z) formula yields
(1 + z)p − 1 − pz = f (z) = f (0) + f (0)z +
1 p(p − 1) f (ξ)z 2 = (1 + ξ)p−2 z 2 2 2
for some ξ with ξ ≤ z. Let ﬁrst p ≥ 2. Then (1 + ξ)p−2 ≤ (3/2)p−2 , and the estimate is shown. 1 1 p−2 because 1 + ξ ≥ Let now p < 2. Then (1 + ξ)p−2 = (1+ξ) 2−p ≤ z2−p = z 1 − ξ ≥ 2z − z = z. Therefore, f (z) ≤ the upper estimate of (A.54) is shown.
p(p−1) 2
zp−2 z 2 =
p(p−1) 2
zp , and
(p−1)/(p−2)
To show the lower estimate, we choose cp ∈(0, 1) for p>2 such that cp − 1/(p−2) p−1 ≥ 0 and set cp = 0 for p ≤ 2. We distinguish between cp + 1 − cp
A.9 Weak Topologies
363
several cases. Case A: z ≥ 0. Set f (z) := 1 + zp − 1 − pz − cp zp = (1 + z)!p − 1 − pz − cp z p for z ≥ 0. Then f (0) = 0 and f (z) = p (1! + z)p−1 − 1 − cp z p−1 and !f (0) = 0 and f (z) = p(p − 1) (1 + z)p−2 − cp z p−2 ≥ p(p − 1) z p−2 − cp z p−2 ≥ 0 because cp ≤ 1. Therefore, f is monotonically increasing, thus positive. Therefore, f is monotonically increasing, thus positive. Case B: z ≤ 0. We replace z by −z ≥ 0 and have to show that f (z) := 1 − zp − 1 + pz − cp z p ≥ 0 for all z ≥ 0. p − 1 + pz − cp z p and f (0) = 0 and Case B1: 0 ≤ z ≤ 1. Then f (z) = ! (1 − z) p−1 p−1 and f (0) = 0 and f (1) = p(1 − cp ) > 0. − cp z f (z) = p 1 − (1 − z) If p ≤ 2 then cp = 0 and thus f ≥ 0, thus f ≥ 0 on [0, 1].! Let p > 2. Then f (z) = p(p − 1) (1 − z)p−2 − cp z p−2 and f (0) > 0 and f (1) < 0. Since f < 0 on (0, 1) there exists exactly one zero zˆ ∈ (0, 1) of f . z , 1]. Therefore, the minimal Therefore, f increases on [0, zˆ] and decreases on [ˆ values of f on [0, 1] are obtained for z = 0 or z = 1 which are both nonnegative. Therefore, f ≥ 0 on [0, 1] which implies that also f is nonnegative.
−cp z p and f (1) = −1+p−cp > 0 Case B2: z ≥ 1. Then f (z) = (z −1)p −1+pz ! p−1 p−1 and f (1) = p(1 − cp ) > 0. + 1 − cp z and f (z) = p (z − 1) If p ≤ 2 we conclude that f ≥ 0 on [1, ∞), thus f ≥ 0 on [1, ∞) and thus also f ≥ 0 on [1, ∞). ! Let ﬁnally p > 2. Then f (z) = p(p − 1) (z − 1)p−2 − cp z p−2 and f (1) < 0 and f (z) → ∞ as z → ∞. The second derivative f vanishes at zˆ = 1 − 1/(p−2) !−1 cp > 1, which is negative on [1, zˆ) and positive for z > zˆ. Therefore, f (z) decreases on (1, zˆ) and increases for z > zˆ. Its minimal value is attained at zˆ which is computed as ! f (ˆ z ) = p (ˆ z − 1)p−1 − cp zˆp−1 + 1 p−1 ! p c(p−1)/(p−2) ≥ 0 − cp + 1 − cp1/(p−2) = p 1/(p−2) p−1 1 − cp by the choice of cp . Therefore, f is positive for z ≥ 1 and thus also f .
A.9
Weak Topologies
In this subsection, we recall the basics on weak topologies. We avoid the deﬁnition of the topology itself; that is, the deﬁnition of the family of open sets, but restrict ourselves to the concept of weak convergence (or weak∗ convergence). Definition A.75 Let X be a normed space and X ∗ its dual space with dual pairing , x X ∗ ,X for ∈ X ∗ and x ∈ X. (a) A sequence (xn) in X is said to converge weakly to x∈X if lim , xn X ∗ ,X = n→∞
, x X ∗ ,X for all ∈ X ∗ . We write xn x for the weak convergence.
364
Appendix A
(b) A sequence (n ) in X ∗ is said to converge weak∗ to ∈ X ∗ if ∗ lim n , x X ∗ ,X = , x X ∗ ,X for all x ∈ X. We write n for the n→∞ weak∗ convergence. First we note that weak convergence is indeed weaker than norm convergence. This follows directly from the continuity of the functional ∈ X ∗ . Furthermore, we note that weak convergence in X ∗ means that T, n X ∗∗ ,X ∗ → T, X ∗∗ ,X ∗ as n → ∞ for all T ∈ X ∗∗ where now ·, · X ∗∗ ,X ∗ denotes the dual pairing in (X ∗∗ , X ∗ ). By the canonical imbedding X → X ∗∗ (see formula (A.14)) weak convergence in X ∗ is stronger than weak∗ convergence. In reﬂexive spaces (see Deﬁnition A.22) the bidual X ∗∗ can be identiﬁed with X and, therefore, weak and weak∗ convergence coincide. If X is a Hilbert space with inner product (·, ·)X then, by the representation Theorem A.23 of Riesz, the dual space X ∗ is identiﬁed with X itself, and weak convergence xn x is equivalent to (xn , z)X → (x, z)X for all z ∈ X. We will need the following results which we cite without proof. Theorem A.76 Let X be a normed space and X ∗ its dual space and (xn ) a sequence in X which converges weakly to some x ∈ X. Then the following holds: (a) The weak limit is welldeﬁned; that is, if xn x and xn y then x = y. (b) The sequence (xn ) is bounded in norm; that is, xn X is bounded in R. (c) If X is ﬁnite dimensional then (xn ) converges to x in norm. Therefore, in ﬁnitedimensional spaces there is no diﬀerence between weak and norm convergence. (d) Let U ⊂ X be a convex and closed set. Then U is also weakly sequentially closed; that is, every weak limit point x ∈ X of a weakly convergent sequence (xn ) in U also belongs to U . (e) Let Y be another normed space and K : X → Y be a linear bounded operator. Then (Kxn ) converges weakly to Kx in Y . If K is compact then (Kxn ) converges in norm to Kx. The following theorem of Alaoglu–Bourbaki (see, e.g., [139]) is the essential ingredient to assure the existence of global minima of the Tikhonov functional for nonlinear inverse problems. Theorem A.77 Let X be a Banach space with dual space X ∗ . Then every bounded sequence (n ) in X ∗ has a weak∗ convergent subsequence. For Hilbert spaces the theorem takes the following form. Corollary A.78 Let X be a Hilbert space. Every bounded sequence (xn ) in X contains a weak accumulation point; that is, a weakly convergent subsequence.
A.10 Problems
A.10
365
Problems
A.1 (a) Show with Deﬁnition A.5 that a set M ⊂ X is closed if, and only if, the limit of every convergent sequence (xk )k in M also belongs to M . (b) Show that a subspace V ⊂ X is dense if, and only if, the orthogonal complement is trivial; that is, V ⊥ = {0}. A.2 Try to prove Theorem A.8. A.3 Let V ⊂ X ⊂ V be a Gelfand triple (see Deﬁnition A.26) and j : V → X and j : X → V the corresponding embedding operators. Show that both of them are onetoone with dense range. A.4 Deﬁne the operator A from the sequence space 2 into itself by 0, if k = 1 , (Ax)k := xk−1 , if k ≥ 2 , for x = (xk ) ∈ 2 . Show that λ = 1 is not an eigenvalue of A but I − A fails to be surjective. Therefore, 1 belongs to the spectrum of A. A.5 Let K : X → Y a compact operator between Hilbert spaces X and Y , and ! let f : 0, K2L(X,Y ) → R be continuous. Deﬁne the operator f (K ∗ K) from X into itself by (A.47). Show that this operator is well deﬁned (that is, f (K ∗ K)x deﬁnes an element in X for every x ∈ X), linear, and bounded and that it is compact if, and only if, f (0) = 0. A.6 Let A ⊂ X be an open set of a Hilbert space X and f : A → R convex and continuous. Show that the set D ⊂ X × R, deﬁned by
D := (z, r) ∈ A × R : r > f (z) . is convex and open in X × R.
Appendix B
Proofs of the Results of Section 2.7 In this appendix, we give the complete proofs of the theorems and lemmas of Chapter 2, Section 2.7. For the convenience of the reader, we formulate the results again. Theorem 2.20 (Fletcher–Reeves) Let K : X → Y be a bounded, linear, and injective operator between Hilbert spaces X and Y . The conjugate gradient method is welldeﬁned and either stops or produces sequences (xm ), (pm ) in X with the properties ∇f (xm ), ∇f (xj ) X = 0 for all j = m , (2.41a) and (Kpm , Kpj )Y = 0
for all j = m ;
(2.41b)
that is, the gradients are orthogonal and the directions pm are Kconjugate. Furthermore, (2.41c) ∇f (xj ), K ∗ Kpm X = 0 for all j < m . Proof:
First, we note the following identities:
(α) ∇f (xm+1 ) = 2 K ∗ (Kxm+1 − y) = 2 K ∗ (Kxm − y) − 2tm K ∗ Kpm = ∇f (xm ) − 2tm K ∗ Kpm . (β) pm , ∇f (xm+1 ) X = pm , ∇f (xm ) X − 2tm (Kpm , Kpm )Y = 0 by the deﬁnition of tm . 1 m 2 m 2 (γ) tm = 12 ∇f (xm ), pm X Kpm −2 Y = 4 ∇f (x )X /Kp Y since pm = 12 ∇f (xm ) + γm−1 pm−1 and (β). Now we prove the following identities by induction with respect to m:
© Springer Nature Switzerland AG 2021 A. Kirsch, An Introduction to the Mathematical Theory of Inverse Problems, Applied Mathematical Sciences 120, https://doi.org/10.1007/9783030633431
367
368
Appendix B
(i) ∇f (xm ), ∇f (xj ) X = 0 for j = 0, . . . , m − 1, (ii) (Kpm , Kpj )Y = 0 for j = 0, . . . , m − 1. Let m = 1. Then, using (α), (i) ∇f (x1 ), ∇f (x0 ) X = ∇f (x0 )2X − 2t0 Kp0 , K∇f (x0 ) Y = 0 , which vanishes by (γ) since p0 = 12 ∇f (x0 ). (ii) By the deﬁnition of p1 and identity (α), we conclude that (Kp1 , Kp0 )Y
=
(p1 , K ∗ Kp0 )X
1 = − 2t0 1 = − 2t0
$
% 1 1 0 1 0 ∇f (x ) + γ0 p , ∇f (x ) − ∇f (x ) X 2 ! 1 ∇f (x1 )2X − γ0 p0 , ∇f (x0 ) X = 0 , 2
where we have used (β), the deﬁnition of p0 , and the choice of γ0 . Now we assume the validity of (i) and (ii) for m and show it for m + 1: (i) For j = 0, . . . m − 1 we conclude that (setting γ−1 = 0 in the case j = 0) ∇f (xm+1 ), ∇f (xj ) X = ∇f (xm ) − 2tm K ∗ Kpm , ∇f (xj ) X = −2tm ∇f (xj ), K ∗ Kpm X = −4tm Kpj − γj−1 Kpj−1 , Kpm Y = 0 , where we have used 12 ∇f (xj ) + γj−1 pj−1 = pj and assertion (ii) for m. For j = m, we conclude that ∇f (xm+1 ), ∇f (xm ) X = ∇f (xm )2X − 2tm ∇f (xm ), K ∗ Kpm X 1 ∇f (xm )2X ∇f (xm ), K ∗ Kpm X = ∇f (xm )2X − 2 m 2 Kp Y by (γ). Now we write
1 m m−1 ∇f (x ) + γm−1 p (Kp , Kp )Y = Kp , K 2 Y 1 m Kp , K∇f (xm ) Y , = 2 which implies that ∇f (xm+1 ), ∇f (xm ) X vanishes. (ii) For j = 0, . . . , m − 1, we conclude that, using (α), 1 ∇f (xm+1 ) + γm pm , K ∗ Kpj (Kpm+1 , Kpj )Y = 2 X 1 m+1 j+1 ∇f (x = − ), ∇f (x ) − ∇f (xj ) X , 4tj m
m
m
Appendix B
369
which vanishes by (i). For j = m by (α) and the deﬁnition of pm+1 , we have 1 1 (Kpm+1 , Kpm )Y = ∇f (xm+1 ) + γm pm , ∇f (xm ) − ∇f (xm+1 ) 2tm 2 X 1 1 1 ∇f (xm+1 ), ∇f (xm ) X − ∇f (xm+1 )2X = 2tm 2 2 m m m m+1 + γm p , ∇f (x ) X − γm p , ∇f (x ) X . /0 1 = 12 ∇f (xm ) 2X
1 γm ∇f (xm )2X − ∇f (xm+1 )2X 4tm
=
by (i) and (β). This term vanishes by the deﬁnition of γm . Thus we have proven (i) and (ii) for m + 1 and thus for all m = 1, 2, 3 . . . To prove (2.41c) we write ∇f (xj ), K ∗ Kpm X = 2 pj − γj−1 pj−1 , K ∗ Kpm X = 0 for j < m , and note that we have already shown this in the proof.
Theorem 2.21 Let (xm ) and (pm ) be the sequences of the conjugate gradient method. Deﬁne the space Vm := span {p0 , . . . , pm }. Then we have the following equivalent characterizations of Vm :
(2.42a) Vm = span ∇f (x0 ), . . . , ∇f (xm ) =
span p0 , K ∗ Kp0 , . . . , (K ∗ K)m p0
(2.42b)
for m = 0, 1, . . . . Furthermore, xm is the minimum of f (x) = Kx − y2Y on Vm−1 for every m ≥ 1.
Proof: Let V˜m = span ∇f (x0 ), . . . , ∇f (xm ) . Then V0 = V˜0 . Assume that we have already shown that Vm = V˜m . Since pm+1 = 12 ∇f (xm+1 ) + γm pm we also have that Vm+1 = V˜m+1 . Analogously, deﬁne the space Vˆm := span p0 , . . . , (K ∗ K)m p0 . Then V0 = Vˆ0 and V1 = Vˆ1 . Assume that we have already shown that Vj = Vˆj for all j = 0, . . . , m. Then we conclude that pm+1
= K ∗ (Kxm+1 − y) + γm pm = K ∗ (Kxm − y) − tm K ∗ Kpm + γm pm = pm − γm−1 pm−1 − tm K ∗ Kpm + γm pm ∈ Vˆm+1 .
(B1)
On the other hand, from (K ∗ K)m+1 p0 = (K ∗ K) (K ∗ K)m p0
!
∈ (K ∗ K)(Vm )
and K ∗ Kpj ∈ Vj+1 ⊂ Vm+1 by (B1) for j = 0, . . . , m, we conclude also that Vm+1 = Vˆm+1 .
370
Appendix B
Now every xm lies in Vm−1 . This is certainly true for m = 1, and if it holds for m then it holds also for m + 1 since xm+1 = xm − tm pm ∈ Vm . xm is the minimum of f on Vm−1 if and only if (Kxm − y, Kz)Y = 0 for all z ∈ Vm−1 . By (2.42a), this is the case if and only if ∇f (xm ), ∇f (xj ) X = 0 for all j = 0, . . . , m − 1. This holds by the preceding theorem. Lemma 2.22 (a) The polynomial Qm , deﬁned by Qm (t) = 1 − t Pm−1 (t) with Pm−1 from (2.43), minimizes the functional H(Q) := Q(KK ∗ )y2Y and satisﬁes
on
{Q ∈ Pm : Q(0) = 1}
H Qm = Kxm − y2Y .
(b) For k = , the following orthogonality relation holds: Qk , Q :=
∞
2 μ2j Qk (μ2j ) Q (μ2j ) (y, yj )Y = 0 .
(2.44)
j=1
If y ∈ / span {y1 , . . . , yN } for any N ∈ N, then ·, · deﬁnes an inner product on the space P of all polynomials. Proof: (a) Let Q ∈ Pm be an arbitrary polynomial with Q(0) = 1. Set P(t) := 1 − Q(t) /t and x := P(K ∗ K)K ∗ y = −P(K ∗ K)p0 ∈ Vm−1 . Then y − Kx = y − KP(K ∗ K)K ∗ y = Q(KK ∗ )y . Thus
H Q = Kx − y2Y ≥ Kxm − y2Y = H Qm .
(b) Let k = . From the identity ∞
1 ∇f (xk ) = K ∗ (Kxk − y) = − μj Qk (μ2j )(y, yj )Y xj , 2 j=1 we conclude that 0 =
1 ∇f (xk ), ∇f (x ) X = Qk , Q . 4
The properties of the inner product are obvious, except perhaps the deﬁniteness. If Qk , Qk = 0, then Qk (μ2j ) (y, yj )Y vanishes for all j ∈ N. The assumption on y implies that (y, yj )Y = 0 for inﬁnitely many j. But then the polynomial Qk has inﬁnitely many zeros μ2j . This implies Qk = 0, which ends the proof. The following lemma is needed for the proof of Theorem 2.24. Lemma Let 0 < m ≤ m(δ) where m(δ) satisﬁes (2.46) and deﬁne the space X σ = (K ∗ K)σ/2 for any σ > 0, equipped with the norm xX σ :=(K ∗ K)−σ/2 xX .
Appendix B
371
Let y δ ∈ / span{y1 , . . . , yN } for any N ∈ N with y δ − Kx∗ Y ≤ δ, and let σ x ∈ X for some σ > 0, and x∗ X σ ≤ E. Then ∗
E Kxm,δ − y δ Y ≤ δ + (1 + σ)(σ+1)/2 d Qδm (0)(σ+1)/2 dt
where Qδm denotes the polynomial Qm for y = y δ . Before we prove this lemma, we recall some properties of orthogonal functions (see [254]). As we saw in Lemma 2.22, the polynomials Qm are orthogonal with respect to the inner product ·, · . Therefore, the zeros λj,m , j = 1, . . . , m, of Qm are all real and positive and lie in the interval 0, K2L(X,Y ) . By their normalization, Qm must have the form m 9
Qm (t) =
j=1
1−
t λj,m
.
Furthermore, the zeros of two subsequent polynomials interlace; that is, 0 < λ1,m < λ1,m−1 < λ2,m < λ2,m−1 < · · · < λm−1,m−1 < λm,m < K2L(X,Y ) . Finally, from the factorization of Qm , we see that d 1 Qm (t) = −Qm (t) and dt λ −t j=1 j,m $ % 2 m m d2 1 1 Q (t) = Q (t) − m m dt2 λ −t (λj,m − t)2 j=1 j,m j=1 m
= Qm (t)
m j,=1 j=
1 . (λj,m − t)(λ,m − t) 2
d d For 0 ≤ t ≤ λ1,m , we conclude that dt Qm (t) ≤ 0, dt 2 Qm (t) ≥ 0, and 0 ≤ Qm (t) ≤ 1. For the proof of the lemma and the following theorem, it is convenient to introduce two orthogonal projections. For any ε > 0, we denote by Lε : X → X and Mε : Y → Y the orthogonal projections Lε z := (z, xn )X xn , z ∈ X , μ2n ≤ε
Mε z
:=
(z, yn )Y yn ,
z∈Y
μ2n ≤ε
where {μn , xn , yn : n ∈ J} denotes a singular system of K.
372
Appendix B The following estimates are easily checked: √
Mε KxY ≤
1 and (I − Lε )xX ≤ √ (I − Mε )KxY ε
εLε xX
for all x ∈ X. Proof of the lemma: Let λj,m be the zeros of Qδm . We suppress the dependence on δ. The orthogonality relation (2.44) implies that Qδm is orthogonal to the polynomial t → Qδm (t)/(λ1,m − t) of degree m − 1; that is, ∞
μ2n Qδm (μ2n )
n=1
2 Qδm (μ2n ) δ (y , yn )Y = 0 . 2 λ1,m − μn
This implies that
Qδm (μ2n )2
μ2n ≤λ1,m
= ≥
δ μ2n (y , yn )Y 2 2 λ1,m − μn
δ μ2n (y , yn )Y 2 μ2n − λ1,m μ2n >λ1,m 2 Qδm (μ2n )2 (y δ , yn )Y . Qδm (μ2n )2
μ2n >λ1,m
From this, we see that ⎛ y δ − Kxm,δ 2Y
= ⎝
+
μ2n ≤λ1,m
≤
μ2n ≤λ1,m
⎞
⎠ Qδm (μ2n )2 (y δ , yn )Y 2
μ2n >λ1,m
Qδm (μ2n )2 .
δ μ2n (y , yn )Y 2 1+ 2 λ1,m − μn /0 1
=:Φm (μ2n )2
= Mλ1,m Φm (KK ∗ )y δ 2Y , where we have set
Φm (t) :=
Qδm (t)
1+
t λ1,m − t
=
Qδm (t)
λ1,m . λ1,m − t
Therefore, y δ − Kxm,δ Y ≤ Mλ1,m Φm (KK ∗ )(y δ − y ∗ )Y + Mλ1,m Φm (KK ∗ )Kx∗ Y .
Appendix B
373
We estimate both terms on the righthand side separately: 2 Mλ1,m Φm (KK ∗ )(y δ − y ∗ )2Y = Φm (μ2n )2 (y δ − y ∗ , yn )Y μ2n ≤λ1,m
≤ Mλ1,m Φm (KK ∗ )Kx∗ 2Y
=
max
Φm (t)2 y δ − y ∗ 2Y ,
2 ! −2σ ∗ Φm (μ2n )2 μ2+2σ μn (x , xn )X n
0≤t≤λ1,m
μ2n ≤λ1,m
≤
max
0≤t≤λ1,m
! t1+σ Φm (t)2 x∗ 2X σ .
The proof is ﬁnished provided we can show that 0 ≤ Φm (t) ≤ 1 and σ+1 1+σ 1+σ 2 t Φm (t) ≤ for all 0 ≤ t ≤ λ1,m . d δ  dt Qm (0) The ﬁrst assertion follows from Φm (0) = 1, Φm (λ1,m ) = 0, and ! d Φm (t)2 = dt
λ1,m d λ1,m + Qm (t)2 2 Qm (t) Qm (t) dt λ1,m − t (λ1,m − t)2 ⎤ ⎡ m 1 1 2⎣ ⎦ ≤ 0. −2 Φm (t) λ1,m − t λ − t j,m j=1
=
Now we set ψ(t) := t1+σ Φm (t)2 . Then ψ(0) = ψ(λ1,m ) = 0. Let tˆ ∈ (0, λ1,m ) be the maximum of ψ in this interval. Then ψ (tˆ) = 0, and thus by diﬀerentiation ! d (σ + 1) tˆσ Φm (tˆ)2 + tˆσ+1 Φm (tˆ)2 = 0 ; dt that is,
⎡
≥
⎤
m 1 ⎦ ≥ tˆ ˆ ˆ λ −t λ1,m − t λ − tˆ j=1 j,m j=1 j,m m d δ 1 ˆ ˆ = t Qm (0) . t λ dt j=1 j,m
σ + 1 = tˆ⎣2
m
1
−
1
d δ This implies that tˆ ≤ (σ + 1)/ dt Qm (0). With ψ(t) ≤ tˆσ+1 for all t ∈ [0, λ1,m ], the assertion follows.
Theorem 2.24 Assume that y ∗ and y δ do not belong to the linear span of ﬁnitely many yj . Let the sequence xm(δ),δ be constructed by the conjugate gradient method with stopping rule (2.46) for ﬁxed parameter r > 1. Let x∗ = (K ∗ K)σ/2 z ∈ X σ for some σ > 0 and z ∈ X. Then there exists c > 0 with x∗ − xm(δ),δ X ≤ c δ σ/(σ+1) E 1/(σ+1) ,
(2.47)
374
Appendix B
where E = zX . Proof: Similar to the analysis of Landweber’s method, we estimate the error by the sum of two terms: the ﬁrst converges to zero as δ → 0 independently of m, and the second term tends to inﬁnity as m → ∞. The role of the norm d δ 1/2 Qm (0) . Rα L(Y,X) here is played by dt First, let δ and m := m(δ) be ﬁxed. Set for abbreviation d q := Qδm (0) . dt Choose 0 < ε ≤ 1/q ≤ λ1,m . With x ˜ := x∗ − Qδm (K ∗ K)x∗ = Pδm−1 (K ∗ K)K ∗ y ∗ we conclude that x∗ − xm,δ X
≤ ≤
Lε (x∗ − xm,δ )X + (I − Lε )(x∗ − xm,δ )X Lε (x∗ − x ˜X + Lε (˜ x − xm,δ )X 1 + √ (I − Mε )(y ∗ − Kxm,δ )Y ε
≤
Lε Qδm (K ∗ K)x∗ X + Lε Pδm−1 (K ∗ K)K ∗ (y ∗ − y δ )X 1 1 + √ y ∗ − y δ Y + √ y δ − Kxm,δ Y ε ε σ/2 δ √ 1+r E max t Qm (t) + δ max t Pδm−1 (t) + √ δ . 0≤t≤ε 0≤t≤ε ε
≤
From ε ≤ λ1,m and 0 ≤ Qδm (t) ≤ 1 for 0 ≤ t ≤ λ1,m , we conclude that 0 ≤ tσ/2 Qδm (t) ≤ εσ/2
for 0 ≤ t ≤ ε .
Furthermore, 0 ≤ t Pδm−1 (t)2 =
d ! 1 − Qδm (t) 1 − Qδm (t) ≤ Qδm (0) t dt /0 1 . /0 . 1 ≤1
d = − dt Qδm (s)
for some s ∈ [0, ε]. Thus we have proven the basic estimate δ √ x∗ − xm,δ X ≤ E εσ/2 + (1 + r) √ + q δ ε
for
0 0, which depends only on σ and r. This case is ﬁnished. Case II: ε∗ > 1/q. In this case, we substitute ε = 1/q in (B2) and conclude that √ √ σ/2 x∗ − xm,δ X ≤ E q −σ/2 + (r + 2) q δ ≤ E ε∗ + (r + 2) q δ σ/(σ+1) r+1 √ ≤ δ σ/(σ+1) E 1/(σ+1) + (r + 2) q δ . σ d δ Qm(δ) (0). Until now, we have It remains to estimate the quantity q = qm = dt not used the stopping rule. We will now use this rule to prove the estimate 2/(σ+1) E qm ≤ c (B3) δ for some c > 0, which depends only on σ and r. Analogously to qm , we deﬁne d δ Qm(δ)−1 (0). By the previous lemma, we already know that qm−1 := dt rδ < y δ − Kxm(δ)−1,δ Y ≤ δ + (1 + σ)(σ+1)/2
E (σ+1)/2 qm−1
;
that is,
(1 + σ)(1+σ)/2 E . r−1 δ We have to prove such an estimate for m instead of m − 1. (σ+1)/2
qm−1
≤
(B4)
Choose T > 1 and ρ∗ ∈ (0, 1) with T qm−1 /ρ∗ . From λj,m ≥ λj−1,m−1 for all j = 2, . . . , m, we conclude that m m−1 d 1 1 1 1 qm = Qδm (0) = ≤ + = + qm−1 . dt λ λ1,m λ λ1,m j=1 j,m j=1 j,m−1 This implies that qm−1 ≤ ρ∗ qm ≤
ρ∗ + ρ∗ qm−1 ; λ1,m
376
Appendix B
that is, qm−1 ≤
ρ∗ . (1 − ρ∗ ) λ1,m
Finally, we need 1 λ2,m
≤
1 λ1,m−1
≤
m−1
1
j=1
λj,m−1
= qm−1 ≤
1 ρ∗ . 1 − ρ∗ λ1,m
Now we set ε := T λ1,m . Then ρ∗ λ2,m ≤ 2 λ2,m . 1 − ρ∗ Deﬁne the polynomial φ ∈ Pm−1 by −1 m 9 t t δ φ(t) := Qm (t) 1 − = 1− . λ1,m λj,m j=2 ε ≤ T
For t ≤ ε and j ≥ 2, we note that t ε 1 ≥ 1 − ≥ 1 − ≥ −1 ; λj,m λ2,m that is, φ(t) ≤ 1 for all 0 ≤ t ≤ ε . For t ≥ ε, we conclude that 1 − t = t − λ1,m ≥ ε − 1 = T − 1 ; λ1,m λ1,m λ1,m that is,
1 δ Qm (t) for all t ≥ ε . T −1 Since φ(0) = 1, we can apply Lemma 2.22. Using the projector Mε , we conclude that φ(t) ≤
rδ