In the past two decades, convex analysis and optimization have been developed in Hadamard spaces. This book represents a
208 59 1MB
English Pages 193 [194] Year 2014
Table of contents :
Preface
1 Geometry of nonpositive curvature
1.1 Geodesic metric spaces
1.2 Meet Hadamard spaces
1.3 Equivalent conditions for CAT(0)
2 Convex sets and convex functions
2.1 Convex sets
2.2 Convex functions
2.3 Convexity and probability measures
3 Weak convergence in Hadamard spaces
3.1 Existence of weak limits
3.2 Weak convergence and convexity
3.3 An application in fixed point theory
4 Nonexpansive mappings
4.1 Kirszbraun–Valentine extension
4.2 Resolvent of a nonexpansive mapping
4.3 Strongly continuous semigroup
5 Gradient flow of a convex functional
5.1 Gradient flow semigroup
5.2 Mosco convergence and its consequences
5.3 Lie–Trotter–Kato formula
6 Convex optimization algorithms
6.1 Convex feasibility problems
6.2 Fixed point approximations
6.3 Proximal point algorithm
7 Probabilistic tools in Hadamard spaces
7.1 Random variables and expectations
7.2 Law of large numbers
7.3 Conditional expectations
8 Tree space and its applications
8.1 Construction of the BHV tree space
8.2 Owen–Provan algorithm
8.3 Medians and means of trees
References
Index
Miroslav Bačák Convex Analysis and Optimization in Hadamard Spaces
De Gruyter Series in Nonlinear Analysis and Applications
 Editor in Chief Jürgen Appell, Würzburg, Germany Editors Catherine Bandle, Basel, Switzerland Alain Bensoussan, Richardson, Texas, USA Avner Friedman, Columbus, Ohio, USA KarlHeinz Hoffmann, Munich, Germany Mikio Kato, Nagano, Japan Umberto Mosco, Worcester, Massachusetts, USA Louis Nirenberg, New York, USA Boris N. Sadovsky, Voronezh, Russia Alfonso Vignoli, Rome, Italy Katrin Wendland, Freiburg, Germany
Volume 22
Miroslav Bačák
Convex Analysis and Optimization in Hadamard Spaces 
Mathematics Subject Classification 2010 46T99, 47H20, 49M20, 49M25, 49M27, 51F99, 52A01, 60B99, 60J10, 92D15 Author Miroslav Bačák Max Planck Institute for Mathematics in the Sciences Inselstraße 22 04103 Leipzig Germany [email protected]
ISBN 9783110361032 eISBN (PDF) 9783110361629 SetISBN 9783110361636 ISSN 0941813X Library of Congress CataloginginPublication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2014 Walter de Gruyter GmbH, Berlin/Boston Typesetting: letex publishing services GmbH, Leipzig Printing and binding: CPI books GmbH, Leck ♾ Printed on acidfree paper Printed in Germany www.degruyter.com
Preface Convex analysis and consequently also convex optimization have been traditionally developed in the realm of linear spaces. In this book we attempt to show that a rich theory of convex functions and convex sets can be built also in spaces with no linear structure, namely in Hadamard spaces. Surprisingly, many of the classical results we came to expect in Euclidean or Hilbert spaces extend into this nonlinear setting. On the other hand some areas of convex analysis have so far no counterparts in Hadamard spaces and this offers a challenge for years to come. The history of Hadamard spaces and their role in geometry and geometric group theory have been described in a number of books, for instance [3, 25, 51, 56, 104], and we refer the interested reader therein. Since convexity is an important element in Hadamard space geometry, convex analysis in these spaces is believed to arise as a rightful subject field. It can be also viewed as a subfield of analysis in metric spaces [8], and then it turns out that a Hadamard space is the metric space where the strongest results can be obtained. Building upon convex analysis, convex optimization provides us with algorithms for solving a variety of problems which may appear in sciences and engineering. We will describe an application of convex optimization in Hadamard spaces to computational biology in the last chapter. One of the nice features of the theory presented in this book is that it originated in a pleasant mix of subject areas. The tools and methods are far from being restricted just to the analytical ones. Indeed, we shall see that analysis, geometry, probability, optimization and combinatorics are all mixed together here in a natural and fruitful way, and convexity is an underlying property which stands behind every single result. Let us now briefly outline the origins of convex analysis in Hadamard spaces. Since 1990s there has been a considerable interest in generalized harmonic maps from a measure space into a Hadamard space, which was triggered by the seminal papers [96, 102, 121] and made a bridge from geometry to analysis in Hadamard spaces. When developing the theory of generalized harmonic maps, J. Jost established numerous convex analytical results in Hadamard spaces. His contributions include resolvents of convex functions, gradient flow semigroups, 𝛤limits of convex functions [103, 105] and barycenters of probability measures [102]. At the same time, gradient flows were independently studied by U. Mayer [139], who was the first to consider parabolic problems in Hadamard space. We would also like to mention a related work in a hyperbolic space due to S. Reich and D. Shoikhet [175]. Substantial advancements in the theory of generalized harmonic maps were later achieved by K.T. Sturm via a probabilistic approach. As a matter of fact, he developed probability theory in Hadamard spaces and also surveyed many already existing results in an extraordinarily lucid way [192, 193, 195, 196]. A number of other authors have contributed since then to convex analysis in Hadamard spaces and they shall be mentioned in the
vi  Preface course of our text. Each chapter ends with bibliographical remarks describing where the material comes from. In this book we intend to present a systematic theory of convex functions, convex sets and mappings from a measure space into a Hadamard space from the rudiments of the theory to the more advanced results which have already achieved a sufficient level of maturity. The only existing book on analysis in Hadamard spaces is by J. Jost [104] and it is our aim here to cover also those parts of the theory which have appeared since it came out. Our exposition is rather selfcontained. The only prerequisite which is anticipated is a positive attitude towards nonpositive curvature. This book was written at the Max Planck Institute in Leipzig. It is my great pleasure to thank Jürgen Jost and Martin Kell for all the discussions on the subject we had during the time. Special thanks go to Simeon Reich for his very valuable comments on the manuscript. Finally, I am grateful to the people at De Gruyter for their very efficient and professional work during the whole process of publication. The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/ 20072013) / ERC grant agreement no 267087. We acknowledge the existence of IPE, the drawing editor, which we used to draw all the pictures.
Contents Preface  v 1 1.1 1.2 1.3
Geometry of nonpositive curvature  1 Geodesic metric spaces  1 Meet Hadamard spaces  5 Equivalent conditions for CAT(0)  20
2 2.1 2.2 2.3
Convex sets and convex functions  30 Convex sets  30 Convex functions  36 Convexity and probability measures  48
3 3.1 3.2 3.3
Weak convergence in Hadamard spaces  58 Existence of weak limits  58 Weak convergence and convexity  63 An application in fixed point theory  65
4 4.1 4.2 4.3
Nonexpansive mappings  69 Kirszbraun–Valentine extension  69 Resolvent of a nonexpansive mapping  73 Strongly continuous semigroup  76
5 5.1 5.2 5.3
Gradient flow of a convex functional  86 Gradient flow semigroup  87 Mosco convergence and its consequences  103 Lie–Trotter–Kato formula  108
6 6.1 6.2 6.3
Convex optimization algorithms  117 Convex feasibility problems  117 Fixed point approximations  121 Proximal point algorithm  125
7 7.1 7.2 7.3
Probabilistic tools in Hadamard spaces  139 Random variables and expectations  139 Law of large numbers  141 Conditional expectations  145
viii  Contents 8 8.1 8.2 8.3
Tree space and its applications  158 Construction of the BHV tree space  158 Owen–Provan algorithm  161 Medians and means of trees  165
References  173 Index  183
1 Geometry of nonpositive curvature The first chapter is devoted to basic notions in metric spaces including a geodesic, metric midpoint and an angle. We define geodesic metric spaces and then explain how one can compare geodesic triangles in these spaces with triangles in the Euclidean plane. Such comparisons enable us to define nonpositive curvature in geodesic spaces and hence to define Hadamard spaces. The condition of nonpositive curvature can be neatly expressed by an analytical inequality and, since we are concerned more with analysis than geometry, this is how we shall use it in our developments. We however feel it is also helpful to gain a geometrical intuition for Hadamard spaces and therefore we start with the triangle comparisons. The chapter ends by providing a number of equivalent conditions for Hadamard spaces. Unless stated otherwise, the 2dimensional vector space ℝ2 is assumed to be equipped with the Euclidean norm
𝑥 = (𝑥1 , 𝑥2 ) ∈ ℝ2 .
‖𝑥‖ := √𝑥21 + 𝑥22 ,
The corresponding inner product is denoted ⟨⋅, ⋅⟩.
1.1 Geodesic metric spaces Let (𝑋, 𝑑) be a metric space. A continuous mapping from the interval [0, 1] to 𝑋 is called a path. The length of a path 𝛾 : [0, 1] → 𝑋 is defined as 𝑛
length(𝛾) := sup ∑ 𝑑 (𝛾 (𝑡𝑖−1 ) , 𝛾 (𝑡𝑖 )) , 𝑖=1
where the supremum is taken over the set of all partitions 0 = 𝑡0 < ⋅ ⋅ ⋅ < 𝑡𝑛 = 1 of the interval [0, 1], with an arbitrary 𝑛 ∈ ℕ. Given a pair of points 𝑥, 𝑦 ∈ 𝑋, we say that a path 𝛾 : [0, 1] → 𝑋 joins 𝑥 and 𝑦 if 𝛾(0) = 𝑥 and 𝛾(1) = 𝑦. A metric space (𝑋, 𝑑) is a length space if for every 𝑥, 𝑦 ∈ 𝑋 and 𝜀 > 0 there exists a path 𝛾 : [0, 1] → 𝑋 joining 𝑥 and 𝑦 such that length(𝛾) ≤ 𝑑(𝑥, 𝑦) + 𝜀. If 𝛾 : [0, 1] → 𝑋 is a path and 𝑡 ∈ [0, 1], we often use the symbol 𝛾𝑡 to denote the point 𝛾(𝑡). A path 𝛾 : [0, 1] → 𝑋 is called a geodesic if 𝑑(𝛾𝑠 , 𝛾𝑡 ) = 𝑑(𝛾0 , 𝛾1 )𝑠 − 𝑡 for every 𝑠, 𝑡 ∈ [0, 1], that is, if it parametrized proportionally to the arc length. In particular, a geodesic is an injection unless it is trivial, that is, unless 𝛾0 = 𝛾1 . When no confusion is likely, we do not distinguish between a geodesic 𝛾 : [0, 1] → 𝑋 and its range 𝛾([0, 1]) ⊂ 𝑋. This allows to say, for instance, that a point lies on a geodesic, or a geodesic goes through a point. We say that (𝑋, 𝑑) is a geodesic space if every two points 𝑥, 𝑦 ∈ 𝑋 are connected by a geodesic. In this case we denote such a geodesic by [𝑥, 𝑦], but one must remember that such a geodesic is not uniquely determined by its endpoints in general. For a point
2  1 Geometry of nonpositive curvature
𝑧 ∈ [𝑥, 𝑦] we often use the notation 𝑧 = (1 − 𝑡)𝑥 + 𝑡𝑦, where 𝑡 = 𝑑(𝑥, 𝑧)/𝑑(𝑥, 𝑦), and say that 𝑧 is a convex combination¹ of 𝑥 and 𝑦. If 𝑥0 , 𝑥1 ∈ 𝑋 and 𝑡 ∈ [0, 1], then the point (1 − 𝑡)𝑥0 + 𝑡𝑥1 on a geodesic [𝑥0 , 𝑥1 ] is usually denoted 𝑥(𝑡) or 𝑥𝑡 . The point 𝑥 1 is called a midpoint of 𝑥0 and 𝑥1 . Every geodesic space is a length space; for the 2
converse, see Exercise 1.6. If moreover every two points of (𝑋, 𝑑) are connected by a unique geodesic, the space (𝑋, 𝑑) is called uniquely geodesic. Since we deal with Hadamard spaces in this book, which are uniquely geodesic, the symbol [𝑥, 𝑦] will then denote the unique geodesic connecting 𝑥 and 𝑦, and no confusion can occur. In particular, we have uniqueness of midpoints. Having defined a midpoint of a given pair of points in a geodesic space, we now introduce a more general notion in an arbitrary metric space. Definition 1.1.1 (Metric midpoint). Let (𝑋, 𝑑) be a metric space and 𝑥, 𝑦 ∈ 𝑋. We say that a point 𝑚 ∈ 𝑋 is a metric midpoint of 𝑥 and 𝑦 if 𝑑(𝑥, 𝑦) = 2𝑑(𝑥, 𝑚) = 2𝑑(𝑚, 𝑦), and we say that a pair of points 𝑥 and 𝑦 has approximate metric midpoints if for every 𝜀 > 0 there is 𝑎 ∈ 𝑋 such that
max {𝑑(𝑥, 𝑎), 𝑑(𝑦, 𝑎)} ≤ 12 𝑑(𝑥, 𝑦) + 𝜀 . If (𝑋, 𝑑) is a geodesic space and 𝑥0 , 𝑥1 ∈ 𝑋, then a midpoint 𝑥 1 := 12 𝑥0 + 12 𝑥1 is of 2 course also a metric midpoint of 𝑥0 and 𝑥1 . The existence of approximate metric midpoints characterizes length spaces among all complete metric spaces. Proposition 1.1.2. Let (𝑋, 𝑑) be a complete metric space. Then the following are equivalent: (i) The space (𝑋, 𝑑) is a length space. (ii) For every 𝑥, 𝑦 ∈ 𝑋 and 𝜀 > 0 there exists a point 𝑏 ∈ 𝑋 such that
𝑑(𝑥, 𝑏)2 + 𝑑(𝑦, 𝑏)2 ≤ 12 𝑑(𝑥, 𝑦)2 + 𝜀. (iii) Every pair of points in 𝑋 has approximate metric midpoints. Proof. Exercise 1.5. In a similar way, one can characterize geodesic spaces via the existence of metric midpoints. Indeed, every geodesic space obviously admits metric midpoints. The converse is also true in complete spaces as the following proposition shows. Proposition 1.1.3. Let (𝑋, 𝑑) be a complete metric space. Then the following are equivalent: (i) The space (𝑋, 𝑑) is a geodesic space.
1 Note that we do not define a convex combination of three or more points.
1.1 Geodesic metric spaces  3
(ii) For every 𝑥, 𝑦 ∈ 𝑋 there exists a point 𝑚 ∈ 𝑋 such that
𝑑(𝑥, 𝑚)2 + 𝑑(𝑦, 𝑚)2 = 12 𝑑(𝑥, 𝑦)2 . (iii) Every pair of points in 𝑋 has a metric midpoint. Proof. Clearly (i) ⇒ (ii), and by Exercise 1.1 also (ii) ⇒ (iii). To see that (iii) ⇒ (i) we choose a pair of points 𝑥0 , 𝑥1 ∈ 𝑋 and repeatedly employ (ii) to obtain 𝑥𝑡 for every dyadic 𝑡 := 𝑘/2𝑙 , where 𝑘 = 1, . . . , 2𝑙 and 𝑙 ∈ ℕ. Since 𝑋 is complete, we obtain a geodesic 𝑥 : [0, 1] → 𝑋 connecting 𝑥0 and 𝑥1 . In order to give the geometrical definition of nonpositive curvature we need to make precise how to compare geodesic triangles with triangles in the Euclidean plane.
Geodesic triangles Let (𝑋, 𝑑) be a geodesic metric space. A geodesic triangle with vertices 𝑝, 𝑞, 𝑟 ∈ 𝑋 consists of three geodesics [𝑝, 𝑞], [𝑞, 𝑟], [𝑟, 𝑝]. If the space is not uniquely geodesic, a geodesic triangle is not uniquely determined by its vertices. We will however denote it by △(𝑝, 𝑞, 𝑟), keeping in mind that the definition implicitly presumes a choice of geodesics [𝑝, 𝑞], [𝑞, 𝑟], [𝑟, 𝑝]. Given such a geodesic triangle in 𝑋, there exists a comparison triangle in ℝ2 , that is, three line segments [𝑝, 𝑞], [𝑞, 𝑟] and [𝑟, 𝑝] in ℝ2 , such that
𝑑(𝑝, 𝑞) = ‖𝑝 − 𝑞‖ ,
𝑑(𝑝, 𝑟) = ‖𝑝 − 𝑟‖ ,
𝑑(𝑟, 𝑞) = ‖𝑟 − 𝑞‖,
see Figure 1.1. The comparison triangle is unique up to isometries and we shall denote it △(𝑝, 𝑞, 𝑟). Note that both a geodesic triangle and a comparison triangle are a union of three geodesics, nothing like a convex combination of three points. The orientation of the sides plays no role, meaning that geodesics [𝑝, 𝑞] and [𝑞, 𝑝] represent the same side of a triangle. Let us now assume 𝑥 ∈ △(𝑝, 𝑞, 𝑟), say 𝑥 = (1 − 𝑡)𝑝 + 𝑡𝑞 for some 𝑡 ∈ [0, 1]. Then the comparison point for 𝑥 in the comparison triangle △(𝑝, 𝑞, 𝑟) is the point 𝑥 := (1 − 𝑡)𝑝 + 𝑡𝑞. A word of caution is in order: the triangle △(𝑝, 𝑥, 𝑟) is not a
X
ℝ2
r
r
q p
q p
Fig. 1.1. Geodesic triangle (left) and its comparison triangle (right).
4  1 Geometry of nonpositive curvature comparison triangle for the geodesic triangle △(𝑝, 𝑥, 𝑟), because in general we do not have 𝑑(𝑥, 𝑟) = ‖𝑥 − 𝑟‖. Next we get to the definition of nonpositive curvature in the sense of Busemann. This is however only an intermediate step for us, since we will need a stronger notion of nonpositive curvature, which is to appear later (more precisely in Definition 1.2.1). Definition 1.1.4 (Busemann nonpositive curvature). Let (𝑋, 𝑑) be a geodesic space. We say that 𝑋 has nonpositive curvature in the sense of Busemann if for every 𝑥, 𝑦, 𝑧 ∈ 𝑋 we have 2𝑑(𝑚1 , 𝑚2 ) ≤ 𝑑(𝑥, 𝑦), where 𝑚1 is a midpoint of [𝑥, 𝑧] and 𝑚2 is a midpoint of [𝑦, 𝑧]. We will alternatively say that (𝑋, 𝑑) is a Busemann space. The geometric condition in the definition means that a triangle in a Busemann space is thinner than its comparison triangle in the Euclidean place. It has an analytic counterpart in terms of convexity of the distance function. Proposition 1.1.5. A geodesic space (𝑋, 𝑑) is a Busemann space if and only if, for every geodesics 𝑥, 𝑦 : [0, 1] → 𝑋, the function 𝑡 → 𝑑(𝑥𝑡 , 𝑦𝑡 ) is convex on [0, 1]. Proof. It of course suffices to prove the implication that in Busemann spaces the function 𝑡 → 𝑑(𝑥𝑡 , 𝑦𝑡 ) is convex. Let 𝑚 be a midpoint of 𝑥1 and 𝑦0 . We then have
𝑑(𝑥 1 , 𝑦 1 ) ≤ 𝑑(𝑥 1 , 𝑚) + 𝑑(𝑚, 𝑦 1 ) ≤ 2
2
2
2
1 1 𝑑 (𝑥0 , 𝑦0 ) + 𝑑 (𝑥1 , 𝑦1 ) . 2 2
From this we easily obtain
𝑑 (𝑥𝑡 , 𝑦𝑡 ) ≤ (1 − 𝑡)𝑑 (𝑥0 , 𝑦0 ) + 𝑡𝑑 (𝑥1 , 𝑦1 ) , for every dyadic 𝑡 := 𝑘/2𝑙 , with 𝑘 = 1, . . . , 2𝑙 and 𝑙 ∈ ℕ. An easy approximation argument extends the last inequality for every 𝑡 ∈ [0, 1]. It is immediate that Busemann spaces are uniquely geodesic. A typical example of Busemann spaces is a strictly convex Banach space. Even though Busemann’s definition captures the essence of nonpositive curvature very well indeed (a complete simply connected Riemannian manifold has nonpositive sectional curvature if and only if it is a Busemann space), it is too weak for our purposes. A major disadvantage of Busemann spaces is that metric projections onto closed convex sets are not nonexpansive in general. Since nonexpansiveness of metric projections underpins so many results in the present book, we therefore need a stronger notion of nonpositive curvature due to Alexandrov. Using Gromov’s terminology, we will refer to spaces of nonpositive curvature in the sense of Alexandrov as CAT(0) spaces. The difference between the Alexandrov curvature and Busemann curvature is similar to that between Hilbert spaces and strictly convex Banach spaces; see Example 1.2.9. It is worth mentioning that there is also a notion of uniform Busemann spaces, which in a sense correspond to uniformly convex Banach spaces.
1.2 Meet Hadamard spaces 
5
Actually, there is a precisely formulated difference between Busemann’s and Alexandrov’s nonpositive curvatures. A geodesic metric space has nonpositive curvature in the sense of Alexandrov if and only if it has nonpositive curvature in the sense of Busemann and it is a Ptolemaic metric space; see Remark 1.3.4 below.
1.2 Meet Hadamard spaces We now introduce CAT(0) spaces, which are geodesic spaces with nonpositive curvature in a stronger sense than that of Busemann. Like in Busemann curvature, we compare geodesic triangles of a CAT(0) space with triangles in the Euclidean plane, but require a stronger inequality to hold. Similarly, we will also see that this geometric condition can be expressed analytically as convexity of the distance function and that this convexity condition is quantified. Complete CAT(0) spaces are called Hadamard spaces (Definition 1.2.3) and these are already the spaces we are going to use throughout the book. Definition 1.2.1 (CAT(0) space). Let (𝑋, 𝑑) be a geodesic space. It is a CAT(0) space if for every geodesic triangle with vertices 𝑝, 𝑞, 𝑟 ∈ 𝑋 and 𝑥 ∈ [𝑝, 𝑟], 𝑦 ∈ [𝑝, 𝑞] we have 𝑑(𝑥, 𝑦) ≤ ‖𝑥 − 𝑦‖, where 𝑥 and 𝑦 are the corresponding comparison points in the comparison triangle △(𝑝, 𝑞, 𝑟). The geometrical meaning of this definition is depicted in Figure 1.2. One can see from the very definition that CAT(0) spaces are Busemann. In particular, they are uniquely geodesic. If (𝑋, 𝑑) is a CAT(0) space, then every geodesic triangle with vertices 𝑝, 𝑞, 𝑟 ∈ 𝑋 and 𝑥 ∈ [𝑞, 𝑟] we have 𝑑(𝑝, 𝑥) ≤ 𝑝 − 𝑥 , (1.2.1) where 𝑥 ∈ △(𝑝, 𝑞, 𝑟). Using an elementary calculation with the inner product in the Euclidean plane ℝ2 , we can equivalently express the condition in (1.2.1) as 2
2
2
2
𝑑 (𝑝, 𝑥𝑡 ) ≤ (1 − 𝑡)𝑑 (𝑝, 𝑞) + 𝑡𝑑 (𝑝, 𝑟) − 𝑡(1 − 𝑡)𝑑 (𝑞, 𝑟) , X
ℝ2
r
r
x
x q
q
y p
p
y
Fig. 1.2. Geodesic triangle (left) and its comparison triangle (right).
(1.2.2)
6  1 Geometry of nonpositive curvature where 𝑥𝑡 := (1 − 𝑡)𝑞 + 𝑡𝑟 for every 𝑡 ∈ [0, 1]. Inequality (1.2.2) says that the distance function in CAT(0) spaces is at least as convex as in the Euclidean plane. It is often 1 applied with 𝑡 := 2 in which case 𝑚 := 𝑥 1 is the midpoint of [𝑞, 𝑟] and we have 2
2
2
2
2
𝑑 (𝑝, 𝑚) ≤ 12 𝑑 (𝑝, 𝑞) + 12 𝑑 (𝑝, 𝑟) − 14 𝑑 (𝑞, 𝑟) .
(1.2.3)
As we shall see later, this inequality in turn implies CAT(0). Let us record here that by Proposition 1.1.5 we have in a CAT(0) space (𝑋, 𝑑) the following convexity property of the metric
𝑑 (𝑥𝑡 , 𝑦𝑡 ) ≤ (1 − 𝑡)𝑑 (𝑥0 , 𝑦0 ) + 𝑡𝑑 (𝑥1 , 𝑦1 )
(1.2.4)
where 𝑥, 𝑦 : [0, 1] → 𝑋 are geodesics. This is sometimes referred to as joint convexity of the metric. It in particular applies in the case 𝑥0 = 𝑥1 . Lemma 1.2.2. Let (𝑋, 𝑑) be a CAT(0) space and 𝑥 : [0, 1] → 𝑋 be a geodesic. Then for every 𝑡 ∈ [0, 1] the point 𝑥𝑡 := (1 − 𝑡)𝑥0 + 𝑡𝑥1 depends continuously on the endpoints 𝑥0 and 𝑥1 . Proof. Choose one more geodesic 𝑦 : [0, 1] → 𝑋. Applying (1.2.2) twice yields
𝑑(𝑥𝑡 , 𝑦𝑡 )2 ≤ (1 − 𝑡)𝑑(𝑥0 , 𝑦𝑡 )2 + 𝑡𝑑(𝑥1 , 𝑦𝑡 )2 − 𝑡(1 − 𝑡)𝑑(𝑥0 , 𝑥1 )2 ≤ (1 − 𝑡)2 𝑑(𝑥0 , 𝑦0 )2 + 𝑡2 𝑑(𝑥1 , 𝑦1 )2 + 𝑡(1 − 𝑡) [𝑑(𝑥0 , 𝑦1 )2 + 𝑑(𝑥1 , 𝑦0 )2 − 𝑑(𝑥0 , 𝑥1 )2 − 𝑑(𝑦0 , 𝑦1 )2 ] . The right hand side converges to 0 if 𝑦0 → 𝑥0 and 𝑦1 → 𝑥1 . As a consequence of the preceding lemma we obtain that CAT(0) spaces are simply connected. Definition 1.2.3 (Hadamard space). A complete CAT(0) space is called a Hadamard space. The CAT(0) geometry is given by comparing geodesic triangles with the Euclidean ones. As a consequence, it is possible to compare geodesic quadrilaterals with quadrilaterals in the Euclidean space, which will be illuminated in Corollary 1.2.5 and Example 1.9. We now present a rather subtle inequality containing the quadruple comparison as a special case. Proposition 1.2.4. Let (𝑋, 𝑑) be a CAT(0) space and
𝑥0 , 𝑥1 , 𝑦0 , 𝑦1 ∈ 𝑋.
1.2 Meet Hadamard spaces 
7
Then for every 𝑡 ∈ [0, 1] and every parameter 𝛽 ∈ [0, 1] we have 2
2
2
2
𝑑 (𝑥0 , 𝑦𝑡 ) + 𝑑 (𝑥1 , 𝑦1−𝑡 ) ≤ 𝑑 (𝑥0 , 𝑦0 ) + 𝑑 (𝑥1 , 𝑦1 ) + (2𝑡2 − 𝑡)𝑑 (𝑦0 , 𝑦1 ) 2
+ 𝑡𝑑 (𝑥0 , 𝑥1 ) − 𝑡𝛽 [𝑑 (𝑥0 , 𝑥1 ) − 𝑑 (𝑦0 , 𝑦1 )]
2
2
2
− 𝑡(1 − 𝛽) [𝑑 (𝑥0 , 𝑦0 ) − 𝑑 (𝑥1 , 𝑦1 )] , where, as always, 𝑥𝑡 := (1 − 𝑡)𝑥0 + 𝑡𝑥1 and 𝑦𝑡 := (1 − 𝑡)𝑦0 + 𝑡𝑦1 . Proof. Step 1: We first assume that 𝑡 = 1 and 𝛽 = 0. Denote 𝛾 : [0, 1] → H the geodesic connecting 𝑥0 and 𝑦1 . Then (1.2.2) yields 2
2
2
2
2
2
2
2
𝑑 (𝑦0 , 𝛾𝑠 ) ≤ (1 − 𝑠)𝑑 (𝑦0 , 𝑥0 ) + 𝑠𝑑 (𝑦0 , 𝑦1 ) − 𝑠(1 − 𝑠)𝑑 (𝑥0 , 𝑦1 ) , 𝑑 (𝑥1 , 𝛾𝑠 ) ≤ (1 − 𝑠)𝑑 (𝑥1 , 𝑥0 ) + 𝑠𝑑 (𝑥1 , 𝑦1 ) − 𝑠(1 − 𝑠)𝑑 (𝑥0 , 𝑦1 ) , and furthermore, for every 𝜀 > 0, 2
𝑑 (𝑥1 , 𝑦0 ) ≤ [𝑑 (𝑥1 , 𝛾𝑠 ) + 𝑑 (𝛾𝑠 , 𝑦0 )]
2
2
2
≤ (1 + 𝜀)𝑑 (𝑥1 , 𝛾𝑠 ) + (1 + 1𝜀 ) 𝑑 (𝛾𝑠 , 𝑦0 ) 2
≤ (1 + 𝜀)(1 − 𝑠)𝑑 (𝑥1 , 𝑥0 ) + (1 + 𝜀)𝑠𝑑 (𝑥1 , 𝑦1 ) 2
2
+ (1 + 1𝜀 ) (1 − 𝑠)𝑑 (𝑦0 , 𝑥0 ) + (1 + 1𝜀 ) 𝑠𝑑 (𝑦0 , 𝑦1 )
2
2
− (2 + 𝜀 + 1𝜀 ) 𝑠(1 − 𝑠)𝑑 (𝑥0 , 𝑦1 ) . Choosing 𝜀 :=
𝑠 1−𝑠
gives 2
2
2
2
𝑑 (𝑥0 , 𝑦1 ) + 𝑑 (𝑥1 , 𝑦0 ) ≤ 𝑑 (𝑥0 , 𝑥1 ) + 𝑑 (𝑦0 , 𝑦1 ) 1−𝑠 𝑠 2 2 𝑑 (𝑥0 , 𝑦0 ) + 𝑑 (𝑥1 , 𝑦1 ) . + 𝑠 1−𝑠 If we further set 𝑠 := 2
𝑑(𝑥0 ,𝑦0 ) , the 𝑑(𝑥0 ,𝑦0 )+𝑑(𝑥1 ,𝑦1 ) 2
last inequality becomes 2
2
2
2
2
2
𝑑 (𝑥0 , 𝑦1 ) + 𝑑 (𝑥1 , 𝑦0 ) ≤ 𝑑 (𝑥0 , 𝑥1 ) + 𝑑 (𝑦0 , 𝑦1 ) + 𝑑 (𝑥0 , 𝑦0 ) + 𝑑 (𝑥1 , 𝑦1 ) 2
− [𝑑 (𝑥0 , 𝑦0 ) − 𝑑 (𝑥1 , 𝑦1 )] , which shows the case 𝑡 = 1 and 𝛽 = 0. Step 2: By a symmetry argument we also get 2
2
2
2
𝑑 (𝑥0 , 𝑦1 ) + 𝑑 (𝑥1 , 𝑦0 ) ≤ 𝑑 (𝑥0 , 𝑥1 ) + 𝑑 (𝑦0 , 𝑦1 ) + 𝑑 (𝑥0 , 𝑦0 ) + 𝑑 (𝑥1 , 𝑦1 ) 2
− [𝑑 (𝑥0 , 𝑥1 ) − 𝑑 (𝑦0 , 𝑦1 )] ,
8  1 Geometry of nonpositive curvature and taking convex combinations yields the case of 𝑡 = 1 and 𝛽 ∈ [0, 1], namely, 2
2
2
2
2
𝑑 (𝑥0 , 𝑦1 ) + 𝑑 (𝑥1 , 𝑦0 ) ≤ 𝑑 (𝑥0 , 𝑥1 ) + 𝑑 (𝑦0 , 𝑦1 ) + 𝑑 (𝑥0 , 𝑦0 ) + 𝑑 (𝑥1 , 𝑦1 ) − (1 − 𝛽) [𝑑 (𝑥0 , 𝑦0 ) − 𝑑 (𝑥1 , 𝑦1 )]
2
2
2
− 𝛽 [𝑑 (𝑥0 , 𝑥1 ) − 𝑑 (𝑦0 , 𝑦1 )] .
(1.2.5)
Step 3: Finally, we obtain the general case applying (1.2.2) and (1.2.5) as follows 2
2
2
𝑑 (𝑥0 , 𝑦𝑡 ) + 𝑑 (𝑥1 , 𝑦1−𝑡) ≤ (1 − 𝑡)𝑑 (𝑥0 , 𝑦0 ) + 𝑡𝑑 (𝑥0 , 𝑦1 ) 2
2
+ (1 − 𝑡)𝑑 (𝑥1 , 𝑦1 ) + 𝑡𝑑 (𝑥1 , 𝑦0 ) − 2𝑡(1 − 𝑡)𝑑 (𝑦0 , 𝑦1 )
2
2
2
2
≤ 𝑑 (𝑥0 , 𝑦0 ) + 𝑑 (𝑥1 , 𝑦1 ) + (2𝑡2 − 𝑡) 𝑑 (𝑦0 , 𝑦1 ) 2
2
2
+ 𝑡𝑑 (𝑥0 , 𝑥1 ) − 𝑡𝛽 [𝑑 (𝑥0 , 𝑥1 ) − 𝑑 (𝑦0 , 𝑦1 )] 2
− 𝑡(1 − 𝛽) [𝑑 (𝑥0 , 𝑦0 ) − 𝑑 (𝑥1 , 𝑦1 )] . The proof is now complete. Corollary 1.2.5. Let (𝑋, 𝑑) be a CAT(0) space. Then for every four points 𝑥0 , 𝑥1 , 𝑦0 ,
𝑦1 ∈ 𝑋 we have 2
2
2
2
𝑑 (𝑥0 , 𝑦1 ) + 𝑑 (𝑥1 , 𝑦0 ) ≤ 𝑑 (𝑥0 , 𝑦0 ) + 𝑑 (𝑥1 , 𝑦1 ) + 2𝑑 (𝑥0 , 𝑥1 ) 𝑑 (𝑦0 , 𝑦1 ) , and in particular, 2
2
2
2
2
2
𝑑 (𝑥0 , 𝑦1 ) + 𝑑 (𝑥1 , 𝑦0 ) ≤ 𝑑 (𝑥0 , 𝑦0 ) + 𝑑 (𝑥1 , 𝑦1 ) + 𝑑 (𝑥0 , 𝑥1 ) + 𝑑 (𝑦0 , 𝑦1 ) . Proof. Follows directly from Proposition 1.2.4. If (𝑋, 𝑑) is complete, we can alternatively use Theorem 2.3.5 (iv) with
𝑠 := 𝑡 :=
𝑑 (𝑥3 , 𝑥4 ) 𝑑 (𝑥1 , 𝑥2 ) + 𝑑 (𝑥3 , 𝑥4 )
to obtain the desired inequalities. The second inequality in Corollary 1.2.5 can be actually strengthened; see Exercise 1.9, where we also present its geometrical meaning.
Angles in geodesic spaces The notion of an angle in Hadamard spaces is more important in geometry than analysis. We will see in Theorem 1.3.3 that the CAT(0) property can be expressed in terms of angles. Let first 𝑎, 𝑏, 𝑐 be three points in the Euclidean plane ℝ2 . The angle between the line segments [𝑎, 𝑏] and [𝑎, 𝑐] will be denoted ∡(𝑏, 𝑎, 𝑐), that is,
∡(𝑏, 𝑎, 𝑐) := arccos
⟨𝑏 − 𝑎, 𝑐 − 𝑎⟩ . ‖𝑏 − 𝑎‖‖𝑐 − 𝑎‖
1.2 Meet Hadamard spaces 
9
Definition 1.2.6 (Angle between two geodesics). Let now (𝑋, 𝑑) be a geodesic space. We define the angle between two geodesics 𝛾 : [0, 1] → 𝑋 and 𝜂 : [0, 1] → 𝑋 with 𝛾0 = 𝜂0 by
𝛼 (𝛾, 𝜂) := lim sup ∡ (𝛾𝑠 , 𝛾0 , 𝜂𝑡 ) , 𝑠,𝑡→0+
where 𝛾𝑠 , 𝛾0 , 𝜂𝑡 are the vertices of the comparison triangle △(𝛾𝑠 , 𝛾0 , 𝜂𝑡 ) for the geodesic triangle △(𝛾𝑠 , 𝛾0 , 𝜂𝑡 ). Note that 𝛾𝑠 is not a comparison point for 𝛾𝑠 in the comparison triangle △(𝛾0 , 𝛾1 , 𝜂1 ), that is, 𝛾𝑠 is not (1 − 𝑠)𝛾0 + 𝑠𝛾1 . And likewise for 𝜂𝑡 . An angle is hence a number from [0, 𝜋]. Let now (H, 𝑑) be a Hadamard space and 𝛾 : [0, 1] → H and 𝜂 : [0, 1] → H be geodesics with 𝛾0 = 𝜂0 . Without loss of generality we may assume 𝑑(𝛾0 , 𝛾1 ) = 𝑑(𝜂0 , 𝜂1 ). Since both functions 𝑠 → ∡(𝛾𝑠 , 𝛾0 , 𝜂𝑡 ) and 𝑡 → ∡(𝛾𝑠 , 𝛾0 , 𝜂𝑡 ) are then nondecreasing, we have
𝛼 (𝛾, 𝜂) = lim ∡ (𝛾𝑡 , 𝛾0 , 𝜂𝑡 ) 𝑡→0+
and by an easy trigonometry,
= lim 2 arcsin 𝑡→0+
𝑑 (𝛾𝑡 , 𝜂𝑡 ) . 2𝑡𝑑 (𝛾0 , 𝛾1 )
Given points 𝑝, 𝑞, 𝑟 ∈ H, we will denote the angle between the geodesics [𝑝, 𝑞] and [𝑝, 𝑟] by 𝛼(𝑞, 𝑝, 𝑟), and the corresponding angle in the comparison triangle by 𝛼(𝑞, 𝑝, 𝑟), that is, 𝛼(𝑞, 𝑝, 𝑟) = ∡(𝑞, 𝑝, 𝑟), where 𝑝, 𝑞, 𝑟 are the vertices of a comparison triangle for the geodesic triangle △(𝑝, 𝑞, 𝑟). Angles also behave nicely when we perturb the geodesics. To show this, we need the following lemma. Lemma 1.2.7. Let (𝑋, 𝑑) be a geodesic space and 𝛾1 , 𝛾2 , 𝛾3 be three geodesics issuing from the same point. Then
𝛼(𝛾1 , 𝛾2 ) ≤ 𝛼(𝛾1 , 𝛾3 ) + 𝛼(𝛾3 , 𝛾2 ). Proof. Exercise 1.10. Now we can prove the continuity of an angle. Proposition 1.2.8. Let (H, 𝑑) be a Hadamard space. Then: (i) The function 𝛼(⋅, ⋅, ⋅) is upper semicontinuous on H3 . (ii) For a fixed 𝑝 ∈ H the function 𝛼(⋅, 𝑝, ⋅) is continuous on H2 . Proof. (i) Choose sequences (𝑥𝑛 ), (𝑦𝑛 ) and (𝑝𝑛 ) in H converging to points 𝑥, 𝑦 and 𝑝, respectively. For 𝑡 ∈ (0, 1] put 𝑎(𝑡) := 𝛼((1 − 𝑡)𝑝 + 𝑡𝑥, 𝑝, (1 − 𝑡)𝑝 + 𝑡𝑦) and 𝑎𝑛 (𝑡) := 𝛼((1 − 𝑡)𝑝𝑛 + 𝑡𝑥𝑛, 𝑝𝑛, (1 − 𝑡)𝑝𝑛 + 𝑡𝑦𝑛). We already know that 𝑎(𝑡) is a nondecreasing function of 𝑡 and 𝛼(𝑥, 𝑝, 𝑦) = lim𝑡→0+ 𝑎(𝑡) along with 𝛼(𝑥𝑛 , 𝑝𝑛 , 𝑦𝑛 ) = lim𝑡→0+ 𝑎𝑛 (𝑡) for each 𝑛 ∈ ℕ. Obviously, for a fixed 𝑡 ∈ (0, 1] we have 𝑎𝑛 (𝑡) → 𝑎(𝑡) as 𝑛 → ∞. Given 𝜀 > 0, there exists 𝜏 ∈ (0, 1] such that 𝑎(𝑡) − 𝜀 ≤ 𝛼(𝑥, 𝑝, 𝑦) for every 𝑡 ∈ (0, 𝜏]. Then for 𝑛 big enough, we have 𝑎𝑛 (𝜏) ≤ 𝑎(𝜏) + 𝜀, and therefore
𝛼 (𝑥𝑛, 𝑝𝑛, 𝑦𝑛) ≤ 𝑎𝑛 (𝜏) ≤ 𝑎(𝜏) + 𝜀 ≤ 𝛼(𝑥, 𝑝, 𝑦) + 2𝜀 . Thus lim sup 𝛼(𝑥𝑛 , 𝑝𝑛 , 𝑦𝑛 ) ≤ 𝛼(𝑥, 𝑝, 𝑦).
10  1 Geometry of nonpositive curvature (ii) We keep the notation, but assume 𝑝𝑛 = 𝑝 for each 𝑛 ∈ ℕ. Applying Lemma 1.2.7 yields
𝛼 (𝑥𝑛, 𝑝, 𝑦𝑛) − 𝛼 (𝑥, 𝑝, 𝑦) ≤ 𝛼 (𝑥, 𝑝, 𝑥𝑛) + 𝛼 (𝑦, 𝑝, 𝑦𝑛) , which, along with the facts that lim𝑛 𝛼(𝑥, 𝑝, 𝑥𝑛 ) = 0 and lim𝑛 𝛼(𝑦, 𝑝, 𝑦𝑛 ) = 0, gives lim𝑛 𝛼(𝑥𝑛, 𝑝, 𝑦𝑛) = 𝛼(𝑥, 𝑝, 𝑦).
Examples of Hadamard spaces Let us now list classical examples of Hadamard spaces. For detailed constructions and proofs, the reader is referred to the Bridson–Haefliger book [51]. We start with two basic examples of Hadamard spaces, which in some sense represent the most extreme cases: curvature 0 and curvature −∞. Example 1.2.9 (Hilbert spaces). Hilbert spaces are Hadamard. The geodesics are the line segments. Moreover, it is known that a Banach space is CAT(0) if and only if it is Hilbert; see Exercise 1.4. Example 1.2.10 (ℝtrees). Recall that a metric space (𝑋, 𝑑) is an ℝtree if it is uniquely geodesic and for every 𝑥, 𝑦, 𝑧 ∈ 𝑋 we have [𝑥, 𝑧] = [𝑥, 𝑦] ∪ [𝑦, 𝑧] whenever [𝑥, 𝑦] ∩ [𝑦, 𝑧] = {𝑦}. Notice that all triangles in an ℝtree are trivial.
Hadamard manifolds The most illuminating instances of Hadamard spaces are Hadamard manifolds. By definition, a Hadamard manifold is a complete simply connected Riemannian manifold of nonpositive sectional curvature. The class of Hadamard manifolds includes: Example 1.2.11 (Hyperbolic spaces). We equip ℝ𝑛+1 with the (−1, 𝑛)inner product 𝑛
⟨𝑥, 𝑦⟩(−1,𝑛) := −𝑥0 𝑦0 + ∑ 𝑥𝑖 𝑦𝑖 𝑖=1 0
1
𝑛
0
1
𝑛
for 𝑥 := (𝑥 , 𝑥 , . . . , 𝑥 ) and 𝑦 := (𝑦 , 𝑦 , . . . , 𝑦 ). Define
ℍ𝑛 := {𝑥 ∈ ℝ𝑛+1 : ⟨𝑥, 𝑥⟩(−1,𝑛) = −1, 𝑥0 > 0} . Then ⟨⋅ , ⋅⟩ induces a Riemannian metric 𝑔 on the tangent spaces 𝑇𝑝 ℍ𝑛 ⊂ 𝑇𝑝 ℝ𝑛+1 for 𝑝 ∈ ℍ𝑛 . The sectional curvature of (ℍ𝑛 , 𝑔) is −1 at every point. Example 1.2.12 (Manifolds of positive definite matrices). The space 𝑃(𝑛, ℝ) of symmetric positive definite matrices 𝑛 × 𝑛 with real entries is a Hadamard manifold if it is equipped with the Riemannian metric
⟨𝑋, 𝑌⟩𝐴 := Tr (𝐴−1 𝑋𝐴−1 𝑌) , for every 𝐴 ∈ 𝑃(𝑛, ℝ).
𝑋, 𝑌 ∈ 𝑇𝐴 (𝑃(𝑛, ℝ)) ,
1.2 Meet Hadamard spaces 
11
The class of Hadamard manifolds also contains symmetric spaces of noncompact type, like 𝑆𝐿(𝑛, ℝ)/𝑆𝑂(𝑛, ℝ). Note that 𝑆𝐿(𝑛, ℝ)/𝑆𝑂(𝑛, ℝ) can be identified with the submanifold 𝑃1 (𝑛, ℝ) := {𝐴 ∈ 𝑃(𝑛, ℝ) : det 𝐴 = 1}. As far as infinite dimensional manifolds (that is, those modeled on a Hilbert space) are concerned, the Hopf–Rinow theorem is not valid any more and we do not have the existence of geodesics in general. However, if the manifold in question is Hadamard, we obtain the existence of geodesics as a consequence of nonpositive curvature. Thus infinite dimensional Hadamard manifolds are another instance of Hadamard spaces; see [125, XI. Proposition 3.4.]. The complex Hilbert ball with the hyperbolic metric is a prominent example of an infinite dimensional Hadamard manifold and we will describe it in detail: Example 1.2.13 (Hilbert ball). Let (𝐻, ‖⋅‖) be a complex Hilbert space with inner product ⟨⋅, ⋅⟩ and put 𝔹 := {𝑥 ∈ 𝐻 : ‖𝑥‖ < 1}. We equip 𝔹 with the hyperbolic metric
𝜌(𝑥, 𝑦) := tanh−1 √1 − 𝜎(𝑥, 𝑦) , where
𝜎(𝑥, 𝑦) :=
(1 − ‖𝑥‖2 ) (1 − ‖𝑦‖2 ) , 1 − ⟨𝑥, 𝑦⟩
𝑥, 𝑦 ∈ 𝔹 ,
𝑥, 𝑦 ∈ 𝔹 .
The metric space (𝔹, 𝜌) is called the Hilbert ball. For each 𝑎 ∈ 𝔹, the Möbius transformation associated with 𝑎 is defined by
𝑀𝑎 (𝑥) := (√1 − ‖𝑎‖2 𝑄𝑎 + 𝑃𝑎 )
𝑥+𝑎 , 1 − ⟨𝑥, 𝑎⟩
𝑥∈𝔹,
where 𝑃𝑎 is the projection onto the onedimensional space span(𝑎) and 𝑄𝑎 := 𝐼 − 𝑃𝑎 , with 𝐼 being the identity mapping. Möbius transformations are automorphisms on 𝔹, and we have 𝑀−𝑎 (𝑎) = 0 for each 𝑎 ∈ 𝔹. This will enable us in the sequel, given a point 𝑥 ∈ 𝔹, to consider 𝑥 = 0. The Hilbert ball is a uniquely geodesic space and each pair of points lies on a geodesic line. It also has a structure of an infinite dimensional Riemannian manifolds. Indeed, to each a point 𝑥 ∈ 𝔹, we associate a Hilbert space 𝐻𝑥 , whose elements will be denoted by (𝑥, 𝑦), where 𝑦 ∈ 𝔹. The space 𝐻𝑥 is equipped with the inner product
𝜌(𝑥, 𝑦)𝜌(𝑥, 𝑧) ⟨(𝑥, 𝑦), (𝑥, 𝑧)⟩𝐻𝑥 := ⟨𝑀−𝑥 (𝑦), 𝑀−𝑥 (𝑧)⟩ , 𝑀−𝑥 (𝑦) 𝑀−𝑥 (𝑧) for 𝑥 ≠ 𝑦 and 𝑥 ≠ 𝑧 from 𝔹. The symbol ‖ ⋅ ‖𝐻𝑥 will then stand for the corresponding norm. For more details, see [92]. We will show that (𝔹, 𝜌) is a Hadamard space in a few steps. Step 1: We claim first it is a Busemann space, that is,
𝜌 (𝑚1 , 𝑚2 ) ≤ 12 𝜌(𝑥, 𝑦) ,
(1.2.6)
12  1 Geometry of nonpositive curvature for every 𝑥, 𝑦, 𝑧 ∈ 𝔹, with 𝑚1 , 𝑚2 being the midpoints of [𝑥, 𝑦] and [𝑥, 𝑧], respectively. Without loss of generality (by virtue of a Möbius transformation), we may assume 𝑧 = 0. Then
𝑥=
2𝑚1 2 , 1 + 𝑚1
𝑦=
2𝑚2 2 , 1 + 𝑚2
and one can readily check that the desired inequality (1.2.6) is equivalent to 2
𝜎(𝑥, 𝑦) ≤
𝜎 (𝑚1 , 𝑚2 )
[2 − 𝜎 (𝑚1 , 𝑚2 )]
2
.
(1.2.7)
To see that the last inequality indeed holds true, let us compute
2 2 2 2 (1 − 𝑚1 ) (1 − 𝑚2 ) 𝜎(𝑥, 𝑦) = (1 − 𝑚 2 ) (1 − 𝑚 2 ) + 2 (𝑚 2 + 𝑚 2 − 2 ⟨𝑚 , 𝑚 ⟩)2 1 2 1 2 1 2 2 2 2 2 (1 − 𝑚1 ) (1 − 𝑚2 ) ≤ 2 2 2 2 [(1 − 𝑚1 ) (1 − 𝑚2 ) + 2 𝑚1 − 𝑚2 ] 2 𝜎 (𝑚1 , 𝑚2 ) ≤ 2 2 2 𝑚 −𝑚 [𝜎 (𝑚1 , 𝑚2 ) + ‖ 1 2 ‖ 2 ] 1−⟨𝑚1 ,𝑚2 ⟩ ≤
𝜎 (𝑚1 , 𝑚2 )
2
[2 − 𝜎 (𝑚1 , 𝑚2 )]
2
,
which is exactly (1.2.7). The Hilbert ball is thus a Busemann space. Step 2: Next we show that 𝜌(𝑦, 𝑧)2 ≥ 𝜌(𝑥, 𝑦)2 + 𝜌(𝑥, 𝑧)2 − 2 Re ⟨(𝑥, 𝑦), (𝑥, 𝑧)⟩ (1.2.8) holds for every 𝑥, 𝑦, 𝑧 ∈ 𝔹. To this end, put 𝑦𝑡 := (1 − 𝑡)𝑥 + 𝑡𝑦 and 𝑧𝑡 := (1 − 𝑡)𝑥 + 𝑡𝑧, for each 𝑡 ∈ [0, 1]. By the previous step, we know the Hilbert ball is a Busemann space and we can employ Proposition 1.1.5 to conclude that 𝜌(𝑦𝑡 , 𝑧𝑡 )2 ≤ 𝑡2 𝜌(𝑦, 𝑧)2 for each 𝑡 ∈ [0, 1] and hence it suffices to show 2
lim 𝑡→0
𝜌 (𝑦𝑡 , 𝑧𝑡 ) ≥ 𝜌(𝑥, 𝑦)2 + 𝜌(𝑥, 𝑧)2 − 2 Re ⟨(𝑥, 𝑦), (𝑥, 𝑧)⟩ . 𝑡2
Without loss of generality, we may assume 𝑦 ≠ 𝑥, 𝑧 ≠ 𝑥 and 𝑥 = 0. We then have
𝜌 (𝑦𝑡 , 𝑧𝑡 ) = 𝜌 (tanh (𝑡 tanh−1 ‖𝑦‖)
𝑦 𝑧 , tanh (𝑡 tanh−1 ‖𝑧‖) ) . ‖𝑦‖ ‖𝑧‖
1.2 Meet Hadamard spaces 
13
Since 𝜌 and the Hilbert space metric are locally isometric, one can write
lim 𝑡→0
2 𝜌 (𝑦𝑡 , 𝑧𝑡 ) 𝑦 1 𝑧 − tanh (𝑡 tanh−1 ‖𝑧‖) = lim 2 tanh (𝑡 tanh−1 ‖𝑦‖) 2 𝑡→0 𝑡 𝑡 ‖𝑦‖ ‖𝑧‖ 𝑦 𝑧 = tanh−1 (‖𝑦‖) − tanh−1 (‖𝑧‖) ‖𝑦‖ ‖𝑧‖ 2 = (0, 𝑦) − (0, 𝑧)𝐻0
= 𝜌(𝑥, 𝑦)2 + 𝜌(𝑥, 𝑧)2 − 2 Re ⟨(𝑥, 𝑦), (𝑥, 𝑧)⟩ , which proves (1.2.8). Step 3: Finally, we will show that the Hilbert ball satisfies (1.2.2), that is, 2
2
2
2
𝜌 (𝑦, 𝑥𝑡 ) ≤ (1 − 𝑡)𝜌 (𝑦, 𝑥0 ) + 𝑡𝜌 (𝑦, 𝑥1 ) − 𝑡(1 − 𝑡)𝜌 (𝑥0 , 𝑥1 ) ,
(1.2.9)
for each geodesic 𝑥 : [0, 1] → 𝔹 and 𝑦 ∈ 𝔹. Applying (1.2.8) with 𝑥 := 𝑥𝑡 , 𝑦 := 𝑥0 , and 𝑧 := 𝑦, we get 2
2
2
𝜌 (𝑥0 , 𝑦) ≥ 𝜌 (𝑥𝑡 , 𝑥0 ) + 𝜌 (𝑥𝑡 , 𝑦) − 2 Re ⟨(𝑥𝑡 , 𝑥0 ) , (𝑥𝑡 , 𝑦)⟩ , and applying (1.2.8) again with 𝑥 := 𝑥𝑡 , 𝑦 := 𝑥1 , and 𝑧 := 𝑦, we get 2
2
2
𝜌 (𝑥1 , 𝑦) ≥ 𝜌 (𝑥𝑡 , 𝑥1 ) + 𝜌 (𝑥𝑡 , 𝑦) − 2 Re ⟨(𝑥𝑡 , 𝑥1 ) , (𝑥𝑡 , 𝑦)⟩ . Now multiply the last inequality by 𝑡 and the second last inequality by (1 − 𝑡) and sum them up to obtain (1.2.9). Thus the Hilbert ball is a CAT(0) space. Since (𝔹, 𝜌) is obviously complete, we can conclude it is a Hadamard space. A more subtle construction of an infinitedimensional weak Riemannian manifold, whose metric completion with respect to the 𝐿2 metric is a Hadamard space, is due to B. Clarke [63]; see also [64, 65]. This manifold arises as the space of Riemannian metrics on a given closed (finite dimensional) manifold. Note however that the completion is no longer a manifold. This construction has a complex counterpart in Kähler geometry, where the metric completion with respect to the 𝐿2 metric of the space of Kähler potentials in a given cohomology class is also a Hadamard space [191], which will be revisited in Example 5.1.17. An analogous (and earlier) result due to S. Yamada [208] states that the completion of the Teichmüller space with respect to the Weil–Petersson metric is a Hadamard space; see also [207] for a nice survey.
Complexes A significant part of CAT(0) geometry and geometric group theory is devoted to CAT(0) complexes, simplicial and cubical complexes being the most important instances. One of the prominent classes of CAT(0) complexes is called Euclidean buildings, which have been thoroughly treated in [1].
14  1 Geometry of nonpositive curvature In Chapter 8 we shall present the BHV tree space, which is a nice example of a CAT(0) cubical complex. The following two facts are recorded here for future reference. Theorem 1.2.14 (Bridson’s theorem). If a cubical complex is finite dimensional or locally finite, then it is a complete geodesic space. There exists a useful criterion whether a cubical complex is CAT(0) due to M. Gromov. Theorem 1.2.15. A finite dimensional cubical complex is a Hadamard space if and only if it is simply connected and the link of each of its vertices is a flag complex. As an authoritative text on the subject, we recommend [51] by M. Bridson and A. Haefliger.
Constructions of Hadamard spaces After presenting examples of Hadamard spaces, we now turn to various techniques which enable us to construct new Hadamard spaces out of known ones. We start with the definition of convexity, which will be further explored in Chapter 2. Let (H, 𝑑) be a Hadamard space. A set 𝐶 ⊂ H is convex if, given 𝑥, 𝑦 ∈ 𝐶, we have [𝑥, 𝑦] ⊂ 𝐶. We say that a function 𝑓 : H → (−∞, ∞] is convex if, for every geodesic 𝛾 : [0, 1] → H, the function 𝑓 ∘ 𝛾 is convex, that is,
𝑓 (𝛾𝑡 ) ≤ (1 − 𝑡)𝑓 (𝛾0 ) + 𝑡𝑓 (𝛾1 ) , for every 𝑡 ∈ (0, 1).
Subspaces A closed convex subset of a Hadamard space with the restricted metric is itself a Hadamard space.
Gluing along convex sets Let (H𝑖 , 𝑑𝑖 )𝑖∈𝐼 be a family of Hadamard spaces (with an arbitrary index set 𝐼) and 𝐶 be a Hadamard space. Suppose 𝐶𝑖 ⊂ H𝑖 are convex closed and 𝐹𝑖 : 𝐶 → 𝐶𝑖 are isometries, for each 𝑖 ∈ 𝐼. Define the equivalence relation 𝐹𝑖 (𝑥) ∼ 𝐹𝑗 (𝑥) for every 𝑥 ∈ 𝐶 and 𝑖, 𝑗 ∈ 𝐼. Then the quotient of the disjoint union
H := ⨆ H𝑖 / ∼ 𝑖∈𝐼
is said to be constructed by gluing of (H𝑖 , 𝑑𝑖 ) along 𝐶 and it is a Hadamard space.
1.2 Meet Hadamard spaces 
15
Products Let (H𝑖 , 𝑑𝑖 ) be a Hadamard space for every 𝑖 = 1, . . . , 𝑁, where 𝑁 ∈ ℕ. Define H := H1 × ⋅ ⋅ ⋅ × H𝑁 and for every 𝑥 := (𝑥1 , . . . , 𝑥𝑁 ), 𝑦 := (𝑦1 , . . . , 𝑦𝑁 ) ∈ H put 𝑁
2
1 2
𝑑(𝑥, 𝑦) := (∑ 𝑑𝑖 (𝑥𝑖 , 𝑦𝑖 ) ) . 𝑖=1
Then (H, 𝑑) is a Hadamard space. We can extend this construction for an infinite index set and add weights. Assume (H𝑖 , 𝑑𝑖 , 𝑜𝑖 ) are pointed metric spaces and 𝑤𝑖 > 0 for each 𝑖 ∈ 𝐼, where 𝐼 is an arbitrary index set. Then the weighted product of these spaces is defined as 2
H := {𝑥 ∈ ∏ H𝑖 : ∑ 𝑤𝑖 𝑑𝑖 (𝑥𝑖 , 𝑜𝑖 ) < ∞} , 𝑖∈𝐼
(1.2.10)
𝑖∈𝐼
with the metric 2
1 2
𝑑(𝑥, 𝑦) := (∑ 𝑤𝑖 𝑑𝑖 (𝑥𝑖 , 𝑦𝑖 ) ) . 𝑖∈𝐼
It is easy to show that (H, 𝑑) is a Hadamard space if all the factors are such; see Exercise 1.8.
Warped products Let (H1 , 𝑑1 ) and (H2 , 𝑑2 ) be Hadamard spaces and 𝑓 : H1 → (0, ∞) be convex continuous function. The length of a path (𝛽1 , 𝛽2 ) in H1 × H2 is given as 𝑛
2
2
2
1
sup ∑ [𝑑1 (𝛽1 (𝑡𝑖−1 ) , 𝛽1 (𝑡𝑖 )) + 𝑓 (𝛽1 (𝑡𝑖−1 )) 𝑑2 (𝛽2 (𝑡𝑖−1 ) , 𝛽2 (𝑡𝑖 )) ] 2 , 𝑖=1
where we take supremum over the set of all partitions 0 = 𝑡0 < ⋅ ⋅ ⋅ < 𝑡𝑛 = 1 of [0, 1], with an arbitrary 𝑛 ∈ ℕ. We then define the distance between two points in H1 × H2 as the infimum of pathlengths between these two points. The resulting space, called the warped product of H1 and H2 with respect to the function 𝑓, is a Hadamard space and is denoted H1 ×𝑓 H2 .
Ultraproducts and ultrapowers A finitely additive probability measure 𝜔 on ℕ is called a nonprincipal ultrafilter on ℕ if every set 𝑆 ⊂ ℕ is measurable, the value of 𝜔(𝑆) is either 0 or 1, and 𝜔(𝑆) = 0 whenever 𝑆 is finite. The existence of nonprincipal ultrafilters is a standard result, which follows from Zorn’s lemma². There is however no distinguished nonprincipal 2 We assume Zorn’s lemma here. In this connection recall Jerry Bona’s famous quote: “The axiom of choice is obviously true, the wellordering principle obviously false, and who can tell about Zorn’s lemma?”
16  1 Geometry of nonpositive curvature ultrafilter on ℕ and we can therefore choose an arbitrary one, let us denote it 𝜔. On the other hand one must remember that all the objects in the following construction do depend on this choice. Ultralimits are limits with respect to a nonprincipal ultrafilter, more precisely, given a bounded sequence (𝑎𝑛 ) ⊂ ℝ, there exists a unique number 𝑙 ∈ ℝ such that
𝜔 {𝑛 ∈ ℕ : 𝑎𝑛 − 𝑙 < 𝜀} = 1 , for every 𝜀 > 0. We say 𝑙 is the ultralimit of (𝑎𝑛 ) and use the notation 𝑙 = 𝜔lim𝑛 𝑎𝑛 . Let (𝑋𝑛 , 𝑑𝑛 , 𝑜𝑛 )𝑛∈ℕ be a sequence of pointed metric spaces. Let 𝑋 denote the set of sequences (𝑥𝑛 ), where 𝑥𝑛 ∈ 𝑋𝑛 and sup𝑛∈ℕ 𝑑𝑛 (𝑥𝑛 , 𝑜𝑛 ) < ∞. Next define
𝑥 ∼ 𝑦 if 𝜔lim 𝑑𝑛 (𝑥𝑛, 𝑦𝑛) = 0, 𝑛 which is an equivalence relation on 𝑋. Denote the quotient 𝑋𝜔 = 𝑋/ ∼ and put
𝑑𝜔 ((𝑥𝑛) , (𝑦𝑛)) := 𝜔lim 𝑑𝑛 (𝑥𝑛, 𝑦𝑛) . 𝑛 Then (𝑋𝜔 , 𝑑𝜔 ) is a complete metric space which we call the ultralimit, or ultraproduct, of (𝑋𝑛 , 𝑑𝑛 , 𝑜𝑛 ). For a constant sequence of spaces, we use the term ultrapower. If all (𝑋𝑛 , 𝑑𝑛 ) are length spaces (geodesic spaces, respectively), so is the ultralimit. The ultralimit of a sequence of pointed CAT(0) spaces is a Hadamard space. Example 1.2.16 (Asymptotic cone). Let (𝑋, 𝑑) be a metric space and 𝑜 ∈ 𝑋. The ultra1 limit of the sequence (𝑋, 𝑛 𝑑, 𝑜) is called the asymptotic cone of 𝑋 at the point 𝑜.
Gromov–Hausdorff limits Recall that the Gromov–Hausdorff distance between compact metric spaces 𝑋, 𝑌 is
𝑑GH (𝑋, 𝑌) := inf 𝑑H (𝑓(𝑋), 𝑔(𝑌)) , where we take inf over all metric spaces 𝑍 and all isometries 𝑓 : 𝑋 → 𝑍 and 𝑔 : 𝑌 → 𝑍. The convergence induced by 𝑑GH is called Gromov–Hausdorff. Note 𝑋 and 𝑌 are isometric if and only if 𝑑GH (𝑋, 𝑌) = 0. The pointed Gromov–Hausdorff convergence is an appropriate analog of the Gromov–Hausdorff convergence for locally compact spaces. A sequence of pointed metric spaces (𝑋𝑛 , 𝑑𝑛 , 𝑜𝑛 ) converges to (𝑋, 𝑑, 𝑜) if for every 𝑟 > 0 we have
𝑑GH (𝐵 (𝑜𝑛, 𝑟) , 𝐵 (𝑜, 𝑟)) → 0 . Both the Gromov–Hausdorff and pointed Gromov–Hausdorff convergences preserve the property of being Hadamard. In particular, this applies to a sequence of Hadamard manifolds, which can be actually regarded as one of the motivations for defining Hadamard spaces.
1.2 Meet Hadamard spaces 
17
If (𝑋𝑛 , 𝑑𝑛 , 𝑜𝑛 ) is a sequence of locally compact metric spaces, then its pointed Gromov–Hausdorff limit is isometric to its ultralimit with respect to an arbitrary ultrafilter. Ultralimits are hence a faithful generalization of the pointed Gromov–Hausdorff convergence for spaces which are not locally compact. For another such generalization called an asymptotic relation, see [124, Definition 3.1].
Universal coverings We will now show how to construct a Hadamard space out of a locally Hadamard space. Recall that a metric space is locally Hadamard if it is connected and each of its points has a neighborhood which is a Hadamard space. Let 𝑋 be a connected topological space. A connected and simply connected topological space 𝑋̃ is called a universal covering of 𝑋 if there exists a continuous surjective mapping 𝜋 : 𝑋̃ → 𝑋 such that each 𝑥 ∈ 𝑋 has an open neighborhood 𝑈 with the property that 𝜋−1 (𝑈) is a disjoint union of open sets in 𝑋̃ each of which is mapped homeomorphically onto 𝑈 by 𝜋. A universal covering is unique. Given a path 𝛾 : [0, 1] → 𝑋 and point 𝑥0 ∈ 𝜋−1 (𝛾0 ), there exists a unique path 𝛾 ̃ : [0, 1] → 𝑋̃ such that 𝛾0̃ = 𝑥0 and 𝜋(𝛾𝑡̃ ) = 𝛾𝑡 for 𝑡 ∈ [0, 1]. The path 𝛾̃ is called a lift of 𝛾 through 𝑥0 . We define the distance 𝑑 ̃ to be the induced length metric on 𝑋̃ . Hadamard spaces also arise as universal coverings of locally Hadamard spaces by a version of the Cartan–Hadamard theorem [4]. More precisely, if (H, 𝑑) is a locally ̃ 𝑑)̃ is a Hadamard space. Hadamard space, then its universal covering (H,
Tangent cones Unlike Riemannian manifolds, a geodesic metric space has no tangent spaces. It is however possible to construct a tangent cone at a given point. It turns out that a (completed) tangent cone of a locally Hadamard space is again a Hadamard space. Let us start with the notion of a Euclidean cone over a metric space. Given a metric space (𝑋, 𝑑), we define the Euclidean cone over 𝑋 by
cone 𝑋 := ([0, ∞) × 𝑋) / ∼ , where the equivalence relation ∼ is defined by (𝑠, 𝑥) ∼ (𝑡, 𝑦) if (𝑠 = 𝑡 = 0) or (𝑠 = 𝑡 and 𝑥 = 𝑦). For a pair of points (𝑠, 𝑥) and (𝑡, 𝑦) in cone 𝑋, we set 1
𝑑cone ((𝑠, 𝑥), (𝑡, 𝑦)) := [𝑠2 + 𝑡2 − 2𝑠𝑡 cos (min {𝑑(𝑥, 𝑦), 𝜋})] 2 .
(1.2.11)
Then (cone 𝑋, 𝑑cone ) is a metric space which is complete if and only (𝑋, 𝑑) is such. Next we define the space of directions. Let now (𝑋, 𝑑) be a geodesic space and fix a point 𝑝 ∈ 𝑋. Two geodesics 𝛾, 𝜂 issuing from 𝑝 are said to determine the same direction if their angle 𝛼(𝛾, 𝜂) = 0. By Lemma 1.2.7, the angle determines an equivalence relation and the space of directions at the point 𝑝 is the quotient of the set of geodesics
18  1 Geometry of nonpositive curvature issuing from 𝑝 with respect to this equivalence relation. Lemma 1.2.7 also gives that 𝛼(⋅, ⋅) is a metric on the space of directions. We denote the space of directions at 𝑝 by dir𝑝 𝑋 and its completion with respect to the metric 𝛼 by dir𝑝 𝑋. Finally, we define the tangent cone at 𝑝 as the Euclidean cone over dir𝑝 𝑋, that is 𝑇𝑝 𝑋 := cone(dir𝑝 𝑋), and the completed tangent cone at 𝑝 as the Euclidean cone over dir𝑝 𝑋, that is 𝑇𝑝 𝑋 := cone(dir𝑝 𝑋). As a matter of fact, the completed tangent cone at 𝑝 can be equivalently obtained as the completion of the tangent cone 𝑇𝑝 with respect to the cone metric (1.2.11). This metric on 𝑇𝑝 𝑋 or 𝑇𝑝 𝑋 is usually denoted by 𝑑𝑝 . Theorem 1.2.17. Let (H, 𝑑) be a locally Hadamard space. The completed tangent cone 𝑇𝑝 H at a point 𝑝 ∈ H is a Hadamard space. The previous theorem will be used in the proof of Theorem 4.1.2. For further information on tangent cones and the proof of the above theorem, see [51].
Nonlinear Lebesgue spaces We will now define L𝑝 (𝛺, H) spaces of mapping from a probability space (𝛺, F, 𝜇) to a Hadamard space (H, 𝑑). Here F denotes a 𝜎algebra on 𝛺 and 𝜇 is a probability measure. On H we consider the Borel 𝜎algebra. Given two measurable maps 𝑓 : 𝛺 → H and 𝑔 : 𝛺 → H, we put 1
{ {[∫ 𝑑 (𝑓(𝜔), 𝑔(𝜔))𝑝 d𝜇(𝜔)] 𝑝 , if 𝑝 ∈ [1, ∞) , 𝑑𝑝 (𝑓, 𝑔) := { 𝛺 { if 𝑝 = ∞. {ess sup𝜔∈𝛺 𝑑 (𝑓(𝜔), 𝑔(𝜔)) , The space H can be identified with constant maps from 𝛺 to H. We will say that an Fmeasurable map 𝑓 : 𝛺 → H belongs to the space L𝑝 (𝛺, H) if 𝑑𝑝 (𝑓, 𝑥) < ∞, for some (or equivalently any) point 𝑥 ∈ H. More precisely, the space L𝑝 (𝛺, H) contains equivalence classes of such maps, that is, we take the quotient with respect to
𝑓 ∼ 𝑔 if 𝑑𝑝 (𝑓, 𝑔) = 0. Then 𝑑𝑝 becomes a metric on L𝑝 (𝛺, H). Notice that the space L𝑝 (𝛺, H) of course depends on 𝛺, F, 𝜇 and H, and 𝑑, but we explicitly write only 𝛺 and H if no ambiguity is possible. In Section 7.3, we will need to specify the 𝜎algebra as well. Proposition 1.2.18. The metric space (L𝑝 (𝛺, H), 𝑑𝑝 ) is complete. If 𝑝 = 2, it is Hadamard. Proof. Step 1: We first show completeness. Let (𝑓𝑛 ) be a Cauchy sequence in L𝑝 (𝛺, H). Select a subsequence such that
𝑑𝑝 (𝑓𝑛𝑘 , 𝑓𝑛𝑘+1 ) ≤ 2−𝑘−1 ,
1.2 Meet Hadamard spaces 
19
∞
for every 𝑘 ∈ ℕ, and define the mapping 𝑔𝑙 (𝜔) := ∑𝑘=𝑙 𝑑(𝑓𝑛𝑘 (𝜔), 𝑓𝑛𝑘+1 (𝜔)), for each 𝑙 ∈ ℕ and almost every 𝜔 ∈ 𝛺. Then 𝑔𝑙 : 𝛺 → [0, ∞] and its 𝐿𝑝 norm satisfies ∞
∞
𝑘=𝑙
𝑘=𝑙
−𝑘−1 = 2−𝑙 . 𝑔𝑙 𝐿𝑝 ≤ ∑ 𝑑𝑝 (𝑓𝑛𝑘 , 𝑓𝑛𝑘+1 ) ≤ ∑ 2 𝐿𝑝
Thus 𝑔𝑙 → 0 and consequently 𝑔𝑙 → 0 almost everywhere on 𝛺. Since 𝑗−1
𝑑 (𝑓𝑛𝑗 (𝜔), 𝑓𝑛𝑙 (𝜔)) ≤ ∑ 𝑑 (𝑓𝑛𝑘 (𝜔), 𝑓𝑛𝑘+1 (𝜔)) ≤ 𝑔𝑙 (𝜔), 𝑘=𝑙
for almost every 𝜔 ∈ 𝛺, the sequence (𝑓𝑛𝑙 (𝜔)) is Cauchy. There exists a point 𝑓(𝜔) ∈ H such that 𝑓𝑛𝑙 (𝜔) → 𝑓(𝜔) as 𝑙 → ∞, for almost every 𝜔 ∈ 𝛺. Moreover, we have
𝑑 (𝑓(𝜔), 𝑓𝑛𝑙 (𝜔)) ≤ 𝑔𝑙 (𝜔) → 0, which implies that 𝑓 : 𝛺 → H is measurable and also 𝑑𝑝 (𝑓, 𝑓𝑛𝑙 ) ≤ ‖𝑔𝑙 ‖𝐿𝑝 → 0. Thus 𝑓𝑛𝑙 → 𝑓 in L𝑝 (𝛺, H), and since (𝑓𝑛 ) is Cauchy, also 𝑓𝑛 → 𝑓 in L𝑝 (𝛺, H). It shows the first statement. Step 2: Let us now consider the case 𝑝 = 2. For 𝑓0 , 𝑓1 ∈ L2 (𝛺, H) and almost every 𝜔 ∈ 𝛺, let 𝑡 → 𝑓𝑡 (𝜔) be the geodesic connecting 𝑓0 (𝜔) and 𝑓1 (𝜔), hence
𝑑 (𝑓𝑠 (𝜔), 𝑓𝑡 (𝜔)) = 𝑡 − 𝑠𝑑 (𝑓0 (𝜔), 𝑓1 (𝜔)) ,
(1.2.12)
for every 𝑠, 𝑡 ∈ [0, 1]. By Lemma 1.2.2, the geodesic depends continuously on the endpoints 𝑓0 (𝜔) and 𝑓1 (𝜔), and thus in a measurable way on 𝜔. We can integrate (1.2.12) to get
𝑑2 (𝑓𝑠 , 𝑓𝑡 ) = 𝑡 − 𝑠𝑑2 (𝑓0 , 𝑓1 ) . This implies that, for each 𝑡 ∈ [0, 1], the mapping 𝑓𝑡 : 𝛺 → H belongs to L2 (𝛺, H) and 𝑡 → 𝑓𝑡 is a geodesic in L2 (𝛺, H). For 𝑔 ∈ L2 (𝛺, H) and almost every 𝜔 ∈ 𝛺, we have by (1.2.2) that 2
2
𝑑 (𝑔(𝜔), 𝑓𝑡 (𝜔)) ≤ (1 − 𝑡)𝑑 (𝑔(𝜔), 𝑓0 (𝜔)) + 𝑡𝑑 (𝑔(𝜔), 𝑓1 (𝜔))
2
2
− 𝑡(1 − 𝑡)𝑑 (𝑓0 (𝜔), 𝑓1 (𝜔)) , and after integration, 2
2
2
2
𝑑2 (𝑔, 𝑓𝑡 ) ≤ (1 − 𝑡)𝑑2 (𝑔, 𝑓0 ) + 𝑡𝑑2 (𝑔, 𝑓1 ) − 𝑡(1 − 𝑡)𝑑2 (𝑓0 , 𝑓1 ) , which completes the proof.
20  1 Geometry of nonpositive curvature
1.3 Equivalent conditions for CAT(0) We first give equivalent conditions for a complete metric space to be CAT(0) in Theorem 1.3.2. Then in Theorem 1.3.3 we give equivalent conditions for a geodesic metric space to be CAT(0). Definition 1.3.1 (4point property). A metric space (𝑋, 𝑑) is said to have the 4point property if for every 𝑥1 , 𝑥2 , 𝑦1 , 𝑦2 ∈ 𝑋 there exist 𝑥1 , 𝑥2 , 𝑦1 , 𝑦2 ∈ ℝ2 such that 𝑑(𝑥𝑖 , 𝑦𝑗 ) = ‖𝑥𝑖 − 𝑦𝑗 ‖ for 𝑖, 𝑗 ∈ {1, 2}, and 𝑑(𝑥1 , 𝑥2 ) ≤ ‖𝑥1 − 𝑥2 ‖ and 𝑑(𝑦1 , 𝑦2 ) ≤
‖𝑦1 − 𝑦2 ‖. Such a quadruple of points of ℝ2 is called a subembedding of 𝑥1 , 𝑥2 , 𝑦1 , 𝑦2 .
Theorem 1.3.2. Let (𝑋, 𝑑) be a complete metric space. The following conditions are equivalent: (i) The space (𝑋, 𝑑) is CAT(0). (ii) Each pair of points in 𝑋 has approximate metric midpoints and 𝑋 enjoys the 4point property. (iii) For every pair of points 𝑥, 𝑦 ∈ 𝑋 there exists 𝑚 ∈ 𝑋 such that for each 𝑧 ∈ 𝑋 we have
𝑑(𝑚, 𝑧)2 ≤ 12 𝑑(𝑥, 𝑧)2 + 12 𝑑(𝑦, 𝑧)2 − 14 𝑑(𝑥, 𝑦)2 . (iv) For every pair of points 𝑥, 𝑦 ∈ 𝑋 and 𝜀 > 0 there exists 𝑚 ∈ 𝑋 such that for each 𝑧 ∈ 𝑋 we have
𝑑(𝑚, 𝑧)2 ≤ 12 𝑑(𝑥, 𝑧)2 + 12 𝑑(𝑦, 𝑧)2 − 14 𝑑(𝑥, 𝑦)2 + 𝜀.
(1.3.13)
Proof. Step 1: We start with the implication (i) ⇒ (ii). Each pair of point has a midpoint, hence it suffices to show the 4point property. Choose four points 𝑥1 , 𝑦1 , 𝑥2 , 𝑦2 ∈ 𝑋 and consider the quadrilateral 𝑄 ⊂ ℝ2 formed by comparison triangles △(𝑥1 , 𝑥2 , 𝑦1 ) and △(𝑥1 , 𝑥2 , 𝑦2 ) with a common edge [𝑥1 , 𝑥2 ] and with 𝑦1 and 𝑦2 on opposite sides of the line containing [𝑥1 , 𝑥2 ]. If 𝑄 is convex, then the diagonals [𝑥1 , 𝑥2 ] and [𝑦1 , 𝑦2 ] intersect at a point 𝑧. Let 𝑧 ∈ [𝑥1 , 𝑥2 ] be the point such that 𝑑(𝑥1 , 𝑧) = ‖𝑥1 − 𝑧‖. By the definition of CAT(0) we get
𝑑 (𝑦1 , 𝑦2 ) ≤ 𝑑 (𝑦1 , 𝑧) + 𝑑 (𝑧, 𝑦2 ) ≤ 𝑦1 − 𝑧 + 𝑧 − 𝑦2 = 𝑦1 − 𝑦2 ,
which shows that the vertices of 𝑄 form a subembedding of 𝑥1 , 𝑦1 , 𝑥2 , 𝑦2 . If 𝑄 is not convex, then an easy application of Alexandrov’s lemma (Exercise 1.3) yields a quadruple of points of ℝ2 satisfying the condition of the 4point property. Step 2: To prove the implication (ii) ⇒ (iii), we first show that (ii) ensures the existence of metric midpoints. Given a pair of points 𝑥, 𝑦 ∈ 𝑋, choose a sequence of approximate metric midpoints (𝑚𝑖 )𝑖∈ℕ such that
max {𝑑 (𝑥, 𝑚𝑖 ) , 𝑑 (𝑦, 𝑚𝑖 )} ≤ 12 𝑑(𝑥, 𝑦) + 1𝑖 .
1.3 Equivalent conditions for CAT(0)  21
We need to show that this sequence is Cauchy. Let (𝑥 , 𝑚𝑖 , 𝑦, 𝑚𝑗 ) be a subembedding of (𝑥, 𝑚𝑖 , 𝑦, 𝑚𝑗 ). Then
𝑑(𝑥, 𝑦) ≤ 𝑥 − 𝑦 ≤ 𝑥 − 𝑚𝑖 + 𝑚𝑖 − 𝑦 = 𝑑 (𝑥, 𝑚𝑖 ) + 𝑑 (𝑚𝑖 , 𝑦) ≤ 𝑑(𝑥, 𝑦) +
2 𝑖
,
which implies that the sequence (𝑚𝑖 ) converges to the metric midpoint of [𝑥 , 𝑦 ]. Since 𝑑(𝑚𝑖 , 𝑚𝑗 ) ≤ ‖𝑚𝑖 − 𝑚𝑗 ‖, we get that (𝑚𝑖 ) is Cauchy, and hence converges to a point 𝑚 ∈ 𝑋, which is clearly a metric midpoint of [𝑥, 𝑦]. It remains to show that for each 𝑧 ∈ 𝑋, the inequality in (iii) holds. Find a subembedding (𝑧 , 𝑥 , 𝑚 , 𝑦 ) ⊂ ℝ2 of (𝑧, 𝑥, 𝑚, 𝑦). Since 𝑑(𝑧, 𝑚) ≤ ‖𝑧 − 𝑚 ‖, we get the inequality in (iii) by an elementary calculation in ℝ2 . Step 3: We prove (iii) ⇒ (i). Applying (iii) with 𝑧 := 𝑥 and then with 𝑧 := 𝑦 gives the existence of metric midpoints. Proposition 1.1.3 then yields that 𝑋 is a geodesic space. The proof that it has the CAT(0) property is identical to that of the implications (ii) ⇒ (iii) and (iii) ⇒ (i) of Theorem 1.3.3 below. Step 4: The implication (iii) ⇒ (iv) is trivial. We will prove its converse. For each 𝜀 > 0, the condition in (iv) gives a point 𝑚𝜀 satisfying (1.3.13). If we apply (1.3.13) first with 𝑧 := 𝑥 and then with 𝑧 := 𝑦, and add the resulting inequalities up, we obtain 2 1 𝑑 (𝑚𝜀 , 𝑥) 2
2
2
+ 12 𝑑 (𝑚𝜀 , 𝑦) ≤ 14 𝑑 (𝑥, 𝑦) + 𝜀 .
On the other hand, invoking the condition in (iv) with 𝛿 > 0 and 𝑧 := 𝑚𝜀 yields
𝑑(𝑚𝛿 , 𝑚𝜀 )2 ≤ 12 𝑑(𝑥, 𝑚𝜀 )2 + 12 𝑑(𝑦, 𝑚𝜀 )2 − 14 𝑑(𝑥, 𝑦)2 + 𝛿 ≤ 𝜀 + 𝛿 , which shows that (𝑚𝜀 ) is Cauchy and one can readily verify that its limit point is the desired point 𝑚 in (iii). Using the condition in Theorem 1.3.2 (iii) as a definition of a Hadamard space is probably most elegant. One then shows the existence of geodesics as a consequence of the definition. We however choose to use the classical definition from comparison geometry. We will now state the second theorem which gives equivalent conditions for a geodesic space to be CAT(0). Theorem 1.3.3. Let (𝑋, 𝑑) be a geodesic space. The following assertions are equivalent: (i) The space (𝑋, 𝑑) is CAT(0). (ii) For every 𝑥, 𝑦, 𝑧 ∈ 𝑋 we have
𝑑(𝑥, 𝑚)2 ≤ 12 𝑑(𝑥, 𝑦)2 + 12 𝑑(𝑥, 𝑧)2 − 14 𝑑(𝑦, 𝑧)2 ,
(1.3.14)
where 𝑚 is the midpoint of [𝑦, 𝑧]. (iii) For every geodesic 𝑥 : [0, 1] → 𝑋 and every point 𝑝 ∈ 𝑋 we have 2
2
2
2
𝑑 (𝑝, 𝑥𝑡 ) ≤ (1 − 𝑡)𝑑 (𝑝, 𝑥0 ) + 𝑡𝑑 (𝑝, 𝑥1 ) − 𝑡(1 − 𝑡)𝑑 (𝑥0 , 𝑥1 ) .
22  1 Geometry of nonpositive curvature (iv) The angle between the sides of every geodesic triangle in 𝑋 is no greater than the angle between the corresponding sides of its comparison triangle. (v) For every 𝑥, 𝑦, 𝑢, 𝑣 ∈ 𝑋 we have
𝑑(𝑥, 𝑢)2 + 𝑑(𝑦, 𝑣)2 ≤ 𝑑(𝑥, 𝑦)2 + 𝑑(𝑢, 𝑣)2 + 2𝑑(𝑥, 𝑣)𝑑(𝑦, 𝑢) .
(1.3.15)
(vi) For every 𝑥, 𝑦, 𝑢, 𝑣 ∈ 𝑋 we have
𝑑(𝑥, 𝑢)2 + 𝑑(𝑦, 𝑣)2 ≤ 𝑑(𝑥, 𝑦)2 + 𝑑(𝑦, 𝑢)2 + 𝑑(𝑢, 𝑣)2 + 𝑑(𝑣, 𝑥)2 .
(1.3.16)
(vii)The space (𝑋, 𝑑) is Busemann, and
𝑑(𝑥, 𝑦)𝑑(𝑢, 𝑣) ≤ 𝑑(𝑥, 𝑢)𝑑(𝑦, 𝑣) + 𝑑(𝑥, 𝑣)𝑑(𝑦, 𝑢) ,
(1.3.17)
for all 𝑥, 𝑦, 𝑢, 𝑣 ∈ 𝑋. Proof. The implication (i) ⇒ (ii) follows from (1.2.3). To see that (ii) implies (iii) it suffices to show the inequality in (iii) for all dyadic 𝑡 ∈ [0, 1]. We will proceed by induction. The inequality obviously holds for 𝑡 = 0 and 𝑡 = 1. Assume it holds for each 𝑡 = 𝑘/2𝑛 , where 𝑘 = 1, . . . , 2𝑛 and 𝑛 ∈ ℕ is fixed. We want to show it holds also for 𝑡 = 𝑘/2𝑛+1 , where 𝑘 = 1, . . . , 2𝑛+1 . For 𝑘 even it is just the assumption. Fix 𝑡 := 𝑘/2𝑛+1 with 𝑘 odd and put 𝑠 := 2−𝑛−1 . By (ii) we have 2
2
2
2
𝑑 (𝑝, 𝑥𝑡 ) ≤ 12 𝑑 (𝑝, 𝑥𝑡−𝑠 ) + 12 𝑑 (𝑝, 𝑥𝑡+𝑠 ) − 14 𝑑 (𝑥𝑡−𝑠 , 𝑥𝑡+𝑠 ) , and by the assumption 2
2
2
2
𝑑 (𝑝, 𝑥𝑡±𝑠 ) ≤ (1 − 𝑡 ∓ 𝑠)𝑑 (𝑝, 𝑥0 ) + (𝑡 ± 𝑠)𝑑 (𝑝, 𝑥1 ) − (1 − 𝑡 ∓ 𝑠)(𝑡 ± 𝑠)𝑑 (𝑥0 , 𝑥1 ) , which together gives 2
2
𝑑 (𝑝, 𝑥𝑡 ) ≤ (1 − 𝑡)𝑑 (𝑝, 𝑥0 ) + 𝑡𝑑 (𝑝, 𝑥1 )
2
− [𝑠2 + 12 (1 − 𝑡 − 𝑠)(𝑡 + 𝑠) + 12 (1 − 𝑡 + 𝑠)(𝑡 − 𝑠)] 𝑑 (𝑥0 , 𝑥1 ) 2
2
2
2
= (1 − 𝑡)𝑑 (𝑝, 𝑥0 ) + 𝑡𝑑 (𝑝, 𝑥0 ) − 𝑡(1 − 𝑡)𝑑 (𝑥0 , 𝑥1 ) . This shows (iii). To prove (iii) ⇒ (i) we again choose a geodesic triangle △(𝑝, 𝑞, 𝑟) ⊂ 𝑋 along with 𝑥 ∈ [𝑝, 𝑞] and 𝑦 ∈ [𝑝, 𝑟]. Let 𝑦 denote the corresponding point for 𝑦 in the comparison triangle △(𝑝, 𝑥, 𝑟), and let 𝑥 and 𝑦 be the comparison points for 𝑥 and 𝑦, respectively, in the comparison triangle △(𝑝, 𝑞, 𝑟). By (iii) we have 𝑑(𝑥, 𝑦) ≤ ‖𝑥 − 𝑦‖ and 𝑑(𝑥, 𝑟) ≤ ‖𝑥 − 𝑟‖. These two inequalities give 𝛼(𝑥, 𝑝, 𝑦) ≤ 𝛼(𝑥, 𝑝, 𝑟) and 𝛼(𝑥, 𝑝, 𝑟) ≤ 𝛼(𝑞, 𝑝, 𝑟), respectively, which together yields 𝛼(𝑥, 𝑝, 𝑦) ≤ 𝛼(𝑞, 𝑝, 𝑟). Therefore 𝑑(𝑥, 𝑦) ≤ ‖𝑥 − 𝑦‖. The implications (i) ⇒ (iv) follows from the definition of an angle. Let us next show (iv) ⇒ (iii). Recall that (iii) is equivalent to (1.2.1). Choose a geodesic triangle △(𝑝, 𝑞, 𝑟) and a point 𝑥 ∈ [𝑞, 𝑟]. Fix a geodesic [𝑝, 𝑥] and let
1.3 Equivalent conditions for CAT(0) 
23
△(𝑝, 𝑞, 𝑥) and △(𝑝, 𝑟, 𝑥) be comparison triangles for △(𝑝, 𝑞, 𝑥) and △(𝑝, 𝑟, 𝑥), respectively, with a common edge [𝑝, 𝑥] and 𝑞 and 𝑟 on opposite sides of the line containing [𝑝, 𝑥]. Since by (iv) we have 𝛼(𝑝, 𝑥, 𝑟) ≤ 𝛼(𝑝, 𝑥, 𝑟) and 𝛼(𝑝, 𝑥, 𝑞) ≤ 𝛼(𝑝, 𝑥, 𝑞), and also 𝛼(𝑞, 𝑥, 𝑟) = 𝜋, we get that 𝛼(𝑝, 𝑥, 𝑟) + 𝛼(𝑝, 𝑥, 𝑞) ≥ 𝜋. Applying Alexandrov’s lemma (Exercise 1.3) yields that 𝑑(𝑝, 𝑥) ≤ 𝑝 − 𝑥 , where 𝑥 is the comparison point for 𝑥 in the comparison triangle △(𝑝, 𝑞, 𝑟) for the geodesic triangle △(𝑝, 𝑞, 𝑟). The implications (i) ⇒ (v) ⇒ (vi) follow immediately from Corollary 1.2.5. We will show (vi) implies (ii) in several steps. Step 1: Inequality (1.3.16) forces 𝑋 to be uniquely geodesic, for if 𝛾, 𝜂 : [0, 1] → 𝑋 were two geodesics connecting a pair of points 𝑥, 𝑦 ∈ 𝑋, applying (1.3.16) to 𝑥, 𝛾 1 , 𝑦, 𝜂 1 yields 𝛾 1 = 𝜂 1 . Hence 𝛾𝑚/2𝑛 = 𝜂𝑚/2𝑛 2
2
2
2
for every 𝑛 ∈ ℕ and 𝑚 = 1, . . . , 2𝑛 , which finally gives 𝛾 = 𝜂. Step 2: For a triple 𝑥, 𝑦, 𝑧 ∈ 𝑋 denote 𝑦 the midpoint of [𝑥, 𝑦] and 𝑧 the midpoint of [𝑥, 𝑧]. We claim that 𝑑 (𝑦 , 𝑧 ) ≤ 𝑑(𝑦, 𝑧) . (1.3.18) Indeed, applying (1.3.16) to 𝑥, 𝑦 , 𝑦, 𝑧 yields 2
2
2
2
𝑑(𝑥, 𝑦)2 + 𝑑 (𝑦 , 𝑧 ) ≤ 12 𝑑(𝑥, 𝑦)2 + 𝑑 (𝑦, 𝑧 ) + 14 𝑑(𝑥, 𝑧)2 , and similarly we obtain
𝑑(𝑥, 𝑧)2 + 𝑑 (𝑦 , 𝑧 ) ≤ 12 𝑑(𝑥, 𝑧)2 + 𝑑 (𝑦 , 𝑧) + 14 𝑑(𝑥, 𝑦)2 , which together gives 2
2
2
2𝑑 (𝑦 , 𝑧 ) + 14 𝑑(𝑥, 𝑦)2 + 14 𝑑(𝑥, 𝑧)2 ≤ 𝑑 (𝑦, 𝑧 ) + 𝑑 (𝑦 , 𝑧) .
(1.3.19)
We next apply (1.3.16) to 𝑦, 𝑧, 𝑧 , 𝑦 and get 2
2
2
𝑑 (𝑦, 𝑧 ) + 𝑑 (𝑦 , 𝑧) ≤ 𝑑(𝑦, 𝑧)2 + 14 𝑑(𝑥, 𝑦)2 + 14 𝑑(𝑥, 𝑧)2 + 𝑑 (𝑦 , 𝑧 ) . (1.3.20) Combining (1.3.19) and (1.3.20) finishes the claim. Step 3: For every 𝑤, 𝑥, 𝑦, 𝑧 ∈ 𝑋 we have
𝑑(𝑤, 𝑦)2 + 𝑑(𝑥, 𝑧)2 ≤ 2𝑑(𝑤, 𝑥)2 + 𝑑(𝑥, 𝑦)2 + 12 𝑑(𝑦, 𝑧)2 + 𝑑(𝑧, 𝑤)2 .
(1.3.21)
Indeed, let 𝑣 be the midpoint of [𝑦, 𝑧] and apply (1.3.16) to 𝑤, 𝑥, 𝑦, 𝑣 and then to 𝑤, 𝑥, 𝑣, 𝑧, which already gives (1.3.21). Step 4: For every 𝑥, 𝑦, 𝑧 ∈ 𝑋 we have
2𝑑(𝑥, 𝑦)2 + 2𝑑(𝑥, 𝑧)2 − 𝑑(𝑦, 𝑧)2 − 4𝑑 (𝑥, ≥ 2 [2𝑑 (
𝑦+𝑧 2 ) 2
(1.3.22)
2 2 𝑥+𝑦 𝑥+𝑦 𝑥+𝑦 𝑦+𝑧 2 , 𝑦) + 2𝑑 ( , 𝑧) − 𝑑(𝑦, 𝑧)2 − 4𝑑 ( , )] . 2 2 2 2
24  1 Geometry of nonpositive curvature 𝑦+𝑧 𝑥+𝑦 , 2 , 𝑥, 𝑧 gives the desired inequality (1.3.22). 2 𝑦+𝑧 𝑋, put 𝑤 := 2 and 𝑦𝑛 := (1 − 21𝑛 )𝑦 + 21𝑛 𝑥, for every 𝑛
Indeed, applying (1.3.21) to Step 5: Given 𝑥, 𝑦, 𝑧 ∈ We want to show
2𝑑(𝑥, 𝑦)2 + 2𝑑(𝑥, 𝑧)2 − 𝑑(𝑦, 𝑧)2 − 4𝑑(𝑥, 𝑤)2 ≥ 0 ,
∈ ℕ.
(1.3.23)
which already gives (vi). We first apply (1.3.22) repeatedly to get
2𝑑(𝑥, 𝑦)2 + 2𝑑(𝑥, 𝑧)2 − 𝑑(𝑦, 𝑧)2 − 4𝑑 (𝑥, 𝑤)2 2
(1.3.24)
2
2
≥ 2𝑛 [2𝑑 (𝑦𝑛, 𝑦) + 2𝑑 (𝑦𝑛, 𝑧) − 𝑑(𝑦, 𝑧)2 − 4𝑑 (𝑦𝑛, 𝑤) ] . 1
1
Then let 𝑣𝑛 := 2 𝑦𝑛 + 2 𝑧 and apply (1.3.22) to the right hand side of (1.3.24) to get
2𝑑(𝑦𝑛 , 𝑦)2 + 2𝑑(𝑦𝑛 , 𝑧)2 − 𝑑(𝑦, 𝑧)2 − 4𝑑 (𝑦𝑛, 𝑤) 2
2
2
(1.3.25) 2
≥ 2 [2𝑑 (𝑣𝑛, 𝑦) + 2𝑑 (𝑣𝑛 , 𝑧) − 𝑑(𝑦, 𝑧)2 − 4𝑑 (𝑣𝑛 , 𝑤) ] . Inequality (1.3.16) applied to 𝑣𝑛 , 𝑦, 𝑤, 𝑧 gives
𝑑(𝑣𝑛 , 𝑤)2 + 𝑑(𝑦, 𝑧)2 ≤ 𝑑(𝑣𝑛 , 𝑦)2 + 𝑑(𝑣𝑛 , 𝑧)2 + 12 𝑑(𝑦, 𝑧)2 . The last inequality along with (1.3.18), (1.3.24) and (1.3.25) yields
2𝑑(𝑥, 𝑦)2 + 2𝑑(𝑥, 𝑧)2 − 𝑑(𝑦, 𝑧)2 − 4𝑑(𝑥, 𝑤)2 2
2
≥ −2𝑛+2 𝑑 (𝑣𝑛 , 𝑤) ≥ −2𝑛+2 𝑑 (𝑦𝑛 , 𝑦) = −22−𝑛𝑑(𝑥, 𝑦)2 . Taking the limit 𝑛 → ∞ gives (1.3.23), and completes the proof (vi) ⇒ (ii). It remains to show that (i) ⇐⇒ (vii). Inequality (1.3.17) holds for every four points in ℝ2 according to Remark 1.3.4. Since a CAT(0) space has the 4point property (see Definition 1.3.1) by Theorem 1.3.2 (completeness is not needed), we immediately get that (i) ⇒ (vii). For the converse implication, the reader is referred to [86, Theorem 1.3]. A consequence of Theorem 1.3.3 (iv) is that the sum of the angles in a geodesic triangle in a Hadamard space is less or equal 𝜋. Remark 1.3.4 (Ptolemy inequality). The classical theorem of Ptolemy states that for every four points 𝑎, 𝑏, 𝑐, 𝑑 ∈ ℝ2 we have that
‖𝑎 − 𝑏‖‖𝑐 − 𝑑‖ ≤ ‖𝑎 − 𝑐‖‖𝑏 − 𝑑‖ + ‖𝑎 − 𝑑‖‖𝑏 − 𝑐‖, and the equality holds if and only if all the four points lie on a common circle. Inequality (1.3.17) is called Ptolemy and metric spaces where it holds true are called Ptolemaic. Hence a geodesic metric space is CAT(0) if and only if it is Busemann and Ptolemaic.
1.3 Equivalent conditions for CAT(0) 
25
Part (vi) of Theorem 1.3.3 also deserves more attention. It is quite a remarkable characterization of CAT(0) in terms of inequalities between mutual distances of finitely many points which was asked for by M. Gromov [97, Section 1.19+]. Inequality (1.3.16) holds for instance for the metric space (𝑌, 𝜎1/2 ) where (𝑌, 𝜎) is an arbitrary metric space, and for ultrametric spaces; see [182, Example 1.2]. There is moreover a deeper structural meaning (1.3.16), which will be stated in Corollary 1.3.6 below. Definition 1.3.5 (Enflo type). A metric space (𝑋, 𝑑) has Enflo type 𝑝 ≥ 0 if 𝑝 is the greatest number for which there is a constant 𝐾 ≥ 1 such that for every 𝑁 ∈ ℕ and {𝑥𝜀 }𝜀∈{−1,1}𝑁 ⊂ 𝑋 we have 𝑝
∑ 𝜀∈{−1,1}𝑁
𝑝
𝑑 (𝑥𝜀 , 𝑥−𝜀 ) ≤ 𝐾𝑝 ∑ 𝑑 (𝑥𝜀 , 𝑥𝜀 ) ,
(1.3.26)
𝜀∼𝜀
𝑁
where 𝜀 = (𝜀1 , . . . , 𝜀𝑁 ) and 𝜀 ∼ 𝜀 stands for ∑1 𝜀𝑖 − 𝜀𝑖  = 2. The least such a constant 𝐾 will be denoted by 𝐸𝑝 (𝑋). All metric spaces have Enflo type ≥ 1 and metric spaces with approximate metric midpoints have Enflo type ≤ 2. The reader is invited to prove these two facts in Exercise 1.11. A Ptolemaic metric space (𝑋, 𝑑) has Enflo type 2 with 𝐸2 (𝑋) = √3; see [154, Proposition 5.3]. As an easy corollary of Theorem 1.3.3 (vi) we obtain the Enflo type for CAT(0) spaces. Corollary 1.3.6. Let (𝑋, 𝑑) be a CAT(0) space. Then it has Enflo type 2 with 𝐸2 (𝑋) = 1. Proof. Since a CAT(0) space has (approximate metric) midpoints we know that the Enflo type is ≤ 2 by Exercise 1.11. We will prove the converse inequality. Notice that inequality (1.3.26) holds with 𝑁 = 2 by Theorem 1.3.3 (vi). Let us assume it holds for some fixed 𝑁 ∈ ℕ and prove it holds for 𝑁 + 1. We thus want to estimate
∑ 𝜂∈{−1,1}𝑁+1
2
𝑑 (𝑥𝜂 , 𝑥−𝜂 ) =
∑ 𝜀∈{−1,1}𝑁
2
2
[𝑑 (𝑥(𝜀,1) , 𝑥(−𝜀,−1) ) + 𝑑 (𝑥(𝜀,−1) , 𝑥(−𝜀,1) ) ] ,
using Theorem 1.3.3 (vi) gives
≤
∑ 𝜀∈{−1,1}𝑁
2
[𝑑 (𝑥(𝜀,1) , 𝑥(𝜀,−1) ) + 𝑑 (𝑥(−𝜀,−1) , 𝑥(−𝜀,1) ) 2
2
2
+ 𝑑 (𝑥(𝜀,1) , 𝑥(−𝜀,1) ) + 𝑑 (𝑥(𝜀,−1) , 𝑥(−𝜀,−1) ) ] . By the assumption that the conclusion holds for 𝑁 we obtain
≤
∑ 𝜀∈{−1,1}𝑁
2
2
[𝑑 (𝑥(𝜀,1) , 𝑥(𝜀,−1) ) + 𝑑 (𝑥(−𝜀,−1) , 𝑥(−𝜀,1) ) ] 2
2
+ ∑ [𝑑 (𝑥(𝜀,1) , 𝑥(𝜀 ,1) ) + 𝑑 (𝑥(𝜀,−1) , 𝑥(𝜀 ,−1) ) ] , 𝜀∼𝜀
26  1 Geometry of nonpositive curvature which is precisely, 2
= ∑ 𝑑 (𝑥𝜂 , 𝑥𝜂 ) , 𝜂∼𝜂
and the induction step is complete. Finally, let us remark that yet another equivalent condition for CAT(0) in terms of probability measures is given in Theorem 2.3.5.
Exercises Exercise 1.1. Let (𝑋, 𝑑) be a metric space. Show that for every 𝑥, 𝑦, 𝑧 ∈ 𝑋 the following inequality holds 1 𝑑(𝑥, 𝑦)2 ≤ 𝑑(𝑥, 𝑧)2 + 𝑑(𝑦, 𝑧)2 , (1.3.27) 2 with the equality occurring if and only if 𝑧 is a midpoint of 𝑥 and 𝑦. Hint. For every 𝑎, 𝑏, 𝑐 ∈ ℝ we have
0 ≤ (𝑎 + 𝑏 − 2𝑐)2 (𝑎 − 𝑏)2 ≤ 2𝑎2 + 2𝑏2 + 4𝑐2 − 4𝑎𝑐 − 4𝑏𝑐 = 2(𝑎 − 𝑐)2 + 2(𝑏 − 𝑐)2 . Conclude that then (1.3.27) holds in ℝ2 and hence also in any other metric space. Exercise 1.2. Let (𝑋, 𝑑) be a metric space. Show that for every 𝑥, 𝑦, 𝑧 ∈ 𝑋 and 𝑡 ∈ (0, 1) the following inequality holds
𝑡(1 − 𝑡)𝑑(𝑥, 𝑦)2 ≤ (1 − 𝑡)𝑑(𝑥, 𝑧)2 + 𝑡𝑑(𝑦, 𝑧)2 .
(1.3.28)
The equality occurs if and only if 𝑑(𝑥, 𝑧) = 𝑡𝑑(𝑥, 𝑦) and 𝑑(𝑧, 𝑦) = (1 − 𝑡)𝑑(𝑥, 𝑦). Hint. Use the hint in Exercise 1.1 and modify as appropriate. Exercise 1.3 (Alexandrov’s lemma). Prove the following result in ℝ2 , which is called Alexandrov’s lemma. Consider four points 𝑎, 𝑏, 𝑏 , 𝑐 ∈ ℝ2 and assume that 𝑏 and 𝑏 lie on the opposite sides of the line containing [𝑎, 𝑐]. Denote 𝛼, 𝛽, 𝛾, and 𝛼 , 𝛽 , 𝛾 the angles at vertices of the triangle △(𝑎, 𝑏, 𝑐), and △(𝑎, 𝑏 , 𝑐), respectively. Assume 𝛾 + 𝛾 ≥ 𝜋. Then
‖𝑏 − 𝑐‖ + 𝑐 − 𝑏 ≤ ‖𝑏 − 𝑎‖ + 𝑎 − 𝑏 ,
and hence there exists a triangle △(𝑎, 𝑏, 𝑏 ) such that ‖𝑎 − 𝑏‖ = ‖𝑎 − 𝑏‖, and ‖𝑎 − 𝑏 ‖ =
‖𝑎−𝑏‖, and ‖𝑏−𝑏 ‖ = ‖𝑏−𝑐‖+‖𝑐−𝑏 ‖. Further, if we denote 𝑐 ∈ [𝑏, 𝑏 ] the point with ‖𝑏 − 𝑐‖ = ‖𝑏 − 𝑐‖, and 𝛼, 𝛽, 𝛽 the angles at vertices 𝑎, 𝑏, 𝑏 in the triangle △(𝑎, 𝑏, 𝑏 ), respectively, then we have 𝛼 + 𝛼 ≤ 𝛼, and 𝛽 ≤ 𝛽, and 𝛽 ≤ 𝛽 .
Bibliographical remarks 
27
Exercise 1.4. Show that a Banach space is CAT(0) if and only it is Hilbert. Exercise 1.5. Prove Proposition 1.1.2 by modifying the proof of Proposition 1.1.3. Exercise 1.6. Give an example of (i) a metric space, which is not a length space, (ii) a length space, which is not geodesic, (iii) a geodesic space, which is not uniquely geodesic, (iv) a uniquely geodesic space, which is not CAT(0). Exercise 1.7. Show that (1.2.1) and (1.2.2) are equivalent. Exercise 1.8. Show that a product of Hadamard spaces is a Hadamard space. Exercise 1.9. Let (H, 𝑑) be a Hadamard space. Show that for every four points 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ∈ H we have 2
2
2
2
𝑑 (𝑥1 , 𝑥3 ) + 𝑑 (𝑥2 , 𝑥4 ) ≤ 𝑑 (𝑥1 , 𝑥2 ) + 𝑑 (𝑥2 , 𝑥3 ) + 𝑑 (𝑥3 , 𝑥4 ) 2
2
+ 𝑑 (𝑥4 , 𝑥1 ) − 𝑑 (𝑚1 , 𝑚2 ) ,
2
(1.3.29)
where 𝑚1 , 𝑚2 are the midpoints of [𝑥1 , 𝑥3 ] and [𝑥1 , 𝑥3 ], respectively. Note that we have equality in (1.3.29) if the space is Euclidean. Also, in the Euclidean case, the points 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 form a parallelogram in ℝ2 if and only if 𝑚1 = 𝑚2 . Then (1.3.29) becomes the parallelogram identity. Hint. Apply (1.2.3) three times: to the point 𝑥2 and geodesic [𝑥1 , 𝑥3 ], to the point 𝑥4 and geodesic [𝑥1 , 𝑥3 ], and finally to the point 𝑚1 and geodesic [𝑥2 , 𝑥4 ]. Exercise 1.10. Prove Lemma 1.2.7. Hint. See [51, Proposition I.1.14]. Exercise 1.11 (Enflo type). Prove that all metric spaces have Enflo type ≥ 1 and metric spaces with approximate metric midpoints have Enflo type ≤ 2.
Bibliographical remarks Much of the material of this chapter has already been covered in the authoritative monograph [51] by M. Bridson and A. Haefliger and the interested reader is referred therein for further details on the geometry of Hadamard spaces. The analytical approach to nonpositive curvature was promoted by K.T. Sturm [195]. For more details on the geometry of Hadamard spaces, we recommend also the books [3, 25, 56, 74, 104] and the survey paper [38]. The notion of a metric midpoint goes back to K. Menger [142]. Propositions 1.1.2 and 1.1.3 in this form appeared in [195, Remark 1.3] and [195, Proposition 1.2], respectively.
28  1 Geometry of nonpositive curvature H. Busemann presented his approach to nonpositive curvature in geodesic spaces in [57, 58]. For a modern account on Busemann spaces, see [104] and [162]. The property in Definition 1.1.4 was used in spaces with unique geodesic lines to define hyperbolic spaces in the sense of Reich–Shafrir [173]. The definition of nonpositive curvature in the sense of Alexandrov comes from [2]. An overview of the historical developments before and after Alexandrov can be found in [51]. Following M. Gromov [95], we use the term CAT(0) spaces, where CAT is an acronym for Cartan–Alexandrov–Toponogov. Another name under which these spaces appear in the literature is globally NPC spaces, or just NPC spaces. There exist CAT(𝜅) spaces for any 𝜅 ∈ ℝ. The real parameter 𝜅 denotes the upper curvature bound and in the manifold case coincides with the upper bound of sectional curvature. If a space is of class CAT(𝜅), then it is also of class CAT(𝜅 ) for each 𝜅 ≤ 𝜅; see [51, Theorem II.1.12]. A. D. Alexandrov [2] was also the first to use an angle in geodesic spaces and therefore it is sometimes referred to as an Alexandrov angle. Lemma 1.2.7 is due to Alexandrov [2] as well; see also [51, Proposition I.1.14]. Proposition 1.2.8 comes from [51, Proposition II.3.3]. Alexandrov’s lemma of Exercise 1.3 first appeared in [6] and has become a basic tool in comparison geometry [51, 56]. Lemma 1.2.2 comes from [51, Proposition II.1.4], we follow an alternative proof from [195, Proposition 2.3]. The statement holds true in Busemann spaces; see [104, Corollary 2.2.5]. The quadrilateral comparison in Proposition 1.2.4 goes back to J. Reshetnyak [179] and was later explored by others [51, 104, 121, 195]. Our proof follows [104, Theorem 2.3.1]. Almost all examples and constructions of Hadamard spaces presented in this chapter have already appeared in [51] and we refer the reader therein. There exist monographs exclusively on Hadamard manifolds, for instance [26, 74]. As references on the Hilbert ball, we recommend [92, 173]. The proof that the Hilbert ball is CAT(0) is scattered in [92, Lemma 17.1], [186, Lemma 2.2] and [187, Lemma 2.3]. Theorem 1.2.14 comes from M. Bridson’s thesis [50], where it appeared in a more general form. Warped products were studied in [5] and in a special case in [59]. Tangent cones are described in detail in [51]. Theorem 1.2.17 is due to I. Nikolaev [153]. Proposition 1.2.18 comes from [192], but nonlinear Lebesgue spaces had been already studied by N. Korevaar and R. Schoen [121] and J. Jost [102, 104, 105]. M. Bridson’s thesis [50] is the first place where various definitions of CAT(0) were proved carefully to be equivalent – after M. Gromov had asserted them to be in his essay on hyperbolic groups [95]. These equivalences were then presented in the Bridson–Haefliger book [51]. The 4point condition first appeared in [179]. Its equivalence with CAT(0) in Theorem 1.3.2 is taken from [51, Proposition II.1.11]. For the characterization in Theorem 1.3.2 (iii) we follow [195, Proposition 2.3]. The condition in Theorem 1.3.2 (iii) is due to F. Bruhat and J. Tits [54] and is sometimes referred to as the Bruhat–Tits in
Bibliographical remarks
 29
equality, or the CNinequality, where CN is an acronym for courbure négative. Part (iv) of Theorem 1.3.2 is [195, Remark 2.2]. Theorem 1.3.3 partly appeared already in [51, Proposition II.1.7]. The proof of (ii) ⇒ (iii) comes from [195, Proposition 2.3]. The implication (vi) ⇒ (ii) is a deep result due to I. Berg and I. Nikolaev [39, Theorem 6]. We follow a newer proof by T. Sato [182]. The characterization in Theorem 1.3.3 (vii) comes from [86, Theorem 1.3]. The Ptolemy inequality is equivalent to the fact that the inversion of the space is a metric space [55]. The Enflo type was introduced by P. Enflo [76–78] under the name roundness and belongs to the classical parts of Banach space theory [37]. Various metric types and cotypes have recently become very popular; see for instance [140, 141] for results connected to Hadamard spaces. There is also a connection with the Kirszbraun extension theorem [127].
2 Convex sets and convex functions In this chapter we introduce convex sets and functions in Hadamard spaces. Convexity is ubiquitous in Hadamard space geometry and it will also be a central element in all our developments in the present book. Since our aim is to show that Hadamard spaces are a suitable framework for convex analysis and optimization, we focus on analytical aspects of convexity. After introducing basic objects and their properties in this chapter, we will get to more advanced topics in the remainder of the book. Let (H, 𝑑) be a Hadamard space. Recall that a set 𝐶 ⊂ H is convex if, given 𝑥, 𝑦 ∈ 𝐶, we have [𝑥, 𝑦] ⊂ 𝐶. We say that a function 𝑓 : H → (−∞, ∞] is convex if, for each geodesic 𝛾 : [0, 1] → H, the function 𝑓 ∘ 𝛾 is convex, that is,
𝑓 (𝛾𝑡 ) ≤ (1 − 𝑡)𝑓 (𝛾0 ) + 𝑡𝑓 (𝛾1 ) , for every 𝑡 ∈ (0, 1). We say that 𝑓 is strictly convex if the inequality is strict whenever 𝛾0 ≠ 𝛾1 . For instance, given a point 𝑥0 ∈ H, the function 𝑑(⋅, 𝑥0 ) is convex by (1.2.4), and consequently the function 𝑑(⋅, 𝑥0 )2 is strictly convex.
2.1 Convex sets We will first focus on (closed) convex sets. After presenting a few examples, we shall examine properties of convex sets. It should be clear that the examples below can be combined using any set operations which preserve convexity. For instance, it is easy to verify that an intersection of an arbitrarily big family of convex closed sets is itself a closed convex set.
Examples Here are several examples of closed convex sets in a Hadamard space (H, 𝑑). Example 2.1.1 (Metric balls). Let 𝑧 ∈ H and 𝑟 > 0. Then the metric ball centered at 𝑧 with radius 𝑟, that is,
𝐵(𝑧, 𝑟) := {𝑥 ∈ H : 𝑑(𝑥, 𝑧) ≤ 𝑟} is a closed convex set by (1.2.2). The value of this example is more obvious when we recall that there exist geodesic spaces with metric balls which are not convex; see Exercise 2.1. Example 2.1.2 (Geodesic rays and lines). An isometry 𝑐 : [0, ∞) → H is called a geodesic ray and an isometry 𝛾 : ℝ → H is called a geodesic line. Then both 𝑐([0, ∞))
2.1 Convex sets
 31
and 𝛾(ℝ) are closed convex sets. Likewise, (the range of) a geodesic 𝑥 : [0, 1] → H is a convex closed set. Example 2.1.3 (Sublevel sets of convex functions). Let 𝑓 : H → (−∞, ∞] be a convex lower semicontinuous function and 𝛽 ∈ ℝ. Then the 𝛽sublevel set of 𝑓, that is,
{𝑥 ∈ H : 𝑓(𝑥) ≤ 𝛽} is a closed convex set. Note that this also captures Example 2.1.1 when we put 𝑓 := 𝑑(⋅, 𝑧) and 𝛽 := 𝑟. Example 2.1.4 (Horoballs). If the function 𝑓 in Example 2.1.3 is a Busemann function (see Example 2.2.10 for the definition), we call the sublevel sets horoballs. They carry a lot of information about the geometry of the underlying space [51]. See also Exercise 2.5. Example 2.1.5 (Fixed point sets). Let 𝐶 ⊂ H be a closed convex set and 𝐹 : 𝐶 → 𝐶 be a nonexpansive mapping. Then the set of fixed points of 𝐹, that is,
Fix 𝐹 := {𝑥 ∈ 𝐶 : 𝑥 = 𝐹𝑥} is closed and convex. According to Theorem 3.3.1 below, the set Fix 𝐹 is nonempty, provided 𝐶 is bounded. Example 2.1.6 (Minimal displacement sets). Let 𝐹 : H → H be an isometry and let
𝛿𝐹 (𝑥) := 𝑑(𝑥, 𝐹𝑥) ,
𝑥 ∈ H,
be its displacement function (see Example 2.2.9). The set
Min 𝐹 := {𝑥 ∈ H : 𝛿𝐹 (𝑥) = inf 𝛿𝐹 } H
is then closed and convex. We say that 𝐹 is semisimple if Min 𝐹 ≠ 0. More generally, given a group 𝐺 which acts by isometries on H, we denote
Min 𝐺 := ⋂ Min 𝐹. 𝐹∈𝐺
It is again a closed convex set. Example 2.1.7. Let (H, 𝑑) be a Hadamard space and (H𝜔 , 𝑑𝜔 ) be its ultrapower. Then H ⊂ H𝜔 is a convex closed set, which is a 1retract. Let (H, 𝑑) be a Hadamard space and 𝑆 ⊂ H. Then the convex hull of 𝑆, denoted co 𝑆, is defined as the intersection of all convex supersets of 𝑆. And likewise the closed convex hull of 𝑆, denoted co 𝑆, is the intersection of all closed convex supersets of 𝑆.
32  2 Convex sets and convex functions Lemma 2.1.8. Let (H, 𝑑) be a Hadamard space and 𝑆 ⊂ H. Put 𝐶0 := 𝑆 and, for 𝑛 ∈ ℕ, let 𝐶𝑛 be the union of all geodesics with endpoints in 𝐶𝑛−1 . Then ∞
co 𝑆 = ⋃ 𝐶𝑛. 𝑛=0
⋃∞ 𝑛=0
Proof. It is easy to see that 𝐶𝑛 is a convex set and contains co 𝑆. On the other hand it is the smallest set with those properties. Next we define an operation closely related to convex hulls which we call a formal convex hull. The following notation will be useful. Notation 2.1.9. Given 𝑁 ∈ ℕ, the symbol 𝛥 𝑁−1 stands for the standard (𝑁−1)dimensional simplex in ℝ𝑁 , that is, the convex hull of the canonical basis 𝑒1 , . . . , 𝑒𝑁 ∈ ℝ𝑁 . Definition 2.1.10 (Formal convex hull). Let (H, 𝑑) be a Hadamard space and 𝑥1 , . . . , 𝑥𝑁 ∈ H. The formal convex hull of the points 𝑥1 , . . . , 𝑥𝑁 is the smallest set 𝑆 ⊂ co{𝑥1, . . . , 𝑥𝑁 } × 𝛥 𝑁−1 satisfying the following two properties: (i) (𝑥𝑛 , 𝑒𝑛) ∈ 𝑆 for every 𝑛 = 1, . . . , 𝑁. (ii) If (𝑢, 𝜆), (𝑤, 𝜇) ∈ 𝑆 and 𝛾 : [0, 1] → H is the geodesic from 𝑢 to 𝑤, then (𝛾𝑡 , (1 − 𝑡)𝜆 + 𝑡𝜇) ∈ 𝑆 for every 𝑡 ∈ [0, 1]. The formal convex hull of the points 𝑥1 , . . . , 𝑥𝑁 is denoted by fco{𝑥1 , . . . , 𝑥𝑁 }. Note that the canonical projections
𝜋1 : fco {𝑥1 , . . . , 𝑥𝑁 } → co {𝑥1 , . . . , 𝑥𝑁 } and
𝜋2 : fco {𝑥1 , . . . , 𝑥𝑁 } → 𝛥 𝑁−1 are both surjective. The formal convex hull will be an important tool in the proof of Theorem 4.1.2, but will be used nowhere else in this book. Question 2.1.11. It is not known whether the following version of Mazur’s theorem holds. Let (H, 𝑑) be a Hadamard space and 𝐾 ⊂ H be compact. Is then co 𝐾 compact? This problem is probably open even for 𝐾 finite.
Projections onto convex sets It is well known that metric projections onto closed convex sets in strictly convex reflexive Banach spaces are well defined singlevalued mappings, which are nonexpansive if and only if the space is Hilbert. It is comforting to learn that metric projections in Hadamard spaces behave equally well. This fact underpins a number of results in the sequel.
2.1 Convex sets
 33
Given a metric space (𝑋, 𝑑) and 𝑆 ⊂ 𝑋, define the distance function by
𝑑(𝑥, 𝑆) := inf 𝑑(𝑥, 𝑠) ,
𝑥 ∈ 𝑋.
𝑠∈𝑆
Interchangeably, we use the symbol 𝑑𝑆 for 𝑑(⋅, 𝑆). Furthermore, denote the nearestpoint mapping by
𝑃𝑆 (𝑥) := {𝑠 ∈ 𝑆 : 𝑑(𝑥, 𝑆) = 𝑑(𝑥, 𝑠)} ,
𝑥 ∈ 𝑋.
Another term for 𝑃𝑆 is a metric projection onto 𝑆. Hence 𝑃𝑆 : 𝑋 → 2𝑆 is a setvalued mapping. If the set 𝑃𝑆 (𝑥) is a singleton, for every 𝑥 ∈ 𝑋, we say that 𝑆 is a Chebyshev set, and consider 𝑃𝑆 as a mapping from 𝑋 to 𝑆. Usually, we write 𝑃𝐶 𝑥 instead of 𝑃𝐶 (𝑥). The following theorem states that any convex closed subset of a Hadamard space is Chebyshev and summarizes the basic properties of the projection. Theorem 2.1.12 (Metric projections). Let (H, 𝑑) be a Hadamard space. Let 𝐶 ⊂ H be a closed convex set. Then: (i) The set 𝐶 is Chebyshev. (ii) If 𝑥 ∈ H \ 𝐶 and 𝑧 ∈ 𝐶 \ {𝑃𝐶 𝑥}, then 2
2
𝑑 (𝑥, 𝑃𝐶 𝑥) + 𝑑 (𝑃𝐶 𝑥, 𝑧) ≤ 𝑑 (𝑥, 𝑧)2 ,
(2.1.1)
𝜋
or equivalently, the angle satisfies 𝛼(𝑥, 𝑃𝐶 𝑥, 𝑧) ≥ 2 . (iii) The mapping 𝑃𝐶 : H → 𝐶 is nonexpansive, that is, we have
𝑑 (𝑃𝐶𝑥, 𝑃𝐶 𝑦) ≤ 𝑑(𝑥, 𝑦), for every 𝑥, 𝑦 ∈ H. Proof. (i) Choose 𝑥 ∈ H. We need to show that the function 𝑐 → 𝑑(𝑥, 𝑐) on 𝐶 has a unique minimizer. It is clear that instead of minimizing 𝑑(𝑥, ⋅), we can equivalently minimize the function 𝑑(𝑥, ⋅)2 . Since the latter is strictly convex, it has at most one minimizer. To show the existence of a minimizer, we take a minimizing sequence (𝑐𝑛) ⊂ 𝐶, that is, 2
𝑑 (𝑥, 𝑐𝑛) → inf 𝑑 (𝑥, ⋅)2 , 𝐶
and denote 𝑐𝑚𝑛 := yields
1 𝑐 2 𝑚
+
1 𝑐. 2 𝑛
Then by the convexity of 𝐶 we have 𝑐𝑚𝑛 ∈ 𝐶, and (1.2.3)
2
2
𝑑 (𝑥, 𝑐𝑚𝑛) ≤ 12 𝑑(𝑥, 𝑐𝑚 )2 + 12 𝑑(𝑥, 𝑐𝑛)2 − 14 𝑑 (𝑐𝑚 , 𝑐𝑛) , or, 2 1 𝑑 (𝑐𝑚 , 𝑐𝑛) 4
2
≤ 12 𝑑(𝑥, 𝑐𝑚 )2 + 12 𝑑(𝑥, 𝑐𝑛)2 − 𝑑 (𝑥, 𝑐𝑚𝑛) ,
34  2 Convex sets and convex functions which, together with the fact that (𝑐𝑛 ) is a minimizing sequence, implies that (𝑐𝑛 ) is Cauchy. The limit point is clearly a minimizer and lies in 𝐶. To prove (ii), we denote 𝛾 : [0, 1] → H the geodesic connecting 𝑃𝐶 𝑥 and 𝑧. Then by (1.2.2) we obtain 2
2
2
𝑑 (𝑥, 𝛾𝑡 ) ≤ (1 − 𝑡)𝑑 (𝑥, 𝑃𝐶 𝑥) + 𝑡𝑑 (𝑥, 𝑧)2 − 𝑡(1 − 𝑡)𝑑 (𝑃𝐶 𝑥, 𝑧) , for any 𝑡 ∈ (0, 1). Since 𝐶 is convex, the geodesic 𝛾 lies in 𝐶, and 2
2
𝑑 (𝑥, 𝑃𝐶 𝑥) ≤ 𝑑 (𝑥, 𝛾𝑡 ) . The last two inequalities together yield 2
2
𝑡𝑑 (𝑥, 𝑃𝐶 𝑥) + 𝑡(1 − 𝑡)𝑑 (𝑃𝐶 𝑥, 𝑧) ≤ 𝑡𝑑 (𝑥, 𝑧)2 . After dividing by 𝑡 and taking the limit 𝑡 → 0, we get the desired inequality 2
2
𝑑 (𝑥, 𝑃𝐶 𝑥) + 𝑑 (𝑃𝐶 𝑥, 𝑧) ≤ 𝑑 (𝑥, 𝑧)2 . By the same argument, we arrive at 2
2
2
𝑑 (𝑥 , 𝑃𝐶𝑥) + 𝑑 (𝑃𝐶 𝑥, 𝑧 ) ≤ 𝑑 (𝑥 , 𝑧 ) , for every 𝑥 ∈ [𝑥, 𝑃𝐶 𝑥] and 𝑧 ∈ [𝑧, 𝑃𝐶 𝑥], which shows 𝛼(𝑥, 𝑃𝐶 𝑥, 𝑧) ≥ show (iii). Corollary 1.2.5 gives 2
2
2
𝜋 . 2
We finally
2
𝑑 (𝑥, 𝑃𝐶 𝑦) + 𝑑 (𝑦, 𝑃𝐶 𝑥) ≤ 𝑑 (𝑥, 𝑦) + 𝑑 (𝑃𝐶 𝑥, 𝑃𝐶 𝑦) 2
2
2
2
+ 𝑑 (𝑥, 𝑃𝐶 𝑥) + 𝑑 (𝑦, 𝑃𝐶 𝑦) . We apply (2.1.1) twice to arrive at 2
2
𝑑 (𝑥, 𝑃𝐶𝑦) + 𝑑 (𝑦, 𝑃𝐶 𝑥) ≥ 𝑑 (𝑥, 𝑃𝐶 𝑥) + 𝑑 (𝑦, 𝑃𝐶 𝑦) 2
+ 2𝑑 (𝑃𝐶 𝑥, 𝑃𝐶 𝑦) , and the inequality 𝑑 (𝑃𝐶 𝑥, 𝑃𝐶 𝑦) ≤ 𝑑(𝑥, 𝑦) follows. The proof is now complete. It is worth noting that the assumptions in Theorem 2.1.12 can be weakened. We need only the set 𝐶 to be complete, not the whole space. Definition 2.1.13 (Firmly nonexpansive mapping). A map 𝐹 : H → H is called firmly nonexpansive if, for every 𝑥, 𝑦 ∈ H, the mapping 𝛷𝑥,𝑦 : [0, 1] → [0, ∞) defined by
𝛷𝑥,𝑦 (𝜆) := 𝑑 ((1 − 𝜆)𝑥 + 𝜆𝐹𝑥, (1 − 𝜆)𝑦 + 𝜆𝐹𝑦) is nonincreasing.
2.1 Convex sets
 35
In fact, metric projections are firmly nonexpansive. It follows directly by (1.2.4) and Theorem 2.1.12 (iii). Another important example of a firmly nonexpansive mapping will be the resolvent 𝐽𝜆 of a convex lsc function defined in (2.2.5) below, as remarked after Proposition 2.2.24. In fact, the metric projection is a special case of the resolvent; see Example 2.2.21. The resolvent 𝑅𝜆 of a nonexpansive mapping defined in Definition 4.2.1 is yet another instance of a firmly nonexpansive mapping; see Lemma 4.2.2. Example 2.1.14 (Expectation of a random variable). In Section 7.3 we will use metric projections to define the conditional expectation of a random variable which takes values in a Hadamard space; see Definition 7.3.2. Next we see that, for a Chebyshev set, nonexpansiveness of the projection in turn implies convexity of the set. Proposition 2.1.15. Let (H, 𝑑) be a Hadamard space and 𝐶 ⊂ H a Chebyshev set. If 𝑃𝐶 is nonexpansive, then 𝐶 is convex. Proof. Choose 𝑥, 𝑦 ∈ 𝐶 and 𝑧 ∈ [𝑥, 𝑦]. By the triangle inequality and the fact that 𝑃𝐶 is nonexpansive, we have
𝑑(𝑥, 𝑧) + 𝑑(𝑧, 𝑦) = 𝑑(𝑥, 𝑦) ≤ 𝑑 (𝑥, 𝑃𝐶 𝑧) + 𝑑 (𝑃𝐶 𝑧, 𝑦) ≤ 𝑑(𝑥, 𝑧) + 𝑑(𝑧, 𝑦) , which gives 𝑃𝐶 𝑧 ∈ [𝑥, 𝑦]. Since 𝑃𝐶 is nonexpansive, we conclude 𝑑(𝑥, 𝑧) = 𝑑(𝑥, 𝑃𝐶 𝑧). At this point, we recall one of the most intriguing open problems in convex analysis: Is every Chebyshev set in a Hilbert space necessarily convex? Surprisingly, there exists a nonconvex Chebyshev set is the real hyperbolic plane ℍ2 . Proposition 2.1.16. Let (H, 𝑑) be a Hadamard space. If (𝐶𝑖 )𝑖∈𝐼 is a nonincreasing family of bounded closed convex sets in H, where 𝐼 is an arbitrary directed set, then ⋂𝑖∈𝐼 𝐶𝑖 ≠ 0. Proof. Choose 𝑥 ∈ H and denote its projection onto 𝐶𝑖 by 𝑥𝑖 := 𝑃𝐶𝑖 𝑥. Then (𝑑(𝑥, 𝑥𝑖 ))𝑖 is a bounded nondecreasing net of nonnegative numbers, and hence has a limit 𝑙. If 𝑙 = 0, then 𝑥 ∈ ⋂𝑖 𝐶𝑖 . If 𝑙 > 0, then (𝑥𝑖 ) is Cauchy. Indeed, denote 𝑥𝑖𝑗 := 12 𝑥𝑖 + 12 𝑥𝑗 . By (1.2.3) we have 2
2
2
2
2
2
𝑑 (𝑥, 𝑥𝑖𝑗 ) ≤ 12 𝑑 (𝑥, 𝑥𝑖 ) + 12 𝑑 (𝑥, 𝑥𝑗 ) − 14 𝑑 (𝑥𝑖 , 𝑥𝑗 ) , or, 2 1 𝑑 (𝑥𝑖 , 𝑥𝑗 ) 4
2
≤ 12 𝑑 (𝑥, 𝑥𝑖 ) + 12 𝑑 (𝑥, 𝑥𝑗 ) − 𝑑 (𝑥, 𝑥𝑖𝑗 ) ,
which implies that (𝑥𝑖 ) is Cauchy. The limit point clearly lies in ⋂𝑖 𝐶𝑖 .
36  2 Convex sets and convex functions
2.2 Convex functions Let (H, 𝑑) be a Hadamard space. As it is usual in convex analysis, we work with functions which take values in (−∞, ∞]. The domain of a function 𝑓 : H → (−∞, ∞] is by definition the set
dom 𝑓 := {𝑥 ∈ H : 𝑓(𝑥) < ∞} . As a convention, we usually assume dom 𝑓 ≠ 0 without explicit mentioning to avoid trivial cases. Likewise to our interest in closed convex set, we will now consider convex lower semicontinuous functions. Let us recall that a function 𝑓 : H → (−∞, ∞] is lower semicontinuous, or lsc, if the set
{𝑥 ∈ H : 𝑓(𝑥) ≤ 𝛼} is closed for each 𝛼 ∈ ℝ. Note that the class of convex lsc functions is stable with respect to the usual operations: If 𝑓, 𝑔 : H → (−∞, ∞] are convex lsc, then also 𝑓 + 𝑔 is convex lsc, and 𝑐𝑓 is convex lsc for every 𝑐 > 0. If 𝑓𝑛 : H → (−∞, ∞] is a sequence of convex lsc functions, then the functions
sup 𝑓𝑛 ,
and
𝑛∈ℕ
lim sup 𝑓𝑛 𝑛→∞
are convex lsc. We also observe the following. Lemma 2.2.1. Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞]. Then 𝑓 is convex lsc if and only its epigraph
epi 𝑓 := {(𝑥, 𝑡) ∈ H × ℝ : 𝑓(𝑥) ≤ 𝑡} is a closed convex subset of H × ℝ. Proof. See Exercise 2.3. A function ℎ : H → (−∞, ∞] is strongly convex with parameter 𝜅 > 0 if,
ℎ ((1 − 𝑡)𝑥 + 𝑡𝑦) ≤ (1 − 𝑡)ℎ(𝑥) + 𝑡ℎ(𝑦) − 𝜅𝑡(1 − 𝑡)𝑑(𝑥, 𝑦)2 , for every 𝑥, 𝑦 ∈ H and 𝑡 ∈ [0, 1]. Remark 2.2.2. Having established this terminology, inequality (1.2.2) says that, given a fixed 𝑧 ∈ H, the function 𝑑(⋅, 𝑧)2 is strongly convex with parameter 𝜅 = 1. We can hence say that a geodesic metric space (𝑋, 𝑑) is CAT(0) if (and only if) the function 𝑥 → 𝑑(𝑥, 𝑧)2 is strongly convex with parameter 𝜅 = 1, for each 𝑧 ∈ 𝑋. As we would expect, strongly convex functions have ‘stronger’ properties; see for instance Proposition 5.1.15. Let (H, 𝑑) be a Hadamard space. We say that a point 𝑥 ∈ H is a minimizer of a function 𝑓 : H → (−∞, ∞] if 𝑓(𝑥) = inf H 𝑓. The set of minimizers of 𝑓 is denoted Min 𝑓. If 𝑓 is convex lsc, then Min 𝑓 is convex and closed by Example 2.1.3.
2.2 Convex functions
 37
Examples We now present several important examples of convex lsc functions on a Hadamard space (H, 𝑑). Example 2.2.3 (Indicator functions). Let 𝐶 ⊂ H be a convex set. Define the indicator function of 𝐶 by 0, if 𝑥 ∈ 𝐶 ,
𝜄𝐶 (𝑥) := {
∞, if 𝑥 ∉ 𝐶.
Then 𝜄𝐶 is a convex function, and it is lsc if and only if 𝐶 is closed. Example 2.2.4 (Distance functions). Given 𝑥0 ∈ H, the function
𝑥 ∈ H,
𝑥 → 𝑑 (𝑥, 𝑥0 ) ,
(2.2.2)
𝑝
is convex and continuous. The function 𝑑(⋅, 𝑥0 ) for 𝑝 > 1 is strictly convex, and if 𝑝 = 2, it is even strongly convex; see Remark 2.2.2. More generally, the distance function to a closed convex subset 𝐶 ⊂ H, defined as
𝑥 ∈ H,
𝑑𝐶 (𝑥) := inf 𝑑(𝑥, 𝑐), 𝑐∈𝐶
is convex and 1Lipschitz. This follows immediately from (1.2.4) and the very definition, respectively. Example 2.2.5. Given a finite number of points 𝑎1 , . . . , 𝑎𝑁 ∈ H and (𝑤1 , . . . , 𝑤𝑁 ) ∈ 𝛥 𝑁−1 , we consider the function 𝑁
𝑝
𝑓(𝑥) := ∑ 𝑤𝑛𝑑 (𝑥, 𝑎𝑛 ) ,
𝑥∈H,
𝑛=1
where 𝑝 ∈ [1, ∞). The function 𝑓 is convex continuous and will be used at several places in our book. In particular, we are concerned with two important cases: (i) If 𝑝 = 1, then 𝑓 becomes the objective function in the Fermat–Weber problem for optimal facility location. A minimizer of 𝑓 exists by Lemma 2.2.19 and is called a median of the points 𝑎1 , . . . , 𝑎𝑁 . (ii) If 𝑝 = 2, then 𝑓 has a unique minimizer by Proposition 2.2.17, which is called the mean of the points 𝑎1 , . . . , 𝑎𝑁 . In Section 2.3 we will introduce barycenters of probability measures and observe that a mean can be viewed as the barycenter of the probability measure 𝑁
𝜇 := ∑ 𝑤𝑛𝛿𝑎𝑛 , 𝑛=1
where 𝛿𝑎𝑛 stands for the Dirac measure at the point 𝑎𝑛 . In Hilbert spaces, this definition of a mean is equivalent to the arithmetic mean, as you can prove in Exercise 2.9. See also Corollary 2.3.11 for a nice property of the mean. Medians and means have applications in Section 8.3.
38  2 Convex sets and convex functions Example 2.2.6. A way of generalizing the previous example is to replace the points 𝑎1 , . . . , 𝑎𝑁 by convex closed sets 𝐶1 , . . . , 𝐶𝑁 ⊂ H. Then 𝑁
𝑝
𝑥∈H,
𝑓(𝑥) := ∑ 𝑤𝑛𝑑 (𝑥, 𝐶𝑛) , 𝑛=1
where 𝑝 ∈ [1, ∞), is again convex and continuous. Example 2.2.7 (Enclosing ball radius). Let 𝐴 ⊂ H be a bounded set and consider the function
𝑓(𝑥) := sup 𝑑(𝑥, 𝑎)2 ,
𝑥 ∈ H.
𝑎∈𝐴
For a fixed 𝑥 ∈ H, the function value 𝑓(𝑥) represents the radius of the smallest ball enclosing the set 𝐴 centered at 𝑥. A minimizer of 𝑓 is therefore the center of a smallest ball enclosing 𝐴 and its existence is established in Example 2.2.18. Example 2.2.8. Let (𝑥𝑛 ) ⊂ H be a bounded sequence. Define the function 𝜔 : H → [0, ∞) as
𝜔 (𝑥; (𝑥𝑛)) := lim sup 𝑑(𝑥, 𝑥𝑛)2 , 𝑛→∞
𝑥 ∈ H.
It is locally Lipschitz, because 𝑑(⋅, 𝑥𝑛 )2 are locally Lipschitz with a common Lipschitz constant for all 𝑛 ∈ ℕ. By Lemma 2.2.15 below, it is strongly convex. The function 𝜔 will be used in the definition of the weak convergence; see (3.1.1). Example 2.2.9 (Displacement functions). Let 𝐹 : H → H be an isometry. The displacement function of 𝐹 is the function 𝛿𝐹 : H → [0, ∞) defined by
𝛿𝐹 (𝑥) := 𝑑(𝑥, 𝐹𝑥) ,
𝑥 ∈ H.
It is convex and 2Lipschitz. Indeed, convexity follows from (1.2.4) and the Lipschitz property from
𝑑(𝑥, 𝐹𝑥) − 𝑑(𝑦, 𝐹𝑦) ≤ 𝑑(𝑥, 𝑦) + 𝑑(𝐹𝑥, 𝐹𝑦) ≤ 2𝑑(𝑥, 𝑦). Example 2.2.10 (Busemann functions). Let 𝑐 : [0, ∞) → H be a geodesic ray (see Example 2.1.2 for the definition). The function 𝑏𝑐 : H → ℝ defined by
𝑏𝑐 (𝑥) := lim [𝑑 (𝑥, 𝑐(𝑡)) − 𝑡] , 𝑡→∞
𝑥 ∈ H,
is called the Busemann function associated to the ray 𝑐. Busemann functions are convex and 1Lipschitz (Exercise 2.5). Concrete examples of Busemann functions are given in [51, p. 273]. Another explicit example of a Busemann function in the Hadamard space of positive definite 𝑛 × 𝑛 matrices with real entries can be found in [51, Proposition 10.69]. The sublevel sets of Busemann functions are called horoballs and were introduced in Example 2.1.4.
2.2 Convex functions 
39
Example 2.2.11 (Energy functional). The energy functional is another important instance of a convex continuous function on a Hadamard space of 𝐿2 mappings; see [192, p. 342]. Minimizers of the energy functional are called harmonic maps and are important in both analysis and geometry. We refer the interested reader to [96, 102, 121] and the followup papers. For a probabilistic approach to harmonic maps in Hadamard spaces, see [192, 193, 196]. As we shall see in the sequel, many properties of convex functions we came to expect in Hilbert spaces extend to Hadamard spaces. For an exception; see Exercise 2.4. Lemma 2.2.12. Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be a convex lsc function. Then 𝑓 is bounded from below on bounded sets. Proof. Let 𝐶 ⊂ H be bounded, and without loss of generality assume that 𝐶 is closed convex. If inf 𝐶 𝑓 = −∞, then the sets 𝑆𝑘 := {𝑥 ∈ 𝐶 : 𝑓(𝑥) ≤ −𝑘} for 𝑘 ∈ ℕ are all nonempty. Since each 𝑆𝑘 is closed convex and bounded, Proposition 2.1.16 yields a point 𝑧 ∈ ⋂𝑘∈ℕ 𝑆𝑘 . Clearly 𝑓(𝑧) = −∞, which is not possible. Lemma 2.2.13. Let 𝑓 : H → (−∞, ∞] be a convex lsc function. For each 𝑥0 ∈ H there exist constants 𝛼, 𝛽 ∈ ℝ such that
𝑓(𝑥) ≥ 𝛼 + 𝛽𝑑 (𝑥, 𝑥0 ) , for every 𝑥 ∈ H. Proof. Assume that this is not the case, that is, for each 𝑘 ∈ ℕ, there exists 𝑥𝑘 ∈ H such that
𝑓(𝑥𝑘 ) < −𝑘 [𝑑 (𝑥𝑘 , 𝑥0 ) + 1] . Then we have
lim inf 𝑓(𝑥𝑘 ) ≤ − lim sup 𝑘 [𝑑 (𝑥𝑘 , 𝑥0 ) + 1] ≤ −∞, 𝑘→∞
𝑘→∞
which via Lemma 2.2.12 implies that (𝑥𝑘 ) is unbounded. Choose 𝑦 ∈ dom 𝑓 and put
𝑧𝑘 := (1 − 𝑡𝑘 )𝑦 + 𝑡𝑘 𝑥𝑘 ,
with 𝑡𝑘 :=
1 . √𝑘𝑑(𝑦, 𝑥𝑘 )
Then 𝑧𝑘 → 𝑦. By convexity,
𝑓(𝑧𝑘 ) ≤ (1 − 𝑡𝑘 )𝑓(𝑦) + 𝑡𝑘 𝑓(𝑥𝑘 ) ≤ (1 − 𝑡𝑘 )𝑓(𝑦) − 𝑡𝑘 𝑘 [𝑑(𝑥𝑘 , 𝑥0 ) + 1] 𝑑(𝑥𝑘 , 𝑥0 ) + 1 . ≤ (1 − 𝑡𝑘 )𝑓(𝑦) − √𝑘 𝑑(𝑥𝑘 , 𝑦) Thus, by lower semicontinuity, we get
𝑓(𝑦) ≤ lim inf 𝑓(𝑧𝑘 ) ≤ −∞, 𝑘→∞
which is not possible.
40  2 Convex sets and convex functions Lemma 2.2.14. Let 𝑓 : H → (−∞, ∞] be a strongly convex lsc function. Then 𝑓 is bounded from below. Proof. Let 𝑓 be strongly convex with parameter 𝜅 > 0 and (𝑥𝑛 ) ⊂ H be a minimizing sequence of 𝑓, that is, 𝑓(𝑥𝑛 ) → inf 𝑓. By contradiction, assume inf 𝑓 = −∞. Then (𝑥𝑛) is unbounded by Lemma 2.2.12 and so is 𝑦𝑛 := 12 𝑥+ 12 𝑥𝑛 for some fixed 𝑥 ∈ dom 𝑓. By strong convexity
1 1 𝜅 𝑓(𝑥) + 𝑓(𝑥𝑛) − 𝑑(𝑥, 𝑥𝑛)2 . 2 2 4
𝑓 (𝑦𝑛) ≤
Lemma 2.2.13 gives 𝛼, 𝛽 ∈ ℝ such that
𝑓(𝑦) ≥ 𝛼 + 𝛽𝑑(𝑥, 𝑦) , for every 𝑦 ∈ H. Hence
𝛼 + 𝛽𝑑 (𝑥, 𝑥𝑛) ≤
1 1 𝜅 2 𝑓(𝑥) + 𝑓 (𝑥𝑛) − 𝑑 (𝑥, 𝑥𝑛 ) , 2 2 4
which is not possible, so 𝑓 is bounded from below. Lemma 2.2.15. Let 𝑓𝑛 : H → (−∞, ∞] be strongly convex functions with common parameter 𝜅, for every 𝑛 ∈ ℕ. Then the function 𝑓 := lim sup𝑛→∞ 𝑓𝑛 is strongly convex with parameter 𝜅. Let 𝑓𝑖 : H → (−∞, ∞] be strongly convex functions with common parameter 𝜅, for every 𝑖 ∈ 𝐼, where 𝐼 is an arbitrary index set. Then the function 𝑓 := sup𝑖∈𝐼 𝑓𝑖 is strongly convex with parameter 𝜅. Proof. See Exercise 2.7. Question 2.2.16 (Lower semicontinuity of ultraextensions). Assume (H, 𝑑) is a Hadamard space and (H𝜔 , 𝑑𝜔 ) is an ultralimit of the constant sequence H𝑛 := H for each 𝑛 ∈ ℕ. Given a convex lsc function 𝑓 : H → (−∞, ∞], define its ultraextension by
𝑓𝜔 (𝑥𝜔 ) := inf {𝜔lim 𝑓 (𝑥𝑛) : (𝑥𝑛) ∈ 𝑥𝜔 } , 𝑛
𝑥𝜔 ∈ H𝜔 .
Since 𝑓 is bounded from below on bounded sets by Lemma 2.2.12, we have 𝑓𝜔 > −∞. The ultraextension of a convex function 𝑓 : H → (−∞, ∞] is a convex function 𝑓𝜔 : H𝜔 → (−∞, ∞] and we have 𝑓𝜔 ↾H = 𝑓. We do not know whether the ultraextension of a convex lsc function is lsc.
Minimizers of convex functions Next we establish conditions which ensure the existence and uniqueness of minimizers of convex lsc functions. Clearly, a strictly convex function admits at most one minimizer. Strong convexity is sufficient to guarantee also the existence as the following proposition states.
2.2 Convex functions
 41
Proposition 2.2.17. Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be a lsc strongly convex function with parameter 𝜅 > 0. Then there exists a unique minimizer 𝑥 ∈ H of 𝑓 and each minimizing sequence converges to 𝑥. Moreover,
𝑓(𝑥) + 𝜅𝑑(𝑥, 𝑦)2 ≤ 𝑓(𝑦) ,
(2.2.3)
for each 𝑦 ∈ H. Proof. Let (𝑥𝑛 ) ⊂ H be a minimizing sequence of 𝑓. Lemma 2.2.14 states that 𝑓 is bounded from below. Denote 𝑥𝑚𝑛 the midpoint of [𝑥𝑚 , 𝑥𝑛 ]. Then again by strong convexity
1 1 𝜅 2 𝑓 (𝑥𝑚 ) + 𝑓 (𝑥𝑛 ) − 𝑑 (𝑥𝑚 , 𝑥𝑛) , 2 2 4 which yields that (𝑥𝑛 ) is Cauchy, and hence converges to a point 𝑥 ∈ H. Finally, we use the lower semicontinuity of 𝑓 to conclude that 𝑥 ∈ Min 𝑓. Uniqueness is obvious and consequently also the fact that any other minimizing sequence converges to 𝑥. To show (2.2.3) we choose 𝑦 ∈ H and denote 𝛾 : [0, 1] → H the geodesic from 𝑥 to 𝑦. Strong convexity gives 𝑓 (𝑥𝑚𝑛) ≤
𝑓(𝑥) < 𝑓 (𝛾𝑡 ) ≤ (1 − 𝑡)𝑓(𝑥) + 𝑡𝑓(𝑦) − 𝜅𝑡(1 − 𝑡)𝑑(𝑥, 𝑦)2 , for every 𝑡 ∈ (0, 1). Dividing by 𝑡 and taking the limit 𝑡 → 0 yields (2.2.3). The proof is now complete. Note that Theorem 2.1.12 (i) and (ii) is a special case of Proposition 2.2.17. Indeed, let 𝐶 ⊂ H be a closed convex set and 𝑥 ∈ H \ 𝐶. Put
𝑓(𝑦) := {
𝑑(𝑥, 𝑦)2 , if 𝑦 ∈ 𝐶 , on 𝑦 ∈ H \ 𝐶. ∞,
Then 𝑓 is strongly convex due to the nonpositive curvature of the Hadamard space. Example 2.2.18 (Minimal enclosing ball center). Let 𝐴 ⊂ H be a bounded set and consider the function from Example 2.2.7, that is,
𝑓(𝑥) := sup 𝑑(𝑥, 𝑎)2 ,
𝑥 ∈ H.
𝑎∈𝐴
According to Lemma 2.2.15, the function is strongly convex, and therefore has a unique minimizer due to Proposition 2.2.17. This minimizer is the center of the minimal ball enclosing the set 𝐴. If 𝐴 := {𝑎1 , . . . , 𝑎𝑁 } is finite, finding a minimizer of 𝑓 amounts to the minmax problem 2
min max 𝑑 (𝑥, 𝑎𝑛) , 𝑛 𝑥∈H
where 𝑛 runs over {1, . . . , 𝑁}.
42  2 Convex sets and convex functions Another condition to guarantee the existence of a minimizer is coercivity. Lemma 2.2.19. Let 𝑓 : H → (−∞, ∞] be a convex lsc function which satisfies 𝑓(𝑥) → ∞ whenever 𝑑(𝑥, 𝑥0 ) → ∞ for some 𝑥0 ∈ H. Then 𝑓 has a minimizer. Proof. By Lemma 2.2.12 we know that 𝑓 is bounded from below on bounded sets, and hence it is bounded from below on H since 𝑓(𝑥) → ∞ whenever 𝑑(𝑥, 𝑥0 ) → ∞. Therefore inf H 𝑓 > −∞, and we can apply Proposition 2.1.16 to the sublevel sets
1 𝐶𝑛 := {𝑥 ∈ H : 𝑓(𝑥) ≤ inf 𝑓 + } , H 𝑛 to get the existence of a minimizer. Notice that the above lemma covers, as a trivial case, the situation when H is bounded. Then every convex lsc function is coercive and has a minimizer. The Moreau–Yosida envelope of a convex lsc function 𝑓 : H → (−∞, ∞] with parameter 𝜆 > 0 is defined as
𝑓𝜆 (𝑥) := inf [𝑓(𝑦) + 𝑦∈H
1 𝑑(𝑥, 𝑦)2 ] , 2𝜆
𝑥∈H.
(2.2.4)
Since 𝑓 is convex and 𝑑(𝑥, ⋅)2 strongly convex, the function 𝑓 + 𝑑(𝑥, ⋅)2 is strongly convex, and hence bounded from below (Lemma 2.2.14), which implies 𝑓𝜆 > −∞. Also 𝑓𝜆 < ∞, unless of course dom 𝑓 = 0, but we do not consider such a function. Furthermore, the function 𝑓𝜆 is convex, but not necessarily lsc; see Exercise 2.8. Given 𝜆, 𝜇 > 0, the following holds
(𝑓𝜆 )𝜇 (𝑥) = 𝑓𝜆+𝜇 (𝑥) ,
𝑥∈H,
as one can easily check. This gives then rise to the Hamilton–Jacobi semigroup. As we already noted above, the function 𝑓 + 𝑑(𝑥, ⋅)2 is strongly convex and the following mapping is therefore well defined. Definition 2.2.20 (Resolvent of a function). Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be convex lsc. For 𝜆 > 0, define the resolvent of 𝑓 as
𝐽𝜆 𝑥 := arg min [𝑓(𝑦) + 𝑦∈H
1 𝑑(𝑥, 𝑦)2 ] , 2𝜆
𝑥∈H,
(2.2.5)
and put 𝐽0 𝑥 := 𝑥 for each 𝑥 ∈ H. 𝑓
If a confusion is likely, one should use the notation 𝐽𝜆 to specify which function a resolvent belongs to. But it will usually be clear from the context, so we omit the superscript. Some authors use the term proximal mapping for the resolvent and denote it 𝑃𝜆 instead of 𝐽𝜆 , but we will not do that. Example 2.2.21 (Metric projections revisited). If 𝑓 is the indicator function of a closed convex set (see Example 2.2.3 for the definition), then the resolvent is the metric projection onto 𝐶, that is, 𝐽𝜆 = 𝑃𝐶 for each 𝜆 > 0.
2.2 Convex functions
 43
Theorem 2.2.22. Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be convex lsc. Given 𝜆 > 0, the resolvent 𝐽𝜆 of 𝑓 is nonexpansive, that is,
𝑥0 , 𝑥1 ∈ H.
𝑑 (𝐽𝜆 𝑥0 , 𝐽𝜆 𝑥1 ) ≤ 𝑑 (𝑥0 , 𝑥1 ) ,
Proof. Denote 𝑦 : [0, 1] → H the geodesic from 𝐽𝜆 𝑥0 to 𝐽𝜆 𝑥1 . By the convexity of 𝑓 we have 𝑓 (𝑦𝑡 ) + 𝑓 (𝑦1−𝑡 ) ≤ 𝑓 (𝑦0 ) + 𝑓 (𝑦1 ) . (2.2.6) Proposition 1.2.4 gives for any 𝑡 ∈ (0, 1) the following estimate 2
2
2
𝑑 (𝑦𝑡 , 𝑥0 ) + 𝑑 (𝑦1−𝑡 , 𝑥1 ) ≤ 𝑑 (𝑦0 , 𝑥0 ) + 𝑑 (𝑦1 , 𝑥1 ) + 2𝑡2 𝑑 (𝑦0 , 𝑦1 )
2
2
2
+ 𝑡𝑑 (𝑥0 , 𝑥1 ) − 𝑡𝑑 (𝑦0 , 𝑦1 )
2 2
− 𝑡 [𝑑 (𝑥0 , 𝑥1 ) − 𝑑 (𝑦0 , 𝑦1 )] . Multiplying (2.2.7) by
1 2𝜆
(2.2.7)
and adding to (2.2.6) yields
2
2
2
2
0 ≤ 2𝑡𝑑 (𝑦0 , 𝑦1 ) + 𝑑 (𝑥0 , 𝑥1 ) − 𝑑 (𝑦0 , 𝑦1 ) + [𝑑 (𝑥0 , 𝑥1 ) − 𝑑 (𝑦0 , 𝑦1 )] , since we have
𝑓 (𝑦𝑡 ) +
1 1 2 2 𝑑 (𝑦𝑡 , 𝑥0 ) + 𝑓 (𝑦1−𝑡) + 𝑑 (𝑦1−𝑡, 𝑥1 ) ≥ 𝑓𝜆 (𝑦0 ) + 𝑓𝜆 (𝑦1 ) 2𝜆 2𝜆
by the definition of 𝑦0 = 𝐽𝜆 𝑥0 and 𝑦1 = 𝐽𝜆 𝑥1 . Taking the limit 𝑡 → 0 in the second last inequality finishes the proof. The following lemma is similar to the variational inequality in (2.2.3), but is more subtle since we take the structure of the strongly convex function in question into account. One can also view it as a discrete version of the evolution variation inequality from Theorem 5.1.11. Lemma 2.2.23. Let 𝑓 : H → (−∞, ∞] be a convex lsc function. Then for every 𝑥, 𝑦 ∈ H and 𝜆 > 0, we have
1 1 2 𝑑 (𝐽𝜆 𝑥, 𝑦) − 𝑑(𝑥, 𝑦)2 ≤ 𝑓(𝑦) − 𝑓𝜆 (𝑥) . 2𝜆 2𝜆 Proof. Denote 𝛾 : [0, 1] → H the geodesic connecting 𝐽𝜆 𝑥 and 𝑦. Applying (1.2.2) and the convexity of 𝑓 yields
𝑓𝜆 (𝑥) ≤ 𝑓 (𝛾𝑡 ) +
1 2 𝑑 (𝑥, 𝛾𝑡 ) 2𝜆
≤ (1 − 𝑡)𝑓𝜆 (𝑥) + 𝑡𝑓(𝑦) +
𝑡 𝑡(1 − 𝑡) 2 𝑑(𝑥, 𝑦)2 − 𝑑 (𝐽𝜆 𝑥, 𝑦) , 2𝜆 2𝜆
44  2 Convex sets and convex functions for any 𝑡 ∈ (0, 1). After dividing by 𝑡 we obtain
0 ≤ −𝑓𝜆 (𝑥) + 𝑓(𝑦) +
1 1−𝑡 2 𝑑(𝑥, 𝑦)2 − 𝑑 (𝐽𝜆 𝑥, 𝑦) , 2𝜆 2𝜆
and passing to the limit 𝑡 → 0 finally gives the desired inequality. The resolvent 𝐽𝜆 is not a semigroup in the variable 𝜆. The following Proposition 2.2.24 describes the dependence of 𝐽𝜆 on 𝜆. Proposition 2.2.24 (Resolvent identity). Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be convex lsc. Then the following identity holds
𝐽𝜆 𝑥 = 𝐽𝜇 (
𝜆−𝜇 𝜇 𝐽𝜆 𝑥 + 𝑥) , 𝜆 𝜆
𝑥 ∈ H,
for every 𝜆 > 𝜇 > 0. Proof. Denote 𝑦 := equalities
𝜆−𝜇 𝐽𝑥 𝜆 𝜆
𝑓 (𝐽𝜇 (𝑦)) +
𝜇
+ 𝜆 𝑥. By the definition of the resolvent we have the in
2 1 1 2 𝑑 (𝐽𝜇 𝑦, 𝑦) ≤ 𝑓 (𝐽𝜆 𝑥) + 𝑑 (𝐽𝜆 𝑥, 𝑦) , 2𝜇 2𝜇
(2.2.8)
2 1 1 2 𝑑 (𝐽𝜆 𝑥, 𝑥) ≤ 𝑓 (𝐽𝜇 𝑦) + 𝑑 (𝐽𝜇 𝑦, 𝑥) . 2𝜆 2𝜆
(2.2.9)
and,
𝑓 (𝐽𝜆 (𝑥)) +
Summing (2.2.8) and (2.2.9) yields 2 2 1 1 1 1 2 2 𝑑 (𝐽𝜇 𝑦, 𝑦) + 𝑑 (𝐽𝜆 𝑥, 𝑥) ≤ 𝑑 (𝐽𝜆 𝑥, 𝑦) + 𝑑 (𝐽𝜇 𝑦, 𝑥) . 2𝜇 2𝜆 2𝜇 2𝜆
By a simple computation we have 2 2 𝜇 1 𝜆 1 2 2 𝑑 (𝐽𝜇 𝑦, 𝑦) + 𝑑 (𝑦, 𝑥) ≤ 𝑑 (𝑦, 𝑥) + 𝑑 (𝐽𝜇 𝑦, 𝑥) , 𝜇 (𝜆 − 𝜇)2 (𝜆 − 𝜇)2 𝜆
and furthermore, 2 2 𝜆 𝜆 2 𝑑 (𝐽𝜇 𝑦, 𝑦) + 𝑑 (𝑦, 𝑥) ≤ 𝑑 (𝐽𝜇 𝑦, 𝑥) . 𝜇 𝜆−𝜇
On the other hand, by (1.3.28) we know that 2
𝑑 (𝐽𝜇 𝑦, 𝑥) ≤
2 𝜆 𝜆 2 𝑑 (𝐽𝜇 𝑦, 𝑦) + 𝑑 (𝑦, 𝑥) . 𝜇 𝜆−𝜇
Hence all of the above inequalities are actually equalities. In particular, by the uniqueness of the minimizer, we get from (2.2.8) that 𝐽𝜆 𝑥 = 𝐽𝜇 𝑦, which finishes the proof.
2.2 Convex functions
 45
As a corollary, we obtain that a resolvent is even firmly nonexpansive; see Definition 2.1.13. Indeed, denote 𝑥𝑡 the geodesic from 𝑥0 to 𝐽𝜆 𝑥0 , and 𝑦𝑡 the geodesic from 𝑦0 to 𝐽𝜆 𝑦0 . The resolvent identity from Proposition 2.2.24 gives
𝑑 (𝐽𝜆 𝑥0 , 𝐽𝜆 𝑦0 ) = 𝑑 (𝐽(1−𝑡)𝜆 𝑥𝑡 , 𝐽(1−𝑡)𝜆 𝑦𝑡 ) ≤ 𝑑 (𝑥𝑡 , 𝑦𝑡 ) , for every 𝑡 ∈ (0, 1). The joint convexity of the metric (1.2.4) then gives that 𝐽𝜆 is firmly nonexpansive. The resolvents are essential in the proof of the existence of harmonic maps. Indeed, the energy functional is convex and lsc on a suitable Hadamard space 𝑌 of 𝐿2 mappings (Example 2.2.11) and 𝐽𝜆 𝑦, with an arbitrary 𝑦 ∈ 𝑌, converges to a minimizer of the energy functional as 𝜆 → ∞. This relies upon the following theorem. For the details, see [103–106]. Theorem 2.2.25. Let (H, 𝑑) be a Hadamard space, 𝑓 : H → (−∞, ∞] be convex lsc and 𝑥 ∈ H. If (𝐽𝜆 𝑛 𝑥)𝑛∈ℕ is bounded for some sequence 𝜆 𝑛 → ∞, then Min 𝑓 ≠ 0 and
lim 𝐽𝜆 𝑥 = 𝑃Min 𝑓 (𝑥),
𝜆→∞
where 𝑃Min 𝑓 stands for the metric projection onto Min 𝑓. Proof. For simplicity denote 𝑦𝜆 := 𝐽𝜆 𝑥, with 𝜆 > 0. Observe that for each 𝑛 ∈ ℕ, the point 𝑦𝜆 𝑛 is a minimizer of the function 𝑓 + 2𝜆1 𝑑(⋅, 𝑥)2 and recall that the sequence 𝑛
(𝑦𝜆 𝑛 )𝑛∈ℕ is bounded by the assumptions. These two facts imply that (𝑦𝜆 𝑛 )𝑛∈ℕ is a minimizing sequence of 𝑓. Next we show that the function 𝜆 → 𝑑(𝑥, 𝑦𝜆 ) is increasing. Suppose 0 < 𝜇 < 𝜆. Then
𝑓 (𝑦𝜆 ) +
2 1 1 2 𝑑 (𝑥, 𝑦𝜆 ) > 𝑓 (𝑦𝜇 ) + 𝑑 (𝑥, 𝑦𝜇 ) , 2𝜇 2𝜇
and furthermore,
𝑓(𝑦𝜆 ) +
1 𝑑(𝑥, 𝑦𝜆 )2 > 𝑓 (𝑦𝜇 ) + 2𝜆 1 +( − 2𝜇
2 1 𝑑 (𝑥, 𝑦𝜇 ) 2𝜆 2 1 2 ) [𝑑 (𝑥, 𝑦𝜇 ) − 𝑑 (𝑥, 𝑦𝜆 ) ] , 2𝜆
which already implies 𝑑(𝑥, 𝑦𝜇 ) < 𝑑(𝑥, 𝑦𝜆 ). As the function 𝜆 → 𝑑(𝑥, 𝑦𝜆 ) is bounded on the sequence (𝜆 𝑛 ) and monotone, it has to be bounded on all (0, ∞). Clearly, we have
𝑓 (𝑦𝜆 ) = inf {𝑓(𝑦) : 𝑑(𝑥, 𝑦) ≤ 𝑑 (𝑥, 𝑦𝜆 )} , 𝑦∈H
and since 𝜆 → 𝑑(𝑥, 𝑦𝜆 ) is increasing, we conclude that 𝜆 → 𝑓(𝑦𝜆 ) is nonincreasing and furthermore that 𝑓(𝑦𝜆 ) tends to inf H 𝑓 as 𝜆 → ∞.
46  2 Convex sets and convex functions Choose now 𝜀 > 0 and find 𝛬 > 0 such that for every 𝜆 ≥ 𝜇 ≥ 𝛬 the following inequality holds
𝑑 (𝑥, 𝑦𝜇 )2 − 𝑑 (𝑥, 𝑦𝜆 )2 < 𝜀. If 𝑦𝜇𝜆 is the midpoint of 𝑦𝜇 and 𝑦𝜆 , then we get 𝑓 (𝑦𝜇 ) +
2 2 1 1 𝑑 (𝑥, 𝑦𝜇 ) ≤ 𝑓 (𝑦𝜇𝜆 ) + 𝑑 (𝑥, 𝑦𝜇𝜆 ) 2𝜇 2𝜇 2 2 1 𝜀 1 [𝑑 (𝑥, 𝑦𝜇 ) + − 𝑑 (𝑦𝜇 , 𝑦𝜆 ) ] ≤ 𝑓 (𝑦𝜇𝜆 ) + 2𝜇 2 4 2 2 1 𝜀 1 [𝑑 (𝑥, 𝑦𝜇 ) + − 𝑑 (𝑦𝜇 , 𝑦𝜆 ) ] . ≤ 𝑓 (𝑦𝜇 ) + 2𝜇 2 4
Hence 𝑑(𝑦𝜇 , 𝑦𝜆 )2 < 2𝜀, that is, (𝑦𝜆 )𝜆>0 is Cauchy. By completeness, there is a point 𝑦 ∈ H such that 𝑦𝜆 → 𝑦 as 𝜆 → ∞. By the lsc of 𝑓, the point 𝑦 is a minimizer of 𝑓. It is straightforward to show 𝑦 = 𝑃Min 𝑓 (𝑥). Proposition 2.2.26. Let 𝑓 : H → (−∞, ∞] be a convex lsc function and 𝑥 ∈ H. Then the function 𝜆 → 𝐽𝜆 𝑥 is continuous on (0, ∞) and
lim 𝐽𝜆 𝑥 = 𝑃dom 𝑓 (𝑥).
𝜆→0+
In particular, if 𝑥 ∈ dom 𝑓, the function 𝜆 → 𝐽𝜆 𝑥 is continuous on [0, ∞). Proof. Denote 𝑃 := 𝑃dom 𝑓 . The resolvent identity (Proposition 2.2.24) and Theorem 2.2.22 easily yield continuity at any positive 𝜆. We leave the details to the reader. To show the second part of the statement, put 𝑥𝜆 := 𝐽𝜆 𝑥 and recall that by the proof of Theorem 2.2.25, we have 𝑓(𝑥𝜆 ) is nonincreasing, in particular, there exists 𝛼 ∈ ℝ such that 𝑓(𝑥𝜆 ) ≥ 𝛼 for any 𝜆 ∈ (0, 1]. Fix 𝜀 > 0 and find 𝛿 > 0 such that 2 2 [𝑑(𝑥, 𝑃𝑥) + 𝛿] − 𝑑(𝑥, 𝑃𝑥) < 𝜀.
Then choose 𝑧 ∈ dom 𝑓 with 𝑑(𝑧, 𝑃𝑥) < 𝛿. Applying (2.1.1) we obtain
𝑓 (𝑥𝜆 ) +
1 1 1 2 2 𝑑 (𝑥𝜆 , 𝑃𝑥) + 𝑑 (𝑃𝑥, 𝑥)2 ≤ 𝑓 (𝑥𝜆 ) + 𝑑 (𝑥𝜆 , 𝑥) 2𝜆 2𝜆 2𝜆 1 ≤ 𝑓(𝑧) + 𝑑(𝑧, 𝑥)2 2𝜆 1 2 ≤ 𝑓(𝑧) + [𝑑(𝑥, 𝑃𝑥) + 𝛿] . 2𝜆
Consequently, 2
𝑑 (𝑥𝜆 , 𝑃𝑥) ≤ [𝑑(𝑥, 𝑃𝑥) + 𝛿]2 − 𝑑(𝑥, 𝑃𝑥)2 + 2𝜆 [𝑓(𝑧) − 𝛼] , whenever 𝜆 ∈ (0, 1]. By choosing 𝜆 such that 2𝜆[𝑓(𝑧) − 𝛼] < 𝜀, we finally get 2
𝑑 (𝑥𝜆 , 𝑃𝑥) < 2𝜀, which finishes the proof.
2.2 Convex functions
 47
The following result is due to de Giorgi. Proposition 2.2.27. Let 𝑓 : H → (−∞, ∞] be convex lsc and 𝑥 ∈ H. Then the function 𝜆 → 𝑓𝜆 (𝑥) is locally Lipschitz on (0, ∞) and 2
𝑑 (𝑥, 𝐽𝜆 𝑥) d 𝑓𝜆 (𝑥) = − , d𝜆 2𝜆2 for almost every 𝜆 ∈ (0, ∞). Proof. Let 𝜇, 𝜆 ∈ (0, ∞). By the definition of the resolvent we have
𝑓 (𝐽𝜆 𝑥) +
2 1 1 2 𝑑 (𝐽𝜆 𝑥, 𝑥) ≥ 𝑓 (𝐽𝜇 𝑥) + 𝑑 (𝐽𝜇 𝑥, 𝑥) , 2𝜇 2𝜇
and therefore
𝜆−𝜇 1 1 2 2 𝑑 (𝐽𝜆 𝑥, 𝑥) = ( − ) 𝑑 (𝐽𝜆 𝑥, 𝑥) 2𝜆𝜇 2𝜇 2𝜆 2 1 1 2 𝑑 (𝐽𝜇 𝑥, 𝑥) − 𝑓 (𝐽𝜆 𝑥) − 𝑑 (𝐽𝜆 𝑥, 𝑥) . ≥ 𝑓 (𝐽𝜇 𝑥) + 2𝜇 2𝜆 By symmetry we obtain 2 2 𝜇−𝜆 1 1 𝑑 (𝐽𝜇 𝑥, 𝑥) = ( − ) 𝑑 (𝐽𝜇 𝑥, 𝑥) 2𝜆𝜇 2𝜆 2𝜇 2 1 1 2 𝑑 (𝐽𝜆 𝑥, 𝑥) − 𝑓 (𝐽𝜇 𝑥) − 𝑑 (𝐽𝜇 𝑥, 𝑥) . ≥ 𝑓 (𝐽𝜆 𝑥) + 2𝜆 2𝜇
The statement follows easily. The theory of resolvents continues in Chapter 5, where we use resolvents to define nonlinear gradient flow semigroups. Remark 2.2.28. The existence of minimizers of the displacement function (see Example 2.2.9 for the definition) gives rise to a classification of isometries on metric spaces. Let 𝐹 be an isometry on a metric space (𝑋, 𝑑). Then it is called – parabolic if 𝑑𝐹 does not attain its infimum, – elliptic if 𝑑𝐹 attains its infimum of 0, – hyperbolic if 𝑑𝐹 attains its strictly positive infimum. It is clear that 𝐹 is semisimple if and only if it is not parabolic; see Example 2.1.6. We refer the interested reader to the Bridson–Haefliger book [51] for more details. Other properties of convex functions will appear at many places throughout the rest of the book. For instance, in Lemma 3.2.3 we will show that convex functions are lsc if and only they are weakly lsc.
48  2 Convex sets and convex functions
2.3 Convexity and probability measures Next, a relationship between convexity and probability measures is studied in detail. We derive the analytical properties of probability measures here and thence prepare a ground for Chapter 7, where the stochastic properties of probability measures will be investigated. Let (𝑋, 𝑑) be a metric space and denote P(𝑋) the set of all probability measures on 𝑋. For 𝑝 ∈ [1, ∞) let P𝑝 (𝑋) be the set of 𝜇 ∈ P(𝑋) such that
∫ 𝑑(𝑥, 𝑦)𝑝 d𝜇(𝑦) < ∞, 𝑋
for some, and hence all, 𝑥 ∈ 𝑋. Denote P∞ (𝑋) the set of measures of P(𝑋) with bounded support. Observe that P∞ (𝑋) ⊂ P𝑝 (𝑋) ⊂ P1 (𝑋). The Dirac measure at a point 𝑥 ∈ 𝑋 will be denoted 𝛿𝑥 . Theorem 2.3.1 (Existence of a barycenter). Let (H, 𝑑) be a Hadamard space and 𝜇 ∈ P1 (H). Given 𝑦 ∈ H, there is a unique minimizer of the function
𝐹𝑦 (𝑧) := ∫ [𝑑(𝑥, 𝑧)2 − 𝑑(𝑥, 𝑦)2 ] d𝜇(𝑥) ,
𝑧∈H.
(2.3.10)
H
This minimizer is independent of 𝑦 and is called the barycenter of 𝜇, denoted 𝑏(𝜇). Given 𝑧 ∈ H, the following inequality holds: 2
2
∫ [𝑑(𝑧, 𝑥)2 − 𝑑 (𝑏(𝜇), 𝑥) ] d𝜇(𝑥) ≥ 𝑑 (𝑧, 𝑏(𝜇)) .
(2.3.11)
H
If, moreover, 𝜇 ∈ P2 (H), then
𝑏(𝜇) = arg min ∫ 𝑑(𝑥, 𝑧)2 d𝜇(𝑥) . 𝑧∈H
(2.3.12)
H
Proof. First observe that the function 𝑧 → 𝐹𝑦 (𝑧) − 𝐹𝑦 (𝑧) is constant, which means that the barycenter (if it exists) does not depend on 𝑦. Furthermore,
𝐹𝑦 (𝑧) = ∫ [𝑑(𝑥, 𝑧) − 𝑑(𝑥, 𝑦)] ⋅ [𝑑(𝑥, 𝑧) + 𝑑(𝑥, 𝑦)] d𝜇(𝑥) H ≤ 𝑑(𝑧, 𝑦) ⋅ [∫ 𝑑(𝑥, 𝑧) d𝜇(𝑥) + ∫ 𝑑(𝑥, 𝑦) d𝜇(𝑥)] , H [H ]
2.3 Convexity and probability measures 
49
which yields 𝐹𝑦 (𝑧) < ∞. We now claim that 𝐹𝑦 is strongly convex. Indeed, let 𝑧𝑡 := (1 − 𝑡)𝑧0 + 𝑡𝑧1 with 𝑡 ∈ [0, 1], then by (1.2.2) we have
𝐹𝑦 (𝑧𝑡 ) = ∫ [𝑑(𝑥, 𝑧𝑡 )2 − 𝑑(𝑥, 𝑦)2 ] d𝜇(𝑥) H
≤ (1 − 𝑡) ∫ [𝑑(𝑥, 𝑧0 )2 − 𝑑(𝑥, 𝑦)2 ] d𝜇(𝑥) H
+ 𝑡 ∫ [𝑑(𝑥, 𝑧1 )2 − 𝑑(𝑥, 𝑦)2 ] d𝜇(𝑥) − 𝑡(1 − 𝑡)𝑑(𝑧0 , 𝑧1 )2 H
= (1 − 𝑡)𝐹𝑦 (𝑧0 ) + 𝑡𝐹𝑦 (𝑧1 ) − 𝑡(1 − 𝑡)𝑑(𝑧0 , 𝑧1 )2 .
(2.3.13)
Since 𝐹𝑦 is obviously continuous, applying Lemma 2.2.17 gives the existence and uniqueness of the barycenter. If we now apply (2.3.13) with 𝑧0 := 𝑏(𝜇), 𝑧1 := 𝑧 and 𝑦 := 𝑏(𝜇), we get 2
0 ≤ 𝐹𝑏(𝜇) (𝑧𝑡 ) ≤ 𝑡𝐹𝑏(𝜇) (𝑧) − 𝑡(1 − 𝑡)𝑑 (𝑧, 𝑏(𝜇)) , and consequently for any 𝑡 ∈ (0, 1) one obtains 2
2
∫ [𝑑(𝑧, 𝑥)2 − 𝑑 (𝑏(𝜇), 𝑥) ] d𝜇(𝑥) ≥ (1 − 𝑡)𝑑 (𝑧, 𝑏(𝜇)) . H
Letting 𝑡 → 0 yields (2.3.11). If finally 𝜇 ∈ P2 (H), then we immediately get (2.3.12). We have already met barycenters of discrete measures in Example 2.2.5. The barycenter of a probability measure can be approximated via the law of large numbers in Theorem 7.2.1, which turns out to be very useful from the computational point of view; see Section 7.2. Example 2.3.2 (Pettis integral). If 𝐻 is a Hilbert space and 𝜇 ∈ P1 (𝐻), then
𝑏(𝜇) = ∫ 𝑥 d𝜇(𝑥), 𝐻
in the weak (Pettis) sense, that is,
⟨𝑏(𝜇), 𝑦⟩ = ∫⟨𝑥, 𝑦⟩ d𝜇(𝑥), 𝐻
for every 𝑦 ∈ 𝐻. Indeed, by the definition, 𝑏(𝜇) is a unique minimizer of the function
𝐹 : 𝑧 → ∫ [𝑑(𝑥, 𝑧)2 − 𝑑(𝑥, 0)2 ] d𝜇(𝑥) = ∫ ‖𝑥 − 𝑧‖2 − ‖𝑥‖2 d𝜇(𝑥). 𝐻
𝐻
50  2 Convex sets and convex functions Therefore 𝑧 = 𝑏(𝜇) if and only if
d 𝐹(𝑧 + 𝑡𝑦)𝑡=0 = 2 ∫⟨𝑧 − 𝑥, 𝑦⟩ d𝜇(𝑥) = 0, d𝑡 𝐻
for every 𝑦 ∈ 𝐻. Lemma 2.3.3. Let 𝐶 ⊂ H be a closed convex set and 𝜇 ∈ P1 (H). If supp 𝜇 ⊂ 𝐶, then 𝑏(𝜇) ∈ 𝐶. Proof. Denote 𝑝 := 𝑃𝐶 (𝑏(𝜇)) the projection of 𝑏(𝜇) onto 𝐶. Then (2.1.1) gives 2
∫ [𝑑(𝑝, 𝑥)2 − 𝑑(𝑦, 𝑥)2 ] d𝜇(𝑥) ≤ ∫ [𝑑 (𝑏(𝜇), 𝑥) − 𝑑(𝑦, 𝑥)2 ] d𝜇(𝑥), H
H
and by the definition of 𝑏(𝜇) we must have 𝑏(𝜇) = 𝑝. Definition 2.3.4 (Variance of a measure). For 𝜇 ∈ P(H) denote its variance by
var 𝜇 := inf ∫ 𝑑(𝑥, 𝑧)2 d𝜇(𝑥). 𝑧∈H
H
Hence P (H) is the set of the probability measures of finite variance. 2
Let 𝜇 ∈ P2 (H). Inequality (2.3.11) reads 2
∫ 𝑑(𝑧, 𝑥)2 d𝜇(𝑥) ≥ 𝑑 (𝑧, 𝑏(𝜇)) + var 𝜇 ,
(2.3.14)
H
for every 𝑧 ∈ H. We call (2.3.14) the variance inequality for 𝜇. We will now give conditions on a complete metric space in terms of probability measures each of which is equivalent to CAT(0). Theorem 2.3.5. Let (𝑋, 𝑑) be a complete metric space. Then the following properties are equivalent: (i) The space (𝑋, 𝑑) is CAT(0). (ii) For any 𝜇 ∈ P2 (𝑋), or equivalently for 𝜇 discrete, there exists a point 𝑧𝜇 ∈ 𝑋 such that for each 𝑧 ∈ 𝑋 we have 2
2
𝑑 (𝑧, 𝑧𝜇 ) + ∫ 𝑑 (𝑥, 𝑧𝜇 ) d𝜇(𝑥) ≤ ∫ 𝑑 (𝑥, 𝑧)2 d𝜇(𝑥) . 𝑋
(iii) We have
var 𝜇 ≤
𝑋
1 ∫ ∫ 𝑑(𝑥, 𝑦)2 d𝜇(𝑥) d𝜇(𝑦), 2 𝑋 𝑋
for every 𝜇 ∈ P (𝑋), or equivalently for every discrete 𝜇 ∈ P2 (𝑋). 2
(2.3.15)
2.3 Convexity and probability measures
 51
(iv) (𝑋, 𝑑) is a length space and for every 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ∈ 𝑋 and 𝑠, 𝑡 ∈ [0, 1], we have
𝑠(𝑠 − 1)𝑑(𝑥1 , 𝑥3 )2 + 𝑡(1 − 𝑡)𝑑(𝑥2 , 𝑥4 )2 ≤ 𝑠𝑡𝑑(𝑥1 , 𝑥2 )2 + (1 − 𝑠)𝑡𝑑(𝑥2 , 𝑥3 )2 + (1 − 𝑠)(1 − 𝑡)𝑑(𝑥3 , 𝑥4 )2 + 𝑠(1 − 𝑡)𝑑(𝑥4 , 𝑥1 )2 . Proof. Step 1: The variance inequality (2.3.14) immediately gives (i) ⇒ (ii). To show 1 the converse implication (ii) ⇒ (i), choose 𝑥0 , 𝑥1 ∈ 𝑋 and set 𝜇 := 2 (𝛿𝑥0 + 𝛿𝑥1 ). Then there exists 𝑧𝜇 ∈ 𝑋 such that for any 𝑧 ∈ 𝑋 we have 1 𝑑(𝑧, 𝑥0 )2 2
+ 12 𝑑(𝑧, 𝑥1 )2 ≥ 12 𝑑(𝑧𝜇 , 𝑥0 )2 + 12 𝑑(𝑧𝜇 , 𝑥1 )2 + 𝑑(𝑧, 𝑧𝜇 )2 ≥ 14 𝑑(𝑥0 , 𝑥1 )2 + 𝑑(𝑧, 𝑧𝜇 )2 ,
where the second inequality follows by (1.3.27). Employing Theorem 1.3.2 gives (i). Step 2: We prove (ii) ⇒ (iii). By integrating the inequality in (ii) against d𝜇(𝑧), we obtain (iii). We get to (iii) ⇒ (iv). Let us first show that (𝑋, 𝑑) is a length space. Choose 𝑥, 𝑦 ∈ 𝑋 and 𝜀 > 0. Set 𝜇 := 12 (𝛿𝑥 + 𝛿𝑦 ). Then (iii) yields a point 𝑚 ∈ 𝑋 such that
𝑑(𝑚, 𝑥)2 + 𝑑(𝑚, 𝑦)2 ≤ 12 𝑑(𝑥, 𝑦)2 + 𝜀. This via Proposition 1.1.2 gives that (𝑋, 𝑑) is a length space. To prove the inequality in (iv), set
𝜇 :=
1 2
[𝑠𝛿𝑥1 + 𝑡𝛿𝑥2 + (1 − 𝑠)𝛿𝑥3 + (1 − 𝑡)𝛿𝑥4 ] .
Then given 𝜀 > 0, we know by (iii) that there exists 𝑦 ∈ 𝑋 such that 1 [𝑠𝑡𝑑(𝑥1 , 𝑥2 )2 4
+ (1 − 𝑠)𝑡𝑑(𝑥2 , 𝑥3 )2 + (1 − 𝑠)(1 − 𝑡)𝑑(𝑥3 , 𝑥4 )2
+ 𝑠(1 − 𝑡)𝑑(𝑥4 , 𝑥1 )2 + 𝑠(1 − 𝑠)𝑑(𝑥1 , 𝑥3 )2 + 𝑡(1 − 𝑡)𝑑(𝑥2 , 𝑥4 )2 ] + 𝜀 ≥ ≥
1 2 1 2
[𝑠𝑑(𝑦, 𝑥1 )2 + 𝑡𝑑(𝑦, 𝑥2 )2 + (1 − 𝑠)𝑑(𝑦, 𝑥3 )2 + (1 − 𝑡)𝑑(𝑦, 𝑥4 )2 ] [𝑠(1 − 𝑠)𝑑(𝑥1 , 𝑥3 )2 + 𝑡(1 − 𝑡)𝑑(𝑥2 , 𝑥4 )2 ] ,
where the last inequality follows by (1.3.28). We hence obtained (iv). Step 3: It remains to show that (iv) ⇒ (i). Given 𝑥, 𝑦 ∈ 𝑋 and 𝑠 ∈ (0, 1), Proposition 1.1.2 yields a point 𝑚 ∈ 𝑋 such that
𝑑(𝑥, 𝑚)2 + 𝑑(𝑦, 𝑚)2 ≤ 12 𝑑(𝑥, 𝑦)2 + 𝑠2 . 1
Let 𝑧 ∈ 𝑋 and apply (iv) to 𝑥1 = 𝑧, 𝑥2 = 𝑦, 𝑥3 = 𝑚, 𝑥4 = 𝑥, and 𝑡 = 2 . Then
𝑠(1 − 𝑠)𝑑(𝑚, 𝑧)2 𝑠 𝑠 1−𝑠 1−𝑠 𝑑(𝑚, 𝑦)2 + 𝑑(𝑚, 𝑥)2 − 14 𝑑(𝑥, 𝑦)2 ≤ 𝑑(𝑦, 𝑧)2 + 𝑑(𝑥, 𝑧)2 + 2 2 2 2 𝑠 𝑠 𝑠 1−𝑠 2 𝑠 . ≤ 𝑑(𝑦, 𝑧)2 + 𝑑(𝑥, 𝑧)2 − 𝑑(𝑥, 𝑦)2 + 2 2 4 2 Dividing by 𝑠 and letting 𝑠 → 0 finishes the proof.
52  2 Convex sets and convex functions Lemma 2.3.6. Let (H𝑖 , 𝑑𝑖 )𝑖∈𝐼 be a family of (pointed) Hadamard spaces with an arbitrary index set 𝐼. Let 𝑤𝑖 > 0 for each 𝑖 ∈ 𝐼, and let (H, 𝑑) be the weighted product of (H𝑖 , 𝑑𝑖 ) defined in (1.2.10). If 𝜇 ∈ P2 (H), then 𝑏(𝜇) = (𝑏(𝜇𝑖 ))𝑖∈𝐼 , where 𝜇𝑖 ∈ P2 (H𝑖 ) are the marginal measures of 𝜇. Proof. Put 𝑦 := (𝑏(𝜇𝑖 ))𝑖∈𝐼 . By the definition, 𝑏(𝜇) is a minimizer of the function 𝐹𝑦 (𝑧) :=
∫H [𝑑(𝑧, 𝑥)2 − 𝑑(𝑦, 𝑥)2 ] d𝜇(𝑥), where 𝑧 ∈ H. We have 2
2
2
2
𝐹𝑦 (𝑧) = ∫ ∑ 𝑤𝑖 [𝑑𝑖 (𝑧𝑖 , 𝑥𝑖 ) − 𝑑𝑖 ((𝑏 (𝜇𝑖 ) , 𝑥𝑖 ) ] d𝜇(𝑥) H
𝑖∈𝐼
= ∑ 𝑤𝑖 ∫ [𝑑𝑖 (𝑧𝑖 , 𝑥𝑖 ) − 𝑑𝑖 ((𝑏 (𝜇𝑖 ) , 𝑥𝑖 ) ] d𝜇 (𝑥𝑖 ) 𝑖∈𝐼
H𝑖
and by the variance inequality (2.3.14) we obtain 2
≥ ∑ 𝑤𝑖 𝑑𝑖 (𝑏 (𝜇𝑖 ) , 𝑧𝑖 ) = 𝑑(𝑦, 𝑧)2 , 𝑖∈𝐼
for any 𝑧 ∈ H. It follows that 𝑏(𝜇) = 𝑦. Lemma 2.3.7. Let 𝑓 : H → (−∞, ∞] be a convex lsc function and 𝜇 ∈ P1 (H). Then ∫H 𝑓 d𝜇 is well defined and is equal to a real number or ∞. Proof. Denote 𝑓+ := max{𝑓, 0} and 𝑓− := max{−𝑓, 0}. The integral ∫H 𝑓 d𝜇 is well
defined if ∫ 𝑓+ < ∞ or ∫ 𝑓− < ∞. We claim that one always has ∫ 𝑓− < ∞. Indeed, choose a point 𝑥0 ∈ H. Lemma 2.2.13 gives 𝛼, 𝛽 ∈ ℝ such that
𝑓(𝑥) ≥ 𝛼 + 𝛽𝑑 (𝑥, 𝑥0 ) , for every 𝑥 ∈ H. Without loss of generality assume 𝛼 = 0 and 𝛽 < 0. Then 𝑓− (𝑥) ≤ −𝛽𝑑(𝑥, 𝑥0 ) for every 𝑥 ∈ H and therefore
∫ 𝑓− (𝑥) d𝜇(𝑥) ≤ ∫ −𝛽𝑑 (𝑥, 𝑥0 ) d𝜇(𝑥) < ∞ , H
H
since 𝜇 ∈ P1 (H). Theorem 2.3.8 (Jensen inequality). Let (H, 𝑑) be a Hadamard space and 𝜇 ∈ P1 (H). For a convex lsc function 𝑓 : H → (−∞, ∞] we have
𝑓 (𝑏(𝜇)) ≤ ∫ 𝑓(𝑥) d𝜇(𝑥) .
(2.3.16)
H
̂ := (𝑥, 𝑓(𝑥)) for every 𝑥 ∈ H, and let 𝜇̂ be the image of 𝜇 under 𝑓̂. Proof. Put 𝑓(𝑥)
2.3 Convexity and probability measures
 53
Step 1: Let us first assume that 𝑓 is bounded from below. Without loss of generality we can then assume ∫H 𝑓(𝑥) d𝜇(𝑥) < ∞, for otherwise the right hand side of (2.3.16)
̂ ∈ P1 (H × ℝ). Indeed, for any 𝑧̂ := (𝑧, 𝑡) ∈ H × ℝ we have is infinite. Then 𝜇 ̂ d𝜇̂ (̂ 𝑥) = ∫ 𝑑 (̂ ∫ 𝑑 (̂𝑧, 𝑥) 𝑧, (𝑥, 𝑓(𝑥))) d𝜇(𝑥) H×ℝ
H 1
2 = ∫ [𝑑(𝑧, 𝑥)2 + 𝑡 − 𝑓(𝑥) ] 2 d𝜇(𝑥) H
≤ ∫ [𝑑(𝑧, 𝑥) + 𝑡 − 𝑓(𝑥)] d𝜇(𝑥) < ∞ . H
Applying Lemma 2.3.6 gives
𝑏 (̂ 𝜇) = (𝑏(𝜇), ∫ 𝑓(𝑥) d𝜇(𝑥)) . H
̂ ⊂ epi 𝑓, we have by Lemma 2.3.3 that 𝑏(̂ Since supp 𝜇 𝜇) ∈ epi 𝑓. That is, 𝑓 (𝑏(𝜇)) ≤ ∫ 𝑓(𝑥) d𝜇(𝑥). H
Step 2: If 𝑓 is not bounded from below, define the functions 𝑔𝑛 := max(𝑓, −𝑛), for 𝑛 ∈ ℕ, which are convex lsc and bounded from below. By Step 1 and the monotonicity of the approximation,
𝑓 (𝑏(𝜇)) ≤ 𝑛→∞ lim ∫ 𝑔𝑛 (𝑥) d𝜇(𝑥) = ∫ 𝑓(𝑥) d𝜇(𝑥). H
H
The proof is complete. A stochastic proof of the Jensen inequality will appear in Theorem 7.2.2. The Jensen inequality in turn implies convexity. Indeed, let 𝑓 : H → (−∞, ∞] be a measurable function and 𝑥 : [0, 1] → H be a geodesic. When we choose 𝑡 ∈ (0, 1), the measure 𝜇 := (1 − 𝑡)𝛿𝑥0 + 𝑡𝛿𝑥1 has the barycenter 𝑏(𝜇) = 𝑥𝑡 , and the Jensen inequality (2.3.16) gives
𝑓 (𝑥𝑡 ) ≤ (1 − 𝑡)𝑓 (𝑥0 ) + 𝑡𝑓 (𝑥1 ) , which is of course the convexity condition. We finish this section making a connection with the Wasserstein distance. For the details, the interested reader is referred to [8]. Given 𝜇, 𝜈 ∈ P(H), we say that a measure 𝜁 ∈ P(H × H) is a coupling of 𝜇 and 𝜈 if
𝜁(𝐴 × H) = 𝜇(𝐴),
and 𝜁(H × 𝐴) = 𝜈(𝐴),
for every Borel 𝐴 ⊂ H. One such a coupling 𝜁 is the product measure 𝜇 ⊗ 𝜈.
54  2 Convex sets and convex functions Definition 2.3.9 (Wasserstein distance). The Wasserstein distance of measures 𝜇, 𝜈 ∈
P1 (H) is given as { } 𝑑𝑊 (𝜇, 𝜈) := inf {∫ ∫ 𝑑(𝑥, 𝑦) d𝜁(𝑥, 𝑦) : 𝜁 is a coupling of 𝜇, 𝜈} . {H H } It is well known that (P1 (H), 𝑑𝑊 ) is a complete metric space. We have the following estimate on barycenters in terms of the Wasserstein distance. Proposition 2.3.10. If 𝜇, 𝜈 ∈ P1 (H), then 𝑑(𝑏(𝜇), 𝑏(𝜈)) ≤ 𝑑𝑊 (𝜇, 𝜈). Proof. Let 𝜁 ∈ P1 (H × H) be a coupling of 𝜇 and 𝜈. Then 𝑏(𝜁) = (𝑏(𝜇), 𝑏(𝜈)) by Lemma 2.3.6. Applying Theorem 2.3.8 on the Hadamard space H × H to the convex function 𝑑 : H × H → ℝ yields
𝑑 (𝑏(𝜇), 𝑏(𝜈)) = 𝑑 (𝑏(𝜁)) ≤ ∫ 𝑑(𝑤) d𝜁(𝑤). H×H
Therefore 𝑑(𝑏(𝜇), 𝑏(𝜈)) ≤ 𝑑𝑊 (𝜇, 𝜈). Let 𝜓 be an isometry between two Hadamard spaces (H, 𝑑) and (H , 𝑑 ). If 𝜇 ∈ P1 (H), then the pushforward 𝜓∗ (𝜇) belongs to P1 (H ) and
𝜓∗ : P1 (H) → P1 (H ) is an isometry between the spaces of probability measures equipped with the Wasserstein distances. The last result is just a special case of Proposition 2.3.10, we record it explicitly for its significance and give an elementary proof. We use Notation 2.1.9. Corollary 2.3.11. Given (𝑤1 , . . . , 𝑤𝑁 ) ∈ 𝛥 𝑁−1 , the mean of points 𝑎1 , . . . , 𝑎𝑁 ∈ H was in Example 2.2.5 defined as 𝑁
2
𝑦 := arg min ∑ 𝑤𝑛𝑑 (𝑥, 𝑎𝑛) . 𝑥∈H
𝑛=1
Let now 𝑎1 , . . . , 𝑎𝑁 ∈ H and denote 𝑁
2
𝑦 := arg min ∑ 𝑤𝑛𝑑 (𝑥, 𝑎𝑛 ) . 𝑥∈H
Then we have
𝑛=1
𝑁
𝑑 (𝑦, 𝑦 ) ≤ ∑ 𝑤𝑛𝑑 (𝑎𝑛, 𝑎𝑛 ) . 𝑛=1
2.3 Convexity and probability measures
 55
Proof. Inequality (1.3.15) yields 2
2
2
2
𝑑 (𝑎𝑛, 𝑦 ) + 𝑑 (𝑎𝑛 , 𝑦) ≤ 𝑑 (𝑎𝑛, 𝑦) + 𝑑 (𝑎𝑛 , 𝑦 ) + 2𝑑 (𝑦, 𝑦 ) 𝑑 (𝑎𝑛 , 𝑎𝑛 ) , multiplying by 𝑤𝑛 and summing up over 𝑛 from 1 to 𝑁 further gives 𝑁
2
𝑁
2
2
2
∑ 𝑤𝑛 [𝑑 (𝑎𝑛, 𝑦 ) + 𝑑 (𝑎𝑛 , 𝑦) ] ≤ ∑ 𝑤𝑛 [𝑑 (𝑎𝑛, 𝑦) + 𝑑 (𝑎𝑛 , 𝑦 ) ] 𝑛=1
𝑛=1 𝑁
+ 2𝑑 (𝑦, 𝑦 ) ∑ 𝑤𝑛𝑑 (𝑎𝑛 , 𝑎𝑛 ) . 𝑛=1
By (2.2.3) we have 𝑁
2
𝑁
2
2
2
∑ 𝑤𝑛 [𝑑 (𝑎𝑛, 𝑦 ) + 𝑑 (𝑎𝑛 , 𝑦) ] ≥ ∑ 𝑤𝑛 [𝑑 (𝑎𝑛, 𝑦) + 𝑑 (𝑎𝑛 , 𝑦 ) ] 𝑛=1
𝑛=1 2
+ 2𝑑 (𝑦, 𝑦 ) . Altogether we obtain 𝑁
𝑑 (𝑦, 𝑦 ) ≤ ∑ 𝑤𝑛𝑑 (𝑎𝑛 , 𝑎𝑛 ) , 𝑛=1
which finishes the proof.
Exercises Exercise 2.1. Find a geodesic metric space (𝑋, 𝑑) and a ball 𝐵(𝑥, 𝑟) ⊂ 𝑋 which is not convex. Exercise 2.2. Let (H1 , 𝑑1 ) and (H2 , 𝑑2 ) be two Hadamard spaces and 𝐶 ⊂ H1 × H2 be convex. Denote 𝜋2 : H1 × H2 → H2 the projection onto the second factor. Show that the set 𝜋2 (𝐶) is convex. Exercise 2.3. Prove Lemma 2.2.1. Exercise 2.4. Any continuous convex function on a Hilbert space is locally Lipschitz. Is it also true in Hadamard spaces? Exercise 2.5. Show that a Busemann function from Example 2.2.10 is well defined, that is, the limit exists. Furthermore, show it is convex and Lipschitz. Finally, choose a geodesic ray in ℝ2 and find the corresponding Busemann function and horoballs. Exercise 2.6. Assume 𝑓 : ℝ𝑛 → ℝ is a twice differentiable function and 𝜅 > 0. Show that the following are equivalent:
56  2 Convex sets and convex functions (i) The function 𝑓 is strongly convex with parameter 𝜅. (ii) Given 𝑝 ∈ ℝ𝑛 , each eigenvalue 𝜆 of the Hessian matrix ∇2 𝑓(𝑝) satisfies 𝜆 ≥ 𝜅. (iii) If 𝑥 : [0, ∞) → ℝ𝑛 and 𝑦 : [0, ∞) → ℝ𝑛 satisfy
̇ = −∇𝑓 (𝑥(𝑡)) , 𝑥(𝑡)
̇ = −∇𝑓 (𝑦(𝑡)) , 𝑦(𝑡)
𝑡 ∈ (0, ∞) ,
along with the initial conditions 𝑥(0) = 𝑥0 and 𝑦(0) = 𝑦0 for given 𝑥0 , 𝑦0 ∈ ℝ𝑛 , then we have
𝑑 (𝑥(𝑡), 𝑦(𝑡)) ≤ exp(−𝑡𝜅) 𝑑 (𝑥(0), 𝑦(0)) , for every 𝑡 ∈ [0, ∞). Exercise 2.7. Prove Lemma 2.2.15. Exercise 2.8. Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be a convex lsc function. Show that the Moreau–Yosida envelope 𝑓𝜆 is convex, but not necessarily lsc. Hint. Observe that the function 𝑔 : H × H → (−∞, ∞] given by
𝑔(𝑥, 𝑦) := 𝑓(𝑥) +
1 𝑑(𝑥, 𝑦)2 2𝜆
is jointly convex because of (1.2.4) and epi 𝑓𝜆 = 𝜋2 (epi 𝑔), where 𝜋2 : H × H × ℝ → H × ℝ is the projection onto the factor H × ℝ. Apply Exercise 2.2 with H1 := H and H2 := H × ℝ. Exercise 2.9 (Means in Hilbert spaces). Show, that if 𝐻 is a Hilbert space, 𝑎1 , . . . , 𝑎𝑁 ∈ 𝐻 and (𝑤1 , . . . , 𝑤𝑁 ) ∈ 𝛥 𝑁−1 , then 𝑁
2
arg min ∑ 𝑤𝑛𝑑 (𝑥, 𝑎𝑛) = 𝑤1 𝑎1 + ⋅ ⋅ ⋅ + 𝑤𝑁 𝑎𝑁 . 𝑥∈𝐻
(2.3.17)
𝑛=1
In other words, the mean coincides with the arithmetic mean. Exercise 2.10. Find an alternative proof of the variance inequality (2.3.14) for a measure 𝜇 ∈ P2 (H). Hint. See (7.1.2). Exercise 2.11. Let (H, 𝑑) be a Hadamard space. Show that an arbitrary family of bounded closed convex sets has the finite intersection property. Hint. Finite subsets of the index set form a directed set. Apply Proposition 2.1.16 to the appropriate net of sets. Exercise 2.12. Prove that the closure of co 𝑆 equals to co 𝑆 for any subset 𝑆 of a Hadamard space.
Bibliographical remarks
 57
Bibliographical remarks Convexity in Hadamard manifolds was systematically studied in [43]. Geometric aspects of convexity in Hadamard spaces appear throughout the whole Bridson– Haefliger book [51]. A more analytical perspective was presented by J. Jost [104]. Lemma 2.1.8 comes from [102, Lemma 2.6]. The formal convex hull was introduced by U. Lang and V. Schroeder in [127] under the name correspondence. Theorem 2.1.12 appeared in [51, Proposition 2.4, p.176] and in the Hadamard manifold case in [43, Lemma 3.2] and [74, Proposition 1.6.3]. We follow Sturm’s exposition [195, Proposition 2.6]. Firmly nonexpansive mappings (Definition 2.1.13) in the Hilbert ball are defined on page 124 of the Goebel–Reich book [92]. More recently, they were used in the paper [119] by E. Kopecká and S. Reich. Proposition 2.1.15 mimics the linear version from [166, Theorem 5.6]; see also [90, p. 247]. The fact that there exists a nonconvex Chebyshev set in the hyperbolic plane was pointed out by Genaro López–Acedo. Proposition 2.1.16 comes from [88, Lemma 2.2] and the statement in Exercise 2.11 from [126, Lemma 2.2]. The Hilbert ball case of Proposition 2.1.16 appears in [92, Theorem 18.1]. Optimization problems with the objective function as in Example 2.2.6 have been recently studied from various perspectives [16, 145–147]. The estimate in Lemma 2.2.13 is a nonlinear variant of [13, Lemma 1.5]. Lemma 2.2.19 comes from [88, Lemma 2.2] and Proposition 2.2.27 from [7, Theorem 3.9]. See also [92, Theorem 18.1]. The study of resolvents in Hadamard spaces was initiated by J. Jost in [103] and independently by U. Mayer [139]. The fact that resolvents are nonexpansive (Theorem 2.2.22) was showed in [103, Lemma 4] and [139, Lemma 1.12]. Lemma 2.2.23 is well known; see for instance [8, (4.1.2)]. The resolvent identity in Proposition 2.2.24 is taken from [139, Lemma 1.10]. A similar form of the resolvent identity appeared in [105]. Theorem 2.2.25 is due to J. Jost [103, Theorem 1] and was used to obtain the existence of harmonic maps. For a recent application of the Hamilton–Jacobi semigroup in length spaces, see [134]. The resolvent of a convex lsc function on a Hilbert space is a standard tool in convex analysis [31]. Section 2.3 comes from [195]. Since this paper leaves no space for improvement, we follow its results word by word. It is worth mentioning that the definition of a barycenter given therein applies to P1 measures, not just P2 measures. J. Jost was the first to define a barycenter in Hadamard spaces [102, Definition 2.3] and [104, Definition 3.2.1]. The proof of Corollary 2.3.11 is from [126, Lemma 4.2].
3 Weak convergence in Hadamard spaces Let (H, 𝑑) be a Hadamard space. So far we have considered only the topology induced by the metric 𝑑, in which a sequence (𝑥𝑛 ) ⊂ H converges to a point 𝑥 ∈ H if 𝑑(𝑥, 𝑥𝑛 ) → 0. This convergence is occasionally referred to as strong and denoted 𝑥𝑛 → 𝑥. It turns out however that if the space is not locally compact, some sequences which do not converge strongly at least converge in a weaker sense and in many cases this weak convergence is sufficient to obtain a desired result. We will encounter this phenomenon chiefly in Chapters 5 and 6. The situation is completely analogous to strong and weak convergences in Hilbert spaces. In particular, we will show that – any bounded sequence has a weakly convergent subsequence, – any convex closed set is (sequentially) weakly closed, and – any convex lsc function is (sequentially) weakly lsc. On the other hand it is not known whether on a given Hadamard space there exists a topology which induces this type of convergence; see Question 3.1.8. Note that we will work with sequences even though it causes no troubles to replace sequences by nets.
3.1 Existence of weak limits Our first important result is the fact that a bounded sequence has a weakly convergent subsequence. Let (H, 𝑑) be a Hadamard space and (𝑥𝑛 ) ⊂ H be a bounded sequence. Define the function 𝜔 : H → [0, ∞) as 2
𝜔 (𝑥; (𝑥𝑛)) := lim sup 𝑑 (𝑥, 𝑥𝑛) , 𝑛→∞
𝑥∈H.
(3.1.1)
This function is strongly convex (see Example 2.2.8) and therefore, by Proposition 2.2.17, it has a unique minimizer, which is called the asymptotic center of (𝑥𝑛 ). Definition 3.1.1 (Weak convergence). We shall say that (𝑥𝑛 ) ⊂ H weakly converges to a point 𝑥 ∈ H if 𝑥 is the asymptotic center of each subsequence of (𝑥𝑛 ). We use the 𝑤
notation 𝑥𝑛 → 𝑥. 𝑤
Clearly, if 𝑥𝑛 → 𝑥, then 𝑥𝑛 → 𝑥. If there is a subsequence (𝑥𝑛𝑘 ) of (𝑥𝑛 ) such that 𝑤
𝑥𝑛𝑘 → 𝑧 for some 𝑧 ∈ H, we say that 𝑧 is a weak cluster point of the sequence (𝑥𝑛 ). It is convenient to denote 𝑟(𝑥𝑛 ) := inf 𝑥∈H 𝜔(𝑥; (𝑥𝑛 )). Proposition 3.1.2. Each bounded sequence has a weakly convergent subsequence, or in other words, each bounded sequence has a weak cluster point.
3.1 Existence of weak limits

59
Proof. If (𝑢𝑛 ) is a subsequence of (𝑣𝑛 ), we will use the notation (𝑢𝑛 ) ≼ (𝑣𝑛 ). Let (𝑥𝑛 ) ⊂ H be a bounded sequence. Denote
𝜌0 := inf {𝑟 (𝑣𝑛) : (𝑣𝑛) ≼ (𝑥𝑛)} , and select (𝑣𝑛1 ) ≼ (𝑥𝑛 ) such that
𝑟 (𝑣𝑛1 ) < 𝜌0 + 1. Denote
𝜌1 := inf {𝑟 (𝑣𝑛 ) : (𝑣𝑛 ) ≼ (𝑣𝑛1 )} . Having (𝑣𝑛𝑖 ) ≼ (𝑣𝑛𝑖−1 ), set
𝜌𝑖 := inf {𝑟 (𝑣𝑛) : (𝑣𝑛) ≼ (𝑣𝑛𝑖 )} . Select (𝑣𝑛𝑖+1 ) ≼ (𝑣𝑛𝑖 ) such that
𝑟 (𝑣𝑛𝑖+1 ) ≤ 𝜌𝑖 +
1 . 𝑖+1
Since (𝜌𝑛 ) is nondecreasing and bounded from above by 𝑟(𝑥𝑛 ), it has a limit, say 𝜌. Now take the diagonal sequence (𝑣𝑘𝑘 ) and fix 𝑖 ∈ ℕ. Then (𝑣𝑘𝑘 ) is a subsequence (modulo the first 𝑖 − 1 elements) of (𝑣𝑛𝑖 ), and hence 𝑟(𝑣𝑘𝑘 ) ≥ 𝜌𝑖 . On the other hand, for the same fixed 𝑖 ∈ ℕ, we have that (𝑣𝑘𝑘 ) is a subsequence (modulo the first 𝑖 elements) 1 of (𝑣𝑛𝑖+1 ), which gives 𝑟(𝑣𝑘𝑘 ) ≤ 𝜌𝑖 + 𝑖+1 . Taking the limit 𝑖 → ∞ gives 𝑟(𝑣𝑘𝑘 ) = 𝜌.
Since any subsequence (𝑢𝑛 ) of (𝑣𝑘𝑘 ) also (for the same reasons) satisfies the in1 equalities 𝑟(𝑣𝑘𝑘 ) ≥ 𝜌𝑖 , and 𝑟(𝑣𝑘𝑘 ) ≤ 𝜌𝑖 + 𝑖+1 , for any 𝑖 ∈ ℕ, one gets
𝑟(𝑢𝑛 ) = 𝜌 .
(3.1.2)
(𝑣𝑘𝑘 )
One can conclude that is the desired subsequence. We also know that there exists a unique point 𝑥 ∈ 𝑋 such that lim sup𝑘→∞ 𝑑(𝑥, 𝑣𝑘𝑘 )2 = 𝜌. By (3.1.2) we get 𝑤
𝑣𝑘𝑘 → 𝑥. Proposition 3.1.3. For a bounded sequence (𝑥𝑛 ) ⊂ H and 𝑥 ∈ H, the following are equivalent: (i) The sequence (𝑥𝑛 ) weakly converges to 𝑥. (ii) For every geodesic 𝛾 : [0, 1] → H with 𝑥 ∈ 𝛾, we have 𝑃𝛾 𝑥𝑛 → 𝑥 as 𝑛 → ∞. (iii) For every 𝑦 ∈ H, we have 𝑃[𝑥,𝑦] 𝑥𝑛 → 𝑥 as 𝑛 → ∞. Proof. (i) ⇒ (ii): Let 𝛾 : [0, 1] → H be a geodesic with 𝑥 ∈ 𝛾. If
lim sup 𝑑 (𝑥, 𝑃𝛾 𝑥𝑛) > 0, 𝑛→∞
then there exists a subsequence (𝑦𝑛 ) of (𝑥𝑛 ) such that 𝑃𝛾 𝑦𝑛 converges to some 𝑦 ∈
𝛾 \ {𝑥}. But then 2
2
2
lim sup 𝑑 (𝑦, 𝑦𝑛) = lim sup 𝑑 (𝑃𝛾 𝑦𝑛 , 𝑦𝑛 ) ≤ lim sup 𝑑 (𝑥, 𝑦𝑛) , 𝑛→∞
𝑛→∞
𝑛→∞
60  3 Weak convergence in Hadamard spaces 𝑤
as a consequence of (2.1.1). It contradicts 𝑦𝑛 → 𝑥. (ii) ⇒ (iii): Trivial. (iii) ⇒ (i): If the sequence (𝑥𝑛 ) does not converge weakly to 𝑥, then there exists a subsequence (𝑦𝑛 ) of (𝑥𝑛 ) such that
lim sup 𝑑(𝑦, 𝑦𝑛)2 < lim sup 𝑑(𝑥, 𝑦𝑛)2 , 𝑛→∞
𝑛→∞
for some 𝑦 ∈ H \ {𝑥}. But then
lim sup 𝑑 (𝑥, 𝑃[𝑥,𝑦] 𝑦𝑛) > 0, 𝑛→∞
which contradicts (iii). One can easily see from Proposition 3.1.3 that in Hilbert spaces, the notion of weak convergence defined above coincides with the classical weak convergence; see Exercise 3.1. In the following series of lemmas we extend various properties of the weak convergence from Hilbert spaces into Hadamard spaces. Lemma 3.1.4 (Opial property). Let (𝑥𝑛 ) ⊂ H be a sequence weakly converging to a point 𝑥 ∈ H. Then we have
lim inf 𝑑(𝑥𝑛 , 𝑥) < lim inf 𝑑(𝑥𝑛, 𝑧) 𝑛→∞ 𝑛→∞ for every 𝑧 ∈ H \ {𝑥}. Proof. Follows from Proposition 3.1.3. Theorem 3.1.5 is an analog of the weak Banach–Saks property in Hilbert spaces, which uses means of points defined in Example 2.2.5 to replace Cesàro means in Hilbert space. The proof relies upon the Opial property from Lemma 3.1.4. Theorem 3.1.5 (Weak Banach–Saks property). Let (𝑥𝑛 ) ⊂ H be a sequence weakly converging to a point 𝑥. Then it has a subsequence (𝑥𝑛𝑘 ) such that the sequence of the barycenters
1 𝑘 𝑏( ∑ 𝛿𝑥𝑛 ) , 𝑘 𝑖=1 𝑖
𝑘∈ℕ
converges to 𝑥 as 𝑘 → ∞. Proof. Select a subsequence of (𝑥𝑛 ), still denoted (𝑥𝑛 ), such that 𝑑(𝑥, 𝑥𝑛 ) → 𝑟, for some 𝑟 ∈ [0, ∞). If 𝑟 = 0, we are done. Let us hence assume 𝑟 > 0. Also observe that
lim 𝑛→∞ We will proceed in two steps.
1 𝑛 2 ∑ 𝑑 (𝑥, 𝑥𝑖 ) = 𝑟2 . 𝑛 𝑖=1
3.1 Existence of weak limits

61
Step 1: Given a finite set 𝐼 ⊂ ℕ, we denote
𝑉(𝐼) := var(
1 ∑𝛿 ), 𝐼 𝑖∈𝐼 𝑥𝑖
where 𝐼 stands for the cardinality of 𝐼. Recall that the variance of a measure was defined in Definition 2.3.4. Put 𝐼𝑘𝑁 := {(𝑘 − 1)2𝑁 , . . . , 𝑘2𝑁 } ⊂ ℕ, for every 𝑘, 𝑁 ∈ ℕ. We claim that if sup lim inf 𝑉 (𝐼𝑘𝑁 ) = 𝑟2 , (3.1.3) 𝑘→∞
𝑁∈ℕ
then
1 𝑛 𝑏𝑛 := 𝑏( ∑ 𝛿𝑥𝑖 ) , 𝑛 𝑖=1
𝑛∈ℕ
converges to 𝑥. Indeed, for every 𝜀 > 0, there exists 𝑁 ∈ ℕ such that
lim inf 𝑉 (𝐼𝑘𝑁 ) ≥ 𝑟2 − 𝜀 . 𝑘→∞
Then we have
𝑟2 ≥ lim sup
1 𝑛 2 ∑ 𝑑 (𝑏𝑛, 𝑥𝑖 ) 𝑛 𝑖=1
≥ lim inf 𝑛→∞
1 𝑛 2 ∑ 𝑑 (𝑏𝑛, 𝑥𝑖 ) 𝑛 𝑖=1
≥ lim inf
1 𝑘 ∑ 𝑉 (𝐼𝑘𝑁 ) 𝑘 𝑖=1
𝑛→∞
𝑘→∞
2
≥𝑟 −𝜀. Since 𝜀 was arbitrary, we get lim𝑛→∞ ity (2.3.14) reads
1 𝑛
𝑛
∑𝑖=1 𝑑(𝑏𝑛, 𝑥𝑖 )2 = 𝑟2 . The variance inequal
1 𝑛 1 𝑛 2 2 2 ∑ 𝑑 (𝑥, 𝑥𝑖 ) − ∑ 𝑑 (𝑏𝑛, 𝑥𝑖 ) ≥ 𝑑 (𝑏𝑛, 𝑥) , 𝑛 𝑖=1 𝑛 𝑖=1 and immediately implies 𝑏𝑛 → 𝑥. Step 2: We will select a subsequence of (𝑥𝑛 ) which ultimately satisfies assumption (3.1.3) in the previous step. Put 𝐽𝑘0 := {𝑘} for each 𝑘 ∈ ℕ, and construct inductively, for every 𝑁 ∈ ℕ, a sequence (𝐽𝑘𝑁 )𝑘∈ℕ of subsets of ℕ of cardinality 2𝑁 such 𝑁−1 𝑁 that 𝐽𝑘𝑁 = 𝐽𝑙𝑁−1 ∪ 𝐽𝑚 for some 𝑙, 𝑚 ∈ ℕ, and max 𝐽𝑘𝑁 < min 𝐽𝑘+1 , and 𝐽1𝑁 ⊂ 𝐽1𝑁+1 , and
lim 𝑉 (𝐽𝑘𝑁 ) = 𝑉𝑁 := lim sup 𝑉 (𝐽𝑙𝑁−1 ∪ 𝐽𝑚𝑁−1 ) .
𝑘→∞
𝑙,𝑚→∞
It is easy to see that 𝑉𝑁 ≤ 𝑉𝑁+1 ≤ 𝑟2 for every 𝑁 ∈ ℕ. We claim that 𝑉𝑁 → 𝑟2 as 𝑁 → ∞. To prove this, we will show that for each 𝜀 ∈ (0, 𝑟) there exists 𝛿 > 0 such
62  3 Weak convergence in Hadamard spaces that if 𝑉𝑁 < (𝑟 − 𝜀)2 for some 𝑁 ∈ ℕ, then 𝑉𝑁+1 > 𝑉𝑁 + 𝛿. Fix 𝑁 ∈ ℕ and for each 𝑙 ∈ ℕ, denote by 𝑏𝑙𝑁 the barycenter of the measure
1 ∑𝛿 . 2𝑁 𝑖∈𝐽𝑁 𝑥𝑖 𝑙
Fix 𝑙 ∈ ℕ such that 𝑉(𝐽𝑖𝑁 ) < (𝑟 − 𝜀)2 for each 𝑖 > 𝑙. Then choose 𝑚 > 𝑙 such that 𝑑(𝑏𝑙𝑁 , 𝑥𝑖 ) > 𝑟 for every 𝑖 ∈ 𝐽𝑚𝑁 on account of Lemma 3.1.4. We arrive at 2 2 1 1 ∑ 𝑑 (𝑏𝑙𝑁 , 𝑥𝑖 ) > 𝑟2 > (𝑟 − 𝜀)2 > 𝑉 (𝐽𝑚𝑁 ) = 𝑁 ∑ 𝑑 (𝑏𝑚𝑁 , 𝑥𝑖 ) , 𝑁 2 𝑖∈𝐽𝑁 2 𝑖∈𝐽𝑁 𝑚
𝑚
and consequently, 𝑁+1 𝑁+1 𝑁 2 max {𝑑 (𝑏𝑚𝑁 , 𝑏𝑙∪𝑚 ) , 𝑑 (𝑏𝑙∪𝑚 , 𝑏𝑙 )} ≥ 𝑑 (𝑏𝑚𝑁 , 𝑏𝑙𝑁 ) > 𝜀 , 𝑁+1 where 𝑏𝑙∪𝑚 stands for the barycenter of the measure
1 2𝑁+1
∑ 𝛿𝑥𝑖 .
𝑁 𝑖∈𝐽𝑙𝑁 ∪𝐽𝑚
Invoking the variance inequality (2.3.14) again, we obtain
𝑉 (𝐽𝑙𝑁 ∪ 𝐽𝑚𝑁 ) =
1 2𝑁+1
2
2
𝑁+1 𝑁+1 [ ∑ 𝑑 (𝑏𝑙∪𝑚 , 𝑥𝑖 ) + ∑ 𝑑 (𝑏𝑙∪𝑚 , 𝑥𝑖 ) ] 𝑖∈𝐽𝑙𝑁
𝑖∈𝐽𝑙𝑁
2
≥
1 𝜀 [𝑉 (𝐽𝑙𝑁 ) + 𝑉 (𝐽𝑚𝑁 ) + ] . 2 4
We have just proved 𝑉𝑁 → 𝑟2 as 𝑁 → ∞. Finally, observe that the set ⋂𝑁 ⋃𝑘 𝐽𝑘𝑁 ⊂ ℕ is infinite. Let us denote its elements in increasing order by (𝑛1 , 𝑛2 , . . . ). Then the sequence (𝑥𝑛𝑘 ) satisfies assumption (3.1.3) in Step 1, modulo renaming (𝑥𝑛𝑘 ) to (𝑥𝑛 ). The proof is hence complete. The conclusion of Proposition 3.1.6 is a counterpart of the Kadec–Klee property in Banach spaces. Recall that a Banach space has the Kadec–Klee property if the weak and strong convergences of sequences coincide on the unit sphere of the Banach space [82]. Hilbert spaces of course enjoy this property. Proposition 3.1.6 (Kadec–Klee property). Let (𝑥𝑛 ) ⊂ H and 𝑥 ∈ H. Then 𝑥𝑛 → 𝑥 if 𝑤
and only if 𝑥𝑛 → 𝑥 and 𝑑(𝑥𝑛 , 𝑦) → 𝑑(𝑥, 𝑦) for some 𝑦 ∈ H. 𝑤
Proof. Let 𝑥𝑛 → 𝑥 and 𝑑(𝑥𝑛 , 𝑦) → 𝑑(𝑥, 𝑦) for some 𝑦 ∈ H. Note that 𝑑(𝑥𝑛 , 𝑦)2 ≥ 𝑑(𝑥𝑛, 𝑃[𝑥,𝑦] 𝑥𝑛)2 + 𝑑(𝑃[𝑥,𝑦] 𝑥𝑛, 𝑦)2 holds. Since 𝑃[𝑥,𝑦] 𝑥𝑛 converges to 𝑥 by Proposition 3.1.3, we have 𝑑(𝑥𝑛 , 𝑃[𝑥,𝑦] 𝑥𝑛 )2 → 0 and consequently 𝑥𝑛 → 𝑥. The converse implication is trivial.
3.2 Weak convergence and convexity

63
A somewhat quantified version of Proposition 3.1.6 is in the following result. Proposition 3.1.7 (Uniform Kadec–Klee property). For every 𝑟 > 0 and 𝜀 ∈ (0, 𝑟) there exists 𝛿 ∈ (0, 𝑟) such that for every 𝑦 ∈ H and a sequence (𝑥𝑛 ) ⊂ 𝐵(𝑦, 𝑟) weakly converging to a point 𝑥 ∈ H and satisfying
lim sup 𝑑 (𝑥, 𝑥𝑛) > 𝜀 , 𝑛→∞
we have 𝑑(𝑥, 𝑦) ≤ 𝑟 − 𝛿. Proof. Select a subsequence of (𝑥𝑛 ), still denoted (𝑥𝑛 ), such that 𝑑(𝑥, 𝑥𝑛 ) > 𝜀 for each 𝑛 ∈ ℕ. Given 𝜆 > 0, there exists 𝑛0 ∈ ℕ so that for every 𝑛 > 𝑛0 , we have
𝑑(𝑦, 𝑥𝑛)2 ≥ 𝑑(𝑥, 𝑦)2 + 𝑑(𝑥, 𝑥𝑛)2 − 𝜆 , on account of (2.1.1). Then 𝑑(𝑥, 𝑦)2 ≤ 𝑟2 − 𝜀2 + 𝜆, and since 𝜆 was arbitrary, we have 𝑑(𝑥, 𝑦)2 ≤ 𝑟2 − 𝜀2 . The existence of weak convergence raises the following question. Question 3.1.8 (Weak topology). Is there a topology 𝜏 on (H, 𝑑) such that for each 𝑤
bounded sequence (𝑥𝑛 ) ⊂ H and a point 𝑥 ∈ H we have 𝑥𝑛 → 𝑥 if and only if 𝜏
𝑥𝑛 → 𝑥? In Chapters 4 and 5 we will need the notion of the weak convergence of a curve to a point. Definition 3.1.9 (Weak convergence of a curve). Let (H, 𝑑) be a Hadamard space and 𝑐 : [0, ∞) → H be a curve (that is, a continuous mapping). Then 𝑐 is said to weakly 𝑤 converge to a point 𝑥 ∈ H if 𝑐(𝑡𝑛 ) → 𝑥 for each sequence (𝑡𝑛 ) ⊂ [0, ∞) with 𝑡𝑛 → ∞. 𝑤 We then write 𝑐(𝑡) → 𝑥. Note that the previous definition implicitly presumes that 𝑐 is bounded. Alternatively, one might view a curve as a net.
3.2 Weak convergence and convexity We now get to the remaining two properties of weak convergence promised in the beginning of the present chapter. Let again (H, 𝑑) be a Hadamard space. Lemma 3.2.1. Let 𝐶 ⊂ H be a closed convex set and (𝑥𝑛 ) ⊂ 𝐶. If the sequence (𝑥𝑛 ) weakly converges to a point 𝑥 ∈ H, then 𝑥 ∈ 𝐶. Proof. Assume that 𝑥 ∉ 𝐶 and denote 𝛾 := [𝑥, 𝑃𝐶 𝑥]. We claim that 𝑃𝛾 𝑥𝑛 = 𝑃𝐶 𝑥 for each 𝑛 ∈ ℕ. Indeed, if for some 𝑚 ∈ ℕ we had 𝑃𝛾 𝑥𝑚 ≠ 𝑃𝐶 𝑥, then by (2.1.1), we would
64  3 Weak convergence in Hadamard spaces have both 2
2
2
𝑑 (𝑥𝑚 , 𝑃𝛾 𝑥𝑚 ) ≥ 𝑑 (𝑥𝑚 , 𝑃𝐶 𝑥) + 𝑑 (𝑃𝐶 𝑥, 𝑃𝛾 𝑥𝑚 ) , and 2
2
2
𝑑 (𝑥𝑚 , 𝑃𝐶 𝑥) ≥ 𝑑 (𝑥𝑚 , 𝑃𝛾 𝑥𝑚 ) + 𝑑 (𝑃𝐶 𝑥, 𝑃𝛾 𝑥𝑚 ) , which is impossible. Finally,
𝑑(𝑃𝛾 𝑥𝑛, 𝑥) = 𝑑 (𝑃𝐶 𝑥, 𝑥) ≠ 0, 𝑤
for each 𝑛 ∈ ℕ, which, by Proposition 3.1.3, contradicts 𝑥𝑛 → 𝑥. Definition 3.2.2 (Weak lower semicontinuity). We shall say that a function 𝑓 : H → (−∞, ∞] is weakly lsc at a given point 𝑥 ∈ dom 𝑓 if
lim inf 𝑓(𝑥𝑛) ≥ 𝑓(𝑥), 𝑛→∞ 𝑤
for each sequence 𝑥𝑛 → 𝑥. We say that 𝑓 is weakly lsc if it is weakly lsc at every 𝑥 ∈ dom 𝑓. Lemma 3.2.3. If 𝑓 : H → (−∞, ∞] is a convex lsc function, then it is weakly lsc. Proof. By contradiction. Let (𝑥𝑛 ) ⊂ H be a sequence weakly converging to a point
𝑥 ∈ dom 𝑓. Suppose that lim inf 𝑓(𝑥𝑛) < 𝑓(𝑥). 𝑛→∞ That is, there exist a subsequence (𝑥𝑛𝑘 ), index 𝑘0 ∈ ℕ and 𝛿 > 0 such that 𝑓(𝑥𝑛𝑘 ) < 𝑓(𝑥) − 𝛿 for each 𝑘 > 𝑘0 . By the lower semicontinuity and convexity of 𝑓, we get
𝑓(𝑦) ≤ 𝑓(𝑥) − 𝛿 for every 𝑦 ∈ co{𝑥𝑛𝑘 : 𝑘 > 𝑘0 }. But this, through Lemma 3.2.1, yields a contradiction 𝑤
to 𝑥𝑛 → 𝑥. Corollary 3.2.4. Let 𝐶 ⊂ H be a closed convex set. The distance function 𝑑𝐶 as well as its square 𝑑2𝐶 are weakly lsc. Next we introduce a simple, but useful notion. Definition 3.2.5 (Fejér monotone sequence). A sequence (𝑥𝑛 ) ⊂ H is Fejér monotone with respect to a set 𝑆 ⊂ H if, for each 𝑦 ∈ 𝑆, we have
𝑑 (𝑥𝑛+1 , 𝑦) ≤ 𝑑 (𝑥𝑛, 𝑦) ,
𝑛 ∈ ℕ.
Recall that a sequence (𝑥𝑛 ) ⊂ H is said to converge linearly to a point 𝑥 ∈ H if there exist 𝐿 ≥ 0 and 𝜃 ∈ [0, 1) such that
𝑑 (𝑥, 𝑥𝑛) ≤ 𝐿𝜃𝑛 ,
𝑛∈ℕ.
The parameter 𝜃 is called the rate of linear convergence.
(3.2.4)
3.3 An application in fixed point theory
 65
Proposition 3.2.6 (Properties of Fejér monotonicity). Let 𝐶 ⊂ H be a convex closed set and (𝑥𝑛 ) ⊂ H be a Fejér monotone sequence with respect to 𝐶. Then the following holds: (i) The sequence (𝑥𝑛 ) is bounded. (ii) 𝑑(𝑥𝑛+1 , 𝐶) ≤ 𝑑(𝑥𝑛 , 𝐶) for each 𝑛 ∈ ℕ. (iii) The sequence (𝑥𝑛 ) weakly converges to some 𝑥 ∈ 𝐶 if and only if all weak cluster points of (𝑥𝑛 ) belong to 𝐶. (iv) The sequence (𝑥𝑛 ) converges to some 𝑥 ∈ 𝐶 if and only if 𝑑(𝑥𝑛 , 𝐶) → 0. (v) The sequence (𝑥𝑛 ) converges linearly to some 𝑥 ∈ 𝐶, provided there exists 𝜃 ∈ [0, 1) such that 𝑑(𝑥𝑛+1 , 𝐶) ≤ 𝜃𝑑(𝑥𝑛 , 𝐶) for each 𝑛 ∈ ℕ. Proof. (i) and (ii) are easy. Let us prove the nontrivial implication of (iii). Assume that all weak cluster points of (𝑥𝑛 ) lie in 𝐶. It suffices to show that (𝑥𝑛 ) has a unique cluster point. By contradiction, let 𝑐1 , 𝑐2 ∈ 𝐶, with 𝑐1 ≠ 𝑐2 , be weak cluster points of (𝑥𝑛 ). That 𝑤
𝑤
is, there are subsequences (𝑥𝑛𝑘 ) and (𝑥𝑚𝑘 ) such that 𝑥𝑛𝑘 → 𝑐1 and 𝑥𝑚𝑘 → 𝑐2 . Without loss of generality, assume 𝑟(𝑥𝑛𝑘 ) ≤ 𝑟(𝑥𝑚𝑘 ). For any 𝜀 > 0 there exists 𝑘0 ∈ ℕ such that 𝑑(𝑥𝑛𝑘 , 𝑐1 )2 < 𝑟(𝑥𝑛𝑘 ) + 𝜀, for each 𝑘 ≥ 𝑘0 . By Fejér monotonicity we also have
𝑑(𝑥𝑚𝑘 , 𝑐1 )2 < 𝑟(𝑥𝑛𝑘 ) + 𝜀, for every 𝑚𝑘 ≥ 𝑛𝑘0 . Hence, there exists 𝑘1 ∈ ℕ such that 𝑑(𝑥𝑚𝑘 , 𝑐1 ) < 𝑟(𝑥𝑚𝑘 ) + 𝜀, for each 𝑘 ≥ 𝑘1 . But this contradicts the fact that 𝑐2 is the unique asymptotic center of (𝑥𝑚𝑘 ). Now we prove (iv). Suppose 𝑑(𝑥𝑛 , 𝐶) → 0. Since for each 𝑘 ∈ ℕ we have 𝑑(𝑥𝑛+𝑘 , 𝑥𝑛) ≤ 𝑑 (𝑥𝑛+𝑘 , 𝑃𝐶 𝑥𝑛) + 𝑑 (𝑥𝑛, 𝑃𝐶 𝑥𝑛)
(3.2.5a)
and hence, by Fejér monotonicity,
≤ 𝑑 (𝑥𝑛, 𝑃𝐶 𝑥𝑛) + 𝑑 (𝑥𝑛, 𝑃𝐶 𝑥𝑛) ≤ 2𝑑(𝑥𝑛 , 𝐶) ,
(3.2.5b)
we obtain that (𝑥𝑛 ) is Cauchy and so converges to a point from 𝐶. The converse implication in (iv) is trivial. It remains to prove (v). From (3.2.5) we get
𝑑(𝑥𝑛+𝑘 , 𝑥𝑛) ≤ 2𝑑(𝑥𝑛 , 𝐶) ≤ 2𝜃𝑛 𝑑(𝑥0 , 𝐶) for every 𝑛, 𝑘 ∈ ℕ. The sequence (𝑥𝑛 ) obviously converges to some 𝑥 ∈ 𝐶 and thus letting 𝑘 → ∞ we have
𝑑(𝑥, 𝑥𝑛) ≤ 2𝑑(𝑥𝑛 , 𝐶) ≤ 2𝜃𝑛 𝑑(𝑥0 , 𝐶) ,
𝑛∈ℕ.
In other words, 𝑥𝑛 → 𝑥 linearly with rate 𝜃, completing the proof.
3.3 An application in fixed point theory As the first application of weak convergence in Hadamard spaces, we present the following fixed point theorem.
66  3 Weak convergence in Hadamard spaces Theorem 3.3.1. Let (H, 𝑑) be a Hadamard space and 𝐶 ⊂ H be convex, closed and bounded. Then for any 𝐹 : 𝐶 → 𝐶 nonexpansive, the set of fixed points Fix 𝐹 is convex, closed and nonempty. Proof. We start by proving the existence of a fixed point. Step 1: We first show that inf 𝑥∈𝐶 𝑑(𝑥, 𝐹𝑥) = 0. Choose 𝜀 ∈ (0, 1) and 𝑧 ∈ 𝐶. Define a mapping 𝐺 : 𝐶 → 𝐶 by 𝐺𝑥 := (1 − 𝜀)𝐹𝑥 + 𝜀𝑧 for every 𝑥 ∈ 𝐶. Since 𝐹 is nonexpansive, the mapping 𝐺 is a contraction and hence has a fixed point 𝑦 ∈ H by the Banach contraction principle. An easy estimate gives
𝑑(𝑦, 𝐹𝑦) = 𝑑(𝐺𝑦, 𝐹𝑦) = 𝜀𝑑(𝑧, 𝐹𝑦) ≤ 𝜀 diam 𝐶, which yields inf 𝑥∈𝐶 𝑑(𝑥, 𝐹𝑥) = 0. Step 2: Choose a sequence 𝑥𝑛 ∈ 𝐶 such that 𝑑(𝑥𝑛 , 𝐹𝑥𝑛 ) → 0. Since 𝐶 is bounded, there exists a subsequence of (𝑥𝑛 ), still denoted by (𝑥𝑛 ), weakly converging to some 𝑥 ∈ 𝐶. Let us now estimate
lim sup 𝑑 (𝐹𝑥, 𝑥𝑛) ≤ lim sup [𝑑 (𝐹𝑥, 𝐹𝑥𝑛) + 𝑑 (𝐹𝑥𝑛, 𝑥𝑛)] 𝑛→∞
𝑛→∞
= lim sup 𝑑 (𝐹𝑥, 𝐹𝑥𝑛) 𝑛→∞
≤ lim sup 𝑑 (𝑥, 𝑥𝑛) . 𝑛→∞
By the uniqueness of the weak limit we get 𝑥 = 𝐹𝑥. Step 3: It has already been observed in Example 2.1.5 that the set of fixed points Fix 𝐹 of a nonexpansive mapping 𝐹 : H → H is convex and closed. Approximation methods for fixed points will be studied in Section 6.2.
Exercises Exercise 3.1 (Hilbert space). Let 𝐻 be a Hilbert space, and ⟨⋅, ⋅⟩ its inner product. A sequence (𝑥𝑛 ) ⊂ 𝐻 is said to converge weakly to a point 𝑥 ∈ 𝐻 if
⟨𝑥𝑛 − 𝑥, 𝑦⟩ → 0 ,
𝑛 → ∞,
for each 𝑦 ∈ 𝐻. Show that this weak convergence coincides with the one defined in Definition 3.1.1. Exercise 3.2 (Hilbert ball). Let (𝔹, 𝜌) be the Hilbert ball from Example 1.2.13. Assume 𝑤
that (𝑥𝑛 ) is a bounded sequence (with respect to 𝜌) and 𝑥 ∈ 𝔹. Show that 𝑥𝑛 → 𝑥 if and only if (𝑥𝑛 ) converges to 𝑥 weakly in the usual the Hilbert space sense. Hint. See [92, Proposition 21.4] if need be.
Bibliographical remarks  67
Exercise 3.3 (Projection theorem revisited). Use weak convergence to show that, given 𝐶 ⊂ H a closed convex subset of a Hadamard space (H, 𝑑) and 𝑥 ∈ H, there exists a point 𝑐 ∈ 𝐶 such that 𝑑𝐶 (𝑥) = 𝑑(𝑐, 𝑥). Hint. There exists (𝑐𝑛 ) ⊂ 𝐶 such that 𝑑(𝑐𝑛 , 𝑥) → 𝑑𝐶 (𝑥). It is bounded, so a subsequence (𝑐𝑛𝑘 ) weakly converges to some 𝑐 ∈ H. Since 𝐶 is convex, 𝑐 ∈ 𝐶. Finally,
𝑑𝐶 (𝑥) ≤ 𝑑(𝑥, 𝑐) ≤ lim inf 𝑑 (𝑥, 𝑐𝑛) = 𝑑𝐶 (𝑥). 𝑛→∞ Exercise 3.4 (Weak Banach–Saks property). Let (𝑥𝑛 ) be a sequence in a Hilbert space which weakly converges to a point 𝑥. Show that it has a subsequence (𝑥𝑛𝑘 ) such that the sequence of Cesàro means
𝑎𝑘 :=
𝑥𝑛1 + ⋅ ⋅ ⋅ + 𝑥𝑛𝑘 𝑘
,
𝑘∈ℕ
converges to 𝑥.
Bibliographical remarks Weak convergence is an essential notion in Banach spaces [82]. J. Jost was the first to introduce weak convergence in Hadamard spaces [102, Definition 2.7]. It was later rediscovered in the following development. E. N. Sosov [188] introduced 𝜓 and 𝜑convergences, both generalizing Hilbert space weak convergence into geodesic metric spaces, which motivated T. C. Lim [132] to define his 𝛥convergence. It was then brought into Hadamard spaces by W. A. Kirk and B. Panyanak [111] and finally, R. Espínola and A. Fernández–León [81] modified Sosov’s 𝜑convergence to obtain an equivalent formulation of 𝛥convergence in Hadamard spaces, which however was exactly the original weak convergence due to J. Jost [102]. Weak convergence theory was in a similar form surveyed in [17, Section 3]. Unlike [111], we use in (3.1.1) the square power of the distance function, which immediately yields a unique minimizer of the function 𝜔. The existence and uniqueness of an asymptotic center in the Hilber ball was proved in [92, Proposition 21.1]. The proof of Proposition 3.1.2 mimics [91, Lemma 15.2]. J. Jost [102, Theorem 2.1] gave a different proof of Proposition 3.1.2. The characterization of weak convergence in Proposition 3.1.3 comes from [81, Proposition 5.2]. The property in Lemma 3.1.4 is named after Z. Opial [156] who established it for Hilbert spaces. The weak Banach–Saks property is known to be enjoyed by Hilbert spaces [82]. Its counterpart in Hadamard spaces was asserted in [102, Theorem 2.2]. In Theorem 3.1.5 we reproduce the proof of [209, Theorem C] due to T. Yokota. Proposition 3.1.7 was inspired by [111, Theorem 3.9] and [17, Proposition 3.6]. Lemmas 3.2.1 and 3.2.3 appeared in [22] and [19], respectively. The Hilbert ball case of Lemma 3.2.1 can be deduced from [92, Lemma 21.2]. Question 3.1.8 was made explicit in [111, p. 3696].
68  3 Weak convergence in Hadamard spaces P. Combettes [67] surveys the importance and usefulness of Fejér monotonicity in optimization and also describes its history. In the context of Hadamard spaces, this property was first used in [22]. Proposition 3.2.6 is from [22, Proposition 3.3]. It is also possible to define weak convergence in an asymptotic relation of Hadamard spaces [124, Definition 5.2]. See also [17, Section 3]. Theorem 3.3.1 is due to W. A. Kirk; see [112, Theorem 18] and [110, Theorem 12]. For a nice application in graph theory, see [109]. I am grateful to Takumi Yokota for informing me of his work [209].
4 Nonexpansive mappings Let (H, 𝑑) be a Hadamard space. A mapping 𝐹 : H → H is nonexpansive if 𝑑(𝐹𝑥, 𝐹𝑦) ≤ 𝑑(𝑥, 𝑦) for every 𝑥, 𝑦 ∈ H. We have already met such mappings in previous chapters, for instance in Theorem 3.3.1. The metric projection in Theorem 2.1.12 and the resolvent of a function in Definition 2.2.20 are important examples of nonexpansive mappings in Hadamard spaces. In this chapter, we first present an extension theorem in the spirit of the Kirszbraun–Valentine theorem, then introduce a resolvent of a nonexpansive mapping and finally we will show that the celebrated Crandall–Liggett construction of semigroups carries over into Hadamard spaces. The results developed here are the crucial building blocks in the proof of the Lie–Trotter–Kato formula in Section 5.3. We believe though, they are interesting on their own.
4.1 Kirszbraun–Valentine extension In this section we are concerned with the question as to whether it is possible to generalize the Kirszbraun–Valentine theorem for mappings with values in a Hadamard space. More precisely, given a nonexpansive mapping defined on a subset of a Hilbert space with values in a Hadamard space, does there exist a nonexpansive extension of this mapping defined on the whole Hilbert space? The answer is yes and before presenting this nice result, we recall the notion of a tangent cone from Section 1.2. Let (H, 𝑑) be a Hadamard space and 𝑝 ∈ H. Then the tangent cone 𝑇𝑝 H was defined as the Euclidean cone over the space of directions at 𝑝, and the completed tangent cone 𝑇𝑝 H is the Euclidean cone over the completed space of directions at 𝑝. Also recall that the latter is a Hadamard space when equipped with the cone metric 𝑑𝑝 . The point (0, 𝑥) ∈ 𝑇𝑝 H, with an arbitrary 𝑥, is called the origin of 𝑇𝑝 H and we denote it by 𝑜. Put also ‖𝑣‖𝑝 := 𝑑𝑝 (𝑜, 𝑣) for each 𝑣 ∈ 𝑇𝑝 H, and define an analog of a Riemannian metric on 𝑇𝑝 H by setting
⟨𝑢, 𝑣⟩𝑝 :=
‖𝑢‖2 + ‖𝑣‖2 − 𝑑𝑝 (𝑢, 𝑣)2 2
,
𝑢, 𝑣 ∈ 𝑇𝑝 H .
One can easily verify that
⟨𝑢, 𝑣⟩𝑝 = 𝑠𝑡 cos 𝛼(𝑥, 𝑦) , whenever 𝑢 = (𝑠, 𝑥) and 𝑣 = (𝑡, 𝑦) and 𝛼(𝑥, 𝑦) is the angle distance between the directions 𝑥 and 𝑦. The following lemma will be needed. It uses the notion of a formal convex hull introduced in Definition 2.1.10.
70  4 Nonexpansive mappings Lemma 4.1.1. Let (H, 𝑑) be a Hadamard space and 𝑝 ∈ H. If 𝑣1 , . . . , 𝑣𝑁 ∈ 𝑇𝑝 H and (𝑢, 𝜆), (𝑤, 𝜇) ∈ fco{𝑣1 , . . . , 𝑣𝑁 }, then 𝑁
⟨𝑢, 𝑤⟩𝑝 ≥ ∑ 𝜆 𝑖 𝜇𝑗 ⟨𝑣𝑖 , 𝑣𝑗 ⟩𝑝 , 𝑖,𝑗=1
where 𝜆 := (𝜆 1 , . . . , 𝜆 𝑁 ) and 𝜇 := (𝜇1 , . . . , 𝜇𝑁 ) are from 𝛥 𝑁−1 . Proof. Step 1: For the moment, let us denote the midpoint of 𝑣1 and 𝑣2 by 𝑚. We claim that
⟨𝑚, 𝑧⟩𝑝 ≥
1 1 ⟨𝑣 , 𝑧⟩ + ⟨𝑣 , 𝑧⟩ , 2 1 𝑝 2 2 𝑝
(4.1.1)
for each 𝑧 ∈ 𝑇𝑝 H. Indeed, since we have
‖𝑚‖2𝑝 =
1 2 1 2 2 𝑣 + 𝑣 − 𝑑𝑝 (𝑣1 , 𝑣2 ) , 2 1 𝑝 2 2 𝑝
we arrive at
‖𝑚‖2𝑝 + ‖𝑧‖2𝑝 − 𝑑𝑝 (𝑚, 𝑧)2 2
2 2 𝑣2 2 + ‖𝑧‖2 − 𝑑𝑝 (𝑣2 , 𝑧)2 𝑣1 𝑝 + ‖𝑧‖2𝑝 − 𝑑𝑝 (𝑣1 , 𝑧) 𝑝 𝑝 ≥ + 4 4
on account of (1.2.3), and the last inequality already implies (4.1.1). Step 2: Let 𝛾 : [0, 1] → 𝑇𝑝 H be the geodesic from 𝑣1 to 𝑣2 . We claim that
⟨𝛾𝑡 , 𝑧⟩𝑝 ≥ (1 − 𝑡) ⟨𝛾0 , 𝑧⟩𝑝 + 𝑡 ⟨𝛾1 , 𝑧⟩𝑝 ,
(4.1.2)
for each 𝑧 ∈ 𝑇𝑝 H and 𝑡 ∈ [0, 1]. Indeed, we repeatedly use (4.1.1) to prove (4.1.2) for every dyadic 𝑡 := 𝑘/2𝑙 , where 𝑘 = 1, . . . , 2𝑙 and 𝑙 ∈ ℕ, and then extend the inequality for an arbitrary 𝑡 ∈ [0, 1] by continuity.¹ Step 3: Fix now 𝑡 ∈ [0, 1] and let 𝜂 : [0, 1] → 𝑇𝑝 H be the geodesic from 𝛾𝑡 to 𝑣3 . Applying (4.1.2) twice, first with 𝛾 and then with 𝜂, yields
⟨𝜂𝑠 , 𝑧⟩ ≥ (1 − 𝑡)(1 − 𝑠) ⟨𝑣1 , 𝑧⟩ + 𝑡(1 − 𝑠) ⟨𝑣2 , 𝑧⟩ + 𝑠 ⟨𝑣3 , 𝑧⟩ , for each 𝑧 ∈ 𝑇𝑝 H and 𝑠 ∈ [0, 1]. Iterating this process gives the proof. We are now ready to state the main result of the present section. Theorem 4.1.2 (Extending nonexpansive mappings). Let (𝐻, ‖ ⋅ ‖) be a Hilbert space and (H, 𝑑) be a Hadamard space. Given a set 𝑆 ⊂ 𝐻 and a nonexpansive mapping 𝐹 : 𝑆 → H, there exists a nonexpansive mapping 𝐺 : 𝐻 → H such that 𝐺 ↾𝑆 = 𝐹 and 𝐺(𝐻) ⊂ co 𝐹(𝑆).
1 Such an argument was used several times in Chapter 1.
4.1 Kirszbraun–Valentine extension  71
Proof. Step 1: First we show that a nonexpansive mapping defined on a set of 𝑁 points 𝑥1 , . . . , 𝑥𝑁 ⊂ 𝐻, where 𝑁 ∈ ℕ, can be extended onto the set {𝑥1 , . . . , 𝑥𝑁 }∪{𝑥}, where 𝑥 ∈ 𝐻. Let hence 𝐹 : {𝑥1, . . . , 𝑥𝑁 } → H be nonexpansive and denote 𝑦𝑛 := 𝑓(𝑥𝑛 ) for every 𝑛 = 1, . . . , 𝑁. Since the set of minimizers of the function
𝑑 (𝑧, 𝑦𝑛) 𝑧 → max , 𝑛=1,...,𝑁 𝑥 − 𝑥 𝑛
𝑧∈H,
coincides with the set of minimizers of the function 2
𝑑 (𝑧, 𝑦𝑛) 𝑧 → max , 𝑛=1,...,𝑁 𝑥 − 𝑥 2 𝑛
𝑧∈H,
and the latter is strongly convex, there exists a unique point 𝑦 ∈ H such that
𝑑 (𝑧, 𝑦𝑛) 𝑦 = arg min max (4.1.3) . 𝑛=1,...,𝑁 𝑥 − 𝑥𝑛 𝑧∈H Here we used Proposition 2.2.17. It is easy to see that 𝑦 ∈ co{𝑦1 , . . . , 𝑦𝑁 }. If we denote 𝑑 (𝑦, 𝑦𝑛) 𝐿 := max , 𝑛=1,...,𝑁 𝑥 − 𝑥𝑛 then obviously 𝑑(𝑦, 𝑦𝑛 ) ≤ 𝐿‖𝑥−𝑥𝑛‖, for each 𝑛 = 1, . . . , 𝑁, and there exist indices for which the equality is attained. Without loss of generality we may assume there exists
1 ≤ 𝑘 ≤ 𝑁 such that 𝑑(𝑦, 𝑦𝑛) = 𝐿‖𝑥 − 𝑥𝑛‖, for each 𝑛 = 1, . . . , 𝑘. Assume that 𝐿 > 1. Let 𝛾𝑛 : [0, 1] → H be the geodesic from 𝑦 to 𝑦𝑛 and let 𝑣𝑛 := (1, 𝛾𝑛 ) ∈ 𝑇𝑦 H, for 𝑛 = 1, . . . , 𝑘. Here we identified the geodesics 𝛾𝑛’s with the directions they determine. We will now show that the origin 𝑜 of the completed tangent cone 𝑇𝑦 H lies in co{𝑣1 , . . . , 𝑣𝑘 }. If it were not the case, denote by 𝑤 the projection of 𝑜 onto co{𝑣1 , . . . , 𝑣𝑘 }. By Theorems 1.2.17 and 2.1.12 we know that 𝛼(𝑜, 𝑤, 𝑣𝑛) ≥ 𝜋2 and 𝜋 consequently 𝛼(𝑤, 𝑜, 𝑣𝑛 ) < 2 . Varying 𝑤 a little, we may assume 𝑤 ∈ 𝑇𝑦 H, and there therefore exists a geodesic 𝜂 : [0, 1] → H with 𝜂(0) = 𝑦 in the direction of 𝑤. Then of 𝜋 course 𝛼(𝜂, 𝛾𝑛 ) = 𝛼(𝑤, 𝑜, 𝑣𝑛 ) < 2 , for 𝑛 = 1, . . . , 𝑘. We hence obtain that there exists 𝑡 ∈ (0, 1) such that 𝑑(𝜂𝑡 , 𝑦𝑛) < 𝐿‖𝑥 − 𝑥𝑛‖, for 𝑛 = 1, . . . , 𝑘, which contradicts (4.1.3). We therefore arrive at 𝑜 ∈ co{𝑣1 , . . . , 𝑣𝑘 }. Given 𝜀 > 0, there exists 𝑣 ∈ co{𝑣1 , . . . , 𝑣𝑘 } with ‖𝑣 ‖𝑦 ≤ 𝜀. Let 𝜆 := (𝜆 1 , . . . , 𝜆 𝑘 ) be such that (𝑣 , 𝜆) ∈ fco{𝑣1 , . . . , 𝑣𝑘 }. By virtue of Lemma 4.1.1 we get 𝑘
2 𝜀2 ≥ 𝑣 𝑦 ≥ ∑ 𝜆 𝑖 𝜆 𝑗 ⟨𝑣𝑖 , 𝑣𝑗 ⟩𝑦 = ∑ 𝜆2𝑖 + 2 ∑ 𝜆 𝑖 𝜆 𝑗 ⟨𝑣𝑖 , 𝑣𝑗 ⟩𝑦 . 𝑖,𝑗=1
Since ∑𝑖 𝜆2𝑖 ≥
1 𝑘
𝑖
𝑖 𝜉𝑖𝑗 , where
⟨𝑥𝑖 − 𝑥, 𝑥𝑗 − 𝑥⟩ 𝜉𝑖𝑗 := , 𝑥𝑖 − 𝑥 𝑥𝑗 − 𝑥 with ⟨⋅, ⋅⟩ standing for the inner product in 𝐻. Consequently, there exists 𝛿 > 0 such that ⟨𝑣𝑖 , 𝑣𝑗 ⟩𝑦 ≥ 𝜉𝑖𝑗 + 𝛿, for 1 ≤ 𝑖 < 𝑗 ≤ 𝑘. Note that 𝛿 is independent of 𝜀. Finally, we arrive at
1 𝜀2 ≥ ∑ 𝜆2𝑖 + 2 ∑ 𝜆 𝑖 𝜆 𝑗 𝜉𝑖𝑗 + 𝛿 ( − 𝜀2 ) 𝑘 𝑖 𝑖 𝜇 > 0. Proof. Denote 𝑦 := satisfying
𝑑(𝑦, 𝑧) =
𝜆−𝜇 𝑅𝜆 𝑥 𝜆
𝜇
+ 𝜆 𝑥 and recall that 𝑅𝜇 𝑦 is the unique point 𝑧 ∈ H
𝜇 𝑑(𝑦, 𝐹𝑧) , 1+𝜇
and 𝑑(𝑧, 𝐹𝑧) =
1 𝑑(𝑦, 𝐹𝑧) . 1+𝜇
It therefore suffices to show that these two equalities hold for 𝑧 := 𝑅𝜆 𝑥. By easy computations we have
𝑑 (𝑦, 𝑅𝜆 𝑥) =
𝜇 𝜇 𝑑 (𝑥, 𝐹𝑅𝜆 𝑥) = 𝑑 (𝑦, 𝐹𝑅𝜆 𝑥) , 1+𝜆 1+𝜇
which shows the first desired equality and the second one is analogous.
4.2 Resolvent of a nonexpansive mapping 
75
Lemma 4.2.7. Let 𝐹 : H → H be nonexpansive and 𝑥 ∈ H. Then the mapping 𝜆 → 𝑅𝜆 𝑥 is continuous on [0, ∞). Proof. Lemma 4.2.4 gives 𝑅𝜆 𝑥 → 𝑥 as 𝜆 → 0. Continuity at a positive 𝜆 follows by the resolvent identity (Proposition 4.2.6) in a similar way as in the proof of Proposition 2.2.26. The following result is a counterpart to Theorem 2.2.25. Theorem 4.2.8. Let 𝐹 : H → H be a nonexpansive mapping and 𝑥 ∈ H. If there exists (𝜆 𝑛 ) ⊂ (0, ∞) such that 𝜆 𝑛 → ∞ and the sequence (𝑅𝜆 𝑛 𝑥)𝑛∈ℕ is bounded, then Fix 𝐹 is nonempty and lim 𝑅𝜆 𝑥 = 𝑃Fix 𝐹 (𝑥) . (4.2.6) 𝜆→∞
Conversely, if Fix 𝐹 ≠ 0, then (𝑅𝜆 𝑥)𝜆∈(0,∞) is bounded. Proof. To simplify our notation, put 𝑥𝜆 := 𝑅𝜆 (𝑥) for each 𝜆 ∈ (0, ∞). Fix now 0
𝜇 > 0 and 𝑚, 𝑛 ∈ ℕ, then the resolvent satisfies 𝑚−1 𝜇 𝑘 𝜆 − 𝜇 𝑛−𝑘 𝑛 (𝑚) ) ( ) ( )𝑑 (𝑅(𝑚−𝑘) 𝑑 (𝑅(𝑛) 𝑥, 𝑅 𝑥) ≤ ∑ ( 𝑥, 𝑥) 𝜇 𝜆 𝜆 𝜆 𝜆 𝑘 𝑘=0 𝑛 𝜇 𝑚 𝜆 − 𝜇 𝑘−𝑚 𝑘 − 1 ) ( )𝑑 (𝑅(𝑛−𝑘) 𝑥, 𝑥) , +∑( ) ( 𝜇 𝜆 𝜆 𝑚 − 1 𝑘=𝑚
whenever 𝑚 ≤ 𝑛, whereas 𝑛 𝜇 𝑘 𝜆 − 𝜇 𝑛−𝑘 𝑛 (𝑚) ) ( )𝑑 (𝑅(𝑚−𝑘) 𝑑 (𝑅(𝑛) 𝑥, 𝑥) , 𝜇 𝑥, 𝑅𝜆 𝑥) ≤ ∑ ( ) ( 𝜆 𝜆 𝜆 𝑘 𝑘=0
provided 𝑚 > 𝑛.
4.3 Strongly continuous semigroup 
77
Proof. To derive the above inequalities, one can mimic the original Crandall–Liggett proof [68, Lemma 1.3]. Lemma 4.3.2. Let 𝛼 ∈ (0, 1) and 𝑚, 𝑛 ∈ ℕ with 𝑚 ≤ 𝑛. Then: 𝑚−1 (i) ∑𝑘=0 (𝑛𝑘)𝛼𝑘 (1 − 𝛼)𝑛−𝑘 (𝑚 − 𝑘) ≤ √(𝑛𝛼 − 𝑚)2 + 𝑛𝛼(1 − 𝛼), 𝑛 𝑘−1 )𝛼𝑚 (1 − 𝛼)𝑘−𝑚 (𝑛 − 𝑘) ≤ √ 𝑚(1−𝛼) + ( 𝑚(1−𝛼) + 𝑚 − 𝑛)2 . (ii) ∑𝑘=𝑚 (𝑚−1 𝛼2 𝛼
Proof. See Exercise 4.3. We can now state the main result of the current section. Theorem 4.3.3 (Existence of the semigroup). Let (H, 𝑑) be a Hadamard space and 𝐹 : H → H be a nonexpansive mapping. Then the limit
𝑇𝑡 𝑥 := 𝑛→∞ lim 𝑅(𝑛) 𝑡 𝑥, 𝑛
𝑥∈H,
(4.3.8)
exists and is uniform with respect to 𝑡 on each bounded subinterval of [0, ∞). Moreover, the family (𝑇𝑡 )𝑡 is a strongly continuous semigroup of nonexpansive mappings, that is, (i) lim𝑡→0+ 𝑇𝑡 𝑥 = 𝑥, (ii) 𝑇𝑡 (𝑇𝑠 𝑥) = 𝑇𝑠+𝑡 𝑥, for every 𝑠, 𝑡 ∈ [0, ∞), (iii) 𝑑(𝑇𝑡 𝑥, 𝑇𝑡 𝑦) ≤ 𝑑(𝑥, 𝑦), for each 𝑡 ∈ [0, ∞), for every 𝑥, 𝑦 ∈ H. Finally, we have the estimates
𝑑 (𝑇𝑡 𝑥, 𝑇𝑠 𝑥) ≤ 2𝑑(𝑥, 𝐹𝑥)𝑡 − 𝑠 ,
𝑥 ∈ H,
(4.3.9)
for every 𝑠, 𝑡 ∈ [0, ∞), and
𝑑 (𝑇𝑡 𝑥, 𝑅(𝑛) 𝑡 𝑥) ≤ 𝑛
2𝑡 𝑑(𝑥, 𝐹𝑥) , √𝑛
𝑥∈H,
(4.3.10)
for every 𝑡 ∈ [0, ∞) and 𝑛 ∈ ℕ. Proof. We divide the proof into several steps. Step 1: Choose 𝜆 ≥ 𝜇 > 0 and 𝑚, 𝑛 ∈ ℕ with 𝑚 ≤ 𝑛. Applying Lemma 4.3.1, Corollary 4.2.5 and Lemma 4.3.2, we obtain 𝑚−1 𝜇 𝑘 𝜆 − 𝜇 𝑛−𝑘 𝑛 (𝑚) ) ( ) ( )(𝑚 − 𝑘)𝑑(𝑥, 𝐹𝑥) 𝑑 (𝑅(𝑛) 𝑥, 𝑅 𝑥) ≤ 𝜆 ∑ ( 𝜇 𝜆 𝜆 𝜆 𝑘 𝑘=1 𝑛 𝜇 𝑚 𝜆 − 𝜇 𝑘−𝑚 𝑘 − 1 ) ( )(𝑛 − 𝑘)𝑑(𝑥, 𝐹𝑥) +𝜇 ∑ ( ) ( 𝜆 𝜆 𝑚−1 𝑘=𝑚 2 𝜇 𝜇𝜆−𝜇 ≤ 𝜆𝑑(𝑥, 𝐹𝑥)√(𝑛 − 𝑚) + 𝑛 𝜆 𝜆 𝜆
+ 𝜇𝑑(𝑥, 𝐹𝑥)√
𝜆2 𝜆 − 𝜇 𝜆𝜆−𝜇 𝑚+( 𝑚 + 𝑚 − 𝑛) 𝜇2 𝜆 𝜇 𝜆
2
78  4 Nonexpansive mappings
= 𝑑(𝑥, 𝐹𝑥)√(𝑛𝜇 − 𝑚𝜆)2 + 𝑛𝜇(𝜆 − 𝜇) + 𝑑(𝑥, 𝐹𝑥)√𝑚𝜆(𝜆 − 𝜇) + (𝑚𝜆 − 𝑛𝜇)2 . If we put 𝜇 :=
𝑡 𝑛
and 𝜆 :=
𝑡 , 𝑚
(4.3.11)
the above inequality reads
(𝑚) √ 𝑑 (𝑅(𝑛) 𝑡 𝑥, 𝑅 𝑡 𝑥) ≤ 2𝑡𝑑(𝑥, 𝐹𝑥) 𝑛
𝑚
1 1 − . 𝑚 𝑛
We immediately get the existence of the limit (4.3.8), uniformly in 𝑡 on bounded intervals, as well as the error estimate (4.3.10). Consequently also (i) holds with the help of Lemma 4.2.7. 𝑠 𝑡 Step 2: In order to show (4.3.9) for 0 < 𝑠 ≤ 𝑡, it suffices to put 𝜇 := 𝑛 and 𝜆 := 𝑛 along with 𝑚 = 𝑛 in (4.3.11) and take the limit. The cases 𝑠 = 0 or 𝑡 = 0 then follow by (i). It is easy to see that 𝑇𝑡 is a nonexpansive mapping. Step 3: It remains to prove the semigroup property in (ii). Let us first show that 𝑇𝑘𝑡 = 𝑇𝑡(𝑘) , where 𝑡 ∈ (0, ∞) and 𝑘 ∈ ℕ. We proceed by induction on 𝑘. For 𝑘 = 2, we have (𝑛) (𝑛) (𝑛) 𝑑 (𝑇𝑡(2) 𝑥, 𝑅(2𝑛) 𝑥) ≤ 𝑑 (𝑇𝑡 𝑇𝑡 𝑥, 𝑅(𝑛) 𝑡 𝑡 𝑇𝑡 𝑥) + 𝑑 (𝑅 𝑡 𝑇𝑡 𝑥, 𝑅 𝑡 𝑅 𝑡 𝑥) , 𝑛
𝑛
𝑛
𝑛
𝑛
and the righthand side tends to 0 as 𝑛 → ∞. Therefore,
𝑇𝑡(2) 𝑥 = 𝑛→∞ lim 𝑅(2𝑛) 𝑥 = 𝑛→∞ lim 𝑅(2𝑛) 𝑡 2𝑡 𝑥 = 𝑇2𝑡 𝑥 . 𝑛
2𝑛
Choose now 𝑘 ∈ ℕ with 𝑘 ≥ 3 and assume the claim holds for 𝑘 − 1. Then (𝑘−1) 𝑑 (𝑇𝑡(𝑘) 𝑥, 𝑅(𝑘𝑛) 𝑥) ≤ 𝑑 (𝑇𝑡 𝑇𝑡(𝑘−1) 𝑥, 𝑅(𝑛) 𝑥) 𝑡 𝑡 𝑇𝑡 𝑛
𝑛
(𝑛)
+ 𝑑 (𝑅 𝑡
𝑛
(𝑘𝑛−𝑛) 𝑇𝑡(𝑘−1) 𝑥, 𝑅(𝑛) 𝑥) 𝑡 𝑅𝑡 𝑛
𝑛
,
and again the righthand side tends to 0 as 𝑛 → ∞ by our assumption. We hence have
𝑇𝑡(𝑘) 𝑥 = 𝑛→∞ lim 𝑅(𝑘𝑛) 𝑥 = 𝑛→∞ lim 𝑅(𝑘𝑛) 𝑡 𝑘𝑡 𝑥 = 𝑇𝑘𝑡 𝑥 . 𝑛
𝑘𝑛
Consequently, we obtain 𝑇𝑡 𝑇𝑠 = 𝑇𝑠+𝑡 for 𝑠, 𝑡 rational, that is, 𝑠 :=
𝑘 𝑙
and 𝑡 :=
𝑝 𝑞
for some
𝑘, 𝑙, 𝑝, 𝑞 ∈ ℕ. Indeed, (𝑘𝑞+𝑙𝑝)
𝑇 𝑘 + 𝑝 = 𝑇 𝑘𝑞+𝑙𝑝 = 𝑇 1 𝑙
𝑞
𝑙𝑞
𝑙𝑞
(𝑘𝑞)
(𝑙𝑝)
𝑙𝑞
𝑙𝑞
= 𝑇1 𝑇1
= 𝑇𝑘 𝑇𝑝 . 𝑙
𝑞
Finally, by the continuity of 𝑡 → 𝑇𝑡 𝑥 we get the desired semigroup property for every 𝑠, 𝑡 ∈ [0, ∞). The proof is complete. By Lemma 4.2.4 we have 𝑛−1
(𝑗)
(𝑗+1)
𝑛
𝑛
𝑑 (𝑥, 𝑅(𝑛) 𝑡 𝑥) ≤ ∑ 𝑑 (𝑅 𝑡 𝑥, 𝑅 𝑡 𝑛
𝑗=0
𝑥) ≤ 𝑛𝑑 (𝑥, 𝑅 𝑡 𝑥) = 𝑡𝑑(𝑥, 𝐹𝑥) , 𝑛
4.3 Strongly continuous semigroup

79
and taking the limit 𝑛 → ∞ gives
𝑑 (𝑥, 𝑇𝑡 𝑥) ≤ 𝑡𝑑(𝑥, 𝐹𝑥) ,
(4.3.12)
for every 𝑡 ∈ [0, ∞). Before stating the following theorem, recall that the weak convergence of a curve was defined in Definition 3.1.9. Theorem 4.3.4 (Asymptotic behavior of the semigroup). Let 𝐹 : H → H be a nonexpansive mapping with at least one fixed point and 𝑥 ∈ H. Then 𝑇𝑡 𝑥 weakly converges to a fixed point of 𝐹 as 𝑡 → ∞. Proof. Step 1: First observe that
𝑑 (𝑅𝜆 𝑥, 𝐹𝑅𝜆 𝑥) =
1 𝑑 (𝑥, 𝑅𝜆 𝑥) ≤ 𝑑(𝑥, 𝐹𝑥) , 𝜆
by Lemma 4.2.4. We hence have (𝑛) 𝑑(𝑥, 𝐹𝑥) ≥ 𝑑 (𝑅 𝑛𝑡 𝑥, 𝐹𝑅 𝑛𝑡 𝑥) ≥ 𝑑 (𝑅(𝑛) 𝑡 𝑥, 𝐹𝑅 𝑡 𝑥) , 𝑛
𝑛
and after taking the limit 𝑛 → ∞, also
𝑑(𝑥, 𝐹𝑥) ≥ 𝑑 (𝑇𝑡 𝑥, 𝐹𝑇𝑡 𝑥) . The semigroup property implies
𝑑 (𝑇𝑠 𝑥, 𝐹𝑇𝑠 𝑥) ≥ 𝑑 (𝑇𝑡 𝑥, 𝐹𝑇𝑡 𝑥) , whenever 𝑠 ≤ 𝑡 and therefore the limit
lim 𝑑 (𝑇𝑡 𝑥, 𝐹𝑇𝑡 𝑥)
(4.3.13)
𝑡→∞
exists. Step 2: We will now show that this limit actually equals to 0. Let 0 ≤ 𝑠 ≤ 𝑡. Then inequality (4.3.12) gives 𝑛−1
𝑑 (𝑇𝑠 𝑥, 𝑇𝑡 𝑥) ≤ ∑ 𝑑 (𝑇𝑠+ 𝑗 (𝑡−𝑠) 𝑥, 𝑇𝑠+ 𝑗+1 (𝑡−𝑠) 𝑥) 𝑛
𝑗=0
≤
𝑛
𝑡 − 𝑠 𝑛−1 ∑ 𝑑 (𝑇𝑠+ 𝑗 (𝑡−𝑠) 𝑥, 𝐹𝑇𝑠+ 𝑗 (𝑡−𝑠) 𝑥) , 𝑛 𝑛 𝑛 𝑗=0
and after sending 𝑛 → ∞, we obtain 𝑡
𝑑 (𝑇𝑠 𝑥, 𝑇𝑡 𝑥) ≤ ∫ 𝑑 (𝑇𝑟 𝑥, 𝐹𝑇𝑟 𝑥) d𝑟 . 𝑠
(4.3.14)
80  4 Nonexpansive mappings Next we prove that
lim 𝑑 (𝑇𝑡 𝑥, 𝐹𝑇𝑡 𝑥) ≤
𝑡→∞
1 lim 𝑑 (𝑇𝑡+ℎ 𝑥, 𝑇𝑡 𝑥) . ℎ 𝑡→∞
(4.3.15)
To this end, we repeatedly use the inequality (𝑛−𝑘+1) 𝑑 (𝐹𝑅(𝑛) 𝑥) ≤ 𝜆 𝑥, 𝑅𝜆
1 𝜆 (𝑛−𝑘) (𝑛−𝑘+1) 𝑑 (𝐹𝑅(𝑛) 𝑑 (𝑅(𝑛) 𝑥) + 𝑥) , 𝜆 𝑥, 𝑅𝜆 𝜆 𝑥, 𝑅𝜆 1+𝜆 1+𝜆
valid for each 1 ≤ 𝑘 ≤ 𝑛, to obtain (𝑛) 𝑑 (𝐹𝑅(𝑛) 𝜆 𝑥, 𝑅𝜆 𝑥) ≤
Put now 𝜆 :=
𝑡 𝑛
𝑛 1 1 (𝑛−𝑗+1) (𝑛) (𝐹𝑅 𝑑 𝑥, 𝑥) + 𝜆 ∑ 𝑑 (𝑅(𝑛) 𝑥) . 𝜆 𝜆 𝑥, 𝑅𝜆 𝑛 𝑗 (1 + 𝜆) (1 + 𝜆) 𝑗=1
and take the limit 𝑛 → ∞. One arrives at 𝑡
𝑑 (𝑇𝑡 𝑥, 𝐹𝑇𝑡 𝑥) ≤ exp(−𝑡)𝑑 (𝐹𝑇𝑡 𝑥, 𝑥) + ∫ exp(−𝑟)𝑑 (𝑇𝑡 𝑥, 𝑇𝑡−𝑟 𝑥) d𝑟 . 0
Applying inequality (4.3.14) and an elementary calculation we arrive at 𝑡
exp(𝑡)𝑑 (𝑇𝑡 𝑥, 𝐹𝑇𝑡 𝑥) ≤ 𝑑 (𝐹𝑇𝑡 𝑥, 𝑥) + ∫ (exp(𝑟) − 1) 𝑑 (𝑇𝑟 𝑥, 𝐹𝑇𝑟 𝑥) d𝑟 , 0
or 𝑡
(exp(𝑡) − 1) 𝑑 (𝑇𝑡 𝑥, 𝐹𝑇𝑡 𝑥) ≤ 𝑑 (𝑇𝑡 𝑥, 𝑥) + ∫ (exp(𝑟) − 1) 𝑑 (𝑇𝑟 𝑥, 𝐹𝑇𝑟 𝑥) d𝑟 . 0
Replacing 𝑡 by ℎ and then 𝑥 by 𝑇𝑡 𝑥 gives 𝑡+ℎ
(exp(ℎ) − 1) 𝑑 (𝑇𝑡+ℎ 𝑥, 𝐹𝑇𝑡+ℎ 𝑥) ≤ ∫ (exp(𝑟 − 𝑡) − 1) 𝑑 (𝑇𝑟 𝑥, 𝐹𝑇𝑟 𝑥) d𝑟 𝑡
+ 𝑑 (𝑇𝑡+ℎ 𝑥, 𝑇𝑡 𝑥) . By an easy calculation we obtain
𝑑 (𝑇𝑡+ℎ 𝑥, 𝑇𝑡 𝑥) ≥ (exp(ℎ) − 1) [𝑑 (𝑇𝑡+ℎ 𝑥, 𝐹𝑇𝑡+ℎ 𝑥) − 𝑑 (𝑇𝑡 𝑥, 𝐹𝑇𝑡 𝑥)] + ℎ𝑑 (𝑇𝑡 𝑥, 𝐹𝑇𝑡 𝑥) , which proves (4.3.15). Then (4.3.15) and (4.3.14) yield
1 lim 𝑑 (𝑇𝑡 𝑥, 𝐹𝑇𝑡 𝑥) ≤ lim sup 𝑑 (𝑇ℎ 𝑥, 𝑥) ℎ→∞ ℎ
𝑡→∞
ℎ
1 ≤ lim ∫ 𝑑 (𝑇𝑟 𝑥, 𝐹𝑇𝑟 𝑥) d𝑟 ℎ→∞ ℎ 0
= lim 𝑑 (𝑇𝑡 𝑥, 𝐹𝑇𝑡 𝑥) , 𝑡→∞
4.3 Strongly continuous semigroup
and thus
1 lim 𝑑 (𝑇𝑡 𝑥, 𝐹𝑇𝑡 𝑥) = lim sup 𝑑 (𝑇ℎ 𝑥, 𝑥) . ℎ ℎ→∞
𝑡→∞

81
(4.3.16)
Let now 𝑦 ∈ H. Then
lim sup ℎ→∞
𝑑 (𝑇ℎ 𝑥, 𝑥) 1 ≤ lim sup [𝑑 (𝑇ℎ 𝑥, 𝑇ℎ 𝑦) + 𝑑 (𝑇ℎ 𝑦, 𝑦) + 𝑑 (𝑦, 𝑥)] ℎ ℎ→∞ ℎ 𝑑 (𝑇ℎ 𝑦, 𝑦) , ≤ lim sup ℎ ℎ→∞
and since 𝑦 was arbitrary, the value on the left hand side is independent of 𝑥. Consequently, by virtue of (4.3.16), the limit in (4.3.13) is independent of 𝑥 and is therefore equal to 0, because one may choose 𝑥 ∈ Fix 𝐹. Step 3: To finish the proof, choose a sequence 𝑡𝑛 → ∞ and put 𝑥𝑛 := 𝑇𝑡𝑛 𝑥0 . Since 𝑇𝑡 is nonexpansive, we know that (𝑥𝑛 ) is Fejér monotone with respect to Fix 𝐹. In particular, the sequence (𝑥𝑛 ) is bounded and has therefore a weak cluster point 𝑧 ∈ H. In order to apply Proposition 3.2.6, it suffices to show that 𝑧 ∈ Fix 𝐹. We easily get
lim sup 𝑑 (𝐹𝑧, 𝑥𝑛) ≤ lim sup 𝑑 (𝐹𝑧, 𝐹𝑥𝑛) + lim sup 𝑑 (𝐹𝑥𝑛, 𝑥𝑛) 𝑛→∞
𝑛→∞
𝑛→∞
≤ lim sup 𝑑 (𝑧, 𝑥𝑛) , 𝑛→∞
which by the uniqueness of the weak limit gives 𝑧 = 𝐹𝑧. Here we used, of course, the fact that the limit in (4.3.13) is 0. It is easy to see that 𝑧 is independent of the choice of the sequence (𝑡𝑛 ) and there𝑤
fore 𝑇𝑡 𝑥0 → 𝑧. As noted in [24, Remark, p. 7], there exists a counterexample in Hilbert space showing that the convergence in Theorem 4.3.4 is not strong in general. We finish this chapter by a rather technical result which will be needed in Section 5.3. Lemma 4.3.5. Let 𝐹 : H → H be a nonexpansive mapping. Then its semigroup satisfies
𝑑 (𝑇𝑡 𝑥, 𝐹(𝑘) 𝑥) ≤ 2𝑑(𝑥, 𝐹𝑥)√(𝑘 − 𝑡)2 + 𝑡 ,
𝑥∈H,
for every 𝑡 ∈ [0, ∞) and 𝑘 ∈ ℕ0 . Proof. Step 1: Choose 𝑛 ∈ ℕ. The convexity of the distance function gives 2
𝑑 (𝐹(𝑘) 𝑥, 𝑅(𝑛) 𝑡 𝑥) ≤ 𝑛
𝑡
2 2 1 (𝑘) (𝑛−1) (𝑘) (𝑛) 𝑛 𝑑 (𝐹 𝑥, 𝑅 𝑥) + 𝑑 (𝐹 𝑥, 𝐹𝑅 𝑥) , 𝑡 𝑡 𝑛 𝑛 1 + 𝑛𝑡 1 + 𝑛𝑡
82  4 Nonexpansive mappings and repeating the argument 𝑛times, we obtain 2 2 𝑡 −𝑛 ) 𝑑 (𝐹(𝑘) 𝑥, 𝑥) 𝑑 (𝐹(𝑘) 𝑥, 𝑅(𝑛) 𝑡 𝑥) ≤ (1 + 𝑛 𝑛 2 𝑡 −𝑗 𝑡 𝑛 (𝑛−𝑗+1) + ∑ (1 + ) 𝑑 (𝐹(𝑘) 𝑥, 𝐹𝑅 𝑡 𝑥) 𝑛 𝑗=1 𝑛 𝑛
𝑡 −𝑛 ≤ (1 + ) 𝑘2 𝑑(𝐹𝑥, 𝑥)2 𝑛 2 𝑡 𝑛 𝑡 −𝑗 (𝑛−𝑗+1) + ∑ (1 + ) 𝑑 (𝐹(𝑘−1) 𝑥, 𝑅 𝑡 𝑥) 𝑛 𝑗=1 𝑛 𝑛 𝑡 −𝑛 = (1 + ) 𝑘2 𝑑(𝐹𝑥, 𝑥)2 𝑛 𝑛 2 𝑡 𝑡 −𝑛−𝑗 (𝑗) + ∑ (1 + ) 𝑑 (𝐹(𝑘−1) 𝑥, 𝑅 𝑡 𝑥) . 𝑛 𝑗=1 𝑛 𝑛 If we define the functions 𝑛 𝑠 −𝑛−𝑗 𝜑𝑛 (𝑠) := ∑ (1 + ) 𝜒( (𝑗−1)𝑡 , 𝑗𝑡 ] (𝑠) , 𝑛 𝑛 𝑛 𝑗=1
and, 𝑛
(𝑗)
2
𝜓𝑛 (𝑠) := ∑ 𝑑 (𝐹(𝑘−1) 𝑥, 𝑅 𝑠 𝑥) 𝜒( (𝑗−1)𝑡 , 𝑗𝑡 ] (𝑠) , 𝑛
𝑗=1
𝑛
𝑛
where 𝑠 ∈ [0, 𝑡] and 𝑛 ∈ ℕ, the above inequality reads 𝑡
𝑡 −𝑛 𝑑 (𝐹 𝑥, 𝑅 𝑡 𝑥) ≤ (1 + ) 𝑘2 𝑑 (𝐹𝑥, 𝑥)2 + ∫ 𝜑𝑛 (𝑠)𝜓𝑛 (𝑠) d𝑠 . 𝑛 𝑛 (𝑘)
(𝑛)
2
0
Step 2: We claim that
lim 𝜑𝑛 (𝑠) = exp(𝑠 − 𝑡)
(4.3.17)
𝑛→∞
and, 2
lim 𝜓𝑛 (𝑠) = 𝑑 (𝐹(𝑘−1) 𝑥, 𝑇𝑠 𝑥) ,
𝑛→∞
for every 𝑠 ∈ (0, 𝑡] and that sup𝑛∈ℕ,𝑠∈[0,𝑡] 𝜑𝑛 (𝑠) 𝜓𝑛 (𝑠) < ∞. Then we could use the dominated convergence theorem and conclude 𝑡 (𝑘)
2
2
2
𝑑 (𝐹 𝑥, 𝑇𝑡 𝑥) ≤ 𝑘 exp(−𝑡)𝑑(𝑥, 𝐹𝑥) + ∫ exp(𝑠 − 𝑡)𝑑 (𝐹(𝑘−1) 𝑥, 𝑇𝑠 𝑥) d𝑠 . (4.3.18) 0
4.3 Strongly continuous semigroup 
83
Let us prove the claim. For every 𝑛 ∈ ℕ and 𝑠 ∈ (0, 𝑡] there exists a unique 𝜈 = 𝜈(𝑛, 𝑠) ∈ ℕ such that (𝜈 − 1)𝑡 < 𝑠𝑛 ≤ 𝜈𝑡. As a matter of fact, it satisfies 𝜈 ≤ 𝑛. 𝑡 𝑠 Inequality (4.3.11) with 𝑚 := 𝜈, 𝜆 := 𝑛 , 𝑛 := 𝑙 and 𝜇 := 𝑙 reads (𝜈) √(𝑠 − 𝑑 (𝑅(𝑙) 𝑠 𝑥, 𝑅 𝑡 𝑥) ≤ 𝑑(𝑥, 𝐹𝑥) 𝑙
𝑛
𝜈𝑡 2 𝑡 𝑠 ) +𝑠( − ) 𝑛 𝑛 𝑙
𝜈𝑡 𝑡 𝑠 𝜈𝑡 2 + 𝑑(𝑥, 𝐹𝑥)√ ( − ) + (𝑠 − ) , 𝑛 𝑛 𝑙 𝑛 and, since we assume 𝜈 ≤ 𝑙, we can take the limit 𝑙 → ∞ to arrive at 2 2 2 √(𝑠 − 𝜈𝑡 ) + 𝑠𝑡 + √ 𝜈𝑡 + (𝑠 − 𝜈𝑡 ) ) . 𝑑 (𝑇𝑠 𝑥, 𝑅(𝜈) 𝑡 𝑥) ≤ 𝑑(𝑥, 𝐹𝑥) ( 𝑛 𝑛 𝑛 𝑛2 𝑛
Since
(𝑘−1) 𝜓𝑛 (𝑠) − 𝑑 (𝐹(𝑘−1) 𝑥, 𝑇𝑠 𝑥)2 = 𝑑 (𝐹(𝑘−1) 𝑥, 𝑅(𝜈) 𝑥, 𝑇𝑠 𝑥) 𝑡 𝑥) + 𝑑 (𝐹 𝑛 (𝑘−1) (𝜈) (𝑘−1) ⋅ 𝑑 (𝐹 𝑥, 𝑅 𝑡 𝑥) − 𝑑 (𝐹 𝑥, 𝑇𝑠 𝑥) , 𝑛 (𝑖)
and the set {𝑅 𝑡 𝑥 : 𝑛, 𝑖 ∈ ℕ, 𝑖 ≤ 𝑛} is bounded, the above inequality gives 𝑛
2
lim 𝜓𝑛 (𝑠) = 𝑑 (𝐹(𝑘−1) 𝑥, 𝑇𝑠 𝑥) ,
𝑛→∞
𝑠 ∈ (0, 𝑡] ,
as well as that (𝜓𝑛 ) is a uniformly bounded sequence. The analogous claims about 𝜑𝑛 are an easy exercise and are omitted. We now justified (4.3.18). Step 3: If 𝑑(𝑥, 𝐹𝑥) = 0, the lemma holds trivially. Let us therefore assume 𝑑(𝑥, 𝐹𝑥) > 0 and put
𝜉𝑘 (𝑠) :=
𝑑 (𝐹(𝑘) 𝑥, 𝑇𝑠 𝑥) 2𝑑(𝑥, 𝐹𝑥)
,
𝑠 ∈ [0, ∞) ,
where 𝑘 ∈ ℕ0 . Then (4.3.18) becomes 𝑡
1 𝜉𝑘 (𝑡) ≤ exp(−𝑡)𝑘2 + ∫ exp(𝑠 − 𝑡)𝜉𝑘−1 (𝑠)2 d𝑠 , 2 2
(4.3.19)
0
and we need to show
𝜉𝑘 (𝑡)2 ≤ (𝑘 − 𝑡)2 + 𝑡 , to complete the proof. We will proceed by induction on 𝑘. The case 𝑘 = 0 is a consequence of (4.3.9). Fix now 𝑘 ∈ ℕ and assume the desired inequality holds for 𝑘 − 1. Then by (4.3.19) we have 𝑡
1 𝜉𝑘 (𝑡) ≤ exp(−𝑡)𝑘2 + ∫ exp(𝑠 − 𝑡) [(𝑘 − 1 − 𝑠)2 + 𝑠] d𝑠 , 2 2
0
84  4 Nonexpansive mappings and it suffices to show that 𝑡
1 2 𝑘 + ∫ exp(𝑠) [(𝑘 − 1 − 𝑠)2 + 𝑠] d𝑠 = exp(𝑡) [(𝑘 − 𝑡)2 + 𝑡] , 2 0
which is obviously the case, for the expressions on either side coincide at 𝑡 = 0 and their derivatives with respect to 𝑡 are equal. This finishes the proof.
Exercises Exercise 4.1. Show that if we assumed 𝐹 to be (1 + 𝜀)Lipschitz in Theorem 3.3.1, for some 𝜀 > 0, instead of 1Lipschitz, the statement fails. Hint. Let 𝐵 ⊂ ℓ2 be the closed unit ball of the Hilbert space ℓ2 . Choose 𝜆 > 0 and consider the mapping 𝐹 : 𝐵 → 𝐵 defined by
𝐹 : (𝑥1 , 𝑥2 , 𝑥3 , . . . ) → (
1 − ‖𝑥‖ , 𝑥 1 , 𝑥2 , . . . ) , 𝜆
for every 𝑥 = (𝑥1 , 𝑥2 , 𝑥3 , . . . ) ∈ ℓ2 . Exercise 4.2. Let 𝐻 be a Hilbert space. Show that the resolvent of a convex lsc function 𝑓 : 𝐻 → (−∞, ∞] satisfies 𝐽𝜆 = (𝐼 + 𝜆𝜕𝑓)−1 . Furthermore, show that the resolvent of a nonexpansive mapping 𝐹 : 𝐻 → 𝐻 satisfies 𝑅𝜆 = (𝐼 + 𝜆(𝐼 − 𝐹))−1 . Exercise 4.3. Prove Lemma 4.3.2. Hint. See [68, Lemma 1.4]. Exercise 4.4. Give the missing details in the proof of Lemma 4.3.5, that is, prove (4.3.17) and show that the sequence (𝜑𝑛 ) is uniformly bounded.
Bibliographical remarks In Section 4.1, we follow the original paper [127] by U. Lang and V. Schroeder. There have been many important followup results, for instance [53, 126, 128, 141, 149]. These developments were motivated by the famous papers [113] by M. Kirszbraun and [200] by F. Valentine. A beautiful proof of the classical result is due to S. Reich and S. Simons [176]. Interestingly, a similar extension theorem for firmly nonexpansive mappings in Hilbert spaces was proved by H. Bauschke [30] and H. Bauschke and X. Wang [33]. The material of Sections 4.2 and 4.3 comes mostly from [189], where the author developed a Hadamard space analog of the theory due to M. Crandall and T. Liggett [68].
Bibliographical remarks 
85
Formula (4.3.8) was in a nonlinear context used already in [173, Theorem 8.1]. Theorem 4.2.8 is due to W. A. Kirk [112, Theorem 26]. Theorem 4.3.4 was proved in [21] mimicking a Hilbert ball result of S. Reich [171]. In this connection, see also the paper [24] by J.B. Baillon, R. E. Bruck and S. Reich. We would like to mention a recent monograph by A. Zaslavski and S. Reich [178], where the interested reader can find a lot of information on nonexpansive mappings in linear and hyperbolic spaces as well as other aspects of nonlinear analysis discussed in our book.
5 Gradient flow of a convex functional Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be a convex lsc function. It turns outs that the resolvent of 𝑓 generates a semigroup of nonexpansive mappings (𝑆𝑡 ) which for a given point 𝑥 ∈ dom 𝑓 represents a curve 𝑥𝑡 := 𝑆𝑡 𝑥 moving in the direction of the steepest descent of 𝑓. This is called a gradient flow. In the present chapter, we first prove the existence of a gradient flow in Hadamard spaces and show that the properties of a gradient flow in a Hilbert space carry over into our nonlinear setting. In Section 5.2 we will see that if a sequence of functions converges in the sense of Mosco, the corresponding resolvents (resp. semigroups) converge pointwise to the resolvent (resp. semigroup) of the limit function. Finally, in Section 5.3 we derive a nonlinear version of the celebrated Lie–Trotter–Kato formula. We start by recalling the classical theory of gradient flows. Let 𝐻 be a Hilbert space. Given a smooth convex function 𝑓 : 𝐻 → ℝ and 𝑥0 ∈ 𝐻, consider the following parabolic problem
̇ = −∇𝑓 (𝑥(𝑡)) , 𝑥(𝑡)
𝑡 ∈ (0, ∞) ,
(5.0.1)
𝑥(0) = 𝑥0 , for an unknown curve 𝑥 : [0, ∞) → 𝐻. In other words, one looks for a curve starting at 𝑥0 which moves in the direction of the steepest descent of the function 𝑓. A solution 𝑥 : [0, ∞) → 𝐻 is then called a gradient flow governed by the functional 𝑓. If we assume 𝑓 be only convex lsc, we can study a similar parabolic problem after replacing the gradient ∇𝑓 by the subgradient 𝜕𝑓. More precisely, let 𝑓 : 𝐻 → (−∞, ∞] be a convex lsc function and 𝑥0 ∈ 𝐻. Then the problem
̇ ∈ −𝜕𝑓 (𝑥(𝑡)) , 𝑥(𝑡)
𝑡 ∈ (0, ∞) ,
(5.0.2)
𝑥(0) = 𝑥0 , for an unknown curve 𝑥 : [0, ∞) → 𝐻 is a natural generalization of (5.0.1) and has been well understood for quite some time. While a generalization of (5.0.2) into Riemannian manifolds is rather straightforward, it is less obvious how to define gradient flows for convex lsc functionals in Hadamard spaces. The main obstacle is that there is no notion of a subdifferential in Hadamard spaces and hence we must adopt a completely different approach. Quite awesomely, we will in the present chapter arrive at a very satisfactory definition of a gradient flow, which is fully equivalent to (5.0.2) in classical settings and behaves equally well.
5.1 Gradient flow semigroup 
87
5.1 Gradient flow semigroup Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be a convex lsc function. Given a point 𝑥0 ∈ H, we would like to find a curve 𝑥 : [0, ∞) → H with 𝑥(0) = 𝑥0 which at each point moves in the direction of the steepest descent of the function 𝑓. To this end, we define a slope of a function and derive its properties prior to stating the main result on the existence of a gradient flow in Theorem 5.1.6.
Slope of a function The notion of a slope is to partially compensate for the lack of a subdifferential of a convex lsc function, but in contrast to the subdifferential, the slope is a nonnegative real number. Definition 5.1.1 (Slope of a function). Let 𝑓 : H → (−∞, ∞] be a convex lsc function and 𝑥 ∈ dom 𝑓. Define the slope of 𝑓 at 𝑥 as
𝜕𝑓(𝑥) : = lim sup 𝑦→𝑥
max {𝑓(𝑥) − 𝑓(𝑦), 0} , 𝑑(𝑥, 𝑦)
and dom 𝜕𝑓 := {𝑥 ∈ H : 𝜕𝑓(𝑥) < ∞}. If 𝑓(𝑥) = ∞, we set 𝜕𝑓(𝑥) := ∞. Note that if 𝐻 is a Hilbert space and 𝑓 : 𝐻 → ℝ is smooth, then ‖∇𝑓(𝑥)‖ = 𝜕𝑓(𝑥) for each 𝑥 ∈ 𝐻. The definition of a slope has a local character. The following lemma shows that it can be equivalently obtained as a global supremum. As a consequence we get the lower semicontinuity of the function 𝑥 → 𝜕𝑓(𝑥) on dom 𝜕𝑓(𝑥). Lemma 5.1.2. Let 𝑓 : H → (−∞, ∞] be a convex lsc function. Then
𝜕𝑓(𝑥) = sup 𝑦∈H\{𝑥}
max {𝑓(𝑥) − 𝑓(𝑦), 0} , 𝑑(𝑥, 𝑦)
𝑥 ∈ dom 𝑓.
Furthermore, the slope of 𝑓 is lsc, that is,
𝜕𝑓(𝑥) ≤ lim inf 𝜕𝑓 (𝑥𝑛) , 𝑛→∞ whenever 𝑥𝑛 → 𝑥 and 𝑥 ∈ dom 𝜕𝑓. Proof. It is immediate that
max {𝑓(𝑥) − 𝑓(𝑦), 0} . 𝑑(𝑥, 𝑦) 𝑦∈H\{𝑥}
𝜕𝑓(𝑥) ≤ sup
To show the converse inequality, we may assume that 𝑥 ∈ dom 𝜕𝑓 and 𝑦 ∈ H satisfy 𝑓(𝑥) − 𝑓(𝑦) > 0. If we denote 𝛾 : [0, 1] → H the geodesic from 𝑥 to 𝑦, then
88  5 Gradient flow of a convex functional by convexity
𝑓(𝑥) − 𝑓 (𝛾𝑡 ) 𝑓(𝑥) − 𝑓(𝑦) , ≥ 𝑑(𝑥, 𝑦) 𝑑 (𝑥, 𝛾𝑡 ) and hence,
𝜕𝑓(𝑥) ≥ lim sup 𝑡→0+
𝑓(𝑥) − 𝑓 (𝛾𝑡 ) 𝑓(𝑥) − 𝑓(𝑦) . ≥ 𝑑(𝑥, 𝑦) 𝑑 (𝑥, 𝛾𝑡 )
Taking supremum over 𝑦 finishes the first part of the proof. To show that 𝜕𝑓 is lsc, let 𝑥𝑛 ∈ H converge to 𝑥 ∈ dom 𝜕𝑓 and let 𝑦 ∈ H \ {𝑥}. Then we have
max {𝑓 (𝑥𝑛) − 𝑓(𝑦), 0} max {𝑓(𝑥) − 𝑓(𝑦), 0} inf . lim inf 𝜕𝑓 (𝑥𝑛 ) ≥ lim ≥ 𝑛→∞ 𝑛→∞ 𝑑(𝑥, 𝑦) 𝑑 (𝑥𝑛, 𝑦) Now it suffices to take the supremum over 𝑦 and we are done, by the first part of the lemma. Observe that 𝑥 ∈ H is a minimizer of 𝑓 if and only if 𝜕𝑓(𝑥) = 0. By the definition, dom 𝜕𝑓 ⊂ dom 𝑓 and it is easy to find a function 𝑓 such that dom 𝜕𝑓 ≠ dom 𝑓; see Exercise 5.1. On the other hand, the following lemma shows that the difference between these two domains cannot be too big. First recall the resolvent of 𝑓 was defined in Definition 2.2.20 as
𝐽𝜆 𝑥 := arg min [𝑓(𝑦) + 𝑦∈H
1 𝑑(𝑥, 𝑦)2 ] , 2𝜆
𝑥∈H,
where 𝜆 ∈ (0, ∞). Lemma 5.1.3. Let 𝑓 : H → (−∞, ∞] be a convex lsc function. Then for every 𝑥 ∈ H and 𝜆 ∈ (0, ∞) we have 𝐽𝜆 𝑥 ∈ dom 𝜕𝑓 and
𝑑 (𝑥, 𝐽𝜆 𝑥) . 𝜆 In particular, we see that dom 𝜕𝑓 is dense in dom 𝑓. 𝜕𝑓 (𝐽𝜆 𝑥) ≤
Proof. Let 𝑦 ∈ H, then
1 1 2 𝑑(𝑥, 𝑦)2 − 𝑑 (𝑥, 𝐽𝜆 𝑥) 2𝜆 2𝜆 1 𝑑 (𝑦, 𝐽𝜆 𝑥) [𝑑(𝑦, 𝑥) + 𝑑 (𝑥, 𝐽𝜆 𝑥)] , ≤ 2𝜆
𝑓 (𝐽𝜆 𝑥) − 𝑓(𝑦) ≤
and hence we obtain
lim sup 𝑦→𝐽𝜆 𝑥
max {0, 𝑓 (𝐽𝜆 𝑥) − 𝑓(𝑦)} 1 [𝑑(𝑦, 𝑥) + 𝑑 (𝑥, 𝐽𝜆 𝑥)] ≤ lim sup 2𝜆 𝑑 (𝑦, 𝐽𝜆 𝑥) 𝑦→𝐽𝜆 𝑥 =
This finishes the proof.
𝑑 (𝑥, 𝐽𝜆 𝑥) . 𝜆
5.1 Gradient flow semigroup  89
Lemma 5.1.4. Let 𝑓 : H → (−∞, ∞] be a convex lsc function. Then for every 𝑥 ∈ H and 𝜆 ∈ (0, ∞), we have
𝑓(𝑥) − 𝑓𝜆 (𝑥) 1 ≤ 𝜕𝑓(𝑥)2 . 𝜆 2 Proof. By Lemma 5.1.2 we have 2
𝑓(𝑥) − 𝑓𝜆 (𝑥) 𝑓(𝑥) − 𝑓 (𝐽𝜆 𝑥) 𝑑 (𝑥, 𝐽𝜆 𝑥) = − 𝜆 𝜆 2𝜆2 2 𝑑 (𝑥, 𝐽𝜆 𝑥) 𝑑 (𝑥, 𝐽𝜆 𝑥) − ≤ 𝜕𝑓(𝑥) 𝜆 2𝜆2 2 2 𝑑 (𝑥, 𝐽𝜆 𝑥) 𝑑 (𝑥, 𝐽𝜆 𝑥) 1 ≤ 𝜕𝑓(𝑥)2 + − 2 2𝜆2 2𝜆2 1 = 𝜕𝑓(𝑥)2 . 2 Combining Lemma 5.1.3, Lemma 2.2.23 and Lemma 5.1.4, we arrive at a heartwarming inequality 2
2
𝜕𝑓 (𝐽𝜆 𝑥) ≤
𝑑 (𝑥, 𝐽𝜆 𝑥) 𝑓(𝑥) − 𝑓𝜆 (𝑥) ≤ 𝜕𝑓(𝑥)2 ≤2 𝜆2 𝜆
(5.1.3)
for every 𝑥 ∈ H, which we record here for future references.
Existence of a gradient flow semigroup At first we show the existence of a flow (Theorem 5.1.6) and next prove its semigroup property (Proposition 5.1.15). Recall some notation. Notation 5.1.5. Given a mapping 𝐹 : H → H, denote
𝐹(𝑛) := 𝐹 ∘ ⋅ ⋅ ⋅ ∘ 𝐹 ,
𝑛∈ℕ,
where 𝐹 appears 𝑛times in the composition on the righthand side. As a convention, we put 𝐹(0) 𝑥 := 𝑥 for every 𝑥 ∈ H. Theorem 5.1.6 (Existence of a flow). Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be a convex lsc function. Assume 𝑥0 ∈ dom 𝑓. Then the following limit exists
lim 𝐽(𝑛) 𝑆𝑡 𝑥0 := 𝑛→∞ 𝑡 𝑥0 , 𝑛
𝑡 ∈ [0, ∞) ,
(5.1.4)
and defines a nonexpansive mapping 𝑆𝑡 : dom 𝑓 → dom 𝑓. The limit in (5.1.4) is uniform with respect to 𝑡 on bounded subintervals of [0, ∞) and, if 𝑥0 ∈ dom 𝑓, we have the error estimate
𝑡 𝑡 √ [𝑓 (𝑥0 ) − 𝑓 𝑡 (𝑥0 )] ≤ 𝜕𝑓 (𝑥0 ) , 𝑑 (𝑆𝑡 𝑥0 , 𝐽(𝑛) 𝑡 𝑥0 ) ≤ 𝑛 𝑛 𝑛 √2𝑛 for every 𝑡 ∈ [0, ∞) and 𝑛 ∈ ℕ.
(5.1.5)
90  5 Gradient flow of a convex functional Proof. Fix 𝜏 > 0. Choose 𝑁 ∈ ℕ and put 𝜆 := 𝜆(𝑁) := 𝜏/𝑁. This gives a partition P𝜆 := {𝑛𝜆}𝑛∈ℕ of the time interval [0, ∞) with time step 𝜆 > 0. Also put 𝐼𝑛𝜆 := ((𝑛 − 1)𝜆, 𝑛𝜆] for every 𝑛 ∈ ℕ. Let us first assume 𝑥0 ∈ dom 𝑓. We divide the proof into several steps. (𝑛) Step 1: Put 𝑥𝜆𝑛 := 𝐽𝜆 𝑥0 for every 𝑛 ∈ ℕ0 . Define the fractional part function
𝑙𝜆 (𝑡) :=
𝑡 − (𝑛 − 1)𝜆 , 𝜆
𝑡 ∈ 𝐼𝑛𝜆 , 𝑛 ∈ ℕ,
and, for each 𝑦 ∈ H, interpolate the distance function squared by putting 2
2
𝐷𝜆 (𝑡; 𝑦)2 := (1 − 𝑙𝜆 (𝑡)) 𝑑 (𝑦, 𝑥𝜆𝑛−1 ) + 𝑙𝜆 (𝑡)𝑑 (𝑦, 𝑥𝜆𝑛 ) ,
𝑡 ∈ 𝐼𝑛𝜆 ,
and in the same way interpolate the function 𝑓, that is, put
𝜑𝜆 (𝑡) := (1 − 𝑙𝜆 (𝑡)) 𝑓 (𝑥𝜆𝑛−1 ) + 𝑙𝜆 (𝑡)𝑓 (𝑥𝜆𝑛 ) ,
𝑡 ∈ 𝐼𝑛𝜆 ,
for every 𝑛 ∈ ℕ. Step 2: Define the function
𝑟𝜆 (𝑡) := 2 (1 − 𝑙𝜆 (𝑡)) [𝑓 (𝑥𝜆𝑛−1 ) − 𝑓 (𝑥𝜆𝑛 ) −
2 1 𝑑 (𝑥𝜆𝑛 , 𝑥𝜆𝑛−1 ) ] 𝜆
2 1 + (1 − 2𝑙𝜆 (𝑡)) 𝑑 (𝑥𝜆𝑛 , 𝑥𝜆𝑛−1 ) , 𝜆
and observe that 𝑛𝜆
∫ 𝑟𝜆 (𝑡) d𝑡 = 𝜆 [𝑓 (𝑥𝜆𝑛−1 ) − 𝑓 (𝑥𝜆𝑛 ) − (𝑛−1)𝜆
2 1 𝑑 (𝑥𝜆𝑛 , 𝑥𝜆𝑛−1 ) ] . 𝜆
Consequently, 𝜏
𝑁
∫ 𝑟𝜆 (𝑡) d𝑡 = 𝜆 ∑ [𝑓 (𝑥𝜆𝑛−1 ) − 𝑓 (𝑥𝜆𝑛 ) − 𝑛=1
0
2 1 𝑑 (𝑥𝜆𝑛 , 𝑥𝜆𝑛−1 ) ] . 𝜆
Step 3: From (5.1.3) we get
𝑓(𝑥𝜆𝑛−1 ) − 𝑓(𝑥𝜆𝑛 ) −
2 2 1 1 𝑑 (𝑥𝜆𝑛 , 𝑥𝜆𝑛−1 ) = 𝑓(𝑥𝜆𝑛−1 ) − 𝑓𝜆 (𝑥𝜆𝑛−1 ) − 𝑑 (𝑥𝜆𝑛 , 𝑥𝜆𝑛−1 ) 𝜆 2𝜆 ≤ [𝑓(𝑥𝜆𝑛−1 ) − 𝑓𝜆 (𝑥𝜆𝑛−1 )] − [𝑓(𝑥𝜆𝑛 ) − 𝑓𝜆 (𝑥𝜆𝑛 )] ,
which yields 𝑁
∑ [𝑓(𝑥𝜆𝑛−1 ) − 𝑓(𝑥𝜆𝑛 ) − 𝑛=1
2 1 𝑑(𝑥𝜆𝑛 , 𝑥𝜆𝑛−1 ) ] ≤ [𝑓(𝑥𝜆0 ) − 𝑓𝜆 (𝑥𝜆0 )] − [𝑓(𝑥𝜆𝑁 ) − 𝑓𝜆 (𝑥𝜆𝑁 )] 𝜆
≤ 𝑓(𝑥𝜆0 ) − 𝑓𝜆 (𝑥𝜆0 ) .
5.1 Gradient flow semigroup
 91
This implies, again by (5.1.3), that 𝜏
∫ 𝑟𝜆 (𝑡) d𝑡 ≤ 𝜆 [𝑓 (𝑥𝜆0 ) − 𝑓𝜆 (𝑥𝜆0 )] ≤ 0
1 2 2 𝜆 𝜕𝑓 (𝑥0 ) . 2
Step 4: We will now show that the functions 𝐷𝜆 and 𝜑𝜆 defined above satisfy
1 d 1 𝐷𝜆 (𝑡; 𝑦)2 + 𝜑𝜆 (𝑡) − 𝑓(𝑦) ≤ 𝑟𝜆 (𝑡) , 2 d𝑡 2
(5.1.6)
for every 𝑦 ∈ dom 𝑓 and 𝑡 ∈ (0, ∞) \ P𝜆 . To prove this inequality, let us assume 𝑡 ∈ ((𝑛 − 1)𝜆, 𝑛𝜆). Applying Lemma 2.2.23 yields
1 d 𝐷 (𝑡; 𝑦)2 + 𝜑𝜆 (𝑡) − 𝑓(𝑦) 2 d𝑡 𝜆 2 2 1 [𝑑 (𝑥𝜆𝑛 , 𝑦) − 𝑑 (𝑥𝜆𝑛−1 , 𝑦) ] + 𝜑𝜆 (𝑡) − 𝑓(𝑦) = 2𝜆 2 1 ≤ − 𝑑 (𝑥𝜆𝑛−1 , 𝑥𝜆𝑛 ) + 𝑓(𝑦) − 𝑓 (𝑥𝜆𝑛 ) + 𝜑𝜆 (𝑡) − 𝑓(𝑦) 2𝜆 2 1 = − 𝑑 (𝑥𝜆𝑛−1 , 𝑥𝜆𝑛 ) + (1 − 𝑙𝜆 (𝑡)) [𝑓 (𝑥𝜆𝑛−1 ) − 𝑓 (𝑥𝜆𝑛 )] 2𝜆 which implies (5.1.6). Step 5: Let us now choose 𝑀 ∈ ℕ and put 𝜂 = 𝜂(𝑀) := 𝜏/𝑀. We hence have another partition P𝜂 := {𝑚𝜂}𝑚∈ℕ of the time interval [0, ∞) with time step 𝜂. Put
𝑥𝜂𝑚 := 𝐽𝜂(𝑚) 𝑥0 for 𝑚 ∈ ℕ0 . We will now interpolate the discrete time flows (𝑥𝜆𝑛 )𝑛∈ℕ and (𝑥𝜂𝑚 )𝑚∈ℕ by setting 𝑥𝜆 (𝑡) := 𝑥𝜆𝑛−1 ,
𝑥𝜆 (𝑡) := 𝑥𝜆𝑛 ,
𝑡 ∈ 𝐼𝑛𝜆 ,
and likewise, 𝜂
𝑥𝜂 (𝑡) := 𝑥𝑚−1 ,
𝑥𝜂 (𝑡) := 𝑥𝜂𝑚 ,
𝑡 ∈ 𝐼𝑚𝜂 ,
for every 𝑛, 𝑚 ∈ ℕ. Step 6: We claim that the function 𝐷𝜆𝜂 defined as 2
2
𝐷𝜆𝜂 (𝑡, 𝑠)2 := (1 − 𝑙𝜂 (𝑠)) 𝐷𝜆 (𝑡; 𝑥𝜂 (𝑠)) + 𝑙𝜂 (𝑠)𝐷𝜆 (𝑡; 𝑥𝜂 (𝑠)) ,
𝑡, 𝑠 ∈ [0, ∞),
satisfies
d 𝐷 (𝑡, 𝑡)2 ≤ 𝑟𝜆 (𝑡) + 𝑟𝜂 (𝑡) , d𝑡 𝜆𝜂 and therefore also
for all 𝑡 ∈ (0, ∞) \ P𝜆 ∪ P𝜂 ,
𝜏 2
𝐷𝜆𝜂 (𝜏, 𝜏) ≤ ∫ [𝑟𝜆 (𝑡) + 𝑟𝜂 (𝑡)] d𝑡 . 0
(5.1.7)
92  5 Gradient flow of a convex functional Indeed, recall that
𝜑𝜂 (𝑠) = (1 − 𝑙𝜂 (𝑠)) 𝑓 (𝑥𝜂 (𝑠)) + 𝑙𝜂 (𝑠)𝑓 (𝑥𝜂 (𝑠)) , 𝜂
and apply (5.1.6) first with 𝑦 := 𝑥𝜂 (𝑠), and then with 𝑦 := 𝑥 (𝑠), to get
1 d 1 𝐷𝜆 (𝑡; 𝑥𝜂 (𝑠)) + 𝜑𝜆 (𝑡) − 𝑓 (𝑥𝜂 (𝑠)) ≤ 𝑟𝜆 (𝑡) , 2 d𝑡 2 and,
1 d 1 𝐷𝜆 (𝑡; 𝑥𝜂 (𝑠)) + 𝜑𝜆 (𝑡) − 𝑓 (𝑥𝜂 (𝑠)) ≤ 𝑟𝜆 (𝑡) , 2 d𝑡 2 respectively, for every 𝑠 ∈ (0, ∞) and 𝑡 ∈ (0, ∞) \ P𝜆 . Taking the convex combination of the two last inequalities with coefficients 1 − 𝑙𝜂 (𝑠) and 𝑙𝜂 (𝑠) gives
1 𝜕 1 𝐷𝜆𝜂 (𝑡, 𝑠)2 + 𝜑𝜆 (𝑡) − 𝜑𝜂 (𝑠) ≤ 𝑟𝜆 (𝑡), 2 𝜕𝑡 2 for every 𝑠 ∈ (0, ∞) and 𝑡 ∈ (0, ∞) \ P𝜆 . In the same way, applying (5.1.6) to 𝐷𝜂 (𝑠; 𝑦) 𝜆
with 𝑦 := 𝑥𝜆 (𝑠), and then with 𝑦 := 𝑥 (𝑠), yields
1 𝜕 1 𝐷𝜂𝜆 (𝑠, 𝑡)2 + 𝜑𝜂 (𝑠) − 𝜑𝜆 (𝑡) ≤ 𝑟𝜂 (𝑠), 2 𝜕𝑠 2 for every 𝑡 ∈ (0, ∞) and 𝑠 ∈ (0, ∞) \ P𝜂 . Altogether we obtain
𝜕 𝜕 𝐷𝜆𝜂 (𝑡, 𝑠)2 + 𝐷𝜂𝜆 (𝑠, 𝑡)2 ≤ 𝑟𝜆 (𝑡) + 𝑟𝜂 (𝑠), 𝜕𝑡 𝜕𝑠 for every 𝑡, 𝑠 ∈ (0, ∞) \ P𝜆 ∪ P𝜂 . This immediately implies the desired inequality and hence proves the claim. 𝜆 Step 7: By (5.1.7) we have that (𝑥 (𝜏))𝜆 is Cauchy. Passing to the limit 𝜂 → 0 in (5.1.7) yields (5.1.5) and hence proves the whole theorem if 𝑥0 ∈ dom 𝑓. If 𝑥0 ∈ dom 𝑓, we use a simple approximation argument. The proof is now complete. The error estimate (5.1.5) yields with the help of Proposition 2.2.26 that 𝑡 → 𝑆𝑡 𝑥0 is a continuous map from [0, ∞) to H. Notation 5.1.7. We will often write 𝑥𝑡 := 𝑆𝑡 𝑥0 to denote a gradient flow curve. However, a word of caution is in order. At some places, the symbol 𝑥𝑡 stands for a point on a geodesic 𝑥𝑡 := (1 − 𝑡)𝑥0 + 𝑡𝑥1 . But it is always clear from the context which case we have in mind.
5.1 Gradient flow semigroup 
93
Proposition 5.1.8 (Semigroup property of a flow). Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be a convex lsc function. Then (𝑆𝑡 ) is a strongly continuous semigroup of nonexpansive mappings, that is, (i) lim𝑡→0+ 𝑆𝑡 𝑥 = 𝑥, (ii) 𝑆𝑡 (𝑆𝑠 𝑥) = 𝑆𝑠+𝑡 𝑥, for every 𝑠, 𝑡 ∈ [0, ∞), (iii) 𝑑(𝑆𝑡 𝑥, 𝑆𝑡 𝑦) ≤ 𝑑(𝑥, 𝑦), for each 𝑡 ∈ [0, ∞), for every 𝑥, 𝑦 ∈ dom 𝑓. Proof. (i) Follows immediately by Proposition 2.2.26 and (5.1.5). As for (ii), the proof is the same as in Theorem 4.3.3, only replace 𝑅𝜆 by 𝐽𝜆 and 𝑇𝑡 by 𝑆𝑡 . The nonexpansive property in (iii) is a direct consequence of Theorem 2.2.22 and (5.1.5). As a first and easy consequence of the semigroup property, we get that the function 𝑡 → 𝑓(𝑥𝑡 ) is nonincreasing. Corollary 5.1.9. Let 𝑓 : H → (−∞, ∞] be convex lsc and 𝑥𝑡 be the gradient flow starting at a point 𝑥0 ∈ dom 𝑓. Then the function 𝑡 → 𝑓(𝑥𝑡 ) is nonincreasing. Indeed, since we have
𝑓 (𝐽(𝑛) 𝑡 𝑥0 ) ≤ 𝑓 (𝐽 𝑡 𝑥0 ) ≤ 𝑓 (𝑥0 ) , 𝑛
𝑛
and 𝑓 is lsc, we obtain that 𝑓(𝑥𝑡 ) ≤ 𝑓(𝑥0 ). Monotonicity then follows by the semigroup property.
Properties of a gradient flow Having constructed the gradient flow semigroup, we will proceed by showing its features. You may notice that many properties of the solution to the problem (5.0.2) in a Hilbert space transfer to our nonlinear setting. Proposition 5.1.10 (Regularity of a flow). Let 𝑓 : H → (−∞, ∞] be a convex lsc function and 𝑥0 ∈ dom 𝑓. Then the mapping 𝑡 → 𝑆𝑡 𝑥0 is locally Lipschitz on (0, ∞) and Lipschitz on [𝑡0 , ∞) where 𝑡0 is an arbitrary positive time. Proof. Let 𝑥, 𝑦 ∈ H. Applying iteratively Lemma 2.2.23 yields 2 𝜀 𝑑 (𝐽 𝜀 𝑥, 𝑦) ≤ 2 [𝑓(𝑦) − 𝑓 (𝐽 𝜀 𝑥)] + 𝑑(𝑥, 𝑦)2 , 𝑛 𝑛 𝑛 2 2 𝜀 [𝑓(𝑦) − 𝑓 (𝐽(2) , 𝑑 (𝐽(2) 𝜀 𝑥, 𝑦) ≤ 2 𝜀 𝑥)] + 𝑑 (𝐽 𝜀 𝑥, 𝑦) 𝑛 𝑛 𝑛 𝑛
.. . 2 2 𝜀 (𝑛−1) [𝑓(𝑦) − 𝑓 (𝐽(𝑛) 𝑑 (𝐽(𝑛) 𝑥, 𝑦) . 𝜀 𝑥, 𝑦) ≤ 2 𝜀 𝑥)] + 𝑑 (𝐽 𝜀 𝑛 𝑛 𝑛 𝑛
94  5 Gradient flow of a convex functional Summing up these inequalities, dividing by 𝜀2 and putting 𝑥 := 𝑥𝑡 and 𝑦 := 𝑥𝑡 gives
𝑑 (𝐽(𝑛) 𝜀 𝑥𝑡 , 𝑥 𝑡 )
2
𝑛
≤2
𝜀2
𝑓 (𝑥𝑡 ) − 𝑓 (𝐽(𝑛) 𝜀 𝑥𝑡 ) 𝑛
𝜀
,
and after taking lim sup𝑛→∞ we obtain 2
𝑑 (𝑥𝑡+𝜀 , 𝑥𝑡 ) 𝑓 (𝑥𝑡 ) − 𝑓 (𝑥𝑡+𝜀 ) . ≤2 2 𝜀 𝜀 Since 𝑡 → 𝑓(𝑥𝑡 ) is monotone by Corollary 5.1.9, it is differentiable almost everywhere on [0, ∞) and thus
lim sup 𝜀→0+
𝑑 (𝑥𝑡+𝜀 , 𝑥𝑡 ) < ∞, 𝜀
almost everywhere on [0, ∞). Fix 𝑠 ∈ [0, ∞) and 𝐶 > 0 such that
𝑑 (𝑥𝑠+𝜀 , 𝑥𝑠 ) ≤ 𝐶𝜀, for every sufficiently small 𝜀 > 0. Then, by the semigroup property,
𝑑 (𝑥𝑡+𝜀 , 𝑥𝑡 ) = 𝑑 (𝑆𝑡−𝑠 𝑥𝑠+𝜀 , 𝑆𝑡−𝑠 𝑥𝑠 ) ≤ 𝑑 (𝑥𝑠+𝜀 , 𝑥𝑠 ) ≤ 𝐶𝜀, for every 𝑡 ∈ [𝑠, ∞). In conclusion, the mapping 𝑡 → 𝑆𝑡 𝑥 is locally Lipschitz on (0, ∞) and Lipschitz on [𝑡0 , ∞) where 𝑡0 is an arbitrary positive time. It is a well known fact in Hilbert spaces, that the evolution variational inequality completely characterizes the gradient flow. Next we see that this extends to Hadamard spaces. Theorem 5.1.11 (Evolution variational inequality). Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be a convex lsc function. Assume 𝑥0 ∈ dom 𝑓 and denote 𝑥𝑡 := 𝑆𝑡 𝑥0 for 𝑡 ∈ (0, ∞). Then 𝑡 → 𝑥𝑡 is absolutely continuous on (0, ∞) and satisfies
1 d 2 𝑑 (𝑦, 𝑥𝑡 ) + 𝑓 (𝑥𝑡 ) ≤ 𝑓(𝑦) , 2 d𝑡
(5.1.9)
for almost every 𝑡 ∈ (0, ∞) and every 𝑦 ∈ dom 𝑓. Conversely, if an absolutely continuous curve 𝑧 : (0, ∞) → H with lim𝑡→0+ 𝑧(𝑡) = 𝑥0 satisfies (5.1.9), then 𝑧𝑡 = 𝑆𝑡 𝑥0 for every 𝑡 ∈ (0, ∞). Proof. By an approximation argument, we can assume 𝑥0 ∈ dom 𝑓. Absolute continuity of 𝑡 → 𝑥𝑡 follows by the local Lipschitz property. Choose 𝑡1 ∈ [0, ∞) and 𝑡2 ∈ [𝑡1 , ∞). Integrating (5.1.6) from 𝑡1 to 𝑡2 and sending 𝜆 → 0 gives 𝑡2
2 2 1 1 𝑑 (𝑦, 𝑥𝑡2 ) − 𝑑 (𝑦, 𝑥𝑡1 ) + ∫ 𝑓 (𝑥𝑡 ) d𝑡 ≤ (𝑡2 − 𝑡1 ) 𝑓(𝑦) , 2 2 𝑡1
(5.1.10)
5.1 Gradient flow semigroup 
95
for each 𝑦 ∈ dom 𝑓. Differentiating the above inequality with respect to 𝑡2 yields (5.1.9). It remains to show that this solution is unique. Let 𝑣 : (0, ∞) → H be a absolutely continuous solution to (5.1.9) and integrate from 𝑡 − ℎ to 𝑡 to obtain 𝑡
1 1 2 2 𝑑 (𝑦, 𝑣𝑡 ) − 𝑑 (𝑦, 𝑣𝑡−ℎ ) + ∫ 𝑓 (𝑣𝑠 ) d𝑠 ≤ ℎ𝑓(𝑦), 2 2 𝑡−ℎ
for each 𝑦 ∈ dom 𝑓. Dividing by ℎ and passing to the limit gives
lim sup [ ℎ→0+
1 1 2 2 𝑑 (𝑦, 𝑣𝑡 ) − 𝑑 (𝑦, 𝑣𝑡−ℎ ) ] + 𝑓 (𝑣𝑡 ) ≤ 𝑓(𝑦) , 2ℎ 2ℎ
(5.1.11)
for each 𝑡 ∈ (0, ∞). By the same argument we arrive at
lim sup [ ℎ→0+
1 1 2 2 𝑑 (𝑦, 𝑣𝑡+ℎ ) − 𝑑 (𝑦, 𝑣𝑡 ) ] + 𝑓 (𝑣𝑡 ) ≤ 𝑓(𝑦) , 2ℎ 2ℎ
(5.1.12)
for each 𝑡 ∈ (0, ∞). Let now 𝑢 : (0, ∞) → dom 𝑓 and 𝑤 : (0, ∞) → dom 𝑓 be absolutely continuous curves satisfying (5.1.9) and
lim 𝑢𝑡 = lim 𝑤𝑡 = 𝑢0 ∈ dom 𝑓. 𝑡→0
𝑡→0
Put 𝑣𝑡 := 𝑢𝑡 and 𝑦 := 𝑤𝑡 in (5.1.11) and 𝑣𝑡 := 𝑤𝑡 and 𝑦 := 𝑣𝑡 (5.1.12) and sum the resulting inequalities up. Then Exercise 5.2 gives
d 2 𝑑 (𝑢𝑡 , 𝑤𝑡 ) ≤ 0, d𝑡 for almost every 𝑡 ∈ (0, ∞), which shows the uniqueness of the solution and the proof is complete. 1
If we put 𝑦 := 𝑥𝑡1 in (5.1.10), then we see that the mapping 𝑡 → 𝑆𝑡 𝑥0 is 2 Hölder on 1 2
[0, ∞). The Hölder constant is optimal in the sense that for each 𝛽 ∈ ( 12 , 1) there exists a convex lsc function 𝑓 : [0, 1] → ℝ, for instance 𝑓(𝑥) := 1 − 𝑥𝛼 ,
𝑥 ∈ [0, 1] ,
(5.1.13)
1
where 𝛼 ∈ (0, 1) is a number satisfying 2−𝛼 < 𝛽, such that the gradient flow is not 𝛽Hölder. Indeed, the solution to the equation
̇ = −∇𝑓 (𝑥(𝑡)) , 𝑥(𝑡)
𝑡 ∈ [0, 1], 1
with the initial condition 𝑥(0) := 0 is 𝑥(𝑡) := (𝛼(2 − 𝛼)𝑡) 2−𝛼 , where 𝑡 ∈ [0, 1], which is not a 𝛽Hölder mapping. The following is a quantified version of Corollary 5.1.9.
96  5 Gradient flow of a convex functional Proposition 5.1.12. Let 𝑓 : H → (−∞, ∞] be a convex lsc function. Given 𝑥0 ∈ dom 𝑓, put 𝑥𝑡 := 𝑆𝑡 𝑥0 . Then
1 2 𝑑 (𝑦, 𝑥0 ) 2𝑡 for every 𝑦 ∈ H. Consequently, lim𝑡→∞ 𝑓(𝑥𝑡 ) = inf H 𝑓. 𝑓 (𝑥𝑡 ) ≤ 𝑓(𝑦) +
(5.1.14)
Proof. To show (5.1.14) we need to integrate (5.1.9) from 0 to 𝜏 and use monotonicity of 𝑓(𝑥𝑡 ) to get 𝜏
1 1 2 𝜏𝑓 (𝑥𝜏 ) ≤ ∫ 𝑓 (𝑥𝑡 ) d𝑡 ≤ 𝜏𝑓(𝑦) + 𝑑 (𝑦, 𝑥0 ) − 𝑑(𝑦, 𝑥𝜏 )2 . 2 2
(5.1.15)
0
Then (5.1.14) follows. We get to a remarkable point, where one can see that a gradient flow moves at each point in the direction of the steepest descent of the function. Theorem 5.1.13. Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be a convex lsc function. Given 𝑥0 ∈ dom 𝑓, put 𝑥𝑡 := 𝑆𝑡 𝑥0 . Then,
𝑑 (𝑥𝑡+ℎ , 𝑥𝑡 ) , ℎ
(5.1.16)
𝑓 (𝑥𝑡 ) − 𝑓 (𝑥𝑡+ℎ ) , ℎ→0+ 𝑑 (𝑥𝑡+ℎ , 𝑥𝑡 )
(5.1.17)
𝑓 (𝑥𝑡 ) − 𝑓 (𝑥𝑡+ℎ ) , ℎ→0+ ℎ
(5.1.18)
𝜕𝑓 (𝑥𝑡 ) = lim
ℎ→0+
as well as,
𝜕𝑓 (𝑥𝑡 ) = lim and also, 2
𝜕𝑓 (𝑥𝑡 ) = lim for every 𝑡 ∈ (0, ∞).
Proof. Step 1: Integrating (5.1.9) from 𝑡 to 𝑡 + ℎ and dividing by ℎ yields ℎ
1 1 2 2 [𝑑 (𝑥𝑡 , 𝑦) − 𝑑 (𝑥𝑡+ℎ , 𝑦) ] ∫ 𝑓 (𝑥𝑡+𝑠 ) d𝑠 − 𝑓(𝑦) ≤ ℎ 2ℎ
(5.1.19)
0
≤ [𝑑 (𝑥𝑡 , 𝑦) + 𝑑 (𝑥𝑡+ℎ , 𝑦)]
𝑑 (𝑥𝑡+ℎ , 𝑥𝑡 ) , 2ℎ
for every 𝑦 ∈ dom 𝑓. Taking lim inf ℎ→0+ and recalling that 𝑡 → 𝑓(𝑥𝑡 ) is lsc, we have
𝑓 (𝑥𝑡 ) − 𝑓(𝑦) ≤ 𝑑 (𝑥𝑡 , 𝑦) lim inf ℎ→0+
𝑑 (𝑥𝑡+ℎ , 𝑥𝑡 ) , ℎ
(5.1.20)
5.1 Gradient flow semigroup
 97
for every 𝑦 ∈ dom 𝑓. Taking supremum over 𝑦 ∈ dom 𝑓 gives
𝑑 (𝑥𝑡+ℎ , 𝑥𝑡 ) , ℎ
𝜕𝑓 (𝑥𝑡 ) ≤ lim inf ℎ→0+
by Lemma 5.1.2. On the other hand if we put 𝑦 := 𝑥𝑡 in (5.1.19) and rescale the integrand, then 1
2
𝑑 (𝑥𝑡+ℎ , 𝑥𝑡 ) 1 ≤ ∫ [𝑓 (𝑥𝑡 ) − 𝑓 (𝑥𝑡+ℎ𝑠 )] d𝑠 2 2ℎ ℎ 0
1
≤ 𝜕𝑓 (𝑥𝑡 ) ∫ 0
𝑑 (𝑥𝑡+ℎ𝑠 , 𝑥𝑡 ) 𝑠 d𝑠 , ℎ𝑠
where we again used Lemma 5.1.2. Taking lim supℎ→0+ yields 1
2
lim sup ℎ→0+
𝑑 (𝑥𝑡+ℎ , 𝑥𝑡 ) 𝑑 (𝑥𝑡+ℎ𝑠 , 𝑥𝑡 ) 𝑠 d𝑠 ≤ 𝜕𝑓 (𝑥𝑡 ) ∫ lim sup 2ℎ2 ℎ𝑠 ℎ→0+ 0
1
≤ 𝜕𝑓 (𝑥𝑡 ) ∫ lim sup 0
≤
ℎ →0+
𝑑 (𝑥𝑡+ℎ , 𝑥𝑡 ) 𝑠 d𝑠 ℎ
𝑑 (𝑥𝑡+ℎ , 𝑥𝑡 ) 1 𝜕𝑓 (𝑥𝑡 ) lim sup . 2 ℎ ℎ→0+
We hence get
lim sup ℎ→0+
𝑑 (𝑥𝑡+ℎ , 𝑥𝑡 ) ≤ 𝜕𝑓 (𝑥𝑡 ) . ℎ
The onesided limit
lim
ℎ→0+
𝑑 (𝑥𝑡+ℎ , 𝑥𝑡 ) ℎ
therefore exists and is equal to 𝜕𝑓(𝑥𝑡 ), which proves (5.1.16). Step 2: We now prove (5.1.17). The inequality
lim sup ℎ→0+
𝑓 (𝑥𝑡 ) − 𝑓 (𝑥𝑡+ℎ ) ≤ 𝜕𝑓 (𝑥𝑡 ) 𝑑 (𝑥𝑡 , 𝑥𝑡+ℎ )
follows immediately from the definition of the slope. To show the converse, choose 𝑥 ∈ dom 𝑓 such that 𝑓(𝑥𝑡 ) > 𝑓(𝑥) and choose 𝜀1 > 0 such that 3𝜀1 < 𝑑(𝑥, 𝑥𝑡 ). Then consider ℎ ∈ (0, ∞) such that
𝑑 (𝑥𝑡 , 𝐽(𝑛) ℎ 𝑥𝑡 ) < 𝜀1 , 𝑛
98  5 Gradient flow of a convex functional for every 𝑛 ∈ ℕ, which for a fixed 𝑛 ∈ ℕ and 𝑘 = 1, . . . , 𝑛 implies that
𝜆 𝑘 :=
(𝑘−1) 𝑑 (𝐽(𝑘) 𝑥𝑡 ) ℎ 𝑥𝑡 , 𝐽 ℎ 𝑛
𝑛
(𝑘−1)
𝑑 (𝑥, 𝐽 ℎ
𝑛
, 𝑥𝑡 )
which satisfies 𝜆 𝑘 ∈ [0, 1). Denote
𝑦𝑘 := (1 − 𝜆 𝑘 ) 𝐽(𝑘−1) 𝑥𝑡 + 𝜆 𝑘 𝑥 , ℎ 𝑛
and observe (𝑘−1) 𝑑 (𝑦𝑘 , 𝐽(𝑘−1) 𝑥𝑡 ) = 𝑑 (𝐽(𝑘) 𝑥𝑡 ) , ℎ ℎ 𝑥𝑡 , 𝐽 ℎ 𝑛
𝑛
𝑛
(𝑘)
which further gives 𝑓(𝐽 ℎ 𝑥𝑡 ) ≤ 𝑓(𝑦𝑘 ), for 𝑘 = 1, . . . , 𝑛. Since 𝑓 is convex, we have 𝑛
(𝑘−1) 𝑓 (𝐽(𝑘) 𝑥𝑡 ) + 𝜆 𝑘 𝑓(𝑥) , ℎ 𝑥𝑡 ) ≤ (1 − 𝜆 𝑘 ) 𝑓 (𝐽 ℎ 𝑛
𝑛
and, after the 𝑛th iteration, we arrive at 𝑛
𝑛
𝑘=1
𝑘=1
𝑓 (𝐽(𝑛) ℎ 𝑥𝑡 ) ≤ [∏ (1 − 𝜆 𝑘 )]𝑓 (𝑥𝑡 ) + [1 − ∏ (1 − 𝜆 𝑘 )]𝑓(𝑥) . 𝑛
(𝑘−1)
We already know that 𝑑(𝐽 ℎ
𝑛
𝑥𝑡 , 𝑥) ≤ 𝑑(𝑥𝑡 , 𝑥) + 𝜀1 and compute
𝑛
𝑛
log[∏ (1 − 𝜆 𝑘 )] = ∑ log (1 − 𝜆 𝑘 ) 𝑘=1
𝑘=1 𝑛
≤ − ∑ 𝜆𝑘 𝑘=1 (𝑘−1) 𝑑 (𝐽(𝑘) 𝑥𝑡 ) ℎ 𝑥𝑡 , 𝐽 ℎ
𝑛
𝑛
≤ −∑
𝑛
𝑑 (𝑥𝑡 , 𝑥) + 𝜀1
𝑘=1
𝑑 (𝐽(𝑛) ℎ 𝑥𝑡 , 𝑥 𝑡 ) ≤−
𝑛
𝑑 (𝑥𝑡 , 𝑥) + 𝜀1
.
Then the previous inequality reads (𝑛)
𝑑 (𝐽(𝑛) ℎ 𝑥𝑡 , 𝑥 𝑡 )
𝑓 (𝐽 ℎ 𝑥𝑡 ) ≤ 𝑓(𝑥) + exp (− 𝑛
𝑛
𝑑 (𝑥𝑡 , 𝑥) + 𝜀1
) [𝑓 (𝑥𝑡 ) − 𝑓(𝑥)] ,
and after letting 𝑛 → ∞,
𝑓 (𝑥𝑡+ℎ ) ≤ 𝑓(𝑥) + exp (−
𝑑 (𝑥𝑡+ℎ , 𝑥𝑡 ) ) [𝑓 (𝑥𝑡 ) − 𝑓(𝑥)] . 𝑑 (𝑥𝑡 , 𝑥) + 𝜀1
5.1 Gradient flow semigroup  99
Observe that if 𝜀2 ∈ (0, 1), then exp(−𝛼) ≤ 1 − (1 − 𝜀2 )𝛼 for sufficiently small 𝛼 ∈ [0, 1). So we can further decrease ℎ to have
𝑑 (𝑥𝑡+ℎ , 𝑥𝑡 ) 𝑑 (𝑥𝑡 , 𝑥) + 𝜀1 sufficiently small and we then have
(1 − 𝜀2 )
𝑓 (𝑥𝑡 ) − 𝑓(𝑥) 𝑓 (𝑥𝑡 ) − 𝑓 (𝑥𝑡+ℎ ) ≤ . 𝑑 (𝑥𝑡 , 𝑥) + 𝜀1 𝑑 (𝑥𝑡 , 𝑥𝑡+ℎ )
Letting ℎ → 0+ gives
(1 − 𝜀2 )
𝑓 (𝑥𝑡 ) − 𝑓 (𝑥𝑡+ℎ ) 𝑓 (𝑥𝑡 ) − 𝑓(𝑥) . ≤ lim inf ℎ→0+ 𝑑 (𝑥𝑡 , 𝑥) + 𝜀1 𝑑 (𝑥𝑡 , 𝑥𝑡+ℎ )
Now send 𝜀1 and 𝜀2 to 0 and take lim sup𝑥→𝑥𝑡 to obtain
𝜕𝑓 (𝑥𝑡 ) = lim sup 𝑥→𝑥𝑡
𝑓 (𝑥𝑡 ) − 𝑓(𝑥) 𝑓 (𝑥𝑡 ) − 𝑓 (𝑥𝑡+ℎ ) ≤ lim inf , ℎ→0+ 𝑑 (𝑥𝑡 , 𝑥) 𝑑 (𝑥𝑡 , 𝑥𝑡+ℎ )
which finishes the proof of (5.1.17). Step 3: We easily get (5.1.18) from (5.1.16) and (5.1.17) since
𝑓 (𝑥𝑡 ) − 𝑓 (𝑥𝑡+ℎ ) 𝑓 (𝑥𝑡 ) − 𝑓 (𝑥𝑡+ℎ ) 𝑑 (𝑥𝑡 , 𝑥𝑡+ℎ ) = . ⋅ ℎ ℎ 𝑑 (𝑥𝑡 , 𝑥𝑡+ℎ ) The proof is complete. A slope has a similar decay as 𝑓(𝑥𝑡 ) in Proposition 5.1.12. Proposition 5.1.14. Let 𝑓 : H → (−∞, ∞] be a convex lsc function and 𝑥0 ∈ dom 𝑓. Put 𝑥𝑡 := 𝑆𝑡 𝑥0 . Then the function 𝑡 → 𝜕𝑓(𝑥𝑡 ) is nonincreasing and 2
𝜕𝑓 (𝑥𝑡 ) ≤ 𝜕𝑓(𝑦)2 +
1 2 𝑑 (𝑦, 𝑥0 ) , 2 𝑡
for every 𝑦 ∈ H. Proof. By (5.1.3) we have
𝜕𝑓 (𝐽(𝑛) 𝑡 𝑥) ≤ 𝜕𝑓 (𝐽 𝑡 𝑥) ≤ 𝜕𝑓(𝑥), 𝑛
𝑛
and since 𝜕𝑓 is lsc (Lemma 5.1.2), we obtain 𝜕𝑓(𝑥𝑡 ) ≤ 𝜕𝑓(𝑥). Monotonicity then follows by the semigroup property.
100  5 Gradient flow of a convex functional Multiply (5.1.18) by 𝑡 and integrate from 0 to 𝜏 and use the monotonicity of 𝜕𝑓 to get 𝜏
𝜏2 2 2 𝜕𝑓 (𝑥𝜏 ) ≤ ∫ 𝑡 𝜕𝑓 (𝑥𝑡 ) d𝑡 2 0
𝜏
≤ − ∫ 𝑡 [𝑓 (𝑥𝑡 )] d𝑡 0 𝜏
= ∫ 𝑓 (𝑥𝑡 ) d𝑡 − 𝜏𝑓 (𝑥𝜏 ) , 0
and by (5.1.15) we have, for each 𝑦 ∈ H,
1 1 2 2 ≤ 𝜏𝑓(𝑦) + 𝑑 (𝑦, 𝑥0 ) − 𝜏𝑓 (𝑥𝜏 ) − 𝑑 (𝑦, 𝑥𝜏 ) , 2 2 1 1 2 2 ≤ 𝜏𝜕𝑓(𝑦)𝑑 (𝑦, 𝑥𝜏 ) − 𝑑 (𝑦, 𝑥𝜏 ) + 𝑑 (𝑦, 𝑥0 ) 2 2 𝜏2 1 2 ≤ 𝜕𝑓(𝑦)2 + 𝑑 (𝑦, 𝑥0 ) . 2 2 The proof is now complete. It is obvious that 𝑥𝑡 ∈ dom 𝜕𝑓 for every 𝑡 ∈ (0, ∞). By the Lipschitz property of 𝑡 → 𝑥𝑡 and (5.1.20) with 𝑦 := 𝑥𝑡+ℎ we get that the function 𝑡 → 𝑓(𝑥𝑡 ) is Lipschitz on (𝑡0 , ∞) for an arbitrary 𝑡0 > 0. Furthermore, it is convex by (5.1.18). In the end, we take a look at gradient flows of strongly convex lsc functions, which we have already met in Exercise 2.6. Proposition 5.1.15. Let 𝑓 : H → (−∞, ∞] be a strongly convex function with parameter 𝜅. Then the corresponding gradient flow satisfies
𝑑 (𝑆𝑡 𝑥, 𝑆𝑡 𝑦) ≤ exp(−𝑡𝜅)𝑑(𝑥, 𝑦) ,
(5.1.21)
for every 𝑥, 𝑦 ∈ dom 𝑓, and also a stronger version of the evolution variational inequality
1 d 𝜅 2 2 𝑑 (𝑦, 𝑥𝑡 ) + 𝑑 (𝑦, 𝑥𝑡 ) + 𝑓 (𝑥𝑡 ) ≤ 𝑓(𝑦) , 2 d𝑡 2 for almost every 𝑡 ∈ (0, ∞) and every 𝑦 ∈ dom 𝑓. Proof. See Exercise 5.4. We know by Proposition 2.2.17 that a lsc strongly convex function 𝑓 : H → (−∞, ∞] has a unique minimizer 𝑧 ∈ H. By (5.1.21) we have the following error estimate for the flow 𝑑 (𝑆𝑡𝑥, 𝑧) ≤ exp(−𝑡𝜅)𝑑(𝑥, 𝑧) , (5.1.22)
5.1 Gradient flow semigroup
 101
for every 𝑥 ∈ dom 𝑓. Here again 𝜅 ∈ (0, ∞) is the parameter of the strong convexity of 𝑓.
Large time behavior of a gradient flow Next we study what happens with a flow if 𝑡 → ∞. It is natural to expect that it converges to a minimizer of 𝑓 provided 𝑓 attains its minimum. It is indeed the case, but unfortunately, the flow (𝑆𝑡 𝑥0 ) convergences only weakly to a minimizer; see Definition 3.1.9. Compare with the convergence of (𝐽𝜆 𝑥0 ) as 𝜆 → ∞ given in Theorem 2.2.25. Theorem 5.1.16 (Gradient flow convergence). Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be a convex lsc function attaining its minimum. Then, given a starting point 𝑥0 ∈ dom 𝑓, the gradient flow 𝑥𝑡 := 𝑆𝑡 𝑥0 weakly converges to a minimizer of 𝑓 as 𝑡 → ∞. Proof. Choose a sequence (𝑡𝑛 ) ⊂ [0, ∞) such that 𝑡𝑛 → ∞. The set of minimizers Min 𝑓 is nonempty by the assumptions and (𝑥𝑡𝑛 )𝑛 is Fejér monotone with respect to Min 𝑓 since 𝑆𝑡 is nonexpansive. Furthermore, the sequence (𝑥𝑡𝑛 )𝑛 is minimizing, that is,
lim 𝑓 (𝑥𝑡𝑛 ) = inf 𝑓(𝑦),
𝑛→∞
𝑦∈H
by Proposition 5.1.14. Since 𝑓 is weakly lsc (Lemma 3.2.3), it follows that all weak cluster points of (𝑥𝑡𝑛 ) lie in Min 𝑓. Apply Proposition 3.2.6 (iii) to get that (𝑥𝑡𝑛 ) weakly converges to a point 𝑧 ∈ Min 𝑓. If we now choose another sequence (𝑡𝑛 ) ⊂ [0, ∞) such that 𝑡𝑛 → ∞, we have 𝑤
to show that 𝑥𝑡𝑛 → 𝑧. Since the sequence (𝑥𝑡𝑛 ) ∪ (𝑥𝑡𝑛 ) is also Fejér monotone with respect to Min 𝑓, we obtain that it weakly converges to 𝑧 using the same argument as 𝑤
above. Consequently, 𝑥𝑡𝑛 → 𝑧. The proof is complete. There exists a convex lsc function 𝑓 : ℓ2 → (−∞, ∞] attaining its minimum and 𝑥0 ∈ dom 𝑓 such that the corresponding gradient flow semigroup converges weakly to a minimizer, but fails to converge strongly [23]. If the function in question is strongly convex on (bounded subsets of) its domain, then the gradient flow converges strongly by (5.1.22). The reader is invited to give an elementary proof of this fact in Exercise 5.5. We also obtain strong convergence provided the underlying space be locally compact. Note that the proximal point algorithm is a discrete version of the gradient flow (see Section 6.3) and it is not surprising then that it has the same asymptotic behavior. In the end, we give an application of gradient flows in Hadamard spaces into Kähler geometry. For the details, the reader is referred to [190, 191].
102  5 Gradient flow of a convex functional Example 5.1.17 (Calabi flow). Let (𝑀2𝑛 , 𝜔) be a compact Kähler manifold. The set of Kähler potentials
K := {𝜑 ∈ C∞ (𝑀, ℝ) : 𝜔𝜑 := 𝜔 + √−1𝜕𝜕𝜑 > 0} is equipped with the 𝐿2 metric as follows. Given 𝜑 ∈ K, we can identify the tangent space 𝑇𝜑 K with C∞ (𝑀, ℝ) and put
⟨𝜓1 , 𝜓2 ⟩ := ∫ 𝜓1 𝜓2 𝜔𝜑𝑛 , 𝑀
which gives a Riemannian metric on K; see [70, 135, 183]. Since K is not complete with respect to this metric, we consider its completion K, which turns out to be a Hadamard space. Denote now 𝑠(𝜔𝜑 ) the scalar curvature of a metric 𝜔𝜑 and put
𝑠 :=
1 Vol(𝑀)
∫ 𝑠 (𝜔𝜑 ) 𝜔𝜑𝑛 , 𝑀
which is independent of 𝜑. A solution 𝜑 : [0, ∞) → K to the Calabi equation
𝜕 𝜑(𝑡) = 𝑠 (𝜔𝜑(𝑡) ) − 𝑠 , 𝜕𝑡 is called a Calabi flow and it can be equivalently defined as the gradient flow of the Mabuchi functional 1
𝑓(𝜑) := − ∫ ∫ 0 𝑀
𝜕𝜑(𝑡) 𝑛 [𝑠 (𝜔𝜑(𝑡) ) − 𝑠] 𝜔𝜑(𝑡) ∧ d𝑡 , 𝜕𝑡
where 𝜑(⋅) : [0, 1] → K is a curve with 𝜑(0) = 0 and 𝜑(1) = 𝜑. The Mabuchi functional 𝑓 : K → ℝ is convex and lsc and we can extend it onto K by whenever 𝜑 ∈ K 𝑓(𝜑), 𝑓(𝜑) := { lim inf K∋𝜑𝑛 →𝜑 𝑓 (𝜑𝑛 ) , elsewhere. Then 𝑓 : K → ℝ is also convex and lsc. Recall that a conjecture of S. Donaldson [71] states that if (𝑀2𝑛 , 𝜔) is a compact Kähler manifold of constant scalar curvature, then the Calabi flow with any initial condition exists for all times and converges to the constant scalar curvature metric. While this conjecture remains open, we can apply Theorem 5.1.16 to conclude that the gradient flow of the functional 𝑓 with any initial condition exists for all times and weakly converges to a minimizer 𝜑 ∈ K of 𝑓. A next step towards the full Donaldson conjecture is to show the minimizer of the Mabuchi energy remains unique in the completion.
5.2 Mosco convergence and its consequences 
103
5.2 Mosco convergence and its consequences In this section we consider a sequence of convex lsc functions 𝑓𝑛 : H → (−∞, ∞], where 𝑛 ∈ ℕ, instead of a single function. Let us first establish the notation. Notation 5.2.1. Given a sequence of convex lsc functions (𝑓𝑛 ), the resolvent of the function 𝑓𝑛 , where 𝑛 ∈ ℕ, will be denoted 𝐽𝜆𝑛 , and its semigroup 𝑆𝑛𝑡 . We study a situation, when the sequence (𝑓𝑛 ) converges (in some sense) to a limit function 𝑓 : H → (−∞, ∞], whose resolvent and semigroup are denoted 𝐽𝜆 and 𝑆𝑡 , respectively. We establish a relationship among the sequences (𝑓𝑛 ) and (𝐽𝜆𝑛 ) and (𝑆𝑛𝑡 ). Namely, we show that the Mosco convergence of functions implies the convergence of the resolvents, and that the convergence of the resolvents implies the convergence of the semigroups. Definition 5.2.2 (𝛤convergence). A sequence (𝑓𝑛 ) of functions 𝑓𝑛 : H → (−∞, ∞] is said to 𝛤converge to a function 𝑓 : H → (−∞, ∞] if, for each 𝑥 ∈ H, we have: (𝛤1) 𝑓(𝑥) ≤ lim inf 𝑛→∞ 𝑓𝑛 (𝑥𝑛 ), whenever 𝑥𝑛 → 𝑥, and (𝛤2) there exists (𝑦𝑛 ) ⊂ H such that 𝑦𝑛 → 𝑥 and 𝑓𝑛 (𝑦𝑛) → 𝑓(𝑥). Like in Hilbert spaces, 𝛤convergence preserves convexity and the limit function is always lsc; see Exercises 5.6 and 5.7. We will however use a stronger type of convergence, called the Mosco convergence. Definition 5.2.3 (Mosco convergence). The sequence (𝑓𝑛 ) converges to 𝑓 in the sense of Mosco if, for each 𝑥 ∈ H, we have: 𝑤 (M1) 𝑓(𝑥) ≤ lim inf 𝑛→∞ 𝑓𝑛 (𝑥𝑛 ), whenever 𝑥𝑛 → 𝑥, and (M2) there exists (𝑦𝑛 ) ⊂ H such that 𝑦𝑛 → 𝑥 and 𝑓𝑛 (𝑦𝑛 ) → 𝑓(𝑥). The advantage of the Mosco convergence is that it implies the convergence of the Moreau–Yosida envelopes and resolvents. As a matter of fact, these conditions are equivalent in Hilbert spaces [12, Theorem 3.26]. It is not known whether the same holds in Hadamard spaces; see Question 5.2.5 below. Theorem 5.2.4. Let (H, 𝑑) be a Hadamard space and 𝑓𝑛 : H → (−∞, ∞] be convex lsc functions, for each 𝑛 ∈ ℕ. If 𝑓𝑛 → 𝑓 in the sense of Mosco, as 𝑛 → ∞, then
lim 𝑓𝜆𝑛(𝑥) = 𝑓𝜆 (𝑥) ,
𝑛→∞
(5.2.23)
and,
lim 𝐽𝑛𝑥 𝑛→∞ 𝜆
= 𝐽𝜆 𝑥 ,
(5.2.24)
for every 𝜆 ∈ (0, ∞) and 𝑥 ∈ H. Here 𝐽𝜆𝑛 stands for the resolvent of 𝑓𝑛 and 𝐽𝜆 for the resolvent of 𝑓.
104  5 Gradient flow of a convex functional Proof. Step 1: We will first show that the sequence (𝐽𝜆𝑛 𝑥)𝑛 is bounded. To do so, we need the following claim, which extends Lemma 2.2.13. Claim: Given 𝑥0 ∈ H, there exist 𝛼, 𝛽 ∈ (0, ∞) such that
𝑓𝑛(𝑦) ≥ −𝛼𝑑(𝑦, 𝑥0 ) − 𝛽 ,
(5.2.25)
for every 𝑦 ∈ H and 𝑛 ∈ ℕ. Indeed, assume that this is not the case, that is, for each 𝑘 ∈ ℕ, there exist 𝑛𝑘 ∈ ℕ and 𝑥𝑘 ∈ H such that
𝑓𝑛𝑘 (𝑥𝑘 ) + 𝑘 [𝑑 (𝑥𝑘 , 𝑥0 ) + 1] < 0. Without loss of generality we may assume 𝑛𝑘 → ∞ as 𝑘 → ∞, for otherwise a contradiction follows already from Lemma 2.2.13. If (𝑥𝑘 ) were bounded, then there 𝑤
̂. By the exist 𝑥̂ ∈ H and a subsequence of (𝑥𝑘 ), still denoted (𝑥𝑘 ), such that 𝑥𝑘 → 𝑥 Mosco convergence of (𝑓𝑛 ) we have ̂ ≤ lim inf 𝑓𝑛𝑘 (𝑥𝑘 ) ≤ − lim sup 𝑘 [𝑑 (𝑥𝑘 , 𝑥0 ) + 1] ≤ −∞, 𝑓 (𝑥) 𝑘→∞
𝑘→∞
which is impossible. Assume therefore (𝑥𝑘 ) is unbounded. Choose 𝑦0 ∈ H and find 𝑦𝑘 → 𝑦0 such that 𝑓𝑛𝑘 (𝑦𝑘 ) → 𝑓(𝑦0 ). Put
𝑧𝑘 := (1 − 𝑡𝑘 )𝑦𝑘 + 𝑡𝑘 𝑥𝑘 ,
with 𝑡𝑘 :=
1 √𝑘𝑑(𝑥𝑘 , 𝑦𝑘 )
.
Then 𝑧𝑘 → 𝑦0 . By convexity,
𝑓𝑛𝑘 (𝑧𝑘 ) ≤ (1 − 𝑡𝑘 )𝑓𝑛𝑘 (𝑦𝑘 ) + 𝑡𝑘 𝑓𝑛𝑘 (𝑥𝑘 ) ≤ (1 − 𝑡𝑘 )𝑓𝑛𝑘 (𝑦𝑘 ) − 𝑡𝑘 𝑘 [𝑑(𝑥𝑘 , 𝑥0 ) + 1] 𝑑(𝑥𝑘 , 𝑥0 ) + 1 . ≤ (1 − 𝑡𝑘 )𝑓𝑛𝑘 (𝑦𝑘 ) − √𝑘 𝑑(𝑥𝑘 , 𝑦𝑘 ) Hence,
𝑓(𝑦0 ) ≤ lim inf 𝑓𝑛𝑘 (𝑧𝑘 ) ≤ −∞, 𝑘→∞
which is not possible, either. This proves the claim. Step 2: For each 𝑛 ∈ ℕ we hence have
𝑓𝑛 (𝐽𝜆𝑛𝑥) ≥ −𝛼𝑑(𝐽𝜆𝑛𝑥, 𝑥0 ) − 𝛽. Choose a sequence (𝑢𝑛 ) ⊂ H such that 𝑢𝑛 → 𝑥0 and 𝑓𝑛 (𝑢𝑛 ) → 𝑓(𝑥0 ). From the definition of 𝐽𝜆𝑛 𝑥, we have
𝑓𝑛(𝑢𝑛 ) +
1 1 2 𝑑(𝑥, 𝑢𝑛)2 ≥ 𝑓𝑛 (𝐽𝜆𝑛𝑥) + 𝑑 (𝑥, 𝐽𝜆𝑛𝑥) , 2𝜆 2𝜆
and furthermore,
𝑓𝑛 (𝑢𝑛 ) + 𝛼𝑑(𝐽𝜆𝑛 𝑥, 𝑥0 ) + 𝛽 +
1 1 2 𝑑(𝑥, 𝑢𝑛)2 ≥ 𝑑 (𝑥, 𝐽𝜆𝑛 𝑥) , 2𝜆 2𝜆
which implies that the sequence (𝐽𝜆𝑛 𝑥)𝑛 is bounded.
5.2 Mosco convergence and its consequences 
105
Let 𝑐 ∈ H be a weak cluster point of (𝐽𝜆𝑛 𝑥)𝑛 . Its existence is guaranteed by boundedness of the sequence. Since 𝑓𝑛 → 𝑓 in the sense of Mosco, there exists a sequence (𝑦𝑛 ) ⊂ H such that 𝑦𝑛 → 𝐽𝜆 𝑥 and 𝑓𝑛 (𝑦𝑛) → 𝑓(𝐽𝜆 𝑥). Then
lim sup 𝑓𝜆𝑛 (𝑥) ≤ lim sup [𝑓𝑛 (𝑦𝑛) + 𝑛→∞
𝑛→∞
= 𝑓 (𝐽𝜆 𝑥) + ≤ 𝑓(𝑐) +
1 𝑑(𝑥, 𝑦𝑛)2 ] 2𝜆
1 2 𝑑 (𝑥, 𝐽𝜆 𝑥) 2𝜆
1 𝑑 (𝑥, 𝑐)2 2𝜆
≤ lim inf [𝑓𝑛 (𝐽𝜆𝑛 𝑥) + 𝑛→∞
1 2 𝑑 (𝑥, 𝐽𝜆𝑛 𝑥) ] , 2𝜆
(5.2.26)
which gives 𝐽𝜆 𝑥 = 𝑐, by the uniqueness of 𝐽𝜆 𝑥. Hence, since 𝑐 was arbitrary, the whole sequence (𝐽𝜆𝑛 𝑥)𝑛 weakly converges to 𝐽𝜆 𝑥. Furthermore,
lim sup 𝑛→∞
1 2 𝑑 (𝑥, 𝐽𝜆𝑛𝑥) ≤ lim sup (−𝑓𝑛 (𝐽𝜆𝑛𝑥)) + lim sup 𝑓𝑛 (𝑦𝑛 ) 2𝜆 𝑛→∞ 𝑛→∞ 1 2 + lim sup 𝑑(𝑥, 𝑦𝑛) 𝑛→∞ 2𝜆 1 ≤ − lim inf 𝑓𝑛 (𝐽𝜆𝑛 𝑥) + 𝑓 (𝐽𝜆 𝑥) + 𝑑(𝑥, 𝐽𝜆 𝑥)2 𝑛→∞ 2𝜆 1 2 𝑑(𝑥, 𝐽𝜆 𝑥) ≤ 2𝜆 1 2 ≤ lim inf (5.2.27) 𝑑 (𝑥, 𝐽𝜆𝑛𝑥) . 𝑛→∞ 2𝜆
Proposition 3.1.6 and (5.2.27) give together the strong convergence
𝐽𝜆𝑛𝑥 → 𝐽𝜆 𝑥,
as 𝑛 → ∞,
which proves (5.2.24). Finally, inequality (5.2.26) immediately gives (5.2.23) and the proof is complete. As we have already mentioned, each of (5.2.23) and (5.2.24) in Theorem 5.2.4 in turn implies the Mosco convergence of (𝑓𝑛 ) as long as the underlying space is Hilbert. We raise the question whether the same is true in a general Hadamard space. Question 5.2.5. Do (5.2.23) or (5.2.24) in Theorem 5.2.4 in turn imply 𝑓𝑛 → 𝑓 in the sense of Mosco? Definition 5.2.6 (Mosco convergence of sets). A sequence of closed convex sets 𝐶𝑛 ⊂ H is said to Mosco converge to a set 𝐶 ⊂ H if the indicator functions 𝜄𝐶𝑛 converge in the sense of Mosco to the indicator function 𝜄𝐶 . Example 5.2.7 (Monotone sequences of sets). Let (𝐶𝑛 ) be a sequence of convex closed subsets of a Hadamard space. If (𝐶𝑛 ) is nonincreasing, then it converges in
106  5 Gradient flow of a convex functional the sense of Mosco to its intersection. Likewise, if (𝐶𝑛 ) is nondecreasing, it converges in the sense of Mosco to the closure of its union. These facts are rather straightforward to prove; see [148, Lemma 1.2, Lemma 1.3] for the linear case. Corollary 5.2.8. Let (H, 𝑑) be a Hadamard space. Assume 𝐶𝑛 ⊂ H are convex closed sets for each 𝑛 ∈ ℕ. If the sequence (𝐶𝑛 ) converges to a set 𝐶 ⊂ H in the sense of Mosco, then the set 𝐶 is closed convex and: (i) 𝑑(𝑥, 𝐶𝑛) → 𝑑(𝑥, 𝐶), (ii) 𝑃𝐶𝑛 𝑥 → 𝑃𝐶 𝑥, for each 𝑥 ∈ H. Proof. The Mosco convergence of functions preserves convexity and the limit function is always lsc. This gives the first statement. The rest follows immediately from the fact that the Moreau–Yosida envelope (with 𝜆 := 12 ) of the indicator function is the distance function squared and the resolvent of the indicator function is the nearest point mapping. The convergence in Corollary 5.2.8 (i) is called Frolík–Wijsman. Hence the Mosco convergence of convex closed sets implies the Frolík–Wijsman convergence. We next show that the convergence of the resolvents implies the convergence of the semigroups. Theorem 5.2.9. Let (H, 𝑑) be a Hadamard space. Assume 𝑓 : H → (−∞, ∞] and 𝑓𝑛 : H → (−∞, ∞], for 𝑛 ∈ ℕ, are convex lsc functions. We use Notation 5.2.1. Assume that for every 𝑥 ∈ dom 𝑓 and 𝜆 ∈ (0, ∞) we have
lim 𝐽𝑛 𝑥 𝑛→∞ 𝜆
= 𝐽𝜆 𝑥 .
(5.2.28)
lim 𝑆𝑛𝑥 𝑛→∞ 𝑡
= 𝑆𝑡 𝑥 ,
(5.2.29)
Then
for every 𝑥 ∈ dom 𝑓 and 𝑡 ∈ (0, ∞). Proof. Step 1: Assume first 𝑥 ∈ dom 𝜕𝑓. We then have
𝑑 (𝑆𝑛𝑡𝑥, 𝑆𝑡 𝑥) ≤ 𝑑 (𝑆𝑛𝑡𝑥, 𝑆𝑛𝑡 (𝐽𝜆𝑛𝑥)) + 𝑑 (𝑆𝑛𝑡 (𝐽𝜆𝑛𝑥) , 𝑆𝑡 𝑥) ≤ 𝑑 (𝑥, 𝐽𝜆𝑛 𝑥) + 𝑑 (𝑆𝑛𝑡 (𝐽𝜆𝑛𝑥) , 𝑆𝑡 𝑥) , for every 𝜆 ∈ (0, ∞) and 𝑛 ∈ ℕ. The second term on the righthand side can be further estimated
𝑑 (𝑆𝑛𝑡 (𝐽𝜆𝑛 𝑥) , 𝑆𝑡 𝑥) ≤ 𝑑 (𝑆𝑛𝑡 (𝐽𝜆𝑛 𝑥) , (𝐽𝑛𝑡 )
(𝑘)
𝑘
+ 𝑑 ((𝐽𝑛𝑡 ) 𝑘
(𝑘)
𝑥, (𝐽 𝑡 ) 𝑘
𝐽𝜆𝑛𝑥) + 𝑑 ((𝐽𝑛𝑡 )
(𝑘)
𝑘
(𝑘)
𝑥) + 𝑑 ((𝐽 𝑡 ) 𝑘
(𝑘)
𝐽𝜆𝑛𝑥, (𝐽𝑛𝑡 ) 𝑘
𝑥, 𝑆𝑡𝑥) .
(𝑘)
𝑥)
5.2 Mosco convergence and its consequences 
107
By (5.1.5) and (5.1.3),
𝑑 (𝑆𝑛𝑡 (𝐽𝜆𝑛 𝑥) , (𝐽𝑛𝑡 )
(𝑘)
𝑘
𝑛 𝑡 𝑛 𝑛 𝑡 𝑑 (𝐽𝜆 𝑥, 𝑥) . 𝜕𝑓 (𝐽𝜆 𝑥) ≤ √2𝑘 √2𝑘 𝜆
𝐽𝜆𝑛𝑥) ≤
Furthermore, inequality (5.1.5) also reads
𝑑 ((𝐽 𝑡 )
(𝑘)
𝑘
𝑡 𝜕𝑓 (𝑥). √2𝑘
𝑥, 𝑆𝑡 𝑥) ≤
Altogether we obtain,
𝑑 (𝑆𝑛𝑡𝑥, 𝑆𝑡 𝑥) ≤2𝑑 (𝑥, 𝐽𝜆𝑛𝑥) + (𝑘)
+ 𝑑 ((𝐽𝑛𝑡 ) 𝑘
𝑛 𝑡 𝑑 (𝐽𝜆 𝑥, 𝑥) √2𝑘 𝜆 (𝑘)
𝑥, (𝐽 𝑘𝑡 )
𝑥) +
𝑡 𝜕𝑓 (𝑥) . √2𝑘
Step 2: Now fix 𝜀 > 0 and choose 𝜆 0 ∈ (0, 1) so that √𝜆 0 𝜕𝑓(𝑥) < 𝜀. By the assumption (5.2.28) and by inequality (5.1.3), we have
lim 𝑛→∞
𝑑 (𝑥, 𝐽𝜆𝑛𝑥) 𝑑 (𝑥, 𝐽𝜆 𝑥) = ≤ 𝜕𝑓(𝑥), √𝜆 √𝜆
for each 𝜆 ∈ (0, ∞). There is therefore 𝑛0 ∈ ℕ such that for every 𝑛 > 𝑛0 we have
𝑑 (𝑥, 𝐽𝜆𝑛0 𝑥) √𝜆 0 and hence
≤
𝑑 (𝑥, 𝐽𝜆 0 𝑥) √𝜆 0
+ 𝜀,
𝑑 (𝑥, 𝐽𝜆𝑛0 𝑥) ≤ √𝜆 0 𝜕𝑓 (𝑥) + 𝜀√𝜆 0 < 2𝜀.
Next choose 𝑘0 ∈ ℕ such that
𝑡 𝜕𝑓 (𝑥) < 𝜀, √2𝑘0 and simultaneously, 𝑛
𝑡 𝑑 (𝑥, 𝐽𝜆 0 𝑥) < 𝜀, √2𝑘0 𝜆0 for every 𝑛 > 𝑛0 . Then we can find 𝑛1 > 𝑛0 so that for every 𝑛 > 𝑛1 we have
𝑑 ((𝐽𝑛𝑡 ) 𝑘0
(𝑘0 )
𝑥, (𝐽 𝑡 ) 𝑘0
(𝑘0 )
𝑥) < 𝜀.
Altogether we have
𝑑 (𝑆𝑛𝑡𝑥, 𝑆𝑡𝑥) < 7𝜀, for each 𝑛 > 𝑛1 .
108  5 Gradient flow of a convex functional Step 3: Let finally 𝑥 ∈ dom 𝑓. Since dom 𝜕𝑓 is dense in dom 𝑓 by Lemma 5.1.3, there exists, for each 𝛿 > 0, a point 𝑦 ∈ dom 𝜕𝑓 such that 𝑑(𝑥, 𝑦) < 𝛿. Then
𝑑 (𝑆𝑛𝑡𝑥, 𝑆𝑡𝑥) ≤ 𝑑 (𝑆𝑛𝑡𝑥, 𝑆𝑛𝑡𝑦) + 𝑑 (𝑆𝑛𝑡𝑦, 𝑆𝑡𝑦) + 𝑑 (𝑆𝑡𝑦, 𝑆𝑡 𝑥) < 2𝛿 + 𝑑 (𝑆𝑛𝑡𝑦, 𝑆𝑡𝑦) , which finishes the proof. Question 5.2.10. Is the convergence in (5.2.29) uniform with respect to 𝑡 on each bounded time interval?
5.3 Lie–Trotter–Kato formula Let (H, 𝑑) be a Hadamard space. In this section we consider a function 𝑓 : H → (−∞, ∞] of the form 𝑁
𝑓 := ∑ 𝑓𝑛 ,
(5.3.30)
𝑛=1
where 𝑓𝑛 : H → (−∞, ∞] are convex lsc functions, 𝑛 = 1, . . . , 𝑁 and 𝑁 ∈ ℕ. The Lie–Trotter–Kato formula tells us that the semigroup of 𝑓 can be approximated by the resolvents of the components 𝑓𝑛 . Functions of the type (5.3.30) appeared for instance in Examples 2.2.5 and 2.2.6 and will be revisited in (6.3.15) and (8.3.2). Before stating the main result in Theorem 5.3.7, we need several auxiliary results. Define first a resolvent of a family of nonexpansive mappings, extending Definition 4.2.1. Definition 5.3.1. Let (H, 𝑑) be a Hadamard space and 𝐹𝜌 : H → H be a nonexpansive map for each 𝜌 ∈ (0, ∞). Given 𝜆 ∈ (0, ∞) and 𝑥 ∈ H, the map 𝜆
1 𝜌 𝑦 → 𝑥+ 𝐹𝜌 𝑦 , 𝜆 1+ 𝜌 1 + 𝜆𝜌 is a contraction with Lipschitz constant
𝜆/𝜌 1+(𝜆/𝜌)
𝑦∈H,
(5.3.31)
and hence has a unique fixed point,
which will be denoted 𝑅𝜆,𝜌 (𝑥). The mapping 𝑥 → 𝑅𝜆,𝜌 (𝑥) is called the resolvent of the family (𝐹𝜌 ). Definition 5.3.2. Let (H, 𝑑) be a Hadamard space and 𝐹𝜌 : H → H be a nonexpansive map for each 𝜌 ∈ (0, ∞). The semigroup of the family (𝐹𝜌 ) is the mapping 𝜌
(𝑗)
𝑇𝑡 𝑥 := lim 𝑅 𝑡 ,𝜌 𝑥 , 𝑗→∞
where 𝑡 ∈ [0, ∞).
𝑗
𝑥∈H.
5.3 Lie–Trotter–Kato formula  109 𝜌
If we replace 𝐹 by 𝐹𝜌 in Theorem 4.3.3 and its proof, we can see that (𝑇𝑡 ) is a welldefined strongly continuous semigroup of nonexpansive mappings and satisfies 𝜌
𝑑 (𝑇𝑡 𝑥, 𝑇𝑠𝜌 𝑥) ≤ 2
𝑑 (𝑥, 𝐹𝜌 𝑥) 𝜌
𝑡 − 𝑠 ,
𝑥∈H,
for every 𝑠, 𝑡 ∈ [0, ∞), and 𝜌
(𝑗)
𝑑 (𝑇𝑡 𝑥, 𝑅 𝑡 ,𝜌 𝑥) ≤ 𝑗
2𝑡 𝑑 (𝑥, 𝐹𝜌 𝑥) , 𝜌 √𝑗
𝑥∈H,
(5.3.32)
for every 𝑡 ∈ [0, ∞) and 𝑗 ∈ ℕ. Lemma 5.3.3. Let 𝐹𝜌 : H → H be a nonexpansive map for each 𝜌 ∈ (0, ∞). The semigroup of the family (𝐹𝜌 ) satisfies 𝜌
𝑑 (𝑇𝑡 𝑥, 𝐹𝜌(𝑘) 𝑥) ≤ 2𝑑(𝑥, 𝐹𝑥)√(𝑘 −
2
𝑡 𝑡 ) + , 𝜌 𝜌
𝑥∈H,
for every 𝑡 ∈ [0, ∞) and 𝑘 ∈ ℕ0 . Proof. Lemma 4.3.5 gives the estimate
𝑑 (𝑇𝑡1 𝑥, 𝐹𝜌(𝑘) 𝑥) ≤ 2𝑑(𝑥, 𝐹𝑥)√(𝑘 − 𝑡)2 + 𝑡 ,
𝑥∈H.
𝜌
Next observe that 𝑇𝑡 = 𝑇1𝑡 , because 𝑅𝜆,𝜌 = 𝑅 𝜆 ,1 . Combining these two facts finishes 𝜌
𝜌
the proof. It is useful to set up some notation. Notation 5.3.4. Let 𝑓 : H → (−∞, ∞] be a function of the form (5.3.30). Denote its resolvent 𝐽𝜆 and semigroup (𝑆𝑡 ), respectively. The resolvent of the function 𝑓𝑛 is denoted 𝐽𝜆𝑛 , for each 𝑛 = 1, . . . , 𝑁. At the same time, there will be a family of nonexpan𝜌 sive maps 𝐹𝜌 : H → H and we will use its resolvents 𝑅𝜆,𝜌 and semigroup (𝑇𝑡 ) defined in Definitions 5.3.1 and 5.3.2, respectively. The proof of the Lie–Trotter–Kato formula relies upon the next two approximation results (Theorems 5.3.5 and 5.3.6) which may also be of independent interest. It would be more economical to state these two theorems as a single result, but we choose not to do so for the sake of clarity. Theorem 5.3.5. Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be of the form (5.3.30). We use Notation 5.3.4 and put 𝐹𝜌 := 𝐽𝜌𝑁 ∘ ⋅ ⋅ ⋅ ∘ 𝐽𝜌1 . If we have
𝐽𝜆 𝑥 = lim 𝑅𝜆,𝜌 𝑥 , 𝜌→0+
110  5 Gradient flow of a convex functional for every 𝑥 ∈ dom 𝑓 and 𝜆 ∈ [0, ∞), then, 𝜌
𝑆𝑡 𝑥 = lim 𝑇𝑡 𝑥 ,
(5.3.33)
𝜌→0+
for every 𝑥 ∈ dom 𝑓 and 𝑡 ∈ [0, ∞). Proof. Step 1: Let us first assume that 𝑥 ∈ dom 𝜕𝑓. By the triangle inequality, 𝜌
𝜌
𝜌
𝜌
𝑑 (𝑇𝑡 𝑥, 𝑆𝑡 𝑥) ≤ 𝑑 (𝑇𝑡 𝑥, 𝑇𝑡 𝑅𝜆,𝜌 𝑥) + 𝑑 (𝑇𝑡 𝑅𝜆,𝜌 𝑥, 𝑆𝑡 𝑥) 𝜌
≤ 𝑑 (𝑥, 𝑅𝜆,𝜌 𝑥) + 𝑑 (𝑇𝑡 𝑅𝜆,𝜌 𝑥, 𝑆𝑡 𝑥) ,
(5.3.34)
and the second term on the right hand side can be further estimated 𝜌
𝜌
(𝑗)
(𝑗)
(𝑗)
𝑗
𝑗
𝑑 (𝑇𝑡 𝑅𝜆,𝜌 𝑥, 𝑆𝑡 𝑥) ≤ 𝑑 (𝑇𝑡 𝑅𝜆,𝜌 𝑥, 𝑅 𝑡 ,𝜌 𝑅𝜆,𝜌 𝑥) + 𝑑 (𝑅 𝑡 ,𝜌 𝑅𝜆,𝜌 𝑥, 𝑅 𝑡 ,𝜌 𝑥) 𝑗
+
(𝑗) (𝑗) 𝑑 (𝑅 𝑡 ,𝜌 𝑥, 𝐽 𝑡 𝑥) 𝑗 𝑗
(𝑗)
+ 𝑑 (𝐽 𝑡 𝑥, 𝑆𝑡 𝑥) . 𝑗
The error estimate in (5.3.32) gives 𝜌
(𝑗)
𝑑 (𝑇𝑡 𝑅𝜆,𝜌 𝑥, 𝑅 𝑡 ,𝜌 𝑅𝜆,𝜌 𝑥) ≤ 𝑗
2𝑡 2𝑡 𝑑 (𝑅𝜆,𝜌 𝑥, 𝐹𝜌 𝑅𝜆,𝜌 𝑥) = 𝑑 (𝑥, 𝑅𝜆,𝜌 𝑥) , 𝜌 √𝑗 𝜆√𝑗
and applying also (5.1.5), inequality (5.3.34) becomes 𝜌
2𝑡 𝑑 (𝑥, 𝑅𝜆,𝜌 𝑥) 𝜆√𝑗 𝑡 (𝑗) (𝑗) + 𝑑 (𝑅 𝑡 ,𝜌 𝑥, 𝐽 𝑡 𝑥) + 𝜕𝑓(𝑥) , 𝑗 𝑗 √2𝑗
𝑑 (𝑇𝑡 𝑥, 𝑆𝑡 𝑥) ≤ 2𝑑 (𝑥, 𝑅𝜆,𝜌 𝑥) +
(5.3.35)
for each 𝑗 ∈ ℕ. At the next step, we will show that all the terms on the right hand side tend to 0 as 𝜌 → 0. Step 2: Choose 𝜀 > 0 and find 𝜆 0 ∈ (0, 1) such that 𝜆 0 𝜕𝑓(𝑥) < 𝜀. By the assumptions and (5.1.3), we have
lim 𝑑 (𝑥, 𝑅𝜆,𝜌 𝑥) = 𝑑 (𝑥, 𝐽𝜆 𝑥) ≤ 𝜆𝜕𝑓(𝑥) .
𝜌→0+
for each 𝜆 > 0, and hence there exists 𝜌0 > 0 such that
𝑑 (𝑥, 𝑅𝜆 0 ,𝜌 𝑥) ≤ 𝑑 (𝑥, 𝐽𝜆 0 𝑥) + 𝜀 ≤ 𝜆 0 𝜕𝑓(𝑥) + 𝜀 < 2𝜀 , for every 𝜌 ∈ (0, 𝜌0 ). Next find 𝑗0 ∈ ℕ such that
𝑡 𝜕𝑓(𝑥) < 𝜀 , √2𝑗0
5.3 Lie–Trotter–Kato formula  111
and simultaneously
2𝑡 𝑑 (𝑥, 𝑅𝜆 0 ,𝜌 𝑥) < 𝜀 , 𝜆 0 √𝑗0 for every 𝜌 ∈ (0, 𝜌0 ). Finally, by the assumptions we can find 𝜌1 ∈ (0, 𝜌0 ) such that
(𝑗 ) (𝑗 ) 𝑑 (𝑅 𝑡 0,𝜌 𝑥, 𝐽 𝑡 0 𝑥) < 𝜀 . 𝑗0
𝑗0
for every 𝜌 ∈ (0, 𝜌1 ). This finishes the proof of (5.3.33) for 𝑥 ∈ dom 𝜕𝑓. Step 3: Let now 𝑥 ∈ dom 𝑓 and 𝛿 > 0. By Lemma 5.1.3 there exists 𝑦 ∈ dom 𝜕𝑓 with 𝑑(𝑥, 𝑦) < 𝛿. Then 𝜌
𝜌
𝜌
𝜌
𝑑 (𝑇𝑡 𝑥, 𝑆𝑡 𝑥) ≤ 𝑑 (𝑇𝑡 𝑥, 𝑇𝑡 𝑦) + 𝑑 (𝑇𝑡 𝑦, 𝑆𝑡 𝑦) + 𝑑 (𝑆𝑡 𝑦, 𝑆𝑡𝑥) 𝜌
≤ 2𝑑(𝑥, 𝑦) + 𝑑 (𝑇𝑡 𝑦, 𝑆𝑡𝑦) , which gives (5.3.33) for 𝑥 ∈ dom 𝑓. Theorem 5.3.6. Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be of the form (5.3.30). We use Notation 5.3.4. Let 𝑅𝜆,𝜌 be the resolvent of the map 𝐹𝜌 := 𝐽𝜌𝑁 ∘⋅ ⋅ ⋅∘𝐽𝜌1 . If we have
𝐽𝜆 𝑥 = lim 𝑅𝜆,𝜌 𝑥 , 𝜌→0+
for every 𝑥 ∈ dom 𝑓 and 𝜆 ∈ [0, ∞), then, (𝑗)
𝑆𝑡 𝑥 = lim (𝐽𝑁𝑡 ∘ ⋅ ⋅ ⋅ ∘ 𝐽1𝑡 ) 𝑥 , 𝑗
𝑗
𝑗→∞
for every 𝑥 ∈ dom 𝑓 and 𝑡 ∈ [0, ∞). Proof. Assume that 𝑥 ∈ dom 𝜕𝑓. We have 𝑡
(𝑗)
𝑡
(𝑗)
𝑑 (𝑆𝑡 𝑥, 𝐹 𝑡 𝑥) ≤ 𝑑 (𝑆𝑡 𝑥, 𝑇𝑡𝑗 𝑥) + 𝑑 (𝑇𝑡𝑗 𝑥, 𝐹 𝑡 𝑥) , 𝑗
𝑗
for 𝑗 ∈ ℕ, and the first term on the right hand side converges to 0 as 𝑗 → ∞ by Theorem 5.3.5. The second term can be estimated 𝑡
𝑡
(𝑗)
𝑡
𝑡
(𝑗)
(𝑗)
(𝑗)
𝑑(𝑇𝑡𝑗 𝑥, 𝐹 𝑡 𝑥) ≤ 𝑑(𝑇𝑡𝑗 𝑥, 𝑇𝑡𝑗 𝑅𝜆, 𝑡 𝑥) + 𝑑(𝑇𝑡𝑗 𝑅𝜆, 𝑡 𝑥, 𝐹 𝑡 𝑅𝜆, 𝑡 𝑥) + 𝑑(𝐹 𝑡 𝑅𝜆, 𝑡 𝑥, 𝐹 𝑡 𝑥) 𝑗
𝑗
𝑗
𝑡 𝑗
𝑗
𝑗
𝑗
𝑗
(𝑗)
≤ 2𝑑(𝑥, 𝑅𝜆, 𝑡 𝑥) + 𝑑(𝑇𝑡 𝑅𝜆, 𝑡 𝑥, 𝐹 𝑡 𝑅𝜆, 𝑡 𝑥) , 𝑗
𝑗
𝑗
𝑗
for 𝑗 ∈ ℕ and 𝜆 > 0. By Lemma 5.3.3 we obtain 𝑡
(𝑗)
𝑑 (𝑇𝑡𝑗 𝑅𝜆, 𝑡 𝑥, 𝐹 𝑡 𝑅𝜆, 𝑡 𝑥) ≤ 2√𝑗𝑑 (𝑅𝜆, 𝑡 𝑥, 𝐹 𝑡 𝑅𝜆, 𝑡 𝑥) = 𝑗
𝑗
𝑗
𝑗
𝑗
𝑗
2𝑡 𝑑 (𝑥, 𝑅𝜆, 𝑡 𝑥) , 𝑗 𝜆√𝑗
𝑗
112  5 Gradient flow of a convex functional which can be made arbitrarily small by taking 𝑗 ∈ ℕ big enough. Choose now 𝜀 > 0 and find 𝜆 0 ∈ (0, 1) such that 𝜆 0 𝜕𝑓(𝑥) < 𝜀. By the assumptions and (5.1.3), we have
lim 𝑑 (𝑥, 𝑅𝜆 0 ,𝜌 𝑥) = 𝑑 (𝑥, 𝐽𝜆 0 𝑥) ≤ 𝜆 0 𝜕𝑓(𝑥) .
𝜌→0+
and hence there exists 𝑗0 ∈ ℕ such that
2𝑑 (𝑥, 𝑅𝜆 0 , 𝑗𝑡 𝑥) ≤ 2𝑑 (𝑥, 𝐽𝜆 0 𝑥) + 𝜀 ≤ 2𝜆 0 𝜕𝑓(𝑥) + 𝜀 < 3𝜀 , 0
which proves the statement for 𝑥 ∈ dom 𝜕𝑓. Using an easy approximation argument like in the proof of Theorem 5.3.5, the statement holds also for 𝑥 ∈ dom 𝑓. We are now ready to present the Lie–Trotter–Kato product formula, which says that the semigroup of the function 𝑓 of the form (5.3.30) can be approximated by the resolvents of the components 𝑓𝑛 , with 𝑛 = 1, . . . , 𝑁. Theorem 5.3.7 (Lie–Trotter–Kato formula). Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be of the form (5.3.30). We use Notation 5.3.4. Then we have (𝑗)
𝑆𝑡𝑥 = lim (𝐽𝑁𝑡 ∘ ⋅ ⋅ ⋅ ∘ 𝐽1𝑡 ) 𝑥 , 𝑗→∞
(5.3.37)
𝑗
𝑗
for every 𝑡 ∈ [0, ∞) and 𝑥 ∈ dom 𝑓. Proof. Step 1: By Theorem 5.3.6, it suffices to show 𝑅𝜆,𝜌 𝑥 → 𝐽𝜆 𝑥 as 𝜌 → 0, where 𝑅𝜆,𝜌
is the resolvent of 𝐹𝜌 := 𝐽𝜌𝑁 ∘ ⋅ ⋅ ⋅ ∘ 𝐽𝜌1 . Put 𝑥0 (𝜌) := 𝑅𝜆,𝜌 𝑥 and 𝑥𝑛 (𝜌) := 𝐽𝜌𝑛 𝑥𝑛−1 (𝜌) for 𝑛 = 1, . . . , 𝑁. By the definition of 𝑅𝜆,𝜌 we have 𝜆
1 𝜌 𝑥0 (𝜌) = 𝑥+ 𝑥𝑁 (𝜌) , 𝜆 1+ 𝜌 1 + 𝜆𝜌
(5.3.38)
and consequently,
𝑑 (𝑥, 𝑥𝑁 (𝜌)) =
𝜌+𝜆 𝑑 (𝑥, 𝑥0 (𝜌)) , 𝜆
(5.3.39)
together with,
𝜌 𝑑 (𝑥, 𝑥0 (𝜌)) . (5.3.40) 𝜆 Step 2: Applying Lemma 2.2.23 for each 𝑓𝑛 , with 𝑛 = 1, . . . , 𝑁, and summing the 𝑑 (𝑥0 (𝜌), 𝑥𝑁 (𝜌)) =
resulting inequalities up, we arrive at 𝑁
2
2𝜌𝑓(𝑣) ≥ 2𝜌 ∑ 𝑓𝑛 (𝑥𝑛(𝜌)) + 𝑑 (𝑥𝑁 (𝜌), 𝑣) − 𝑑 (𝑥0 (𝜌), 𝑣)
2
𝑛=1 𝑁
2
+ ∑ 𝑑 (𝑥𝑛−1 (𝜌), 𝑥𝑛(𝜌)) , 𝑛=1
(5.3.41)
5.3 Lie–Trotter–Kato formula  113
for every 𝑣 ∈ dom 𝑓. Inequality (1.2.2) yields 𝜆 𝜌
1+
2
𝜆 𝜌
2
𝑑 (𝑣, 𝑥𝑁 (𝜌)) ≥ 𝑑 (𝑣, 𝑥0 (𝜌)) −
1 1+
𝜆 𝜌
2
𝜆 𝜌
𝑑(𝑣, 𝑥) +
(1 + 𝜆𝜌 )
2
2
𝑑 (𝑥, 𝑥𝑁 (𝜌)) .
(5.3.42) Combining this inequality with (5.3.39) and (5.3.41) gives after some elementary calculations that 𝑁
2
2
2𝜆𝑓(𝑣) ≥ 2𝜆 ∑ 𝑓𝑛 (𝑥𝑛(𝜌)) + 𝑑 (𝑥0 (𝜌), 𝑥) + 𝑑 (𝑥0 (𝜌), 𝑣) − 𝑑(𝑥, 𝑣)2 .
(5.3.43)
𝑛=1
for every 𝑣 ∈ dom 𝑓. Step 3: Fix now a sequence 𝜌𝑗 → 0. We will show that, for every 𝑛 = 0, 1, . . . , 𝑁, the sequence (𝑥𝑛 (𝜌𝑗 ))𝑗 is bounded. To this end, apply Lemma 2.2.13 to obtain 𝛼, 𝛽 ∈ ℝ such that
𝑓𝑛 (𝑥𝑛 (𝜌𝑗 )) ≥ 𝛼 + 𝛽𝑑 (𝑥, 𝑥𝑛 (𝜌𝑗 )) , for every 𝑛 = 1, . . . , 𝑁 and 𝑗 ∈ ℕ. Since, by (1.1), we have 2
𝑑 (𝑥0 (𝜌𝑗 ) , 𝑣) ≥
2 1 𝑑 (𝑥0 (𝜌𝑗 ) , 𝑥) − 𝑑(𝑥, 𝑣)2 , 2
inequality (5.3.43) yields 𝑁 2 3 𝜆𝑓(𝑣) ≥ 𝜆[𝑁𝛼 + 𝛽 ∑ 𝑑 (𝑥, 𝑥𝑛 (𝜌𝑗 ))] + 𝑑 (𝑥0 (𝜌𝑗 ) , 𝑥) − 𝑑(𝑥, 𝑣)2 . 4 𝑛=1
(5.3.44)
for every 𝑣 ∈ dom 𝑓. The case 𝛽 ≥ 0 is easy, we therefore assume 𝛽 < 0. Next, observe that
𝑑 (𝑥, 𝑥𝑛 (𝜌𝑗 )) ≤ 𝑑 (𝑥, 𝐽𝜌𝑛𝑗 𝑥) + 𝑑 (𝐽𝜌𝑛𝑗 𝑥, 𝐽𝜌𝑛𝑗 𝐽𝜌𝑛−1 𝑥) + ⋅ ⋅ ⋅ + 𝑑 (𝐽𝜌𝑛𝑗 . . . 𝐽𝜌1𝑗 𝑥, 𝑥𝑛 (𝜌𝑗 )) 𝑗 𝑛
≤ ∑ 𝑑 (𝑥, 𝐽𝜌𝑘𝑗 𝑥) + 𝑑 (𝑥, 𝑥0 (𝜌𝑗 )) , 𝑘=1
≤ 𝐿 + 𝑑 (𝑥, 𝑥0 (𝜌𝑗 ))
(5.3.45)
for some 𝐿 > 0 and every 𝑛 = 1, . . . , 𝑁 and 𝑗 ∈ ℕ. Plugging this inequality into (5.3.44) gives that (𝑥0 (𝜌𝑗 ))𝑛 is bounded and by (5.3.45) we get that the sequence (𝑥𝑛 (𝜌𝑗 ))𝑗 is bounded also for 𝑛 = 1, . . . , 𝑁. Then Lemma 2.2.12 yields that the sequence (𝑓𝑛 (𝑥𝑛 (𝜌𝑗 )))𝑗 is bounded from below and inequality (5.3.43) implies that it is also bounded from above. Step 4: Consider (5.3.41) with 𝜌 := 𝜌𝑗 and take the limit 𝑗 → ∞ to obtain via (5.3.40) that 𝑁
2
lim ∑ 𝑑 (𝑥𝑛−1 (𝜌𝑗 ) , 𝑥𝑛 (𝜌𝑗 )) = 0 .
𝑗→∞
𝑛=1
(5.3.46)
114  5 Gradient flow of a convex functional Let 𝑧 ∈ H be a weak cluster point of 𝑥0 (𝜌𝑗 ) and 𝑥0 (𝜌𝑗𝑝 ) be a sequence weakly converging to 𝑧. Recall that the existence of a weak cluster point was guaranteed by Proposition 3.1.2. By (5.3.46), also 𝑥𝑗 (𝜌𝑗𝑝 ) weakly converges to 𝑧, for every 𝑛 = 1, . . . , 𝑁. Consequently, 𝑧 ∈ dom 𝑓 due to Lemma 3.2.1. Consider next inequality (5.3.43) with 𝜌 := 𝜌𝑗𝑝 and take the limit 𝑝 → ∞. Since the functions 𝑓𝑛 , with 𝑛 = 1, . . . , 𝑁, are weakly lsc by Lemma 3.2.3, we obtain 𝑧 = 𝐽𝜆 𝑥. In particular, 𝑧 ∈ dom 𝑓. Since 𝑧 was an arbitrary weak cluster point of 𝑥0 (𝜌𝑗 ), we 𝑤
get 𝑥0 (𝜌𝑗 ) → 𝐽𝜆 𝑥. Applying once again (5.3.43) with 𝑣 := 𝐽𝜆 𝑥 gives 𝑥0 (𝜌𝑗 ) → 𝐽𝜆 𝑥 as 𝑗 → ∞. This finishes the proof of (5.3.37). Question 5.3.8. Is the convergence in (5.3.37) uniform with respect to 𝑡 on each bounded time interval?
Exercises Exercise 5.1. Find a convex lsc function 𝑓 : ℝ → (−∞, ∞] such that dom 𝜕𝑓 ≠
dom 𝑓. Exercise 5.2. Let 𝑣 : (0, 1) → ℝ be a locally absolutely continuous function and 𝐷 : (0, 1)2 → ℝ be a map satisfying 𝐷(𝑠, 𝑡) − 𝐷 (𝑠 , 𝑡) ≤ 𝑣(𝑠) − 𝑣 (𝑠 ) , 𝐷(𝑠, 𝑡) − 𝐷 (𝑠, 𝑡 ) ≤ 𝑣(𝑡) − 𝑣 (𝑡 ) , for every 𝑠, 𝑠 , 𝑡, 𝑡 ∈ (0, 1). Show that the function 𝑡 → 𝐷(𝑡, 𝑡) is locally absolutely continuous on (0, 1) and
d 𝐷(𝑡, 𝑡) − 𝐷(𝑡 − ℎ, 𝑡) 𝐷(𝑡, 𝑡 + ℎ) − 𝐷(𝑡, 𝑡) 𝐷(𝑡, 𝑡) ≤ lim sup + lim sup , d𝑡 ℎ ℎ ℎ→0+ ℎ→0+ for almost every 𝑡 ∈ (0, 1). Hint. See [8, Lemma 4.3.4]. Exercise 5.3. Let 𝑓 : H → (−∞, ∞] be convex lsc and 𝑥 ∈ H. Show that for every 𝑡 ∈ [0, ∞) we have
sup 𝑑 (𝑥, 𝐽(𝑛) 𝑡 𝑥) < ∞ 𝑛∈ℕ
𝑛
Hint. First use (5.1.3) to get the claim for 𝑥 ∈ dom 𝜕𝑓 and then apply the fact that dom 𝜕𝑓 is dense in dom 𝑓 along with the nonexpansiveness of the resolvent, which gives the claim for 𝑥 ∈ dom 𝑓. Finally, use the limit behavior of the resolvent at 0 from Proposition 2.2.26. Exercise 5.4 (Gradient flows of strongly convex functions). Prove Proposition 5.1.15.
Bibliographical remarks
 115
Exercise 5.5 (Strong convergence). Give an elementary proof of the fact that if 𝑓 : H → (−∞, ∞] in Theorem 5.1.16 is strongly convex, then the gradient flow converges strongly to a minimizer. Exercise 5.6. Let (H, 𝑑) be a Hadamard space and 𝑓𝑛 : H → (−∞, ∞] be a sequence of functions which 𝛤converges to a function 𝑓 : H → (−∞, ∞]. Show that 𝑓 is convex whenever 𝑓𝑛 are such. Exercise 5.7. Let (𝑋, 𝑑) be a complete metric space and 𝑓𝑛 : 𝑋 → (−∞, ∞] be a sequence of functions which 𝛤converges to a function 𝑓 : 𝑋 → (−∞, ∞]. Show that 𝑓 is lsc.
Bibliographical remarks A classical reference for problem (5.0.2) is H. Brezis’ book [46]. The study of gradient flows in Hadamard spaces was initiated independently by U. Mayer [139] and J. Jost [105], who both obtained the main existence result (5.1.4) and the semigroup property (Proposition 5.1.8) based on the celebrated Crandall–Liggett construction in linear spaces [68]. There had been however earlier results of this type in a special case [173, 175]. On the other hand a significant part of this theory works in more general metric spaces and is now a hot topic in analysis [8]. Our exposition in Section 5.1 follows the development in [8], which we adapt into Hadamard spaces. A considerable number of these results were already established in [139], namely, the Hölder and Lipschitz property of the flow (Proposition 5.1.10) coming from [139, Theorems 2.2 and 2.9], and including the function (5.1.13). Furthermore, the properties of a gradient flow in Theorem 5.1.13 are from [139] and in the case of (5.1.17), we follow the original proof [139, Theorem 2.14]. The proof of (5.1.16) comes from [9, p. 84]. Large time behavior of a gradient flow (Theorem 5.1.16) comes from [19]. Strongly convex functions were known to produce strongly convergent gradient flow already in [139, Lemma 1.7]. For the Hilbert space case, see [31, Theorem 27.1 (iii)]. The existence of a gradient flow in Hadamard spaces and its basic properties can be alternatively established in a more elementary way [7, Theorem 4.25]. Theorems 5.2.4 and 5.2.9 come from [17], but earlier results on Mosco and 𝛤convergences in Hadamard space were obtained already in [105, 124]. For linear counterparts of these two theorems, see [12, Theorem 3.26], [13, Théorème 1.2] and [46, Theorems 3.16 and 4.2]. Closely related results can be found for instance in [49, 68]. The claim in the proof of Theorem 5.2.4 was in the context of Hilbert spaces proved in [13, Lemme 1.5]. For the details on the Frolík–Wijsman convergence, see [87, 199, 203]. A Banach space version of Corollary 5.2.8 (ii) appears in the paper [101] by M. Israel, Jr. and S. Reich. Question 5.2.10 was motivated by [46, Theorems 3.16 and 4.2]. Theorem 5.3.7 is due to I. Stojkovic [189, Theorem 4.4]. Our proof comes from [18]. The Lie–Trotter–Kato formula in Banach spaces is a classical part of functional anal
116  5 Gradient flow of a convex functional ysis. It originated in seminal works by H. Brezis and A. Pazy [48, 49], P. Chernoff [60], T. Kato [107], I. Miyadera and S. Ôharu [144], S. Reich [168–170], H. Trotter [197, 198] and others. Theorem 5.3.7 is particularly close to [108]. Question 5.3.8 is natural with regard to the above mentioned Hilbert space results. For the details, see also the monographs [46, 61, 79, 164]. We also note that Theorem 5.3.5 was proved in [189, Theorem 3.12] with the additional claim that the semigroup convergence is uniform in 𝑡 on compact intervals. I was however unable to understand the argument in the proof leading to such a claim and Theorem 5.3.5 is therefore stated here without uniform convergence. Consequently, we do not have uniform convergence in Theorem 5.3.6, which is taken from [189, Theorem 3.13], and ultimately we pose Question 5.3.8. A very nice application of the Lie–Trotter–Kato formula in Hadamard spaces was discovered by L. Kovalev in [122]. We should like to also mention a semigroup associated to a nonlinear Markov operator in a Hadamard space, which was obtained by a formula similar to (5.1.4) in [192, Section 8]. I would like to thank Jeffrey Streets for a discussion about Calabi flows and the Donaldson conjecture.
6 Convex optimization algorithms After being concerned with convex analysis in the previous chapters, we now turn to a few topics in convex optimization. That is, computational and algorithmic questions about convex sets and convex functions will be studied. Since the optimization problems under consideration are typically impossible to solve exactly, we develop approximation methods, which means that a sequence of points is generated by an algorithm and its convergence to a desired exact solution is proved. In particular, the algorithms studied in this chapter work in infinitely many steps. We present several optimization methods which are classical in Hilbert spaces and show that they function in Hadamard spaces equally well. In Section 8.3 we will see that such generalizations are not only of theoretical interest, but can be important in real world applications. It is well worth pointing out that all the algorithms rely on the fact that a metric projection on a closed convex set is a nonexpansive mapping (Theorem 2.1.12). Also Fejér monotonicity (Definition 3.2.5) plays a prominent role in the proofs of convergence.
6.1 Convex feasibility problems In many situations in optimization, we are to minimize a function on a set 𝐶 which is given as an intersection of (finitely many) closed convex sets and before running a minimization algorithm itself, one needs to find a feasible solution, that is, a point from 𝐶. This task is in the literature called a convex feasibility problem. Here we will consider the following case: given two closed convex subsets 𝐴 and 𝐵 of a Hadamard space (H, 𝑑) such that 𝐴 ∩ 𝐵 ≠ 0, we are to find a point 𝑥 ∈ 𝐴 ∩ 𝐵. We shall present two approximation algorithms, namely the alternating projection method and the averaged projection method, each of which generates a sequence of points (𝑥𝑛 ) ⊂ H converging to a point 𝑥 ∈ 𝐴 ∩ 𝐵. It turns out that in order to get strong convergence of such a sequence (𝑥𝑛 ), we need additional assumptions. The following regularity conditions mean that if a point is close to both sets 𝐴 and 𝐵, then it is close to their intersection 𝐴 ∩ 𝐵. Under even stronger assumptions on the sets, one obtains linear convergence of the algorithm; see (3.2.4) to recall this type of convergence.
Regularity of two intersecting sets We say that 𝐴, 𝐵 ⊂ H are boundedly regular if for every bounded set 𝑆 ⊂ H and 𝜀 > 0 there exists 𝛿 > 0 such that 𝑑(𝑥, 𝐴 ∩ 𝐵) < 𝜀 whenever 𝑥 ∈ 𝑆 and max{𝑑(𝑥, 𝐴), 𝑑(𝑥, 𝐵)} ≤ 𝛿.
118  6 Convex optimization algorithms The sets 𝐴, 𝐵 ⊂ H are boundedly linearly regular if for every bounded set 𝑆 ⊂ H there exists 𝜅 > 0 such that we have
𝑑 (𝑥, 𝐴 ∩ 𝐵) ≤ 𝜅 max {𝑑(𝑥, 𝐴), 𝑑(𝑥, 𝐵)} , for each 𝑥 ∈ 𝑆. Finally, we say that 𝐴, 𝐵 ⊂ H are linearly regular if there exists 𝜅 > 0 such that we have
𝑑 (𝑥, 𝐴 ∩ 𝐵) ≤ 𝜅 max {𝑑(𝑥, 𝐴), 𝑑(𝑥, 𝐵)} , for each 𝑥 ∈ H.
Alternating projection method Now we are ready to introduce the alternating projection method, a wellknown optimization algorithm in Hilbert spaces, which can be traced back to von Neumann. Let (H, 𝑑) be a Hadamard space and 𝐴, 𝐵 ⊂ H convex closed subsets such that 𝐴∩𝐵 ≠ 0. Choose a starting point 𝑥0 ∈ H and define
𝑥2𝑛−1 := 𝑃𝐴 𝑥2𝑛−2 ,
𝑥2𝑛 := 𝑃𝐵 𝑥2𝑛−1 ,
𝑛∈ℕ.
(6.1.1)
This sequence is sometimes referred to as the alternating sequence. Theorem 6.1.1 (Alternating projections). Let (𝑥𝑛 ) ⊂ H be the sequence generated by (6.1.1). Then the following holds: (i) The sequence (𝑥𝑛 ) weakly converges to a point 𝑥 ∈ 𝐴 ∩ 𝐵. (ii) If 𝐴 and 𝐵 are boundedly regular, then 𝑥𝑛 → 𝑥. (iii) If 𝐴 and 𝐵 are boundedly linearly regular, then 𝑥𝑛 → 𝑥 linearly. (iv) If 𝐴 and 𝐵 are linearly regular, then 𝑥𝑛 → 𝑥 linearly with a rate independent of the starting point. Proof. By (2.1.1) the sequence (𝑥𝑛 ) is Fejér monotone with respect to 𝐴 ∩ 𝐵. We claim that
max {𝑑(𝑥𝑛 , 𝐴)2 , 𝑑(𝑥𝑛 , 𝐵)2 } ≤ 𝑑(𝑥𝑛, 𝐴 ∩ 𝐵)2 − 𝑑(𝑥𝑛+1 , 𝐴 ∩ 𝐵)2 ,
(6.1.2)
for every 𝑛 ∈ ℕ. Indeed, fix 𝑛 ∈ ℕ and without loss of generality assume 𝑥𝑛 ∈ 𝐴 and 𝑥𝑛+1 ∉ 𝐴 ∩ 𝐵. Inequality (2.1.1) reads 2
2
𝑑 (𝑥𝑛, 𝑃𝐴∩𝐵(𝑥𝑛 )) ≥ 𝑑(𝑥𝑛, 𝑥𝑛+1 )2 + 𝑑 (𝑃𝐴∩𝐵 (𝑥𝑛) , 𝑥𝑛+1 ) , 2
2
2
𝑑 (𝑥𝑛, 𝐴 ∩ 𝐵) ≥ 𝑑 (𝑥𝑛, 𝐵) + 𝑑 (𝐴 ∩ 𝐵, 𝑥𝑛+1 ) , which yields (6.1.2). Now, by Fejér monotonicity, Proposition 3.2.6 (ii) and (6.1.2) we get max {𝑑(𝑥𝑛 , 𝐴), 𝑑(𝑥𝑛 , 𝐵)} → 0, as 𝑛 → ∞. (6.1.3)
6.1 Convex feasibility problems 
119
Let us prove (i). Since (𝑥𝑛 ) is bounded, it has a weak cluster point 𝑥 ∈ H. Take a subsequence (𝑥𝑛𝑘 ) which weakly converges to 𝑥. Using Corollary 3.2.4 and (6.1.3), we have 𝑑(𝑥, 𝐴) = 𝑑(𝑥, 𝐵) = 0. Hence 𝑥 ∈ 𝐴 ∩ 𝐵 and we conclude, by Proposi𝑤
tion 3.2.6 (iii), that 𝑥𝑛 → 𝑥. As for (ii), the bounded regularity of 𝐴 and 𝐵 along with (6.1.3) gives 𝑑(𝑥𝑛 , 𝐴 ∩ 𝐵) → 0 as 𝑛 → ∞. Applying Proposition 3.2.6 (iv) yields (ii). To prove (iii), recall that (𝑥𝑛 ) is bounded. Hence, by bounded linear regularity, there exists 𝜅 > 0 such that
𝑑(𝑥𝑛 , 𝐴 ∩ 𝐵) ≤ 𝜅 max {𝑑(𝑥𝑛 , 𝐴), 𝑑(𝑥𝑛 , 𝐵)} , for every 𝑛 ∈ ℕ. Using (6.1.2), we arrive at
𝑑2 (𝑥𝑛 , 𝐴 ∩ 𝐵) ≤ 𝜅2 [𝑑2 (𝑥𝑛 , 𝐴 ∩ 𝐵) − 𝑑2 (𝑥𝑛+1 , 𝐴 ∩ 𝐵)] , 𝑑(𝑥𝑛+1 , 𝐴 ∩ 𝐵) ≤ √1 −
1 𝑑(𝑥𝑛, 𝐴 ∩ 𝐵). 𝜅2
Applying Proposition 3.2.6 (v) finishes the proof of (iii). Finally, the proof of (iv) is similar to that one of (iii). It is natural to ask whether it is possible to obtain the strong convergence of (6.1.1) in general. This was a longstanding problem in Hilbert spaces, which eventually turned out to have a negative answer. Example 6.1.2. There exist a hyperplane 𝐴 ⊂ ℓ2 , a closed convex cone 𝐵 ⊂ ℓ2 and a point 𝑥0 ∈ ℓ2 such that the sequence generated by (6.1.1) from the starting point 𝑥0 converges weakly to a point in 𝐴 ∩ 𝐵, but not in norm [100, 138]. On the other hand already John von Neumann showed that if 𝐻 is a Hilbert space and 𝐴, 𝐵 ⊂ 𝐻 are its closed subspaces, then for any starting point 𝑥0 ∈ 𝐻, the sequence defined by (6.1.1) converges in norm to a point from 𝐴 ∩ 𝐵. There exist surprisingly nice sets 𝐴, 𝐵 ⊂ ℓ2 for which we do not know whether the convergence of the alternating sequence (6.1.1) is strong. Question 6.1.3. Let 𝐴 ⊂ ℓ2 be a closed (affine) subspace and
𝐵 := {(𝑥𝑛) ∈ ℓ2 : 𝑥𝑛 ≥ 0 for every 𝑛 ∈ ℕ} be the positive cone. Assume 𝐴 ∩ 𝐵 ≠ 0 and 𝑥0 ∈ ℓ2 . Does the sequence generated by (6.1.1) converge strongly to a point from 𝐴 ∩ 𝐵?
Averaged projection method Another popular method for solving convex feasibility problems is the averaged projection method. Let again 𝐴 and 𝐵 be closed convex subsets of a Hadamard space
120  6 Convex optimization algorithms
(H, 𝑑) with 𝐴 ∩ 𝐵 ≠ 0. Choose a starting point 𝑥0 ∈ H and put 𝑥𝑛 :=
𝑃𝐴 𝑥𝑛−1 + 𝑃𝐵 𝑥𝑛−1 , 2
𝑛∈ℕ.
(6.1.5)
That is, at each step, the averaged projection method updates 𝑥𝑛−1 to 𝑥𝑛 by projecting 𝑥𝑛−1 on 𝐴 and 𝐵, respectively, and taking the midpoint of the two projections 𝑃𝐴 𝑥𝑛−1 and 𝑃𝐵 𝑥𝑛−1 . Theorem 6.1.4 (Averaged projections). Let (𝑥𝑛 ) be a sequence generated by (6.1.5). Then the following holds: (i) The sequence (𝑥𝑛 ) weakly converges to a point 𝑥 ∈ 𝐴 ∩ 𝐵. (ii) If 𝐴 and 𝐵 are boundedly regular, then 𝑥𝑛 → 𝑥. (iii) If 𝐴 and 𝐵 are boundedly linearly regular, then 𝑥𝑛 → 𝑥 linearly. (iv) If 𝐴 and 𝐵 are linearly regular, then 𝑥𝑛 → 𝑥 linearly with a rate independent of the starting point. Proof. The proof is rather similar to that of Theorem 6.1.1 so we will be brief. The sequence (𝑥𝑛 ) is Fejér with respect to 𝐴 ∩ 𝐵. Let 𝑐 ∈ H be a weak cluster point of (𝑥𝑛 ). Since
𝑑(𝑥𝑛, 𝐴)2 ≤ 𝑑(𝑥𝑛 , 𝐴 ∩ 𝐵)2 − 𝑑 (𝑃𝐴 𝑥𝑛, 𝐴 ∩ 𝐵)
2
2
𝑑(𝑥𝑛, 𝐴)2 ≤ 𝑑(𝑥𝑛 , 𝐴 ∩ 𝐵)2 − 𝑑 (𝑥𝑛+1 , 𝐴 ∩ 𝐵) , and after interchanging 𝐴 and 𝐵 also 2
𝑑(𝑥𝑛, 𝐵)2 ≤ 𝑑(𝑥𝑛, 𝐴 ∩ 𝐵)2 − 𝑑 (𝑥𝑛+1 , 𝐴 ∩ 𝐵) , which together with the weak lower semicontinuity of the metric yields 𝑐 ∈ 𝐴 ∩ 𝐵. By 𝑤
Proposition 3.2.6 (iii), we finally get 𝑥𝑛 → 𝑥. This proves (i). The remainder is similar to the proof of Theorem 6.1.1 and we leave it as Exercise 6.1. In general, the convergence in Theorem 6.1.4 is not strong [32]. Remark 6.1.5. Convex feasibility problems can be also formulated in the language of 𝑃 +𝑃 fixed point theory. Note that 𝑥 ∈ 𝐴 ∩ 𝐵 if and only if ( 𝐴 2 𝐵 )𝑥 = 𝑥. And likewise for 𝑃𝐵 ∘ 𝑃𝐴 , more precisely, 𝑥 ∈ 𝐴 ∩ 𝐵 if and only if (𝑃𝐵 ∘ 𝑃𝐴 )𝑥 = 𝑥. Algorithms for finding fixed points of nonexpansive mappings will be studied in Section 6.2. 𝑃 +𝑃 Moreover, both 𝑃𝐵 ∘ 𝑃𝐴 and 𝐴 2 𝐵 are firmly nonexpansive (Definition 2.1.13) and the fact from Exercise 6.2 can be used to obtain weak convergence as in Theorems 6.1.1 and 6.1.4.
6.2 Fixed point approximations
121

6.2 Fixed point approximations Let again (H, 𝑑) be a Hadamard space and 𝐹 : H → H be a nonexpansive mapping. We are now interested in the set of fixed points of 𝐹, that is,
Fix 𝐹 := {𝑥 ∈ H : 𝑥 = 𝐹𝑥} . More precisely we would like to find a point 𝑥 ∈ Fix 𝐹, but since it is difficult in general, we will content ourselves with a sequence (𝑥𝑛 ) converging to a fixed point of 𝐹. We present three approximation algorithms each of which generates such an approximating sequence.
Krasnoselski–Mann algorithm The first method is called the Krasnoselski–Mann algorithm and is defined as follows. Given a starting point 𝑥0 ∈ H, we set
𝑥𝑛+1 := (1 − 𝛼𝑛) 𝐹𝑥𝑛 + 𝛼𝑛𝑥𝑛,
𝑛 ∈ ℕ0 ,
(6.2.6)
where 𝛼𝑛 ∈ [0, 1] for each 𝑛 ∈ ℕ0 . As we shall see in Theorem 6.2.1 below, the sequence obtained by (6.2.6) converges weakly to a fixed point. Theorem 6.2.1. Let (H, 𝑑) be a Hadamard space and 𝐹 : H → H be a nonexpansive mapping with Fix 𝐹 ≠ 0. Assume that (𝛼𝑛 ) ⊂ [0, 1] satisfying ∑𝑛 𝛼𝑛 (1 − 𝛼𝑛 ) = ∞. Then given a starting point 𝑥0 ∈ H, the sequence (𝑥𝑛 ) defined by (6.2.6) weakly converges to some 𝑥 ∈ Fix 𝐹. Proof. Step 1: We first establish Fejér monotonicity. Let 𝑧 ∈ Fix 𝑓. From (1.2.2) we obtain 2
2
2
2
𝑑 (𝑧, 𝑥𝑛+1 ) ≤ (1 − 𝛼𝑛) 𝑑 (𝐹𝑧, 𝐹𝑥𝑛) + 𝛼𝑛𝑑 (𝑧, 𝑥𝑛) − 𝛼𝑛 (1 − 𝛼𝑛) 𝑑 (𝐹𝑥𝑛, 𝑥𝑛 ) 2
2
≤ 𝑑 (𝑧, 𝑥𝑛) − 𝛼𝑛 (1 − 𝛼𝑛) 𝑑 (𝐹𝑥𝑛, 𝑥𝑛) , which implies that the sequence (𝑥𝑛 ) is Fejér monotone with respect to Fix 𝐹. Step 2: If one applies Exercise 6.5 to the last inequality, we get 2
∑ 𝛼𝑛 (1 − 𝛼𝑛) 𝑑 (𝐹𝑥𝑛, 𝑥𝑛) < ∞ . 𝑛
By the assumptions on (𝛼𝑛 ), we consequently obtain lim inf 𝑑(𝐹𝑥𝑛 , 𝑥𝑛 ) = 0. By the convexity of the metric, we have
𝑑 (𝐹𝑥𝑛+1 , 𝑥𝑛+1 ) = 𝑑 (𝐹𝑥𝑛+1 , (1 − 𝛼𝑛) 𝐹𝑥𝑛 + 𝛼𝑛𝑥𝑛) ≤ 𝑑 (𝐹𝑥𝑛+1 , 𝐹𝑥𝑛) + 𝑑 (𝐹𝑥𝑛, (1 − 𝛼𝑛) 𝐹𝑥𝑛 + 𝛼𝑛𝑥𝑛) ≤ 𝑑 (𝑥𝑛+1 , 𝑥𝑛) + 𝛼𝑛𝑑 (𝐹𝑥𝑛, 𝑥𝑛) = 𝑑 (𝐹𝑥𝑛, 𝑥𝑛) , and the sequence (𝑑(𝐹𝑥𝑛 , 𝑥𝑛 )) hence has a limit. Therefore lim 𝑑(𝐹𝑥𝑛 , 𝑥𝑛 ) = 0.
122  6 Convex optimization algorithms Step 3: According to Proposition 3.2.6 it suffices to show that every weak cluster point of (𝑥𝑛 ) is a fixed point of 𝐹. Let 𝑥 ∈ H be a weak cluster point of (𝑥𝑛 ) and let 𝑤
(𝑦𝑛) be a subsequence of (𝑥𝑛 ) such that 𝑦𝑛 → 𝑥. Since 𝑑(𝑦𝑛 , 𝐹𝑦𝑛) → 0, we have lim sup 𝑑 (𝐹𝑥, 𝑦𝑛) ≤ lim sup [𝑑 (𝐹𝑥, 𝐹𝑦𝑛 ) + 𝑑 (𝐹𝑦𝑛 , 𝑦𝑛)] 𝑛→∞
𝑛→∞
= lim sup 𝑑 (𝐹𝑥, 𝐹𝑦𝑛) 𝑛→∞
≤ lim sup 𝑑 (𝑥, 𝑦𝑛) . 𝑛→∞
This, by the uniqueness of the weak limit, implies 𝑥 = 𝐹𝑥. Applying Proposition 3.2.6 finishes the proof. It is known that the convergence in Theorem 6.2.1 is not strong in general [89]; see also [32, Corollary 5.2].
Halpern algorithm Next we introduce our second algorithm approximating fixed points, which is attributed to B. Halpern. Given a point 𝑥0 ∈ H, define
𝑥𝑛+1 := (1 − 𝛼𝑛) 𝐹𝑥𝑛 + 𝛼𝑛𝑥0 ,
𝑛 ∈ ℕ0 .
(6.2.7)
The strong convergence of the Halpern algorithm is shown in Theorem 6.2.2 below. Notice however that the sequence (6.2.7) is not Fejér monotone. It is worth mentioning that if 𝐹 is a linear nonexpansive mapping on a Hilbert 1 space 𝐻 and we set 𝛼𝑛 := 𝑛+1 , then the iterations in (6.2.7) become
𝑥𝑛 =
1 𝑛 (𝑘) ∑ 𝐹 𝑥0 , 𝑛 + 1 𝑘=0
and are usually called the Cesàro means, or ergodic averages. Theorem 6.2.2. Let (H, 𝑑) be a Hadamard space and 𝐹 : H → H be a nonexpansive mapping with Fix 𝐹 ≠ 0. Assume that a sequence (𝛼𝑛 ) ⊂ (0, 1) satisfies: (i) 𝛼𝑛 → 0, (ii) ∑𝑛 𝛼𝑛 = ∞, (iii) ∑𝑛 𝛼𝑛 − 𝛼𝑛−1  < ∞. Given a starting point 𝑥0 ∈ H, let (𝑥𝑛 ) be the sequence defined by (6.2.7). Then,
lim 𝑥𝑛 = 𝑃Fix 𝐹 (𝑥0 ) .
𝑛→∞
(6.2.8)
6.2 Fixed point approximations
 123
Proof. Step 1: First observe that the sequence (𝑥𝑛 ) is bounded. By the convexity of the metric we have
𝑑 (𝑥𝑛+1 , 𝑥𝑛) = 𝑑 ((1 − 𝛼𝑛) 𝐹𝑥𝑛 + 𝛼𝑛𝑥0 , 𝛼𝑛−1 𝑥0 + (1 − 𝛼𝑛−1 ) 𝐹𝑥𝑛−1 ) ≤ 𝑑 ((1 − 𝛼𝑛) 𝐹𝑥𝑛 + 𝛼𝑛𝑥0 , 𝛼𝑛𝑥0 + (1 − 𝛼𝑛) 𝐹𝑥𝑛−1 ) + 𝑑 (𝛼𝑛𝑥0 + (1 − 𝛼𝑛) 𝐹𝑥𝑛−1 , 𝛼𝑛−1 𝑥0 + (1 − 𝛼𝑛−1 ) 𝐹𝑥𝑛−1 ) ≤ (1 − 𝛼𝑛) 𝑑 (𝐹𝑥𝑛, 𝐹𝑥𝑛−1 ) + 𝛼𝑛 − 𝛼𝑛−1 𝑑 (𝑥0 , 𝐹𝑥𝑛−1 ) ≤ (1 − 𝛼𝑛) 𝑑 (𝑥𝑛, 𝑥𝑛−1 ) + 𝛼𝑛 − 𝛼𝑛−1 𝑑 (𝑥0 , 𝐹𝑥𝑛−1 ) . Exercise 6.4 yields that 𝑑(𝑥𝑛+1 , 𝑥𝑛 ) → 0. Furthermore, 𝑑 (𝑥𝑛, 𝐹𝑥𝑛) ≤ 𝑑 (𝑥𝑛, 𝑥𝑛+1 ) + 𝑑 (𝑥𝑛+1 , 𝐹𝑥𝑛) ≤ 𝑑 (𝑥𝑛, 𝑥𝑛+1 ) + 𝑑 (𝛼𝑛𝑥0 + (1 − 𝛼𝑛) 𝐹𝑥𝑛, 𝐹𝑥𝑛) ≤ 𝑑 (𝑥𝑛, 𝑥𝑛+1 ) + 𝛼𝑛𝑑 (𝑥0 , 𝐹𝑥𝑛) . The assumption (i) now gives 𝑑(𝑥𝑛 , 𝐹𝑥𝑛 ) → 0. Step 2: Denote 𝑧 := 𝑃Fix 𝐹 (𝑥0 ). We have 2
2
2
𝑑 (𝑧, 𝑥𝑛+1 ) ≤ (1 − 𝛼𝑛) 𝑑 (𝑧, 𝐹𝑥𝑛) + 𝛼𝑛𝑑 (𝑧, 𝑥0 ) − 𝛼𝑛 (1 − 𝛼𝑛) 𝑑 (𝐹𝑥𝑛, 𝑥0 ) 2
2
2
2
≤ (1 − 𝛼𝑛) 𝑑 (𝑧, 𝐹𝑥𝑛) + 𝛼𝑛 [𝑑 (𝑧, 𝑥0 ) − (1 − 𝛼𝑛) 𝑑 (𝐹𝑥𝑛, 𝑥0 ) ] . In order to apply the convergence condition from Exercise 6.4 and hence finish the proof, we need to show that 2
2
lim sup [𝑑 (𝑧, 𝑥0 ) − (1 − 𝛼𝑛) 𝑑 (𝐹𝑥𝑛, 𝑥0 ) ] ≤ 0.
(6.2.9)
𝑛→∞
Choose a sequence 𝜆 𝑚 ∈ (0, ∞) with lim 𝜆 𝑚 = ∞. Denote 𝑧𝑚 := 𝑅𝜆 𝑚 𝑥0 , where 𝑅𝜆 is the resolvent of 𝐹, and observe that this sequence is bounded due to Theorem 4.2.8, because Fix 𝐹 ≠ 0. Then, 2
𝑑 (𝑥𝑛, 𝑧𝑚 ) ≤
𝜆𝑚 1 2 2 𝑑 (𝑥𝑛, 𝐹𝑧𝑚 ) + 𝑑 (𝑥𝑛, 𝑥0 ) 1 + 𝜆𝑚 1 + 𝜆𝑚 −
≤
2
(1 + 𝜆 𝑚 )
𝑑 (𝑥0 , 𝐹𝑧𝑚 )
2
𝜆𝑚 1 2 2 [𝑑 (𝑥𝑛, 𝐹𝑥𝑛) + 𝑑 (𝐹𝑥𝑛, 𝐹𝑧𝑚 )] + 𝑑 (𝑥𝑛, 𝑥0 ) 1 + 𝜆𝑚 1 + 𝜆𝑚 −
≤
𝜆𝑚
𝜆𝑚 2
(1 + 𝜆 𝑚 )
𝑑 (𝑥0 , 𝐹𝑧𝑚 )
2
𝜆𝑚 1 2 2 [𝑑 (𝑥𝑛, 𝐹𝑥𝑛) + 𝑑 (𝑥𝑛, 𝑧𝑚 )] + 𝑑 (𝑥𝑛 , 𝑥0 ) 1 + 𝜆𝑚 1 + 𝜆𝑚 −
𝜆𝑚
2
2
(1 + 𝜆 𝑚 )
𝑑 (𝑥0 , 𝐹𝑧𝑚 ) .
124  6 Convex optimization algorithms Since lim 𝑑(𝑥𝑛 , 𝐹𝑥𝑛 ) = 0, we obtain
𝜆𝑚
2
(1 + 𝜆 𝑚 ) and after dividing by
2
1 1+𝜆 𝑚
𝑑 (𝑥0 , 𝐹𝑧𝑚 ) ≤
1 2 lim inf 𝑑 (𝑥𝑛, 𝑥0 ) , 1 + 𝜆 𝑚 𝑛→∞
and taking the limit 𝑚 → ∞ it becomes 2
2
𝑑 (𝑥0 , 𝑧) ≤ lim inf 𝑑 (𝑥𝑛, 𝑥0 ) . 𝑛→∞ Consequently, 2
2
𝑑 (𝑥0 , 𝑧) ≤ lim inf (1 − 𝛼𝑛) 𝑑 (𝐹𝑥𝑛, 𝑥0 ) , 𝑛→∞ which already implies (6.2.9). The proof is now complete.
Yet another fixed point algorithm We now give one more algorithm for finding a fixed point of a nonexpansive mapping 𝐹 : H → H. In Hilbert spaces it coincides with the proximal point algorithm for the maximal monotone operator 𝐴 := 𝐼 − 𝐹, where 𝐼 is the identity. The algorithm is however probably only of theoretical interest because at each step, we need to find a fixed point of a contraction mapping, which itself requires an approximation algorithm. Proposition 6.2.3. Let 𝐹 : H → H be a nonexpansive mapping with at least one fixed point and let (𝜆 𝑛 ) ⊂ (0, ∞) be a sequence satisfying ∑𝑛 𝜆2𝑛 = ∞. Given a point 𝑥0 ∈ H, put
𝑥𝑛 := 𝑅𝜆 𝑛 𝑥𝑛−1 ,
𝑛 ∈ ℕ,
where 𝑅𝜆 𝑛 is the resolvent of 𝐹. Then the sequence (𝑥𝑛 ) weakly converges to a fixed point of 𝐹. Proof. Let 𝑥 ∈ H be a fixed point of 𝐹. Then, for each 𝑛 ∈ ℕ, we have
𝑑 (𝑥𝑛−1 , 𝑥) ≥ 𝑑 (𝑅𝜆 𝑛 𝑥𝑛−1 , 𝑅𝜆 𝑛 𝑥) = 𝑑 (𝑥𝑛, 𝑥) , which verifies the Fejér monotonicity of (𝑥𝑛 ) with respect to Fix 𝐹. Put
𝛽𝑛 :=
1 . 1 + 𝜆𝑛
Inequality (1.2.2) yields 2
2
2
𝑑 (𝑥, 𝑥𝑛) ≤ 𝛽𝑛𝑑 (𝑥, 𝑥𝑛−1 ) + (1 − 𝛽𝑛)𝑑 (𝑥, 𝐹𝑥𝑛) − 𝛽𝑛(1 − 𝛽𝑛 )𝑑 (𝑥𝑛−1 , 𝐹𝑥𝑛) 2
2
2
≤ 𝛽𝑛𝑑 (𝑥, 𝑥𝑛−1 ) + (1 − 𝛽𝑛)𝑑 (𝑥, 𝑥𝑛) − 𝛽𝑛𝑑 (𝑥𝑛−1 , 𝑥𝑛) , which gives 2
2
2
𝑑 (𝑥𝑛−1 , 𝑥𝑛) ≤ 𝑑 (𝑥, 𝑥𝑛−1 ) − 𝑑 (𝑥, 𝑥𝑛) ,
2
6.3 Proximal point algorithm

125
and hence
𝑑 (𝑥𝑛−1 , 𝑥𝑛) 𝜆2𝑛 𝜆2𝑛
2 2
2
≤ 𝑑 (𝑥, 𝑥𝑛−1 ) − 𝑑 (𝑥, 𝑥𝑛) .
(6.2.10)
By the triangle inequality, we have
𝑑 (𝑥𝑛, 𝑥𝑛+1 ) + 𝑑 (𝑥𝑛+1 , 𝐹𝑥𝑛+1 ) = 𝑑 (𝑥𝑛, 𝐹𝑥𝑛+1 ) ≤ 𝑑 (𝑥𝑛, 𝐹𝑥𝑛) + 𝑑 (𝐹𝑥𝑛, 𝐹𝑥𝑛+1 ) ≤ 𝑑 (𝑥𝑛, 𝐹𝑥𝑛) + 𝑑 (𝑥𝑛, 𝑥𝑛+1 ) , and therefore
𝑑 (𝑥𝑛, 𝑥𝑛+1 ) 𝑑 (𝑥𝑛−1 , 𝑥𝑛) = 𝑑 (𝑥𝑛+1 , 𝐹𝑥𝑛+1 ) ≤ 𝑑 (𝑥𝑛 , 𝐹𝑥𝑛) = 𝜆 𝑛+1 𝜆𝑛
(6.2.11)
Summing up (6.2.10) over 𝑛 = 1, . . . , 𝑚, where 𝑚 ∈ ℕ, and using (6.2.11) gives 𝑚
( ∑ 𝜆2𝑛) 𝑛=1
2
𝑑 (𝑥𝑚−1 , 𝑥𝑚 ) 2 2 ≤ 𝑑 (𝑥, 𝑥0 ) − 𝑑 (𝑥, 𝑥𝑚 ) . 𝜆2𝑚
Take the limit 𝑚 → ∞ to arrive at
𝑑 (𝑥𝑚 , 𝐹𝑥𝑚 ) =
1 𝑑 (𝑥𝑚−1 , 𝑥𝑚 ) → 0 , 𝜆𝑚
as 𝑚 → ∞.
Assume now that 𝑧 ∈ H is a weak cluster point of (𝑥𝑛 ). Then
lim sup 𝑑 (𝐹𝑧, 𝑥𝑛) ≤ lim sup [𝑑 (𝐹𝑧, 𝐹𝑥𝑛) + 𝑑 (𝐹𝑥𝑛, 𝑥𝑛)] , 𝑛→∞
𝑛→∞
≤ lim sup 𝑑 (𝑧, 𝑥𝑛) + 0. 𝑛→∞
By the uniqueness of the weak limit we get 𝑧 = 𝐹𝑧. Finally apply Proposition 3.2.6 (iii) to conclude that (𝑥𝑛 ) weakly converges to a fixed point of 𝐹. Proposition 6.2.3 with a constant sequence 𝜆 𝑛 := 𝜆 for every 𝑛 ∈ ℕ follows by Exercise 6.2. As already alluded to in Remark 6.1.5 the above algorithms for finding fixed points can be used in convex feasibility problems.
6.3 Proximal point algorithm Next we are concerned with a minimization problem for a convex lsc function on a Hadamard space. It turns out that the proximal point algorithm (PPA), which is a popular optimization method in Hilbert spaces can be extended into the Hadamard space setting and works equally well. We first introduce a basic form of this algorithm and then turn to its more sophisticated version (socalled splitting PPA) which applies in a rather special albeit very important case.
126  6 Convex optimization algorithms Basic form of the PPA The proximal point algorithm is a method for finding a minimizer of a convex lsc function. It can be viewed as a discrete version of the gradient flow semigroup presented in Chapter 5. Let (H, 𝑑) be a Hadamard space and 𝑓 : H → (−∞, ∞] be a convex lsc function. The proximal point algorithm starting at a point 𝑥0 ∈ H generates at the 𝑛th step, 𝑛 ∈ ℕ, the point
𝑥𝑛 := 𝐽𝜆 𝑛 𝑥𝑛−1 = arg min [𝑓(𝑦) + 𝑦∈H
1 2 𝑑 (𝑦, 𝑥𝑛−1 ) ] . 2𝜆 𝑛
(6.3.12)
Theorem 6.3.1 (Proximal point algorithm). Let (H, 𝑑) be a Hadamard space and 𝑓: H → (−∞, ∞] be a convex lsc function attaining its minimum. Then, for an arbitrary ∞ starting point 𝑥0 ∈ H and a sequence of positive reals (𝜆 𝑛 ) such that ∑1 𝜆 𝑛 = ∞, the sequence (𝑥𝑛 ) ⊂ H defined by (6.3.12) weakly converges to a minimizer of 𝑓. Proof. The set of minimizers Min 𝑓 is by the assumptions nonempty and Theorem 2.2.22 gives that the sequence (𝑥𝑛 ) is Fejér monotone with respect to Min 𝑓. Without loss of generality we may assume that 𝑓 = 0 on Min 𝑓. Choose now 𝑝 ∈ Min 𝑓 and 𝑘 ∈ ℕ and apply Lemma 2.2.23 with 𝑦 := 𝑝 and 𝑥 := 𝑥𝑘−1 to obtain
𝜆 𝑘 𝑓(𝑥𝑘 ) ≤
1 1 𝑑(𝑥𝑘−1 , 𝑝)2 − 𝑑(𝑥𝑘 , 𝑝)2 , 2 2
(6.3.13)
From (6.3.13) and from the monotonicity of (𝑓(𝑥𝑛 ))𝑛 , we have 𝑛
𝑛
2
2
2𝑓 (𝑥𝑛) ∑ 𝜆 𝑘 ≤ 2 ∑ 𝜆 𝑘 𝑓 (𝑥𝑘 ) ≤ 𝑑 (𝑥0 , 𝑝) − 𝑑 (𝑥𝑛, 𝑝) , 𝑘=1
𝑘=1
and thus, 2
𝑓 (𝑥𝑛) ≤
𝑑 (𝑥0 , 𝑝) , ∑𝑛𝑘=1 𝜆 𝑘
(6.3.14)
for every 𝑛 ∈ ℕ. By the assumptions, the right hand side of the last inequality goes to 0 as 𝑛 → ∞. We thence have that (𝑥𝑛 ) is a minimizing sequence, that is, 𝑓(𝑥𝑛 ) → 0 as 𝑛 → ∞. Assume now that a point 𝑧 ∈ H is a weak cluster point of (𝑥𝑛 ), that is, there exists 𝑤
a subsequence (𝑥𝑛𝑘 ) of (𝑥𝑛 ) such that 𝑥𝑛𝑘 → 𝑧. Since 𝑓 is lsc, and therefore weakly lsc by Lemma 3.2.3, we get 𝑓(𝑧) = 0 and hence 𝑧 ∈ Min 𝑓. Applying Proposition 3.2.6 (iii) 𝑤
finally gives 𝑥𝑛 → 𝑧 as 𝑛 → ∞. This finishes the proof. The above proximal point algorithm for a constant sequence 𝜆 𝑛 := 𝜆, for every 𝑛 ∈ ℕ, follows by Exercise 6.2.
6.3 Proximal point algorithm

127
Remark 6.3.2 (Rate of the convergence). As we saw in (6.3.14), for each 𝑛 ∈ ℕ, we have
𝑓(𝑥𝑛 ) − inf 𝑓 ≤
𝐾
𝑛 ∑𝑘=1
𝜆𝑘
,
where 𝐾 is a positive constant depending on the objective function 𝑓 and the starting point 𝑥0 . Interestingly, there exist a convex lsc function 𝑓 : ℓ2 → (−∞, ∞] attaining its minimum and a point 𝑥0 ∈ ℓ2 such that the PPA starting at 𝑥0 does not converge strongly to a minimizer of 𝑓. For more details, see [29, 32, 98]. Consequently, if one wants to obtain strong convergence of a PPA sequence, more assumptions are necessary. For instance, it would be sufficient to require the underlying space to be locally compact. Alternatively, we may require the function 𝑓 be strongly convex on bounded subsets of its domain, since then every minimizing sequence converges strongly to a minimizer by Proposition 2.2.17.
Splitting proximal point algorithm If the objective function in question has a special form, we can take an advantage of that and improve the proximal point algorithm. We will now study minimization problems for a function 𝑓 : H → (−∞, ∞] of the form 𝑁
𝑓 := ∑ 𝑓𝑛 ,
(6.3.15)
𝑛=1
where 𝑓𝑛 : H → (−∞, ∞] are all convex and lsc. The main trick here is that instead of applying iteratively the resolvent
𝐽𝜆 𝑥 := arg min [𝑓(𝑦) + 𝑦∈H
1 𝑑(𝑥, 𝑦)2 ] 2𝜆
of the function 𝑓, we apply the resolvents
𝐽𝜆𝑛𝑥 := arg min [𝑓𝑛 (𝑦) + 𝑦∈H
1 𝑑(𝑥, 𝑦)2 ] 2𝜆
(6.3.16)
of its components 𝑓𝑛 . Such a method is called the splitting proximal point algorithm, and abbreviated SPPA. There are essentially two ways of applying the resolvents (6.3.16) in the SPPA. We either fix an order of the components (that is, a permutation of the numbers 1, . . . , 𝑁, which without loss of generality may be the identity permutation) and at each cycle we will apply the resolvents sequentially in this fixed order, or alternatively, we will at each step pick a number 𝑟 ∈ {1, . . . , 𝑁} at random and apply the resolvent of 𝑓𝑟 .
128  6 Convex optimization algorithms Theorems 6.3.7 and 6.3.13 below say that in either case, we get a sequence converging to a minimizer of 𝑓. Note that in order to obtain the convergence of the SPPA, the underlying space is assumed to be locally compact. Question 6.3.3. Is it possible to obtain weak convergence in Theorems 6.3.7 and 6.3.13 below, provided we drop the assumption the space be locally compact? Prototypical instances of (6.3.15) are in Examples 2.2.5 and 2.2.6. Example 6.3.4 (Computing medians and means). Given a finite number of points 𝑎1 , . . . , 𝑎𝑁 ∈ H and (𝑤1 , . . . , 𝑤𝑁 ) ∈ 𝛥 𝑁−1 , we consider the function 𝑁
𝑝
𝑓(𝑥) := ∑ 𝑤𝑛𝑑 (𝑥, 𝑎𝑛) ,
𝑥 ∈ H,
(6.3.17)
𝑛=1
where 𝑝 ∈ [1, ∞). In this case, computing the resolvent 𝐽𝜆𝑛 is actually a onedimensional problem. In other words, employing the splitting PPA reduces the computation of a minimizer of 𝑓 to a sequence of onedimensional problems. We are in particular interested in computing medians and means, that is, the cases 𝑝 = 1 and 𝑝 = 2, respectively. This will be greatly exploited in Section 8.3. Interestingly, a mean can be also computed via the law of large numbers; see Algorithm 7.2.3. Example 6.3.5 (Constrained minimization). Let 𝑓 : H → (−∞, ∞] be convex lsc and 𝐶 ∈ H be closed convex. To solve the constrained minimization problem
arg min 𝑓(𝑥), 𝑥∈𝐶
one can write it in an equivalent form
arg min 𝑔(𝑥), 𝑥∈H
where 𝑔 := 𝑓 + 𝜄𝐶 and apply the splitting PPA. Recall that the resolvent of 𝜄𝐶 is 𝑃𝐶 , the metric projection onto 𝐶.
Cyclic order version of the SPPA We will now prove the first main result, namely, that the proximal point algorithm with the cyclic order of applying the marginal resolvents (6.3.16) gives a sequence which converges to a minimizer. Let us first precisely define the procedure. Consider a function 𝑓 of the form (6.3.15). Let (𝜆 𝑘 ) be a sequence of positive reals satisfying ∞
∑ 𝜆𝑘 = ∞ , 𝑘=0
∞
and
∑ 𝜆2𝑘 < ∞ . 𝑘=0
(6.3.18)
6.3 Proximal point algorithm

129
Let 𝑥0 ∈ H be an arbitrary starting point. For each 𝑘 ∈ ℕ0 we set
𝑥𝑘𝑁+1 := 𝐽𝜆1𝑘 (𝑥𝑘𝑁 ) , 𝑥𝑘𝑁+2 := 𝐽𝜆2𝑘 (𝑥𝑘𝑁+1 ) , .. .
(6.3.19)
𝑥𝑘𝑁+𝑁 :=
𝐽𝜆𝑁𝑘
(𝑥𝑘𝑁+𝑁−1 ) ,
where the resolvents are defined by (6.3.16) above. Note that the step size parameter 𝜆 𝑘 is constant throughout each cycle. The convergence of the above algorithm is assured by the following theorem. The assumption (6.3.20) will be commented on in Remark 6.3.14. Example 6.3.6 (Alternating projections revisited). Let (H, 𝑑) be a Hadamard space and 𝐴, 𝐵 ⊂ H be closed convex with 𝐴 ∩ 𝐵 ≠ 0. Then the cyclic version of the SPPA applied to the function
𝑓 := 𝜄𝐴 + 𝜄𝐵 generates the same sequence as the alternating projection method from Section 6.1. Theorem 6.3.7 (SPPA with cyclic order). Let (H, 𝑑) be a locally compact Hadamard space and 𝑓 : H → (−∞, ∞] be of the form (6.3.15) with Min 𝑓 ≠ 0. Given a starting point 𝑥0 ∈ H, let (𝑥𝑗 ) be the sequence defined in (6.3.19). Assume there exists 𝐿 > 0 such that
𝑓𝑛 (𝑥𝑘𝑁 ) − 𝑓𝑛 (𝑥𝑘𝑁+𝑛 ) ≤ 𝐿𝑑 (𝑥𝑘𝑁 , 𝑥𝑘𝑁+𝑛 ) ,
(6.3.20a)
𝑓𝑛 (𝑥𝑘𝑁+𝑛−1 ) − 𝑓𝑛 (𝑥𝑘𝑁+𝑛 ) ≤ 𝐿𝑑 (𝑥𝑘𝑁+𝑛−1 , 𝑥𝑘𝑁+𝑛 ) ,
(6.3.20b)
for each 𝑘 ∈ ℕ0 and 𝑛 = 1, . . . , 𝑁. Then (𝑥𝑗 ) converges to a minimizer of 𝑓. Proof. We divide the proof into two steps. Step 1: We claim that 2
2
𝑑 (𝑥𝑘𝑁+𝑁 , 𝑦) ≤ 𝑑 (𝑥𝑘𝑁 , 𝑦) − 2𝜆 𝑘 [𝑓 (𝑥𝑘𝑁 ) − 𝑓(𝑦)] + 2𝜆2𝑘 𝐿2 𝑁(𝑁 + 1), for every 𝑦 ∈ H. Indeed, apply Lemma 2.2.23 to obtain 2
2
𝑑 (𝑥𝑘𝑁+𝑛 , 𝑦) ≤ 𝑑 (𝑥𝑘𝑁+𝑛−1 , 𝑦) − 2𝜆 𝑘 [𝑓𝑛 (𝑥𝑘𝑁+𝑛 ) − 𝑓𝑛 (𝑦)] , for every 𝑦 ∈ H and 𝑛 = 1, . . . , 𝑁. By summing up we obtain 2
2
𝑁
𝑑 (𝑥𝑘𝑁+𝑁 , 𝑦) ≤ 𝑑 (𝑥𝑘𝑁 , 𝑦) − 2𝜆 𝑘 ∑ [𝑓𝑛 (𝑥𝑘𝑁+𝑛 ) − 𝑓𝑛 (𝑦)] , 𝑛=1 2
= 𝑑 (𝑥𝑘𝑁 , 𝑦) − 2𝜆 𝑘 [𝑓 (𝑥𝑘𝑁 ) − 𝑓(𝑦)] 𝑁
+ 2𝜆 𝑘 ∑ [𝑓𝑛 (𝑥𝑘𝑁 ) − 𝑓𝑛 (𝑥𝑘𝑁+𝑛 )] . 𝑛=1
(6.3.21)
130  6 Convex optimization algorithms By assumption (6.3.20a), we have
𝑓𝑛 (𝑥𝑘𝑁 ) − 𝑓𝑛 (𝑥𝑘𝑁+𝑛 ) ≤ 𝐿𝑑 (𝑥𝑘𝑁 , 𝑥𝑘𝑁+𝑛 ) , where the right hand side can be further estimated as
𝑑 (𝑥𝑘𝑁 , 𝑥𝑘𝑁+𝑛) ≤ 𝑑 (𝑥𝑘𝑁 , 𝑥𝑘𝑁+1 ) + ⋅ ⋅ ⋅ + 𝑑 (𝑥𝑘𝑁+𝑛−1 , 𝑥𝑘𝑁+𝑛 ) . By the definition of the algorithm we have
𝑓𝑛 (𝑥𝑘𝑁+𝑛 ) +
1 2 𝑑 (𝑥𝑘𝑁+𝑛−1 , 𝑥𝑘𝑁+𝑛 ) ≤ 𝑓𝑛 (𝑥𝑘𝑁+𝑛−1 ) , 2𝜆 𝑘
for each 𝑛 = 1, . . . , 𝑁, which then gives
𝑑 (𝑥𝑘𝑁+𝑛−1 , 𝑥𝑘𝑁+𝑛 ) ≤ 2𝜆 𝑘
𝑓𝑛 (𝑥𝑘𝑁+𝑛−1 ) − 𝑓𝑛 (𝑥𝑘𝑁+𝑛) 𝑑 (𝑥𝑘𝑁+𝑛−1 , 𝑥𝑘𝑁+𝑛 )
≤ 2𝜆 𝑘 𝐿,
(6.3.22)
where we employed assumption (6.3.20b). Hence,
𝑓𝑛 (𝑥𝑘𝑁 ) − 𝑓𝑛 (𝑥𝑘𝑁+𝑛 ) ≤ 2𝜆 𝑘 𝐿2 𝑛, and finally, 2
2
𝑑 (𝑥𝑘𝑁+𝑁 , 𝑦) ≤ 𝑑 (𝑥𝑘𝑁 , 𝑦) − 2𝜆 𝑘 [𝑓 (𝑥𝑘𝑁 ) − 𝑓(𝑦)] + 2𝜆2𝑘 𝐿2 𝑁(𝑁 + 1), which finishes the proof of (6.3.21). Step 2: Let now 𝑧 ∈ Min 𝑓, and apply (6.3.21) with 𝑦 := 𝑧. Then 2
2
𝑑 (𝑥𝑘𝑁+𝑁 , 𝑧) ≤ 𝑑 (𝑥𝑘𝑁 , 𝑧) − 2𝜆 𝑘 [𝑓 (𝑥𝑘𝑁 ) − 𝑓(𝑧)] + 2𝜆2𝑘 𝐿2 𝑁(𝑁 + 1), which according to Exercise 6.5 implies that the sequence
(𝑑 (𝑥𝑘𝑁 , 𝑧))𝑘∈ℕ
0
converges, (and in particular, the sequence (𝑥𝑘𝑁 ) is bounded), and ∞
∑ 𝜆 𝑘 [𝑓 (𝑥𝑘𝑁 ) − 𝑓(𝑧)] < ∞.
(6.3.23)
𝑘=0
From (6.3.23) we immediately obtain that there exists a subsequence (𝑥𝑘𝑙 𝑁 ) of (𝑥𝑘𝑁 ) for which 𝑓 (𝑥𝑘𝑙 𝑁 ) → 𝑓(𝑧), as 𝑙 → ∞. Since the sequence (𝑥𝑘𝑙 𝑁 ) is bounded, it has a subsequence which converges to a point 𝑧̂ ∈ H. By the lower semicontinuity of 𝑓 we obtain 𝑧̂ ∈ Min 𝑓. Then we know that
(𝑑 (𝑥𝑘𝑁 , 𝑧̂))𝑘∈ℕ0 ̂. converges, and also that it converges to 0, since a subsequence of (𝑥𝑘𝑁 ) converges to 𝑧
6.3 Proximal point algorithm

131
By virtue of (6.3.22), we obtain
lim 𝑥𝑘𝑁+𝑛 = 𝑧̂,
𝑘→∞
̂ and the proof for every 𝑛 = 1, . . . , 𝑁. Hence the whole sequence (𝑥𝑗 ) converges to 𝑧 is complete.
Random order variant of the SPPA Instead of applying the marginal resolvents in a cyclic order, one can at each step select a number from {1, . . . , 𝑁} at random and apply the corresponding resolvent. Using some tools from probability theory, namely a refined form of the classical supermartingale convergence theorem, we will prove that the resulting SPPA sequence converges to a minimizer of the function 𝑓. Let us first recall some terminology and notation. Let (𝛺, F) a measurable space. A sequence (F𝑘 )𝑘∈ℕ0 of 𝜎algebras on 𝛺 is called a filtration if F0 ⊂ F1 ⊂ ⋅ ⋅ ⋅ ⊂ F. Then (𝛺, F, (F𝑘 )) is a filtered measurable space. A filtered measurable space with a probability measure is called a filtered probability space. We also denote
F∞ := 𝜎( ⋃ F𝑘 ) , 𝑘∈ℕ0
and observe that F∞ is a sub𝜎algebra of F. Assume that
(𝛺, F, (F𝑘 )𝑘∈ℕ , ℙ) 0 is now a fixed filtered probability space. Definition 6.3.8 (Nonnegative supermartingale). A nonnegative stochastic process (𝑋𝑘 ), where 𝑘 ∈ ℕ0 , is a supermartingale if (i) 𝑋𝑘 is F𝑘 measurable, (ii) 𝔼F𝑘 𝑋𝑘+1 ≤ 𝑋𝑘 , for each 𝑘 ∈ ℕ0 . Note that if we have equality in (ii), the process is called a martingale. The following theorem is a fundamental result in martingale theory. Theorem 6.3.9 (Supermartingale convergence theorem). Let (𝑋𝑘 ) be a nonnegative supermartingale. Then there exists a finite nonnegative random variable 𝑋 which is F∞ measurable and (𝑋𝑘 ) converges to 𝑋 almost surely. Proof. See [204, Corollary 11.7]. We shall need a slightly improved form of the above theorem. In order to make the proof more transparent, we state its intermediate version as a lemma.
132  6 Convex optimization algorithms Lemma 6.3.10. Let (𝑋𝑘 ) and (𝑊𝑘 ) be a nonnegative stochastic processes such that: (i) 𝑋𝑘 and 𝑊𝑘 are F𝑘 measurable for each 𝑘 ∈ ℕ0 . (ii) 𝔼F𝑘 𝑋𝑘+1 ≤ 𝑋𝑘 + 𝑊𝑘 , for each 𝑘 ∈ ℕ0 . (iii) ∑𝑘 𝑊𝑘 < ∞. Then (𝑋𝑘 ) converges almost surely to a finite nonnegative random variable. Proof. Denote 𝑘−1
𝑈0 := 𝑋0 ,
𝑈𝑘 := 𝑋𝑘 − ∑ 𝑊𝑖 ,
𝑘 ∈ ℕ,
𝑖=0
and observe 𝑘
𝑘
𝔼F𝑘 𝑈𝑘+1 = 𝔼F𝑘 𝑋𝑘+1 − ∑ 𝑊𝑖 ≤ 𝑋𝑘 + 𝑊𝑘 − ∑ 𝑊𝑖 = 𝑈𝑘 , 𝑖=0
𝑖=0
for each 𝑘 ∈ ℕ0 . Given 𝛼 ∈ (0, ∞), we define a random variable 𝜈𝛼 := inf{𝑛 ∈ ℕ0 : ∑𝑛𝑖=0 𝑊𝑖 > 𝛼}. Then ((𝛼 + 𝑈𝑘 ) 𝜒{𝜈𝛼 >𝑘} ) 𝑘∈ℕ0
is a nonnegative supermartingale and hence converges almost surely by Theorem 6.3.9. Hence (𝑈𝑘 ) converges almost surely on ∞
{𝜈𝛼 = ∞} = { ∑ 𝑊𝑘 ≤ 𝛼} . 𝑘=0
Since 𝛼 was arbitrary, it follows that (𝑈𝑘 ) converges almost surely on 𝛺. Consequently, the process (𝑋𝑘 ) converges almost surely on 𝛺 to a finite random variable. We can now state the promised refinement of the supermartingale convergence theorem.¹ Theorem 6.3.11. Assume (𝑌𝑘 ), (𝑍𝑘 ) and (𝑊𝑘 ) are sequences of nonnegative realvalued random variables defined on 𝛺 and assume that: (i) 𝑌𝑘 , 𝑍𝑘 , 𝑊𝑘 are F𝑘 measurable for each 𝑘 ∈ ℕ0 . (ii) 𝔼F𝑘 𝑌𝑘+1 ≤ 𝑌𝑘 − 𝑍𝑘 + 𝑊𝑘 , for each 𝑘 ∈ ℕ0 . (iii) ∑𝑘 𝑊𝑘 < ∞. Then the sequence (𝑌𝑘 ) converges to a finite random variable 𝑌 almost surely, and ∑𝑘 𝑍𝑘 < ∞ almost surely. Proof. Lemma 6.3.10, applied with 𝑋𝑘 := 𝑌𝑘 yields the claim on the convergence of (𝑌𝑘 ). To show that ∑𝑘 𝑍𝑘 < ∞ almost surely, we put 𝑘−1
𝑉0 := 𝑌0 ,
𝑉𝑘 := 𝑌𝑘 + ∑ 𝑍𝑖 ,
𝑘 ∈ ℕ,
𝑖=0
1 Its deterministic counterpart is in Exercise 6.5 and was used in the proof of Theorem 6.3.7.
6.3 Proximal point algorithm 
133
and observe that 𝑘
𝑘
𝔼F𝑘 𝑉𝑘+1 = 𝔼F𝑘 𝑌𝑘+1 + ∑ 𝑍𝑖 ≤ 𝑌𝑘 − 𝑍𝑘 + 𝑊𝑘 + ∑ 𝑍𝑖 = 𝑉𝑘 + 𝑊𝑘 . 𝑖=0
𝑖=0
Apply Lemma 6.3.10 again, this time with 𝑋𝑘 := 𝑉𝑘 , and conclude that (𝑉𝑘 ) converges ∞ almost surely to a finite random variable 𝑉. Then ∑𝑘=0 𝑍𝑘 = 𝑉 − 𝑌 and the proof is complete. For the purposes of the randomized SPPA, consider the probability space 𝛺 := {1, . . . , 𝑁}ℕ0 equipped with the product of the uniform probability measure on {1, . . . , 𝑁} and let (𝑟𝑘 ) be the sequence of random variables
𝑟𝑘 (𝜔) := 𝜔𝑘 ,
𝜔 = (𝜔1 , 𝜔2 , . . . ) ∈ 𝛺 .
Let 𝑓 : H → (−∞, ∞] be a function of the form (6.3.15) and let (𝜆 𝑘 ) be a sequence of positive reals satisfying ∞
∞
∑ 𝜆𝑘 = ∞ ,
and
𝑘=0
∑ 𝜆2𝑘 < ∞ .
(6.3.24)
𝑘=0
Given a starting point 𝑥0 ∈ H, we put 𝑟
𝑥𝑘+1 := 𝐽𝜆𝑘𝑘 𝑥𝑘 ,
𝑘 ∈ ℕ0 .
(6.3.25)
Define the natural filtration F𝑘 := 𝜎(𝑥0 , . . . , 𝑥𝑘 ) on 𝛺, and finally, denote 𝑥𝑛𝑘+1 the result of the iteration (6.3.25) with 𝑥𝑘 if 𝑟𝑘 (𝜔) = 𝑛. Having Theorem 6.3.11 at hand, we prove the following lemma. Again, the assumption (6.3.26) below will be commented on in Remark 6.3.14. Lemma 6.3.12. Let (H, 𝑑) be a locally compact Hadamard space and 𝑓 be of the form (6.3.15). Given a starting point 𝑥0 ∈ H, let (𝑥𝑘 ) be the sequence defined in (6.3.25). Assume there exists 𝐿 > 0 such that
𝑓𝑛 (𝑥𝑘 ) − 𝑓𝑛 (𝑥𝑛𝑘+1 ) ≤ 𝐿𝑑 (𝑥𝑘 , 𝑥𝑛𝑘+1 ) , for every 𝑘 ∈ ℕ0 and 𝑛 = 1, . . . , 𝑁. Then we have 2
2
𝔼F𝑘 𝑑 (𝑥𝑘+1 , 𝑦) ≤ 𝑑 (𝑥𝑘 , 𝑦) −
2𝜆 𝑘 [𝑓 (𝑥𝑘 ) − 𝑓(𝑦)] + 4𝜆2𝑘 𝐿2 , 𝑁
almost surely, for every 𝑦 ∈ H. Proof. By Lemma 2.2.23 we have 2
2
𝑑 (𝑥𝑘+1 , 𝑦) ≤ 𝑑 (𝑥𝑘 , 𝑦) − 2𝜆 𝑘 [𝑓𝑟𝑘 (𝑥𝑘+1 ) − 𝑓𝑟𝑘 (𝑦)] . Taking the conditional expectation with respect to F𝑘 gives 2
2
𝔼F𝑘 𝑑 (𝑥𝑘+1 , 𝑦) ≤ 𝑑 (𝑥𝑘 , 𝑦) − 2𝜆 𝑘 𝔼F𝑘 [𝑓𝑟𝑘 (𝑥𝑘+1 ) − 𝑓𝑟𝑘 (𝑦)] .
(6.3.26)
134  6 Convex optimization algorithms If we denote 𝑥𝑛𝑘+1 the result of the iteration with 𝑥𝑘 when 𝑟𝑘 = 𝑛, we get 2
2𝜆 𝑘 𝑁 ∑ [𝑓 (𝑥𝑛 ) − 𝑓𝑛 (𝑦)] 𝑁 𝑛=1 𝑛 𝑘+1
2
𝔼F𝑘 𝑑 (𝑥𝑘+1 , 𝑦) ≤ 𝑑 (𝑥𝑘 , 𝑦) −
2𝜆 𝑘 [𝑓 (𝑥𝑘 ) − 𝑓(𝑦)] 𝑁
2
= 𝑑 (𝑥𝑘 , 𝑦) − +
2𝜆 𝑘 𝑁 ∑ [𝑓𝑛 (𝑥𝑘 ) − 𝑓𝑛 (𝑥𝑛𝑘+1 )] . 𝑁 𝑛=1
By assumption (6.3.26) we have 𝑁
𝑁
𝑛=1
𝑛=1
∑ [𝑓𝑛 (𝑥𝑘 ) − 𝑓𝑛 (𝑥𝑛𝑘+1 )] ≤ 𝐿 ∑ 𝑑 (𝑥𝑘 , 𝑥𝑛𝑘+1 ) ≤ 2𝐿2 𝜆 𝑘 𝑁, since
𝑑 (𝑥𝑘 , 𝑥𝑛𝑘+1 ) ≤ 2𝜆 𝑘
𝑓𝑛 (𝑥𝑘 ) − 𝑓𝑛 (𝑥𝑛𝑘+1 ) ≤ 2𝜆 𝑘 𝐿. 𝑑 (𝑥𝑘 , 𝑥𝑛𝑘+1 )
We hence finally obtain 2
2
𝔼F𝑘 𝑑 (𝑥𝑘+1 , 𝑦) ≤ 𝑑 (𝑥𝑘 , 𝑦) −
2𝜆 𝑘 [𝑓 (𝑥𝑘 ) − 𝑓(𝑦)] + 4𝜆2𝑘 𝐿2 , 𝑁
which finishes the proof. We now prove the convergence of the random order version of the SPPA. Theorem 6.3.13 (SPPA with random order). Let (H, 𝑑) be a locally compact Hadamard space and 𝑓 be of the form (6.3.15) with Min 𝑓 ≠ 0. Assume that the Lipschitz condition (6.3.26) holds true. Then, given a starting point 𝑥0 ∈ H, the sequence (𝑥𝑘 ) defined in (6.3.25) converges to a minimizer of 𝑓 almost surely. Proof. Since Min(𝑓) is a locally compact Hadamard space, its closed balls are compact by the Hopf–Rinow theorem [51, p. 35] and consequently it is separable. We can thus choose a countable dense subset {𝑣𝑖 } of Min(𝑓). For each 𝑖 ∈ ℕ apply Lemma 6.3.12 with 𝑦 := 𝑣𝑖 to obtain
2𝜆 𝑘 [𝑓 (𝑥𝑘 (𝜔)) − 𝑓 (𝑣𝑖 )] + 4𝜆2𝑘 𝐿2 . 𝑁 for every 𝜔 from a full measure set 𝛺𝑣𝑖 ⊂ 𝛺. Theorem 6.3.11 immediately gives that 𝑑(𝑣𝑖 , 𝑥𝑘 (𝜔)) converges, and 2
2
𝔼F𝑘 𝑑 (𝑥𝑘+1 (𝜔), 𝑣𝑖 ) ≤ 𝑑 (𝑥𝑘 (𝜔), 𝑣𝑖 ) −
∞
∑ 𝜆 𝑘 [𝑓 (𝑥𝑘 (𝜔)) − inf 𝑓] < ∞, 𝑘=0
for all 𝜔 ∈ 𝛺𝑣𝑖 , where 𝛺𝑣𝑖 is a set of full measure. Next denote
𝛺∞ := ⋂ 𝛺𝑣𝑖 , 𝑖∈ℕ
6.3 Proximal point algorithm 
135
which is by countable subadditivity again a set of full measure. The last inequality yields that for 𝜔 ∈ 𝛺∞ , we have lim inf 𝑘→∞ 𝑓(𝑥𝑘 (𝜔)) = inf 𝑓, and since (𝑥𝑘 (𝜔)) is bounded, it has a cluster point 𝑥(𝜔) ∈ H. By the lower semicontinuity of 𝑓 we know that 𝑥(𝜔) ∈ Min 𝑓. For each 𝜀 > 0 there exists 𝑣𝑖(𝜀) ∈ (𝑣𝑖 ) such that 𝑑(𝑥(𝜔), 𝑣𝑖(𝜀) ) < 𝜀. Because the sequence 𝑑(𝑥𝑘 (𝜔), 𝑣𝑖(𝜀) ) converges and 𝑥(𝜔) is a cluster point of 𝑥𝑘 (𝜔), we have
lim 𝑑 (𝑥𝑘 (𝜔), 𝑣𝑖(𝜀) ) < 𝜀.
𝑘→∞
This yields 𝑥𝑘 (𝜔) → 𝑥(𝜔). We obtain that 𝑥𝑘 converges to a minimizer almost surely. This finishes the proof. Remark 6.3.14. The assumptions (6.3.20) in Theorem 6.3.7, and (6.3.26) in Theorem 6.3.13 are satisfied, for instance, if (i) the functions 𝑓𝑛 are Lipschitz on H with constant 𝐿, or (ii) the SPPA sequence (6.3.19), or (6.3.25) respectively, is bounded and the functions 𝑓𝑛 are Lipschitz on this bounded set with constant 𝐿. To summarize the most important cases of Theorems 6.3.7 and 6.3.13, we state the following corollary. Corollary 6.3.15. Let (H, 𝑑) be a locally compact Hadamard space and 𝑓 be of the form (6.3.15). Assume that (at least) one of the following conditions is satisfied: (i) the functions 𝑓𝑛 are Lipschitz on H with constant 𝐿 and Min 𝑓 ≠ 0, or (ii) the function 𝑓 is of the form (6.3.17). Then: (i) The sequence defined in (6.3.19) converges to a minimizer of 𝑓. (ii) The sequence defined in (6.3.25) converges to a minimizer of 𝑓 almost surely. Proof. The proof follows immediately from Remark 6.3.14 and the fact that if the function 𝑓 is of the form (6.3.17), the SPPA sequences (6.3.19) and (6.3.25) are contained in the bounded set co{𝑥0 , 𝑎1 , . . . , 𝑎𝑁 }.
Exercises Exercise 6.1. Give the details in the proof of Theorem 6.1.4. Exercise 6.2. Let (H, 𝑑) be a Hadamard space and 𝐹 : H → H be firmly nonexpansive with Fix 𝐹 ≠ 0. Given 𝑥0 ∈ H, put
𝑥𝑛 := 𝐹𝑥𝑛−1 ,
𝑛 ∈ ℕ.
Show that the sequence (𝑥𝑛 ) weakly converges to a fixed point of 𝐹. Hint. The sequence is Fejér monotone with respect to Fix 𝐹. Show that 𝑑(𝑥𝑛 , 𝑥𝑛+1 ) → 0 and all cluster points of (𝑥𝑛) lie in Fix 𝐹.
136  6 Convex optimization algorithms Exercise 6.3. Let 𝐹 : H → H be nonexpansive and 𝛼 ∈ (0, 1). Is then (1 − 𝛼)𝐹 + 𝛼𝐼, where 𝐼 : H → H stands for the identity mapping, a firmly nonexpansive mapping? Exercise 6.4. Let (𝑎𝑛 ) and (𝑐𝑛 ) be sequences of nonnegative numbers such that ∑ 𝑐𝑛 < ∞ and (𝑏𝑛) ⊂ ℝ be a sequence with lim sup 𝑏𝑛 ≤ 0. Let (𝑡𝑛) ⊂ [0, 1] be a sequence with ∑ 𝑡𝑛 = ∞. Assume that
𝑎𝑛+1 ≤ (1 − 𝑡𝑛) 𝑎𝑛 + 𝑡𝑛 𝑏𝑛 + 𝑐𝑛, for every 𝑛 ∈ ℕ. Show that lim 𝑎𝑛 = 0. Hint. Use recursion to estimate 𝑎𝑛+𝑚 from above. If in trouble, see [10, Lemma 2.3]. The following is a deterministic version of Theorem 6.3.11. Exercise 6.5. Let (𝑎𝑘 ), (𝑏𝑘 ) and (𝑐𝑘 ) be sequences of nonnegative real numbers. Assume that 𝑎𝑘+1 ≤ 𝑎𝑘 − 𝑏𝑘 + 𝑐𝑘 , (6.3.27) ∞
for each 𝑘 ∈ ℕ and ∑𝑘=1 𝑐𝑘 < ∞. Show that the sequence (𝑎𝑘 ) converges and ∑∞ 𝑘=1 𝑏𝑘 < ∞. Hint. Fix 𝑙 ∈ ℕ. Sum (6.3.27) over 𝑘 ≥ 𝑙 and take lim sup𝑘→∞ to obtain ∞
lim sup 𝑎𝑘 ≤ 𝑎𝑙 + ∑ 𝑐𝑘 . 𝑘→∞
𝑘=𝑙
Taking lim inf 𝑙→∞ yields lim sup𝑘→∞ 𝑎𝑘 ≤ lim inf 𝑙→∞ 𝑎𝑙 , and hence (𝑎𝑘 ) converges. Now fix 𝑛 ∈ ℕ and sum (6.3.27) from 𝑘 = 1 to 𝑘 = 𝑛, 𝑛
𝑛
∑ 𝑏𝑘 ≤ 𝑎1 + ∑ 𝑐𝑘 − 𝑎𝑛+1 . 𝑘=1
𝑘=1 ∞
Since the last inequality holds for any 𝑛 ∈ ℕ, we get ∑𝑘=1 𝑏𝑘 < ∞.
Bibliographical remarks Convex feasibility problems are a classical topic in optimization. The alternating projection method appeared in [22] as a modification of its earlier Hilbert space variant from [28]. The proofs of classical results due to J. von Neumann and L. Bregman can be found in [32, Section 3]. A simple geometric proof of von Neumann’s theorem is given in the paper [118] by E. Kopecká and S. Reich. For the construction from Example 6.1.2, see [100, 138]. The Hilbert ball case was studied by S. Reich [172]. For recent information regarding algorithm (6.1.1) in Hilbert space, see the paper [120] by E. Kopecká and S. Reich. It is known that Question 6.1.3 has the positive answer if codim 𝐴 = 1, but is otherwise largely open [28].
Bibliographical remarks
 137
The averaged projection method in Hilbert spaces was shown to weakly converge by A. Auslender [14]. A counterexample to strong convergence was provided in [32] as late as 2004. To the best of our knowledge, Theorem 6.1.4 appears nowhere in the literature. Theorem 6.2.1 copies its Hilbert space version from [31, Theorem 5.14]. This approximation algorithm has been used extensively since its discovery by M. Krasnoselski [123] and W. Mann [136]. For further analyses, see the papers [129] and [117]. The Krasnoselski–Mann algorithm was studied also in [45, 167]. Theorem 6.2.2 comes from [181] and extends a Hilbert space result of R. Wittmann [205]. We give a simpler proof, which avoids Banach limits and exploits only the geometry of nonpositive curvature. The conditions (i)–(iii) in Theorem 6.2.2 on the parameters 𝜆 𝑛 are discussed in detail in [99, 205]. For a deeper study of the Halpern algorithm using proof mining, see the papers [115, 116] by U. Kohlenbach and L. Leustean. An authoritative reference on proof mining is the monograph [114] by U. Kohlenbach. We would like to mention a recent paper of S. Reich and L. Shemen which is devoted to the Halpern algorithm in the Hilbert ball [174]. Proposition 6.2.3 appeared in [21]. In a Hilbert space 𝐻 it corresponds to the PPA for the maximal monotone operator 𝐼 − 𝐹, where 𝐹 : 𝐻 → 𝐻; see [31, Theorem 23.41]. For earlier results in nonlinear spaces, see also [52, Theorem 2.6], [173, Corollary 7.10], [187, Theorem 4.7], and [11, Theorem 6.4]. Related results in Banach spaces are contained in the paper [151] by O. Nevanlinna and S. Reich. The proximal point algorithm originated in the celebrated papers [47, 137, 180]. O. Güler was the first to give a counterexample to strong convergence [98] and newer counterexamples appeared later in [29, 32]. Recent and unifying treatment of the PPA and be found in [31]. The proximal point algorithm in the context of Riemannian manifold (of nonpositive sectional curvature) was studied in [36, 85, 130, 160]. The proof of Theorem 6.3.1 comes from [19]. For another approach to the PPA in metric spaces, see [211] by A. Zaslavski. The splitting PPA appeared in [16], based on its Euclidean space predecessor due to D. Bertsekas [40]. The cyclic order version of the SPPA was later worked out also in Alexandrov spaces by S. Ohta and M. Pálfia [155]. In the Hadamard space setting, S. Banert studied the SPPA for two functions in [27]. Infinite products of resolvents of accretive operators were studied in [177] by S. Reich and A. Zaslavski. We note that splitting methods go back to P.L. Lions and B. Mercier [133] and G. Passty [163]. For a nice survey, see [66]. Martingale theory is explained in a number of books on probability. We used [204] by D. Williams. Theorem 6.3.9 is [204, Corollary 11.7]. Lemma was in a more general form proved in [44, Theorem 3.3.6]; see also [152, Exercise II4]. Theorem 6.3.11 is is stated in [41, Proposition 4.2] by D. Bertsekas and J. Tsitsiklis. We reproduce its proof from unpublished notes [210] by H. Yu. The statement in Exercise 6.2 appeared in [11, Theorem 6.4]. The claim in Exercise 6.5 comes from [41, Lemma 3.4].
138  6 Convex optimization algorithms There exist many interesting results in optimization in Hadamard manifolds which have so far no analogues in general Hadamard spaces. Let us mention for instance the paper [131]. I would like to thank Brailey Sims for helpful discussions about fixed point approximations in Hadamard spaces.
7 Probabilistic tools in Hadamard spaces We will now study random variables with values in a Hadamard space. Probability theory with such mappings was developed by K.T. Sturm and includes also martingales and Markov processes, which are necessary for a stochastic approach to harmonic maps with values in a Hadamard space. We will however mention only rudiments of this rich probability theory that are needed for our purposes. As applications, we reprove the Jensen inequality, employ the law of large numbers as an algorithm for computing the Fréchet mean and prove a Hadamard space variant of the Wolfowitz theorem, which consequently gives a consensus algorithm.
7.1 Random variables and expectations We start by introducing basic objects. Let (𝛺, F, ℙ) be a probability space and (H, 𝑑) be a Hadamard space. We say that a mapping 𝑌 : 𝛺 → H is a random variable if it is Fmeasurable, that is, if
𝑌−1 (𝐴) ∈ F,
for each Borel set 𝐴 ⊂ H. A random variable 𝑌 : 𝛺 → H defines a probability measure 𝑌∗ ℙ ∈ P(H) by pushforward of ℙ under 𝑌, that is,
𝑌∗ ℙ(𝐴) := ℙ (𝑌−1 (𝐴)) , where 𝐴 ⊂ H is Borel measurable. The measure 𝑌∗ ℙ is called the distribution of 𝑌 and denoted by ℙ𝑌 . We hence have a probability measure on a Hadamard space and can use the results of Section 2.3. Also note that 𝑌 ∈ L𝑝 (𝛺, H) if and only if ℙ𝑌 ∈ P𝑝 (H), where the spaces L𝑝 (𝛺, H) were introduced on page 18. Define the variance of 𝑌 by
var 𝑌 := var ℙ𝑌 = inf 𝔼𝑑 (𝑥, 𝑌(𝜔))2 . 𝑥∈H
(7.1.1)
Obviously, var 𝑌 < ∞ if and only if 𝑌 ∈ L2 (𝛺, H). Definition 7.1.1 (Expectation). The expectation of a random variable 𝑌 ∈ L1 (𝛺, H) is defined as
𝔼𝑌 := 𝑏 (ℙ𝑌 ) , that is, as the barycenter of the distribution ℙ𝑌 . Recall that a barycenter was defined in Theorem 2.3.1. If 𝑌 : 𝛺 → [−∞, ∞] is a random variable, then our definition of the expectation coincides with the usual one; see Example 2.3.2. We can hence use the symbol 𝔼 in both cases.
140  7 Probabilistic tools in Hadamard spaces If we identify H with the constant maps in L2 (𝛺, H), then the map 𝔼 : L2 (𝛺, H) → H can be regarded as the metric projection onto the closed convex set of constant maps. Inequality (2.1.1) then reads
𝔼𝑑(𝑧, 𝑌)2 ≥ 𝑑 (𝑧, 𝔼𝑌)2 + 𝔼𝑑 (𝑌, 𝔼𝑌)2 ,
(7.1.2)
or, equivalently,
𝔼𝑑(𝑧, 𝑌)2 ≥ 𝑑 (𝑧, 𝔼𝑌)2 + var 𝑌 , for every 𝑧 ∈ H and 𝑌 ∈ L2 (𝛺, H). It is exactly the variance inequality (2.3.14) for the measure ℙ𝑌 ∈ P2 (H). Proposition 7.1.2. If 𝑋, 𝑌 ∈ L1 (𝛺, H), then 𝑑(𝔼𝑋, 𝔼𝑌) ≤ 𝔼𝑑(𝑋, 𝑌). In particular, if a sequence 𝑋𝑛 ∈ L1 (𝛺, H) converges to some 𝑋 ∈ L1 (𝛺, H) in L1 (𝛺, H), then 𝔼𝑋𝑛 → 𝔼𝑋. Proof. Inequality (2.3.14) gives
𝑑 (𝔼𝑌, 𝔼𝑋)2 ≤ ∫ [𝑑 (𝔼𝑌, 𝑋(𝜔))2 − 𝑑 (𝔼𝑋, 𝑋(𝜔))2 ] dℙ(𝜔), 𝛺
as well as,
𝑑 (𝔼𝑋, 𝔼𝑌)2 ≤ ∫ [𝑑 (𝔼𝑋, 𝑌(𝜔))2 − 𝑑 (𝔼𝑌, 𝑌(𝜔))2 ] dℙ(𝜔) . 𝛺
Apply Corollary 1.2.5 to the points 𝑋(𝜔), 𝑌(𝜔), 𝔼𝑋, and 𝔼𝑌 to obtain
𝑑 (𝔼𝑌, 𝑋)2 − 𝑑 (𝔼𝑋, 𝑋)2 + 𝑑 (𝑌, 𝔼𝑋)2 − 𝑑 (𝔼𝑌, 𝑌)2 ≤ 2𝑑 (𝔼𝑋, 𝔼𝑌) 𝑑(𝑋, 𝑌) , almost surely on 𝛺. Take expectations in the last inequality and sum up all the inequalities above to get
2𝑑 (𝔼𝑋, 𝔼𝑌)2 ≤ 2𝑑 (𝔼𝑋, 𝔼𝑌) 𝔼𝑑(𝑋, 𝑌) , or,
𝑑 (𝔼𝑋, 𝔼𝑌) ≤ 𝔼𝑑(𝑋, 𝑌) ,
(7.1.3)
which is the first claim. The second part is a direct consequence. Before moving on to the next section, we recall a few elementary definitions, which readily extend for random variables with values in a Hadamard space (H, 𝑑). Random variables (𝑌𝑛 )𝑛∈ℕ are independent if for every finite set of indices 𝑖1 , . . . , 𝑖𝑁 ∈ ℕ and Borel measurable sets 𝐴 𝑖1 , . . . , 𝐴 𝑖𝑁 ⊂ H, the events
{𝑌𝑖1 ∈ 𝐴 𝑖1 } , . . . , {𝑌𝑖𝑁 ∈ 𝐴 𝑖𝑁 } ⊂ 𝛺,
7.2 Law of large numbers

141
are independent. Random variables (𝑌𝑛 )𝑛∈ℕ are identically distributed if
ℙ𝑌1 = ℙ𝑌2 = ℙ𝑌3 = . . . . A sequence of random variables (𝑌𝑛 )𝑛∈ℕ converges in probability to a random variable 𝑌 : 𝛺 → H if we have
ℙ (𝑑(𝑌, 𝑌𝑛) > 𝜀) → 0,
as 𝑛 → ∞
for an arbitrary 𝜀 > 0.
7.2 Law of large numbers Here we prove a nonlinear version of the law of large numbers and as an application we give an alternative proof of the Jensen inequality (Theorem 7.2.2). Another application is an algorithm for computing the Fréchet mean in the end of this section. Let (𝛺, F, ℙ) be a probability space and (H, 𝑑) be a Hadamard space. Given a sequence (𝑌𝑛 ) of random variables 𝑌𝑛 : 𝛺 → H, we define a new sequence (𝑆𝑛 ) of random variables 𝑆𝑛 : 𝛺 → H as
𝑆1 (𝜔) := 𝑌1 (𝜔),
(7.2.4a)
and for 𝑛 ∈ ℕ inductively
𝑆𝑛+1 (𝜔) :=
𝑛 1 𝑆 (𝜔) + 𝑌 (𝜔), 𝑛+1 𝑛 𝑛 + 1 𝑛+1
(7.2.4b)
where 𝜔 ∈ 𝛺. The following Theorem 7.2.1 states both the weak and strong laws of large numbers. Theorem 7.2.1 (Law of large numbers). Let (𝑌𝑛 ) be a sequence of independent identically distributed random variables 𝑌𝑛 ∈ L2 (𝛺, H) and let (𝑆𝑛 ) be the sequence from (7.2.4). Then: (i) The sequence (𝑆𝑛 ) converges to 𝔼𝑌1 in L2 (𝛺, H) and in probability. (Weak law of large numbers) (ii) If moreover 𝑌𝑛 ∈ L∞ (𝛺, H) for each 𝑛 ∈ ℕ, then
𝑆𝑛(𝜔) → 𝔼𝑌1 ,
as 𝑛 → ∞,
for ℙalmost every 𝜔 ∈ 𝛺. (Strong law of large numbers) Proof. Step 1: We claim that 2
2
2
𝔼𝑑 (𝑌𝑛+1 , 𝑆𝑛) ≥ 𝔼𝑑 (𝔼𝑌𝑛+1 , 𝑆𝑛) + 𝔼𝑑 (𝔼𝑌𝑛+1 , 𝑌𝑛+1 ) ,
(7.2.5)
142  7 Probabilistic tools in Hadamard spaces for every 𝑛 ∈ ℕ. Indeed, observe that 𝑌𝑛+1 and 𝑆𝑛 are independent and hence 2 2 ∫ 𝑑 (𝑌𝑛+1 (𝜔), 𝑆𝑛(𝜔)) dℙ(𝜔) = ∫ [∫ 𝑑 (𝑌𝑛+1 (𝜔), 𝑆𝑛 (𝜔 )) dℙ(𝜔)] dℙ (𝜔 ) . 𝛺 𝛺 [𝛺 ]
Fix 𝜔 ∈ 𝛺 and apply (7.1.2) to obtain 2
2
2
𝔼𝑑 (𝑌𝑛+1 , 𝑆𝑛 (𝜔 )) ≥ 𝑑 (𝑆𝑛 (𝜔 ) , 𝔼𝑌𝑛+1 ) + 𝔼𝑑 (𝑌𝑛+1 , 𝔼𝑌𝑛+1 ) . Integration with respect to 𝜔 yields (7.2.5). Step 2: Next we show by induction on 𝑛 ∈ ℕ that 2
𝔼𝑑 (𝔼𝑌1 , 𝑆𝑛) ≤
1 var 𝑌1 . 𝑛
(7.2.6)
It obviously holds for 𝑛 = 1 and we assume it holds for some fixed 𝑛 ∈ ℕ. We have
𝔼𝑑 (𝔼𝑌1 , 𝑆𝑛+1 )
2
= 𝔼𝑑 (𝔼𝑌1 ,
2 𝑛 1 𝑆𝑛 + 𝑌𝑛+1 ) , 𝑛+1 𝑛+1
by (1.2.2) we get
≤
𝑛 1 𝑛 2 2 2 𝔼𝑑 (𝔼𝑌1 , 𝑆𝑛) + 𝔼𝑑 (𝔼𝑌1 , 𝑌𝑛+1 ) − 𝔼𝑑 (𝑌𝑛+1 , 𝑆𝑛) , 2 𝑛+1 𝑛+1 (𝑛 + 1)
and applying (7.2.5) gives
𝑛 1 2 2 𝔼𝑑 (𝔼𝑌1 , 𝑆𝑛) + 𝔼𝑑 (𝔼𝑌1 , 𝑌𝑛+1 ) 𝑛+1 𝑛+1 𝑛 2 2 [𝔼𝑑 (𝔼𝑌𝑛+1 , 𝑆𝑛) + 𝔼𝑑 (𝔼𝑌𝑛+1 , 𝑌𝑛+1 ) ] − (𝑛 + 1)2 𝑛 2 1 2 ) 𝔼𝑑 (𝔼𝑌1 , 𝑆𝑛) + =( var 𝑌1 𝑛+1 (𝑛 + 1)2 1 var 𝑌1 . ≤ 𝑛+1 ≤
This proves (7.2.6), which via Exercise 7.1 immediately yields (i). Step 3: It remains to show (ii). We first claim that
𝑆𝑛2 (𝜔) → 𝔼𝑌1 ,
as 𝑛 → ∞,
for ℙalmost every 𝜔 ∈ 𝛺. Indeed, choose 𝜀 > 0 and apply (7.2.6) to arrive at ∞
∞
∞ 1 1 2 𝔼𝑑 (𝑆 , 𝔼𝑌 ) ≤ ∑ var 𝑌1 < ∞, 2 𝑛 1 2 2 𝑛2 𝜀 𝜀 𝑛=1 𝑛=1
∑ ℙ (𝑑 (𝑆𝑛2 , 𝔼𝑌1 ) > 𝜀) ≤ ∑ 𝑛=1
7.2 Law of large numbers 
143
and the Borel–Cantelli lemma yields the claim. Now assume 𝑌𝑛 ∈ L∞ (𝛺, H), that is, there exists 𝐾 > 0 such that 𝑑(𝑧, 𝑌1 (𝜔)) ≤ 𝐾 for some 𝑧 ∈ H and ℙalmost every 𝜔 ∈ 𝛺. Then also 𝑑(𝑧, 𝑌𝑛 (𝜔)) ≤ 𝐾 for every 𝑛 ∈ ℕ and ℙalmost every 𝜔 ∈ 𝛺, since all the random variables are identically distributed, and by convexity also 𝑑(𝑧, 𝑆𝑛(𝜔)) ≤ 𝐾 for every 𝑛 ∈ ℕ and ℙalmost every 𝜔 ∈ 𝛺. Consequently,
𝑑 (𝑆𝑛(𝜔), 𝑆𝑛+1 (𝜔)) =
1 2 𝑑 (𝑆𝑛(𝜔), 𝑌𝑛+1 (𝜔)) ≤ 𝐾, 𝑛+1 𝑛+1
for ℙalmost every 𝜔 ∈ 𝛺. Therefore, for any 𝑛, 𝑘 ∈ ℕ with 𝑛2 ≤ 𝑘 < (𝑛 + 1)2 , we have
𝑑 (𝑆𝑛2 (𝜔), 𝑆𝑘 (𝜔)) ≤ (
𝑘 − 𝑛2 4 1 1 1 ) 2𝐾 ≤ 2𝐾 ≤ 𝐾, + + ⋅ ⋅ ⋅ + 2 2 2 𝑛 +1 𝑛 +2 𝑘 𝑛 𝑛
which gives (ii). As a nice application of the law of large numbers, we reprove the Jensen inequality of Theorem 2.3.8. Theorem 7.2.2 (Jensen inequality revisited). Let (H, 𝑑) be a Hadamard space and 𝜇 ∈ P1 (H). Then for a convex lsc function 𝑓 : H → (−∞, ∞] we have
𝑓 (𝑏(𝜇)) ≤ ∫ 𝑓(𝑥) d𝜇(𝑥). H
Proof. Step 1: Let us first assume 𝜇 ∈ P2 (H) and 𝑓 ∈ L2 (H, 𝜇). Choose a probability space (𝛺, F, ℙ) and a sequence (𝑌𝑛 ) of independent random variables 𝑌𝑛 ∈ L2 (𝛺, H) such that ℙ𝑌𝑛 = 𝜇 for every 𝑛 ∈ ℕ. Let 𝑆𝑛 be defined by (7.2.4) and put
𝑍𝑛 := 𝑓 (𝑌𝑛 ) ,
and 𝑇𝑛 :=
1 𝑛 ∑𝑍 , 𝑛 𝑖=1 𝑖
for 𝑛 ∈ ℕ. Then by the weak law of large numbers of Theorem 7.2.1 applied to (𝑌𝑛 ) and (𝑓 ∘ 𝑌𝑛) we obtain
𝑆𝑛 → 𝔼𝑌1 = 𝑏(𝜇),
and 𝑇𝑛 → 𝔼 (𝑓 ∘ 𝑌1 ) = ∫ 𝑓 d𝜇 ,
(7.2.7)
H
in probability. We further prove by induction on 𝑛 ∈ ℕ that 𝑓(𝑆𝑛 ) ≤ 𝑇𝑛 . Indeed, it is true for 𝑛 = 1 and assume it is also true for some fixed 𝑛 ∈ ℕ. By the convexity of 𝑓 we have
𝑓 (𝑆𝑛+1 ) ≤
𝑛 1 𝑛 1 𝑓 (𝑆𝑛) + 𝑓 (𝑌𝑛+1 ) ≤ 𝑇𝑛 + 𝑍 = 𝑇𝑛+1 . 𝑛+1 𝑛+1 𝑛+1 𝑛 + 1 𝑛+1
144  7 Probabilistic tools in Hadamard spaces There exist subsequences 𝑆𝑛𝑘 and 𝑇𝑛𝑘 such that the convergence in (7.2.7) is almost surely. Since 𝑓 is lsc we get
𝑓 (𝑏(𝜇)) ≤ lim inf 𝑓 (𝑆𝑛𝑘 ) ≤ lim inf 𝑇𝑛𝑘 = ∫ 𝑓 d𝜇. 𝑘→∞
𝑘→∞
H
In other words,
𝑓 (𝑏 (ℙ𝑌1 )) ≤ 𝔼 (𝑓 ∘ 𝑌1 ) .
(7.2.8)
Step 2: Let now 𝜇 ∈ P1 (H). By Lemma 2.3.7 we may assume that 𝑓+ ∈ L1 (H, 𝜇). Choose a probability space (𝛺, F, ℙ) and a random variable 𝑋 : 𝛺 → H such that ℙ𝑋 = 𝜇. Then 𝑋 ∈ L1 (𝛺, H) as well as (𝑓 ∘ 𝑋)+ ∈ L1 (𝛺, ℙ). Fix some 𝑧 ∈ H and put
𝛺𝑛 := {𝜔 ∈ 𝛺 : 𝑑 (𝑧, 𝑋(𝜔)) < 𝑛, 𝑓 (𝑋(𝜔)) < 𝑛} . Define 𝑋𝑛 ∈ L∞ (𝛺, H) with 𝑓 ∘ 𝑋𝑛 ∈ L∞ (𝛺) by 𝑋𝑛(𝜔) := {
𝑋(𝜔) if 𝜔 ∈ 𝛺𝑛 𝑧 if 𝜔 ∈ 𝛺 \ 𝛺𝑛 .
Then 𝑋𝑛 → 𝑋 in L1 (𝛺, H) and hence by Proposition 7.1.2 we obtain 𝔼𝑋𝑛 → 𝔼𝑋. Since 𝑓 is lsc we arrive at
𝑓 (𝔼𝑋) ≤ lim inf 𝑓 (𝔼𝑋𝑛) . 𝑛→∞ Moreover,
𝔼 (𝑓 ∘ 𝑋) = 𝑛→∞ lim 𝔼 [𝜒𝛺𝑛 (𝑓 ∘ 𝑋)] = 𝑛→∞ lim 𝔼 [𝜒𝛺𝑛 (𝑓 ∘ 𝑋𝑛)] = 𝑛→∞ lim 𝔼 (𝑓 ∘ 𝑋𝑛) and recalling (7.2.8) we get
≥ lim inf 𝑓 (𝔼𝑋𝑛) . 𝑛→∞ In summary,
𝑓 (𝑏(𝜇)) = 𝑓 (𝔼𝑋) ≤ 𝔼 (𝑓 ∘ 𝑋) = ∫ 𝑓 d𝜇, H
which finishes the proof. As a second application of the law of large numbers, we show how to compute means.
Computing the Fréchet mean Recall first the definition of a mean from Example 2.2.5. If 𝑎1 , . . . , 𝑎𝑁 ∈ H and (𝑤1 , . . . , 𝑤𝑁 ) ∈ 𝛥 𝑁−1 , then 𝑁
2
arg min ∑ 𝑤𝑛𝑑 (𝑥, 𝑎𝑛) , 𝑥∈H
𝑛=1
7.3 Conditional expectations 
145
exists, is unique and we call it the mean of 𝑎1 , . . . , 𝑎𝑁 . Except for a few trivial situations, it is not possible to compute a mean exactly and an approximation algorithm is desirable. We have already presented the splitting proximal point algorithm (Theorems 6.3.7 and 6.3.13) which is of course applies to computing means in locally compact Hadamard spaces. Here we introduce a different approximation algorithm based on the law of large numbers (Theorem 7.2.1), which however in some special cases coincides with the algorithm from Theorem 6.3.13, as will be discussed in detail in Remark 8.3.3. The approximation sequence is defined as follows. Algorithm 7.2.3 (Computing a mean). At each step 𝑖 ∈ ℕ0 , choose randomly 𝑟𝑖 ∈ {1, . . . , 𝑁} according to the distribution 𝑤 := (𝑤1 , . . . , 𝑤𝑁 ) and put
𝑠𝑖+1 :=
𝑖 1 𝑠 + 𝑎 . 𝑖 + 1 𝑖 𝑖 + 1 𝑟𝑖
(7.2.9)
The convergence of the sequence (𝑠𝑖 ) to the mean of 𝑎1 , . . . , 𝑎𝑁 is guaranteed by Theorem 7.2.1. To explain the details, denote [𝑁] := {1, . . . , 𝑁} and 𝛺 := [𝑁]ℕ0 . We define a 𝜎algebra on 𝛺 as the power of the 𝜎algebra 2𝑁 , and we define a probability measure on 𝛺 as the power of the probability distribution 𝑤, respectively. Then the random variables 𝑌𝑖 : 𝛺 → {𝑎1 , . . . , 𝑎𝑁 } given by
𝑌𝑖 (𝜔) := 𝑎𝜔𝑖 ,
𝜔 = (𝜔1 , 𝜔2 , . . . ) ∈ 𝛺 ,
for each 𝑖 ∈ ℕ0 , are independent and identically distributed, and satisfy the assumptions of Theorem 7.2.1. Define then the sequence of random variables (𝑆𝑛 ) like in (7.2.4). Since 𝑠𝑖 = 𝑆𝑖 (𝜔) for almost every 𝜔 ∈ 𝛺 and every 𝑖 ∈ ℕ0 , we obtain the desired convergence to 𝑁
2
𝔼𝑌1 = arg min ∑ 𝑤𝑛𝑑 (𝑥, 𝑎𝑛 ) , 𝑥∈H
𝑛=1
where the equality follows immediately from the definition.
7.3 Conditional expectations We continue by defining a conditional expectation and establishing its basic properties. It can be then used to develop the theory of stochastic processes and martingales in Hadamard spaces. Since this is already beyond the scope of our book, we at least give a number of references at the end of the present chapter. Our motivation is the Grohs–Wolfowitz theory and a consensus algorithm which follows from it. On the other hand a conditional expectation is interesting on its own and its definition makes use of metric projections onto convex subsets of a Hadamard space.
146  7 Probabilistic tools in Hadamard spaces Scalarvalued random variables Let us first recall the definition of a conditional expectation of a scalarvalued random variable. Let (𝛺, F, ℙ) be a probability space. Assume G ⊂ F is another 𝜎algebra. It is now important to distinguish whether a function is measurable with respect to F or G. The restriction of ℙ onto G is still denoted ℙ. Let 𝑋 : 𝛺 → [−∞, ∞] be a random variable from L1 (𝛺, F, ℙ), that is, an Fmeasurable function with 𝔼𝑋 < ∞. Then the function
𝜑(𝐴) := ∫ 𝑋 dℙ ,
𝐴 ∈ G,
𝐴
is a new measure on G, which is absolutely continuous with respect to (the restriction of) the measure ℙ. The Radon–Nikodym theorem hence gives a Gmeasurable function
𝔼G 𝑋 :=
d𝜑 , dℙ
which is a unique Gmeasurable function satisfying
∫ 𝔼G 𝑋 dℙ = ∫ 𝑋 dℙ , 𝐴
(7.3.10)
𝐴
for every 𝐴 ∈ G. This function 𝔼G 𝑋 is called the conditional expectation of 𝑋 with respect to G. The conditional expectation
𝔼G : L1 (𝛺, F, ℙ) → L1 (𝛺, G, ℙ) is a linear mapping with the following properties: (i) If 𝑋 ∈ L1 (𝛺, F, ℙ) and 𝑌 ∈ L1 (𝛺, G, ℙ), then 𝔼G (𝑋𝑌) = 𝑌𝔼G 𝑋. In particular, 𝔼G (𝑌) = 𝑌. (ii) If 𝑋 ∈ L1 (𝛺, F, ℙ) and 𝑋 ≥ 0, then 𝔼G 𝑋 ≥ 0. (iii) If 𝑋 ∈ L1 (𝛺, F, ℙ) and 𝑓 : ℝ → ℝ is convex with 𝑓 ∘ 𝑋 ∈ L1 (𝛺, F, ℙ), then
𝑓 (𝔼G 𝑋) ≤ 𝔼G (𝑓 ∘ 𝑋) ,
(7.3.11)
which is the Jensen inequality for conditional expectations. (iv) If 𝑋𝑛 → 𝑋 in L1 (𝛺, F, ℙ), or in L2 (𝛺, F, ℙ), then 𝔼G 𝑋𝑛 → 𝔼G 𝑋 in L1 (𝛺, F, ℙ), or in L2 (𝛺, F, ℙ), respectively. It is important that the conditional expectation of an 𝐿2 random variable can be characterized in terms of metric projections. Proposition 7.3.1. Let 𝑋 ∈ L2 (𝛺, F, ℙ) and 𝑍 ∈ L2 (𝛺, G, ℙ). Then
2 𝔼 𝑌 − 𝔼𝐺 𝑋 ≤ 𝔼𝑌 − 𝑍2 , for every 𝑌 ∈ L2 (𝛺, F, ℙ). That is, the conditional expectation 𝔼G 𝑋 is the metric projection of 𝑋 onto the closed linear subspace L2 (𝛺, G, ℙ).
7.3 Conditional expectations  147
The statement in Proposition 7.3.1 can be used as the definition of the conditional expectation for 𝐿2 functions and such a definition can be then extended for 𝐿1 functions by a density argument. This motivates the following developments.
Conditional expectation in Hadamard spaces We will now turn to Hadamard space valued random variables and define conditional expectations for them. Let (H, 𝑑) be and Hadamard space and (𝛺, F, ℙ) be a probability space. Assume G ⊂ F is another 𝜎algebra. Given an Fmeasurable mapping 𝑌 : 𝛺 → H, we define its mean conditional variance as
varG 𝑌 := inf {𝔼𝑑(𝑌, 𝑍)2 : 𝑍 : 𝛺 → H, Gmeasurable} .
(7.3.12)
Recalling (7.1.1) we see that the variance of 𝑌 is
var 𝑌 := inf 𝔼𝑑(𝑌, 𝑧)2 , 𝑧∈H
and is equal to the conditional variance with G = {0, 𝛺}. In general, we have varG 𝑌 ≤ var 𝑌. In the present section, it is convenient to change the notation and denote the Lebesgue space L𝑝 (𝛺, H) from Proposition 1.2.18 by L𝑝 (F). Its subset consisting of Gmeasurable mappings will be denoted L𝑝 (G). Note that L2 (G) is a closed convex subset of the Hadamard space L2 (F). Definition 7.3.2 (Conditional expectation). Let 𝑌 ∈ L2 (F). Its conditional expectation under G is defined as the metric projection of 𝑌 on the convex closed set L2 (G) and we denote it 𝔼G (𝑌). Recall that conditional expectation is well defined due to Theorem 2.1.12, and (2.1.1) gives the following form of a variance inequality 2
𝑑2 (𝑍, 𝑌)2 ≥ varG 𝑌 + 𝑑2 (𝑍, 𝔼G 𝑌) , for each 𝑍 ∈ L2 (G). Observe that if G = {0, 𝛺}, we recover the usual expectation, that is, 𝔼G 𝑌 = 𝔼𝑌. Furthermore, by Proposition 7.3.1, we see that this definition gives the usual conditional expectation in the case of H = ℝ. We can hence use the symbol 𝔼G for both. Also, the following conditional variance inequality holds. Proposition 7.3.3 (Conditional variance inequality). Let 𝑌 ∈ L2 (F) and 𝑍 ∈ L2 (G). Then 2 2 𝔼G 𝑑(𝑍, 𝑌)2 ≥ 𝔼G 𝑑 (𝔼G 𝑌, 𝑌) + 𝑑 (𝔼G 𝑌, 𝑍) , (7.3.13) holds ℙalmost surely on 𝛺.
148  7 Probabilistic tools in Hadamard spaces Proof. Step 1: Fix a set 𝐴 ∈ G and put 1 2
2
𝑑𝐴 (𝑍1 , 𝑍2 ) := [∫ 𝑑 (𝑍1 (𝜔), 𝑍2 (𝜔)) dℙ(𝜔)] , [𝐴 ] for every Gmeasurable mappings 𝑍1 , 𝑍2 : 𝛺 → H. Denote L2 (𝐴, G) the quotient of L2 (G) with respect 𝑍1 ∼ 𝑍2 if 𝑑𝐴 (𝑍1 , 𝑍2 ) = 0. Exercise 7.2 yields that (L2 (𝐴, G), 𝑑𝐴 ) is a Hadamard space. Since the function 𝑑𝐴 (⋅, 𝑌)2 is strongly convex on L2 (𝐴, G), by Proposition 2.2.17 we have a unique minimizer 𝔼𝐴,G 𝑌 ∈ L2 (𝐴, G) along with the variance inequality 2
2
𝑑𝐴 (𝔼𝐴,G 𝑌, 𝑌) + 𝑑𝐴 (𝔼𝐴,G 𝑌, 𝑍) ≤ 𝑑𝐴 (𝑍, 𝑌)2 ,
(7.3.14)
for every 𝑍 ∈ L2 (𝐴, G). Likewise, for each 𝑍 ∈ L2 (𝛺 \ 𝐴, G), we obtain 2
2
𝑑𝛺\𝐴 (𝔼𝛺\𝐴,G 𝑌, 𝑌) + 𝑑𝛺\𝐴 (𝔼𝛺\𝐴,G 𝑌, 𝑍) ≤ 𝑑𝛺\𝐴 (𝑍, 𝑌)2 .
(7.3.15)
Obviously, 2
𝑑2 (𝔼G 𝑌, 𝑌) = inf {𝑑𝛺 (𝑍, 𝑌)2 : 𝑍 ∈ L2 (𝛺, G, 𝑌)} ≥ inf {𝑑𝐴 (𝑍, 𝑌)2 : 𝑍 ∈ L2 (𝐴, G, 𝑌)} + inf {𝑑𝛺\𝐴 (𝑍, 𝑌)2 : 𝑍 ∈ L2 (𝛺 \ 𝐴, G, 𝑌)} 2
(7.3.16)
2
= 𝑑𝐴 (𝔼𝐴,G 𝑌, 𝑌) + 𝑑𝛺\𝐴 (𝔼𝛺\𝐴,G 𝑌, 𝑌) . Step 2: Now define a mapping 𝑌𝐴 : 𝛺 → H by on 𝐴 , 𝔼 𝑌, 𝑌𝐴 := { 𝐴,G 𝔼𝛺\𝐴,G 𝑌, on 𝛺 \ 𝐴. Observe that 𝑌𝐴 ∈ L2 (G). The inequalities (7.3.14)–(7.3.16) imply that 𝑌𝐴 is a unique minimizer of the function 𝑑2 (⋅, 𝑌)2 on L2 (G). Thus 𝑌𝐴 = 𝔼G 𝑌. In particular, 𝔼G 𝑌 = 𝔼𝐴,G𝑌 almost surely on 𝐴. Inequality (7.3.14) then reads 2
2
∫ 𝑑 (𝑍, 𝔼G 𝑌) dℙ + ∫ 𝑑 (𝔼G 𝑌, 𝑌) dℙ ≤ ∫ 𝑑 (𝑍, 𝑌)2 dℙ, 𝐴
𝐴
𝐴
for each 𝑍 ∈ L2 (G), or equivalently 2
2
𝑑 (𝑍, 𝔼G 𝑌) + 𝔼G 𝑑 (𝔼G 𝑌, 𝑌) ≤ 𝔼G 𝑑 (𝑍, 𝑌)2 , holds ℙalmost surely on 𝛺, for each 𝑍 ∈ L2 (G), which finishes the proof.
7.3 Conditional expectations  149
Theorem 7.3.4. Let 𝑋, 𝑌 ∈ L2 (F). Then we have
𝑑 (𝔼G 𝑋, 𝔼G 𝑌) ≤ 𝔼G 𝑑(𝑋, 𝑌) , almost surely, and for every 𝑝 ∈ [1, ∞] also
𝑑𝑝 (𝔼G 𝑋, 𝔼G 𝑌) ≤ 𝑑𝑝 (𝑋, 𝑌) . Proof. The conditional variance inequality (7.3.13) gives 2
2
2
2
𝑑 (𝔼G 𝑋, 𝔼G 𝑌) ≤ 𝔼G 𝑑 (𝔼G 𝑌, 𝑋) − 𝔼G 𝑑(𝔼G 𝑋, 𝑋)2 , almost surely, as well as,
𝑑 (𝔼G 𝑋, 𝔼G 𝑌) ≤ 𝔼G 𝑑 (𝔼G 𝑋, 𝑌) − 𝔼G 𝑑(𝔼G 𝑌, 𝑌)2 . Apply Corollary 1.2.5 to the points 𝑋(𝜔), 𝑌(𝜔), 𝔼G 𝑋(𝜔), and 𝔼G 𝑌(𝜔) to obtain 2
2
2
𝑑 (𝔼G 𝑋, 𝑌) + 𝑑 (𝑋, 𝔼G 𝑌) ≤ 𝑑 (𝔼G 𝑋, 𝑋) + 𝑑 (𝔼G 𝑌, 𝑌)
2
+ 2𝑑 (𝔼G 𝑋, 𝔼G 𝑌) 𝑑(𝑋, 𝑌) , almost surely. Take conditional expectations and add up these inequalities to get 2
2𝑑 (𝔼G 𝑋, 𝔼G 𝑌) ≤ 2𝑑 (𝔼G 𝑋, 𝔼G 𝑌) 𝔼G 𝑑(𝑋, 𝑌), almost surely, or,
𝑑 (𝔼G 𝑋, 𝔼G 𝑌) ≤ 𝔼G 𝑑(𝑋, 𝑌), which is the first claim. This in turn implies the second one, 𝑝
𝑝
𝔼𝑑 (𝔼G 𝑋, 𝔼G 𝑌) ≤ 𝔼 [𝔼G 𝑑(𝑋, 𝑌)] ≤ 𝔼 [𝔼G 𝑑(𝑋, 𝑌)𝑝 ] ≤ 𝔼𝑑(𝑋, 𝑌)𝑝 , where the middle inequality follows from (7.3.11). The proof is now complete. As a corollary we obtain that the definition of conditional expectation 𝔼G extends continuously from L2 (F) to L1 (F). Indeed, approximate a random variable 𝑌 ∈ L1 (F) by random variables 𝑌𝑛 ∈ L∞ (F) defined as
𝑌𝑛(𝜔) := {
𝑌(𝜔), if 𝑑 (𝑌(𝜔), 𝑧) ≤ 𝑛 , otherwise, 𝑧,
where 𝑧 ∈ H is fixed. Then 𝑌𝑛 → 𝑌 in L1 (F). Theorem 7.3.4 implies that (𝔼G 𝑌𝑛 ) is a Cauchy sequence in L1 (G) and we denote its limit point 𝔼G 𝑌. One can conclude that a conditional expectation is well defined also on L1 (F). Moreover, both inequalities of Theorem 7.3.4 remain valid in L1 (F). Finally, given 𝑝 ∈ [1, ∞], the mapping 𝔼G : L𝑝 (F) → L𝑝 (G) is nonexpansive.
150  7 Probabilistic tools in Hadamard spaces Filtered conditional expectations Let (𝛺, F, ℙ) be a probability space and G1 ⊂ G2 ⊂ F be other 𝜎algebras. If a random variable 𝑋 : 𝛺 → [−∞, ∞] is from L1 (𝛺, F, ℙ), that is, it is an Fmeasurable function with 𝔼𝑋 < ∞, then
𝔼G1 𝔼G2 𝑋 = 𝔼G1 𝑋 .
(7.3.17)
On the other hand if (H, 𝑑) is a Hadamard space and 𝑋 : 𝛺 → H is from L1 (F), then (7.3.17) is no longer true in general; see Exercise 7.3. To overcome this noncommutativity issue, filtered conditional expectations are now introduced. Let (𝛺, F) a measurable space. A sequence (F𝑖 )𝑖∈ℕ0 of 𝜎algebras on 𝛺 is called a filtration if F0 ⊂ F1 ⊂ ⋅ ⋅ ⋅ ⊂ F. Then (𝛺, F, (F𝑖 )) is a filtered measurable space. A filtered measurable space with a probability measure is called a filtered probability space. Definition 7.3.5 (Filtered conditional expectation). Let (𝛺, F, (F𝑖 ), ℙ) be a filtered probability space and 𝑌 ∈ L1 (F𝑙 ) for some 𝑙 ∈ ℕ0 . Then we define the filtered conditional expectation by
𝔼 (𝑌‖F𝑘 ) := 𝔼F𝑘 𝔼F𝑘−1 . . . 𝔼F𝑙−1 𝑌 ,
𝑘 ∈ ℕ0 ,
provided 𝑘 < 𝑙. Otherwise we put 𝔼(𝑌‖F𝑘 ) := 𝑌. The filtered conditional expectation is a crucial building block for developing martingale theory in Hadamard space, but it is beyond the scope of the present book. We merely show the following version of the Jensen inequality, which is needed in the proof of Theorem 7.3.16. Theorem 7.3.6 (Conditional Jensen inequality). Let (H, 𝑑) be a Hadamard space and (𝛺, F, (F𝑖 )𝑖∈ℕ0 , ℙ) be a filtered probability space. If 𝑓 : H → (−∞, ∞] is a convex lsc function and 𝑌 ∈ L1 (F) with 𝔼(𝑓 ∘ 𝑌)+ < ∞, then
𝑓 (𝔼F𝑖 𝑌) ≤ 𝔼F𝑖 (𝑓 ∘ 𝑌), and if moreover 𝑌 ∈ L1 (F𝑘 ), we have
𝑓 (𝔼 (𝑌‖F𝑖 )) ≤ 𝔼F𝑖 (𝑓 ∘ 𝑌) , for each 𝑖, 𝑘 ∈ ℕ0 with 𝑖 ≤ 𝑘. Proof. Step 1: Assume first 𝑌 ∈ L2 (F) and 𝔼𝑓 ∘ 𝑌2 < ∞. The random variable ̂ := H × ℝ and it is an 𝐿2 mapping ̂ := (𝑌, 𝑓 ∘ 𝑌) has values in the Hadamard space H 𝑌 satisfying
̂ = (𝔼F 𝑌, 𝔼F (𝑓 ∘ 𝑌)) . 𝔼F𝑖 𝑌 𝑖 𝑖
7.3 Conditional expectations  151
̂ ̂ ∈ epi 𝑓 and therefore Since 𝑌(𝛺) ⊂ epi 𝑓, we have 𝔼F𝑖 𝑌 𝑓 (𝔼F𝑖 𝑌) ≤ 𝔼F𝑖 (𝑓 ∘ 𝑌) . Step 2: Assume now 𝑌 ∈ L1 (F) and 𝔼(𝑓 ∘ 𝑌)+ < ∞. Choose 𝑧 ∈ H and put
𝛺𝑛 := {𝜔 ∈ 𝛺 : 𝑑 (𝑧, 𝑌(𝜔)) < 𝑛, 𝑓 (𝑌(𝜔)) < 𝑛} . Define 𝑌𝑛 ∈ L∞ (𝛺, H) with 𝑓 ∘ 𝑌𝑛 ∈ L∞ (𝛺) by
𝑌𝑛(𝜔) := {
𝑌(𝜔) if 𝜔 ∈ 𝛺𝑛 𝑧 if 𝜔 ∈ 𝛺 \ 𝛺𝑛 .
Then 𝑌𝑛 → 𝑌 in L1 (F) and by Theorem 7.3.4 also 𝔼F𝑖 𝑌𝑛 → 𝔼F𝑖 𝑌 in L1 (F) as 𝑛 → ∞. The lower semicontinuity of 𝑓 implies
lim inf 𝑓 (𝔼F𝑖 𝑌𝑛) ≥ 𝑓 (𝔼F𝑖 𝑌) . 𝑛→∞ Moreover,
𝔼F𝑖 (𝑓 ∘ 𝑌) = 𝑛→∞ lim 𝔼F𝑖 [𝜒𝛺𝑛 (𝑓 ∘ 𝑌)] = 𝑛→∞ lim 𝔼F𝑖 [𝜒𝛺𝑛 (𝑓 ∘ 𝑌𝑛 )] = 𝑛→∞ lim 𝔼F𝑖 (𝑓 ∘ 𝑌𝑛) ≥ lim inf 𝑓 (𝔼F𝑖 𝑌𝑛) , 𝑛→∞ by Step 1. Step 3: Finally, assume 𝑌 ∈ L1 (F𝑘 ) and 𝔼(𝑓 ∘ 𝑌)+ < ∞. For 𝑖 ∈ ℕ with 𝑖 ≤ 𝑘 define
𝑌𝑖 := 𝔼 (𝑌‖F𝑖 ) . Using Step 2, we can iteratively deduce from 𝑌𝑖 ∈ L1 (F𝑘 ) and 𝔼(𝑓 ∘ 𝑌𝑖 )+ < ∞ that 𝑌𝑖−1 ∈ L1 (F𝑘 ) and
𝑓 ∘ 𝑌𝑖−1 ≤ 𝔼F𝑖−1 (𝑓 ∘ 𝑌𝑖 ) . Iterating the last inequality finishes the proof.
Grohs–Wolfowitz theory We finish this chapter by presenting a remarkable result about a certain family of stochastic matrices and show its application to distributed computing. The following notation will be used.
152  7 Probabilistic tools in Hadamard spaces Notation 7.3.7. Given 𝑘 ∈ ℕ, we denote [𝑘] : = {1, . . . , 𝑘}. For a mapping 𝑥 : [𝑘] → H, the symbol diam(𝑥) stands for the diameter of its range, that is, diam(𝑥) := max𝑚,𝑛∈[𝑘] 𝑑(𝑥(𝑚), 𝑥(𝑛)). Recall some classical definitions. Definition 7.3.8 (Markov kernels). For every 𝑖, 𝑗 ∈ ℕ0 with 𝑖 ≤ 𝑗 let 𝑝𝑖𝑗 : [𝑁] × [𝑁] → [0, 1] be a stochastic matrix. If the family (𝑝𝑖𝑗 )𝑖𝑗 satisfies: (i) 𝑝𝑖𝑖 (𝑚, 𝑛) = 𝛿𝑚𝑛 , for every 𝑖 ∈ ℕ0 and 𝑚, 𝑛 = 1, . . . , 𝑁, (ii) 𝑝𝑖𝑘 = 𝑝𝑖𝑗 𝑝𝑗𝑘 , for every 𝑖, 𝑗, 𝑘 ∈ ℕ0 with 𝑖 ≤ 𝑗 ≤ 𝑘, then we say that (𝑝𝑖𝑗 ) form a family of Markov kernels. Having a family of Markov kernels 𝑝𝑖𝑗 , where 𝑖, 𝑗 ∈ ℕ0 with 𝑖 ≤ 𝑗, we define a Markov
chain (𝑋𝑖 )𝑖∈ℕ0 on the probability space 𝛺 := [𝑁]ℕ0 equipped with the standard product 𝜎algebra F. Given an initial distribution 𝜋 on [𝑁] we equip the probability space (𝛺, F) with the probability measure ℙ which is uniquely determined by its restrictions onto the cylinders:
ℙ ({𝑘0 } × ⋅ ⋅ ⋅ × {𝑘𝑖 } × [𝑁] × . . . ) := 𝜋 (𝑘0 ) 𝑝01 (𝑘0 , 𝑘1 ) . . . 𝑝𝑖−1,𝑖 (𝑘𝑖−1 , 𝑘𝑖 ) , where 𝑘0 , . . . , 𝑘𝑖 ∈ [𝑁] and 𝑖 ∈ ℕ0 . Then for each 𝑖 ∈ ℕ0 , the random variable 𝑋𝑖 : 𝛺 → [𝑁] is given by
𝑋𝑖 (𝜔) := 𝜔𝑖 ,
𝜔 := (𝜔0 , 𝜔1 , . . . ) ∈ 𝛺 .
This processes is adapted to the filtration
F𝑖 := 𝜎 {{𝑘0 } × ⋅ ⋅ ⋅ × {𝑘𝑖 } × [𝑁] × . . . : (𝑘0 , . . . , 𝑘𝑖 ) ∈ [𝑁]𝑖+1 } , where 𝑖 ∈ ℕ0 , meaning that the random variable 𝑋𝑖 is F𝑖 measurable. If we define the expectation 𝔼 is with respect to ℙ, then the linear Markov property, that is, 𝑁
𝔼F𝑖 𝑓 (𝑋𝑗 ) (𝜔) = ∑ 𝑝𝑖𝑗 (𝑋𝑖 (𝜔), 𝑛) 𝑓(𝑛) ,
𝜔∈𝛺,
(7.3.18)
𝑛=1
holds for an arbitrary function 𝑓 : [𝑁] → [0, ∞) and 𝑖, 𝑗 ∈ ℕ0 with 𝑖 ≤ 𝑗. Example 7.3.9 (Homogeneous Markov chain). Let 𝐴 ∈ ℝ𝑁×𝑁 be a stochastic matrix and put 𝑝𝑖−1,𝑖 := 𝐴 for 𝑖 ∈ ℕ. Then 𝑝𝑖𝑗 = 𝐴𝑗−𝑖 are Markov kernels, where 𝑖, 𝑗 ∈ ℕ0 and 𝑖 ≤ 𝑗, and the associated Markov process is called homogeneous. Proposition 7.3.10 (Nonlinear Markov property). Let 𝑝𝑖𝑗 be Markov kernels for 𝑖, 𝑗 ∈ ℕ0 with 𝑖 ≤ 𝑗 and let 𝑥 : [𝑁] → H be a mapping. Then we have 𝑁
𝔼F𝑖 (𝑥 ∘ 𝑋𝑗 ) (𝜔) = arg min ∑ 𝑝𝑖𝑗 (𝑋𝑖 (𝜔), 𝑛) 𝑑 (𝑥(𝑛), 𝑧)2 , 𝑧∈H
𝑛=1
for every 𝑖, 𝑗 ∈ ℕ0 with 𝑖 ≤ 𝑗. Here (𝑋𝑖 ) is the Markov chain constructed above.
7.3 Conditional expectations 
153
Proof. Step 1: We will first prove that 𝑁
∫ ∑ 𝑝𝑖𝑗 (𝑋𝑖 (𝜔), 𝑛) 𝑓 (𝑥(𝑛), 𝑌(𝜔)) dℙ(𝜔) = ∫ 𝑓 (𝑥 ∘ 𝑋𝑗 (𝜔), 𝑌(𝜔)) dℙ(𝜔) , 𝛺
𝑛=1
𝛺
holds for each 𝑌 : 𝛺 → H which is F𝑖 measurable and each measurable function 𝑓 : H × H → [0, ∞). By a monotone class argument, we may assume 𝑓(𝑢, 𝑣) = 𝑔(𝑢)ℎ(𝑣) for some measurable functions 𝑔, ℎ : H → [0, ∞). Then we obtain 𝑁
∫ ∑ 𝑝𝑖𝑗 (𝑋𝑖 (𝜔), 𝑛)𝑔 (𝑥(𝑛)) ℎ (𝑌(𝜔)) dℙ(𝜔) 𝛺
𝑛=1 𝑁
= ∫ ℎ (𝑌(𝜔)) ∑ 𝑝𝑖𝑗 (𝑋𝑖 (𝜔), 𝑛) 𝑔 (𝑥(𝑛)) dℙ(𝜔) , 𝑛=1
𝛺
and using the linear Markov property (7.3.18) we get
= ∫ ℎ (𝑌(𝜔)) 𝔼F𝑖 [𝑔 (𝑥 ∘ 𝑋𝑗 )] dℙ(𝜔) 𝛺
= ∫ 𝔼F𝑖 [ℎ (𝑌(𝜔)) 𝑔 (𝑥 ∘ 𝑋𝑗 )] dℙ(𝜔) 𝛺
= 𝔼 [ℎ(𝑌)𝑔 (𝑥 ∘ 𝑋𝑗 )] , which is already the desired equality. Step 2: If we denote 𝑁
𝑌(𝜔) := arg min ∑ 𝑝𝑖𝑗 (𝑋𝑖 (𝜔), 𝑛) 𝑑 (𝑥(𝑛), 𝑧)2 , 𝑧∈H
𝜔∈𝛺,
𝑛=1
we see that 𝑌 is F𝑖 measurable, because 𝑋𝑖 is such. It remains to show that 𝔼𝑑(𝑥∘𝑋𝑗 , 𝑌)2 ≤ 𝔼𝑑(𝑥∘𝑋𝑗 , 𝑍)2 for every F𝑖 measurable random variable 𝑍 : 𝛺 → H. We obviously have 𝑁
𝑁
𝑛=1
𝑛=1
∑ 𝑝𝑖𝑗 (𝑋𝑖 , 𝑛) 𝑑 (𝑥(𝑛), 𝑌)2 ≤ ∑ 𝑝𝑖𝑗 (𝑋𝑖 , 𝑛) 𝑑 (𝑥(𝑛), 𝑍)2 , and consequently 2
𝑁
𝔼𝑑 (𝑥 ∘ 𝑋𝑗 , 𝑌) = 𝔼 ∑ 𝑝𝑖𝑗 (𝑋𝑖 , 𝑛) 𝑑 (𝑥(𝑛), 𝑌)2 𝑛=1 𝑁
≤ 𝔼 ∑ 𝑝𝑖𝑗 (𝑋𝑖 , 𝑛) 𝑑 (𝑥(𝑛), 𝑍)2 𝑛=1 2
= 𝔼𝑑 (𝑥 ∘ 𝑋𝑗 , 𝑍) . The proof is complete.
154  7 Probabilistic tools in Hadamard spaces Remark 7.3.11. The expectation 𝔼 of course depends on the initial probability distribution 𝜋, but Proposition 7.3.10 tells us that 𝔼F𝑖 (𝑥 ∘ 𝑋𝑗 )(𝜔) is independent of 𝜋. A stochastic matrix 𝑊 ∈ ℝ𝑁×𝑁 is irreducible if for every 𝑚, 𝑛 ∈ [𝑁] there exists 𝑖 ∈ ℕ such that 𝑊𝑖 (𝑚, 𝑛) > 0. The matrix 𝑊 is aperiodic if for every 𝑚, 𝑛 ∈ [𝑁] we have gcd{𝑘 ∈ ℕ : 𝑊𝑘 (𝑚, 𝑛) > 0} = 1. The following result is often referred to as the Fundamental theorem of Markov chains. Theorem 7.3.12. Let 𝑊 ∈ ℝ𝑁×𝑁 be a stochastic matrix. It is irreducible and aperiodic if and only if the limit 𝑄 := lim𝑖→∞ 𝑊𝑖 exists and the matrix 𝑄 has all rows identical. A stochastic irreducible and aperiodic matrix 𝑊 is called an SIA matrix. Note that (𝑄(1, 1), . . . , 𝑄(1, 𝑁)) is then a unique stationary distribution on [𝑁] for the homogeneous Markov chain from Example 7.3.9, that is, it is the only vector 𝑞 ∈ [0, 1]𝑁 satisfying 𝑞 = 𝑞𝑊. Definition 7.3.13 (SIA family). A family {𝑊1 , . . . , 𝑊𝑘 } ⊂ ℝ𝑁×𝑁 is called an SIA family if every matrix
𝑊𝜂 := 𝑊𝜂1 𝑊𝜂2 . . . 𝑊𝜂𝑖 is SIA, where 𝜂 ∈ [𝑘]𝑖 and 𝑖 ∈ ℕ. With regard to Theorem 7.3.12 it is natural to introduce the following pseudonorm
‖𝐴‖ := max max 𝐴(𝑘, 𝑛) − 𝐴(𝑚, 𝑛) , 𝑛∈[𝑁] 𝑘,𝑚∈[𝑁]
of a matrix 𝐴 ∈ ℝ𝑁×𝑁 . It quantifies how different the rows of a given matrix are. A sufficiently long word composed from an SIA family is a matrix with almost identical rows, as the following Wolfowitz theorem states. Theorem 7.3.14. Let {𝑊1 , . . . , 𝑊𝑘 } ⊂ ℝ𝑁×𝑁 be an SIA family. Then for every 𝜀 > 0 there exists 𝑖0 ∈ ℕ such that ‖𝑊𝜂 ‖ < 𝜀 whenever 𝜂 ∈ [𝑘]𝑖 for some 𝑖 ≥ 𝑖0 . Proof. See [206]. Example 7.3.15. Let {𝑊1 , . . . , 𝑊𝑘 } ⊂ ℝ𝑁×𝑁 be an SIA family and 𝜂 ∈ [𝑘]ℕ . If we put 𝑝𝑖,𝑖+1 := 𝑊𝜂𝑖 for 𝑖 ∈ ℕ0 , then we obtain Markov kernels and Theorem 7.3.14 tells us that the associated Markov chain converges to a stationary state. We will now show that Theorem 7.3.14 can be extended into a Hadamard space (H, 𝑑) and applied as a consensus algorithm. Given a stochastic matrix 𝑊 ∈ ℝ𝑁×𝑁 and a mapping 𝑥 : [𝑁] → H, we define 𝑁
2
𝑊𝑥(𝑚) : = arg min ∑ 𝑊(𝑚, 𝑛)𝑑 (𝑦, 𝑥(𝑛)) , 𝑦∈H
where 𝑚 = 1, . . . , 𝑁.
𝑛=1
7.3 Conditional expectations 
155
Theorem 7.3.16. Let (H, 𝑑) be a Hadamard space and {𝑊1 , . . . , 𝑊𝑘 } ⊂ ℝ𝑁×𝑁 be an SIA family. Then there exist 𝜅 ∈ (0, 1) and 𝑖0 ∈ ℕ such that we have
diam (𝑊𝜂 𝑥) ≤ 𝜅 diam(𝑥) , for every 𝜂 ∈ [𝑘]𝑖 with 𝑖 ≥ 𝑖0 and every 𝑥 : [𝑁] → H. Proof. We proceed in several steps. Step 1: We first show that
𝑊𝜂 𝑥 ∘ 𝑋0 = 𝔼 (𝑥 ∘ 𝑋𝑖 ‖F0 ) ,
(7.3.19)
for each 𝜂 ∈ [𝑘]𝑖 and 𝑖 ∈ ℕ, with (𝑋𝑖 ) being the Markov chain from Example 7.3.15. Indeed, by Proposition 7.3.10 we have 𝑁
𝔼F𝑖−1 (𝑥 ∘ 𝑋𝑖 ) (𝜔) = arg min ∑ 𝑊𝜂𝑖 (𝑋𝑖−1 (𝜔), 𝑗) 𝑑 (𝑧, 𝑥(𝑛))2 𝑧∈H
𝑛=1
= 𝑊𝜂𝑖 𝑥 ∘ 𝑋𝑖−1 (𝜔) , for each 𝜔 ∈ 𝛺, and repeating the argument yields
𝔼F𝑖−2 (𝔼F𝑖−1 (𝑥 ∘ 𝑋𝑖 )) (𝜔) = 𝔼F𝑖−2 (𝑊𝜂𝑖 𝑥 ∘ 𝑋𝑖−1 ) (𝜔) = 𝑊𝜂𝑖−1 𝑊𝜂𝑖 𝑥 ∘ 𝑋𝑖−2 (𝜔) , and easily also (7.3.19). Step 2: Let 𝑧 ∈ H and 𝑖 ∈ ℕ. Using (7.3.19) gives
𝑑 (𝑊𝜂 𝑥 ∘ 𝑋0 , 𝑧) = 𝑑 (𝔼 (𝑥 ∘ 𝑋𝑖 ‖F0 ) , 𝑧) and if we apply Theorem 7.3.6, then
≤ 𝔼F0 [𝑑 (𝑥 ∘ 𝑋𝑖 , 𝑧)] , which by (7.3.18) gives 𝑁
= ∑ 𝑝0𝑖 (𝑋0 (𝜔), 𝑘) 𝑑 (𝑧, 𝑥(𝑘)) . 𝑘=1
We hence arrive at 𝑁
𝑑 (𝑊𝜂 𝑥(𝑚), 𝑧) ≤ ∑ 𝑝0𝑖 (𝑚, 𝑘) 𝑑 (𝑧, 𝑥(𝑘))
(7.3.20)
𝑘=1
for each 𝑚 ∈ [𝑁]. Put now 𝑧 := 𝑊𝜂 𝑥(𝑛), with 𝑛 ∈ [𝑁], and repeat the argument leading to (7.3.20) to obtain 𝑁
𝑑 (𝑊𝜂 𝑥(𝑚), 𝑊𝜂 𝑥(𝑛)) ≤ ∑ 𝑝0𝑖 (𝑚, 𝑘) 𝑝0𝑖 (𝑛, 𝑙) 𝑑 (𝑥(𝑘), 𝑥(𝑙)) . 𝑘,𝑙=1
(7.3.21)
156  7 Probabilistic tools in Hadamard spaces Step 3: The righthand side (RHS) of (7.3.21) can be further estimated 𝑁
RHS = ∑ 𝑝0𝑖 (1, 𝑘) 𝑝0𝑖 (1, 𝑙) 𝑑 (𝑥(𝑘), 𝑥(𝑙)) 𝑘,𝑙=1 𝑁
+ ∑ [𝑝0𝑖 (𝑚, 𝑘) − 𝑝0𝑖 (1, 𝑘)] 𝑝0𝑖 (𝑛, 𝑙) 𝑑 (𝑥(𝑘), 𝑥(𝑙)) 𝑘,𝑙=1 𝑁
+ ∑ 𝑝0𝑖 (1, 𝑘) [𝑝0𝑖 (𝑛, 𝑙) − 𝑝01 (1, 𝑙)] 𝑑 (𝑥(𝑘), 𝑥(𝑙)) 𝑘,𝑙=1 𝑁
≤ ∑ 𝑝0𝑖 (1, 𝑘) 𝑝0𝑖 (1, 𝑙) 𝑑 (𝑥(𝑘), 𝑥(𝑙)) + 2 𝑝0𝑖 diam(𝑥) 𝑘,𝑙=1 𝑁
≤ ( ∑ 𝑝0𝑖 (1, 𝑘) 𝑝0𝑖 (1, 𝑙) + 2 𝑝0𝑖 ) diam(𝑥) 𝑘,𝑙=(𝑘1)=𝑙̸ 𝑁
= (1 − ∑ 𝑝0𝑖 (1, 𝑘)2 + 2 𝑝0𝑖 ) diam(𝑥) .
(7.3.22)
𝑘=1
Now observe that 𝑁
∑ 𝑝0𝑖 (1, 𝑘)2 ≥ 𝑘=1
1 1 𝑁 ∑ 𝑝0𝑖 (1, 𝑘) = , √𝑁 𝑘=1 √𝑁
and also, since 𝑝0𝑖 = 𝑊𝜂1 𝑊𝜂2 . . . 𝑊𝜂𝑖 , there exists 𝑖0 ∈ ℕ by Theorem 7.3.14 such that
1 , 𝑝0𝑖 ≤ 4√𝑁 whenever 𝑖 ≥ 𝑖0 . Altogether, we arrive at
𝑑 (𝑊𝜂 𝑥(𝑚), 𝑊𝜂 𝑥(𝑛)) ≤ (1 −
1 ) diam(𝑥) , √ 2 𝑁
which finishes the proof.
A consensus algorithm We end this section by showing a possible application of Theorem 7.3.16. Assume we have a mapping 𝑥 : [𝑁] → H and we would like to compute an average element of 𝑥(1), . . . , 𝑥(𝑁). It may happen that computing the geodesic between a certain pair of points 𝑥(𝑚) and 𝑥(𝑛) is difficult and we therefore cannot use Algorithm 7.2.3 to compute the mean. A possible solution to this issue is to find an SIA family {𝑊1 , . . . , 𝑊𝑘 } such that each of its matrices reflects the possibility to compute the geodesic between 𝑥(𝑚) and 𝑥(𝑛) at a particular step. Then it suffices to put
𝑥𝑖 (⋅) := 𝑊𝜂𝑖 𝑥𝑖−1 (⋅) ,
Bibliographical remarks
 157
̂ ∈ H such that 𝑥𝑖 (𝑛) → 𝑥̂ as 𝑖 → ∞ for for 𝑖 ∈ ℕ and Theorem 7.3.16 yields a point 𝑥 every 𝑛 = 1, . . . , 𝑁. The limit point 𝑥̂ is a natural alternative to the mean if the latter is difficult to compute. Algorithms of this type are called consensus algorithms.
Exercises Exercise 7.1. Let 𝑌𝑛 ∈ L2 (𝛺, H) be a sequence of random variables which converges to some 𝑌 ∈ L2 (𝛺, H) in the metric 𝑑2 . Show that then (𝑌𝑛 ) converges to 𝑌 in probability. Exercise 7.2. Show that the space (L2 (𝐴, G), 𝑑𝐴 ) from the proof of Proposition 7.3.3 is Hadamard. Hint. Mimic the proof of Proposition 1.2.18. Exercise 7.3. Show that (7.3.17) holds for functions, but not for mappings into a Hadamard space. Hint. If need be, see [194, Example 3.2].
Bibliographical remarks Probability theory in Hadamard spaces was developed by K.T. Sturm. The material of this chapter comes from his papers [192, 194–196]. We follow the original text very closely. There are also other deep results, which were not mentioned in the present book, for instance, martingale theory [62, 194] and a probabilistic approach to harmonic maps [192, 193, 196]. Proposition 7.1.2 is a special case of [194, Theorem 2.3]. The law of large numbers appeared in [195, Theorem 4.7] and it was used to prove the Jensen inequality in [195, Theorem 6.2]. For computing the Fréchet mean via the law of large numbers, see [16, 143, 195]. Conditional expectation theory (Section 7.3) is from [194]. Its counterpart for scalarvalued random variables can be found in many books on probability, for instance [73]. The nonlinear Markov property (Proposition 7.3.10) is a special case of [194, Theorem 5.2], which was mentioned in a similar form in [75, Proposition 6] and used in [94, Proposition 2.6]. Theorem 7.3.16 is the main result of [94] and it extends Theorem 7.3.14 by J. Wolfowitz [206] into Hadamard spaces. For details on Markov chains and SIA matrices, see for instance [185]. The papers [15, 75, 80, 150] contain further results in probability in Hadamard spaces.
8 Tree space and its applications Although the Hadamard space is an interesting object on its own and plays an important role in geometry and geometric group theory, we are pleased to present a Hadamard space which has real world applications. This space was constructed by L. Billera, S. Holmes and K. Vogtmann as a (moduli) space of metric trees with a fixed number of terminal vertices and is now called the BHV tree space, or just tree space. Since such trees typically model the evolutionary history of a group of species, tree space becomes a fitting framework for various computations in statistical phylogenetics. We first present the construction of tree space and then, in Section 8.2, we will be concerned with an algorithm for computing a convex combination of a given pair of trees in tree space. Unlike the approximation algorithms in Chapter 6, this algorithm works in finitely many steps and returns an exact result. Finally, in Section 8.3 we give approximation algorithms for computing medians and means of trees and outline their applications in phylogenetics.
8.1 Construction of the BHV tree space We now construct a space of trees, which will turn out to be a cubical CAT(0) complex. First we need to make precise what we mean by a tree. Given 𝑛 ∈ ℕ, a metric 𝑛tree is a tree¹ with 𝑛 + 1 terminal vertices called leaves that are labeled 0, 1, . . . , 𝑛, such that the edges have positive lengths. Vertices other than leaves have no labels since we consider them just as ‘branching points’ and their degree is at least three. The edges which are adjacent to a leaf are called leaf edges, and the remaining edges are called inner. We see an example of a metric 6tree with three inner edges 𝑒1 , 𝑒2 , and 𝑒3 in Figure 8.1. Recall that all edges, both leaf and inner, have nonnegative lengths. Instead of a metric 𝑛tree, we will write simply a tree. The number 𝑛 will be fixed and clear from the context. Later, when we consider multiple trees, it will be important 0
e1
e2 e3 4
1
2
3
5
6
Fig. 8.1. An example of a 6tree with three inner edges.
1 A tree is an undirected, connected, cyclefree graph.
8.1 Construction of the BHV tree space 
159
that they all have the same number of leaves. The leaf 0 may in some applications play a distinguished role (for instance in phylogenetics it can represent a socalled outgroup), but in our considerations it has the same status as other leaves. Each inner edge of a tree determines a unique partition of the set of leaves 𝐿 into two disjoint and nonempty subsets 𝐿 1 ∪ 𝐿 2 = 𝐿 called a split, which we denote 𝐿 1 𝐿 2 . A split is defined as the disjoint union of the two sets of leaves which arise when we removed the inner edge under consideration. For instance the inner edges 𝑒1 , 𝑒2 , 𝑒3 of the tree in Figure 8.1 have splits (0, 4, 5, 61, 2, 3), (0, 1, 2, 34, 5, 6) and (0, 1, 2, 3, 45, 6), respectively. On the other hand, having a set of leaves and splits subject to certain conditions, we can uniquely construct a tree [72, 184]. Namely, we require each pair of splits 𝐿 1 𝐿 2 and 𝐿1 𝐿2 be compatible, that is, one of the sets
𝐿 1 ∩ 𝐿1 ,
𝐿 1 ∩ 𝐿2 ,
𝐿 2 ∩ 𝐿1 ,
𝐿 2 ∩ 𝐿2
be empty. We say that a set of inner edges 𝐼 is compatible if for each pair of edges 𝑒, 𝑒 ∈ 𝐼, the corresponding splits are compatible. These terms will be essential for computing convex combinations of a given pair of points in tree space in Section 8.2. We will now proceed to construct a space of trees, that is, a space whose elements will be all metric 𝑛trees, where 𝑛 ∈ ℕ is a fixed number representing the number of leaf vertices. The resulting space will be the tree space T𝑛 . First, it is useful to observe that one can treat separately leaf edges and inner edges. Indeed, since the lengths of the 𝑛 + 1 leaf edges correspond to points in [0, ∞)𝑛+1 , the space T𝑛 is the product of [0, ∞)𝑛+1 and a factor representing the inner edges. We will now describe the latter. For simplicity, we may ignore the Euclidean factor of the space in the construction and denote the factor representing the inner edges also by T𝑛 . Binary 𝑛trees have the maximal possible number of inner edges, namely 𝑛 − 2. Fix now a metric 𝑛tree 𝑇 with 𝑟 inner edges of lengths 𝑙1 , . . . , 𝑙𝑟 , where 1 ≤ 𝑟 ≤ 𝑛 − 2. Clearly (𝑙1 , . . . , 𝑙𝑟 ) lies in the open orthant (0, ∞)𝑟 , and conversely, each point of (0, ∞)𝑟 corresponds to an 𝑛tree of the same combinatorial structure as 𝑇. Note that a tree 𝑇 is said to have the same combinatorial structure as 𝑇 if it has the same number of inner edges as 𝑇 and all its inner edges have the same splits as the inner edges of 𝑇. In other words, the trees 𝑇 and 𝑇 differ only by inner edge lengths. To each point from the boundary 𝜕(0, ∞)𝑟 we associate a metric 𝑛tree obtained from 𝑇 by shrinking some inner edges to zero length. Hence, each point from the closed orthant [0, ∞)𝑟 corresponds to a metric 𝑛tree of the same combinatorial structure as 𝑇, where we allowed some inner edges to shrink to zero length. An orthant containing trees that are not binary appears as a face of the orthants corresponding to (at least three) binary trees. In Figure 8.2, we see a copy of [0, ∞)2 representing all 4trees of a given combinatorial structure, namely, all 4trees with two inner edges 𝑒1 and 𝑒2 , such that the split of 𝑒1 is (1, 20, 3, 4) and the split of 𝑒2 is (1, 2, 30, 4). If the length of 𝑒1 is zero, then the tree lies on the vertical boundary ray. If the length of 𝑒2 is zero, then the tree lies on the horizontal boundary ray. In summary, each orthant O = [0, ∞)𝑟 , where 1 ≤ 𝑟 ≤ 𝑛 − 2, corresponds to a compatible set
160  8 Tree space and its applications
0 4
(1, 1)
3
1 2 3
2
1
0
4
1
(0, 2 )
0
0 1 1
2 3
4
2
3
4
( 34, 0)
(0, 0)
Fig. 8.2. 4trees of a given combinatorial structure.
0 0 1 1
3
4 0
2 3 4 1 2 3 4 0
0 1
2
2 3 4
1
2 3 4
Fig. 8.3. Five out of 15 orthants of the tree space T4 .
of inner edges, and conversely, each compatible set of inner edges 𝐴 := (𝑒1 , . . . , 𝑒𝑟 ) corresponds to a unique orthant O(𝐴), which is a copy of [0, ∞)𝑟 . The tree space T𝑛 consists of (2𝑛 − 3)(2𝑛 − 5) ⋅ ⋅ ⋅ 5 ⋅ 3 copies of orthants [0, ∞)𝑛−2 glued together along lowerdimensional faces, which correspond to nonbinary trees, that is, compatible sets of inner edges of cardinality < 𝑛 − 2. Note that the tree whose all inner edges have zero lengths can be view as the origin of tree space. We hence denote it 0. We now equip the tree space T𝑛 with a metric. Let 𝑇, 𝑇 ∈ T𝑛 . If both 𝑇 and 𝑇 lie in the same orthant, we define their distance as the length of the Euclidean line segment [𝑇, 𝑇 ]. If they lie in different orthants, we can connect them by a curve consisting of finitely many Euclidean line segments, and define the distance between 𝑇 and 𝑇 as the infimum of the lengths of all such curves. Then T𝑛 becomes a geodesic space. One can easily observe that each geodesic consists of finitely many Euclidean line segments each of which lies in one orthant. If we subdivide every orthant into the unit cubes having integral vertices, we obtain a cubical complex. The following important theorem states that tree space has nonpositive curvature.
8.2 Owen–Provan algorithm
 161
Theorem 8.1.1. The space T𝑛 is a Hadamard space. Proof. We have already observed that T𝑛 is a cubical complex. Indeed, it suffices to divide each orthant into cubes with integral vertices. By Theorem 1.2.14, it is a complete geodesic space. In order to apply Theorem 1.2.15, we need to show that the link of each vertex is a flag complex, which we leave as Exercise 8.1. Since T𝑛 is obviously simply connected we obtain it is a Hadamard space.
8.2 Owen–Provan algorithm Next we are concerned with the issue of computing a convex combination of a given pair of trees 𝑇, 𝑇 ∈ T𝑛 and computing their mutual distance. We will describe an ingenious algorithm which, given 𝑡 ∈ [0, 1], in a finite number of steps returns the tree (1 − 𝑡)𝑇 + 𝑡𝑇 and computes 𝑑(𝑇, 𝑇 ). From a practical point of view, it is important that this algorithm works in polynomial time. Our aim is to give a description detailed enough so that the reader understands the algorithm and is able to implement it on a computer. We however do not give proofs due to their technical nature and refer the reader to the original papers instead. As in the construction of tree space in Section 8.1, we observe that the leaf edges of any tree on the geodesic between 𝑇 and 𝑇 lie in the Euclidean factor of T𝑛 , and hence we can restrict our attention only to the inner edges. Denote E the set of inner edges of 𝑇 and E the set of inner edges of 𝑇 . If the trees 𝑇 and 𝑇 have a common inner edge, that is, if there exist 𝑒 ∈ E and 𝑒 ∈ E which have the same splits, then this common edge will be present in any tree on the geodesic. This means that we can remove this edge and solve the original problem for the two subtrees which arise by removing this common edge. Hence we may assume without loss of generality that 𝑇 and 𝑇 have no edge in common. Let A := {𝐴 1 , . . . , 𝐴 𝑘 } and B := {𝐵1 , . . . , 𝐵𝑘 } be partitions of E and E , respectively. If
𝐵1 ∪ ⋅ ⋅ ⋅ ∪ 𝐵𝑙 ∪ 𝐴 𝑙+1 ∪ ⋅ ⋅ ⋅ ∪ 𝐴 𝑘 is a compatible set for each 𝑙 ∈ {1, . . . , 𝑘}, then there exist corresponding orthants O𝑙 = O𝑙 (𝐵1 ∪ ⋅ ⋅ ⋅ ∪ 𝐵𝑙 ∪ 𝐴 𝑙+1 ∪ ⋅ ⋅ ⋅ ∪ 𝐴 𝑘 ) in the tree space T𝑛 . The finite sequence of such orthants P := (O1 , . . . , O𝑘 ) is called a path space for the pair (𝑇, 𝑇 ). The shortest curve through a path space which connects 𝑇 and 𝑇 is called a path space geodesic. The pair (A, B) is called its support. As the next result shows, it suffices to look for a geodesic among path space geodesics. Proposition 8.2.1. Let 𝑇, 𝑇 ∈ T𝑛 be trees with no edge in common. Then the geodesic connecting 𝑇 and 𝑇 is a path space geodesic for some path space between 𝑇 and 𝑇 .
162  8 Tree space and its applications We shall now proceed to identify path space geodesics. For a set of inner edges 𝐴, denote
‖𝐴‖ := √ ∑ 𝑒2 , 𝑒∈𝐴
where 𝑒 stands for the length of 𝑒. It is sometimes important to specify in which tree a given edge is considered, we then use the symbol 𝑒𝑇 to denote the length of an edge 𝑒 of a tree 𝑇. Theorem 8.2.2. Let 𝑇, 𝑇 ∈ T𝑛 be trees with no edge in common. Then a curve 𝛾 : [0, 1] → T𝑛 such that 𝛾(0) = 𝑇 and 𝛾(1) = 𝑇 is a geodesic if and only if there exist partitions A = {𝐴 1 , . . . , 𝐴 𝑘 } and B = {𝐵1 , . . . , 𝐵𝑘 } of E and E , respectively, satisfying the following conditions: (i) For any 𝑙 > 𝑚, the sets 𝐴 𝑙 and 𝐵𝑚 are compatible. (ii) The sets satisfy
‖𝐴 𝑘 ‖ ‖𝐴 1 ‖ ‖𝐴 2 ‖ ≤ ≤ ⋅⋅⋅ ≤ . ‖𝐵1 ‖ ‖𝐵2 ‖ ‖𝐵𝑘 ‖ (iii) For each 𝑙 ∈ {1, . . . , 𝑘}, there are no partition 𝐶1 ∪ 𝐶2 of 𝐴 𝑙 , and partition 𝐷1 ∪ 𝐷2 of 𝐵𝑙 , such that 𝐶2 is compatible with 𝐷1 and
‖𝐶1 ‖ ‖𝐶2 ‖ < , ‖𝐷1 ‖ ‖𝐷2 ‖ and 𝛾 is a path space geodesic with support (A, B). The algorithm for computing a geodesic is based on Theorem 8.2.2. We start with the support (A0 , B0 ), where A := E and B := E , and with the path space geodesic 𝛾0 which consists of the line segment connecting 𝑇 and the origin of tree space 0 and the line segment connecting 0 and 𝑇 . Having a path space geodesic 𝛾𝑖 with support (A𝑖 , B𝑖 ) satisfying conditions (i) and (ii) of Theorem 8.2.2, we check whether the condition (iii) of Theorem 8.2.2 is also satisfied. If so, we have the geodesic, otherwise we construct a shorter path space geodesic 𝛾𝑖+1 with support (A𝑖+1 , B𝑖+1 ) satisfying conditions (i) and (ii) of Theorem 8.2.2. By [157, Theorem 3.5] we know that this algorithm gives a geodesic in finitely many steps. Let us now take a closer look at the iterative step 𝑖 → 𝑖 + 1 and reformulate it as the extension problem for bipartite graphs. Given sets 𝐴 ⊂ E and 𝐵 ⊂ E , we define their incompatibility graph as a bipartite graph 𝐺(𝐴 ∪ 𝐵, 𝐸) with the vertex set 𝐴 ∪ 𝐵, whose edges 𝐸 correspond to pairs 𝑒 ∈ 𝐴 and 𝑓 ∈ 𝐵 with incompatible splits. Clearly, two sets 𝐶 ⊂ 𝐴 and 𝐷 ⊂ 𝐵 are compatible if and only if 𝐶 ∪ 𝐷 is an independent set in 𝐺(𝐴 ∪ 𝐵, 𝐸). We define the weight of a vertex to be the squared length of the corresponding tree edge. The extension problem for 𝐺(𝐴 ∪ 𝐵, 𝐸) asks whether there exist a partition 𝐶1 ∪ 𝐶2 of 𝐴 and partition 𝐷1 ∪ 𝐷2 of 𝐵 such that 𝐶2 ∪ 𝐷1 is an
8.2 Owen–Provan algorithm
 163
independent set in 𝐺(𝐴 ∪ 𝐵, 𝐸), and
𝐶2 𝐶1 < . 𝐷1 𝐷2
(8.2.1)
Hence a path space geodesic 𝛾𝑖 with support (A𝑖 , B𝑖 ) is a geodesic if and only if the extension problem has no solution for any pair (𝐴 𝑘 , 𝐵𝑘 ) of (A𝑖 , B𝑖 ). To reformulate the extension problem, we first note that scaling the edge lengths does not affect (8.2.1), hence we multiply each edge length so that ‖𝐴‖ = ‖𝐵‖ = 1. Then (8.2.1) is equivalent to
2 2 𝐶2 + 𝐷1 > 1, and we therefore look for an independent set in 𝐺(𝐴 ∪ 𝐵, 𝐸) having sufficiently large total weight. Since the sets 𝐶2 and 𝐷1 form an independent set if and only if their complements 𝐶1 and 𝐷2 form a vertex cover of the graph 𝐺(𝐴 ∪ 𝐵, 𝐸), the extension problem is equivalent to finding a vertex cover 𝐶1 ∪ 𝐷2 ⊂ 𝐴 ∪ 𝐵 such that 2 2 𝐶1 + 𝐷2 < 1.
For a general graph, this is a typical NPhard problem (its decision version is NPcomplete), but in case of bipartite graphs, it is classically solved via a flow network in polynomial time. Indeed, we transform the bipartite graph 𝐺(𝐴 ∪ 𝐵, 𝐸), depicted in Figure 8.4, into a flow network as follows. First we impose an orientation to all edges from 𝐸 so that they go from 𝐴 to 𝐵, and assign infinite capacity to each of these edges. Then we add a new vertex called a source, and connect it with each vertex of 𝐴 so that all these edges are oriented from the source to 𝐴. Set the capacity of an edge going from the source to a vertex 𝑎 ∈ 𝐴 to be the weight of 𝑎. In a similar way, we add a new vertex called a sink, and connect all vertices of 𝐵 with it. These edges are oriented from 𝐵 to the sink and have capacities equal to the weights of the vertices of 𝐵. We hence obtained a flow network as in Figure 8.5. The aim now is to push as much of a flow from the source to the sink as possible. A maximal flow then gives a minimal cut by the maxflow mincut theorem, and a minimal cut is exactly the desired vertex cover, as one can easily observe. There exist many algorithms which give a maximal flow in polynomial time. We recommend for instance the pushrelabel algorithm due to A. Goldberg and R. Tarjan [93]. Recall that we consider two trees 𝑇, 𝑇 ∈ T𝑛 , which have no common edge, and our ultimate goal is to compute their distance and convex combination. So far we have found partitions A = {𝐴 1 , . . . , 𝐴 𝑘 } and B = {𝐵1 , . . . , 𝐵𝑘 } of the sets of edges E and E , respectively, satisfying the properties of Theorem 8.2.2. A path space geodesic 𝛾 : [0, 1] → T𝑛 from 𝑇 to 𝑇 with support (A, B) consists of 𝑘 + 1 Euclidean line
164  8 Tree space and its applications B w4
A
w1
w5 w2
w6 w3 w7
Fig. 8.4. A bipartite graph with vertex weights.
B
A
∞
w4
w1 ∞
w2
w5 sink
source ∞
w6
w3 ∞
w7
Fig. 8.5. The flow network with edge capacities.
segments 𝐴 ≤ ‖ 𝐵 1‖ } , if 𝑖 = 0, ‖ 1‖ 𝐴 𝐴 𝑡 ≤ ‖ 𝐵 𝑖+1 ‖ } , if 𝑖 = 1, . . . , 𝑘 − 1, {𝛾(𝑡) : ‖ 𝐵 𝑖 ‖ ≤ 1−𝑡 ‖ 𝑖‖ ‖ 𝑖+1 ‖ 𝐴 if 𝑖 = 𝑘, {𝛾(𝑡) : ‖ 𝐵 𝑘 ‖ } , ‖ 𝑘‖
{𝛾(𝑡) :
𝑡 1−𝑡
where 𝑡 ∈ [0, 1], and where the points on each segment are associated with tree 𝑇𝑖 having the edge set
𝐵1 ∪ ⋅ ⋅ ⋅ ∪ 𝐵𝑖 ∪ 𝐴 𝑖+1 ∪ ⋅ ⋅ ⋅ ∪ 𝐴 𝑘 ,
8.3 Medians and means of trees

165
and edge lengths (1−𝑡)‖𝐴 𝑗 ‖−𝑡‖𝐵𝑗 ‖ 𝑒𝑇 , if 𝑒 ∈ 𝐴 𝑗 , { { ‖𝐴 𝑗 ‖ 𝑒𝑇𝑖 := { { 𝑡‖𝐵𝑗 ‖−(1−𝑡)‖𝐴 𝑗 ‖ 𝑒𝑇 , if 𝑒 ∈ 𝐵𝑗 . ‖𝐵 𝑗 ‖ {
According to Theorem 8.2.2, this path space geodesic is actually a geodesic. Given now 𝑡 ∈ [0, 1], it is straightforward to construct the tree (1−𝑡)𝑇+𝑡𝑇 . The distance between 𝑇 and 𝑇 is given by 1 2
𝑘
2 𝑑 (𝑇, 𝑇 ) = [∑ (𝐴 𝑖 + 𝐵𝑖 ) ] .
𝑖=1
If two trees have common edges, the decomposition into subtrees with no common edge can be done in linear time. Hence the whole Owen–Provan algorithm for finding a geodesic and computing distances in tree space works in polynomial time.
8.3 Medians and means of trees In phylogenetics, the terminal vertices of a tree represent living species and the tree is to reflect the evolutionary history of these species. As a result of statistical computations (e.g. a Markov chain Monte Carlo simulation) or other methods, biologists obtain a large set of trees and it is desirable to represent this data set by a single tree. There are various reasons why the mean is a good choice. The task hence is, given a finite family of trees, to find their mean. This should serve as a brief motivation for the next developments. Now we will show how to address the issue of computing a median and mean via the splitting proximal point algorithm as promised in Example 6.3.4. Let (H, 𝑑) be a locally compact Hadamard space. Assume 𝑎1 , . . . , 𝑎𝑁 ∈ H and (𝑤1 , . . . , 𝑤𝑁 ) ∈ 𝛥 𝑁−1 . We minimize the function 𝑁
𝑝
𝑓(𝑥) := ∑ 𝑤𝑛𝑑 (𝑥, 𝑎𝑛) ,
𝑥 ∈ H,
(8.3.2)
𝑛=1
where 𝑝 = 1 or 𝑝 = 2, employing the splitting PPA with 𝑓𝑛 := 𝑤𝑛 𝑑(⋅, 𝑎𝑛 )𝑝 , for 𝑛 = 1, . . . , 𝑁. Given 𝜆 > 0, we denote the resolvent of 𝑓𝑛 by 𝐽𝜆𝑛 , hence, 𝑝
𝐽𝜆𝑛𝑥 := arg min [𝑤𝑛𝑑 (𝑦, 𝑎𝑛) + 𝑦∈H
1 𝑑(𝑦, 𝑥)2 ] , 2𝜆
𝑥 ∈ H,
(8.3.3)
and it is obvious that 𝐽𝜆𝑛 𝑥 lies on the geodesic [𝑥, 𝑎𝑛 ], that is,
𝐽𝜆𝑛 𝑥 = (1 − 𝑡)𝑥 + 𝑡𝑎𝑛, for some 𝑡 ∈ [0, 1] and our only task is to find such 𝑡. In other words we are to solve a onedimensional problem at each step. We will now look at explicit forms of the approximation algorithms.
166  8 Tree space and its applications Algorithms for computing the mean Let us first consider the cyclic order version from (6.3.19). Let (𝜆 𝑘 ) be a sequence of positive reals satisfying (6.3.18). We start at some point 𝑥0 ∈ H and for each 𝑘 ∈ ℕ0 we set
𝑥𝑘𝑁+1 := 𝐽𝜆1𝑘 (𝑥𝑘𝑁 ) , 𝑥𝑘𝑁+2 := 𝐽𝜆2𝑘 (𝑥𝑘𝑁+1 ) , .. .
𝑥𝑘𝑁+𝑁 := 𝐽𝜆𝑁𝑘 (𝑥𝑘𝑁+𝑁−1 ) . It is easy to find the resolvents explicitly by (8.3.3). Indeed, fix 𝑘 ∈ ℕ0 and 1 ≤ 𝑛 ≤ 𝑁. Then 𝑥𝑘𝑁+𝑛 lies on the geodesic [𝑥𝑘𝑁+𝑛−1 , 𝑎𝑛 ] and by an elementary calculation we get
𝑥𝑘𝑁+𝑛 = (1 − 𝑡𝑛𝑘 ) 𝑥𝑘𝑁+𝑛−1 + 𝑡𝑛𝑘 𝑎𝑛, where
𝑡𝑛𝑘 =
2𝜆 𝑘 𝑤𝑛 . 1 + 2𝜆 𝑘 𝑤𝑛
(8.3.4)
(8.3.5)
The above algorithm then reads: Algorithm 8.3.1 (Computing mean, cyclic order version). Given 𝑥0 ∈ H and (𝜆 𝑘 ) satisfying (6.3.18) we set
2𝜆 𝑘 𝑤1 1 𝑥𝑘𝑁 + 𝑎, 1 + 2𝜆 𝑘 𝑤1 1 + 2𝜆 𝑘 𝑤1 1 2𝜆 𝑘 𝑤2 1 𝑥𝑘𝑁+1 + 𝑎, 𝑥𝑘𝑁+2 := 1 + 2𝜆 𝑘 𝑤2 1 + 2𝜆 𝑘 𝑤2 2 𝑥𝑘𝑁+1 :=
.. .
𝑥𝑘𝑁+𝑁 :=
2𝜆 𝑘 𝑤𝑁 1 𝑥𝑘𝑁+𝑁−1 + 𝑎 , 1 + 2𝜆 𝑘 𝑤𝑁 1 + 2𝜆 𝑘 𝑤𝑁 𝑁
for each 𝑘 ∈ ℕ0 and 𝑛 = 1, . . . , 𝑁. The convergence of the sequence (𝑥𝑗 ) produced by Algorithm 8.3.1 to the weighted mean of the points 𝑎1 , . . . , 𝑎𝑁 follows by Corollary 6.3.15. Note that if the weights are uniform, that is, 𝑤𝑛 = 𝑁1 for each 𝑛 = 1, . . . , 𝑁, then the coefficients 𝑡𝑛𝑘 are independent of 𝑛. We will now turn to the randomized version from (6.3.25). By a similar process as above we obtain the following algorithm. Algorithm 8.3.2 (Computing mean, random order version). Let 𝑥0 ∈ H be a starting point and (𝜆 𝑘 ) satisfy (6.3.24). At each step 𝑘 ∈ ℕ0 , choose randomly 𝑟𝑘 ∈ {1, . . . , 𝑁}
8.3 Medians and means of trees

167
according to the uniform distribution and put
𝑥𝑘+1 :=
2𝜆 𝑘 𝑤𝑟𝑘 1 𝑥𝑘 + 𝑎 . 1 + 2𝜆 𝑘 𝑤𝑟𝑘 1 + 2𝜆 𝑘 𝑤𝑟𝑘 𝑟𝑘
The convergence of the sequence (𝑥𝑘 ) produced by Algorithm 8.3.2 to the weighted mean of the points 𝑎1 , . . . , 𝑎𝑁 follows by Corollary 6.3.15. Remark 8.3.3 (Comparing with Algorithm 7.2.3). We shall now compare Algorithm 8.3.2 with Algorithm 7.2.3. Let us first consider the unweighted case, that is, 𝑤𝑛 = 𝑁1 for every 𝑛 = 1, . . . , 𝑁. At each iteration 𝑘 ∈ ℕ0 , Algorithm 7.2.3 selects 𝑟𝑘 ∈ {1, . . . , 𝑁} according to the uniform distribution and generates a new point
𝑠𝑘+1 :=
𝑘 1 𝑠𝑘 + 𝑎 . 𝑘+1 𝑘 + 1 𝑟𝑘
In Algorithm 8.3.2, at each step 𝑘 ∈ ℕ0 we again randomly choose a number 𝑟𝑘 ∈ {1, . . . , 𝑁} according to the uniform distribution and put
𝑥𝑘+1 :=
2𝜆 𝑘 1 𝑥 + 𝑎 . 1 + 2𝜆 𝑘 𝑘 1 + 2𝜆 𝑘 𝑟𝑘
Thus Algorithm 8.3.2 produces the same sequence as Algorithm 7.2.3 provided we set 1 𝜆 𝑘 := 2𝑘 for each 𝑘 ∈ ℕ. In other words Algorithm 7.2.3 is a special case of Algorithm 8.3.2. On the other hand as far as weighted Fréchet means are concerned, there exists a difference between these two algorithms. Indeed, if 𝑤 := (𝑤1 , . . . , 𝑤𝑁 ) are the weights, then the algorithm (7.2.9) selects 𝑟𝑘 ∈ {1, . . . , 𝑁} according to the distribution 𝑤 and generates a new point
𝑠𝑘+1 :=
𝑘 1 𝑠 + 𝑎 , 𝑘 + 1 𝑘 𝑘 + 1 𝑟𝑘
that is, with the same coefficients as in the unweighted case. Algorithm 8.3.2 in contrast still selects 𝑟𝑘 ∈ {1, . . . , 𝑁} according to the uniform distribution, but the new point is given by
𝑥𝑘+1 =
2𝜆 𝑘 𝑤𝑟𝑘 1 𝑥𝑘 + 𝑎 , 1 + 2𝜆 𝑘 𝑤𝑟𝑘 1 + 2𝜆 𝑘 𝑤𝑟𝑘 𝑟𝑘
that is, the coefficients now do depend on the weights. In summary, introducing weights effects either the coefficients (Algorithm 8.3.2), or the probability distribution which is used for selecting the points 𝑎1 , . . . , 𝑎𝑁 (Algorithm 7.2.3).
168  8 Tree space and its applications Algorithms for computing the median We have to treat the cyclic and the random case separately. In the cyclic order version, we start at some point 𝑥0 ∈ H and for each 𝑘 ∈ ℕ0 we set
𝑥𝑘𝑁+1 := 𝐽𝜆1𝑘 (𝑥𝑘𝑁 ) , 𝑥𝑘𝑁+2 := 𝐽𝜆2𝑘 (𝑥𝑘𝑁+1 ) , .. .
𝑥𝑘𝑁+𝑁 := 𝐽𝜆𝑁𝑘 (𝑥𝑘𝑁+𝑁−1 ) , where 𝐽𝜆𝑛𝑘 is the resolvent of the function 𝑓𝑛 := 𝑤𝑛 𝑑(⋅, 𝑎𝑛 ), where 𝑛 = 1, . . . , 𝑁, and (𝜆 𝑘 ) is a sequence of positive reals satisfying (6.3.18). Given 𝑘 ∈ ℕ0 and 1 ≤ 𝑛 ≤ 𝑁, it is obvious that 𝐽𝜆𝑛 (𝑥𝑘𝑁+𝑛−1 ) lies on the geodesic [𝑥𝑘𝑁+𝑛−1 , 𝑎𝑛 ], that is, 𝑘
𝑥𝑘𝑁+𝑛 = (1 − 𝑡𝑛𝑘 ) 𝑥𝑘𝑁+𝑛−1 + 𝑡𝑛𝑘 𝑎𝑛, for some 𝑡𝑛𝑘 ∈ [0, 1]. These coefficients are again easy to determine. Algorithm 8.3.4 (Computing median, cyclic order version). Given 𝑥0 ∈ H and (𝜆 𝑘 ) satisfying (6.3.18) we set
𝑥𝑘𝑁+1 := (1 − 𝑡1𝑘 ) 𝑥𝑘𝑁 + 𝑡1𝑘 𝑎1 , 𝑥𝑘𝑁+2 := (1 − 𝑡2𝑘 ) 𝑥𝑘𝑁+1 + 𝑡2𝑘 𝑎2 , .. . 𝑁 𝑥𝑘𝑁+𝑁 := (1 − 𝑡𝑁 𝑘 ) 𝑥𝑘𝑁+𝑁−1 + 𝑡𝑘 𝑎𝑁 ,
with 𝑡𝑛𝑘 defined by
𝑡𝑛𝑘 := min {1,
𝜆 𝑘 𝑤𝑛 }, 𝑑 (𝑎𝑛, 𝑥𝑘𝑁+𝑛−1 )
for each 𝑘 ∈ ℕ0 and 𝑛 = 1, . . . , 𝑁. The convergence of the sequence (𝑥𝑗 ) produced by Algorithm 8.3.4 to the median of the points 𝑎1 , . . . , 𝑎𝑁 with the weights (𝑤1 , . . . , 𝑤𝑁 ) follows by Corollary 6.3.15 above. Finally, the randomized version can be derived in a similar way. Algorithm 8.3.5 (Computing median, random order version). Let 𝑥0 ∈ H be a starting point and (𝜆 𝑘 ) satisfy (6.3.24). At each step 𝑘 ∈ ℕ0 , choose randomly 𝑟𝑘 ∈ {1, . . . , 𝑁} according to the uniform distribution and put
𝑥𝑘+1 := (1 − 𝑡𝑘 ) 𝑥𝑘 + 𝑡𝑘 𝑎𝑟𝑘 , with 𝑡𝑘 defined by
𝑡𝑘 := min {1, for each 𝑘 ∈ ℕ0 .
𝜆 𝑘 𝑤𝑟𝑘 𝑑 (𝑎𝑟𝑘 , 𝑥𝑘 )
},
(8.3.6)
8.3 Medians and means of trees 
169
Table 8.1. Multiple sequence alignment of the 13 species. Squirrel: Baboon: Pika: Guinea pig: Marmoset: Orangutan: Mouse: Human: Rat: Rabbit: Chimpanzee: Kangaroo rat: Gorilla:
TAAGTAGAGGTGAAAAGCCTACCGAGCCTGGTGATAGCTGGTTGTCCAGACTAGAATTTTAGTTCTACTTT TGGGTAGAGGTGACAAGCCTACCGAGCCTGGTGATAGCTGGTTATCCAAGATAGAATCTTAGTTCAGCCTT CAGATAGTGGTGAAAAGCCAAACGAGCCTAGTGATAGCTGGTTGTCCAGGTTAGAATTTTAGTTCAACTTT GAAGTAGAGGTGAAAAGCCAATCGAGCTTGGTGATAGCTGGTTATCCAAATAAGAATATCAGTTCAGCTTT TGGGTAGTGGTGACAAACCTACCGAGCCTGGTGATAGCTGGTTATCCAAGACAGAATCTTAGTTCAACTTT TGGGTAGAGGCGACAAACCTACCGAGCCTGGTGATAGCTGGTTGTCCAAGACAGAATCTTAGTTCAACTTT TAGGTAGAGGTGAAAAGCCTAACGAGCTTGGTGATAGCTGGTTACCCAAAAAATGAATTTAAGTTCAATTTT TAGGTAGAGGCGACAAACCTACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAACTTT TAGGTAGAGGTGAAAAGCCTATCGAGCTTGGTGATAGCTGGTTGCCCAAAAAAGAATTTCAGTTCAAACTT TAAATAGAGGTGAAAAGCCAACCGAGCCTGGTGATAGCTGGTTGTCCAGAATAGAATTTTAGTTCAACTTT TAGGTAGAGGCGACAAACCTACCGAGCCTGGTGATAGCTGGTTGTCCAAGATAGAATCTTAGTTCAACTTT AAGGTAGAGGTGAAAAGCCTAACGAGCCTGATGATAGCTGGTTGTCCAAGAAAGAATCTTAGTTCACCTTA TAGGTAGAGGCGACAAACCTACCGAGCCTGGTGATAGCTGGTTGTCCAAGACAGAATCTTAGTTCAACTTT
The convergence of the sequence (𝑥𝑘 ) produced by Algorithm 8.3.5 to the median of the points 𝑎1 , . . . , 𝑎𝑁 follows by Corollary 6.3.15.
A real data computation We can apply the preceding algorithms in tree space T𝑛 to compute the median and mean of a family of trees 𝑇1 , . . . , 𝑇𝑁 ∈ T𝑛 . In this case the Owen–Provan algorithm is needed to compute convex combinations of a given pair of point. For instance, in Algorithm 8.3.1 we have to compute the convex combination in (8.3.4) and as a matter of fact, this is done at each iteration of this algorithm. It is therefore important that the Owen–Provan algorithm works in polynomial time, which makes Algorithm 8.3.1 fast and usable in practice. Likewise for the remaining algorithms for computing medians and means. To demonstrate a real data computation, we present the following example from phylogenetics. The labels {0, . . . , 𝑛} of a (phylogenetic) tree now represent a group of living species. In our example 13 species are considered and hence 𝑛 = 12. In Table 8.1 we list the species along with a piece of their DNA sequences² called a multiple sequence alignment. There are several methods for constructing trees out of a multiple sequence alignment and we refer the interested reader to [34] for a description of one of such statistical models relying upon a Markov chain Monte Carlo simulation.³ As a result we obtain sample trees from a posterior distribution in the tree space T12 . The mean of the generated trees is an approximation of the posterior expectation and the higher the number of sample trees, the more accurate this approximation. In our computation we generated 100,000 such trees and depict four of them in Figure 8.6. Looking at, for instance, the position of the guinea pig, one can easily observe that the trees have different topologies and we therefore really need to use multiple orthants
2 This particular data comes from the MTRNR2 gene. 3 For a detailed background, see [159].
170  8 Tree space and its applications
Pika Rabbit
Pika Rabbit Orangutan Baboon
Kangaroo rat Rat Mouse Squirrel Guinea pig Orangutan Baboon
Marmoset Gorilla Human Chimpanzee Kangaroo rat Squirrel Guinea pig Rat Mouse
Marmoset Gorilla Human Chimpanzee
Pika Rabbit Squirrel Orangutan Gorilla Human Chimpanzee Baboon
Pika Rabbit Squirrel Kangaroo rat Guinea pig Rat Mouse Orangutan Gorilla Human Chimpanzee Baboon
Marmoset Guinea pig Kangaroo rat Rat Mouse
Marmoset
Fig. 8.6. Four out of the 100,000 trees in our example.
Pika Rabbit Baboon Marmoset Orangutan Gorilla Chimpanzee Human Rat Mouse Kangaroo rat Guinea pig Squirrel Fig. 8.7. The mean of the 100,000 trees in our example.
Bibliographical remarks
 171
of tree space and hence the full strength of Hadamard space optimization. The mean 1 of these 100,000 trees was computed using Algorithm 8.3.2 with step sizes 𝜆 𝑘 := 𝑘+1 ,
where 𝑘 ∈ ℕ0 . The result of 108 iterations of this algorithm is depicted in Figure 8.7. This tree is a good point estimate of the posterior distribution.
Exercises Exercise 8.1. Complete the missing bit in the proof of Theorem 8.1.1 by showing that the link of each vertex is a flag complex. Hint. See [42, Lemma 4.1].
Bibliographical remarks Tree space was introduced in the seminal paper [42], where is was also shown that it is a Hadamard space. For a long time there was no efficient algorithm for computing convex combinations of a given pair of points, which made the BHV tree space less attractive for applications. This situation has changed after the discovery of M. Owen and S. Provan [157]. The Owen–Provan algorithm builds upon prior studies of geodesics in tree space, which were worked out in [42, 158]. Proposition 8.2.1 was proved in [42, Proposition 4.1]; see also [157, Theorem 2.2]. Theorem 8.2.2 contains results from [158] and [157, Theorem 2.5]. The algorithms for computing medians and means appeared in [16]. Their implementations are available as free software [20, 35]. A statistical model for phylogenetic inference which relies upon these algorithms was developed in [34]. As a reference for graph theory needed in this chapter, we used [161]. We also recommend the excellent book [159] by L. Pachter and B. Sturmfels for the mathematical background of contemporary phylogenetics. Since a treelike structure is frequent in nature, tree space can be used as a framework for various biological models and computing medians and means is often of importance. In this way tree space has been used to model brain arteries [202] and lung airways [83, 84]. The space 𝑃(𝑛, ℝ) of symmetric positive definite matrices 𝑛 × 𝑛 with real entries is a Hadamard manifold; see Example 1.2.12. This manifold plays a key role in diffusion tensor imaging as explained in [165] and computing Fréchet means of a finite family of symmetric positive definite matrices is one of the crucial operations [165, Section 3.7]. The algorithms of Section 8.3 can therefore find applications also in this area. The real data computations and visualizations in this chapter are courtesy of Philipp Benner. I am most grateful to Megan Owen for her very valuable comments on this chapter.
172  8 Tree space and its applications Martin Bridson kindly pointed out to me that the BHV tree space is also a very interesting object in geometric group theory: it is essentially the cone over the link of the rose in the simplicial spine of Outer space introduced in [69] by M. Culler and K. Vogtmann. The BHV construction shows that there is a natural CAT(0) metric on this cone, but in contrast there does not exist a CAT(0) metric on the whole of the spine of Outer space. This result is due to M. Bridson [50]. For a gentle introduction to Culler–Vogtmann’s Outer space, we recommend [201].
References [1] [2]
[3] [4] [5] [6] [7] [8]
[9]
[10]
[11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24]
P. Abramenko and K. S. Brown, Buildings, Graduate Texts in Mathematics 248, Springer, New York, 2008, Theory and applications. A. D. Aleksandrov, A theorem on triangles in a metric space and some of its applications, Trudy Mat. Inst. Steklov., v 38, Trudy Mat. Inst. Steklov., v 38, Izdat. Akad. Nauk SSSR, Moscow, 1951, pp. 5–23. S. Alexander, V. Kapovich and A. Petrunin, Alexandrov geometry, Book in preparation. S. B. Alexander and R. L. Bishop, The HadamardCartan theorem in locally convex metric spaces, Enseign. Math. (2) 36 (1990), 309–320. S. B. Alexander and R. L. Bishop, Warped products of Hadamard spaces, Manuscripta Math. 96 (1998), 487–505. A. D. Alexandrow, Die innere Geometrie der konvexen Flächen, AkademieVerlag, Berlin, 1955. L. Ambrosio and N. Gigli, A user’s guide to optimal transport, Modelling and optimisation of flows on networks, Lecture Notes in Math. 2062, Springer, Heidelberg, 2013, pp. 1–155. L. Ambrosio, N. Gigli and G. Savaré, Gradient flows in metric spaces and in the space of probability measures, second ed, Lectures in Mathematics ETH Zürich, Birkhäuser Verlag, Basel, 2008. L. Ambrosio and G. Savaré, Gradient flows of probability measures, Handbook of differential equations: evolutionary equations. Vol. III, Handb. Differ. Equ., Elsevier/NorthHolland, Amsterdam, 2007, pp. 1–136. K. Aoyama, Y. Kimura, W. Takahashi and M. Toyoda, Approximation of common fixed points of a countable family of nonexpansive mappings in a Banach space, Nonlinear Anal. 67 (2007), 2350–2360. D. ArizaRuiz, L. Leuştean and G. LópezAcedo, Firmly nonexpansive mappings in classes of geodesic spaces, Trans. Amer. Math. Soc. 366 (2014), 4299–4322. H. Attouch, Variational convergence for functions and operators, Applicable Mathematics Series, Pitman (Advanced Publishing Program), Boston, MA, 1984. H. Attouch, Familles d’opérateurs maximaux monotones et mesurabilité, Ann. Mat. Pura Appl. (4) 120 (1979), 35–111. A. Auslender, Méthodes numériques pour la résolution des problèmes d’optimisation avec constraintes, Ph.D. thesis, Faculté des Sciences, Grenoble, 1969. T. Austin, A CAT(0)valued pointwise ergodic theorem, J. Topol. Anal. 3 (2011), 145–152. M. Bačák, Computing medians and means in Hadamard spaces, to appear in SIAM J. Optim. arXiv:1210.2145. M. Bačák, Convergence of semigroups under nonpositive curvature, to appear in Trans. Amer. Math. Soc., arXiv:1211.0414. M. Bačák, A new proof of the LieTrotterKato formula in Hadamard spaces, appeared in Commun. Contemp. Math., http://www.worldscientific.com/doi/abs/10.1142/S0219199713500442. M. Bačák, The proximal point algorithm in metric spaces, Israel J. Math. 194 (2013), 689–701. M. Bačák, TrAP – Tree Averaging Program: https://github.com/bacak/TrAP, 2013. M. Bačák and S. Reich, The asymptotic behavior of a class of nonlinear semigroups in Hadamard spaces, accepted in J. Fixed Point Theory Appl., arXiv:1405.6637. M. Bačák, I. Searston and B. Sims, Alternating projections in CAT(0) spaces, J. Math. Anal. Appl. 385 (2012), 599–607. J. B. Baillon, Un exemple concernant le comportement asymptotique de la solution du problème 𝑑𝑢/𝑑𝑡 + 𝜕𝜑(𝑢) ∋ 0, J. Funct. Anal. 28 (1978), 369–376. J. B. Baillon, R. E. Bruck and S. Reich, On the asymptotic behavior of nonexpansive mappings and semigroups in Banach spaces, Houston J. Math. 4 (1978), 1–9.
174  References [25] [26] [27] [28] [29]
[30] [31]
[32] [33]
[34] [35] [36]
[37]
[38] [39] [40] [41] [42] [43] [44] [45] [46] [47]
W. Ballmann, Lectures on spaces of nonpositive curvature, DMV Seminar 25, Birkhäuser Verlag, Basel, 1995, With an appendix by Misha Brin. W. Ballmann, M. Gromov and V. Schroeder, Manifolds of nonpositive curvature, Progress in Mathematics 61, Birkhäuser Boston Inc., Boston, MA, 1985. S. Banert, Backward–backward splitting in Hadamard spaces, J. Math. Anal. Appl. 414 (2014), 656–665. H. H. Bauschke and J. M. Borwein, On the convergence of von Neumann’s alternating projection algorithm for two sets, SetValued Anal. 1 (1993), 185–212. H. H. Bauschke, J. V. Burke, F. R. Deutsch, H. S. Hundal and J. D. Vanderwerff, A new proximal point iteration that converges weakly but not in norm, Proc. Amer. Math. Soc. 133 (2005), 1829–1835 (electronic). H. H. Bauschke, Fenchel duality, Fitzpatrick functions and the extension of firmly nonexpansive mappings, Proc. Amer. Math. Soc. 135 (2007), 135–139 (electronic). H. H. Bauschke and P. L. Combettes, Convex analysis and monotone operator theory in Hilbert spaces, CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC, Springer, New York, 2011, With a foreword by Hédy Attouch. H. H. Bauschke, E. Matoušková and S. Reich, Projection and proximal point methods: convergence results and counterexamples, Nonlinear Anal. 56 (2004), 715–738. H. H. Bauschke and X. Wang, Firmly nonexpansive and KirszbraunValentine extensions: a constructive approach via monotone operator theory, Nonlinear analysis and optimization I. Nonlinear analysis, Contemp. Math. 513, Amer. Math. Soc., Providence, RI, 2010, pp. 55–64. P. Benner, M. Bačák and P. Y. Bourguignon, Point estimates in phylogenetic reconstructions, To appear in Bioinformatics. P. Benner, TFBayes: https://github.com/pbenner/tfbayes, 2013. G. C. Bento, O. P. Ferreira and P. R. Oliveira, Local convergence of the proximal point method for a special class of nonconvex functions on Hadamard manifolds, Nonlinear Anal. 73 (2010), 564–572. Y. Benyamini and J. Lindenstrauss, Geometric nonlinear functional analysis. Vol. 1, American Mathematical Society Colloquium Publications 48, American Mathematical Society, Providence, RI, 2000. V. N. Berestovskij and I. G. Nikolaev, Multidimensional generalized Riemannian spaces, Geometry, IV, Encyclopaedia Math. Sci. 70, Springer, Berlin, 1993, pp. 165–243, 245–250. I. D. Berg and I. G. Nikolaev, Quasilinearization and curvature of Aleksandrov spaces, Geom. Dedicata 133 (2008), 195–218. D. P. Bertsekas, Incremental proximal methods for large scale convex optimization, Math. Program. 129 (2011), 163–195. D. P. Bertsekas and J. N. Tsitsiklis, NeuroDynamic Programming, Athena Scientific, 1996. L. J. Billera, S. P. Holmes and K. Vogtmann, Geometry of the space of phylogenetic trees, Adv. in Appl. Math. 27 (2001), 733–767. R. L. Bishop and B. O’Neill, Manifolds of negative curvature, Trans. Amer. Math. Soc. 145 (1969), 1–49. V. S. Borkar, Probability theory, Universitext, SpringerVerlag, New York, 1995, An advanced course. J. Borwein, S. Reich and I. Shafrir, KrasnoselskiMann iterations in normed spaces, Canad. Math. Bull. 35 (1992), 21–28. H. Brézis, Opérateurs maximaux monotones et semigroupes de contractions dans les espaces de Hilbert, NorthHolland Publishing Co., Amsterdam, 1973. H. Brézis and P. L. Lions, Produits infinis de résolvantes, Israel J. Math. 29 (1978), 329–345.
References  175
[48] [49] [50] [51]
[52] [53] [54] [55] [56] [57] [58] [59] [60] [61]
[62] [63] [64] [65]
[66]
[67] [68] [69] [70]
H. Brezis and A. Pazy, Semigroups of nonlinear contractions on convex sets, J. Functional Analysis 6 (1970), 237–281. H. Brézis and A. Pazy, Convergence and approximation of semigroups of nonlinear operators in Banach spaces, J. Functional Analysis 9 (1972), 63–74. M. R. Bridson, Geodesics and curvature in metric simplicial complexes, Group theory from a geometrical viewpoint (Trieste, 1990), World Sci. Publ., River Edge, NJ, 1991, pp. 373–463. M. R. Bridson and A. Haefliger, Metric spaces of nonpositive curvature, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 319, SpringerVerlag, Berlin, 1999. R. E. Bruck and S. Reich, Nonexpansive projections and resolvents of accretive operators in Banach spaces, Houston J. Math. 3 (1977), 459–470. Y. Brudnyi and P. Shvartsman, Stability of the Lipschitz extension property under metric transforms, Geom. Funct. Anal. 12 (2002), 73–79. F. Bruhat and J. Tits, Groupes réductifs sur un corps local, Inst. Hautes Études Sci. Publ. Math. (1972), 5–251. S. M. Buckley, K. Falk and D. J. Wraith, Ptolemaic spaces and CAT(0), Glasg. Math. J. 51 (2009), 301–314. D. Burago, Y. Burago and S. Ivanov, A course in metric geometry, Graduate Studies in Mathematics 33, American Mathematical Society, Providence, RI, 2001. H. Busemann, Spaces with nonpositive curvature, Acta Math. 80 (1948), 259–310. H. Busemann, The geometry of geodesics, Academic Press Inc., New York, N. Y., 1955. C. H. Chen, Warped products of metric spaces of curvature bounded from above, Trans. Amer. Math. Soc. 351 (1999), 4727–4740. P. R. Chernoff, Note on product formulas for operator semigroups, J. Functional Analysis 2 (1968), 238–242. P. R. Chernoff, Product formulas, nonlinear semigroups, and addition of unbounded operators, American Mathematical Society, Providence, R. I., 1974, Memoirs of the American Mathematical Society, No. 140. T. Christiansen and K. T. Sturm, Expectations and martingales in metric spaces, Stochastics 80 (2008), 1–17. B. Clarke, Geodesics, distance, and the CAT(0) property for the manifold of Riemannian metrics, Math. Z. 273 (2013), 55–93. B. Clarke and Y. Rubinstein, Ricci flow and the metric completion of the space of Kähler metrics, Amer. J. Math. 135 (2013), 1477–1505. B. Clarke and Y. A. Rubinstein, Conformal deformations of the Ebin metric and a generalized Calabi metric on the space of Riemannian metrics, Ann. Inst. H. Poincaré Anal. Non Linéaire 30 (2013), 251–274. P. L. Combettes and J. C. Pesquet, Proximal splitting methods in signal processing, Fixedpoint algorithms for inverse problems in science and engineering, Springer Optim. Appl. 49, Springer, New York, 2011, pp. 185–212. P. Combettes, Fejér monotonicity in convex optimization, Encyclopedia of Optimization (C. Floudas and P. Pardalos, eds.), Kluwer, Boston, MA, 2001. M. G. Crandall and T. M. Liggett, Generation of semigroups of nonlinear transformations on general Banach spaces, Amer. J. Math. 93 (1971), 265–298. M. Culler and K. Vogtmann, Moduli of graphs and automorphisms of free groups, Invent. Math. 84 (1986), 91–119. S. K. Donaldson, Symmetric spaces, Kähler geometry and Hamiltonian dynamics, Northern California Symplectic Geometry Seminar, Amer. Math. Soc. Transl. Ser. 2 196, Amer. Math. Soc., Providence, RI, 1999, pp. 13–33.
176  References [71] [72] [73] [74] [75] [76] [77] [78] [79] [80]
[81] [82] [83] [84]
[85] [86] [87] [88] [89] [90]
[91] [92]
[93]
S. K. Donaldson, Conjectures in Kähler geometry, Strings and geometry, Clay Math. Proc. 3, Amer. Math. Soc., Providence, RI, 2004, pp. 71–78. A. Dress, K. Huber, J. Koolen, V. Moulton and A. Spillner, Basic phylogenetic combinatorics, Cambridge University Press, Cambridge, 2012. R. Durrett, Probability: theory and examples, fourth ed, Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge, 2010. P. B. Eberlein, Geometry of nonpositively curved manifolds, Chicago Lectures in Mathematics, University of Chicago Press, Chicago, IL, 1996. O. Ebner, Nonlinear Markov semigroups and refinement schemes on metric spaces, preprint, arXiv:1112.6003. P. Enflo, Uniform structures and square roots in topological groups. I, Israel J. Math. 8 (1970), 230–252, II, Israel J. Math. 8 (1970), 253–272. P. Enflo, On a problem of Smirnov, Ark. Mat. 8 (1969), 107–109. P. Enflo, On the nonexistence of uniform homeomorphisms between 𝐿 𝑝 spaces, Ark. Mat. 8 (1969), 103–105 (1969). K. J. Engel and R. Nagel, A short course on operator semigroups, Universitext, Springer, New York, 2006. A. EsSahib and H. Heinich, Barycentre canonique pour un espace métrique à courbure négative, Séminaire de Probabilités, XXXIII, Lecture Notes in Math. 1709, Springer, Berlin, 1999, pp. 355–370. R. Espínola and A. FernándezLeón, CAT(𝑘)spaces, weak convergence and fixed points, J. Math. Anal. Appl. 353 (2009), 410–427. M. Fabian, P. Habala, P. Hájek, V. Montesinos and V. Zizler, Banach space theory, CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC, Springer, New York, 2011. A. Feragen, S. Hauberg, M. Nielsen and F. Lauze, Means in spaces of treelike shapes, International Conference of Computer Vision (ICCV) (2011). A. Feragen, P. Lo, M. de Bruijne, M. Nielsen and F. Lauze, Towards a theory of statistical treeshape analysis, IEEE Transactions of Pattern Analysis and Machine Intelligence, in press, arXiv:1207.5371v1. O. P. Ferreira and P. R. Oliveira, Proximal point algorithm on Riemannian manifolds, Optimization 51 (2002), 257–270. T. Foertsch, A. Lytchak and V. Schroeder, Nonpositive curvature and the Ptolemy inequality, Int. Math. Res. Not. IMRN (2007), Art. ID rnm100, 15. Z. Frolík, Concerning topological convergence of sets, Czechoslovak Math. J 10(85) (1960), 168–180. T. Gelander, A. Karlsson and G. A. Margulis, Superrigidity, generalized harmonic maps and uniformly convex spaces, Geom. Funct. Anal. 17 (2008), 1524–1550. A. Genel and J. Lindenstrauss, An example concerning fixed points, Israel J. Math. 22 (1975), 81–86. J. R. Giles, Convex analysis with application in the differentiation of convex functions, Research Notes in Mathematics 58, Pitman (Advanced Publishing Program), Boston, Mass., 1982. K. Goebel and W. A. Kirk, Topics in metric fixed point theory, Cambridge Studies in Advanced Mathematics 28, Cambridge University Press, Cambridge, 1990. K. Goebel and S. Reich, Uniform convexity, hyperbolic geometry, and nonexpansive mappings, Monographs and Textbooks in Pure and Applied Mathematics 83, Marcel Dekker Inc., New York, 1984. A. Goldberg and R. Tarjan, A new approach to the maximumflow problem, J. Assoc. Comput. Mach. 35 (1988), 921–940.
References  177
[94] [95] [96] [97]
[98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112]
[113] [114] [115] [116]
P. Grohs, Wolfowitz’s theorem and convergence of consensus algorithms in Hadamard spaces, to appear in Proc. Amer. Math. Soc. M. Gromov, Hyperbolic groups, Essays in group theory, Math. Sci. Res. Inst. Publ. 8, Springer, New York, 1987, pp. 75–263. M. Gromov and R. Schoen, Harmonic maps into singular spaces and 𝑝adic superrigidity for lattices in groups of rank one, Inst. Hautes Études Sci. Publ. Math. (1992), 165–246. M. Gromov, Metric structures for Riemannian and nonRiemannian spaces, Progress in Mathematics 152, Birkhäuser Boston Inc., Boston, MA, 1999, Based on the 1981 French original [MR0682063 (85e:53051)], With appendices by M. Katz, P. Pansu and S. Semmes, Translated from the French by Sean Michael Bates. O. Güler, On the convergence of the proximal point algorithm for convex minimization, SIAM J. Control Optim. 29 (1991), 403–419. B. Halpern, Fixed points of nonexpanding maps, Bull. Amer. Math. Soc. 73 (1967), 957–961. H. S. Hundal, An alternating projection that does not converge in norm, Nonlinear Anal. 57 (2004), 35–61. M. M. Israel, Jr. and S. Reich, Extension and selection problems for nonlinear semigroups in Banach spaces, Math. Japon. 28 (1983), 1–8. J. Jost, Equilibrium maps between metric spaces, Calc. Var. Partial Differential Equations 2 (1994), 173–204. J. Jost, Convex functionals and generalized harmonic maps into spaces of nonpositive curvature, Comment. Math. Helv. 70 (1995), 659–673. J. Jost, Nonpositive curvature: geometric and analytic aspects, Lectures in Mathematics ETH Zürich, Birkhäuser Verlag, Basel, 1997. J. Jost, Nonlinear Dirichlet forms, New directions in Dirichlet forms, AMS/IP Stud. Adv. Math. 8, Amer. Math. Soc., Providence, RI, 1998, pp. 1–47. J. Jost, Riemannian geometry and geometric analysis, fifth ed, Universitext, SpringerVerlag, Berlin, 2008. T. Kato, Nonlinear semigroups and evolution equations, J. Math. Soc. Japan 19 (1967), 508– 520. T. Kato and K. Masuda, Trotter’s product formula for nonlinear semigroups generated by the subdifferentials of convex functionals, J. Math. Soc. Japan 30 (1978), 169–178. W. A. Kirk, Fixed point theorems in CAT(0) spaces and ℝtrees, Fixed Point Theory Appl. (2004), 309–316. W. A. Kirk, Geodesic geometry and fixed point theory. II, International Conference on Fixed Point Theory and Applications, Yokohama Publ., Yokohama, 2004, pp. 113–142. W. A. Kirk and B. Panyanak, A concept of convergence in geodesic spaces, Nonlinear Anal. 68 (2008), 3689–3696. W. A. Kirk, Geodesic geometry and fixed point theory, Seminar of Mathematical Analysis (Malaga/Seville, 2002/2003), Colecc. Abierta 64, Univ. Sevilla Secr. Publ., Seville, 2003, pp. 195–225. M. Kirszbraun, Über die zusammenziehende und Lipschitzsche Transformationen, Fundamenta Math. (1934), 77–108. U. Kohlenbach, Applied proof theory: proof interpretations and their use in mathematics, Springer Monographs in Mathematics, SpringerVerlag, Berlin, 2008. U. Kohlenbach and L. Leuştean, Effective metastability of Halpern iterates in CAT(0) spaces, Adv. Math. 231 (2012), 2526–2556. U. Kohlenbach and L. Leuştean, Addendum to “Effective metastability of Halpern iterates in CAT(0) spaces” [Adv. Math. 231 (5) (2012) 2526–2556], Adv. Math. 250 (2014), 650–651.
178  References [117] U. Kohlenbach and L. Leuştean, Mann iterates of directionally nonexpansive mappings in hyperbolic spaces, Abstr. Appl. Anal. (2003), 449–477. [118] E. Kopecká and S. Reich, A note on the von Neumann alternating projections algorithm, J. Nonlinear Convex Anal. 5 (2004), 379–386. [119] E. Kopecká and S. Reich, Asymptotic behavior of resolvents of coaccretive operators in the Hilbert ball, Nonlinear Anal. 70 (2009), 3187–3194. [120] E. Kopecká and S. Reich, A note on alternating projections in Hilbert space, J. Fixed Point Theory Appl. 12 (2012), 41–47. [121] N. J. Korevaar and R. M. Schoen, Sobolev spaces and harmonic maps for metric space targets, Comm. Anal. Geom. 1 (1993), 561–659. [122] L. Kovalev, Lipschitz retraction of finite subsets of Hadamard spaces, preprint, arXiv:1406.6742. [123] M. A. Krasnoselskij, Two remarks on the method of successive approximations, Uspehi Mat. Nauk (N.S.) 10 (1955), 123–127. [124] K. Kuwae and T. Shioya, Variational convergence over metric spaces, Trans. Amer. Math. Soc. 360 (2008), 35–75. [125] S. Lang, Fundamentals of differential geometry, Graduate Texts in Mathematics 191, SpringerVerlag, New York, 1999. [126] U. Lang, B. Pavlović and V. Schroeder, Extensions of Lipschitz maps into Hadamard spaces, Geom. Funct. Anal. 10 (2000), 1527–1553. [127] U. Lang and V. Schroeder, Kirszbraun’s theorem and metric spaces of bounded curvature, Geom. Funct. Anal. 7 (1997), 535–560. [128] J. R. Lee and A. Naor, Extending Lipschitz functions via random metric partitions, Invent. Math. 160 (2005), 59–95. [129] L. Leustean, A quadratic rate of asymptotic regularity for CAT(0)spaces, J. Math. Anal. Appl. 325 (2007), 386–399. [130] C. Li, G. López and V. MartínMárquez, Monotone vector fields and the proximal point algorithm on Hadamard manifolds, J. Lond. Math. Soc. (2) 79 (2009), 663–683. [131] C. Li, B. S. Mordukhovich, J. Wang and J. C. Yao, Weak sharp minima on Riemannian manifolds, SIAM J. Optim. 21 (2011), 1523–1560. [132] T. C. Lim, Remarks on some fixed point theorems, Proc. Amer. Math. Soc. 60 (1976), 179–182 (1977). [133] P. L. Lions and B. Mercier, Splitting algorithms for the sum of two nonlinear operators, SIAM J. Numer. Anal. 16 (1979), 964–979. [134] J. Lott and C. Villani, HamiltonJacobi semigroup on length spaces and applications, J. Math. Pures Appl. (9) 88 (2007), 219–229. [135] T. Mabuchi, Some symplectic geometry on compact Kähler manifolds. I, Osaka J. Math. 24 (1987), 227–252. [136] W. R. Mann, Mean value methods in iteration, Proc. Amer. Math. Soc. 4 (1953), 506–510. [137] B. Martinet, Régularisation d’inéquations variationnelles par approximations successives, Rev. Française Informat. Recherche Opérationnelle 4 (1970), 154–158. [138] E. Matoušková and S. Reich, The Hundal example revisited, J. Nonlinear Convex Anal. 4 (2003), 411–427. [139] U. F. Mayer, Gradient flows on nonpositively curved metric spaces and harmonic maps, Comm. Anal. Geom. 6 (1998), 199–253. [140] M. Mendel and A. Naor, Expanders with respect to Hadamard spaces and random graphs, to appear in Duke Math. J. [141] M. Mendel and A. Naor, Spectral calculus and Lipschitz extension for barycentric metric spaces, Anal. Geom. Metr. Spaces 1 (2013), 163–199.
References 
179
[142] K. Menger, Untersuchungen über allgemeine Metrik, Math. Ann. 100 (1928), 75–163. [143] E. Miller, M. Owen and S. Provan, Averaging metric phylogenetic trees, preprint, arXiv:1211.7046v1. [144] I. Miyadera and S. Ôharu, Approximation of semigroups of nonlinear operators, Tôhoku Math. J. (2) 22 (1970), 24–47. [145] B. Mordukhovich and N. M. Nam, Applications of variational analysis to a generalized FermatTorricelli problem, J. Optim. Theory Appl. 148 (2011), 431–454. [146] B. S. Mordukhovich, N. M. Nam and J. Salinas, Applications of variational analysis to a generalized Heron problem, Appl. Anal. 91 (2012), 1915–1942. [147] B. S. Mordukhovich, N. M. Nam and J. Salinas, Jr., Solving a generalized Heron problem by means of convex analysis, Amer. Math. Monthly 119 (2012), 87–99. [148] U. Mosco, Convergence of convex sets and of solutions of variational inequalities, Advances in Math. 3 (1969), 510–585. [149] A. Naor and S. Sheffield, Absolutely minimal Lipschitz extension of treevalued mappings, Math. Ann. 354 (2012), 1049–1078. [150] A. Navas, An 𝐿1 ergodic theorem with values in a nonpositively curved space via a canonical barycenter map, Ergodic Theory Dynam. Systems 33 (2013), 609–623. [151] O. Nevanlinna and S. Reich, Strong convergence of contraction semigroups and of iterative methods for accretive operators in Banach spaces, Israel J. Math. 32 (1979), 44–58. [152] J. Neveu, Discreteparameter martingales, revised ed, NorthHolland Publishing Co., Amsterdam, 1975, Translated from the French by T. P. Speed, NorthHolland Mathematical Library, Vol. 10. [153] I. Nikolaev, The tangent cone of an Aleksandrov space of curvature ≤ 𝐾, Manuscripta Math. 86 (1995), 137–147. [154] S. I. Ohta, Markov type of Alexandrov spaces of nonnegative curvature, Mathematika 55 (2009), 177–189. [155] S. I. Ohta and M. Pálfia, Discretetime gradient flows and law of large numbers in Alexandrov spaces, preprint, arXiv:1402.1629v1. [156] Z. Opial, Weak convergence of the sequence of successive approximations for nonexpansive mappings, Bull. Amer. Math. Soc. 73 (1967), 591–597. [157] M. Owen and S. Provan, A fast algorithm for computing geodesic distances in tree space, IEEE/ACM Trans. Computational Biology and Bioinformatics 8 (2011), 2–13. [158] M. Owen, Computing geodesic distances in tree space, SIAM J. Discrete Math. 25 (2011), 1506–1529. [159] L. Pachter and B. Sturmfels (eds.), Algebraic statistics for computational biology, Cambridge University Press, New York, 2005. [160] E. A. Papa Quiroz and P. R. Oliveira, Proximal point methods for quasiconvex and convex functions with Bregman distances on Hadamard manifolds, J. Convex Anal. 16 (2009), 49–69. [161] C. Papadimitriou and K. Steiglitz, Combinatorial optimization: algorithms and complexity, PrenticeHall Inc., Englewood Cliffs, N.J., 1982. [162] A. Papadopoulos, Metric spaces, convexity and nonpositive curvature, IRMA Lectures in Mathematics and Theoretical Physics 6, European Mathematical Society (EMS), Zürich, 2005. [163] G. B. Passty, Ergodic convergence to a zero of the sum of monotone operators in Hilbert space, J. Math. Anal. Appl. 72 (1979), 383–390. [164] A. Pazy, Semigroups of linear operators and applications to partial differential equations, Applied Mathematical Sciences 44, SpringerVerlag, New York, 1983. [165] X. Pennec, P. Fillard and N. Ayache, A Riemannian framework for tensor computing, International Journal of Computer Vision 66 (2006), 41–66. [166] R. R. Phelps, Convex sets and nearest points, Proc. Amer. Math. Soc. 8 (1957), 790–797.
180  References [167] S. Reich, Weak convergence theorems for nonexpansive mappings in Banach spaces, J. Math. Anal. Appl. 67 (1979), 274–276. [168] S. Reich, Product formulas, nonlinear semigroups, and accretive operators, J. Funct. Anal. 36 (1980), 147–168. [169] S. Reich, A complement to Trotter’s product formula for nonlinear semigroups generated by the subdifferentials of convex functionals, Proc. Japan Acad. Ser. A Math. Sci. 58 (1982), 193–195. [170] S. Reich, Solutions of two problems of H. Brézis, J. Math. Anal. Appl. 95 (1983), 243–250. [171] S. Reich, The asymptotic behavior of a class of nonlinear semigroups in the Hilbert ball, J. Math. Anal. Appl. 157 (1991), 237–242. [172] S. Reich, The alternating algorithm of von Neumann in the Hilbert ball, Dynam. Systems Appl. 2 (1993), 21–25. [173] S. Reich and I. Shafrir, Nonexpansive iterations in hyperbolic spaces, Nonlinear Anal. 15 (1990), 537–558. [174] S. Reich and L. Shemen, A note on Halpern’s algorithm in the Hilbert ball, J. Nonlinear Convex Anal. 14 (2013), 853–862. [175] S. Reich and D. Shoikhet, Semigroups and generators on convex domains with the hyperbolic metric, Atti Accad. Naz. Lincei Cl. Sci. Fis. Mat. Natur. Rend. Lincei (9) Mat. Appl. 8 (1997), 231–250. [176] S. Reich and S. Simons, Fenchel duality, Fitzpatrick functions and the KirszbraunValentine extension theorem, Proc. Amer. Math. Soc. 133 (2005), 2657–2660 (electronic). [177] S. Reich and A. J. Zaslavski, Infinite products of resolvents of accretive operators, Topol. Methods Nonlinear Anal. 15 (2000), 153–168, Dedicated to Juliusz Schauder, 1899–1943. [178] S. Reich and A. J. Zaslavski, Genericity in nonlinear analysis, Developments in Mathematics 34, Springer, New York, 2014. [179] J. G. Rešetnjak, Nonexpansive maps in a space of curvature no greater than 𝐾, Sibirsk. Mat. Ž. 9 (1968), 918–927. [180] R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control Optimization 14 (1976), 877–898. [181] S. Saejung, Halpern’s iteration in CAT(0) spaces, Fixed Point Theory Appl. (2010), Art. ID 471781, 13. [182] T. Sato, An alternative proof of Berg and Nikolaev’s characterization of CAT(0)spaces via quadrilateral inequality, Arch. Math. (Basel) 93 (2009), 487–490. [183] S. Semmes, Complex MongeAmpère and symplectic manifolds, Amer. J. Math. 114 (1992), 495–550. [184] C. Semple and M. Steel, Phylogenetics, Oxford Lecture Series in Mathematics and its Applications 24, Oxford University Press, Oxford, 2003. [185] E. Seneta, Nonnegative matrices and Markov chains, Springer Series in Statistics, Springer, New York, 2006, Revised reprint of the second (1981) edn. [SpringerVerlag, New York; MR0719544]. [186] I. Shafrir, Coaccretive operators and firmly nonexpansive mappings in the Hilbert ball, Nonlinear Anal. 18 (1992), 637–648. [187] I. Shafrir, Theorems of ergodic type for 𝜌nonexpansive mappings in the Hilbert ball, Ann. Mat. Pura Appl. (4) 163 (1993), 313–327. [188] E. N. Sosov, On analogues of weak convergence in a special metric space, Izv. Vyssh. Uchebn. Zaved. Mat. (2004), 84–89. [189] I. Stojkovic, Approximation for convex functionals on nonpositively curved spaces and the TrotterKato product formula, Adv. Calc. Var. 5 (2012), 77–126.
References 
181
[190] J. Streets, The consistency and convergence of Kenergy minimizing movements, preprint. arXiv:1301.3948v1. [191] J. Streets, Long time existence of minimizing movement solutions of Calabi flow, preprint. arXiv:1208.2718v3. [192] K. T. Sturm, Nonlinear Markov operators associated with symmetric Markov kernels and energy minimizing maps between singular spaces, Calc. Var. Partial Differential Equations 12 (2001), 317–357. [193] K. T. Sturm, Nonlinear Markov operators, discrete heat flow, and harmonic maps between singular spaces, Potential Anal. 16 (2002), 305–340. [194] K. T. Sturm, Nonlinear martingale theory for processes with values in metric spaces of nonpositive curvature, Ann. Probab. 30 (2002), 1195–1222. [195] K. T. Sturm, Probability measures on metric spaces of nonpositive curvature, Heat kernels and analysis on manifolds, graphs, and metric spaces (Paris, 2002), Contemp. Math. 338, Amer. Math. Soc., Providence, RI, 2003, pp. 357–390. [196] K. T. Sturm, A semigroup approach to harmonic maps, Potential Anal. 23 (2005), 225–277. [197] H. F. Trotter, Approximation of semigroups of operators, Pacific J. Math. 8 (1958), 887–919. [198] H. F. Trotter, On the product of semigroups of operators, Proc. Amer. Math. Soc. 10 (1959), 545–551. [199] M. Tsukada, Convergence of best approximations in a smooth Banach space, J. Approx. Theory 40 (1984), 301–309. [200] F. A. Valentine, On the extension of a vector function so as to preserve a Lipschitz condition, Bull. Amer. Math. Soc. 49 (1943), 100–108. [201] K. Vogtmann, What is. . . outer space?, Notices Amer. Math. Soc. 55 (2008), 784–786. [202] H. Wang and J. S. Marron, Object oriented data analysis: sets of trees, Ann. Statist. 35 (2007), 1849–1873. [203] R. A. Wijsman, Convergence of sequences of convex sets, cones and functions. II, Trans. Amer. Math. Soc. 123 (1966), 32–45. [204] D. Williams, Probability with martingales, Cambridge Mathematical Textbooks, Cambridge University Press, Cambridge, 1991. [205] R. Wittmann, Approximation of fixed points of nonexpansive mappings, Arch. Math. (Basel) 58 (1992), 486–491. [206] J. Wolfowitz, Products of indecomposable, aperiodic, stochastic matrices, Proc. Amer. Math. Soc. 14 (1963), 733–737. [207] S. A. Wolpert, Geometry of the WeilPetersson completion of Teichmüller space, Surveys in differential geometry, Vol. VIII (Boston, MA, 2002), Surv. Differ. Geom., VIII, Int. Press, Somerville, MA, 2003, pp. 357–393. [208] S. Yamada, On the geometry of WeilPetersson completion of Teichmüller spaces, Math. Res. Lett. 11 (2004), 327–344. [209] T. Yokota, Convex functions and barycenter on CAT(1)spaces of small radii, preprint, http://www.kurims.kyotou.ac.jp/∼takumiy/. [210] H. Yu, Some proof details for asynchronous stochastic approximation algorithms, Unpublished manuscript (2012). [211] A. J. Zaslavski, Inexact proximal point methods in metric spaces, SetValued Var. Anal. 19 (2011), 589–608.
Index 𝛤convergence 103 4point property 20 A Alexandrov’s lemma 26 alternating projection method 118 alternating sequence 118 angle between geodesics 9 aperiodic matrix 154 approximate metric midpoints 2 asymptotic center 58 asymptotic cone 16 averaged projection method 119 B barycenter of a measure 48 BHV tree space 158 boundedly linearly regular sets 118 boundedly regular sets 117 Bridson theorem 14 Busemann function 38 Busemann space 4 C Calabi equation 102 Calabi flow 102 CAT(0) complex 13 CAT(0) space 5 Cesàro means 67, 122 Chebyshev set 33 coercive function 42 comparison point 3 comparison triangle 3 compatible edges 159 compatible splits 159 completed tangent cone 18 conditional expectation 147 conditional Jensen inequality 150 conditional variance inequality 147 consensus algorithm 156 convergence in probability 141 convex combination 2 convex feasibility problem 117 convex function 30 convex hull 31
convex set 30 coupling of measures 53 D diffusion tensor imaging 171 displacement function 38 distance function 33, 37 distribution of a random variable 139 domain of a function 36 Donaldson conjecture 102 E energy functional 39 epigraph 36 ergodic averages 122 Euclidean building 13 Euclidean cone 17 evolution variational inequality 94 expectation 139 extension problem 162 F Fejér monotone sequence 64 Fermat–Weber problem 37 filtered conditional expectation 150 filtered probability space 131, 150 filtration 131, 150 firmly nonexpansive map 34 flow network 163 formal convex hull 32 Frolík–Wijsman convergence 106 G geodesic 1 geodesic line 30 geodesic ray 30 geodesic space 1 geodesic triangle 3 gluing along convex sets 14 gradient flow 87 Grohs–Wolfowitz theory 151 Gromov–Hausdorff convergence 16 Gromov–Hausdorff distance 16 guinea pig 169
184  Index H Hadamard manifold 10 Hadamard space 6 Halpern algorithm 122 Hamilton–Jacobi semigroup 42 Hilbert ball 11 homogeneous Markov chain 152 horoball 31 I identically distributed 141 incompatibility graph 162 independent random variables 140 independent set 162 indicator function 37 inner edge 158 irreducible matrix 154 J Jensen inequality 52, 143 joint convexity 6 K Kadec–Klee property 62 Kähler potential 102 Kirszbraun–Valentine theorem 69 Krasnoselski–Mann algorithm 121 L leaf edge 158 leaf of a tree 158 length of a path 1 length space 1 Lie–Trotter–Kato formula 112 linear convergence 64 linear Markov property 152 linearly regular sets 118 locally Hadamard space 17 lowersemicontinuity 36 lsc 36 M Mabuchi functional 102 Markov chain Monte Carlo 169 Markov kernel 152 martingale 131 maxflow mincut theorem 163 mean 37 mean conditional variance 147
median 37 metric midpoint 2 metric 𝑛tree 158 metric projection 33 midpoint 2 minimal displacement sets 31 minimizer 36 Möbius transformation 11 Moreau–Yosida envelope 42 Mosco convergence 103 Mosco convergence of sets 105 multiple sequence alignment 169 N nearestpoint mapping 33 nonlinear Lebesgue space 18 nonlinear Markov property 152 nonprincipal ultrafilter 15 O Opial property 60 origin of tree space 160 Owen–Provan algorithm 161 P path 1 path space 161 path space geodesic 161 Pettis integral 49 probability measures 48 proximal point algorithm 126 Ptolemy inequality 24 pushrelabel algorithm 163 R random variable 139 rate of linear convergence 64 resolvent identity 44, 74 resolvent of a family of maps 108 resolvent of a function 42 resolvent of a nonexpansive map 73 S semisimple isometry 31 SIA family 154 SIA matrix 154 sink 163 slope of a function 87 source 163
Index  185
space of directions 17 split 159 splitting proximal point algorithm 127 strong convergence 58 strongly convex function 36 subembedding 20 supermartingale 131 supermartingale convergence theorem 131 T tangent cone 18 tree 158 tree space 160 U ultraextension of a function 40 ultralimit 16 ultrapower 16 ultraproduct 16 uniform Kadec–Klee property 63
uniquely geodesic space 2 universal covering 17 V variance inequality 50 variance of a measure 50 variance of a random variable 139 vertex cover 163 W warped product 15 Wasserstein distance 54 weak Banach–Saks property 60 weak cluster point 58 weak convergence 58 weak topology 63 weakly lsc function 64 weighted product 15 Wolfowitz theorem 154
De Gruyter Series in Nonlinear Analysis and Applications
Volume 21 Moshe Marcus, Laurent Véron Nonlinear Second Order Elliptic Equations Involving Measures, 2013 ISBN 9783110305159, eISBN 9783110305319, SetISBN 9783110305326 Volume 20 John R. Graef, Johnny Henderson, Abdelghani Ouahab Impulsive Differential Inclusions: A Fixed Point Approach, 2013 ISBN 9783110293616, eISBN 9783110295313, SetISBN 9783110295320 Volume 19 Petr Hájek, Michal Johanis Higher Smoothness in Banach Spaces, 2014 ISBN 9783110258981, eISBN 9783110258998, SetISBN 9783112203859 Volume 18 Smaïl Djebali, Lech Górniewicz, Abdelghani Ouahab Solution Sets for Differential Equations and Inclusions, 2012 ISBN 9783110293449, eISBN 9783110293562, SetISBN 9783110293579 Volume 17 Jürgen Appell, Józef Banas, Nelson José Merentes Díaz Bounded Variation and Around, 2013 ISBN 9783110265071, eISBN 9783110265118, SetISBN 9783110266245 Volume 16 Martin Väth Topological Analysis: From the Basics to the Triple Degree for Nonlinear Fredholm Inclusions, 2012 ISBN 9783110277227, eISBN 9783110277333, SetISBN 9783110277340
www.degruyter.com